System Reliability Engineer (Application Support)

Company:  Fulcrum Digital Inc
Location: Kansas City
Closing Date: 29/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Job Description
Who are we
Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.
The Role
  • Provide L2 support to production systems like application, database, middleware components, infrastructure and network components
  • Manage productions incidents end-to-end within defined SLAs with focus on resolution rather than who caused it.
  • Interact with various stake holders such as Release managers, program leads, service managers, development and test leads
  • Review operational readiness requirements such as monitoring and alerting, log rotation and resilience of the components and report the gaps
  • Provide pre-implementation support with activities such as release notes review and implementation dry runs.
  • Protect production components by running health checks, monitoring latency and memory utilization.
  • Automate day-to-day activities and propose changes that improve reliability
  • Participate in CAB and provide feedback on change requests
  • Support the DevOps team in testing the promote pipelines and suggest automation of configuration items.
  • Practice incident management best practices and perform RCA.
  • Participate in disaster recovery tests and operational acceptance tests
  • Analyze the technology stack that makes up the product and optimize recovery time objective.
  • Work with team members spread across and time zones
  • Share knowledge, document improvements and mentor junior resources
  • Support deployments of code into multiple lower environments. Supporting current processes needed with an emphasis on automating everything as soon as possible.
  • Engage in and improve the whole lifecycle of services-from inception and design, through deployment, operation, and refinement.
Requirements
  • Deployments MTF/Prod, Maintenance items (including stop/start, Disaster Recovery-related activities, etc.), CR for changes in MTF/Prod
  • Tools -
  • Log Monitoring Tool - Splunk or any other
  • Application Monitoring tool - DynaTrace or any other
  • Ticketing incident/problem management tool - Remedy
  • Skills -
  • Linux & Shell Scripting
  • ITIL / ITSM
  • PL/SQL
  • Troubleshooting
  • Jenkins- CI/CS(Basic)
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙