Site Reliability Engineer

Company:  Altimetrik
Location: Mountain View
Closing Date: 03/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description

Design, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant and scalable database systems

Build and maintain CI/CD pipelines in Jenkins

Build and deploy services in Kubernetes cluster using helm, kustomize, etc

Contribute to infrastructure changes to AWS with deep understanding of AWS services

Engage in on-call for pre-production and production systems supporting multi-million users

Write/Review RCA docs to prevent recurrence of Incidents in future and share the learnings

Contribute to major system upgrades, deployment automation, monitoring enhancements and Production changes

Create operational playbooks, contribute to how-to articles, and gain domain knowledge to drive changes in the team

Participate and contribute in FMEA/Chaos testing, Security remediations, etc

Share best practices and patterns for operational excellence and cost optimization

Reduce or eliminate manual steps by automating as much as possible

Continuously look for opportunities to increase developer velocity and productivity

Qualifications:

Bachelor's or master's degree in computer science or a related technical field. Equivalent experience will be considered

4+ years of hands-on development & operational experience with building and maintaining infrastructure in AWS

Extensive performance monitoring, troubleshooting & tuning experience

Experience with AWS services and hands-on knowledge of hosting on Cloud

Experience with scripting languages for DevOps automation

Experience with any one of the programming languages: Java/Python/Ruby

Knowledge of Docker & Kubernetes, ArgoCD,

Experience with monitoring and observability using Splunk, Wavefront, AppDynamics, Prometheus, Tracing, etc

Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙