Site Reliability Engineer

Company:  ACL Digital
Location: Atlanta
Closing Date: 03/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Title: Site Reliability Engineer
Atlanta, GA
Duration: 12 months
Site Reliability Engineer (SRE) with AWS Cloud and Application Monitoring Experience**
We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and robust application monitoring capabilities.
As an integral part of our team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems and applications.
Responsibilities:
-Implement, improve monitoring, alerting, and logging solutions to detect and respond to incidents.
-Collaborate closely with development team to deploy applications and services and ensure they meet reliability and performance standards.
-Automate deployment, configuration management, and troubleshooting processes to streamline operations.
-Participate in on-call rotation and triage production incidents, lead RCAs, and implement preventive actions.
-Conduct capacity planning and performance analysis to handle growing user traffic and data volume effectively.
-Establish and enforce best practices for security, monitoring, and disaster recovery.
-Continuously evaluate and implement new technologies to optimize infrastructure efficiency and reliability.
Requirements:
- Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent work experience.
-Proven experience as a Site Reliability Engineer or similar role, with a strong focus on AWS cloud infrastructure.
-Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).
-Hands-on experience with monitoring tools such as CloudWatch, SumoLogic, Dynatrace, Grafana, or similar for application performance monitoring and alerting.
- Proficiency in scripting and automation (e.g., Python, Bash) to build and maintain deployment pipelines and infrastructure.
- Strong analytical and troubleshooting skills to diagnose and resolve complex infrastructure and application, data issues.
- Experience with containerization (Docker, Kubernetes) and serverless architecture (AWS Lambda).
- Familiarity with CI/CD pipelines and version control systems (Git) for continuous integration and deployment.
- Excellent communication skills and ability to collaborate effectively with cross-functional teams.
AWS Certification is plus.
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙