Staff Site Reliability Engineer - 45461BR

Company:  Synopsys Inc
Location: Mountain View
Closing Date: 06/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

We are seeking a talented and experienced professional to join our team as Staff SRE Engineer. The successful candidate will have the responsibility of designing, implementing, and maintaining the observability platform that monitors the health of our production systems on-prem and in the Cloud. The candidate should have an exceptional background in software development, system administration, and monitoring tools, as well as a passion for building scalable and reliable systems.

Key Responsibilities

  1. Design and implement the SRE & Observability platform to monitor the health of our production systems providing a holistic view of the environment.
  2. Partner with other teams to ensure that data and monitoring tools are effectively integrated with other systems and processes.
  3. Ensure that the SRE & Observability platform is scalable, reliable, and can handle large volumes of data.
  4. Design and implement SRE best practices for the team and identify KPIs for various systems, organizations, and stakeholders.
  5. Partner with multiple teams to identify data points needed to define SLA, SLI, SLO, error budgets and KPIs.
  6. Automate the deployment and configuration of monitoring tools to reduce human error and increase efficiency.
  7. Develop custom scripts and tools to extend the functionality of the monitoring platform, including, but not limited to Proactive remediation and Self-Healing.
  8. Perform root cause analysis on incidents, prepare detailed reports to present to the stakeholders, and develop solutions to prevent similar incidents from occurring in the future.
  9. Provide guidance and mentorship to junior members of the team.
  10. Drive the design and implementation of major SRE initiatives.
  11. Act as a SME on SRE & Observability, providing guidance to other teams across the organization.
  12. Continuously evaluate and implement new tools and technologies to improve the SRE platform.

Qualifications

  1. Have a proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
  2. Deep knowledge of Linux OS, Networking and NFS technologies.
  3. Experience with data stores and search engines such as Elasticsearch is a must. Other technologies like Prometheus, Grafana, and similar technologies is a plus.
  4. Solid Python programming skills and experience.
  5. Expertise in cloud computing platforms such as Azure, AWS, or GCP.
  6. Experience with EDA workloads and environments including technologies such as Grid engine, LSF, SLURM, Linux, networking and NFS is required.
  7. Familiarity with containerization technologies such as Docker, Swarm and Kubernetes.
  8. Excellent problem-solving skills and attention to detail.
  9. Ability to work collaboratively with other teams and stakeholders.
  10. Proven communication skills, both verbal and written.
  11. Ability to work in a fast-paced and dynamic environment.
  12. Bachelor's or Master’s degree in Computer Science, Engineering, or a related field.
  13. 10+ years of experience in software development, system administration and monitoring tools.

The base salary range across the U.S. for this role is between $117,000-$204,000. In addition, this role may be eligible for an annual bonus, equity, and other discretionary bonuses. Synopsys offers comprehensive health, wellness, and financial benefits as part of a competitive total rewards package. The actual compensation offered will be based on a number of job-related factors, including location, skills, experience, and education. Your recruiter can share more specific details on the total rewards package upon request.

#J-18808-Ljbffr
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙