Staff Site Reliability Engineer - 45461BR

Company: Synopsys Inc

Location: Mountain View

Closing Date: 06/11/2024

Salary: £150 - £200 Per Annum

Hours: Full Time

Type: Permanent

Apply Now

Job Requirements / Description

We are seeking a talented and experienced professional to join our team as Staff SRE Engineer. The successful candidate will have the responsibility of designing, implementing, and maintaining the observability platform that monitors the health of our production systems on-prem and in the Cloud. The candidate should have an exceptional background in software development, system administration, and monitoring tools, as well as a passion for building scalable and reliable systems.

Key Responsibilities

Design and implement the SRE & Observability platform to monitor the health of our production systems providing a holistic view of the environment.
Partner with other teams to ensure that data and monitoring tools are effectively integrated with other systems and processes.
Ensure that the SRE & Observability platform is scalable, reliable, and can handle large volumes of data.
Design and implement SRE best practices for the team and identify KPIs for various systems, organizations, and stakeholders.
Partner with multiple teams to identify data points needed to define SLA, SLI, SLO, error budgets and KPIs.
Automate the deployment and configuration of monitoring tools to reduce human error and increase efficiency.
Develop custom scripts and tools to extend the functionality of the monitoring platform, including, but not limited to Proactive remediation and Self-Healing.
Perform root cause analysis on incidents, prepare detailed reports to present to the stakeholders, and develop solutions to prevent similar incidents from occurring in the future.
Provide guidance and mentorship to junior members of the team.
Drive the design and implementation of major SRE initiatives.
Act as a SME on SRE & Observability, providing guidance to other teams across the organization.
Continuously evaluate and implement new tools and technologies to improve the SRE platform.

Qualifications

Have a proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
Deep knowledge of Linux OS, Networking and NFS technologies.
Experience with data stores and search engines such as Elasticsearch is a must. Other technologies like Prometheus, Grafana, and similar technologies is a plus.
Solid Python programming skills and experience.
Expertise in cloud computing platforms such as Azure, AWS, or GCP.
Experience with EDA workloads and environments including technologies such as Grid engine, LSF, SLURM, Linux, networking and NFS is required.
Familiarity with containerization technologies such as Docker, Swarm and Kubernetes.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively with other teams and stakeholders.
Proven communication skills, both verbal and written.
Ability to work in a fast-paced and dynamic environment.
Bachelor's or Master’s degree in Computer Science, Engineering, or a related field.
10+ years of experience in software development, system administration and monitoring tools.

The base salary range across the U.S. for this role is between $117,000-$204,000. In addition, this role may be eligible for an annual bonus, equity, and other discretionary bonuses. Synopsys offers comprehensive health, wellness, and financial benefits as part of a competitive total rewards package. The actual compensation offered will be based on a number of job-related factors, including location, skills, experience, and education. Your recruiter can share more specific details on the total rewards package upon request.

#J-18808-Ljbffr

Apply Now

Share this job

Synopsys Inc

Useful Links

More Jobs in Mountain View
Full Time Jobs in Mountain View
Part Time Jobs in Mountain View
Engineering Jobs