The Microsoft Azure Silver Storage Team is designing, building, and operating critical services that our clients rely on every day, transforming a pile of servers into a connected cloud. We are responsible for ensuring the reliability of these systems, in the face of constant change, relying on cutting-edge research and a stellar engineering team. We have built multiple clouds thus far and the best is yet to come as we continue expanding at a rapid pace. Our division is responsible for a large diversity of technologies and challenges, including the physical compute fabric, networking, storage, and a myriad of "dial-tone" services that are crucial to the customer experience, while facing many complex challenges. We’re looking for a Service Engineer II - Cloud Support Engineer that is passionate about delivering value to customers in mission critical environments, enjoy a growth hacking culture, and are eager to play one of the most important long games for Microsoft.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day and we need you as a Service Engineer II - Cloud Support Engineer.
Responsibilities
As a Service Engineer II / Cloud Support Engineer you will work with other engineers and teams within the Azure Silver Storage team to provide support for complex Azure Storage issues that affect the availability or latency that customers experience using any of the core Storage services. Successful candidate will be trained to troubleshoot a wide range of errors involving authentication, authorization, availability, permissions, networking, data partition errors, node health errors across Azure Object Storage, File Storage, and Disk Storage.
You will investigate and root cause issues using logs, dashboards, TSGs, and patterns from previous investigations. The successful candidate will be able to identify patterns of incidents and suggest/implement process or engineering changes that will reduce or eliminate incidents and drive improvements back into the service.
Technical Knowledge and Expertise
- Demonstrates expertise in service and/or system design, interactions between technology layers and components, functions of infrastructure, and dependencies at scale. Contributes to service design by identifying and recommending optimal configurations of technology components with awareness of cost management. Adjusts configurations and defines infrastructures to improve the availability, reliability, efficiency, observability, and/or performance of supported products and services, with minimal guidance from other engineers. Actively participates in reviews with the engineering teams that develop and/or manage services and shares learnings and recommendations across engineering teams working on related services within their organization.
- Stays current in knowledge and expertise as technology landscape evolves. Contributes to the adoption of new solutions. Proactively seeks opportunities to learn and receive feedback.
Operational Excellence
- Implements reliable, scalable, and high-performance solutions across teams. Contributes to design documents. Owns implementation and rollback plans Maintains quality checklist and related documentation with minimal guidance.
- Monitors and takes action on telemetry data and performs analyses to identify patterns that reveal errors and unexpected problems that are affecting the system availability, reliability, performance, and/or efficiency, with minimal guidance. Develops scripting and/or automation used in monitoring based on observations and experience.
- Responds to incidents during regular on-call rotations by identifying the level of impact, troubleshooting issues, and deploying appropriate fixes to resolve root cause(s). Alerts product teams and owners to major customer impacting issues and escalates resolution of complex and highly impactful issues affecting multiple components or features to other engineers or engineering teams as needed. Shares details related to incidents and their resolution through postmortem reports and during regular review meetings.
- Learns and adheres to prescriptive guidance for security, privacy, and compliance standards in alignment with direction from the business and technical experts. Works with security, privacy, and compliance teams to identify and address issues relevant to their services with minimal guidance.
Collaboration and Knowledge Sharing
- Collaborates within and across teams by proactively and systematically sharing information with an appropriate level of detail for their audience. Proactively manages dependencies for their work with others.
- Shares insights and best practices that can be applied to improve development and operations of the system, platform, or product components and features by participating in design reviews, incident drills and debriefs, and regular meetings, as well as interactions with more experienced Service Engineers and members of product engineering teams.
Specialty Responsibilities
- Leverages technical expertise, judgment, and decision making to coordinate multiple work streams and resources in crisis situations to drive mitigation plan and resolve crisis by engaging necessary teams and escalating to appropriate stakeholders. Applies diagnostic expertise. Provides guidance to other engineers working to mitigate and resolve issues. Communicates customer impact and other relevant information with key stakeholders, leadership, and customers. Develops projects and programs to improve crisis response by creating standard practices for consistent response across engineering teams. Fosters increased stability. Reduces noise by adjusting telemetry and alarming.
- Identifies security issues, and recommends potential mitigation strategies to address underlying causes. Develops security guidance and models to address issues and to contribute to the definition of best practices. Suggests and drives appropriate guidance, models, response, and remediation for issues. Troubleshoots system issues and partners with engineering teams to conduct root cause analyses. Communicates and drives adherence to security policies and procedures.
Other
- Embody our Culture and Values
Qualifications
Required / Minimum Qualifications:
- Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, service engineering, or systems engineering
- OR equivalent experience.
Other Requirements
Security Clearance Verification: Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
- Clearance Verification: This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport.
Additional / Preferred Qualifications
- Experience with Powershell, Python or similar scripting/coding languages
- Experience with data visualization tools such as PowerBI or similar
- Proficiency in a structured query language such as SQL, KQL, or similar
- 1+ year(s) technical experience working with large-scale cloud or distributed systems.
- Management Information Systems (MIS), or other industry or product specific Engineering Certifications.
- Enjoy problem solving and troubleshooting. Able to triage and prioritize customer requests and issues
- Able to work across and collaborate with different functional areas in order to troubleshoot an issue.
Service Engineering IC3 - The typical base pay range for this role across the U.S. is USD $94,300 - $182,600 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $120,900 - $198,600 per year. Certain roles may be eligible for benefits and other compensation.
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
#J-18808-Ljbffr