Full-Stack MLOps Systems Engineering Lead

Company:  Accenture
Location: Oklahoma City
Closing Date: 07/11/2024
Salary: £100 - £125 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

This is a Senior Manager/Associate Director-level position:

We Are Nextira, now part of Accenture, builds cloud-based solutions and services with cutting-edge engineering skills, artificial intelligence (AI), machine learning (ML), and data analytics that enable clients to design, build, launch and optimize high-performance computing environments. Nextira joined the Accenture AWS Business Group and our AWS North America delivery practice in June 2023. The Accenture AWS Business Group is part of Accenture Cloud First, Accenture’s industry-leading business group focused on redefining how organizations operate and create value by using cloud, data, and artificial intelligence for total enterprise reinvention.

You Are:

An experienced, highly motivated MLOps Engineering Lead looking to join our team in supporting our large-scale GPU-based AI training and research cluster hosted on various cloud providers such as AWS, Azure, and GCP. The ideal candidate should have expert knowledge of Linux at the kernel level and be able to configure and troubleshoot NVIDIA drivers and utilities, particularly on virtual machines running in the cloud. The candidate should be able to use their expertise to guide experienced ML engineers in their efforts to train large AI models in the cloud.

The Work:

You (Full Stack MLOps Engineering Lead) will lead the design, development, and operational management of cloud-native computing clusters to perform ML training and inference. You will lead the development and delivery of tooling needed to optimize performance and troubleshoot issues with their training workloads.

You will manage HPC (High Performance Computing) clusters, including schedulers such as Slurm, and compute nodes accelerated with NVIDIA GPUs. You’ll help experienced ML engineers configure and manage their Conda environments, optimizing them for their specific AI training and research needs. As a MLOps Engineering Lead, you will design, deploy, and maintain cloud infrastructure with infrastructure-as-code (IaC) tools such as Terraform, AWS CDK.

Our Senior MLOps Engineers engage in clear and effective communication with highly technical users, providing support and guidance on a wide range of technical topics related to the cluster while utilizing their strong Linux skills to troubleshoot and resolve issues, optimize system performance, and ensure a stable and reliable environment for AI training and research.

Travel may be required for this role. The amount of travel will vary from 0 to 100% depending on business need and client requirements.


Here’s What You Need (Basic Qualifications):

  • Bachelor's degree or equivalent (minimum 12 years) work experience. (If Associate’s Degree, must have minimum 6 years of work experience)

  • Minimum of 6 years of professional experience working in a software engineering or DevOps role

  • Minimum of 4 years of experience in Linux Systems Administration, including kernel tuning, networking, and storage

  • Minimum of 3 years of experience with at least three of the following: Python, Docker / Kubernetes, C++, GPU stack (e.g. CUDA, SMI, ROCM)

  • Minimum of 4 years of experience in a platform engineering or developer role in a cloud environment

  • Minimum of 12 months of experience with IaC tools, e.g. Terraform

  • Excellent problem-solving and analytical skills

Bonus points if you have (Preferred Qualifications):

  • Experience in MLOps, Artificial Intelligence (AI), Large Language Models (LLMs), or High Performance Computing (HPC)

  • Experience with full-stack application development, particularly using cloud provider APIs

  • Experience with parallel file systems, e.g. Lustre, GPFS, Weka

  • Experience with managing virtualized Python environments, e.g. conda, pyenv.

  • Experience working in a consulting environment, engaging with client stakeholders at a senior level.

  • Experience leading a team of cloud platform engineers.

  • Strong written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non-technical audiences.

Compensation at Accenture varies depending on a wide array of factors, which may include but are not limited to the specific office location, role, skill set, and level of experience.

Information on benefits is here.

#J-18808-Ljbffr
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙