Senior Systems Reliability Engineer

Company:  DRW
Location: Chicago
Closing Date: 03/11/2024
Salary: £150 - £200 Per Annum
Hours: Full Time
Type: Permanent
Job Requirements / Description

DRW is a diversified trading firm with over 3 decades of experience bringing sophisticated technology and exceptional people together to operate in markets around the world. We value autonomy and the ability to quickly pivot to capture opportunities, so we operate using our own capital and trading at our own risk.

Headquartered in Chicago with offices throughout the U.S., Canada, Europe, and Asia, we trade a variety of asset classes including Fixed Income, ETFs, Equities, FX, Commodities and Energy across all major global markets. We have also leveraged our expertise and technology to expand into three non-traditional strategies: real estate, venture capital and cryptoassets.

We operate with respect, curiosity and open minds. The people who thrive here share our belief that it’s not just what we do that matters–it's how we do it. DRW is a place of high expectations, integrity, innovation and a willingness to challenge consensus.

We are seeking a Systems Reliability Engineer to join our Fixed Income Commodities and Currency Options (FICCO) and Cumberland team in either Chicago or London. In this role, you will be responsible for designing and supporting highly available systems within a technologically diverse stack used for global research and trading of FICCO and Cryptoassets. Leveraging tools such as AWS, Docker, Kubernetes, CI/CD, Python, Prometheus and Grafana, you will develop a repeatable and supportable tech stack to meet the demanding needs of our business.

Core Responsibilities:

  1. Collaborate with our FICCO and Cumberland technology and trading teams regarding their CI/CD processes.
  2. Collaborate with development teams to troubleshoot software build issues and optimize packaging processes.
  3. Automate deployment processes to improve efficiency and reduce manual intervention.
  4. Implement and manage infrastructure as code tools such as Terraform and Ansible.
  5. Maintain, design, and troubleshoot our observability stack.
  6. Drive initiatives to modernize environments by developing and optimizing processes using appropriate cloud and container tools, such as AWS and Kubernetes.
  7. Consistently challenge the norm and advocate for change.

Skills and Qualifications:

  1. Proven experience as a DevOps Engineer, Site Reliability Engineer, or similar software engineering role.
  2. Strong expertise with Observability tools such as Prometheus, Alert Manager, Grafana, Sentry, and OpenTelemetry.
  3. Proficiency with Python, Java, and C++ software builds and packaging.
  4. Hands-on experience with CI/CD tools like TeamCity, Concourse, Argo Workflows, and/or GitHub Actions.
  5. Solid understanding of Infrastructure as Code (IaC) tools such as Terraform, Terragrunt, and Ansible.
  6. Skills in Python for troubleshooting and maintaining environment dependencies.
  7. Proficient with Docker for image creation, networking, and execution.
  8. Experienced with Kubernetes, including deployment and management of applications.
  9. Knowledge of ArgoCD, Helm, and Kustomize for Kubernetes application management.
  10. Fundamental understanding of git and familiarity with git repository tools such as GitHub and GitLab.
  11. Linux experience with Debian and Redhat-based systems.
  12. Excellent organizational skills, with the ability to effectively plan and prioritize tasks.
  13. Strong collaborative team spirit and communication skills.

Preferred Qualifications:

  1. Bachelor’s degree in Computer Science, Engineering, or a relevant field.
  2. Experience using Conda, including environment management and conda-build for creating conda packages.
  3. Experience deploying and maintaining CI/CD pipelines in a large-scale production environment.
  4. Hands-on experience with cloud platforms and services, such as AWS, GCP, or Azure.
  5. Experience supporting the infrastructure and systems that facilitate electronic trading functions or other high-performance computing environments.
  6. Experience in consolidating diverse and redundant approaches to common problems.
#J-18808-Ljbffr
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙