SRE

Company:  Datum Technologies Group
Location: Kansas City
Closing Date: 01/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Site Reliability Engineer (SRE)
Function: Technology Engineering and Service Operations
Family: Automation Engineering
Dynamic Site Reliability Engineer skilled in provisioning and managing technology infrastructure to support enterprise-level services. Proven track record in designing, building, and maintaining efficient technology platforms that meet both internal and external customer needs while effectively managing associated risks.
Key Responsibilities
  • Collaborate in a DevSecOps environment to build and operate large-scale, distributed, fault-tolerant systems.
  • Partner with development and operations teams to design highly available, cost-effective systems with superior uptime metrics.
  • Resolve trouble tickets with the cloud operations team; develop scripts for troubleshooting and incident management.
  • Create tools and scripts for auto-remediation of incidents, implementing comprehensive monitoring and alerting systems.
  • Develop Infrastructure as Code (IaC) patterns compliant with security and engineering standards using Terraform and cloud SDKs.
  • Serve as a first responder in a 24/7 operations model for incident and problem management.
Key Skills
  • DevSecOps: Apply best practices to enhance product resilience, ensuring well-engineered solutions through coding, testing, and documentation.
  • Systems Thinking: Leverage knowledge of system integrations to enhance operational performance and achieve availability goals.
  • Operational Excellence: Monitor and measure systems against key metrics, identifying process improvements for efficiency.
  • Troubleshooting: Employ methodical approaches for problem resolution, analyzing patterns to implement preventative measures.
  • Technical Communication: Effectively convey technical concepts to stakeholders, demonstrating strong verbal and written communication skills.

Experience
  • Education:
    BS in Computer Science or related technical field (e.g., Physics, Mathematics), or equivalent experience.
  • Professional Experience:
    • 4-7 years in software engineering, systems administration, database management, and networking.
    • 2+ years developing and administering software in public cloud environments.
    • Experience monitoring infrastructure and application uptime to meet functional and performance objectives.
    • Proficient in languages such as Python, Bash, Java, Go, JavaScript, and Node.js.
    • Cross-functional knowledge in systems, storage, networking, security, and databases.
    • System administration skills with automation/orchestration in Linux/Windows using Terraform, Chef, Ansible, Docker, and Kubernetes.
    • Proficient in CI/CD tooling and practices.
  • Certifications:
    Cloud Certification (Strongly Preferred)

"All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran."
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙