Company:
Datum Technologies Group
Location: Kansas City
Closing Date: 01/11/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Site Reliability Engineer (SRE)
Function: Technology Engineering and Service Operations
Family: Automation Engineering
Dynamic Site Reliability Engineer skilled in provisioning and managing technology infrastructure to support enterprise-level services. Proven track record in designing, building, and maintaining efficient technology platforms that meet both internal and external customer needs while effectively managing associated risks.
Key Responsibilities
Experience
"All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran."
Function: Technology Engineering and Service Operations
Family: Automation Engineering
Dynamic Site Reliability Engineer skilled in provisioning and managing technology infrastructure to support enterprise-level services. Proven track record in designing, building, and maintaining efficient technology platforms that meet both internal and external customer needs while effectively managing associated risks.
Key Responsibilities
- Collaborate in a DevSecOps environment to build and operate large-scale, distributed, fault-tolerant systems.
- Partner with development and operations teams to design highly available, cost-effective systems with superior uptime metrics.
- Resolve trouble tickets with the cloud operations team; develop scripts for troubleshooting and incident management.
- Create tools and scripts for auto-remediation of incidents, implementing comprehensive monitoring and alerting systems.
- Develop Infrastructure as Code (IaC) patterns compliant with security and engineering standards using Terraform and cloud SDKs.
- Serve as a first responder in a 24/7 operations model for incident and problem management.
- DevSecOps: Apply best practices to enhance product resilience, ensuring well-engineered solutions through coding, testing, and documentation.
- Systems Thinking: Leverage knowledge of system integrations to enhance operational performance and achieve availability goals.
- Operational Excellence: Monitor and measure systems against key metrics, identifying process improvements for efficiency.
- Troubleshooting: Employ methodical approaches for problem resolution, analyzing patterns to implement preventative measures.
- Technical Communication: Effectively convey technical concepts to stakeholders, demonstrating strong verbal and written communication skills.
Experience
- Education:
BS in Computer Science or related technical field (e.g., Physics, Mathematics), or equivalent experience. - Professional Experience:
- 4-7 years in software engineering, systems administration, database management, and networking.
- 2+ years developing and administering software in public cloud environments.
- Experience monitoring infrastructure and application uptime to meet functional and performance objectives.
- Proficient in languages such as Python, Bash, Java, Go, JavaScript, and Node.js.
- Cross-functional knowledge in systems, storage, networking, security, and databases.
- System administration skills with automation/orchestration in Linux/Windows using Terraform, Chef, Ansible, Docker, and Kubernetes.
- Proficient in CI/CD tooling and practices.
- Certifications:
Cloud Certification (Strongly Preferred)
"All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran."
Share this job
Datum Technologies Group