Site Reliability Engineer

Company:  Tekwissen
Location: Tampa
Closing Date: 19/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Overview:
TekWissen Group is a workforce management provider throughout the USA and many other countries in the world. Our client is an American multinational information technology services and consulting company and is a leading provider of information technology, consulting, and business process outsourcing services, dedicated helping the world's leading companies build stronger businesses.
Title: Site Reliability Engineer
Work Location: Tampa, FL 33607
Job Type: Contract
Work Type: Onsite
Duration: 3 Months
Job Description:
Top Qualifications:
Advanced Kubernetes -
  • Must have strong skills in Kubernetes at scale using one of GKE, AKS, EKS or RKE. Experience with Kubectl and Helm.
Containers -
  • Experience deploying Java (Spring Boot) microservices in dockerized environments.
Observability -
  • Experience in setting up tools like Prom/Grafana, Datadog, AppDynamics, Splunk. to give actionable intel on a microservice environment including but not limited to synthetics, Application performance monitoring,logging and Alerting (Pagerduty/OpsGenie Integrations).
Good CI/CD expertise -
  • Jenkins, Azure DevOps, Github Actions, ArgoCD, Artifactory, Azure container registry, Google container registry and other similar tooling
SCM -
  • Working with tools like Github/Gitlab for source code management and well as experience with branching strategies like GitFlow and trunk based.
Job Summary:
  • We are looking for a seasoned Site Reliability Engineer to augment our team to support its strategy of driving products and technology into everything they deliver to accelerate the growth in business.
  • As a SRE, you'll work as part of a team of problem solvers, helping to solve complex business issues from strategy to execution.
  • The team covers a variety of responsibilities that are executed by DevSecOps, Site Reliability and ML Ops Engineers, including:
  • Defining standard reliability and resilience for infrastructure and application components.
  • Proactive optimization of redundancies, monitoring and alerting practices and patterns
  • Developing resilient and highly available distributed systems.
  • Infrastructure as Code development for building cloud tools.
  • Secrets and configuration management
  • Monitoring systems and services, providing incident and emergency response to triage and resolve system or client issues
  • Management of the application ecosystem improving platform infrastructure and applications with high reliability, resiliency, performance, and quality
  • Supporting documentation, knowledge articles, and runbooks
  • Designing, building, and Implementing SRE patterns that adhere to our client's security

TekWissen® Group is an equal opportunity employer supporting workforce diversity.
Apply Now
An error has occurred. This application may no longer respond until reloaded. Reload 🗙