Posting title: Site Reliability Engineer (SRE)
Experience: 7+ Years
Location: Bangalore
Work mode: Hybrid
Primary skills: Terraform, Ansible, AWS, Kubernetes, Openshift, CI/CD
Qualification: Any Engineering/ Computers degree
Responsibilities:
- Design, build, and maintain highly available, scalable, and performant
infrastructure for our applications.
- Implement and manage Infrastructure as Code (IaC) using Terraform and
Ansible.
- Automate deployments and configuration management for AWS, on-premise
environments, and container orchestration platforms (Kubernetes, EKS, Openshift).
- Integrate CI/CD pipelines to automate infrastructure provisioning and
configuration changes.
- Implement and maintain monitoring solutions (e.g., Datadog or Prometheus)
to proactively identify and troubleshoot infrastructure issues.
- Collaborate with development teams to ensure infrastructure aligns with
application requirements and best practices.
- Participate in incident response procedures for infrastructure and application
issues.
- Continuously improve infrastructure performance, scalability, and reliability.
- Stay up-to-date with the latest trends and technologies in cloud computing,
containers, and monitoring.
Key Attributes and Qualifications:
- Proven experience as a Site Reliability Engineer (SRE) or related role.
- Strong understanding of cloud computing concepts (AWS preferred).
- Experience with on-premise deployments (optional but a plus).
- Expert knowledge of Infrastructure as Code (IaC) tools like Terraform and Ansible.
- Familiarity with container orchestration platforms (Kubernetes, EKS, Openshift).
- Experience with CI/CD pipelines and tools.
- Proficiency with configuration management tools.
- Experience with monitoring tools like Datadog or Prometheus.
- Strong scripting skills (Bash, Python preferred).
- Excellent problem-solving and analytical skills.
- Ability to work independently and as part of a team.
- Strong communication and collaboration skills.