Job Title: Cloud SRE
Exp- 8+ Years
Key Responsibilities:
- Manage, monitor, and optimize cloud infrastructure preferably onAzureand(Google Cloud Platform (GCP) OR AWS), ensuring high availability and performance.
- Design, deploy, and maintain containerized applications usingKubernetesand related tooling.
- Implement infrastructure as code (IaC) using tools likeTerraformandAnsibleto automate environment provisioning, configuration, and scaling.
- Build and maintainCI/CD pipelinesusingJenkins,Git, andGitOpsprinciples to ensure smooth deployment and integration processes.
- ApplySREbest practices to improve system reliability, availability, and performance through monitoring, alerting, and automation.
- Work closely with development, operations, and QA teams to streamline processes and promote a culture of continuous improvement.
- Participate and be a key player in diagnosing and resolving production issues.
- Maintain comprehensive documentation for systems, procedures, and processes.
Required Skills and Experience:
- Strong hands-on experience inAzureorGCPcloud environments.
- Proficiency inKubernetes,Ansible,Terraform, andGit.
- Solid understanding ofCI/CD pipelinesand related tools such asJenkinsandGitOps.
- Familiarity withDevOpspractices, including automation, continuous integration, and continuous deployment.
- Knowledge of software development and its intersection with infrastructure and operations.
- Experience withSREprinciples, such as monitoring, alerting, reliability metrics, and incident management.
- Experience with scripting languages such asPython,Shell Scripting.
- Certification inAzure,GCP, orKubernetesis a plus.
- Experience with monitoring and logging tools likePrometheus,Grafana, orELKstack.
- Excellent problem-solving skills and a proactive attitude.
Strong communication and collaboration skills