We are looking for Site Reliability Engineers who can manage, maintain and troubleshoot Alkiras world class cloud networking solution round the clock. In this role, you will w ork in a product company where you get to sharpen your existing skills and get exposed to a wide range of technologies and constructs ranging from microservices, devops methodologies, Kubernetes, Terraform, data networking and security.
Responsibilities:
- You will be responsible for the availability and integrity of the infrastructure that underpins Alkira s Cloud Networking platform
- You hold the production systems together; troubleshoot issues that arise in production deployment
- Provide 24x7 coverage as a part of scheduled shift and on-call rotation
- Work with multiple tools like Prometheus, Grafana, Jira etc. to monitor, manage, triage and document infrastructure issues in real time
- Automate infrastructure deployment using CI/CD
- Build necessary tools to evolve how we maintain and monitor our solution
- Develop and execute system and integration test plans
Requirements:
- At least 2 years experience in management of production systems
- Self starter and a solution oriented mindset. You see potential challenges as opportunities to learn and grow
- Experience with cloud providers, AWS, Azure or GCP
- Experience with computer networking and network technologies
- Experience with CI/CD pipelines such as Concourse-CI, Jenkins.
- Experience with Kubernetes
- Excellent problem-solving skills and ability to quickly grasp new concepts
- Highly desirable candidates with Hashicorp Certified: Terraform Associate