Senior Site Reliability Engineer

Company name confidential

Quick Apply

3-6 Years

INR 9.1 - 15.2 LPA

ITES/BPO/Call Center

Job Description

HIRING FOR TATA CONSULTANCY SERVICES (TCS)

Responsibilities:

You will create infrastructure as a code (IaC) and automate manual processes using tools like Bash.
You will automate the deployment of applications and services to staging and production environments. This includes building CI, and CD pipelines, containerization and orchestration of workloads, configuration management, etc.
You will build auto-scaling systems that scale up or down based on user demands.
You will build observability into systems, making it easier to find and resolve issues before they blow up in production.
You will implement ways to improve system performance and optimize cloud costs.
Meticulously create RCAs, runbooks, and checklists and follow them diligently.
You own the reliability of systems that are live on production.
Auto-scaling eKYC Machine Learning workloads to handle 2 Million API requests/day.
Migrating 1.3TB of primary data from self-hosted MySQL to GCP CloudSQL.
Building a control plane for multi-cluster Kubernetes setup.
Implementing GitOps for continuous deployment of microservices.
Migrating background jobs from VMs to Kubernetes using KEDA.

Requirements:

Understanding of basic bash scripting and computer networking (SSH, TCP, HTTP).
Experience with using a programming language (we primarily use Go) to build a basic REST API.
Experience with using Git as a version control system.
Experience with any of the cloud providers (AWS, GCP, etc) to deploy a three-tier web app.
The high-level idea of system components (databases, cache, reverse proxies, CDNs) to understand how and where they fit in the big picture.
Experience in creating CI, and CD pipelines to build and deploy at least a simple REST API application to dev/prod environments.
Ability to take code from the local to prod by implementing Continuous Integration and Delivery principles.
Exposure to building, scaling, and deploying software using a 12-factor app (https://12factor.net/principles.
Experience in working with Microservices and use of container orchestration tools like Kubernetes/Nomad.
Experience with using Observability tools and setting up monitoring and alerting for microservices using Prometheus, Grafana, Loki, ELK stack Datadog, and the like.
Implementing everything as code - from infra to policies, security, configuration, etc. using relevant tools such as Terraform, OPA, Ansible, etc.
Experience with building cloud-agnostic homogenous deployment solutions.