Sr. Site Reliability Engineer

Avalara

Early Applicant

5 months ago
Be among the first 50 applicants

Exp: 4-7 Years

Full time

Delhi, Hyderabad / Secunderabad, Telangana, Bengaluru / Bangalore, Chennai, Kolkata, Mumbai, Pune, India

Job Description

As a member of our Reliability Engineering Product SRE team, you will be responsible for building Products/Services aligning to reliability standards and principles with higest level of MVRs ( Minimum Viable requirements) & SMMs (Software maturity metrics) and ensuring customer satisfaction through your expertise in SRE domain skills
We are seeking an individual who is interested in automation and efficiency
You will be using bundled tech stack to show how the customer, product, and infrastructure are interacting or behaving
You will have a keen eye for customer satisfaction based on numbers (SLO, SLI, SLA) and will be expected to know the golden metrics that drive it
You will programmatically approach MVRs using coding languages, and know scripting languages

* Build/automate producs and infrastructures with highest level of MVR s/SMMs aligning to reliability standards/principles

* Being a key stakeholder/embedded SRE from RELE team to colloborate and partner with enginering teams from Products inception to production.

* Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring

* Solid coding skills in golang/Python, GitLab, or Perl and a history of automating workloads

* Conduct workshops in SRE champion meetings to evangelize MVRs/SMMs to engineering org.
* Setup and operate observability tools across multiple cloud providers.
* Create reusable observability components to assist with onboarding to observability tools.
* Assist development teams to define SLO/SLI dashboards and alerts.
* Deep expertise in the mentality, processes, and tools needed to deliver five nines SLAs.

* Managing/Administering observability tools like Grafana, Prometheus, Loki across multiple cloud providers.
* On-boarding feature development teams to RELE platforms and standards.
* Being a part of oncall rotation.Troubleshoot and support the production environments.

Job Duties

* Build/automate producs and infrastructures with highest level of MVR s/SMMs aligning to reliability standards/principles
* Being a key stakeholder/embedded SRE from RELE team to colloborate and partner with enginering teams from Products inception to production.
* Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring
* Solid coding skills in golang/Python, GitLab, or Perl and a history of automating workloads
* Conduct workshops in SRE champion meetings to evangelize MVRs/SMMs to engineering org.

* Setup and operate observability tools across multiple cloud providers.

* Create reusable observability components to assist with onboarding to observability tools.

* Assist development teams to define SLO/SLI dashboards and alerts.

* Deep expertise in the mentality, processes, and tools needed to deliver five nines SLAs.
* Managing/Administering observability tools like Grafana, Prometheus, Loki across multiple cloud providers.

* On-boarding feature development teams to RELE platforms and standards.

* Being a part of oncall rotation.Troubleshoot and support the production environments.

What you'll Need to be Successful

* Minimum of 7 years experience in a SaaS environment.
* A 4 Year Bachelors engineering degree in Computer Science.
* Ability to participate in an on-call rotation.
* Networking: A good understanding of the OSI model, TCP/IP, and DNS; particularly as it relates to cloud environments.
* Linux Fundamentals: Solid experience with the administration, security hardening, and performance tuning of one or more distributions of Linux.
* Troubleshooting: A passion for tracking down technical root causes of distributed systems, and software.
* Observability: Experience with developing service level indicators and objectives, instrumenting software, and building alerts.
* Software Engineering: An understanding of software engineering fundamentals with experience developing software with a team of engineers.
* Automation: A strong desire to automate all of the things and eliminate toil.
* Containers: A solid understanding of the underpinnings of container technology such as cgroups and namespaces.
* Container Orchestration Systems: Experience with the operations, administration, and development of orchestration systems such as Kubernetes, ECS, Mesos, Nomad.
* Infrastructure-as-Code: Experience with deploying and maintaining infrastructure as code with tools such as Terraform, Pulumi.
* Technical Writing: Most of the services we develop are greenfield, and you will need to build documentation and diagrams for other engineering teams.
* Customer Satisfaction: Keen eye for customer satisfaction (our customers are other engineering teams and Avalara customers).
* Passion for Learning: Interest in the broader technology space with a constant desire to expand your understanding.
* Adaptability: Experience working on a variety of projects. In short, we want people with T-shaped skills.
* Tools & Technologies we are looking as part of the skillset: Terraform, Grafana, Prometheus, Loki, Alert manager, Pushgateway, Prometheus exporters & client libraries, PromQL, LogQL, Fluentd, Fluent-bit, Sumologic, Splunk, Tempo, Jaeger, OpenTelemetry, Cortex, etc Other Common Tools & Technologies expected :AWS, GCP, Oracle Cloud, Terraform, GitLab, Artifactory, Atlassian suite, GIT, Kubernetes, Go, C#, Python, Bash, Powershell, Docker, Windows, Linux, etc