- As a member of our Reliability Engineering Product SRE team, you will be responsible for building Products/Services aligning to reliability standards and principles with higest level of MVRs ( Minimum Viable requirements) & SMMs (Software maturity metrics) and ensuring customer satisfaction through your expertise in SRE domain skills
- We are seeking an individual who is interested in automation and efficiency
- You will be using bundled tech stack to show how the customer, product, and infrastructure are interacting or behaving
- You will have a keen eye for customer satisfaction based on numbers (SLO, SLI, SLA) and will be expected to know the golden metrics that drive it
- You will programmatically approach MVRs using coding languages, and know scripting languages
* Build/automate producs and infrastructures with highest level of MVR s/SMMs aligning to reliability standards/principles
* Being a key stakeholder/embedded SRE from RELE team to colloborate and partner with enginering teams from Products inception to production.
* Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring
* Solid coding skills in golang/Python, GitLab, or Perl and a history of automating workloads
* Conduct workshops in SRE champion meetings to evangelize MVRs/SMMs to engineering org.
* Setup and operate observability tools across multiple cloud providers.
* Create reusable observability components to assist with onboarding to observability tools.
* Assist development teams to define SLO/SLI dashboards and alerts.
* Deep expertise in the mentality, processes, and tools needed to deliver five nines SLAs.
* Managing/Administering observability tools like Grafana, Prometheus, Loki across multiple cloud providers.
* On-boarding feature development teams to RELE platforms and standards.
* Being a part of oncall rotation.Troubleshoot and support the production environments.
Job Duties * Build/automate producs and infrastructures with highest level of MVR s/SMMs aligning to reliability standards/principles
* Being a key stakeholder/embedded SRE from RELE team to colloborate and partner with enginering teams from Products inception to production.
* Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring
* Solid coding skills in golang/Python, GitLab, or Perl and a history of automating workloads
* Conduct workshops in SRE champion meetings to evangelize MVRs/SMMs to engineering org.
* Setup and operate observability tools across multiple cloud providers.
* Create reusable observability components to assist with onboarding to observability tools.
* Assist development teams to define SLO/SLI dashboards and alerts.
* Deep expertise in the mentality, processes, and tools needed to deliver five nines SLAs.
* Managing/Administering observability tools like Grafana, Prometheus, Loki across multiple cloud providers.
* On-boarding feature development teams to RELE platforms and standards.
* Being a part of oncall rotation.Troubleshoot and support the production environments.
What you'll Need to be Successful * Minimum of 7 years experience in a SaaS environment.
* A 4 Year Bachelors engineering degree in Computer Science.
* Ability to participate in an on-call rotation.
* Networking: A good understanding of the OSI model, TCP/IP, and DNS; particularly as it relates to cloud environments.
* Linux Fundamentals: Solid experience with the administration, security hardening, and performance tuning of one or more distributions of Linux.
* Troubleshooting: A passion for tracking down technical root causes of distributed systems, and software.
* Observability: Experience with developing service level indicators and objectives, instrumenting software, and building alerts.
* Software Engineering: An understanding of software engineering fundamentals with experience developing software with a team of engineers.
* Automation: A strong desire to automate all of the things and eliminate toil.
* Containers: A solid understanding of the underpinnings of container technology such as cgroups and namespaces.
* Container Orchestration Systems: Experience with the operations, administration, and development of orchestration systems such as Kubernetes, ECS, Mesos, Nomad.
* Infrastructure-as-Code: Experience with deploying and maintaining infrastructure as code with tools such as Terraform, Pulumi.
* Technical Writing: Most of the services we develop are greenfield, and you will need to build documentation and diagrams for other engineering teams.
* Customer Satisfaction: Keen eye for customer satisfaction (our customers are other engineering teams and Avalara customers).
* Passion for Learning: Interest in the broader technology space with a constant desire to expand your understanding.
* Adaptability: Experience working on a variety of projects. In short, we want people with T-shaped skills.
* Tools & Technologies we are looking as part of the skillset: Terraform, Grafana, Prometheus, Loki, Alert manager, Pushgateway, Prometheus exporters & client libraries, PromQL, LogQL, Fluentd, Fluent-bit, Sumologic, Splunk, Tempo, Jaeger, OpenTelemetry, Cortex, etc Other Common Tools & Technologies expected :AWS, GCP, Oracle Cloud, Terraform, GitLab, Artifactory, Atlassian suite, GIT, Kubernetes, Go, C#, Python, Bash, Powershell, Docker, Windows, Linux, etc