Senior Site Reliability Engineer -AWS

athenahealth

Early Applicant

5 months ago
Be among the first 50 applicants

Exp: 7-9 Years

Bengaluru / Bangalore, India

Job Description

We are looking for a Senior Site Reliability Engineer to join our Cloud Infrastructure Engineering division. Cloud Infrastructure Engineering ensures the continuous availability of the technologies and systems that are the foundation of athenahealth's services. We are directly responsible for thousands of servers, petabytes of storage, and handling thousands of web requests per second, all while sustaining growth at a meteoric rate. We enable an operating system for the medical office that abstracts away administrative complexity, leaving doctors free to practice medicine.

What we are looking for is :

You're a seasoned engineer with a passion for identifying and resolving reliability and scalability challenges. You are a curious team player, someone who loves to explore, learn, and make things better. You are excited to uncover inefficiencies in business processes, creative in finding ways to automate solutions, and relentless in your pursuit of greatness. You're a nimble learner capable of quickly absorbing complex solutions and an excellent communicator who can help evangelize engineering excellence.

The Team:

We are a group of Site Reliability Engineers who are passionate about reliability, automation, and scalability. We use an agile based framework to execute our work, ensuring we are always focused on the most important and impactful needs of the business. We support systems in both private and public cloud and make data-driven decisions for which one best suits the needs of the business. We are relentless in automating away manual, repetitive work so we can focus on projects that help move the business forward.

Primary Responsibilities

Deploying, maintaining, and managing: Deploying, automating, and managing an AWS production system
Ensuring reliability: Ensuring that AWS production systems are reliable, secure, and scalable.
Resolving problems: Resolving problems across multiple platforms and application domains using system troubleshooting and problem-solving techniques
Provide primary operational support and engineering for all Cloud and Enterprise deployments.
Monitoring system performance: Monitoring system performance and identifying downtimes along with the underlying causes.
Create and develop cost-effective systems within an account.

Secondary Responsibilities

Working closely with developers, testers, and system administrators
Introducing processes, tools, and methodologies to balance needs throughout the SDLC and/or pipeline management and data flow.
Integrating security measures: Integrating security measures in the development lifecycle.

Typical Qualifications

7+ years of experience building, scaling, and supporting highly available systems and services.
Expertise in the delivery, maintenance, and support of Linux systems and infrastructure.
Experience building AWS platforms.
Extensive AWS experience: Working familiarity with AWS commonly used services (Computing/EC2, Networking, Content delivery, Containers/ECS/EKS, storage/S3, CloudFormation, Serverless computing/Lambda, Load balancing, AMIs, Operation management best practices etc.) required.
Expertise in configuration management tools like puppet. Experience with Infrastructure-as-Code, Linux, and API integration. Familiarity with Terraform desired.
Proficiency in at least one scripting or programming language (Ansible, Python, Go, Ruby, Shell etc.)
Experience implementing solutions using SRE, DevOps principles, Continuous integration & continuous delivery, source code management/version control/bitbucket/github.
Familiarity with telemetry, observability, latest monitoring, visualization tools e.g., Prometheus, Alertmanager, Grafana or similar tools desired.
Expertise in promoting and driving system visibility to aid in the rapid detection and resolution of issues.

Behaviors & Abilities Required:

Ability to learn and adapt in a fast-paced environment.
Ability to work collaboratively on a cross-functional team with a wide range of experience levels.
Ability to prioritize both individual time and the time of the team.
Strong negotiation and problem-solving skills
Ability to keep projects on track and provide regular progress updates.
Ability to context-switch when required and manage multiple projects simultaneously.
Participation in Rotational On-Calls and/or work Shift