Job Description And Requirements
Site Reliability Engineering, Sr Staff
The Engineering Excellence Group drives innovation velocity and enterprise infrastructure automation, which are critical elements of our growth and scaling strategy. This team is chartered to drive significant productivity, robustness, agility, and time-to-market advantage in the creation of Synopsys products and solutions. The group also leads corporate infrastructure transformation as we continue to drive IT operations leadership and invest in the next wave of disruptive technologies.
Responsibilities
Key Roles & Responsibilities
- Discover, design, implement changes to existing IT infrastructure with the focus of improved reliability, performance, and standardization.
- Collaborate with Engineering and business units to translate customer, business, and technical requirements into SRE architectural designs and enhancements.
- Ensure efficient resource utilization and continuously improve processes leveraging automation and internal tools resulting in enhanced service delivery, maturity, and scalability.
- Troubleshoot production issues providing root cause analysis and designing solutions to prevent future occurrences.
- Monitoring of services and creating intelligent alarming for quicker incident detection and resolution.
- Maintain vulnerability management processes and policies using a risk-based priority methodology.
- Collaborate with the various teams and platform owners on all vulnerability management and reporting.
- Mentor and coach other SRE team members.
- Strategically apply architectural and infrastructure disciplines to solve business problems.
Required Skills
- Extensive experience with a wide range of infrastructure technologies, such as but not limited to Linux, Windows, High-performance computing , storage platforms, networking, cloud computing, cloud services (IaaS, PaaS, SaaS, etc.), virtualization, OpenStack, containerization, and orchestration technologies (e.g., Docker, Kubernetes). Solid understanding of the underpinnings of container technology such as Cgroups and Namespaces.
- Deep understanding of IT infrastructure related services and their dependencies required to troubleshoot issues and define mitigations.
- Solid experience with the administration, security hardening, and performance tuning of Linux and Windows OS. In-depth knowledge of CIS benchmarking standards.
- Experience with developing service level indicators and objectives, instrumenting software, and building alerts.
- An understanding of software engineering fundamentals with experience developing software with a team of engineers. Strong experience in the practice of testing.
- Experience with the operations, administration, and development of orchestration systems such as Kubernetes, ECS, Mesos.
- Passion for tracking down technical root causes of distributed systems, and software.
- Experience with ITAM, Service Mapping, and CMDB (service-now)
- Strong technical foundation, with the ability to engage deeply on technical topics related to data center and cloud infrastructure, software reliability, and operational practices.
- Proficiency in ITIL (Information Technology Infrastructure Library) processes and frameworks
- Service availability-oriented mindset with a pro-active approach to problem solving. An ideal candidate should be able to develop automated solutions to prevent recurring problems.
- Possesses the ability and willingness to challenge the status-quo and optimize current processes and procedures.
Experience & Education
- Masters/bachelor's degree with minimum of 12+ years of experience in IT infrastructure & operations with 6+ years in an SRE role
- 12+ years of experience with infrastructure architecture design, implementation, and support in large organizations.
- Implementation experience in infra-automation tools and frameworks like GitHub, Maven/Gradle, Jenkins, Terraform (IaC), Ansible, Shell scripting.
- Hands on experience with one or more of Java/Python/Go/NodeJS languages.
- Well versed with SDLC, Agile processes and CI/CD tools.
- Well versed with ITIL process including incident, request and change management.
- Strong understanding of cloud, automation, networking and SIEM tools.
- Excellent verbal and written communication skills
- Excellent problem-solving skills and ability to work through issues and challenges .
Job Category
Information Technology
Country
India
Job Subcategory
Site Reliability
Hire Type
Employee