Senior Monitoring Engineer

Sailpoint Technologies

Early Applicant

3 months ago
Be among the first 50 applicants

Exp: 7-10 Years

Full time

Pune, India

Job Description

As a member of 24/7 NOC team, oversee the whole platform ensuring stability and performance
and monitor production releases based on the complexity and risk assessment.
Make it easy for everyone to create, consume, manage, and scale reliable cloud production services
to achieve more
Work independently or collaboratively on SailPoint SaaS services to design, develop, and improve
end-to-end reliability and maintainability for all services
Coach engineering teams on observability best practices such as setting up well defined Service Level
Objectives (SLOs).
Lead engineering teams through post-incident reviews to define effective preventive actions
Collaborate effectively with developers to increase system reliability through short-term embedding
programs
Enable our engineering teams to scale our enterprise operations by providing guidance, best practices
and support as part of an SRE Centre of Excellence
Manage cross-functional requirements working with Engineering, Product, Services, and other
departments
Develop and implement automation tools and processes to streamline operations and enhance system
performance.
Be a mentor of quality for design reviews, code, test cases, automation, observability, root cause
analysis, and self-healing
Influence architectural design, implementation, consolidation, and simplification for global scale
Focuses on expanding own skills and looking at improving their teammates skills
Drive operational excellence to deliver frictionless operation, happy on call, and optimal customer
experience

Requirements

7-10 years of experience working in an agile software development, infrastructure operations, or
application management with SaaS software or cloud service provider organizations.
5+ years of experience using NOC or SRE tactics to monitor Engineering production operations
supporting a highly available environment for SaaS software or cloud service provider.
Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code.
Experience with containerization technology and/or Kubernetes
Experience with metrics, tracing, and logging observability tools such as Prometheus, Grafana,
Honeycomb, Jaeger, and Kibana
Experience with incident management, including conducting incident reviews
Good to have experience with programming languages (Java, Python, Go, etc).
Strong understanding of Linux, software development, systems, networking, and Cloud concepts Experience working with remote teams (US time zones).
Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers
who are not direct reports.
Have excellent communication skills - English fluency

Preferred :

Bachelors degree in Computer Science or other technical discipline