Search by job, company or skills

Movius

SRE II - Observability & Reliability

Early Applicant
  • a month ago
  • Be among the first 50 applicants

Job Description

Job Summary

We are seeking a Senior Software Engineer to join our Site Reliability Engineering team, with a focus on Observability and Reliability. As a key member of our SRE team, you will play a critical role in ensuring the performance, stability, and availability of our applications and systems with a focused approach in Application Performance Management, Observability & Reliability of the platform.

The Senior Software Engineer will be responsible for the design, implementation, and maintenance of our observability and reliability infrastructure, with a primary focus on the ELK stack (Elasticsearch, Logstash, and Kibana). The role involves configuring, fine-tuning, and automating alerts, integrating Elastic solutions with other tools and applications, generating reports, and optimizing the observability and monitoring systems.

Key Duties & Responsibilities

1

Collaborate with cross-functional teams to define and implement observability and reliability standards and best practices.

2

Design, deploy, and maintain the ELK stack for log aggregation, monitoring, and analysis.

3

Develop and maintain alerts and monitoring systems, ensuring early detection of issues and rapid incident response.

4

Create, customize, and maintain dashboards in Kibana for different stakeholders.

5

Collaborate with software development teams to identify performance bottlenecks and recommend solutions.

6

Automate manual tasks and workflows to streamline observability and reliability processes.

7

Conduct regular system and application performance analysis and optimization, effective automation & tooling, capacity planning and optimization, security practices and compliance adherence, documentation and knowledge sharing, Disaster Recovery and backup.

8

Generate and deliver detailed reports on system performance and reliability metrics.

9

Stay up to date with industry trends and best practices in observability and reliability engineering.

Qualifications/Skills/Abilities

Minimum Requirements

Formal Education

Bachelors degree in computer science, Information Technology, or a related field (or equivalent experience).

Experience (type & duration)

5+ years of experience in Site Reliability Engineering, Obervability & reliability, DevOps

Skills

  • Proficiency in configuring and maintaining the ELK stack (Elasticsearch, Logstash, Kibana) is mandatory.
  • Strong scripting and automation skills, with expertise in Python, Bash, or similar languages.
  • Experience in Data structures using Elasticsearch Indices.
  • Experience in writing Data Ingestion Pipelines using Logstash.
  • Experience with infrastructure as code (IaC) and configuration management tools (e.g., Ansible, Terraform).
  • Handson and experience with cloud platforms ( AWS preferred) and containerization technologies (e.g., Docker, Kubernetes).
  • Good to have Telecom domain expertise but not mandatory
  • Strong problem-solving skills and the ability to troubleshoot complex issues in a production environment.
  • Excellent communication and collaboration skills.

Accreditation/certifications/licenses

Relevant certifications (e.g., Elastic Certified Engineer) are a plus.

More Info

Industry:Other

Job Type:Permanent Job

Date Posted: 08/10/2024

Job ID: 95391095

Report Job

About Company

Movius
Follow

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

SDE II I Site Reliability Engineering Consumer Technology

OesonCompany Name Confidential

Site Reliability Engineer SRE

TATA Consultancy Services Ltd Company Name Confidential
Last Updated: 20-10-2024 07:52:29 PM
Home Jobs in Bengaluru / Bangalore SRE II - Observability & Reliability