Search by job, company or skills
Site Reliability Engineer (SRE) will focus on Scalability, High Availability, Performance, Stability and Reliability of Software Applications. SRE will build automations to simplify operations and processes, collaborate with cross-functional teams to create proactive engineering mechanisms and ensure positive end user experiences. SRE with a good understanding of Products & interdependencies, with blend of pragmatic, operational and software development skills, will apply sound engineering principles, operational discipline, and mature automation to our operating environments and the codebase.
General Duties & Responsibilities
Build software solutions and systems to manage platform infrastructure and applications.
Partner with Tech teams to improve services through rigorous testing & release procedures.
Participate in system design consulting, platform management, and capacity planning.
Improve reliability, quality, and time-to-market of our suite of software solutions.
Build monitoring that alerts on symptoms rather than on outages.
Run production environment by monitoring availability & taking a holistic view of system health.
Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
Provide primary engineering support for multiple large, distributed software applications.
Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
Create sustainable systems and services through automation and uplifts.
Balance feature development speed and reliability with well-defined service level objectives.
Partner with stakeholders to design & deliver a reliable, scalable, secure & performant platform.
Stay current on technical trends to suggest innovative tools and approaches to problems.
A proactive approach to spotting problems, areas for improvement & performance bottlenecks.
Identify and resolve problems promptly to meet and improve service levels and standards.
Technical Skill Requirements
Comfort with Linux/Unix command line.
Conceptually strong in Basics in Technology, Infrastructure, Domain.
Know at least one of (Unix/Windows Scripting, Python, Java, C++, C#)
Exceptional and demonstrable web development experience.
Experience with relational databases, and NoSQL databases.
Ability to Automate repetitive tasks (Scripting, RPA, Power Automate, UiPath, StackStorm etc.)
Knowledge in DevOPS (CI/CD) and relevant tools (e.g., Jenkins)
Experience with Docker in a production environment including container orchestration (e.g., Nomad, Mesos, Kubernetes, etc.)
Experience working on cloud-based infrastructure e.g., AWS, GCP, Azure.
Experience with infrastructure as code (Terraform or CloudFormation).
Knowledge of configuration management systems like Ansible, Chef or Puppet.
Knowledge in OS, Network, Middleware, Database, SSL (Secure Sockets Layer), Load Balancer
Strong Knowledge in Tools like Dynatrace, Splunk & ability to create Dashboards, Views/Alerts
Strong Problem-Solving Skillset To troubleshoot and solve a problem quickly.
Educational Requirements
Bachelor's degree in computer science or other highly technical, scientific discipline.
Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript.
General Knowledge, Skills & Abilities
Previous success in technical engineering
Coding experience beyond simple scripts
Detective and Problem-Solving Skills
Analytical and Proactive mindset
Strong and Assertive Communication
Passion for Continuous Improvement
Date Posted: 20/06/2024
Job ID: 82336985