Search by job, company or skills

WEX

Site Reliability Engineer 1/2

Early Applicant
  • a month ago
  • Be among the first 50 applicants

Job Description

About The Team/Role

Working closely with the Platform Operations Lead, the Site Reliability Engineer is responsible for building out WEX's Travel engineering solutions and operational problems with a focus on optimizing existing systems, building infrastructure and eliminating work through automation in an Agile environment.

How you'll make an impact

  • Engage in and improve the whole lifecycle of servicesfrom inception and design through deployment, operation and refinement
  • Support capacity planning, availability, scalability, security and latency considerations for new infrastructure and service provisioning as appropriate
  • Scale and optimize existing infrastructure and services sustainably through mechanisms, including automation, and evolve them by improving reliability and efficiency
  • Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
  • Maintain infrastructure and services by measuring, and monitoring system metrics to proactively identify operational efficiencies, potential outages and security threats in Development, UAT, Staging and Production environments
  • Practice sustainable incident response and blameless postmortems
  • Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems
  • Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform
  • Step back to observe patterns and develop innovative tools and automation to eliminate or minimize menial tasks. Use those learnings to drive the best operational practices
  • Develop and maintain solution and operational documentation and designs for all infrastructure and services within the scope of SRE
  • Preserve operational visibility and response capabilities fixing and improving our dashboards, alerts, and automation
  • Take part in on-call rotation as part of the Platform Operations team supporting the Wex Travel Platform

Experience You'll Bring

  • Proficient in one or more of the following scripting languages: JavaScript, Nodejs, Python, PowerShell, Bash, etc
  • 2 years experience working with public cloud platforms, Azure preferable
  • Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible etc.
  • Understanding of standard networking protocols and components such as HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies
  • Understanding of Serverless Application Framework
  • Experience in containerised workloads and management platforms such as Docker or Kubernetes
  • Familiarity with distributed systems is a plus including Microservices
  • Experience in Infrastructure automation tools such as Cloudformation, Terraform
  • Understanding of CI/CD processes and experience with deployment automation tools such as CodePipeline, CodeDeploy, Jenkins, Bamboo
  • Strong debugging, troubleshooting, and problem-solving skills
  • Effective communication, collaboration & negotiation skills with the ability to interface with various business units and third parties
  • Experience liaising with developers, operations staff and third-party resources
  • Understanding of API integration
  • JIRA & Confluence (Desirable)
  • Software Engineering or Computer Science equivalent degree (Desirable)

More Info

Industry:Other

Function:technology

Job Type:Permanent Job

Date Posted: 21/10/2024

Job ID: 97282719

Report Job

About Company

Follow

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Site Reliability Engineer 1

ZelisCompany Name Confidential

Site Reliability Engineer 2

SliceCompany Name Confidential
Last Updated: 23-11-2024 07:31:00 PM
Home Jobs in India Site Reliability Engineer 1/2