Search by job, company or skills

WEX Inc.

Site Reliability Engineer 1/2

Early Applicant
  • a month ago
  • Be among the first 50 applicants

Job Description

About the Team/Role

Working closely with the Platform Operations Lead, the Site Reliability Engineer is responsible for building out WEX's Travel engineering solutions and operational problems with a focus on optimizing existing systems, building infrastructure and eliminating work through automation in an Agile environment.

How you'll make an impact

Engage in and improve the whole lifecycle of servicesfrom inception and design through deployment, operation and refinement

Support capacity planning, availability, scalability, security and latency considerations for new infrastructure and service provisioning as appropriate

Scale and optimize existing infrastructure and services sustainably through mechanisms, including automation, and evolve them by improving reliability and efficiency

Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence

Maintain infrastructure and services by measuring, and monitoring system metrics to proactively identify operational efficiencies, potential outages and security threats in Development, UAT, Staging and Production environments

Practice sustainable incident response and blameless postmortems

Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems

Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform

Step back to observe patterns and develop innovative tools and automation to eliminate or minimize menial tasks. Use those learnings to drive the best operational practices

Develop and maintain solution and operational documentation and designs for all infrastructure and services within the scope of SRE

Preserve operational visibility and response capabilities fixing and improving our dashboards, alerts, and automation

Take part in on-call rotation as part of the Platform Operations team supporting the Wex Travel Platform

Experience you'll bring

Proficient in one or more of the following scripting languages: JavaScript, Nodejs, Python, PowerShell, Bash, etc

2 years experience working with public cloud platforms, Azure preferable

Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible etc.

Understanding of standard networking protocols and components such as HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies

Understanding of Serverless Application Framework

Experience in containerised workloads and management platforms such as Docker or Kubernetes

Familiarity with distributed systems is a plus including Microservices

Experience in Infrastructure automation tools such as Cloudformation, Terraform

Understanding of CI/CD processes and experience with deployment automation tools such as CodePipeline, CodeDeploy, Jenkins, Bamboo

Strong debugging, troubleshooting, and problem-solving skills

Effective communication, collaboration & negotiation skills with the ability to interface with various business units and third parties

Experience liaising with developers, operations staff and third-party resources

Understanding of API integration

JIRA & Confluence (Desirable)

Software Engineering or Computer Science equivalent degree (Desirable)

More Info

Industry:Other

Job Type:Permanent Job

Date Posted: 08/10/2024

Job ID: 95365155

Report Job

About Company

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Site Reliability Engineer 1 2

WEXCompany Name Confidential

Site Reliability Engineer 1

GranicusCompany Name Confidential
Last Updated: 20-10-2024 05:30:39 PM
Home Jobs in Remote Site Reliability Engineer 1/2