Welcome to the Production Operations Engineering team, the backbone of PayPals Critical Response Engineering. Our team is a vibrant and enthusiastic mix of engineers who are dedicated to triaging issues and designing innovative solutions. We focus on implementing automation to streamline incident management processes at PayPal. By joining our team, youll be part of a dynamic group that plays a crucial role in ensuring the reliability and efficiency of our services.
Job Description:
Your way to impact
As a member of the Production Operations Engineering team, you have the unique opportunity to create a significant impact in several ways:
- Enhancing System Reliability : By quickly identifying and remediating production-impacting incidents, you ensure the high availability and reliability of PayPal s services, directly contributing to a seamless customer experience.
- Innovating Automation : Design and implement cutting-edge automation solutions that simplify and expedite incident management processes. Your work in this area can reduce manual intervention, increase efficiency, and prevent future issues. Collaborative Problem-Solving : Engage with cross-functional teams to troubleshoot and resolve complex technical issues. Your collaborative efforts help foster a culture of continuous improvement and innovation within the organization. Proactive Monitoring : Develop and refine monitoring tools that provide actionable alerts, enabling the team to address potential issues before they impact users. Your contributions here enhance our proactive stance on system health and performance.
- Knowledge Sharing : Participate in and lead knowledge transfer sessions, sharing your expertise and insights with peers and new team members. This helps build a stronger, more knowledgeable team and prepares the next generation of engineers for success.
- By leveraging your technical skills and innovative mindset, you can drive meaningful change and support PayPal s mission to be the worlds most trusted and convenient digital payments platform.
Your day to day
Command Center Production Operations Engineers are the core of PayPals Critical Response Engineering team. In addition to actively contributing to PayPals monitoring and automation platforms, Prod Ops engineers will leverage their system administration knowledge and software engineering skills to quickly identify and remediate production impacting incidents.
- First responders to any system and application events/impacts that are detected in the PayPal Command Center by various monitoring systems
- Develop and enhance production monitoring and management capabilities leveraging existing platforms and tools
- Work independently and within a team to triage and remediate production system and application incidents
- Handle escalations from PayPal partners about critical issues
- Work with Technical Duty Officer and other support teams inside and outside the Command Center in troubleshooting, escalating, and resolving critical site incidents
- Identify recurring system and application issues and work with cloud teams, infra teams, product development, vendors and other stakeholders in investigating and resolving cause
- Send external communications regarding outages to PayPal merchants, partners and customers
- Maintain accurate documentation of site incidents, including impact details, timelines, steps taken for mitigation / resolution
- Develop and maintain technical documentation for use by all of PayPal operations interacting with the live site
What do you need to bring
- : Proficiency in Unix/Linux environments, including troubleshooting and managing application and system logs.
- : Ability to quickly identify, triage, and remediate production-impacting incidents, along with strong communication skills to effectively update stakeholders during major incidents.
- : Familiarity with tools like Datadog, SignalFx, and Splunk for monitoring and analyzing system performance.
- Nice-to-haves include
- Experience with automation and scripting languages (e.g., Python, Bash).
- Understanding of cloud platforms (e.g., AWS, GCP).
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).