There are NO limits to your career: come shape the future and be part of a truly unique global culture at OutSystems!
Site Reliability Engineer
Production Engineering - SRE
SREs at OutSystems work closely with development teams, acting as an extension of the team, in adopting the reliability tenets with the shared goal of meeting Service Level Objectives (SLOs) and thus delivering a smooth and frictionless Customer Experience.
Overview
As an SRE at OutSystems here are your key responsibilities and duties:
- Lead and onboard services and teams to the reliability tenets;
- Establish and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs);
- Design and implement scalable, reliable, and secure infrastructure, while ensuring cloud-native best practices;
- Collaborate with software development teams to ensure systems are resilient (observable, fault-tolerant, recoverable, scalable) and performant;
- Implement monitoring, alerting, logging, and tracing solutions to detect and respond to incidents;
- Lead incident response efforts, ensuring quick resolution and minimal downtime, and conduct RCA/post-mortems;
- Automate every operational task, with a special focus on fast incident detection & recovery;
- Foster a culture of continuous improvement and knowledge sharing;
- Communicate effectively with stakeholders, providing updates on system reliability and performance;
- Participate in on-call rotation to provide 24/7 support for production systems.
Qualifications
- STEM degree (BSc, MSc, in Software Engineering/Computer Science or related fields);
- 5+ years of experience in software development and/or operations.
- Proficiency in at least one high-level programming language (C++, Python, Java, C#, etc.).
- Strong troubleshooting and debugging skills.
- Fluency in English and excellent communication skills.
Soft Skills
- Communication - able to communicate effectively (in English) both orally and written showing empathy for the other person
- Humbleness - accepts mistakes and acts accordingly, with a humble attitude, apologizing for them and mitigating them ASAP to avoid higher impact.
- Accountability - takes ownership of problems and makes sure to see them through. Even if he does not have all the necessary knowledge to move on alone, can involve the right people to reach closure.
- Negotiation Skills - has tough and politically complex conversations with colleagues and customers, defusing disagreements and leading towards a mutual agreement and understanding of all parties involved.
- Process Oriented - is organized and able to properly follow defined processes, whilst being able to properly challenge inefficient processes and suggest improvements.
- Problem-solving - Has a top-down approach to problems, breaking them into smaller pieces and solving them by starting with a wider scope and narrowing it down as the analysis progresses. Has critical thinking, so can analyze information objectively and make a reasoned judgment.
Technical Skills
Experience in any of the following is valued, but not fully required:
- Containerization technologies and orchestration platforms, mainly Kubernetes (CKA, CKAD, CKS certifications are valued);
- Experience with automation and Infrastructure as Code (IaC) tools, such as AWS CloudFormation, Terraform, Puppet, Chef, Spacelift, etc.
- Experience with Python, Go, Bash/Shell scripting, or other automation tools/languages
- Familiarity with AWS services like EC2, RDS, ELB, CloudFront, Lambda, etc.
- Proficiency in monitoring and troubleshooting complex distributed systems
- Experience with Grafana, ELK stack, Prometheus, or others
- Strong understanding of designing resilient and fault-tolerant systems
- Expertise in debugging complex distributed systems.
Join us in disrupting the status quo of the low-code market, we give you the power to Ask Why, you give our customers the power to innovate through software!