Design, implement, and manage cloud infrastructure and applications with a focus on high availability, fault tolerance, and auto-scaling using Terraform.
Monitor, analyze, and ensure the performance and reliability of our systems with Dynatrace, implementing automated solutions to preemptively resolve potential issues.
Utilize GitLab for continuous integration/continuous deployment (CI/CD) pipelines, enhancing our deployment strategies to ensure seamless and reliable updates to our services.
Lead incident response and blameless post-mortems, driving root cause analysis and implementing preventive measures to mitigate future incidents.
Work closely with development teams to advocate for reliability and performance best practices, incorporating SRE principles into the software development lifecycle.
Develop and maintain documentation for system architecture, processes, and disaster recovery plans.
Stay up-to-date with the latest industry trends and technologies, continuously seeking to improve our systems and processes.
Mentor junior staff and enable them for success.
Qualifications:
Bachelor s degree in Computer Science, Engineering, or related field, or equivalent work experience.
At least 5+ years of experience in a Site Reliability Engineering or DevOps role, with a demonstrated ability to work independently and self-regulate.
Strong experience with infrastructure as code (IaC) tools, specifically Terraform.
Strong background in maintaining multiple production environments mission critical components.
Proficient in monitoring and observability tools, with substantial experience in Dynatrace.
Extensive knowledge of CI/CD processes and tools, with a focus on GitLab.
Proficient with containerization and orchestration technologies (e.g., Docker, Kubernetes).
Experience in one or more of the following programming languages: Elixir, Golang, or Python.
Excellent problem-solving skills, with the ability to analyze complex systems and identify performance bottlenecks and areas for improvement.
Strong communication and collaboration skills, capable of working effectively across different teams and departments.