Circles is a leader in leveraging innovative technologies to enhance operational efficiencies and deliver exceptional user experiences. As we continue to grow and evolve, we are seeking a skilled Senior Site Reliability Engineer (SRE) to join our team. This role is pivotal in ensuring the reliability, performance, and security of our infrastructure and applications.
Position Overview
The Senior SRE will be responsible for leading initiatives to improve system reliability, automate operational processes, and ensure the scalability and security of our systems. The ideal candidate will have a strong background in Linux systems, cloud technologies, containerization, and automation, along with a proactive approach to problem-solving and a commitment to continuous improvement.
Key Responsibilities
- Design and implement automation solutions for infrastructure provisioning, configuration, and management using Ansible, promoting consistency and reliability across environments.
- Lead the development and maintenance of CI/CD pipelines using Jenkins, ensuring efficient deployment processes and integrating quality checks.
- Manage and optimize containerized applications using Docker and Kubernetes, focusing on scalability, efficiency, and security.
- Architect and maintain secure, scalable, and resilient cloud infrastructure on AWS, including performance tuning and cost optimization.
- Conduct comprehensive Linux system administration, including performance tuning, security hardening, and troubleshooting.
- Develop and maintain Python scripts to automate tasks and integrate systems, enhancing operational efficiency.
- Collaborate with development and operations teams to implement SRE principles, fostering a culture of reliability and performance.
- Monitor system performance, identify bottlenecks, and implement solutions to ensure high availability and optimal user experience.
- Lead incident response efforts, minimizing impact and conducting post-mortem analyses to prevent future occurrences.
- Mentor junior team members and contribute to the development of best practices and standards within the SRE team.
Required Skills and Experience
- Minimum of 5 years of experience in a senior SRE role or similar, with a proven track record in improving system reliability and performance.
- Expertise in Linux administration, performance optimization, and security practices.
- Strong experience with container technologies (Docker) and orchestration systems (Kubernetes).
- Extensive knowledge of AWS cloud services, architecture, and management.
- Proficiency in Python for scripting and automation.
- Solid understanding of Ansible for infrastructure automation and Jenkins for continuous integration and delivery.
- Excellent problem-solving skills, with the ability to troubleshoot complex system issues effectively.
- Strong communication and collaboration abilities, capable of leading projects and working across teams to achieve objectives.
Education
- Bachelors degree in Computer Science, Information Technology, or related field preferred