Skills:
Linux Administration, Network Troubleshooting, Scripting (Bash/Python), Cloud Infrastructure Management, Monitoring Tools (e.g. Nagios), Automation (Ansible/Chef/Puppet), Security Best Practices, Database Management,
We are seeking a Systems Operations Engineer with deep knowledge of Kubernetes and AWS. The ideal band member will play a crucial role in the architecture, deployment, and management of our cloud infrastructure, containerized applications, and mobile device management (MDM). This role requires someone who can actively design, implement, and maintain complex systems while ensuring their scalability, performance, and security.
Key Responsibilities
- Design, deploy, and manage scalable and reliable cloud infrastructure on AWS, ensuring high availability and disaster recovery.
- Architect, manage, and optimize Kubernetes clusters for deploying containerized applications, including implementing best practices for scaling, monitoring, and securing the clusters.
- Develop automation scripts and tools using languages such as Python, Bash, or Terraform to streamline operations and reduce manual intervention.
- Put in place comprehensive monitoring solutions to track system performance, identify bottlenecks, and optimize resources for cost-effectiveness.
- Implement best practices, conduct regular security assessments, and ensure compliance with relevant regulations to ensure cloud infrastructure security and DistroKid-owned devices.
- Work closely with development teams for continuous integration/continuous deployment (CI/CD) pipeline, troubleshoot issues, and provide operational assistance.
- Monitor and optimize AWS resource usage and costs, employ scaling policies and choose the appropriate services and configurations to meet performance needs economically.
- Develop and maintain disaster recovery plans, leveraging AWS capabilities for backup and replication to ensure business continuity.
- Maintain systems, processes, and procedures documentation to ensure knowledge transfer and operational consistency.
- Provide technical leadership and mentorship to junior engineers, fostering a culture of continuous learning and improvement.
Requirements
- 5+ years of experience in systems operations, focusing on cloud infrastructure and containerized environments.
- Proven experience with AWS services (EC2, S3, RDS, IAM, VPC, etc.) and Kubernetes in production environments.
- Experience with infrastructure-as-code tools such as Terraform, CloudFormation, or Ansible.
- Strong knowledge of Linux/Unix systems and shell scripting.
- Proficiency in one or more programming languages (Python, Go, etc.).
- Experience with CI/CD tools, like Jenkins, Bitbucket, or similar.
- Familiarity with monitoring tools (Prometheus, Grafana, CloudWatch) and logging solutions (Datadog, ELK, Fluentd).
Benefits
- Work from home
- 5 days a week work shift