Responsibilities:- Construct robust data pipelines and scalable infrastructure supporting our machine learning systems. Implement monitoring tools like Prometheus and Grafana for insightful, real-time analytics.
Seamlessly transition models from development to production, ensuring systems are highly available and fault-tolerant.- Develop tools and automate services for machine learning training and inference using frameworks such as Ansible and Terraform.
Proactively identify and integrate advanced technologies to enhance system performance, maintainability, and reliability.
- Embed DevOps best practices in our machine learning lifecycle, including establishing CI/CD pipelines using AWS CI/CD or Jenkins or CircleCI, and managing configurations and automation seamlessly.
Prioritize auditability, version control, and data security, meeting stringent regulatory and compliance benchmarks.- Lead initiatives to develop and deploy experimental models, assessing their efficiency and impact.
Engage with various teams across the organization to ensure that machine learning solutions are integrated smoothly and contribute to ongoing improvement efforts.
- Excellent problem-solving capabilities. Effective communication skills, capable of conveying technical concepts to non technical stakeholders.
Strong collaborative spirit, working comfortably across diverse teams.
Solid background as a Platform Engineer, ML DevOps Engineer, Data Engineer, or similar role with extensive data engineering experience.- Proficient with cloud services (AWS, Azure, GCP), utilizing tools like EC2, S3, SageMaker, and Google Cloud ML Engine for effective model scaling and deployment.
Skilled in Docker and Kubernetes for orchestration.
- Experienced with frameworks such as Airflow, KubeFlow, or Argo.
Deep understanding of data ingestion, transformation, and storage technologies including SQL, NoSQL, Hadoop, and Spark.- Highly proficient in Python; additional scripting knowledge beneficial.
Familiar with machine learning frameworks like PyTorch, TensorFlow, and TFX.
- Well-versed in version control (Git), CI/CD (Jenkins, AWS), and infrastructure automation (Ansible, Terraform).
* Experience with testing, monitoring tools (Prometheus, ELK Stack), and logging frameworks (Logstash, Fluentd).
Location: Work from Office only in Mumbai
Job Type: Full-time
Pay: 1,400,
- 00 - 2,000,000.00 per year
Jadwal: - Day shift
Ability to commute/relocate:
- Mumbai, Maharashtra: Reliably commute or planning to relocate before starting work (Required)
Experience:
- ML Ops: 3 years (Required)
Work Location: In person