We are seeking a skilled MLOps Engineer with 2-3 years of experience to join our team. In this role, you will collaborate with data scientists, software engineers, and IT teams to ensure smooth deployment, scaling, and monitoring of machine learning models. You will build, automate, and manage infrastructure and pipelines for AI/ML workflows.
Key Responsibilities:
- Design, develop, and maintain automated pipelines for continuous integration and deployment (CI/CD) of machine learning models.
- Manage model versioning, deployment, and monitoring in production environments.
- Optimize the performance and scalability of machine learning models post-deployment.
- Collaborate with data science teams to improve model reproducibility, experiment tracking, and data workflows.
- Implement monitoring, alerting, and logging solutions to ensure model performance and detect anomalies in production.
- Manage and scale infrastructure (on-premise or cloud) required for model training and inference.
- Work closely with DevOps teams to ensure seamless integration of MLOps practices into existing development workflows.
- Implement security and compliance practices for AI/ML pipelines, including data governance.
- Troubleshoot issues in production environments and ensure high availability of models.
Qualifications:-
- Education: Bachelor's degree in Computer Science, Engineering, or a related field.
- Experience: 2-3 years of hands-on experience in MLOps, DevOps, or related fields.
- Experience with machine learning lifecycle management tools such as MLflow, Kubeflow, or TFX.
- Strong knowledge of cloud platforms such as AWS, Google Cloud, or Azure (experience in setting up AI/ML services is a plus).
- Proficiency in scripting and automation (Python, Bash, etc.).
- Experience with containerization (Docker) and orchestration tools (Kubernetes).
- Familiarity with CI/CD tools like Jenkins, CircleCI, or GitLab CI for deploying machine learning models.
- Knowledge of version control systems (e.g., Git) and infrastructure-as-code (e.g., Terraform, CloudFormation).
- Understanding of monitoring and logging frameworks (e.g., Prometheus, Grafana, ELK stack).
- Familiarity with data engineering tools (e.g., Apache Airflow, Kafka) is a plus.