As a Staff Engineer, you will play a pivotal role in building an end-to-end ML lifecycle with scale. Our recommendation engine creates personalized home page experiences for millions globally, adapting to user interests in real time. Leveraging ML algorithms and A/B testing, we continuously refine our recommendations. you'll thrive here if you're passionate about tackling complex technical challenges using big data, ML models, and building scalable services.
KEY RESPONSIBILITIES: - Serves as the functional lead, possesses deep knowledge of core systems, and takes full responsibility for the teams code, design, and project delivery, ensuring high quality and on-time completion
- Responsible for end-to-end Machine Learning Lifecycle and working with a diverse range of frameworks & technologies from Big data frameworks like Spark, Flink, Kafka, etc to ML frameworks like TensorFlow serving
- Design and implement low-latency caching and storage solutions
- Build stream processing pipelines to compute near real-time features that will help us serve real-time recommendations for our users
- You will continuously evaluate relevant technologies, influence, and drive architecture and design discussions
- Tackles complex technical challenges independently for complex systems, demonstrating technical depth and proficiency
- Collaborate with Product Managers and cross-functional teams to understand requirements and deploy high-quality software solutions that meet business objectives
SKILLS & ATTRIBUTES FOR SUCCESS: - Strong coding skills in Python, Java, Golang, or Scala
- Experience with Big Data frameworks like Spark and Kubernetes for data and ML workloads
- Proficient in implementing API performance monitoring solutions using tools such as Grafana, Prometheus, and Cloudwatch
- Skilled in developing end-to-end machine learning pipelines ensuring consistency between development and production environments
- Ability to design scalable ML architectures tailored to site traffic and feature complexity for predictive algorithms
- Familiar with model and data versioning, resource allocation and scaling, and logging to build optimal systems
- Experienced in creating systems that monitor and react to faults in resources, data streams, and model responses
- Expertise with MLOps tools for scalable, production-level deployment, including feature stores, model hosting and versioning, data versioning, prediction and drift monitoring, and automated remediation
- Strong problem-solving skills and the ability to work effectively in a fast-paced environment
- Commitment to continuous learning and professional development