Senior Data Engineer

redBus

Early Applicant

7 days ago
Be among the first 50 applicants

Exp: 3-5 Years

Bengaluru / Bangalore, India(estd)

Job Description

redBus.in is India's largest online bus ticketing platform operating across India, SEA & LATAM (Part of GO-MMT group, Nasdaq: MMYT). We have the largest network of bus operators and satisfied customers booking upto 250,000 transactions a day through mobile and desktop channels. With over 300,000+ live routes in multiple continents we host the largest inventory of bus seats from private bus operators and state road transport corporations. We are a true Indian MNC with global presence and operations across India, Singapore, Malaysia, Indonesia, Vietnam, Peru and Colombia. In India, redBus works with 3000 + bus operators with services across 50,000 + routes and a customer base of more than 10 million.

redBus at its heart is about people. As the pioneer in our space, we are passionate about creating amazing experiences for passengers, partners and our people. Energized by a great work environment, where talent is nurtured, innovation is celebrated and challenges are conquered, we journey towards creating fulfilling moments for everyone whose lives we touch. Always striking a balance between getting the job done, while also having fun along the way. So here is to being a Great Place to Work!

Key Responsibilities:

Data Infrastructure & Pipeline Development:

Design, build, and maintain scalable data and machine learning pipelines to support real-time processing and high-volume data flows. Oversee the entire pipeline lifecycle to ensure efficient data collection, transformation, and integration across platforms.

Pattern Mining & Anomaly Detection:

Develop frameworks for pattern mining and anomaly detection within user behavior and transaction data. Continuously monitor and analyze data to identify actionable patterns and anomalies, supporting proactive decision-making.

Workflow Automation:

Automate and orchestrate workflows to optimize data movement, transformation, and ML model deployment, ensuring high availability and operational efficiency across data workflows.

Generative AI for Data Insights:

Leverage generative AI tools to automate diagnostic summaries, enhance feature engineering, and make data insights accessible to both technical and non-technical users.

ML Model Deployment & Monitoring:

Deploy, monitor, and manage machine learning models in production, focusing on performance and accuracy for anomaly detection and pattern recognition. Use monitoring tools to ensure model health and proactively address issues.

Data Warehousing & Real-Time Analytics:

Develop and manage robust data warehousing and real-time analytics solutions to support rapid decision-making and effective anomaly detection across the organization.

Educational Qualification:

Bachelor's or Master's degree in Computer Science or a related field.

Years of Experience:

3 to 5 years in Big data engineering.

Nature and Scope of Responsibilities:

The ideal candidate will be responsible for developing cutting-edge services for pattern mining and anomaly detection, utilizing billions of user behavior and transaction signals daily. This role involves handling big data, building resilient and stable data engineering pipelines for high-volume processing, and performing data modeling for databases and data warehouses to support scalable solutions. Collaboration with cross-functional teams, including product, business development and engineering, is essential to drive actionable insights and enhance data accessibility. Key responsibilities include automating data workflows, deploying models, and utilizing generative AI tools to enhance data usability for diverse stakeholders.

Must-Have Skills:

Expertise in at least two big data and data pipeline technologies, including Apache Kafka, Apache Nifi, and Apache Spark.
Strong understanding of data warehousing concepts and database modeling skills , with experience in tools like Apache Iceberg/Hudi and data warehouses like AWS Redshift.
Advanced programming skills in Python, Java, and SQL for data engineering and ML workflows.
Strong familiarity with AWS cloud services, including RDS, Redshift, Athena, EC2, ECS, Lambda, Glue, and Fargate.
Exposure to advanced statistical and machine learning techniques for anomaly detection and pattern recognition.
Familiarity with generative AI applications to enhance data insights.

Good-to-Have Skills:

Experience in OLAP databases like Apache Druid and ClickHouse.
Strong experience with one of the workflow automation and orchestration tools, such as Apache Airflow, Prefect, or Luigi.
Knowledge of predictive maintenance for data pipelines.
Proficiency in ML lifecycle management and model monitoring using tools such as Amazon SageMaker and MLflow.
Knowledge of GCP and Azure services.
Knowledge of front-end application development.