Data and AI Engineer

SourceBae

Early Applicant

4 months ago
Be among the first 50 applicants

Exp: 8-10 Years

India

Job Description

Lead Data Engineer

Experience: 8+ Years

About Us: We are seeking a talented and experienced Data & AI Engineer with strong Azure cloud competencies to join our dynamic team.

Role Overview: Deliver successful projects in customer environments to bring use cases into production, machine learning projects, and large migrations, in order to deliver on value proposition.

Key Responsibilities:

Architecture and Design for Data Engineering and Machine Learning Projects: Establish architecture and target design for data engineering and machine learning projects.
Requirement Analysis, Planning, Effort and Resource Needs Estimation: Conduct current inventory analysis, review and formalize requirements, and develop project planning and execution plans.
Advisory Services and Best Practices: Provide troubleshooting, performance tuning, cost optimization, operational runbooks, and mentoring.
Large Migrations: Assist customers with large migrations to Databricks from Hadoop ecosystems, data warehouses (Teradata, DataStage, Netezza, Ab Initio), ETL engines (Informatica), SAS, SQL, and cloud-based data platforms like Redshift, Snowflake, and EMR.
Design, Build, and Optimize Data Pipelines: Implement best-in-class Databricks solutions with flexibility for future iterations.
Production Readiness: Assist with production readiness for customers, including exception handling, production cutover, capture analysis, alert scheduling, and monitoring.
Machine Learning (ML) Model Review, Tuning, ML Operations, and Optimization: Build and review ML models, ensure ML best practices, manage model lifecycle, work with ML frameworks, and deploy models in production.

Must Have:

Hands-on experience with distributed computing frameworks like Databricks, Spark Ecosystem (Spark Core, PySpark, Spark Streaming, Spark SQL).
Willingness to work with product teams to optimize product features/functions.
Experience with batch workloads and real-time streaming with high-volume data frequency.
Performance optimization on Spark workloads.
Environment setup, user management, authentication, and cluster management on Databricks.
Professional curiosity and the ability to learn new technologies and tasks independently.
Good understanding of SQL and a strong grasp of relational and analytical database management theory and practice.

Key Skills:

Proficiency in Python, SQL, and PySpark.
Experience with Big Data Ecosystem (Hadoop, Hive, Sqoop, HDFS, HBase).
Expertise in Spark Ecosystem (Spark Core, Spark Streaming, Spark SQL) / Databricks.
Strong knowledge of Azure (ADF, ADB, Logic Apps, Azure SQL Database, Azure Key Vaults, ADLS, Synapse).
Familiarity with AWS (Lambda, AWS Glue, S3, Redshift).
Strong understanding of data modeling and ETL methodology.

If you are a seasoned data engineer looking for a challenging and rewarding opportunity, we would love to hear from you. Apply now share your CV's at [Confidential Information] or share it via WhatsApp at 9109436045