Job Title: Lead Data Engineer
Job Type: Full-time
Company Overview:
EXL is a global leader in data-driven solutions, offering a range of services designed to help businesses unlock the full potential of their data. We specialize in building scalable, high-performance data systems, enabling organizations to drive innovation and achieve their business goals. We are seeking a Lead Data Engineer with expertise in PySpark, Databricks, Azure, SQL, and DevOps to join our talented team.
In this role, you will lead the design, development, and implementation of cutting-edge data engineering solutions that leverage Databricks and Azure. You will collaborate closely with cross-functional teams to deliver end-to-end data pipelines, automate processes, and optimize workflows. The ideal candidate will have deep technical expertise in distributed data processing, cloud architecture, and best practices for building and maintaining robust, scalable data systems.
Key Responsibilities:
Design & Development of Data Solutions:
- Lead the design, development, and implementation of scalable and efficient data pipelines using PySpark and Databricks on Azure cloud infrastructure.
- Architect and implement ETL pipelines, data processing workflows, and batch/real-time data integrations using Databricks and other Azure data services.
- Optimize SQL queries and databases for large-scale data processing, ensuring high performance, scalability, and efficiency in data retrieval and analysis.
- Build and maintain data lakes, data warehouses, and other data storage solutions using Azure services like Azure Data Lake, Azure SQL Database, and Azure Synapse Analytics.
Cloud Infrastructure & Architecture:
- Lead the development of cloud-native data architectures and solutions on Azure to support the needs of business units and data scientists.
- Collaborate with cross-functional teams to integrate Azure services such as Azure Data Factory, Azure Databricks, Azure Event Hubs, and Azure Synapse to ensure a seamless flow of data across environments.
- Apply best practices for data governance, security, and scalability to ensure that data pipelines are resilient and compliant with organizational and regulatory standards.
Automation & CI/CD Integration:
- Drive the automation of data workflows using DevOps practices and tools such as Azure DevOps, Terraform, Git, and Jenkins to ensure robust, continuous integration and deployment (CI/CD) processes.
- Implement Infrastructure as Code (IaC) for the automated deployment and management of data infrastructure on Azure.
- Set up and maintain automated testing, monitoring, and logging for data pipelines to ensure high-quality and error-free execution.
Collaboration & Stakeholder Engagement:
- Work closely with business stakeholders, data scientists, and analysts to understand their data needs and provide innovative, data-driven solutions.
- Collaborate with product owners and project managers to prioritize deliverables, ensure smooth execution of data projects, and meet business objectives.
- Lead the troubleshooting of data pipeline issues and provide technical support to engineers, ensuring fast resolution of data processing bottlenecks or failures.
Leadership & Mentorship:
- Mentor and guide junior data engineers, sharing your expertise in PySpark, Databricks, Azure, and SQL to elevate the technical capabilities of the team.
- Establish and enforce coding standards, best practices, and documentation to ensure consistency, maintainability, and scalability of data engineering solutions.
Required Qualifications & Skills:
Education: Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related field, or equivalent work experience.
Experience:
- 5+ years of experience as a Data Engineer, with significant hands-on expertise in PySpark, Databricks, Azure, and SQL.
- Extensive experience designing, building, and optimizing large-scale ETL pipelines and distributed data processing solutions using PySpark and Databricks.
- In-depth knowledge of Azure cloud services (e.g., Azure Data Lake, Azure Synapse Analytics, Azure Data Factory, Azure SQL Database), and experience in building cloud-based data architectures.
- Strong proficiency in SQL for querying, managing, and optimizing relational and non-relational databases.
- Proven experience with DevOps tools and practices, including Azure DevOps, Terraform, Git, and Jenkins, for continuous integration and deployment.
Technical Skills:
- Strong proficiency in PySpark for distributed data processing and building data workflows in Databricks.
- Expertise in SQL query optimization and data modeling for large datasets.
- Experience with Azure cloud services and integrating them into end-to-end data workflows.
- Familiarity with Infrastructure as Code (IaC) tools like Terraform or Azure ARM templates.
- Knowledge of cloud security best practices, data governance, and regulatory compliance standards.
Preferred Qualifications:
- Certifications: Azure Data Engineer, Azure Solutions Architect, or similar certifications are highly desirable.
- Experience with additional Azure services such as Azure Databricks for ML workflows or Azure Event Hubs for real-time data streaming.
- Familiarity with containerized data applications (e.g., Docker, Kubernetes) is a plus.
- Experience with machine learning pipelines or advanced analytics using Databricks is a plus.
Why EXL
- Competitive salary and comprehensive benefits package.
- Flexible work environment.
- Opportunities for career growth and professional development.
- Work with a talented, collaborative team and cutting-edge technologies.
- Focus on innovation and continuous improvement in everything we do.
EXL is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.