Overview
The Data Engineer plays a crucial role in an organization by developing, constructing, testing, and maintaining architectures such as databases and large-scale processing systems. They are responsible for transforming data into a format that can be easily analyzed, optimizing data workflows, and building data pipelines. They work closely with data scientists, analysts, and other stakeholders to understand business requirements and provide infrastructure support for data generation, analysis, and reporting.
Key Responsibilities
- Design and build data pipelines using Spark-SQL and PySpark in Azure Databricks
- Design and build ETL pipelines using ADF
- Build and maintain a Lakehouse architecture in ADLS / Databricks.
- Perform data preparation tasks including data cleaning, normalization, deduplication, type conversion etc.
- Work with DevOps team to deploy solutions in production environments.
- Control data processes and take corrective action when errors are identified. Corrective action may include executing a work around process and then identifying the cause and solution for data errors.
- Participate as a full member of the global Analytics team, providing solutions for and insights into data related items.
- Collaborate with your Data Science and Business Intelligence colleagues across the world to share key learnings, leverage ideas and solutions and to propagate best practices. You will lead projects that include other team members and participate in projects led by other team members.
- Apply change management tools including training, communication and documentation to manage upgrades, changes and data migrations.
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Proven experience in data engineering, data warehousing, or data integration.
- Proficiency in programming languages such as Python, Java, or Scala.
- Strong understanding of SQL and database management systems.
- Experience with big data technologies like Hadoop, Spark, or NoSQL databases.
- Hands-on experience with cloud platforms such as AWS, Azure, or GCP.
- Ability to design and optimize data models for diverse data types.
- Expertise in building and maintaining ETL pipelines and data workflows.
- Strong problem-solving skills and attention to detail.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
- Familiarity with data governance, security, and compliance best practices.
Skills: python,sql,data modeling,etl,big data,data warehousing,data analysis,ml,flow,azure,devops