Skills:
Databricks, SQL, PySpark, Python, Airflow, Scala,
Responsibilities
Design and build reusable components, frameworks and libraries at scale to support analytics
products.
Design and implement product features in collaboration with business and Technology
stakeholders.
Anticipate, identify and solve issues concerning data management to improve data quality.
Clean, prepare and optimize data at scale for ingestion and consumption.
Drive the implementation of new data management projects and re-structure of the current
data architecture.
Implement complex automated workflows and routines using workflow scheduling tools.
Build continuous integration, test-driven development and production deployment
frameworks.
Drive collaborative reviews of design, code, test plans and dataset implementation performed
by other data engineers in support of maintaining data engineering standards.
Analyze and profile data for the purpose of designing scalable solutions.
Troubleshoot complex data issues and perform root cause analysis to proactively resolve
product and operational issues.
Mentor and develop other data engineers in adopting best practices.
Qualifications
Primary skillset:
Experience working with distributed technology tools for developing Batch and
Streaming pipelines using
- SQL, Spark, Python, PySpark [4+ years],
- Airflow [3+ years],
- Scala [2+ years].
Able to write code which is optimized for performance.
Experience in Cloud platform, e.g., AWS, GCP, Azure, etc.
Able to quickly pick up new programming languages, technologies, and frameworks.
Strong skills building positive relationships across Product and Engineering.
Able to influence and communicate effectively, both verbally and written, with team members
and business stakeholders
Experience with creating/ configuring Jenkins pipeline for smooth CI/CD process for Managed
Spark jobs, build Docker images, etc.
Working knowledge of Data warehousing, Data modelling, Governance and Data Architecture
Good To Have
Experience working with Data platforms, including EMR, Airflow, Databricks (Data
Engineering & Delta Lake components, and Lakehouse Medallion architecture), etc.
Experience working in Agile and Scrum development process.
Experience in EMR/ EC2, Databricks etc.
Experience working with Data warehousing tools, including SQL database, Presto, and
Snowflake
Experience architecting data product in Streaming, Serverless and Microservices Architecture
and platform.