Data Engineer - Pyspark, Scala

Early Applicant

Exp: 0-2 Years

Job Description

Responsibilities -

Expertly perform tasks under supervision.
Develop and maintain end-to-end data pipelines using cloud-native solutions to extract, load, and transform data from disparate data sources to a cloud data warehouse.
Capable of formatting and distributing custom data extracts through various means (e.g., custom SFTPs, APIs (e.g., RESTful), and other bulk data transfer mediums) and optimizing data storage options based on business requirements.
Competent in helping to develop/design database structure and function, schema design, and database testing protocols.
Contribute to the process of defining company data assets (data models) and custom client workflows, as well as standardized data quality protocols.
Capable of independently and collaboratively troubleshooting database issues and queries for improving data retrieval times across various systems (e.g., via SQL).
Collaborate with both technical and non-technical stakeholders including IT, Data Science, and various team members across a diverse array of business units.
Work closely with IT team whenever necessary to help facilitate, troubleshoot, or develop database connectivity between internal/external resources (e.g., on-premises Azure Data Lakes, Data Warehouses, and Data Hubs).
Help implement and enforce enterprise reference architecture and ensure that data infrastructure design reflects enterprise business rules as well as data governance and security guidelines.

Criteria -

Experience with Azure Stack (Data Lake/Blob Storage, Azure Data Factory (or equivalent), Databrick) and production level experience with on-premises Microsoft SQL Server required.
Experience with ETL/ELT, taking data from various data sources and formats and ingesting into a cloud-native data warehouse, required.
Experience with one of the following: Python and Pyspark, Scala as well as standard analytic libraries/packages (e.g., pandas, Numpy, dplyr, data table, stringr, Slick, and/or Kafka) and related distribution frameworks required.
Strong verbal and written communication skills required.
Experience with DataRobot, Domino Data Labs, Salesforce MC, Veeva CRM preferred.
Familiarity with Snowflake Warehouse preferred.

Preferred - Immediate joiners