Skills:
Azure Databricks, python, Hadoop, Data lake, Apache Spark, PySpark, etl tools,
The ideal candidate is a self-motivated, multi-tasker, and demonstrated team-player. You will be a lead developer responsible for the development of new software products and enhancements to existing products. You should excel in working with large-scale applications and frameworks and have outstanding communication and leadership skills.
Responsibilities
- Design and implement data engineering solutions in bigdata space
- Design and implement batch and streaming solutions to support near real-time analytics
- Collaborate with architects, tech leads, QA, business stakeholders, and project managers to design and develop solutions in AWS Databricks.
- Develop end-to-end Enterprise grade solutions, facilitate discussions with business & IT stakeholders and highlighting the outcomes that can be delivered through these solutions.
- Develop data models based on the source systems
- Part of 24x7 on-call rotation process for production support (Short-term)
- Troubleshoot production issues and bug fixes
- Translate business requirements into technical specs
- Coach fellow developers being a hands-on developer as part of a global team
Qualifications
- Bachelor's degree in Computer Science (or related field)
- 8+ years of IT experience in big data space like Hadoop, data lake, data engineering using Python & Spark programming languages
- Minimum 3 year of experience in Databricks workspace, Databricks notebooks, Job cluster, Delta Lake, Databricks Lakehouse and Unity catalog
- Minimum 5 years experience in PySpark and Python development
- 6+ year's design and/or implementation of data pipelines
- Experience on design and development of Data pipelines and ETL/ELT jobs to ingest and process data in data lake
- Strong working knowledge of SQL Server, No SQL, Spark SQL, Data Modeling, Identity & Access Management, Query optimization, Parallel processing
- Experience in processing streaming data using Kafka / Pub-Sub
- Experience in AGILE development, SCRUM and Application Lifecycle Management (ALM)
- Must have knowledge in Datawarehouse concepts
- Hands-on experience in full life-cycle of software development
- Plus to have experience in IaaS using Terraform
- Plus to have experience in Pandas and TensorFlow
- Plus to have Java, JavaSpark experience.
- Plus to have Databricks certifications