Data Engineer will be responsible for designing, implementing, and maintaining robust data pipelines on the Databricks platform. The ideal candidate will have a deep understanding of big data technologies, ETL processes, and data integration. This role requires collaboration with data scientists, analysts, and other stakeholders to ensure data availability and integrity for analytics and reporting.
Key Responsibilities
- Design and Develop Data Pipelines: Create and maintain scalable and efficient data pipelines using Databricks and Apache Spark.
- Data Integration: Integrate data from various sources, including databases, APIs, and external datasets, ensuring data quality and consistency.
- ETL Processes: Develop and optimize ETL (Extract, Transform, Load) processes to support data analytics and business intelligence.
- Performance Optimization: Optimize data processing performance, including tuning Spark jobs and managing resource allocation.
- Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and provide solutions.
- Data Quality: Implement data validation and cleansing procedures to ensure data accuracy and reliability.
- Documentation: Document data pipelines, processes, and architecture for maintenance and knowledge sharing.
- Monitoring and Troubleshooting: Monitor data pipelines for performance and reliability, and troubleshoot issues as they arise.