PySpark Developer (Map Development)

KPIT

Early Applicant

8 days ago
Be among the first 50 applicants

Exp: 4-9 Years

Full time

Bengaluru / Bangalore, India

Job Description

Job Description

Experince: 4 to 9 Years

Work Location: Bnagalore

Job Description:

We are seeking skilled PySpark Developers with expertise in Azure Databricks for map development to join our dynamic team. As a PySpark Developer, you will play a crucial role in designing, implementing, and optimizing data processing pipelines using PySpark on the Azure Databricks platform. Your primary responsibility will be to develop and maintain scalable and efficient mapping solutions to support our business objectives.

Responsibilities:

Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions. Design, develop, and implement PySpark-based data processing pipelines on Azure Databricks for efficient data manipulation and transformation. Optimize Spark jobs for performance and scalability to handle large volumes of data effectively. Develop custom map functions and transformations using PySpark to meet specific business needs. Implement data quality checks and ensure data integrity throughout the data processing pipeline. Work closely with data engineers and data scientists to integrate machine learning models into the PySpark pipelines. Troubleshoot and resolve issues related to data processing, performance, and scalability. Stay updated with the latest advancements in PySpark, Spark, and Azure Databricks technologies and incorporate best practices into the development process. Requirements: Bachelor's degree in Computer Science, Engineering, or a related field. Proven experience as a PySpark Developer with a strong understanding of Spark architecture and internals. Hands-on experience with Azure Databricks, including cluster management, notebook development, and job scheduling. Proficiency in Python programming language and experience with PySpark libraries for data manipulation. Solid understanding of distributed computing principles and experience with large-scale data processing. Experience with SQL and NoSQL databases for data storage and retrieval. Strong analytical and problem-solving skills with a keen attention to detail. Excellent communication and collaboration skills with the ability to work effectively in a team environment.

Nice to have: Experience with geospatial data processing and mapping libraries (e.g., GeoPandas, Folium).

Certification in PySpark or Azure Databricks. Familiarity with containerization technologies (e.g., Docker, Kubernetes or WSL)