Spark Developer

Purple Drive Technologies

Early Applicant

a month ago
Be among the first 50 applicants

Exp: 4-6 Years

Bengaluru / Bangalore, India

Job Description

Job Title: PySpark Developer

Experience: 4+ Years (Relevant)

Location: Bangalore
Budget
: Up to 9 LPANotice Period: Immediate to 30 days preferred

Job Description:

Purple Drive Technology is seeking an experienced PySpark Developer to join our team in Bangalore. The ideal candidate will have more than 4 years of relevant experience working with PySpark, Big Data technologies, and distributed computing environments. The candidate will be responsible for developing and optimizing large-scale data processing pipelines to support analytics and data engineering tasks.
Key Responsibilities
: Design, build, and maintain scalable data pipelines using PySpark in a distributed computing environment.
Develop and implement optimized
ETL (Extract, Transform, Load) processes to ensure efficient data processing and loading into the data lake or data warehouse. Collaborate with data engineers, data scientists, and other stakeholders to ensure data consistency and accuracy.

Perform data transformations, aggregations, and joins using PySpark for large datasets.

Ensure that the code follows best practices in terms of performance, security, and scalability.
Troubleshoot and resolve data processing issues, including performance bottlenecks and failures in distributed environments.
Conduct performance tuning and optimization of PySpark applications.

Maintain thorough documentation for the developed systems and data flows.

Required Skills:

4+ years of experience in PySpark development.

Strong knowledge of Apache Spark and distributed computing concepts.

Hands-on experience in designing and developing large-scale data pipelines and ETL frameworks using PySpark.
Proficiency in
Python programming, with an in-depth understanding of PySpark APIs. Experience with Big Data technologies such as Hadoop, HDFS, Hive, etc.

Familiarity with cloud platforms like AWS, Azure, or Google Cloud.

Solid understanding of data warehousing and data lakes.
Ability to work with large and complex datasets efficiently.
Knowledge of best practices in performance optimization and tuning of Spark jobs.

Preferred Skills:

Experience with Kafka and streaming data processing.

Familiarity with SQL and NoSQL databases.

Hands-on experience in version control systems like Git and CI/CD pipelines.
Experience working in
Agile or Scrum environments.
Educational Qualification
: Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

Job Type: Full-time

Pay: 234,
53 - 946,875.06 per year

Benefits:
Provident Fund
Schedule:
Day shift
Monday to Friday
Supplemental Pay: