Key Responsibilities
- Designing and developing robust PySpark applications for large-scale data processing.
- Building and optimizing data ingestion, transformation, and storage processes.
- Implementing efficient algorithms and data structures for distributed computing.
- Collaborating with cross-functional teams to integrate data-driven solutions into business processes.
- Troubleshooting performance bottlenecks and ensuring high availability and reliability of data pipelines.
- Writing and optimizing SQL queries for data extraction and manipulation.
Required Skills And Qualifications
- Bachelors/Masters degree in Computer Science, Engineering, or a related field.
- Proven experience (3-10 years) in Python development with a focus on PySpark.
- Strong understanding of distributed computing principles and experience with Apache Spark.
- Proficiency in SQL and experience with relational databases (MySQL, PostgreSQL, etc.).
- Experience with data serialization formats such as JSON, Parquet, Avro.
- Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes) is a plus.
- Excellent problem-solving skills and ability to work independently or as part of a team.
- Good communication skills with the ability to effectively collaborate with stakeholders.
Preferred Qualifications
- Certification in Apache Spark or related technologies.
- Experience with stream processing frameworks like Apache Kafka.
- Knowledge of machine learning frameworks (e.g., TensorFlow, PyTorch) for data analysis.
Additional Information
Required Qualification
Master of Computer Applications (M.C.A.) ,
Bachelor of Engineering - Bachelor of Technology (B.E./B.Tech.) ,