Must Have :1. Ability to set up and manipulate large data sets by using Python language - Data structures like Lists, strings, dictionaries, and tuples.
2. Expertise in Pandas and NumPy Library.
3. Expertise in End-to-End Data pipelines- from the Raw Data Layer to the advanced analytics Layer
4. Expertise in Data Exploration, Visualization, and comparing metrics of large .csv and Parquet files including partitioned parquet files.
5. Strong skills in join, merge, pivot table, grouping, and window functions in Python or SQL.
6. Versioning tools: Git, git push, git clone.
7. Good exposure to working on large data sets.
8. Capable of capturing business requirements and converting them to technical design.
9. Good exposure to work in Agile Methodology.
Value-adds : 1. Good experience with object-oriented programming patterns, multithreading, and multiprocessing.
2. Good experience in building REST APIs.
3. API Security: API Key Validation, Authorization, Authentication, and Identity.
Good to have : 1. Developing Spark applications using Python.
2. Strong background in building an ETL data pipeline.
3. Familiar with Apache Spark i.e., Spark SQL, Spark Streaming, Data frames, RDD, Py Spark.
Education :1. Bachelor's degree in computer science or a related field