Design, develop, and maintain data pipelines and ETL processes on AWS infrastructure.
Utilize big data tools such as Hadoop, Spark, and Kafka to process and analyze large volumes of data efficiently.
Develop and optimize SQL and NoSQL queries for data manipulation and retrieval.
Implement advanced scripting solutions using Python and PySpark for data processing tasks.
Collaborate with cross-functional teams to understand data requirements and implement scalable solutions.
Utilize Kubernetes for container orchestration and management to ensure scalability and reliability of data applications.
Stay updated with the latest trends and advancements in data engineering and contribute to the continuous improvement of data processes and infrastructure.
Provide mentorship and guidance to junior team members, fostering a culture of knowledge sharing and collaboration.
Requirements:
Bachelor's degree in Computer Science, Engineering, or a related field.
Minimum 5 years of experience in data engineering roles, with a strong focus on building and managing solutions on AWS.
Proficiency in big data tools such as Hadoop, Spark, and Kafka.
Solid expertise in relational SQL and NoSQL databases for data manipulation and retrieval.
Advanced scripting skills in Python and PySpark for data processing tasks.
Experience with Kubernetes for container orchestration and management.
Familiarity with machine learning concepts and techniques is desirable.
Knowledge of Google Cloud Platform (GCP) is a plus.
Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment.
Strong communication and interpersonal skills, with the ability to effectively interact with stakeholders at all levels.