You will take charge of establishing a scalable data infrastructure, aligning it with the needs discussed with product managers, business stakeholders, and analytics team members.
Collaborate with product managers, UX designers, and other backend teams to comprehend requirements and convert them into functional specifications.
Your responsibilities include designing, developing, and sustaining efficient data pipelines, data warehouses, and data integration processes.
Implement and enhance ETL (Extract, Transform, Load) processes for the effective handling of large volumes of structured and unstructured data.
Guarantee data quality and integrity through the implementation of data validation and cleansing techniques, while also defining and enforcing data governance, security, and compliance policies.
Construct and uphold data models, fact layers, schemas, and database systems to support data analytics and reporting requirements.
Requirements:
Education: BS in Computer Science or Equivalent.
Minimum 2+ years of data engineering experience.
Experience using batch and stream processing frameworks like Spark/Kafka Streaming, Spring batch, or equivalent.
Proficiency in programming languages such as Python, Java, or Scala for data processing and scripting.
Good experience with SQL and writing complex queries/data structures.
Proficiency with AWS setup and configuration.
Experience creating ETL pipelines from sources like Google Analytics, MixPanel, AWS Aurora etc., using Glue or equivalent technologies
In-depth knowledge of relational databases (e. g., MySQL, PostgreSQL) and experience with database design and optimization.
Hands-on experience with big data technologies such as Hadoop, Spark, or NoSQL databases.
Excellent communication and interpersonal skills to effectively collaborate with cross-functional teams and stakeholders.