Data Architecture : Architect end-to-end solutions, including data acquisition, preprocessing, reporting, user interface, model development, and deployment. Ensure scalability, reliability, monitoring and maintainability of AI systems
ETL Design & Data Modeling : Support Advanced Analytics delivery teams by supporting them with data modelling design discussions, design reviews, capturing data mappings and ensure reuse and best practice adoption.
Software Development: Lead development of code frameworks for data integration, data pipelining, data lineage, etc.
Deployment and Integration: Design and oversee the deployment of solution developed into production environments. Collaborate with software engineers for seamless integrat ion with existing products/ systems.
Thought Leadership: Provide guidance, mentorship, and technical expertise to team members - Data Engineers , DevOps, D ata scientists, M achine learning engineers, and non-technical stakeholders like Product Owners, Delivery Managers
Review & establish standards and best practices - guide and lead developers, training and testing framework, coding standards, engineering practices.
Staying up to with the latest Technology advancements and AI development s , and accordingly suggest improvements to our roadmap and approaches, and disseminate learnings to colleagues.
Qualifications & Skills:
Bachelors o r Masters degree in Computer Science , Data Science, or a related field with Proven experience of 5 + years working as a Data Engineer, in a cloud based environment
Proficiency of developing batch & streaming ETL pipelines using Python, SQL, Pyspark , open source format files (e.g., Iceberg/Hudi/Delta, Parquet, Avro, ORC) and spark optimisation
Proven track record in leveraging / developing frameworks, reusable solutions and defining standards eg. Kedro , ZenML , Great Expectations, Airflow, Airbyte , Singer, Alation, etc.
Experience of developing data integration, data pipeline, data quality, data preprocessing solution using Spark , Spark streaming to develop data lakes for Advanced Analytics projects
Experience with cloud platforms AWS specifically S3, Lambda, EC2, EMR, RDS, AWS Glue, etc.
Understanding of Distributed systems like Hadoop, Streaming - Kafka, Flink, Spark streaming
Experience of designing & developing data architecture as per data lake paradigms
Familiarity with Snowflake or other data warehousing platforms
Experience with data mapping, data lineage, data modelling & understanding of data governance, data security practices
Proficiency in database design and SQ L and NoSQL databases like Redis, DynamoDB .
Good understanding of the Agile methodologies and working in Scrum/SAFE practices
Experience on Advanced Analytics project & understanding of AI/ML techniques
Knowledge of software development best practices, including version control (Git) and continuous integration (CI/CD) processes.
Strong problem-solving and debugging skills.
Effective communication skills and the ability to work collaboratively with cross-functional teams.