We are seeking talented Data Engineers to join our team and play a crucial role in designing, building, and maintaining the data infrastructure on cloud platforms for our clients. In this position, you will work closely with data architects, analysts, and cross-functional teams to develop robust and scalable data pipelines, ensuring the efficient flow of data across various systems and platforms.
About AIonOS
AIonOS, a joint venture between InterGlobe and Assago, is set to transform the AI landscape globally. AIonOS, an AI business venture, is where innovation meets human ingenuity to bring forth a future of precision and purpose in every decision businesses make. Bringing in a paradigm shift in the way enterprises run in conjunction with AI, AIonOS will transform businesses into AI-native enterprises by creating a seamless ecosystem of infrastructure, data, and AI to unlock new levels of productivity and profitability.
Responsibilities
- Design and develop data ingestion pipelines to extract data from various sources, including databases, data lakes, and streaming platforms, into cloud-based data repositories.
- Build and maintain ETL/ELT processes using cloud-native services and tools (e.g., Google Cloud Dataflow, Azure Data Factory, or AWS Glue) to transform and load data into data warehouses or data lakes.
- Implement and optimize data processing workflows using distributed computing frameworks like Apache Spark, Apache Beam, or cloud-native services like Azure Databricks, Google Dataproc, or Amazon EMR.
- Develop and maintain data storage solutions, such as data lakes (e.g., Azure Data Lake Storage, Google Cloud Storage, Amazon S3) and data warehouses (e.g., Azure Synapse Analytics, Google BigQuery, Amazon Redshift).
- Collaborate with data architects, data scientists, and analysts to understand data requirements and implement efficient data models and schemas.
- Ensure data quality, integrity, and security by implementing data validation, monitoring, and governance processes.
- Automate and orchestrate data pipelines using cloud-native tools (e.g., Azure Data Factory, Google Cloud Composer, AWS Step Functions) for efficient and reliable data processing.
- Optimize data pipelines and infrastructure for performance, scalability, and cost-effectiveness, leveraging cloud-native services and best practices.
- Provide technical support and documentation for data processing solutions and infrastructure.
Qualifications
- Bachelor's or master's degree in computer science, engineering, or a related field.
- Minimum of 2 years of experience as a Data Engineer, with a strong focus on cloud technologies. (2-8 years of experience)
- Proven expertise in at least one major cloud platform (Azure, GCP, or AWS) and its data services and tools. Preferably Google Cloud Platform (GCP), using services like Cloud Dataflow, Cloud Data prep, Cloud Dataproc, and Big Query.
- Proficient in Python, SQL, and scripting languages like Bash or PowerShell.
- Experience with data processing frameworks and libraries like Apache Spark, Apache Beam, pandas etc.
- Knowledge of data warehousing concepts, data modeling techniques (e.g., star schema, dimensional modeling), and ETL/ELT processes.
- Familiarity with big data technologies and frameworks like Apache Hadoop, Apache Kafka etc.
- Understanding of data governance, data security, and compliance best practices.
- Experience with containerization technologies like Docker and Kubernetes.
- Strong problem-solving, analytical, and troubleshooting skills.
- Excellent communication and collaboration skills, with the ability to work effectively in cross-functional teams.
- Familiarity with agile methodologies and DevOps practices.
While expertise in Google Cloud Platform (GCP) is preferred, candidates with experience in other cloud platforms like AWS or Azure will also be considered, as the core data engineering principles and technologies are transferable across cloud providers.