In This Role, You Will:
Data infrastructure and processing:
- Lead the design and development of data pipelines for seamless integration of data from various sources into the G2 Data Platform.
- Optimize data pipelines, ensuring cost effectiveness, scalability, and reliability.
- Constantly innovate to make the data stack follow the latest trends.
- Service data requests from various users of the G2 data platform.
- Actively contribute to data modeling and design reviews, striving for improved adoption and efficiency.
- Execute the project tasks aligned with project timelines and objectives under the guidance of senior team members.
- Develop repeatable and scalable code that processes data to ensure data availability in the platform is as real-time as possible.
- Actively contribute to the development and advancement of the data platform.
- Promote architectural changes that increase the scalability of our data infrastructure while maintaining efficiency in all phases of the development lifecycle.
Data quality assurance and governance :
- Learn and adopt best practices in data engineering to contribute to robust solutions.
- Own the implementation of data quality and data governance initiatives and drive them to completion.
- Recommend and ensure the data platform follows privacy and security standards and requirements.
- Document data architecture, data model, and workflows for how it should be consumed.
Mentorship and Collaboration :
- Guide junior engineers by providing technical support, expertise, best practices, and constructive feedback on data engineering techniques.
- Collaborate with peers, actively participating in knowledge sharing sessions and contributing to a collaborative team environment.
- Seek guidance and mentorship from senior team members to enhance technical and analytical skills.
Minimum Qualifications:
- 4+ years of experience as a data engineer or ETL developer.
- 2+ years of development experience with sound skills in data modeling, optimization, and database architecture.
- Experience in the design and development of data pipelines using cloud and open-source tools.
- Proficiency in writing and debugging SQL queries.
- Good programming skills in Python or Java.
- Must have good knowledge of performance tuning, optimization and debugging of data pipelines.
- Working knowledge of the AWS data services (DynamoDB, RDS, Data Pipeline, EMR, Lambda, Glue, ECS, etc) and cloud data warehouses like SnowFlake.
- Proficiency in handling structured and unstructured data.
- Proficiency in ELT/ELT tools like AWS Glue, Step Functions, Data Pipelines, Airflow, Airbyte, and DBT.
- Familiarity with distributed computing and frameworks like Apache Spark, Hadoop, and Apache Kafka for handling large volumes of data.
- Familiarity with software engineering principles and best practices.
What Can Help Your Application Stand Out:
- Experience with Docker and Kubernetes.
- Proficiency in data modeling, schema design, and optimizing data structures for performance in Snowflake.
- Working experience in startup environments.
- Experience with Agile process methodology, CI/CD automation, Test Driven Development.
- Knowledge of data governance, security, and compliance standards within cloud-based data solutions.
- Understanding of any reporting tools such as Tableau, Qlikview ,Looker, PowerBI, etc
- Database administration background