Overview
The GCP Data Engineer plays a critical role in managing and transforming data within the Google Cloud Platform. This position is vital for organizations that depend on data-driven insights to make strategic decisions. The Data Engineer is responsible for designing, implementing, and maintaining scalable data pipelines that extract, transform, and load data into BigQuery or other analytics platforms. In an era where data is essential, the GCP Data Engineer ensures high-quality data management, enabling data scientists and analysts to access reliable datasets for analysis and reporting.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using Google Cloud services.
- Implement ETL processes for data ingestion from various sources.
- Optimize and manage data storage solutions in Google Cloud Storage.
- Collaborate with data scientists to understand their data requirements for analytics.
- Develop robust data models to support reporting and dashboard efforts.
- Monitor and troubleshoot data pipeline performance and integrity.
- Implement data security practices and manage access controls.
- Utilize BigQuery to query and analyze large datasets efficiently.
- Automate and schedule jobs for data pipelines to ensure timely delivery.
- Document data flows, architecture, and operational procedures.
- Stay updated on GCP advancements and industry trends.
Required Qualifications
- 7 to 10 years of IT experience.
- A background in software engineering, ideally with experience in a complex cloud environment (Google Cloud Platform).
- Proficiency in Python and SQL.
- Experience building data pipelines and ETL frameworks, both batch and real-time, using Python and GCP tools such as Apache Beam, Data Flow, and Data Fusion.
- Experience using Terraform for Infrastructure as Code.
- Familiarity with Big Data Technologies (Spark, Cloud SQL, BigQuery), preferably in an enterprise environment.
- Experience orchestrating end-to-end data pipelines with tools like Cloud Composer and Dataform is highly desired.
- Experience managing complex and reusable dataflow pipelines is a plus.
- Proficiency with Airflow is essential.
- Experience working with operations and architecture teams to develop scalable and supportable solutions is desired but not mandatory.
- Ability to understand customer needs and make sound judgments.
- Attention to detail and strong problem-solving capabilities to develop and deliver quality solutions.
- Understanding of the current technology landscape and relevant technologies.
Skills: big data technologies,python,data modeling,spark,google cloud platform,cloud storage,gcp,etl processes,sql,cloud composer,data flow,terraform,cloud,airflow,google cloud,bigquery,pipelines,data pipelines,sql proficiency,apache beam,data warehousing,cloud sql,dataform,data fusion