Preferred - Data Engineering Background
Required Skills - GCP DE Experience, Big query, SQL, Cloud compressor/Python, Cloud functions, Dataproc+pyspark, Python injection, Dataflow+PUB/SUB
Job Requirement:
- Have Implemented and Architected solutions on Google Cloud Platform using the components of GCP
- Experience with Apache Beam/Google Dataflow/Apache Spark in creating end to end data pipelines.
- Experience in some of the following: Python, Hadoop, Spark, SQL, Big Query, Big Table Cloud Storage, Datastore, Spanner, Cloud SQL, Machine Learning.
- Experience programming in Java, Python, etc
- Expertise in at least two of these technologies: Relational Databases, Analytical Databases, NoSQL databases.
- Certified in Google Professional Data Engineer/ Solution Architect is a major Advantage
Skills Required:
- 3+ years experience in IT or professional services experience in IT delivery or large-scale IT analytics projects
- Candidates must have expertise knowledge of Google Cloud Platform; the other cloud platforms are nice to have.
- Expert knowledge in SQL development.
- Expertise in building data integration and preparation tools using cloud technologies (like Snaplogic, Google Dataflow, Cloud Dataprep, Python, etc).
- Experience with Apache Beam/Google Dataflow/Apache Spark in creating end to end data pipelines.
- Experience in some of the following: Python, Hadoop, Spark, SQL, Big Query, Big Table Cloud Storage, Datastore, Spanner, Cloud SQL, Machine Learning.
- Experience programming in Java, Python, etc
- Identify downstream implications of data loads/migration (eg, data quality, regulatory, etc)
- Implement data pipelines to automate the ingestion, transformation, and augmentation of data sources, and provide best practices for pipeline operations.
- Capability to work in a rapidly changing business environment and to enable simplified user access to massive data by building scalable data solutions
- Advanced SQL writing and experience in data mining (SQL, ETL, data warehouse, etc) and using databases in a business environment with complex datasets