Software Engineer

Early Applicant

Exp: 1-3 Years

Job Description

Bachelors degree in Computer Science, Data Engineering, or a related field.
3+ years of experience with Python.
3+ years of experience in SQL and NoSQL Database.
1+ years of experience building out data pipelines from scratch in a highly distributed and fault-tolerant manner using Apache Airflow and Python
Experience with GCP cloud services (eg: BigQuery, Cloud Run, Dataflow, CloudSQL, GCS, Cloud Functions and Pub/Sub.)
Strong understanding of database management, data modeling, and ETL processes.
Familiarity with internal data sources and the ability to extract and transform data effectively.
Experience in creating database schemas, views, indexes, stored procedures and managing data pipelines.
Proficiency in Apache Airflow and PySpark for task automation and data processing.
Ability to work collaboratively in a team environment and communicate effectively with stakeholders.
Strong problem-solving skills and attention to detail in data processing tasks.

Desired Skills:

Familiarity with big data and machine learning tools and platforms.
Comfortable with open-source technologies including Apache Spark, Hadoop, Kafka.
Knowledge of database systems (SQL, NoSQL).
Understanding of DevOps practices and tools.

Responsibilities

Responsible for development support/maintenance of a brand-new offering under the Integrated Services pillar of Ford. Your work will involve:
Experience with Python frameworks such as Django, Flask, or FastAPI.
Utilize Apache Airflow, Apache Spark, and Python to design and implement robust data engineering solutions.
Gain a deep understanding of internal Ford data sources to effectively extract, transform, and load data for analysis.
Develop database schemas, indexes and views within each data source to optimize data processing.
Define Apache Airflow tasks and explore PySpark for handling task failures and data processing challenges.
Establish and maintain an internal database for managing subscription and API usage data, and ensure data integrity and security.
Collaborate with the internal data factory team to establish data pipelines, synchronize data, and perform ETL activities.
Implement Apache Airflow tasks in Python to automate data synchronization processes and maintain data consistency.
Design, develop, and optimize data pipelines using BigQuery to extract, transform, and load data from diverse sources