Expertise in AWS services like EC2, CloudFormation, S3, IAM, SNS, SQS, EMR, Athena, Glue, lake formation etc
Expertise in Hadoop/EMR/DataBricks with good debugging skills to resolve hive and spark related issues.
Sound fundamentals of database concepts and experience with relational or non-relational database types such as SQL, Key-Value, Graphs etc
Experience in infrastructure provisioning using CloudFormation, Terraform, Ansible, etc
Experience in programming languages such as Python/PySpark.
Excellent written and verbal communication skills.
Key Responsibilities
Working closely with the Data lake engineers to provide technical guidance, consultation and resolution of their queries.
Assist in development of simple and advanced analytics best practices, processes, technology & solution patterns and automation (including CI/CD)
Working closely with various stakeholders in US team with a collaborative approach.
Develop data pipeline in python/pyspark to be executed in AWS cloud.
Set up analytics infrastructure in AWS using cloud formation templates.
Develop mini/micro batch, streaming ingestion patterns using Kinesis/Kafka.
Seamlessly upgrading the application to higher version like Spark/EMR upgrade.
Participates in the code reviews of the developed modules and applications.
Provides inputs for formulation of best practices for ETL processes / jobs written in programming languages such as PySpak and BI processes.
Working with column-oriented data storage formats such as Parquet , interactive query service such as Athena and event-driven computing cloud service - Lambda
Performing R&D with respect to the latest and greatest Big data in the market, perform comparative analysis and provides recommendations to choose the best tool as per the current and future needs of the enterprise.
Required Qualifications
Bachelors or Masters degree in Computer Science or similar field
2-4 years of strong expeirence in big data development
Expertise in AWS services like EC2, CloudFormation, S3, IAM, SNS, SQS, EMR, Athena, Glue, lake formation etc
Expertise in Hadoop/EMR/DataBricks with good debugging skills to resolve hive and spark related issues.
Sound fundamentals of database concepts and experience with relational or non-relational database types such as SQL, Key-Value, Graphs etc
Experience in infrastructure provisioning using CloudFormation, Terraform, Ansible, etc
Experience in programming languages such as Python/PySpark.
Excellent written and verbal communication skills.