Building and maintaining our data infrastructure on the AWS cloud platform, with a specific focus on supporting AI model training, development, and deployment
This includes designing, developing, and implementing data pipelines, ensuring the secure and efficient flow of data for our AI models
Responsibilities
Data for AI: Design and implement data pipelines specifically tailored for AI training and model development
Partner with AI data scientists and engineers to understand their data needs and ensure efficient data access
Implement data pre-processing and feature engineering techniques for optimal model performance
AWS expertise: Design,develop, and maintain data infrastructure on the AWS cloud platform, leveraging services like S3,Redshift, Glue, Lambda, and Kinesis
Build and maintain data lakes and warehouses optimized for AI workloads
Implement automated data transformations,cleansing, and validation processes to ensure data quality
Coding proficiency: Write code using Python,SQL, and other relevant programming languages, with a focus on libraries and frameworks commonly used in AI (eg, TensorFlow, PyTorch)
Automation and monitoring: Automate data pipelines using CI/CD tools and related techniques
Monitor and troubleshoot data pipelines for performance, reliability,and data quality issues
Qualifications
3 to 8 years of experience as a Data Engineer or similar role,with a demonstrated experience in supporting data pipelines for AI applications and their data requirements
Proven experience with AWS services,including S3, Redshift, Glue, Lambda, and Kinesis
Strong experience with Python and SQL
Strong understanding of AI libraries and frameworks
Experience with data warehousing and data modeling concepts
Excellent problem-solving and analytical skills
Ability to work independently and as part of a collaborative team