Search by job, company or skills
Job Scope:
The AI Infrastructure Lead will be responsible for overseeing the design, implementation, and optimization of infrastructure solutions tailored to support AI and machine learning workloads. This role requires expertise in network architecture, cloud computing, security solutions, and automation tools.
Job Responsibilities:
Lead projects from conception through design and implementation, ensuring infrastructure solutions meet the computational needs of AI teams.
Design and architect secure and scalable network infrastructure, with a strong background in network architecture such as Cisco ACI Architecture.
Implement and manage security solutions including WAF, DoS/DDOS protection, SSL, IPS, and other relevant security measures.
Deep understanding of Nvidia GPUs, NVSwitch, and NVLink architecture, incorporating specialized hardware for optimized AI workloads.
Architect and deploy cloud infrastructure solutions, including master nodes, worker nodes, pods, and containers, optimized for AI and machine learning tasks.
Integrate workloads with network layer, leveraging detailed knowledge of routing/switching technologies.
Develop and implement cloud automation and orchestration approaches using tools such as Terraform and Python to streamline deployment and management tasks.
Experience in service decomposition and microservices architecture, with expertise in multi-tier traditional application architectures.
Lead end-to-end cloud migration and transformation projects, designing and engineering solutions for seamless transition.
Collaborate closely with AI teams to understand computational needs and translate them into infrastructure requirements.
Monitor, manage, and optimize cloud resources to maximize performance and minimize costs, utilizing automation scripts and infrastructure as code.
Stay updated on new DC technology approaches and make rapid decisions on adopting advancements to improve overall data center capabilities.
Document solutions through high-level design (HLD) and low-level design (LLD) documents covering end-to-end DC and cloud requirements.
Good to Have Skill:
Experience with containerization technologies such as Docker and Kubernetes.
Knowledge of cloud computing delivery models (IaaS, PaaS, SaaS) and deployment models related to Public, Private, and Hybrid Cloud services.
Previous experience in solutions design and engineering, particularly in the context of cloud infrastructure.
Familiarity with AI and machine learning concepts and workflows, with the ability to align infrastructure solutions accordingly.
Behavioral Attributes:
Qualification and Experience:
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
Minimum of 10 years of experience in infrastructure architecture, cloud computing, and network design.
Proven track record of leading infrastructure projects, particularly in the context of AI and machine learning workloads.
Experience with cloud automation tools, programming languages such as Python, and infrastructure as code practice.
Date Posted: 11/07/2024
Job ID: 84183769