Search by job, company or skills

Gruve Inc.

Software Engineer (Platform Engineering)

Early Applicant
  • 5 months ago
  • Be among the first 50 applicants

Job Description

As a Software Developer in GPU Infrastructure Automation, you will be responsible for designing, developing, and optimizing software solutions that effectively manage and schedule GPU resources. You will work closely with various software teams to ensure seamless integration and optimal performance of our GPU infrastructure.

Key Responsibilities:

Design and implement GPU cluster management and observability tools.

Develop tools and APIs for other computational layers.

Conduct performance profiling and optimization using tools like NVIDIA Nsight.

Participate in code reviews, design discussions, and continuous integration/continuous deployment (CI/CD) processes.

Validate GPU cluster performance with benchmarking tools likeMLPerf.

Implement and maintain synchronization mechanisms for managing concurrency and shared resources.

Developing infrastructure software tool kit for GPU clustering, capacity and scheduling automation

Required Skills and Qualifications:

Bachelor's orMaster's degree in Computer Scienceor related field.

Strong proficiency in Golang, C/C++, and experience with GPU schedulers like SLURM.

Strong proficiency in Kubernetes (K8) technologies

Strong proficiency in in one of the public cloud Infrastructure and PaaS technologies (AWS, GCP, Azure)

In-depth understanding of GPU architectures and parallel computing principles.

Excellent understanding of REST APIs and experience with threading, concurrency, and synchronization mechanisms.

Knowledge of Linux operating systems.

Familiarity with scheduling algorithms and load balancing techniques.

Strong understanding of data structures, algorithms, and numerical methods.

Proficient in creating and using well-structured CI/CD pipelines.

Excellent problem-solving skills and attention to detail.

Strong communication and teamwork abilities.

More Info

Industry:Other

Function:technology

Job Type:Permanent Job

Skills Required

Login to check your skill match score

Login

Date Posted: 18/06/2024

Job ID: 82139449

Report Job

About Company

Follow

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Staff Software Engineer Platform Infrastructure Engineering Team

Pantheon Systems IncCompany Name Confidential

Software Engineer iOS Engineering Platform

GojekCompany Name Confidential
Last Updated: 18-06-2024 02:17:35 PM
Home Jobs in Pune Software Engineer (Platform Engineering)