Search by job, company or skills
Job Description:
5+ years of experience different flavours of Linux like SLES, RHEL and Ubuntu/Debian.
Experience in managing HPC clusters and should have good understanding of its architecture.
Skilled in installation and configuration of various applications on Linux.
Install, administer, and maintain hardware, system software, networking, accounts, and security measures on VMWare configuration.
Diagnose and resolve system issues and performance issues.
Should have experience in drafting technical SOPs, action plans and knowledge documents.
Should have good understanding of different cloud platforms.
Reinstate integrity of system as quickly as possible following an outage in order to minimize downtime.
Triage and solve user-submitted tickets, especially when they relate to the infrastructure.
Track resource usage using monitoring and queuing software.
Peer assistance is an added trait.
Technical Skills:
Demonstrated expertise with Linux system administration, including OS, networking, storage, Docker and security.
Experience with high-speed networking such as InfiniBand and 10/40 Gigabit Ethernet.
Familiarity with large storage systems (Lustre, GPFS, others).
Experience with HPC clusters manager (xCat, HPCM, Bright Cluster Manager).
Experience in server hardware patching and troubleshooting.
Experience managing HPC clusters and GPUs.
Experience using and supporting job schedulers such as SLURM, PBS or other schedulers.
Familiar with Shell/python scripting and Ansible.
Familiar with monitoring tools like Grafana/Nagios/Opsramp.
Familiar with virtulization technologies like KVM, VMWare, vCenter.
Login to check your skill match score
Date Posted: 29/10/2024
Job ID: 98434113