Search by job, company or skills

Simplismart

Infrastructure Engineer

Early Applicant
  • a month ago
  • Be among the first 50 applicants

Job Description

About Simplismart

A bit about our product - Simplismart is an MLOps platform with 3 major suites:

  • Training suite: Assemble and train any model, including LLMs, vision, audio, tabular, and tree models.
  • Deployment suite: Most companies fail to make models production-ready. Our proprietary model deployment suite is 6x faster than HuggingFaces enterprise suite and 12x faster than replicate.ai. Users can easily deploy (auto-scale) models trained on Simplismart (more optimised), import any model from HuggingFace, or even a Pytorch/Tensorflow artefact: Tensorflow, Pytorch, ONNX, JAX.
  • Observability suite: Monitor model health, including load, latency, uptime, data drift, and concept drift.

Position Overview

As a Cloud Engineer, you will contribute to building a highly available, global, multi-cloud PaaS platform using open-source technologies to support Simplismarts rapid growth. This system encompasses diverse environments (Kubernetes, VMs, bare metal compute) and provides a cohesive and reliable abstraction for running AI workloads. You will be able to work with cutting-edge technologies and solve complex problems.

To be successful in this role, you need to be deeply technical, possess strong communication and collaboration skills, and have experience in infrastructure-as-code. Proficiency with tools like Terraform and Ansible and strong software development fundamentals is essential. Additionally, you should have a good understanding of systems knowledge and troubleshooting abilities.

Requirements

  • 5+ years of experience writing high-performance, well-tested, production-quality code and platform engineering.
  • Proficiency in at least one backend programming language (Python desired; C++ is a plus)
  • Demonstrated experience with high-performance or distributed cloud microservices architectures.
  • Ideally, you should have experience building and operating globally using multiple cloud providers such as AWS, Azure, or GCP.
  • A good understanding of low-level operating systems concepts, including multi-threading, memory management, networking and storage, performance, and scale.
  • Pragmatic, methodical, well-organized, detail-oriented, and self-starting.
  • Experience with Kubernetes, containerization, Terraform and Ansible.
  • Experience with Pytorch or Tensorflow is a plus. (not necessary)
  • Knowledge of GPU programming, NCCL and CUDA is a plus.

Responsibilities

  • Designing the high-level architecture of the MLOps platform from the ground up.
  • Handling formalisation of diverse GPU-based workloads.
  • Developing a robust internal system for continuous deployment of various services and modules in diverse environments.
  • Create frameworks for reliable and fault tolerant systems for mission-critical workloads.

Skills And Attributes

  • Deep technical expertise.
  • Strong communication and collaboration skills.
  • Experience in infrastructure-as-code (Terraform, Ansible).
  • Strong software development fundamentals.
  • Good systems knowledge and troubleshooting abilities.
  • Ability to work independently and as part of a team.
  • Proactive and self-motivated.

Why should you join SimpliSmart

Well, let's break away from the conventional perks and instead focus on what you WONT experience here:

  • Legacy System Headaches: You won't have to endlessly grapple with outdated legacy systems that hinder your productivity and creativity.
  • Bossy Culture: At SimpliSmart, we believe in collaboration and empowerment, not hierarchy. You won't have a boss breathing down your neck but instead, colleagues who support your growth.
  • Dark Circles: Late nights and overwork are not the norm here. We prioritize work-life balance, ensuring you won't be sporting those tired, dark circles under your eyes.
  • Stagnation: Say goodbye to redundant and stagnant tasks. We thrive on innovation and dynamic challenges that keep you engaged and motivated.

Skills: infrastructure,prometheus,grafana,terraform,kubernetes

More Info

Industry:Other

Job Type:Permanent Job

Date Posted: 08/10/2024

Job ID: 95427821

Report Job

About Company

Follow

Hi , want to stand out? Get your resume crafted by experts.

Similar Jobs

Software Engineer II Infrastructure Core

Google IncCompany Name Confidential

Network Engineer IT Technical Analyst Network Infrastructure

The Human Capital ExchangeCompany Name Confidential
Last Updated: 17-11-2024 09:17:29 PM