You are viewing a preview of this job. Log in or register to view more details about this job.

Machine Learning Infrastructure Engineer

ML Infrastructure Engineer
Menlo Park, CA | On-Site | Full-Time/Direct Hire

Inference Optimization is a MUST 

Looking for ML Infra experts (Bay Area preferred) with deep experience in CUDA, GPU optimization, VLLMs, and LLM inference—pure language focus, no vision/audio.

This is for our Dillusion-LLM StartUp Client in Menlo Park, CA

We’re looking for a ML Infrastructure Engineer to help build the infrastructure that powers large-scale model training and real-time inference. You’ll collaborate with world-class researchers and engineers to design high-performance, distributed systems that bring advanced LLMs into production.


Responsibilities

  • Design and manage distributed infrastructure for ML training at scale
  • Optimize model serving systems for low-latency inference
  • Build automated pipelines for data processing, model training, and deployment
  • Implement observability tools to monitor performance in production
  • Maximize resource utilization across GPU clusters and cloud environments
  • Translate research requirements into robust, scalable system designs

Must-Haves

  • Masters or PhD in Computer Science, Engineering, or a related field (or equivalent experience)
  • Strong foundation in software engineering, systems design, and distributed systems
  • Experience with cloud platforms (AWS, GCP, or Azure)
  • Proficient in Python and at least one systems-level language (C++/Rust/Go)
  • Hands-on experience with Docker, Kubernetes, and CI/CD workflows
  • Familiarity with ML frameworks like PyTorch or TensorFlow from a systems perspective