You are viewing a preview of this job. Log in or register to view more details about this job.

Machine Learning Engineer

The Chan Zuckerberg Biohub (CZ Biohub SF) is seeking a highly skilled and motivated Machine Learning Engineer to lead the development of state-of-the-art multimodal large language model (LLM) agents that will enable breakthrough research and discoveries in biology. We are interested in pursuing these new ideas for zebrafish, a powerful model organism, to understand mechanisms of infection and immunity, organ regeneration, and organismal development.  The ideal candidate will have established expertise in machine learning, self-supervised learning, and pretraining of multimodal models to integrate natural language with another modality such as omics or image. The successful candidate will report directly to Yasin Şenbabaoğlu (Director of Computational Biology) at CZ Biohub, San Francisco.

You will

  • Design, develop, and help to deploy multimodal LLMs that integrate textual and multi-omic data
  • Lead the research and development of novel algorithms to process and align scientific literature with biological datasets for downstream analysis
  • Collaborate closely with computational biologists and experimental scientists to understand domain-specific challenges and optimize model performance
  • Manage large-scale datasets (scientific texts, omics data, and imaging) and build efficient data pipelines for training and evaluation
  • Mentor junior scientists and engineers, fostering a culture of collaboration and continuous learning research projects

You have

Required –

  • PhD in Computer Science, Machine Learning, Computational Biology, Bioinformatics or a related field; or Masters with equivalent experience
  • 3+ years of experience with Python and relevant deep learning libraries (e.g., PyTorch, TensorFlow)
  • 3+ years of experience in designing innovative multimodal AI systems and/or architectures
  • Experience with model deployment, containerization, cloud-based platforms and version control systems
  • Experience in integrating and aligning heterogeneous data sources (text, omics, images) for AI-driven applications
  • Proven track record of impactful publications and conference presentations in relevant areas
  • Excellent problem-solving skills and ability to work in an interdisciplinary environment

Nice to have -

  • Expertise in natural language processing and self-supervised learning training techniques
  • Familiarity with bioinformatics tools and scientific literature databases (e.g., PubMed, arXiv)
  • Experience with Python backend framework experience (e.g. Flask, FastAPI, etc) and high-performance computing (HPC) environments
  • Strong leadership and project management skills