You are viewing a preview of this job. Log in or register to view more details about this job.

Masters Intern, Distributed Systems/AI Infrastructure

About Odyn Network

Odyn Network is at the forefront of artificial intelligence innovation, building transformative AI solutions that demand cutting-edge, high-performance infrastructure. Our mission is to accelerate AI development through scalable, efficient, and reliable systems. We are redefining the future of distributed computing by providing instant access to a global network of GPUs for AI workloads.
 

Role Overview

We are seeking exceptional Masters students pursuing degrees in Computer Engineering, Computer Science, or related fields to join our team as Research Interns. This internship offers hands-on experience working alongside our senior engineering team on real-world projects involving GPU infrastructure, distributed computing, and AI orchestration platforms.
 

As an intern, you will contribute meaningful work to projects that directly impact our production systems while gaining exposure to cutting-edge technologies in GPU computing, Kubernetes orchestration, high-performance networking, and cloud-native infrastructure.
 

What You'll Work On

Interns will be integrated into one of our core technical areas and will work on projects such as:

GPU Infrastructure & Orchestration

  • Assist in designing and implementing GPU cluster management solutions using Kubernetes, Slurm, or Ray
  • Develop scripts and tools for monitoring GPU utilization, telemetry, and performance metrics
  • Contribute to resource scheduling algorithms and workload optimization strategies
  • Profile and analyze CUDA/NCCL performance in distributed training environments

Distributed Systems & Networking

  • Work on data transfer optimization and high-speed networking configurations (InfiniBand, RoCE)
  • Develop tools for fault tolerance monitoring and resilience testing
  • Contribute to software-defined networking (SDN) solutions for compute clusters
  • Implement checkpointing and graceful eviction mechanisms for distributed workloads

Research & Development

  • Research and prototype advanced scheduling and resource allocation algorithms
  • Investigate performance optimization techniques across heterogeneous GPU hardware
  • Contribute to technical documentation, white papers, and internal research reports
  • Explore innovations in AI model deployment, inference serving, and workload orchestration

Platform Development

  • Build automation tools using Python, Go, or C++ for infrastructure management
  • Develop Kubernetes operators, controllers, or admission webhooks
  • Create dashboards and visualization tools for system monitoring using Prometheus/Grafana
  • Contribute to open-source projects and internal tooling libraries

 

What We're Looking For

Required Qualifications

  • Currently enrolled in a Masters programme in Computer Engineering, Computer Science, or related field
  • Strong programming skills in Python, C++, Go, or similar languages
  • Understanding of computer architecture, operating systems, and networking fundamentals
  • Solid grasp of data structures, algorithms, and software engineering principles
  • Excellent problem-solving skills and analytical mindset
  • Strong written and verbal communication skills
  • Self-motivated with ability to work independently and collaboratively in a fast-paced environment

Preferred Qualifications

  • Coursework or projects involving GPU computing, CUDA, parallel programming, or distributed systems
  • Experience with containerization technologies (Docker, Kubernetes) or orchestration platforms
  • Exposure to machine learning frameworks (PyTorch, TensorFlow) or AI workloads
  • Familiarity with high-performance computing (HPC), cluster computing, or cloud platforms (AWS, GCP, Azure)
  • Knowledge of networking protocols, Linux internals, or systems programming
  • Prior internship or research experience in related fields
  • Contributions to open-source projects or personal projects demonstrating technical skills
  • Understanding of infrastructure-as-code tools (Terraform, Ansible, Helm)

 

What We Offer

Professional Development

  • Mentorship from experienced engineers and researchers in GPU computing and distributed systems
  • Hands-on experience with production-scale infrastructure powering AI workloads
  • Exposure to cutting-edge technologies including H100/A100 GPUs, Kubernetes, Ray, and cloud-native platforms
  • Opportunity to contribute to open-source projects and publish technical work
  • Professional networking opportunities with industry experts and academic researchers

Compensation & Benefits

  • Competitive monthly compensation
  • Remote/flexible work arrangements with optional in-person collaboration in London
  • Access to state-of-the-art GPU infrastructure and development tools
  • Potential for return internship offers or full-time employment opportunities post-graduation

Work Environment

  • Collaborative, inclusive culture that values diverse perspectives and innovative thinking
  • Flexible working hours that accommodate academic schedules
  • Regular technical talks, workshops, and learning sessions
  • Flat organizational structure with direct access to senior leadership
  • Startup environment with opportunity to make significant impact from day one

 

Odyn Network is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees and interns. We encourage applications from candidates of all backgrounds, including those historically underrepresented in technology and engineering fields.