You are viewing a preview of this job. Log in or register to view more details about this job.

Masters Intern, Distributed Systems/AI Infrastructure

About Odyn Network

Odyn Network is at the forefront of artificial intelligence innovation, building transformative AI solutions that demand cutting-edge, high-performance infrastructure. Our mission is to accelerate AI development through scalable, efficient, and reliable systems. We are redefining the future of distributed computing by providing instant access to a global network of GPUs for AI workloads.

Role Overview

We are seeking exceptional Masters students pursuing degrees in Computer Engineering, Computer Science, or related fields to join our team as Research Interns. This internship offers hands-on experience working alongside our senior engineering team on real-world projects involving GPU infrastructure, distributed computing, and AI orchestration platforms.

As an intern, you will contribute meaningful work to projects that directly impact our production systems while gaining exposure to cutting-edge technologies in GPU computing, Kubernetes orchestration, high-performance networking, and cloud-native infrastructure.

What You'll Work On

Interns will be integrated into one of our core technical areas and will work on projects such as:

GPU Infrastructure & Orchestration

Assist in designing and implementing GPU cluster management solutions using Kubernetes, Slurm, or Ray
Develop scripts and tools for monitoring GPU utilization, telemetry, and performance metrics
Contribute to resource scheduling algorithms and workload optimization strategies
Profile and analyze CUDA/NCCL performance in distributed training environments

Distributed Systems & Networking

Work on data transfer optimization and high-speed networking configurations (InfiniBand, RoCE)
Develop tools for fault tolerance monitoring and resilience testing
Contribute to software-defined networking (SDN) solutions for compute clusters
Implement checkpointing and graceful eviction mechanisms for distributed workloads

Research & Development

Research and prototype advanced scheduling and resource allocation algorithms
Investigate performance optimization techniques across heterogeneous GPU hardware
Contribute to technical documentation, white papers, and internal research reports
Explore innovations in AI model deployment, inference serving, and workload orchestration

Platform Development

Build automation tools using Python, Go, or C++ for infrastructure management
Develop Kubernetes operators, controllers, or admission webhooks
Create dashboards and visualization tools for system monitoring using Prometheus/Grafana
Contribute to open-source projects and internal tooling libraries

What We're Looking For

Required Qualifications

Currently enrolled in a Masters programme in Computer Engineering, Computer Science, or related field
Strong programming skills in Python, C++, Go, or similar languages
Understanding of computer architecture, operating systems, and networking fundamentals
Solid grasp of data structures, algorithms, and software engineering principles
Excellent problem-solving skills and analytical mindset
Strong written and verbal communication skills
Self-motivated with ability to work independently and collaboratively in a fast-paced environment

Preferred Qualifications

Coursework or projects involving GPU computing, CUDA, parallel programming, or distributed systems
Experience with containerization technologies (Docker, Kubernetes) or orchestration platforms
Exposure to machine learning frameworks (PyTorch, TensorFlow) or AI workloads
Familiarity with high-performance computing (HPC), cluster computing, or cloud platforms (AWS, GCP, Azure)
Knowledge of networking protocols, Linux internals, or systems programming
Prior internship or research experience in related fields
Contributions to open-source projects or personal projects demonstrating technical skills
Understanding of infrastructure-as-code tools (Terraform, Ansible, Helm)

What We Offer

Professional Development

Mentorship from experienced engineers and researchers in GPU computing and distributed systems
Hands-on experience with production-scale infrastructure powering AI workloads
Exposure to cutting-edge technologies including H100/A100 GPUs, Kubernetes, Ray, and cloud-native platforms
Opportunity to contribute to open-source projects and publish technical work
Professional networking opportunities with industry experts and academic researchers

Compensation & Benefits

Competitive monthly compensation
Remote/flexible work arrangements with optional in-person collaboration in London
Access to state-of-the-art GPU infrastructure and development tools
Potential for return internship offers or full-time employment opportunities post-graduation

Work Environment

Collaborative, inclusive culture that values diverse perspectives and innovative thinking
Flexible working hours that accommodate academic schedules
Regular technical talks, workshops, and learning sessions
Flat organizational structure with direct access to senior leadership
Startup environment with opportunity to make significant impact from day one

Odyn Network is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees and interns. We encourage applications from candidates of all backgrounds, including those historically underrepresented in technology and engineering fields.