You are viewing a preview of this job. Log in or register to view more details about this job.

HPC Engineer

Job Details
We are seeking a skilled and motivated HPC Engineer to support our High-Performance Computing (HPC) and Digital Platform operations. This role plays a critical part in maintaining and optimizing Linux-based systems in a complex, data-driven environment. You’ll work alongside a global IT team to ensure seamless operation of our HPC and cloud infrastructure, supporting scientific discovery and data-intensive workloads.

This position requires flexibility to work variable shifts, including occasional evenings, weekends, or holiday coverage based on business and system support needs.

Key Responsibilities:

Install, configure, and maintain Linux systems and related applications across HPC and cloud environments
Monitor system performance, analyze logs, and proactively identify and address potential issues
Provide technical support to end users, resolving system, hardware, and software issues
Manage system backups, software upgrades, and security patches
Support in-house software, troubleshoot performance issues, and ensure adherence to IT policies
Utilize ticketing systems to manage and resolve support requests efficiently
Collaborate with cross-functional teams to support evolving business and research needs
Contribute to automation efforts using tools like Ansible, GitLab, Puppet, or equivalent
Support job scheduling and workload management using SLURM (preferred)
Stay current with evolving technologies and best practices in HPC and cloud computing

Required Skills & Qualifications:

Bachelor’s degree in Computer Science, IT, Engineering, or a related field – or equivalent work experience
1–5 years of hands-on Linux system administration experience, preferably in an HPC environment
Proficient in shell scripting (Bash, Python, or Perl)
Experience with Docker and container orchestration
Familiarity with configuration management tools (Ansible, Chef, Puppet, Salt, etc.)
Exposure to SQL-based databases such as MySQL or MariaDB
Strong troubleshooting, problem-solving, and communication skills
Ability to work variable shifts in a 24/7 environment as needed

Preferred Qualifications:

Experience with SLURM workload manager
Exposure to DevOps practices and tools (CI/CD, Kubernetes, OpenStack)
Understanding of hardware infrastructure, including CPU, GPU, and storage systems
Cloud administration experience (AWS, Azure, GCP, etc.)
Certifications such as CompTIA Network+, CCNA, or ITIL Foundation

Location: Houston, Texas

Schedule: Hybrid – 4 days onsite, 1 day work-from-home
Employment Type: Full-Time | Variable Shifts Required