You are viewing a preview of this job. Log in or register to view more details about this job.

Whisper ASR Engineer

 Software Developer


 

Overview:

We’re seeking a skilled developer to work with the Whisper Automatic Speech Recognition (ASR) codebase and optimize it for real-time captioning, transcription, and translation applications. This role will focus on improving performance, latency, and accuracy.


 

Key Responsibilities:

 

  • Work directly with the WhisperASR open-source model and underlying PyTorch codebase.
  • Optimize inference speed and memory efficiency for real-time or near-real-time transcription.
  • Implement low-latency streaming pipelines for audio capture, buffering, and incremental transcription.
  • Profile and improve GPU/CPU utilization and model quantization or pruning where applicable.
  • Develop and maintain API endpoints, CLI tools, or SDKs for Whisper-based real-time services.
  • Work collaboratively with hardware, firmware, and product teams to ensure compatibility with on-premise embedded systems.
  • Stay current on emerging speech recognition architectures and contribute ideas for R&D improvements.
  • Conduct performance benchmarking, regression testing, and documentation of all optimizations.


 

Required Skills & Qualifications:

 

  • Strong experience with Python and PyTorch (TensorRT).
  • Familiarity with Open AI Whisper or other ASR architectures (wav2vec2, Conformer, Deep Speech, etc.).
  • Experience with real-time or streaming data processing (e.g., Web Sockets, asyncio, FFmpeg).
  • Experience working with LLMs and designing robust systems that utilize them.
  • Background in CUDA optimization and GPU acceleration techniques.
  • Understanding of audio preprocessing, signal processing, and speech-to-text workflows.
  • Experience working in Linux environments.
  • Strong debugging, profiling, and performance tuning skills.


 

Preferred Qualifications:

 

  • Prior work on low-latency broadcast or captioning systems.
  • Knowledge of python and C++ for performance-critical components.


 

  • Experience deploying models on embedded devices (Jetson, Orin, or similar).
  • Familiarity with real-time translation or multi-language ASR.