Whisper ASR Engineer
Software Developer
Overview:
We’re seeking a skilled developer to work with the Whisper Automatic Speech Recognition (ASR) codebase and optimize it for real-time captioning, transcription, and translation applications. This role will focus on improving performance, latency, and accuracy.
Key Responsibilities:
- Work directly with the WhisperASR open-source model and underlying PyTorch codebase.
- Optimize inference speed and memory efficiency for real-time or near-real-time transcription.
- Implement low-latency streaming pipelines for audio capture, buffering, and incremental transcription.
- Profile and improve GPU/CPU utilization and model quantization or pruning where applicable.
- Develop and maintain API endpoints, CLI tools, or SDKs for Whisper-based real-time services.
- Work collaboratively with hardware, firmware, and product teams to ensure compatibility with on-premise embedded systems.
- Stay current on emerging speech recognition architectures and contribute ideas for R&D improvements.
- Conduct performance benchmarking, regression testing, and documentation of all optimizations.
Required Skills & Qualifications:
- Strong experience with Python and PyTorch (TensorRT).
- Familiarity with Open AI Whisper or other ASR architectures (wav2vec2, Conformer, Deep Speech, etc.).
- Experience with real-time or streaming data processing (e.g., Web Sockets, asyncio, FFmpeg).
- Experience working with LLMs and designing robust systems that utilize them.
- Background in CUDA optimization and GPU acceleration techniques.
- Understanding of audio preprocessing, signal processing, and speech-to-text workflows.
- Experience working in Linux environments.
- Strong debugging, profiling, and performance tuning skills.
Preferred Qualifications:
- Prior work on low-latency broadcast or captioning systems.
- Knowledge of python and C++ for performance-critical components.
- Experience deploying models on embedded devices (Jetson, Orin, or similar).
- Familiarity with real-time translation or multi-language ASR.