You are viewing a preview of this job. Log in or register to view more details about this job.

Whisper ASR Engineer

Software Developer

Overview:

We’re seeking a skilled developer to work with the Whisper Automatic Speech Recognition (ASR) codebase and optimize it for real-time captioning, transcription, and translation applications. This role will focus on improving performance, latency, and accuracy.

Key Responsibilities:

Work directly with the WhisperASR open-source model and underlying PyTorch codebase.
Optimize inference speed and memory efﬁciency for real-time or near-real-time transcription.
Implement low-latency streaming pipelines for audio capture, buffering, and incremental transcription.
Proﬁle and improve GPU/CPU utilization and model quantization or pruning where applicable.
Develop and maintain API endpoints, CLI tools, or SDKs for Whisper-based real-time services.
Work collaboratively with hardware, ﬁrmware, and product teams to ensure compatibility with on-premise embedded systems.
Stay current on emerging speech recognition architectures and contribute ideas for R&D improvements.
Conduct performance benchmarking, regression testing, and documentation of all optimizations.

Required Skills & Qualiﬁcations:

Strong experience with Python and PyTorch (TensorRT).
Familiarity with Open AI Whisper or other ASR architectures (wav2vec2, Conformer, Deep Speech, etc.).
Experience with real-time or streaming data processing (e.g., Web Sockets, asyncio, FFmpeg).
Experience working with LLMs and designing robust systems that utilize them.
Background in CUDA optimization and GPU acceleration techniques.
Understanding of audio preprocessing, signal processing, and speech-to-text workﬂows.
Experience working in Linux environments.
Strong debugging, proﬁling, and performance tuning skills.

Preferred Qualiﬁcations:

Prior work on low-latency broadcast or captioning systems.
Knowledge of python and C++ for performance-critical components.

Experience deploying models on embedded devices (Jetson, Orin, or similar).
Familiarity with real-time translation or multi-language ASR.