The first part of the project aims to provide a speech to text solution for displaying air traffic control commands received at the airplane in front of pilot.
The basic step is to extend the language models of some pre-trained noise robust speech recognition models (Kaldi ASR toolkit) for decoding ATC specific commands and phrases.
We have also tried ATC speech signal enhancement by de-noising (Wiener filter, MMSE, deep denoising auto-encoders) for improved speech recognition accuracy.
Automatic Speech Recognition (ASR) in a nutshell:
Future work includes using ATC speech corpus (80 hr available via Linguistic Data Consortium) to train better acoustic models and improve the language models for ATC grammar. We are also preparing noise adulterated data (matching ATC radio com voice quality) using open source speech corpus (e.g. LibriSpeech, Mozilla deep speech etc.) and performing multi-condition training for an improved accuracy.
In cabin speech recognition:
The second part of the project aims to provide speech recognition of in cabin flight announcements for passengers. The speech decoding has to happen on a smart-phone application running a native speech recognition engine (CMU Pocket Sphinx) as well as via a cloud server running Kaldi models through RESTful-API.
Unlike the ATC case where the speech suffers non linear distortions and radio channel noise, the in cabin speech is much cleaner and is affected by additive background engine noise, and reverberations. The language models prepared from open source clean speech and transcripts of in flight announcements are suitable in this case. For training acoustic models we are preparing noise adulterated dataset via re-recoding the clean corpus through smart-phones. A multi-condition training approach will be utilized for creating noise robust in cabin speech recognition ASR.
People: Harshpal Singh, Praveen Rajagopal, Aditya Udupa, Payal Aich, Ramanathan Rahul