Augmenting Automatic Speech Recognition (ASR) through Web Crawling (Project)
- Automatic speech recognition (ASR) system consists of acoustic model of speech sounds and a language model (LM) that describes word sequences. Training a LM requires a large amount of text data that is similar to the speech recognition task.
- As it is costly to obtain transcribed data, especially for conversational speech, there is often limited amount of training data for LM. With a limited training data, the performance of the ASR system is restricted and is often faced with OOV (Out-of-Vocabulary) issues. The web consists of large amount of text data and is a great source to improve OOV rates. However, it is critical to only retrieve data that are suitable for the ASR task.
- To allow users to better process audio data, one means is to use ASR system to convert audio to text, before further analytics can be performed on the text. An ASR consists of an Acoustic Model (AM) that contains how speech sounds and an LM that describes word sequences. In order to boost the performance of the ASR system and overcome the challenge of inaccuracies of OOV words, one possible approach is to crawl the web for relevant data to boost the LM's ability to recognize OOV words.
Scope & Deliverables
- Review research literature to understand state of the art techniques to augment ASR systems’ LMs with web documents.
- Experiment with existing techniques for retrieving web documents based on existing transcripts, training LMs.
- Conceive and implement a methodology to score the accuracy/precision of LMs and ASR systems.
- Implement and evaluate the ASR systems augmented with web documents
- Interim presentation on web retrieval techniques (depending on length of internship).
- Test suite for evaluating ASR systems.
- Technical write-up on the algorithm used and its performance.
- Lab demo and presentation.
- Functioning code, with comments (and documentation).
- Proficient in Python programming and debugging
- Innovative and willing to learn on the job
- Good communication and presentation skills
- Analytical and possess good troubleshooting skills