Speech recognition

Speech recognition is an interdisciplinary subfield of computer science and computational linguistics focused on developing computer-based methods and technologies to translate spoken language into text. It is also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT).

Speech recognition applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing (e.g. "I would like to make a collect call"), and home automation (e.g., "turn off the kitchen lights"). There are also productivity applications for speech recognition, such as searching audio recordings (e.g., by creating a transcript), simple data entry (e.g., speaking a credit card number aloud), preparation of structured documents (e.g. a radiology report), determining speaker characteristics,[1] speech-to-text processing (e.g., word processors or emails), and controlling aircraft (usually termed direct voice input).

Automatic pronunciation assessment is used in education, such as for spoken language learning.

The term voice recognition[2][3][4] or speaker identification[5][6][7] refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice, or it can authenticate or verify the identity of a speaker as part of a security process.

  1. ^ P. Nguyen (2010). "Automatic classification of speaker characteristics". International Conference on Communications and Electronics 2010. pp. 147–152. doi:10.1109/ICCE.2010.5670700. ISBN 978-1-4244-7055-6. S2CID 13482115.
  2. ^ "British English definition of voice recognition". Macmillan Publishers Limited. Archived from the original on 16 September 2011. Retrieved 21 February 2012.
  3. ^ "voice recognition, definition of". WebFinance, Inc. Archived from the original on 3 December 2011. Retrieved 21 February 2012.
  4. ^ "The Mailbag LG #114". Linuxgazette.net. Archived from the original on 19 February 2013. Retrieved 15 June 2013.
  5. ^ Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). "Optimization of data-driven filterbank for automatic speaker verification". Digital Signal Processing. 104 102795. arXiv:2007.10729. Bibcode:2020DSP...10402795S. doi:10.1016/j.dsp.2020.102795. S2CID 220665533.
  6. ^ Reynolds, Douglas; Rose, Richard (January 1995). "Robust text-independent speaker identification using Gaussian mixture speaker models" (PDF). IEEE Transactions on Speech and Audio Processing. 3 (1): 72–83. doi:10.1109/89.365379. ISSN 1063-6676. OCLC 26108901. S2CID 7319345. Archived (PDF) from the original on 8 March 2014. Retrieved 21 February 2014.
  7. ^ "Speaker Identification (WhisperID)". Microsoft Research. Microsoft. Archived from the original on 25 February 2014. Retrieved 21 February 2014. When you speak to someone, they don't just recognize what you say: they recognize who you are. WhisperID will let computers do that, too, figuring out who you are by the way you sound.

From Wikipedia, the free encyclopedia · View on Wikipedia

Developed by Nelliwinne