Open Access

Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

  • Jeffrey Schuster1Email author,
  • Kshitij Gupta1,
  • Raymond Hoare1 and
  • Alex K Jones1
EURASIP Journal on Embedded Systems20062006:048085

https://doi.org/10.1155/ES/2006/48085

Received: 21 December 2005

Accepted: 27 June 2006

Published: 2 November 2006

Abstract

This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]

Authors’ Affiliations

(1)
University of Pittsburgh

References

  1. Agaram KK, Keckler SW, Burger D: Characterizing the SPHINX speech recognition system. In Tech. Rep. TR2001-18. Department of Computer Sciences, University of Texas at Austin, Austin, Tex, USA; 2001.Google Scholar
  2. Lai C, Lu S-L, Zhao Q: Performance analysis of speech recognition software. Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, February 2002, Cambridge, Mass, USA Google Scholar
  3. Ravishankar M, Singh R, Raj B, Stern R: The 1999 CMU 10x real time broadcast news transcription system. Proceedings of DARPA Workshop on Automatic Transcription of Broadcast News, May 2000, Washington, DC, USA Google Scholar
  4. Rabiner L, Juang BH: Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ, USA; 1993.Google Scholar
  5. Huang X, Acero A, Hon H: Spoken Language Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 2001.Google Scholar
  6. Results or a medium vocabulary test, CMU Sphinx, http://cmusphinx.sourceforge.net/MediumVocabResults.html
  7. ARM922T (Rev 0) Technical Reference Manual, ARMGoogle Scholar
  8. Anantharaman TS, Bisiani R: A hardware accelerator for speech recognition algorithms. Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA '86), June 1986, Tokyo, Japan 216-223.Google Scholar
  9. Nedevschi S, Patra RK, Brewer EA: Hardware speech recognition for user interfaces in low cost, low power devices. Proceedings of Design Automation Conference (DAC '05), June 2005, Anaheim, Calif, USA 684-689.Google Scholar
  10. Placeway P, Chen S, Eskenazi M, et al.: The 1996 Hub-4 Sphinx-3 system. Proceedings of the DARPA Speech Recognition Workshop, February 1997, Chantilly, Va, USA 85-89.Google Scholar
  11. Hoare R, Gupta K, Schuster J: Speech silicon: a data-driven SoC for performing hidden Markov model based speech recognition. In Proceedings of High Performance Embedded Computing Workshop (HPEC '05), September 2005, Lexington, Mass, USA. MIT;Google Scholar
  12. Hoare R, et al.: A hardware based acoustic modeling pipeline for hidden Markov model based speech recognition. Proceedings of 13th Reconfigurable Architectures Workshop (RAW '06), April 2006, Rhodes Island, Greece Google Scholar
  13. Nouza J: Feature selection methods for hidden Markov model-based speech recognition. Proceedings of the 13th International Conference on Pattern Recognition, August 1996, Vienna, Austria 2: 186-190.View ArticleGoogle Scholar
  14. Mathew B, Davis A, Fang Z: A low-power accelerator for the SPHINX 3 speech recognition system. Proceedings of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '03), November 2003, San Jose, Calif, USA 210-219.View ArticleGoogle Scholar
  15. CMU Sphinx, http://cmusphinx.sourceforge.net/html/cmusphinx.php
  16. Linguistic Data Consortium, http://www.ldc.upenn.edu/
  17. Li X, Bilmes J: Feature pruning in likelihood evaluation of HMM-based speech recognition. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 303-308.Google Scholar
  18. Ravishankar M: Efficient algorithms for speech recognition, M.S. thesis. Carnegie Mellon University, Pittsburgh, Pa, USA; 1996. CMU-CS-96-143Google Scholar

Copyright

© Jeffrey Schuster et al. 2006

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.