Skip to main content
  • Research Article
  • Open access
  • Published:

Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

Abstract

This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]

References

  1. Agaram KK, Keckler SW, Burger D: Characterizing the SPHINX speech recognition system. In Tech. Rep. TR2001-18. Department of Computer Sciences, University of Texas at Austin, Austin, Tex, USA; 2001.

    Google Scholar 

  2. Lai C, Lu S-L, Zhao Q: Performance analysis of speech recognition software. Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, February 2002, Cambridge, Mass, USA

    Google Scholar 

  3. Ravishankar M, Singh R, Raj B, Stern R: The 1999 CMU 10x real time broadcast news transcription system. Proceedings of DARPA Workshop on Automatic Transcription of Broadcast News, May 2000, Washington, DC, USA

    Google Scholar 

  4. Rabiner L, Juang BH: Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ, USA; 1993.

    Google Scholar 

  5. Huang X, Acero A, Hon H: Spoken Language Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 2001.

    Google Scholar 

  6. Results or a medium vocabulary test, CMU Sphinx, http://cmusphinx.sourceforge.net/MediumVocabResults.html

  7. ARM922T (Rev 0) Technical Reference Manual, ARM

  8. Anantharaman TS, Bisiani R: A hardware accelerator for speech recognition algorithms. Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA '86), June 1986, Tokyo, Japan 216-223.

    Google Scholar 

  9. Nedevschi S, Patra RK, Brewer EA: Hardware speech recognition for user interfaces in low cost, low power devices. Proceedings of Design Automation Conference (DAC '05), June 2005, Anaheim, Calif, USA 684-689.

    Google Scholar 

  10. Placeway P, Chen S, Eskenazi M, et al.: The 1996 Hub-4 Sphinx-3 system. Proceedings of the DARPA Speech Recognition Workshop, February 1997, Chantilly, Va, USA 85-89.

    Google Scholar 

  11. Hoare R, Gupta K, Schuster J: Speech silicon: a data-driven SoC for performing hidden Markov model based speech recognition. In Proceedings of High Performance Embedded Computing Workshop (HPEC '05), September 2005, Lexington, Mass, USA. MIT;

    Google Scholar 

  12. Hoare R, et al.: A hardware based acoustic modeling pipeline for hidden Markov model based speech recognition. Proceedings of 13th Reconfigurable Architectures Workshop (RAW '06), April 2006, Rhodes Island, Greece

    Google Scholar 

  13. Nouza J: Feature selection methods for hidden Markov model-based speech recognition. Proceedings of the 13th International Conference on Pattern Recognition, August 1996, Vienna, Austria 2: 186-190.

    Article  Google Scholar 

  14. Mathew B, Davis A, Fang Z: A low-power accelerator for the SPHINX 3 speech recognition system. Proceedings of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '03), November 2003, San Jose, Calif, USA 210-219.

    Chapter  Google Scholar 

  15. CMU Sphinx, http://cmusphinx.sourceforge.net/html/cmusphinx.php

  16. Linguistic Data Consortium, http://www.ldc.upenn.edu/

  17. Li X, Bilmes J: Feature pruning in likelihood evaluation of HMM-based speech recognition. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 303-308.

    Google Scholar 

  18. Ravishankar M: Efficient algorithms for speech recognition, M.S. thesis. Carnegie Mellon University, Pittsburgh, Pa, USA; 1996. CMU-CS-96-143

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey Schuster.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Schuster, J., Gupta, K., Hoare, R. et al. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition. J Embedded Systems 2006, 048085 (2006). https://doi.org/10.1155/ES/2006/48085

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1155/ES/2006/48085

Keywords