Skip to main content

Advertisement

Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

Article metrics

Abstract

This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]

References

  1. 1.

    Agaram KK, Keckler SW, Burger D: Characterizing the SPHINX speech recognition system. In Tech. Rep. TR2001-18. Department of Computer Sciences, University of Texas at Austin, Austin, Tex, USA; 2001.

  2. 2.

    Lai C, Lu S-L, Zhao Q: Performance analysis of speech recognition software. Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, February 2002, Cambridge, Mass, USA

  3. 3.

    Ravishankar M, Singh R, Raj B, Stern R: The 1999 CMU 10x real time broadcast news transcription system. Proceedings of DARPA Workshop on Automatic Transcription of Broadcast News, May 2000, Washington, DC, USA

  4. 4.

    Rabiner L, Juang BH: Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ, USA; 1993.

  5. 5.

    Huang X, Acero A, Hon H: Spoken Language Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 2001.

  6. 6.

    Results or a medium vocabulary test, CMU Sphinx, http://cmusphinx.sourceforge.net/MediumVocabResults.html

  7. 7.

    ARM922T (Rev 0) Technical Reference Manual, ARM

  8. 8.

    Anantharaman TS, Bisiani R: A hardware accelerator for speech recognition algorithms. Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA '86), June 1986, Tokyo, Japan 216-223.

  9. 9.

    Nedevschi S, Patra RK, Brewer EA: Hardware speech recognition for user interfaces in low cost, low power devices. Proceedings of Design Automation Conference (DAC '05), June 2005, Anaheim, Calif, USA 684-689.

  10. 10.

    Placeway P, Chen S, Eskenazi M, et al.: The 1996 Hub-4 Sphinx-3 system. Proceedings of the DARPA Speech Recognition Workshop, February 1997, Chantilly, Va, USA 85-89.

  11. 11.

    Hoare R, Gupta K, Schuster J: Speech silicon: a data-driven SoC for performing hidden Markov model based speech recognition. In Proceedings of High Performance Embedded Computing Workshop (HPEC '05), September 2005, Lexington, Mass, USA. MIT;

  12. 12.

    Hoare R, et al.: A hardware based acoustic modeling pipeline for hidden Markov model based speech recognition. Proceedings of 13th Reconfigurable Architectures Workshop (RAW '06), April 2006, Rhodes Island, Greece

  13. 13.

    Nouza J: Feature selection methods for hidden Markov model-based speech recognition. Proceedings of the 13th International Conference on Pattern Recognition, August 1996, Vienna, Austria 2: 186-190.

  14. 14.

    Mathew B, Davis A, Fang Z: A low-power accelerator for the SPHINX 3 speech recognition system. Proceedings of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '03), November 2003, San Jose, Calif, USA 210-219.

  15. 15.

    CMU Sphinx, http://cmusphinx.sourceforge.net/html/cmusphinx.php

  16. 16.

    Linguistic Data Consortium, http://www.ldc.upenn.edu/

  17. 17.

    Li X, Bilmes J: Feature pruning in likelihood evaluation of HMM-based speech recognition. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 303-308.

  18. 18.

    Ravishankar M: Efficient algorithms for speech recognition, M.S. thesis. Carnegie Mellon University, Pittsburgh, Pa, USA; 1996. CMU-CS-96-143

Download references

Author information

Correspondence to Jeffrey Schuster.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • Silicon
  • Active Data
  • Speech Recognition
  • Control Structure
  • Major Operation