Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

Schuster, Jeffrey; Gupta, Kshitij; Hoare, Raymond; Jones, Alex K

doi:10.1155/ES/2006/48085

Research Article
Open access
Published: 02 November 2006

Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

Jeffrey Schuster¹,
Kshitij Gupta¹,
Raymond Hoare¹ &
…
Alex K Jones¹

EURASIP Journal on Embedded Systems volume 2006, Article number: 048085 (2006) Cite this article

2266 Accesses
8 Citations
6 Altmetric
Metrics details

Abstract

This paper examines the design of an FPGA-based system-on-a-chip capable of performing continuous speech recognition on medium sized vocabularies in real time. Through the creation of three dedicated pipelines, one for each of the major operations in the system, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls in the system. Further, by implementing a token-passing scheme between the later stages of the system, the complexity of the control was greatly reduced and the amount of active data present in the system at any time was minimized. Additionally, through in-depth analysis of the SPHINX 3 large vocabulary continuous speech recognition engine, we were able to design models that could be efficiently benchmarked against a known software platform. These results, combined with the ability to reprogram the system for different recognition tasks, serve to create a system capable of performing real-time speech recognition in a vast array of environments.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18]

References

Agaram KK, Keckler SW, Burger D: Characterizing the SPHINX speech recognition system. In Tech. Rep. TR2001-18. Department of Computer Sciences, University of Texas at Austin, Austin, Tex, USA; 2001.
Google Scholar
Lai C, Lu S-L, Zhao Q: Performance analysis of speech recognition software. Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, February 2002, Cambridge, Mass, USA
Google Scholar
Ravishankar M, Singh R, Raj B, Stern R: The 1999 CMU 10x real time broadcast news transcription system. Proceedings of DARPA Workshop on Automatic Transcription of Broadcast News, May 2000, Washington, DC, USA
Google Scholar
Rabiner L, Juang BH: Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series. Prentice Hall, Englewood Cliffs, NJ, USA; 1993.
Google Scholar
Huang X, Acero A, Hon H: Spoken Language Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 2001.
Google Scholar
Results or a medium vocabulary test, CMU Sphinx, http://cmusphinx.sourceforge.net/MediumVocabResults.html
ARM922T (Rev 0) Technical Reference Manual, ARM
Anantharaman TS, Bisiani R: A hardware accelerator for speech recognition algorithms. Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA '86), June 1986, Tokyo, Japan 216-223.
Google Scholar
Nedevschi S, Patra RK, Brewer EA: Hardware speech recognition for user interfaces in low cost, low power devices. Proceedings of Design Automation Conference (DAC '05), June 2005, Anaheim, Calif, USA 684-689.
Google Scholar
Placeway P, Chen S, Eskenazi M, et al.: The 1996 Hub-4 Sphinx-3 system. Proceedings of the DARPA Speech Recognition Workshop, February 1997, Chantilly, Va, USA 85-89.
Google Scholar
Hoare R, Gupta K, Schuster J: Speech silicon: a data-driven SoC for performing hidden Markov model based speech recognition. In Proceedings of High Performance Embedded Computing Workshop (HPEC '05), September 2005, Lexington, Mass, USA. MIT;
Google Scholar
Hoare R, et al.: A hardware based acoustic modeling pipeline for hidden Markov model based speech recognition. Proceedings of 13th Reconfigurable Architectures Workshop (RAW '06), April 2006, Rhodes Island, Greece
Google Scholar
Nouza J: Feature selection methods for hidden Markov model-based speech recognition. Proceedings of the 13th International Conference on Pattern Recognition, August 1996, Vienna, Austria 2: 186-190.
Article Google Scholar
Mathew B, Davis A, Fang Z: A low-power accelerator for the SPHINX 3 speech recognition system. Proceedings of International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '03), November 2003, San Jose, Calif, USA 210-219.
Chapter Google Scholar
CMU Sphinx, http://cmusphinx.sourceforge.net/html/cmusphinx.php
Linguistic Data Consortium, http://www.ldc.upenn.edu/
Li X, Bilmes J: Feature pruning in likelihood evaluation of HMM-based speech recognition. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '03), November-December 2003, St. Thomas, Virgin Islands, USA 303-308.
Google Scholar
Ravishankar M: Efficient algorithms for speech recognition, M.S. thesis. Carnegie Mellon University, Pittsburgh, Pa, USA; 1996. CMU-CS-96-143
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pittsburgh, Pittsburgh, PA, 15261, USA
Jeffrey Schuster, Kshitij Gupta, Raymond Hoare & Alex K Jones

Authors

Jeffrey Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Kshitij Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Hoare
View author publications
You can also search for this author in PubMed Google Scholar
Alex K Jones
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeffrey Schuster.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Schuster, J., Gupta, K., Hoare, R. et al. Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition. J Embedded Systems 2006, 048085 (2006). https://doi.org/10.1155/ES/2006/48085

Download citation

Received: 21 December 2005
Revised: 08 June 2006
Accepted: 27 June 2006
Published: 02 November 2006
DOI: https://doi.org/10.1155/ES/2006/48085

Speech Silicon: An FPGA Architecture for Real-Time Hidden Markov-Model-Based Speech Recognition

Abstract

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords