Open Access

A Systematic Approach to Design Low-Power Video Codec Cores

  • Kristof Denolf1Email author,
  • Adrian Chirila-Rus4,
  • Paul Schumacher4,
  • Robert Turney4,
  • Kees Vissers4,
  • Diederik Verkest1, 2, 3 and
  • Henk Corporaal5
EURASIP Journal on Embedded Systems20072007:064569

https://doi.org/10.1155/2007/64569

Received: 2 June 2006

Accepted: 5 March 2007

Published: 8 May 2007

Abstract

The higher resolutions and new functionality of video applications increase their throughput and processing requirements. In contrast, the energy and heat limitations of mobile devices demand low-power video cores. We propose a memory and communication centric design methodology to reach an energy-efficient dedicated implementation. First, memory optimizations are combined with algorithmic tuning. Then, a partitioning exploration introduces parallelism using a cyclo-static dataflow model that also expresses implementation-specific aspects of communication channels. Towards hardware, these channels are implemented as a restricted set of communication primitives. They enable an automated RTL development strategy for rigorous functional verification. The FPGA/ASIC design of an MPEG-4 Simple Profile video codec demonstrates the methodology. The video pipeline exploits the inherent functional parallelism of the codec and contains a tailored memory hierarchy with burst accesses to external memory. 4CIF encoding at 30 fps, consumes 71 mW in a 180 nm, 1.62 V UMC technology.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52]

Authors’ Affiliations

(1)
D6, IMEC
(2)
Department of Electrical Engineering, Katholieke Universiteit Leuven (KUL)
(3)
Department of Electrical Engineering, Vrije Universiteit Brussel (VUB)
(4)
Xilinx Inc
(5)
Faculty of Electrical Engineering, Technical University Eindhoven

References

  1. Viredaz MA, Wallach DA: Power evaluation of a handheld computer. IEEE Micro 2003,23(1):66-74. 10.1109/MM.2003.1179900View ArticleGoogle Scholar
  2. Lambrechts A, Raghavan P, Leroy A, et al.: Power breakdown analysis for a heterogeneous NoC platform running a video application. Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP '05), July 2005, Samos, Greece 179-184.Google Scholar
  3. Fujiyoshi T, Shiratake S, Nomura S, et al.: A 63-mW H.264/MPEG-4 audio/visual codec LSI with module-wise dynamic voltage/frequency scaling. IEEE Journal of Solid-State Circuits 2006,41(1):54-62. 10.1109/JSSC.2005.859337View ArticleGoogle Scholar
  4. Horowitz M, Alon E, Patil D, Naffziger S, Kumar R, Bernstein K: Scaling, power, and the future of CMOS. Proceedings of IEEE International Electron Devices Meeting (IEDM '05), December 2005, Washington, DC, USA 7.Google Scholar
  5. Bilsen G, Engels M, Lauwereins R, Peperstraete J: Cyclo-static dataflow. IEEE Transactions on Signal Processing 1996,44(2):397-408. 10.1109/78.485935View ArticleGoogle Scholar
  6. Pirsch P, Berekovic M, Stolberg H-J, Jachalsky J: VLSI architectures for MPEG-4. Proceedings of International Symposium on VLSI Technology, Systems, and Applications (VTSA '03), October 2003, Hsinchu, Taiwan 208A-208E.Google Scholar
  7. Chien S-Y, Huang Y-W, Chen C-Y, Chen HH, Chen L-G: Hardware architecture design of video compression for multimedia communication systems. IEEE Communications Magazine 2005,43(8):123-131. 10.1109/MCOM.2005.1497562View ArticleGoogle Scholar
  8. Lian C-J, Huang Y-W, Fang H-C, Chang Y-C, Chen L-G: JPEG, MPEG-4, and H.264 codec IP development. Proceedings of Design, Automation and Test in Europe (DATE '05), March 2005, Munich, Germany 2: 1118-1119.View ArticleGoogle Scholar
  9. Edwards S, Lavagno L, Lee EA, Sangiovanni-Vincentelli A: Design of embedded systems: formal models, validation, and synthesis. Proceedings of the IEEE 1997,85(3):366-390. 10.1109/5.558710View ArticleGoogle Scholar
  10. Mazzoni L: Power aware design for embedded systems. IEE Electronics Systems and Software 2003,1(5):12-17. 10.1049/ess:20030502View ArticleGoogle Scholar
  11. Catthoor F, Wuytack S, de Greef E, Balasa F, Nachtergaele L, Vandecappelle A: Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, Mass, USA; 1998.View ArticleMATHGoogle Scholar
  12. Panda PR, Catthoor F, Dutt ND, et al.: Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems 2001,6(2):149-206. 10.1145/375977.375978View ArticleGoogle Scholar
  13. Denolf K, de Vleeschouwer C, Turney R, Lafruit G, Bormans J: Memory centric design of an MPEG-4 video encoder. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(5):609-619. 10.1109/TCSVT.2005.846430View ArticleGoogle Scholar
  14. Lee EA, Parks TM: Dataflow process networks. Proceedings of the IEEE 1995,83(5):773-801. 10.1109/5.381846View ArticleGoogle Scholar
  15. Davare A, Zhu Q, Moondanos J, Sangiovanni-Vincentelli A: JPEG encoding on the Intel MXP5800: a platform-based design case study. Proceedings of the 3rd IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia '05), September 2005, New York, NY, USA 89-94.Google Scholar
  16. Hwang H, Oh T, Jung H, Ha S: Conversion of reference C code to dataflow model: H.264 encoder case study. Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC '06), January 2006, Yokohama, Japan 152-157.Google Scholar
  17. Haim F, Sen M, Ko D-I, Bhattacharyya SS, Wolf W: Mapping multimedia applications onto configurable hardware with parameterized cyclo-static dataflow graphs. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '06), May 2006, Toulouse, France 3: 1052-1055.Google Scholar
  18. Williamson MC, Lee EA: Synthesis of parallel hardware implementations from synchronous dataflow graph specifications. Proceedings of the 30th Asilomar Conference on Signals, Systems and Computers, November 1996, Pacific Grove, Calif, USA 2: 1340-1343.Google Scholar
  19. Horstmannshoff J, Meyr H: Efficient building block based RTL code generation from synchronous data flow graphs. Proceedings of the 37th Conference on Design Automation (DAC '00), June 2000, Los Angeles, Calif, USA 552-555.View ArticleGoogle Scholar
  20. Jung H, Lee K, Ha S: Efficient hardware controller synthesis for synchronous dataflow graph in system level design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2002,10(4):423-428. 10.1109/TVLSI.2002.807765View ArticleGoogle Scholar
  21. Dalcolmo J, Lauwereins R, Ade M: Code generation of data dominated DSP applications for FPGA targets. Proceedings of the 9th IEEE International Workshop on Rapid System Prototyping, June 1998, Leuven, Belgium 162-167.Google Scholar
  22. Grou-Szabo R, Ghattas H, Savaria Y, Nicolescu G: Component-based methodology for hardware design of a dataflow processing network. Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC '05), July 2005, Banff, Alberta, Canada 289-294.View ArticleGoogle Scholar
  23. Keutzer K, Newton AR, Rabaey JM, Sangiovanni-Vincentelli A: System-level design: orthogonalization of concerns and platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2000,19(12):1523-1543. 10.1109/43.898830View ArticleGoogle Scholar
  24. Denning D, Harold N, Devlin M, Irvine J: Using system generator to design a reconfigurable video encryption system. Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL '03), September 2003, Lisbon, Portugal 980-983.Google Scholar
  25. Nakamura Y, Hosokawa K, Kuroda I, Yoshikawa K, Yoshimura T: A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication. Proceedings of the 41st Design Automation Conference (DAC '04), June 2004, San Diego, Calif, USA 299-304.View ArticleGoogle Scholar
  26. Siripokarpirom R, Mayer-Lindenberg F: Hardware-assisted simulation and evaluation of IP cores using FPGA-based rapid prototyping boards. Proceedings of the 15th IEEE International Workshop on Rapid Systems Prototyping, June 2004, Geneva, Switzerland 96-102.Google Scholar
  27. Amer I, Sayed M, Badawy W, Jullien G: On the way to an H.264 HW/SW reference model: a systemC modeling strategy to integrate selected IP-blocks with the H.264 software reference model. Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS '05), November 2005, Athens, Greece 178-181.Google Scholar
  28. Amer I, Rahman CA, Mohamed T, Sayed M, Badawy W: A hardware-accelerated framework with IP-blocks for application in MPEG-4. Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC '05), July 2005, Banff, Alberta, Canada 211-214.View ArticleGoogle Scholar
  29. Irwin MJ, Kandemir MT, Vijaykrishnan N, Sivasubramaniam A: A holistic approach to system level energy optimization. In Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation (PATMOS '00), September 2000, Göttingen, Germany. Springer; 88-107.Google Scholar
  30. http://www.imec.be/design/atomium/
  31. Schumacher P, Denolf K, Chirila-Rus A, et al.: A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications. Proceedings of IEEE International Conference on Image Processing (ICIP '05), September 2005, Genova, Italy 2: 886-889.Google Scholar
  32. Information technology—generic coding of audio-visual objects—part 2: visual ISO/IEC 14496-2:2004, June 2004Google Scholar
  33. Bhaskaran V, Konstantinides K: Image and Video Compression Standards, Algorithms and Architectures. Kluwer Academic Publishers, Boston, Mass, USA; 1997.View ArticleGoogle Scholar
  34. Information technology—generic coding of audio-visual objects—part 5: reference software ISO/IEC 14496-5:2001, December 2001Google Scholar
  35. de Vleeschouwer C: Model-based rate control implementation for low-power video communications systems. IEEE Transactions on Circuits and Systems for Video Technology 2003,13(12):1187-1194. 10.1109/TCSVT.2003.819181View ArticleGoogle Scholar
  36. de Vleeschouwer C, Nilsson T, Denolf K, Bormans J: Algorithmic and architectural co-design of a motion-estimation engine for low-power video devices. IEEE Transactions on Circuits and Systems for Video Technology 2002,12(12):1093-1105. 10.1109/TCSVT.2002.806810View ArticleGoogle Scholar
  37. Sriram S, Bhattacharyya SS: Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, New York, NY, USA; 2000.Google Scholar
  38. Wiggers M, Bekooij M, Jansen P, Smit G: Efficient computation of buffer capacities for multi-rate real-time systems with back-pressure. Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06), October 2006, Seoul, Korea 10-15.View ArticleGoogle Scholar
  39. Rintaluoma T, Silven O, Raekallio J: Interface overheads in embedded multimedia software. Proceedings of the 6th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS '06), July 2006, Samos, Greece 5-14.View ArticleGoogle Scholar
  40. Amphion : Standalone MPEG-4 video encoders. 2003.Google Scholar
  41. Nakayama H, Yoshitake T, Komazaki H, et al.: An MPEG-4 video LSI with an error-resilient codec core based on a fast motion estimation algorithm. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '02), February 2002, San Francisco, Calif, USA 1: 368-474.Google Scholar
  42. Yamada T, Irie N, Nishimoto J, et al.: A 133 MHz 170 mW 10 μ A standby application processor for 3G cellular phones. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '02), February 2002, San Francisco, Calif, USA 1: 370-474.Google Scholar
  43. Arakida H, Takahashi M, Tsuboi Y, et al.: A 160 mW, 80 nA standby, MPEG-4 audiovisual LSI with 16 Mb embedded DRAM and a 5 GOPS adaptive post filter. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '03), February 2003, San Francisco, Calif, USA 1: 42-476.View ArticleGoogle Scholar
  44. Chang Y-C, Chao W-M, Chen L-G: Platform-based MPEG-4 video encoder SOC design. Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS '04), October 2004, Austin, Tex, USA 251-256.Google Scholar
  45. Yamauchi H, Okada S, Watanabe T, et al.: An 81MHz, 1280 × 720pixels × 30frames/s MPEG-4 video/audio codec processor. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '05), February 2005, San Francisco, Calif, USA 1: 130-589.Google Scholar
  46. Watanabe Y, Yoshitake T, Morioka K, et al.: Low power MPEG-4 ASP codec IP macro for high quality mobile video applications. Proceedings of IEEE International Conference on Consumer Electronics (ICCE '05), January 2005, Las Vegas, Nev, USA 337-338.Google Scholar
  47. Lin C-P, Tseng P-C, Chiu Y-T, et al.: A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '06), February 2006, San Francisco, Calif, USA 1: 1626-1635.Google Scholar
  48. http://www.annapmicro.com/products.html
  49. ISO/IEC JTC1/SC29WG11 : Information technology—generic coding of audio-visual objects—part 2: visual, amendment 2: new levels for simple profile. Tech. Rep. N6496 2004.Google Scholar
  50. http://www.synopsys.com/
  51. http://www.model.com/
  52. http://www.itrs.net/Links/2006Update/FinalToPost/02_Design_2006Update.pdf

Copyright

© Denolf et al. 2007

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.