Skip to main content

A Systematic Approach to Design Low-Power Video Codec Cores


The higher resolutions and new functionality of video applications increase their throughput and processing requirements. In contrast, the energy and heat limitations of mobile devices demand low-power video cores. We propose a memory and communication centric design methodology to reach an energy-efficient dedicated implementation. First, memory optimizations are combined with algorithmic tuning. Then, a partitioning exploration introduces parallelism using a cyclo-static dataflow model that also expresses implementation-specific aspects of communication channels. Towards hardware, these channels are implemented as a restricted set of communication primitives. They enable an automated RTL development strategy for rigorous functional verification. The FPGA/ASIC design of an MPEG-4 Simple Profile video codec demonstrates the methodology. The video pipeline exploits the inherent functional parallelism of the codec and contains a tailored memory hierarchy with burst accesses to external memory. 4CIF encoding at 30 fps, consumes 71 mW in a 180 nm, 1.62 V UMC technology.

[1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52]


  1. 1.

    Viredaz MA, Wallach DA: Power evaluation of a handheld computer. IEEE Micro 2003,23(1):66-74. 10.1109/MM.2003.1179900

    Article  Google Scholar 

  2. 2.

    Lambrechts A, Raghavan P, Leroy A, et al.: Power breakdown analysis for a heterogeneous NoC platform running a video application. Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP '05), July 2005, Samos, Greece 179-184.

    Google Scholar 

  3. 3.

    Fujiyoshi T, Shiratake S, Nomura S, et al.: A 63-mW H.264/MPEG-4 audio/visual codec LSI with module-wise dynamic voltage/frequency scaling. IEEE Journal of Solid-State Circuits 2006,41(1):54-62. 10.1109/JSSC.2005.859337

    Article  Google Scholar 

  4. 4.

    Horowitz M, Alon E, Patil D, Naffziger S, Kumar R, Bernstein K: Scaling, power, and the future of CMOS. Proceedings of IEEE International Electron Devices Meeting (IEDM '05), December 2005, Washington, DC, USA 7.

    Google Scholar 

  5. 5.

    Bilsen G, Engels M, Lauwereins R, Peperstraete J: Cyclo-static dataflow. IEEE Transactions on Signal Processing 1996,44(2):397-408. 10.1109/78.485935

    Article  Google Scholar 

  6. 6.

    Pirsch P, Berekovic M, Stolberg H-J, Jachalsky J: VLSI architectures for MPEG-4. Proceedings of International Symposium on VLSI Technology, Systems, and Applications (VTSA '03), October 2003, Hsinchu, Taiwan 208A-208E.

    Google Scholar 

  7. 7.

    Chien S-Y, Huang Y-W, Chen C-Y, Chen HH, Chen L-G: Hardware architecture design of video compression for multimedia communication systems. IEEE Communications Magazine 2005,43(8):123-131. 10.1109/MCOM.2005.1497562

    Article  Google Scholar 

  8. 8.

    Lian C-J, Huang Y-W, Fang H-C, Chang Y-C, Chen L-G: JPEG, MPEG-4, and H.264 codec IP development. Proceedings of Design, Automation and Test in Europe (DATE '05), March 2005, Munich, Germany 2: 1118-1119.

    Article  Google Scholar 

  9. 9.

    Edwards S, Lavagno L, Lee EA, Sangiovanni-Vincentelli A: Design of embedded systems: formal models, validation, and synthesis. Proceedings of the IEEE 1997,85(3):366-390. 10.1109/5.558710

    Article  Google Scholar 

  10. 10.

    Mazzoni L: Power aware design for embedded systems. IEE Electronics Systems and Software 2003,1(5):12-17. 10.1049/ess:20030502

    Article  Google Scholar 

  11. 11.

    Catthoor F, Wuytack S, de Greef E, Balasa F, Nachtergaele L, Vandecappelle A: Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, Mass, USA; 1998.

    Book  MATH  Google Scholar 

  12. 12.

    Panda PR, Catthoor F, Dutt ND, et al.: Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems 2001,6(2):149-206. 10.1145/375977.375978

    Article  Google Scholar 

  13. 13.

    Denolf K, de Vleeschouwer C, Turney R, Lafruit G, Bormans J: Memory centric design of an MPEG-4 video encoder. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(5):609-619. 10.1109/TCSVT.2005.846430

    Article  Google Scholar 

  14. 14.

    Lee EA, Parks TM: Dataflow process networks. Proceedings of the IEEE 1995,83(5):773-801. 10.1109/5.381846

    Article  Google Scholar 

  15. 15.

    Davare A, Zhu Q, Moondanos J, Sangiovanni-Vincentelli A: JPEG encoding on the Intel MXP5800: a platform-based design case study. Proceedings of the 3rd IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia '05), September 2005, New York, NY, USA 89-94.

    Google Scholar 

  16. 16.

    Hwang H, Oh T, Jung H, Ha S: Conversion of reference C code to dataflow model: H.264 encoder case study. Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC '06), January 2006, Yokohama, Japan 152-157.

    Google Scholar 

  17. 17.

    Haim F, Sen M, Ko D-I, Bhattacharyya SS, Wolf W: Mapping multimedia applications onto configurable hardware with parameterized cyclo-static dataflow graphs. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '06), May 2006, Toulouse, France 3: 1052-1055.

    Google Scholar 

  18. 18.

    Williamson MC, Lee EA: Synthesis of parallel hardware implementations from synchronous dataflow graph specifications. Proceedings of the 30th Asilomar Conference on Signals, Systems and Computers, November 1996, Pacific Grove, Calif, USA 2: 1340-1343.

    Google Scholar 

  19. 19.

    Horstmannshoff J, Meyr H: Efficient building block based RTL code generation from synchronous data flow graphs. Proceedings of the 37th Conference on Design Automation (DAC '00), June 2000, Los Angeles, Calif, USA 552-555.

    Google Scholar 

  20. 20.

    Jung H, Lee K, Ha S: Efficient hardware controller synthesis for synchronous dataflow graph in system level design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2002,10(4):423-428. 10.1109/TVLSI.2002.807765

    Article  Google Scholar 

  21. 21.

    Dalcolmo J, Lauwereins R, Ade M: Code generation of data dominated DSP applications for FPGA targets. Proceedings of the 9th IEEE International Workshop on Rapid System Prototyping, June 1998, Leuven, Belgium 162-167.

    Google Scholar 

  22. 22.

    Grou-Szabo R, Ghattas H, Savaria Y, Nicolescu G: Component-based methodology for hardware design of a dataflow processing network. Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC '05), July 2005, Banff, Alberta, Canada 289-294.

    Google Scholar 

  23. 23.

    Keutzer K, Newton AR, Rabaey JM, Sangiovanni-Vincentelli A: System-level design: orthogonalization of concerns and platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2000,19(12):1523-1543. 10.1109/43.898830

    Article  Google Scholar 

  24. 24.

    Denning D, Harold N, Devlin M, Irvine J: Using system generator to design a reconfigurable video encryption system. Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL '03), September 2003, Lisbon, Portugal 980-983.

    Google Scholar 

  25. 25.

    Nakamura Y, Hosokawa K, Kuroda I, Yoshikawa K, Yoshimura T: A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication. Proceedings of the 41st Design Automation Conference (DAC '04), June 2004, San Diego, Calif, USA 299-304.

    Google Scholar 

  26. 26.

    Siripokarpirom R, Mayer-Lindenberg F: Hardware-assisted simulation and evaluation of IP cores using FPGA-based rapid prototyping boards. Proceedings of the 15th IEEE International Workshop on Rapid Systems Prototyping, June 2004, Geneva, Switzerland 96-102.

    Google Scholar 

  27. 27.

    Amer I, Sayed M, Badawy W, Jullien G: On the way to an H.264 HW/SW reference model: a systemC modeling strategy to integrate selected IP-blocks with the H.264 software reference model. Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS '05), November 2005, Athens, Greece 178-181.

    Google Scholar 

  28. 28.

    Amer I, Rahman CA, Mohamed T, Sayed M, Badawy W: A hardware-accelerated framework with IP-blocks for application in MPEG-4. Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC '05), July 2005, Banff, Alberta, Canada 211-214.

    Google Scholar 

  29. 29.

    Irwin MJ, Kandemir MT, Vijaykrishnan N, Sivasubramaniam A: A holistic approach to system level energy optimization. In Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation (PATMOS '00), September 2000, Göttingen, Germany. Springer; 88-107.

    Google Scholar 

  30. 30.

  31. 31.

    Schumacher P, Denolf K, Chirila-Rus A, et al.: A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications. Proceedings of IEEE International Conference on Image Processing (ICIP '05), September 2005, Genova, Italy 2: 886-889.

    Google Scholar 

  32. 32.

    Information technology—generic coding of audio-visual objects—part 2: visual ISO/IEC 14496-2:2004, June 2004

  33. 33.

    Bhaskaran V, Konstantinides K: Image and Video Compression Standards, Algorithms and Architectures. Kluwer Academic Publishers, Boston, Mass, USA; 1997.

    Book  Google Scholar 

  34. 34.

    Information technology—generic coding of audio-visual objects—part 5: reference software ISO/IEC 14496-5:2001, December 2001

  35. 35.

    de Vleeschouwer C: Model-based rate control implementation for low-power video communications systems. IEEE Transactions on Circuits and Systems for Video Technology 2003,13(12):1187-1194. 10.1109/TCSVT.2003.819181

    Article  Google Scholar 

  36. 36.

    de Vleeschouwer C, Nilsson T, Denolf K, Bormans J: Algorithmic and architectural co-design of a motion-estimation engine for low-power video devices. IEEE Transactions on Circuits and Systems for Video Technology 2002,12(12):1093-1105. 10.1109/TCSVT.2002.806810

    Article  Google Scholar 

  37. 37.

    Sriram S, Bhattacharyya SS: Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, New York, NY, USA; 2000.

    Google Scholar 

  38. 38.

    Wiggers M, Bekooij M, Jansen P, Smit G: Efficient computation of buffer capacities for multi-rate real-time systems with back-pressure. Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06), October 2006, Seoul, Korea 10-15.

    Google Scholar 

  39. 39.

    Rintaluoma T, Silven O, Raekallio J: Interface overheads in embedded multimedia software. Proceedings of the 6th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS '06), July 2006, Samos, Greece 5-14.

    Google Scholar 

  40. 40.

    Amphion : Standalone MPEG-4 video encoders. 2003.

    Google Scholar 

  41. 41.

    Nakayama H, Yoshitake T, Komazaki H, et al.: An MPEG-4 video LSI with an error-resilient codec core based on a fast motion estimation algorithm. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '02), February 2002, San Francisco, Calif, USA 1: 368-474.

    Google Scholar 

  42. 42.

    Yamada T, Irie N, Nishimoto J, et al.: A 133 MHz 170 mW 10 μ A standby application processor for 3G cellular phones. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '02), February 2002, San Francisco, Calif, USA 1: 370-474.

    Google Scholar 

  43. 43.

    Arakida H, Takahashi M, Tsuboi Y, et al.: A 160 mW, 80 nA standby, MPEG-4 audiovisual LSI with 16 Mb embedded DRAM and a 5 GOPS adaptive post filter. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '03), February 2003, San Francisco, Calif, USA 1: 42-476.

    Article  Google Scholar 

  44. 44.

    Chang Y-C, Chao W-M, Chen L-G: Platform-based MPEG-4 video encoder SOC design. Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS '04), October 2004, Austin, Tex, USA 251-256.

    Google Scholar 

  45. 45.

    Yamauchi H, Okada S, Watanabe T, et al.: An 81MHz, 1280 × 720pixels × 30frames/s MPEG-4 video/audio codec processor. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '05), February 2005, San Francisco, Calif, USA 1: 130-589.

    Google Scholar 

  46. 46.

    Watanabe Y, Yoshitake T, Morioka K, et al.: Low power MPEG-4 ASP codec IP macro for high quality mobile video applications. Proceedings of IEEE International Conference on Consumer Electronics (ICCE '05), January 2005, Las Vegas, Nev, USA 337-338.

    Google Scholar 

  47. 47.

    Lin C-P, Tseng P-C, Chiu Y-T, et al.: A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '06), February 2006, San Francisco, Calif, USA 1: 1626-1635.

    Google Scholar 

  48. 48.

  49. 49.

    ISO/IEC JTC1/SC29WG11 : Information technology—generic coding of audio-visual objects—part 2: visual, amendment 2: new levels for simple profile. Tech. Rep. N6496 2004.

  50. 50.

  51. 51.

  52. 52.

Download references

Author information



Corresponding author

Correspondence to Kristof Denolf.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Denolf, K., Chirila-Rus, A., Schumacher, P. et al. A Systematic Approach to Design Low-Power Video Codec Cores. J Embedded Systems 2007, 064569 (2007).

Download citation


  • External Memory
  • Memory Hierarchy
  • Video Codec
  • Algorithmic Tuning
  • Memory Optimization