A Systematic Approach to Design Low-Power Video Codec Cores
EURASIP Journal on Embedded Systems volume 2007, Article number: 064569 (2007)
The higher resolutions and new functionality of video applications increase their throughput and processing requirements. In contrast, the energy and heat limitations of mobile devices demand low-power video cores. We propose a memory and communication centric design methodology to reach an energy-efficient dedicated implementation. First, memory optimizations are combined with algorithmic tuning. Then, a partitioning exploration introduces parallelism using a cyclo-static dataflow model that also expresses implementation-specific aspects of communication channels. Towards hardware, these channels are implemented as a restricted set of communication primitives. They enable an automated RTL development strategy for rigorous functional verification. The FPGA/ASIC design of an MPEG-4 Simple Profile video codec demonstrates the methodology. The video pipeline exploits the inherent functional parallelism of the codec and contains a tailored memory hierarchy with burst accesses to external memory. 4CIF encoding at 30 fps, consumes 71 mW in a 180 nm, 1.62 V UMC technology.
Viredaz MA, Wallach DA: Power evaluation of a handheld computer. IEEE Micro 2003,23(1):66-74. 10.1109/MM.2003.1179900
Lambrechts A, Raghavan P, Leroy A, et al.: Power breakdown analysis for a heterogeneous NoC platform running a video application. Proceedings of the 16th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP '05), July 2005, Samos, Greece 179-184.
Fujiyoshi T, Shiratake S, Nomura S, et al.: A 63-mW H.264/MPEG-4 audio/visual codec LSI with module-wise dynamic voltage/frequency scaling. IEEE Journal of Solid-State Circuits 2006,41(1):54-62. 10.1109/JSSC.2005.859337
Horowitz M, Alon E, Patil D, Naffziger S, Kumar R, Bernstein K: Scaling, power, and the future of CMOS. Proceedings of IEEE International Electron Devices Meeting (IEDM '05), December 2005, Washington, DC, USA 7.
Bilsen G, Engels M, Lauwereins R, Peperstraete J: Cyclo-static dataflow. IEEE Transactions on Signal Processing 1996,44(2):397-408. 10.1109/78.485935
Pirsch P, Berekovic M, Stolberg H-J, Jachalsky J: VLSI architectures for MPEG-4. Proceedings of International Symposium on VLSI Technology, Systems, and Applications (VTSA '03), October 2003, Hsinchu, Taiwan 208A-208E.
Chien S-Y, Huang Y-W, Chen C-Y, Chen HH, Chen L-G: Hardware architecture design of video compression for multimedia communication systems. IEEE Communications Magazine 2005,43(8):123-131. 10.1109/MCOM.2005.1497562
Lian C-J, Huang Y-W, Fang H-C, Chang Y-C, Chen L-G: JPEG, MPEG-4, and H.264 codec IP development. Proceedings of Design, Automation and Test in Europe (DATE '05), March 2005, Munich, Germany 2: 1118-1119.
Edwards S, Lavagno L, Lee EA, Sangiovanni-Vincentelli A: Design of embedded systems: formal models, validation, and synthesis. Proceedings of the IEEE 1997,85(3):366-390. 10.1109/5.558710
Mazzoni L: Power aware design for embedded systems. IEE Electronics Systems and Software 2003,1(5):12-17. 10.1049/ess:20030502
Catthoor F, Wuytack S, de Greef E, Balasa F, Nachtergaele L, Vandecappelle A: Custom Memory Management Methodology: Exploration of Memory Organization for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, Mass, USA; 1998.
Panda PR, Catthoor F, Dutt ND, et al.: Data and memory optimization techniques for embedded systems. ACM Transactions on Design Automation of Electronic Systems 2001,6(2):149-206. 10.1145/375977.375978
Denolf K, de Vleeschouwer C, Turney R, Lafruit G, Bormans J: Memory centric design of an MPEG-4 video encoder. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(5):609-619. 10.1109/TCSVT.2005.846430
Lee EA, Parks TM: Dataflow process networks. Proceedings of the IEEE 1995,83(5):773-801. 10.1109/5.381846
Davare A, Zhu Q, Moondanos J, Sangiovanni-Vincentelli A: JPEG encoding on the Intel MXP5800: a platform-based design case study. Proceedings of the 3rd IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTImedia '05), September 2005, New York, NY, USA 89-94.
Hwang H, Oh T, Jung H, Ha S: Conversion of reference C code to dataflow model: H.264 encoder case study. Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC '06), January 2006, Yokohama, Japan 152-157.
Haim F, Sen M, Ko D-I, Bhattacharyya SS, Wolf W: Mapping multimedia applications onto configurable hardware with parameterized cyclo-static dataflow graphs. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '06), May 2006, Toulouse, France 3: 1052-1055.
Williamson MC, Lee EA: Synthesis of parallel hardware implementations from synchronous dataflow graph specifications. Proceedings of the 30th Asilomar Conference on Signals, Systems and Computers, November 1996, Pacific Grove, Calif, USA 2: 1340-1343.
Horstmannshoff J, Meyr H: Efficient building block based RTL code generation from synchronous data flow graphs. Proceedings of the 37th Conference on Design Automation (DAC '00), June 2000, Los Angeles, Calif, USA 552-555.
Jung H, Lee K, Ha S: Efficient hardware controller synthesis for synchronous dataflow graph in system level design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2002,10(4):423-428. 10.1109/TVLSI.2002.807765
Dalcolmo J, Lauwereins R, Ade M: Code generation of data dominated DSP applications for FPGA targets. Proceedings of the 9th IEEE International Workshop on Rapid System Prototyping, June 1998, Leuven, Belgium 162-167.
Grou-Szabo R, Ghattas H, Savaria Y, Nicolescu G: Component-based methodology for hardware design of a dataflow processing network. Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC '05), July 2005, Banff, Alberta, Canada 289-294.
Keutzer K, Newton AR, Rabaey JM, Sangiovanni-Vincentelli A: System-level design: orthogonalization of concerns and platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2000,19(12):1523-1543. 10.1109/43.898830
Denning D, Harold N, Devlin M, Irvine J: Using system generator to design a reconfigurable video encryption system. Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL '03), September 2003, Lisbon, Portugal 980-983.
Nakamura Y, Hosokawa K, Kuroda I, Yoshikawa K, Yoshimura T: A fast hardware/software co-verification method for system-on-a-chip by using a C/C++ simulator and FPGA emulator with shared register communication. Proceedings of the 41st Design Automation Conference (DAC '04), June 2004, San Diego, Calif, USA 299-304.
Siripokarpirom R, Mayer-Lindenberg F: Hardware-assisted simulation and evaluation of IP cores using FPGA-based rapid prototyping boards. Proceedings of the 15th IEEE International Workshop on Rapid Systems Prototyping, June 2004, Geneva, Switzerland 96-102.
Amer I, Sayed M, Badawy W, Jullien G: On the way to an H.264 HW/SW reference model: a systemC modeling strategy to integrate selected IP-blocks with the H.264 software reference model. Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS '05), November 2005, Athens, Greece 178-181.
Amer I, Rahman CA, Mohamed T, Sayed M, Badawy W: A hardware-accelerated framework with IP-blocks for application in MPEG-4. Proceedings of the 5th International Workshop on System-on-Chip for Real-Time Applications (IWSOC '05), July 2005, Banff, Alberta, Canada 211-214.
Irwin MJ, Kandemir MT, Vijaykrishnan N, Sivasubramaniam A: A holistic approach to system level energy optimization. In Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation (PATMOS '00), September 2000, Göttingen, Germany. Springer; 88-107.
Schumacher P, Denolf K, Chirila-Rus A, et al.: A scalable, multi-stream MPEG-4 video decoder for conferencing and surveillance applications. Proceedings of IEEE International Conference on Image Processing (ICIP '05), September 2005, Genova, Italy 2: 886-889.
Information technology—generic coding of audio-visual objects—part 2: visual ISO/IEC 14496-2:2004, June 2004
Bhaskaran V, Konstantinides K: Image and Video Compression Standards, Algorithms and Architectures. Kluwer Academic Publishers, Boston, Mass, USA; 1997.
Information technology—generic coding of audio-visual objects—part 5: reference software ISO/IEC 14496-5:2001, December 2001
de Vleeschouwer C: Model-based rate control implementation for low-power video communications systems. IEEE Transactions on Circuits and Systems for Video Technology 2003,13(12):1187-1194. 10.1109/TCSVT.2003.819181
de Vleeschouwer C, Nilsson T, Denolf K, Bormans J: Algorithmic and architectural co-design of a motion-estimation engine for low-power video devices. IEEE Transactions on Circuits and Systems for Video Technology 2002,12(12):1093-1105. 10.1109/TCSVT.2002.806810
Sriram S, Bhattacharyya SS: Embedded Multiprocessors: Scheduling and Synchronization. Marcel Dekker, New York, NY, USA; 2000.
Wiggers M, Bekooij M, Jansen P, Smit G: Efficient computation of buffer capacities for multi-rate real-time systems with back-pressure. Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06), October 2006, Seoul, Korea 10-15.
Rintaluoma T, Silven O, Raekallio J: Interface overheads in embedded multimedia software. Proceedings of the 6th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS '06), July 2006, Samos, Greece 5-14.
Amphion : Standalone MPEG-4 video encoders. 2003.
Nakayama H, Yoshitake T, Komazaki H, et al.: An MPEG-4 video LSI with an error-resilient codec core based on a fast motion estimation algorithm. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '02), February 2002, San Francisco, Calif, USA 1: 368-474.
Yamada T, Irie N, Nishimoto J, et al.: A 133 MHz 170 mW 10 μ A standby application processor for 3G cellular phones. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '02), February 2002, San Francisco, Calif, USA 1: 370-474.
Arakida H, Takahashi M, Tsuboi Y, et al.: A 160 mW, 80 nA standby, MPEG-4 audiovisual LSI with 16 Mb embedded DRAM and a 5 GOPS adaptive post filter. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '03), February 2003, San Francisco, Calif, USA 1: 42-476.
Chang Y-C, Chao W-M, Chen L-G: Platform-based MPEG-4 video encoder SOC design. Proceedings of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS '04), October 2004, Austin, Tex, USA 251-256.
Yamauchi H, Okada S, Watanabe T, et al.: An 81MHz, 1280 × 720pixels × 30frames/s MPEG-4 video/audio codec processor. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '05), February 2005, San Francisco, Calif, USA 1: 130-589.
Watanabe Y, Yoshitake T, Morioka K, et al.: Low power MPEG-4 ASP codec IP macro for high quality mobile video applications. Proceedings of IEEE International Conference on Consumer Electronics (ICCE '05), January 2005, Las Vegas, Nev, USA 337-338.
Lin C-P, Tseng P-C, Chiu Y-T, et al.: A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. Proceedings of IEEE International Solid-State Circuits Conference (ISSCC '06), February 2006, San Francisco, Calif, USA 1: 1626-1635.
ISO/IEC JTC1/SC29WG11 : Information technology—generic coding of audio-visual objects—part 2: visual, amendment 2: new levels for simple profile. Tech. Rep. N6496 2004.
About this article
Cite this article
Denolf, K., Chirila-Rus, A., Schumacher, P. et al. A Systematic Approach to Design Low-Power Video Codec Cores. J Embedded Systems 2007, 064569 (2007). https://doi.org/10.1155/2007/64569
- External Memory
- Memory Hierarchy
- Video Codec
- Algorithmic Tuning
- Memory Optimization