# Dynamic voltage and frequency scaling over delay-constrained mobile multimedia service using approximated relative complexity estimation

- Jihyeok Yun
^{1}Email author, - Deepak Kumar Singh
^{1}and - Doug Young Suh
^{1}

**2013**:13

https://doi.org/10.1186/1687-3963-2013-13

© Yun et al.; licensee Springer. 2013

**Received: **28 January 2013

**Accepted: **11 June 2013

**Published: **4 September 2013

## Abstract

This paper deals with dynamic voltage and frequency scaling (DVFS) in mobile multimedia services. The multimedia services that consume a large amount of energy cannot be continuously used in mobile devices because of battery limitation. The DVFS has been applied to multimedia services in previous studies. However, they have only addressed the issue of power saving and overlooked the fact that mobile multimedia services are sensitive to delays. The proposed method is intended to apply DVFS to multimedia services considering potential delays. Another problem with previous studies is that either separate devices have been employed or appropriate frequency scaling values have been determined through complicated calculation processes to apply DVFS to multimedia services. On the contrary, the proposed method determines appropriate frequency scaling values using the characteristics of multimedia contents without employing any separate devices or undergoing complicated calculation processes. This has the advantage of allowing DVFS to be applied to real-time multimedia content. The present paper proposes a DVFS application method that divides multimedia services into video conferences, which are real-time services, and video streaming, which is a non-real-time service, and that reduces energy consumption in a simple manner while considering the constraints of service delays.

## Keywords

## Introduction

The quality requirements for handheld devices’ video services have been continuously increasing. As a result, it has been challenging to maintain the high level of quality required to satisfy consumers. In this paper, we propose a dynamic voltage and frequency scaling (DVFS) complexity estimation algorithm that can produce power-saving effects close to those produced using previous DVFS with no additional devices or complexity. DVFS is a method of reducing a processor's power consumption by adjusting applied voltage to a processor dynamically.

According to [1–3], the power consumption of a processor is proportional to the square of its supply voltage, and supply voltage is proportional to frequency. Based on these relationships, power consumption can be reduced by adjusting voltage and frequency appropriately. After estimating the complexity of a processor, which is required for decoding, voltage and frequency will be applied appropriately to the estimated complexity.

In the case of [1], although DVFS was adopted as a power-saving method for wireless mobile devices with limited power, video quality was allowed to deteriorate to reduce the complexity of the codec as with [4]. However, the methods proposed by [1, 4] are not suitable for the recent trend of mobile video services in which high-resolution and high-definition video services are preferred.

Ma et al. [2] proposed modeling the complexity of video frames by analyzing the individual module units of video decoders using an appropriate complexity model proposed by [5, 6], which adds a complexity profiler to video decoders. This model increases complexity due to the added extra profiler and does not consider the frame drop or buffering phenomenon, which is generated due to an estimation error that may occur because of the jittering.

Cho and Cho [3] proposed an algorithm to find the optimum combination of frequency and voltage in which the decoding slack time is 0 while decoding time information is stored because of the frequency and voltage applied to a processor. To this end, Cho and Cho [3] used complexity interpolation based on the frame information (e.g., frame size, frame type) under [7] and the feedback control proposed by [8]. However, since Cho and Cho [3] evaluated the performance of this complexity estimation conducted through interpolation based on super low-resolution images (e.g., 240 × 128, 192 × 144, 192 × 112) with small differences in complexity between frames, its applicability to the current trend of using high-resolution images is uncertain. High-resolution and high-definition images involve significant differences in complexity between frames, and thus, complexity estimation errors using linear interpolation are substantial. In addition, this algorithm has to store decoding information between certain periods and cannot prevent a frame drop or a buffering phenomenon since an estimation of the next frame is calculated after taking into account the overhead due to estimation errors.

In this paper, we propose an estimation method that requires simple profilers or calculations for complexity estimation as well as a DVFS method that prevents frame drop and buffering, which is in contrast with the methods proposed in [2, 3]. Our proposed estimation method does complexity estimation with simple profilers or calculations by using the characteristic of multimedia content requiring the repeated processing of similar calculations.

For video content, as the coded frame type is the same and frames are temporally nearer, the similarity becomes higher. This principle is based on the most representative principle of video codec compression. As described in [9], the H.264/AVC standard, which is the most widely used reference codec, also increases the compressibility of the codec using the similarities between temporally close frames. Therefore, it is effective to perform voltage and frequency scaling by taking advantage of complexity information of frames that have been decoded most recently and the same coded frame types requiring no extra calculation processing. In addition, our proposed method sets the limited delay bound of the delays caused by estimation errors, and if the limited delay bound is exceeded, the corresponding frame is decoded with a processor's maximum frequency. As such, if one frame is decoded with maximum frequency, the delay problem is solved, but overhead still exists in terms of power. However, since the number of estimation errors is small due to the offset caused by adding and subtracting repeatedly, and decoding time is reduced significantly when decoded with the maximum frequency, the related overhead is minimal. With such a small amount of energy consumption, frame drop or buffering caused by estimation errors does not occur, and energy-saving effects can be obtained, unlike in existing methods.

### Complexity estimation with delay control

The equations for estimation are dependent on the structure of the group of pictures (GOP). The GOP structures usually used in H.264/AVC are shown in Figure 1a, b. Figure 1a is a non-dyadic hierarchical structure, and Figure 1b is dyadic hierarchical structure.

Since our proposed model targets video service, it takes advantage of one characteristic of videos: their similarity between temporally close frames. In our method, referenced frames for complexity can be searched using the GOP size and the intra-period, which are parameters of the video coder according to [9, 10].

*s*and the intra-period is

*p*, then the

*n*th expected complexity of the I-frame

*c*

_{exp _ i}[

*n*] can be calculated using Equation 1. Similarly the

*n*th expected complexity of the P-frame

*c*

_{exp _ p}[

*n*] can be calculated using Equation 2. The

*n*th expected complexity of the B-frame

*c*

_{exp _ b}[

*n*] can be calculated using Equations 3 and 4. Equations 3 and 4 are the case of non-dyadic hierarchical structure and the case of dyadic hierarchical structure, respectively.

*n*%

*p*) means (

*n*modulo

*p*).

*t*

_{slack}⋅ (

*n*- 1), which is the time to be decoded, and

*t*[

*n*- 1], which is the time when decoding is finished. |Δ| will be controlled to converge to 0 and to be no larger than

*J*

_{max}, the jitter limit.

*c*

_{exp}[

*n*] and

*f*

_{exp}[

*n*] represented by Equation 6,

*c*[

*n*] respectively, estimate the complexity of frame

*n*and frequency, which enables a frame to be decoded for

*t*

_{slack}. In order to lower jitter, the expected frequency

*f*

_{exp}[

*n*] is modified according to Δ.

*J*

_{max}< Δ, while the process is accelerated, as in Figure 2b, in Equation 4 for 0 < Δ ≤

*J*

_{max}and the process is decelerated, as shown in Figure 2c, in Equation 7 for

*otherwise*to control delay variation.

Ma et al. [2] assumed that jitters would not occur, as shown Figure 2c, and thus, that spare time that could be utilized for frame decoding would always exist. However, in the present study, jitters that may occur when DVFS is utilized in the process of video decoding were considered in preparation for situations, as shown in Figure 2a, b.

In our proposed estimation method, when there is no anchor for estimation such as the start time for decoding or the changing of a channel, decoding is performed by applying maximum frequency to a processor. For each frame type, after one frame undergoes performance decoding, the previous frame can act as an anchor so that estimation can be done.

On measuring the calculations of all modules of H.264/AVC using [11], it can be seen that the aforementioned main modules spend an average of 80% of the entire execution time. After getting the values of execution time of each module and the current frequency of the processor, we use them in Equation 6 to calculate the complexity of frame. This process is known as complexity profiling used in [2, 10, 12, 13]. Since this value hardly changes with changes in diverse compression options supported by H.264/AVC, in the present study, an approximate value estimated based on the main modules’ complexity, as shown in Figure 3, is used as the anchor complexity information.

If many scene changes occur, the proposed complexity estimation method generates estimation errors as a result of reduced similarity between neighboring frames. If the estimation value is larger than the actual value, it generates energy wasting, and if the estimation value is smaller than the actual value, it generates delay. As shown in Figure 6, since the estimation error shows a symmetrical form between the right and left sides, the estimation error is offset. As a result, we can see that delay and power-saving efficiency have a tradeoff relationship with each other.

Figure 7a illustrates a case where two types of frames - I-frames and P-frames - were used to compose image sequences, and Figure 7b depicts a case where three types of frames - I-frames, P-frames, and B-frames - were used to compose image sequences.

Figures 6 and 7 show the estimation accuracy of the proposed method. From these figures, we can see that the proposed method can show the unexpected large estimation errors that may occur. Usually, the frame correlation is high because of the same frame type and being adjacent to each other. However, during cases like scene change, the frame correlation becomes low. This is the case where there occur large estimation errors. In this case, complexity information for estimation for the next frame cannot be used, and we should update the complexity information. So, we use complexity profiling with max frequency of processor which is explained in Equation 7.

Regarding delay and power wasting, which are generated despite the offset between estimation errors while using our proposed method, we compare them by considering the accepted delay bound according to the characteristics of buffer and service [14, 15]. Our proposed method for delay control, as shown in Figure 2, is performed using Equation 7.

The proposed complexity profiling is operated at the beginning of decoding or when there is scene change. The profiling is done for each frame type (i.e., I-, P- and B-frames). The proposed complexity profiler detects the execution time of modules of the decoder (i.e., inverse transform, inverse quantization, interpolation, motion compensation, and loop filter) same as that of [2, 10, 12]. When the correlation between the frames of the same type is high, we do not use the profiler and just estimate by execution time of the previous frame.

In [2], profiling is performed in all frames. Each module has the complexity coefficient where complexity coefficient means complexity of the module operating in 1 bit. During decoding, the number of bits in each module is known; thus, we can determine the total complexity of the frame.

In [10], complexity profiling is done in every frame using the execution time like that mentioned in our proposed profiling method.

In [12], complexity profiling is done in every frame using the execution cycle of each module.

In [2, 10, 12], complexity is predicted using different algorithms, but all of them performed post-decoding for all of the frames. However, in our proposed method, we perform post-decoding when there is scene change and on the first frame of each frame type (i.e., I, P, B). Therefore, the proposed method reduces the post-decoding of all frames. This reduction of post-decoding may reduce the access to profiling memory.

### Energy consumption

DVFS is a method used to control a processor's calculation frequency to reduce the amount of energy used for calculations. Basically, when the number of calculations is large, the voltage is amplified to increase the processor's frequency. The correlation formula can be calculated using formulas 8 and 9 as with [2, 16, 17].

*t*(s) and proportional to the number of calculations

*c*(cycle). Supply voltage in complementary metal-oxide-semiconductor (CMOS) circuits can be expressed as shown in formula 9.

*ω*,

*φ*,

*θ*are coefficients determined by the underlying platform. As a representative example, Intel Pentium M1.6GHz (Intel Corporation, Santa Clara, CA, USA) of 90-nm processes has values

*θ*= 0.61,

*φ*= 1, and

*ω*= 5.6 × 10

^{-10}, as shown in [18]. As mentioned in [16], the power (W) of CMOS circuits is expressed by formula 10 below.

According to [16], *P*_{dyn} is the dynamic power consumption, which is determined by the supply voltage and the frequency, and *P*_{DC} is the static power consumption, which is the leakage power consumption of CMOS devices. This is determined by the supply voltage and the constant value. *P*_{on} is the power that maintains the ‘power on’ state of the processor. This is assumed as 0.1 W in the present study.

*P*

_{dyn}is calculated using the following formula in Watt units.

where *K*_{eff} is the effective circuit capacitance.

*P*

_{dyn}multiplied by time

*t*, as shown in formula 13.

*P*

_{DC}can be calculated in voltage units using formula 16 and Table 1 based on [17].

**Underlying coefficient of CMOS circuit**

Constant | Value | Constant | Value |
---|---|---|---|

| 0.063 |
| 5.26 × 10 |

| 0.153 |
| -0.144 |

| 5.38 × 10 |
| 4.8 × 10 |

| 1.83 |
| -0.7 |

| 4.19 |

*V*

_{bs}is the body bias voltage, and this is assumed to be -0.7 V in the present invention.

*I*

_{j}is the reverse bias junction current, which is a constant value.

*I*

_{subn}is the sub-threshold current, which is calculated in Ampere units using formula 17 below and Table 1.

Leakage power consumption *P*_{DC} and *P*_{on} can be obtained using the above formulas, and leakage energy consumption *E*_{DC} and *E*_{on} can be obtained by multiplying calculation time *t*.

*ω*,

*φ*, and

*θ*determined by the underlying platform were set to 5.6 × 10

^{-10}, 1, and 0.61, respectively, and a certain complexity was decided at different frequencies. On reviewing Figure 8, it can be seen that when the same complexity is decoded, the energy decreases as the frequency is reduced, and the energy increases exponentially as the frequency is increased.

### Results of theoretical simulation

To display raw files that are the decoder's outputs, the raw files should be transformed into RGB files. This work conducts float computations in pixel units. Therefore, no differences in the complexity between frames are caused by the transformation work in general cases where images of the same frame size are continued.

However, even if the complexity necessary for transformation work remains constant, since DVFS is conducted, the transformation work will be affected by frequency scaling, and thus, the energy consumption necessary will vary by frame.

### Complexity estimation

The first simulation compares the decoding of energy consumption of the processor between methods that use DVFS and the method that does not use DVFS [2], using complexity modeling and our proposed method. This simulation uses a science fiction, action movie [19] which is the worst simulation environment for our proposed method. We also assume that the estimation error of the comparison counterpart method [2] is 0%, which represents the best possible estimation.

### Delay control

The second simulation compares energy consumption while a method of frame drop prevention, which is generated by the estimation error of DVFS to support QoE, is used. Since Ma et al. [2] does not consider delay, in order to overcome an estimation error of 3%, DVFS shall be performed with a margin of 3% of complexity estimation. Our proposed method overcomes frame drop by setting a delay threshold as a buffer during DVFS operation to overcome the estimation error. The large buffer can overcome a large estimation error. This use of the buffer can make the proposed method adapted to real-time video service. When there is burst scene change, the total time required for post-decoding is large. So, the buffer can be used to overcome the delay caused by this time requirement.

Threshold values for delay, *D*_{th}, are set as three values such as 0.01 s (=10 ms), 0.1 s (=100 ms), and 1 s (=1,000 ms).

The third simulation compares reduction of post-decoding. We assume the scene change with a larger error rate than [2, 10, 12]. So, we perform the post-decoding to make the estimation error rate equal to the average estimation error of [2, 10, 12].

In case of [2], the average estimation error rate was 3%, so our assumed estimation error rate was higher than 3%. Similarly, for [10, 12], the estimation rate was assumed based on their average estimation error rate.

### Results of experimental simulation

On reviewing the results in theoretical simulation, it can be seen that quite excellent energy-saving effects can be obtained if the proposed method is used when the movie [19] is decoded using the H.264/AVC reference software 18.3 version. However, the results in the theoretical simulation only related to the energy consumed in the central processing unit (CPU) and did not consider the energy consumed in numerous background programs (e.g., display, system software, widget) as with cases where video services are used in actual mobile devices.

Energy-saving effects obtained when the method proposed in the theoretical simulation was applied to actual mobile devices were measured.

**Specific embedded testbed for experiment**

Description | |
---|---|

Processor | ARM Coretex™ - A9 dual core (2 GHz) |

Memory | 1 GB LPDDR2 |

LCD | 7” 800 × 400 resolution |

System software | Linux Kernel 2.6.35.7, Android 2.3.5 (Gingerbread) |

**Characteristics of CPU governors on Linux-based operating system**

Governor | Frequency scaling method |
---|---|

Performance | Uses only the maximum clock frequency that can be provided by the CPU |

Powersave | Uses only the minimum clock frequency that can be provided by the CPU |

Ondemand | Changes between the maximum/minimum frequencies depending on load conditions |

Conservative | Gradually changes between the maximum ~ minimum frequencies depending on load conditions |

Userspace | Uses the frequency designated by the user |

Android-based mobile devices generally produced for commercial purposes use the on-demand governor that changes frequencies gradually depending on the load conditions of the CPU. However, since gradual frequency changes are not suitable for cases where loads vary according to the kind of frames that constitute videos, the userspace governor that changes frequencies as requested by the user was used in this experiment.

Please note that the frequencies that can be designated using the userspace governor in the aforementioned system configuration of the embedded testbed are limited to 2 GHz, 1.6 GHz, 1 GHz, and 400 MHz.

As with the experiment in the theoretical simulation, this experiment compared decoding energy consumption among cases where DVFS was not used, cases where the method under [2] used complexity modeling, and cases where the proposed method was used. This experiment was conducted with [12], which is a science fiction, action movie with the lowest level of similarity among frames; this experimental environment can show the worst performance of the proposed method, while the method under [2] with which the proposed method was compared was a case of best performance with an estimation error of 0%.

On reviewing Figure 13, it can be seen that the amount of energy corresponding to 91.08% of the energy consumed in cases where DVFS was not used was consumed in the case of best performance under [2], and 91.41% of the energy was consumed in the case where the proposed method was used despite the fact that no separate computation for estimation was conducted. Given these results, it can be seen that although the energy-saving efficiency was lower in this experiment than in the theoretical simulation where only the energy efficiency of the CPU was tested, since this experiment was conducted in an environment where numerous background programs operated together, as in actual use environments, when compared with [1], almost the same energy-saving effect could be identified despite the fact that the experimental environment was the worst for the proposed method similar to the results in the theoretical simulation.

## Conclusion

As discussed in the ‘Introduction’, conventional methods that apply DVFS to video decoding use extra estimation profilers or post-decoding (i.e., calculations). These methods generate additional complexity and estimation errors. This paper proposes an estimation method that takes advantage of the correlation between video frames of the same frame type, which is in contrast to other estimation methods that uses only post-decoding. This proposed method shall have larger estimation errors compared to the conventional methods because it uses a less number of post-decoding than the conventional method.

When the estimation value is larger than the actual value, our proposed method controls decoding jitter from the estimation error if the range (i.e., *J*_{max} in Figure 2) of estimation error is within the threshold value. On the other hand, since the conventional methods such as those of [2, 3] do not consider delay, they can cause frame drop as a result of estimation errors. As shown in our simulation result, our proposed method, which uses fewer post-decoding, showed comparable performance with the performance of [2] that uses more post-decoding than the proposed method.

As a result of Figure 11, the proposed method, which sets the delay threshold (i.e., *J*_{max} in Figure 2), is applied according to [14]. So, the proposed method can be used on different video types and services with delay acceptance (e.g., delay threshold for video conversation is 100 ms; delay threshold for video streaming is 1 s) and thus can be used to save energy.

Therefore, our proposed method reduces the number of post-decoding unlike the conventional algorithms where post-decoding is done to all frames. It may reduce the memory access of the profiler, but memory analysis was not performed to confirm it.

In this paper, we have discussed DVFS, which only considers video decoding. However, our future research can develop further energy-saving methods that can be used in the overall video service systems such as video data receiving, memory access, video decoding, and video display.

## Declarations

### Acknowledgments

This research was supported by the MKE (Ministry of Knowledge Economy), Korea under the ITRC (Information Technology Research Center) support program (NIPA-2012-H0301-12-1006) supervised by the NIPA (National IT Industry Promotion Agency). This research was funded by the MSIP (Ministry of Science, ICT & Future Planning), Korea in the ICT R&D Program 2013.

## Authors’ Affiliations

## References

- He Z, Liang Y:
**Power-rate-distortion analysis for wireless video communication under energy constraints.***IEEE Trans. Circuits Syst. Video Technol.*2005,**15**(5):645-658.View ArticleGoogle Scholar - Ma Z, Hu H, Wang Y:
**On complexity modeling of H.264/AVC video decoding and its application for energy efficient decoding.***IEEE Trans. Multimedia*2011,**13:**1240-1255.View ArticleGoogle Scholar - Cho J, Cho I:
**A combined approach for QoS-guaranteed and low-power video decoding.***IEEE Trans. Consumer Elec.*2011,**57:**651-657.View ArticleGoogle Scholar - van der Schaar M, Andreopoulos Y:
**Rate-distortion-complexity modeling for network and receiver aware adaption.***IEEE Trans. Multimedia*2005,**7**(3):471-479.View ArticleGoogle Scholar - Horowitz M, Joch A:
**H.264AVC baseline profile decoder complexity analysis.***IEEE Trans. Circuits Syst. Video Technol.*2003,**13**(7):704-716. 10.1109/TCSVT.2003.814967View ArticleGoogle Scholar - Ma Z, Zhang Z:
**Complexity modeling of H.264 entropy decoding.***Proc. ICIP, San Diego, October 2008*Google Scholar - Choi K, Cheng W:
**Frame-based dynamic voltage and frequency scaling for an MPEG decoder.***Proc. ICCAD, San Jose, November 2002*Google Scholar - Lu Z, Lach J:
**Reducing multimedia decode power using feedback control.***Proceedings of International Conference on Computer Design, San Jose, October 2003*Google Scholar - Wiegand T, Sulivan GJ:
**Overview of the H.264/AVC video coding standard.***IEEE Trans. Circuits Syst. Video Technol.*2003,**13**(7):560-576.View ArticleGoogle Scholar - Kontorinis N, Andreopoulos Y:
**Statistical framework for video decoding complexity modeling and prediction.***IEEE Trans. Circuits Syst. Video Technol.*2009,**19**(7):1000-1013.View ArticleGoogle Scholar - Intel Parallel Studio[Online], Available: software.intel.com/en-us/intel-parallel-studio-xe. Accessed 8 February 2012
- Akyol E, Scharr M:
**Complexity model based proactive dynamic voltage scaling for video decoding systems.***IEEE Trans. Multimedia*2007,**9**(7):1475-1492.View ArticleGoogle Scholar - Andreopoulos Y, van der Scharr M:
**Complexity-constrained video bitstream shaping.***IEEE Trans. Signal Process*2007,**55**(5):1967-1974.MathSciNetView ArticleGoogle Scholar - Nortel Networks:
*QoS performance requirements for UMTS, 3GPP TSG SA S1*. Copenhagen; 1999.Google Scholar - Wee S, Tan W:
**Optimized video streaming for networks with varying delay.***Proc. ICME, Lausanne, August 2002*Google Scholar - Jejurikar R, Pereira C:
**Leakage aware dynamic voltage scaling for real-time embedded systems.***Proc. DAC, San Diego, July 2004*Google Scholar - Martin S, Flautner K:
**Combined dynamic voltage scaling and adaptive body biasing for optimal power consumption in microprocessors under dynamic workloads.***Proc. ICCAD, San Jose, November 2002*Google Scholar - Intel Pentium Mobile Processor[Online]. Available: . Accessed 15 January 2012 http://www.intel.com/content/www/us/en/intelligent-systems/previous-generation/embedded-pentium-m.html [Online]. Available: . Accessed 15 January 2012
- Warner Bros. Entertainment:
*The Matrix*. United States; 1999.Google Scholar - Dolby Laboratories Inc., Fraunhofer-Institute HHI, Microsoft Corporation ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-X072, Geneva, 29 June–5 July 2007. H.264/MPEG-4 AVC Reference Software ManualGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.