# A Precise High-Level Power Consumption Model for Embedded Systems Software

- MostafaEA Ibrahim
^{1, 2}Email author, - Markus Rupp
^{2}and - HossamAH Fahmy
^{3}

**2011**:480805

https://doi.org/10.1155/2011/480805

© Mostafa E. A. Ibrahim et al. 2011

**Received: **26 February 2010

**Accepted: **11 August 2010

**Published: **19 August 2010

## Abstract

The increasing demand for portable computing has elevated power consumption to be one of the most critical embedded systems design parameters. In this paper, we present a precise high-level power estimation methodology for the software loaded on a VLIW processor that is based on a functional level power model. The targeted processor of our approach is the TMS320C6416T DSP from Texas Instrument. We consider several important issues in our model such as the pipeline stall, inter-instructions effect and cache misses. The contributions are the following. First, a precise model to estimate the power consumption of the targeted DSP, while running a software algorithm is proposed. Second, we prove the validation and precision of our model on many typical algorithms applied in signal and image processing. Third, we further validate the precision of our model on a real application applied in the video processing field. The power consumption estimated by our model is compared to the physically measured power consumption, achieving a very low average absolute estimation error of 1.65% and a maximum absolute estimation error of only 3.3%.

## Keywords

## 1. Introduction

Many applicationsin special areas such as hand-held computation, tiny robots,and guidance systems in automated vehicles are powered by batteries of low rating. In order to avoid frequent recharging or replacement of the batteries, there is a significant interest in low-power system design. Very Long Instruction Word (VLIW) Digital Signal Processors (DSP) are the most worthy choice for such an application domain because of their optimal performance at low power. The importance of the power constraints during the design of embedded systems has continuously increased in the past years, due to technological trends toward high-level integration and increasing operating frequencies, combined with the growing demand of portable systems. This has led to a significant research effort in power estimation and low power design. Power simulators (profilers) allow the software programmers to specify the hot spot, highly power consuming, segments of their software code as a first step towards optimizing these code segments from power perspective. Developers of power simulators have to embed a precise power consumption model in their simulators. Existing processors power simulators are available only for the lower levels of the design, at the circuit level and to a limited extent at the logic level. These tools are very slow and impractical to use to evaluate the power consumption of embedded software since the application power consumption would only be known at the very last stage of the design process. In this paper, an approach for estimating the power consumption of a VLIW DSP while running a software application is presented. The contribution of this work aims to precisely estimate the power consumption of the core processor while running a software algorithm at an early stage in the design process. The targeted DSP is the TMS320C6416T (for the rest of the paper it is referred to as C6416T for brevity) from Texas Instrument. This processor features the highest performance among the fixed-point DSPs of the C6000 DSP platforms.

The rest of the paper is organized as follows. Section 2 presents an overview of the existing power consumption modeling abstraction levels for general purpose processors. Section 3 provides a general overview of the target architecture. Section 4 describes the methodology along with the experimental setup employed in our experiments. Section 5 describes in detail the functional level analysis and the resulting mathematical formulas constituting the model for the targeted architecture functional units. Section 6 demonstrates the validation of the power estimation methodology utilizing the developed power consumption model. Finally, Section 7 summarizes the main contributions of this paper.

## 2. Related Work

This section summarizes the most recent contributions to the problem of power modeling and estimation. Recent approaches to model the power consumption of the software running on a processor can be separated into two main categories *low-level models* and *high-level models*. Low-level models calculate power and energy from detailed electrical descriptions, comprising circuit level, gate level, register transfer (RT) level, or system level. while, high-level models deal only with instructions and functional units from the software point of view and without electrical knowledge of the underlying architecture [1].

### 2.1. Low-Level Estimation Techniques

The level of detail in the modeling performed by the power simulator influences both the accuracy of estimation as well as the speed of the simulator. In this section we survey the models frequently used at low level as these power consumption estimation techniques cover a range of abstractions such as the circuit/transistor level, logic gate level, RT level, and architectural level.

#### 2.1.1. Transistor-Level Estimations

The representation of a microprocessor in terms of transistors and nets is extremely complex and requires undergoing all the steps of the design flow and the layout, routing, and parameter extraction inclusive. Furthermore, a transistor level view of the system employs component models based on linearized differential equations and works in the continuous time domain. This implies that a simulation of more than one million transistors, even for a few clock cycles, requires times that are usually not affordable and anyway not practical for the high-level power characterization [2]. Thus, while providing very good accuracy; transistor-level power estimation methodology is slow and impractical for analyzing the power consumption at an early design stage. Moreover, this methodology requires the availability of lower level circuit details of the targeted processor, which is not available for most of commercial off-the-shelf processors.

The PowerMil [3] is an early attempt to build a low-level power consumption simulator. PowerMil is a transistor level simulator for simulating the current and power behavior in VLSI circuits. It is capable of simulating detailed current behavior in modern deep submicron CMOS circuits, including sophisticated circuitries such as sense-amplifiers, with speed and capacity approaching conventional gate level simulators. For more details about power estimation techniques in VLSI circuits refer to [4, 5].

#### 2.1.2. Gate-Level Estimations

Methods to estimate the power consumption based on gate level descriptions of microprocessors or micro controller cores have been proposed in literature. The main advantage of such methods with respect to transistor-level simulation approaches is that the simulation is event-driven and takes place in a discrete time domain, leading to a considerable reduction of the computational complexity, without a significant loss of accuracy [2].

An example for the gate level power estimators is the model presented by Chou [6]. present an accurate estimation of signal activity at the internal nodes of sequential logic circuits. The power consumption estimation in Chou and Roy is a Monte Carlo based approach that take spatial and temporal correlations of logic signals into consideration.

#### 2.1.3. RT-Level Estimations

A design described at RT-level can be regarded as a collection of blocks and a network of interconnections. The blocks are sometimes referred to as macros, adders, registers, multiplexers, and so on, while the interconnections are simply nets or group of nets. An assumption underlying the great majority of the approaches presented in the literature is that the power properties of a block can be derived from an analysis of the block isolated from a design, under controlled operating conditions. The main factor influencing the power consumption model of a macro is the input statistics [2].

Most of the research in RT level power estimation is based on empirical methods that measure the power consumption of existing implementations and produce models from those measurements. This is in contrast to the approaches that rely on information-theoretic measures of activity to estimate power [7, 8]. Measurement-based approaches for estimating the power consumption of datapath functional units can be divided into two subcategories, namely; transition sensitive and activity sensitive. The first technique, introduced by Powel and Chau [9], is a fixed-activity micromodeling strategy called the Power Factor Approximation (PFA) method. The power models are parameterized in terms of complexity parameters and a PFA proportional constant. Similar schemes were also proposed by Kumar et al. [10] and Liu and Svensson [11]. This approach assumes that the inputs do not affect the switching activity of a hardware block. To remedy this problem, activity-sensitive empirical power models were developed. These schemes are based on predictable input signal statistics; an example is the method proposed by Landman and Rabaey [12]. The overall accuracy of such models may be hampered due to incorrect input statistics or the inability to correctly model the interaction.

The second empirical technique, transition-sensitive power models, is based on input transitions rather than input statistics. The method, proposed by Mehta et al. [13], assumes that a power model is provided for each functional unit—a table containing the power consumed for each input transition. Closely related input transitions and power patterns can be concentrated in clusters, thereby reducing the size of the table. Other researchers have also proposed similar macro-model-based power estimation approaches [14, 15].

#### 2.1.4. Architectural-Level Estimations

Recently, various architectural power simulators have been designed that employ a combination of lower level of abstraction power consumption models. These simulators derive power estimates from the analysis of circuit activity induced by the application programmes during each cycle and from detailed capacitive models for the components activated. A key distinction between these different simulators is the estimation accuracy and estimation speed. For example, the *SimplePower* power simulator [16] employs a transition-sensitive power model for the datapath functional unit. The *SimplePower* core accesses a table containing the switch capacitance for each input transition of the functional unit exercised.

The use of a transition-sensitive approach has both design challenges as well as performance concerns during simulation. The first concern is that the construction of these tables is time-consuming. Unfortunately, the size of this table grows exponentially with the size of the inputs. The table construction problem can be addressed by partitioning and clustering mechanisms. The second concern is the performance cost of the table lookup for each component access in a cycle. In order to overcome this cost, simulators such as *SoftWatt* [17] and *Wattch* [18] utilize a simple fixed-activity model for the functional unit. These simulators only track the number of accesses to a specific component and utilize an average capacity value to estimate the power consumed. Even the same simulator can employ different types of power models for different components. For example, *SimplePower* estimates the power consumed in the memories utilizing analytical models [19]. In contrast to the datapath components that utilize a transition-sensitive approach, these models estimate the power consumed per access and do not accommodate the power differences found in sequences of accesses. One of the most widely used microarchitectural power simulators is *Wattch* [18]. *Wattch* is a power simulator for superscalar, out-of-order, processors. It has been developed with aid of the infrastructure offered by *SimpleScaler* [20]. The power estimation engine of *Wattch* is based on the *SimpleScaler* architecture, but in addition, it supports detailed cycle-accurate information for all models, including datapath elements, memory, control logic, and clock distribution network [21].

While providing good accuracy, low-level power estimation methodologies are slow and impractical for analyzing the power consumption at an early design stage. Moreover, these methodologies require the availability of lower level circuit details or a complete Hardware Description Language (HDL) design of the targeted processor, which is not available for most of the commercial off-the-shelf processors.

### 2.2. High-Level Estimation Techniques

Recently, the demand has increased for high level power estimation simulators that allow an early design space exploration from the power consumption perspective. The existing high-level power estimation models can be classified into two main categories, Instruction Level Power Analysis (ILPA) and Functional Level Power Analysis (FLPA).

#### 2.2.1. Instruction Level Power Analysis

An instruction level power model for individual processors was first proposed by Tiwari et al. [22]. By measuring the current drawn by the processor as it repeatedly executes distinct instructions or distinct instruction sequences, it is possible to obtain most of the information that is required to evaluate the power consumption of a program for the processor under test. Tiwari et al. modeled the power consumption of the Intel DX486 processor. Power is modeled as a base cost for each instruction plus the interinstruction overheads that depend on neighboring instructions. The base cost of an instruction can be considered as the cost associated with the basic processing needed to execute the instruction. However, when sequences of instructions are considered, certain interinstruction effects come into play, which are not reflected in the cost computed solely from base cost. These effects can be summarized as the following.

(i)Circuit state: switching activity depends on the current inputs and previous circuit state, In other words the difference between the bit pattern of two successive instructions.

(ii)Resource constraints: resource constraints in the CPU can lead to stalls, for example, pipeline stalls and write buffer stalls.

(iii)Cache misses: another interinstruction effect is the effect of cache misses. The instruction timings listed in manuals provide the cycle count assuming a cache hit. For a cache miss, a certain cycle penalty has to be added to the instruction execution time.

An experimental method is proposed by Tiwari et al. to empirically determine the base and the inter-instructions overhead cost. In this experimental method, several programs containing an infinite loop consisting of several instances of the given instruction or instruction sequences are used. The average current drawn by the processor core during the execution of this loop is measured by a standard off-the-shelf, dual-slope integrating digital multimeter. Much more accurate measuring environments have been proposed to precisely monitor the instantaneous current drawn by the processor instead of the average current. One of these approaches has employed a high-performance current mirror based on bipolar junction transistors as current sensing circuit. The power profiler in the work of Nikolaidis et al. [23] receives as input the trace file of executed assembly instructions, generated by an appropriate processor simulator, and estimates the base and interinstruction energy cost of the executed program taking into account the energy sensitive factors as well as the effect of pipeline stalls and flushes. The main disadvantage of this approach is the current measuring complexity [24].

Another approach, to reduce the spatial complexity of instruction-level power models is presented in [25]. Therein, interinstruction effects have been measured by considering only the additional energy consumption observed when a generic instruction is executed after a no-operation (NOP) instruction.

An attempt to modify the original ILPA to create an instruction level power model with a gate level simulator is carried out by Sama et al. [26]. In this approach, the power cost values were obtained through a power simulator rather than actual measurement; thus modeling is possible at design time and can be part of microarchitecture and/or instruction set architecture exploration. More researchers attempted to enhance the original Tiwari ILPA power consumption modeling technique as in [27–29].

The ILPA-based methods have some drawbacks, one of these drawbacks is that the number of current measurements is directly related to the number of instructions in the Instruction Set Architecture (ISA) and also the number of parallel instructions composing the very long instruction in the VLIW processor. The problem of instruction level power characterization of -issue VLIW processor is where is the number of instructions in the ISA and is number of parallel instructions composing the VLIW [30]. Also they do not provide any insight on the instantaneous causes of power consumption within the processor core, which is seen as a black-box model. Moreover, the effect of varying data (as well as address) is ignored in the ILPA models, though this effect can be accounted by an additive factor [31].

#### 2.2.2. Function Level Power Analysis

FLPA was first introduced by Laurent et al. in [32]. The functional level power modeling approach is applicable to all types of processor architectures. Furthermore, FLPA modeling can be applied to a processor with moderate effort, and no detailed knowledge of the processors circuitry is needed. The basic idea behind the FLPA is the distinction of the processor architecture into functional blocks like processing Unit (PU), instruction management unit (IMU), internal memory, and others [32]. First, a functional analysis of these blocks is performed to specify and then discard the nonconsuming blocks (those with negligible impact on the power-consumption). The second step is to figure out the parameters that affect the power consumption of each of the power consuming blocks. For instance, the IMU is affected by the instructions dispatching rate which in turn is related to the degree of parallelism. In addition to these parameters, there are some parameters that affect the power consumption of all functional blocks in the same manner such as operating frequency and word length of input data.

Laurent et al. [32, 33] presented a functional level power consumption model for the TI C62x DSP series. The C62x series has four internal memory modes which are handled by Laurent et al. model. The targeted architecture in our proposed model is the TI C64x. There are significant differences between the C64x and the C62x architectures. The internal program and data memories of the C62x have been replaced by two level-1 caches in the C64x. Moreover, the C64x includes a level-2 SRAM that is utilized both for the data and program with the ability to be partially configured as level-2 cache memory. Moreover, the C64x has the ability to utilize SIMD instructions. The number of registers is also doubled (2 × 32 in place of 2 × 16).

Unlike the model presented in [32, 33] that considers the load and store instructions as a part of the processing unit submodel, we had to consider the internal memory as a separate functional block. Hence, we believe that our proposed model is substantially different from the model presented in [32, 33] but required for accurately modeling the C64 family.

The work presented in [34] briefly points to the C64x but is lacking details. Furthermore, as of now (submission time of the paper) the model for the C64x is not included in the library of the latest SoftExplorer tool [35]. Unlike the model introduced in [36] that employs the parallelism factor as the affecting parameter for the processing units (PUs) block, the fact that the NOP does not require any PU for its execution convinced us that another parameter yields a better description of the PUs. Moreover, as we will explain later, the level-1 data cache memory submodel is different as well.

## 3. Target Architecture

In this section we briefly consider the target processor architectural features and the experimental setup used in our work.

## 4. Methodology

By means of simulations or measurements it is possible to find an arithmetic function for each block that determines its power consumption depending on a set of parameters. Hence, to determine the arithmetic function for each functional block, the average supply current of the processor core is measured in relation with the variation of the affecting parameter. These variations are achieved by a set of small programs, called scenarios. Such scenarios are short programs written in assembly language. Each program consists of an unbounded loop with a body of several hundreds of certain instructions that individually invoke each block. The power consumption rules are finally obtained by curve-fitting the measurement values [33].

The parameters that affect the power consumption for each functional block can be extracted from the assembly code generated by the CCS3.1. Some parameters cannot be extracted directly from the assembly code, such as the execution time and the data cache miss rate. Therefore, the code should be run at least once to obtain these parameters with the aid of the code profiler.

## 5. C6416T Power Consumption Model

Methodology of computing algorithmic parameters.

### 5.1. Static and Clock Distribution Power Consumption Submodel

### 5.2. IMU Power Consumption Submodel

The IMU unit of the C T processor consists of two main subunits which are the instructions fetching unit and the dispatching unit. The IMU fetches eight instructions per cycle as one fetch packet. The dispatch unit then subdivides this fetch packet into execution packets. Since the C T has eight functional units, it is capable of simultaneously executing up to eight instructions. Consequently, the dispatch unit can divide the fetch packet into ne (maximum parallelism) to eight (sequential) execution packets. Thus, the dispatching rate is strongly affecting the instruction parallelism. Therefore, it is obvious that the dispatch rate is the parameter that affects the power consumption of the IMU.

The quality of the fitting process is measured by the value -squared ( ): a number from to , which is the normalized square of the residuals of the data after the fit. This value expresses what fraction of the variance of the data is explained by the fitted trend line. It reveals how closely the estimated values for the trend line correspond to the actual data. A trend line is most reliable when its value is at or close to [39]. Since the value for the arithmetic function in (1) equals then (1) is an excellent fit for the curve values in Figure 7.

where PSR stands for pipeline stall rate which can be expressed as the number of pipeline stall cycles divided by the total cycles required for executing the code segment under investigation.

### 5.3. PU Power Consumption SubModel

The data path of the C T consists of eight functional units. These functional units can work simultaneously if the dispatch unit succeeds to compose an execution packet with eight instructions. Unlike the model in [36] that uses the parallelism degree as the affecting parameter for the processing unit submodel, the fact that the NOP does not require any PU for its execution convinced us that another parameter yields a better description of the PUs.

More than 1000 different instructions compose the scenarios that vary the processing unit rate, that is to account for the inter-instructions effect. The current measured from the DSK is the sum of the clock tree, IMU, and the PU currents. To attain only the current drawn by the PU, the IMU and clock tree currents are subtracted from the measured current.

The arithmetic function in (3) results in an excellent fit for the curve values in Figure 10 with an value of . Compared to other functional units such as clock tree or the IMU, it is clear that the PU does not significantly contribute to the total power consumption of the core processor. It is important to mention that the scenario for invoking the PU does not include any memory instructions. The internal memory operations are handled in a separate scenario.

### 5.4. Internal Memory Power Consumption SubModel

As mentioned in Section 5.3 the internal memory operations are separately handled. That is because of its distinct execution characteristics. Two categories of memory operations are included in the instruction set of the C6416T DSP load and store. The load instructions represent the read of data from the data cache (if the operand exist in the data cache) to a specific register from the processor's register file. The store instructions represent the write of data into the memory, according to the data cache write policy.

The C64x+ architecture is capable of performing two memory operations per cycle. The affecting parameters for the internal memory submodel are the memory read access rate and the memory write access rate . The memory access rate is defined as the number of memory references (read and write) divided by the algorithm execution time.

### 5.5. L1 Data Cache Power Consumption SubModel

The L data cache functional block represents the flow of data from the L data cache to L memory and vice versa. Different scenarios are prepared to stimulate the effect of the data cache miss.

### 5.6. L1 Program Cache Power Consumption SubModel

## 6. Model Validation

Benchmarks used for our experiments.

First of all, all optimization options which are included in the CCS are turned off because these optimization options affect the speed or the code size only and are not dedicated to power optimization. The second step is to compile the benchmarks.

Unlike almost all the benchmarks of Figure 16, the physically measured power consumption for the IIR benchmark is higher than the estimated power. By investigating the generated assembly code of the IIR benchmark from the CCS3.1 we find that all the load and store operations inside the loops are of operand size 32 bits unlike the case in the other benchmarks; for example in the FFT 16 × 16 all the load and store instructions are of operand size 16 bit. Refereing to Section 4, we indicated that the word length of the instructions' operand affects the power consumption of almost all the functional blocks. Moreover, in Section 5 we stated that "in our proposed model we choose 16-bits word length to be the typical word length." Therefore the estimated power in the IIR benchmark is slightly higher than the physically measured power.

The power-consumption of the Elastic Graph Matching (EGM) algorithm is estimated with the aid of our proposed power consumption model described in Section 2. The estimated power consumption equals 1.0498 W while the physically measured power consumption equals 1.061 W, resulting in an estimation error of only .

## 7. Conclusion

In this paper, we developed a precise functional level estimation technique to estimate the power consumption of the embedded software running on a programmable processor. The commercial off-the-shelf VLIW DSP C6416T from Texas Instruments is utilized as the targeted platform. The inter-instructions as well as the pipeline stall effects have been investigated in our proposed model. The validation and precision of our model have been proven by estimating the power consumption of many typical algorithms applied in signal and image processing. We further validated the precision of our developed model on a real application applied in the video processing field. The power consumption estimated by our model, is compared to the physically measured power consumption, achieving a very low absolute average estimation error of and an absolute maximum estimation error of only .

Although our methodology for modeling the TI C6416T VLIW processor power consumption is applicable for other VLIW processors, for different targeted processor architecture we can reconsider the distinction of the processor architecture into functional blocks as shown in Figure 4. For example, if the new target architecture includes a floating-point unit, Figure 4 has to be modified by including a new functional block that represents the floating-point unit together with the parameters affecting its power consumption such as the access rate. Thus, as long as the power consumption of the functional blocks remains independent, our presented power estimation methodology is intended to work equally on all VLIW architectures.

Finally, we have evaluated the global optimization levels of the CCS as well as two specific architectural features of the C6416T, namely, Software Pipelined Loop (SPLoop) and the Single Instruction Multi Data (SIMD) from the perspective of energy and power consumption [42, 43].

## Declarations

### Acknowledgments

This work has been funded by the Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms as well as the COMET K-Project Embedded Computer Vision (ECV) in conjunction with the Austrian Institute of Technology (AIT).

## Authors’ Affiliations

## References

- Bleakley CJ, Casas-Sanchez M, Rizo-Morente J:
**Software level power consumption models and power saving techniques for embedded DSP processors.***Journal of Low Power Electronics*2006,**2**(2):281-290.Google Scholar - Brandolese C:
*A codesign approach to software power estimation for embedded systems, Ph.D. disseration*. Politecnico di Milano, Institute of Electronics and Information; 2000.Google Scholar - Huang CX, Zhang B, Deng A, Swirski B:
**Design and implementation of PowerMill.**In*Proceedings of the International Symposium on Low Power Design (ISLPED '95), April 1995, New York, NY, USA*. ACM; 105-109.Google Scholar - Najm FN:
**Survey of power estimation techniques in VLSI circuits.***IEEE Transactions on Very Large Scale Integration (VLSI) Systems*1994,**2**(4):446-455.View ArticleGoogle Scholar - Gupta S, Najm FN:
**Power macromodeling for high level power estimation.**In*Proceedings of the 34th Design Automation Conference (DAC '97), June 1997, New York, NY, USA*. ACM; 365-370.View ArticleGoogle Scholar - Chou T, Roy K:
**Accurate power estimation of CMOS sequential circuits.***IEEE Transactions on Very Large Scale Integration (VLSI) Systems*1996,**4**(3):369-380.View ArticleGoogle Scholar - Marculescu D, Marculescu R, Pedram M:
**Information theoretic measures of energy consumption at register transfer level.**In*Proceedings of the International Symposium on Low Power Design (ISLPED '95), 1995, New York, NY, USA*. ACM; 81-86.Google Scholar - Rabaey JN, Pedram M:
*Low Power Design Methodologies, The Springer International Series in Engineering and Computer Science*. 1996.,**336:**Google Scholar - Powell S, Chau EM:
**Estimating power dissipation of VLSI signal processing chips: the PFA technique.***VLSI Signal Processing IV*1990, 250-259.Google Scholar - Kumar N, Katkoori S, Rader L, Vemuri R:
**Profile-driven behavioral synthesis for low-power VLSI systems.***IEEE Design and Test of Computers*1995,**12**(3):70-84. 10.1109/MDT.1995.466383View ArticleGoogle Scholar - Liu D, Svensson C:
**Power consumption estimation in CMOS VLSI chips.***IEEE Journal of Solid-State Circuits*1994,**29**(6):663-670. 10.1109/4.293111View ArticleGoogle Scholar - Landman PE, Rabaey JM:
**Activity-sensitive architectural power analysis for the control path.**In*Proceedings of the International Symposium on Low Power Design (ISLPED '95), April 1995, New York, NY, USA*. ACM; 93-98.Google Scholar - Mehta H, Owens RM, Irwin MJ:
**Energy characterization based on clustering.**In*Proceedings of the 33rd Annual Design Automation Conference (DAC '96), June 1996, New York, NY, USA*. ACM; 702-707.View ArticleGoogle Scholar - Wu Q, Qiu Q, Pedram M, Ding C-S:
**Cycle-accurate macro-models for RT-level power analysis.***IEEE Transactions on Very Large Scale Integration (VLSI) Systems*1998,**6**(4):520-528.View ArticleGoogle Scholar - Benini L, Bogliolo A, Favalli M, De Micheli G:
**Regression models for behavioral power estimation.***Integrated Computer-Aided Engineering*1998,**5**(2):95-106.Google Scholar - Ye W, Vijaykrishnan N, Kandemir M, Irwin MJ:
**The design and use of simplepower: a cycle-accurate energy estimation tool.**In*Proceedings of the 37th conference on Design Automation (DAC '00), June 2000, New York, NY, USA*. ACM; 340-345.View ArticleGoogle Scholar - Gurumurthi S, Sivasubramaniam A, Irwin MJ,
*et al*.:**Using complete machine simulation for software power estimation: the softWatt approach.**In*Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA '02), February 2002, Washington, DC, USA*. IEEE Computer Society; 141-151.View ArticleGoogle Scholar - Brooks D, Tiwari V, Martonosi M:
**Wattch: a framework for architectural-level power analysis and optimizations.***SIGARCH Computer Architecture News*2000,**28**(2):83-94. 10.1145/342001.339657View ArticleGoogle Scholar - Kamble MB, Ghose K:
**Analytical energy dissipation models for low power caches.**In*Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '97), August 1997, New York, NY, USA*. ACM; 143-148.Google Scholar - Burger D, Austin TM:
**The simplescalar tool set, version 2.0.***SIGARCH Computer Architecture News*1997,**25**(3):13-25. 10.1145/268806.268810View ArticleGoogle Scholar - Pedram M:
*Power Aware Design Methodologies, edited by J. M. Rabaey*. Norwell, Mass, USA, Kluwer Academic Publishers; 2002.View ArticleGoogle Scholar - Tiwari V, Malik S, Wolfe A:
**Power analysis of embedded software: a first step towards software power minimization.***IEEE Transactions on Very Large Scale Integration (VLSI) Systems*1994,**2**(4):437-445.View ArticleGoogle Scholar - Nikolaidis S, Kavvadias N, Neofotistos P, Kosmatopoulos K, Laopoulos T, Bisdounis L:
**Instrumentation set-up for instruction level power modeling.**In*Proceedings of the 12th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS '02), 2002, London, UK*. Springer; 71-80.Google Scholar - Nikolaidis S, Kavvadias N, Laopoulos T, Bisdounis L, Blionas S:
**Instruction level energy modeling for pipelined processors.***Journal of Embedded Computing*2005,**1**(3):317-324.Google Scholar - Klass B, Thomas DE, Schmit H, Nagle DF:
**Modeling inter-instruction energy effects in a digital signal processor.***Proceedings of the Power Driven Microarchitecture Workshop in Conjunction with International Syymposism Computer Architecture, June 1998*Google Scholar - Sama A, Theeuwen JFM, Balakrishnan M:
**Speeding up power estimation of embedded software.**In*Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '00), 2000, New York, NY, USA*. ACM; 191-196.Google Scholar - Russell JT, Jacome MF:
**Software power estimation and optimization for high performance, 32-bit embedded processors.**In*Proceedings of the IEEE International Conference on Computer Design (ICCD '98), October 1998, Washington, DC, USA*. IEEE Computer Society; 328-333.Google Scholar - Mehta H, Owens RM, Irwin MJ:
**Instruction level power profiling.**In*Proceedings of the International Conference of Acoustics, Speech, and Signal Processing (ICASSP '96), 1996, Washington, DC, USA*. IEEE Computer Society; 3326-3329.Google Scholar - Steven V, Gentile R, Kaeli DR, Olivadoti G: Developing energy-aware strategies for the blackfin processor. Proceedings of the High Performance Embedded Computing (HPEC '04), Septemper 2004Google Scholar
- Sami M, Sciuto D, Silvano C, Zaccaria V:
**An instruction-level energy model for embedded VLIW architectures.***IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*2002,**21**(9):998-1010. 10.1109/TCAD.2002.801105View ArticleGoogle Scholar - Balakrishnan M:
**Low Power Design.**Lectures, 2008, http://embedded.cse.iitd.ernet.in/homepage/course/low_power/index.shtml - Laurent J, Senn E, Julien N, Martin E:
**High level energy estimation for DSP systems.***Proceedings International Workshop on Power And Timing Modeling and Optimization and Simulation (PATMOS '01), September 2001*311-316.Google Scholar - Senn E, Julien N, Laurent J, Martin E: Power consumption estimation of a C program for data-intensive applications. In Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation (PATMOS'02), 2002, London, UK. Springer; 332-341.Google Scholar
- Senn E, Laurent J, Julien N, Martin E:
**SoftExplorer: estimating and optimizing the power and energy consumption of a C program for DSP applications.***EURASIP Journal on Applied Signal Processing*2005,**2005**(16):2641-2654. 10.1155/ASP.2005.2641MATHView ArticleGoogle Scholar -
**SoftExplorer: Processors Power Estimation Tool**http://portal.acm.org/citation.cfm?id=1287311 - Schneider M, Blume H, Noll TG:
**Power estimation on functional level for programmable processors.***Journal of Advances in Radio Science*2005,**2:**215-219.View ArticleGoogle Scholar - Texas Instruments Inc :
**TMS320C6416T, Fixed Point Digital Signal Processor, Datasheet.**SPRS226J, November 2003, http://www.ti.com - Agilent Technologies Inc :
**Agilent 34410A Digital Multimeter, Datasheet.**5989-3738EN, October 2007, http://www.home.agilent.com/agilent/product.jspx?pn=34410A - Draper NR, Smith H:
*Applied Regression Analysis, Wiley Series in Probability and Mathematical Statistics*. 2nd edition. John Wiley & Sons, New York, NY, USA; 1981.Google Scholar - Texas Instruments Inc :
**TMS320C6416T, Technical Overview.**SPRU395B, January 2001, http://www.ti.com - Ibrahim MEA, Rupp M, Fahmy HAH:
**Power estimation methodology for VLIW digital signal Processor.**In*Proceedings of the Conference on Signals, Systems and Computers (SSC '08), October 2008, Asilomar, Calif, USA*. IEEE Computer Society; 1840-1844.Google Scholar - Ibrahim MEA, Rupp M, Habib SE-D:
**Compiler-based optimizations impact on embedded software power consumption.**In*Proceedings of the IEEE Joint North-East Workshop on Circuits and Systems and TAISA Conference (NEWCAS-TAISA '09), June 2009, Toulouse, France*. IEEE; 247-250.Google Scholar - Ibrahim MEA, Rupp M, Habib SE-D:
**Performance and power consumption trade-offs for a VLIW DSP.**In*Proceedings of the IEEE International Symposium on Signals, Circuits and systems (ISSCS '09), July 2009, Iasi, Romania*. IEEE; 179-200.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.