 Research Article
 Open Access
A Precise HighLevel Power Consumption Model for Embedded Systems Software
 MostafaEA Ibrahim^{1, 2}Email author,
 Markus Rupp^{2} and
 HossamAH Fahmy^{3}
https://doi.org/10.1155/2011/480805
© Mostafa E. A. Ibrahim et al. 2011
 Received: 26 February 2010
 Accepted: 11 August 2010
 Published: 19 August 2010
Abstract
The increasing demand for portable computing has elevated power consumption to be one of the most critical embedded systems design parameters. In this paper, we present a precise highlevel power estimation methodology for the software loaded on a VLIW processor that is based on a functional level power model. The targeted processor of our approach is the TMS320C6416T DSP from Texas Instrument. We consider several important issues in our model such as the pipeline stall, interinstructions effect and cache misses. The contributions are the following. First, a precise model to estimate the power consumption of the targeted DSP, while running a software algorithm is proposed. Second, we prove the validation and precision of our model on many typical algorithms applied in signal and image processing. Third, we further validate the precision of our model on a real application applied in the video processing field. The power consumption estimated by our model is compared to the physically measured power consumption, achieving a very low average absolute estimation error of 1.65% and a maximum absolute estimation error of only 3.3%.
Keywords
 Power Consumption
 Digital Signal Processor
 Functional Block
 Data Cache
 Arithmetic Function
1. Introduction
Many applicationsin special areas such as handheld computation, tiny robots,and guidance systems in automated vehicles are powered by batteries of low rating. In order to avoid frequent recharging or replacement of the batteries, there is a significant interest in lowpower system design. Very Long Instruction Word (VLIW) Digital Signal Processors (DSP) are the most worthy choice for such an application domain because of their optimal performance at low power. The importance of the power constraints during the design of embedded systems has continuously increased in the past years, due to technological trends toward highlevel integration and increasing operating frequencies, combined with the growing demand of portable systems. This has led to a significant research effort in power estimation and low power design. Power simulators (profilers) allow the software programmers to specify the hot spot, highly power consuming, segments of their software code as a first step towards optimizing these code segments from power perspective. Developers of power simulators have to embed a precise power consumption model in their simulators. Existing processors power simulators are available only for the lower levels of the design, at the circuit level and to a limited extent at the logic level. These tools are very slow and impractical to use to evaluate the power consumption of embedded software since the application power consumption would only be known at the very last stage of the design process. In this paper, an approach for estimating the power consumption of a VLIW DSP while running a software application is presented. The contribution of this work aims to precisely estimate the power consumption of the core processor while running a software algorithm at an early stage in the design process. The targeted DSP is the TMS320C6416T (for the rest of the paper it is referred to as C6416T for brevity) from Texas Instrument. This processor features the highest performance among the fixedpoint DSPs of the C6000 DSP platforms.
The rest of the paper is organized as follows. Section 2 presents an overview of the existing power consumption modeling abstraction levels for general purpose processors. Section 3 provides a general overview of the target architecture. Section 4 describes the methodology along with the experimental setup employed in our experiments. Section 5 describes in detail the functional level analysis and the resulting mathematical formulas constituting the model for the targeted architecture functional units. Section 6 demonstrates the validation of the power estimation methodology utilizing the developed power consumption model. Finally, Section 7 summarizes the main contributions of this paper.
2. Related Work
This section summarizes the most recent contributions to the problem of power modeling and estimation. Recent approaches to model the power consumption of the software running on a processor can be separated into two main categories lowlevel models and highlevel models. Lowlevel models calculate power and energy from detailed electrical descriptions, comprising circuit level, gate level, register transfer (RT) level, or system level. while, highlevel models deal only with instructions and functional units from the software point of view and without electrical knowledge of the underlying architecture [1].
2.1. LowLevel Estimation Techniques
The level of detail in the modeling performed by the power simulator influences both the accuracy of estimation as well as the speed of the simulator. In this section we survey the models frequently used at low level as these power consumption estimation techniques cover a range of abstractions such as the circuit/transistor level, logic gate level, RT level, and architectural level.
2.1.1. TransistorLevel Estimations
The representation of a microprocessor in terms of transistors and nets is extremely complex and requires undergoing all the steps of the design flow and the layout, routing, and parameter extraction inclusive. Furthermore, a transistor level view of the system employs component models based on linearized differential equations and works in the continuous time domain. This implies that a simulation of more than one million transistors, even for a few clock cycles, requires times that are usually not affordable and anyway not practical for the highlevel power characterization [2]. Thus, while providing very good accuracy; transistorlevel power estimation methodology is slow and impractical for analyzing the power consumption at an early design stage. Moreover, this methodology requires the availability of lower level circuit details of the targeted processor, which is not available for most of commercial offtheshelf processors.
The PowerMil [3] is an early attempt to build a lowlevel power consumption simulator. PowerMil is a transistor level simulator for simulating the current and power behavior in VLSI circuits. It is capable of simulating detailed current behavior in modern deep submicron CMOS circuits, including sophisticated circuitries such as senseamplifiers, with speed and capacity approaching conventional gate level simulators. For more details about power estimation techniques in VLSI circuits refer to [4, 5].
2.1.2. GateLevel Estimations
Methods to estimate the power consumption based on gate level descriptions of microprocessors or micro controller cores have been proposed in literature. The main advantage of such methods with respect to transistorlevel simulation approaches is that the simulation is eventdriven and takes place in a discrete time domain, leading to a considerable reduction of the computational complexity, without a significant loss of accuracy [2].
An example for the gate level power estimators is the model presented by Chou [6]. present an accurate estimation of signal activity at the internal nodes of sequential logic circuits. The power consumption estimation in Chou and Roy is a Monte Carlo based approach that take spatial and temporal correlations of logic signals into consideration.
2.1.3. RTLevel Estimations
A design described at RTlevel can be regarded as a collection of blocks and a network of interconnections. The blocks are sometimes referred to as macros, adders, registers, multiplexers, and so on, while the interconnections are simply nets or group of nets. An assumption underlying the great majority of the approaches presented in the literature is that the power properties of a block can be derived from an analysis of the block isolated from a design, under controlled operating conditions. The main factor influencing the power consumption model of a macro is the input statistics [2].
Most of the research in RT level power estimation is based on empirical methods that measure the power consumption of existing implementations and produce models from those measurements. This is in contrast to the approaches that rely on informationtheoretic measures of activity to estimate power [7, 8]. Measurementbased approaches for estimating the power consumption of datapath functional units can be divided into two subcategories, namely; transition sensitive and activity sensitive. The first technique, introduced by Powel and Chau [9], is a fixedactivity micromodeling strategy called the Power Factor Approximation (PFA) method. The power models are parameterized in terms of complexity parameters and a PFA proportional constant. Similar schemes were also proposed by Kumar et al. [10] and Liu and Svensson [11]. This approach assumes that the inputs do not affect the switching activity of a hardware block. To remedy this problem, activitysensitive empirical power models were developed. These schemes are based on predictable input signal statistics; an example is the method proposed by Landman and Rabaey [12]. The overall accuracy of such models may be hampered due to incorrect input statistics or the inability to correctly model the interaction.
The second empirical technique, transitionsensitive power models, is based on input transitions rather than input statistics. The method, proposed by Mehta et al. [13], assumes that a power model is provided for each functional unit—a table containing the power consumed for each input transition. Closely related input transitions and power patterns can be concentrated in clusters, thereby reducing the size of the table. Other researchers have also proposed similar macromodelbased power estimation approaches [14, 15].
2.1.4. ArchitecturalLevel Estimations
Recently, various architectural power simulators have been designed that employ a combination of lower level of abstraction power consumption models. These simulators derive power estimates from the analysis of circuit activity induced by the application programmes during each cycle and from detailed capacitive models for the components activated. A key distinction between these different simulators is the estimation accuracy and estimation speed. For example, the SimplePower power simulator [16] employs a transitionsensitive power model for the datapath functional unit. The SimplePower core accesses a table containing the switch capacitance for each input transition of the functional unit exercised.
The use of a transitionsensitive approach has both design challenges as well as performance concerns during simulation. The first concern is that the construction of these tables is timeconsuming. Unfortunately, the size of this table grows exponentially with the size of the inputs. The table construction problem can be addressed by partitioning and clustering mechanisms. The second concern is the performance cost of the table lookup for each component access in a cycle. In order to overcome this cost, simulators such as SoftWatt [17] and Wattch [18] utilize a simple fixedactivity model for the functional unit. These simulators only track the number of accesses to a specific component and utilize an average capacity value to estimate the power consumed. Even the same simulator can employ different types of power models for different components. For example, SimplePower estimates the power consumed in the memories utilizing analytical models [19]. In contrast to the datapath components that utilize a transitionsensitive approach, these models estimate the power consumed per access and do not accommodate the power differences found in sequences of accesses. One of the most widely used microarchitectural power simulators is Wattch [18]. Wattch is a power simulator for superscalar, outoforder, processors. It has been developed with aid of the infrastructure offered by SimpleScaler [20]. The power estimation engine of Wattch is based on the SimpleScaler architecture, but in addition, it supports detailed cycleaccurate information for all models, including datapath elements, memory, control logic, and clock distribution network [21].
While providing good accuracy, lowlevel power estimation methodologies are slow and impractical for analyzing the power consumption at an early design stage. Moreover, these methodologies require the availability of lower level circuit details or a complete Hardware Description Language (HDL) design of the targeted processor, which is not available for most of the commercial offtheshelf processors.
2.2. HighLevel Estimation Techniques
Recently, the demand has increased for high level power estimation simulators that allow an early design space exploration from the power consumption perspective. The existing highlevel power estimation models can be classified into two main categories, Instruction Level Power Analysis (ILPA) and Functional Level Power Analysis (FLPA).
2.2.1. Instruction Level Power Analysis
An instruction level power model for individual processors was first proposed by Tiwari et al. [22]. By measuring the current drawn by the processor as it repeatedly executes distinct instructions or distinct instruction sequences, it is possible to obtain most of the information that is required to evaluate the power consumption of a program for the processor under test. Tiwari et al. modeled the power consumption of the Intel DX486 processor. Power is modeled as a base cost for each instruction plus the interinstruction overheads that depend on neighboring instructions. The base cost of an instruction can be considered as the cost associated with the basic processing needed to execute the instruction. However, when sequences of instructions are considered, certain interinstruction effects come into play, which are not reflected in the cost computed solely from base cost. These effects can be summarized as the following.
(i)Circuit state: switching activity depends on the current inputs and previous circuit state, In other words the difference between the bit pattern of two successive instructions.
(ii)Resource constraints: resource constraints in the CPU can lead to stalls, for example, pipeline stalls and write buffer stalls.
(iii)Cache misses: another interinstruction effect is the effect of cache misses. The instruction timings listed in manuals provide the cycle count assuming a cache hit. For a cache miss, a certain cycle penalty has to be added to the instruction execution time.
An experimental method is proposed by Tiwari et al. to empirically determine the base and the interinstructions overhead cost. In this experimental method, several programs containing an infinite loop consisting of several instances of the given instruction or instruction sequences are used. The average current drawn by the processor core during the execution of this loop is measured by a standard offtheshelf, dualslope integrating digital multimeter. Much more accurate measuring environments have been proposed to precisely monitor the instantaneous current drawn by the processor instead of the average current. One of these approaches has employed a highperformance current mirror based on bipolar junction transistors as current sensing circuit. The power profiler in the work of Nikolaidis et al. [23] receives as input the trace file of executed assembly instructions, generated by an appropriate processor simulator, and estimates the base and interinstruction energy cost of the executed program taking into account the energy sensitive factors as well as the effect of pipeline stalls and flushes. The main disadvantage of this approach is the current measuring complexity [24].
Another approach, to reduce the spatial complexity of instructionlevel power models is presented in [25]. Therein, interinstruction effects have been measured by considering only the additional energy consumption observed when a generic instruction is executed after a nooperation (NOP) instruction.
An attempt to modify the original ILPA to create an instruction level power model with a gate level simulator is carried out by Sama et al. [26]. In this approach, the power cost values were obtained through a power simulator rather than actual measurement; thus modeling is possible at design time and can be part of microarchitecture and/or instruction set architecture exploration. More researchers attempted to enhance the original Tiwari ILPA power consumption modeling technique as in [27–29].
The ILPAbased methods have some drawbacks, one of these drawbacks is that the number of current measurements is directly related to the number of instructions in the Instruction Set Architecture (ISA) and also the number of parallel instructions composing the very long instruction in the VLIW processor. The problem of instruction level power characterization of issue VLIW processor is where is the number of instructions in the ISA and is number of parallel instructions composing the VLIW [30]. Also they do not provide any insight on the instantaneous causes of power consumption within the processor core, which is seen as a blackbox model. Moreover, the effect of varying data (as well as address) is ignored in the ILPA models, though this effect can be accounted by an additive factor [31].
2.2.2. Function Level Power Analysis
FLPA was first introduced by Laurent et al. in [32]. The functional level power modeling approach is applicable to all types of processor architectures. Furthermore, FLPA modeling can be applied to a processor with moderate effort, and no detailed knowledge of the processors circuitry is needed. The basic idea behind the FLPA is the distinction of the processor architecture into functional blocks like processing Unit (PU), instruction management unit (IMU), internal memory, and others [32]. First, a functional analysis of these blocks is performed to specify and then discard the nonconsuming blocks (those with negligible impact on the powerconsumption). The second step is to figure out the parameters that affect the power consumption of each of the power consuming blocks. For instance, the IMU is affected by the instructions dispatching rate which in turn is related to the degree of parallelism. In addition to these parameters, there are some parameters that affect the power consumption of all functional blocks in the same manner such as operating frequency and word length of input data.
Laurent et al. [32, 33] presented a functional level power consumption model for the TI C62x DSP series. The C62x series has four internal memory modes which are handled by Laurent et al. model. The targeted architecture in our proposed model is the TI C64x. There are significant differences between the C64x and the C62x architectures. The internal program and data memories of the C62x have been replaced by two level1 caches in the C64x. Moreover, the C64x includes a level2 SRAM that is utilized both for the data and program with the ability to be partially configured as level2 cache memory. Moreover, the C64x has the ability to utilize SIMD instructions. The number of registers is also doubled (2 × 32 in place of 2 × 16).
Unlike the model presented in [32, 33] that considers the load and store instructions as a part of the processing unit submodel, we had to consider the internal memory as a separate functional block. Hence, we believe that our proposed model is substantially different from the model presented in [32, 33] but required for accurately modeling the C64 family.
The work presented in [34] briefly points to the C64x but is lacking details. Furthermore, as of now (submission time of the paper) the model for the C64x is not included in the library of the latest SoftExplorer tool [35]. Unlike the model introduced in [36] that employs the parallelism factor as the affecting parameter for the processing units (PUs) block, the fact that the NOP does not require any PU for its execution convinced us that another parameter yields a better description of the PUs. Moreover, as we will explain later, the level1 data cache memory submodel is different as well.
3. Target Architecture
In this section we briefly consider the target processor architectural features and the experimental setup used in our work.
4. Methodology
By means of simulations or measurements it is possible to find an arithmetic function for each block that determines its power consumption depending on a set of parameters. Hence, to determine the arithmetic function for each functional block, the average supply current of the processor core is measured in relation with the variation of the affecting parameter. These variations are achieved by a set of small programs, called scenarios. Such scenarios are short programs written in assembly language. Each program consists of an unbounded loop with a body of several hundreds of certain instructions that individually invoke each block. The power consumption rules are finally obtained by curvefitting the measurement values [33].
The parameters that affect the power consumption for each functional block can be extracted from the assembly code generated by the CCS3.1. Some parameters cannot be extracted directly from the assembly code, such as the execution time and the data cache miss rate. Therefore, the code should be run at least once to obtain these parameters with the aid of the code profiler.
5. C6416T Power Consumption Model
Methodology of computing algorithmic parameters.
Parameter  Computation methodology 

 No. of fetch packets/No. of execution packets 
 (No. of executed instructions—NOP instructions)/Total code cycles 
 (No. of L1D read hits/Total code cycles) 100 
 (No. of L1D write hits/Total code cycles) 100 
 ((No. of L1D read misses + No. of L1D write misses)/No. of L1D references ) 100 
 (No. of L1P misses/No. of L1P references) 100 
PSR  No. of CPU stall cycles/Total code cycles 
5.1. Static and Clock Distribution Power Consumption Submodel
5.2. IMU Power Consumption Submodel
The IMU unit of the C T processor consists of two main subunits which are the instructions fetching unit and the dispatching unit. The IMU fetches eight instructions per cycle as one fetch packet. The dispatch unit then subdivides this fetch packet into execution packets. Since the C T has eight functional units, it is capable of simultaneously executing up to eight instructions. Consequently, the dispatch unit can divide the fetch packet into ne (maximum parallelism) to eight (sequential) execution packets. Thus, the dispatching rate is strongly affecting the instruction parallelism. Therefore, it is obvious that the dispatch rate is the parameter that affects the power consumption of the IMU.
The quality of the fitting process is measured by the value squared ( ): a number from to , which is the normalized square of the residuals of the data after the fit. This value expresses what fraction of the variance of the data is explained by the fitted trend line. It reveals how closely the estimated values for the trend line correspond to the actual data. A trend line is most reliable when its value is at or close to [39]. Since the value for the arithmetic function in (1) equals then (1) is an excellent fit for the curve values in Figure 7.
where PSR stands for pipeline stall rate which can be expressed as the number of pipeline stall cycles divided by the total cycles required for executing the code segment under investigation.
5.3. PU Power Consumption SubModel
The data path of the C T consists of eight functional units. These functional units can work simultaneously if the dispatch unit succeeds to compose an execution packet with eight instructions. Unlike the model in [36] that uses the parallelism degree as the affecting parameter for the processing unit submodel, the fact that the NOP does not require any PU for its execution convinced us that another parameter yields a better description of the PUs.
More than 1000 different instructions compose the scenarios that vary the processing unit rate, that is to account for the interinstructions effect. The current measured from the DSK is the sum of the clock tree, IMU, and the PU currents. To attain only the current drawn by the PU, the IMU and clock tree currents are subtracted from the measured current.
The arithmetic function in (3) results in an excellent fit for the curve values in Figure 10 with an value of . Compared to other functional units such as clock tree or the IMU, it is clear that the PU does not significantly contribute to the total power consumption of the core processor. It is important to mention that the scenario for invoking the PU does not include any memory instructions. The internal memory operations are handled in a separate scenario.
5.4. Internal Memory Power Consumption SubModel
As mentioned in Section 5.3 the internal memory operations are separately handled. That is because of its distinct execution characteristics. Two categories of memory operations are included in the instruction set of the C6416T DSP load and store. The load instructions represent the read of data from the data cache (if the operand exist in the data cache) to a specific register from the processor's register file. The store instructions represent the write of data into the memory, according to the data cache write policy.
The C64x+ architecture is capable of performing two memory operations per cycle. The affecting parameters for the internal memory submodel are the memory read access rate and the memory write access rate . The memory access rate is defined as the number of memory references (read and write) divided by the algorithm execution time.
5.5. L1 Data Cache Power Consumption SubModel
The L data cache functional block represents the flow of data from the L data cache to L memory and vice versa. Different scenarios are prepared to stimulate the effect of the data cache miss.
5.6. L1 Program Cache Power Consumption SubModel
Complete power consumption model for C6416T DSP.
Functional unit  Functional unit power consumption submodel 

Clock distribution 

IMU 

Processing units 

Memory read 

Memory write 

L1D cache 

L1P cache 

Total power 

Complete power consumption model for C6416T DSP at MHz.
Functional unit  Functional unit power consumption submodel 

Clock distribution 

IMU 

Processing units 

Memory read 

Memory write 

L1D cache 

L1P cache 

6. Model Validation
Benchmarks used for our experiments.
Benchmark  Description 

DotP128  Dot product of a vector of 128 16bit elements 
m100  Matrix multiplication for 2 100 × 100 square matrices 
FIR  Computes a real FIR filter, Input data and filter taps are 16bit 
Sobel 3 × 3  Apply Sobel filter of 3 × 3 window to an image of 8192 pixels 
Thresholding  Performs a thresholding operation on an input image of 8192 pixels 
Histogram  Takes histogram of an image of 8192, 8bit pixels 
IIR  Performs an autoregressive movingaverage (ARMA) filter with autoregressive filter coefficients and movingaverage filter coefficients 
FFT16 × 16  Performs a mixed radix forwards FFT using a special sequence of coefficients 
Correlation 3 × 3  Performs a point by point multiplication of the 3 × 3 mask with an input image 
First of all, all optimization options which are included in the CCS are turned off because these optimization options affect the speed or the code size only and are not dedicated to power optimization. The second step is to compile the benchmarks.
Unlike almost all the benchmarks of Figure 16, the physically measured power consumption for the IIR benchmark is higher than the estimated power. By investigating the generated assembly code of the IIR benchmark from the CCS3.1 we find that all the load and store operations inside the loops are of operand size 32 bits unlike the case in the other benchmarks; for example in the FFT 16 × 16 all the load and store instructions are of operand size 16 bit. Refereing to Section 4, we indicated that the word length of the instructions' operand affects the power consumption of almost all the functional blocks. Moreover, in Section 5 we stated that "in our proposed model we choose 16bits word length to be the typical word length." Therefore the estimated power in the IIR benchmark is slightly higher than the physically measured power.
The powerconsumption of the Elastic Graph Matching (EGM) algorithm is estimated with the aid of our proposed power consumption model described in Section 2. The estimated power consumption equals 1.0498 W while the physically measured power consumption equals 1.061 W, resulting in an estimation error of only .
7. Conclusion
In this paper, we developed a precise functional level estimation technique to estimate the power consumption of the embedded software running on a programmable processor. The commercial offtheshelf VLIW DSP C6416T from Texas Instruments is utilized as the targeted platform. The interinstructions as well as the pipeline stall effects have been investigated in our proposed model. The validation and precision of our model have been proven by estimating the power consumption of many typical algorithms applied in signal and image processing. We further validated the precision of our developed model on a real application applied in the video processing field. The power consumption estimated by our model, is compared to the physically measured power consumption, achieving a very low absolute average estimation error of and an absolute maximum estimation error of only .
Although our methodology for modeling the TI C6416T VLIW processor power consumption is applicable for other VLIW processors, for different targeted processor architecture we can reconsider the distinction of the processor architecture into functional blocks as shown in Figure 4. For example, if the new target architecture includes a floatingpoint unit, Figure 4 has to be modified by including a new functional block that represents the floatingpoint unit together with the parameters affecting its power consumption such as the access rate. Thus, as long as the power consumption of the functional blocks remains independent, our presented power estimation methodology is intended to work equally on all VLIW architectures.
Finally, we have evaluated the global optimization levels of the CCS as well as two specific architectural features of the C6416T, namely, Software Pipelined Loop (SPLoop) and the Single Instruction Multi Data (SIMD) from the perspective of energy and power consumption [42, 43].
Declarations
Acknowledgments
This work has been funded by the Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms as well as the COMET KProject Embedded Computer Vision (ECV) in conjunction with the Austrian Institute of Technology (AIT).
Authors’ Affiliations
References
 Bleakley CJ, CasasSanchez M, RizoMorente J: Software level power consumption models and power saving techniques for embedded DSP processors. Journal of Low Power Electronics 2006,2(2):281290.Google Scholar
 Brandolese C: A codesign approach to software power estimation for embedded systems, Ph.D. disseration. Politecnico di Milano, Institute of Electronics and Information; 2000.Google Scholar
 Huang CX, Zhang B, Deng A, Swirski B: Design and implementation of PowerMill. In Proceedings of the International Symposium on Low Power Design (ISLPED '95), April 1995, New York, NY, USA. ACM; 105109.Google Scholar
 Najm FN: Survey of power estimation techniques in VLSI circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 1994,2(4):446455.View ArticleGoogle Scholar
 Gupta S, Najm FN: Power macromodeling for high level power estimation. In Proceedings of the 34th Design Automation Conference (DAC '97), June 1997, New York, NY, USA. ACM; 365370.View ArticleGoogle Scholar
 Chou T, Roy K: Accurate power estimation of CMOS sequential circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 1996,4(3):369380.View ArticleGoogle Scholar
 Marculescu D, Marculescu R, Pedram M: Information theoretic measures of energy consumption at register transfer level. In Proceedings of the International Symposium on Low Power Design (ISLPED '95), 1995, New York, NY, USA. ACM; 8186.Google Scholar
 Rabaey JN, Pedram M: Low Power Design Methodologies, The Springer International Series in Engineering and Computer Science. 1996., 336:Google Scholar
 Powell S, Chau EM: Estimating power dissipation of VLSI signal processing chips: the PFA technique. VLSI Signal Processing IV 1990, 250259.Google Scholar
 Kumar N, Katkoori S, Rader L, Vemuri R: Profiledriven behavioral synthesis for lowpower VLSI systems. IEEE Design and Test of Computers 1995,12(3):7084. 10.1109/MDT.1995.466383View ArticleGoogle Scholar
 Liu D, Svensson C: Power consumption estimation in CMOS VLSI chips. IEEE Journal of SolidState Circuits 1994,29(6):663670. 10.1109/4.293111View ArticleGoogle Scholar
 Landman PE, Rabaey JM: Activitysensitive architectural power analysis for the control path. In Proceedings of the International Symposium on Low Power Design (ISLPED '95), April 1995, New York, NY, USA. ACM; 9398.Google Scholar
 Mehta H, Owens RM, Irwin MJ: Energy characterization based on clustering. In Proceedings of the 33rd Annual Design Automation Conference (DAC '96), June 1996, New York, NY, USA. ACM; 702707.View ArticleGoogle Scholar
 Wu Q, Qiu Q, Pedram M, Ding CS: Cycleaccurate macromodels for RTlevel power analysis. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 1998,6(4):520528.View ArticleGoogle Scholar
 Benini L, Bogliolo A, Favalli M, De Micheli G: Regression models for behavioral power estimation. Integrated ComputerAided Engineering 1998,5(2):95106.Google Scholar
 Ye W, Vijaykrishnan N, Kandemir M, Irwin MJ: The design and use of simplepower: a cycleaccurate energy estimation tool. In Proceedings of the 37th conference on Design Automation (DAC '00), June 2000, New York, NY, USA. ACM; 340345.View ArticleGoogle Scholar
 Gurumurthi S, Sivasubramaniam A, Irwin MJ, et al.: Using complete machine simulation for software power estimation: the softWatt approach. In Proceedings of the 8th International Symposium on HighPerformance Computer Architecture (HPCA '02), February 2002, Washington, DC, USA. IEEE Computer Society; 141151.View ArticleGoogle Scholar
 Brooks D, Tiwari V, Martonosi M: Wattch: a framework for architecturallevel power analysis and optimizations. SIGARCH Computer Architecture News 2000,28(2):8394. 10.1145/342001.339657View ArticleGoogle Scholar
 Kamble MB, Ghose K: Analytical energy dissipation models for low power caches. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '97), August 1997, New York, NY, USA. ACM; 143148.Google Scholar
 Burger D, Austin TM: The simplescalar tool set, version 2.0. SIGARCH Computer Architecture News 1997,25(3):1325. 10.1145/268806.268810View ArticleGoogle Scholar
 Pedram M: Power Aware Design Methodologies, edited by J. M. Rabaey. Norwell, Mass, USA, Kluwer Academic Publishers; 2002.View ArticleGoogle Scholar
 Tiwari V, Malik S, Wolfe A: Power analysis of embedded software: a first step towards software power minimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 1994,2(4):437445.View ArticleGoogle Scholar
 Nikolaidis S, Kavvadias N, Neofotistos P, Kosmatopoulos K, Laopoulos T, Bisdounis L: Instrumentation setup for instruction level power modeling. In Proceedings of the 12th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS '02), 2002, London, UK. Springer; 7180.Google Scholar
 Nikolaidis S, Kavvadias N, Laopoulos T, Bisdounis L, Blionas S: Instruction level energy modeling for pipelined processors. Journal of Embedded Computing 2005,1(3):317324.Google Scholar
 Klass B, Thomas DE, Schmit H, Nagle DF: Modeling interinstruction energy effects in a digital signal processor. Proceedings of the Power Driven Microarchitecture Workshop in Conjunction with International Syymposism Computer Architecture, June 1998Google Scholar
 Sama A, Theeuwen JFM, Balakrishnan M: Speeding up power estimation of embedded software. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED '00), 2000, New York, NY, USA. ACM; 191196.Google Scholar
 Russell JT, Jacome MF: Software power estimation and optimization for high performance, 32bit embedded processors. In Proceedings of the IEEE International Conference on Computer Design (ICCD '98), October 1998, Washington, DC, USA. IEEE Computer Society; 328333.Google Scholar
 Mehta H, Owens RM, Irwin MJ: Instruction level power profiling. In Proceedings of the International Conference of Acoustics, Speech, and Signal Processing (ICASSP '96), 1996, Washington, DC, USA. IEEE Computer Society; 33263329.Google Scholar
 Steven V, Gentile R, Kaeli DR, Olivadoti G: Developing energyaware strategies for the blackfin processor. Proceedings of the High Performance Embedded Computing (HPEC '04), Septemper 2004Google Scholar
 Sami M, Sciuto D, Silvano C, Zaccaria V: An instructionlevel energy model for embedded VLIW architectures. IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems 2002,21(9):9981010. 10.1109/TCAD.2002.801105View ArticleGoogle Scholar
 Balakrishnan M: Low Power Design. Lectures, 2008, http://embedded.cse.iitd.ernet.in/homepage/course/low_power/index.shtml
 Laurent J, Senn E, Julien N, Martin E: High level energy estimation for DSP systems. Proceedings International Workshop on Power And Timing Modeling and Optimization and Simulation (PATMOS '01), September 2001 311316.Google Scholar
 Senn E, Julien N, Laurent J, Martin E: Power consumption estimation of a C program for dataintensive applications. In Proceedings of the 12th International Workshop on Integrated Circuit Design. Power and Timing Modeling, Optimization and Simulation (PATMOS'02), 2002, London, UK. Springer; 332341.Google Scholar
 Senn E, Laurent J, Julien N, Martin E: SoftExplorer: estimating and optimizing the power and energy consumption of a C program for DSP applications. EURASIP Journal on Applied Signal Processing 2005,2005(16):26412654. 10.1155/ASP.2005.2641MATHView ArticleGoogle Scholar
 SoftExplorer: Processors Power Estimation Tool http://portal.acm.org/citation.cfm?id=1287311
 Schneider M, Blume H, Noll TG: Power estimation on functional level for programmable processors. Journal of Advances in Radio Science 2005, 2: 215219.View ArticleGoogle Scholar
 Texas Instruments Inc : TMS320C6416T, Fixed Point Digital Signal Processor, Datasheet. SPRS226J, November 2003, http://www.ti.com
 Agilent Technologies Inc : Agilent 34410A Digital Multimeter, Datasheet. 59893738EN, October 2007, http://www.home.agilent.com/agilent/product.jspx?pn=34410A
 Draper NR, Smith H: Applied Regression Analysis, Wiley Series in Probability and Mathematical Statistics. 2nd edition. John Wiley & Sons, New York, NY, USA; 1981.Google Scholar
 Texas Instruments Inc : TMS320C6416T, Technical Overview. SPRU395B, January 2001, http://www.ti.com
 Ibrahim MEA, Rupp M, Fahmy HAH: Power estimation methodology for VLIW digital signal Processor. In Proceedings of the Conference on Signals, Systems and Computers (SSC '08), October 2008, Asilomar, Calif, USA. IEEE Computer Society; 18401844.Google Scholar
 Ibrahim MEA, Rupp M, Habib SED: Compilerbased optimizations impact on embedded software power consumption. In Proceedings of the IEEE Joint NorthEast Workshop on Circuits and Systems and TAISA Conference (NEWCASTAISA '09), June 2009, Toulouse, France. IEEE; 247250.Google Scholar
 Ibrahim MEA, Rupp M, Habib SED: Performance and power consumption tradeoffs for a VLIW DSP. In Proceedings of the IEEE International Symposium on Signals, Circuits and systems (ISSCS '09), July 2009, Iasi, Romania. IEEE; 179200.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.