This section summarizes the most recent contributions to the problem of power modeling and estimation. Recent approaches to model the power consumption of the software running on a processor can be separated into two main categories low-level models and high-level models. Low-level models calculate power and energy from detailed electrical descriptions, comprising circuit level, gate level, register transfer (RT) level, or system level. while, high-level models deal only with instructions and functional units from the software point of view and without electrical knowledge of the underlying architecture [1].
2.1. Low-Level Estimation Techniques
The level of detail in the modeling performed by the power simulator influences both the accuracy of estimation as well as the speed of the simulator. In this section we survey the models frequently used at low level as these power consumption estimation techniques cover a range of abstractions such as the circuit/transistor level, logic gate level, RT level, and architectural level.
2.1.1. Transistor-Level Estimations
The representation of a microprocessor in terms of transistors and nets is extremely complex and requires undergoing all the steps of the design flow and the layout, routing, and parameter extraction inclusive. Furthermore, a transistor level view of the system employs component models based on linearized differential equations and works in the continuous time domain. This implies that a simulation of more than one million transistors, even for a few clock cycles, requires times that are usually not affordable and anyway not practical for the high-level power characterization [2]. Thus, while providing very good accuracy; transistor-level power estimation methodology is slow and impractical for analyzing the power consumption at an early design stage. Moreover, this methodology requires the availability of lower level circuit details of the targeted processor, which is not available for most of commercial off-the-shelf processors.
The PowerMil [3] is an early attempt to build a low-level power consumption simulator. PowerMil is a transistor level simulator for simulating the current and power behavior in VLSI circuits. It is capable of simulating detailed current behavior in modern deep submicron CMOS circuits, including sophisticated circuitries such as sense-amplifiers, with speed and capacity approaching conventional gate level simulators. For more details about power estimation techniques in VLSI circuits refer to [4, 5].
2.1.2. Gate-Level Estimations
Methods to estimate the power consumption based on gate level descriptions of microprocessors or micro controller cores have been proposed in literature. The main advantage of such methods with respect to transistor-level simulation approaches is that the simulation is event-driven and takes place in a discrete time domain, leading to a considerable reduction of the computational complexity, without a significant loss of accuracy [2].
An example for the gate level power estimators is the model presented by Chou [6]. present an accurate estimation of signal activity at the internal nodes of sequential logic circuits. The power consumption estimation in Chou and Roy is a Monte Carlo based approach that take spatial and temporal correlations of logic signals into consideration.
2.1.3. RT-Level Estimations
A design described at RT-level can be regarded as a collection of blocks and a network of interconnections. The blocks are sometimes referred to as macros, adders, registers, multiplexers, and so on, while the interconnections are simply nets or group of nets. An assumption underlying the great majority of the approaches presented in the literature is that the power properties of a block can be derived from an analysis of the block isolated from a design, under controlled operating conditions. The main factor influencing the power consumption model of a macro is the input statistics [2].
Most of the research in RT level power estimation is based on empirical methods that measure the power consumption of existing implementations and produce models from those measurements. This is in contrast to the approaches that rely on information-theoretic measures of activity to estimate power [7, 8]. Measurement-based approaches for estimating the power consumption of datapath functional units can be divided into two subcategories, namely; transition sensitive and activity sensitive. The first technique, introduced by Powel and Chau [9], is a fixed-activity micromodeling strategy called the Power Factor Approximation (PFA) method. The power models are parameterized in terms of complexity parameters and a PFA proportional constant. Similar schemes were also proposed by Kumar et al. [10] and Liu and Svensson [11]. This approach assumes that the inputs do not affect the switching activity of a hardware block. To remedy this problem, activity-sensitive empirical power models were developed. These schemes are based on predictable input signal statistics; an example is the method proposed by Landman and Rabaey [12]. The overall accuracy of such models may be hampered due to incorrect input statistics or the inability to correctly model the interaction.
The second empirical technique, transition-sensitive power models, is based on input transitions rather than input statistics. The method, proposed by Mehta et al. [13], assumes that a power model is provided for each functional unit—a table containing the power consumed for each input transition. Closely related input transitions and power patterns can be concentrated in clusters, thereby reducing the size of the table. Other researchers have also proposed similar macro-model-based power estimation approaches [14, 15].
2.1.4. Architectural-Level Estimations
Recently, various architectural power simulators have been designed that employ a combination of lower level of abstraction power consumption models. These simulators derive power estimates from the analysis of circuit activity induced by the application programmes during each cycle and from detailed capacitive models for the components activated. A key distinction between these different simulators is the estimation accuracy and estimation speed. For example, the SimplePower power simulator [16] employs a transition-sensitive power model for the datapath functional unit. The SimplePower core accesses a table containing the switch capacitance for each input transition of the functional unit exercised.
The use of a transition-sensitive approach has both design challenges as well as performance concerns during simulation. The first concern is that the construction of these tables is time-consuming. Unfortunately, the size of this table grows exponentially with the size of the inputs. The table construction problem can be addressed by partitioning and clustering mechanisms. The second concern is the performance cost of the table lookup for each component access in a cycle. In order to overcome this cost, simulators such as SoftWatt [17] and Wattch [18] utilize a simple fixed-activity model for the functional unit. These simulators only track the number of accesses to a specific component and utilize an average capacity value to estimate the power consumed. Even the same simulator can employ different types of power models for different components. For example, SimplePower estimates the power consumed in the memories utilizing analytical models [19]. In contrast to the datapath components that utilize a transition-sensitive approach, these models estimate the power consumed per access and do not accommodate the power differences found in sequences of accesses. One of the most widely used microarchitectural power simulators is Wattch [18]. Wattch is a power simulator for superscalar, out-of-order, processors. It has been developed with aid of the infrastructure offered by SimpleScaler [20]. The power estimation engine of Wattch is based on the SimpleScaler architecture, but in addition, it supports detailed cycle-accurate information for all models, including datapath elements, memory, control logic, and clock distribution network [21].
While providing good accuracy, low-level power estimation methodologies are slow and impractical for analyzing the power consumption at an early design stage. Moreover, these methodologies require the availability of lower level circuit details or a complete Hardware Description Language (HDL) design of the targeted processor, which is not available for most of the commercial off-the-shelf processors.
2.2. High-Level Estimation Techniques
Recently, the demand has increased for high level power estimation simulators that allow an early design space exploration from the power consumption perspective. The existing high-level power estimation models can be classified into two main categories, Instruction Level Power Analysis (ILPA) and Functional Level Power Analysis (FLPA).
2.2.1. Instruction Level Power Analysis
An instruction level power model for individual processors was first proposed by Tiwari et al. [22]. By measuring the current drawn by the processor as it repeatedly executes distinct instructions or distinct instruction sequences, it is possible to obtain most of the information that is required to evaluate the power consumption of a program for the processor under test. Tiwari et al. modeled the power consumption of the Intel DX486 processor. Power is modeled as a base cost for each instruction plus the interinstruction overheads that depend on neighboring instructions. The base cost of an instruction can be considered as the cost associated with the basic processing needed to execute the instruction. However, when sequences of instructions are considered, certain interinstruction effects come into play, which are not reflected in the cost computed solely from base cost. These effects can be summarized as the following.
(i)Circuit state: switching activity depends on the current inputs and previous circuit state, In other words the difference between the bit pattern of two successive instructions.
(ii)Resource constraints: resource constraints in the CPU can lead to stalls, for example, pipeline stalls and write buffer stalls.
(iii)Cache misses: another interinstruction effect is the effect of cache misses. The instruction timings listed in manuals provide the cycle count assuming a cache hit. For a cache miss, a certain cycle penalty has to be added to the instruction execution time.
An experimental method is proposed by Tiwari et al. to empirically determine the base and the inter-instructions overhead cost. In this experimental method, several programs containing an infinite loop consisting of several instances of the given instruction or instruction sequences are used. The average current drawn by the processor core during the execution of this loop is measured by a standard off-the-shelf, dual-slope integrating digital multimeter. Much more accurate measuring environments have been proposed to precisely monitor the instantaneous current drawn by the processor instead of the average current. One of these approaches has employed a high-performance current mirror based on bipolar junction transistors as current sensing circuit. The power profiler in the work of Nikolaidis et al. [23] receives as input the trace file of executed assembly instructions, generated by an appropriate processor simulator, and estimates the base and interinstruction energy cost of the executed program taking into account the energy sensitive factors as well as the effect of pipeline stalls and flushes. The main disadvantage of this approach is the current measuring complexity [24].
Another approach, to reduce the spatial complexity of instruction-level power models is presented in [25]. Therein, interinstruction effects have been measured by considering only the additional energy consumption observed when a generic instruction is executed after a no-operation (NOP) instruction.
An attempt to modify the original ILPA to create an instruction level power model with a gate level simulator is carried out by Sama et al. [26]. In this approach, the power cost values were obtained through a power simulator rather than actual measurement; thus modeling is possible at design time and can be part of microarchitecture and/or instruction set architecture exploration. More researchers attempted to enhance the original Tiwari ILPA power consumption modeling technique as in [27–29].
The ILPA-based methods have some drawbacks, one of these drawbacks is that the number of current measurements is directly related to the number of instructions in the Instruction Set Architecture (ISA) and also the number of parallel instructions composing the very long instruction in the VLIW processor. The problem of instruction level power characterization of
-issue VLIW processor is
where
is the number of instructions in the ISA and
is number of parallel instructions composing the VLIW [30]. Also they do not provide any insight on the instantaneous causes of power consumption within the processor core, which is seen as a black-box model. Moreover, the effect of varying data (as well as address) is ignored in the ILPA models, though this effect can be accounted by an additive factor [31].
2.2.2. Function Level Power Analysis
FLPA was first introduced by Laurent et al. in [32]. The functional level power modeling approach is applicable to all types of processor architectures. Furthermore, FLPA modeling can be applied to a processor with moderate effort, and no detailed knowledge of the processors circuitry is needed. The basic idea behind the FLPA is the distinction of the processor architecture into functional blocks like processing Unit (PU), instruction management unit (IMU), internal memory, and others [32]. First, a functional analysis of these blocks is performed to specify and then discard the nonconsuming blocks (those with negligible impact on the power-consumption). The second step is to figure out the parameters that affect the power consumption of each of the power consuming blocks. For instance, the IMU is affected by the instructions dispatching rate which in turn is related to the degree of parallelism. In addition to these parameters, there are some parameters that affect the power consumption of all functional blocks in the same manner such as operating frequency and word length of input data.
Laurent et al. [32, 33] presented a functional level power consumption model for the TI C62x DSP series. The C62x series has four internal memory modes which are handled by Laurent et al. model. The targeted architecture in our proposed model is the TI C64x. There are significant differences between the C64x and the C62x architectures. The internal program and data memories of the C62x have been replaced by two level-1 caches in the C64x. Moreover, the C64x includes a level-2 SRAM that is utilized both for the data and program with the ability to be partially configured as level-2 cache memory. Moreover, the C64x has the ability to utilize SIMD instructions. The number of registers is also doubled (2 × 32 in place of 2 × 16).
Unlike the model presented in [32, 33] that considers the load and store instructions as a part of the processing unit submodel, we had to consider the internal memory as a separate functional block. Hence, we believe that our proposed model is substantially different from the model presented in [32, 33] but required for accurately modeling the C64 family.
The work presented in [34] briefly points to the C64x but is lacking details. Furthermore, as of now (submission time of the paper) the model for the C64x is not included in the library of the latest SoftExplorer tool [35]. Unlike the model introduced in [36] that employs the parallelism factor as the affecting parameter for the processing units (PUs) block, the fact that the NOP does not require any PU for its execution convinced us that another parameter yields a better description of the PUs. Moreover, as we will explain later, the level-1 data cache memory submodel is different as well.