Accurate energy characterization of OS services in embedded systems
© Ouni et al.; licensee Springer. 2012
Received: 31 January 2012
Accepted: 5 July 2012
Published: 25 July 2012
As technology scales for increased circuit density and performance, the management of power consumption in embedded systems is becoming critical. Because the operating system (OS) is a basic component of the embedded system, the reduction and characterization of its energy consumption is a main challenge for the designers. In this work, a flow of low power OS energy characterization is introduced. The variation of the energy and power consumption of the embedded OS services is studied. The remainder of this article details the methods used to determine energy and power overheads of a set of basic services of the embedded OS: scheduling, context switch and inter-process communication. The impacts of hardware and software parameters like processor frequency and scheduling policy on the energy consumption are analyzed. Also, models and laws of the power and energy are extracted. Then, to quantify the low power OS energetic overhead, the obtained models are integrated in the system level design. Our method allows estimating the energy consumption of the low power OS services when running an application on a specific hardware platform.
Nowadays energy consumption in embedded systems has become one of the key challenges for the software and hardware designers. The embedded operating system (OS) serves as an interface between the application software and the hardware. It is an important software component in many embedded system applications since it drives the exploitation of the hardware platform by offering a wide variety of services: task management, scheduling, inter-process communication (IPC), timer services, I/O operations and memory management. Also, the embedded OS manages the overall power consumption of the embedded system components. It includes many power management policies aiming at keeping components into lower power states, thereby reducing energy consumption.
Under the French research program Open-PEOPLE project, we aim at characterizing and optimizing the energy consumption of the embedded OS services. In this article, variation of the scheduling routines, IPC and context switch energy consumption as a function of hardware and software parameters is studied. The remainder of this article is organized as follows: Related works are described in Section “Related works”. Energy characterization and estimation flow is presented in Section “Energy characterization and estimation flow”. Section “Hardware platform” introduces the hardware platform and explains the measurement setup for measuring the energy consumption. Then, Sections “Embedded OS power and energy models” and “Experimental results” explains our methodology to characterize the embedded OS services energy overhead and describes experimental results and derived models. Section “Embedded OS service’s models integration in the system level design flow” shows the low power OS services’s models integration in the system level design flow. Finally, Section “Conclusion” concludes the article and proposes the future work.
In order to characterize energy and power overhead of embedded OSs, several studies have proposed evaluating embedded OS energy consumption at different abstraction levels.
Li and John introduced a routine level power model. According to them, the elementary unit is the OS service routine. So, they consider the energy consumed by the OS services as the sum of those consumed by each routine. They proposed a model of power consumption based on the correlation that they found between the power and the instruction per cycle (IPC) metric.
Acquaviva et al. proposed a new methodology to characterize the OS energy overhead. They measured the energy consumption of the eCos real time OS (RTOS) running on a prototype wearable computer, HP’s SmartBadgeIII. Then, they studied the energy impact of the RTOS both at the kernel and at the I/O driver level and determine the key parameters affecting the energy consumption. This work studied the relation between the power and performance of the OS services and the CPU clock frequency. Acquaviva et al. perform analysis but they did not model the energy consumption of the OS services and drivers.
Tan et al. modeled the OSs energy consumption at the kernel level. They classify the energy into two groups: the explicit energy which is related directly to the OS primitives and the implicit energy which results from the running of the OS engine. The authors explained their approaches to measure these classes of energy and they proposed energy consumption macro models. Then, Tan et al. validated their methodology on two embedded OSs, μCOS and Linux OS. However, the scope of the proposed work in is limited in some ways as it targets the OS’s running on a single processor. Also, the authors do not consider the I/O drivers in the proposed energy consumption model.
Dick et al. analyzed the power consumption of the μCOS OS which is running several embedded applications on a Fujitsi SPARClite processor based embedded system. The authors demonstrated that the OS functions have an important impact on the total energy consumption. This impact depends on the complexity of the applications. The presented work represents only an analysis of OS power consumption. Dick et al. did not propose an energy consumption model.
Baynes et al. described their simulation environment, Simbed, which evaluates the performance and energy consumption of the RTOS and embedded applications. The authors compared three different RTOS’s: μCOS, Echidna and NOS. They found that the OS overhead depends on the applications. It is so high for the lightweight applications and diminishes for more compute-intensive applications. Nevertheless, Baynes et al. presented high level energy measurements (simulation), the extracted models are not realistic because they are not deduced from measurements on actual hardware platform. Also, the energy consumption of OS services compared with the total application energy consumption was not calculated. Guo et al. proposed a novel approach using hopfield neural network to solve the problem of RTOS power partitioning; they target to optimally allocate the RTOS’s behavior to the hardware/software system. They defined a new energy function for this kind of neural network and some considerations on the state updating rule. The obtained simulation results show that the proposed method can perform energy saving up to 60%. This work does not consider energy macro-modeling and RTOS services. Zhao et al. propose a new approach to estimate and optimize the energy consumption of the embedded OS and the applications at a fine-grained level. The work is based on power model and a new estimation model for OS energy consumption. Zhao et al. demonstrate that the approach can characterize and optimize the energy consumption of fine-grained software components. Fournel et al. present a performance and energy consumption simulator for embedded system executing an application code. This work allows designers to get fast performance and consumption estimations without deploying software on target hardware, while being independent of any compilation tools or software components such as network protocols or OSs.
Fei et al. interest in reducing the energy consumption of the OS-driven multi-process embedded software programs by transforming its source code. They minimize the energy consumed in the execution of OS functions and services. The authors propose four types of transformations, namely process-level concurrency management, message vectorization, computation migration and IPC mechanism selection. Fei et al. evaluate the applicability of theses techniques in the context of an embedded system containing an Intel StrongARM processor and embedded Linux OS. They manage process-level concurrency through process merging to save context switch overhead and IPCs. They modify the process interface by vectorizing the communications between processes and selecting an energy-efficient IPC mechanism. This work attempt to relocate computations from one process to another so as to reduce the number and data volume of IPCs. These transformations provide complementary optimization strategies to traditional compiler optimizations for energy savings. Dhouib et al. propose a multi-layer approach to estimate the energy consumption of embedded OSs. The authors start by estimating energy and power consumption of standalone tasks. Then they add energy overheads of the OS services which are timer interrupts, IPC and peripheral device accesses. They validate the multi-layer approach by estimating the energy consumption of an M-JPEG encoder running on linux 2.6 and deployed on a XUP Virtex-II pro development board. In the recent works, low power OSs, which are the embedded OS allowing the binding of the application on hardware platform adapting low power techniques, are not considered. It is not mentioned what are the processor capabilities and which low power policy is used. In fact, the energy consumption of OS services depends on low power policy used. For example, we will see in this article that the context switch energy overhead is important due to the use of specific technique, aiming at reducing the power consumption, that stimulates this service. Creative chip designers have come up with a variety of methods and techniques to reduce power without impacting system performance. For instance, the embedded system used in this work, OMAP35x EVM board, has three basic methods to reduce the power consumption: dynamic voltage and frequency scaling (DVFS), adaptive voltage scaling (AVS) and dynamic power switching (DPS). This article relies on studying the energy overhead of embedded OS adapting DVFS technique.
Energy characterization and estimation flow
Where represents the energy consumed by the task T i ,Eintertask is the energy consumed by the task routines and operations, p is the number of services used by the task T i ,δi,j is energy consumption rate of the task T i using the service S j and the energy consumption of the service S j .
We consider t the total number of the OS services, x j the number of the parameters that influence the energy consumption of the service S j ,1 ≤ j ≤ t.
αi,j: energy consumption rate of the task T i using the service BS j . βi,k: energy consumption rate of the task T i using the service SS k . and represent respectively the energy consumed by the service BS j and SS k .
In the next section, we will explain the approach used to characterize the embedded OS services energy overhead and its variation with hardware and software parameters. Next section describes the hardware platform used and details the energy measurement setup.
Embedded OS power and energy models
In this section, embedded OS services energy characterization approaches are introduced, three important services are studied: the scheduling, the context switch and IPC.
The scheduling routines
Scheduling routines and operations could generate power overhead on the processor and/or memory components. They are considered as system calls and only consist in switching the processor from unprivileged user mode to a privileged kernel model. To quantify power and energy overhead of embedded OS scheduler routines and operations, we have to build test programs containing threads with different priorities, we measure in a first step the average energy consumed by the standalone tasks without scheduling routines, and then with scheduling routines.
Where Ewithsch and Ewithoutsch represent respectively the energy consumed by the benchmarks with scheduling routines and without scheduling routines.
We vary several parameters when running the test programs. The applicative parameter that we can change is the scheduling policy. We also modify the processor frequency as a hardware parameter. We are interested in studying the influence of three scheduling policies: SCHED_FIFO, SCHED_RR and SCHED_OTHER.
SCHED_FIFO policy is used with static priorities higher than 0, it is a scheduling algorithm without time slicing. Under this policy, a process which is preempted by another process having higher priority will stay at the head of the list for its priority and will resume execution as soon as all processes of higher priority are blocked again. If there are two SCHED_FIFO processes having the same priority, the process which is running will continue its execution until it decides to give the processor up. The process having the highest priority will use the processor as long as it needs it.
SCHED_RR policy enhances the SCHED_FIFO one; hence, everything described above for SCHED_FIFO also applies to SCHED_RR except that each process is only allowed to run for a maximum time called quantum. If a SCHED_RR process has been running for a time period equal to or greater than the time quantum, it will be put at the end of the priority list. A SCHED_RR process that has been preempted by a higher priority process subsequently resumes execution as a running process will complete the unexpired portion of its round robin time quantum.
SCHED_OTHER policy is only used at static priority 0. To ensure a fair progress among the processes, the SCHED_OTHER scheduler elects a process to run from the static priority 0 list based on a dynamic priority that is determined only inside this list. The dynamic priority is based on the nice level and increased for each time quantum the process is ready to run, but denied to run by the scheduler.
The context switch
The context switch is a mechanism which occurs when the kernel changes the control of the processor from an executing process to another that is ready to run. The kernel saves the state of current process including the processor register values and other data that describes this state. Then, it loads the saved state of the new process for execution.
In the majority of recent work presented previously, the authors do not take into account the energy and time overheads of this service when studying the energy consumption of the OSs. They include it with the scheduling service, but the two services are distinct. Actually, in the embedded systems, the processor has two operating modes: the kernel mode and user mode. The processes running on kernel and user mode are called kernel and user processes respectively. The user process runs in a memory space which can be swapped out when necessary. When the processor needs the user process to execute a kernel code, the process become in kernel mode with administrative privileges. In this case, the processor has no restrictions while executing the instructions and will access to key system resources. Once the kernel process finishes its workload, it returns to the initial state as a user process. The scheduler switches the processor from the user mode to a kernel mode via system calls; this mechanism is named the mode switch. Unlike the mode switch, the context switch consists in switching the processor from one process to another.
Context switching introduces direct and indirect overheads. Direct context switch overheads include saving and restoring processor registers, flushing the processor pipeline, and executing the OS scheduler. Indirect overheads involve the switch of the address translation maps used by the processor when the threads have different virtual address spaces. This switch perturbs the TLB (CPU cache that memory management hardware unit uses to improve virtual address translation speed) states. Also, the indirect context switch includes the perturbation of the processor’s caches. In fact when a thread T1 is switched out and a new thread T2 starts the execution, the cache state of T1 is perturbed and some cache blocks are replaced. So, when T1 resumes the execution and restores the cache state, it gets a cache misses. Besides, the OS memory paging represents a source of the indirect overhead since the context switch can occur in a memory page moved to the disk when there is no free memory. Prior research has shown that indirect context switch overheads, mainly the cache perturbation effect, are significantly larger than direct overheads.
Where Pstep 1 and P stepn are, respectively the average power consumption of the benchmarks in step 1 and step n.
We execute the test programs following the characterization approach. Then, we vary the scheduling policy and the frequency, we note the power and performance variations and we extract energy models.
The scheduling policy impact on the context switch overhead
It is noted that the context switch energy overhead decreases with the increase of the number of context switches. In fact, to switch from one process to another, the state of each process should be saved in a data structure named process control block (PCB). The energy overhead of the creation of the PCB is accounted with the context switch energy overhead and is divided between the context switches so that if the number of context switches increases the average Ecs per context switching decreases. Also, when the scheduling policy used is SCHED_FIFO, the context switch energy overhead is more important than the energy for the SCHED_RR scheduling policy. In fact, under the round robin scheduling policy, the processor assigns time slices (quantum) to each process. So, before the context switches that we generate, there is another context switches that occurred automatically due to the expiration of the quantum of the process P1. Consequently, the PCB is created during the automatic context switch. The energy overhead of the PCB creation is not accounted with the energy of the context switch that we generate: Ecs. But, under the FIFO scheduling policy, the processor does not switch automatically from the process P1 to P2 only if P1 terminates its execution so that the energy overhead of the PCB creation is accounted with Ecs.
We note that SCHED_OTHER processes are non real time processes, but, SCHED_RR and SCHED_FIFO are real time processes. So, SCHED_RR and SCHED_FIFO processes need more memory than SCHED_OTHER processes to save the processor registers because they execute more operations and calculations in order to respect the real time constraints so that they consume more time to change the context. Then, the context switch of the SCHED_OTHER processes consume less energy than the SCHED_RR and SCHED_FIFO ones.
The processor frequency impact on the context switch overhead
In this section, the impact of processor frequency on the context switch overhead for static and dynamic frequency cases is discussed.
Static frequency case
For this experiment, the scheduling policy and the number of context switches are fixed. The benchmarks of step 1 and step n with a static frequency are executed. The CPU frequency is varied afterwards and the benchmarks are re-executed.
Where f is the CPU frequency, the unit of Pcs and f is respectively mW and MHz.
The voltage Vdrop across the processor increases with the rise of the processor frequency so that the power consumption increases with the frequency.
Dynamic frequency case
The core frequency is dynamically changed during the test of the benchmarks: The test programs are executed in step 1 and step n. The processes P1 and P2 are executed respectively at a frequency F1 and F2. So, when the processor preempts the process P1 and executes the process P2, the core frequency changes from F1 to F2; and inversely.
IPC power models according to processor frequency
Power model: P IPC (mW)
Average error (%)
0.347×F + 6.474
0.333×F + 4.968
0.217×F + 8.542
IPC energy models according to message size
Energy model: E IPC (nJ)
Average error (%)
Embedded OS service’s models integration in the system level design flow
The OS energy and power models are integrated in the system level design flow. The energy and power estimation is targeting the system design including the software and hardware components.
As a result, simulated outputs can be computed as: either user readable in the form of diagrams or reports, or machine readable intended for a subsequent analysis tool. The user interacts with STORM through a user-friendly graphical user interface which is composed of command and display windows. The XML file generated from the AADL model having the extension “.aaxl” is not recognized by the STORM simulator. For this reason, in the code transformation step, we adapt the file generated to the simulator structure by parsing existing file “.aaxl” and extracting the data needed to generate the input file of the simulator. To extract the required data from the “aaxl” file, we use the java API JDOM which allows us to manipulate, and output XML data from Java code so that we can read and write XML data without the complex and memory-consumptive options that current API offerings provide. Because JDOM uses the Java Collections API to manage the tree, we transform the “aaxl” file to a JDOM tree, and then we extract each data by walking the tree and iterating the document.
Thanks to the significant evolution in processor technology over the last few years, processors with variable voltages and frequencies are now available, they adapt low power and energy techniques to minimize the energy consumption. Reduction in supply voltage requires reduction in operating frequency. That is why, when calculating the overhead of OS services, we execute the application on the hardware platform while adapting a low power technique: The dynamic voltage frequency scaling (DVFS) technique, it has been particularly distinguished by its efficiency to reduce CPU power consumption. It can execute various tasks of an application at different couples of voltage / frequency depending on the workload of the processor. Several strategies have been proposed to exploit certain aspects of DVFS and offer a particular method to build pseudo intermediate frequencies for use in conjunction with the techniques of dynamic voltage scaling (DVS).
Where Erunning[f m ] and Eidle[f m ] represent respectively the energy consumed by the processor, at a frequency f m , when it is in running and idle mode, n is the number of the tasks, EOS and are previously presented in Equations (2) and (4).
In the next section, taking as use case the H.264 application, the energy consumption of the OS services will be determined following the approach described previously.
The H.264 video decoder application is taken as main use case application. It is a high quality video compression algorithm relying on several efficient strategies extracting spatial (within a frame) and temporal dependencies (between frames). This application is characterized by a flexible coding, high compression and high quality resolution. Moreover, it is a promising standard for embedded devices. The main steps of the H.264 decoding process consist in the following: First, a compressed bit stream coming from the Network application layer (NAL), which formats the representation of the video and provides header information in a manner appropriate for conveyance by particular transport layers, is received at the input of the decoder. Then, the entropy decoded bloc begins with decoding the slice header where each slice consists of one or more 1616 macroblocks, and then it decodes the other parameters. The decoded data are entropy and sorted to produce a set of quantized coefficients. These coefficients are then inverse quantized and inverse transformed. Thereafter, the data obtained are added to the predicted data from the previous frames depending upon the header information. Finally the original block is obtained after the de-blocking filter to compensate the block artifacts effect.The H.264 video decoder application can be broken down into various tasks sets corresponding to different types of parallelization. In our experiments, we use the slices version, one of the task models of H.264 proposed by Thales Group, France in the context of French national project Pherma.
H.264 video decoder application tasks features
Activation date (ms)
OS services energy consumption rates
OS service S j
We have presented power and energy models of three basic services of the embedded OS adapting a low power technique: the scheduling, the context switch and IPC. These services are chosen to be characterized because they are stimulated when adapting the DVFS technique. The models are based on measurements on the hardware platform OMAP35x EVM board and allow the characterization of the energy overhead of the low power OS. Experiments show that these services consume a significant part of energy. For this reason, we plan, in the future work, to characterize other basic services of the OS such as the I/O operations and task management, then to compare the overhead of the low power OS using DVFS technique with those using other techniques, for example the DPM (Dynamic power management) technique. The future works of this project will focus on handling the use of the OS by application tasks to optimize the energy consumption of the embedded systems.
The authors would like to thank the national agency of research (ANR) in France that sponsors our research project OPEN-PEOPLE.
- Open-PEOPLE project:: Open Power and Energy Optimization Platform and Estimator. 2011.http://www.open-people.fr/France2011Google Scholar
- Li T, John LK: Run-time modeling estimation of operating system power consumption,. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems SIGMETRICS,. San Diego, California, USA; 2003. pp. 160–171Google Scholar
- Acquaviva A, Benini L, Riccó B: Energy characterization of embedded real-time operating systems,. J. ACM SIGARCH Comput. Archit. News 2001, 29: 13-18.View ArticleGoogle Scholar
- Tan TK, Raghunathan A, Jha NK: Embedded operating system energy analysis and macro-modeling,. In Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD’02),. Washington, DC, USA; 2002. pp. 515–520View ArticleGoogle Scholar
- Dick RP, Lakshminarayana G, Raghunathan A, Jha NK: Power analysis of embedded operating systems,. In Proceedings of the 37th Annual Design Automation Conference,. Los Angeles, CA, USA; 2000.Google Scholar
- Baynes K, Collins C, Fiterman E, Ganesh B, Kohout P, Smit C, Zhang T, Jacob BL: The performance and energy consumption of embedded real-time operating systems. IEEE Trans. Comput 2003, 52: pp. 1454-1469.View ArticleGoogle Scholar
- Guo B, Wang D, Shen Y, Li Z: A Hopfield neural network approach for power optimization of real-time operating systems. Neur. Comput. Appl 2008, 17: 11-17.Google Scholar
- Zhao X, Guo Y, Wang H, Chen X: Fine-grained energy estimation optimization of embedded operating systems,. In Proceedings of the 2008 International Conference on Embedded Software and Systems Symposia,. Chengdu, Sichuan, China; 2008. pp. 90–95View ArticleGoogle Scholar
- Fournel N, Fraboulet A, Feautrier P: eSimu: a fast and accurate energy consumption simulator for real embedded system,. In IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks,. Espoo, Finland; 2007. pp. 1–6View ArticleGoogle Scholar
- Fei Y, Ravi S, Raghunathan A, Jha N: Energy-optimizing source code transformations for operating system-driven embedded software. ACM Trans. Embed. Comput. Syst. (TECS) 2007, 7: 1-26.Google Scholar
- Dhouib S, Senn E, Diguet J, Laurent J: Modelling and estimating the energy consumption of embedded applications and operating systems,. In Proceedings of the 2009 12th International Symposium on Integrated Circuits, ISIC’09,. Singapore; 2009. pp. 457–461Google Scholar
- OMAP35x Evaluation Module (EVM) (2011), http://focus.ti.com/docs/toolsw/folders/print/tmdsevm3530.html
- Liu F, Guo F, Solihin Y, Kim S, Ekerm A: Characterizing and modeling the behavior of context switch misses,. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques,. Toronto, Ontario, Canada; 2008.Google Scholar
- Tsafrir D: The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops),. In ACM Workshop on Experimental Computer Science (ExpCS),. San-Diego, California, USA; 2007. p. 4Google Scholar
- Ouni B, Belleudy C, Bilavarn S, Senn E: Embedded operating systems energy overhead,. In Conference on Design and Architectures for Signal and Image Processing, DASIP,. Tampere, Finland; 2011. pp. 52–57Google Scholar
- Park J, Shin D, Chang N, Pedram M: Accurate modeling and calculation of delay and energy overheads of dynamic voltage scaling in modern high-performance microprocessors,. In Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design,. Austin, Texas, USA; 2010. pp. 419–424View ArticleGoogle Scholar
- Architecture Analysis and Design Language (AADL) standard (2011), http://www.aadl.info/
- STORM simulation tool (2011), http://storm.rts-software.org
- Pillai P, Shin KG: Real-time dynamic voltage scaling for low-power embedded operating systems,. In Proceedings of the eighteenth ACM symposium on Operating systems principles,. Banff, Alberta, Canada; 2001. pp. 89–102View ArticleGoogle Scholar
- Bhatti MK, Belleudy C, Auguin M: An inter-task real time DVFS scheme for multiprocessor embedded systems,. In Proceedings of the 2010 Conference on Design and Architectures for Signal and Image Processing (DASIP’10). Edinburgh, Scotland, United Kingdom; 2010.Google Scholar
- Thales group (France), France, 2011, http://www.thalesgroup.com
- ANR project Pherma, France, (2007–2010), http://pherma.irccyn.ec-nantes.fr
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.