In this section the methods and concepts used in RTES performance evaluation are covered. This comprises an introduction to design Y-chart in RTES performance evaluation, phases of a model-based RTES performance evaluation process, discussion on modeling language and tool development, and a short introduction to RTES timing analysis concepts. Finally, the related work on UML in RTES performance evaluation is examined.
2.1. Design Y-Chart and RTES Modeling
Typical approach for RTES performance evaluation follows the design Y-chart  presented in Figure 2 by separating the application description from underlying platform description. These two are bound in the mapping phase. This means that communication and computation of application functionalities are committed onto certain platform resources.
There are several possible abstraction levels for describing the application and platform for performance evaluation. One possibility is to utilize abstract specifications. This means that application workload and performance of the platform resources are represented symbolically without needing detailed executable descriptions.
Application workload is a quantity which informs how much capacity is required from the underlying platform components to execute certain functionality. In model-based performance evaluation the workloads can be estimated based on, for example, standard specifications, prior experience from the application domain, or available processing capacity. Legacy application components, on the other hand, can be profiled and performance models of these components can be evaluated together with the models of components yet to be developed.
In addition to computational demands, communication demands between application parts must be considered. In practice, the communication is realized as data messages transmitted between real-time operating system (RTOS) threads or between processing elements over an on-chip communication network. Shared buses and Network-on-Chip (NoC) links and routers perform scheduling for transmitted data packets in an analogous way as PEs execute and schedule computational tasks. Moreover, inter-PE communication can be alternatively performed using a shared memory. The performance characteristics of memories as well as their utilization play a major role in the overall system performance. The impact of computation, communication, and storage activities should all be considered in system-level analysis to enable successful performance evaluation of a modern SoC.
2.2. Model-Based RTES Performance Evaluation Process
RTES performance evaluation process must follow disciplined steps to be effective. From SoC designer's perspective, a generic performance evaluation process consists of the following steps. Some of the concepts of this and the next subsection have been reused and modified from the work in :
(1)selection of the evaluation techniques and tools,
(2)measuring, profiling, and estimating workload characteristics of application and determining platform performance characteristics by benchmarking, estimation, and so forth,
(3)constructing system performance model,
(4)measuring, executing, or simulating system performance models,
(5)interpreting, validating, monitoring, and back-annotating data received from previous step.
The selection of the evaluation techniques and tools is the first and foremost step in the performance evaluation process. This phase includes considering the requirements of the performance analysis and availability of tools. It determines the modeling methods used and the effort required to perform the evaluation. It also determines the abstraction level and accuracy used. All further steps in the process are dependent on this step.
The second step is performed if the system performance model requires initial data about application task workloads or platform performance. This is based on profiling, specifications, or estimation. The application as well as platform may be alternatively described using executable behavioral models. In that case, such additional information may not be needed as all performance data can be determined during system model execution.
The actual system model is constructed in the third step by a system architect according to defined metamodel and model representation methods. Gathered initial performance data is annotated to the system model. The annotation of the profiling results can also be accelerated by combining the profiling and back-annotation with automation tools such as .
After system modeling, the actual analysis of the model is carried out. This may involve several model transformations, for example, from UML to SystemC. The analysis methods can be classified into dynamic and static methods . Dynamic methods are based on executing the system model with simulations. Simulations can be categorized into cycle-accurate and system-level simulations. Cycle-accurate simulation means that the timing of system behavior is defined by the precision of a single clock cycle. Cycle-accuracy guarantees that at any given clock cycle, the state of the simulated system model is identical with the state of the real system. System-level simulation uses higher abstraction level. The system is represented at IP-block level consisting coarse grained models of processing, memory, and communication elements. Moreover, the application functionality is presented by coarse-grained models such as interacting tasks.
Static (or analytic) methods are typically used in early design-space exploration to find different corner cases. Analytical models cannot take into consideration sporadic effects in the system behavior, such as aperiodic interrupts or other aperiodic external events. Static models are suited for performance evaluation when deterministic behavior of the system is accurate enough for the analysis.
Static methods are faster and provide significantly larger coverage of the design-space than dynamic methods. However, static methods are less accurate as they cannot take into account dynamic performance aspects of a multiprocessor system. Furthermore, dynamic methods are better suited for spotting delayed task response times due to blocking of shared resources.
Analysing, measuring, and executing the system performance models produces usually a massive amount of data from the modeled system. The final step in the flow is to select, interpret, and exploit the relevant data. The selection and interpretation of the relevant data depends on the purpose of the analysis. The purpose can be early design-space exploration, for example. In that case, the flow is usually iterative so that the results are used to optimize the system models after which the analysis is performed again for the modified models. In dynamic methods, an effective way of analysing the system behavior is to visualize the results of simulation in form of graphs. This helps the designer to efficiently spot changes in system behavior over time.
2.3. Modeling Language and Tool Development
SoC designers typically utilize predefined modeling languages and tools to carry out the performance evaluation process. On the other hand, language and tool developers have their own steps to provide suitable evaluation techniques and tools for SoC designers. In general they are as follows:
(1)formulation of metamodel,
(2)developing methods for model representation and capturing,
(3)developing analysis tools according to selected modeling methods.
The formulation of the metamodel requires very similar kind of consideration on the objectives of the performance analysis as the selection of the techniques and tools by SoC designers. The created metamodel determines the effort required to perform the evaluation as well as the abstraction level and accuracy used. In particular, it defines whether the system performance model can be executed, simulated, or statically analysed.
The second step is to define how the model is captured by a designer. This phase includes the selection or definition of the modeling language (such as UML, SystemC or a custom domain-specific language). The selection of notations also requires transformation rules defined between the elements of the metamodel and the elements of the selected description language. In case of UML2, the metamodel concepts are mapped to UML2 metaclasses, stereotyped model elements, and diagrams.
We want to emphasize the importance of performing these first two steps exactly in this order. The definition of the metamodel should be performed independently from the utilized modeling language and with full concentration on the primary objectives of the analysis. The selection of the modeling language should not alter the metamodel nor bias the definition of it. Instead, the modeling language and notations should be tailored for the selected metamodel, for instance, by utilizing extension mechanisms of the UML2 or defining completely new domain-specific language. The reason for this is that model notations contribute only to presentational features. Model semantics truly determine whether the model is usable for the analysis. Nevertheless, presentational features determine the feasibility of the model for a human designer.
The final step is the development of the tools. To provide efficient evaluation techniques, the implementation of the tools should follow the created metamodel and its original objectives. This means that the original metamodel becomes the foundation of the internal metamodel of the tools. The system modeling language and tools are linked together with model transformations. These transformations are used to convert the notations of the system modeling language to the format understood by the tools, while the semantics of the model is maintained.
2.4. RTES Timing Analysis Concepts
A typical SoC contains heterogeneous processing elements executing complex application tasks in parallel. The timing analysis of such a system requires abstraction and parameterization of the key concerns related to resulting performance.
Hansson et al. define concepts for RTES timing analysis . In the following, a short introduction to these concepts is given.
Task execution time
is the time in which (in clock cycles or absolute time) a set of sequential operations are executed undisturbed on a processing element. It should be noted that the term task is here considered more generally as a sequence of operations or actions related to single-threaded execution, communication, or data storing. The term thread is used to denote typical schedulable object in an RTOS. profiling the execution time does not consider background activities in the system, such as RTOS thread pre-emptions, interrupts, or delays for waiting a blocked shared resource. The purpose of execution time is to determine how much computing resources is required to execute the task. Task response time
, on the other hand, is the actual time it takes from beginning to the end of the task in the system. It accounts all interference from other system parts and background activities.
Execution time and response time can be further classified into worst case (wc), best case (bc), and average case (ac) times. Worst case execution time is the worst possible time the task can take when not interfered by other system activities. On the other hand, worst case response time is the worst possible time the task may take when considering the worst case scenario in which other system parts and activities interfere its execution. In multimedia applications that require streaming data processing, the worst case and average case response times are usually the ones needed to be analysed. However, in some hard real-time systems, such as a car air bag controller, also the best case response time () may be as important as the . Average case response time is usually not so significant. Jitter is a measure for time variability. For a single task, jitter in execution time can be calculated as . Respectively, jitter in response time can be calculated as .
It is assumed that the execution time is constant for a given task-PE pair. It should be noted that in practice the execution time of a function may vary depending on the processed data, for example. For these kinds of functions the constant task execution time assumption is not valid. Instead, different execution times of such functions should be modeled by selecting a suitable value to characterize it (e.g., worst or average case) or by defining separate tasks for different execution scenarios. As opposed to execution time, response time varies dynamically depending on the task's surrounding system it is executed on. The response time analysis must be repeated if
(1)mapping of application tasks is changed,
(2)new functionalities (tasks) are added to the application,
(3)underlying execution platform is modified,
(4)environment (stimuli from outside) changes.
In contrast, a single task execution time does not have to be profiled again if the implementation of the task is not changed (e.g., due to optimization) assuming that the PE on which the profiling was carried out is not changed. If the PE executing is changed and the profiling uses absolute time units, then a reprofiling is needed. However, this can be avoided by utilizing PE-neutral parameters, such as number of operation, to characterize the execution load of the task. Other possibility is to represent processing element performances using a relative speed factor as in .
In multiprocessor SoC performance evaluation, simulating the profiled or estimated execution times (or number of operations) of tasks on abstract HW resource models is an effective way of observing combined effects of task execution times, mapping, scheduling, and HW platform parameters on resulting task response times, response time jitters, and processing element utilizations.
Timing requirements of SoC functions are compared against estimated, simulated, or measured response times. It is typical that timing requirements are given as combined response times of several individual tasks. This is naturally completely dependent on the granularity used in identifying individual tasks. For instance, a single WLAN data transmission task could be decomposed into data processing, scheduling, and medium access tasks. Then examining if the timing requirement of a single data transmission is met requires examining the response times of the composite tasks in an additive manner.
2.5. On UML in Simulation-Based RTES Performance Evaluation
Related work has several static and dynamic methods for performance evaluation of parallel computer systems. A comprehensive survey on methods and tools used for design-space exploration is presented in . Our focus is on dynamic methods and some of the closest related research to our work are examined in the following.
Erbas et al.  present a system-level modeling and simulation environment called Sesame, which aims at efficient design space exploration of embedded multimedia system architectures. For application, it uses KPN for modeling the application performance with a high-level programming language. The code of each Kahn process is instrumented with annotations describing the application's computational actions, which allows to capture the computational behavior of an application. The communication behavior of a process is represented by reading from and writing to FIFO channels. The architecture model simulates the performance consequences of the computation and communication events generated by an application model. The timing of application events are simulated by parameterizing each architecture model component with a table of operation latencies. The simulation provides performance estimates of the system under study together with statistical information such as utilization of architecture model components. Their performance metamodel and approach has several similarities with ours. The biggest differences are in the abstraction level of HW communication modeling and visualization of the system models and performance results.
Balsamo and Marzolla  present how UML use case, activity and deployment diagrams can be used to derive performance models based on multichain and multiclass Queuing Networks. The UML models are annotated according to the UML Profile for Schedulability, Performance and Time Specification . This approach has been developed for SW architectures rather than for embedded systems. No specific tool framework is presented.
Kreku et al.  propose a method for simulation-based RTES performance evaluation. The method is based on capturing application workloads using UML2 state-machine descriptions. The platform model is constructed from SystemC component models that are instantiated from a library. Simulation is enabled with automatic C++ code generation from UML2 description, which makes the application and platform models executable in a SystemC simulator. Platform description provides dedicated abstract services for application to project its computational and communicational loads on HW resources. These functions are invoked from actions of the state-machines. The utilization of UML2 state-machine enables efficiently capturing the control structures of the application. This is a clear benefit in comparison to plain data flow graphs. The platform services can be used to represent data processing and memory accesses. Their method is well suited for control-intensive applications as UML state-machines are used as the basis of modeling. Our method targets at modeling embedded streaming data applications with less effort required in modeling using UML activity diagrams.
Madl et al.  present how distributed real-time embedded systems can be represented as discrete event systems and propose an automated method for verification of dense time properties of such systems. The model of computation (MoC) is based on tasks connected with channels. Tasks are mapped onto machines that represent computational resources of embedded HW.
Our performance evaluation method is based on executable streaming data application workload model specified as UML activity diagrams and abstract platform performance model specified in composite structure diagrams. In comparison to related work, this is the first proposal that defines transformation between UML activity diagrams and streaming data application workload models and successfully adopts it for embedded RTES performance evaluation.