This section presents the results of simulating our approach on a set of synthetic and randomly generated task sets. The simulation environment is described next.

**Considered platform** We consider a platform consisting of two or four homogeneous cores.

**Task generation** Each task *τ*
_{
i
} can be sequential or parallel. The number of each type of tasks depends on the generation itself and is not controlled beforehand. Tasks are created until the total utilization of the task set does not exceed the total platform capacity (i.e., *U*
_{
τ
}≤*m*).

Tasks are created by randomly selecting a number of segments *k*∈ [1,3,5,7]. When *k*=1, the task is sequential; otherwise, it is parallel. In case of a parallel task, the number of sub-tasks is *n*
_{subtsk}∈ [*k*,10]. The worst-case execution time per sub-task (C_{i,subtsk}) in each task varies in the range [1,max_Ci_subtsk] where max_Ci_subtsk=2 for performance reasons. We compute the worst-case execution time of each task as \(C_{i} = \sum _{\forall \ \text {subtsk} \in \tau _{i}} \mathrm {C}_{\text {i,subtsk}}\)
^{11}. Then, we derive the remaining parameters: the period *T*
_{
i
} and utilization *U*
_{
i
}. The period *T*
_{
i
} is uniformly generated in the interval [*C*
_{
i
},*n*
_{subtsk}∗max_Ci_subtsk∗2]. This interval allows us to have a task utilization \(\left (\text {recall that}\ {U_{i} = \frac {C_{i}}{T_{i}}}\right)\) that falls in the interval [0.50,1] if all nodes are assigned max_Ci_subtsk or [0.25,1] if all nodes are assigned the minimum value for C_{i,subtsk}
^{12}. To generate execution patterns for the migrating tasks, we use Eq. 2 first and if no pattern is found we follow an enumeration approach. In our experiments, *D*
_{
i
}=*T*
_{
i
}. This procedure is repeated until 1000 task sets with migrating tasks are generated for two and four cores.

**Selected heuristics** In order to evaluate the performance of FFDO, we have conducted benchmarks against other well-known bin-packing heuristics, namely the standard first-fit decreasing (FFD), best-fit decreasing (BFD), and worst-fit decreasing (WFD). FFD assigns each task to the first core from the set of cores with sufficient idle time to accommodate it; BFD assigns each task into the core which after the assignment minimizes the idle time among all cores; and WFD assigns the task to the core which after the assignment maximizes the idle time among all cores. All the heuristics, except FFDO, group the tasks into sequential and parallel tasks and sort each group in a decreasing order of task utilization.

In order to compare the heuristics, we measured the percentage of unallocated tasks over a large number of task sets (to this end, we generated one million task sets in this experiment) to decide which heuristics have a higher number of candidate migrating tasks. Figure 8 depicts the results. We clearly observe that FFDO and WFD are the heuristics that present a higher number of unallocated tasks, while BFD and FFD allocate nearly the same amount of tasks and present a lower value of unallocated tasks when compared to FFDO and WFD. These results indicate that our initial heuristic is a good candidate for our approach as it allows the approach to try to re-allocate a high number of tasks in the second phase as migrating tasks. Due to this result, we selected both FFDO and WFD for a direct comparison in terms of the number of schedulable task sets.

To compare these two heuristics, we followed a procedure where a number of task sets are randomly generated in order to obtain 100 schedulable task sets with FFDO, and then, for all the generated task sets, we evaluate how many of them are schedulable using WFD. Figure 9 depicts the results of this comparison.

The task sets schedulable by using WFD can be divided into four groups: 26.85% of these task sets are schedulable by using both heuristics; 24.51% are not schedulable by using FFDO due to *k*
_{
i
}
^{13}; 43.19% are not schedulable by using FFDO with a *k*
_{
i
} value in the range of valid values; and finally, 5.45% of the task sets are deemed not schedulable with FDDO after applying the heuristic. Overall, in a two-core setting, the total number of task sets that are schedulable by using WFD is 257, which represents an increase of 157% over FFDO for the same input. From the diagram, the majority of the task sets that are schedulable by using WFD fit in a potential feasible region for FFDO heuristic (43.19%) — here, all task sets have migrating tasks and *k*
_{
i
} values that fit in the range of valid values but no feasible pattern is found. These results still hold for four cores but to a less extent as only 17.9% more task sets were schedulable by using WFD over FFDO.

We conjecture that WFD behaves better than FFDO (even though FFDO has a higher percentage of unassigned tasks as shown in Fig. 8) for smaller number of cores because of the task-to-core assignment. Depending on the granularity of the utilization of the task sets, more empty space may be available globally in the cores when performing the task allocation for a small number of cores. These idle slots make it possible for our pattern-finding procedure to find enough room to fit a job of a task when computing the execution pattern for a migrating task. However, as the number of cores increases, WFD naturally balances the workload through the cores, whereas FFDO assigns the workload in the initial cores leaving more room in later cores. For this reason, we envision that WFD will have the tendency to behave either equally to or even worse than FFDO with the increase in the number of cores.

**Considered metrics** In order to evaluate the proposed approach, we measure the gain obtained in terms of the average worst-case response time for each schedulable task set. Specifically, for each task set, we generate the complete schedule for the two approaches: the approach that schedules migrating tasks without applying the work-stealing mechanism among the selected cores, denoted as Approach-NS; and the approach that applies the work-stealing mechanism among the selected cores, denoted as Approach-S. After generating both schedules for each task set, we compute the average response-time of the jobs of each task throughout the hyperperiod by adding the response time of each individual job and by dividing the obtained result by the number of jobs in one hyperperiod. This process is applied to both approaches. The improvement, i.e., the gain of Approach-S over Approach-NS is computed by applying the following formula for each task *τ*
_{
i
}: \(AV_{\tau _{i}} = \frac {AV^{NS}_{\tau _{i}} - AV^{S}_{\tau _{i}}}{AV^{NS}_{\tau _{i}}} \cdot 100\), where \(AV^{NS}_{\tau _{i}}\) denotes the average response-time for task *τ*
_{
i
} in Approach-NS and \(AV^{S}_{\tau _{i}}\) denotes its average response-time in Approach-S. It follows that the average gain for each task in the task set is computed by dividing *AV*
_{
τ
}: \(AV_{\tau } = \frac {1}{|\tau |} \cdot \sum _{\tau _{i} \in \tau } AV_{\tau _{i}}\).

Figure 10 illustrates the average gain for two and four cores, respectively, for the selected heuristics, namely FFDO and WFD.

**Interpretation of the results** The improvement in terms of average response time per task (in %) is grouped by utilization—see Fig. 10—when using Approach-S over Approach-NS. For each sub-figure, the distribution of data is depicted in the form of box plot. In the plot, for each utilization value, it is possible to see the minimum and maximum values of gain per task, the median and the mean (in the form of a diamond shape), the first and third quartiles, and finally, the outliers in the shape of a cross. The line in red depicts a linear regression on the data (the mean value was used to compute the regression) in order to depict the pattern of prediction of the gain per task.

*Considering two cores:* for task sets with a high utilization (over 1.55), there is a clear illustration of the gain of the proposed approach. In the best case, this gain reaches nearly 15% for FFDO and nearly 12% of the average response time per task for WFD, which is non negligible. As the utilization of the task sets increases, the gain per task decreases. This is expected due to the increasing lack of idle time available for stealing. The trend shows that above 1.95 of utilization, the work-stealing mechanism becomes of little interest. This is explained by the fact that the total workload on each core is very high, thus leaving very small room for improvement on the average response time of each migrating task through work-stealing. It is important to note that task sets with utilizations below 1.55 using FFDO and 1.45 using WFD are not included in the plot as they do not contain any migrating task.

*Considering four cores:* the trend is similar to the one depicted for two cores. This trend is also shown by the linear regression line where it is possible to predict the average gain per task as a function of the utilization of the task set. The regression shows that for lower utilizations in two cores, the expected improvement starts at 2.3*%* for FFDO and 3.3*%* for WFD. For four cores, it starts at 1.4*%* for both heuristics. We can also observe that the expected improvement decreases with an increase in the tasks’ utilization. This behavior suggests that work-stealing is useful for task sets with migrating tasks with a utilization that span from the lowest possible utilization for task sets with migrating tasks up to the platform capacity. Closer to this upper limit, the benefits of using work-stealing are limited. From the observed behavior in two and four cores, we conjecture that the proposed approach will behave similarly when the number of cores increases.

**Overheads of the approach** This work shows that it is possible to decrease the average response time of tasks and use this newly generated free time slots to execute less critical tasks (e.g., aperiodic or best-effort tasks). While such a decrease involves overhead costs, such as the number and cost of migrations or even the impact of online admission control on the overall approach, we did not explicitly measure them. Still, we provide an overview of the existing costs and their possible impact on system performance.

We assume that cores that share a migrating task have a local copy of this task. However, keeping task copies is platform dependent as for some platforms it might not be possible to have copies due to memory constraints. In our approach, local copies are used for migrating tasks which might be subject to stealing, and having a local copy prevents fetching the task code from the main memory. Whenever a stealing operation occurs, a core fetches data from another core’s memory in order to help in the execution of the task. While this is not a task migration *per se*, it has some commonalities as data needs to be moved from one core to another. This may cause interference in the execution of other tasks in the system (for instance due to the existence of shared resources). In our approach, this overhead only occurs when stealing occurs and is performed by a core that is idle, so part of the cost is supported by the idle core (which is negligible due to the idleness of the core). Considering the number of data transfers, this number can be bounded in our framework as in the worst-case the number of data fetches when stealing depends on the number of sub-tasks in each segment and the number of cores that share the task.

Considering the online admission control, our test requires the current time instant and the available slack at a specific time instant. Both of these variables can be easily computed in any given platform either by using the platform timing functions and a cumulative function that computes the slack for the current job. Therefore, we consider that this does not pose any significant overhead in our approach.