Clock refinement in imperative synchronous languages

Gemünde, Mike; Brandt, Jens; Schneider, Klaus

doi:10.1186/1687-3963-2013-3

Review
Open access
Published: 10 April 2013

Clock refinement in imperative synchronous languages

Mike Gemünde¹,
Jens Brandt¹ &
Klaus Schneider¹

EURASIP Journal on Embedded Systems volume 2013, Article number: 3 (2013) Cite this article

4083 Accesses
6 Citations
Metrics details

Abstract

The synchronous model of computation divides the program execution into a sequence of logical steps. On the one hand, this view simplifies many analyses and synthesis procedures, but on the other hand, it imposes restrictions on the modeling and optimization of systems. In this article, we introduce refined clocks in imperative synchronous languages to overcome these restrictions while still preserving important properties of the basic model. We first present the idea in detail and motivate various design decisions with respect to the language extension. Then, we sketch all the adaptations needed in the design flow to support refined clocks.

1 Review

Synchronous languages [1] such as Esterel[2], Lustre[3], or Quartz[4] have been proposed for the development of safety-critical embedded systems. They are based on a convenient programming model, which allows one to generate deterministic single-threaded code from multi-threaded synchronous programs. Thus, synchronous programs can directly be executed on simple micro-controllers without having the need to use complex operating systems. In addition, synchronous programs can straightforwardly be translated to hardware circuits [4–6], which makes synchronous languages attractive for the use in hardware–software co-design. Furthermore, the concise formal semantics of synchronous languages is the basis for formal verification of the correctness of the programs as well as of the used compilers [7–10]. Finally, since macro steps consist of only finitely many micro steps whose number is known at compile-time, one can determine tight bounds on the reaction time by a simplified worst-case execution time analysis [11–14].

All these advantages are due to the underlying synchronous model of computation[1], which divides the execution of programs into micro and macro steps, where variables change synchronously only between macro steps and remain constant during micro steps. The partitioning into micro and macro steps is explicitly given by the programmer, and the micro steps are executed in a causal ordering so that there are no read-after-write conflicts [7, 15]. As a consequence, all threads of a program run in lockstep: they execute the micro steps of their current macro steps in the common global variable environment, and therefore automatically synchronize at the end of the macro step.

Obviously, the synchronous model of computation enforces deterministic concurrency, which has many advantages in system design, e.g., to avoid Heisenbugs [16] and to allow compile-time analyses, e.g., on WCET. At the same time, however, it imposes tight restrictions on modeling possibilities, since there is no means to express the independence of threads in certain program locations. This phenomenon—where synchronous lockstep execution of threads is enforced even though it is not necessary—is often referred to as over-synchronization. Over-synchronization occurs quite frequently, since the input signals of a system usually have different rates, and even signals of the same rate do not necessarily need to be synchronized if there are no data dependencies among them. While a static clock and data-flow analysis may be able to detect the dependencies to desynchronize such programs [17], adding an explicit notion of independence makes it possible for compilers to create desynchronized code without sophisticated and expensive analyses.

Another deficiency of the synchronous model is its inflexibility with respect to temporal changes. Modifications of the temporal behavior of a component may be problematic since they can endanger the global behavior of the entire system. For this reason, design methods such as latency-insensitive [18] design or synchronous elastic systems [19, 20] have been developed to maintain the synchronous computation between modules in case the timing of one of the modules is changed.

Another desirable feature for imperative synchronous languages, which requires further temporal abstraction layers, is function calls, which must be executed within a micro step: For example, assume that the greatest common divisor (GCD) of two integers is required in a program expression. As such non-primitive recursive functions require data-dependent loops, it is not possible to implement them as micro steps of a macro step since the number of micro steps depends on values. Executing parallel function calls imposes a lot of problems since lazy evaluation or other kinds of code optimization destroy the temporal behavior. Wrapping functions into module calls causes even more problems, since the function parameters should be constant during function evaluation, which must explicitly be enforced by the caller. A true function interface would guarantee this by definition.

All these problems can be solved by providing a hierarchy of clocks in the synchronous system that does not only allow to combine macro steps to a larger step of a slower clock, but that also allows one to refine the base clock into different faster clocks. Thereby, it is possible to explicitly describe the point of time, when synchronization should happen, independent of the number of steps that have been passed or that are needed for a calculation. The refinement of the base clock in a module is particularly attractive since it retains the external input/output behavior. Thereby, it is possible to replace a code segment by another one having a different temporal behavior. For instance, it becomes possible to exchange components with functionally equivalent ones running at higher clock speeds. Obviously, refinements make component-based design much more flexible.

The rest of this article is structured as follows. Section 2 briefly introduces the imperative synchronous language Quartz, which serves as the starting point for our extension which is presented in Section 3. Section 4 sketches a formal semantics for the extension. Section 5 gives details about the compilation of our extended Quartz to a new intermediate format, and from there finally to hardware and software. Section 6 finally discusses related work before we draw some conclusions in Section 7.

2 The synchronous language Quartz

This section introduces the synchronous model of computation with the example of the imperative synchronous language Quartz. The synchronous model of computation [1, 21] divides the execution of a program into a sequence of macro steps [22]. In each macro step, the system reads the inputs, performs some computation and finally produces the outputs. In theory, the semantics assumes that the outputs are computed in zero-time. In practice, the execution implicitly follows the data dependencies between the micro steps, and outputs have to be computed in bounded time for the given application. Thus, the synchronous model of computation abstracts from communication and computation delays and considers only the dependencies of the data. A consequence of this abstraction is that each variable has a designated value in each macro step.

2.1 Statements

The imperative synchronous language Quartz implements the synchronous model of computation by means of the pause statement. While all other primitive statements do not take time (in terms of macro steps), a pause marks the end of a macro step and consumes one logical unit of time. Thus, the behavior of a whole macro step is defined by all actions between two consecutive pause statements. Parallel threads run in lock-step: their macro steps are executed synchronously, and the statement in both are scheduled according to the data dependencies so that all variables have a unique well-defined value in the macro step.

We illustrate the synchronous model of computation by a simple example shown in Figure 1a. It takes two inputs i1, i2, produces two outputs o1, o2, and has one local variable x. Every pause statement is annotated with a label for better identification. Figure 1b shows an execution of the program based on some sample input values. For space reasons, the values true and false are written as T and F in the figure. In the first macro step, the program is started (st is true) all actions before the first pause statement are executed. In the example, these are the assignments to o1 and x which assigns the values 3 and 1 based on the given input values. There is no assignment to o2 which therefore gets its default value 0. In the second macro step, the execution resumes from the pause statement with label l1. The label is set to true, and all other labels are false for this step. In this second step, the variables o1, o2 and x are assigned. Since each variable has a unique value for the entire step, the value that is assigned to o2 is used to determine the value for o1. Thus, the assignment to o2 must be executed before the assignment to o1. The resulting values are shown in the table. The next macro step starts from the pause statement with label l2. Due to the if statement, the assignment to o1 is not executed. Since o1 is not set by an assignment, it stores its value from the last step.

Basically, all Quartz programs can be reduced to the following set of basic statements, which can be used to define further macro statements as syntactic sugar:

● nothing

This statement has no effect. It only exists for technical reasons of defining source code transformations.

● l: pause

The pause marks the end of a macro step and thus also the begin of the following step.

● x = τ

This form of a variable assignment is called an immediate assignment. It sets the value of the variable x for the current step to the value given by the evaluation of the expression τ.

● next (x) = τ

This form of variable assignment is called an delayed assignment. Like the immediate assignment, it evaluates the expression τ with the values of the current step, but this value is assigned to the variable x only in the following step.

● do ... while(σ)

This loop statement first executes its body statement. If the body statement terminates, the condition σ is evaluated and if σ holds, the body statement is restarted in the same step. Otherwise, the loop terminates. All other loop versions can be reduced to this basic loop.

● { … } | | { … }

The parallel statement executes both code blocks in parallel where in each step, one macro step of each block is executed. We call the two code sub-statements of the parallel statement threads. One can therefore also say that the threads synchronize on each pause statement that is reached. The parallel statement terminates if its last thread terminates.

● abort ... when(σ)

With the (strong) abortion statement, the execution of a code block can be aborted when the given condition holds. The abortion takes place at the beginning of a macro step: if the condition holds in a step, no action inside of the code block is executed.

● suspend ... when(σ)

With the (strong) suspension statement, the execution of a code block can be stopped when the given condition holds. In this case, the execution is stopped for the whole macro step and no action inside the block is executed. The execution resumes at the next macro step where the condition does not hold.

There are also other statements which are not considered here, because we do not define the extension with refined clocks for them.

2.2 Logical correctness and causality

In the synchronous model of computation, all micro steps in a macro step are executed synchronously. In theory, every variable assignment that complies to the execution can be considered as a consistent one. A program that has for each input assignment exactly one consistent assignment of all variables is called logically correct. We illustrate this concept with the help of the following Quartz program.

Consider the step that starts from label l1. Assume that the variables x and y had the value false in the previous step, i. e. if no assignment sets them in the considered step they will keep their values of the previous step. In order to be logically correct, a unique variable assignment has to be found which leads to a valid execution of the program. In principle, we can check all possible variable assignments.

It is easily seen that only the assignment (x = true, y = true) is consistent. Thus, this program (or at least the considered step) is logically correct. However, for a real execution of such a program, considering all possibilities is too inefficient. Therefore, the semantics of any synchronous language, and the one of Quartz in particular, also requires a constructive execution of programs (so that the above example is not a constructive Quartz program). In Quartz, this means that actions can be only executed if all control-flow conditions contributing to their trigger can be evaluated before that action. The control-flow conditions to execute the assignments to x and y depend on the values of x and y which is not allowed. Instead it should be possible to evaluate those control-flow conditions from already known values. Checking this property statically is known as causality analysis [7, 15, 23–28] in the context of synchronous programs.

2.3 Compilation and intermediate representation

Quartz is currently the language of the Averest framework (http://www.averest.org), which contains tools for simulation, compilation, verification, and synthesis for Quartz [4]. Thereby, its compiler translates the source files to the Averest Intermediate Format (AIF) [29]. AIF abstracts from the complexity of the source language: difficult interactions of preemption statements or reincarnations of local variables [7, 30, 31] are no longer an issue. Nevertheless, AIF files contain the entire behavior of the given synchronous program, and they are therefore the central part of the target-independent analyses of various back end tools.

The intermediate format describes the behavior with the help of synchronous guarded actions [31], which turned out to be well suited to eliminate the complex interaction of statements of the source language on the one hand, while preserving the synchronous semantics and allowing efficient analysis and generation of hardware and software code on the other hand. A guarded action is of the form:

γ \Rightarrow A

(1)

where γ is called the guard and A is an action, i.e., either an immediate or a delayed assignment. The intention is that the action is executed in an instant whenever its guard holds. Thus, the data-flow actions can be collected from the source code, and the compiler determines their corresponding guards.

Guarded actions do not only represent the data-flow, i.e., assignments occurring in a program, but they are also used for the control-flow. To this end, all program labels are encoded as Boolean events (and additionally adding an implicit start label st). The control flow can then be described by actions of the form 〈γ⇒next(ℓ)=true〉, where γ is a condition that is responsible for moving the control flow at the next point of time to location ℓ. For instance, the guarded actions for program P1 (see Figure 1a) are given in Figure 2.

The semantics of the intermediate format is as follows. In contrast to traditional guarded commands[32–34], guarded actions follow the synchronous model of computation. In each macro step, all actions refer to the same point of time, i.e., the evaluation of all expressions contained in the guarded actions refers to the same variable environment. If the guard τ of an immediate assignment γ⇒x=τ is true, the right-hand side τ is evaluated to determine the value of variable x in the current macro step, while a delayed action defers the update to the following step.

Similar to Quartz programs, the AIF description adds an implicit default reaction: if no action has determined the value of the variable in the current macro step, then a variable either gets a default value or stores its previous value, depending on the declaration of the variable (obviously, this is the case if the guards of all immediate assignments in the current step and the guards of all delayed assignments in the preceding step of a variable are evaluated to false). Thereby, event variables are reset to a default value while memorized variables store their value of the previous step.

In addition to the description of the behavior by guarded actions and default reactions, AIF contains more information such as the declaration of variables and the input/output interface of the described synchronous system. The intermediate format contains more information (e.g., about modularity or verification), which we skip since it is not needed in this article.

The algorithm which translates a given Quartz program to guarded actions is given in [29, 31], and we will only sketch its basic idea in this article. The whole procedure is split into two functions, which determine the surface and depth[7] of each statement:

● surface

The surface contains the guarded actions which are executed in the macro step in which the considered statement is started.

● depth

The depth contains the guarded actions which are executed in all following steps after the statement was started.

Thereby, the whole compilation slices the program into steps, i.e., the depth compilation makes use of the surface compilation, which traverses the abstract syntax tree (AST). The separation into surface and depth is essential for the correctness of the compilation algorithm.

3 Language extension

As already stated in Section 1, all threads of a Quartz program are based on the same timescale and therefore, they synchronize at each pause statement. If they do not communicate, then the synchronization is not necessary, but still enforced by the synchronous model of computation. This so-called over-synchronization is therefore an undesired side-effect of the synchronous model of computation. We present clock refinement as a solution to overcome this problem. This extension was recently proposed [35] to avoid the described effects and others. First, Section 3.1 describes the basic idea of the extension, then we illustrate the underlying time model in Section 3.2, and we finally discuss some design decisions and their consequences in Section 3.3.

3.1 Basic idea of refined clocks

The basic idea of the language extension is explained in the following with the help of two implementations of the Euclidean Algorithm to compute the GCD. The first variant, which is given in Figure 3a, does not use clock refinement. The module reads its two inputs a and b in the first step and assigns them to the local variables x and y. Then, the module computes iteratively the GCD of the local variables. The computation steps are separated by the pause statement with label l1. Each variable has a unique value in a step, and the delayed assignments set a new value to the variables for the following step. Finally, the GCD is written to the output variable gcd. Apparently, a drawback of this implementation is that the computation is spread over a number of steps. The actual number depends on the input values, and each call to this module has to take care of the consumption of time. An example execution trace for the computation of the GCD of the numbers 7 and 3 is shown in Figure 4a. The computation takes six steps and during this computation, the inputs a and b may change in principle. Thus, a calling module has to take care of the computation steps until the result is available.

The second variant, which is shown in Figure 3b, uses clock refinement. While the overall algorithm remains the same, the GCD computation is now hidden in the declaration of the local clock C1. The computation steps are separated by the pause statement with label l, which now belongs to the clock C1. In contrast to the first variant, the computation does not hit a pause statement of the outer clock and thus, the computation steps are not visible to the outside. As a consequence, each call to this module seems to be completed in a single step. The local variables x and y are now declared inside the local clock block and therefore, they can change their value for each step of the local clock, which is crucial for the correct execution of the algorithm in this example. An example execution trace for the computation of the GCD of the numbers 7 and 3 is shown in Figure 4b. The computation for the version with a refined clock takes also six steps, but these are steps of clock C1. The computation is finished in one step of the module’s clock. The variables a, b, and gcd, which are declared on the module’s clock, only have one value for this base step, while the variables x and y, which are declared on clock C1, change their value for each step of clock C1. Thus, the inputs remain constant during the computation and there is only one value of the output gcd.

The trace shows even more: In the synchronous model, each variable has exactly one value for each step, and this value is valid from the beginning to the end of the step. The inputs are given from the outside and thus, they are known for the whole computation. The output gcd is computed after some substeps, but in the general view, it is valid during the whole step. An additional note should be given on the term clock, because it is often used for different concepts. In this case, the clock is about the description of the computation and the control-flow of the language. It is not to trigger computations from the outer environment by, e.g., a periodic signal. This distinction is considered again in Section 6.

Obviously, it is not only possible to arbitrarily nest clock declarations, but also to introduce new clocks in separate scopes. This gives rise to the clock tree of a program, which can directly be obtained from the program structure. Figure 5 gives an example: the left-hand side shows the structure of nested clock declarations in source code, and the right-hand side shows the according clock tree, which can be derived from it.

3.2 Different views at the time model

The synchronous model abstracts time to reaction instants. The imperative synchronous language Quartz implements this model by steps, which range from a pause statement to another pause statements in the source code. Thus, in this single clock model, instants coincide with steps.

This section discusses two different interpretations of refined clocks and pause statements related to a particular clock. The first interpretation keeps the view that a step of the module coincides with an instant, whereas in the second interpretation new instants are introduced by the refined clocks, but these instants are not visible to the outside. We call the first one the step view and the second one the instant view. Both of them are discussed in more detail in the following. In single-clock Quartz, the following two interpretations of the pause statement are possible:

1.
A step ranges from one pause statement to a pause statement and everything in between defines the behavior of the execution instant. Thus, the pause statement separates two steps.
2.
The program execution waits at a pause statement for a clock tick. When it occurs, the program is executed until the next pause statement is reached and the execution stops and waits for the next clock tick to occur. It can be seen as a special kind of the await statement that waits for clocks.

The distinction between the above two views might appear artificial and irrelevant, so one might say that both views are the same. This is mostly true for the single clock case, where both views coincide, but when refined clocks come into play, both views become different:

1.
A step of a clock ranges from one pause statement of this clock to another one. In between, pause statements of a lower clock can occur, which hierarchically divide the step into substeps.
2.
The execution waits at a pause statement for the occurrence of the clock the pause belongs to. Then the execution proceeds to the next pause statement and waits again. This view introduces new instances to the execution, but there is no forced synchronization in each step, because different threads may wait for different clocks.

The difference of both interpretations for refined clocks is illustrated by a code example in Figure 6. Note that the refined clock C1 is locally declared in the first thread. Assume that the control-flow is currently at labels l1 and l4. Then, we compare both interpretations:

1.
According to the first interpretation, everything between two pause statements of the same clock belongs to a step of this clock. We are interested in steps of the module’s base clock C0, which is not explicitly declared. In the first thread, this step ends at the pause statement with label l3. In the second thread, this step ends at label l5. Both threads execute one step synchronously, and thus, all parts of $A_{1}$ , $A_{2}$ , and $A_{3}$ referring to the module clock C0 are executed together, regardless of the separation of $A_{1}$ and $A_{2}$ due to C1.

2.
According to the second interpretation, the program waits for a clock tick of the clocks given as the argument of the pause statements, and the execution proceeds to the next pause statements, waiting there for the next tick. When a tick of the module clock occurs, $A_{1}$ and $A_{3}$ are executed synchronously, and the labels l2 and l5 are reached, where the execution waits for the next tick, which can only be C1. The execution proceeds with $A_{2}$ , and the first threads finally reaches label l3 while the second threads simply waits at label l5 for the next tick of C0.

In the first interpretation, $A_{2}$ and $A_{3}$ are executed synchronously and thus, $A_{3}$ can depend on $A_{2}$ . In the second interpretation, both blocks are explicitly ordered, and $A_{3}$ is executed before $A_{2}$ so that $A_{3}$ cannot depend on $A_{2}$ . Thus, the second interpretation is a more operational style of description, whereas the first one can be seen as a more declarative way.

The intention of this extension is to provide an (operational) executable model, which lead to the conclusion that the second view is taken. This decision can be justified with constructive semantics for Quartz and Esterel: not each logically correct program is considered a good one for execution. Even if some actions are executed in the same instant, they also can anyway depend on each other by an causal order. The discussion in the following section will also confirm this choice.

3.3 Refined clocks in Quartz programs

This section discusses several design decisions for our extension and their effects, in particular to the data-flow of Quartz programs. As we will see, the most general variant of the extension has to deal with many problems, which makes it too inefficient for practical examples (and probably too complex for developers). The result of our discussion are several restrictions which limit the set of valid programs to a reasonable subset, where the additional complexity is manageable. This approach is similar to the constructive semantics of Quartz which does not allow all logically correct programs.

In the following, we use the notation $A_{i}$ to identify some arbitrary actions (assignments) in the source code. Direct dependencies are denoted as $A_{1} \to_{x}^{C} A_{2}$ , which means that there is an action in $A_{1}$ that writes the variable x of clock C that is read by an action in $A_{2}$ . Thus, $A_{2}$ cannot be executed before $A_{1}$ .

3.3.1 Backward data flow

From the semantical point of view, substeps can be seen as micro steps of the higher clock level. In principle, they are executed simultaneously in a single step based on the higher clock. However, if we take a finer grained view, we notice that the substeps are actually executed sequentially. Without any additional constraints, this has the consequence that information can flow backwards in the program across substeps since a variable on the higher level does not change throughout the whole (super) step. Consider the following fragment of code as an example:

This code fragment basically contains one step of clock C0, which starts at l1 and ends at l3. This step is divided into two substeps of clock C1, where the first substep executes the actions $A_{1}$ and in the second one the actions $A_{2}$ . The following cases of dependencies can occur for the above example: ●

A_{1} \to_{x}^{C0} A_{2}

The variable x of clock C0 is written by an action in $A_{1}$ and read by an action in $A_{2}$ . Since x refers to clock C0, it has exactly one value for the whole step from l1 to l3. This dependency seems to be no problem, because both steps of clock C1 can be executed in the right order.●

A_{2} \to_{x}^{C0} A_{1}

The variable x of clock C0 is written by an action in $A_{2}$ and read by an action in $A_{1}$ . Since x refers to clock C0, it has exactly one value for the whole step from l1 to l3 and value of x is needed for the execution of $A_{1}$ . However, an implicit execution order of the both steps is given by their ordering in source code. Thus, the information flows backwards due to the substeps.

It seems to be possible to solve the second dependency for the second case in the above example by a simple analysis. However, the examples can be much more complex, since control-flow introduces the question whether an action is finally reached or not.

Assume that we have the dependencies $A_{3} \to_{x}^{C0} A_{1}, A_{1} \to_{y}^{C0} γ, A_{2} \to_{z}^{C1} A_{3}$ of the actions for example above. The condition γ of the loop depends on a variable y of clock C0 but the computation of the variable in $A_{1}$ depends on another variable x, which is computed at the end of the step. In addition, the loop must terminate to reach the end of the step of clock C0. The variable z, which belongs to clock C1, is changed in the loop and finally it is used to compute x. Thus, the whole loop has to be iterated to its end to check whether the loop has to be entered or not.

The example illustrates two points: First, introducing refined clocks without additional constraints would require an expensive (reachability) analysis, and second, it seems to be unnatural to the developer since pause statements of clock C1 impose a sequential order of substeps. In consequence, backward data flow is forbidden in our approach, and a sequential execution of the substeps must be able to compute all values.

3.3.2 Scheduling parallel threads

Several refined clocks can also be declared in parallel threads so they are unrelated to each other (substeps of one thread are not visible to the other thread). Thus, there is no stepwise synchronization for these clocks. Instead, synchronization is only given by steps of a higher clock which is declared outside the parallel statement and visible to both threads. In the rest of this section, we first look at a simpler situation, where we have two threads and a refined clock in addition to the module clock.

In the first reaction, both threads are entered and the control-flow stops at labels l1 and l4. The first thread starts the next step of clock C0 at l1, executes the actions $A_{1}$ and $A_{2}$ in two substeps, and ends at l3. In the second thread, this step of clock C0 starts at label 4, includes the actions $A_{3}$ and ends at label l5. Due to the synchronous model, the steps of both threads are executed synchronously. However, the step of clock C0 of the first thread is divided into two steps of the lower clock C1 (the first one executes the actions $A_{1}$ , and the second one executes the actions $A_{2}$ ). Assume that we have the following dependencies between actions:

The sequential dependency between $A_{1}$ and $A_{2}$ is given by the source code. However, both can use the same variables of clock C1, which generally have different values in different substeps. $A_{3}$ writes a variable that is read by $A_{1}$ , and $A_{2}$ writes a variable that is used by $A_{3}$ . This is not necessarily a cycle because the variables imposing the dependencies can occur in different actions in $A_{3}$ . The model itself just means that all actions $A_{3}$ are executed until label l5 is reached. Thus, splitting $A_{3}$ into parts seems to be possible where the actions with dependencies to $A_{1}$ are executed together with $A_{1}$ and the other actions are executed with $A_{2}$ .

Consider a second example, which looks very similar at first glance:

The code is mostly the same as the previous one—only the clock C1 is declared outside the parallel statement so that it is visible in both threads (as well as variables declared on this clock). Thus, dependencies between $A_{3}$ and $A_{1}$ are now also possible on clock C1. Assume the following dependencies:

The sequential dependency between $A_{1}$ and $A_{2}$ is still present. First, consider case (a), where a dependency by variable x of clock C1 exists from $A_{3}$ to $A_{1}$ . The second dependency is imposed by variable y of clock C0 from $A_{2}$ to $A_{3}$ . If the variable x can be computed without the knowledge of variable y, it is still possible to split $A_{3}$ and execute one part with $A_{1}$ and the other one with $A_{2}$ . However, if y is needed to determine the value of x, a cycle is present and the execution is not possible. Now, consider case (b) where the dependency goes from $A_{1}$ to $A_{3}$ . Again, if x and y are not needed in the same actions, a split is possible. However, if both occur in the same guarded actions, the situation becomes more complicated. This action needs to be executed when the value of y is known, i.e., when $A_{2}$ is executed. However, the value of y determined by $A_{1}$ has to be used which is the value from the last substep. Thus, the value has to be stored so that the action in $A_{3}$ can be executed later.

Finally, consider the following example:

The step in the first thread is now divided into two substeps of clock C1 and the step in the second thread into three substeps. Due to the parallel threads, the substeps are also executed in parallel. Thus, a synchronization takes place at labels l2 and l5, and the actions $A_{1}$ and $A_{3}$ are executed together. After this first substep, there is a very similar situation to the previous example: the first thread has one step to reach label l3 and the second one has to execute two substeps. However, in the previous example we talked about splitting the actions of the first thread. However, this seems to be very confusing, especially to the developer to decide when actions can be moved to different substeps and when not. Therefore, we only allow the actions within an instant to be executed synchronously.

4 Formal semantics

We formally define the semantics of our language extension in the style of Plotkin’s Structural Operations Semantics (SOS)[36, 37]. This formalism has already successfully been used in the context of synchronous languages [7, 38, 39], and the formal semantics of single-clocked Quartz[4] already exists in this format. As the name suggests, SOS rules are defined over the structure of a given program, i.e., the AST.

In sequential programming languages, a program is executed step-by-step as given in the source code. However, due to the synchronous abstraction of time, the execution of synchronous programs must follow data dependencies, which is not necessarily the order given in the source code. Hence, we cannot use SOS rules directly, but our semantics uses two sets of SOS rules: transition rules and reaction rules.

The execution of the program is based on an environment $E$ , which is an assignment of values to each variable of the program. The transition rules specify an interpreter: they take an environment and a given program and execute its first step, i.e., they transform the program according to the environment. The computation of the actual environment (which also comprises a dynamic causality analysis) is accomplished by the second set of rules, the reaction rules. In the following, we focus on the first part, the transition rules. For the reaction rules, we refer to [40, 41].

4.1 Basic definitions

This section introduces some basic notations and formalizations. First, we define the basis of our temporal model, namely clocks, and their refinements. As all refinements always refine existing clocks, they can be organized in a tree-like relation, which is defined as follows.

Definition 1. (Clocks) We write c₁ ≻ c₂ if the clock c₂ is declared in the scope of c₁, i.e., c₁ is on a higher level (slower) than c₂.The relations ≽,≺,≼ are used accordingly. If two clocks c₁ and c₂ are independent, i.e., neither c₁ ≽ c₂ nor c₁ ≼ c₂ holds, we write c₁ # c₂.

Two clocks are independent, i.e., c₁ # c₂, if they are either declared in parallel threads, or they are declared in two distinct parts of a sequence or an if statement. For example, the clock relations C5 ≺ C1 and C2 # C3 hold for the program in Figure 5. In addition to the clocks, each program uses a finite set of variables and each variable is declared inside the scope of a clock. The following definition takes care of the variables and their clocks:

Definition 2. (Variables) $V$ is the set of variables of a synchronous program. Each variable $x \in V$ stores a value of its domain dom(x),and it is declared in the scope of a clock, which is given by clock(x). Additionally, we denote with $V^{IN}$ , $V^{OUT}$ , $V^{LOC}$ the sets of all input, output and local variables respectively. For a variable x, default (x) denotes its default value.

For example, the default value of a Boolean variable is false and that of an integer variable is 0. As an example, the clock of the variable x in the program GCD2 in Figure 3 is C1 (clock(x) = C1). For assigning values to variables, we use the following actions:

Definition 3. (Action) The actions in a synchronous program are assignments of one of the following forms.

\begin{array}{l} x = τ & (immediate assignment) \\ next (x) = τ & (delayed assignment) \end{array}

An immediate assignment assigns the value of the expression τ directly to the variable x. A delayed assignment evaluates the value of τ directly but assigns it in the next step of clock(x).

Note that a delayed assignment takes care of the clock of the variable that is assigned. In the semantics definition of synchronous programs the values of variables are determined iteratively for each step. Therefore, a notion of not yet known is needed for variables. This is covered by the following definition.

Definition 4. (Environment) An environment $E$ maps each variable $x \in V$ to a value of dom(x) ∪ {⊥}. Hence, the extended domain of a variable x additionally contains the value ⊥, which is interpreted as not known. We write $E (x)$ to retrieve the value of x in environment $E$ , and similarly $⟦ τ ⟧_{E}$ to evaluate the expression τ with respect to the values of the variables in environment $E$ . The environment which is undefined for each variable is denoted with $E^{⊥}$ .

In addition, we define operations on environments.

Definition 5. (Environment Combination) For two environments $E_{1}$ and $E_{2}$ ,we define the intersection and union as follows:

\begin{align} (E_{1} ⊓ E_{2}) (x) : = \{\begin{array}{l} v & if v = E_{1} (x) = E_{2} (x) \\ ⊥ & otherwise \end{array} \\ (E_{1} \dot{⊔} E_{2}) (x) : = \{\begin{array}{l} E_{1} (x) & if E_{2} (x) = ⊥ \\ E_{2} (x) & if E_{1} (x) = ⊥ \\ v & if v = E_{1} (x) = E_{2} (x) \end{array} \end{align}

The union is only allowed if there are no conflicting values for the same variable in both environments.

Definition 6. (Environment Restriction) A restriction of an environment $E$ with respect to ⊙c (where $⊙ \in \{≻, ≽, ≺, ≼, ⊁, ⪲, ⊀, ⪱\}$ )is defined as follows:

{(E)}_{/_{⊙}} c (x) : = \{\begin{array}{l} E (x) & if clock (x) ⊙ c \\ ⊥ & otherwise \end{array}

Thus, ${(E)}_{/_{⊙ ⪱}} c$ describes the environment where all variables with a clock lower or equal to c are set to ⊥, the values of all other variables in $E$ are kept.

Definition 7. (Partial Order of Environments) An environment $E_{1}$ is smaller than environment $E_{2}$ (greater resp.), if the following holds:

E_{1} ⊑ E_{2} : \Leftrightarrow \forall x \in V. E_{1} (x) \neq ⊥ \to E_{1} (x) = E_{2} (x)

Thus, at least the variables which are defined by $E_{1}$ are defined by $E_{2}$ with the same values.

4.2 Transition rules

The transition rules define the execution of a single step on the source code based on an existing environment for this step. Previous sections discussed the view at the model and emphasized the characteristics of the pause(C) statement as wait for clock C. In the transition rules, this view is pointed out by renaming pause(C) to await clock(C). Analogous to the original await statement, the transition rules also use the statement immediate await clock(C) to define the behavior. Transition rules have the form

\begin{array}{lcr} 〈 E, C_{S}, S 〉 \overset{C}{↠} 〈 S^{'}, A, C 〉 \end{array}

and describe how the statement $S$ is transformed to the residual statement $S^{'}$ when an instant of clock c is executed with the environment $E$ . Thereby, the set $A$ contains the assignments which are executed during this step and the set $C$ contains the clocks for which a corresponding pause statement is reached during the execution. Thus, $C$ collects the clocks which can be used for the next step. The statement clock $C_{S}$ is the lowest clock the statement is defined in.

In the following, we only give the rules for the new statements of our extension. All the other rules are similar to the original definition for single-clocked Quartz, and they can be found in Appendix Appendix 1: Transition rules. Additional details about their definition can be found in [35].

Now consider a simple example for the transition rules. Thereby, depending on the input i, the following statement $S$ can either be derived to $S_{1}^{'}$ or $S_{2}^{'}$ :

For an instant where input i holds, i.e., $E_{1} (i) = true$ , the if-branch is entered and the statement is reduced by the transition rules to:

\begin{array}{lcr} 〈 E_{1}, C0, S 〉 \overset{C}{↠} 〈 S_{1}^{'}, {x= true, y= true}, {C0} 〉 \end{array}

For an instant where $E_{2} (i) = false$ holds, the if-branch is not entered and the statement is transformed by the transition rules to:

\begin{array}{lcr} 〈 E_{2}, C0, S 〉 \overset{C}{↠} 〈 S_{2}^{'}, {x= true}, {C0} 〉 \end{array}

Note that the if statement is completely removed after it is reached. The condition is only checked when the statement is reached and in this instant it is substituted with the one or the other branch depending on the evaluation of the condition.

The transition rules which are used to define the semantics of single-clocked Quartz in [4] use a Boolean flag instead of the set $C$ . In the single clock case it is sufficient to indicate whether a pause statement is reached and whether the macro step terminated. For refined clocks, we collect the clocks of all pause statements which are reached to be able to determine a clock for the next step. However, the same information is still available by checking the emptiness of $C$ as it can be found in the rules.

Exemplarily, the rules for the new pause(C) statement, or as it is called in the transition rules await clock(C), are explained. When this statement is reached, it is changed to immediate → await clock(C). More important, the clock C is added to the set $C$ which indicates that a new step of clock C can be done. The rules for mmediate await clock(C) only proceed with the execution when a step on the associated clock is performed.

The rules for the clock declaration (c₁ and c₂) are straightforward. Both rules update the statement clock to the current declaration. The rules differ in whether the local block is executed in an instant or not. If this is the case, the whole block is removed, otherwise it remains with the residual statement.

4.3 Program execution

Based on the transition rules, the execution of a program can be defined as a sequence of tuples.

\begin{array}{lcr} (E_{0}^{PRV}, E_{0}^{CUR}, E_{0}^{NXT}, E_{0}^{ASS}, S_{0}, c_{0}), \\ (E_{1}^{PRV}, E_{1}^{CUR}, E_{1}^{NXT}, E_{1}^{ASS}, S_{1}, c_{1}), \\ \dots, \\ (E_{n}^{PRV}, E_{n}^{CUR}, E_{n}^{NXT}, E_{n}^{ASS}, S_{0}, c_{n}) \end{array}

Thereby, each tuple coincides with an instant in which c_i holds. The environments $E_{i}^{CUR}$ store the current values of the variables, $E_{i}^{PRV}$ hold the values of the values in the previous step (w. r. t.dependclk the clock c_i), and $E_{i}^{NXT}$ hold the values of delayed assignments which have to be committed in the next step (w. r. t.dependclk the clock c_i). In addition, the environments $E_{i}^{ASS}$ hold the values of the current step which has been already assigned by an immediate assignment. The module is initially started with the module clock, and thus, c₀=C0 and $S_{0}$ is the whole program. All other instants (0≤i<n) are defined by the transition rules:

\begin{array}{lcr} 〈 E_{i}, C0, S_{i} 〉 \overset{C_{i}}{↠} 〈 S_{i + 1}, A, C 〉 \end{array}

Thereby, the clock for the following instant is defined by the pause statements which are reached:

\begin{array}{lcr} c_{i + 1} \in C, ∄ c \in C. c ≻ c_{i + 1} \end{array}

With the clock c_i, a new step of this clock is started. Thus, the environment $E_{i}^{PRV}$ which holds the values of each variable from its last step has to be updated for the variables with clock c_i or lower clocks. The values of all other variables are retained:

\begin{array}{lcr} E_{i}^{PRV} = {(E_{i - 1}^{PRV})}_{/_{c_{i} ⪱}} \dot{⊔} {(E_{i - 1}^{CUR})}_{/_{c_{i} ≼}} \end{array}

The same holds for the environment $E_{i}^{CUR}$ . However, here it is only required that the values of the variables with a clock not lower or equal to c_i are kept and all variables are assigned a value:

\begin{array}{lcr} E_{i}^{CUR} ⊒ {(E_{i - 1}^{CUR})}_{/_{c_{i} ⪱}} \\ ∄ x \in V. E_{i}^{CUR} (x) = ⊥ \end{array}

The definition of the both environments $E_{i}^{NOWASS}$ and $E_{i}^{NXTASS}$ are used to treat the executed actions in environments. Thus, if there is an action in $A$ which sets the variable x to τ, the value of τ evaluated by $E_{i}^{CUR}$ is assigned by $E_{i}^{NOWASS}$ . Accordingly, $E_{i}^{NXTASS}$ holds the values of the delayed assignments:

\begin{align} E_{i}^{NOWASS} (x) : = \{\begin{array}{l} ⟦τ ⟧_{E_{i}^{CUR}} & if x = τ \in A \\ ⊥ & otherwise \end{array} \\ E_{i}^{NXTASS} (x) : = \{\begin{array}{l} ⟦ τ ⟧_{E_{i}^{CUR}} & if next (x) = τ \in A \\ ⊥ & otherwise \end{array} \end{align}

The environments $E_{i}^{NXT}$ and $E_{i}^{ASS}$ are updated according to the executed assignments. A delayed assignment from the last step is also transferred to an immediate one of the new step:

\begin{align} E_{i}^{NXT} & = {(E_{i - 1}^{NXT})}_{/_{c_{i} ⪱}} \dot{⊔} E_{i}^{NXTASS} \\ E_{i}^{ASS} & = {(E_{i - 1}^{ASS})}_{/_{c_{i} ⪱}} \dot{⊔} {(E_{i - 1}^{NXT})}_{/_{c_{i} ≼}} \dot{⊔} E_{i}^{NOWASS} \end{align}

With the clock of the next instant c_i+1 a new step of this clock is started. It is also necessary to ensure that the variables had the correct value for the step:

\begin{align} \forall x \in (V^{LOC} \cup V^{OUT}) . clock (x) ≼ c_{i + 1} \to \\ E_{i}^{CUR} (x) : = \{\begin{array}{l} E_{i}^{ASS} (x) & if E_{i}^{ASS} (x) \neq ⊥ \\ E_{i}^{PRV} (x) & if x is memorized variable \\ default (x) & if x is event variable \end{array} \end{align}

Since 0 ≤ i < n, we need to initially define $E_{- 1}^{PRV} = E_{- 1}^{NXT} = E_{- 1}^{ASS} = E^{⊥}$ and $\forall x \in V. E_{- 1}^{CUR} (x) : = default (x)$ .

5 Compilation

This section explains the compilation of Quartz with refined clocks. Similar to traditional Quartz, we also use guarded actions as an intermediate format. However, the intermediate format has to be extended appropriately so that it can represent systems with refined clocks. The translation to the intermediate format is presented in Section 5.1. Based on this, we discuss two possible targets: hardware synthesis in Section 5.2 and software synthesis in Section 5.3.

5.1 Translation to the intermediate format

Similar to the extension of Quartz with refined clocks, we also have to extend the intermediate format. As already shown in Section 2.3, the intermediate format represents the behavior of a system by guarded actions defined over a set of explicitly declared variables. Obviously, clocks and their relations are additionally needed to describe the data flow in the context of refined clocks, and the extended intermediate format contains all this technical information.

In the single-clock case, each guarded action is bound to at least one label which defines the control flow location in the source code where the action is executed from. For refined clocks, it is now necessary to only execute the actions when (1) the label holds and (2) the according clock ticks. Therefore, the label in the guard is strengthened with the corresponding clock, which requires to introduce variables for the clocks.

Before we go into details about the compilation, we first recall the remarks from Section 3.3. There, we saw that a pause statement is only left when its clock is present (it will block until the given clock ticks). Consider the following guarded action (see also Figure 2):

\begin{array}{lcr} l 2 \land i 1 > 4 \Rightarrow o 1 = i 1 \end{array}

This action originates from a single clock example but it can be also extended by the module clock C0 (which is the clock of all labels in the single clock case):

\begin{array}{lcr} (l 2 \land C 0) \land i 1 > 4 \Rightarrow o 1 = i 1 \end{array}

In the single-clock case, both guarded actions do not make any difference since C0 holds in every instant. However, we will extend the idea to refined clocks, where clocks do not generally tick in every instant, and store for each variable its clock. In particular, the Boolean control flow labels are also bound to a clock. Finally, without going into technical details, since the intermediate format stores everything which is needed for further processing, also the dependencies between clocks, i.e., the clock tree of the system, are stored. A concrete example of a clock tree was given in Figure 5, which would in this case be contained in the intermediate format.

The compilation algorithm for refined clocks to the extended intermediate format can be found in Appendix Appendix 2: Compilation algorithm. It is based on the original compilation algorithm [4]. In particular, we also use the notion of surface and depth. Thereby, the surface of a statement are the actions which can be executed in the first instant, and the depth are the actions which can be executed in the following instants.

The algorithm basically works in the same way as the original one, it traverses the AST of the programm and determines with CompileSurface the actions which are executed in the instant starting from the current position. The entry point is defined by the function Compile which calls CompileSurface and CompileDepth for the whole program. It also sets the module clock C0 as the highest one, because it is not explicitly defined. The labels of the pause statements are strengthened by the clock as described above. In addition, the abort and suspend conditions need only to be checked on the corresponding labels. On labels which are defined on a lower clock inside of an abort block, those conditions do not need to be checked. Therefore, the algorithm uses maps to store the conditions of the surrounding abort and suspend blocks related to each clock. At a pause statement, the condition for the clock just needs to be added. For a detailed description of the compilation algorithm consider [40, 42]. The guarded actions of the example program GCD2 are given in Figure 7.

5.2 Hardware synthesis

The next step of the translation to hardware circuits is an equation system. Given a library for all operators in our data flow, the equations can syntactically be translated to any hardware description language such as verilog or VHDL. In principle, the translation can be also used to generate software but, as shown in Section 5.3, there is a more efficient translation for that purpose.

In the equation system, we use three different kinds of equations. The first type represents wires which are directly connected to some logic so that the computed value is immediately available. Such an immediate equation has the following form:

x = τ

for a variable x set to the current value of τ.

State elements such as registers are represented by the remaining equations. Each one is defined by two equations, one for the initial step and one for subsequent transitions.

\begin{array}{lcr} init (x) = τ_{1} \\ next (x) = τ_{2} \end{array}

Note that the clock which defines the steps is now the hardware clock, which is generally different to the logical clocks of the source language.

5.2.1 Control flow

For the translation of the control flow, every label is considered separately. Such a label ℓ can be written by multiple delayed guarded actions (note that the control flow does not contain immediate actions). Assume that the label ℓ is written by the following actions:

\begin{array}{lcr} γ_{1} \Rightarrow & next (ℓ) = true \\ γ_{2} \Rightarrow & next (ℓ) = true \\ \dots \\ γ_{n} \Rightarrow & next (ℓ) = true \end{array}

The label can be set by this guarded actions, and it remains active until its clock holds. Therefore, the actions are combined to define a register in the following way:

\begin{align} init (ℓ) = false \\ next (ℓ) = \underset{guards}{\underset{⏟}{γ_{1} \lor γ_{2} \lor \dots \lor γ_{3}}} \lor \underset{default}{\underset{⏟}{(ℓ \land \neg clock (ℓ))}} \end{align}

The expression to set the register is split into two parts. The first one is given by the guards of the control flow guarded actions which ensures that the label is set when one of the guards hold. The second part is the default value which ensures that the label remains activated as long as no tick of its clock occurs. The special start label st is translated as follows:

\begin{array}{lcr} init (st) = true \\ next (st) = st \land \neg clock (st) \end{array}

Thus, we just set it initially, and reset it for the rest of the execution.

5.2.2 Data flow

The translation of the data flow is more sophisticated since we have to consider the following issues: (1) data flow variables can be written by delayed and immediate assignments and (2) delayed assignments do not necessarily take place in the next instant, instead the value has to be kept until the next tick of the variables clock. In general, two registers are needed for each variable (for some special cases, the following general solution can be optimized but we will present the full solution for the sake of completeness). Assume that a variable x is written by the following guarded actions:

\begin{array}{lcr} γ_{1}^{i} \Rightarrow x = τ_{1}^{i} γ_{1}^{d} \Rightarrow Next (x) = τ_{1}^{d} \\ γ_{2}^{i} \Rightarrow x = τ_{2}^{i} γ_{2}^{d} \Rightarrow Next (x) = τ_{2}^{d} \\ \dots \dots \\ γ_{n}^{i} \Rightarrow x = τ_{n}^{i} γ_{m}^{d} \Rightarrow Next (x) = τ_{m}^{d} \end{array}

For a variable x, two new variables are introduced which are converted to a register:

● x^nxt

Delayed assignments are expected to take place at the next occurrence of the clock of x. The variable x^nxt stores values from those delayed assignments, until the clock holds.

● x^prv

Since the steps of a certain clock do no longer coincide with the instants, the values of variables have to be kept for the whole step. Therefore, the variable x^prv stores the value of x from the previous instant.

The equations can then be defined as follows:

\begin{align} init (x^{nxt}) & = default (x) \\ next (x^{nxt}) & = \{\begin{array}{l} τ_{1}^{d} & : & γ_{1}^{d} \\ τ_{2}^{d} & : & γ_{2}^{d} \\ ⋮ & ⋮ \\ τ_{m}^{d} & : & γ_{m}^{d} \\ trans (x) & : & default \end{array} \\ init (x^{prv}) & = default (x) \\ next (x^{prv}) & = x \\ x & = \{\begin{array}{l} τ_{1}^{i} & : & γ_{1}^{i} \\ τ_{2}^{i} & : & γ_{2}^{i} \\ ⋮ & ⋮ \\ τ_{m}^{i} & : & γ_{n}^{i} \\ x^{nxt} & : & clock (x) \\ x^{prv} & : & default \end{array} \end{align}

where the expression trans(x) depends on the storage type of the variable x:

trans (x) : = \{\begin{array}{l} default (x) & : & x is event variable \\ x & : & x is memorized variable \end{array}

Optimizations are possible e. g.dependclk if no delayed assignments exists and trans(x) is a constant value (e. g. false). In this case, the variable x^nxtnxt can completely be removed because it always holds a constant value. Similarly, optimizations are possible if there are not immediate assignments for a variable x. In this case, the translation shown for the control flow can be used.

5.2.3 Scheduling

In addition to the translation of the control flow and the data flow, we have to consider the clocks for the synthesis. In single-clock Quartz, this is simple since the hardware clock coincides with the module clock of the Quartz module. Thus, in each clock cycle one instant of the module is executed. The hardware synthesis for refined clocks is based on the same idea but for independent clocks the one or the other instant can be executed. In addition, not every clock is allowed to occur in every instant due to the restrictions imposed by the clock tree and the control flow. Restrictions can also be imposed by the data flow, if one thread waits for a value which is computed by an independent step in a later instant. To sum up, scheduling the clocks according to the semantics requires some analysis.

In the following, we assume that communication between unrelated clocks is only done by delayed actions (for a generalization see [40, 41]). This means that no data dependencies exist for unrelated clocks and communication among them are synchronized by a common higher clock. In this case, there is no need to consider data-flow dependencies and we can describe a scheduler for the clocks only by the control flow. We call a clock C enabled if one of its labels holds:

\begin{array}{lcr} enabled (C) : = \underset{ℓ \in ℒ, clock (ℓ) = C}{\lor} ℓ \end{array}

A clock is only allowed to tick, if at least one of the related pause statements are reached. In addition, it can only tick if no lower clock is enabled, because execution should synchronize on common pause statements. Therefore, we also define the sets of all lower and all higher clocks of C by:

\begin{array}{lcr} lower (C) : = {c \in C | c ≺ C} \\ higher (C) : = {c \in C | c ≻ C} \end{array}

With these definitions, we can finally construct the equation for the clock C as follows:

\begin{array}{c} \begin{array}{c} C = \underset{tick by its own}{\underset{⏟}{enabled (C) \land \underset{c \in lower (C)}{\land} \neg enabled (c)}} \lor \\ \underset{tick forced by higher clock}{\underset{⏟}{\underset{c \in higher (C)}{\lor} c}} \end{array} \end{array}

Thus, a clock can tick by its own, if it is enabled, but no lower clock is. In addition, a clock tick can be forced by a higher clock which also includes all lower ones. This is to trigger the delayed assignments also for the lower clocks.

5.3 Software synthesis

Synchronous languages can be used to build hardware and software from the same description. One possible solution for this is a software synthesis which simulates the hardware that is described above. However, there are more efficient solutions. One possibility is based on the extended finite state machine. Thereby, the possible combinations of labels form the states. The guarded actions are grouped by the labels which occur in their guards and are assigned to the corresponding states. Thus, in each state only the guarded actions which are possibly executed have to be evaluated.

With the introduction of refined clocks, there exists another parameter to classify the guarded actions. Therefore, the guarded actions are first divided by the clocks (of the labels) which occur in their guards. The guarded actions of each clock are combined to a task. The advantage is that local variables of a task are the local variables of the clock and do not have to be made visible to other tasks. Inputs of a task come from the higher level and outputs go back to the higher level or to a lower one.

To complete one step, a module usually has to complete several substeps. The substeps can be associated by unrelated clocks, thus, they can be executed independently. With the model of tasks, the inputs and the according state can be send to the tasks and they can concurrently execute several substeps. The tasks can be scheduled dynamically whenever a lower clock level is entered. Thus, tasks model the parallelism which is inherent to refined clocks.

5.4 Example

In this section, we will discuss by an example that refined clocks can be used to relax over-synchronization and that this advantage can be used for a more liberal code generation. The software realization does not need to introduce needless synchronization, and hardware implementations can use different schedulers for the refined clocks to control the trade-off between resources (space of the hardware design) and execution time.

Consider the following example which consists of two parallel threads with two unrelated clocks. In each one, the same resource is used—for illustration, assume that it is a multiplier.

Without refined clocks, synchronization of both threads would be necessary (due to the semantics) on each pause statement. The original single-clocked hardware synthesis considers each instant as a clock cycle. Therefore, using the same multiplication unit for both multiplications would require a reachability analysis to ensure that both are not executed in the same clock cycle.

Refined clocks relax the need for synchronization and only require them for the same clocks. Therefore, synchronization is not necessary for the above example. The scheduler, as it is described in Section 5.2.3, can (1) execute both steps containing multiplications together or it can (2) ensure that both steps are executed one after the other. For the first case, two multipliers are necessary, and both are used in parallel. For the second case, since the scheduler ensures the mutual exclusive access, only one multiplication unit is necessary, since the multiplications are executed one after the other. To summarize, both cases differ in space (of the hardware design) and (execution) time in terms of clock cycles. From this point of view, the first case can also be achieved without refined clocks. However, refined clocks initially introduce the possibility of selecting between space and time.

6 Related work

Using more than one clock in a system is a quite common approach to deal with timing, synchronization, and independent execution in synchronous systems, even though the term clock can be misleading since it is used for many different concepts: a hardware developer will probably understand by clock a periodic signal whose occurrence is based on a fixed physical time. In this case, the clock signal is typically fed in from the environment (clock generator) into the actual circuit and is used to drive the execution. Another interpretation is given from synchronous data-flow languages like Lustre or Signal where each signal has a clock that identifies the availability/presence of data. Thereby, the presence of data can depend on the presence of other data and also on other data values. These clocks are not necessarily all given by the environment, and can be instead computed by the system from the given ones. Finally, the imperative languages mentioned in this article, Esterel and Quartz, are single-clock synchronous languages where a clock is used to separate the execution into single reaction steps. If these languages are translated to synchronous hardware circuits, each step can be mapped to a hardware-clock cycle, but there is no reason to compile it in this way. Different approaches related to refined clocks are introduced and compared to the presented work in the following.

6.1 Esterel

The synchronous language Esterel is quite similar to the language Quartz described in Section 2. A difference which should be pointed out is the interpretation of the terms signal and variable. Since both are used synonymously in Quartz, Esterel makes a clear distinction between both. Where the Esterel signals behave like Quartz signals/variables, which are only allowed to have one value per step, Esterel variables can be assigned multiple times. When a variable is read, its last assigned value is used:

The example is taken from [43]. The variable X is assigned two times and used to set the values of the signals S1 and S2. Thereby, S1 receives the value 0 and S1 receives the value 1. Even though these variables of Esterel are useful, they also have some limitations: It is not allowed to write and read them in independent threads. Also, since Esterel forbids instantaneous loops, the introductory GCD2 example cannot be converted to Esterel. Moreover, the Esterel variables only provide one simplified abstraction layer for data. In contrast, refined clocks can be arbitrarily deep nested, can be used in parallel threads, and they can also interact with preemption statements.

6.2 Multiclock Esterel

Originally, Esterel also has the single-clock abstraction of steps, but in the past, it has been enriched with two different multiclock extensions. They are both named multiclock Esterel and introduced in [44, 45].

Berry and Sentovich [44] introduced their version of multiclock Esterel. Their work addresses the need to design systems with multiple clock domains in a modular way. Each module can run on its own clock, where each step of the module coincides with a clock tick of the module’s clock. The modules itself are still single-clock modules with the possibility to call other modules on a different clock. To communicate data between clock domains, the authors defined two possible communication devices, named sampler and reclocker. Finally, a system consists of different modules each running at their own clock and communicating by the defined communication devices. In case of this version of multiclock Esterel, the clocks trigger the computation of each module and have to be additionally provided by the outer environment, which can be, e.g., a hardware clock.

The second multiclock Esterel extension was proposed by Rajan and Shyamasundar [45, 46]. Their solution introduces a new statement which allows to override the clock locally by an expression based on known signals. The local statement tick is then based on this new clock expression. Finally, the signals where the local clocks are defined with, have also to be provided by the outer environment. The difference to Berry’s extension is that no dedicated clock signal is used, but any signal can be used to define a new tick.

Both extensions basically allow to define new (arbitrary) clocks for a module or a code block. However, they do not allow to access multiple clocks at the same time. In addition, the clocks have a different meaning here, since they are intended to be given from the environment to trigger computations. Instead, our Quartz extension refines the inner descriptions where clocks are used to divide steps into substeps. This is only used for modeling, not for execution.

6.3 Lustre and Signal

Multi-clocked systems can also be described by the synchronous language Lustre[3, 47]. Each Lustre program basically consists of a set of equations over data streams. In addition to functions and delays, there are two operators to change the rate/clock of a stream. The clock of a stream identifies the positions where a value is present. The downsampling operator when takes a stream of arbitrary type and a Boolean stream and keeps only the events of the first one at those instants where the second one is true. The upsampling operator current undoes a previous downsampling operation by inserting the last known value in the missing locations of the stream. Each node has a so-called base clock and at least one input of the node must run on this clock. New signal definitions always come with the definition of the clock. Hence, since upsampling only undoes the last downsampling, there is no means to refine the base clock. Therefore, the base clock is the fastest clock of a node and contains all instants at which any computation or communication may happen. Lustre specifications are completely deterministic due to their bottom-up design from the base clock.

In contrast to this, the polychronous language Signal[48–50] is also based on multiple clocks. While the syntax looks almost like Lustre, its semantics is very different due to its assumption that there may not be a base clock. As a consequence, Signal specifications are relational and not functional like Lustre: they do not describe a single behavior, but several possible ones, which differ in the clocks. Hence, Signal solves most of the problems mentioned in introduction—however, the price one has to pay for this powerful model is that input/output determinism is generally lost. It can be guaranteed if the program is shown to be endochronous [51] or weakly endochronous [52]. While endochrony proves determinism by the existence of a base clock (usually called master trigger in this context), weak endochrony also reveals some internal nondeterminism that can safely be exploited for a more efficient execution. Unfortunately, weak endochrony cannot be automatically checked in general. However, Signal cannot solve all the problems we have mentioned in the introduction. In particular, the definition of program functions which hide a sequential computation in an instantaneous expression is not possible. For example, a basic Signal node which instantaneously computes the GCD (i.e., its result is available at the same instant when the inputs arrive), cannot be replaced by other nodes running at a higher rate (cf. the introductory GCD2 example).

Both Lustre and Signal deal with inputs and outputs based on different clocks. Again, those clocks are different to the clocks of the Quartz extension since their computation steps are refined internally. In addition, since Signal is able to solve some of the introductory problems, it is a matter of taste whether to use a (descriptive) data flow language like Signal or to use a control-flow-based language like Quartz.

6.4 Discrete event

The discrete event languages Verilog and VHDL are hardware description languages used to model circuits. A simulation semantics is defined for them which allows similar to the Esterel variables multiple updates to signals in so-called delta cycles where the physical simulation time does not proceed. In this way, a signal can have multiple values in a clock cycle. However, besides the fact, that this behavior is hard to survey, it is only available for hardware simulation, hence not for synthesis and also not for software designs.

Finally, the same observation as for Esterel holds, i.e., the delta-cycle changes only provide a single abstraction for data and cannot influence the control-flow of substeps. Even more, Esterel and Quartz provides more rich control-flow statements which can be used for the whole design process including code generation. With refined clocks, the same rich control-flow statements can be used for arbitrarily many abstraction layers.

7 Conclusion

Imperative synchronous languages are limited so far to a single clock abstraction of time which imposes restrictions to the programmer. We introduce refined clocks as a language extension to the language Quartz. This article presents the problems introduced by this new extension and it shows how they can be solved in a practical way. It formally defines the semantics for the new extension and a compilation algorithm to translate the programs to a new intermediate format. In addition, synthesis to hardware and to software is presented. It is also shown how these synthesis procedures can benefit from the new features which have been introduced.

Appendix 1: Transition rules

The transition rules defining the semantics of the language extesnion are given in Figure 8 for the basic statements, in Figure 9 for the parallel execution, in Figure 10 for strong abortion, and in Figure 11 for the strong suspension.

Appendix 2: Compilation algorithm

The compilation algorithm for programs of the presented language extension to the intermediate format are given in Figure 12, Figure 13, and in Figure 14.

References

Benveniste A, Caspi P, Edwards S, Halbwachs N, Le Guernic P, de Simone R: The synchronous languages twelve years later. Proc. IEEE 2003, 91: 64-83. 10.1109/JPROC.2002.805826
Article Google Scholar
Berry G: The foundations of Esterel. In Proof, Language and Interaction: Essays in Honour of Robin Milner. Edited by: Tofte M, Plotkin G, Stirling C, Tofte M . MIT Press Cambridge; 1998:425-454.
Google Scholar
Halbwachs N, Caspi P, Raymond P, Pilaud D: The synchronous dataflow programming language LUSTRE. Proc. IEEE 1991,79(9):1305-1320. 10.1109/5.97300
Article Google Scholar
Schneider K: The synchronous programming language Quartz. Internal Report 375, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany, 2009
Berry G: A hardware implementation of pure Esterel. Sadhana 1992, 17: 95-130. 10.1007/BF02811340
Article Google Scholar
Rocheteau F, Halbwachs N: Implementing reactive programs on circuits: a hardware implementation of LUSTRE. In Real-Time: Theory in Practice vol. 600 of LNCS. Edited by: Rozenberg G, de Bakker J, Huizing C, de Roever WP, Rozenberg G . Springer Mook, The Netherlands; 1992:195-208.
Chapter Google Scholar
Berry G: The constructive semantics of pure Esterel.1999. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.2076]
Google Scholar
Schneider K: A verified hardware synthesis for Esterel. In Distributed and Parallel Embedded Systems (DIPES). Edited by: Rammig F, Rammig F . Kluwer Schloß Ehringerfeld; 2000:205-214.
Google Scholar
Schneider K: Embedding imperative synchronous languages in interactive theorem provers. In Application of Concurrency to System Design (ACSD). IEEE Computer Society Newcastle Upon Tyne; 2001:143-154.
Chapter Google Scholar
Schneider K, Brandt J, Schuele T: A verified compiler for synchronous programs with local declarations. Electron. Notes Theor. Comput. Sci. (ENTCS) 2006,153(4):71-97. 10.1016/j.entcs.2006.02.028
Article Google Scholar
Li YT, Malik S: Performance analysis of real-time embedded software. (Kluwer, The Netherlands, 1999)
Logothetis G, Schneider K: Exact high level WCET analysis of synchronous programs by symbolic state space exploration. In Design, Automation and Test in Europe (DATE). IEEE Computer Society Munich; 2003:10196-10203.
Google Scholar
Boldt M, Traulsen C, von Hanxleden R: Worst case reaction time analysis of concurrent reactive programs. Electron. Notes Theor. Comput. Sci. (ENTCS) 2008,203(4):65-79. 10.1016/j.entcs.2008.05.011
Article Google Scholar
Ju L, Khoa Huynh B, Roychoudhury A, Chakraborty S: Timing analysis of Esterel programs on general purpose multiprocessors. In Design Automation Conference (DAC). Edited by: Sapatnekar S. (ACM Anaheim; 2010:48-51.
Google Scholar
Schneider K, Brandt J, Schuele T: Causality analysis of synchronous programs with delayed actions. 2004.
Chapter Google Scholar
Titzer B, Palsberg J: Nonintrusive precision instrumentation of microcontroller software. In Languages, Compilers, and Tools for Embedded Systems (LCTES). Edited by: Gupta R, Paek Y, Gupta R . ACM Chicago, IL; 2005:59-68.
Google Scholar
Brandt J, Schneider K: Static data-flow analysis of synchronous programs. In Formal Methods and Models for Codesign (MEMOCODE). Edited by: Bloem R, Schaumont P. IEEE Computer Society Cambridge; 2009:161-170.
Google Scholar
Carloni L, McMillan K, Sangiovanni-Vincentelli A: Theory of latency-insensitive design. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. (T-CAD) 2001,20(9):1059-1076. 10.1109/43.945302
Article Google Scholar
Cortadella J, Kishinevsky M, Grundmann B: Synthesis of synchronous elastic architectures. In Design Automation Conference (DAC). Edited by: Sentovich E, Sentovich E . ACM San Francisco; 2006:657-662.
Google Scholar
Krstic S, Cortadella J, Kishinevsky M, O’Leary J: Synchronous elastic networks. In Formal Methods in Computer-Aided Design (FMCAD). Edited by: Manolios P, Gupta A, Manolios P . IEEE Computer Society San Jose; 2006:19-30.
Google Scholar
Halbwachs N: Synchronous Programming of Reactive Systems. Kluwer, The Netherlands; 1993.
Book MATH Google Scholar
Harel D, Naamad A: The STATEMATE semantics of Statecharts. ACM Trans. Softw. Eng. Methodol. (TOSEM) 1996,5(4):293-333. 10.1145/235321.235322
Article Google Scholar
Malik S: Analysis of cycle combinational circuits. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. (T-CAD) 1994,13(7):950-956. 10.1109/43.293952
Article MATH Google Scholar
Halbwachs N, Maraninchi F: On the symbolic analysis of combinational loops in circuits and synchronous programs. In Euromicro Conference. IEEE Computer Society Como; 1995.
Google Scholar
Brzozowski J, Seger CJ: Asynchronous Circuits. Springer, New York; 1995.
Book MATH Google Scholar
Shiple T, Berry G, Touati H: Constructive analysis of cyclic circuits. In European Design Automation Conference (EDAC). IEEE Computer Society Paris; 1996:328-333.
Google Scholar
Boussinot F: SugarCubes implementation of causality. Research Report 3487, Institut National de Recherche en Informatique et en Automatique (INRIA), Sophia Antipolis, France 1998
Google Scholar
Schneider K, Brandt J, Schuele T, Tuerk T: Maximal causality analysis,. In Application of Concurrency to System Design (ACSD). Edited by: Watanabe Y, Desel J, Watanabe Y . IEEE Computer Society Saint-Malo; 2005:106-115.
Chapter Google Scholar
Brandt J, Schneider K: Separate translation of synchronous programs to guarded actions. Internal Report 382/11, Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany 2011
Google Scholar
Tardieu O, de Simone R: Curing schizophrenia by program rewriting in Esterel. In Formal Methods and Models for Codesign (MEMOCODE). IEEE Computer Society San Diego; 2004:39-48.
Google Scholar
Brandt J, Schneider K: Separate compilation for synchronous programs. In Software and Compilers for Embedded Systems (SCOPES) Volume 320 of ACM International Conference Proceeding Series. Edited by: Falk H, Falk H . ACM Nice; 2009:1-10.
Google Scholar
Chandy K, Misra J: Parallel Program Design. Addison-Wesley, Austin; 1989.
Book MATH Google Scholar
Dill D: The Murphi verification system. In Computer-Aided Verification (CAV), Volume 110 of LNCS. Edited by: Henzinger T, Alur R, Henzinger T . Springer New Brunswick; 1996:390-393.
Chapter Google Scholar
Lamport L: The temporal logic of actions. Technical Report 79 Digital Equipment Cooperation 1991
Google Scholar
Gemünde M, Brandt J, Schneider K: A formal semantics of clock refinement in imperative synchronous languages. In Application of Concurrency to System Design (ACSD). Edited by: Fernandes J, Gomes L, Khomenko V, Fernandes J . IEEE Computer Society Braga; 2010:157-168.
Google Scholar
Plotkin G: A structural approach to operational semantics. Technical Report FN-19, DAIMI, Arhus, Denmark 1981
Google Scholar
Mosses P: Formal semantics of programming languages. Electron. Notes Theor. Comput. Sci. (ENTCS) 2006, 148: 41-73. 10.1016/j.entcs.2005.12.012
Article Google Scholar
Berry G, Cosserat L: The Esterel synchronous programming language and its mathematical semantics. In Seminar on Concurrency (CONCUR) Volume 197 of LNCS. Edited by: Brookes S, Roscoe A, Winskel G. Springer Pittsburgh; 1985:389-448.
Chapter Google Scholar
Tini S: Structural operational semantics for synchronous languages. PhD thesis. University of Pisa, Italy, 2000
Google Scholar
Gemünde M, Brandt J, Schneider K: Schizophrenia and causality in the context of refined clocks. In Forum on Specification and Design Languages (FDL). Edited by: Ghenassia O, Morawiec K, Hinderscheit J, Ghenassia O . IEEE Computer Society Oldenburg; 2011:1-8.
Google Scholar
Gemünde M, Brandt J, Schneider K: Causality analysis of synchronous programs with refined clocks. In High Level Design Validation and Test Workshop (HLDVT). IEEE Computer Society; 2011:25-32.
Google Scholar
Gemünde M, Brandt J, Schneider K: Compilation of imperative synchronous programs with refined clocks. In Formal Methods and Models for Codesign (MEMOCODE). Edited by: Jobstmann B, Carloni L, Jobstmann B . IEEE Computer Society Grenoble; 2010:209-218.
Google Scholar
Berry G: A quick guide to Esterel.1997. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.2222]
Google Scholar
Berry G, Sentovich E: Correct Hardware Design and Verification Methods (CHARME), Volume 2144 of LNCS. Edited by: Melham T, Margaria T, Melham T . Livingston: Springer; 2001:110-125.
Chapter Google Scholar
Rajan B, Shyamasundar R: Multiclock ESTEREL: a reactive framework for asynchronous design. In International Parallel and Distributed Processing Symposium (IPDPS), Cancún. Quintana Roo: IEEE Computer Society; 2000:201-209.
Google Scholar
Rajan B, Shyamasundar R: Modeling distributed embedded systems in multiclock Esterel. In Formal Description Techniques for Distributed Systems and Communication Protocols (FORTE/PSTV). Edited by: Latella D, Bolognesi T, Latella D . Kluwer Pisa; 2000:301-316.
Google Scholar
Halbwachs N: A synchronous language at work: the story of Lustre. In Formal Methods and Models for Codesign (MEMOCODE). IEEE Computer Society Verona; 2005:3-11.
Google Scholar
Gautier T, Le Guernic P, Besnard L: SIGNAL, a declarative language for synchronous programming of real-time systems. In Functional, Programming Languages and Computer Architecture, Volume 274 of LNCS. Edited by: Kahn G. Springer Portland; 1987:257-277.
Chapter Google Scholar
Le Guernic P, Gauthier T, Le Borgne M, Le Maire C: Programming real-time applications with SIGNAL. Proc. IEEE 1991,79(9):1321-1336. 10.1109/5.97301
Article Google Scholar
Le Guernic P, Talpin JP, Le Lann JC: Polychrony for system design. J. Circuits Syst. Comput. (JCSC) 2003,12(3):261-304. 10.1142/S0218126603000763
Article Google Scholar
Potop-Butucaru D, Caillaud B, Benveniste A: Concurrency in synchronous systems. In Application of Concurrency to System Design (ACSD). IEEE Computer Society Hamilton; 2004:67-76.
Google Scholar
Potop-Butucaru D, Caillaud B: Correct-by-construction asynchronous implementation of modular synchronous specifications. In Application of Concurrency to System Design (ACSD). IEEE Computer Society Saint-Malo; 2005:48-57.
Chapter Google Scholar

Download references

Acknowledgments

We thank the German Research Foundation (DFG) for supporting this work.

Author information

Authors and Affiliations

Department of Computer Science, University of Kaiserslautern, Kaiserslautern, Germany
Mike Gemünde, Jens Brandt & Klaus Schneider

Authors

Mike Gemünde
View author publications
You can also search for this author in PubMed Google Scholar
Jens Brandt
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mike Gemünde.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gemünde, M., Brandt, J. & Schneider, K. Clock refinement in imperative synchronous languages. J Embedded Systems 2013, 3 (2013). https://doi.org/10.1186/1687-3963-2013-3

Download citation

Received: 27 February 2012
Accepted: 31 January 2013
Published: 10 April 2013
DOI: https://doi.org/10.1186/1687-3963-2013-3

Clock refinement in imperative synchronous languages

Abstract

Abstract

1 Review

2 The synchronous language Quartz

2.1 Statements

2.2 Logical correctness and causality

2.3 Compilation and intermediate representation

3 Language extension

3.1 Basic idea of refined clocks

3.2 Different views at the time model

3.3 Refined clocks in Quartz programs

3.3.1 Backward data flow

3.3.2 Scheduling parallel threads

4 Formal semantics

4.1 Basic definitions

4.2 Transition rules

4.3 Program execution

5 Compilation

5.1 Translation to the intermediate format

5.2 Hardware synthesis

5.2.1 Control flow

5.2.2 Data flow

5.2.3 Scheduling

5.3 Software synthesis

5.4 Example

6 Related work

6.1 Esterel

6.2 Multiclock Esterel

6.3 Lustre and Signal

6.4 Discrete event

7 Conclusion

Appendix 1: Transition rules

Appendix 2: Compilation algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords