The proposed flow for estimating the implementation effort is illustrated in Figure 1. It takes its outset in a behavioural description of the algorithm, in C-language (including library function source code), which is intended to be implemented in hardware. From this description, we use the design-Trotter framework to generate a hierarchical control data flow graph (HCDFG) which is then measured to identify the number of independent paths. The resulting measure, combined with the experience of the developers, gives an estimate of the required implementation effort. The method is self-learning in the sense that after each successful implementation, new knowledge about the developers involved can be integrated, and improve the accuracy of the estimates. The HCDFG and the approach for modelling the developers experience are covered later in this section but initially we investigate how the number of paths can be measured.
3.1. Cyclomatic Complexity
As described in Section1.3, the number of independent paths is expected to correlate with the complexity that the engineers are facing when working on the implementation. Therefore, finding a method to measure the number of independent paths in an algorithm could help us investigating this issue. A metric measuring is the cyclomatic complexity measure proposed by McCabe [15] which measures the number of linear-independent paths in the algorithm.
The cyclomatic complexity was originally invented as a way to intuitively quantify the complexity of algorithms, but has later found use for other purposes especially in the software domain. The cyclomatic complexity has been used for evaluating the quality of code in companies [16], where quality covers aspects from understandability over testability to maintainability. It has also been shown [17] that algorithms with a high cyclomatic complexity more frequently have errors than algorithms with lower cyclomatic complexity. The cyclomatic complexity has furthermore been used for evaluating programming languages for parallel computing [18], where languages that encapsulate control statement in instructions are receiving higher scores. All use the cyclomatic complexity measure under the assumptions that the complexity has significant influence on the number of paths the developers need to inspect, its correlation to the number of paths that needs to be tested, or a combination of the two.
In the domain of hardware, the cyclomatic complexity has also found use, judging the readability and maintainability in the SAVE project [19]. It is worth noticing that they use a misinterpreted [20] definition of the cyclomatic complexity [21].
All these projects utilise the cyclomatic complexity's ability to measure the number of independent paths and relate them to their individual cases:
where
represents the number of condition nodes in the graph G representing the algorithm being analysed. Figure 2 shows two examples of graphs and the corresponding cyclomatic complexity.
In this work, we propose an adapted version of the cyclomatic complexity definition to estimate, a priori, the number of independent paths on a hierarchical control data flow graph (HCDFG), defined in the following section. The cyclomatic complexity for an HCDFG is obtained by examining its subgraphs as explained in Section3.3.
3.2. HCDFG
For this work we use the hierarchical control data flow graphs (HCDFGs), which are introduced in [22, 23]. The HCDFGs are used to represent an algorithm with a graph-based model so the examination task of the algorithm is eased. Control/Data Flow Graphs (CDFGs) are well accepted by designers as a representation of an algorithm where data flow graphs represent the data flow between different processes/operations, and the control flow layer, encapsulating these data flows and adding control structures to the graphical notation. The hierarchy layered structure is added to help representing large algorithms as well as to enable the analysis mechanism to identify functions/blocks in the graph. Such an identified block can then be seen as a single HCDFG that can be instantiated several times. Figure 3 shows an example of a hierarchical control data flow graph.
In this work the design space exploration tool "Design-Trotter" is used as an engine for analysing the algorithms. The HCDFG model is used as "Design-Trotter's" internal representation.
The hierarchy of an HCDFG is shown in Figure 3. An HCDFG can consist of other HCDFGs, Control/Data flow graphs (CDFGs) and data flow graphs (DFGs) as well as elementary nodes (processing, memory, and control nodes). An HCDFG is connected via dependency edges. In this work we only explore the graph at levels above the DFGs, and therefore only concentrate on these when we define the graph types in what follows.
Let us consider the hierarchical control data flow graph,
where
are the nodes denoted by
and the nodes are
, meaning that the nodes in the
can be instances of its own type, encapsulated control data flow graphs,
, encapsulated data flow graphs
, or data transfer nodes, Data. The last one is introduced to avoid the duplication of data representations in the hierarchy, when data is exchanged between the graphs. Thereby, data are only represented by their nodes and not by edges as it is common in many other types of DFGs.
The edges,
, connect the nodes such that
where
and represent the indexes of the nodes,
and where every node can have multiple input and/or output edges. For the
, only data dependencies, DD, are allowed, and no control dependencies, CD.
In this way the HCDFG forms a hierarchy of encapsulated HCDFGs, CDFGs, and DFGs, connected via exchanging data nodes. The HCDFG can be seen as a container graph for other graph types such as the CDFG.
We can define the CDFG as
where
are the nodes denoted by
and the nodes are
where
. In this way the
is able to describe common control structures, where the actual data processing is encapsulated in either DFGs or HCDFGs. Again, the data exchange nodes are used to exchange data between the other nodes.
The edges,
, connect the nodes such that
where
and represent the indexes of the nodes. If
and
then
, else
Beneath the control data flow graphs
, the data flow graphs
exist but they are of no use in this work so we will not define them further here.
3.3. Calculating the Cyclomatic Complexity on CDFGs
Now that the HCDFG has been defined, we explain our proposed method for measuring the cyclomatic complexity on the CDFGs.
Since the cyclomatic complexity only considers the control structure in finding the number of independent paths in the algorithm, the DFG part of the algorithm is, as mentioned earlier, of no interest for this task because it only gives a single path. On the other hand, what is of interest is how the cyclomatic complexity is measured on the CDFGs and HCDFGs which are built by the tool Design-Trotter. This leaves us with the following cases which are described in detail afterwards:
-
(i)
If constructs,
-
(ii)
Switch constructs,
-
(iii)
For-loop,
-
(iv)
While/do-while loops,
-
(v)
Functions,
-
(vi)
HCDFGs in parallel,
-
(vii)
HCDFGs in serial sequence.
3.3.1. If Constructs
"If constructs" case is represented as CDFGs,
, where one node is a control node of type if (see Figure 4(a)). Before arriving at the control node, a condition evaluation node
is traversed to calculate the boolean variable stored in
(to maintain simplicity, these are not shown in Figure 4(a)) that is used in the condition node. If the variable is true, the algorithm follows the path through the true body node,
. Else it goes to the false body node
. Note that in some cases, either the true body or the false body does not exist, but it still gives a path. In this case, according to the cyclomatic complexity measure, the number of independent paths is
The last part of (3),
is included in case the evaluation graph is an HCDFG node.
3.3.2. Switch Constructs
"Switch constructs" case is represented as CDFGs,
, and is almost the same flow as the "if constructs" case discussed above. One node is a control node of switch type. Before arriving to the control node, a condition evaluation node
is traversed. Depending on the output, the switch node leads the algorithm flow to the selected case node:
. An example is shown in Figure 4(b). According to the cyclomatic complexity measure, the number of independent paths is as follows:
where
represents the number of cases,
the index to the corresponding node on which the paths are measured.
The same argument goes for the
part of (4); it is included in case the evaluation graph is an HCDFG node, but else it is omitted.
3.3.3. For-Loop
"For-loop" case is the most complex of the control structures. Strictly speaking, a "for loop" consists of three different parts: the evaluation body, the evolution body, and the for body,
,
, and
, respectively. The control node nfor, determines, based on the output from the evaluation graph, whether the flow should go into the "for loop" or leave it. The evolution node updates the indexes. Since each iteration of the graph needs to pass through the evaluation and evolution nodes, the number of independent paths is calculated as
In many cases, the evaluation and evolution part of the "for loop" are quite simple indexing functions, meaning that
,
, will leave
. The "for loop" is illustrated in Figure 4(d).
3.3.4. While Loops and Do-While Loops
"While loops" and "do-while loops" cases are described jointly since it is only the entry to the loop structure that separates them and their cyclomatic complexity are equivalent. The "while loops" consist of two main parts: the while body
, and the while evaluation
. This is illustrated in Figure 4(c). Deciding whether to continue looping is decided by the control node
based on the output of the
. Similarly to the "for loop," each iteration of the graph needs to pass through the evaluation nodes, so the number of independent paths can be calculated as
In many cases, the evaluation part of the while loop is a set of simple test functions, meaning that
, which leaves the
.
3.3.5. Functions
The goal is to identify the number of independent paths in the algorithm/system. For this, reuse in terms of functions/blocks of code is important. When all independent paths through a function are known, reuse of this function does not change the number of independent paths in the system. From an implementation point of view, such functions represent an entity where the paths only need to be implemented once. In HCDFGs, a function/block can be seen as an encapsulated
. Therefore, the number of independent paths in function/blocks of reused code should only count once. The paths can be calculated as
3.3.6. HCDFGs in Parallel and Serial
Knowing how to handle all the HCDFGs that are identified for reuse (function), together with all the CDFGs, does not give it all. How the hierarchy of graphs should be combined is also of interest. For a parallel combination of two or more HCDFGs/CDFGs, as shown in Figure 4(e), the increase in the number of independent paths is then additive. The number of paths can be calculated as
where
represents the number of nodes in parallel,
the index to the corresponding node where the paths are measured.
For serial combination of two or more HCDFGs and/or CDFGs, the number of independent paths is a combination of the independent paths of the involved HCDFGs/CDFGs. Remembering that there always needs to be one path through the system, the number of independent paths in a serial combination, is given as
where
represents the number of nodes in serial,
the index to the corresponding node where the paths are measured.
An example of serial combination is shown in Figure 4(f). The number of independent paths for the entire algorithm, (
), is equivalent to the top HCDFG node which includes all the independent paths of its subgraphs.
3.4. Experience Impact
The experience of the designer has an impact on the challenge that he/she is facing when developing a system. A radical example is when a beginner and a developer with ten years of experience are asked to solve the same task. They will not see equal difficulty in the same task, and thereby do not need to put the same effort into the development.
Experience is influenced by many parameters but in this work we only focus on the time the developer has worked with the implementation language and the target architecture.
The impact of experience is a factor that slowly decreases over time: consider a new developer, the experience that he/she obtains in the first months working with the language, and architecture improves his/her skills significantly. On the other hand, a developer who has worked with the language and architecture for five years, for example, will not improve her/his skills at the same rate by working an extra year. The impact from the experience is therefore not linear but tends to have a negative acceleration or inverse logarithmic nature, with dramatic change in impact in the beginning, progressing towards little or no change as time increases.
In literature, for example, [24], many studies try to fit historical data to models. An example of a model is a power function with negative slope or a negative exponential function. From the vast variety of models that has been proposed over the years, the only conclusion that can be drawn is that there are multiple curvatures, but they all appear to have a negative accelerating slope, which tends to be exponential/logarithmic.
In order to get the best possible outset for predicting the implementation effort, it is of vital importance to obtain some data of the developers' experiences, and also how they performed in the past. The parameters involved in the experience curve can then be trimmed to create the best possible fit. However, it has not been the purpose of this work to select the perfect nature for a learning curve nor to evaluate the accuracy of such one. The learning curve will be adapted to the individual developers, and as the model is used in subsequent projects, its accuracy will progressively improve. As a consequence, the experience here is only intended as an element in modelling the complexity and thereby a means for more accurate estimates.
For the experiments in this study we have chosen to use the following model:
where
and
are trim parameters which can be used to optimise the curve to fit reality,
is the number of weeks which the developer, Dev, has worked with the language and architecture. Figure 5 depicts the shape of the experience model.
In this work,our initial experiments have shown that setting
and
makes our model sufficiently general, and therefore we have not further investigated the tuning of these two parameters.