In this chapter the most important implementation details of the framework are given. First, the messages exchanged between the host and the FI logic are explained. The two following parts describe the inner workings of FIJI’s hardware, as well as the software handling configuration, instrumentation, and run-time execution of the FI. Eventually, the small use case intended to allow new users to become familiar with FIJI and experiment with its various options is introduced.
4.1 Communication protocol between host and FIC
Figure 6 depicts the fields in a fault configuration message sent from the host to the FIC. It contains six bits per FIU for the configuration of two fault patterns. A number of padding bits is prepended to the message to align the FIC configuration data (timer values, design ID, …) to byte boundaries. The exact format depends on the parametrization of the FI hardware subsystem, i.e., the timer width and the number of FIUs.
The following bytes specify attributes applying to all the instrumented nets for the sequence of phases determined by this instance of configuration message. This data consists of two timer reload values determining the length of the COUNT and FAULT1 phases and a configuration byte. The latter determines the reset (R: “reset enable”) and trigger behavior (TE: “trigger enable”, XT: “external trigger”) of the sequence. Setting bit U instructs the FIC to discard the fault patterns and only send a status update back to the host.
The configuration message is concluded with a 16-bit design ID. The purpose of this ID is to prevent the unintentional use of a configuration mismatching the instrumented netlist. For the configuration to be actually applied, this ID must match the one embedded into the design at instrumentation time. The design ID is generated by the Instrumentation Tool by hashing the original DUT netlist and FIJI’s parametrization via VHDL constants. Additionally, the integrity of configuration messages is protected via an 8-bit CRC (CCITT polynomial x8+x2+x+1).
The messages returned by the hardware to the host are single-byte status words as depicted in Fig. 7 that describe the current state of the system. Additionally, the value of two fault detection nets are output that may be used to detect the propagation of an injected error through the design and verify that fault detection within the DUT is working. The integrity of the single-byte status word is protected by a parity bit.
4.2 Fault injection hardware
The HDL wrapper generated by the instrumentation tool instantiates the modified netlist. This netlist has the same input and output ports as the original netlist, but in addition has the broken-up nets exposed as ports, with the original driver now connected to an output, and all the driven cells’ inputs connected to an input port.
By parameterizing the wrapper, the entire FI capability can be turned on and off. Either the FI hardware subsystem is instantiated for the instrumented nets to pass through, or each instrumented output is directly connected to the corresponding input effectively reconnecting the split nets (just outside the DUT). This allows the use of the same instrumented netlist of the DUT for FI tests as well as in the final design (i.e., without any FI capabilities).
Additionally, separation constraints can be set to produce a bitstream that contains the DUT and the FIJI logic in distinct physical blocks of the FPGA device and prohibit optimizations beyond DUT boundaries. That way any unintended influences between FIJI and the DUT are avoided. Figure 8 shows an exemplary floorplan with physical separation between the DUT’s logic and FIJI.
Details of the FI hardware subsystem can be seen in Fig. 9. In the remainder of this subsection, the functionality and the specifics of the major blocks will be described. The three major components are the UART that establishes the communication to the host PC, the FIUs that directly tap into the instrumented nets, and the FIC that reacts to configuration messages sent by the host and orchestrates the whole FI.
4.2.1 Fault injection controller
At runtime, the behavior of the FIC can be controlled via a serial interface. This interface consists of a data signal, a shift-enable signal indicating that the data line is valid, and a framing signal that indicates the start of a new byte. Additionally, an error signal is used to inform the FIC of detected transmission errors.
A finite state machine (FSM) in the FIC forwards the received data to its internal registers corresponding to the fields in the protocol for the general configuration information and to the FIUs for the fault pattern information. In addition to directing the received data, the FIC also handles the activation of the FIUs, and, if configured, the activation of a reset signal for the DUT according to the trigger and timer values.
In particular, the FIC can be instructed via the configuration message to defer execution of the fault patterns until an edge is registered on either a trigger signal coming from the DUT or an external trigger. The polarity of these edges can be configured at instrumentation time. Timing for the application of the fault patterns is controlled by a timer unit in the FIC. The width of this timer unit can be configured at instrumentation time and is loaded by the FIC with the transmitted timer values at runtime.
Furthermore, the FIC can be configured to generate a reset pulse of adjustable duration if the FIC-to-DUT reset is enabled at instrumentation time. The entire FI hardware subsystem can either be reset by a signal from an external port or from a net of the DUT (the latter, of course, only if the FIC-to-DUT reset capability is disabled).
The internal state of the FIC provides the majority of the status bits transmitted back to the host via the UART (cf. Fig. 7).
Finally, the FIC module also hosts the LFSR that is used by the FIUs to emulate a floating net caused by a stuck-open error. Its width and generator polynomial are selectable at instrumentation time. Each FIU uses a subset of its output bits to create the signal of the “floating” net.
4.2.2 External communication interface
The intended use case for the FI hardware subsystem is to be controlled via a host external to the FPGA. An asynchronous serial interface allows bidirectional communication while requiring just two additional IO pins of the FPGA device and a justifiable amount of FPGA hardware resources.
The UART module handles the synchronization of the incoming data signal as well as the detection of the data framing (start and stop bits). Data is forwarded to the FIC via its serial interface; detected framing errors are signaled using the error line. Furthermore, the parallel outgoing data from the FIC is serialized and a parity bit appended before it is sent to the host, as a CRC for this direction of data transfer would be inefficient.
During normal operation, a configuration message that is sent by the host to the FIC results in an immediate CONF_DONE message by the FIC, followed by a READY message at the end of the fault phase, i.e., after the timer duration t1 has been counted down (cf. Section 3.4). Furthermore, an UNDERRUN message is sent if the host did not send a new configuration during the first injection phase as specified by t2.
As both t1 and t2 can be as short as a single clock cycle, the READY and UNDERRUN messages need to be buffered before being sent over the serial link. For this purpose, FIJI instantiates a small FIFO buffer that can hold three messages to be transmitted.
4.2.3 Fault injection units
For each instrumented net, the generated FI wrapper instantiates an FIU. The purpose of each FIU is to inject a fault into its corresponding net when instructed by the FIC. Each FIU is configurable to support at runtime either all fault models (stuck-at-0/1, delay, SEU, stuck-open) or only a single fault model. This reduces the amount of hardware resources and the path delay introduced by the bigger multiplexer thus helping to maintain clock frequencies when instrumenting critical nets.
Each FIU contains two sets of six-bit registers: a shift register that is used to shift in the configuration issued by the FIC, and a pattern register holding the currently active fault patterns. The shift registers in all the FIUs are daisy-chained and filled by the FIC as it receives configuration data (cf. Fig. 6). A separate signal activated by the FIC instructs the FIUs to update the pattern register with the newly shifted-in data. This allows the current configuration to stay active while shifting in new data, as well as discarding an invalid configuration (e.g., if the message’s CRC is incorrect) before it affects the DUT.
A multiplexer that is controlled by the currently applied fault pattern bits selects which signal is fed into the DUT. This can be the value of the original net or any faulty version of the signal if a fault is to be injected. The FIU realizes each of the supported fault models as configured. To that end, the multiplexer can select between various flawed inputs, such as constant lows and highs (for stuck-at-0/1 errors), a register with the previous value of the original net (to emulate delay faults), and parts of the LFSR. A user-defined mask is used to customize the frequency and sequence for the simulated stuck-open outputs of the FIUs. This mask specifies which of LFSR bits to AND together to form the faulty signal.
4.3 FIJI software components
The three main software parts of FIJI are written in Perl 5. For graphical user interfaces, they rely on Perl/Tk while netlists are parsed and manipulated with the help of Verilog-Perl [27], which is a library written in Perl and C++ for parsing Verilog and SystemVerilog. Because Verilog-Perl could not distinguish nor manipulate individual signals of concatenations or vectored nets, we added support for these features and contributed it back to the community in an iterating process with helpful reviews from the upstream maintainer Wilson Snyder. The first upstream release including these changes is version 3.440.
The four main Perl programs utilized in FIJI’s flow are depicted as reddish blocks with round corners in Fig. 3 and will be described in the following sections.
4.3.1 Setup tool
FIJI Setup (fiji_setup.pl, see Fig. 3) is a graphical configuration tool that provides the user a comfortable way to specify the various options of a FIJI project and to save/restore it in/from a text-based configuration file. Figure 10 presents a typical instance of FIJI Setup that depicts the tab used to configure the individual FIUs. On the top and bottom, there are some widgets commonly shown for all tabs. Among other things, the status bar on the bottom shows the deviation of FPGA resources relative to a simple default configuration of FIJI (virtual resource factors) to allow the engineer to estimate the resource usage of changed settings without the need to completely synthesize the whole design. The graphical representation of the system shown on the left of the window is also always visible and dynamically rearranged to reflect the current configuration data. This diagram like many other widgets in FIJI Setup provides more detailed information when hovering the mouse over respective elements by way of tooltips. Additionally, all input widgets validate user inputs and signal possible errors or inconsistencies including the causes.
In the FIU tab, which is active in the screenshot, the engineer can create new FIUs or edit, rearrange, and delete existing ones. The select buttons open dialogs allowing to search for elements of a loaded netlist by simple substrings, typical globbing, or regular expressions. They are used to define the instrumented net of the respective FIU (i.e., the locations where to inject faults) and select the driver thereof in case of ambiguity. The Model and LFSR mask settings refer to the fault model and selection of random bits as explained at the end of Section 4.2.3.
Additionally, the FIUs can be named individually for easier discernibility in the configuration file(s) and latter steps.
4.3.2 Instrumentation tool
The main task of the instrumentation tool (fiji_instrument.pl, see Fig. 3) is to actually perform the modifications of the DUT netlist according to the information in the FIJI Settings. In particular, it breaks up each selected net and inserts a saboteur in between as follows. User interaction is not required during this process; therefore, the tool was implemented as a command-line program.
For each [FIUn] entry in the settings file, it splits the corresponding net into an original and a modified net as shown in Fig. 11. The driver attached to the original net is then connected to a newly created output port of the module where it is instantiated, while the driven pins via the modified net are connected to a new input. Both get passed on to the existing top-level entity where they are further exported by adding them to the entities external interface. This modified netlist is written to a new file, alongside with a wrapper that instantiates and connects both the modified netlist and the parametrized FI hardware subsystem (which includes the FIC and FIUs) as shown in Fig. 2. The wrapper has essentially the same input and output ports as the original netlist, allowing to re-use existing pin constraints. It, however, also introduces a small number of additional pins needed by the FI subsystem (e.g., to communicate with the host). Additionally, the instrumentation tool also produces a set of template constraint files for the chosen P&R tool to logically and/or physically separate the FI logic and the DUT netlist.
All steps of the instrumentation need to take busses into account. Moreover, FIJI supports instrumentation of multiple bits of a single bus (with one dedicated FIU per bit). To test our Verilog-Perl changes required to accomplish that as well as the instrumentation code itself, several unit tests were set up consisting of minimal netlists and associated FIJI Settings. These tests cover many different combinations of instrumentation targets (e.g., pins, internal nets) and drivers of these targets which helped to find many bugs in corner cases.
First, the unit tests instrument the netlists with the instrumentation tool. The instrumented netlists are quickly checked for syntax errors and the like by synthesizing them with Synplify and filtering out benign warnings. In addition to the syntax check, a behavioral simulation tries to find discrepancies between an instrumented and untouched entity of the respective netlist. To that end, the test instantiates them in a testbench and compares their output when fed auto-generated input. The tests are not exhaustive but very effective since instrumentation bugs usually affect the signal path in a very direct manner and more formal techniques are unnecessary in this particular use case.
4.3.3 Execution engine
While the exact timing of the individual FI phases is handled by the hardware, the FIJI software on the host PC controls the broader aspects of the execution by providing FI configurations to the hardware (cf. Section 3.4). It is therefore responsible to supply the FIC with new patterns to sustain a cycle-accurate FI if this is desired. This will be discussed in more detail in Section 5.5.
There are two options available to control the FIJI logic at run time (as can be seen in Fig. 3). The FIJI Execution Engine (FIJIEE) tool is a command-line tool which facilitates downloading pre-defined or random test patterns but can also be controlled interactively. The FIJIEE GUI tool (shown in Fig. 12) provides a graphical user interface for roughly the same functionality. Both communicate with the FIJI logic in the target FPGA via serial connection (UART) to download test patterns and read back status information.
The FIJIEE GUI can create a sequence of fault patterns to be downloaded one after another. For each test pattern, the timer values can be changed, the reset and trigger options enabled, and the individual FIUs configured. Such a sequence of patterns together with settings applying to the whole sequence (e.g., conditions when to halt the test execution) can be saved to a file and reloaded later or exerted by the command-line tool (e.g., in automated tests).
Both applications support a manual mode where the parameters of a single fault configuration can be entered and executed. Additionally, random tests can be set up where the user determines the global probability of the various fault models as well as lower and upper bounds on t1 and t2. The probabilities are then used to determine the state of the FIUs in each message sent in two possible ways: Either a single FIU is selected at random and armed according to the global probabilities, or each FIU is subject to possible faults according to these probabilities, which potentially leads to multiple injected faults in a single phase.
Once a FI test has been completed (either a complete sequence has been downloaded, a fault detect line reported an error, the user aborted the sequence or manual downloading, or a transmission error occurred), the current test run can be exported by the FIJIEE GUI for reproducing it at a later time in hardware or in simulation.
For simulations, two ways are supported: For RTL simulation, FI logic templates can be created that have to be integrated into the various RTL modules manually. Additionally, a scheduling process that controls these templates using hierarchical identifiers is exported.
Alternatively, FIJIEE can also prepare a gate-level simulation for re-executing the test patterns that ran in actual hardware. To that end, FIJIEE tools are able to export the executed tests as a VHDL architecture for the top-level entity of the FI logic. This architecture replaces the FIC and the FIUs with a simulation-only description that sets the modified net outputs according to the timing of the test run previously executed in hardware.
4.4 Demo design
FIJI was put to practical use in an FPGA-based safety demonstrator containing a triplicated MC8051 soft CPU core that allows for limited N-version programming [28]. As this demonstrator is quite complex and rather heavyweight concerning synthesis tool runtimes, the first public release of FIJI (cf. “Availability of data and materials” section) also contains a reduced version of this demonstrator as a demo design. This design is intended to guide the user through the first steps with the FIJI framework.
The demo design consists of a simple VGA controller that moves a small airplane sprite across the screen. Although initially designed for a few low-cost FPGA boards only (i.e., Terasic DE0 (Altera/Intel Cyclone III), Digilent Basys 3 (Xilinx Artix-7) and Zybo (Xilinx Zynq-7010)), it is easily portable to other development boards that provide a DIP switch, at least two buttons, and a VGA connector.
An overview of the design is shown in Fig. 13. The design contains a module that is responsible for generating the VGA timing signals, as well as row and column information for the sprite engine that moves the airplane image across the screen. The output of this sprite engine are the red, green, and blue color signals for the VGA interface. Triple modular redundancy (TMR) has been applied to the sprite engine, with majority voting over each color signal. The voter can be disabled using a DIP switch.
By instrumenting nets in the netlist of this demo design, the user can explore the effects of failures injected into different portions of the design. For example, errors in the VGA timing controller cannot be masked by TMR, making this design unit a single point of failure. Faults in one of the three sprite engine instances can be tolerated without effect if the majority voter is enabled.
Figure 14 shows the behavior of the demo design in simulation under the injection of each supported fault type. The respective faults are injected into the MSB output signal of the sprite engine’s shift register that holds the current line of the airplane image. The original value of this signal is denoted as fiji_s_sprite_line_31_ori_o in Fig. 14, while the faulty versions of this signal are denoted as as fiji_s_sprite_line_31_inj_i. In the demo design (see Fig. 13), this shift register output is used to generate the values of the RGB output registers at each VGA pixel clock cycle. The blue output register is displayed in Fig. 14 as s_blue_tmr_partitions_1. In this example, the injected stuck-at and stuck-open faults propagate to the design’s output, i.e., they affect the VGA signal. Moreover, the example illustrates that in some cases—if a delay fault or an SEU fault is injected—a fault is masked by the state of the DUT. Thus, the fault effect is visible on the shift register output but does not propagate to the design’s boundary.