This Special Issue of the EURASIP Journal of embedded systems is intended to present innovative methods, tools, design methodologies, and frameworks for algorithm-architecture matching approach in the design flow including system level design and hardware/software codesign, RTOS, system modeling and rapid prototyping, system synthesis, design verification, and performance analysis and estimation.
Today, typical sequential design flows are in use and they are reaching their limits due to:
(i)The complexity of today's systems designed with the emerging submicron technologies for integrated circuit manufacturing
(ii)The intense pressure on the design cycle time in order to reach shorter time-to-market and reduce development and production costs
(iii)The strict performance constraints that have to be reached in the end, typically low and/or guaranteed application execution time, integrated circuit area, and overall system power dissipation.
Because in such design methodology the system is seen as a whole, this special issue also covers the following topics:
(i) New and emerging architectures: SoC, MPSoC, configurable computing (ASIPs), (dynamically) reconfigurable systems using FPGAs
(ii) Smart sensors: audio and image sensors for high performance and energy efficiency
(iv) Resource management techniques for real-time operating systems in a codesign framework
(v) Systems and architectures for real-time image processing
(vi) Formal models, transformations, and architectures for reliable embedded system design.
We received 30 submissions of which we eventually accepted 17 for publication.
The paper entitled "Multicore software defined radio architecture for GNSS receiver signal processing" by H. Hurskainen et al. describes a multicore Software Defined Radio (SDR) architecture for Global Navigation Satellite System (GNSS) receiver implementation. Three GNSS SDR architectures are discussed: a hardware-based SDR that is feasible for embedded devices but relatively expensive, a pure SDR approach that has high level of flexibility and low bill of material, but is not yet suited for handheld applications, and a novel architecture that uses a programable array of multiple processing cores that exhibits both flexibility and potential for mobile devices.
The paper entitled "An open framework for rapid prototyping of signal processing applications" by M. Pelcat et al. presents an open source eclipse-based framework which aims to facilitate the exploration and development processes in this context. The framework includes a generic graph editor (Graphiti), a graph transformation library (SDF4J), and an automatic mapper/scheduler tool with simulation and code generation capabilities (PREESM). The input of the framework is composed of a scenario description and two graphs: one graph describes an algorithm and the second graph describes an architecture. As an example, a prototype for 3GPP long-term evolution (LTE) algorithm on a multi-core digital signal processor is built, illustrating both the features and the capabilities of this framework.
The paper entitled "Run-time HW/SW scheduling of data flow applications on reconfigurable architectures" by F. Ghaffari et al. presents an efficient dynamic and run-time Hardware/Software scheduling approach. This scheduling heuristic consists in mapping on line the different tasks of a highly dynamic application in such a way that the total execution time is minimized. On several image processing applications, the scheduling method is applied. The presented experiments include simulation and synthesis results on a Virtex V-based platform. These results show a better performance against existing methods.
The paper entitled "Techniques and architectures for hazard-free semiparallel decoding of LDPC codes" by M. Rovini et al. describes three different techniques to properly reschedule the decoding updates, based on the careful insertion of "idle" cycles, to prevent the hazards of the pipeline mechanism in LDPC decoding. Along these different semiparallel architectures of a layered LDPC decoder suitable for use with such techniques are analyzed. Taking the LDPC codes for the wireless local area network (IEEE 802.11n) as a case study, a detailed analysis of the performance attained with the proposed techniques and architectures is reported, and results of the logic synthesis on a 65 nm low-power CMOS technology are shown.
The paper entitled "OLLAF: a fine grained dynamically reconfigurable architecture for OS support" by S. Garcia and B. Granado presents OLLAF, a fine grained dynamically reconfigurable architecture (FGDRA), specially designed to efficiently support an OS. The studies presented here show the contribution of this architecture in terms of hardware context management, preemption support, as well as the gain that can be obtained, by using OLLAF instead of a classical FPGA, in terms of context management and preemption overhead.
The paper entitled "Trade-off exploration for target tracking application in a customized multiprocessor architecture" by J. Khan et al. presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC) architecture optimized for multiple target tracking (MTT) in automotive applications. The paper explains how the MTT application is designed and profiled to partition it among different processors. It also explains how different optimizations were applied to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization, resulting in a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the real-time constraints of the given application.
The paper entitled "A prototyping virtual socket system-on-platform architecture with a novel ACQPPS motion estimator for H.264 video encoding applications" by Y. Qiu and W. M. Badawy presents a novel adaptive crossed quarter polar pattern search (ACQPPS) algorithm that is proposed to realize an enhanced inter prediction for H.264. Moreover, an efficient prototyping system-on-platform architecture is also presented, which can be utilized for a realization of H.264 baseline profile encoder with the support of integrated ACQPPS motion estimator and related video IP accelerators. The implementation results show that ACQPPS motion estimator can achieve very high estimated image quality comparable to that from the full search method, in terms of peak signal-to-noise ratio (PSNR), while keeping the complexity at an extremely low level.
The paper entitled "FPSoC-based architecture for a fast motion estimation algorithm in H.264/AVC" by O. Ndili and T. Ogunfunmi presents an architecture based on a modified hybrid fast motion estimation (FME) algorithm. Presented results show that the modified hybrid FME algorithm outperforms previous state-of-the-art FME algorithms, while its losses, when compared with FSME (full search motion estimation), in terms of PSNR performance and computation time are insignificant.
The paper entitled "FPGA accelerator for wavelet-based automated global image registration" by B. Li et al. presents an architecture for wavelet-based automated global image registration (WAGIR) that is fundamental for most remote sensing image processing algorithms, and extremely computation intensive. They propose a block wavelet-based automated global image registration (BWAGIR) architecture based on a block resampling scheme. The architecture with 1 processing unit outperforms the CL cluster system with 1 node by at least 7.4X, and the MPM massively parallel machine with 1 node by at least 3.4X. And the BWAGIR with 5 units achieves a speedup of about 3X against the CL with 16 nodes, and a comparable speed with the MPM with 30 nodes.
The paper entitled "A system for an accurate 3D reconstruction in video endoscopy capsule" by A. Kolar et al. presents the hardware and software development of a wireless multispectral vision sensor which allows transmitting a 3D reconstruction of a scene in real time. The paper also presents a method to acquire the images at a 25 frames/s video rate with a discrimination between the texture and the projected pattern. This method uses an energetic approach, a pulsed projector, and an original 64 64 CMOS image sensor with programable integration time. Multiple images are taken with different integration times to obtain an image of the pattern which is more energetic than the background texture. Also presented is a 3D reconstruction processing that allows a precise and real-time reconstruction. This processing which is specifically designed for an integrated sensor and its integration in an FPGA-like device has a low power consumption compatible with a VCE examination. The paper presents experimental results with the realization of a large-scale demonstrator using an SOPC prototyping board.
The paper entitled "Performance evaluation of UML2-modeled embedded streaming applications with system-level simulation" by T. Arpinen et al. presents an efficient method to capture abstract performance model of a streaming data real-time embedded system (RTES). This method uses an MDA (model driven architecture) approach. The goal of the performance modeling and simulation is to achieve early estimates on PE, memory, and on-chip network utilization, task response times, among other information that is used for design-space exploration. UML2 is used for performance model specification. The application workload modeling is carried out using UML2 activity diagrams. Platform is described with structural UML2 diagrams and model elements annotated with performance values. The focus here is on modeling streaming data applications. It is characteristic to streaming applications that a long sequence of data items flows through a stable set of computation steps (tasks) with only occasional control messaging and branching.
The paper entitled "Cascade boosting-based object detection from high-level description to hardware implementation" by K. Khattab et al. presents an implementation of boosting-based object detection algorithms that are considered the fastest accurate object detection algorithms today, but their implementation in a real-time solution is still a challenge. A new parallel architecture, which exploits the parallelism and the pipelining in these algorithms, is proposed. The method to develop this architecture was based on a high-level SystemC description. SystemC enables PC simulation that allows simple and fast testing and leaves the structure open to any kind of hardware or software implementation since SystemC is independent from all platforms.
The paper entitled "Very low memory wavelet compression architecture using strip-based processing for implementation in wireless sensor networks" by L. W. Chew et al. presents a hardware architecture for strip-based image compression using the SPIHT algorithm. The lifting-based 5/3 DWT which supports a lossless transformation is used in the proposed work. The wavelet coefficients output from the DWT module is stored in a strip buffer in a predefined location using a new 1D addressing method for SPIHT coding. In addition, a proposed modification on the traditional SPIHT algorithm is also presented. In order to improve the coding performance, a degree-0 zerotree coding methodology is applied during the implementation of SPIHT coding. To facilitate the hardware implementation, the proposed SPIHT coding eliminates the use of lists in its set-partitioning approach and is implemented in two passes. The proposed modification reduces both the memory requirement and complexity of the hardware coder.
The paper entitled "Data cache-energy and throughput models: design exploration for embedded processors" by M. Y. Qadri and K. D. McDonald Maier proposes cache-energy models. These models strive to provide a complete application-based analysis. As a result they could facilitate the tuning of a cache and an application according for a given power budget. The models presented in this paper are an improved extension of energy and throughput models for a data cache in term of the leakage energy that is indicated for the entire processor rather than simply the cache on its own. The energy model covers the per cycle energy consumption of the processor. The leakage energy statistics of the processor in the data sheet covers the cache and all peripherals of the chip. It is also improved in terms of refinement of the miss rate that has been split into two terms: a read miss rate and a write miss rate. This was done as the read energy and write energy components correspond to the respective miss rate contribution of the cache. The model-based approach presented was used to predict the processors performance with sufficient accuracy. An example application for design exploration that could facilitate the identification of an optimal cache configuration and code profile for a target application was discussed.
The paper entitled "Hardware architecture for pattern recognition in gamma-ray experiment" by S. Khatchadourian et al. presents an intelligent way of triggering data in the HESS (high energy stereoscopic system) phase II experiment. The system relies on the utilization of image processing algorithms in order to increase the trigger efficiency. The proposed trigger scheme is based on a neural system that extracts the interesting features of the incoming images and rejects the background more efficiently than classical solutions. The paper presents the basic principles of the algorithms as well as their hardware implementation in FPGAs.
The paper entitled "Evaluation and design space exploration of a time-division multiplexed NoC on FPGA for image analysis applications" by L. Zhang et al. presents an adaptable fat tree NoC architecture for field programmable gate array (FPGA) designed for image analysis applications. The authors propose a dedicated communication architecture for image analysis algorithms. This communication mechanism is a generic NoC infrastructure dedicated to dataflow image processing applications, mixing circuit-switching and packet-switching communications. The complete architecture integrates two dedicated communication architectures and reusable IP blocks. Communications are based on the NoC concept to support the high bandwidth required for a large number and type of data. For data communication inside the architecture, an efficient time-division multiplexed (TDM) architecture is proposed. This NoC uses a fat tree (FT) topology with virtual channels (VC) and flit packet-switching with fixed routes. Two versions of the NoC are presented in this paper. The results of their implementations and their design space exploration (DSE) on Altera StratixII are analyzed and compared with a point-to-point communication and illustrated with a multispectral image application.
The paper entitled "Efficient processing of a rainfall simulation watershed on an FPGA-based architecture with fast access to neighborhood pixels" by L. S. Yeong et al. describes a hardware architecture to implement the watershed algorithm using rainfall simulation. The speed of the architecture is increased by utilizing a multiple memory bank approach to allow parallel access to the neighborhood pixel values. In a single read cycle, the architecture is able to obtain all five values of the center and four neighbors for a 4 connectivity watershed transform. The proposed rainfall watershed architecture consists of two parts. The first part performs the arrowing operation and the second part assigns each pixel to its associated catchment basin.
Markus RuppAhmet T. ErdoganBertrand Granado
Authors and Affiliations
Institute of Communications and Radio-Frequency Engineering (INTHFT), Vienna University of Thechnology, 1040, Vienna, Austria
The School of Engineering and Electronics, The University of Edinburgh, Edinburgh, EH9 3JL, UK
ENSEA, Cergy-Pontoise University, boulevard du Port-95011 Cergy-Pontoise Cedex, France
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.