FPGA Supercomputing Platforms, Architectures, and Techniques for Accelerating Computationally Complex Algorithms
© V. Sriram and M. Leeser. 2009
Received: 6 May 2009
Accepted: 6 May 2009
Published: 30 July 2009
This is a special issue on FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms. This issue covers a broad range of applications in which field programmable gate arrays (FPGAs) are successfully used to accelerate processing. It also provides researcher's insights on the challenges in successfully using FPGAs. The applications discussed include motor control, radar processing, face recognition, processing seismic data, and accelerating random number generation. Techniques discussed by the authors include partitioning between a CPU and FPGA hardware, reducing bitwidth to improve performance, interfacing to analog signals, and using high level tools to develop applications.
Two challenges that face many users of reconfigurable hardware are interfacing to the analog domain and easing the job of developing applications. In the paper entitled "Prototyping Advanced Control Systems on FPGA," the authors present a rapid prototyping platform and design flow suitable for the design of onchip motion controllers and other SoCs with a need for analog interfacing. The target hardware platform consists of a customized FPGA design for the Amirix AP1000 PCI FPGA board coupled with a multichannel analog I/O daughter card. The design flow uses Xilinx System Generator in MATLAb/Simulink for system design and test, and Xilinx Platform Studio for SoC integration. This approach has been applied to the analysis, design, and hardware implementation of a vector controller for 3-phase AC induction motors.
Image processing is an application area that exhibits a great deal of parallelism. In the work entitled "Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing," the authors investigate the use of a high-performance reconfigurable supercomputer built from both general-purpose processors and FPGAs. These architectures allow a designer to exploit both fine-grained and coarse-grained parallelism, achieving high degrees of speedup. The authors describe how backprojection, used to reconstruct Synthetic Aperture Radar (SAR) images, is implemented on a high-performance reconfigurable computer system. The results show an overall application speedup of 50 times.
Neural networks have successfully been used to detect faces in video images. In the paper entitled "Performance Analysis of Bit-Width Reduced Floating-Point Arithmetic Units in FPGAs: Case Study of Neural Network-based Face Detector," the authors describe the implementation of an FPGA-based face detector using a neural network and bit-width reduced floating-point arithmetic units (FPUs). The FPUs and neural network are designed using MATLAB and VHDL, and the two implementations are compared. The authors demonstrate that reductions in the number of bits used in arithmetic computation can produce significant cost reductions including area, speed, and power with a small sacrifice in accuracy.
The oil and gas industry has a huge demand for high-performance computing on extremely large volumes of data. FPGAs are exceedingly well matched for this task. Reduced precision arithmetic operations can greatly decrease the area cost and I/O bandwidth of the FPGA-based design, supporting increased parallelism and achieving high performance. In the work entitled "Accelerating Seismic Computations Using Customized Number Representations on FPGAs," the authors present a tool to determine the minimum-number of precision that still provides acceptable accuracy for seismic applications. By using the minimized number format, the authors are able to demonstrate speedups ranging from 5 to 7 times, including overhead costs such as the transfer time to and from the general purpose processors. With improved bandwidth between CPU and FPGA, the authors show that a 48-time speedup is possible.
A large number of applications require large quantities of uncorrelated random numbers. In the paper entitled "An FPGA Implementation of a Parallelized MT19937 Uniform Random Number Generator", Vinay Sriram and David Kearney present a fast uniform random-number generator implemented in reconfigurable hardware that is both higher throughput and more area efficient than previous implementations. The design presented, which generates up to 624 random numbers in parallel, has a throughput that is more than 15 times better than previously published results.
This collection of papers represents an overview of active research in the field of reconfigurable hardware applications and techniques.
Vinay SriramMiriam Leeser
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.