Skip to main content

Design of a Real-Time Face Detection Parallel Architecture Using High-Level Synthesis


We describe a High-Level Synthesis implementation of a parallel architecture for face detection. The chosen face detection method is the well-known Convolutional Face Finder (CFF) algorithm, which consists of a pipeline of convolution operations. We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE), by describing in C language inter-PE communication, fine scheduling of the successive convolutions, and memory distribution and bandwidth. Using this approach, we explore several implementation alternatives in order to find a compromise between processing speed and area of the PE. We then build a parallel architecture composed of a PE ring and a FIFO memory, which constitutes a generic architecture capable of processing images of different sizes. A ring of 25 PEs running at 80 MHz is able to process 127 QVGA images per second or 35 VGA images per second.

Publisher note

To access the full article, please see PDF.

Author information



Corresponding author

Correspondence to Franck Mamalet.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Farrugia, N., Mamalet, F., Roux, S. et al. Design of a Real-Time Face Detection Parallel Architecture Using High-Level Synthesis. J Embedded Systems 2008, 938256 (2008).

Download citation


  • Convolution
  • Processing Element
  • Face Detection
  • Parallel Architecture
  • Convolution Operation