From: Performance Analysis of Bit-Width Reduced Floating-Point Arithmetic Units in FPGAs: A Case Study of Neural Network-Based Face Detector
Bit-width
Calculation
Experiment
MRRE
ARRE
max
FPU32
4E-05
2.89E-05
1.93E-05
FPU24
0.0026
0.0018
0.0012
FPU20
0.0410
0.0296
0.0192
FPU18
0.1641
0.1184
0.0766
FPU16
0.6560
0.4733
0.2816
FPU14
2.62
1.891
0.9872
FPU12
10.4
7.5256
1.0741