# Embedded tracking algorithm based on multi-feature crowd fusion and visual object compression

- Zheng Wenyi
^{1, 2}Email author and - Dong Decun
^{1}

**2016**:16

https://doi.org/10.1186/s13639-016-0053-7

© The Author(s). 2016

**Received: **9 June 2016

**Accepted: **7 September 2016

**Published: **21 September 2016

## Abstract

The accuracy and poor real-time performance of moving objects in a dynamic range complex environment become the bottleneck problem of the target location and tracking. In order to improve the positioning accuracy and the quality of tracking service, we propose an embedded tracking algorithm based on multi-feature fusion and visual object compression. On the hand, according to the feature of the target, the optimal feature matching method is selected, and the multi-feature crowd fusion location model is proposed. On the other hand, to reduce the dimension of the multidimensional space composed of the moving object visual frame and the compression of the visual object, the embedded tracking algorithm is established. Experimental results show that the proposed tracking algorithm has high precision, low energy consumption, and low delay.

## Keywords

## 1 Introduction

Moving target tracking is one of the most active areas in the development of science and technology. The target tracking algorithms have been widely valued by all countries in the world [1]. With the performance of continuous improvement and expansion, location and tracking algorithm for successful application in industry, agriculture, health care, and service industry [2], in the urban security, defense and space exploration have dangerous situations [3] is to show their talents.

The low-complexity and high-accuracy algorithm was presented in [4], for reducing the computational load of the traditional data-fusion algorithm with heterogeneous observations for location tracking. Trogh et al. [5] presented a radio frequency-based location tracking system, which could improve the performance by eliminating the shadowing. In [6], Liang and Krause proposed the proof-of-concept system based on a sensor fusion approach, which was built with considerations for lower cost, and higher mobility, deplorability, and portability, by combining the drift velocities of anchor nodes. The scheme of [7] could estimate the drift velocity of the tracked node by using spatial correlation of ocean current. The distributed multi-human location algorithm was researched by Yang et al. [8] for a binary piezoelectric infrared sensor tracking system.

The model-based approach was presented in [9], which is used to predict the geometric structure of an object using its visual hull. The task-dependent codebook compression framework was proposed to learn a compression function and adapt the codebook compression [10]. Ji et.al [11] proposed a novel compact bag-of-patterns descriptor with an application to low bit rate mobile landmark search. The blocks were flagged by lying in the object regions flagging compression blocks and an object tree would be added in each coding tree unit to describe the object’s shape in its additional object tree [12].

However, how to provide guarantee for the high precision tracking of moving objects in the dynamic range and complex environment and the complexity of the optimization algorithm is one of the most difficult problems. Based on the results of the above researches, the embedded tracking algorithm based on multi-feature crowd fusion and visual object compression was proposed for mobile object tracking.

The rest of the paper is organized as follows. Section 2 describes the location model based on multi-feature crowd fusion. In Section 3, we show the embedded tracking algorithm for visual object compression. We analyzed and evaluated the proposed scheme in Section 4. Finally, we conclude the paper in Section 5.

## 2 Location model based on multi-feature crowd fusion

Generally, the target location is divided into two stages.

- (1)
Capture moving target image frame sequence.

- (2)
The characteristics of real-time target image frames would be extracted.

- (3)
From the current image frame to the still image frame between the target, search and the extracted features of the image frame the most similar to the target motion characteristics.

The second stage involves the characteristics of the moving target matching.

Choosing different features, according to the characteristics of the target, is selecting the best feature matching scheme.

- (1)
The extracted features are single. Such feature extraction is difficult to locate for the complex moving objects with multiple states.

- (2)
The change features of the moving object in complex scenes such

*C*_{def}as various deformation,*C*_{lgf}(light),*C*_{siz}(size), and*C*_{col}(color), making the single feature matching success rate SR_{FM}very low, as shown in formula (1).

*G*is the vector representation of image frame matrix.

*H*is the function used to solve the image frame characteristics and frame similarity.

*N*represents the captured motion target image frame sequence length.

*M*represents the frames of image feature matching. From formula (1), it is found that the upper bound of the matching success rate is \( \frac{f\left({\mathrm{if}}_M,{h}_M\right)}{N} \). But the success rate of image frames captured is inversely proportional to the number of captured image frames. The conclusion shows that the captured image frames will restrict the single feature matching characteristic.

- (3)
The accuracy and robustness of target motion in real time are poor, as shown in formula (2). In order to improve the accuracy and robustness, a single feature set is tracking the feature series, but the complexity of the transition algorithm is too high, as shown in formula (3).

*A*

_{TR}indicates the positioning accuracy. RUS

_{TR}indicates the location robustness.

*β*is the included angle between adjacent image frames.

*ρ*denotes the expressed error vector.

Here, CLE_{TSA} represents the complexity of the transition algorithm. The function *g*(*h*
_{
i
}, *α*) represents transition algorithm. It can be found that the complex image frames are proportional to the degree and feature matching. This shows that more space, time, and computation must be paid in order to get more features to match the image frames.

*v*

_{mot}is the target motion trajectory fitting function.

*K*is the representation of time series features.

*L*is the representation of spatial sequence features.

Here, rank {ML_{F}} is the rank of multi-feature vector. From formula (5), we can see that the high matching success rate can be guaranteed as long as the multiple feature vectors are solved correctly.

## 3 Embedded tracking algorithm for visual object compression

*x*

^{ Q }. The visual frame of the internal elements is used to form a visual matrix VM. The element of the visual matrix receives the interference of the multidimensional space, and the information is easy to be distorted. In order to solve this problem, the visual matrix VM can be compressed. The visual frame of the L-dimensional space tracking system is shown in Fig. 3. Here, the VM is selected as the object of the center visual frame VF, perpendicular to the coordinate axes of the L-dimensional space. The angle between the vertical line is denoted as θ

_{1}, …, θ

_{Q}. MO represents a moving target. In the moving process of objects,

*φ*is the angle between the VF point and the motion direction. The VF point can collect the visual targets with different degrees. These targets are used to update the elements of the visual matrix VM. The fusion results of VF and VM are mapped into the MO plane. The compression matrix must satisfy the omni-directional low-dimensional characteristics, as shown in formula (7). After the matrix is compressed, the characteristics of the distribution of the visual frame must be satisfied, as shown in formula (8).

*F*(

*θ*,

*φ*) represents the direction function of visual frames in L-dimensional space.

*D*represents the distance between the VF point and the origin of the L-dimensional space.

*D*

_{eff}represents the effective dimension of visual tracking space. The parameter value is obviously less than L, which can effectively reduce the dimension and improve the compression efficiency of the visual frame.

Here, *f*(VF) is the visual frame distribution density function. \( {\left\{{D}_{\mathrm{eff}}\right\}}_{\max}^{i=1,\dots, {\left|\mathrm{V}\mathrm{M}\right|}^H} \) represents the largest spatial dimension in the VM rank of the visual matrix.

*D*

_{eff}dimensional space by choosing the \( {\left\{{D}_{\mathrm{eff}}\right\}}_{\min}^{i=1,\dots, {\left|\mathrm{V}\mathrm{M}\right|}^H} \) and realize the target motion prediction. The specific steps of the visual target compression algorithm are as follows:

- (1)
The visual matrix of the core visual frame-oriented migration: The VFC state of the visual frame is {

*θ*_{ C },*φ*_{ C },*D*_{eff_C }}, which is captured by the current moving target. The visual frames propagate along the direction of the*F*(*θ*_{ C },*φ*_{ C }). The new state {*θ*_{ U },*φ*_{ U },*D*_{eff_U }} of the visual frame is obtained after spreading on the dimensional space*D*_{eff_C }. - (2)
The moving object, the current state of the visual frame, and the diffusion of the visual frame form a compressed plane PCV: The compressed point set PT is formed in the plane. Arbitrary two points PT

_{ j }and PT_{ i }in the plane into a visual line: PT_{ i }normal vector is NV_{ i }= sin*θ*_{ C }cos*φ*_{ U }‖PT_{ j }‖. The normal vector of any point PT_{ j }on the plane is NV_{ j }= sin*θ*_{ C }cos*φ*_{ U }‖PT_{ i }‖. - (3)
The included angle between the normal plane of the method vector NV

_{ i }and the normal plane of the normal vector NV_{ j }: The relation between the plane angle and the direction arc is \( \sin \gamma ={\mathrm{NV}}_i\cdot {\mathrm{NV}}_j\left\Vert \mathrm{P}\mathrm{T}\right\Vert \arctan \left(\frac{\left|{\theta}_C-{\theta}_U\right|}{\left|{\varphi}_C-{\varphi}_U\right|}\right) \). The vector mapping relation between the plane angle and the direction field of the moving object is shown in formula (10).

*M*

_{ γ }is divided into 4 submatrices. Each submatrix

*M*

_{(i,j)}is obtained by solving the direction function, compressing the visual frame point and the signal strength.

- (4)
The 4 submatrices of the moving object multi-feature fusion mapping matrix are obtained by the visual frame analysis. Tracking matrix MT is obtained through the target compression. \( {M}_T=\left[\begin{array}{cc}\hfill {\mathrm{ML}}_{F\left(i,j\right)}{\mathrm{fu}}_{\mathrm{comp}}\hfill & \hfill {\mathrm{SR}}_{\mathrm{FM}}\cdot \mathrm{rank}\left\{{\mathrm{ML}}_F\right\} \tan \gamma \hfill \\ {}\hfill M\ast {\mathrm{if}}_M\left({C}_{\mathrm{def}},{C}_{\lg f},{C}_{\mathrm{siz}},{C}_{\mathrm{col}}\right)\hfill & \hfill {\displaystyle {\sum}_{i=1}^M\left( \sin \gamma -{\alpha}^i\right)}\frac{1}{N}\hfill \end{array}\right] \)

- (5)
*M*_{ γ }and*M*_{ T }through the operational matrix of integration to predict the moving object space of the latest state.

## 4 Performance analysis of embedded tracking algorithm

We focus on the library environment and the playground environment to test the proposed algorithm in this paper. The server of the experimental platform is core i5 Intel, physical memory is 8G, and virtual memory is 4G (algorithm using C language programming). In the library environment, the digital cameras were used to obtain the actual value of the pedestrian trajectory; the measurement range is 120 m^{2}. During the experiment, the pedestrians once every move, capture a visual frame image, and upload to the server through the Wi-Fi. In the playground, the scope of the experiment scene is larger. The maximum circumference of the playground is 700 m. By using an outdoor electric car combined with HD industrial cameras, the pedestrian trajectory is captured.

Delay with 1000 visual frame samples

Normal plane radian | Location delay with single feature | Location delay with multiple features and fusion |
---|---|---|

30 | 25.7 ms | 1.9 ms |

50 | 89.4 ms | 2.0 ms |

110 | 345.2 ms | 1.8 ms |

## 5 Conclusions

Positioning accuracy, real-time performance, and robustness are important performance indexes of moving target location and tracking. In order to improve the positioning accuracy and improve the quality of tracking service, we propose an embedded tracking algorithm based on multi-feature crowd fusion and visual object compression. First of all, it analyzes the dynamic motion of the target, the moving track, and the structure parameters of the image frame. By capturing the state characteristics of different targets, the multiple feature vectors are formed. Secondly, the visual matrix of the core visual frame-oriented migration is obtained. Moving object, the current state of the visual frame, and the diffusion of the visual frame form a compression plane. Embedded tracking is implemented. The experimental results show that the multi-feature fusion can effectively reduce the positioning error and shorten the delay. Compared with the target object tracking algorithm, the embedded tracking algorithm based on visual object compression has a significant advantage in the reconstruction of the moving target.

## Declarations

### Acknowledgements

This work is supported in part by the National Youth Fund Project No. 61300228.

### Competing interests

The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- PH Tseng, KT Feng, YC Lin et al., Wireless location tracking algorithms for environments with insufficient signal sources. IEEE Transactions on Mobile Computing 8(12), 1676–1689 (2009)View ArticleGoogle Scholar
- CT Chiang, PH Tseng, KT Feng, Hybrid unified Kalman tracking algorithms for heterogeneous wireless location systems. IEEE Transactions on Vehicular Technology 61(61), 702–715 (2012)View ArticleGoogle Scholar
- YC Lai, JW Lin, YH Yeh et al., A tracking system using location prediction and dynamic threshold for minimizing SMS delivery. Journal of Communications & Networks 15(15), 54–60 (2013)View ArticleGoogle Scholar
- YS Chiou, F Tsai, A reduced-complexity data-fusion algorithm using belief propagation for location tracking in heterogeneous observations. IEEE Transactions on Cybernetics 44(6), 922–935 (2014)View ArticleGoogle Scholar
- J Trogh, D Plets, A Thielens et al., Enhanced indoor location tracking through body shadowing compensation. IEEE Sensors Journal 16(7), 2105–2114 (2016)View ArticleGoogle Scholar
- P-C Liang, P Krause, Smartphone-based real-time indoor location tracking with 1-m precision. IEEE Journal of Biomedical and Health Informatics 20(3), 756–762 (2016)View ArticleGoogle Scholar
- R Diamant, LM Wolff, L Lampe, Location tracking of ocean-current-related underwater drifting nodes using Doppler shift measurements. IEEE Journal of Oceanic Engineering 40(4), 887–902 (2015)View ArticleGoogle Scholar
- B Yang, Y Lei, B Yan, Distributed multi-human location algorithm using naive Bayes classifier for a binary pyroelectric infrared sensor tracking system. IEEE Sensors Journal 16(1), 1–1 (2015)View ArticleGoogle Scholar
- SS Hwang, WJ Kim, J Yoo et al., Visual hull-based geometric data compression of a 3-D object. IEEE Transactions on Circuits & Systems for Video Technology 25(7), 1151–1160 (2015)View ArticleGoogle Scholar
- R Ji, H Yao, W Liu et al., Task-dependent visual-codebook compression. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 21(4), 2282–2293 (2012)MathSciNetView ArticleGoogle Scholar
- R Ji, LY Duan, J Chen et al., Mining compact bag-of-patterns for low bit rate mobile visual search. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 23(7), 3099–113 (2014)MathSciNetView ArticleGoogle Scholar
- T Huang, S Dong, Y Tian, Representing visual objects in HEVC coding loop. IEEE Journal on Emerging & Selected Topics in Circuits & Systems 4(1), 5–16 (2014)View ArticleGoogle Scholar