Generally, the target location is divided into two stages.
In the first phase, the feature of moving target is extracted. Feature extraction would be completed with the following steps.

(1)
Capture moving target image frame sequence.

(2)
The characteristics of realtime target image frames would be extracted.

(3)
From the current image frame to the still image frame between the target, search and the extracted features of the image frame the most similar to the target motion characteristics.
The second stage involves the characteristics of the moving target matching.
Choosing different features, according to the characteristics of the target, is selecting the best feature matching scheme.
The above localization scheme has the following defects:

(1)
The extracted features are single. Such feature extraction is difficult to locate for the complex moving objects with multiple states.

(2)
The change features of the moving object in complex scenes such C
_{def} as various deformation, C
_{lgf} (light), C
_{siz} (size), and C
_{col} (color), making the single feature matching success rate SR_{FM} very low, as shown in formula (1).
$$ \left\{\begin{array}{l}f\left({\mathrm{IF}}_i=\mathrm{if}\right)={\displaystyle \sum_{i=1}^N\left({G}_i,h\left(\mathrm{if},{h}_{i1},{\displaystyle \sum_{j=1}^i{h}_j}\right)\right)}\\ {}\mathrm{if}\left({C}_{\mathrm{def}},{C}_{\lg f}\right)h\left(\mathrm{if},{h}_{i1},{\displaystyle \sum_{j=1}^i{h}_j}\right)=\left{\mathrm{if}}_i\left({C}_{\mathrm{siz}},{C}_{\mathrm{col}}\right)\alpha \right{h}_{i1}\left(\alpha, {\displaystyle \sum_{j=1}^i{h}_j}\right)\\ {}{\mathrm{SR}}_{\mathrm{FM}}={\displaystyle \sum_{i=1}^Mf\left({\mathrm{if}}_i,{h}_i\right)}\frac{1}{{\displaystyle \sum_{j=1}^Nf\left({\mathrm{if}}_j\right)}}\le \frac{f\left({\mathrm{if}}_M,{h}_M\right)}{N}\end{array}\right. $$
(1)
Here, if represents the image frames. IF is the representation of image frame sequence. G is the vector representation of image frame matrix. H is the function used to solve the image frame characteristics and frame similarity. N represents the captured motion target image frame sequence length. M represents the frames of image feature matching. From formula (1), it is found that the upper bound of the matching success rate is \( \frac{f\left({\mathrm{if}}_M,{h}_M\right)}{N} \). But the success rate of image frames captured is inversely proportional to the number of captured image frames. The conclusion shows that the captured image frames will restrict the single feature matching characteristic.

(3)
The accuracy and robustness of target motion in real time are poor, as shown in formula (2). In order to improve the accuracy and robustness, a single feature set is tracking the feature series, but the complexity of the transition algorithm is too high, as shown in formula (3).
$$ \left\{\begin{array}{l}{A}_{\mathrm{TR}}={\displaystyle \sum_{i=1}^M\left( \sin \beta {\alpha}^i\right)}\frac{1}{N}+{\displaystyle \sum_{i=1}^M{h}_i\sqrt{\alpha }}\\ {}{\mathrm{RUS}}_{\mathrm{TR}}=\rho f\left({\mathrm{IF}}_M\right){\mathrm{SR}}_{\mathrm{FM}}\end{array}\right. $$
(2)
Here, A
_{TR} indicates the positioning accuracy. RUS_{TR} indicates the location robustness. β is the included angle between adjacent image frames. ρ denotes the expressed error vector.
$$ \left\{\begin{array}{l}{\mathrm{CLE}}_{\mathrm{TSA}}=\left{\mathrm{IF}}_i\rho \sin \beta \right\le M\ast g\left({h}_i,\alpha \right)\\ {}g\left({h}_i,\alpha \right)\approx {\displaystyle \sum_{i=1}^M\left({h}_ii\mathrm{f}\left({C}_{\mathrm{def}},{C}_{\lg f},{C}_{\mathrm{siz}},{C}_{\mathrm{col}}\right)\right)}\\ {}\begin{array}{ccc}\hfill \hfill & \hfill \hfill & \hfill \ge M\ast {\mathrm{if}}_M\left({C}_{\mathrm{def}},{C}_{\lg f},{C}_{\mathrm{siz}},{C}_{\mathrm{col}}\right)\hfill \end{array}\end{array}\right. $$
(3)
Here, CLE_{TSA} represents the complexity of the transition algorithm. The function g(h
_{
i
}, α) represents transition algorithm. It can be found that the complex image frames are proportional to the degree and feature matching. This shows that more space, time, and computation must be paid in order to get more features to match the image frames.
In order to solve the above problems, we propose a multifeature crowd fusion location model. The model analyzes the dynamic motion of the target, the moving track, and the structure parameters of the image frame. The state characteristics of different targets are captured; the composition of multiple feature vectors such as formula (4) is presented. This vector integrates the characteristics of motion state and deformation, light, size, and color and can effectively improve the low matching success rate of single feature extraction, such as formula (5).
$$ \left\{\begin{array}{l}{\mathrm{ML}}_F=\left[\begin{array}{ccc}\hfill {v}_{\mathrm{mot}}{\mathrm{if}}_{11}\hfill & \hfill \cdots \hfill & \hfill {v}_{\mathrm{mot}}{f}_{1L}\hfill \\ {}\hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill {v_{\mathrm{mot}}}^K{f}_{K1}\hfill & \hfill \cdots \hfill & \hfill {v_{\mathrm{mot}}}^K{f}_{KL}\hfill \end{array}\right]\\ {}{v}_{\mathrm{mot}}=\frac{M}{N}{\displaystyle \sum_{i=1}^{NM}f\left(i{\mathrm{f}}_i,{h}_i\right)}\end{array}\right. $$
(4)
Here, v
_{mot} is the target motion trajectory fitting function. K is the representation of time series features. L is the representation of spatial sequence features.
$$ {\mathrm{SR}}_{\mathrm{F}\mathrm{M}}=\left\{\begin{array}{l}\alpha \cdot \mathrm{rank}\left\{{\mathrm{ML}}_{\mathrm{F}}\right\} \tan \beta, \beta <\alpha \left\right\beta >\rho \\ {}1,\alpha \le \beta \le \rho \end{array}\right. $$
(5)
Here, rank {ML_{F}} is the rank of multifeature vector. From formula (5), we can see that the high matching success rate can be guaranteed as long as the multiple feature vectors are solved correctly.
In order to further reduce the complexity and improve the accuracy and reliability of multifeature matching, this model combines the multifeature fusion mechanism based on crowd feature analysis. Multifeature vector and the target motion curve of the knowledge combination are shown in Fig. 1. In the crowd analysis characteristics, the curve and the multiple features are relatively independent, the relative independence between the arc and the multiple features. By crowd analysis, multifeature vectors are optimized. Multifeatures in this vector are not mutually exclusive. Multi features in this vector are not mutually exclusive for improving the performance of multi feature fusion. This can reduce the amount of fusion operations, as shown in formula (6).
$$ \left\{\begin{array}{l}{\mathrm{fu}}_{\mathrm{comp}}=\frac{1}{M^2}{\displaystyle \sum_{i=1}^K{\displaystyle \sum_{j=1}^LG\left(i,j\right){\mathrm{ML}}_{F\left(i,j\right)}}}\\ {}G\left(i,j\right)=\frac{f\left({\mathrm{IF}}_i\right)f\left({\mathrm{IF}}_j\right)}{h\left(f,{\mathrm{ML}}_{Fi},{\mathrm{ML}}_{Fj}\right)}\end{array}\right. $$
(6)
In summary,the multi feature crowd fusion algorithm is shown in Fig. 2.