CN105894008A

CN105894008A - Target motion track method through combination of feature point matching and deep nerve network detection

Info

Publication number: CN105894008A
Application number: CN201410767363.8A
Authority: CN
Inventors: 陈姝
Original assignee: Guangxi Kasite Cartoon Co Ltd
Current assignee: Guangxi Kasite Cartoon Co Ltd
Priority date: 2015-01-16
Filing date: 2015-01-16
Publication date: 2016-08-24

Abstract

The present invention discloses a target motion track method through combination of feature point matching and deep nerve network detection. The method comprises: firstly, learning the visual prior of a learning target through samples by the deep nerve network, performing tracking under the Bayes inference framework, taking the target visual prior as a target appearance expression in the tracking process, and obtaining the tracking result through the particle filter sequence. In order to avoid tracking drift, a system state model is built by the feature point matching, and the target is divided into subtargets to perform similar measurement to improve the local shielding capability resistance of the algorithm. The target motion track method through combination of feature point matching and deep nerve network detection is able to accurately track the motion target in the video, is widely applicable to the fields of man-machine interaction, interactive entertainment, intelligent monitoring, medical diagnosis and the like.

Description

The target travel tracking detected in conjunction with Feature Points Matching and deep neural network

[technical field]

The present invention relates to computer vision and field of video processing, particularly to moving target based on video with Track method.

[background technology]

Motion target tracking based on video can be widely applied for man-machine interaction, interaction entertainment, intelligent monitoring Etc. multiple fields.At present, conventional motion target tracking method mainly uses discriminant track algorithm or generation Formula track algorithm.This two classes track algorithm is respectively arranged with its pluses and minuses, it determines formula track algorithm is owing to have employed effectively Target visual represent, can obtain preferable tracking effect under simple tracking environmental, but complicated Under background, tracking effect is poor.Production track algorithm tracking effect under the complex environment such as blocking is preferable, but It is that then tracking effect is poor when target appearance drastically changes.

[summary of the invention]

In consideration of it, it is an object of the invention to utilize the advantage of this two classes track algorithm, propose a kind of in complexity Environment carries out the method for target travel accurate tracking.

For reaching above-mentioned purpose, the present invention by the following technical solutions:

1, stack sparse own coding neutral net is trained to obtain the priori vision table of target on public data collection Show.

2, on the basis of the SIFT matching double points extracted, set up object function, and optimize this object function and must transport The motion model of upper and lower two interframe of moving-target.

3, on the basis of above two steps, particle filter is utilized to carry out motion target tracking.

Compared with prior art, the present invention has a following significant advantage:

1, this invention utilizes the priori visual representation of deep neural network learning target, can improve the precision of tracking.

2, this invention utilizes Feature Points Matching and deep neural network detection to be possible to prevent follow the tracks of drift and follow the tracks of loss.

[accompanying drawing explanation]

Fig. 1 is target following flow chart；

Fig. 2 is stack sparse own coding neural network structure figure；

Fig. 3 is characteristic layer 1 sparse own coding structure chart；

Fig. 4 is that the present invention is applied to video tracking result figure；

[detailed description of the invention]

Below in conjunction with the accompanying drawings and the present invention is described in further detail by detailed description of the invention.

Target travel track algorithm proposed by the invention is as it is shown in figure 1, now specifically introduce the realization of each step Details:

1, off-line training.VOC2010 and Caltech101 data set is utilized to carry out stack own coding nerve net Network training obtains the vision priori of target.The stack own coding neural network structure used is as in figure 2 it is shown, be total to Dividing 5 layers, the 1st layer is input layer, and last layer is softmax grader, and middle three layers is sparse own coding Device.Training uses successively greedy coaching method to be undertaken in two steps, each layer of first step training network successively, the Two steps utilize back-propagation algorithm to carry out whole network on the basis of every layer of initial weight that the first step calculates Fine setting.

The first step is trained: sparse own coding device is attempted approaching an identity function, so that outputClose to Input x.Structure (see Fig. 3) and the training process of sparse own coding device are described as a example by characteristic layer 1.

If the i-th sample is x_i, W, W ', b, b ' are respectively between input layer and hidden layer and hidden layer and output layer Weight matrix and bias vector.Input layer (x_i=[x₁ ^(l), x₂ ^(l)..., x₁₀₂₄ ^(l), 1]^T), hidden layer ( It is the excitation output of hidden layer neuron j under the i-th sample) and defeated Go out layer () there is following relation

Wherein f () is logistic sigmoid function.

The object function setting up sparse own coding device is

Wherein β controls the weight of openness penalty factor, and J (W, W ', b, b ') is cost function, is defined as

For sparse constraint item, wherein m is the quantity of hidden layer neuron, and ρ is openness Parameter, it is common that one close to 0 less value, β be used for control constraints item weight.ForBetween cross entropy.

Sparse own coding device is to solve the parameter making formula (2) minimize, i.e.

W, W^{'}, b, b^{'} = \arg \min_{W, W^{'}, b {, b}^{'}} (J_{sparse} (W, W^{'}, b, b^{'})) - - - (4)

Second step is finely tuned: during the first step is trained, and during owing to training each layer parameter, can fix other Each layer parameter keeps constant, thus if it is desired to obtain more preferable result, after above-mentioned pre-training process completes, The parameter of all layers can be adjusted to improve result by back-propagation algorithm simultaneously.

In order to improve the ability of algorithm opposing partial occlusion, except one overall goals own coding degree of depth god of training Through network, target is divided into four nonoverlapping sub-goals (top half, the latter half, left-half simultaneously And right half part) be respectively trained an own coding deep neural network, due to size reduce half, thus this four The size of the input layer of the own coding deep neural network that individual sub-goal is corresponding is all 512.Getting well of do so is in Under at partial occlusion, target part region is visible, utilizes partially visible region to carry out target similarity measurement The reliability of algorithm can be improved.

2, target following.The present invention uses particle filter as the basic skills followed the tracks of.Particle filter tracking Precision depends on the reliability of state model and observation model, and the structure of the two model is described below.

2.1 state models build

We are with vector

x_{t} = {(x_{t}^{1}, y_{t}^{1}, x_{t}^{2}, y_{t}^{2}, x_{t}^{3}, y_{t}^{3}, x_{t}^{4}, y_{t}^{4})}^{T}

Represent dbjective state, It is respectively the target upper left corner in the picture, the lower left corner, the upper right corner, the coordinate in the lower right corner. The state model used is as follows

p (x_{t} | x_{t - 1}) = g (x_{t - 1}) + v_{t - 1}, v_{t - 1} ~ N (0, Σ) - - - (5)

Wherein N (0, ∑) is null vector average multivariate Gaussian function, and ∑ is diagonal matrix.g(x_t-1) it is motion mould Type, uses six parameter affine transform to be defined as follows

\{\begin{matrix} x_{t} = a_{1} x_{t - 1} + a_{2} y_{t - 1} + a_{0} \\ y_{t} = a_{4} x_{t - 1} + a_{5} y_{t - 1} + a_{3} \end{matrix} - - - (6)

Wherein a₀, a₁, a₂, a₃, a₄, a₅For motion model parameters, following methods is used to be calculated.

Being defined error function by above formula is

E (p; x) = [\begin{matrix} x_{t} - a_{1} x_{t - 1} - a_{2} y_{t - 1} - a_{0} \\ y_{t} - a_{4} x_{t - 1} - a_{5} y_{t - 1} - a_{3} \end{matrix}] - - - (7)

Wherein p=(a₀, a₁, a₂, a₃, a₄, a₅)^T, x=(x_t, y_t, x_t+1, y_t+1) it is that 2 coupling SIFT feature points are at image On coordinate.Minimizing following formula can be in the hope of motion model parameters

Gauss-Newton iterative method is used to optimize above formula.

2.2 observation models build

Using the output of deep neural network as the similarity of particle, owing to having 5 deep neural network, Therefore the output of the deep neural network of the similarity combining target of particle and sub-goal, it is defined as follows

p (z_{t} | x_{t}) &Proportional; θ_{1} \cdot c_{t}^{f} + θ_{2} \cdot \max (c_{t}^{t}, c_{t}^{b}, c_{t}^{l}, c_{t}^{r}) - - - (9)

Wherein θ₁, θ₂For coefficient of similarity, meet θ₁+θ₂=1, its value adjusts according to experiment and arranges；For mesh The output of mark deep neural network,It is respectively the output of four sub-target depth neutral nets.

2.3 online target followings

Target to be tracked is marked in first frame by user and obtains, and extracts in object region to be tracked SIFT feature.Extract, in first frame, the sparse own coding of stack that off-line training is obtained by the positive negative sample of target Deep neural network carries out retraining and obtains the vision special table representation model of target.Positive sample set is by selected target Region obtains according to following transformation equation,

(\begin{matrix} x^{'} \\ y^{'} \end{matrix}) = (\begin{matrix} \cos θ & \sin θ \\ - \sin θ & \cos θ \end{matrix}) (\begin{matrix} 1 & s \\ s & 1 \end{matrix}) (\begin{matrix} x \\ y \end{matrix}) - - - (10)

Wherein (x, y)^TFor the pixel coordinate in selected target region, (x ', y ')^TFor the area pixel after conversion Point coordinates, θ, s are transformation parameter, and θ is translation-angle, and span is θ ∈ [-π/4, π/4], and s is for becoming Changing yardstick, span is s ∈ [0.8,1.2].Randomly choose one group (θ, s) value according to formula (10) by To Template Carrying out converting and train positive sample as one, negative sample collection is then made with the image-region outside selected target region For data source, from these data sources, extract negative sample with different yardsticks and position.

Target on-line tracking flow process is as shown in Figure 1.Two threshold values a are also set in this algorithm₁, a₂, and a₁＞ a₂, wherein a₁It is used for judging currently to follow the tracks of result the most reliable, if credibility exceedes this threshold value, then Using tracking result as a new positive sample, and the image-region beyond this region selects negative sample, weight Newly train deep neural network.a₂It is used for judging whether particle filter tracking drifts about, if all particles Similarity be below this threshold value, then show that tracker lost efficacy, need scanned whole figure by deep neural network As carrying out target detection, and carry out particle filter according to testing result and reinitialize.

The target following result that this method is applied to video is shown in Fig. 4.

Claims

1. the target travel tracking combining Feature Points Matching and deep neural network detection.Its feature It is to comprise the following steps:

A) carry out stack own coding neural metwork training at VOC2010 and Caltech101 data set and obtain mesh Target vision priori.

B) the SIFT matching double points that extracts and on set up object function, optimize this object function and obtain moving target The motion model of upper and lower two interframe.

C) on the basis of above two steps, particle filter is utilized to carry out motion target tracking.

A kind of combination Feature Points Matching the most according to claim 1 and the target of deep neural network detection Motion tracking method, it is characterised in that: described step b) sets up following error function.

E (p; x) = [\begin{matrix} x_{t} - a_{1} x_{t - 1} - a_{2} y_{t - 1} - a_{0} \\ y_{t} - a_{4} x_{t - 1} - a_{5} y_{t - 1} - a_{3} \end{matrix}]