CN109559329A

CN109559329A - A kind of particle filter tracking method based on depth denoising autocoder

Info

Publication number: CN109559329A
Application number: CN201811433093.1A
Authority: CN
Inventors: 李良福; 宋睿
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2019-04-02
Anticipated expiration: 2038-11-28
Also published as: CN109559329B

Abstract

The invention belongs to computer vision analysis technical fields, more particularly to a kind of particle filter tracking method based on depth denoising autocoder, including initial phase, using the manual spotting position of video first frame, it is tracked in process and tracking processing in first frame, it needs respectively in target background and prospect a certain number of positive negative samples selected around, initialize trained network model, second step, carry out importance sampling, third step, calculate observation probability, 4th step, update weight, 5th step, judge and selects the maximum particle of weight in weight information, think the particle be we next the target to be tracked, tracking new samples are updated for next frame, process of the circulation second step to the 5th step, until video playing finishes, this method can efficiently differentiate target signature and background, improve track algorithm Precision.

Description

Particle filter tracking method based on depth denoising automatic encoder

Technical Field

The invention belongs to the technical field of computer vision analysis, and particularly relates to a particle filter tracking method based on a depth denoising automatic encoder.

Background

Visual target tracking is an important research direction for computer vision and visual analysis. Typical visual analysis requires consistent and stable tracking of the object of interest. For monocular visual target tracking, numerous scholars propose theories and algorithms worthy of reference. In practical applications, the problem still faces a huge challenge due to the influence of factors such as complex background, target occlusion, rapid target motion, illumination change and the like.

The deep neural network has strong learning ability in the aspects of target detection and target classification. The deep learning architecture is more suitable for learning classification features than specific targets. In addition, the deep neural network algorithm usually requires a long iterative training process to converge, and the real-time requirement of online learning is difficult to meet. Therefore, it is difficult to extend the current deep learning network architecture to the field of target tracking.

Disclosure of Invention

The invention provides a particle filter tracking method based on a depth denoising automatic encoder, aiming at solving the problems of interference such as complex background, light change, target shielding and the like in the target tracking process and the problem of poor anti-jamming capability of the existing tracking algorithm. The technical problem to be solved by the invention is realized by the following technical scheme:

a particle filter tracking method based on a depth denoising automatic encoder comprises the following steps: step 1: training a deep network model, carrying out unsupervised greedy training on each layer of network layer by layer, adding noise into training data to obtain more stable feature expression, and carrying out supervised learning on the features through a classification neural network to further optimize parameters of the network;

step 2: manually calibrating the target position by adopting a first frame of a video, selecting positive and negative training samples from the sequence, and initializing the deep network model in the step 1;

and step 3: sampling particle sets by adopting importance, then propagating each particle forwards through a trained network model, and calculating the confidence coefficient of each particle in the online tracking process through a classification neural network;

and 4, step 4: calculating the observation probability of the particles according to the confidence coefficient of the particles in the step 3;

and 5: and (5) updating the weight of the particles according to the observation probability in the step (4) to determine the target position, updating and tracking a new sample for the next frame, and circulating the processes from the step (3) to the step (5) until the video is played.

Further, the depth network model in the step 1 is overlapped by an automatic noise reduction coder, and the output of the next layer is used as the input of the upper layer; the automatic noise reduction encoder comprises an encoder, a decoder and an implicit layer, wherein the decoder needs to predict original undamaged data according to noise characteristics and finally outputs the closest original input, Gaussian noise is usually used as an attenuation vector, and the expression of the Gaussian noise is as follows:

where x is the original input without noise interference,is data contaminated by noise, and σ represents the degree of regularization of the auto-encoder.

Further, the training process in step 1 is as follows: assume a training sample set x ∈ R for unlabeled classes^dMapping the input x to the hidden layer by the activation function f to obtain z ∈ R^d

z∈f_θ(x)＝σ(Wx+b) (1)

Where θ is { W, b }, W is a weight matrix, b is a coding layer vector, σ is a non-linear activation function, and the decoder remaps the input coded representation to form reconstructed y

y∈f_θ′(h)＝σ(W′h+b′) (2)

Where θ '{ W', b '}, W' is the transpose of the weight matrix W, and σ is the activation function of decoding; the automatic noise reduction encoder enables y to be approximately equal to x through the process;

hypothesis training set { (x)⁽¹⁾，y⁽¹⁾)，...，(x^(m)，y^(m)) Contains m training samples, x represents a single sample feature, y represents the input to which the sample corresponds, and a single sample (x, y) is taken to define its cost function;

wherein h is_W，b(x) Is corresponding to a networkThe output value of sample x, and thus the cost function of the training set of m samples, is:

λ is the weight loss coefficient, controlling the relative importance of the two parts; the process of training an automatic noise reduction encoder is to adjust the minimum reconstruction error J (W, b) of the parameters { theta, theta' } in the training sample set, where J (W, b) is a convex function, usually optimized by an iterative method.

Further, the classification neural network comprises an automatic noise reduction encoder coding part and k classification layers connected with sparse constraints.

Further, the classification neural network learning method in the step 1 is as follows, and Z is an activation function of a hidden layer of the self-encoder. In the forward propagation phase, the activation function Z is:

z＝W^Tx+b (6)

wherein x is an input vector; w is a weight; b is a bias (bias).

The first K maxima of the activation function are maintained and all others are set to zero:

wherein, (gamma)^cIs complementary to z, (Γ)^c＝sup p_k(z). The sparsity z is used to calculate the network reconstruction error:

where x is the training sample set, W represents the weight, b' represents the transpose of the bias (bias), the weight is back-propagated by the first K maxima of the activation function output to reconstruct the error iteration adjustment.

Further, the confidence coefficient algorithm in step 3 is as follows: let o_iIs corresponding to class k_iThe output of the neural network, the expectation of the output value is the posterior probability.

E{o_i}＝P(k_i|x) (9)

Where x is the network input. Typically, the corresponding class of the largest output is taken as the decision, so the confidence can be obtained from the posterior probability of the neural network, and the largest output of the classified neural network is taken as the confidence:

c(x)＝E{max o_i} (10)

further, the importance sampling method in step 3 is as follows:

when a new frame image arrives, q(s) is distributed according to the importance_t|s_t-1，y_1：t) And a motion model, from the set of particles at time t-1Obtaining n particles at time tWherein the importance weight corresponding to the set of particlesSum of (S)_tIs 1; target state s_tS is represented by six affine parameters, horizontal translation, vertical translation, scaling, width/height ratio, rotation and skew_t(tx, ty, sxy, ra, ar, sa); distribution per dimension of state transitionsIn the motion model, the model is an independent zero-mean normal distribution model.

Further, the observation probability calculation method in step 4 is as follows:

each particle is propagated forward through the classification neural network to obtain its confidenceAnd will maximize confidenceComparing with the set threshold value tau ifReselecting the positive and negative training samples, and initializing a classification neural network; if it is notThe probability of observation of the particle is calculated as follows:

wherein y is_tIt means that the sample at time t corresponds to the input,refers to the ith particle at time t.

Further, the method for updating the weight of the particle in step 5 includes:

wherein,each dimension representing a state transition in the importance sample is distributed,i.e. the calculated probability distribution of the particles, the distribution q(s) in general_t|s_t-1，y_1：t) Using a first order Markov process q(s)_t|s_t-1) I.e. the state transition is independent of the model observations, then the weights are updated as:

whereinIndicating the weight at the time immediately before the update,representing the observation probability of the particles obtained by the previous step, wherein for each frame, the particle with the maximum weight is a tracking result; updating a positive sample for each tracking frame, and then tracking the next positive sample; determining a state corresponding to the largest particle as a frame target position outside the current vehicle

The invention has the beneficial effects that:

the automatic noise reduction encoder obtains distributed characteristic representation of high-dimensional complex input through unsupervised greedy training layer by layer and parameter optimization multi-layer network structure, and only needs to adjust network parameters for different tasks; the method can effectively distinguish the target characteristics from the background through the automatic noise reduction encoder for deep noise reduction; and a classification neural network is introduced, so that the classification capability of the network is improved, the precision of a tracking algorithm is improved, and finally, particle filtering is used for tracking the target.

Drawings

FIG. 1 is a schematic diagram of an automatic noise reduction encoder.

Fig. 2 is a schematic diagram of a classification neural network structure.

Fig. 3 is a diagram illustrating the tracking result of the indoor occlusion phenomenon.

Fig. 4 is a diagram illustrating the tracking result of the outdoor occlusion phenomenon.

Fig. 5 is a diagram illustrating the tracking result of the illumination change target.

Fig. 6 is a schematic diagram of a target fuzzy target tracking result.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

A particle filter tracking method based on a depth denoising automatic encoder comprises the following steps:

step 1: training a deep network model, carrying out unsupervised greedy training on each layer of network layer by layer, adding noise into training data to obtain more stable feature expression, and carrying out supervised learning on the features through a classification neural network to further optimize parameters of the network;

As shown in fig. 1, in step 1, the depth network model is overlapped by an automatic noise reduction encoder, the depth automatic encoder is a typical unsupervised learning network, which is a depth network model, overlapped by a self-encoder, and uses the output of the next layer as the input of the upper layer, the essence of the automatic encoder is to learn the same function, i.e. the input of the network is equal to the reconstructed output, and the training and parameter optimization process is to realize output and reproduction input; the automatic noise reduction encoder comprises an encoder, a decoder and an implicit layer; the auto noise reduction encoder accepts the corrupted data as input and predicts the original uncorrupted data as output by training. The purpose of the noise reduction automatic encoder is to allow the use of a very large encoder and simultaneously prevent useless constant functions between the encoder and the decoder, and based on a statistical theory, the core idea of the automatic noise reduction encoder is to disturb original input and noise according to a certain rule, so that the original input is damaged, damaged data is input into a network, and the representation of a hidden layer is obtained. The decoder needs to predict the original uncorrupted data according to the noise characteristics, and finally outputs the closest original input, which is the effect of removing interference, and gaussian noise is usually used as an attenuation vector, and its expression is:

where x is the original input without noise interference,which is noise contaminated data, and sigma represents the degree of regularization of the auto-encoder, for the problem of generation of impairment data, the binomial random numbers are not only simple but also easy to calculate, i.e., binomial distribution random numbers are generated using the same input shape and then multiplied by the input. We use the squared error function as the reconstruction error and train it in exactly the same way as other feed forward networks.

The training process in step 1 is as follows: assume a training sample set x ∈ R for unlabeled classes^dMapping the input x to the hidden layer by the activation function f to obtain z ∈ R^d

z∈f_θ(x)＝σ(Wx+b) (1)

y∈f_θ′(h)＝σ(W′h+b′) (2)

hypothesis training set { (x)⁽¹⁾，y⁽¹⁾，...，(x^(m)，y^(m)) Contains m training samples, x represents a single sample feature, y represents the input to which the sample corresponds, and a single sample (x, y) is taken to define its cost function;

wherein h is_W，b(x) Is the output value of sample x corresponding to the network, so the cost function of the training set of m samples is:

it can be seen that the first part of the equation is the mean variance term of the cost function. The second part is a weight attenuation term which can prevent the weight from changing too much so as to prevent overfitting, and lambda is a weight reduction coefficient which controls the relative importance of the two parts; the process of training an automatic noise reduction encoder is to adjust the minimum reconstruction error J (W, b) of the parameters { theta, theta' } in the training sample set, where J (W, b) is a convex function, usually optimized by an iterative method.

The classification neural network comprises an automatic noise reduction encoder coding part and classification layers connected with k sparse constraints.

The purpose of constructing a classification neural network is to calculate the confidence of each particle in the online tracking process. The automatic noise reduction encoder consists of an encoding part of the automatic noise reduction encoder and k sparse constraint connected classification layers, and a schematic diagram of a classification neural network structure is shown in FIG. 2; the invariant characteristic of the target can be effectively learned by introducing k sparse constraint, the linear discrimination capability of the classification neural network is improved, and the overfitting problem is solved to a certain extent. Neuroscience research shows that the response of visual signals in cerebral cortex is sparse, so that the expression of original signals can be more meaningful by introducing sparse restriction in a deep neural network, particularly for a classification task, the idea is verified in principal component analysis and sparse coding, K sparse restriction keeps K maximum activation functions of hidden layers, and the rest are set to be zero.

The classified neural network learning method in the step 1 is as follows, and Z is an activation function of a hidden layer of a self-encoder. In the forward propagation phase, the activation function Z is:

z＝W^Tx+b (6)

wherein x is an input vector; w is a weight; b is a bias (bias).

where x is the training sample set, W represents the weight, b' represents the transpose of the bias (bias), the weight is back-propagated by the first K maxima of the activation function output to reconstruct the error iteration adjustment. The confidence level of the classification neural network output is the confidence level, which reflects its decision confidence at some point in the feature vector space.

The confidence coefficient algorithm in the step 3 is as follows: let o_iIs corresponding to class k_iThe output of the neural network, the expectation of the output value is the posterior probability.

E{o_i}＝P(k_i|x) (9)

c(x)＝E{max o_i} (10)

the importance sampling method in step 3 is as follows:

when a new frame image arrives, q(s) is distributed according to the importance_t|s_t-1，y_1：t) And a motion model, from the set of particles at time t-1Obtaining n particles at time tWherein the importance weight corresponding to the set of particlesSum of (S)_tIs 1; target state s_tS is represented by six affine parameters, horizontal translation, vertical translation, scaling, width/height ratio, rotation and skew_t(tx, ty, sxy, ra, ar, sa); distribution per dimension of state transitionsIs an independent zero-mean normal score in the motion modelAnd (4) cloth modeling.

The observation probability calculation method in step 4 is as follows:

The method for updating the weight of the particles in the step 5 comprises the following steps:

whereinIndicating the weight at the time immediately before the update,representing the observation probability of the particles obtained by the previous step, wherein for each frame, the particle with the maximum weight is a tracking result; updating a positive sample for each tracking frame, and then tracking the next positive sample; the state corresponding to the largest particle is determined as the frame target position outside the current vehicle.

The experimental operating environment is as follows: 3.8GHz, quad-core AMD processor, 8GB memory. Video sequences in various environments are used for verification, including illumination change, target occlusion and target rapid motion.

Fig. 3 and 4 show occlusion of an object. The shielding phenomenon is the phenomenon that a target is partially shielded due to the complexity of the surrounding environment and the interference of other surrounding objects, and the target cannot be lost in the whole tracking process of the tracker; outdoor photography often produces strong light changes. When the light changes greatly, the performance of target tracking is affected, and as can be seen from fig. 5, when a target enters a tunnel, huge illumination changes exist in an image, but from the tracking result, the algorithm accurately completes the tracking task.

The problem of target blurring appears in fig. 6, the target blurring is caused by too fast speed of the target in the moving process or unstable photography, the image of the target in the image is not clear, the target is blurred, the tracking effect is affected, the tracking algorithm in the text accurately completes tracking, and no target is lost.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A particle filter tracking method based on a depth denoising automatic encoder is characterized in that:

2. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: in the step 1, the deep network models are overlapped by an automatic noise reduction coder, and the output of the next layer is used as the input of the upper layer; the automatic noise reduction encoder comprises an encoder, a decoder and an implicit layer, wherein the decoder needs to predict original undamaged data according to noise characteristics and finally outputs the closest original input, Gaussian noise is usually used as an attenuation vector, and the expression of the Gaussian noise is as follows:

3. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the training process in the step 1 is as follows: assume a training sample set x ∈ R for unlabeled classes^dMapping the input x to the hidden layer by the activation function f to obtain z ∈ R^d

z∈f_θ(x)＝σ(Wx+b) (1)

y∈f_θ′(h)＝σ(W′h+b′) (2)

4. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the classification neural network comprises an automatic noise reduction encoder coding part and k classification layers connected with sparse constraints.

5. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the classified neural network learning method in the step 1 is as follows, and Z is an activation function of a hidden layer of a self-encoder. In the forward propagation phase, the activation function Z is:

z＝W^Tx+b (6)

wherein x is an input vector; w is a weight; b is a bias (bias).

6. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the confidence coefficient algorithm in the step 3 is as follows: let o_iIs corresponding to class k_iThe output of the neural network, the expectation of the output value is the posterior probability.

E{o_i}＝P(k_i|x) (9)

c(x)＝E{maxo_i} (10)

7. the particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the importance sampling method in the step 3 is as follows:

when a new frame image arrives, q(s) is distributed according to the importance_t|s_t-1，y_1∶t) And a motion model, from the set of particles at time t-1Obtaining n particles at time tWherein the importance weight corresponding to the set of particlesSum of (S)_tIs 1; target state S_tS is represented by six affine parameters, horizontal translation, vertical translation, scaling, width/height ratio, rotation and skew_t(tx, ty, sxy, ra, ar, sa); distribution per dimension of state transitionsIn the motion model, the model is an independent zero-mean normal distribution model.

8. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the observation probability calculation method in the step 4 is as follows:

9. The particle filter tracking method based on the automatic deep denoising encoder as claimed in claim 1, wherein: the method for updating the weight of the particles in the step 5 comprises the following steps: