CN106127806A

CN106127806A - A kind of RGB D method for tracking target based on degree of depth Boltzmann machine cross-module formula feature learning

Info

Publication number: CN106127806A
Application number: CN201610440598.5A
Authority: CN
Inventors: 姜明新
Original assignee: Huaiyin Institute of Technology
Current assignee: Huaiyin Institute of Technology
Priority date: 2016-06-17
Filing date: 2016-06-17
Publication date: 2016-11-16
Anticipated expiration: 2036-06-17
Also published as: CN106127806B

Abstract

The invention discloses a kind of RGB D method for tracking target based on degree of depth Boltzmann machine cross-module formula feature learning, comprise the steps: to build the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss Bernoulli Jacob；Described degree of depth learning network is carried out without supervised training；Utilize the degree of depth learning network that trains extract the initial sample of target to be tracked across pattern feature；By initial sample in pattern feature is sent to logistic regression grader, and described logistic regression grader is carried out Training；Sample at described RGB D video sequence t frame according to constructed state transition model；Utilize degree of depth learning network extract each sampling across pattern feature；Each sampling is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result as observation likelihood model；Bayesian algorithm is utilized to obtain the video frequency object tracking result of t frame.The present invention has higher success rate and relatively low centralized positioning error.

Description

A kind of RGB-D target following based on degree of depth Boltzmann machine cross-module formula feature learning Method

Technical field

The present invention relates to a kind of video target tracking method, be specially one based on degree of depth Boltzmann machine across pattern feature The RGB-D video target tracking method of study.

Background technology

Target following technology based on video is one of key issue of computer vision field, at video monitoring, machine People navigates, and the field such as intelligent transportation, virtual reality suffers from being widely applied.Both at home and abroad, a lot of research worker have done this respect Research, and have made some progress.Video target tracking method of the prior art is mainly in two-dimensional image sequence Realizing target location, lack three-dimensional information, therefore, can block in target, rotate and during attitudes vibration, appearance is followed the tracks of unsuccessfully Situation.Along with the release of RGB-D sensor, we can obtain coloured image and depth image simultaneously.Depth information is permissible Improve the performance of video frequency object tracking, the most preferably by RGB information and depth information effective integration, and then promote video mesh The performance that mark is followed the tracks of is the research emphasis in video frequency object tracking field, and conventional correlation technique generally exists techniques below problem: Using the feature of engineer as observation likelihood model, the feature of engineer can have some limitations；To RGB pattern Carry out computing respectively with the engineer's feature under Depth pattern, then use the methods such as weighting to extraction under both of which Feature simply merges, and amalgamation mode have ignored dependency complicated between RGB pattern and Depth pattern.

Summary of the invention

The present invention is directed to the proposition of problem above, and develop a kind of based on degree of depth Boltzmann machine cross-module formula feature learning RGB-D video target tracking method.

The technological means of the present invention is as follows:

A kind of RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, including as follows Step:

Step 1: build the cross-module formula spy of the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob Levy degree of deep learning network；

Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated；Described RGB-D sets of video data without Exemplar includes without label RGB image sample with without label Depth image pattern；

Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Bohr to based on Gauss-Bernoulli Jacob The cross-module formula depths of features learning network of the most graceful machine carries out without supervised training, extract sample across pattern feature；

Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample following the tracks of target in initial m frame As the original template in target sample storehouse；Described original template includes the positive negative sample of RGB image following the tracks of target in initial m frame Negative sample positive with Depth image；

Step 5: by following the tracks of the positive negative sample of RGB image of target in initial m frame, the positive negative sample of Depth image inputs respectively In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained, obtain target sample This across pattern feature；

Step 6: across pattern feature, positive for target negative sample is sent to logistic regression grader, divides described logistic regression Class device carries out Training；

Step 7: build state transition model, according to constructed state transition model before described RGB-D video sequence Sampling around the target following result of one frame video, obtain sample set, described sample set is multiple target following blocks；

Step 8: described sample is input to the cross-module formula spy limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained Levy in degree of deep learning network, obtain each sample across pattern feature；

Step 9: each sample is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result conduct Observation likelihood model；

Step 10: according to formulaObtain the mesh of described RGB-D video sequence t frame video Mark follows the tracks of resultWherein p (Z_t|X_t) represent observation likelihood model, p (X_t|X_t-1) represent t-1 frame and t frame two continuous frames State transition model, posterior probability p (X_t|Z_t)∝p(Z_t|X_t)∫p(X_t|X_t-1)p(X_t-1|Z_t-1)dX_t-1。

Further, during target is tracked, target sample storehouse is updated, uses tracking to obtain Target following result replaces the original template in target sample storehouse；

Further, described state transition model p (X_t|X_t-1)=N (X_t；X_t-1, Ψ), wherein, Ψ is diagonal matrix, should Element in diagonal matrix is the variance of the motion affine transformation parameter of target

Further, described observation likelihood model isWherein, w is logistic regression classification The parameter to be optimized of device.

Owing to have employed technique scheme, it is based on degree of depth Boltzmann machine cross-module formula feature learning that the present invention provides RGB-D video target tracking method, it is possible to occur seriously to block between target, rotate, the change such as care time, more stable Moving target in tracking, has higher accuracy rate and stronger robustness.

Accompanying drawing explanation

Fig. 1 is the structural representation of Boltzmann machine of the present invention；

Fig. 2 is the structural representation that the present invention limits Boltzmann machine；

Fig. 3 is the schematic diagram of RGB-D data cross-module formula depths of features Boltzmann machine of the present invention.

Fig. 4 is the schematic flow sheet of video target tracking method of the present invention；

Fig. 5 to Fig. 7 be successively RGB-D test video face_occ2, express2_occ, bear_change average in Heart position error Comparative result figure.

Detailed description of the invention

Step 10: according to formulaObtain the target of described RGB-D video sequence t frame video Follow the tracks of resultWherein p (Z_t|X_t) represent observation likelihood model, p (X_t|X_t-1) represent t-1 frame and t frame two continuous frames State transition model, posterior probability p (X_t|Z_t)∝p(Z_t|X_t)∫p(X_t|X_t-1)p(X_t-1|Z_t-1)dX_t-1。

Further, during target is tracked, target sample storehouse is updated, uses tracking to obtain Target following result replaces the original template in target sample storehouse；Further, described state transition model p (X_t|X_t-1)=N (X_t；X_t-1, Ψ), wherein, Ψ is diagonal matrix, and the element in this diagonal matrix is the side of the motion affine transformation parameter of target Difference

Boltzmann machine (Boltzmann Machines, BM) was proposed by Hinton and Sejnowski in nineteen eighty-three, was one Planting and entirely connected, by probabilistic neural unit, the Feedback Neural Network formed, the connection between neuron is symmetrical, and without self feed back, god Only have two states (un-activation and state of activation) through first output, represent with binary zero and 1.Fig. 1 shows Boltzmann The structure of machine, as it is shown in figure 1, the neuron of Boltzmann machine comprises visible element v ∈ { 0,1}^DWith hidden unit h ∈ { 0,1}^F, Visible element and hidden unit are made up of visible node and concealed nodes respectively, and D, F are respectively visible node and hidden node Quantity.Visible element v ∈ { 0,1}^DRepresent observable data, hidden unit h ∈ { 0,1}^FRepresent the spy from extracting data Levy.Visible layer node and hidden layer node (v, h) between energy function be defined as:

E (v, h; θ) = - \frac{1}{2} v^{'} L v - \frac{1}{2} h^{'} J h - v^{'} W h - v^{'} B - h^{'} A - - - (1)

In formula, θ={ W, L, J, B, A} are model parameters, and W, L, J represent that visible layer arrives hidden layer, it is seen that layer is to visible layer With the symmetrical connection weight of hidden layer to hidden layer, the diagonal entry of L and J is 0, B and A is visible layer node and hidden layer respectively The threshold value of node.

Boltzmann machine the destination of study is to obtain the connection weight between each neuron, finds the minimum overall situation energy of system Amount structure.By energy function indexation and regularization, can obtain visible element v and hidden unit h state be 1 associating Probability distribution:

P (v, h; θ) = \frac{P^{*} (v, h; θ)}{Z (θ)} = \frac{1}{Z (θ)} \exp (- E (v, h; θ)) - - - (2)

In formula, P^*Represent non-normalized probability,Represent partition function, be normalization item

Boltzmann machine has the most powerful unsupervised learning ability, it is possible to rule complicated in learning data, but learns The time practised is the longest, then, introduces and limits Boltzmann machine.Make L=0 and J=0, it may be assumed that limit Boltzmann machine visible layer and hidden Not link between node in hiding layer by layer, the only node between visible layer and hidden layer has connection weight, to a great extent Improve the training effectiveness of degree of deep learning network.Fig. 2 shows the schematic diagram limiting Boltzmann machine, as in figure 2 it is shown, limit glass Wurz graceful machine visible layer node and hidden layer node (v, h) between energy function be:

E(v,h；θ)=-v ' Wh-v ' B-h ' A (3)

Visible element v and hidden unit h state are the joint probability distribution of 1:

P (v, h; θ) = \frac{1}{Z (θ)} \exp (- E (v, h; θ)) - - (4)

Equally, in formulaRepresent partition function, be normalization item.

When input signal is real-valued image, the visible layer node limited in Boltzmann machine model is real numberHidden Hide node layer h ∈ { 0,1}^FDuring for binary random number, original model lost efficacy.Then the limited Boltzmann of Gauss-Bernoulli Jacob is defined Machine (Gaussian-Bernoulli RBM, GRBM) model, its energy function is:

E (v, h; θ) = Σ_{i = 1}^{D} \frac{{(v_{i} - b_{i})}^{2}}{2 σ_{i}^{2}} - Σ_{i = 1}^{D} Σ_{j = 1}^{F} \frac{v_{i}}{σ_{i}} W_{i j} h_{j} - Σ_{j = 1}^{F} a_{j} h_{j} - - - (5)

In formula, θ={ a, b, W, σ } is model parameter.

By to the further investigation of the most existing method for tracking target based on RGB-D data and summary, Wo Menfa Existing existing document is all independent under RGB information and depth information both of which clarification of objective to carry out computing respectively, so Feature under both of which is simply merged by the methods such as rear employing weighting, and such Feature Fusion mode have ignored two kinds Dependency complicated between pattern, obtained feature is the most notable to the raising of performance of target tracking.The present invention utilize based on Gauss-Bernoulli Jacob limits the cross-module formula depths of features learning network of the RGB pattern-Depth pattern of Boltzmann machine and extracts target Under RGB pattern-Depth pattern across pattern feature, Fig. 3 be cross-module formula depths of features of the present invention study process schematic, as Shown in Fig. 3, set up degree of deep learning network respectively in both of which (RGB pattern, Depth pattern), as shown in Fig. 3 (a) and 3 (b), The second hidden layer in both of whichThe hidden layer of one cross-module formula Combined expression of upper increase, such as Fig. 3 (c) institute Show, by Level by level learning obtain RGB-D data across pattern featureThe hidden layer of described cross-module formula Combined expression As aforementioned second hidden layer, but its input signal is the Gauss-Bernoulli Jacob corresponding to RGB pattern limits Boltzmann machine Gauss-Bernoulli Jacob corresponding to the output of the second hidden layer and Depth pattern limits the output of Boltzmann machine, and Fig. 3 illustrates The schematic diagram of cross-module formula depths of features study.Concrete building process is as follows:

Degree of depth Boltzmann machine is a kind of network with symmetrical coupled random binary unit, with only hidden layer RBM is different, and degree of depth Boltzmann machine comprises a series of visual element v ∈ { 0,1}^DWith multiple hidden layer unitHerein by the cross-module formula degree of depth Bohr building RGB and Depth The most graceful machine extract RGB-D data across pattern feature, first, set up the extraction RGB sequence signature with two hidden layers Gauss-Bernoulli Jacob's degree of depth Boltzmann machine, as shown in Fig. 3 (a), orderRepresent visible element, be real-valued RGB figure herein As input, orderWithRepresent the hidden unit of two hidden layers, two-layer based on RGB The energy function of Gauss-Bernoulli Jacob's degree of depth Boltzmann machine is:

\begin{matrix} E (v^{R G B}, h^{(1 R G B)}, h^{(2 R G B)}; θ^{R G B}) = Σ_{i = 1}^{D} \frac{{(v_{i}^{(R G B)} - b_{i}^{(R G B)})}^{2}}{2 σ_{i}^{{(R G B)}^{2}}} - Σ_{i = 1}^{D} Σ_{j = 1}^{F_{1}^{R G B}} \frac{v_{i}^{(R G B)}}{σ_{i}^{(R G B)}} W_{i j}^{(1 R G B)} h_{j}^{(1 R G B)} \\ - Σ_{j = 1}^{F_{1}^{R G B}} Σ_{l = 1}^{F_{2}^{R G B}} W_{j l}^{(2 R G B)} h_{j}^{(1 R G B)} h_{l}^{(2 R G B)} - Σ_{j = 1}^{F_{1}^{R G B}} a_{j}^{(1 R G B)} h_{j}^{(1 R G B)} - Σ_{l = 1}^{F_{2}^{R G B}} a_{l}^{(2 R G B)} h_{l}^{(2 R G B)} \end{matrix} - - - (6)

In formula,It is the standard deviation of Gauss model, θ^RGBIt it is the model parameter vector of degree of depth Boltzmann machine.

Two-layer Gauss based on RGB-Bernoulli Jacob's degree of depth Boltzmann machine joint probability distribution is:

P (v^{R G B}, h^{(1 R G B)}, h^{(2 R G B)}; θ^{R G B}) = \frac{1}{Z (θ^{R G B})} \underset{h^{(1 R G B) (2 R G B)}}{Σ} \exp (- E (v^{R G B}, h^{(1 R G B)}, h^{(2 R G B)}; θ^{R G B})) - - - (7)

In like manner, orderRepresenting visible element, the most real-valued depth image inputs, orderWithRepresent the hidden unit of two hidden layers, equally build the extraction RGB sequence with two hidden layers The Gauss of row feature-Bernoulli Jacob's degree of depth Boltzmann machine, as shown in Fig. 3 (b), its energy equation and joint probability distribution are respectively For:

\begin{matrix} E (v^{D e p t h}, h^{(1 D e p t h)}, h^{(2 D e p t h)}; θ^{D e p t h}) = Σ_{i = 1}^{D} \frac{{(v_{i}^{(D e p t h)} - b_{i}^{(D e p t h)})}^{2}}{2 σ_{i}^{{(D e p t h)}^{2}}} - Σ_{i = 1}^{D} Σ_{j = 1}^{F_{1}^{D e p t h}} \frac{v_{i}^{(D e p t h)}}{σ_{i}^{(D e p t h)}} W_{i j}^{(1 D e p t h)} h_{j}^{(1 D e p t h)} \\ - Σ_{j = 1}^{F_{1}^{D e p t h}} Σ_{l = 1}^{F_{2}^{D e p t h}} W_{j l}^{(2 D e p t h)} h_{j}^{(1 D e p t h)} h_{l}^{(2 D e p t h)} - Σ_{j = 1}^{F_{1}^{D e p t h}} a_{j}^{(1 D e p t h)} h_{j}^{(1 D e p t h)} - Σ_{l = 1}^{F_{2}^{D e p t h}} a_{l}^{(2 D e p t h)} h_{l}^{(2 D e p t h)} \end{matrix} - - - (8)

P (v^{D e p t h}, h^{(1 D e p t h)}, h^{(2 D e p t h)}; θ^{D e p t h}) = \frac{1}{Z (θ^{D e p t h})} \underset{h^{(1 D e p t h) (2 D e p t h)}}{Σ} \exp (- E (v^{D e p t h}, h^{(1 D e p t h)}, h^{(2 D e p t h)}; θ^{D e p t h})) - - - (9)

Degree of depth Boltzmann machine is utilized to extract the target characteristic under RGB and Depth both of which, in both of which herein Increase the hidden layer of a cross-module formula Combined expression on second hidden layer, build RGB and Depth with three hidden layers The Gauss of cross-module formula-Bernoulli Jacob's degree of depth Boltzmann machine, obtains RGB-D data across pattern feature by Level by level learningAcross shown in Gauss-Bernoulli Jacob degree of depth Boltzmann machine schematic diagram such as Fig. 3 (c) of pattern feature.

Make { v^RGB,v^DepthRepresent real-valued gaussian variable, { h^(1RGB),h^(2RGB),h^(1Depth),h^(2Depth),h⁽³⁾Represent three The hidden unit of hidden layer, then across modal Gaussian-Bernoulli Jacob's degree of depth Boltzmann machine joint probability distribution and energy equation difference For:

\begin{matrix} P (v^{R G B}, v^{D e p t h}; θ) = \underset{h^{(2 R G B)}, h^{(2 D e p t h)}, h^{(3)}}{Σ} P (h^{(2 R G B)}, h^{(2 D e p t h)}, h^{(3)}) (\underset{h^{(1 RGB)}}{Σ} P (v^{RGB}, h^{(1 RGB)}, h^{(2 RGB)}) (\underset{h^{(1 Depth)}}{Σ} P (v^{Depth}, h^{(1 Depth)}, h^{(2 Depth)})) \\ = \frac{1}{Z (θ)} \underset{h}{Σ} \exp (- \underset{i}{Σ} \frac{{(v_{i}^{R G B})}^{2}}{2 σ_{i}^{2}}) + \underset{i j}{Σ} \frac{v_{i}^{(R G B)}}{σ_{i}} W_{i j}^{(1 R G B)} h_{j}^{(1 R G B)} + \underset{j l}{Σ} W_{j l}^{(2 R G B)} h_{j}^{(2 R G B)} h_{l}^{(2 R G B)} - (- \underset{i}{Σ} \frac{{(v_{i}^{D e p t h})}^{2}}{2 σ_{i}^{2}}) \\ + \underset{i j}{Σ} \frac{v_{i}^{(D e p t h)}}{σ_{i}} W_{i j}^{(1 D e p t h)} h_{j}^{(1 D e p t h)} + \underset{j l}{Σ} W_{j l}^{(2 D e p t h)} h_{j}^{(1 D e p t h)} h_{l}^{(2 D e p t h)} + \underset{l p}{Σ} W^{(3 R G B)} h_{l}^{(2 R G B) h_{p}^{(3)}} + \underset{l p}{Σ} W^{(3 D e p t h)} h_{l}^{(2 D e p t h)} h_{p}^{(3)} \end{matrix} - - - (10)

\begin{matrix} E (v, h; θ) = Σ_{i = 1}^{D} \frac{{(v_{i}^{(R G B)} - b_{i}^{(R G B)})}^{2}}{2 σ_{i}^{{(R G B)}^{2}}} - Σ_{i = 1}^{D} Σ_{j = 1}^{F_{1}^{R G B}} \frac{v_{i}^{(R G B)}}{σ_{i}^{(R G B)}} W_{i j}^{(1 R G B)} h_{j}^{(1 R G B)} - Σ_{j = 1}^{F_{1}^{R G B}} Σ_{l = 1}^{F_{2}^{R G B}} h_{j}^{(1 R G B)} W_{j l}^{(2 R G B)} h_{l}^{(2 R G B)} \\ + Σ_{l = 1}^{F_{2}^{R G B}} Σ_{p = 1}^{F_{3}^{R G B}} h_{j}^{(2 R G B)} W_{l p}^{(3 R G B)} h_{p}^{(3 R G B)} - Σ_{j = 1}^{F_{1}^{R G B}} a_{j}^{(1 R G B)} h_{j}^{(R G B)} - Σ_{l = 1}^{F_{2}^{R G B}} a_{l}^{(2 R G B)} h_{l}^{(2 R G B)} + Σ_{i = 1}^{K} \frac{{(v_{i}^{(D e p t h)} - b_{i}^{(D e p t h)})}^{2}}{2 σ_{i}^{{(D e p t h)}^{2}}} \\ - Σ_{i = 1}^{K} Σ_{j = 1}^{F_{1}^{D e p t h}} \frac{v_{i}^{(D e p t h)}}{σ_{i}^{(D e p t h)}} W_{i j}^{(1 D e p t h)} h_{j}^{(1 D e p t h)} - Σ_{j = 1}^{F_{1}^{D e p t h}} Σ_{l = 1}^{F_{2}^{D e p t h}} h_{j}^{(1 D e p t h)} W_{j l}^{(2 D e p t h)} h_{l}^{(2 D e p t h)} - Σ_{l = 1}^{F_{2}^{D e p t h}} Σ_{p = 1}^{F_{3}^{D e p t h}} h_{j}^{(2 D e p t h)} W_{l p}^{(3 D e p t h)} h_{p}^{(3 D e p t h)} \\ - Σ_{j = 1}^{F_{1}^{D e p t h}} a_{j}^{(1 D e p t h)} h_{j}^{(1 D e p t h)} - Σ_{l = 1}^{F_{2}^{D e p t h}} a_{l}^{(2 D e p t h)} h_{l}^{(2 D e p t h)} - Σ_{p = 1}^{F_{3}} a_{p}^{(3)} h_{p}^{(3)} \end{matrix} - - - (11)

RGB-Depth is exactly computing formula across the learning tasks of Gauss-Bernoulli Jacob's degree of depth Boltzmann machine of pattern feature (11) maximum likelihood probability, obtains corresponding model parameter, but obtaining formula (11) maximum likelihood probability accurately is ten Divide difficulty, it is possible to use mean field is inferred and MCMC based on stochastic approximation realizes stochastic approximation study.

Video target tracking method of the present invention is based on Bayesian MAP probability, and described Bayesian MAP is general The problem that rate can regard the Bayesian MAP probability Estimation of hidden state variable in HMM as.That is: exist T frame obtains a series of observed image Y_t={ y₁, y₂..., y_t, utilize Bayes's MAP theory to estimate hidden state variable x_t。 From bayesian theory:

p(x_t|Y_t)∝p(y_t|x_t)∫p(x_t|x_t-1)p(x_t-1|Y_t-1)dx_t-1 (12)

In formula, p (x_t|x_t-1) represent two continuous frames state transition model, p (y_t|x_t) represent observation likelihood model.T The optimum state value of frame target can be obtained by maximum a-posteriori estimation, it may be assumed that

In formula,Represent state variable x of t frame_tThe l sample.In order to without loss of generality, present invention assumes that state Metastasis model Normal Distribution, it may be assumed that

p(X_t|X_t-1)=N (X_t；X_t-1,Ψ) (14)

In formula, Ψ is diagonal matrix, and element therein is the variance of the motion affine transformation parameter of target

Observation likelihood model can be divided into discriminative model (Discriminative model) and production model The big class of (Generative model) two: the present invention uses discriminative model therein, utilizes logistic regression (Logistic Regression) sample characteristics is classified by grader, completes following the tracks of the RGB-D of the positive negative sample of target across pattern feature After extraction, will be in pattern feature be input to logistic regression grader, it is thus achieved that confidence score (Confidence Score)；Logic The principle returning grader is specific as follows: set the cross-module formula feature representation of i-th training sample asI-th is trained The label of sample is y_i{-1 ,+1}, then positive sample set is ∈Positive sample is corresponding Label isEqually, negative sample collection is combined intoNegative The label that sample is corresponding isLogistic regression grader can be by minimizing generation It is trained by valency function, i.e.

\underset{&PlusMinus; w}{m i n} [C^{+} Σ_{i^{+} = 1}^{D^{+}} l o g (1 + e^{y_{i^{+}} &PlusMinus; w^{T} &PlusMinus; h_{i^{+}}^{(3)}}) + C^{-} Σ_{i^{-} = 1}^{D^{-}} \log (1 + e^{y_{i^{-}} &PlusMinus; w^{T} &PlusMinus; h_{i^{-}}^{(3)}})] - - - (15)

In formula,Be respectively the weight parameter of positive and negative two logic of class costs, w be logistic regression grader Parameter to be optimized, w^TTransposition, D+ for w represent that the number of positive sample, D-represent the number of negative sample；The present invention puts above-mentioned Letter mark is as observation likelihood model

p (Z_{t} | X_{t}^{i}) = \frac{1}{1 + e^{- (&PlusMinus; w^{T} &PlusMinus; z_{i})}} - - - (16)

In formula, z_iRepresent the observation of a target candidate；

First the present invention builds the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob Cross-module formula depths of features learning network, utilize this network extraction extract sample in RGB-D video data across pattern feature；Then By sample in pattern feature is input to logistic regression grader, obtain the result of logistic regression grader and calculate as Bayes The observation likelihood model of method, builds rational state transition model simultaneously；RGB-D video is realized finally by bayesian algorithm Target following in data；Video target tracking method of the present invention is made further below by concrete experimentation Illustrate:

The realization of video target tracking method of the present invention, based on Windows 10 operating system, uses MATLAB R2014a is as software platform, and allocation of computer is CPU and TITANGPU of 3.40GHz；Use Kinect under various circumstances 100 groups of common test sets of video data of shooting are tested, in view of the target only enumerating wherein 3 groups of test videos below length Follow the tracks of result.In order to assess the performance of the present invention, select the target following that 6 kinds of current comparisons of the prior art are representative Method carries out the contrast of experimental result, wherein, 6 kinds of typical algorithm be respectively as follows: TLDTracker, Struck Tracker, CTTracker, VTD Tracker, MIL Tracker, RGB-D Tracker and DAE Tracker.Wherein DAE Tracker It it is the video mesh selecting degree of depth denoising own coding device (Denoising Auto-Encoder, DAE) to carry out cross-module formula feature learning Mark tracking；The present invention uses two evaluation criterions to assess the performance of each track algorithm, and specially mean center positions Error (Average Center Position Errors, ACPE) and success rate (Success Rate)；Wherein, centralized positioning The definition of error is the Euclidean distance between tracking box center and real goal center, and the definition of success rate is:Wherein, B_objectIt is the rectangle frame of target following, B_groundIt is that real target frame, success rate are big Follow the tracks of successfully in 0.5 expression；Fig. 5 to Fig. 7 sequentially show RGB-D test video face_occ2, express2_occ, The mean center position error Comparative result figure of bear_change, as shown in Fig. 5, Fig. 6 and Fig. 7, video object of the present invention Tracking has relatively low mean center position error and of a relatively high accuracy, in order to illustrate that the present invention applies Overall performance in 100 groups of common test data, selects success rate as the evaluation criterion of track algorithm further, and each is calculated The success rate comparing result of method sees table 1.

It is special that the present invention utilizes sparse denoising own coding degree of deep learning network to extract the cross-module formula of sample in RGB-D video data Levying, this feature can describe dependency complicated between both of which (RGB pattern, Depth pattern), will input across pattern feature In logistic regression grader, the result obtained, as the observation likelihood model of bayesian algorithm, is come real by bayesian algorithm Target following in existing RGB-D video data；Test result indicate that, video target tracking method of the present invention can be in target Between occur seriously to block, rotate, the change such as care time, moving target in more stable tracking, there is higher accuracy rate Stronger robustness.

The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims

1. a RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, it is characterised in that Comprise the steps:

Step 1: structure limits the deep across pattern feature of the RGB pattern-Depth pattern of Boltzmann machine based on Gauss-Bernoulli Jacob Degree learning network；

Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated；Described RGB-D sets of video data is without label Sample includes without label RGB image sample with without label Depth image pattern；

Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Boltzmann to based on Gauss-Bernoulli Jacob The cross-module formula depths of features learning network of machine carries out without supervised training, extract sample across pattern feature；

Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample conduct following the tracks of target in initial m frame Original template in target sample storehouse；Described original template include initial m frame is followed the tracks of target the positive negative sample of RGB image and The positive negative sample of Depth image；

Step 5: by initial m frame is followed the tracks of the positive negative sample of RGB image of target, the positive negative sample of Depth image be separately input to instruction In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob perfected, obtain target sample Across pattern feature；

Step 6: positive for target negative sample is sent to logistic regression grader across pattern feature, to described logistic regression grader Carry out Training；

Step 7: build state transition model, according to constructed state transition model in described RGB-D video sequence former frame Sampling around the target following result of video, obtain sample set, described sample set is multiple target following blocks；

Step 8: be input to described sample to train limits the deep across pattern feature of Boltzmann machine based on Gauss-Bernoulli Jacob Degree learning network in, obtain each sample across pattern feature；

Step 9: across pattern feature, each sample is sent to the logistic regression grader that trains, output result is as observation seemingly So model；

Step 10: according to formulaObtain the target following of described RGB-D video sequence t frame video ResultWherein p (Z_t|X_t) represent observation likelihood model, p (X_t|X_t-₁) represent t-1 frame and the state of t frame two continuous frames Metastasis model, posterior probability p (X_t|Z_t)∝p(Z_t|X_t)∫p(X_t|X_t-₁)p(X_t-₁|Z_t-₁)dX_t-₁。

The RGB-D data cross-module formula characterology limiting Boltzmann machine based on Gauss-Bernoulli Jacob the most according to claim 1 The method for tracking target practised, it is characterised in that

Described limit the RGB-D cross-module formula depths of features learning network of Boltzmann machine based on Gauss-Bernoulli Jacob and comprise there is height This-Bernoulli Jacob limits the RGB pattern feature degree of deep learning network of Boltzmann machine, has Gauss-Bernoulli Jacob and limit Boltzmann Increase in the Depth pattern feature degree of deep learning network of machine and degree of deep learning network the second hidden layer in both modes One the 3rd hidden layer, is: the hidden layer of cross-module formula feature representation；

The input signal of described 3rd hidden layer is the output letter of the second hidden layer of described RGB pattern feature degree of deep learning network NumberOutput signal with the second hidden layer of described Depth pattern feature degree of deep learning networkObtain the 3rd hidden layer Output signalThe output signal of the second hidden layer of described RGB pattern feature degree of deep learning networkIt is Tracking target sample feature under RGB pattern；The output letter of the second hidden layer of described Depth pattern feature degree of deep learning network NumberThe tracking target sample feature being under Depth pattern；The output signal of described 3rd hidden layerIt is institute State follow the tracks of target sample across pattern feature.

RGB-D video frequency object tracking based on degree of depth Boltzmann machine cross-module formula feature learning the most according to claim 1 Method, it is characterised in that be updated target sample storehouse during being tracked target, uses and follows the tracks of the mesh obtained Mark is followed the tracks of result and is replaced the original template in target sample storehouse.

A kind of RGB-D video object based on degree of depth Boltzmann machine cross-module formula feature learning the most according to claim 1 Tracking, it is characterised in that described state transition model p (X_t|X_t-1)=N (X_t；X_t-1, Ψ), wherein, Ψ is diagonal matrix, Element in this diagonal matrix is the variance of the motion affine transformation parameter of target

A kind of RGB-D video object based on degree of depth Boltzmann machine cross-module formula feature learning the most according to claim 1 Tracking, it is characterised in that described observation likelihood model isWherein, w is that logistic regression divides The parameter to be optimized of class device.