Summary of the invention
The present invention is directed to the proposition of problem above, and develop a kind of based on degree of depth Boltzmann machine cross-module formula feature learning
RGB-D video target tracking method.
The technological means of the present invention is as follows:
A kind of RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, including as follows
Step:
Step 1: build the cross-module formula spy of the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob
Levy degree of deep learning network;
Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated;Described RGB-D sets of video data without
Exemplar includes without label RGB image sample with without label Depth image pattern;
Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Bohr to based on Gauss-Bernoulli Jacob
The cross-module formula depths of features learning network of the most graceful machine carries out without supervised training, extract sample across pattern feature;
Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample following the tracks of target in initial m frame
As the original template in target sample storehouse;Described original template includes the positive negative sample of RGB image following the tracks of target in initial m frame
Negative sample positive with Depth image;
Step 5: by following the tracks of the positive negative sample of RGB image of target in initial m frame, the positive negative sample of Depth image inputs respectively
In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained, obtain target sample
This across pattern feature;
Step 6: across pattern feature, positive for target negative sample is sent to logistic regression grader, divides described logistic regression
Class device carries out Training;
Step 7: build state transition model, according to constructed state transition model before described RGB-D video sequence
Sampling around the target following result of one frame video, obtain sample set, described sample set is multiple target following blocks;
Step 8: described sample is input to the cross-module formula spy limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained
Levy in degree of deep learning network, obtain each sample across pattern feature;
Step 9: each sample is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result conduct
Observation likelihood model;
Step 10: according to formulaObtain the mesh of described RGB-D video sequence t frame video
Mark follows the tracks of resultWherein p (Zt|Xt) represent observation likelihood model, p (Xt|Xt-1) represent t-1 frame and t frame two continuous frames
State transition model, posterior probability p (Xt|Zt)∝p(Zt|Xt)∫p(Xt|Xt-1)p(Xt-1|Zt-1)dXt-1。
Further, during target is tracked, target sample storehouse is updated, uses tracking to obtain
Target following result replaces the original template in target sample storehouse;
Further, described state transition model p (Xt|Xt-1)=N (Xt;Xt-1, Ψ), wherein, Ψ is diagonal matrix, should
Element in diagonal matrix is the variance of the motion affine transformation parameter of target
Further, described observation likelihood model isWherein, w is logistic regression classification
The parameter to be optimized of device.
Owing to have employed technique scheme, it is based on degree of depth Boltzmann machine cross-module formula feature learning that the present invention provides
RGB-D video target tracking method, it is possible to occur seriously to block between target, rotate, the change such as care time, more stable
Moving target in tracking, has higher accuracy rate and stronger robustness.
Detailed description of the invention
A kind of RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, including as follows
Step:
Step 1: build the cross-module formula spy of the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob
Levy degree of deep learning network;
Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated;Described RGB-D sets of video data without
Exemplar includes without label RGB image sample with without label Depth image pattern;
Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Bohr to based on Gauss-Bernoulli Jacob
The cross-module formula depths of features learning network of the most graceful machine carries out without supervised training, extract sample across pattern feature;
Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample following the tracks of target in initial m frame
As the original template in target sample storehouse;Described original template includes the positive negative sample of RGB image following the tracks of target in initial m frame
Negative sample positive with Depth image;
Step 5: by following the tracks of the positive negative sample of RGB image of target in initial m frame, the positive negative sample of Depth image inputs respectively
In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained, obtain target sample
This across pattern feature;
Step 6: across pattern feature, positive for target negative sample is sent to logistic regression grader, divides described logistic regression
Class device carries out Training;
Step 7: build state transition model, according to constructed state transition model before described RGB-D video sequence
Sampling around the target following result of one frame video, obtain sample set, described sample set is multiple target following blocks;
Step 8: described sample is input to the cross-module formula spy limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained
Levy in degree of deep learning network, obtain each sample across pattern feature;
Step 9: each sample is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result conduct
Observation likelihood model;
Step 10: according to formulaObtain the target of described RGB-D video sequence t frame video
Follow the tracks of resultWherein p (Zt|Xt) represent observation likelihood model, p (Xt|Xt-1) represent t-1 frame and t frame two continuous frames
State transition model, posterior probability p (Xt|Zt)∝p(Zt|Xt)∫p(Xt|Xt-1)p(Xt-1|Zt-1)dXt-1。
Further, during target is tracked, target sample storehouse is updated, uses tracking to obtain
Target following result replaces the original template in target sample storehouse;Further, described state transition model p (Xt|Xt-1)=N
(Xt;Xt-1, Ψ), wherein, Ψ is diagonal matrix, and the element in this diagonal matrix is the side of the motion affine transformation parameter of target
Difference
Further, described observation likelihood model isWherein, w is logistic regression classification
The parameter to be optimized of device.
Boltzmann machine (Boltzmann Machines, BM) was proposed by Hinton and Sejnowski in nineteen eighty-three, was one
Planting and entirely connected, by probabilistic neural unit, the Feedback Neural Network formed, the connection between neuron is symmetrical, and without self feed back, god
Only have two states (un-activation and state of activation) through first output, represent with binary zero and 1.Fig. 1 shows Boltzmann
The structure of machine, as it is shown in figure 1, the neuron of Boltzmann machine comprises visible element v ∈ { 0,1}DWith hidden unit h ∈ { 0,1}F,
Visible element and hidden unit are made up of visible node and concealed nodes respectively, and D, F are respectively visible node and hidden node
Quantity.Visible element v ∈ { 0,1}DRepresent observable data, hidden unit h ∈ { 0,1}FRepresent the spy from extracting data
Levy.Visible layer node and hidden layer node (v, h) between energy function be defined as:
In formula, θ={ W, L, J, B, A} are model parameters, and W, L, J represent that visible layer arrives hidden layer, it is seen that layer is to visible layer
With the symmetrical connection weight of hidden layer to hidden layer, the diagonal entry of L and J is 0, B and A is visible layer node and hidden layer respectively
The threshold value of node.
Boltzmann machine the destination of study is to obtain the connection weight between each neuron, finds the minimum overall situation energy of system
Amount structure.By energy function indexation and regularization, can obtain visible element v and hidden unit h state be 1 associating
Probability distribution:
In formula, P*Represent non-normalized probability,Represent partition function, be normalization item
Boltzmann machine has the most powerful unsupervised learning ability, it is possible to rule complicated in learning data, but learns
The time practised is the longest, then, introduces and limits Boltzmann machine.Make L=0 and J=0, it may be assumed that limit Boltzmann machine visible layer and hidden
Not link between node in hiding layer by layer, the only node between visible layer and hidden layer has connection weight, to a great extent
Improve the training effectiveness of degree of deep learning network.Fig. 2 shows the schematic diagram limiting Boltzmann machine, as in figure 2 it is shown, limit glass
Wurz graceful machine visible layer node and hidden layer node (v, h) between energy function be:
E(v,h;θ)=-v ' Wh-v ' B-h ' A (3)
Visible element v and hidden unit h state are the joint probability distribution of 1:
Equally, in formulaRepresent partition function, be normalization item.
When input signal is real-valued image, the visible layer node limited in Boltzmann machine model is real numberHidden
Hide node layer h ∈ { 0,1}FDuring for binary random number, original model lost efficacy.Then the limited Boltzmann of Gauss-Bernoulli Jacob is defined
Machine (Gaussian-Bernoulli RBM, GRBM) model, its energy function is:
In formula, θ={ a, b, W, σ } is model parameter.
By to the further investigation of the most existing method for tracking target based on RGB-D data and summary, Wo Menfa
Existing existing document is all independent under RGB information and depth information both of which clarification of objective to carry out computing respectively, so
Feature under both of which is simply merged by the methods such as rear employing weighting, and such Feature Fusion mode have ignored two kinds
Dependency complicated between pattern, obtained feature is the most notable to the raising of performance of target tracking.The present invention utilize based on
Gauss-Bernoulli Jacob limits the cross-module formula depths of features learning network of the RGB pattern-Depth pattern of Boltzmann machine and extracts target
Under RGB pattern-Depth pattern across pattern feature, Fig. 3 be cross-module formula depths of features of the present invention study process schematic, as
Shown in Fig. 3, set up degree of deep learning network respectively in both of which (RGB pattern, Depth pattern), as shown in Fig. 3 (a) and 3 (b),
The second hidden layer in both of whichThe hidden layer of one cross-module formula Combined expression of upper increase, such as Fig. 3 (c) institute
Show, by Level by level learning obtain RGB-D data across pattern featureThe hidden layer of described cross-module formula Combined expression
As aforementioned second hidden layer, but its input signal is the Gauss-Bernoulli Jacob corresponding to RGB pattern limits Boltzmann machine
Gauss-Bernoulli Jacob corresponding to the output of the second hidden layer and Depth pattern limits the output of Boltzmann machine, and Fig. 3 illustrates
The schematic diagram of cross-module formula depths of features study.Concrete building process is as follows:
Degree of depth Boltzmann machine is a kind of network with symmetrical coupled random binary unit, with only hidden layer
RBM is different, and degree of depth Boltzmann machine comprises a series of visual element v ∈ { 0,1}DWith multiple hidden layer unitHerein by the cross-module formula degree of depth Bohr building RGB and Depth
The most graceful machine extract RGB-D data across pattern feature, first, set up the extraction RGB sequence signature with two hidden layers
Gauss-Bernoulli Jacob's degree of depth Boltzmann machine, as shown in Fig. 3 (a), orderRepresent visible element, be real-valued RGB figure herein
As input, orderWithRepresent the hidden unit of two hidden layers, two-layer based on RGB
The energy function of Gauss-Bernoulli Jacob's degree of depth Boltzmann machine is:
In formula,It is the standard deviation of Gauss model, θRGBIt it is the model parameter vector of degree of depth Boltzmann machine.
Two-layer Gauss based on RGB-Bernoulli Jacob's degree of depth Boltzmann machine joint probability distribution is:
In like manner, orderRepresenting visible element, the most real-valued depth image inputs, orderWithRepresent the hidden unit of two hidden layers, equally build the extraction RGB sequence with two hidden layers
The Gauss of row feature-Bernoulli Jacob's degree of depth Boltzmann machine, as shown in Fig. 3 (b), its energy equation and joint probability distribution are respectively
For:
Degree of depth Boltzmann machine is utilized to extract the target characteristic under RGB and Depth both of which, in both of which herein
Increase the hidden layer of a cross-module formula Combined expression on second hidden layer, build RGB and Depth with three hidden layers
The Gauss of cross-module formula-Bernoulli Jacob's degree of depth Boltzmann machine, obtains RGB-D data across pattern feature by Level by level learningAcross shown in Gauss-Bernoulli Jacob degree of depth Boltzmann machine schematic diagram such as Fig. 3 (c) of pattern feature.
Make { vRGB,vDepthRepresent real-valued gaussian variable, { h(1RGB),h(2RGB),h(1Depth),h(2Depth),h(3)Represent three
The hidden unit of hidden layer, then across modal Gaussian-Bernoulli Jacob's degree of depth Boltzmann machine joint probability distribution and energy equation difference
For:
RGB-Depth is exactly computing formula across the learning tasks of Gauss-Bernoulli Jacob's degree of depth Boltzmann machine of pattern feature
(11) maximum likelihood probability, obtains corresponding model parameter, but obtaining formula (11) maximum likelihood probability accurately is ten
Divide difficulty, it is possible to use mean field is inferred and MCMC based on stochastic approximation realizes stochastic approximation study.
Video target tracking method of the present invention is based on Bayesian MAP probability, and described Bayesian MAP is general
The problem that rate can regard the Bayesian MAP probability Estimation of hidden state variable in HMM as.That is: exist
T frame obtains a series of observed image Yt={ y1, y2..., yt, utilize Bayes's MAP theory to estimate hidden state variable xt。
From bayesian theory:
p(xt|Yt)∝p(yt|xt)∫p(xt|xt-1)p(xt-1|Yt-1)dxt-1 (12)
In formula, p (xt|xt-1) represent two continuous frames state transition model, p (yt|xt) represent observation likelihood model.T
The optimum state value of frame target can be obtained by maximum a-posteriori estimation, it may be assumed that
In formula,Represent state variable x of t frametThe l sample.In order to without loss of generality, present invention assumes that state
Metastasis model Normal Distribution, it may be assumed that
p(Xt|Xt-1)=N (Xt;Xt-1,Ψ) (14)
In formula, Ψ is diagonal matrix, and element therein is the variance of the motion affine transformation parameter of target
Observation likelihood model can be divided into discriminative model (Discriminative model) and production model
The big class of (Generative model) two: the present invention uses discriminative model therein, utilizes logistic regression (Logistic
Regression) sample characteristics is classified by grader, completes following the tracks of the RGB-D of the positive negative sample of target across pattern feature
After extraction, will be in pattern feature be input to logistic regression grader, it is thus achieved that confidence score (Confidence Score);Logic
The principle returning grader is specific as follows: set the cross-module formula feature representation of i-th training sample asI-th is trained
The label of sample is yi{-1 ,+1}, then positive sample set is ∈Positive sample is corresponding
Label isEqually, negative sample collection is combined intoNegative
The label that sample is corresponding isLogistic regression grader can be by minimizing generation
It is trained by valency function, i.e.
In formula,Be respectively the weight parameter of positive and negative two logic of class costs, w be logistic regression grader
Parameter to be optimized, wTTransposition, D+ for w represent that the number of positive sample, D-represent the number of negative sample;The present invention puts above-mentioned
Letter mark is as observation likelihood model
In formula, ziRepresent the observation of a target candidate;
First the present invention builds the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob
Cross-module formula depths of features learning network, utilize this network extraction extract sample in RGB-D video data across pattern feature;Then
By sample in pattern feature is input to logistic regression grader, obtain the result of logistic regression grader and calculate as Bayes
The observation likelihood model of method, builds rational state transition model simultaneously;RGB-D video is realized finally by bayesian algorithm
Target following in data;Video target tracking method of the present invention is made further below by concrete experimentation
Illustrate:
The realization of video target tracking method of the present invention, based on Windows 10 operating system, uses MATLAB
R2014a is as software platform, and allocation of computer is CPU and TITANGPU of 3.40GHz;Use Kinect under various circumstances
100 groups of common test sets of video data of shooting are tested, in view of the target only enumerating wherein 3 groups of test videos below length
Follow the tracks of result.In order to assess the performance of the present invention, select the target following that 6 kinds of current comparisons of the prior art are representative
Method carries out the contrast of experimental result, wherein, 6 kinds of typical algorithm be respectively as follows: TLDTracker, Struck Tracker,
CTTracker, VTD Tracker, MIL Tracker, RGB-D Tracker and DAE Tracker.Wherein DAE Tracker
It it is the video mesh selecting degree of depth denoising own coding device (Denoising Auto-Encoder, DAE) to carry out cross-module formula feature learning
Mark tracking;The present invention uses two evaluation criterions to assess the performance of each track algorithm, and specially mean center positions
Error (Average Center Position Errors, ACPE) and success rate (Success Rate);Wherein, centralized positioning
The definition of error is the Euclidean distance between tracking box center and real goal center, and the definition of success rate is:Wherein, BobjectIt is the rectangle frame of target following, BgroundIt is that real target frame, success rate are big
Follow the tracks of successfully in 0.5 expression;Fig. 5 to Fig. 7 sequentially show RGB-D test video face_occ2, express2_occ,
The mean center position error Comparative result figure of bear_change, as shown in Fig. 5, Fig. 6 and Fig. 7, video object of the present invention
Tracking has relatively low mean center position error and of a relatively high accuracy, in order to illustrate that the present invention applies
Overall performance in 100 groups of common test data, selects success rate as the evaluation criterion of track algorithm further, and each is calculated
The success rate comparing result of method sees table 1.
It is special that the present invention utilizes sparse denoising own coding degree of deep learning network to extract the cross-module formula of sample in RGB-D video data
Levying, this feature can describe dependency complicated between both of which (RGB pattern, Depth pattern), will input across pattern feature
In logistic regression grader, the result obtained, as the observation likelihood model of bayesian algorithm, is come real by bayesian algorithm
Target following in existing RGB-D video data;Test result indicate that, video target tracking method of the present invention can be in target
Between occur seriously to block, rotate, the change such as care time, moving target in more stable tracking, there is higher accuracy rate
Stronger robustness.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto,
Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and
Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.