CN106127806A - A kind of RGB D method for tracking target based on degree of depth Boltzmann machine cross-module formula feature learning - Google Patents

A kind of RGB D method for tracking target based on degree of depth Boltzmann machine cross-module formula feature learning Download PDF

Info

Publication number
CN106127806A
CN106127806A CN201610440598.5A CN201610440598A CN106127806A CN 106127806 A CN106127806 A CN 106127806A CN 201610440598 A CN201610440598 A CN 201610440598A CN 106127806 A CN106127806 A CN 106127806A
Authority
CN
China
Prior art keywords
rgb
target
degree
sample
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610440598.5A
Other languages
Chinese (zh)
Other versions
CN106127806B (en
Inventor
姜明新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN201610440598.5A priority Critical patent/CN106127806B/en
Publication of CN106127806A publication Critical patent/CN106127806A/en
Application granted granted Critical
Publication of CN106127806B publication Critical patent/CN106127806B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a kind of RGB D method for tracking target based on degree of depth Boltzmann machine cross-module formula feature learning, comprise the steps: to build the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss Bernoulli Jacob;Described degree of depth learning network is carried out without supervised training;Utilize the degree of depth learning network that trains extract the initial sample of target to be tracked across pattern feature;By initial sample in pattern feature is sent to logistic regression grader, and described logistic regression grader is carried out Training;Sample at described RGB D video sequence t frame according to constructed state transition model;Utilize degree of depth learning network extract each sampling across pattern feature;Each sampling is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result as observation likelihood model;Bayesian algorithm is utilized to obtain the video frequency object tracking result of t frame.The present invention has higher success rate and relatively low centralized positioning error.

Description

A kind of RGB-D target following based on degree of depth Boltzmann machine cross-module formula feature learning Method
Technical field
The present invention relates to a kind of video target tracking method, be specially one based on degree of depth Boltzmann machine across pattern feature The RGB-D video target tracking method of study.
Background technology
Target following technology based on video is one of key issue of computer vision field, at video monitoring, machine People navigates, and the field such as intelligent transportation, virtual reality suffers from being widely applied.Both at home and abroad, a lot of research worker have done this respect Research, and have made some progress.Video target tracking method of the prior art is mainly in two-dimensional image sequence Realizing target location, lack three-dimensional information, therefore, can block in target, rotate and during attitudes vibration, appearance is followed the tracks of unsuccessfully Situation.Along with the release of RGB-D sensor, we can obtain coloured image and depth image simultaneously.Depth information is permissible Improve the performance of video frequency object tracking, the most preferably by RGB information and depth information effective integration, and then promote video mesh The performance that mark is followed the tracks of is the research emphasis in video frequency object tracking field, and conventional correlation technique generally exists techniques below problem: Using the feature of engineer as observation likelihood model, the feature of engineer can have some limitations;To RGB pattern Carry out computing respectively with the engineer's feature under Depth pattern, then use the methods such as weighting to extraction under both of which Feature simply merges, and amalgamation mode have ignored dependency complicated between RGB pattern and Depth pattern.
Summary of the invention
The present invention is directed to the proposition of problem above, and develop a kind of based on degree of depth Boltzmann machine cross-module formula feature learning RGB-D video target tracking method.
The technological means of the present invention is as follows:
A kind of RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, including as follows Step:
Step 1: build the cross-module formula spy of the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob Levy degree of deep learning network;
Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated;Described RGB-D sets of video data without Exemplar includes without label RGB image sample with without label Depth image pattern;
Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Bohr to based on Gauss-Bernoulli Jacob The cross-module formula depths of features learning network of the most graceful machine carries out without supervised training, extract sample across pattern feature;
Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample following the tracks of target in initial m frame As the original template in target sample storehouse;Described original template includes the positive negative sample of RGB image following the tracks of target in initial m frame Negative sample positive with Depth image;
Step 5: by following the tracks of the positive negative sample of RGB image of target in initial m frame, the positive negative sample of Depth image inputs respectively In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained, obtain target sample This across pattern feature;
Step 6: across pattern feature, positive for target negative sample is sent to logistic regression grader, divides described logistic regression Class device carries out Training;
Step 7: build state transition model, according to constructed state transition model before described RGB-D video sequence Sampling around the target following result of one frame video, obtain sample set, described sample set is multiple target following blocks;
Step 8: described sample is input to the cross-module formula spy limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained Levy in degree of deep learning network, obtain each sample across pattern feature;
Step 9: each sample is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result conduct Observation likelihood model;
Step 10: according to formulaObtain the mesh of described RGB-D video sequence t frame video Mark follows the tracks of resultWherein p (Zt|Xt) represent observation likelihood model, p (Xt|Xt-1) represent t-1 frame and t frame two continuous frames State transition model, posterior probability p (Xt|Zt)∝p(Zt|Xt)∫p(Xt|Xt-1)p(Xt-1|Zt-1)dXt-1
Further, during target is tracked, target sample storehouse is updated, uses tracking to obtain Target following result replaces the original template in target sample storehouse;
Further, described state transition model p (Xt|Xt-1)=N (Xt;Xt-1, Ψ), wherein, Ψ is diagonal matrix, should Element in diagonal matrix is the variance of the motion affine transformation parameter of target
Further, described observation likelihood model isWherein, w is logistic regression classification The parameter to be optimized of device.
Owing to have employed technique scheme, it is based on degree of depth Boltzmann machine cross-module formula feature learning that the present invention provides RGB-D video target tracking method, it is possible to occur seriously to block between target, rotate, the change such as care time, more stable Moving target in tracking, has higher accuracy rate and stronger robustness.
Accompanying drawing explanation
Fig. 1 is the structural representation of Boltzmann machine of the present invention;
Fig. 2 is the structural representation that the present invention limits Boltzmann machine;
Fig. 3 is the schematic diagram of RGB-D data cross-module formula depths of features Boltzmann machine of the present invention.
Fig. 4 is the schematic flow sheet of video target tracking method of the present invention;
Fig. 5 to Fig. 7 be successively RGB-D test video face_occ2, express2_occ, bear_change average in Heart position error Comparative result figure.
Detailed description of the invention
A kind of RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, including as follows Step:
Step 1: build the cross-module formula spy of the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob Levy degree of deep learning network;
Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated;Described RGB-D sets of video data without Exemplar includes without label RGB image sample with without label Depth image pattern;
Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Bohr to based on Gauss-Bernoulli Jacob The cross-module formula depths of features learning network of the most graceful machine carries out without supervised training, extract sample across pattern feature;
Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample following the tracks of target in initial m frame As the original template in target sample storehouse;Described original template includes the positive negative sample of RGB image following the tracks of target in initial m frame Negative sample positive with Depth image;
Step 5: by following the tracks of the positive negative sample of RGB image of target in initial m frame, the positive negative sample of Depth image inputs respectively In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained, obtain target sample This across pattern feature;
Step 6: across pattern feature, positive for target negative sample is sent to logistic regression grader, divides described logistic regression Class device carries out Training;
Step 7: build state transition model, according to constructed state transition model before described RGB-D video sequence Sampling around the target following result of one frame video, obtain sample set, described sample set is multiple target following blocks;
Step 8: described sample is input to the cross-module formula spy limiting Boltzmann machine based on Gauss-Bernoulli Jacob trained Levy in degree of deep learning network, obtain each sample across pattern feature;
Step 9: each sample is sent to, across pattern feature, the logistic regression grader that trains, it is thus achieved that result conduct Observation likelihood model;
Step 10: according to formulaObtain the target of described RGB-D video sequence t frame video Follow the tracks of resultWherein p (Zt|Xt) represent observation likelihood model, p (Xt|Xt-1) represent t-1 frame and t frame two continuous frames State transition model, posterior probability p (Xt|Zt)∝p(Zt|Xt)∫p(Xt|Xt-1)p(Xt-1|Zt-1)dXt-1
Further, during target is tracked, target sample storehouse is updated, uses tracking to obtain Target following result replaces the original template in target sample storehouse;Further, described state transition model p (Xt|Xt-1)=N (Xt;Xt-1, Ψ), wherein, Ψ is diagonal matrix, and the element in this diagonal matrix is the side of the motion affine transformation parameter of target Difference
Further, described observation likelihood model isWherein, w is logistic regression classification The parameter to be optimized of device.
Boltzmann machine (Boltzmann Machines, BM) was proposed by Hinton and Sejnowski in nineteen eighty-three, was one Planting and entirely connected, by probabilistic neural unit, the Feedback Neural Network formed, the connection between neuron is symmetrical, and without self feed back, god Only have two states (un-activation and state of activation) through first output, represent with binary zero and 1.Fig. 1 shows Boltzmann The structure of machine, as it is shown in figure 1, the neuron of Boltzmann machine comprises visible element v ∈ { 0,1}DWith hidden unit h ∈ { 0,1}F, Visible element and hidden unit are made up of visible node and concealed nodes respectively, and D, F are respectively visible node and hidden node Quantity.Visible element v ∈ { 0,1}DRepresent observable data, hidden unit h ∈ { 0,1}FRepresent the spy from extracting data Levy.Visible layer node and hidden layer node (v, h) between energy function be defined as:
E ( v , h ; θ ) = - 1 2 v ′ L v - 1 2 h ′ J h - v ′ W h - v ′ B - h ′ A - - - ( 1 )
In formula, θ={ W, L, J, B, A} are model parameters, and W, L, J represent that visible layer arrives hidden layer, it is seen that layer is to visible layer With the symmetrical connection weight of hidden layer to hidden layer, the diagonal entry of L and J is 0, B and A is visible layer node and hidden layer respectively The threshold value of node.
Boltzmann machine the destination of study is to obtain the connection weight between each neuron, finds the minimum overall situation energy of system Amount structure.By energy function indexation and regularization, can obtain visible element v and hidden unit h state be 1 associating Probability distribution:
P ( v , h ; θ ) = P * ( v , h ; θ ) Z ( θ ) = 1 Z ( θ ) exp ( - E ( v , h ; θ ) ) - - - ( 2 )
In formula, P*Represent non-normalized probability,Represent partition function, be normalization item
Boltzmann machine has the most powerful unsupervised learning ability, it is possible to rule complicated in learning data, but learns The time practised is the longest, then, introduces and limits Boltzmann machine.Make L=0 and J=0, it may be assumed that limit Boltzmann machine visible layer and hidden Not link between node in hiding layer by layer, the only node between visible layer and hidden layer has connection weight, to a great extent Improve the training effectiveness of degree of deep learning network.Fig. 2 shows the schematic diagram limiting Boltzmann machine, as in figure 2 it is shown, limit glass Wurz graceful machine visible layer node and hidden layer node (v, h) between energy function be:
E(v,h;θ)=-v ' Wh-v ' B-h ' A (3)
Visible element v and hidden unit h state are the joint probability distribution of 1:
P ( v , h ; θ ) = 1 Z ( θ ) exp ( - E ( v , h ; θ ) ) - - ( 4 )
Equally, in formulaRepresent partition function, be normalization item.
When input signal is real-valued image, the visible layer node limited in Boltzmann machine model is real numberHidden Hide node layer h ∈ { 0,1}FDuring for binary random number, original model lost efficacy.Then the limited Boltzmann of Gauss-Bernoulli Jacob is defined Machine (Gaussian-Bernoulli RBM, GRBM) model, its energy function is:
E ( v , h ; θ ) = Σ i = 1 D ( v i - b i ) 2 2 σ i 2 - Σ i = 1 D Σ j = 1 F v i σ i W i j h j - Σ j = 1 F a j h j - - - ( 5 )
In formula, θ={ a, b, W, σ } is model parameter.
By to the further investigation of the most existing method for tracking target based on RGB-D data and summary, Wo Menfa Existing existing document is all independent under RGB information and depth information both of which clarification of objective to carry out computing respectively, so Feature under both of which is simply merged by the methods such as rear employing weighting, and such Feature Fusion mode have ignored two kinds Dependency complicated between pattern, obtained feature is the most notable to the raising of performance of target tracking.The present invention utilize based on Gauss-Bernoulli Jacob limits the cross-module formula depths of features learning network of the RGB pattern-Depth pattern of Boltzmann machine and extracts target Under RGB pattern-Depth pattern across pattern feature, Fig. 3 be cross-module formula depths of features of the present invention study process schematic, as Shown in Fig. 3, set up degree of deep learning network respectively in both of which (RGB pattern, Depth pattern), as shown in Fig. 3 (a) and 3 (b), The second hidden layer in both of whichThe hidden layer of one cross-module formula Combined expression of upper increase, such as Fig. 3 (c) institute Show, by Level by level learning obtain RGB-D data across pattern featureThe hidden layer of described cross-module formula Combined expression As aforementioned second hidden layer, but its input signal is the Gauss-Bernoulli Jacob corresponding to RGB pattern limits Boltzmann machine Gauss-Bernoulli Jacob corresponding to the output of the second hidden layer and Depth pattern limits the output of Boltzmann machine, and Fig. 3 illustrates The schematic diagram of cross-module formula depths of features study.Concrete building process is as follows:
Degree of depth Boltzmann machine is a kind of network with symmetrical coupled random binary unit, with only hidden layer RBM is different, and degree of depth Boltzmann machine comprises a series of visual element v ∈ { 0,1}DWith multiple hidden layer unitHerein by the cross-module formula degree of depth Bohr building RGB and Depth The most graceful machine extract RGB-D data across pattern feature, first, set up the extraction RGB sequence signature with two hidden layers Gauss-Bernoulli Jacob's degree of depth Boltzmann machine, as shown in Fig. 3 (a), orderRepresent visible element, be real-valued RGB figure herein As input, orderWithRepresent the hidden unit of two hidden layers, two-layer based on RGB The energy function of Gauss-Bernoulli Jacob's degree of depth Boltzmann machine is:
E ( v R G B , h ( 1 R G B ) , h ( 2 R G B ) ; θ R G B ) = Σ i = 1 D ( v i ( R G B ) - b i ( R G B ) ) 2 2 σ i ( R G B ) 2 - Σ i = 1 D Σ j = 1 F 1 R G B v i ( R G B ) σ i ( R G B ) W i j ( 1 R G B ) h j ( 1 R G B ) - Σ j = 1 F 1 R G B Σ l = 1 F 2 R G B W j l ( 2 R G B ) h j ( 1 R G B ) h l ( 2 R G B ) - Σ j = 1 F 1 R G B a j ( 1 R G B ) h j ( 1 R G B ) - Σ l = 1 F 2 R G B a l ( 2 R G B ) h l ( 2 R G B ) - - - ( 6 )
In formula,It is the standard deviation of Gauss model, θRGBIt it is the model parameter vector of degree of depth Boltzmann machine.
Two-layer Gauss based on RGB-Bernoulli Jacob's degree of depth Boltzmann machine joint probability distribution is:
P ( v R G B , h ( 1 R G B ) , h ( 2 R G B ) ; θ R G B ) = 1 Z ( θ R G B ) Σ h ( 1 R G B ) ( 2 R G B ) exp ( - E ( v R G B , h ( 1 R G B ) , h ( 2 R G B ) ; θ R G B ) ) - - - ( 7 )
In like manner, orderRepresenting visible element, the most real-valued depth image inputs, orderWithRepresent the hidden unit of two hidden layers, equally build the extraction RGB sequence with two hidden layers The Gauss of row feature-Bernoulli Jacob's degree of depth Boltzmann machine, as shown in Fig. 3 (b), its energy equation and joint probability distribution are respectively For:
E ( v D e p t h , h ( 1 D e p t h ) , h ( 2 D e p t h ) ; θ D e p t h ) = Σ i = 1 D ( v i ( D e p t h ) - b i ( D e p t h ) ) 2 2 σ i ( D e p t h ) 2 - Σ i = 1 D Σ j = 1 F 1 D e p t h v i ( D e p t h ) σ i ( D e p t h ) W i j ( 1 D e p t h ) h j ( 1 D e p t h ) - Σ j = 1 F 1 D e p t h Σ l = 1 F 2 D e p t h W j l ( 2 D e p t h ) h j ( 1 D e p t h ) h l ( 2 D e p t h ) - Σ j = 1 F 1 D e p t h a j ( 1 D e p t h ) h j ( 1 D e p t h ) - Σ l = 1 F 2 D e p t h a l ( 2 D e p t h ) h l ( 2 D e p t h ) - - - ( 8 )
P ( v D e p t h , h ( 1 D e p t h ) , h ( 2 D e p t h ) ; θ D e p t h ) = 1 Z ( θ D e p t h ) Σ h ( 1 D e p t h ) ( 2 D e p t h ) exp ( - E ( v D e p t h , h ( 1 D e p t h ) , h ( 2 D e p t h ) ; θ D e p t h ) ) - - - ( 9 )
Degree of depth Boltzmann machine is utilized to extract the target characteristic under RGB and Depth both of which, in both of which herein Increase the hidden layer of a cross-module formula Combined expression on second hidden layer, build RGB and Depth with three hidden layers The Gauss of cross-module formula-Bernoulli Jacob's degree of depth Boltzmann machine, obtains RGB-D data across pattern feature by Level by level learningAcross shown in Gauss-Bernoulli Jacob degree of depth Boltzmann machine schematic diagram such as Fig. 3 (c) of pattern feature.
Make { vRGB,vDepthRepresent real-valued gaussian variable, { h(1RGB),h(2RGB),h(1Depth),h(2Depth),h(3)Represent three The hidden unit of hidden layer, then across modal Gaussian-Bernoulli Jacob's degree of depth Boltzmann machine joint probability distribution and energy equation difference For:
P ( v R G B , v D e p t h ; θ ) = Σ h ( 2 R G B ) , h ( 2 D e p t h ) , h ( 3 ) P ( h ( 2 R G B ) , h ( 2 D e p t h ) , h ( 3 ) ) ( Σ h ( 1 RGB ) P ( v RGB , h ( 1 RGB ) , h ( 2 RGB ) ) ( Σ h ( 1 Depth ) P ( v Depth , h ( 1 Depth ) , h ( 2 Depth ) ) ) = 1 Z ( θ ) Σ h exp ( - Σ i ( v i R G B ) 2 2 σ i 2 ) + Σ i j v i ( R G B ) σ i W i j ( 1 R G B ) h j ( 1 R G B ) + Σ j l W j l ( 2 R G B ) h j ( 2 R G B ) h l ( 2 R G B ) - ( - Σ i ( v i D e p t h ) 2 2 σ i 2 ) + Σ i j v i ( D e p t h ) σ i W i j ( 1 D e p t h ) h j ( 1 D e p t h ) + Σ j l W j l ( 2 D e p t h ) h j ( 1 D e p t h ) h l ( 2 D e p t h ) + Σ l p W ( 3 R G B ) h l ( 2 R G B ) h p ( 3 ) + Σ l p W ( 3 D e p t h ) h l ( 2 D e p t h ) h p ( 3 ) - - - ( 10 )
E ( v , h ; θ ) = Σ i = 1 D ( v i ( R G B ) - b i ( R G B ) ) 2 2 σ i ( R G B ) 2 - Σ i = 1 D Σ j = 1 F 1 R G B v i ( R G B ) σ i ( R G B ) W i j ( 1 R G B ) h j ( 1 R G B ) - Σ j = 1 F 1 R G B Σ l = 1 F 2 R G B h j ( 1 R G B ) W j l ( 2 R G B ) h l ( 2 R G B ) + Σ l = 1 F 2 R G B Σ p = 1 F 3 R G B h j ( 2 R G B ) W l p ( 3 R G B ) h p ( 3 R G B ) - Σ j = 1 F 1 R G B a j ( 1 R G B ) h j ( R G B ) - Σ l = 1 F 2 R G B a l ( 2 R G B ) h l ( 2 R G B ) + Σ i = 1 K ( v i ( D e p t h ) - b i ( D e p t h ) ) 2 2 σ i ( D e p t h ) 2 - Σ i = 1 K Σ j = 1 F 1 D e p t h v i ( D e p t h ) σ i ( D e p t h ) W i j ( 1 D e p t h ) h j ( 1 D e p t h ) - Σ j = 1 F 1 D e p t h Σ l = 1 F 2 D e p t h h j ( 1 D e p t h ) W j l ( 2 D e p t h ) h l ( 2 D e p t h ) - Σ l = 1 F 2 D e p t h Σ p = 1 F 3 D e p t h h j ( 2 D e p t h ) W l p ( 3 D e p t h ) h p ( 3 D e p t h ) - Σ j = 1 F 1 D e p t h a j ( 1 D e p t h ) h j ( 1 D e p t h ) - Σ l = 1 F 2 D e p t h a l ( 2 D e p t h ) h l ( 2 D e p t h ) - Σ p = 1 F 3 a p ( 3 ) h p ( 3 ) - - - ( 11 )
RGB-Depth is exactly computing formula across the learning tasks of Gauss-Bernoulli Jacob's degree of depth Boltzmann machine of pattern feature (11) maximum likelihood probability, obtains corresponding model parameter, but obtaining formula (11) maximum likelihood probability accurately is ten Divide difficulty, it is possible to use mean field is inferred and MCMC based on stochastic approximation realizes stochastic approximation study.
Video target tracking method of the present invention is based on Bayesian MAP probability, and described Bayesian MAP is general The problem that rate can regard the Bayesian MAP probability Estimation of hidden state variable in HMM as.That is: exist T frame obtains a series of observed image Yt={ y1, y2..., yt, utilize Bayes's MAP theory to estimate hidden state variable xt。 From bayesian theory:
p(xt|Yt)∝p(yt|xt)∫p(xt|xt-1)p(xt-1|Yt-1)dxt-1 (12)
In formula, p (xt|xt-1) represent two continuous frames state transition model, p (yt|xt) represent observation likelihood model.T The optimum state value of frame target can be obtained by maximum a-posteriori estimation, it may be assumed that
In formula,Represent state variable x of t frametThe l sample.In order to without loss of generality, present invention assumes that state Metastasis model Normal Distribution, it may be assumed that
p(Xt|Xt-1)=N (Xt;Xt-1,Ψ) (14)
In formula, Ψ is diagonal matrix, and element therein is the variance of the motion affine transformation parameter of target
Observation likelihood model can be divided into discriminative model (Discriminative model) and production model The big class of (Generative model) two: the present invention uses discriminative model therein, utilizes logistic regression (Logistic Regression) sample characteristics is classified by grader, completes following the tracks of the RGB-D of the positive negative sample of target across pattern feature After extraction, will be in pattern feature be input to logistic regression grader, it is thus achieved that confidence score (Confidence Score);Logic The principle returning grader is specific as follows: set the cross-module formula feature representation of i-th training sample asI-th is trained The label of sample is yi{-1 ,+1}, then positive sample set is ∈Positive sample is corresponding Label isEqually, negative sample collection is combined intoNegative The label that sample is corresponding isLogistic regression grader can be by minimizing generation It is trained by valency function, i.e.
m i n ± w [ C + Σ i + = 1 D + l o g ( 1 + e y i + ± w T ± h i + ( 3 ) ) + C - Σ i - = 1 D - log ( 1 + e y i - ± w T ± h i - ( 3 ) ) ] - - - ( 15 )
In formula,Be respectively the weight parameter of positive and negative two logic of class costs, w be logistic regression grader Parameter to be optimized, wTTransposition, D+ for w represent that the number of positive sample, D-represent the number of negative sample;The present invention puts above-mentioned Letter mark is as observation likelihood model
p ( Z t | X t i ) = 1 1 + e - ( ± w T ± z i ) - - - ( 16 )
In formula, ziRepresent the observation of a target candidate;
First the present invention builds the RGB pattern-Depth pattern limiting Boltzmann machine based on Gauss-Bernoulli Jacob Cross-module formula depths of features learning network, utilize this network extraction extract sample in RGB-D video data across pattern feature;Then By sample in pattern feature is input to logistic regression grader, obtain the result of logistic regression grader and calculate as Bayes The observation likelihood model of method, builds rational state transition model simultaneously;RGB-D video is realized finally by bayesian algorithm Target following in data;Video target tracking method of the present invention is made further below by concrete experimentation Illustrate:
The realization of video target tracking method of the present invention, based on Windows 10 operating system, uses MATLAB R2014a is as software platform, and allocation of computer is CPU and TITANGPU of 3.40GHz;Use Kinect under various circumstances 100 groups of common test sets of video data of shooting are tested, in view of the target only enumerating wherein 3 groups of test videos below length Follow the tracks of result.In order to assess the performance of the present invention, select the target following that 6 kinds of current comparisons of the prior art are representative Method carries out the contrast of experimental result, wherein, 6 kinds of typical algorithm be respectively as follows: TLDTracker, Struck Tracker, CTTracker, VTD Tracker, MIL Tracker, RGB-D Tracker and DAE Tracker.Wherein DAE Tracker It it is the video mesh selecting degree of depth denoising own coding device (Denoising Auto-Encoder, DAE) to carry out cross-module formula feature learning Mark tracking;The present invention uses two evaluation criterions to assess the performance of each track algorithm, and specially mean center positions Error (Average Center Position Errors, ACPE) and success rate (Success Rate);Wherein, centralized positioning The definition of error is the Euclidean distance between tracking box center and real goal center, and the definition of success rate is:Wherein, BobjectIt is the rectangle frame of target following, BgroundIt is that real target frame, success rate are big Follow the tracks of successfully in 0.5 expression;Fig. 5 to Fig. 7 sequentially show RGB-D test video face_occ2, express2_occ, The mean center position error Comparative result figure of bear_change, as shown in Fig. 5, Fig. 6 and Fig. 7, video object of the present invention Tracking has relatively low mean center position error and of a relatively high accuracy, in order to illustrate that the present invention applies Overall performance in 100 groups of common test data, selects success rate as the evaluation criterion of track algorithm further, and each is calculated The success rate comparing result of method sees table 1.
It is special that the present invention utilizes sparse denoising own coding degree of deep learning network to extract the cross-module formula of sample in RGB-D video data Levying, this feature can describe dependency complicated between both of which (RGB pattern, Depth pattern), will input across pattern feature In logistic regression grader, the result obtained, as the observation likelihood model of bayesian algorithm, is come real by bayesian algorithm Target following in existing RGB-D video data;Test result indicate that, video target tracking method of the present invention can be in target Between occur seriously to block, rotate, the change such as care time, moving target in more stable tracking, there is higher accuracy rate Stronger robustness.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, according to technical scheme and Inventive concept equivalent or change in addition, all should contain within protection scope of the present invention.

Claims (5)

1. a RGB-D video target tracking method based on degree of depth Boltzmann machine cross-module formula feature learning, it is characterised in that Comprise the steps:
Step 1: structure limits the deep across pattern feature of the RGB pattern-Depth pattern of Boltzmann machine based on Gauss-Bernoulli Jacob Degree learning network;
Step 2: gather the unlabeled exemplars that a large amount of random RGB-D video data is concentrated;Described RGB-D sets of video data is without label Sample includes without label RGB image sample with without label Depth image pattern;
Step 3: the unlabeled exemplars utilizing described RGB-D video data to concentrate, limits Boltzmann to based on Gauss-Bernoulli Jacob The cross-module formula depths of features learning network of machine carries out without supervised training, extract sample across pattern feature;
Step 4: for having the RGB-D video sequence following the tracks of target, choose the positive negative sample conduct following the tracks of target in initial m frame Original template in target sample storehouse;Described original template include initial m frame is followed the tracks of target the positive negative sample of RGB image and The positive negative sample of Depth image;
Step 5: by initial m frame is followed the tracks of the positive negative sample of RGB image of target, the positive negative sample of Depth image be separately input to instruction In the cross-module formula depths of features learning network limiting Boltzmann machine based on Gauss-Bernoulli Jacob perfected, obtain target sample Across pattern feature;
Step 6: positive for target negative sample is sent to logistic regression grader across pattern feature, to described logistic regression grader Carry out Training;
Step 7: build state transition model, according to constructed state transition model in described RGB-D video sequence former frame Sampling around the target following result of video, obtain sample set, described sample set is multiple target following blocks;
Step 8: be input to described sample to train limits the deep across pattern feature of Boltzmann machine based on Gauss-Bernoulli Jacob Degree learning network in, obtain each sample across pattern feature;
Step 9: across pattern feature, each sample is sent to the logistic regression grader that trains, output result is as observation seemingly So model;
Step 10: according to formulaObtain the target following of described RGB-D video sequence t frame video ResultWherein p (Zt|Xt) represent observation likelihood model, p (Xt|Xt-1) represent t-1 frame and the state of t frame two continuous frames Metastasis model, posterior probability p (Xt|Zt)∝p(Zt|Xt)∫p(Xt|Xt-1)p(Xt-1|Zt-1)dXt-1
The RGB-D data cross-module formula characterology limiting Boltzmann machine based on Gauss-Bernoulli Jacob the most according to claim 1 The method for tracking target practised, it is characterised in that
Described limit the RGB-D cross-module formula depths of features learning network of Boltzmann machine based on Gauss-Bernoulli Jacob and comprise there is height This-Bernoulli Jacob limits the RGB pattern feature degree of deep learning network of Boltzmann machine, has Gauss-Bernoulli Jacob and limit Boltzmann Increase in the Depth pattern feature degree of deep learning network of machine and degree of deep learning network the second hidden layer in both modes One the 3rd hidden layer, is: the hidden layer of cross-module formula feature representation;
The input signal of described 3rd hidden layer is the output letter of the second hidden layer of described RGB pattern feature degree of deep learning network NumberOutput signal with the second hidden layer of described Depth pattern feature degree of deep learning networkObtain the 3rd hidden layer Output signalThe output signal of the second hidden layer of described RGB pattern feature degree of deep learning networkIt is Tracking target sample feature under RGB pattern;The output letter of the second hidden layer of described Depth pattern feature degree of deep learning network NumberThe tracking target sample feature being under Depth pattern;The output signal of described 3rd hidden layerIt is institute State follow the tracks of target sample across pattern feature.
RGB-D video frequency object tracking based on degree of depth Boltzmann machine cross-module formula feature learning the most according to claim 1 Method, it is characterised in that be updated target sample storehouse during being tracked target, uses and follows the tracks of the mesh obtained Mark is followed the tracks of result and is replaced the original template in target sample storehouse.
A kind of RGB-D video object based on degree of depth Boltzmann machine cross-module formula feature learning the most according to claim 1 Tracking, it is characterised in that described state transition model p (Xt|Xt-1)=N (Xt;Xt-1, Ψ), wherein, Ψ is diagonal matrix, Element in this diagonal matrix is the variance of the motion affine transformation parameter of target
A kind of RGB-D video object based on degree of depth Boltzmann machine cross-module formula feature learning the most according to claim 1 Tracking, it is characterised in that described observation likelihood model isWherein, w is that logistic regression divides The parameter to be optimized of class device.
CN201610440598.5A 2016-06-17 2016-06-17 RGB-D video target tracking methods based on depth Boltzmann machine cross-module formula feature learning Expired - Fee Related CN106127806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610440598.5A CN106127806B (en) 2016-06-17 2016-06-17 RGB-D video target tracking methods based on depth Boltzmann machine cross-module formula feature learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610440598.5A CN106127806B (en) 2016-06-17 2016-06-17 RGB-D video target tracking methods based on depth Boltzmann machine cross-module formula feature learning

Publications (2)

Publication Number Publication Date
CN106127806A true CN106127806A (en) 2016-11-16
CN106127806B CN106127806B (en) 2018-10-02

Family

ID=57470434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610440598.5A Expired - Fee Related CN106127806B (en) 2016-06-17 2016-06-17 RGB-D video target tracking methods based on depth Boltzmann machine cross-module formula feature learning

Country Status (1)

Country Link
CN (1) CN106127806B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018130890A1 (en) * 2017-01-11 2018-07-19 International Business Machines Corporation Learning apparatus and method for bidirectional learning of predictive model based on data sequence
CN108829987A (en) * 2018-06-22 2018-11-16 中国核动力研究设计院 A kind of data driven type probability evaluation method of failure
CN109389621A (en) * 2018-09-11 2019-02-26 淮阴工学院 RGB-D method for tracking target based on the fusion of multi-mode depth characteristic
CN111091078A (en) * 2019-12-03 2020-05-01 北京华捷艾米科技有限公司 Object tracking method and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971137A (en) * 2014-05-07 2014-08-06 上海电力学院 Three-dimensional dynamic facial expression recognition method based on structural sparse feature study
CN105302873A (en) * 2015-10-08 2016-02-03 北京航空航天大学 Collaborative filtering optimization method based on condition restricted Boltzmann machine

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971137A (en) * 2014-05-07 2014-08-06 上海电力学院 Three-dimensional dynamic facial expression recognition method based on structural sparse feature study
CN105302873A (en) * 2015-10-08 2016-02-03 北京航空航天大学 Collaborative filtering optimization method based on condition restricted Boltzmann machine

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MING-XIN JIANG,ET AL.: "Visual Object Tracking Based on 2DPCA and ML", 《MATHEMATICAL PROBLEMS IN ENGINEERING》 *
NGIAM, J.,ET AL.: "Multimodal Deep Learning", 《IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML)》 *
SRIVASTAVA, N.,ET AL.: "Multimodal Learning with Deep Boltzmann Machines", 《IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE AND WORKSHOP ON NEURAL INFORMATION PROCESSING SYSTEMS (NIPS)》 *
姜明新 等: "基于颜色与深度信息特征融合的一种多目标跟踪算法", 《光电子.激光》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018130890A1 (en) * 2017-01-11 2018-07-19 International Business Machines Corporation Learning apparatus and method for bidirectional learning of predictive model based on data sequence
CN108829987A (en) * 2018-06-22 2018-11-16 中国核动力研究设计院 A kind of data driven type probability evaluation method of failure
CN109389621A (en) * 2018-09-11 2019-02-26 淮阴工学院 RGB-D method for tracking target based on the fusion of multi-mode depth characteristic
CN109389621B (en) * 2018-09-11 2021-04-06 淮阴工学院 RGB-D target tracking method based on multi-mode depth feature fusion
CN111091078A (en) * 2019-12-03 2020-05-01 北京华捷艾米科技有限公司 Object tracking method and related equipment
CN111091078B (en) * 2019-12-03 2023-10-24 北京华捷艾米科技有限公司 Object tracking method and related equipment

Also Published As

Publication number Publication date
CN106127806B (en) 2018-10-02

Similar Documents

Publication Publication Date Title
CN108229444B (en) Pedestrian re-identification method based on integral and local depth feature fusion
CN109858390B (en) Human skeleton behavior identification method based on end-to-end space-time diagram learning neural network
CN106127804B (en) The method for tracking target of RGB-D data cross-module formula feature learnings based on sparse depth denoising self-encoding encoder
Quattoni et al. Hidden-state conditional random fields
Gao et al. A segmentation-aware object detection model with occlusion handling
Liu et al. A survey on deep-learning approaches for vehicle trajectory prediction in autonomous driving
CN106203283A (en) Based on Three dimensional convolution deep neural network and the action identification method of deep video
US10964033B2 (en) Decoupled motion models for object tracking
CN105069413A (en) Human body gesture identification method based on depth convolution neural network
CN107424161B (en) Coarse-to-fine indoor scene image layout estimation method
CN106127806A (en) A kind of RGB D method for tracking target based on degree of depth Boltzmann machine cross-module formula feature learning
CN102682452A (en) Human movement tracking method based on combination of production and discriminant
US11270425B2 (en) Coordinate estimation on n-spheres with spherical regression
WO2018058419A1 (en) Two-dimensional image based human body joint point positioning model construction method, and positioning method
CN104616324A (en) Target tracking method based on adaptive appearance model and point-set distance metric learning
CN108875586A (en) A kind of functional limb rehabilitation training detection method based on depth image Yu skeleton data multiple features fusion
CN114937066A (en) Point cloud registration system and method based on cross offset features and space consistency
CN111027586A (en) Target tracking method based on novel response map fusion
Iosifidis et al. Neural representation and learning for multi-view human action recognition
CN105469050A (en) Video behavior identification method based on local space-time characteristic description and pyramid vocabulary tree
CN105894008A (en) Target motion track method through combination of feature point matching and deep nerve network detection
Batool et al. Telemonitoring of daily activities based on multi-sensors data fusion
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
CN106056627A (en) Robustness object tracking method based on local identification sparse representation
Yu et al. Accurate and robust visual localization system in large-scale appearance-changing environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20161116

Assignee: Huai'an Zhi Shanghai Science and Technology Co., Ltd.

Assignor: Huaijin Polytechnical College

Contract record no.: 2019320000031

Denomination of invention: Boltzmann machine cross-mode feature deep learning-based RGB-D video target tracking method

Granted publication date: 20181002

License type: Common License

Record date: 20190319

EE01 Entry into force of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Huai'an Zhi Shanghai Science and Technology Co., Ltd.

Assignor: Huaijin Polytechnical College

Contract record no.: 2019320000031

Date of cancellation: 20190709

EC01 Cancellation of recordation of patent licensing contract
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20181002

Termination date: 20200617

CF01 Termination of patent right due to non-payment of annual fee