CN103237155B

CN103237155B - The tracking of the target that a kind of single-view is blocked and localization method

Info

Publication number: CN103237155B
Application number: CN201310110181.9A
Authority: CN
Inventors: 胡永利; 孙艳丰; 马俊
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2013-04-01
Filing date: 2013-04-01
Publication date: 2016-12-28
Anticipated expiration: 2033-04-01
Also published as: CN103237155A

Abstract

The present invention discloses tracking and the localization method of the target that a kind of apparent single-view obtaining the target image that removal is blocked exactly is blocked, including step: (1) uses multiple RGB D camera to registrate；(2) color and the degree of depth obtaining multiple RGB D cameras carries out characteristic model expression；(3) based on particle filter model, the characteristic model of step (2) is tracked and positions.

Description

The tracking of the target that a kind of single-view is blocked and localization method

Technical field

The invention belongs to the technical field of target following and location, the target being blocked more particularly to a kind of single-view Following the tracks of and localization method, the target that single-view is blocked refers to that target is blocked in a visual angle and in another visual angle is Visible.

Background technology

Human body tracking is research branch important in computer vision field, and it is in intelligent monitoring, video conference and man-machine The field such as mutual is widely used.In target tracking domain, be often used is Kalman filter (KF) and expansion card Thalmann filter (EKF), the former is used for nonlinear system for linear system, the latter.But for system occurs non-gaussian During the noise being distributed, the performance of above two filtering method will decline, and even there will be Divergent Phenomenon.This is due to vision system System itself has the non-linear and non-Gaussian system of height, therefore to non-linear, the correlation theory of non-Gaussian filtering and treatment technology Research have the biggest Practical significance, it has become the important trend of the research and development in this field.Particle filter is nearly tens A kind of non-linear, the filtering method of non-Gaussian filtering that year rises, it does not has special attribute specification, applies in target system Tracking field have good effect.Particle filter also exists the problems such as sample degeneracy, occlusion issue be difficult under study for action, because of This, studying and design pedestrian tracking system based on particle filter is significantly.

In order to obtain stable tracking effect, need to solve series of problems, including target detection and segmentation, character representation With dynamic tracking etc..For these problems, research worker proposes the method for a lot of human body tracking.Traditional human body tracing method makes Follow the tracks of as feature with colouring information, such as color histogram feature.These methods are all based on greatly the video sequence of single camera Row, owing to cannot obtain target 3-dimensional spatial information, these methods very difficult process target is blocked and eliminates similar purpose etc. and asks Topic.Multiple video cameras can obtain the video sequence of various visual angles, it provides more information and can be used for following the tracks of.Therefore, research Personnel propose much to solve to be blocked based on overlapping multiple camera tracking method the tracking of target.Although many vision methods are permissible By different visual angle segmentation objects, but it is intended to solve the registration of same target in different visual angles and calculate Target space position also Highly difficult.

Recently as TOF(Time Of Flight, depth camera) and a depth camera of Kinect(Microsoft) etc. The appearance of depth camera, can obtain color and depth information with combined depth camera, solve target occlusion and other problem.? Nearly researcher is attempted using 3-dimensional and depth camera to solve more complicated problem.Such as, stereoscopic camera reconstruct 3-dimensional space is used Carry out human body tracking.Additionally, a kind of method merging target appearance and depth characteristic is used for following the tracks of and location.Although using vertical After body camera, tracking effect is greatly improved, but it is not the most fully solved occlusion issue, particularly target and is hidden completely Keep off this situation.

Summary of the invention

The technology of the present invention solves problem: overcome the deficiencies in the prior art, it is provided that a kind of apparent acquisition exactly is gone The tracking of the target being blocked except the single-view of target image blocked and localization method.

The technical solution of the present invention is: the tracking of the target that this single-view is blocked and localization method, including with Lower step:

(1) use multiple RGB-D(can gather the camera of image information and depth information simultaneously) camera registration；

(2) color and the degree of depth obtaining multiple RGB-D cameras carries out characteristic model expression；

(3) based on particle filter model, the characteristic model of step (2) is tracked and positions.

Owing to target is blocked, only it is difficult to the lasting tracking in the case of target is blocked, for this with a camera Method employs multiple RGB-D camera collaborative work, because when the situation that target is blocked in a camera occurs, separately In one camera, this target is visible, it is possible to according to the Target space position obtained in this camera and combine two cameras Between spatial transform relation, calculate target position under another camera coordinates system, thus achieve target and hidden Lasting tracking in the case of gear.

Detailed description of the invention

The tracking of the target that this single-view is blocked and localization method, comprise the following steps:

(1) multiple RGB-D camera is used to registrate；

Preferably, in step (1), camera transformation model is formula (1):

p₁=R·p₂+T (1)

Wherein p_l=[x_l, y_l, z₁]^T, p₂=[x₂, y₂, z₂]^T, [x₁,y₁,z₁]、[x₂,y₂,z₂] distinguish in representation space One some space coordinates under first, second camera coordinates system, R is 3 × 3 rotational transformation matrix, T=[x₀, y₀, z₀]^T It it is translation parameters；According to the depth data of RGB-D collected by camera, obtain space 3-dimensional registration point pair, be expressed as formula (2)

P₁={ p_1i=[x_1i, y_1i, z_1i]^T| i=1, N} and P₂={ p_2i=[x_2i, y_2i, z_2i]^T| i=1, N} (2)

Wherein p_1iAnd p_2iFor the coordinate figure under world coordinate system；

Optimum transformation parameter (R is calculated by formula (3)^*, T^*)。

(R^{*}, T^{*}) = \arg \min_{R, T} Σ_{i = 1}^{N} | | p_{1 i} - (R \cdot p_{2 i} + T) | | - - - (3) .

Preferably, in step (2), color characteristic model is that from RGB, original image is turned to HSV space, then empty at HSV Between fall into a trap the color histogram of the rectangular area comprising target in nomogram picture, color characteristic model is pressed formula (4) and is set up,

H=[h₁..., h_i..., h_n]^T (4)

Wherein n represents the number of rectangular histogram bin, h_iIt is that the color of statistics falls the frequency in i-th bin region；

By the depth data in target area by formula (5) binaryzation, 1 represents target body region, 0 represent background or Other object,

B (x, y) = \begin{matrix}  \end{matrix} \{\begin{matrix} 1 & if | | Dp (x, y) - \overset{&OverBar;}{Dp} | | \leq ϵ \\ 0 & else \end{matrix} - - - (5)

Wherein (x is y) that (x, y) corresponding depth value are point to DThe average depth value of target, obtains in ε experiment Threshold value.

Preferably, to target area down-sampling before generating depth characteristic.

Preferably, step (3) include following step by step:

(3.1) for RGB-D video sequence, state variable is represented by formula (6),

S_t=λ₁(S_t-1-S₀)+λ₂(S_t-2-S₀)+G_t (6)

Wherein S_oRepresent original state, s_tIt is the yardstick in region, λ₁And λ₂It is predetermined weights, G_tIt is the Gauss of a zero-mean Random process vector；

(3.2) matching candidate particle and template particles in the current frame, initial target region is as template, according to similarity detection Excellent particle region is as target area: the most individually calculating the similarity of color and depth characteristic and template, then definition is merged similar Property, color similarity is calculated by formula (7),

M_{&upsi;} (H_{O}, H_{T}) = e^{- λ (1 - B_{d} ((H_{O}, H_{T}))} - - - (7)

WhereinH_OFor candidate particle H_O, H_TFor template particles, λ is for adjusting change The parameter preset of rate, it makes M_υ() ∈ [0,1], the maximum of result represents that candidate region is most like with template；

Degree of depth similarity is represented by formula (8):

D_ORepresent candidate's particle, D_TRepresent template,Represent l_oNormal form, is the number of non-zero value, and ^ is AND step-by-step operation, ～be NOT step-by-step operation；

The fusion similarity of color and depth characteristic is calculated by formula (9):

M(HD_O, HD_T)=M_υ(H_O, H_T)·M_d(D_O, D_T) (9)

HD_OAnd HD_TRepresent candidate's fusion feature and template fusion feature, thus obtain the position of target in RGB-D video sequence Put；

Obtain the particle in many RGB-D video sequence by formula (10) to mate:

C ({HD}_{O 1}, {HD}_{O 2}) = \frac{M ({HD}_{O 1}, {HD}_{T 1}) + M ({HD}_{O 2}, {HD}_{T 2})}{ω | | P_{1} - R^{*} \cdot P_{2} + T^{*} | |} - - - (10)

Wherein HD_Ol,HD_T1And HD_O2, HD_T2Represent that the fusion of first camera and second camera candidate's particle and template is special respectively Levy, P_lAnd P₂It is to be first camera and second camera observation particle HD respectively_O1And HD_O2The three-dimensional point coordinate at center, R^*, T^*Be from Second camera is to the transformation matrix of first camera, and ω is the weights preventing denominator from being removed by 0, is obtained optimum time by formula (11) accordingly Select target location:

({HD}_{O 1}^{*}, {HD}_{O 2}^{*}) = \arg \underset{{HD}_{O 2} &Element; {Pt}_{2}}{\max_{{HD}_{O 1} &Element; {Pt}_{1}}} C ({HD}_{O 1}, {HD}_{O 2}) - - - (11)

Wherein Pt₁And Pt₂Represent the random particles collection of particle filter in first camera and second camera respectively；(3.3) pass through Formula (12) carries out the most lasting tracking and obtains the three dimensional local information of target:

p^{*} = η_{1} \cdot p_{1}^{*} + η_{2} \cdot (R^{*} \cdot p_{2}^{*} + T^{*}) - - - (12)

WithRepresent respectively in first camera and second camera each under coordinate system The coordinate at the center of excellent intended particle, recombination coefficient η₁And η₂Defined by formula (13):

η_{1} = \frac{M ({HD}_{O 1}^{*}, {HD}_{T 1})}{M ({HD}_{O 1}^{*}, {HD}_{T 1}) + M ({HD}_{O 2}^{*}, {HD}_{T 2})}

η_{2} = \frac{M ({HD}_{O 2}^{*}, {HD}_{T 2})}{M ({HD}_{O 1}^{*}, {HD}_{T 1}) + M ({HD}_{O 2}^{*}, {HD}_{T 2})} - - - (13) .

The above, be only presently preferred embodiments of the present invention, and the present invention not makees any pro forma restriction, every depends on Any simple modification, equivalent variations and the modification made above example according to the technical spirit of the present invention, the most still belongs to the present invention The protection domain of technical scheme.

Claims

1. the tracking of the target that a single-view is blocked and localization method, it is characterised in that

Comprise the following steps:

(1) multiple RGB-D camera is used to registrate, it is achieved the conversion between different cameral；

The transformation model of different cameral is formula (1):

p₁=R p₂+T (1)

Wherein p₁=[x₁, y₁, z₁]^T, p₂=[x₂, y₂, z₂]^T, [x₁,y₁,z₁]、[x₂,y₂,z₂] distinguish a point in representation space Space coordinates under first, second camera coordinates system, R is 3 × 3 rotational transformation matrix, T=[x₀, y₀, z₀]^TIt it is translation Parameter；According to the depth data of RGB-D collected by camera, obtain space 3-dimensional registration point pair, be expressed as formula (2)

Wherein p_1iAnd p_2iFor the coordinate figure under world coordinate system；

Optimum transformation parameter (R is calculated by formula (3)^*, T^*):

(3)

Color characteristic model is that from RGB, original image is turned to HSV space, then comprises mesh in HSV space falls into a trap nomogram picture The color histogram of mark rectangular area, formula pressed by color characteristic model

(4) set up,

H=[h₁..., h_i..., h_n]^T (4)

Depth characteristic is by the depth data in target area by formula (5) binaryzation, and 1 represents target body region, and 0 represents the back of the body Scape or other object,

Wherein Dp (x, y) be point (x, y) corresponding depth value,Being the average depth value of target area, ε is to obtain in experiment Threshold value；

The tracking of the target that single-view the most according to claim 1 is blocked and localization method, it is characterised in that generating To target area down-sampling before depth characteristic.

The tracking of the target that single-view the most according to claim 1 and 2 is blocked and localization method, it is characterised in that step Suddenly (3) include following step by step:

(3.1) for RGB-D video sequence, state variable is represented by formula (6),

S_t=λ₁(S_t-1-S₀)+λ₂(S_t-2-S₀)+G_t (6)

Wherein S₀Represent original state, S_tIt is the current state of prediction, S_t-1And S_t-2It is the shape in 1 and 2 moment in the past respectively State, λ₁And λ₂It is predetermined weights, G_tIt it is the Gaussian random process vector of a zero-mean；

(3.2) matching candidate particle and template particles in the current frame, initial target region, as template, is detected according to similarity Optimum particle region is as target area: the most individually calculates the similarity of color and depth characteristic and template, then defines Merging similarity, color similarity is calculated by formula (7),

WhereinH_OFor candidate particle H_O, H_TFor template particles, λ is for adjusting rate of change Parameter preset, it makes M_υ() ∈ [0,1], the maximum of result represents that candidate region is most like with template；

Degree of depth similarity is represented by formula (8):

D_ORepresent candidate's particle, D_TRepresent template,Represent l₀Normal form, is the number of non-zero value, and ∧ is AND step-by-step operation ,～ It it is NOT step-by-step operation；

M(HD_O, HD_T)=M_v(H_O, H_T)·M_d(D_O, D_T) (9)

HD_OAnd HD_TRepresent candidate's fusion feature and template fusion feature, thus obtain the position of target in RGB-D video sequence；

Obtain the particle in many RGB-D video sequence by formula (10) to mate:

Wherein HD_O1, HD_T1And HD_O2, HD_T2Represent first camera and second camera candidate's particle and the fusion feature of template respectively, P₁And P₂It is to be first camera and second camera observation particle HD respectively_O1And HD_O2The three-dimensional point coordinate at center, R^*, T^*It is from second Camera is to the transformation matrix of first camera, and ω is the weights preventing denominator from being removed by 0, is obtained best candidate mesh by formula (11) accordingly Cursor position:

Wherein Pt₁And Pt₂Represent the random particles collection of particle filter in first camera and second camera respectively；

(3.3) carry out the most lasting tracking by formula (12) and obtain the three dimensional local information of target:

WithRepresent in first camera and second camera each optimum mesh under coordinate system respectively The coordinate at the center of mark particle, recombination coefficient η₁And η₂Defined by formula (13):