CN106709997A

CN106709997A - Three-dimensional key point detection method based on deep neural network and sparse auto-encoder

Info

Publication number: CN106709997A
Application number: CN201610279232.4A
Authority: CN
Inventors: 朱策; 林薪雨; 张倩; 王征韬; 刘翼鹏; 夏志强; 虢齐
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2017-05-24
Anticipated expiration: 2036-04-29
Also published as: CN106709997B

Abstract

The invention belongs to the technical field of three-dimensional computer vision, and specifically relates to a three-dimensional key point detection method based on a deep neural network and a sparse auto-encoder. The method comprises a phase of training the sparse auto-encoder and the deep neural network and a phase of detecting a three-dimensional key point by using the trained deep neural network as a regression model. Local and global information of a three-dimensional grid model in a multi-scale space is sufficiently utilized for detecting whether the point to be detected is a key point. By introducing the multi-layer sparse auto-encoder, correlation between the local and global information can be effectively discovered and an advanced feature expression form of the information is formed for regressing the information. Finally, the key point in the three-dimensional grid model can be effectively, robustly and stably detected.

Description

Three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder

Technical field

The invention belongs to 3D computer vision technical field, and in particular to one kind is based on deep neural network and sparse self-encoding encoder Three-dimensional critical point detection method.

Background technology

Three-dimensional critical point detection is the important content in 3D computer vision, be widely used in as target registered with match, it is three-dimensional Shape-memory behavior, among the various applications such as mesh segmentation and simplification.Researchers propose various detections three in the past few decades The method for tieing up key point, wherein being mostly the method based on geometry.Scale Invariants of the Godila and Wagan to two dimension Feature Transform (SIFT) algorithm is extended, and proposes D S IFT critical point detection algorithms.Holte is utilized Difference-of-Normals (DoN) operator carries out three-dimensional critical point detection.Castellani is aobvious according to the vision of three-dimensional grid model A kind of work property principle, it is proposed that three-dimensional critical point detection algorithm that can detect robust.Additionally, also some algorithm utilization draws general The method of Lars spectrum carries out three-dimensional critical point detection in Laplce's spectral domain rather than real domain.

Three-dimensional critical point detection method based on geometry lacks enough flexibilities, the demand therefore, it is difficult to meet vast application.Should It is three-dimensional key point that class method usually defines the point in all directions acute variation on three-dimensional grid model surface, but in some fields Under scape, these points are probably some unessential little details in noise or three-dimensional grid model.Additionally, work as to need to consider When to three-dimensional grid model semantic information, such issues that method based on geometry can hardly be processed.Based on above reason, Increasing researcher starts to be devoted to find a kind of new framework carries out three-dimensional critical point detection.

In recent years, some researchers proposed to carry out three-dimensional critical point detection using the method for machine learning, and such method is certain The deficiency of the three-dimensional critical point detection method based on geometry can be solved in degree.Teran and Mordohai by the use of random forest as Grader carries out three-dimensional critical point detection (Teran, L., Mordohai, P.:3d interest point detection via discriminative learning.In:Proceedings of the 13th European Conference on Computer Vision Conference on Computer Vision,Zurich,Switzerland(Sept 2014)).In the method, it is several based on geometry Three-dimensional critical point detection method be used to produce the characteristic attribute of training sample and test sample.Creusot is using linear discriminant point Analysis (LDA) and AdaBoost two ways detect three-dimensional key point from three-dimensional face model.Salti and Tombari is three-dimensional pass Key point test problems are attributed to a two-dimentional classification problem, and the standard of its classification is whether a point can be correctly predetermined with one Three-dimensional description sub- successful match (Salti, S., Tombari, F., Spezialetti, R., Stefano, the L.D. of justice:Learning a descriptor-specific 3d keypoint detector.In:Computer Vision(ICCV),2015IEEE International Conference on,Dec 2015,2318-2326)。

However, above-mentioned these algorithms mostly simply use local message to produce characteristic attribute, lack similar to Laplce's spectrum Such global information.

The content of the invention

There is problem or deficiency for above-mentioned, be in can more effectively, robustly and stably detecting three-dimensional grid model Key point, the invention provides a kind of three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder.

Concrete technical scheme is as follows：

Step 1, selection training set and the test set from three-dimensional grid model database, and positive and negative sample point is chosen from training set：

The training set and test set non-overlapping copies of selection, and three-dimensional grid model has the three-dimensional key point generated by handmarking. For each three-dimensional grid model in training set, it is positive sample point to choose the three-dimensional key point for all being generated by handmarking, Remaining is negative sample point.

Step 2, be positive and negative sample point formed characteristic attribute, construction feature property set：

Local message and global information using three-dimensional grid model in multiscale space form the characteristic attribute of sample point.

The local message includes three parts：1) Euclidean distance f of the surrounding neighbors point of measured point to measured point section_d, 2) Measured point and the angle f around it between normal vector of field point_θ, 3) and four kinds of curvature of curved surface f_c：Maximum principal curvatures, minimum master Curvature, Gaussian curvature and average curvature.The global information is Laplce's spectral information f_ls。

For any point v in three-dimensional grid model M (x, y, z), f is made for its characteristic attribute, then：

F=[f₀,f₁,f₂,...,f_Ω]^T (1)

f_i=[f_d,f_θ,f_c,f_ls], i=0,1,2 ..., Ω (2)

Wherein f_i, i=0,1,2 ..., Ω represents three-dimensional grid model M (x, y, z) in the characteristic attribute information corresponding to yardstick i, f_iBag Containing three class local message f_d、f_θ、f_cWith global information f_ls.Evolution tables of the three-dimensional grid model M (x, y, z) in dimensional space It is shown as：

M_δ(x, y, z)=M (x, y, z) * G (x, y, z, δ) (3)

Wherein δ ∈ { 0, ε, 2 ε ..., Ω ε } are the standard deviation of three-dimensional Gaussian wave filter, and ε is the minimum for surrounding three-dimensional grid model completely 0.3%, δ=0 of cubical leading diagonal length represents that the evolutionary model is initial three-dimensional grid model M (x, y, z), and * is volume Product operator.

Three-dimensional grid model is made up of series of points and its annexation.As shown in figure 3, any in three-dimensional grid model One point v, makes V_k(v), k=1,2,3,4,5 is the k- rings neighborhood point around it, and n is its normal vector.Make v_kjIt is V_kJ-th in (v) Point, n_kjIt is its normal vector.Then point v_kjThe Euclidean distance d in the section corresponding to point v_kjFor：

Point v_kjIt is with the angle between point v normal vectors：

Wherein, (x_v,y_v,z_v) be point v coordinate, (x_kj,y_kj,z_kj) it is point v_kjCoordinate.Order Wherein N_kIt is V_kThe number at (v) midpoint.Then f_dAnd f_θFor：

f_d=[max (d_k),min(d_k),max(d_k)-min(d_k),mean(d_k),var(d_k),harmmean(d_k)] (7)

f_θ=[max (θ_k),min(θ_k),max(θ_k)-min(θ_k),mean(θ_k),var(θ_k),harmmean(θ_k)] (8)

Wherein, mean (), var () and harmmean () represent arithmetic average, variance and harmonic average respectively.f_cBy four Curvature is planted to constitute：

Wherein c₁It is minimum principal curvatures, c₂It is maximum principal curvatures, (c₁+c₂)/2 are average curvature, c₁c₂It is Gaussian curvature.

The Laplacian Matrix of three-dimensional grid model is a symmetrical matrix and can be decomposed into：

L=B Λ B^T (10)

Wherein Λ=Diag { λ_f, 1≤f≤Ψ } and it is that the element of a diagonal matrix and the inside is arranged according to ascending order, λ_fIt is three dimensional network The characteristic value of the Laplacian Matrix of lattice model.The column vector of orthogonal matrix B is corresponding characteristic vector, and Ψ is three-dimensional grid mould The sum at type midpoint.Laplce's spectrum is defined as：

H (f)={ λ_f,1≤f≤Ψ} (11)

Global information is obtained using logarithm-Laplce's spectrum.Logarithm-Laplce's spectrum L (f) is defined as：

L (f)=log (H (f)) (12)

Scrambling R (f) of spectrum is used to obtain grid conspicuousness：

R (f)=| L (f)-J_Γ(f)*L(f)| (13)

Wherein,It is a vector of 1 × Γ.By equation below：

The scrambling of spectrum is transformed into real domain from spectral domain.Wherein R₁=Diag { exp (R (f)):1≤f≤Ψ } it is to angular moment Battle array,For Hadamard is accumulated, W is weight matrix, wherein

OrderS is the element of S, then：

f_ls=[max (s_k),min(s_k),max(s_k)-min(s_k),mean(s_k),var(s_k),harmmean(s_k)] (17)

Step 3, the characteristic attribute collection built using step 2 and corresponding tally set train sparse self-encoding encoder and depth nerve Network：

Sparse self-encoding encoder is a variant of self-encoding encoder, is obtained by adding openness limitation in the hidden layer part of self-encoding encoder Arrive.Fig. 2 (a) illustrates a basic structure for self-encoding encoder.

Three sparse self-encoding encoders are trained first, the coded portion of these three sparse self-encoding encoders is then extracted level and is associated in one Rise, form a sparse self-encoding encoder of depth, then train first level logical to return layer come after processing the sparse self-encoding encoder coding of depth The feature of output.

Deep neural network regression model is made up of the sparse self-encoding encoder of depth and above-mentioned logistic regression level connection, Fig. 2 (b) displayings The basic structure of the deep neural network regression model in the inventive method.

Finally the effect that deep neural network regression model realizes accurate adjustment is acted on using back-propagation algorithm.

Step 4, the deep neural network regression model obtained using step 3 are predicted to three-dimensional grid model and obtain corresponding Conspicuousness response diagram：

Characteristic attribute is formed to each point of three-dimensional grid model in test set using method same in step 2, and uses step The 3 deep neural network regression models for obtaining are predicted to the point, obtain a regressand value.In obtaining three-dimensional grid model again Regressand value a little, the conspicuousness response diagram of the three-dimensional grid model is constituted with it.

Step 5, the conspicuousness response diagram obtained according to step 4 obtain three-dimensional key point：

The point with local maximum is used as three-dimensional key point in choosing conspicuousness response diagram.It is each in for three-dimensional grid model Individual, if the conspicuousness response of the point is all bigger than the conspicuousness response put in 5- rings neighborhood around it, the point is three-dimensional Key point.Otherwise, the point is not just three-dimensional key point.

The present invention passes through：1st, three-dimensional key point inspection is carried out as regression model with reference to sparse self-encoding encoder using deep neural network Survey；2nd, part and global information using three-dimensional grid model in multiscale space forms characteristic attribute, makes as much as possible Information is utilized to detect three-dimensional key point；3rd, introducing the sparse self-encoding encoder of multilayer can effectively find these local and global letters Correlation between breath simultaneously forms the advanced features representation of these information, to be returned to it.Can effectively, Shandong Rod ground and stably detect the key point in three-dimensional grid model.

In sum, relatively existing three-dimensional critical point detection method, the method for the present invention can effectively, robustly and stably Detect the key point in three-dimensional grid model.

Brief description of the drawings

Fig. 1 is the flow chart of three-dimensional critical point detection method in the present invention；

Fig. 2 (a) is the structure chart of self-encoding encoder, and Fig. 2 (b) is deep neural network regression model of the invention；

Fig. 3 is aircraft three-dimensional grid model and its close-up schematic view；

Fig. 4 is the three-dimensional key point that chair three-dimensional grid model is detected using the inventive method；

Fig. 5 is the key point obtained in different frame using present invention detection three-dimensional video sequence；

Fig. 6 is the present invention and remaining the 5 kinds performance comparison figures of three-dimensional critical point detection method；Figure (a) is tested for database A Concentrate the performance chart on IOU evaluation indexes；Figure (b) is the property on IOU evaluation indexes in database B test sets Can curve map.

Reference：Tested point v；1- rings neighborhood point V₁；2- rings neighborhood point V₂；3- rings neighborhood point V₃；4- rings neighborhood point V₄；5- Ring neighborhood point V₅。

Specific embodiment

The inventive method is described in further detail with instantiation below in conjunction with the accompanying drawings, the target of example is by three dimensional network The validity of lattice model critical point detection result verification the method for the invention.

In implementation process, we are with document (Dutagaci, H., Cheung, C.P., Godil, A.:Evaluation of 3d interest point detection techniques via human-generated ground truth.The Visual Computer 28(9)(2012) Three-dimensional grid model database in 901-917) is used as training and test data set.

Train the specific embodiment in deep neural network stage：

The three-dimensional grid model database is divided into two parts of database A and database B, and database A includes 24 three dimensional networks Lattice model, is carried out demarcating generation three-dimensional key point Ground truth by 23 people.Database B includes 43 three-dimensional grid models, Carried out demarcating generation Ground truth by 16 people.Respectively selection database A and database B 2/3rds as training set, Remaining is used as test set.

For each three-dimensional grid model in database A training sets, Selecting All Parameters σ ∈ { 0.01,0.02 ..., 0.1 } and parameter Ground truth corresponding to n ∈ { 11,12 ..., 22 } are positive sample, and remaining point is negative sample.Positive sample sum is 17115, Negative sample sum is 148565.For each three-dimensional grid model in database B training sets, Selecting All Parameters Ground truth corresponding to σ ∈ { 0.01,0.02 ..., 0.1 } and parameter n ∈ { 8,9 ..., 15 } are positive sample, and remaining point is negative sample This.Positive sample sum is 18427, and negative sample sum is 222034.

Step 2, be positive and negative sample point formed characteristic attribute：

For each three-dimensional grid model in training set, expression of the model in metric space is calculated first, then calculate The information of each sample point in the model in yardstick i=0,1,2 ..., Ω, the value of Ω is 6 in the present invention.Local message f_d、 f_θWith f_cCan be calculated by equation below respectively：

f_d=[max (d_k),min(d_k),max(d_k)-min(d_k),mean(d_k),var(d_k),harmmean(d_k)]

f_θ=[max (θ_k),min(θ_k),max(θ_k)-min(θ_k),mean(θ_k),var(θ_k),harmmean(θ_k)]

Global information f_lsObtained in the scrambling of spectral domain by calculating logarithm-Laplce's spectrum：

R (f)=| L (f)-J_Γ(f)*L(f)|

WhereinIn Γ be 9.Global information (f_ls) be：

f_ls=[max (s_k),min(s_k),max(s_k)-min(s_k),mean(s_k),var(s_k),harmmean(s_k)]

Finally the characteristic attribute of each sample point in the three-dimensional grid model is：

F=[f₀,f₁,f₂,...,f₆]^T

f_i=[f_d,f_θ,f_c,f_ls], i=0,1,2 ..., 6

Sample dimension size is 665.

Step 3, train sparse self-encoding encoder and deep neural network using the characteristic attribute collection for building and corresponding tally set：

The characteristic attribute collection obtained using step 1 and step 2 and corresponding tally set train sparse self-encoding encoder and depth nerve net Network, its relative parameters setting is as shown in the table：

Wherein ρ is the openness parameter of sparse self-encoding encoder, the weight of sparse penalty term in the cost function of the sparse self-encoding encoder of β controls.

The detection three-dimensional key point stage：

Step 4, three-dimensional grid model is predicted using the deep neural network regression model for training and obtains its conspicuousness Response diagram：

For each three-dimensional grid model in test set, by taking chair three-dimensional grid model as an example.It is first according to step 2 In method obtain its expression in metric space, then be each point calculate characteristic attribute.Using the depth god for training Each point on chair three-dimensional grid model is predicted through net regression model.For each point, depth nerve It is the regressand value between 0 to 1 that network is output as a value, and output valve represents that the value more may be three-dimensional key point closer to 1, Vice versa.All these output valves together constitute the conspicuousness response diagram of chair three-dimensional grid model.

Step 5, three-dimensional key point is obtained according to conspicuousness response diagram：

The point with local maximum is used as chair three-dimensional grid model in choosing the conspicuousness response diagram of chair three-dimensional grid model Three-dimensional key point.For each point in chair three-dimensional grid model, if the conspicuousness response of the point is than 5- rings around it The conspicuousness response of point is all big in neighborhood, then the point is three-dimensional key point.Otherwise, the point is not just three-dimensional key point.Fig. 4 Illustrate the three-dimensional key point of the chair three-dimensional grid model detected using the method for the present invention.

Fig. 5 illustrates the three-dimensional key point distribution in the pedestrian's three-dimensional test sequence detected using the inventive method under different frame.

Fig. 6 illustrates the inventive method with remaining the 5 kinds performance comparison figures of three-dimensional critical point detection method, and evaluation index is IOU Criterion (L.Teran and P.Mordohai, " 3d interest point detection via discriminativ learning, " in European Conference on Computer Vision.Zurich, Switzerland, Sept 2014), Fig. 6 (a) illustrates 6 kinds Performance chart of the algorithm in database A test sets, Fig. 6 (b) illustrates property of 6 kinds of algorithms in database B test sets Can curve map.

Claims

1. the three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder, comprises the following steps：

The training set and test set non-overlapping copies of selection, and three-dimensional grid model has the three-dimensional key point generated by handmarking； For each three-dimensional grid model in training set, it is positive sample point to choose the three-dimensional key point for all being generated by handmarking, Remaining is negative sample point；

Local message and global information using three-dimensional grid model in multiscale space form the characteristic attribute of sample point；

The local message includes three parts：1) Euclidean distance f of the surrounding neighbors point of measured point to measured point section_d, 2) Measured point and the angle f around it between normal vector of field point_θ, 3) and four kinds of curvature of curved surface f_c：Maximum principal curvatures, minimum master Curvature, Gaussian curvature and average curvature；The global information is Laplce's spectral information f_ls；

F=[f₀,f₁,f₂,...,f_Ω]^T (1)

f_i=[f_d,f_θ,f_c,f_ls], i=0,1,2 ..., Ω (2)

Wherein f_i, i=0,1,2 ..., Ω represents three-dimensional grid model M (x, y, z) in the characteristic attribute information corresponding to yardstick i, f_iBag Containing f_d、f_θ、f_cAnd f_ls；Evolutions of the three-dimensional grid model M (x, y, z) in dimensional space is expressed as：

M_δ(x, y, z)=M (x, y, z) * G (x, y, z, δ) (3)

G (x, y, z, δ) = \frac{1}{{(\sqrt{2 π} δ)}^{3}} e^{- \frac{(x^{2} + y^{2} + z^{2})}{2 δ^{2}}} - - - (4)

Wherein δ ∈ { 0, ε, 2 ε ..., Ω ε } are the standard deviation of three-dimensional Gaussian wave filter, and ε is the minimum for surrounding three-dimensional grid model completely 0.3%, δ=0 of cubical leading diagonal length represents that the evolutionary model is initial three-dimensional grid model M (x, y, z), and * is volume Product operator；

Three-dimensional grid model is made up of series of points and its annexation, to any point v in three-dimensional grid model, order V_k(v), k=1,2,3,4,5 is the k- rings neighborhood point around it, and n is its normal vector；Make v_kjIt is V_kJ-th point in (v), n_kjFor Its normal vector, then point v_kjThe Euclidean distance d in the section corresponding to point v_kjFor：

d_{k j} = \frac{| n^{T} {[x_{k j}, y_{k j}, z_{k j}]}^{T} - n^{T} {[x_{v}, y_{v}, z_{v}]}^{T} |}{| | n | |_{2}} - - - (5)

Point v_kjIt is with the angle between point v normal vectors：

θ_{k j} = m i n (a r c c o s (\frac{n^{T} n_{k j}}{| | n | |_{2} | | n_{k j} | |_{2}})) - - - (6)

Wherein, (x_v,y_v,z_v) be point v coordinate, (x_kj,y_kj,z_kj) it is point v_kjCoordinate；Order Wherein N_kIt is V_kThe number at (v) midpoint, then f_dAnd f_θFor：

f_d=[max (d_k),min(d_k),max(d_k)-min(d_k),mean(d_k),var(d_k),harmmean(d_k)] (7)

Wherein, mean (), var () and harmmean () represent arithmetic average, variance and harmonic average respectively；f_cBy four Curvature is planted to constitute：

f_{c} = [c_{1}, c_{2}, \frac{c_{1} + c_{2}}{2}, c_{1} c_{2}] - - - (9)

Wherein c₁It is minimum principal curvatures, c₂It is maximum principal curvatures, (c₁+c₂)/2 are average curvature, c₁c₂It is Gaussian curvature；

L=B Λ B^T (10)

Wherein Λ=Diag { λ_f, 1≤f≤Ψ } and it is that the element of a diagonal matrix and the inside is arranged according to ascending order, λ_fIt is three dimensional network The characteristic value of the Laplacian Matrix of lattice model；The column vector of orthogonal matrix B is corresponding characteristic vector, and Ψ is three-dimensional grid mould The sum at type midpoint, Laplce's spectrum is defined as：

H (f)={ λ_f,1≤f≤Ψ} (11)

Global information is obtained using logarithm-Laplce's spectrum, logarithm-Laplce's spectrum L (f) is defined as：

L (f)=log (H (f)) (12)

Scrambling R (f) of spectrum is used to obtain grid conspicuousness：

R (f)=| L (f)-J_Γ(f)*L(f)| (13)

Wherein,It is a vector of 1 × Γ, by equation below：

The scrambling of spectrum is transformed into real domain, wherein R from spectral domain₁=Diag { exp (R (f)):1≤f≤Ψ } it is to angular moment Battle array,For Hadamard is accumulated, W is weight matrix, wherein

W (i, j) = \frac{1}{| | v_{i} - v_{j} | |^{2}} A (i, j) - - - (15)

OrderS is the element of S, then：

Step 3, the characteristic attribute collection and corresponding tally set that are built using step 2, train sparse self-encoding encoder and depth god Through network：

Sparse self-encoding encoder is a variant of self-encoding encoder, is obtained by adding openness limitation in the hidden layer part of self-encoding encoder Arrive；Three sparse self-encoding encoders are trained first, the coded portion of these three sparse self-encoding encoders is then extracted level and is associated in one Rise, form a sparse self-encoding encoder of depth, then train first level logical to return layer come after processing the sparse self-encoding encoder coding of depth The feature of output；

Deep neural network regression model is made up of the sparse self-encoding encoder of depth and above-mentioned logistic regression level connection；

Finally the effect that deep neural network regression model realizes accurate adjustment is acted on using back-propagation algorithm；

Characteristic attribute is formed to each point of three-dimensional grid model in test set using method same in step 2, and uses step The 3 deep neural network regression models for obtaining are predicted to the point, obtain a regressand value；In obtaining three-dimensional grid model again Regressand value a little, the conspicuousness response diagram of the three-dimensional grid model is constituted with it；

The point with local maximum is used as three-dimensional key point in choosing conspicuousness response diagram；It is each in for three-dimensional grid model Individual, if the conspicuousness response of the point is all bigger than the conspicuousness response put in 5- rings neighborhood around it, the point is three-dimensional Key point；Otherwise, the point is not just three-dimensional key point.