CN106709997B

CN106709997B - Three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder

Info

Publication number: CN106709997B
Application number: CN201610279232.4A
Authority: CN
Inventors: 朱策; 林薪雨; 张倩; 王征韬; 刘翼鹏; 夏志强; 虢齐
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2019-07-19
Anticipated expiration: 2036-04-29
Also published as: CN106709997A

Abstract

The invention belongs to 3D computer vision technical fields, and in particular to a kind of three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder.This method includes the sparse self-encoding encoder of training and deep neural network stage and detects the three-dimensional key point stage using trained deep neural network as regression model.Part and global information of the three-dimensional grid model in multiscale space are fully utilized to detect whether tested point is key point.The correlation between these parts and global information can effectively be found and form the advanced features representation of these information by introducing the sparse self-encoding encoder of multilayer, to return to it.It finally can effectively, robustly and steadily detect the key point in three-dimensional grid model.

Description

Three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder

Technical field

The invention belongs to 3D computer vision technical fields, and in particular to one kind based on deep neural network and it is sparse from The three-dimensional critical point detection method of encoder.

Background technique

Three-dimensional critical point detection is the important content in 3D computer vision, be widely used in as target registered with Match, 3D shape retrieval, among the various applications such as mesh segmentation and simplification.Researchers propose more in the past few decades The method that kind detects three-dimensional key point, wherein being mostly the method based on geometry.Godila and Wagan are to two-dimensional Scale Invariant Feature Transform (SIFT) algorithm is extended, and proposes that three-dimensional S IFT critical point detection is calculated Method.Holte carries out three-dimensional critical point detection using Difference-of-Normals (DoN) operator.Castellani is according to three The vision significance principle for tieing up grid model, proposes a kind of three-dimensional critical point detection algorithm that can detect robust.In addition, also There is some algorithm to carry out three-dimensional critical point detection in Laplce's spectral domain rather than real domain using the method for Laplce's spectrum.

Three-dimensional critical point detection method based on geometry lacks enough flexibilities, and therefore, it is difficult to meet the need of vast application It asks.It is three-dimensional key point that such method, which usually defines on three-dimensional grid model surface the point of acute variation in all directions, still In some scenes, these points may be some unessential little details in noise either three-dimensional grid model.In addition, working as When needs in view of three-dimensional grid model semantic information, such issues that method based on geometry can hardly be handled.It is based on The above reason, more and more researchers start to be dedicated to finding a kind of new three-dimensional critical point detection of frame progress.

In recent years, some researchers proposed to carry out three-dimensional critical point detection using the method for machine learning, such method It can solve the deficiency of the three-dimensional critical point detection method based on geometry to a certain extent.Teran and Mordohai is using at random Forest carries out three-dimensional critical point detection (Teran, L., Mordohai, P.:3d interest point as classifier detection via discriminative learning.In:Proceedings of the 13th European Conference on Computer Vision Conference on Computer Vision,Zurich, Switzerland(Sept 2014)).In the method, several three-dimensional critical point detection methods based on geometry are used to generate The characteristic attribute of training sample and test sample.Creusot using linear discriminant analysis (LDA) and AdaBoost two ways from Three-dimensional key point is detected in three-dimensional face model.Three-dimensional critical point detection problem is attributed to a two dimension by Salti and Tombari Classification problem, classification standard be a point whether can correctly with a predefined sub- successful match of three-dimensional description (Salti,S.,Tombari,F.,Spezialetti,R.,Stefano,L.D.:Learning a descriptor- specific 3d keypoint detector.In:Computer Vision(ICCV),2015IEEE International Conference on,Dec 2015,2318-2326)。

However, these above-mentioned algorithms mostly only generate characteristic attribute using only local message, lacks and be similar to La Pula Global information as this spectrum.

Summary of the invention

For above-mentioned there are problem or deficiency, can more effectively, robustly and steadily to detect three-dimensional grid mould Key point in type, the present invention provides a kind of three-dimensional critical point detection side based on deep neural network and sparse self-encoding encoder Method.

Specific technical solution is as follows:

Step 1 chooses training set and test set from three-dimensional grid model database, and chooses positive and negative sample from training set This point:

The training set and test set of selection do not overlap, and three-dimensional grid model has the three-dimensional pass generated by handmarking Key point.For each of training set three-dimensional grid model, chooses and be all positive by the three-dimensional key point that handmarking generates Sample point, remaining sample point that is negative.

Step 2, the negative sample point that is positive form characteristic attribute, construction feature property set:

The feature of sample point is formed using local message of the three-dimensional grid model in multiscale space and global information Attribute.

The local message includes three parts: the 1) Euclidean distance of the surrounding neighbors point of measured point to measured point tangent plane f_d, 2) and measured point and the angle f around it between normal vector of field point_θ, 3) and four kinds of curvature of curved surface f_c: maximum principal curvatures, minimum Principal curvatures, Gaussian curvature and mean value curvature.The global information is Laplce's spectral information f_ls。

For any point v in three-dimensional grid model M (x, y, z), enabling f is its characteristic attribute, then:

F=[f₀,f₁,f₂,...,f_Ω]^T (1)

f_i=[f_d,f_θ,f_c,f_ls], i=0,1,2 ..., Ω (2)

Wherein f_i, i=0,1,2 ..., Ω indicates three-dimensional grid model M (x, y, z) characteristic attribute corresponding to scale i Information, f_iInclude three classes local message f_d、f_θ、f_cWith global information f_ls.Three-dimensional grid model M (x, y, z) is in dimensional space Developing indicates are as follows:

M_δ(x, y, z)=M (x, y, z) * G (x, y, z, δ) (3)

Wherein δ ∈ { 0, ε, 2 ε ..., Ω ε } is the standard deviation of three-dimensional Gaussian filter, and ε is to surround three-dimensional grid mould completely 0.3%, the δ=0 of the leading diagonal length of the minimum cube of type indicate the evolutionary model be initial three-dimensional grid model M (x, Y, z), * is convolution operator.

Three-dimensional grid model is made of series of points and its connection relationship.As shown in figure 3, in three-dimensional grid model Any point v, enables V_k(v), k=1,2,3,4,5 be surrounding k- ring neighborhood point, and n is its normal vector.Enable v_kjFor V_k(v) in J-th point, n_kjFor its normal vector.Then point v_kjTo the Euclidean distance d of tangent plane corresponding to point v_kjAre as follows:

Point v_kjWith the angle between point v normal vector are as follows:

Wherein, (x_v,y_v,z_v) be point v coordinate, (x_kj,y_kj,z_kj) it is point v_kjCoordinate.It enables Wherein N_kFor V_k(v) number at midpoint.Then f_dAnd f_θAre as follows:

f_d=[max (d_k),min(d_k),max(d_k)-min(d_k),mean(d_k),var(d_k),harmmean(d_k)] (7)

f_θ=[max (θ_k),min(θ_k),max(θ_k)-min(θ_k),mean(θ_k),var(θ_k),harmmean(θ_k)] (8)

Wherein, mean (), var () and harmmean () respectively indicate arithmetic average, variance and harmonic average.f_c It is made of four kinds of curvature:

Wherein c₁For minimum principal curvatures, c₂For maximum principal curvatures, (c₁+c₂)/2 are mean value curvature, c₁c₂For Gaussian curvature.

The Laplacian Matrix of three-dimensional grid model is a symmetrical matrix and can decompose are as follows:

L=B Λ B^T (10)

Wherein Λ=Diag { λ_f, 1≤f≤Ψ } and it be the element of a diagonal matrix and the inside is arranged according to ascending order, λ_fIt is The characteristic value of the Laplacian Matrix of three-dimensional grid model.The column vector of orthogonal matrix B is corresponding characteristic vector, and Ψ is three-dimensional The sum at grid model midpoint.Laplce's spectrum is defined as:

H (f)={ λ_f,1≤f≤Ψ} (11)

Global information is obtained using logarithm-Laplce's spectrum.Logarithm-Laplce spectrum L (f) is defined as:

L (f)=log (H (f)) (12)

The scrambling R (f) of spectrum is used to obtain grid conspicuousness:

R (f)=| L (f)-J_Γ(f)*L(f)| (13)

Wherein,It is the vector of a 1 × Γ.Pass through following formula:

The scrambling of spectrum is transformed into real domain from spectral domain.Wherein R₁=Diag { exp (R (f)): 1≤f≤Ψ } is Diagonal matrix,For Hadamard product, W is weight matrix, wherein

It enablesS is the element of S, then:

f_ls=[max (s_k),min(s_k),max(s_k)-min(s_k),mean(s_k),var(s_k),harmmean(s_k)] (17)

Step 3, the characteristic attribute collection built using step 2 and corresponding tally set train sparse self-encoding encoder and depth Neural network:

Sparse self-encoding encoder is a variant of self-encoding encoder, by adding sparsity limit in the hidden layer part of self-encoding encoder It makes and obtains.Fig. 2 (a) illustrates the basic structure of a self-encoding encoder.

Three sparse self-encoding encoders are trained first, and the coded portion of these three sparse self-encoding encoders is then extracted grade It is linked togather, forms the sparse self-encoding encoder of depth, then train first level logical to return layer sparse from coding to handle depth The feature exported after device coding.

Deep neural network regression model is made of the sparse self-encoding encoder of depth and above-mentioned logistic regression level connection, Fig. 2 (b) Illustrate the basic structure of the deep neural network regression model in the method for the present invention.

Finally the effect that deep neural network regression model realizes accurate adjustment is acted on using back-propagation algorithm.

Step 4, the deep neural network regression model obtained using step 3 are predicted and are obtained to three-dimensional grid model Corresponding conspicuousness response diagram:

Characteristic attribute is formed to each point of three-dimensional grid model in test set using method same in step 2, and The deep neural network regression model obtained with step 3 predicts the point, obtains a regressand value.Three-dimensional grid is obtained again The regressand value of all the points in model is constituted the conspicuousness response diagram of the three-dimensional grid model with it.

Step 5, the conspicuousness response diagram obtained according to step 4 obtain three-dimensional key point:

The point in conspicuousness response diagram with local maximum is chosen as three-dimensional key point.For in three-dimensional grid model Each point, if the conspicuousness response of the point is all bigger than the conspicuousness response put in 5- ring neighborhood around it, should Point is three-dimensional key point.Otherwise, which is not just three-dimensional key point.

The present invention passes through: 1, deep neural network being used to carry out three-dimensional pass as regression model in conjunction with sparse self-encoding encoder The detection of key point；2, the part using three-dimensional grid model in multiscale space and global information form characteristic attribute, use up Information more than possible is utilized to detect three-dimensional key point；3, these offices can effectively be found by introducing the sparse self-encoding encoder of multilayer Correlation between portion and global information and the advanced features representation for forming these information, to be returned to it.Energy Enough key points effectively, robustly and steadily detected in three-dimensional grid model.

In conclusion relatively existing three-dimensional critical point detection method, method of the invention can effectively, robustly and surely Surely the key point in three-dimensional grid model is detected.

Detailed description of the invention

Fig. 1 is the flow chart of three-dimensional critical point detection method in the present invention；

Fig. 2 (a) is the structure chart of self-encoding encoder, and Fig. 2 (b) is deep neural network regression model of the invention；

Fig. 3 is aircraft three-dimensional grid model and its partial enlargement diagram；

Fig. 4 is the three-dimensional key point that chair three-dimensional grid model is detected using the method for the present invention；

Fig. 5 is using present invention detection three-dimensional video sequence key point obtained in different frame；

Fig. 6 is the performance comparison figure of the present invention with remaining 5 kinds three-dimensional critical point detection methods；(a) is schemed for database A test Concentrate the performance chart about IOU evaluation index；It is bent about the performance of IOU evaluation index in database B test set for scheming (b) Line chart.

Appended drawing reference: tested point v；1- ring neighborhood point V₁；2- ring neighborhood point V₂；3- ring neighborhood point V₃；4- ring neighborhood point V₄； 5- ring neighborhood point V₅。

Specific embodiment

The method of the present invention is described in further detail with specific example with reference to the accompanying drawing, the target of example is to pass through The validity of three-dimensional grid model critical point detection result verification the method for the invention.

In implementation process, we are with document (Dutagaci, H., Cheung, C.P., Godil, A.:Evaluation of 3d interest point detection techniques via human-generated ground Truth.The Visual Computer 28 (9) (2012) 901-917) in three-dimensional grid model database as training and Test data set.

The specific embodiment in training deep neural network stage:

The three-dimensional grid model database is divided into two parts database A and database B, and database A includes 24 three-dimensionals Grid model carries out calibration by 23 people and generates three-dimensional key point Ground truth.Database B includes 43 three-dimensional grid moulds Type carries out calibration by 16 people and generates Ground truth.It is selected respectively 2/3rds of database A and database B as instruction Practice collection, remaining is as test set.

For each of database A training set three-dimensional grid model, Selecting All Parameters σ ∈ 0.01,0.02 ..., 0.1 } and Ground truth corresponding to parameter n ∈ { 11,12 ..., 22 } is positive sample, remaining point is negative sample.Positive sample This sum is 17115, and negative sample sum is 148565.For each of database B training set three-dimensional grid model, choose Ground truth corresponding to parameter σ ∈ { 0.01,0.02 ..., 0.1 } and parameter n ∈ { 8,9 ..., 15 } is positive sample, Remaining point is negative sample.Positive sample sum is 18427, and negative sample sum is 222034.

Step 2, the negative sample point that is positive form characteristic attribute:

For each of training set three-dimensional grid model, expression of the model in scale space is calculated first, so After calculate information of each of model sample point in scale i=0,1,2 ..., Ω, the value of Ω is 6 in the present invention. Local message f_d、f_θWith f_cIt can be calculated by the following formula respectively:

f_d=[max (d_k),min(d_k),max(d_k)-min(d_k),mean(d_k),var(d_k),harmmean(d_k)]

f_θ=[max (θ_k),min(θ_k),max(θ_k)-min(θ_k),mean(θ_k),var(θ_k),harmmean(θ_k)]

Global information f_lsIt is obtained by calculating logarithm-Laplce's spectrum in the scrambling of spectral domain:

R (f)=| L (f)-J_Γ(f)*L(f)|

WhereinIn Γ be 9.Global information (f_ls) are as follows:

f_ls=[max (s_k),min(s_k),max(s_k)-min(s_k),mean(s_k),var(s_k),harmmean(s_k)]

The characteristic attribute of each of final three-dimensional grid model sample point are as follows:

F=[f₀,f₁,f₂,...,f₆]^T

f_i=[f_d,f_θ,f_c,f_ls], i=0,1,2 ..., 6

Sample dimension size is 665.

Step 3 utilizes the characteristic attribute collection built and the sparse self-encoding encoder of corresponding tally set training and depth nerve Network:

The characteristic attribute collection obtained using step 1 and step 2 and the sparse self-encoding encoder of corresponding tally set training and depth Neural network, relative parameters setting are as shown in the table:

Wherein ρ is the sparsity parameter of sparse self-encoding encoder, and β controls sparse punishment in the cost function of sparse self-encoding encoder The weight of item.

Detect the three-dimensional key point stage:

Step 4 predicts three-dimensional grid model using trained deep neural network regression model and obtains it Conspicuousness response diagram:

For each of test set three-dimensional grid model, by taking chair three-dimensional grid model as an example.First, in accordance with Method in step 2 obtains its expression in scale space, then puts for each and calculates characteristic attribute.Using trained Deep neural network regression model predicts each point on chair three-dimensional grid model.For each point, The output of deep neural network is that a value is regressand value between 0 to 1, and output valve indicates that the value more may be three closer to 1 Key point is tieed up, vice versa.All these output valves together constitute the conspicuousness response diagram of chair three-dimensional grid model.

Step 5 obtains three-dimensional key point according to conspicuousness response diagram:

The point in the conspicuousness response diagram of chair three-dimensional grid model with local maximum is chosen as chair three dimensional network The three-dimensional key point of lattice model.For each of chair three-dimensional grid model point, if the conspicuousness response ratio of the point The conspicuousness response put in 5- ring neighborhood around it is all big, then the point is three-dimensional key point.Otherwise, which is not just three-dimensional pass Key point.Fig. 4 illustrates the three-dimensional key point of the chair three-dimensional grid model detected using method of the invention.

Fig. 5 illustrates the three-dimensional key point in the pedestrian's three-dimensional test sequence detected using the method for the present invention under different frame Distribution.

Fig. 6 illustrates the method for the present invention with the performance comparison figure of remaining 5 kinds three-dimensional critical point detection methods, and evaluation index is IOU criterion (L.Teran and P.Mordohai, " 3d interest point detection via discriminativ learning,”in European Conference on Computer Vision.Zurich,Switzerland,Sept 2014), Fig. 6 (a) illustrates performance chart of 6 kinds of algorithms in database A test set, and Fig. 6 (b) illustrates 6 kinds of algorithms and exists Performance chart in database B test set.

Claims

1. the three-dimensional critical point detection method based on deep neural network and sparse self-encoding encoder, comprising the following steps:

Step 1 chooses training set and test set from three-dimensional grid model database, and chooses positive negative sample from training set Point:

The training set and test set of selection do not overlap, and three-dimensional grid model has the three-dimensional generated by handmarking crucial Point；For each of training set three-dimensional grid model, chooses and be all positive sample by the three-dimensional key point of handmarking's generation This point, remaining sample point that is negative；

The characteristic attribute of sample point is formed using local message of the three-dimensional grid model in multiscale space and global information；

The local message includes three parts: 1) the Euclidean distance f of the surrounding neighbors point of measured point to measured point tangent plane_d, 2) and quilt Measuring point and the angle f around it between normal vector of field point_θ, 3) and four kinds of curvature of curved surface f_c: maximum principal curvatures, minimum principal curvatures, Gaussian curvature and mean value curvature；The global information is Laplce's spectral information f_ls；

F=[f₀,f₁,f₂,...,f_Ω]^T (1)

f_i=[f_d,f_θ,f_c,f_ls], i=0,1,2 ..., Ω (2)

Wherein f_i, i=0,1,2 ..., Ω expression three-dimensional grid model M (x, y, z) characteristic attribute information corresponding to scale i, f_iInclude f_d、f_θ、f_cAnd f_ls；Evolution of the three-dimensional grid model M (x, y, z) in dimensional space indicates are as follows:

M_δ(x, y, z)=M (x, y, z) * G (x, y, z, δ) (3)

Wherein δ ∈ { 0, ε, 2 ε ..., Ω ε } is the standard deviation of three-dimensional Gaussian filter, and ε is to surround three-dimensional grid model completely 0.3%, the δ=0 of the leading diagonal length of minimum cube indicate the evolutionary model be initial three-dimensional grid model M (x, y, Z), * is convolution operator；

Three-dimensional grid model is made of series of points and its connection relationship, to any point v in three-dimensional grid model, enables V_k (v), k=1,2,3,4,5 be surrounding k- ring neighborhood point, and n is its normal vector；Enable v_kjFor V_k(v) j-th point in, n_kjFor Its normal vector, then point v_kjTo the Euclidean distance d of tangent plane corresponding to point v_kjAre as follows:

Point v_kjWith the angle between point v normal vector are as follows:

Wherein, (x_v,y_v,z_v) be point v coordinate, (x_kj,y_kj,z_kj) it is point v_kjCoordinate；It enables Wherein N_kFor V_k(v) number at midpoint, then f_dAnd f_θAre as follows:

f_d=[max (d_k),min(d_k),max(d_k)-min(d_k),mean(d_k),var(d_k),harmmean(d_k)] (7)

Wherein, mean (), var () and harmmean () respectively indicate arithmetic average, variance and harmonic average；f_cBy four Kind curvature is constituted:

Wherein c₁For minimum principal curvatures, c₂For maximum principal curvatures, (c₁+c₂)/2 are mean value curvature, c₁c₂For Gaussian curvature；

L=B Λ B^T (10)

Wherein Λ=Diag { λ_f, 1≤f≤Ψ } and it be the element of a diagonal matrix and the inside is arranged according to ascending order, λ_fIt is three-dimensional The characteristic value of the Laplacian Matrix of grid model；The column vector of orthogonal matrix B is corresponding characteristic vector, and Ψ is three-dimensional grid The sum at model midpoint, Laplce's spectrum is defined as:

H (f)={ λ_f,1≤f≤Ψ} (11)

Global information, logarithm-Laplce spectrum L (f) are obtained using logarithm-Laplce's spectrum is defined as:

L (f)=log (H (f)) (12)

The scrambling R (f) of spectrum is used to obtain grid conspicuousness:

R (f)=| L (f)-J_Γ(f)*L(f)| (13)

Wherein,It is the vector of a 1 × Γ, passes through following formula:

The scrambling of spectrum is transformed into real domain from spectral domain, wherein R₁=Diag { exp (R (f)): 1≤f≤Ψ } is diagonal Matrix,For Hadamard product, W is weight matrix, wherein

It enablesS is the element of S, then:

Step 3, the characteristic attribute collection and corresponding tally set built using step 2, the sparse self-encoding encoder of training and depth mind Through network:

Sparse self-encoding encoder is a variant of self-encoding encoder, and adding sparsity limitation in the hidden layer part of self-encoding encoder It obtains；Three sparse self-encoding encoders are trained first, and the coded portion of these three sparse self-encoding encoders is then extracted cascade Together, the sparse self-encoding encoder of depth is formed, first level logical is then trained to return layer to handle the sparse self-encoding encoder of depth The feature exported after coding；

Deep neural network regression model is made of the sparse self-encoding encoder of depth and above-mentioned logistic regression level connection；

Finally the effect that deep neural network regression model realizes accurate adjustment is acted on using back-propagation algorithm；

Step 4, the deep neural network regression model obtained using step 3 are predicted three-dimensional grid model and are obtained corresponding Conspicuousness response diagram:

Characteristic attribute is formed to each point of three-dimensional grid model in test set using method same in step 2, and with step Rapid 3 obtained deep neural network regression models predict the point, obtain a regressand value；Three-dimensional grid model is obtained again The regressand value of middle all the points is constituted the conspicuousness response diagram of the three-dimensional grid model with it；

The point in conspicuousness response diagram with local maximum is chosen as three-dimensional key point；For every in three-dimensional grid model One point, if the conspicuousness response of the point is all bigger than the conspicuousness response put in 5- ring neighborhood around it, which is Three-dimensional key point；Otherwise, which is not just three-dimensional key point.