CN107944459A

CN107944459A - A kind of RGB D object identification methods

Info

Publication number: CN107944459A
Application number: CN201711315171.3A
Authority: CN
Inventors: 雷建军; 倪敏; 丛润民; 侯春萍; 陈越; 牛力杰
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-12-09
Filing date: 2017-12-09
Publication date: 2018-04-20

Abstract

The invention discloses a kind of RGB D object identification methods, the recognition methods comprises the following steps：The gray level image by coloured image generation, the surface normal by depth image generation are obtained, by coloured image, gray level image, depth image and surface normal collectively as more data pattern informations；High-level characteristic in coloured image, gray level image and surface normal is extracted by convolution recurrent neural network respectively；Utilize the high-level characteristic of convolution Fei Sheer vectors recurrent neural network extraction depth image；Above-mentioned multiple high-level characteristics are subjected to Fusion Features, the total characteristic of object is obtained, will realize object recognition task in the total characteristic input feature vector grader of object.The present invention merges a variety of data patterns, extracts more accurate RGB D object features, and then improve the accuracy rate of object identification.

Description

A kind of RGB-D object identification methods

Technical field

The present invention relates to deep learning, technical field of stereoscopic vision, more particularly to a kind of RGB-D object identification methods.

Background technology

Object identification is one of key technical problem of computer vision field, has important researching value and extensively should Use prospect.As the further development of sensing technology is with applying, the Kinect of coloured image and depth image can be obtained at the same time Camera etc. is increasingly becoming the mainstream imaging device of a new generation.In general, coloured image can provide the letter such as the texture of target, color Breath, depth image can provide the information such as effective depth, shape, and two kinds of information complement one another, and then further enhance various The performance of visual task.The depth information in RGB-D data how is fully excavated, explores depth and the relation of color data, into One step improves the emphasis and difficult point that object recognition rate is research.Therefore, have towards the object recognition technique research of RGB-D images Highly important theoretical and application value.

From the angle of feature generation, object identification method is broadly divided into two classes, i.e., based on the artificial method for obtaining feature and The method that feature is obtained based on study.The main distinction of these two kinds of methods is the acquisition modes of feature, the former is with artificial side Formula obtains feature, and the latter then extracts target signature by way of study.Obtained feature is inputted to grader (as supported Vector machine, random forest etc.) in classify, and then realize identification mission.

In based on the artificial method for obtaining feature, the feature being commonly used has SIFT (Scale-invariant Feature Transform), SURF (Speed Up Robust Feature) and texture primitive etc., these features can be effective Ground describes the color and the three-dimensional geometric information of texture information and depth image of coloured image.However, the spy of such method extraction Sign often has certain limitation, is only capable of fetching portion clue, and be not easy to expand to different data sets or different moulds In formula.

In contrast, the method extraction feature based on study can be obtained directly from initial data by study, mode More flexibly, it is reliable, wherein representational method has layering and matching tracking, convolution K averages descriptor, layering sparse coding drawn game Portion's codes co-ordinates etc..

The above method is usually handled just for coloured image and depth image, and have ignored other data patterns (such as Gray level image, surface normal etc.) useful effect to identification.

The content of the invention

The present invention is for recognition accuracy existing for current RGB-D object recognition techniques is low, feature describes imperfection etc. and asks Topic, proposes a kind of RGB-D object identification methods, it is intended to merges a variety of data patterns, it is special to extract more accurate RGB-D objects Sign, and then the accuracy rate of object identification is improved, it is described below：

A kind of RGB-D object identification methods, the recognition methods comprise the following steps：

The gray level image by coloured image generation, the surface normal by depth image generation are obtained, by coloured image, ash Image, depth image and surface normal are spent collectively as more data pattern informations；

The high level extracted respectively by convolution-recurrent neural network in coloured image, gray level image and surface normal is special Sign；

Utilize the high-level characteristic of convolution-Fei Sheer vectors-recurrent neural network extraction depth image；

Above-mentioned multiple high-level characteristics are subjected to Fusion Features, obtain the total characteristic of object, the total characteristic of object are inputted special Object recognition task is realized in sign grader.

It is described to be specially by the step of above-mentioned multiple high-level characteristics progress Fusion Features：

F=[F_d；F_c；F_n；F_g]

Wherein, F be represent object total characteristic, F_dFor the feature of depth figure, F_cFor the feature of coloured image, F_nFor table The feature of face normal vector, F_gFor the feature of gray level image.

The feature classifiers are specially softmax graders.

The beneficial effect of technical solution provided by the invention is：

1st, the present invention introduces Fei Sheer vector modules on the basis of convolution-recurrent neural network, denser, complete to obtain Standby depth characteristic statement；

2nd, the present invention merges a variety of data patterns, efficiently solves the RGB-D objects because caused by feature learning is not comprehensive The problem of recognition effect is poor.

Brief description of the drawings

Fig. 1 is a kind of flow chart of RGB-D object identification methods；

Fig. 2 is qualitative effect figure.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, embodiment of the present invention is made below further It is described in detail on ground.

Embodiment 1

A variety of data patterns such as combination of embodiment of the present invention coloured image, depth image, gray level image and surface normal, A kind of RGB-D object identification methods are proposed, concrete operation step is as follows：

101：The gray level image by coloured image generation, the surface normal by depth image generation are obtained, by cromogram Picture, gray level image, depth image and surface normal are collectively as more data pattern informations；

102：Height in coloured image, gray level image and surface normal is extracted by convolution-recurrent neural network respectively Layer feature；

103：Utilize the high-level characteristic of convolution-Fei Sheer vectors-recurrent neural network extraction depth image；

104：Above-mentioned multiple high-level characteristics are subjected to Fusion Features, obtain the total characteristic of object, the total characteristic of object is defeated Enter and object recognition task is realized in feature classifiers.

In conclusion the embodiment of the present invention is by above-mentioned steps 101- steps 104, it is basic in convolution-recurrent neural network Upper introducing Fei Sheer vector modules, are stated with obtaining denser, complete depth characteristic；A variety of data patterns are merged, effectively Solve the problems, such as that the RGB-D object identification effects because caused by feature learning is not comprehensive are poor.

Embodiment 2

The scheme in embodiment 1 is further introduced with reference to specific calculation formula, example, it is as detailed below Description：

201：Obtain more data pattern informations；

In order to more fully excavate coloured image and deep image information, the embodiment of the present invention adds two kinds of data moulds Formula, the i.e. gray level image by coloured image generation and the surface normal by depth image generation, provide as object identification More useful informations.Specifically, depth image and surface normal can provide the geological information of object, coloured image with Gray level image can provide the texture information of object.

202：Utilize the high-level characteristic of convolution-recurrent neural network extraction cromogram, gray-scale map and surface normal；

Wherein, convolution-recurrent neural network (Convolutional-Recursive Neural Network, CNN- RNN) model is made of convolutional neural networks and recurrent neural network two parts.Raw image data inputted after block divides to Convolutional neural networks carry out feature extraction, and then obtain the shift-invariant operator of low-dimensional.

In order to further improve the accuracy rate of object identification, the shift-invariant operator quilt of the low-dimensional of convolutional neural networks output Input carries out further feature extraction into recurrent neural network.Recurrent neural network can effectively capture the layer of input data The structural information of secondaryization feature and object, and then obtain more accurate feature statement.

1) convolutional neural networks

Convolutional neural networks are a kind of effective feature extraction structures, it mainly rolls up input picture using wave filter Feature of the product operation extraction with translation invariant characteristic.

The convolutional neural networks that the embodiment of the present invention utilizes mainly include convolution algorithm and pond computing, pass through convolution algorithm Multiple Feature Mappings can be obtained, multiple Feature Mappings represent the information of object different levels.Pond computing effect is to be based on office Portion's correlation principle carries out sub-sampling to multiple Feature Mappings, so as to retain useful information while data volume is reduced.

If the size of input picture is d_I×d_I, and it is d to have K size_P×d_PWave filter be used for convolution algorithm, then K d can be obtained after convolution algorithm_I-d_PThe filter response of+1 dimension.Then, it is d using size_l×d_l, step-length be S window pair Filter response carries out average pondization operation, obtains the pondization that size is r × r and responds, wherein, r=(d_I-d_P-d_l+1)/s+1.Cause This, convolutional neural networks are applied to piece image can obtain the three-dimensional matrice that size is K × r × r.

2) recurrent neural network

The thought of recurrent neural network is to be learnt in tree construction by recursion method using identical neutral net To the feature of stratification, and then obtain more detailed structural information.In recurrent neural network research work before, tree knot The constituted mode of structure is more flexible, but it is exchanged for by cost of arithmetic speed, thus is not easy to carry out parallel search or utilization Large-scale matrix concurrent operation.Therefore, the embodiment of the present invention employs the recurrent neural network of fixed tree construction, and to original net Network structure is extended, that is, it is per laminated and adjacent vector block and more than vectorial right to allow.Therefore, the embodiment of the present invention uses Recursive Neural Network Structure can obtain more neighborhood informations, and then obtain more accurate object features statement.

Recurrent neural network is using the output of convolutional neural networks as input, the three-dimensional matrice that convolutional neural networks are exported In every group of adjacent column vector be defined as block, multiple merged blocks form father's vectorFor convenience, the present invention is implemented Example is calculated using square, its size is denoted as K × b × b, then his father's vector can be expressed as：

Wherein,It is coefficient matrix, f represents nonlinear operation, and common mathematical function has tanh etc., m_i(i=1, 2,...,b²) represent subvector.

203：Utilize the high-level characteristic of convolution-Fei Sheer vectors-recurrent neural network extraction depth map；

Fei Sheer vectors are a kind of advanced feature coding methods, it can generate parameter model (such as by data sample Gauss hybrid models) one group of local feature vectors is encoded to the character representation become compared with higher-dimension.

If X={ x_k, k=1, K is the feature description obtained from convolutional neural networks, utilizes diagonal covariance matrix Training gauss hybrid models, formula are as follows：

Wherein, K is characterized the dimension of mapping, { ω_i,μ_i,σ_i, i=1,2, N represents the mixing of gauss hybrid models respectively Weight, average and diagonal covariance, N are the numbers of gauss hybrid models.γ_k(i) descriptor of i-th of Gaussian Profile is represented. Therefore, by Fei Sheer vector extraction depth image feature byWithComposition, i.e.,

204：Multiple features fusion；

Gray level image and surface are added in order to more fully excavate colored and deep image information, the embodiment of the present invention Two kinds of data patterns of normal vector.Gray level image and surface normal, can be with respectively as the supplement of coloured image and depth image More useful informations are provided for object identification.Using convolution-recurrent neural network respectively to colored, gray scale and surface normal After three kinds of data patterns carry out feature extraction, after being merged with the depth characteristic of convolution-Fei Sheer vectors-recurrent neural network extraction Final object features are obtained, are denoted as：

F=[F_d；F_c；F_n；F_g]

205：Object recognition task will can be realized in object features F input feature vector graders.

Wherein, in view of the simplicity of softmax graders computationally, the embodiment of the present invention utilize softmax graders Realize object identification, the softmax graders are known to those skilled in the art, and the embodiment of the present invention does not repeat this.

In conclusion the embodiment of the present invention is by above-mentioned steps 201- steps 205, it is basic in convolution-recurrent neural network Upper introducing Fei Sheer vector modules, are stated with obtaining denser, complete depth characteristic；A variety of data patterns are merged, effectively Solve the problems, such as that the RGB-D object identification effects because caused by feature learning is not comprehensive are poor.

Embodiment 3

Feasibility verification is carried out to the scheme in Examples 1 and 2 with reference to Fig. 2, it is described below：

Fig. 2 gives the confusion matrix of the visualization result of this method, i.e. recognition result.Wherein, the transverse axis of confusion matrix Represent the object classification of prediction, altogether 51 classes, such as apple, bowl, cereal__box etc., the longitudinal axis represents true in data set Real object classification.

The value of confusion matrix diagonal entry represents accuracy of identification of this method in each classification, and such as a rows, b arranges The numerical value of element represent for a type objects to be mistakenly identified as the percentage of b type objects.From figure 2 it can be seen that this method obtains Preferable recognition result was obtained, most of classification can obtain higher discrimination.

The embodiment of the present invention is tested this method with control methods in RGB-D databases, obtains object identification Accuracy rate.RGB-D databases are the video sequences collected by Kincet cameras under the indoor environments such as kitchen, office Row, are related to 300 different objects, 51 class data.Each object example be from three viewing angles (30 °, 45 ° of horizontal line and 60 °) shot.Experimental result shows that the recognition accuracy of this method can reach 87.60%, Random Forest sides The recognition accuracy of method is that the recognition accuracy of 79.60%, Linear SVM methods is 81.90%.So with Random Forest methods are compared, and this method obtains 10.05% performance gain.Compared with Linear SVM methods, this method obtains 5.70% performance gain, algorithm performance are excellent.

Wherein, Random Forest methods and Linear SVM methods are algorithm known in those skilled in the art, The embodiment of the present invention does not repeat this.

It will be appreciated by those skilled in the art that attached drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Sequence number is for illustration only, does not represent the quality of embodiment.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.

Claims

1. a kind of RGB-D object identification methods, it is characterised in that the recognition methods comprises the following steps：

The gray level image by coloured image generation, the surface normal by depth image generation are obtained, by coloured image, gray-scale map Picture, depth image and surface normal are collectively as more data pattern informations；

High-level characteristic in coloured image, gray level image and surface normal is extracted by convolution-recurrent neural network respectively；

Above-mentioned multiple high-level characteristics are subjected to Fusion Features, obtain the total characteristic of object, by the total characteristic input feature vector point of object Object recognition task is realized in class device.

2. a kind of RGB-D object identification methods according to claim 1, it is characterised in that described by above-mentioned multiple high levels Feature carry out Fusion Features the step of be specially：

F=[F_d；F_c；F_n；F_g]

Wherein, F be represent object total characteristic, F_dFor the feature of depth figure, F_cFor the feature of coloured image, F_nFor surface method The feature of vector, F_gFor the feature of gray level image.

A kind of 3. RGB-D object identification methods according to claim 1, it is characterised in that the tagsort implement body For softmax graders.