CN107092859A

CN107092859A - A kind of depth characteristic extracting method of threedimensional model

Info

Publication number: CN107092859A
Application number: CN201710148547.XA
Authority: CN
Inventors: 周燕; 曾凡智
Original assignee: Foshan University
Current assignee: Foshan University
Priority date: 2017-03-14
Filing date: 2017-03-14
Publication date: 2017-08-25

Abstract

The present invention provides a method for extracting depth features of a three-dimensional model. First, the polar view of the three-dimensional model is extracted as training input data for a deep convolutional neural network; secondly, a deep convolutional neural network is constructed and the polar view is trained; again , input the polar view into the deep convolutional neural network for training until the deep convolutional neural network converges, and realize the determination of the internal weights of the deep convolutional neural network after the training is completed; finally, in the trained deep convolutional neural network Input the polar view of the 3D model whose features need to be extracted, and calculate the feature vector of the fully connected layer in the deep convolutional neural network, which is used as the depth feature of the 3D model whose features need to be extracted. The present invention builds a deep convolutional neural network and reduces residual errors through iterative correction of weights, so that the network converges. After the training is completed, the fully connected layer of the convolutional neural network is extracted as the depth feature of the extreme view of the 3D model.

Description

A deep feature extraction method for 3D models

技术领域technical field

本发明涉及三维模型处理技术领域，更具体地说，涉及一种三维模型的深度特征提取方法。The invention relates to the technical field of three-dimensional model processing, and more specifically, relates to a deep feature extraction method of a three-dimensional model.

背景技术Background technique

随着三维模型处理技术和计算机硬件、软件的快速发展，以及多媒体技术和互联网技术的推广，大量的三维模型应用于各个领域中，人们对三维模型的应用需求也日益增大。三维模型在电子商务、建筑设计、工业设计、广告影视和三维游戏等多个领域扮演重要的角色。大规模数据集的三维模型在社会生产生活的各个方面需要重用设计和模型检索，因此，如何从已有的各类型的三维模型数据集中快速精确地检索到目标三维模型，成为了亟待解决的关键问题。With the rapid development of 3D model processing technology and computer hardware and software, as well as the promotion of multimedia technology and Internet technology, a large number of 3D models are used in various fields, and people's demand for 3D model applications is also increasing. 3D models play an important role in many fields such as e-commerce, architectural design, industrial design, advertising film and television, and 3D games. The 3D models of large-scale data sets need to reuse design and model retrieval in all aspects of social production and life. Therefore, how to quickly and accurately retrieve the target 3D model from the existing various types of 3D model data sets has become the key to be solved urgently. question.

近年来，基于深度学习的三维模型分析成为研究热点，它结合了计算机视觉、人工智能、智能计算等研究内容，能解决三维模型的视觉任务包括三维模型特征提取、分类、识别、检测和预测等难题。使用深度学习技术能自动地学习三维模型的隐含特征，并且能在大规模数据集上进行训练，增强学习模型的泛化能力。In recent years, 3D model analysis based on deep learning has become a research hotspot. It combines computer vision, artificial intelligence, intelligent computing and other research content, and can solve the visual tasks of 3D models, including 3D model feature extraction, classification, recognition, detection and prediction, etc. problem. The use of deep learning technology can automatically learn the hidden features of the 3D model, and can be trained on a large-scale data set to enhance the generalization ability of the learning model.

目前基于深度学习的特征提取方法，存在如下问题：如深度学习的框架提取的特征不能完全表达三维模型信息、网络层次深带来的计算复杂度高和网络的过拟合问题、以及网络学习时间长和内存储存空间大等。随着深度学习的技术成熟运用和对三维模型特征能力表达强的需求，使用深度学习去提取特征将为三维模型分类、检索、检测和识别难题带来新的突破。The current feature extraction method based on deep learning has the following problems: such as the features extracted by the deep learning framework cannot fully express the 3D model information, the high computational complexity brought by the deep network layer and the over-fitting problem of the network, and the network learning time Long and large memory storage space, etc. With the mature application of deep learning technology and the strong demand for the ability to express 3D model features, the use of deep learning to extract features will bring new breakthroughs in the classification, retrieval, detection and identification of 3D models.

发明内容Contents of the invention

本发明的目的在于克服现有技术中的缺点与不足，提供一种三维模型的深度特征提取方法，该方法构建深层卷积神经网络对三维模型的极视图进行训练，经过迭代修正权值缩小残差，使得网络收敛。待训练完毕后，提取卷积神经网络的全链接层作为三维模型极视图的深度特征，使得深度特征可用于三维模型的分类、检索和识别等视觉任务。该方法构建深层的卷积神经网络层次丰富，可加快网络训练速度和提高深度网络拟合的准确度。The purpose of the present invention is to overcome the shortcomings and deficiencies in the prior art, and provide a method for extracting deep features of a three-dimensional model. The method constructs a deep convolutional neural network to train the extreme view of the three-dimensional model, and reduces the residual value after iterative correction of weights. Poor, making the network converge. After the training is completed, the fully connected layer of the convolutional neural network is extracted as the depth feature of the extreme view of the 3D model, so that the depth feature can be used for visual tasks such as classification, retrieval and recognition of the 3D model. This method constructs a deep convolutional neural network with rich layers, which can speed up network training and improve the accuracy of deep network fitting.

为了达到上述目的，本发明通过下述技术方案予以实现：一种三维模型的深度特征提取方法，其特征在于：In order to achieve the above object, the present invention is achieved through the following technical solutions: a method for extracting depth features of a three-dimensional model, characterized in that:

首先，提取三维模型的极视图，作为深度卷积神经网络的训练输入数据；First, extract the polar view of the 3D model as the training input data for the deep convolutional neural network;

其次，构建深度卷积神经网络，并对极视图进行训练；其中，深度卷积神经网络包括有极视图作为训练输入数据的输入层、用于对极视图的特征进行学习并得到二维特征图的卷积层、用于对不同位置的二维特征图进行聚合统计并降低特征维度的池化层、用于对二维特征图进行排列链接形成一维向量的全链接层、以及用于输出类别预测结果的输出层；Secondly, construct a deep convolutional neural network and train the polar view; among them, the deep convolutional neural network includes an input layer with the polar view as the training input data, which is used to learn the features of the polar view and obtain a two-dimensional feature map The convolutional layer, the pooling layer used to aggregate the two-dimensional feature maps of different positions and reduce the feature dimension, the fully connected layer used to arrange and link the two-dimensional feature maps to form a one-dimensional vector, and the output The output layer of the category prediction result;

再次，将极视图输入到深度卷积神经网络进行训练，直到深度卷积神经网络收敛，实现训练完成后深度卷积神经网络的内部权值的确定；Again, input the polar view into the deep convolutional neural network for training until the deep convolutional neural network converges, and realize the determination of the internal weights of the deep convolutional neural network after the training is completed;

最后，在已训练好的深度卷积神经网络中输入需提取特征的三维模型的极视图，计算深度卷积神经网络中全链接层的特征向量，则作为需提取特征的三维模型的深度特征。Finally, input the polar view of the 3D model that needs to be extracted into the trained deep convolutional neural network, and calculate the feature vector of the fully connected layer in the deep convolutional neural network, which is used as the depth feature of the 3D model that needs to be extracted.

在上述方案中，本发明的三维模型的深度特征提取方法是通过深度卷积神经网络对三维模型的极视图进行训练，经过迭代修正权值缩小残差，使得网络收敛。待训练完毕后，提取卷积神经网络的全链接层作为三维模型极视图的深度特征。其中，极视图对三维模型的空间集合结构全局表达，采用三维模型的极视图可精简和减轻深度卷积神经网络的训练计算量。本发明所构建的深度卷积神经网络层次丰富，可加快网络训练速度和提高深度网络拟合的准确度。其中，极视图是指从三维模型的质心向外发射一组采样射线，射线与模型的交点到质心的距离排列成的二维采样图。In the above solution, the deep feature extraction method of the 3D model of the present invention is to train the polar view of the 3D model through a deep convolutional neural network, and iteratively corrects the weights to reduce the residual, so that the network converges. After the training is completed, the fully connected layer of the convolutional neural network is extracted as the depth feature of the extreme view of the 3D model. Among them, the polar view is a global expression of the spatial collection structure of the 3D model, and the use of the polar view of the 3D model can simplify and reduce the training calculation of the deep convolutional neural network. The deep convolutional neural network constructed by the invention has rich layers, which can speed up the network training speed and improve the fitting accuracy of the deep network. Among them, the polar view refers to a group of sampling rays emitted from the center of mass of the three-dimensional model, and a two-dimensional sampling map formed by the distance from the intersection of the rays and the model to the center of mass.

具体地说，该方法包括以下步骤：Specifically, the method includes the following steps:

步骤s101：提取三维模型的极视图，作为深度卷积神经网络的训练输入数据，其中训练输入数据设置为x⁽ⁱ⁾∈χ，χ为共有N个三维模型的极视图；第i个模型对应的类别标签为y⁽ⁱ⁾∈{1,2,...,K}，K为三维模型的类别数量；Step s101: Extract the polar view of the three-dimensional model as the training input data of the deep convolutional neural network, wherein the training input data is set to x ⁽ⁱ⁾ ∈ χ, where χ is the polar view of N three-dimensional models in total; the i-th model corresponds to The category label of y ⁽ⁱ⁾ ∈ {1,2,...,K}, K is the number of categories of the three-dimensional model;

步骤s102：构建深度卷积神经网络；在深度卷积神经网络中，深度卷积神经网络包括：极视图x⁽ⁱ⁾作为训练输入数据的输入层Ι，4个卷积层C(t),t＝1,2,3,4，1个池化层P，两个全链接层FC(1)和FC(2)，以及输出层Ο；其中，4个卷积层C(t),t＝1,2,3,4中每个卷积层和两个全链接层FC(1)和FC(2)中每个全链接层均使用修正线性激活函数f(a)＝max(0,a)替代深度卷积神经网络中的sigmoid函数f(a)＝1/(1+e^-a)；Step s102: build a deep convolutional neural network; in the deep convolutional neural network, the deep convolutional neural network includes: polar view x ⁽ⁱ⁾ as the input layer I of training input data, 4 convolutional layers C (t), t=1,2,3,4, 1 pooling layer P, two fully connected layers FC(1) and FC(2), and output layer O; among them, 4 convolutional layers C(t), t = Each convolutional layer in 1, 2, 3, 4 and each fully connected layer in the two fully connected layers FC(1) and FC(2) use a modified linear activation function f(a)=max(0, a) replace the sigmoid function f(a)=1/(1+e ^-a ) in the deep convolutional neural network;

步骤s103：设置深度卷积神经网络的参数，即将深度卷积神经网络各层权值初始化：Step s103: Set the parameters of the deep convolutional neural network, that is, initialize the weights of each layer of the deep convolutional neural network:

输入层Ι：输入数据为一个尺寸为(32×32)的极视图x⁽ⁱ⁾；Input layer Ι: the input data is a polar view x ^{(i) whose size is (32×32)} ;

卷积层C：4个卷积层依次对极视图的特征进行学习，各个卷积层得到的二维特征图的数量为F^t＝(6,8,10,12)，根据以下公式得到各个卷积层的二维特征图：Convolutional layer C: 4 convolutional layers learn the features of the polar view in turn, and the number of two-dimensional feature maps obtained by each convolutional layer is F ^t = (6,8,10,12), and each is obtained according to the following formula The two-dimensional feature map of the convolutional layer:

其中表示C(t)的第q个二维特征图，M表示t-1层的特征图集合，t-1为0时，即表示二维特征图为输入的极视图，表示第t-1层的第p个特征图到第t层卷积层的第q个二维特征图的卷积核，由[-1,1]的随机数生成(5×5)的矩阵作为初始卷积核。bias为偏置，初始化值为0。(*)表示卷积的运算，f(·)为修正线性激活函数；通过上述公式可得到：in Represents the qth two-dimensional feature map of C(t), M represents the feature map set of the t-1 layer, when t-1 is 0, it means that the two-dimensional feature map is the polar view of the input, Represents the convolution kernel of the pth feature map of the t-1th layer to the qth two-dimensional feature map of the tth layer convolutional layer, a matrix of (5×5) generated by random numbers of [-1,1] as the initial convolution kernel. bias is the bias, the initial value is 0. (*) represents the operation of convolution, f(·) is the modified linear activation function; through the above formula, it can be obtained:

卷积层C(1)计算出6个尺寸为(28×28)的二维特征图；The convolutional layer C(1) calculates 6 two-dimensional feature maps with a size of (28×28);

卷积层C(2)计算出8个尺寸为(24×24)的二维特征图；The convolutional layer C(2) calculates 8 two-dimensional feature maps with a size of (24×24);

卷积层C(3)计算出10个尺寸为(20×20)的二维特征图；The convolutional layer C (3) calculates 10 two-dimensional feature maps with a size of (20×20);

卷积层C(4)计算出12个尺寸为(16×16)的二维特征图；The convolutional layer C (4) calculates 12 two-dimensional feature maps with a size of (16×16);

池化层P：通过下述公式对卷积层C(4)计算出的二维特征图进行最大池化处理，得到12个尺寸为(8×8)的特征矩阵：Pooling layer P: Perform maximum pooling processing on the two-dimensional feature map calculated by the convolutional layer C(4) by the following formula to obtain 12 feature matrices with a size of (8×8):

其中，PM_p(u₀,v₀)表示二维特征图最大池化处理的特征矩阵对应的坐标，是卷积层C(4)计算出的二维特征图，max()是计算矩阵元素的最大值；Among them, PM _p (u ₀ , v ₀ ) represents the coordinates corresponding to the feature matrix of the two-dimensional feature map maximum pooling process, is the two-dimensional feature map calculated by the convolutional layer C(4), and max() is the maximum value of the calculation matrix elements;

全链接层FC(1)：将PM_p排列成列向量(64×1)，对各个矩阵的列向量全链接得到一维向量L⁰(768×1)，即一维向量L⁰(768×1)作为全链接层FC(1)的输入向量；Fully-connected layer FC(1): Arrange PM _p into column vectors (64×1), and fully link the column vectors of each matrix to obtain one-dimensional vector L ⁰ (768×1), that is, one-dimensional vector L ⁰ (768× 1) as the input vector of the fully connected layer FC(1);

全链接层FC(2)：设定全链接层FC(1)有512个神经元，全链接层FC(2)有128个神经元；并通过下述全链接层传导公式计算全链接层FC(1)的输出向量L¹作为全链接层FC(2)的输入向量，以及计算全链接层FC(2)的输出向量L²作为输出层O的输入向量：Fully connected layer FC(2): set the fully connected layer FC(1) to have 512 neurons, and the fully connected layer FC(2) to have 128 neurons; and calculate the fully connected layer FC by the following fully connected layer conduction formula The output vector L ¹ of (1) is used as the input vector of the fully connected layer FC(2), and the output vector L ² of the fully connected layer FC(2) is calculated as the input vector of the output layer O:

其中为与网络权值，由[-1,1]的随机数生成(512×768)的矩阵作为初始权值W¹，生成(128×512)的矩阵作为初始权值W²；为的偏置，初始值为0；f(·)为修正线性激活函数；取1和2；in for and Network weights, a matrix of (512×768) generated by a random number of [-1,1] is used as the initial weight W ¹ , and a matrix of (128×512) is generated as the initial weight W ² ; for The bias, the initial value is 0; f(·) is the modified linear activation function; Take 1 and 2;

输出层O：设置输出层有K个神经元，计算公式如下：Output layer O: Set the output layer to have K neurons, the calculation formula is as follows:

y′＝f(W³L²+b²)；y'=f(W ³ L ² +b ² );

其中，y′⁽ⁱ⁾为深度卷积神经网络的最终输出，i为第i个模型；W³由[-1,1]的随机数生成(K×128)的矩阵作为初始权值；L²为全链接层FC(2)的输出向量；Among them, y′ ⁽ⁱ⁾ is the final output of the deep convolutional neural network, and i is the i-th model; W ³ is a matrix of (K×128) generated by random numbers [-1,1] as the initial weight; L ² is the output vector of the fully connected layer FC(2);

步骤s104：对极视图的数据集χ进行训练，设置学习率η为1，使用梯度随机下降算法，根据深度卷积神经网络输出预测的结果y′⁽ⁱ⁾和真实的类别标签y⁽ⁱ⁾∈{1,2,...,K}的误差，进行反向传播，迭代20次算法能收敛，实现训练完成后深度卷积神经网络的内部权值的确定；Step s104: train the polar view data set χ, set the learning rate η to 1, use the gradient random descent algorithm, and output the predicted result y′ ⁽ⁱ⁾ and the real category label y ⁽ⁱ⁾ according to the deep convolutional neural network The error of ∈{1,2,...,K} is backpropagated, and the algorithm can converge after 20 iterations, realizing the determination of the internal weights of the deep convolutional neural network after training;

步骤s105：在已训练好的深度卷积神经网络中输入需提取特征的三维模型的极视图x⁽ⁱ⁾，并计算第2个全链接层FC(2)输出的特征向量L²，即是所需提取特征的三维模型极视图的深度特征。Step s105: Input the polar view x ⁽ⁱ⁾ of the 3D model to be extracted into the trained deep convolutional neural network, and calculate the feature vector L ² output by the second fully connected layer FC(2), that is Depth features of the polar view of the 3D model for which features are to be extracted.

所述提取三维模型的极视图是指：The polar view of the extracted three-dimensional model refers to:

首先，将三维点云模型进行预处理，计算出三维点云模型的质心及尺度，并将三维点云模型平移到直角坐标系上进行缩放，实现三维点云模型在直角坐标系上归一化；First, preprocess the 3D point cloud model, calculate the centroid and scale of the 3D point cloud model, and translate the 3D point cloud model to the Cartesian coordinate system for scaling, so as to realize the normalization of the 3D point cloud model on the Cartesian coordinate system ;

其次，将在直角坐标系上经过缩放的三维点云模型转换到球坐标，并得到三维点云模型各个点的方向和距离属性；Secondly, convert the scaled 3D point cloud model on the Cartesian coordinate system to spherical coordinates, and obtain the direction and distance attributes of each point of the 3D point cloud model;

再次，将点集的球坐标映射到极视图的像素位置上，计算每个像素采样距离集的最大距离，作为方向区间的射线采样值；Again, map the spherical coordinates of the point set to the pixel position of the polar view, calculate the maximum distance of each pixel sampling distance set, and use it as the ray sampling value of the direction interval;

最后，将每个像素采样距离集的最大距离排列成二维采样图，即为所提取的极视图。Finally, the maximum distance of each pixel sampling distance set is arranged into a two-dimensional sampling map, which is the extracted polar view.

所述提取三维模型的极视图具体包括以下步骤：The extraction of the polar view of the three-dimensional model specifically includes the following steps:

步骤S201：输入三维点云模型，该三维点云模型的尺度为P＝{p_i(x_i,y_i,z_i)|i＝1,2,...,N}；Step S201: Input a 3D point cloud model, the scale of which is P={p _i ( _xi , y _i , z _i )|i=1,2,...,N};

步骤S202：按照下述公式计算三维点云模型的质心g(g_x，g_y，g_z)，通过得到的质心g(g_x，g_y，g_z)将三维点云模型平移变换到直角坐标系上，则三维点云模型在直角坐标系的尺度为p_i′＝p_i-g,i＝1,2,...,N；平移变换后的三维点云模型p_i′的质心位于直角坐标系的原点；Step S202: Calculate the centroid g(g _x , g _{y , g z ) of the 3D point cloud model according to the following formula, and translate the 3D point cloud model to a right angle through the obtained centroid g(g x , g y} _, _g _z ₎ coordinate system, the scale of the 3D point cloud model in the Cartesian coordinate system is p _i ′= _pi -g, i=1,2,...,N; the centroid of the 3D point cloud model p _i ′ after translation transformation at the origin of the Cartesian coordinate system;

步骤s203：计算三维点云模型的缩放因子s，将三维点云模型缩放到单位尺度为上，其中，缩放因子s为Step s203: Calculate the scaling factor s of the 3D point cloud model, and scale the 3D point cloud model to a unit scale of , where the scaling factor s is

步骤s204：将在直角坐标系上经过缩放的三维点云模型转换到球坐标Q，此时三维点云模型在球坐标上的尺度为转换公式如下：Step s204: Convert the scaled 3D point cloud model on the Cartesian coordinate system to the spherical coordinate Q. At this time, the scale of the 3D point cloud model on the spherical coordinate is The conversion formula is as follows:

其中θ∈[0,π]，仰角在Z轴负半轴上为0；Where θ∈[0,π], the elevation angle is 0 on the negative half axis of the Z axis;

步骤s205：将球坐标Q映射到极视图的像素位置(u,v)上，映射关系为则三维点云模型在极视图的像素位置(u,v)上按照下述公式计算；其中，一个像素位置(u,v)上存在一个球坐标点、多个球坐标点或者不存在有球坐标点；Step s205: Mapping the spherical coordinate Q to the pixel position (u, v) of the polar view, the mapping relationship is Then the 3D point cloud model is calculated according to the following formula at the pixel position (u, v) of the polar view; where there is one spherical coordinate point, multiple spherical coordinate points or no spherical coordinate point at a pixel position (u, v) Coordinate points;

其中n_u和n_v分别为极视图的宽和长；where n _u and n _v are the width and length of the polar view, respectively;

步骤s206：每个像素(u,v)的采样距离集为按下述公式计算出每个像素采样距离集的最大值作为最大距离以得到极视图中像素采样值，并排列成二维采集图作为极视图I；Step s206: The sampling distance set of each pixel (u, v) is Calculate the maximum value of each pixel sampling distance set according to the following formula as the maximum distance to obtain the pixel sampling value in the polar view, and arrange it into a two-dimensional acquisition image as the polar view I;

上述的深度特征提取方法直接对三维模型的极视图进行特征提取，针对二维极视图利用卷积操作可以感知各个位置的信息。经过多层卷积处理得到的特征图，在全链接层中能提取辨别性强的深度特征。The above-mentioned depth feature extraction method directly performs feature extraction on the polar view of the 3D model, and the information of each position can be perceived by using the convolution operation for the two-dimensional polar view. The feature map obtained through multi-layer convolution processing can extract highly discriminative deep features in the fully connected layer.

而在提取极视图时，预处理过程需要对三维模型进行平移和缩放变换，保证三维模型在标准尺度上归一化和标准化处理。将三维模型点云转换成球坐标系，利于将球坐标点映射到二维极视图的对应像素位置，通过映射，统计该像素位置上的点的距离集的最大值，将最大采样值形成二维采样图，即是三维模型新型的极视图。When extracting the polar view, the preprocessing process needs to translate and scale the 3D model to ensure that the 3D model is normalized and standardized on the standard scale. Converting the 3D model point cloud into a spherical coordinate system is beneficial to map the spherical coordinate point to the corresponding pixel position of the 2D polar view. Through the mapping, the maximum value of the distance set of the point at the pixel position is counted, and the maximum sampling value is formed into a binary Dimensional sampling map, which is a new type of polar view for 3D models.

与现有技术相比，本发明具有如下优点与有益效果：本发明三维模型的深度特征提取方法构建深层卷积神经网络对三维模型的极视图进行训练，经过迭代修正权值缩小残差，使得网络收敛。待训练完毕后，提取卷积神经网络的全链接层作为三维模型极视图的深度特征，使得深度特征可用于三维模型的分类、检索和识别等视觉任务。该方法构建深层的卷积神经网络层次丰富，可加快网络训练速度和提高深度网络拟合的准确度。Compared with the prior art, the present invention has the following advantages and beneficial effects: the deep feature extraction method of the three-dimensional model of the present invention constructs a deep convolutional neural network to train the extreme view of the three-dimensional model, and the residual is reduced through iterative correction of the weight, so that Network convergence. After the training is completed, the fully connected layer of the convolutional neural network is extracted as the depth feature of the extreme view of the 3D model, so that the depth feature can be used for visual tasks such as classification, retrieval and recognition of the 3D model. This method constructs a deep convolutional neural network with rich layers, which can speed up network training and improve the accuracy of deep network fitting.

附图说明Description of drawings

图1是本发明三维模型的深度特征提取方法的流程图；Fig. 1 is the flow chart of the depth feature extraction method of three-dimensional model of the present invention;

图2是本发明三维模型的深度特征提取方法中深层卷积神经网络的示意图；2 is a schematic diagram of a deep convolutional neural network in the depth feature extraction method of a three-dimensional model of the present invention;

图3是本发明方法中提取三维模型的极视图的流程图；Fig. 3 is the flowchart of extracting the extreme view of three-dimensional model in the method of the present invention;

图4是本发明方法中由三维模型提取得到极视图的示意图；Fig. 4 is the schematic diagram that extracts and obtains polar view by three-dimensional model in the method of the present invention;

具体实施方式detailed description

下面结合附图与具体实施方式对本发明作进一步详细的描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

实施例Example

如图1至图4所示，本发明三维模型的深度特征提取方法是这样的：As shown in Figures 1 to 4, the depth feature extraction method of the three-dimensional model of the present invention is as follows:

在上述方案中，本发明的三维模型的深度特征提取方法是通过深度卷积神经网络对三维模型的极视图进行训练，经过迭代修正权值缩小残差，使得网络收敛。待训练完毕后，提取卷积神经网络的全链接层作为三维模型极视图的深度特征。其中，极视图对三维模型的空间集合结构全局表达，采用三维模型的极视图可精简和减轻深度卷积神经网络的训练计算量。本发明所构建的深度卷积神经网络层次丰富，可加快网络训练速度和提高深度网络拟合的准确度。其中，极视图是指从三维模型的质心向外发射一组采样射线，射线与模型的交点到质心的距离排列成的二维采样图。In the above solution, the deep feature extraction method of the 3D model of the present invention is to train the polar view of the 3D model through a deep convolutional neural network, and to reduce the residual after iterative correction of the weights, so that the network converges. After the training is completed, the fully connected layer of the convolutional neural network is extracted as the depth feature of the extreme view of the 3D model. Among them, the polar view is a global expression of the spatial collection structure of the 3D model, and the use of the polar view of the 3D model can simplify and reduce the training calculation of the deep convolutional neural network. The deep convolutional neural network constructed by the invention has rich layers, which can speed up the network training speed and improve the fitting accuracy of the deep network. Among them, the polar view refers to a group of sampling rays emitted from the center of mass of the three-dimensional model, and a two-dimensional sampling map formed by the distance from the intersection of the rays and the model to the center of mass.

y′＝f(W³L²+b²)；y'=f(W ³ L ² +b ² );

上述提取三维模型的极视图是指：The polar view of the above-mentioned extracted 3D model refers to:

具体地说，提取三维模型的极视图具体包括以下步骤：Specifically, extracting the polar view of the 3D model specifically includes the following steps:

上述实施例为本发明较佳的实施方式，但本发明的实施方式并不受上述实施例的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.

Claims

1. A depth feature extraction method of a three-dimensional model, characterized in that:

First, extract the polar view of the 3D model as the training input data for the deep convolutional neural network;

Secondly, construct a deep convolutional neural network and train the polar view; among them, the deep convolutional neural network includes an input layer with the polar view as the training input data, which is used to learn the features of the polar view and obtain a two-dimensional feature map The convolutional layer, the pooling layer used to aggregate the two-dimensional feature maps of different positions and reduce the feature dimension, the fully connected layer used to arrange and link the two-dimensional feature maps to form a one-dimensional vector, and the output The output layer of the category prediction result;

Again, input the polar view into the deep convolutional neural network for training until the deep convolutional neural network converges, and realize the determination of the internal weights of the deep convolutional neural network after the training is completed;

Finally, input the polar view of the 3D model that needs to be extracted into the trained deep convolutional neural network, and calculate the feature vector of the fully connected layer in the deep convolutional neural network, which is used as the depth feature of the 3D model that needs to be extracted.

2. the depth feature extraction method of three-dimensional model according to claim 1, is characterized in that: comprise the following steps:

Step s101: Extract the polar view of the three-dimensional model as the training input data of the deep convolutional neural network, wherein the training input data is set to x ⁽ⁱ⁾ ∈ χ, where χ is the polar view of N three-dimensional models in total; the i-th model corresponds to The category label of y ⁽ⁱ⁾ ∈ {1,2,...,K}, K is the number of categories of the three-dimensional model;

Step s102: build a deep convolutional neural network; in the deep convolutional neural network, the deep convolutional neural network includes: polar view x ⁽ⁱ⁾ as the input layer I of training input data, 4 convolutional layers C (t), t=1,2,3,4, 1 pooling layer P, two fully connected layers FC(1) and FC(2), and output layer O; among them, 4 convolutional layers C(t), t = Each convolutional layer in 1, 2, 3, 4 and each fully connected layer in the two fully connected layers FC(1) and FC(2) use a modified linear activation function f(a)=max(0, a) replace the sigmoid function f(a)=1/(1+e ^-a ) in the deep convolutional neural network;

Step s103: Set the parameters of the deep convolutional neural network, that is, initialize the weights of each layer of the deep convolutional neural network:

Input layer Ι: the input data is a polar view x ^{(i) whose size is (32×32)} ;

Convolutional layer C: 4 convolutional layers learn the features of the polar view in turn, and the number of two-dimensional feature maps obtained by each convolutional layer is F ^t = (6,8,10,12), and each is obtained according to the following formula The two-dimensional feature map of the convolutional layer:

<mrow> <msubsup> <mi>x</mi> <mi>q</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>x</mi> <mo>&Element;</mo> <mi>M</mi> </mrow> </munder> <msubsup> <mi>x</mi> <mi>p</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mo>*</mo> <msubsup> <mi>k</mi> <mrow> <mi>p</mi> <mi>q</mi> </mrow> <mi>t</mi> </msubsup> <mo>+</mo> <msubsup> <mi>bias</mi> <mi>q</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> </mrow>

in Represents the qth two-dimensional feature map of C(t), M represents the feature map set of the t-1 layer, when t-1 is 0, it means that the two-dimensional feature map is the polar view of the input, Represents the convolution kernel of the pth feature map of the t-1th layer to the qth two-dimensional feature map of the tth layer convolutional layer, a matrix of (5×5) generated by random numbers of [-1,1] as the initial convolution kernel. bias is the bias, the initial value is 0. (*) represents the operation of convolution, f(·) is the modified linear activation function; through the above formula, it can be obtained:

The convolutional layer C(1) calculates 6 two-dimensional feature maps with a size of (28×28);

The convolutional layer C(2) calculates 8 two-dimensional feature maps with a size of (24×24);

The convolutional layer C (3) calculates 10 two-dimensional feature maps with a size of (20×20);

The convolutional layer C (4) calculates 12 two-dimensional feature maps with a size of (16×16);

Pooling layer P: Perform maximum pooling processing on the two-dimensional feature map calculated by the convolutional layer C(4) by the following formula to obtain 12 feature matrices with a size of (8×8):

Among them, PM _p (u ₀ , v ₀ ) represents the coordinates corresponding to the feature matrix of the two-dimensional feature map maximum pooling process, is the two-dimensional feature map calculated by the convolutional layer C(4), and max() is the maximum value of the calculation matrix elements;

Fully-connected layer FC(1): Arrange PM _p into column vectors (64×1), and fully link the column vectors of each matrix to obtain one-dimensional vector L ⁰ (768×1), that is, one-dimensional vector L ⁰ (768× 1) as the input vector of the fully connected layer FC(1);

Fully connected layer FC(2): set the fully connected layer FC(1) to have 512 neurons, and the fully connected layer FC(2) to have 128 neurons; and calculate the fully connected layer FC by the following fully connected layer conduction formula The output vector L ¹ of (1) is used as the input vector of the fully connected layer FC(2), and the output vector L ² of the fully connected layer FC(2) is calculated as the input vector of the output layer O:

L ^l ＝f(W ^l L ^l-1 +b ^l-1 )

Among them, W ^l is the network weight of FC(l-1) and FC(l), and the matrix of (512×768) generated by the random number of [-1,1] is used as the initial weight W ¹ to generate (128×512) The matrix of is used as the initial weight W ² ; b ^l-1 is the bias of FC(l-1), the initial value is 0; f(·) is the modified linear activation function; l takes 1 and 2;

Output layer O: Set the output layer to have K neurons, the calculation formula is as follows:

y'=f(W ³ L ² +b ² );

Among them, y′ ⁽ⁱ⁾ is the final output of the deep convolutional neural network, and i is the i-th model; W ³ is a matrix of (K×128) generated by random numbers [-1,1] as the initial weight; L ² is the output vector of the fully connected layer FC(2);

Step s104: train the polar view data set χ, set the learning rate η to 1, use the gradient random descent algorithm, and output the predicted result y′ ⁽ⁱ⁾ and the real category label y ⁽ⁱ⁾ according to the deep convolutional neural network The error of ∈{1,2,...,K} is backpropagated, and the algorithm can converge after 20 iterations, realizing the determination of the internal weights of the deep convolutional neural network after training;

Step s105: Input the polar view x ⁽ⁱ⁾ of the 3D model to be extracted into the trained deep convolutional neural network, and calculate the feature vector L ² output by the second fully connected layer FC(2), that is Depth features of the polar view of the 3D model for which features are to be extracted.

3. the depth feature extraction method of three-dimensional model according to claim 1, is characterized in that: the polar view of described extraction three-dimensional model refers to:

First, preprocess the 3D point cloud model, calculate the centroid and scale of the 3D point cloud model, and translate the 3D point cloud model to the Cartesian coordinate system for scaling, so as to realize the normalization of the 3D point cloud model on the Cartesian coordinate system ;

Secondly, convert the scaled 3D point cloud model on the Cartesian coordinate system to spherical coordinates, and obtain the direction and distance attributes of each point of the 3D point cloud model;

Again, map the spherical coordinates of the point set to the pixel position of the polar view, calculate the maximum distance of each pixel sampling distance set, and use it as the ray sampling value of the direction interval;

Finally, the maximum distance of each pixel sampling distance set is arranged into a two-dimensional sampling map, which is the extracted polar view.

4. the depth feature extraction method of three-dimensional model according to claim 3, is characterized in that: the polar view of described extraction three-dimensional model specifically comprises the following steps:

Step S201: Input a 3D point cloud model, the scale of which is P={p _i ( _xi , y _i , z _i )|i=1,2,...,N};

Step S202: Calculate the centroid g(g _x , g _{y , g z ) of the 3D point cloud model according to the following formula, and translate the 3D point cloud model to a right angle through the obtained centroid g(g x , g y} _, _g _z ₎ coordinate system, the scale of the 3D point cloud model in the Cartesian coordinate system is p′ _i =p _i -g,i=1,2,...,N; the centroid of the 3D point cloud model p _i ′ after translation transformation at the origin of the Cartesian coordinate system;

<mrow> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <msub> <mi>g</mi> <mi>x</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>x</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>g</mi> <mi>y</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>y</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msub> <mi>g</mi> <mi>z</mi> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mi>N</mi> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>x</mi> <mi>z</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>;</mo> </mrow>

Step s203: Calculate the scaling factor s of the 3D point cloud model, and scale the 3D point cloud model to a unit scale of , where the scaling factor s is

<mrow> <mi>s</mi> <mo>=</mo> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <mo>|</mo> <mo>|</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mo>&prime;</mo> </msubsup> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>)</mo> </mrow> <mo>;</mo> </mrow>

Step s204: Convert the scaled 3D point cloud model on the Cartesian coordinate system to the spherical coordinate Q. At this time, the scale of the 3D point cloud model on the spherical coordinate is The conversion formula is as follows:

Where θ∈[0,π], the elevation angle is 0 on the negative half axis of the Z axis;

Step s205: Mapping the spherical coordinate Q to the pixel position (u, v) of the polar view, the mapping relationship is Then the 3D point cloud model is calculated according to the following formula at the pixel position (u, v) of the polar view; where there is one spherical coordinate point, multiple spherical coordinate points or no spherical coordinate point at a pixel position (u, v) Coordinate points;

where n _u and n _v are the width and length of the polar view, respectively;

Step s206: The sampling distance set of each pixel (u, v) is Calculate the maximum value of each pixel sampling distance set according to the following formula as the maximum distance to obtain the pixel sampling value in the polar view, and arrange it into a two-dimensional acquisition image as the polar view I;