CN108389251A

CN108389251A - The full convolutional network threedimensional model dividing method of projection based on fusion various visual angles feature

Info

Publication number: CN108389251A
Application number: CN201810235912.5A
Authority: CN
Inventors: 张岩; 水盼盼; 王鹏宇; 胡炳扬; 甘渊; 余锋根; 刘琨; 孙正兴
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2018-08-10
Anticipated expiration: 2038-03-21
Also published as: CN108389251B

Abstract

The invention discloses the full convolutional network threedimensional model dividing methods of projection based on fusion various visual angles feature, including：Step 1, to inputting three-dimensional grid model dataset acquisition data；Step 2, figure is rendered to model projection with the FCN full convolutional networks of fusion various visual angles feature and carries out semantic segmentation, obtain model and project the pixel of rendering figure under each viewpoint direction being predicted to be the probability of each label；Step 3, rendering figure semantic segmentation probability graph is projected under each viewpoint direction to model instead throw and using maximum visual angle pond, obtain the probability that model dough sheet is predicted to be each label；Step 4, it cuts algorithm using Graph Cut figures to optimize, obtains the final prediction label of model dough sheet.

Description

The full convolutional network threedimensional model dividing method of projection based on fusion various visual angles feature

Technical field

The invention belongs to Computer Image Processing and field of Computer Graphics, more particularly to based on fusion various visual angles feature The full convolutional network threedimensional model dividing method of projection.

Background technology

In recent years, the appearance with more and more 3D modeling softwares and depth transducer, such as Kinect, it is wide On the general platform applied to sampling depth data, there is explosive growth, 3D models on the internet in three-dimensional modeling data There is a large amount of form of expression, such as point cloud, voxel, dough sheet.This trend makes being parsed into for hot spot of 3D models Research field.Currently, the research of art of image analysis has been achieved for great successes, the introducing of deep learning frame is even more Further improve effect.However, the convolution operation on 2D images can not directly apply on 3D models so that by depth The method of habit becomes difficult heavy applied to the analysis of 3D models.Therefore, a large amount of 3D model analysis method is dependent on hand adjustment Description son extraction feature.3D model datas are organized into intermediate representation, the data knot such as Ru Shu, figure although having recently emerged Structure so that convolution operation becomes feasible, however this structure is difficult the adjacent pass completely kept between original dough sheet or point System.In addition, requirement of these methods to model watertightness, alignment etc. then further constrains the universality of method.

Although the semantic segmentation problem of 3D models is very basic, it is but very challenging, there is following reason：

1, same semantic label must be correctly labeled as by belonging to the various ambiguous model parts of the same part；

2, accurately the edge of detection model component is frequently necessary to subtleer geological information；

3, part and global characteristics, which must be combined analysis, could realize better segmentation result；

4, analysis method must have robustness to noise, down-sampled and diversity with class model.

In recent years, the semantic segmentation field of 3D models was flourished, and unsupervised conventional method occurs and has supervision The method two major classes of data-driven are other.

Unsupervised method, such as document 1Huang, Qixing, V.Koltun, and L.Guibas.Joint shape segmentation with linear programming.ACM,2011:1-12.；R.Hu,L.Fan,and L.Liu. Co- segmentation of 3d shapes via subspace clustering.Computer Graphics Forum, 31 (5):1703-1713,2012., document 2M.Meng, J.Xia, J.Luo, and Y.He.Unsupervised co- segmentation for 3d shapes using iterative multi-label optimization.Computer- Aided Design,45(2):312-320,2013., document 3O.Sidi, O.van Kaick, Y.Kleiman, H.Zhang, and D. Cohen-Or.Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering.In SIGGRAPH Asia Conference,page 1, 2011., document 4K.Xu, H.Li, H.Zhang, D. Cohen-Or, Y.Xiong, and Z.-Q.Cheng.Style-content separation by anisotropic part scales. Acm Transactions on Graphics,29(6): 184,2010. etc. are divided by means of the priori of existing Models Sets to carry out joint segmentation or collaboration.Document 5V.Kreavoy,D.Julius,and A.Sheffer.Model composition from interchangeable components.In Conference on Computer Graphics&Applications, pages 129-138, 2007. have the method for Models Sets using matching.And document 6K.Xu, H.Li, H.Zhang, D. Cohen-Or, Y.Xiong, and Z.-Q.Cheng.Style-content separation by anisotropic part scales. Acm Transactions on Graphics,29(6):184,2010. then using the method for cluster.But these unsupervised sides Method is only effective to the more similar Models Sets of model, does not have good generalization ability.

There is the method for supervision then to concentrate extraction feature from the training data of tape label, then with trained model to test Data set carries out semantic segmentation.Document 7E.Kalogerakis, A.Hertzmann, and K.Singh.Learning 3d mesh segmentation and labeling.Acm Transactions on Graphics,29(4):102,2010. instruction Conditional Random Field (CRF) grader is practiced to carry out semantic segmentation.Document 8Z.Xie, K.Xu, L.Liu,and Y.Xiong.3d shape segmentation and labeling via extreme learning machine.Computer Graphics Forum,33(5):85-95,2014. passes through training Extreme Learning Machine (ELM) carries out collaboration segmentation to Unknown Model.Document 9K.Guo, D.Zou, and X.Chen.3d mesh labeling via deep convolutional neural networks.ACM Transactions on Graphics, 35(1):3, Dec.2015. by the geometric properties of each tri patch of extraction model, then by Convolutional Networks Neural (CNN) are applied to these features, cannot directly be applied to successfully solve convolutional neural networks In the threedimensional model the problem of, however it must be flow pattern that this method, which requires model structure, and some hand adjustments for extracting Description of the aspect of model then constrains the further promotion of effect.Document 10B.Graham and L.van der Maaten.Submanifold sparse convolutional networks.2017., document 11C.R.Qi, H.Su, K.Mo,and L.J.Guibas.Pointnet: Deep learning on point sets for 3d Classification and segmentation.2016., document 12P.S.Wang, Y.Liu, Y.X.Guo, C.Y.Sun, and X.Tong.O-cnn:Octree-based convolutional neural networks for 3d shape analysis.Acm Transactions on Graphics,36(4):72,2017., document 13L. Yi, H.Su, X.Guo, and L.Guibas.Syncspeccnn:Synchronized spectral cnn for 3d shape The methods of segmentation.2016. by converting threedimensional model to the intermediate structures such as voxel, tree, figure, spectral space, then answer With deep learning frame prediction model dough sheet or the label of point cloud, good effect is achieved, however still without in solution Between form can not completely between reserving model element syntople, and the representation of voxel and point cloud is not practical enough. Document 14Xu, Haotian, M.Dong, and Z.Zhong. " Directionally Convolutional Networks for 3D Shape Segmentation. " International Conference on Computer Vision 2017. are proposed Double-current neural network framework completes the segmentation task of 3D models jointly, and CNN is used using model dough sheet normal as input In extraction lower level feature, another CNN using the distance between model dough sheet histogram as input, for extraction compared with High-level feature, in addition, also creatively proposing searching based on the neighborhood of the different scale of syntople between dough sheet Method solves the problems, such as that convolution is carried out directly on 3D models also can but also the frame both can be suitably used for grid model Applied to point cloud model, the experiment in PSB small data sets, which also turns out, achieves good effect, however this method requires model It is watertight, therefore constrains the universality of this method.

Since deep learning is highly successful in art of image analysis, and the scale of image data set is also far above at present Some 3D model datas collection, such as PSB, ShapeNet, based on various visual angles projected image semantic segmentation and the anti-3D for throwing method Model semantics dividing method has become one of the approach of the above problem of solution.Document 15Y.Wang, M.Gong, T. Wang, D.Cohen-Or,H.Zhang,and B.Chen.Projective analysis for 3d shape segmentation. Acm Transactions on Graphics,32(6):192,2013. use for the first time by 3D model projections to 2D picture spaces Middle analysis is completed to mark the semantic segmentation of projected image, then instead throws back 3D moulds again by traditional collaboration dividing method In type, a kind of thinking solving completely new 3D model semantics segmentation problem is provided, accuracy rate by the tradition of small data set because being assisted Satisfied level is unable to reach with the restriction of dividing method.Document 16E.Kalogerakis, M.Averkiou, S. Maji, and S.Chaudhuri.3d shape segmentation with projective convolutional Networks.2016. well-designed visual angle selection, improves the coverage rate of the visible dough sheet in visual angle, and will be in image, semantic Effect good Fully Convolutional Networks (FCNs) in segmentation field are applied to the semanteme point that projection renders figure Mark is cut, is then optimized again with CRF and further promotes accuracy rate, realize the prediction end to end of 3D model patch-levels.But The visual angle selection of this method excessively takes, ending up then extends the training time of entire frame, practicability with CRF training optimizations It is relatively low.

Invention content

Goal of the invention：The technical problem to be solved by the present invention is to, provide a kind of new to have in view of the deficiencies of the prior art The 3D model semantics of effect divide mask method.

Technical solution：The invention discloses a kind of threedimensional models of the full convolutional network of projection based on fusion various visual angles feature Dividing method, this method are used to carry out semantic segmentation mark to all parts of 3D models, include the following steps：

Step 1, to the three-dimensional grid model dataset acquisition data of input；

Step 2, with the full convolutional network FCN (Fully Connected Network) of fusion various visual angles feature to three-dimensional Grid model projection renders figure and carries out semantic segmentation, obtains three-dimensional grid model and projects the rendering graphic language under each viewpoint direction Justice segmentation probability graph；

Step 3, project that rendering figure semantic segmentation probability graph is counter to be thrown under each viewpoint direction to three-dimensional grid model And using maximum visual angle pond, obtain the probability that three-dimensional grid model dough sheet is predicted to be each label；

Step 4, it cuts algorithm using Graph Cut figures to optimize, obtains the final pre- mark of three-dimensional grid model dough sheet Label.

Step 1 includes the following steps：

Step 1-1, it is assumed that (format .off has recorded all vertex of threedimensional model to input single 3 D grid model s The call number on three vertex of coordinate and each tri patch, threedimensional model s, which is derived from, contains 16 type 3D models ShapeNetCore standard 3D model semantics partitioned data set) and the tally set l of all dough sheet associated components (format is .seg, the label of each tri patch associated components type of model is had recorded), 14 viewpoints are selected from 42 fixed views, So that the dough sheet coverage rate of three-dimensional grid model s is maximum, find that 14 viewpoints had both made model dough sheet coverage rate exist in experiment 90% or more, and hardware GPU volume performances can be taken into account；

Step 1-2, acquisition step 1-1 is obtained under Lambert (Lambert diffusing reflection illumination model) illumination model 14 The projection of a viewpoint direction drag s renders atlas P={ p₁,p₂,…p_i,…,p₁₄, wherein p_iRefer in i-th of viewpoint direction Under collected to model s projection render figure；

Step 1-3 acquires the dough sheet label color-patch map G={ g of three-dimensional grid model s under 14 viewpoint directions₁,g₂,… g_i,…,g₁₄, wherein g_iRefer under i-th of viewpoint direction to the collected dough sheet true tag color-patch maps of model s, model Different piece corresponds to different labels, the identical component for indicating these dough sheets and belonging to model of the label of dough sheet, by the model Each label mapping in tally set l is a kind of specific color, and to carry out coloring rendering to model s, G is for supervising The training of neural network and compares with the label finally predicted and calculate accuracy rate and (needed during neural metwork training P and G is inputted, wherein G is used for supervised training process, need to only input P during the test without G is input to nerve net In network)；

The dough sheet number of step 1-4, acquisition three-dimensional grid model s are projected to pixel in image with it under 14 viewpoints Mapping relations between position establish a mapping relations concordance list t for three-dimensional grid model s.

Wherein, step 1-1 includes the following steps：

Step 1-1-1, to three-dimensional grid model s, dough sheet collection is combined into F, calculate separately 42 viewpoints it can be seen that face Piece set, selection is it can be seen that the most viewpoint v of dough sheet number in F is added in viewpoint set V, while viewpoint v can be seen To all dough sheets number be added to the dough sheet collection that in dough sheet set M that viewpoint in V is seen, can will be seen from v viewpoint directions Conjunction is rejected from F；

Step 1-1-2, calculate each viewpoint other than viewpoint set V it can be seen that dough sheet set, select energy Enough see that the most viewpoint μ of the dough sheet number in F is added in viewpoint set V, at the same by viewpoint μ it can be seen that all dough sheets It number is added in M, the dough sheet set that will be seen from v viewpoint directions is rejected from F；

Step 1-1-3 repeats step 1-1-2, until the viewpoint number in V terminates when being 14.

In step 1-4, mapped according to the dough sheet journal in three-dimensional grid model s files in the concordance list t Relationship, including each dough sheet number can be seen respectively by how many a viewpoints and corresponding viewpoint number, can each see under viewpoint should Dough sheet is projected in how many a pixels and abscissa and ordinate of these pixels in picture, these in concordance list t Data will be used for subsequent anti-throwing process.

Step 2 includes the following steps：

Step 2-1, by the three-dimensional grid model data set S={ S of input_Train,S_TestEtc. quantity be randomly divided into instruction Practice collection S_Train={ s₁,s₂,…s_i,…,s_nAnd test set S_Test={ s_n+1,s_n+2,…,s_n+j,…,s_n+m, wherein s_iIndicate instruction Practice and concentrates i-th of model, s_n+jIndicate j-th of model in test set；

Step 2-2, for training set S_Train, acquire the projection rendering figure P under its each viewpoint direction_Train={ P₁, P₂,…P_i,…,P_nAnd dough sheet true tag color-patch map G_Train={ G₁,G₂,…G_i,…,g_nBe input in full convolutional network It is trained, obtains the trained full convolutional network for having merged various visual angles feature, wherein P_iRefer to training set S_TrainIn I-th of model s_i14 visual angles under projection render atlas, G_iRefer to training set S_TrainIn i-th model s_i14 Dough sheet true tag under visual angle colours atlas；

Step 2-3, for test set S_Test, acquire the projection rendering figure under its each viewpoint direction and be input to and train Full convolutional neural networks in, obtain three-dimensional grid model and project the pixel of rendering figure under each viewpoint direction being noted as The probability of each label, to obtain the probability graph of the projection rendering figure semantic segmentation under each viewpoint direction.

Wherein, step 2-2 includes the following steps：

Step 2-2-1, the projection inputted under each viewpoint direction of training set render figure P_Train, it is used in combination corresponding dough sheet true Label color-patch map G_TrainSupervised training is done, after the convolution sum pondization operation by forward-propagating, the projection under each viewpoint renders Figure is all extracted as the feature vector of 128 dimensions；

Step 2-2-2, in the preoperative full articulamentum (Fully Connected Layer) of deconvolution, to each visual angle The feature vector for 128 dimensions that lower projection rendering figure is extracted carries out maximum visual angle pond, selects the maximum value group under each dimension The feature vector of 128 dimensions of each visual angle characteristic has been merged in synthesis one, and this feature vector is obtained 40 by the method for stacking The eigenmatrixes of × 40 × 128 dimensions, and this feature matrix is spliced to full articulamentum preceding layer each visual angle under 40 × After the eigenmatrix of 40 × 512 dimensions, the eigenmatrix of 40 × 40 × 640 dimensions under each visual angle is formed；

Step 2-2-3 carries out deconvolution operation to the eigenmatrix of 40 × 40 × 640 dimensions under each visual angle, final to pass through It crosses the common Softmax multi-categorizers in machine learning field and multi-tag prediction is carried out to the multidimensional characteristic vectors of input, obtain pair Projection under each viewpoint direction renders the probability graph of figure semantic segmentation；

Step 2-2-4, using to the label of pixel prediction maximum probability as to the pixel prediction label and corresponding face Piece true tag color-patch map compares, and calculates Loss loss functions, carries out backpropagation, finally obtains trained fusion and regard more The full convolutional network of corner characteristics.

Step 3 includes the following steps：

Step 3-1, the mapping relations concordance list T obtained according to step 1_Test={ t_n+1,t_n+2,…t_n+j,…,t_n+m, Middle t_n+jIt refers to having recorded test set S_TestIn j-th of model s_n+jDough sheet number be projected under 14 visual angles with dough sheet Projection renders the concordance list of the relationship between the location of pixels in figure, is regarded each in conjunction with the three-dimensional grid model that step 2 obtains Projection renders figure semantic segmentation probability graph under point direction, and inversely three-dimensional grid model is derived by each visual angle by counter throw Lower panel is predicted to be the probability of each label, and the anti-detailed process thrown refers to specific embodiment part；

Step 3-2 is predicted to be each label to the three-dimensional grid model that step 3-1 is obtained in each visual angle lower panel Probability results carry out maximum visual angle pond, i.e., dough sheet are predicted to be all probability values of a label most under each visual angle Big value is predicted as the probability value of label as this, so that each label has unique prediction probability value to each dough sheet.

Step 4 comprises the steps of：

Step 4-1, according to side whether is total between dough sheet to determine whether adjacent, be calculated three-dimensional grid model each Adjacent dough sheet set around dough sheet；

Step 4-2, calculate Euclidean distance and dough sheet of each dough sheet respectively between adjacent dough sheet geometric center it Between dihedral angle, the i.e. angle of dough sheet normal；

Step 4-3 cuts the final pre- mark that all dough sheets of three-dimensional grid model are calculated in algorithm according to Graph Cut figures Label.Step 4-3 includes：

If in three-dimensional grid model s, F is the tri patch set of model s, v and the tri patch that f is model s, l_fFor The label of dough sheet f, p_f(l_f) it is that dough sheet f is predicted as l_fProbability value, dough sheet v ∈ N_f, N_fFor the dough sheet set adjacent with dough sheet f, θ_fvFor the dihedral angle between dough sheet f and dough sheet v, d_fvFor the distance between dough sheet f central points and dough sheet v central points, then：

Wherein,

λ is a non-negative constant, for balancingWithλ is rule of thumb set in the present invention =50；If by a label l_fAssign dough sheet f, and l_fCorresponding p_f(l_f) if value very little,It will obtain one A larger punishment；AndThe flatness between adjacent tri patch label can be then punished, when adjacent two A tri patch angle very little or when apart from very little and inconsistent label, can obtain a larger penalty term, to complete All dough sheets of pairs of model_f, f ∈ F } semantic segmentation mark.

The method of the present invention is dedicated to the semantic component for solving for 3D models to be divided into tape label.Composition portion based on model Point come to model carry out analysis and reasoning be widely applied in fields such as computer vision, robot and virtual realities, such as mix Model analysis, object detecting and tracking, 3D reconstructions, Style Transfer, robot roaming and crawl etc., this is but also this works Become very significant.

Advantageous effect：The method of the present invention is inspired carries out language using full convolutional network in first to model two-dimensional projection rendering figure Justice segmentation, the method that the semantic label of pixel is then mapped back by threedimensional model tri patch by back projection method.Entire In the process, this method improve model various visual angles projection rendering figure, dough sheet true tag color-patch map and dough sheet and its in each viewpoint Under between the location of pixels that is projected mapping relations collecting efficiency, and full convolutional neural networks are had modified, in full articulamentum The feature of various visual angles is merged, to further improve the effect of perspective view semantic segmentation mark.Entire method system is efficient And it is practical.The method of the present invention optimizes the viewpoint selection during acquisition 3D model projection rendering figures, and both having taken into account viewpoint can Piece coverage rate and efficiency are met, and devises a kind of compact data structure for storing 3D models dough sheet and it is respectively being regarded The mapping relations between the position of pixel in projection rendering figure are projected under point direction.In addition, being done to full convolutional network FCN Modification, ensures that entire frame can extract local feature, can also extract the global characteristics for having merged various visual angles, to further Improve the effect of 3D model semantics segmentation mark.

Description of the drawings

The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, it is of the invention above-mentioned or Otherwise advantage will become apparent.

Fig. 1 a are undivided archetype.

Fig. 1 b are that the label after semantic segmentation colours rendering result.

Fig. 2 is the System Framework figure of the method for the present invention.

Fig. 3 is the position setting figure of 42 fixed views during viewpoint selection in the present invention.

Fig. 4 a illustrate the viewpoint selection figure during model data collecting in the present invention by taking model aircraft as an example.

Fig. 4 b illustrate the projection rendering figure during model data collecting in the present invention by taking model aircraft as an example.

Fig. 4 c illustrate the dough sheet true tag coloring during model data collecting in the present invention by taking model aircraft as an example Figure.

Fig. 5 is that each dough sheet number of record cast projects in projection rendering figure with it under each viewpoint direction in the present invention Location of pixels between mapping relations data structure show figure.

Fig. 6 is the semantic segmentation effect rendering figure comparison of the method for the present invention and other methods.

Fig. 7 is flow chart of the present invention.

Specific implementation mode

The present invention will be further described with reference to the accompanying drawings and embodiments.

As shown in fig. 7, the invention discloses the full convolutional network threedimensional model segmentations of projection based on fusion various visual angles feature Method, present invention acquisition select 14 viewpoints to threedimensional model iteration to be split, make visible dough sheet under these viewpoint directions Coverage rate maximize；Projection rendering figure and dough sheet true tag color-patch map under each viewpoint direction are acquired to the model, and Record cast dough sheet number and its project after mapping relations in picture between location of pixels；By the various visual angles of model training collection Projection rendering figure and dough sheet true tag color-patch map are input in full convolutional network and are trained, by more regarding for model measurement collection Angular projection renders figure and inputs trained full convolutional network, and the pixel tag for obtaining the projection rendering figure under each viewpoint direction is pre- Survey probability graph；It is right according to the mapping relations between the dough sheet number recorded before and its location of pixels being projected under each viewpoint The pixel tag prediction probability figure of full convolutional network output uses the anti-method thrown, and determines model dough sheet in each viewpoint direction Under be predicted to be the probability of each label, carry out maximum visual angle pond, i.e., being predicted to be owning for a certain label under each visual angle Maximum probability in probability value is predicted to be the probability of the label as dough sheet, obtains each dough sheet and is predicted to be each label Probability value；The final prediction label that optimization algorithm determines all dough sheets of model is cut using figure again.

For given certain one kind 3D Models Sets S={ S_Train,S_Test, etc. quantity be randomly divided into training set S_Train= {s₁,s₂,…s_i,…,s_nAnd test set S_Test={ s_n+1,s_n+2,…,s_n+j,…,s_n+m, wherein S_iIt indicates i-th in training set A model, s_n+jIt indicates that j-th of model in test set, the present invention pass through following steps, completes to test set S_TestInterior model Semantic segmentation mark, goal task is as shown in Figure 1a, and flow chart is as shown in Figure 2：

Specifically include following steps：

Step 1 includes the following steps：

Step 1-1, it is assumed that (format .off has recorded all vertex of threedimensional model to input single 3 D grid model s The call number on three vertex of coordinate and each tri patch, threedimensional model s, which is derived from, contains 16 type 3D models ShapeNetCore standard 3D model semantics partitioned data set) and the tally set l of all dough sheet associated components (format is .seg, the label of each tri patch associated components type of model is had recorded), from 42 fixed views (position is as shown in Figure 3) 14 viewpoints of middle selection so that the dough sheet coverage rate of three-dimensional grid model s is maximum, finds that 14 viewpoints both make mould in experiment Type dough sheet coverage rate can take into account hardware GPU volume performances 90% or more；

The dough sheet number of step 1-4, acquisition three-dimensional grid model s are projected to pixel in image with it under 14 viewpoints Mapping relations between position, establishing a mapping relations concordance list t for three-dimensional grid model s, (data structure of t refers to figure 5).The process is as shown in Fig. 4 a, Fig. 4 b and Fig. 4 c.

Wherein, step 1-1 includes the following steps：

Step 2 includes the following steps：

Wherein, step 2-2 includes the following steps：

Step 2-2-2, as shown in Fig. 2, in the preoperative full articulamentum (Fully Connected Layer) of deconvolution, Feature vector to projecting 128 dimensions that rendering figure is extracted under each visual angle carries out maximum visual angle pond, selects under each dimension Maximum value be combined into one merged each visual angle characteristic 128 dimension feature vectors, this feature vector is passed through into stacking Method obtains the eigenmatrixes of 40 × 40 × 128 dimensions, and each of preceding layer that this feature matrix is spliced to full articulamentum regards After the eigenmatrix of 40 × 40 × 512 dimensions under angle, the eigenmatrix of 40 × 40 × 640 dimensions under each visual angle is formed；

Step 3 includes the following steps：

Step 4 comprises the steps of：

Wherein,

λ is a non-negative constant, for balancingWithλ is rule of thumb set in the present invention =50；If by a label l_fAssign dough sheet f, and l_fCorresponding p_f(l_f) if value very little,It will obtain one A larger punishment；AndThe flatness between adjacent tri patch label can be then punished, when adjacent two A tri patch angle very little or when apart from very little and inconsistent label, can obtain a larger penalty term, to complete All dough sheet { l of pairs of model_f, f ∈ F } semantic segmentation mark.

Embodiment

As illustrated in figs. 1A and ib, Fig. 1 a are undivided archetype to the goal task of the present invention, and Fig. 1 b are semantic point Label after cutting colours rendering result, and the structural system of entire method is as shown in Figure 2.Illustrate the present invention below according to embodiment Each step.

Step (1), to the three-dimensional grid model data set S gathered datas of input.It is specifically divided by taking model s as an example following Several steps：

Step (1.1), selection select 14 viewpoints from 42 fixed views so that model dough sheet coverage rate is maximum；

42 fixed views are arranged, as shown in figure 3, the distance of view distance coordinate origin depends in step (1.1.1) Perspective view under all viewpoint directions of model can be filled as much as possible and render window, experiment herein renders window size and sets 320 × 320 are set to, unit is pixel.Viewpoint straight up and straight down in both direction it is each there are one viewpoint, remaining According to vertical covering of the fan direction every 30 degree, every 45 degree of each viewpoints in horizontal direction, amount to 42 viewpoints.For mould Type s, its dough sheet collection are combined into F, calculate separately the dough sheet set that 42 viewpoints can be seen, select the dough sheet number it can be seen that in F Most viewpoint v is added in viewpoint set V, at the same by v it can be seen that all dough sheets number be added to and can be seen by viewpoint in V To dough sheet set M in, while the dough sheet set that will be seen from the directions viewpoint v is rejected from F；

Step (1.1.2) calculates the dough sheet set that each viewpoint other than viewpoint set V can be seen, selects energy Enough see that the most viewpoint μ of the dough sheet number in F is added in viewpoint set V, at the same by μ it can be seen that all dough sheets number plus Enter into M, while the dough sheet set that will be seen from the directions viewpoint μ is rejected from F；

Step (1.1.3) repeats step (1.1.2), until the viewpoint number in V is 14 end.

Step (1.2), acquisition step 1-1 is obtained under Lambert (Lambert diffusing reflection illumination model) illumination model The projection of 14 viewpoint direction drag s renders atlas P={ p₁,p₂,…p_i,…,p₁₄, wherein p_iRefer to i-th of viewpoint side Projection collected to model s renders figure downwards, and the wide and high of picture is 320 pixels, format jpg, under some visual angle Rendering result for, as shown in Figure 4 b.

Step (1.3) acquires the dough sheet label color-patch map G={ g of three-dimensional grid model s under 14 viewpoint directions₁,g₂,… g_i,…,g₁₄, wherein g_iRefer under i-th of viewpoint direction to the collected dough sheet true tag color-patch maps of model s, picture It is wide and it is high be 320 pixels, format png, by taking some visual angle as an example, as a result as illustrated in fig. 4 c, the different piece of model corresponds to Different label, the identical component for indicating these dough sheets and belonging to model of the label of dough sheet, will be in model tally set l Each label mapping is a kind of specific color (such as red), and to carry out coloring rendering to model s, G is for supervising god Training through network and compares with the label finally predicted and calculate accuracy rate and (needed during neural metwork training defeated Enter P and G, wherein G is used for supervised training process, need to only input P during the test without G is input to neural network In).

Step (1.4) is projected to pixel in image with it to model s collection models dough sheet number under 14 viewpoints Mapping relations between position establish a mapping relations concordance list t, as shown in Figure 5：Trianglei indicates the i-th of model A dough sheet, VisualNumi indicate that the dough sheet can be by seen in several visual angles, and VisualNumj indicate that the dough sheet can be seen regards Point number (value range 1-42), PixelNum indicate that the dough sheet is projected under the viewpoint direction in how many a pixels, Xk indicates that the abscissa of pixel, yk indicate the ordinate of pixel.Therefore, all dough sheets of model s quilt under each viewpoint direction Projecting in which pixel can be recorded, and be in a manner of compressing and store.

Step (2) renders figure to model projection with the full convolutional network FCN frames of fusion various visual angles feature and carries out semanteme Segmentation, obtains model and projects the pixel of rendering figure under each viewpoint direction being predicted to be the probability of each label, such as Fig. 2 institutes Show.

Step (2.1), by the three-dimensional grid model data set S={ S of input_Train,S_TestEtc. quantity be randomly divided into Training set S_Train={ s₁,s₂,…s_i,…,s_nAnd test set S_Test={ s_n+1,s_n+2,…,s_n+j,…,s_n+m, wherein s_iIt indicates I-th of model in training set, s_n+jIndicate j-th of model in test set；

Step (2.2), for training set S_Train, acquire the projection rendering figure P under its each viewpoint direction_Train={ P₁, P₂,…P_i,…,P_nAnd dough sheet true tag color-patch map G_Train={ G₁,G₂,…G_i,…,g_nBe input in full convolutional network It is trained, obtains the trained full convolutional network for having merged various visual angles feature, wherein P_iRefer to training set S_TrainIn I-th of model s_i14 visual angles under projection render atlas, G_iRefer to training set S_TrainIn i-th model s_i14 Dough sheet true tag under visual angle colours atlas, and the step is specific can be divided into following steps again：

Step (2.2.1), the projection inputted under each viewpoint direction of training set render figure P_Train, it is used in combination corresponding dough sheet true Real label color-patch map G_TrainSupervised training is done, after the convolution sum pondization operation by forward-propagating, the projection wash with watercolours under each viewpoint Dye figure is all extracted as the feature vector of 128 dimensions；

Step (2.2.2), as shown in Fig. 2, in preoperative full articulamentum (the Fully Connected of deconvolution Layer), maximum visual angle pond is carried out to the feature vector for projecting 128 dimensions that rendering figure is extracted under each visual angle, selection is each Maximum value under a dimension is combined into the feature vector for 128 dimensions that one has merged each visual angle characteristic, and this feature vector is led to It crosses the method stacked and obtains the eigenmatrix of 40 × 40 × 128 dimensions, and this feature matrix is spliced to the preceding layer of full articulamentum Each visual angle under 40 × 40 × 512 dimension eigenmatrixes after, formed under each visual angle 40 × 40 × 640 dimension spies Levy matrix；

Step (2.2.3) carries out deconvolution operation, finally to the eigenmatrix of 40 × 40 × 640 dimensions under each visual angle Multi-tag prediction is carried out to the multidimensional characteristic vectors of input by the common Softmax multi-categorizers in machine learning field, is obtained The probability graph of figure semantic segmentation is rendered to the projection under each viewpoint direction.

Step (2.2.4), using to the label of pixel prediction maximum probability as prediction label to the pixel and corresponding Dough sheet true tag color-patch map compares, and calculates Loss loss functions, carries out backpropagation, it is more to finally obtain trained fusion The full convolutional network of visual angle characteristic.

Step (2.3), for test set S_Test, acquire the projection rendering figure under its each viewpoint direction and be input to training In good full convolutional neural networks, obtain three-dimensional grid model and project the pixel of rendering figure under each viewpoint direction being marked For the probability of each label, to obtain the probability graph of the projection rendering figure semantic segmentation under each viewpoint direction, test process Mainly comprise the steps of：

Projection of the test model collection under each viewpoint direction is rendered atlas P by step (2.3.1)_TestIt is input to and has instructed The fusion the perfected full convolutional network of various visual angles features；

Step (2.3.2), projection of the output test model collection under each viewpoint direction render atlas P_TestSemantic segmentation Probability graph.

Step (3) projects the model that test model is concentrated under each viewpoint direction and renders figure semantic segmentation probability graph Instead throw and using average viewing angle pond, obtains the probability that model dough sheet is predicted to be each label.

Step (3.1), with the test set P of test process_TestFor, the mapping relations concordance list that is obtained according to step 1 T_Test={ t_n+1,t_n+2,…t_n+j,…,t_n+m, wherein t_n+jIt refers to having recorded test set S_TestIn j-th of model s_n+jFace Piece number and dough sheet are projected to the concordance list of the relationship between the location of pixels in projection rendering figure under 14 visual angles, in conjunction with The three-dimensional grid model that step (2) obtains projects rendering figure semantic segmentation probability graph under each viewpoint direction, by it is counter throw it is inverse To the probability for being derived by three-dimensional grid model in each visual angle lower panel and being predicted to be each label, the anti-detailed process thrown Refer to specific embodiment part；

Step (3.2), the probability that each label is predicted to be in each viewpoint drag dough sheet that step (3.1) is obtained As a result maximum visual angle pond is carried out, i.e., dough sheet is predicted to be under each visual angle the maximum of all probability values of a certain label Value is predicted as the probability value of label as this, so that each label has unique prediction probability value to each dough sheet.Such as Fruit is using the corresponding label of maximum probability as the label of the dough sheet, and the coarse segmentation before being optimized is as a result, rendering result such as Fig. 2 In Raw Labled Shape shown in, occur accidentally dividing in edge-of-part and components interior or the case where wrong segmentation, because This needs to be optimized the accuracy rate for promoting dough sheet mark.

Step (4) is cut algorithm using Graph Cut figures and is optimized, obtains the final prediction label of model dough sheet.

Each dough sheet of model f is calculated according to side whether is total between dough sheet to determine whether adjacent in step (4.1) The adjacent dough sheet set N of surrounding_f；

Step (4.2) calculates Euclidean distance and dough sheet of each dough sheet respectively between adjacent dough sheet geometric center Between the angle of dihedral angle namely dough sheet normal；

Step (4.3) cuts the final prediction label that all dough sheets of model are calculated in algorithm using Graph Cut figures：

If the tri patch collection of three-dimensional grid model s is combined into the tri patch that F, v and f are model s, l_fFor the mark of dough sheet f Label, p_f(l_f) it is that dough sheet f is predicted as l_fProbability value, dough sheet v ∈ N_f, N_fFor the dough sheet set adjacent with dough sheet f, θ_fvFor dough sheet Dihedral angle between f and dough sheet v, d_fvFor the distance between dough sheet f central points and dough sheet v central points, then：

Wherein,

λ is a non-negative constant, for balancingWithλ is rule of thumb set in this experiment =50；If by a label l_fAssign dough sheet f, and l_fCorresponding p_f(l_f) if value very little,It will obtain one A larger punishment；AndThe flatness between adjacent tri patch label can be then punished, when adjacent two A tri patch angle very little or when apart from very little and inconsistent label, can obtain a larger penalty term, to complete All dough sheet { l of pairs of model_f, f ∈ F } semantic segmentation mark.

Interpretation of result

The experimental situation parameter of the method for the present invention is as follows：

1) it is Windows10 64 to carry out data acquisition and the anti-experiment porch parameter for throwing simultaneously render process to model Operating system, Intel (R) Core (TM) i5-3470 CPU 3.20GHz, memory 10GB, using C++ programming languages, and combine OpenGL and OpenCV third parties increase income library to realize, programming development environment is Visual Studio 2015；

2) to merged various visual angles feature the training of the full convolutional networks of FCN and the experiment porch parameter of test process be 64 bit manipulation systems of Windows10, Intel (R) Core (TM) i7-5820K CPU 3.30GHz, memory 64GB, video card are Titan X GPU 12GB use Python programming languages, and use Caffe third party and increase income library to realize.

The method of the present invention and the method (abbreviation Guo et al.) in conventional method ShapeBoost, document 9, document 16 In the contrast and experiment (as shown in Table 1 and Table 2) of method (abbreviation ShapePFCN) be analyzed as follows：

On the Models Sets of 16 class classifications of generally acknowledged threedimensional model semantic segmentation standard data set ShapeNetCore It is tested, per the item name of a kind of data set as shown in 1 first row of table, wherein title meaning of all categories is Airplane (aircraft), Bag (packet), Cap (cap), Car (automobile), Chair (chair), Earphone (earphone), Guitar (guitar), Knife (knife), Lamp (lamp), Laptop (portable computer), Motorbike (motorcycle), Mug (mug), Pistol (pistol), Rocket (rocket), Skateboard (skis), Table (desk)；Stroke of training set and test set Divide as shown in 1 secondary series of table；Semantic segmentation marks effect and renders figure comparison as shown in Figure 6；Semantic segmentation marks accuracy rate comparison As shown in Table 1 and Table 2.

As shown in fig. 6, the method and ShapePFCN of the present invention respectively have quality.As the accuracy rate of Tables 1 and 2 compares (table 1 Illustrating the method for the present invention, the accuracy rate comparison of semantic segmentation mark, table 2 on ShapeNetCore data sets are opened up with other methods Show that the semantic segmentation on ShapeNetCore data sets marks accuracy rate Statistical Comparison to the method for the present invention with other methods) institute Show, ShapePFCN methods are led in method part of the invention, and Category Avg (being averaged for classification accuracy rate) Be more than ShapePFCN methods on Dataset Avg (Average Accuracy of entire data set), in more than 3 tag class In other category of model result, the method for the present invention is also in no way inferior.

Table 1

Table 2

	ShapeBoost	Guo et al.	ShapePFCN	The method of the present invention
					Category Avg.	83.10	78.7	88.7	88.8
Category Avg.(>3labels)	74.8	69.6	84.9	86.4
					Dataset Avg.	80.4	74.7	88.0	89.0
Dataset Avg.(>3labels)	74.2	68.7	84.5	86.2

Fusion Features in from contrast experiment, removing full convolutional network FCN frames respectively and Graph Cut optimizations, It is as shown in Table 3 with final experimental result accuracy rate comparison, indicate that Fusion Features and Graph Cut optimizations can be obviously improved The final semantic segmentation of model marks accuracy rate.

In addition, 22 or so of visual angle number compared to ShapePFCN of the method for the present invention are reduced to 14, the image of acquisition Size is also down to 320 × 320 by 768 × 768, and the more efficient quick and operability of viewpoint selection method is stronger.Finally Graph Cut figures cut optimization algorithm also need not be as the condition random field CRF (Conditional in ShapePFCN methods Random Field) training like that.More importantly method of the invention does not need the depth information of collection model, therefore more With universality and operability.Various visual angles Fusion Features this technologies is exactly introduced in full convolutional network FCN frames, So that the method for the present invention can obtain preferable effect under these relatively low constraintss.Table 3 illustrates the method for the present invention most Termination fruit and the comparison for removing result after the various visual angles Fusion Features and Graph Cut in full convolutional network FCN optimize respectively.

Table 3

It is specific real the present invention provides the full convolutional network threedimensional model dividing method of projection based on fusion various visual angles feature Now there are many method of the technical solution and approach, the above is only a preferred embodiment of the present invention, it is noted that for For those skilled in the art, without departing from the principle of the present invention, can also make it is several improvement and Retouching, these improvements and modifications also should be regarded as protection scope of the present invention.Each component part being not known in the present embodiment It is realized with the prior art.

Claims

1. the full convolutional network threedimensional model dividing method of projection based on fusion various visual angles feature, which is characterized in that including following Step：

Step 2, three-dimensional grid model is projected to render with the full convolutional network FCN of fusion various visual angles feature and schemes to carry out semantic point It cuts, obtains three-dimensional grid model and project rendering figure semantic segmentation probability graph under each viewpoint direction；

Step 3, project that rendering figure semantic segmentation probability graph is counter to be thrown and adopted under each viewpoint direction to three-dimensional grid model With maximum visual angle pond, the probability that three-dimensional grid model dough sheet is predicted to be each label is obtained；

Step 4, it cuts algorithm using Graph Cut figures to optimize, obtains the final prediction label of three-dimensional grid model dough sheet.

2. according to the method described in claim 1, it is characterized in that, step 1 includes the following steps：

Step 1-1, it is assumed that the tally set l of input single 3 D grid model s and all dough sheet associated components, from 42 fixations 14 viewpoints are selected in viewpoint so that the dough sheet coverage rate of three-dimensional grid model s is maximum；

The projection of step 1-2,14 viewpoint direction drag s that acquisition step 1-1 is obtained under Lambert illumination models render Atlas P={ p₁, p₂... p_i..., p₁₄, wherein p_iRefer to the projection collected to model s under i-th of viewpoint direction to render Figure；

Step 1-3 acquires the dough sheet label color-patch map G={ g of three-dimensional grid model s under 14 viewpoint directions₁, g₂, ...g_i..., g₁₄, wherein g_iRefer under i-th of viewpoint direction to the collected dough sheet true tag color-patch maps of model s, model Different piece correspond to different labels, the identical component for indicating these dough sheets and belonging to model of the label of dough sheet, by the mould Each label mapping in type tally set l is a kind of specific color, to carry out coloring rendering to model s；

The dough sheet number of step 1-4, acquisition three-dimensional grid model s are projected to the position of pixel in image with it under 14 viewpoints Between mapping relations, establish a mapping relations concordance list for three-dimensional grid model s.

3. according to the method described in claim 2, it is characterized in that, step 1-1 includes the following steps：

Step 1-1-1, to three-dimensional grid model s, dough sheet collection is combined into F, calculate separately 42 viewpoints it can be seen that dough sheet collection Close, selection it can be seen that the most viewpoint v of dough sheet number in F is added in viewpoint set V, while by viewpoint v it can be seen that All dough sheets number are added to the dough sheet set that in dough sheet set M that viewpoint in V is seen, can will be seen from v viewpoint directions from F Middle rejecting；

Step 1-1-2, calculate each viewpoint other than viewpoint set V it can be seen that dough sheet set, selection can see The viewpoint μs most to the dough sheet number in F are added in viewpoint set V, at the same by viewpoint μ it can be seen that all dough sheets number be added Into M, the dough sheet set that will be seen from v viewpoint directions is rejected from F；

4. according to the method described in claim 3, it is characterized in that, in step 1-4, according to three-dimensional grid in the concordance list t Dough sheet journal mapping relations in model s files, including each dough sheet number can be seen respectively by how many a viewpoints and Corresponding viewpoint number, can each be shown in that the dough sheet is projected in how many a pixels under viewpoint and these pixels are in picture Abscissa and ordinate, these data in concordance list t will be used for subsequent anti-throwing process.

5. according to the method described in claim 4, it is characterized in that, step 2 includes the following steps：

Step 2-1, by the three-dimensional grid model data set S={ S of input_Train, S_TestEtc. quantity be randomly divided into training set S_Train={ s₁, s₂... s_i..., s_nAnd test set S_Test={ s_n+1, s_n+2..., s_n+j..., s_n+m, wherein s_iIndicate instruction Practice and concentrates i-th of model, s_n+jIndicate j-th of model in test set；

Step 2-2, for training set S_Train, acquire the projection rendering figure P under its each viewpoint direction_Train={ P₁, P₂, ...P_i..., p_nAnd dough sheet true tag color-patch map G_Train={ G₁, G₂... G_i..., g_nBe input in full convolutional network It is trained, obtains the trained full convolutional network for having merged various visual angles feature, wherein P_iRefer to training set S_TrainIn I-th of model s_i14 visual angles under projection render atlas, G_iRefer to training set S_TrainIn i-th of model s_i14 Dough sheet true tag under visual angle colours atlas；

Step 2-3, for test set S_Test, acquire the projection rendering figure under its each viewpoint direction and be input to trained complete In convolutional neural networks, obtain three-dimensional grid model and project the pixel of rendering figure under each viewpoint direction being noted as each mark The probability of label, to obtain the probability graph of the projection rendering figure semantic segmentation under each viewpoint direction.

6. method according to claim 5, which is characterized in that step 2-2 includes the following steps：

Step 2-2-1, the projection inputted under each viewpoint direction of training set render figure P_Train, corresponding dough sheet true tag is used in combination Color-patch map G_TrainSupervised training is done, after the convolution sum pondization operation by forward-propagating, the projection under each viewpoint renders figure It is extracted as the feature vector of 128 dimensions；

Step 2-2-2, in the preoperative full articulamentum of deconvolution, to projecting 128 dimensions that rendering figure is extracted under each visual angle Feature vector carries out maximum visual angle pond, selects the maximum value under each dimension to be combined into one and has merged each visual angle characteristic This feature vector is obtained the eigenmatrix of 40 × 40 × 128 dimensions by the feature vector of 128 dimensions by the method for stacking, and should Eigenmatrix is spliced to after the eigenmatrix of 40 × 40 × 512 dimensions under each visual angle of the preceding layer of full articulamentum, is formed The eigenmatrix of 40 × 40 × 640 dimensions under each visual angle；

Step 2-2-3 carries out deconvolution operation to the eigenmatrix of 40 × 40 × 640 dimensions under each visual angle, eventually passes through Softmax multi-categorizers carry out multi-tag prediction to the multidimensional characteristic vectors of input, obtain to the projection under each viewpoint direction Render the probability graph of figure semantic segmentation；

Step 2-2-4, using to the label of pixel prediction maximum probability as to the pixel prediction label and corresponding dough sheet it is true Real label color-patch map comparison, calculates Loss loss functions, carries out backpropagation, finally obtain trained fusion various visual angles feature Full convolutional network.

7. according to the method described in claim 6, it is characterized in that, step 3 includes the following steps：

Step 3-1, the mapping relations concordance list T obtained according to step 1_Test={ t_n+1, t_n+2... t_n+j..., t_n+m, wherein t_n+jIt refers to having recorded test set S_TestIn j-th of model s_n+jDough sheet number and dough sheet projection is projected under 14 visual angles The concordance list for rendering the relationship between the location of pixels in figure, the three-dimensional grid model obtained in conjunction with step 2 is in each viewpoint side Projection renders figure semantic segmentation probability graph downwards, and inversely three-dimensional grid model is derived by each visual angle lower panel by counter throw It is predicted to be the probability of each label, the anti-detailed process thrown refers to specific embodiment part；

Step 3-2, the three-dimensional grid model obtained to step 3-1 are predicted to be the probability of each label in each visual angle lower panel As a result maximum visual angle pond is carried out, i.e., the maximum value for dough sheet being predicted to be under each visual angle all probability values of a label is made The probability value of label is predicted as this, so that each label has unique prediction probability value to each dough sheet.

8. the method according to the description of claim 7 is characterized in that step 4 comprises the steps of：

Three-dimensional grid model each dough sheet is calculated according to side whether is total between dough sheet to determine whether adjacent in step 4-1 The adjacent dough sheet set of surrounding；

Step 4-2 is calculated two between Euclidean distance and dough sheet of each dough sheet respectively between adjacent dough sheet geometric center Face angle, the i.e. angle of dough sheet normal；

Step 4-3 cuts the final prediction label that all dough sheets of three-dimensional grid model are calculated in algorithm according to Graph Cut figures.

9. according to the method described in claim 8, it is characterized in that, step 4-3 includes：

If in three-dimensional grid model s, F is the tri patch set of model s, v and the tri patch that f is model s, l_fFor dough sheet f's Label, p_f(l_f) it is that dough sheet f is predicted as l_fProbability value, dough sheet v ∈ N_f, N_fFor the dough sheet set adjacent with dough sheet f, θ_fvFor face Dihedral angle between piece f and dough sheet v, d_fvFor the distance between dough sheet f central points and dough sheet v central points, then：

Wherein,

λ is a non-negative constant, for balancingWith