Specific embodiment
Through inventor the study found that on the global context information modeling of existing scene point cloud, artwork is usually utilized
The available solution of expression ability of type (graphical models), such as relatively common mode be by classifier and condition with
(Conditional Random Fields, CRF) is combined the semantic label to estimate each data point on airport.However, point
The class device Classification and Identification stage independently carries out operation typically as individual module with the CRF optimizing phase, does not have between each other
Interaction, to limit the information interchange between each module.
Wherein, for classifier, three-dimensional voxel convolutional neural networks are a preferable selections.Three-dimensional voxel convolutional Neural
Network is expanded by two-dimensional convolution neural network, and good property is also achieved in objective Classification and Identification task
Can, for the deep neural network based on cloud, three-dimensional voxel convolutional neural networks also have network structure clear, easy
In the advantages such as speeding up to realize.But voxel neural network needs the data of regularization to input, and it is also voxel level that it, which marks result,
Rough label.
For the above-mentioned problems in the prior art, the embodiment of the present invention provides a kind of three-dimensional point based on fusion voxel
Cloud labeling method and device, by the voxel model of regularization be based on voxel convolutional neural networks construct multiscale space with
Multiple dimensioned voxel feature is extracted, voxel feature is then expanded into point feature in the way of feature interpolation, and then is realized point-by-point
More sophisticated category identification, further to promote labeling properties.To make the purpose of the embodiment of the present invention, technical solution and excellent
Point it is clearer, following will be combined with the drawings in the embodiments of the present invention, technical solution in the embodiment of the present invention carry out it is clear,
It is fully described by, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Usually exist
The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.
Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed
The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiments of the present invention, this field is common
Technical staff's every other embodiment obtained without creative efforts belongs to the model that the present invention protects
It encloses.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
As shown in Figure 1, for the application of the three-dimensional point cloud labelling apparatus 100 provided in an embodiment of the present invention based on fusion voxel
Schematic diagram of a scenario.Wherein, electric terminal 10 include based on fusion voxel three-dimensional point cloud labelling apparatus 100, memory 200, deposit
Store up controller 300 and processor 400.Wherein, electric terminal 10 may be, but not limited to, computer, mobile internet surfing equipment
(mobile Internet device, MID) etc. has the electronic equipment of processing function, can also be server etc..
Optionally, memory 200, storage control 300, each element of processor 400 are directly or indirectly electric between each other
Property connection, to realize the transmission or interaction of data.For example, passing through one or more communication bus or signal wire between these elements
It realizes and is electrically connected.Three-dimensional point cloud labelling apparatus 100 based on fusion voxel includes at least one can be with the shape of software or firmware
Formula is stored in memory 200 or is solidificated in the software function module in the operating system of electric terminal 10.Processor 400 is being deposited
Access memory 200 under the control of controller 300 is stored up, for executing the executable module stored in memory 200, such as base
Software function module and computer program etc. included by the three-dimensional point cloud labelling apparatus 100 of fusion voxel.
It is appreciated that structure shown in FIG. 1 is only to illustrate, electric terminal 10 may also include it is more than shown in Fig. 1 or
Less component, or with the configuration different from shown in Fig. 1.Each component shown in Fig. 1 can using hardware, software or its
Combination is realized.
Further, Fig. 2 is please referred to, the embodiment of the present invention also provides a kind of three-dimensional point cloud mark based on fusion voxel
Note method is introduced the three-dimensional point cloud labeling method based on fusion voxel below with reference to Fig. 2.
Step S11 carries out voxelization processing to three dimensional point cloud collection, and carries out voxel in voxel based on processing result
Feature extraction forms the first voxel eigenmatrix;
Step S12, using the first voxel eigenmatrix as the input of Three dimensional convolution neural network voxel is calculated
Analysis On Multi-scale Features, and feature fused in tandem is carried out to obtain the second voxel eigenmatrix to the Analysis On Multi-scale Features;
Voxel feature in second voxel eigenmatrix is extended to three-dimensional point cloud based on feature interpolation algorithm by step S13
To obtain a cloud eigenmatrix in each point in data set;
Step S14 will be marked in cloud eigenmatrix input multilayer perceptron with attribute of the realization to three-dimensional point cloud.
In the present embodiment, voxelization processing is carried out to three-dimensional point cloud first, feature extraction then is carried out to point cloud in voxel,
Then by using voxel feature as element voxel model input Three dimensional convolution neural network carry out Multi resolution feature extraction with merge,
Voxel feature is expanded into a cloud feature followed by feature interpolation algorithm, realizes to the label of three-dimensional point cloud, can effectively mention
High point cloud marks precision.
In detail, Fig. 3 is please referred to, the process for carrying out voxelization processing to cloud in step S11 can pass through following steps
Rapid S111- step S113 is realized:
A cloud coordinate space is divided into multiple voxels according to default voxel size by step S111;
Step S112 sorts out each point that three dimensional point cloud is concentrated to corresponding voxel according to the grid parameter of voxel
In;
Step S113 samples the point in each voxel after classification so that the quantity of the point in voxel reaches first
Preset value.
Wherein, the present embodiment in above-mentioned steps S111- step S113 introduction point cloud voxelization model to be carried out to cloud
Voxelization processing.Specifically, as shown in figure 4, point cloud voxelization will put the segmentation of cloud coordinate space according to given voxel size
For multiple voxels.Where it is assumed that size of the input point cloud on three reference axis, that is, X, Y, Z axis direction is respectively W, H, E, each
The size of voxel is λW、λH、λE, then the size of voxelization treated model is W'=W/ λW, H'=H/ λH, E'=E/ λE.This
In embodiment, for the ease of subsequent convolution algorithm, can enable W', H', E' be integer and be 2 power side.
After realizing voxelization grid to cloud coordinate space in step S111, can further it be joined according to the grid of each voxel
Each point is sorted out in several pairs of point clouds, so that each point is attributed in each voxel.But when carrying out cloud and sorting out, due into
When row three dimensional point cloud acquires by measurement error, distance, the factors such as block and influenced, collected cloud is often not
Uniformly, as partial region point cloud compares concentration, partial region point cloud is than sparse.In addition, point cloud data acquisition is equivalent to mesh
Mark the sampling on surface, thus target internal be it is empty, point cloud data is not present, carries out a voxelization so as to cause to cloud space
Afterwards, the point cloud inside each voxel is unevenly distributed, specifically as shown in figure 4, wherein the voxel in the lower left corner does not include a point cloud,
The voxel in the upper right corner includes that points are less.Consequently, to facilitate subsequent carry out unified voxel feature extraction, a cloud piecemeal is being carried out
After need to sample out of each voxel and obtain identical quantity point, (T is according to cloud resolution ratio and a storage such as the first preset value T
Capacity determines).
It should be noted that when being sampled, if the quantity for the point for including in voxel is greater than the first preset value, from current
The first preset value of stochastical sampling point in voxel is so that the quantity of the point in the voxel reaches the first preset value;If including in voxel
Point quantity less than the first preset value, then randomly select one or more points from current voxel and replicated so that the voxel
In the quantity of point reach the first preset value.For example, it is assumed that the first preset value is T, it is more than the body of T for the quantity put in voxel
Plain then T point of stochastical sampling, for points are less than T in voxel, then the point of random reproduction respective numbers is to obtain T point
Set, carry out after a cloud piecemeal and sampling it is available no more than T' × H' × E' include T point set of voxels, in turn
Feature learning is carried out using the point cloud data in voxel to obtain the validity feature of each voxel comprising point cloud and express.
Further, cloud feature extraction is carried out in voxel based on processing result in above-mentioned steps S11 and forms the first body
The step of plain eigenmatrix includes: the centre coordinate for the point cloud computing cloud in each voxel, and is sat based on the center
It marks and center normalized is carried out to obtain primary data matrix to the point cloud data in voxel;By primary data Input matrix
Point-by-point local feature description is realized in LGAB module, and the local feature set in voxel is carried out using maximum value pond
Point-by-point pondization operates the global characteristics to obtain voxel and as the first voxel eigenmatrix.
Specifically, it is carried out in the embodiment of the present invention using part as shown in Figure 5 and global characteristics Fusion Module (LGAB)
It stacks to build feature learning network and carry out the extraction of voxel sign.Wherein, as shown in Figure 6, it is assumed that VxFor the non-empty comprising T point
Voxel, i.e. Vx={ pi=(xi,yi,zi) }, i=1,2,3 ..., T, then being inputted by point cloud data (primary data matrix)
Center normalization is carried out before LGAB module, i.e., calculates the centre coordinate (c of point cloud in voxel firstx,cy,cz), utilize centre coordinate
(cx,cy,cz) point cloud data is carried out center to normalize being to obtain final input data primary data matrixThat is, the input of voxel characteristic extracting module is what T × 6 was tieed up
Primary data matrix.
Further, point-by-point local feature description can be obtained using the LGAB module of stacking, using maximum value pond
(Max-Pooling, MP), which carries out pondization operation to characteristic set point-by-point in voxel, can be obtained the global characteristics of voxel.Such as Fig. 6
The feature extraction example for non-empty voxel is given, to reduce number of parameters, remaining non-empty voxel is when carrying out feature extraction
Identical network parameter can be shared.When actual implementation, information is shared since LGAB module can be good at merging point cloud local neighborhood
With respective different information, the cascade in the present embodiment using multiple LGAB modules can effectively extract the point cloud inside voxel and believe
Breath.
Further, as shown in fig. 7, using the first voxel eigenmatrix as Three dimensional convolution neural network in step S12
Input can be realized with the process that the Analysis On Multi-scale Features of voxel are calculated by following step.
Voxel eigenmatrix is converted to 4 dimension tensors by step S120, and the 4 dimension tensor is inputted respectively has difference big
Voxel feature in the Three dimensional convolution neural network of small convolution kernel to be calculated under different scale;
Voxel feature under different scale is inputted the three-dimensional warp with different size of convolution kernel by step S121 respectively
To obtain the voxel feature of multiple and different scales in product neural network, wherein each Three dimensional convolution nerve net when carrying out convolution
The convolution kernel size of network and the convolution kernel size of each three-dimensional deconvolution neural network when progress deconvolution are corresponding identical.
In detail, due to the important information that space geometry information is objective, directly handling three-dimensional data can
With extract target validity feature description, therefore, the present invention use for reference two-dimensional convolution neural network in image procossing it is huge at
Function expands to two-dimensional convolution neural network in Three dimensional convolution neural network, i.e., using Three dimensional convolution neural network to three dimensions
When according to being handled, needing to carry out three-dimensional data regularization processing i.e. voxelization and handling, it is also necessary to by two-dimensional convolution operation, pond
Change operation etc. to expand in three-dimensional voxel data.Wherein, Three dimensional convolution formula are as follows:
In formula (1),For the three-dimensional voxel data of input,For three dimensional convolution kernel template,
For its output response.Similar with two-dimensional convolution operation, two-dimensional convolution core is only expanded to three dimensional convolution kernel by Three dimensional convolution operation,
Comprising three dimensions of length, width and height, correspondingly, as shown in figure 8, its local experiences is also converted to by the local neighborhood on two-dimensional surface
Local neighborhood in three-dimensional space.When actual implementation, since Three dimensional convolution operation can reduce the bulk of three-dimensional data, i.e., three
The bulk of dimensional feature figure is less than the size of input voxel data.But three-dimensional data is marked, needs to obtain every number
The characteristic information at strong point needs to obtain the feature of each voxel when carrying out voxel feature extraction, thus need to grasp convolution
Characteristic pattern back mapping after work returns in initial input voxel.To solve this problem, it is grasped in the present embodiment using deconvolution
Make (deconvolution) and is handled obtained three-dimensional voxel characteristic pattern the spy to obtain and input voxel data with size
Sign figure.The core of deconvolution operation is still convolution operation, only carries out edge to input feature vector data before convolution operation
0 operation is mended, to guarantee the size requirement of output characteristic pattern.Show for example, giving the two dimension that convolution is operated with deconvolution shown in Fig. 9
Example, wherein blue markings data are the input data of two operations, and grey flag data is convolution kernel (size of the two is identical),
Green Marker data are the response output of the two.It is not difficult to find out that the input of warp lamination is the output of convolutional layer, warp lamination
Output has identical size with the input of convolutional layer.
Further, after handling based on aforementioned operation all non-empty voxels, a series of voxel of D dimensions can be obtained
Feature.Since each voxel feature and the voxel coordinate of three-dimensional space are uniquely corresponding, thus the character representation that can be will acquire is 4
Tensor is tieed up, size is W' × H' × E' × D, (for empty voxel, use D to tie up zero vector and describe as its feature).It will
Character representation be converted to based on 4 dimension tensors character representation after can using Three dimensional convolution neural network (3D CNN) carry out into
The characteristic optimization of one step.In view of the feature extraction under fixed size (convolution kernel size) is not enough to the part of expressed intact voxel
Contextual information, the present invention in using multiple dimensioned feature extraction with merge, extract more abundant local neighborhood information.
It wherein, as shown in Figure 10, is specific Three dimensional convolution neural network structure, the i.e. multiple dimensioned spy based on 3D CNN
Sign is extracted mainly comprising Three dimensional convolution operation (Conv3D), three-dimensional deconvolution operation (DeConv3D) and feature serial operation
(Concat).For W' × H' × E' × D dimensional feature of input, convolution, warp are carried out respectively using three different convolution kernels
Product operation, is denoted as Conv3D (fin;fout;ker;st;Pad), DeConv3D (fin;fout;ker;st;), pad wherein fin、fout
Indicate the dimension of input and output eigenmatrix, ker;st;Convolution kernel moving step length when pad respectively indicates convolution kernel size, convolution
Size carries out data filling size when Data expansion, is trivector.To obtain the feature under different scale, three convolution
Core can be but be not limited to (1;2;2);(2;1;2);(2;2;1) identical convolution kernel, is used in convolution and deconvolution operation, instead
Corresponding edge is carried out in convolution operation mends 0 operation.In addition, not only being grasped comprising convolution, deconvolution in convolutional layer and warp lamination
Make, each convolution operation is subsequent also comprising normalization layer (Batch Normalization layer, BN) and ReLU activation behaviour
Make.
Multi resolution feature extraction based on 3D CNN is along three mutually orthogonal directions of three-dimensional space (i.e. X, Y, Z-direction)
Using different convolution kernel carry out voxel feature extraction with merge, enable the feature learnt to include more partial structurtes
Information realizes the expression more complete to cloud.
Further, in step s 13, due to needing to obtain the feature of each point when realizing the label of three-dimensional point cloud
Description, and aforementioned Three dimensional convolution neural network can only obtain the feature description of each voxel, therefore, by voxel in the present invention
Feature carries out interpolation to obtain the feature description of each point in input point cloud.As shown in figure 11, it for giving target point p, finds out
Nearest preset quantity (such as 8) a neighboring voxels, each neighboring voxels in the voxel space formed by the second voxel eigenmatrix
Corresponding feature describesWherein j=1,2 ..., 8, then the feature description of target point p are as follows:
In formula (2),It indicates according to the central point c in target point p and j-th of neighboring voxelsjBetween
The weight parameter that Euclidean distance obtains,The voxel feature for indicating j-th of neighboring voxels repeats to hold to each point in three-dimensional point cloud
The row above process, the voxel feature in the second voxel eigenmatrix can be extended to three dimensional point cloud concentration each point in
Obtain a cloud eigenmatrix.
Further, in step S14, the point cloud eigenmatrix of the obtained each point of step S13 is inputted into multilayer perceptron
(MLP) classification identification a little can be realized, i.e. three-dimensional point cloud marks, and specific network structure is referring to Fig. 4.
According to actual needs, it marks to advanced optimize the above-mentioned point cloud obtained using convolutional neural networks as a result, mentioning
Height label precision, the invention also includes step S15.
Step S15, it is excellent by a cloud genera label is carried out in the three-dimensional point cloud input CRF-RNN network for completing attribute label
Change.
Specifically, the present embodiment is aforementioned based on Three dimensional convolution nerve by the FC-CRF realized based on CNN basic operation insertion
In the point cloud token network of network, realization is end to end, fusion is thick marks the fine token network of three-dimensional point cloud optimized with rear end,
Further increase an accuracy for cloud label, the especially flatness of object boundary, profile.
In the present embodiment, the CRF marked towards three-dimensional point cloud is modeled first, the CRF for being then based on CNN operation is approximate real
It is existing, finally merge the three-dimensional point cloud label of CRF optimization.
Traditional semantic marker is to be modeled as point-by-point Classification and Identification, has and carries out Classification and Identification using local feature,
Also have and Classification and Identification is carried out using deep neural network.But point-by-point Classification and Identification would generally be brought and some cannot obviously receive
Marked erroneous, for example other classifications may be identified as the partial dot of some target internal, this is because point-by-point point
Class identification does not account for syntople between points, has only used the part of point to be marked, the neighborhood letter of small size
Breath.If object construction information can be modeled in advance (such as: all targets be all it is continuous, with similar characteristics
Consecutive points should be labeled as same class target) and label result is optimized, is limited based on modeling result, it is some apparent
Mistake can be rejected effectively, and then obtain high-precision label result.Condition random field (CRF) is continuous to target
Property and the effective ways that are modeled of its contextual information, and be widely used in two dimensional image label.Wherein, condition
Random field is the model that the conditional probability distribution of another group of output stochastic variable is calculated under conditions of giving one group of stochastic variable,
Its main feature assumes that output variable constitutes Markov random field (Markov Random Field, MRF).
In detail, CRF is a kind of undirected graph model of probability of discriminate, can in data global context information,
Intercrossing feature is modeled, and is a kind of probability graph model that can be good at processing sequence data segmentation and label.Assuming that giving
Determine stochastic variable set X={ X1,X2,…,XNAnd P={ P1,P2,…,PN, wherein Xi∈ L={ l1,l1,…,lM, for
Three-dimensional point cloud label, P are the input point cloud comprising N number of point, PjFor j-th point of measurement vector, X is the semantic mark of input point cloud
Note is as a result, XiFor i-th point of semantic label, value is one in M semantic label, then corresponding CRF model can
To be indicated with Gibbs probability distribution, it may be assumed that
In formula (3), G is the probability non-directed graph constructed on stochastic variable collection X,OFor the group in figure G, whereinοIn it is each pair of
Node be it is adjacent,OGIt is then the set of all groups in G, Z (P) is normalized function, Λ (xο| P) be group on energy
Function, also known as potential function.
Result x ∈ L is marked for any oneNWhole potential function are as follows:
It is solved to obtain optimal label result based on maximal posterior probability algorithm are as follows:
As can be seen that the maximization of label result posterior probability is whole potential function from aforementioned optimal solution solution procedure
It minimizes.It is not difficult to find out that condition random field realizes the modeling of local context information by group potential function first, it is then sharp
The transmitting of contextual information is carried out with graph structure, and then realizes the modeling of a wide range of contextual information.
For connecting CRF model entirely, each node is connected with remaining node in figure G, as shown in figure 12, corresponding base
GroupOFor comprising individual node or comprising the group of paired node, thus the corresponding whole potential function of x can indicate are as follows:
Wherein, for ease of description, removing the condition portion P in condition posterior probability in formula (6), that is, have Λ (x)=
Λ (x | P),For unit potential function,For pairs of binary potential function, i, j=1,2 ..., N.
Unit potential functionI-th of node is marked as x in expression figure GiCost, the function is usually by certain discriminate classifier
Probability output defines, and estimated result at this moment usually contains more noise, and segmentation result is often discontinuous in object edge.
What pairs of binary potential function provided is that label i-th, j observation point are x simultaneouslyi、xjCost, have and retain adjacent observation point
The flatness of label result can be improved in the effect for marking consistency, reducing inconsistency.
Using the thought of Gauss weighting by pairs of binary potential function is defined as:
In formula (7), function ψ (xi,xj) consistency between different labels calculates function, w(m)For weight,For based on
The smoothing filter function of Gaussian kernel, a total of MGA gaussian kernel function, fi,fjThe characteristic vector of observation point i, j is respectively indicated, and
HaveEach gaussian kernel functionA symmetrical, positive definite square can be passed through
Battle array ΛmTo define.So far, the full connection CRF modeling towards three-dimensional point cloud label is completed, and is how that solution obtains in next step
Optimal label result.
Three-dimensional point cloud label optimization process based on full connection CRF is based on input point cloud data and maximizes posterior probability Φ
(X) process.It is relatively difficult for solving accurate posterior probability, and calculation amount is huge, and the approximation method based on mean field can be with
Convert posterior probability Φ (X) to a series of product of mutually independent marginal probabilities, i.e. Φ (X) ≈ Θ (X)=∏I=1Θi
(Xi), the available any label result x in convolution (3)-(7)iMarginal probability ΘiAre as follows:
The iteration that CRF can be constructed based on formula (8) infers algorithm, as shown in algorithm 5.1.The convergence master of the iterative algorithm
It to be measured by the otherness between the Q and P of estimation, it is available by the convergence of assessment algorithm, be in the number of iterations
Evaluated error very little when 10, it was demonstrated that algorithm has good convergence.
Algorithm 1: the CRF iteration based on mean field approximation infers algorithm
1. initialization: being initialized to all nodes
2.while is convergence do
3. information is transmitted: calculating all gaussian filtering results
4. weighted filtering:
5. consistency detection:
6. increasing unitary potential function:
7. normalization:
8.end while
Below to how realizing using the relevant operation in CNN that CRF iteration infers that above-mentioned algorithm is introduced.It is based on
CNN operation to algorithm be reconstructed in the biggest problems are that the backpropagation of error can be realized, i.e., carried out using BP algorithm
Parameter learning training.
(1) initialization operation
Initialization operation in algorithm 1 are as follows:
Wherein,Summation can be can be carried out to all values.Note
Then haveZi=∑lexp(Ui(l)), it can be seen that this operation is equivalent to right in each scene point
The U of all possible label resultsi(l) the activation operation based on Softmax function is carried out.Softmax function is normal in CNN network
Activation primitive does not include any parameter, and error derivative can carry out backpropagation, thus also can use Back-
Propagation (BP) algorithm carries out learning training.
(2) information is transmitted
As shown in algorithm 1, the information transmitting in CRF utilizes MGA Gaussian filter is to ΘjCarry out smothing filtering.Gauss
The kernel function of filter is coordinate information or color, the strength information obtained according to the feature of cloud, for example put, is expressed every
Incidence relation between a scene point.In full connection CRF model, each filter needs to cover all the points in point cloud, number
It is very big according to amount and calculation amount, thus cannot be directly realized.Here using based on full freedom degree polyhedron lattice side
Method (permutohedral laTTice) realizes quick Gaussian convolution, and calculation amount is O (N), N is the point being filtered
Quantity has faster speed and better filter effect compared to traditional Gaussian convolution.It is brilliant based on full freedom degree polyhedron
The quick Gaussian convolution of lattice method includes four-stage, and polyhedron lattice constructs stage, extension (splat) mapping, slice
(slice) map and obscure (blur) stage etc..
In backpropagation, the input (error derivative) of current convolutional layer is the output of upper one layer of filter along opposite direction
By MGOutput result after a Gaussian filter.In Gaussian convolution based on full freedom degree polyhedron lattice method, this is anti-
To propagation, identical polyhedron lattice building, extension can be mapped on the basis of mapping with slice when keeping with forward-propagating
The sequence of filter in the fuzzy stage is reversed to realize.The calculation amount of this implementation method remains as O (N), hence it is evident that reduces meter
Calculation amount, improves computational efficiency.
(3) weighted filtering
Next calculating is the aforementioned M to each semantic label lGA output result is weighted summation.In a cloud mark
In note, it is independent from each other between each semantic label, thus this weighted filtering operation can pass through MGA convolution kernel be l ×
The convolution operation of l realizes that wherein the input of the convolution operation is to include MGThe eigenmatrix in a channel exports as comprising l
The eigenmatrix in channel.In backpropagation, due to the input and output of this single stepping be it is known, between convolution kernel be also
It is mutually independent, thus the error derivative about convolution nuclear parameter and about input data can calculate, into
And it can use BP algorithm and learning training carried out to convolution nuclear parameter.
(4) consistency detection
In consistency detection, compatible meter is carried out using output result of the PoTTs model to labels different in previous step
It calculates.Whether the label that two similar observation points are mainly compared in compatibility calculating is identical, when the semantic label of two points is identical
Consistency detection is 0, and when the semantic label of two points does not introduce penalty term σ simultaneously, calculating is as follows:
Be compared to using fixed penalty term σ, the present invention is considered as the penalty value learnt based on data, this be by
The degree of association between different labels is different, thus is labeled as different labels for entirely marking result shadow in consecutive points
Sound is different.Therefore, consistency detection can also regard a convolutional layer as, and the I/O channel number of this layer is M
(number of tags), for convolution kernel having a size of l × l, the neuron connection weight parameter learnt is the value of transfer function.Due to
It is realized using basic convolution operation, thus this step is also that can carry out backpropagation.
(5) increase unitary potential function
By unitary potential functionWith obtained in consistency detection output result carry out by element combination into
And obtain complete potential function result.Increase this step of unitary potential function in, do not include any parameter, thus can simply by
Error in output copies to input terminal to realize backpropagation.
(6) it normalizes
Can also be operated by the activation based on Softmax function similar to the process of initialization, in normalization step come
It realizes, backpropagation is consistent with the backpropagation based on Softmax function in CNN.So far, it has used in CNN network
Basic operation each step of single iteration in algorithm 1 is realized, to above-mentioned steps carry out stack can be realized it is more
The derivation algorithm of secondary iteration.
Based on foregoing description, approximation is carried out to CRF model using the approximation method of mean field first in the present embodiment and has been built
Then mould has carried out Equivalent realization to each step in mean field approximation method using the basic operation in CNN, that is, has realized list
The mean field approximation algorithm of secondary iteration.The mean field approximation method of iteration is only needed to stack i.e. related step
Can, i.e., it is calculated using the mean field approximation that iteration can be realized in recursive CNN structure (RNN), structure is as shown in figure 13, gives
The point cloud data of input is P, and point-by-point unitary potential function is U=Ui(l), the marginal probability that preceding an iteration obtains is H1, currently
The marginal probability that iteration obtains is H2, single mean field approximation estimation be denoted as fΩ(U,P,H1), Ω is its parameter sets (comprising adding
All parameters in power filtering and consistency detection) and Ω={ w(m),ψ(l,l')}.For H1, start to initialize when iteration
To be equal to H in iteration later with the softmax function output that U is input2Output, that is, have:
Wherein, T ' is the number of iterations.
Obtaining H1Afterwards, H is carried out based on mean field approximation algorithm2Estimation, that is, have:
H2(t')=fΩ(U,P,H1(t')),0<t'≤T'
For export Y, only in last time iteration output estimation as a result, Y=H2(T')。
Based on above-mentioned analysis it is recognised that the error derivative in whole network structure (being denoted as CRF-RNN) about parameter is
It can ask, thus it can be solved with the BP algorithm of standard, thus can also be embedded into other neural networks and be learned
Practise training.
Further, based on above-mentioned cloud token network Point-VoxelNet network struction fusion three-dimensional voxel volume
The three-dimensional point cloud token network (Point VoxelNet+CRF-RNN, PVCRF) of product neural network and the optimization of the rear end CRF, tool
Body structure is as shown in figure 14, wherein for the point cloud of input, first according to the scene size of input point cloud carry out voxelization and
The point cloud of fixed quantity is randomly selected in voxel for subsequent feature extraction, and spy is then carried out in voxel based on LGAB module
Sign, which is extracted, obtains simple voxel feature, (feature of hollow body element, which uses, the mends 0 operation) base after obtaining all non-empty voxel features
In Three dimensional convolution neural network (Conv3D, DeConv3D) carry out multiple dimensioned voxel feature extraction with merge, then utilization is slotting
Multiple dimensioned voxel feature is expanded in all the points and then obtains point-by-point point feature by the method for value, is then inputted point feature more
To obtain just beans-and bullets shooter cloud label as a result, finally carrying out rear end optimization based on CRF-RNN network structure in layer perceptron.This network
Structure realizes the information exchange in Classification and Identification stage and CRF optimizing phase in cloud label well, marks to point cloud is improved
Precision has obvious effect.
Based on the description to the above-mentioned three-dimensional point cloud labeling method based on fusion voxel, inventor is also based on fusion to this
The performance of three-dimensional point cloud labeling method of element is verified, as point set hand over and than (Intersection over Union,
IoU) and the evaluation indexes such as whole accuracy rate (Overall Accuracy, OA) evaluate and test a cloud labeling properties.
(1) network implementations and parameter setting
In point cloud data rasterizing and sample phase, need to carry out different disposal to different data sets.
S3DIS: for S3DIS data set, scene is respectively E=8m along the full-size range of Z, Y, X-direction,
H=16m, W=50m.To cover whole scenes, the size for being entire grid with 8 × 16 × 50, the size of each voxel is λE=
0.5m、λH=0.25m, λW=0.2m, the voxel model of building is having a size of E'=16, H'=64, W'=256, wherein extra
Voxel empties.T=32 point is chosen in each voxel.
VKITTI: for vKITTI data set, scene is respectively E=along the full-size range of Z, Y, X-direction
33m, H=193m, W=148m.Here the size that each voxel is arranged is λE=2m, λH=1.6m, λW=1.2m, the body of building
Prime model is having a size of E'=16, H'=128, W'=128.Likewise, choosing T=32 point in each voxel.
For CRF-RNN network, to prevent over-fitting and gradient disappearance etc., its number of iterations, which is arranged, in the training stage is
T '=10 are arranged in test phase in T '=5.Gaussian filter size is consistent with point cloud data size.
The present invention uses the strategy of two steps training, and the first step individually trains Point-VoxelNet, second step pair
The network PVCRF of joint Point-VoxelNet and CRF-RNN are finely adjusted.In Point-VoxelNet network, using momentum
The Adam optimization algorithm that value is 0.9 is trained optimization, and initial learning rate is 0.001, and trained batch size is 16.It is right
PVCRF network uses momentum value to be trained optimization for 0.6 Adam optimization algorithm, and initial learning rate is 0.0001, training
Batch size be 16.During the training period, equally use the early strategy that stops obtaining optimal network parameter, maximum exercise wheel number
Be 100, if network parameter it is continuous 10 wheel training after still without update if deconditioning.In test phase, likewise, adopting
Algorithm is verified with 6 folding cross validations, the grouping situation of training data and test data is as shown in table 1.
The three-dimensional voxel nerve net of proposition is realized based on Python, using the deep learning frame of Tensorflow
Network structure and CRF-RNN network structure.Experimental Hardware environment are as follows: Intel Core i76700KCPU, 48G memory, GTX
1080Ti video card (supports CUDA 8.0, cuDNN 5.1).
(2) quantitative result is analyzed
Deep neural network model (Point VoxelNet and PVCRF) is compared in three-dimensional based on above-mentioned two data set
Point cloud label in application effect, and with it is current preferably labeling algorithm PointNet, MS+CU (2), SEGCloud,
3DContextNet has carried out comparative analysis.Wherein, only with the rectangular co-ordinate information of cloud, i.e. XYZ coordinate is handled.
Table 1
Table 2
Statistics label of the heterogeneous networks model on two small data sets is set forth as a result, wherein originally in Tables 1 and 2
Embodiment propose PVCRF model achieved on S3DIS, vKITTI data set preferably be averaged IoU, respectively 51.8%,
39.1%, it is also yielded good result on general reference numeral accuracy rate OA, respectively 81.2%, 82.6% show PVCRF mould
Type can preferably obtain the complete characteristics expression of a cloud, it was demonstrated that Multi resolution feature extraction based on three-dimensional voxel space be based on
The Multi resolution feature extraction of theorem in Euclid space has comparable function, can mark for three-dimensional point cloud and provide the details letter of a cloud scene
Breath.
Comparing PVCRF and SEGCLloud, SEGCLloud directlys adopt two-value voxel model and carries out voxel feature learning, and
PVCRF and utilizes point cloud in voxel to carry out voxel feature learning using the point cloud voxel model of rasterizing.Both add
The rear end optimization based on full connection CRF model is added.In addition, containing the multiple dimensioned spy based on three-dimensional voxel space in PVCRF
Levy extraction module, thus PVCRF achieves higher average IoU, absolutely prove using point cloud data in voxel handled with
And Multi resolution feature extraction module can extract the description of the feature with stronger characterization ability.
Compare Point-VoxelNet and PVCRF model, PVCRF is equal in average IoU and two aspect of overall accuracy OA
Preferably performance is achieved, this is because connection CRF model can model larger range of contextual information entirely, to point
The syntople of cloud has stronger characterization ability.
On different data sets, using PointNet as benchmark, since the point cloud in S3DIS data set compares concentration,
Point cloud density is generally higher, and the network model PVCRF that the present embodiment proposes achieves biggish performance boost, and in vKITTI number
According to concentration, since the distribution of its cloud is universal more sparse, thus performance boost is then smaller.
Label result and authentic signature result such as Figure 15 that algorithms of different obtains on S3DIS and vKITTI data set and
Shown in Figure 16.From left to right successively are as follows: input color point cloud, the label based on PointNet, Point-VoxelNet, PVCRF
As a result, authentic signature result.It can be seen that from the label result in Figure 15 and Figure 16 and be better than based on the PVCRF result obtained
PointNet and Point-VoxelNet, label result and legitimate reading are more closely, demonstrate the Analysis On Multi-scale Features in PVCRF
The validity of study and the optimization of the rear end CRF.It is compared to PointNet and Point-VoxelNet, is based on Point-VoxelNet
The label result of acquirement is better than PointNet, and this is mainly due to the introducings that Analysis On Multi-scale Features in Point-VoxelNet learn
Improve the feature learning ability of network model.
Indoors in outer scene, for the target that degree of overlapping is relatively high or is completely embedded, Point-VoxelNet,
PVCRF model is still more difficult separated, such as the road in the wallboard (board) and window (window), Figure 16 in Figure 15
(road) with landform (terrain).Due to considering Multi resolution feature extraction in Point-VoxelNet and PVCRF model,
Target for being greater than target size in scene can be marked well, such as desk (table) and figure in Figure 15
Building (building) in 16 etc..
Compare Point-VoxelNet and PVCRF, due to being connected in the large scale that CRF can be extracted in scene point cloud entirely
Context information, so that the details in label result is more prominent, such as the chair (chair) and sofa in Figure 15 the first row
(sofa) shown in segmentation result of the road (road) with landform (terrain) and in Figure 16.
(4) classification confusion matrix is analyzed
Two network models of Point-VoxelNet and PVCRF are set forth in S3DIS, vKITTI number in Figure 17, Figure 18
According to the classification confusion matrix obtained on collection, the numerical value in matrix grid is category label accuracy rate, and mesh color also represents standard
The size of true rate.The result of the two is compared it can be found that for house data collection S3DIS, Point-VoxelNet and PVCRF mould
Type is suitable to the segmentation precision of all kinds of targets, and the introducing of CRF model is mainly reduction of desk (table) and floor (floor)
Obscuring Deng between.For the introducing of outdoor data collection vKITTI, CRF model be mainly reduction of building (building) with
Degree of aliasing between lorry (van) etc..
From Figure 17 (a) as can be seen that Point-VoxelNet network model is to ceiling in S3DIS data set
(ceiling), floor (floor), door (door), column (column), crossbeam (beam), window (window), bookcase
(bookcase), for the recognition accuracy of wallboard (board) and chair (chair) 52% or more, label accuracy rate is lower
There are sofa (sofa), wall (wall) and shade (cluTTer), precision is worst in 13 class targets between 35% to 46%
Be desk (table), precision 22%.The superiority and inferiority for the average classification accuracy rate that PVCRF is obtained on S3DIS data set point
Cloth situation is similar with Point-VoxelNet network model, but universal precision all increases, and wherein the accuracy rate of desk improves
To 30%, as shown in Figure 18 (a).
Also there is similar comparing result in vKITTI data set, as shown in Figure 17 (b) and 18 (b).This comparing result
CRF model is demonstrated to the accurate assurance and modeling of target detail in scene.In comparison diagram 17 and Figure 18 in two datasets not
With as a result, the label precision on vKITTI data set is generally lower, this is mainly due to the point cloud phases in vKITTI data set
For the point cloud data in S3DIS data set generally than sparse and uneven, thus neighborhood of a point structure be not it is so obvious,
And then it is insufficient when leading to information extraction, it is unfavorable for realizing high-precision cloud label.
(5) Statistical Analysis is calculated
Experimental analysis is carried out to the computational efficiency of algorithms of different on S3DIS data set.Include altogether in S3DIS data set
272 width independent point clouds are marked each amplitude point cloud using algorithms of different and count its mean test time (only statistics nerve
Network query function is time-consuming, not the time-consuming of statistical data preparation stage), statistical result such as table 3.Statistical result can from table 3
Out, the calculating time of PointNet is about 1.8s.After the fusion three-dimensional voxel convolutional neural networks and CRF that are provided in the present embodiment
The network model PVCRF of end optimization calculates time-consuming maximum, is 4.52s, this is because the introducing of CRF significantly increases calculation amount.
Table 3
The FC-CRF realized based on convolutional neural networks is used to be advanced optimized finally to obtain fine label result.
Further, Figure 19 is please referred to, the page provided in an embodiment of the present invention builds device 100 and includes voxel processing
And characteristic extracting module 110, multiple dimensioned voxel feature calculation module 120, feature expansion module 130 and point cloud mark module 140.
Voxel processing and characteristic extracting module 110, for carrying out voxelization processing to three dimensional point cloud collection, and based on place
Reason result carries out voxel feature extraction in voxel and forms the first voxel eigenmatrix;In the present embodiment, above-mentioned steps S11 can be by
Voxel processing and characteristic extracting module 110 execute, i.e., the specific descriptions about voxel processing and characteristic extracting module 110 can refer to
Step S11, details are not described herein for the present embodiment.Optionally, as shown in figure 19 in the present embodiment, voxel processing and feature extraction
Module 110 includes voxel division unit 111, point cloud classification unit 112 and Points Sample unit 113.
Voxel division unit 111, for a cloud coordinate space to be divided into multiple voxels according to default voxel size;This reality
It applies in example, above-mentioned steps S111 can be executed by voxel division unit 111, i.e., the specific descriptions about voxel division unit 111 can
With reference to step S111, details are not described herein for the present embodiment.
Point cloud sort out unit 112, for the grid parameter according to voxel by each point that three dimensional point cloud is concentrated sort out to
In corresponding voxel;In the present embodiment, above-mentioned steps S112 can sort out unit 112 by cloud and execute, i.e., sort out about cloud single
The specific descriptions of member 112 can refer to step S112, and details are not described herein for the present embodiment.
Points Sample unit 113, for being sampled to the point in each voxel after classification so that point in voxel
Quantity reaches the first preset value.In the present embodiment, above-mentioned steps S113 can be executed by Points Sample unit 113, i.e., about a cloud
The specific descriptions of sampling unit 113 can refer to step S113, and details are not described herein for the present embodiment.
Multiple dimensioned voxel feature calculation module 120, for using the first voxel eigenmatrix as Three dimensional convolution neural network
Input the Analysis On Multi-scale Features of voxel are calculated, and feature fused in tandem is carried out to obtain the second body to the Analysis On Multi-scale Features
Plain eigenmatrix;In the present embodiment, above-mentioned steps S12 can be executed by multiple dimensioned voxel feature calculation module 120, i.e., about more rulers
The specific descriptions of degree voxel feature calculation module 120 can refer to step S12, and details are not described herein for the present embodiment.
Feature expansion module 130, for being expanded the voxel feature in the second voxel eigenmatrix based on feature interpolation algorithm
It opens up in each point concentrated to three dimensional point cloud to obtain a cloud eigenmatrix;In the present embodiment, above-mentioned steps S13 can be by feature
Expansion module 130 executes, i.e. the specific descriptions about feature expansion module 130 can refer to step S13, and the present embodiment is herein no longer
It repeats.
Point cloud mark module 140, for that will put in cloud eigenmatrix input multilayer perceptron to realize to three-dimensional point cloud
Attribute label.In the present embodiment, above-mentioned steps S14 can be executed by a cloud mark module 140, i.e., about a cloud mark module 140
Specific descriptions can refer to step S14, details are not described herein for the present embodiment.
To sum up, a kind of three-dimensional point cloud labeling method and device based on fusion voxel provided in an embodiment of the present invention, passes through
Multiple dimensioned voxel feature is extracted based on voxel convolutional neural networks building multiscale space on the voxel model of regularization, then
Voxel feature is expanded into point feature in the way of feature interpolation, and then realizes the point-by-point identification of more sophisticated category and point
Cloud label.
In the description of the present invention, term " setting ", " connected ", " connection " shall be understood in a broad sense, for example, it may be fixed
Connection, may be a detachable connection, or be integrally connected;It can be mechanical connection, be also possible to be electrically connected;It can be directly
It is connected, the connection inside two elements can also be can be indirectly connected through an intermediary.For the ordinary skill of this field
For personnel, the concrete meaning of above-mentioned term in the present invention can be understood with concrete condition.Provided by the embodiment of the present invention
In several embodiments, it should be understood that disclosed device and method can also be realized by other means.It is described above
Device and method embodiment it is only schematical, for example, the flow chart and block diagram in the drawings show according to the present invention
The device of preset quantity embodiment, method and computer program product architecture, function and operation in the cards.At this
On point, each box in flowchart or block diagram can represent a part of a module, section or code.The module,
A part of program segment or code includes that one or preset quantity are a for realizing defined logic function.
It should also be noted that function marked in the box can also be with difference in some implementations as replacement
The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes
It can also execute in the opposite order, this depends on the function involved.It is also noted that in block diagram and or flow chart
The combination of box in each box and block diagram and or flow chart, can function or movement as defined in executing it is dedicated
Hardware based system is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.