CN111428855A - End-to-end point cloud deep learning network model and training method - Google Patents
End-to-end point cloud deep learning network model and training method Download PDFInfo
- Publication number
- CN111428855A CN111428855A CN202010116881.9A CN202010116881A CN111428855A CN 111428855 A CN111428855 A CN 111428855A CN 202010116881 A CN202010116881 A CN 202010116881A CN 111428855 A CN111428855 A CN 111428855A
- Authority
- CN
- China
- Prior art keywords
- point
- points
- monitoring
- sampling
- identification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000012544 monitoring process Methods 0.000 claims abstract description 125
- 238000005070 sampling Methods 0.000 claims abstract description 82
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 23
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 claims description 20
- 239000010410 layer Substances 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000000875 corresponding effect Effects 0.000 claims description 6
- 210000004709 eyebrow Anatomy 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000006386 neutralization reaction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 210000003128 head Anatomy 0.000 description 4
- 210000000887 face Anatomy 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/757—Matching configurations of points or features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
An end-to-end point cloud deep learning network model and a training method can simultaneously position identification points on human faces with different scales, and the network has good positioning precision and high positioning speed. The network model is a deep learning network structure of a Convolutional Neural Network (CNN) and comprises the following components: (1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step; (2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points; (3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
Description
Technical Field
The invention relates to the technical field of point cloud image processing and deep learning, in particular to an end-to-end point cloud deep learning network model and an end-to-end point cloud deep learning training method.
Background
The three-dimensional image is a special information expression form and is characterized in that the three-dimensional data in the space to be expressed comprises the following expression forms: depth maps (expressing the distance of an object from the camera in grey scale), geometric models (built by CAD software), point cloud models (all reverse engineering equipment samples objects as point clouds). Compared with a two-dimensional image, the three-dimensional image can realize natural object-background decoupling by means of information of a third dimension. Point cloud data is the most common and basic three-dimensional model. The point cloud model is usually obtained directly by measurement, each point corresponds to a measurement point, and other processing means are not used, so that the point cloud model contains the maximum information quantity. The information is hidden in the point cloud and needs to be extracted by other extraction means, and the process of extracting the information in the point cloud is three-dimensional image processing.
The point cloud is a massive point set which expresses the target space distribution and the target surface characteristics under the same space reference system, and after the space coordinates of each sampling point on the surface of the object are obtained, the point set is obtained and is called as the point cloud (PointCloud).
The rapid and accurate positioning of the identification points in the point cloud is very important to be applied in the fields of identity recognition, 3D model segmentation, 3D model retrieval and the like, wherein the automatic positioning of the identification points in the 3D face point cloud is very important to be applied in the aspects of face recognition, expression recognition, head pose recognition, head motion estimation, head point cloud dense matching, lip shape analysis, head surgery, disease diagnosis and the like.
However, the current technology cannot simultaneously guarantee the accuracy and the speed of the algorithm, the algorithm with higher speed has lower accuracy, and the algorithm with higher accuracy has lower speed, so that the application with higher requirements on the accuracy and the speed cannot be met.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an end-to-end point cloud deep learning network model, which can simultaneously position identification points on human faces with different scales, and has high positioning precision and high positioning speed.
The technical scheme of the invention is as follows: the end-to-end point cloud deep learning network model is a deep learning network structure of a Convolutional Neural Network (CNN) and comprises the following steps:
(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;
(2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points;
(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
The invention uses a point distribution characteristic extractor to extract neighborhood point cloud distribution characteristics of sampling points, the neighborhood point distribution characteristics of the points are abstracted step by step and the spatial receptive field is enlarged step by step, thereby expressing the distribution characteristics of the points in different spatial ranges; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning accuracy, and the algorithm consumes shorter time and is more stable due to the fact that the algorithm consumes time for forward propagation of point cloud in the network and the lightweight design.
The method matches each monitoring point with a plurality of identification points, matches the identification points with the monitoring point as long as the monitoring point is adjacent to one identification point, predicts the position of the identification point matched with the monitoring point by using the characteristics of each monitoring point, and converts the positioning problem of the identification point in the point cloud into a multi-label prediction and regression problem.
Drawings
FIG. 1 is a flow chart of the structure of L andemark Net and its application to a set of normal-scale face points.
FIG. 2 is a diagram illustrating a simple matching result between a monitoring point and a target identification point.
FIG. 3 is a flow chart of an end-to-end point cloud deep learning network model according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to make the description of the present disclosure more complete and complete, the following description is given for illustrative purposes with respect to the embodiments and examples of the present invention; it is not intended to be the only form in which the embodiments of the invention may be practiced or utilized. The embodiments are intended to cover the features of the various embodiments as well as the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and step sequences.
As shown in fig. 3, the end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and includes the following steps:
(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the neighborhood point cloud point distribution characteristics of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;
(2) selecting a part of point sets from the sampling point sets, and using all the sampling points in the sampling point sets as monitoring points to position the identification points;
(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
The invention uses a point distribution characteristic extractor to extract neighborhood point cloud distribution characteristics of sampling points, the neighborhood point distribution characteristics of the points are abstracted step by step and the spatial receptive field is enlarged step by step, thereby expressing the distribution characteristics of the points in different spatial ranges; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning accuracy, and the algorithm consumes shorter time and is more stable due to the fact that the algorithm consumes time for forward propagation of point cloud in the network and the lightweight design.
Preferably, in the step (1), for any input point cloud P, it is first down-sampled into a point cloud P with a point cloud density D using a Voxel Grid filter0(ii) a According to a fixed sampling ratio { tau1,τ2,…,τnFrom P0Gradually down-sampling to obtain a sampling point set { P1,P2,…,Pn};
From the first set of sample points P1Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation1,P2,…,PnAbstract features of the sample points in. Feature abstraction operations on a set of points Pi-1Computing a set of points PiFor a set of sampling points PiInner kth sample pointAt a set of sampling points Pi-1The interior finding is located at the pointA radius of r as a centeriNeighborhood subset inside sphereExtraction using point distribution feature extractorN iniPoints and their feature vectorsGet a pointAbstract feature vector ofWherein n isiAnd the point cloud density is positively correlated with D. Set of sampling points PiCharacteristic of all sampling points inComposition point set PiAbstract feature set F ofiSet of sample points { P }1,P2,…,PnFeature set of { F } {1,F2,…,FnThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted; finally, the point cloud extractor acts on PnAll the points in (a) will produce a feature vector that expresses global features.
Then, from the last layer sampling point set PnThe sampling point set P is obtained step by stepn,Pn-1,…,P1The propagation characteristics of all sampling points in the set will constitute a propagation characteristic setFeature propagation operations on a set of points Pi+1Computing a set of points PiFor each sample point set PiInner kth sample pointSet of points Pi+1Neutralization ofAbstract features of the nearest 3 points andtaking the derivative of the distance as weight to carry out weighted average, and taking the weighted average result and the pointAbstract features ofSplicing is carried out, a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u functions) are used for acting on splicing results, and points are obtainedCharacteristic of propagation ofSet of sampling points PiPropagation characteristics of all the sampling pointsComposition point set PiIs propagated feature setDue to the sampling point set PnThe next stage of (2) is a feature vector, which is taken as the weighted average result and the set of sampling points PnSplicing the abstract features of each point to obtain a point set PnPropagation characteristics of each sampling point
Preferably, in step (1), the Voxel Grid filter is first voxelized, and the barycenter of the point located in each Voxel constitutes the output point cloud.
Preferably, in the step (2), from the sampling point set { P }1,P2,…,PnSelecting a plurality of point sets, which are called monitoring point sets MPS, and all sampling points in the monitoring point sets are called monitoring points; for the ith monitor Point set PiThe k monitoring point ofWill be provided withAndrespectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring pointCharacteristics of each monitoring pointThe point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have discrimination, the neighborhood of which target identification point the monitoring point belongs to is judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point is predicted.
Preferably, in the step (3), if the number of target standard test points is L, the ith monitoring point set P is selectediThe k monitoring point ofUsing 1 single fully-connected layer with output dimension of LActing on its characteristicsPredicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3 Acting on its characteristicsTo the monitoring pointPredicting the offset (delta x, delta y, delta z) of each identification point; the jthAnd predicting the offset of the monitoring point and the jth identification point.
Preferably, in the step (3), the parameters at these fully-connected layers are shared among each set of sampling points.
FIG. 1 is a flow chart of the structure of L andemark Net and its application to a set of normal-scale face points.
A network consists of many feature abstraction operations and feature propagation operations. For any input point cloud P, it is first down-sampled using a Voxel Grid filter into a point cloud P with a point cloud density D0. The Voxel Grid filter first voxelizes the space, and the center of gravity of the points located within each Voxel constitutes the output point cloud. According to a fixed sampling ratio { tau1,τ2,…,τnFrom P0Step-by-step sampling to obtain a sampling point set { P1,P2,…,Pn}. From the first set of sample points P1Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation1,P2,…,PnAbstract features of the sample points in. Feature abstraction operations on a set of points Pi-1Computing a set of points PiFor a set of sampling points PiInner kth sample pointAt a set of sampling points Pi-1The interior finding is located at the pointA radius of r as a centeriNeighborhood subset inside sphereActing on using point-distributed feature extractors (e.g. PointNet, RS-CNN, etc.)N iniPoints and their feature vectors to obtain pointsAbstract feature vector ofWherein n isiAnd the point cloud density is positively correlated with D. The characteristics of all sampling points in each sampling point setForm a point set PiAbstract feature set F ofiSet of sample points { P }1,P2,…,PnFeature set of { F } {1,F2,…,FnThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted. Finally, the point of use acts on P with the feature extractornAll the points in (a) will produce a feature vector that expresses global features. Then, from the last layer sampling point set PnInitially, a set of sample points { P } will be obtained step by stepn,Pn-1,…,P1The propagation characteristics of all sampling points in the set will constitute a propagation characteristic setFeature propagation operations on a set of points Pi+1Computing a set of points PiFor each sample point set PiInner kth sample pointSet of points Pi+1Neutralization ofAbstract features of the nearest 3 points andis weighted average, the weighted average result is obtainedAnd pointAbstract features ofPerforming splicing, and applying a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u) to the splicing result to obtain pointsCharacteristic of propagation ofDue to the sampling point set PnIdentifies a feature vector, and then takes this feature vector as the weighted average result and the set of sample points PnSplicing abstract features of each point in the point set P, and obtaining a point set P through a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u)nPropagation characteristics of each sampling point
From the set of sample points { P1,P2,…,PnA plurality of point sets are selected, and are called Monitoring Point Sets (MPS), and all sampling points in the monitoring point sets are called monitoring points. For the ith monitor Point set PiThe k monitoring point ofWill be provided withAndrespectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring pointDue to the characteristics of each monitoring pointThe point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have the discrimination, the neighborhood of which target identification point the monitoring point belongs to can be judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point can be predicted.
If the number of the target identification points is L, for the ith monitoring point set PiThe k monitoring point ofUsing 1 single fully-connected layer with output dimension of LActing on its characteristicsPredicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3Acting on its characteristicsTo the monitoring pointThe amount of deviation (Δ x, Δ y, Δ z) from each marker point is predicted. Is different(for example,) The offset of the monitoring point from a different identification point (e.g., the jth identification point) is predicted. The parameters at these fully-connected layers are shared among each set of sample points.
Features with larger spatial receptive fields can express distribution features of points in a larger spatial range, and can be used to locate identification points on faces with larger dimensions, and vice versa. If a plurality of monitoring point sets with different spatial receptive fields are used, the network can be enabled to simultaneously locate the identification points on faces of different dimensions. Because the relative topological relation of the identification points and the relative positions of the identification points and the feature areas on the human face are relatively fixed, the global information helps to position the identification points, and because the propagation characteristics of the points contain the global information, the propagation characteristics of the monitoring points are integrated as the characteristics of the monitoring points in addition to the abstract characteristics of the monitoring points, so that the positioning stability of the network is improved.
The method matches each monitoring point with a plurality of identification points, matches the identification points with the monitoring point as long as the monitoring point is adjacent to one identification point, predicts the position of the identification point matched with the monitoring point by using the characteristics of each monitoring point, and converts the positioning problem of the identification point in the point cloud into a multi-label prediction and regression problem.
Preferably, when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with a specific scale are matched with the monitoring points with the corresponding spatial receptive fields, and a series of boxes, namely a target box TBX and a detection box MBX, are arranged by taking the gold standard identification points and the monitoring points as centers respectively.
Preferably, the golden standard is based on training dataSet the side length (l) of TBXx t,ly t,Lz t) The setting mode is formula (1):
wherein the content of the first and second substances,is the left external canthus of the eye,is the right external canthus,is the heart of the eyebrow, and the eyebrow,is the lower base; according to the use at each monitoring pointUpper level point centralized generationRadius r of the sphere ofiIs provided withLength of side (l)x m,ly m,lz m) The setting mode is formula (2):
lx m=ly m=lz m=2ri(2)
if the TBX and the monitoring point of the jth golden standard identification pointIs/are as followsHas a jaccard value exceeding a threshold value thmThen matching is performed according to equation (3):
preferably, all parameters of the network are trained simultaneously using the loss functions of equation (4), including classification loss functions and regression loss functions
loss=lossc+λlossr(4)
The classification loss function is formula (5)
Wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively;
lossi,kas a monitoring pointThe loss of classification of (a) is,is acted on using sigmoid functionMonitoring point of network prediction obtained by j-th dimension calculation of outputThe probability in the adjacent area of the jth gold standard identification point defines the monitoring point matched with at least one gold standard identification point as a positive sample, the monitoring point not matched with any gold standard identification point as a negative sample, NpIs the number of positive samples, NeThe number of negative samples;
according to lossi,kSorting the negative samples and selecting the lossi,kThe largest first few negative samples calculate the classification loss and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.
The regression loss function is formula (6):
is a monitoring point predicted by the networkOffset from the jth target mark point ofAn output of (d);corresponding gold standard.
Fig. 2 is a schematic diagram of a simple matching result between the monitoring point and the target identification point. The training method is described in detail below.
In the network training stage, the monitoring points need to be matched with the gold standard in the training data, and the network is trained according to the matching result.
In order to solve the two problems, a multi-label matching strategy (M L M) is provided, each monitoring point is matched with a plurality of identification points, the identification points are matched with the monitoring points as long as the monitoring points are adjacent to a certain identification point, the position of the identification point matched with the monitoring point is predicted by using the characteristics of each monitoring point, and the positioning problem of the identification points in the point cloud is converted into a multi-label prediction and regression problem.
When the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with a certain specific scale need to be matched with the monitoring points with the corresponding large and small spatial receptive fields, and therefore a series of boxes, namely a Target Box (TBX) and a detection box (MBX), are arranged by taking the gold standard identification points and the monitoring points as centers respectively. As shown in fig. 2, two solid black dots and two bold line boxes represent two target identification points and their TBXs, respectively. The three slashed-filled black dots and the three thin line boxes are the three monitoring points and their MBXs, respectively.
In order to make the size of TBX reflect the size of the face in the training data, the training is based onGolden standard for practicing dataSet the side length (l) of TBXx t,ly t,lz t) The setting mode is as follows:
wherein the content of the first and second substances,is the left external canthus of the eye,is the right external canthus,is the heart of the eyebrow, and the eyebrow,is the tip of the lower chin.
According to the use at each monitoring pointUpper level point centralized generationRadius r of the sphere ofiIs provided withLength of side (l)x m,ly m,lz m) The setting mode is as follows:
lx m=ly m=lz m=2ri
if the TBX and the monitoring point of the jth golden standard identification pointIs/are as followsHas a jaccard value exceeding a threshold value thmThen match them
Loss function: all parameters of the network are synchronously trained using loss functions, including classification loss functions and regression loss functions.
loss=lossc+λlossr
Wherein the classification loss function is as follows:
wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; lossi,kAs a monitoring pointThe loss of classification of (a) is,is acted on using sigmoid functionMonitoring point of network prediction obtained by j-th dimension calculation of outputThe probability in the jth gold standard identification point neighborhood is obtained by the formula (3)Defining the monitoring points matched with at least one gold standard identification point as positive samples, and the monitoring points not matched with any gold standard identification point as negative samples, NpIs the number of positive samples, NeThe number of negative samples.
Since the number of negative samples is much larger than the number of positive samples, according to lossi,kSorting the negative samples and selecting the lossi,kThe largest first few negative samples calculate the classification loss and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.
The regression loss function is defined as follows:
obtained by the formula (3) Is a monitoring point predicted by the networkOffset from the jth target mark point ofAn output of (d);corresponding gold standard.
In more detail, the RS-Conv is used as a point distribution feature extractor in the network, and 3D euclidean distances and coordinate differences (3D-Ed, x) are usedi-xj) As the low-level distribution relation information h of the point cloud. The network comprises 8 feature abstraction operations and feature propagation operations in total, and the sampling ratio is { tau1,τ2,…,τ7Are {7/20,8/10,10/15,15 }20,20/25,25/60,60/120 for generating a local subset of samples for each sample pointSampling radius of { r1,r2,…,r7Are {8,10,15,20,25,60,120} (mm), respectively, and the last feature abstraction operation is to act on the set of points P7Collecting local point cloud subsets of each sampling point from the upper stage sampling point set by using a farthest point collection methodLocal point cloud subsetNumber of intermediate sampling points s1,s2,…,s7Are {75/V,100/V,50/V,75/V, 200/V,100/V } respectively, where V is the size of the Grid in the Voxel Grid filter used to downsample the set of input points, and V is 5 mm. In addition, λ is 1, thm=0.2,thp=0.9,thd=3mm,the=5mm。
And (4) calculating a covariance matrix cov (X) for predicting the missing identification points according to the gold standard in the training set, and supplementing the missing gold standard identification points in the training data to finish the calculation of the matching condition of the gold standard and the monitoring points.
Data enhancement: the training data is sequentially rotated around the x, y, z axes by randomly selected angular values ranging from-2.5 ° to +2.5 ° and random jitter with a mean value of 0 and a standard deviation of 0.25mm is added to each point of the training data. Random rotation and random jitter will make the training data used each time the network is trained different from each other, which will make the network training stable and therefore very important.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.
Claims (10)
1. An end-to-end point cloud deep learning network model is characterized in that: the deep learning network structure of the convolutional neural network CNN is similar to the deep learning network structure of the convolutional neural network CNN, and comprises the following steps:
(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;
(2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points;
(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
2. The end-to-end point cloud deep learning network model of claim 1, wherein: in the step (1), for any input point cloud P, firstly, a Voxel Grid filter is used for down-sampling the point cloud P into a point cloud P with a point cloud density D0(ii) a According to a fixed sampling ratio { tau1,τ2,…,τnFrom P0Gradually down-sampling to obtain a sampling point set { P1,P2,…,Pn};
From the first set of sample points P1Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation1,P2,…,PnAbstract characteristics of sampling points in the data; feature abstraction operations on a set of points Pi-1Computing a set of points PiFor a set of sampling points PiInner kth sample pointAt a set of sampling points Pi-1The interior finding is located at the pointA radius of r as a centeriNeighborhood subset inside sphereExtraction using point distribution feature extractorN iniPoints and their feature vectors to obtain pointsAbstract feature vector ofWherein n isiThe point cloud density is positively correlated with D; the characteristics of all sampling points in each sampling point setComposition point set PiAbstract feature set F ofiSet of sample points { P }1,P2,…,PnFeature set of { F } {1,F2,…,FnThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted; finally using point cloud feature extractor to act on PnAll the points in the image are used for generating a feature vector expressing global features;
then, from the last layer sampling point set PnInitially, a set of sample points { P } will be obtained step by stepn,Pn-1,…,P1The propagation characteristics of all sampling points in the set will constitute a propagation characteristic setFeature propagation operations on a set of points Pi+1Computing a set of points PiFor a set of sampling points PiInner kth sample pointSet of points Pi+1Neutralization ofAbstract features of the nearest 3 points andthe reciprocal of the distance of (a) is weighted average, and the result of the weighted average is compared with the pointAbstract features ofAnd performing splicing, namely using a plurality of multi-layer perceptron M L P and nonlinear activation function Ru L u functions to act on a splicing result to obtain pointsCharacteristic of propagation ofDue to the sampling point set PnIs a feature vector, then this feature vector is taken as the weighted average result and the set of sample points PnSplicing the abstract features of each point to obtain a point set PnPropagation characteristics of each sampling point
3. The end-to-end point cloud deep learning network model of claim 2, wherein: in the step (1), a Voxel Grid filter firstly voxelizes a space, and the barycenter of a point located in each Voxel forms an output point cloud.
4. The end-to-end point cloud deep learning network model of claim 3, characterized in thatThe method comprises the following steps: in the step (2), from the sampling point set { P }1,P2,…,PnSelecting a plurality of point sets, which are called monitoring point sets MPS, and all sampling points in the monitoring point sets are called monitoring points; for the ith monitor Point set PiThe k monitoring point ofWill be provided withAndrespectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring pointCharacteristics of each monitoring pointThe point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have discrimination, the neighborhood of which eye ratio identification point the monitoring point is located in is judged according to the characteristics of each monitoring point, and the position adjacent to the eye ratio identification point is predicted.
5. The end-to-end point cloud deep learning network model of claim 4, wherein in the step (3), if the number of target identification points is L, for the ith monitoring point set PiThe k monitoring point ofUsing 1 single fully-connected layer with output dimension of LActing on its characteristicsPredicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3 (j-0, 1, …, L-1) on its characteristicsTo the monitoring pointPredicting the offset (delta x, delta y, delta z) of each identification point; the jthAnd predicting the offset of the monitoring point and the jth identification point.
6. The end-to-end point cloud deep learning network model of claim 5, wherein: in step (3), the parameters at these fully-connected layers are shared among each set of sample points.
7. The method for training the end-to-end point cloud deep learning network model according to claim 6, wherein: each monitoring point is matched with a plurality of identification points, as long as the monitoring point is adjacent to one identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristics of each monitoring point, and the positioning problem of the identification points in the point cloud is converted into a multi-label prediction and regression problem.
8. The method for training the end-to-end point cloud deep learning network model according to claim 7, wherein: when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with corresponding large and small spatial receptive fields, and a series of boxes, namely a target box TBX and a detection box MBX, are respectively arranged by taking the gold standard identification points and the monitoring points as centers.
9. The method for training the end-to-end point cloud deep learning network model according to claim 8, wherein: gold standard according to training dataSet the side length (l) of TBXx t,ly t,lz t) The setting mode is formula (1):
wherein the content of the first and second substances,is the left external canthus of the eye,is the right external canthus,is the heart of the eyebrow, and the eyebrow,is the lower base;
according to the use at each monitoring pointUpper level point centralized generationRadius r of the sphere ofiIs provided withLength of side (l)x m,ly m,lz m) The setting mode is formula (2):
lx m=ly m=lz m=2ri(2)
if the TBX and the monitoring point of the jth golden standard identification pointIs/are as followsHas a jaccard value exceeding a threshold value thmThen matching is performed according to equation (3):
10. the method for training the end-to-end point cloud deep learning network model according to claim 9, wherein: learning all parameters of the network simultaneously using the loss functions of equation (4), including classification loss function and regression loss function
loss=lossc+λlossr(4)
The classification loss function is formula (5)
Wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; lossi,kAs a monitoring pointThe loss of classification of (a) is,is acted on using sigmoid functionMonitoring point of network prediction obtained by j-th dimension calculation of outputThe probability in the adjacent area of the jth gold standard identification point defines the monitoring point matched with at least one gold standard identification point as a positive sample, the monitoring point not matched with any gold standard identification point as a negative sample, NpIs the number of positive samples, NeThe number of negative samples;
according to lossi,kSorting the negative samples and selecting the lossi,kCalculating the classification loss of the first negative samples with the largest size, and ensuring that the number of the negative samples participating in calculation is not more than three times of the number of the positive samples;
the regression loss function is formula (6):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010116881.9A CN111428855B (en) | 2020-02-25 | 2020-02-25 | End-to-end point cloud deep learning network model and training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010116881.9A CN111428855B (en) | 2020-02-25 | 2020-02-25 | End-to-end point cloud deep learning network model and training method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111428855A true CN111428855A (en) | 2020-07-17 |
CN111428855B CN111428855B (en) | 2023-11-14 |
Family
ID=71551571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010116881.9A Active CN111428855B (en) | 2020-02-25 | 2020-02-25 | End-to-end point cloud deep learning network model and training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428855B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085123A (en) * | 2020-09-25 | 2020-12-15 | 北方民族大学 | Point cloud data classification and segmentation method based on salient point sampling |
CN116045833A (en) * | 2023-01-03 | 2023-05-02 | 中铁十九局集团有限公司 | Bridge construction deformation monitoring system based on big data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268256A1 (en) * | 2017-03-16 | 2018-09-20 | Aquifi, Inc. | Systems and methods for keypoint detection with convolutional neural networks |
CN109544700A (en) * | 2018-10-12 | 2019-03-29 | 深圳大学 | Processing method, device and the equipment of point cloud data neural network based |
CN110197223A (en) * | 2019-05-29 | 2019-09-03 | 北方民族大学 | Point cloud data classification method based on deep learning |
CN110321910A (en) * | 2018-03-29 | 2019-10-11 | 中国科学院深圳先进技术研究院 | Feature extracting method, device and equipment towards cloud |
-
2020
- 2020-02-25 CN CN202010116881.9A patent/CN111428855B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180268256A1 (en) * | 2017-03-16 | 2018-09-20 | Aquifi, Inc. | Systems and methods for keypoint detection with convolutional neural networks |
CN110321910A (en) * | 2018-03-29 | 2019-10-11 | 中国科学院深圳先进技术研究院 | Feature extracting method, device and equipment towards cloud |
CN109544700A (en) * | 2018-10-12 | 2019-03-29 | 深圳大学 | Processing method, device and the equipment of point cloud data neural network based |
CN110197223A (en) * | 2019-05-29 | 2019-09-03 | 北方民族大学 | Point cloud data classification method based on deep learning |
Non-Patent Citations (1)
Title |
---|
SHAOSHUAI SHI ETC.: "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection", ARXIV:1912.13192V1 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085123A (en) * | 2020-09-25 | 2020-12-15 | 北方民族大学 | Point cloud data classification and segmentation method based on salient point sampling |
CN112085123B (en) * | 2020-09-25 | 2022-04-12 | 北方民族大学 | Point cloud data classification and segmentation method based on salient point sampling |
CN116045833A (en) * | 2023-01-03 | 2023-05-02 | 中铁十九局集团有限公司 | Bridge construction deformation monitoring system based on big data |
CN116045833B (en) * | 2023-01-03 | 2023-12-22 | 中铁十九局集团有限公司 | Bridge construction deformation monitoring system based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN111428855B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu | Research on road traffic situation awareness system based on image big data | |
Dong et al. | Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells | |
CN108182441B (en) | Parallel multichannel convolutional neural network, construction method and image feature extraction method | |
CN106951923B (en) | Robot three-dimensional shape recognition method based on multi-view information fusion | |
CN109671102B (en) | Comprehensive target tracking method based on depth feature fusion convolutional neural network | |
CN110956111A (en) | Artificial intelligence CNN, LSTM neural network gait recognition system | |
Sun et al. | Deep learning‐based single‐cell optical image studies | |
CN111797683A (en) | Video expression recognition method based on depth residual error attention network | |
CN106845430A (en) | Pedestrian detection and tracking based on acceleration region convolutional neural networks | |
CN105160400A (en) | L21 norm based method for improving convolutional neural network generalization capability | |
Abas et al. | A YOLO and convolutional neural network for the detection and classification of leukocytes in leukemia | |
CN113313123B (en) | Glance path prediction method based on semantic inference | |
CN114299150A (en) | Depth 6D pose estimation network model and workpiece pose estimation method | |
CN111414875B (en) | Three-dimensional point cloud head posture estimation system based on depth regression forest | |
Steinberg et al. | A Bayesian nonparametric approach to clustering data from underwater robotic surveys | |
Kate et al. | Breast cancer image multi-classification using random patch aggregation and depth-wise convolution based deep-net model | |
CN111428855A (en) | End-to-end point cloud deep learning network model and training method | |
Wiesner et al. | On generative modeling of cell shape using 3D GANs | |
CN114937182A (en) | Image emotion distribution prediction method based on emotion wheel and convolutional neural network | |
CN110210380A (en) | The analysis method of personality is generated based on Expression Recognition and psychology test | |
Özbay et al. | 3D Human Activity Classification with 3D Zernike Moment Based Convolutional, LSTM-Deep Neural Networks. | |
Cao et al. | 3D convolutional neural networks fusion model for lung nodule detection onclinical CT scans | |
CN115616570A (en) | SAR target recognition method based on semi-supervised generation countermeasure network | |
Oztel et al. | Deep learning approaches in electron microscopy imaging for mitochondria segmentation | |
CN113988163A (en) | Radar high-resolution range profile identification method based on multi-scale grouping fusion convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |