CN111428855B - End-to-end point cloud deep learning network model and training method - Google Patents

End-to-end point cloud deep learning network model and training method Download PDF

Info

Publication number
CN111428855B
CN111428855B CN202010116881.9A CN202010116881A CN111428855B CN 111428855 B CN111428855 B CN 111428855B CN 202010116881 A CN202010116881 A CN 202010116881A CN 111428855 B CN111428855 B CN 111428855B
Authority
CN
China
Prior art keywords
point
points
monitoring
sampling
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010116881.9A
Other languages
Chinese (zh)
Other versions
CN111428855A (en
Inventor
杨健
范敬凡
艾丹妮
郭龙腾
王涌天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202010116881.9A priority Critical patent/CN111428855B/en
Publication of CN111428855A publication Critical patent/CN111428855A/en
Application granted granted Critical
Publication of CN111428855B publication Critical patent/CN111428855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The end-to-end point cloud deep learning network model and the training method can simultaneously position the identification points on the faces with different scales, and the network has good positioning accuracy and high positioning speed. The network model is a deep learning network structure similar to a convolutional neural network CNN, and comprises the following components: (1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the point distribution characteristics of the neighborhood point clouds of the sampling points are gradually abstract and the space receptive field is gradually enlarged; (2) Selecting part of point sets from the sampling point sets, and using all sampling points in the sampling point sets as monitoring points to locate the identification points; (3) And predicting the probability that each monitoring point is positioned in the neighborhood of different identification points and the offset of each monitoring point and different identification points.

Description

End-to-end point cloud deep learning network model and training method
Technical Field
The invention relates to the technical field of point cloud image processing and deep learning, in particular to an end-to-end point cloud deep learning network model and an end-to-end point cloud deep learning training method.
Background
The three-dimensional image is a special information expression form, and is characterized by three-dimensional data in the expression space, wherein the expression form comprises: depth map (expressing object-to-camera distance in grayscale), geometric model (built by CAD software), point cloud model (all reverse engineering devices sample objects as point clouds). Compared with the two-dimensional image, the three-dimensional image can realize the decoupling of the natural object, namely the background by the information of the third dimension. Point cloud data is the most common and fundamental three-dimensional model. The point cloud model is usually obtained directly by measurement, each point corresponds to one measurement point, and other processing means are not adopted, so that the maximum information quantity is contained. The information is hidden in the point cloud and needs to be extracted by other extraction means, and the process of extracting the information in the point cloud is three-dimensional image processing.
The Point Cloud is a massive Point set expressing the target space distribution and the target surface characteristics under the same space reference system, and after the space coordinates of each sampling Point of the object surface are obtained, the Point Cloud is obtained and is called as Point Cloud.
The rapid and accurate positioning of the identification points in the point cloud is very important in the fields of identity recognition, 3D model segmentation, 3D model retrieval and the like, wherein the automatic positioning of the identification points in the 3D face point cloud is very important in the aspects of face recognition, expression recognition, head pose recognition, head motion estimation, head point cloud dense matching, lip shape analysis, head operation, disease diagnosis and the like.
However, the existing technology cannot simultaneously ensure the accuracy and the speed of the algorithm, the algorithm with higher speed has lower accuracy, and the algorithm with higher accuracy has slower speed, so that the application with higher requirements on the accuracy and the speed can not be met.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide an end-to-end point cloud deep learning network model which can simultaneously position identification points on faces with different scales, and has the advantages of high positioning accuracy and high positioning speed.
The technical scheme of the invention is as follows: the end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and comprises the following steps:
(1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the point distribution characteristics of the neighborhood point clouds of the sampling points are gradually abstract and the space receptive field is gradually enlarged;
(2) Selecting part of point sets from the sampling point sets, and using all sampling points in the sampling point sets as monitoring points to locate the identification points;
(3) And predicting the probability that each monitoring point is positioned in the neighborhood of different identification points and the offset of each monitoring point and different identification points.
The invention uses the point distribution feature extractor to extract the neighborhood point cloud distribution feature of the sampling point, the neighborhood point distribution feature of the point is abstract step by step and the space receptive field is enlarged step by step, thus the distribution feature of the point in different space ranges can be expressed, the invention uses a plurality of monitoring point sets with different space receptive fields, so that the network can simultaneously position the identification points on the faces with different scales; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning precision, and the algorithm consumes time which is time consuming for forward propagation of the point cloud in the network, and is shorter and more stable through light-weight design.
The invention also provides a training method of the end-to-end point cloud deep learning network model, which is used for matching each monitoring point with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.
Drawings
Fig. 1 is a flow chart of the structure of a Landmark Net and its application to a set of face points with normal dimensions.
The identification point of fig. 2 is a schematic diagram of a simple matching result of the monitoring point and the target identification point.
Fig. 3 is a flow chart of an end-to-end point cloud deep learning network model according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order that the present disclosure may be more fully described and fully understood, the following description is provided by way of illustration of embodiments and specific examples of the present invention; this is not the only form of practicing or implementing the invention as embodied. The description covers the features of the embodiments and the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and sequences of steps.
As shown in fig. 3, this end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and includes the following steps:
(1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the neighborhood point clouds of the sampling points are gradually abstract in point distribution characteristics and the space receptive field is gradually expanded;
(2) Selecting a part of point sets from the sampling point sets, and using all sampling points written in the sampling point sets here as monitoring points to locate the identification points;
(3) And predicting the probability that each monitoring point is positioned in the neighborhood of different identification points and the offset of each monitoring point and different identification points.
The invention uses the point distribution feature extractor to extract the neighborhood point cloud distribution feature of the sampling point, the neighborhood point distribution feature of the point is abstract step by step and the space receptive field is enlarged step by step, thus the distribution feature of the point in different space ranges can be expressed, the invention uses a plurality of monitoring point sets with different space receptive fields, and the network can simultaneously position the identification points on the faces with different scales; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning precision, and the algorithm consumes time which is time consuming for forward propagation of the point cloud in the network, and is shorter and more stable through light-weight design.
Preferably, the steps are(1) In the method, for any input point cloud P, a Voxel Grid filter is firstly used for downsampling the point cloud P into a point cloud P with a point cloud density of D 0 The method comprises the steps of carrying out a first treatment on the surface of the According to a fixed sampling proportion { tau } 12 ,…,τ n From P 0 Step-by-step downsampling to obtain a sampling point set { P } 1 ,P 2 ,…,P n };
From the first set of sampling points P 1 Initially, a set of sampling points { P } is extracted step by step using a feature abstraction operation 1 ,P 2 ,…,P n Abstract features of sample points in }. Feature abstraction operations are applied to a set of points P i-1 Calculate the point set P i Abstract features of each sampling point in (a) for a set of sampling points P i The kth sample point inAt the sampling point set P i-1 Find the position at the point->Radius r as center i Neighborhood subset inside sphere->Extracting +.>N in i Individual points and their feature vectors, resulting in points +.>Is>Wherein n is i Is positively correlated with a point cloud density D. Sampling point set P i Features of all sample points ∈ ->Composition point set P i Is an abstract feature set F of (1) i Each sampling point set { P ] 1 ,P 2 ,…,P n Feature set { F } 1 ,F 2 ,…,F n The spatial receptive field of the } is expanded step by step and is abstract step by step; finally, the point cloud extractor acts on P n Will produce a feature vector that expresses the global feature.
Next, from the last layer of sample point set P n The sampling point set { P } is obtained step by step n ,P n-1 ,…,P 1 The propagation characteristics of all sample points within a set will constitute a propagation characteristic setFeature propagation operations are applied to point set P i+1 Calculate the point set P i Propagation characteristics of each sampling point of the set P of sampling points i The kth sample point in->Will point set P i+1 Middle ANDAbstract features of the nearest 3 points to AND +.>The derivative of the distance is weighted average for the weight, and the weighted average result is compared with the pointAbstract features of->Splicing, and applying multiple multi-layer perceptrons (MLP) and nonlinear activation functions (ReLu functions) to splicing results to obtain point +.>Propagation characteristics of->Sampling pointSet P i Propagation characteristics of all sample points in (a)>Composition point set P i Propagation feature set->Due to the sampling point set P n The next stage of (a) is a feature vector, which is taken as a weighted average result and a sampling point set P n The abstract features of each point in the list are spliced to obtain a point set P n Propagation characteristics of each sample point +.>
Preferably, in the step (1), the Voxel Grid filter voxels the space first, and the barycenters of the points located in each Voxel form the output point cloud.
Preferably, in the step (2), the sampling point set { P } 1 ,P 2 ,…,P n Selecting a plurality of point sets, namely a monitoring point set MPS, and enabling all sampling points in the monitoring point sets to be called monitoring points; for the ith monitoring point set P i The kth monitoring point in (a)Will->And->Respectively carrying out batch normalization and then splicing, and taking the splicing result as the characteristic of each monitoring point ∈10->Characteristics of each monitoring Point->Reflects the neighborhood of the monitoring pointThe distribution characteristics of the points in the domain, the characteristics of the monitoring points in different areas are provided with differentiation, the neighborhood of the target identification point to which the monitoring point belongs is judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point is predicted.
Preferably, in the step (3), if the number of target test points is L, for the i-th monitoring point set P i The kth monitoring point in (a)Single-layer full-connection layer with 1 output dimension L>Acting on its features->Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point by using L single-layer full-connection layers with 3 output dimensions Acting on its features->On the monitoring point->Predicting the offset (delta x, delta y, delta z) of each identification point; jth->And predicting the offset of the monitoring point and the j-th identification point.
Preferably, in the step (3), the parameters in the fully connected layers are shared in each sampling point set.
Fig. 1 is a flow chart of the structure of a Landmark Net and its application to a set of face points with normal dimensions. The following is a specific description:
a network consists of a number of feature abstraction operations and feature propagation operations. For any input point cloud P, a Voxel Grid filter is firstly used for downsampling the point cloud P into a point cloud P with a point cloud density of D 0 . The Voxel Grid filter first voxels the space, and the center of gravity of the points located within each Voxel forms the output point cloud. According to a fixed sampling proportion { tau } 12 ,…,τ n From P 0 Step by step sampling to obtain a sampling point set { P } 1 ,P 2 ,…,P n }. From the first set of sampling points P 1 Initially, a set of sampling points { P } is extracted step by step using a feature abstraction operation 1 ,P 2 ,…,P n Abstract features of sample points in }. Feature abstraction operations are applied to a set of points P i-1 Calculate the point set P i Abstract features of each sampling point in (a) for a set of sampling points P i The kth sample point inAt the sampling point set P i-1 Find the position at the point->Radius r as center i Neighborhood subset inside sphere->Using a Point distribution feature extractor (e.g., pointNet, RS-CNN, etc.) to act on +.>N in i Individual points and their feature vectors, resulting in points +.>Is>Wherein, n i Is positively correlated with a point cloud density D.Features of all sample points in each sample point set +.>Constitutes a point set P i Is an abstract feature set F of (1) i Each sampling point set { P ] 1 ,P 2 ,…,P n Feature set { F } 1 ,F 2 ,…,F n The spatial receptive field of is progressively expanding and is progressively abstract. Finally, the point of use and feature extractor acts on P n Will produce a feature vector that expresses the global feature. Next, from the last layer of sample point set P n Initially, a set of sampling points { P } will be obtained step by step n ,P n-1 ,…,P 1 The propagation features of all sample points within a set will constitute a propagation feature set +.>Feature propagation operations are applied to point set P i+1 Calculate the point set P i Propagation characteristics of each sampling point of the set P of sampling points i The kth sample point in->Will point set P i+1 Middle and->Abstract features of the nearest 3 points to AND +.>The derivative of the distance of (2) is weighted average for the weight, the weighted average result is combined with the point +.>Abstract features of->Splicing, and applying multiple multi-layer perceptrons (MLP) and nonlinear activation functions (ReLu) to splicing results to obtain point ∈>Propagation characteristics of->Due to the sampling point set P n A feature vector is identified as the next stage of the set P of sampling points and the weighted average result n The abstract features of each point in the model are spliced, and a point set P is obtained through a plurality of multi-layer perceptrons (MLP) and nonlinear activation functions (ReLu) n Propagation characteristics of each sample point +.>
From the set of sampling points { P 1 ,P 2 ,…,P n A plurality of point sets are selected, which are called Monitoring Point Sets (MPS), and all sampling points in the monitoring point sets are called monitoring points. For the ith monitoring point set P i The kth monitoring point in (a)Will->And->Respectively carrying out batch normalization and then splicing, and taking the splicing result as the characteristic of each monitoring point ∈10->Due to the characteristics of each monitoring point->The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas are distinguished, the neighborhood of which target identification point the monitoring point belongs to can be judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point can be predicted.
If the number of the target identification points is L, for the ith monitoring point set P i The kth monitoring point in (a)Single-layer full-connection layer with 1 output dimension L>Acting on its features->Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point by using L single-layer full-connection layers with 3 output dimensionsActing on its features->On the monitoring point->The offset (deltax, deltay, deltaz) from each identification point is predicted. Different->(e.g.)>) The offset of this monitoring point from a different identification point (e.g., the jth identification point) is predicted. The parameters at these fully connected layers are shared in each sample point set.
Features with larger spatial receptive fields can express distribution features of points within a larger spatial range, and can be used to locate identified points on a larger scale face, and vice versa. If multiple monitoring point sets with different spatial receptive fields are used, the network can be enabled to locate the identification points on faces of different scales at the same time. Because the relative topological relation of the identification points and the relative positions of the identification points and the characteristic areas on the human face are relatively fixed, the global information is helpful for positioning the identification points, and because the propagation characteristics of the points contain the global information, besides the abstract characteristics of the monitoring points, the propagation characteristics of the monitoring points are integrated as the characteristics of the monitoring points, so that the positioning stability of the network is improved.
The invention also provides a training method of the end-to-end point cloud deep learning network model, which is used for matching each monitoring point with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.
Preferably, when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with the corresponding size space receptive fields, and a series of boxes are respectively arranged with the gold standard identification points and the monitoring points as the centers and are respectively called a target frame TBX and a detection frame MBX.
Preferably, the gold standard is based on training dataSide length (l) of TBX is set x t ,l y t ,L z t ) The setting mode is formula (1):
wherein,left external corner of the eye, left eye, right eye, left eye, right>Is the right outer corner of the eye, is->Eyebrow, in particular->Is the chin; according to the method for detecting the position of each monitoring point>The upper level point is centrally generated +.>Radius r of sphere of (2) i Set->Length of side (l) x m ,l y m ,l z m ) The setting mode is formula (2):
l x m =l y m =l z m =2r i (2)
if TBX of j-th gold standard mark point and monitoring pointIs->Is greater than a threshold th m Then matching is performed according to equation (3):
preferably, all parameters of the network are trained simultaneously using the loss functions of equation (4), including the classification loss function and the regression loss function
loss=loss c +λloss r (4)
The classification loss function is formula (5)
Wherein i, k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively;
loss i,k for monitoring pointsClassification loss of->Is to use sigmoid function to act on +.>The j-th dimension of the output of (a) calculates the predicted monitoring point of the network>The probability of being positioned in the neighborhood of the jth gold standard identification point is defined, at least one monitoring point matched with one gold standard identification point is defined as a positive sample, the monitoring point which is not matched with any gold standard identification point is defined as a negative sample, and N is defined as p N is the number of positive samples e Number of negative samples;
according to loss of i,k Ordering negative samples, selecting loss i,k The largest first few negative samples calculate the classification loss, and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.
The regression loss function is formula (6):
is composed of networkPredicted monitoring Point->Offset from the jth target mark point isAn output of (2); />Is the corresponding gold standard.
Fig. 2 is a schematic diagram of a simple matching result of a monitoring point and a target identification point. The training method is described in detail below.
In the network training stage, the monitoring points are required to be matched with the gold standard in the training data, and the network is trained according to the matching result.
In order to solve the two problems, a multi-label matching strategy (MLM) is proposed, each monitoring point is matched with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.
When using this network to locate the identification points in a set of points with multiple dimensions, it is necessary to match the identification points in the set of points with a certain specific dimension with the monitoring points with a corresponding size of spatial receptive field, for which purpose a series of boxes, called Target Box (TBX) and detection box (MBX), are set, centered on the gold standard identification points and the monitoring points, respectively. As shown in fig. 2, two solid black dots and two bold line boxes represent two target identification dots and their TBXs, respectively. The three diagonally filled black dots and the three thin line boxes are the three monitoring points and their MBX, respectively.
In order to make the size of TBX reflect the dimension of human face in training data, according to the gold standard of training dataSide length (l) of TBX is set x t ,l y t ,l z t ) The arrangement mode is as follows:
wherein,left external corner of the eye, left eye, right eye, left eye, right>Is the right outer corner of the eye, is->Eyebrow, in particular->Is the chin tip.
According to the method for each monitoring pointThe upper level point is centrally generated +.>Radius r of sphere of (2) i Set->Length of side (l) x m ,l y m ,l z m ) The arrangement mode is as follows:
l x m =l y m =l z m =2r i
if TBX of j-th gold standard mark point and monitoring pointIs->Is greater than a threshold th m Then they are matched +.>
Loss function: all parameters of the network are synchronously trained using the following loss functions, including classification loss functions and regression loss functions.
loss=loss c +λloss r
Wherein the classification loss function is as follows:
wherein i, k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; loss of loss i,k For monitoring pointsClassification loss of->Is to use sigmoid function to act on +.>The j-th dimension of the output of (a) calculates the predicted monitoring point of the network>The probability of being located inside the neighborhood of the jth gold standard mark point is obtained by the formula (3)>Defining at least one monitoring point matched with one gold standard identification point as a positive sample, and defining the monitoring point which is not matched with any gold standard identification point as a negative sample, N p N is the number of positive samples e Is the number of negative samples.
Since the number of negative samples is much larger than the number of positive samples, according to loss i,k Ordering negative samples, selecting loss i,k The largest first few negative samples calculate the classification loss, and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.
The regression loss function is defined as follows:
obtained by the method (3) Is a monitoring point predicted by the network +.>Offset from the jth target mark point is +.>An output of (2); />Is the corresponding gold standard.
In more detail, RS-Conv is used as a point distribution feature extractor in the network, and 3D euclidean distance and coordinate difference (3D-Ed, x i -x j ) As low-level hierarchical relationship information h of the point cloud. The network contains 8 feature abstract operations and feature propagation operations, and the sampling proportion { tau } 12 ,…,τ 7 {7/20,8/10,10/15,15/20,20/25,25/60,60/120} respectively, for generating a local sample subset for each sample pointIs { r } 1 ,r 2 ,…,r 7 {8,10,15,20,25,60,120} (mm), respectively, the last feature abstraction operates to work on the point set P 7 Collecting a local point cloud subset of each sampling point from the upper-level sampling point set by using the furthest sampling point method>Local point cloud subset->Number of sample points { s } 1 ,s 2 ,…,s 7 The values are {75/V,100/V,50/V,75/V,75/V,200/V,100/V } where V is the size of the Grid in the Voxel Grid filter for down-sampling the set of input points, V=5 mm. In addition, λ=1, th m =0.2,th p =0.9,th d =3mm,th e =5mm。
And the covariance matrix Cov (X) for predicting the missing identification points is calculated according to the gold standard in the training set, and the missing gold standard identification points in the training data are supplemented to finish calculation of the matching condition of the gold standard and the monitoring points.
Data enhancement: the training data were rotated sequentially around the x, y, z axes, respectively, at randomly selected angle values ranging from-2.5 ° to +2.5°, and random jitter with a mean of 0 standard deviation of 0.25mm was added at each point of the training data. Random rotation and random jitter will change the training data used by the training network each time to be different from each other, which will stabilize the network training and is therefore very important.
The present invention is not limited to the preferred embodiments, but can be modified in any way according to the technical principles of the present invention, and all such modifications, equivalent variations and modifications are included in the scope of the present invention.

Claims (9)

1. A prediction method of an end-to-end point cloud deep learning network model is characterized by comprising the following steps of: the model is a deep learning network structure similar to a convolutional neural network CNN, and comprises the following steps:
(1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the point distribution characteristics of the neighborhood point clouds of the sampling points are gradually abstract and the space receptive field is gradually enlarged;
(2) Selecting part of point sets from the sampling point sets, and using all sampling points in the sampling point sets as monitoring points to locate the identification points;
(3) Predicting the probability that each monitoring point is located in the neighborhood of different identification points and the offset of each monitoring point and different identification points;
in the step (1), for any input point cloud P, a pixel Grid filter is first used to downsample the point cloud P to a point cloud density D 0 The method comprises the steps of carrying out a first treatment on the surface of the According to a fixed sampling proportion { tau } 12 ,…,τ n From P 0 Step-by-step downsampling to obtain a sampling point set { P } 1 ,P 2 ,…,P n -a }; from the first set of sampling points P 1 Initially, a set of sampling points { P } is extracted step by step using a feature abstraction operation 1 ,P 2 ,…,P n Abstract features of sample points in }; feature abstraction operations are applied to a set of points P i-1 Calculate the point set P i Abstract features of each sampling point in (a) for a set of sampling points P i The kth sample point inAt the sampling point set P i-1 Find the position at the point->Radius r as center i Adjacent to the spherical interiorDomain subset->Extracting +.>N in i Individual points and their feature vectors, resulting in points +.>Is an abstract feature vector f of (1) i k Wherein n is i Positively correlated to a point cloud density D; feature f of all sampling points in each sampling point set i k Composition point set P i Is an abstract feature set F of (1) i Each sampling point set { P ] 1 ,P 2 ,…,P n Feature set { F } 1 ,F 2 ,…,F n The spatial receptive field of the } is expanded step by step and is abstract step by step; end use point cloud feature extractor acting on P n Will produce a feature vector which expresses the global feature;
next, from the last layer of sample point set P n Initially, a set of sampling points { P } will be obtained step by step n ,P n-1 ,…,P 1 The propagation characteristics of all sample points within a set will constitute a propagation characteristic setFeature propagation operations are applied to point set P i+1 Calculate the point set P i Abstract features of each sampling point in (a) for a set of sampling points P i The kth sample point in->Will point set P i+1 Middle and->Abstract features of the nearest 3 points to AND +.>Is weighted by the inverse of the distance of (2), and the result of the weighted average is compared with the pointAbstract features f of (1) i k And splicing, namely, using a plurality of multi-layer perceptron MLP and a nonlinear activation function ReLu functions to act on splicing results to obtain a point ∈>Propagation characteristics of->Due to the sampling point set P n The next stage of (a) is a feature vector, then this feature vector is treated as a weighted average result and a set of sampling points P n The abstract features of each point in the list are spliced to obtain a point set P n Propagation characteristics of each sample point +.>
2. The method for predicting an end-to-end point cloud deep learning network model according to claim 1, wherein: in the step (1), the Voxel Grid filter voxels the space first, and the center of gravity of the point located in each Voxel forms an output point cloud.
3. The method for predicting an end-to-end point cloud deep learning network model according to claim 2, wherein: in the step (2), from the sampling point set { P } 1 ,P 2 ,…,P n Selecting a plurality of point sets, namely a monitoring point set MPS, and enabling all sampling points in the monitoring point sets to be called monitoring points; for the ith monitoring point set P i The kth monitoring point in (a)Will f i k And->Respectively carrying out batch normalization and then splicing, and taking the splicing result as the characteristic of each monitoring point ∈10->Characteristics of each monitoring Point->The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas are distinguished, the neighborhood of the destination ratio identification point where the monitoring point is located is judged according to the characteristics of each monitoring point, and the position of the adjacent destination ratio identification point is predicted.
4. The method for predicting an end-to-end point cloud deep learning network model of claim 3, wherein: in the step (3), if the number of the target identification points is L, for the ith monitoring point set P i The kth monitoring point in (a)Single-layer full-connection layer with 1 output dimension L>Acting on its features->Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point by using L single-layer full-connection layers with 3 output dimensionsActing on its features->On the monitoring point->Predicting the offset (delta x, delta y, delta z) of each identification point; jth->And predicting the offset of the monitoring point and the j-th identification point.
5. The method for predicting an end-to-end point cloud deep learning network model according to claim 4, wherein: in the step (3), the parameters at these fully connected layers are shared in each sample point set.
6. The method for predicting an end-to-end point cloud deep learning network model according to claim 5, wherein: each monitoring point is matched with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with each monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.
7. The method for predicting an end-to-end point cloud deep learning network model of claim 6, wherein: when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with the space receptive fields with corresponding sizes, and a series of boxes which are respectively called a target box TBX and a detection box MBX are respectively arranged by taking the gold standard identification points and the monitoring points as centers.
8. The method for predicting an end-to-end point cloud deep learning network model of claim 7, wherein: according to trainingGold standard for training dataSide length (l) of TBX is set x t ,l y t ,l z t ) The setting mode is formula (1):
wherein,left external corner of the eye, left eye, right eye, left eye, right>Is the right outer corner of the eye, is->Eyebrow, in particular->Is the chin;
according to the method for each monitoring pointThe upper level point is centrally generated +.>Radius r of sphere of (2) i Set->Length of side (l) x m ,l y m ,l z m ) The setting mode is formula (2):
l x m =l y m =l z m =2r i (2)
if TBX of j-th gold standard mark point and monitoring pointIs->Is greater than a threshold th m Then matching is performed according to equation (3):
9. the method for predicting an end-to-end point cloud deep learning network model of claim 8, wherein: learning all parameters of the network simultaneously using the loss functions of equation (4), including classification loss functions and regression loss functions
loss=loss c +λloss r (4)
The classification loss function is formula (5)
Wherein i, k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively;
loss i,k for monitoring pointsClassification loss of->Is to use sigmoid function to act on +.>The j-th dimension of the output of (a) calculates the predicted monitoring point of the network>The probability of being positioned in the neighborhood of the jth gold standard identification point is defined, at least one monitoring point matched with one gold standard identification point is defined as a positive sample, the monitoring point which is not matched with any gold standard identification point is defined as a negative sample, and N is defined as p N is the number of positive samples e Number of negative samples;
according to loss of i,k Ordering negative samples, selecting loss i,k The largest first several negative samples calculate the classification loss, and ensure that the number of the negative samples participating in the calculation is not more than three times of the number of the positive samples;
the regression loss function is formula (6):
is a monitoring point predicted by the network +.>Offset from the jth target mark point isAn output of (2); />Is the corresponding gold standard.
CN202010116881.9A 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method Active CN111428855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116881.9A CN111428855B (en) 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116881.9A CN111428855B (en) 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method

Publications (2)

Publication Number Publication Date
CN111428855A CN111428855A (en) 2020-07-17
CN111428855B true CN111428855B (en) 2023-11-14

Family

ID=71551571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116881.9A Active CN111428855B (en) 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method

Country Status (1)

Country Link
CN (1) CN111428855B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085123B (en) * 2020-09-25 2022-04-12 北方民族大学 Point cloud data classification and segmentation method based on salient point sampling
CN116045833B (en) * 2023-01-03 2023-12-22 中铁十九局集团有限公司 Bridge construction deformation monitoring system based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544700A (en) * 2018-10-12 2019-03-29 深圳大学 Processing method, device and the equipment of point cloud data neural network based
CN110197223A (en) * 2019-05-29 2019-09-03 北方民族大学 Point cloud data classification method based on deep learning
CN110321910A (en) * 2018-03-29 2019-10-11 中国科学院深圳先进技术研究院 Feature extracting method, device and equipment towards cloud

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11379688B2 (en) * 2017-03-16 2022-07-05 Packsize Llc Systems and methods for keypoint detection with convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321910A (en) * 2018-03-29 2019-10-11 中国科学院深圳先进技术研究院 Feature extracting method, device and equipment towards cloud
CN109544700A (en) * 2018-10-12 2019-03-29 深圳大学 Processing method, device and the equipment of point cloud data neural network based
CN110197223A (en) * 2019-05-29 2019-09-03 北方民族大学 Point cloud data classification method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection;Shaoshuai Shi etc.;arXiv:1912.13192v1;第1、3节、图1-2 *

Also Published As

Publication number Publication date
CN111428855A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
CN110135267B (en) Large-scene SAR image fine target detection method
Zhu Research on road traffic situation awareness system based on image big data
CN110472483B (en) SAR image-oriented small sample semantic feature enhancement method and device
CN108230329B (en) Semantic segmentation method based on multi-scale convolution neural network
CN109685768B (en) Pulmonary nodule automatic detection method and system based on pulmonary CT sequence
CN112488210A (en) Three-dimensional point cloud automatic classification method based on graph convolution neural network
CN106951923B (en) Robot three-dimensional shape recognition method based on multi-view information fusion
CN112396002A (en) Lightweight remote sensing target detection method based on SE-YOLOv3
CN106991411B (en) Remote Sensing Target based on depth shape priori refines extracting method
Ouyang et al. Vehicle target detection in complex scenes based on YOLOv3 algorithm
CN113255779B (en) Multi-source perception data fusion identification method, system and computer readable storage medium
CN111428855B (en) End-to-end point cloud deep learning network model and training method
EP4053734A1 (en) Hand gesture estimation method and apparatus, device, and computer storage medium
CN113128564B (en) Typical target detection method and system based on deep learning under complex background
Zelener et al. Cnn-based object segmentation in urban lidar with missing points
Fan et al. A novel sonar target detection and classification algorithm
CN113536920A (en) Semi-supervised three-dimensional point cloud target detection method
Lavanya et al. Enhancing Real-time Object Detection with YOLO Algorithm
CN112991281B (en) Visual detection method, system, electronic equipment and medium
CN116311387B (en) Cross-modal pedestrian re-identification method based on feature intersection
CN117152601A (en) Underwater target detection method and system based on dynamic perception area routing
Da et al. Remote sensing image ship detection based on improved YOLOv3
CN115115016A (en) Method and device for training neural network
Zhang et al. Multiple Objects Detection based on Improved Faster R-CNN
Mishra et al. Enhanced Object Detection with Deep Convolutional Neural Networks for Vehicle Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant