CN111428855A - End-to-end point cloud deep learning network model and training method - Google Patents

End-to-end point cloud deep learning network model and training method Download PDF

Info

Publication number
CN111428855A
CN111428855A CN202010116881.9A CN202010116881A CN111428855A CN 111428855 A CN111428855 A CN 111428855A CN 202010116881 A CN202010116881 A CN 202010116881A CN 111428855 A CN111428855 A CN 111428855A
Authority
CN
China
Prior art keywords
point
points
monitoring
sampling
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010116881.9A
Other languages
Chinese (zh)
Other versions
CN111428855B (en
Inventor
杨健
范敬凡
艾丹妮
郭龙腾
王涌天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202010116881.9A priority Critical patent/CN111428855B/en
Publication of CN111428855A publication Critical patent/CN111428855A/en
Application granted granted Critical
Publication of CN111428855B publication Critical patent/CN111428855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

An end-to-end point cloud deep learning network model and a training method can simultaneously position identification points on human faces with different scales, and the network has good positioning precision and high positioning speed. The network model is a deep learning network structure of a Convolutional Neural Network (CNN) and comprises the following components: (1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step; (2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points; (3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.

Description

End-to-end point cloud deep learning network model and training method
Technical Field
The invention relates to the technical field of point cloud image processing and deep learning, in particular to an end-to-end point cloud deep learning network model and an end-to-end point cloud deep learning training method.
Background
The three-dimensional image is a special information expression form and is characterized in that the three-dimensional data in the space to be expressed comprises the following expression forms: depth maps (expressing the distance of an object from the camera in grey scale), geometric models (built by CAD software), point cloud models (all reverse engineering equipment samples objects as point clouds). Compared with a two-dimensional image, the three-dimensional image can realize natural object-background decoupling by means of information of a third dimension. Point cloud data is the most common and basic three-dimensional model. The point cloud model is usually obtained directly by measurement, each point corresponds to a measurement point, and other processing means are not used, so that the point cloud model contains the maximum information quantity. The information is hidden in the point cloud and needs to be extracted by other extraction means, and the process of extracting the information in the point cloud is three-dimensional image processing.
The point cloud is a massive point set which expresses the target space distribution and the target surface characteristics under the same space reference system, and after the space coordinates of each sampling point on the surface of the object are obtained, the point set is obtained and is called as the point cloud (PointCloud).
The rapid and accurate positioning of the identification points in the point cloud is very important to be applied in the fields of identity recognition, 3D model segmentation, 3D model retrieval and the like, wherein the automatic positioning of the identification points in the 3D face point cloud is very important to be applied in the aspects of face recognition, expression recognition, head pose recognition, head motion estimation, head point cloud dense matching, lip shape analysis, head surgery, disease diagnosis and the like.
However, the current technology cannot simultaneously guarantee the accuracy and the speed of the algorithm, the algorithm with higher speed has lower accuracy, and the algorithm with higher accuracy has lower speed, so that the application with higher requirements on the accuracy and the speed cannot be met.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an end-to-end point cloud deep learning network model, which can simultaneously position identification points on human faces with different scales, and has high positioning precision and high positioning speed.
The technical scheme of the invention is as follows: the end-to-end point cloud deep learning network model is a deep learning network structure of a Convolutional Neural Network (CNN) and comprises the following steps:
(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;
(2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points;
(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
The invention uses a point distribution characteristic extractor to extract neighborhood point cloud distribution characteristics of sampling points, the neighborhood point distribution characteristics of the points are abstracted step by step and the spatial receptive field is enlarged step by step, thereby expressing the distribution characteristics of the points in different spatial ranges; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning accuracy, and the algorithm consumes shorter time and is more stable due to the fact that the algorithm consumes time for forward propagation of point cloud in the network and the lightweight design.
The method matches each monitoring point with a plurality of identification points, matches the identification points with the monitoring point as long as the monitoring point is adjacent to one identification point, predicts the position of the identification point matched with the monitoring point by using the characteristics of each monitoring point, and converts the positioning problem of the identification point in the point cloud into a multi-label prediction and regression problem.
Drawings
FIG. 1 is a flow chart of the structure of L andemark Net and its application to a set of normal-scale face points.
FIG. 2 is a diagram illustrating a simple matching result between a monitoring point and a target identification point.
FIG. 3 is a flow chart of an end-to-end point cloud deep learning network model according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to make the description of the present disclosure more complete and complete, the following description is given for illustrative purposes with respect to the embodiments and examples of the present invention; it is not intended to be the only form in which the embodiments of the invention may be practiced or utilized. The embodiments are intended to cover the features of the various embodiments as well as the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and step sequences.
As shown in fig. 3, the end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and includes the following steps:
(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the neighborhood point cloud point distribution characteristics of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;
(2) selecting a part of point sets from the sampling point sets, and using all the sampling points in the sampling point sets as monitoring points to position the identification points;
(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
The invention uses a point distribution characteristic extractor to extract neighborhood point cloud distribution characteristics of sampling points, the neighborhood point distribution characteristics of the points are abstracted step by step and the spatial receptive field is enlarged step by step, thereby expressing the distribution characteristics of the points in different spatial ranges; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning accuracy, and the algorithm consumes shorter time and is more stable due to the fact that the algorithm consumes time for forward propagation of point cloud in the network and the lightweight design.
Preferably, in the step (1), for any input point cloud P, it is first down-sampled into a point cloud P with a point cloud density D using a Voxel Grid filter0(ii) a According to a fixed sampling ratio { tau12,…,τnFrom P0Gradually down-sampling to obtain a sampling point set { P1,P2,…,Pn};
From the first set of sample points P1Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation1,P2,…,PnAbstract features of the sample points in. Feature abstraction operations on a set of points Pi-1Computing a set of points PiFor a set of sampling points PiInner kth sample point
Figure BDA0002391759040000041
At a set of sampling points Pi-1The interior finding is located at the point
Figure BDA0002391759040000042
A radius of r as a centeriNeighborhood subset inside sphere
Figure BDA0002391759040000043
Extraction using point distribution feature extractor
Figure BDA0002391759040000044
N iniPoints and their feature vectorsGet a point
Figure BDA0002391759040000045
Abstract feature vector of
Figure BDA0002391759040000046
Wherein n isiAnd the point cloud density is positively correlated with D. Set of sampling points PiCharacteristic of all sampling points in
Figure BDA0002391759040000047
Composition point set PiAbstract feature set F ofiSet of sample points { P }1,P2,…,PnFeature set of { F } {1,F2,…,FnThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted; finally, the point cloud extractor acts on PnAll the points in (a) will produce a feature vector that expresses global features.
Then, from the last layer sampling point set PnThe sampling point set P is obtained step by stepn,Pn-1,…,P1The propagation characteristics of all sampling points in the set will constitute a propagation characteristic set
Figure BDA00023917590400000517
Feature propagation operations on a set of points Pi+1Computing a set of points PiFor each sample point set PiInner kth sample point
Figure BDA0002391759040000051
Set of points Pi+1Neutralization of
Figure BDA0002391759040000052
Abstract features of the nearest 3 points and
Figure BDA0002391759040000053
taking the derivative of the distance as weight to carry out weighted average, and taking the weighted average result and the point
Figure BDA0002391759040000054
Abstract features of
Figure BDA0002391759040000055
Splicing is carried out, a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u functions) are used for acting on splicing results, and points are obtained
Figure BDA0002391759040000056
Characteristic of propagation of
Figure BDA0002391759040000057
Set of sampling points PiPropagation characteristics of all the sampling points
Figure BDA0002391759040000058
Composition point set PiIs propagated feature set
Figure BDA0002391759040000059
Due to the sampling point set PnThe next stage of (2) is a feature vector, which is taken as the weighted average result and the set of sampling points PnSplicing the abstract features of each point to obtain a point set PnPropagation characteristics of each sampling point
Figure BDA00023917590400000510
Preferably, in step (1), the Voxel Grid filter is first voxelized, and the barycenter of the point located in each Voxel constitutes the output point cloud.
Preferably, in the step (2), from the sampling point set { P }1,P2,…,PnSelecting a plurality of point sets, which are called monitoring point sets MPS, and all sampling points in the monitoring point sets are called monitoring points; for the ith monitor Point set PiThe k monitoring point of
Figure BDA00023917590400000511
Will be provided with
Figure BDA00023917590400000512
And
Figure BDA00023917590400000513
respectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring point
Figure BDA00023917590400000514
Characteristics of each monitoring point
Figure BDA00023917590400000515
The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have discrimination, the neighborhood of which target identification point the monitoring point belongs to is judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point is predicted.
Preferably, in the step (3), if the number of target standard test points is L, the ith monitoring point set P is selectediThe k monitoring point of
Figure BDA00023917590400000516
Using 1 single fully-connected layer with output dimension of L
Figure BDA0002391759040000061
Acting on its characteristics
Figure BDA0002391759040000062
Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3
Figure BDA0002391759040000063
Figure BDA0002391759040000064
Acting on its characteristics
Figure BDA0002391759040000065
To the monitoring point
Figure BDA0002391759040000066
Predicting the offset (delta x, delta y, delta z) of each identification point; the jth
Figure BDA0002391759040000067
And predicting the offset of the monitoring point and the jth identification point.
Preferably, in the step (3), the parameters at these fully-connected layers are shared among each set of sampling points.
FIG. 1 is a flow chart of the structure of L andemark Net and its application to a set of normal-scale face points.
A network consists of many feature abstraction operations and feature propagation operations. For any input point cloud P, it is first down-sampled using a Voxel Grid filter into a point cloud P with a point cloud density D0. The Voxel Grid filter first voxelizes the space, and the center of gravity of the points located within each Voxel constitutes the output point cloud. According to a fixed sampling ratio { tau12,…,τnFrom P0Step-by-step sampling to obtain a sampling point set { P1,P2,…,Pn}. From the first set of sample points P1Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation1,P2,…,PnAbstract features of the sample points in. Feature abstraction operations on a set of points Pi-1Computing a set of points PiFor a set of sampling points PiInner kth sample point
Figure BDA0002391759040000068
At a set of sampling points Pi-1The interior finding is located at the point
Figure BDA0002391759040000069
A radius of r as a centeriNeighborhood subset inside sphere
Figure BDA00023917590400000610
Acting on using point-distributed feature extractors (e.g. PointNet, RS-CNN, etc.)
Figure BDA00023917590400000611
N iniPoints and their feature vectors to obtain points
Figure BDA00023917590400000612
Abstract feature vector of
Figure BDA00023917590400000613
Wherein n isiAnd the point cloud density is positively correlated with D. The characteristics of all sampling points in each sampling point set
Figure BDA00023917590400000614
Form a point set PiAbstract feature set F ofiSet of sample points { P }1,P2,…,PnFeature set of { F } {1,F2,…,FnThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted. Finally, the point of use acts on P with the feature extractornAll the points in (a) will produce a feature vector that expresses global features. Then, from the last layer sampling point set PnInitially, a set of sample points { P } will be obtained step by stepn,Pn-1,…,P1The propagation characteristics of all sampling points in the set will constitute a propagation characteristic set
Figure BDA00023917590400000615
Feature propagation operations on a set of points Pi+1Computing a set of points PiFor each sample point set PiInner kth sample point
Figure BDA00023917590400000616
Set of points Pi+1Neutralization of
Figure BDA00023917590400000617
Abstract features of the nearest 3 points and
Figure BDA0002391759040000071
is weighted average, the weighted average result is obtainedAnd point
Figure BDA0002391759040000072
Abstract features of
Figure BDA0002391759040000073
Performing splicing, and applying a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u) to the splicing result to obtain points
Figure BDA0002391759040000074
Characteristic of propagation of
Figure BDA0002391759040000075
Due to the sampling point set PnIdentifies a feature vector, and then takes this feature vector as the weighted average result and the set of sample points PnSplicing abstract features of each point in the point set P, and obtaining a point set P through a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u)nPropagation characteristics of each sampling point
Figure BDA0002391759040000076
From the set of sample points { P1,P2,…,PnA plurality of point sets are selected, and are called Monitoring Point Sets (MPS), and all sampling points in the monitoring point sets are called monitoring points. For the ith monitor Point set PiThe k monitoring point of
Figure BDA0002391759040000077
Will be provided with
Figure BDA0002391759040000078
And
Figure BDA0002391759040000079
respectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring point
Figure BDA00023917590400000710
Due to the characteristics of each monitoring point
Figure BDA00023917590400000711
The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have the discrimination, the neighborhood of which target identification point the monitoring point belongs to can be judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point can be predicted.
If the number of the target identification points is L, for the ith monitoring point set PiThe k monitoring point of
Figure BDA00023917590400000712
Using 1 single fully-connected layer with output dimension of L
Figure BDA00023917590400000713
Acting on its characteristics
Figure BDA00023917590400000714
Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3
Figure BDA00023917590400000715
Acting on its characteristics
Figure BDA00023917590400000716
To the monitoring point
Figure BDA00023917590400000717
The amount of deviation (Δ x, Δ y, Δ z) from each marker point is predicted. Is different
Figure BDA00023917590400000718
(for example,
Figure BDA00023917590400000719
) The offset of the monitoring point from a different identification point (e.g., the jth identification point) is predicted. The parameters at these fully-connected layers are shared among each set of sample points.
Features with larger spatial receptive fields can express distribution features of points in a larger spatial range, and can be used to locate identification points on faces with larger dimensions, and vice versa. If a plurality of monitoring point sets with different spatial receptive fields are used, the network can be enabled to simultaneously locate the identification points on faces of different dimensions. Because the relative topological relation of the identification points and the relative positions of the identification points and the feature areas on the human face are relatively fixed, the global information helps to position the identification points, and because the propagation characteristics of the points contain the global information, the propagation characteristics of the monitoring points are integrated as the characteristics of the monitoring points in addition to the abstract characteristics of the monitoring points, so that the positioning stability of the network is improved.
The method matches each monitoring point with a plurality of identification points, matches the identification points with the monitoring point as long as the monitoring point is adjacent to one identification point, predicts the position of the identification point matched with the monitoring point by using the characteristics of each monitoring point, and converts the positioning problem of the identification point in the point cloud into a multi-label prediction and regression problem.
Preferably, when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with a specific scale are matched with the monitoring points with the corresponding spatial receptive fields, and a series of boxes, namely a target box TBX and a detection box MBX, are arranged by taking the gold standard identification points and the monitoring points as centers respectively.
Preferably, the golden standard is based on training data
Figure BDA0002391759040000081
Set the side length (l) of TBXx t,ly t,Lz t) The setting mode is formula (1):
Figure BDA0002391759040000082
wherein the content of the first and second substances,
Figure BDA0002391759040000083
is the left external canthus of the eye,
Figure BDA0002391759040000084
is the right external canthus,
Figure BDA0002391759040000085
is the heart of the eyebrow, and the eyebrow,
Figure BDA0002391759040000086
is the lower base; according to the use at each monitoring point
Figure BDA0002391759040000087
Upper level point centralized generation
Figure BDA0002391759040000088
Radius r of the sphere ofiIs provided with
Figure BDA0002391759040000089
Length of side (l)x m,ly m,lz m) The setting mode is formula (2):
lx m=ly m=lz m=2ri(2)
if the TBX and the monitoring point of the jth golden standard identification point
Figure BDA00023917590400000810
Is/are as follows
Figure BDA00023917590400000811
Has a jaccard value exceeding a threshold value thmThen matching is performed according to equation (3):
Figure BDA00023917590400000812
preferably, all parameters of the network are trained simultaneously using the loss functions of equation (4), including classification loss functions and regression loss functions
loss=lossc+λlossr(4)
The classification loss function is formula (5)
Figure BDA0002391759040000091
Figure BDA0002391759040000092
Wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively;
lossi,kas a monitoring point
Figure BDA0002391759040000093
The loss of classification of (a) is,
Figure BDA0002391759040000094
is acted on using sigmoid function
Figure BDA0002391759040000095
Monitoring point of network prediction obtained by j-th dimension calculation of output
Figure BDA0002391759040000096
The probability in the adjacent area of the jth gold standard identification point defines the monitoring point matched with at least one gold standard identification point as a positive sample, the monitoring point not matched with any gold standard identification point as a negative sample, NpIs the number of positive samples, NeThe number of negative samples;
according to lossi,kSorting the negative samples and selecting the lossi,kThe largest first few negative samples calculate the classification loss and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.
The regression loss function is formula (6):
Figure BDA0002391759040000097
Figure BDA0002391759040000098
is a monitoring point predicted by the network
Figure BDA0002391759040000099
Offset from the jth target mark point of
Figure BDA00023917590400000910
An output of (d);
Figure BDA00023917590400000911
corresponding gold standard.
Fig. 2 is a schematic diagram of a simple matching result between the monitoring point and the target identification point. The training method is described in detail below.
In the network training stage, the monitoring points need to be matched with the gold standard in the training data, and the network is trained according to the matching result.
In order to solve the two problems, a multi-label matching strategy (M L M) is provided, each monitoring point is matched with a plurality of identification points, the identification points are matched with the monitoring points as long as the monitoring points are adjacent to a certain identification point, the position of the identification point matched with the monitoring point is predicted by using the characteristics of each monitoring point, and the positioning problem of the identification points in the point cloud is converted into a multi-label prediction and regression problem.
When the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with a certain specific scale need to be matched with the monitoring points with the corresponding large and small spatial receptive fields, and therefore a series of boxes, namely a Target Box (TBX) and a detection box (MBX), are arranged by taking the gold standard identification points and the monitoring points as centers respectively. As shown in fig. 2, two solid black dots and two bold line boxes represent two target identification points and their TBXs, respectively. The three slashed-filled black dots and the three thin line boxes are the three monitoring points and their MBXs, respectively.
In order to make the size of TBX reflect the size of the face in the training data, the training is based onGolden standard for practicing data
Figure BDA0002391759040000101
Set the side length (l) of TBXx t,ly t,lz t) The setting mode is as follows:
Figure BDA0002391759040000102
wherein the content of the first and second substances,
Figure BDA0002391759040000103
is the left external canthus of the eye,
Figure BDA0002391759040000104
is the right external canthus,
Figure BDA0002391759040000105
is the heart of the eyebrow, and the eyebrow,
Figure BDA0002391759040000106
is the tip of the lower chin.
According to the use at each monitoring point
Figure BDA0002391759040000107
Upper level point centralized generation
Figure BDA0002391759040000108
Radius r of the sphere ofiIs provided with
Figure BDA0002391759040000109
Length of side (l)x m,ly m,lz m) The setting mode is as follows:
lx m=ly m=lz m=2ri
if the TBX and the monitoring point of the jth golden standard identification point
Figure BDA00023917590400001010
Is/are as follows
Figure BDA00023917590400001011
Has a jaccard value exceeding a threshold value thmThen match them
Figure BDA00023917590400001012
Figure BDA00023917590400001013
Loss function: all parameters of the network are synchronously trained using loss functions, including classification loss functions and regression loss functions.
loss=lossc+λlossr
Wherein the classification loss function is as follows:
Figure BDA0002391759040000111
Figure BDA0002391759040000112
wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; lossi,kAs a monitoring point
Figure BDA0002391759040000113
The loss of classification of (a) is,
Figure BDA0002391759040000114
is acted on using sigmoid function
Figure BDA0002391759040000115
Monitoring point of network prediction obtained by j-th dimension calculation of output
Figure BDA0002391759040000116
The probability in the jth gold standard identification point neighborhood is obtained by the formula (3)
Figure BDA0002391759040000117
Defining the monitoring points matched with at least one gold standard identification point as positive samples, and the monitoring points not matched with any gold standard identification point as negative samples, NpIs the number of positive samples, NeThe number of negative samples.
Since the number of negative samples is much larger than the number of positive samples, according to lossi,kSorting the negative samples and selecting the lossi,kThe largest first few negative samples calculate the classification loss and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.
The regression loss function is defined as follows:
Figure BDA0002391759040000118
obtained by the formula (3)
Figure BDA0002391759040000119
Figure BDA00023917590400001110
Is a monitoring point predicted by the network
Figure BDA00023917590400001111
Offset from the jth target mark point of
Figure BDA00023917590400001112
An output of (d);
Figure BDA00023917590400001113
corresponding gold standard.
In more detail, the RS-Conv is used as a point distribution feature extractor in the network, and 3D euclidean distances and coordinate differences (3D-Ed, x) are usedi-xj) As the low-level distribution relation information h of the point cloud. The network comprises 8 feature abstraction operations and feature propagation operations in total, and the sampling ratio is { tau12,…,τ7Are {7/20,8/10,10/15,15 }20,20/25,25/60,60/120 for generating a local subset of samples for each sample point
Figure BDA00023917590400001114
Sampling radius of { r1,r2,…,r7Are {8,10,15,20,25,60,120} (mm), respectively, and the last feature abstraction operation is to act on the set of points P7Collecting local point cloud subsets of each sampling point from the upper stage sampling point set by using a farthest point collection method
Figure BDA0002391759040000121
Local point cloud subset
Figure BDA0002391759040000122
Number of intermediate sampling points s1,s2,…,s7Are {75/V,100/V,50/V,75/V, 200/V,100/V } respectively, where V is the size of the Grid in the Voxel Grid filter used to downsample the set of input points, and V is 5 mm. In addition, λ is 1, thm=0.2,thp=0.9,thd=3mm,the=5mm。
And (4) calculating a covariance matrix cov (X) for predicting the missing identification points according to the gold standard in the training set, and supplementing the missing gold standard identification points in the training data to finish the calculation of the matching condition of the gold standard and the monitoring points.
Data enhancement: the training data is sequentially rotated around the x, y, z axes by randomly selected angular values ranging from-2.5 ° to +2.5 ° and random jitter with a mean value of 0 and a standard deviation of 0.25mm is added to each point of the training data. Random rotation and random jitter will make the training data used each time the network is trained different from each other, which will make the network training stable and therefore very important.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (10)

1. An end-to-end point cloud deep learning network model is characterized in that: the deep learning network structure of the convolutional neural network CNN is similar to the deep learning network structure of the convolutional neural network CNN, and comprises the following steps:
(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;
(2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points;
(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.
2. The end-to-end point cloud deep learning network model of claim 1, wherein: in the step (1), for any input point cloud P, firstly, a Voxel Grid filter is used for down-sampling the point cloud P into a point cloud P with a point cloud density D0(ii) a According to a fixed sampling ratio { tau12,…,τnFrom P0Gradually down-sampling to obtain a sampling point set { P1,P2,…,Pn};
From the first set of sample points P1Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation1,P2,…,PnAbstract characteristics of sampling points in the data; feature abstraction operations on a set of points Pi-1Computing a set of points PiFor a set of sampling points PiInner kth sample point
Figure FDA0002391759030000011
At a set of sampling points Pi-1The interior finding is located at the point
Figure FDA0002391759030000012
A radius of r as a centeriNeighborhood subset inside sphere
Figure FDA0002391759030000013
Extraction using point distribution feature extractor
Figure FDA0002391759030000014
N iniPoints and their feature vectors to obtain points
Figure FDA0002391759030000015
Abstract feature vector of
Figure FDA0002391759030000016
Wherein n isiThe point cloud density is positively correlated with D; the characteristics of all sampling points in each sampling point set
Figure FDA0002391759030000017
Composition point set PiAbstract feature set F ofiSet of sample points { P }1,P2,…,PnFeature set of { F } {1,F2,…,FnThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted; finally using point cloud feature extractor to act on PnAll the points in the image are used for generating a feature vector expressing global features;
then, from the last layer sampling point set PnInitially, a set of sample points { P } will be obtained step by stepn,Pn-1,…,P1The propagation characteristics of all sampling points in the set will constitute a propagation characteristic set
Figure FDA0002391759030000021
Feature propagation operations on a set of points Pi+1Computing a set of points PiFor a set of sampling points PiInner kth sample point
Figure FDA0002391759030000022
Set of points Pi+1Neutralization of
Figure FDA0002391759030000023
Abstract features of the nearest 3 points and
Figure FDA0002391759030000024
the reciprocal of the distance of (a) is weighted average, and the result of the weighted average is compared with the point
Figure FDA0002391759030000025
Abstract features of
Figure FDA0002391759030000026
And performing splicing, namely using a plurality of multi-layer perceptron M L P and nonlinear activation function Ru L u functions to act on a splicing result to obtain points
Figure FDA0002391759030000027
Characteristic of propagation of
Figure FDA0002391759030000028
Due to the sampling point set PnIs a feature vector, then this feature vector is taken as the weighted average result and the set of sample points PnSplicing the abstract features of each point to obtain a point set PnPropagation characteristics of each sampling point
Figure FDA0002391759030000029
3. The end-to-end point cloud deep learning network model of claim 2, wherein: in the step (1), a Voxel Grid filter firstly voxelizes a space, and the barycenter of a point located in each Voxel forms an output point cloud.
4. The end-to-end point cloud deep learning network model of claim 3, characterized in thatThe method comprises the following steps: in the step (2), from the sampling point set { P }1,P2,…,PnSelecting a plurality of point sets, which are called monitoring point sets MPS, and all sampling points in the monitoring point sets are called monitoring points; for the ith monitor Point set PiThe k monitoring point of
Figure FDA00023917590300000210
Will be provided with
Figure FDA00023917590300000211
And
Figure FDA00023917590300000212
respectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring point
Figure FDA00023917590300000213
Characteristics of each monitoring point
Figure FDA00023917590300000214
The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have discrimination, the neighborhood of which eye ratio identification point the monitoring point is located in is judged according to the characteristics of each monitoring point, and the position adjacent to the eye ratio identification point is predicted.
5. The end-to-end point cloud deep learning network model of claim 4, wherein in the step (3), if the number of target identification points is L, for the ith monitoring point set PiThe k monitoring point of
Figure FDA0002391759030000031
Using 1 single fully-connected layer with output dimension of L
Figure FDA0002391759030000032
Acting on its characteristics
Figure FDA0002391759030000033
Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3 (
Figure FDA0002391759030000034
j-0, 1, …, L-1) on its characteristics
Figure FDA0002391759030000035
To the monitoring point
Figure FDA0002391759030000036
Predicting the offset (delta x, delta y, delta z) of each identification point; the jth
Figure FDA0002391759030000037
And predicting the offset of the monitoring point and the jth identification point.
6. The end-to-end point cloud deep learning network model of claim 5, wherein: in step (3), the parameters at these fully-connected layers are shared among each set of sample points.
7. The method for training the end-to-end point cloud deep learning network model according to claim 6, wherein: each monitoring point is matched with a plurality of identification points, as long as the monitoring point is adjacent to one identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristics of each monitoring point, and the positioning problem of the identification points in the point cloud is converted into a multi-label prediction and regression problem.
8. The method for training the end-to-end point cloud deep learning network model according to claim 7, wherein: when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with corresponding large and small spatial receptive fields, and a series of boxes, namely a target box TBX and a detection box MBX, are respectively arranged by taking the gold standard identification points and the monitoring points as centers.
9. The method for training the end-to-end point cloud deep learning network model according to claim 8, wherein: gold standard according to training data
Figure FDA0002391759030000038
Set the side length (l) of TBXx t,ly t,lz t) The setting mode is formula (1):
Figure FDA0002391759030000041
wherein the content of the first and second substances,
Figure FDA0002391759030000042
is the left external canthus of the eye,
Figure FDA0002391759030000043
is the right external canthus,
Figure FDA0002391759030000044
is the heart of the eyebrow, and the eyebrow,
Figure FDA0002391759030000045
is the lower base;
according to the use at each monitoring point
Figure FDA0002391759030000046
Upper level point centralized generation
Figure FDA0002391759030000047
Radius r of the sphere ofiIs provided with
Figure FDA0002391759030000048
Length of side (l)x m,ly m,lz m) The setting mode is formula (2):
lx m=ly m=lz m=2ri(2)
if the TBX and the monitoring point of the jth golden standard identification point
Figure FDA0002391759030000049
Is/are as follows
Figure FDA00023917590300000410
Has a jaccard value exceeding a threshold value thmThen matching is performed according to equation (3):
Figure FDA00023917590300000411
10. the method for training the end-to-end point cloud deep learning network model according to claim 9, wherein: learning all parameters of the network simultaneously using the loss functions of equation (4), including classification loss function and regression loss function
loss=lossc+λlossr(4)
The classification loss function is formula (5)
Figure FDA00023917590300000412
Figure FDA00023917590300000413
Wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; lossi,kAs a monitoring point
Figure FDA00023917590300000414
The loss of classification of (a) is,
Figure FDA00023917590300000415
is acted on using sigmoid function
Figure FDA00023917590300000416
Monitoring point of network prediction obtained by j-th dimension calculation of output
Figure FDA00023917590300000417
The probability in the adjacent area of the jth gold standard identification point defines the monitoring point matched with at least one gold standard identification point as a positive sample, the monitoring point not matched with any gold standard identification point as a negative sample, NpIs the number of positive samples, NeThe number of negative samples;
according to lossi,kSorting the negative samples and selecting the lossi,kCalculating the classification loss of the first negative samples with the largest size, and ensuring that the number of the negative samples participating in calculation is not more than three times of the number of the positive samples;
the regression loss function is formula (6):
Figure FDA0002391759030000051
Figure FDA0002391759030000052
is a monitoring point predicted by the network
Figure FDA0002391759030000053
Offset from the jth target mark point of
Figure FDA0002391759030000054
An output of (d);
Figure FDA0002391759030000055
corresponding gold standard.
CN202010116881.9A 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method Active CN111428855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010116881.9A CN111428855B (en) 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116881.9A CN111428855B (en) 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method

Publications (2)

Publication Number Publication Date
CN111428855A true CN111428855A (en) 2020-07-17
CN111428855B CN111428855B (en) 2023-11-14

Family

ID=71551571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116881.9A Active CN111428855B (en) 2020-02-25 2020-02-25 End-to-end point cloud deep learning network model and training method

Country Status (1)

Country Link
CN (1) CN111428855B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085123A (en) * 2020-09-25 2020-12-15 北方民族大学 Point cloud data classification and segmentation method based on salient point sampling
CN116045833A (en) * 2023-01-03 2023-05-02 中铁十九局集团有限公司 Bridge construction deformation monitoring system based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268256A1 (en) * 2017-03-16 2018-09-20 Aquifi, Inc. Systems and methods for keypoint detection with convolutional neural networks
CN109544700A (en) * 2018-10-12 2019-03-29 深圳大学 Processing method, device and the equipment of point cloud data neural network based
CN110197223A (en) * 2019-05-29 2019-09-03 北方民族大学 Point cloud data classification method based on deep learning
CN110321910A (en) * 2018-03-29 2019-10-11 中国科学院深圳先进技术研究院 Feature extracting method, device and equipment towards cloud

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268256A1 (en) * 2017-03-16 2018-09-20 Aquifi, Inc. Systems and methods for keypoint detection with convolutional neural networks
CN110321910A (en) * 2018-03-29 2019-10-11 中国科学院深圳先进技术研究院 Feature extracting method, device and equipment towards cloud
CN109544700A (en) * 2018-10-12 2019-03-29 深圳大学 Processing method, device and the equipment of point cloud data neural network based
CN110197223A (en) * 2019-05-29 2019-09-03 北方民族大学 Point cloud data classification method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAOSHUAI SHI ETC.: "PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection", ARXIV:1912.13192V1 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085123A (en) * 2020-09-25 2020-12-15 北方民族大学 Point cloud data classification and segmentation method based on salient point sampling
CN112085123B (en) * 2020-09-25 2022-04-12 北方民族大学 Point cloud data classification and segmentation method based on salient point sampling
CN116045833A (en) * 2023-01-03 2023-05-02 中铁十九局集团有限公司 Bridge construction deformation monitoring system based on big data
CN116045833B (en) * 2023-01-03 2023-12-22 中铁十九局集团有限公司 Bridge construction deformation monitoring system based on big data

Also Published As

Publication number Publication date
CN111428855B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
Zhu Research on road traffic situation awareness system based on image big data
Dong et al. Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells
CN108182441B (en) Parallel multichannel convolutional neural network, construction method and image feature extraction method
CN106951923B (en) Robot three-dimensional shape recognition method based on multi-view information fusion
CN109671102B (en) Comprehensive target tracking method based on depth feature fusion convolutional neural network
CN110956111A (en) Artificial intelligence CNN, LSTM neural network gait recognition system
Sun et al. Deep learning‐based single‐cell optical image studies
CN111797683A (en) Video expression recognition method based on depth residual error attention network
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN105160400A (en) L21 norm based method for improving convolutional neural network generalization capability
Abas et al. A YOLO and convolutional neural network for the detection and classification of leukocytes in leukemia
CN113313123B (en) Glance path prediction method based on semantic inference
CN114299150A (en) Depth 6D pose estimation network model and workpiece pose estimation method
CN111414875B (en) Three-dimensional point cloud head posture estimation system based on depth regression forest
Steinberg et al. A Bayesian nonparametric approach to clustering data from underwater robotic surveys
Kate et al. Breast cancer image multi-classification using random patch aggregation and depth-wise convolution based deep-net model
CN111428855A (en) End-to-end point cloud deep learning network model and training method
Wiesner et al. On generative modeling of cell shape using 3D GANs
CN114937182A (en) Image emotion distribution prediction method based on emotion wheel and convolutional neural network
CN110210380A (en) The analysis method of personality is generated based on Expression Recognition and psychology test
Özbay et al. 3D Human Activity Classification with 3D Zernike Moment Based Convolutional, LSTM-Deep Neural Networks.
Cao et al. 3D convolutional neural networks fusion model for lung nodule detection onclinical CT scans
CN115616570A (en) SAR target recognition method based on semi-supervised generation countermeasure network
Oztel et al. Deep learning approaches in electron microscopy imaging for mitochondria segmentation
CN113988163A (en) Radar high-resolution range profile identification method based on multi-scale grouping fusion convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant