CN111428855A

CN111428855A - End-to-end point cloud deep learning network model and training method

Info

Publication number: CN111428855A
Application number: CN202010116881.9A
Authority: CN
Inventors: 杨健; 范敬凡; 艾丹妮; 郭龙腾; 王涌天
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-07-17
Anticipated expiration: 2040-02-25
Also published as: CN111428855B

Abstract

An end-to-end point cloud deep learning network model and a training method can simultaneously position identification points on human faces with different scales, and the network has good positioning precision and high positioning speed. The network model is a deep learning network structure of a Convolutional Neural Network (CNN) and comprises the following components: (1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step; (2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points; (3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.

Description

End-to-end point cloud deep learning network model and training method

Technical Field

The invention relates to the technical field of point cloud image processing and deep learning, in particular to an end-to-end point cloud deep learning network model and an end-to-end point cloud deep learning training method.

Background

The three-dimensional image is a special information expression form and is characterized in that the three-dimensional data in the space to be expressed comprises the following expression forms: depth maps (expressing the distance of an object from the camera in grey scale), geometric models (built by CAD software), point cloud models (all reverse engineering equipment samples objects as point clouds). Compared with a two-dimensional image, the three-dimensional image can realize natural object-background decoupling by means of information of a third dimension. Point cloud data is the most common and basic three-dimensional model. The point cloud model is usually obtained directly by measurement, each point corresponds to a measurement point, and other processing means are not used, so that the point cloud model contains the maximum information quantity. The information is hidden in the point cloud and needs to be extracted by other extraction means, and the process of extracting the information in the point cloud is three-dimensional image processing.

The point cloud is a massive point set which expresses the target space distribution and the target surface characteristics under the same space reference system, and after the space coordinates of each sampling point on the surface of the object are obtained, the point set is obtained and is called as the point cloud (PointCloud).

The rapid and accurate positioning of the identification points in the point cloud is very important to be applied in the fields of identity recognition, 3D model segmentation, 3D model retrieval and the like, wherein the automatic positioning of the identification points in the 3D face point cloud is very important to be applied in the aspects of face recognition, expression recognition, head pose recognition, head motion estimation, head point cloud dense matching, lip shape analysis, head surgery, disease diagnosis and the like.

However, the current technology cannot simultaneously guarantee the accuracy and the speed of the algorithm, the algorithm with higher speed has lower accuracy, and the algorithm with higher accuracy has lower speed, so that the application with higher requirements on the accuracy and the speed cannot be met.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an end-to-end point cloud deep learning network model, which can simultaneously position identification points on human faces with different scales, and has high positioning precision and high positioning speed.

The technical scheme of the invention is as follows: the end-to-end point cloud deep learning network model is a deep learning network structure of a Convolutional Neural Network (CNN) and comprises the following steps:

(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the point distribution characteristics of the neighborhood point clouds of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;

(2) selecting a part of point sets from the sampling point sets, calling all the sampling points in the sampling point sets as monitoring points, and positioning the identification points by using the monitoring points;

(3) and predicting the probability of each monitoring point in the neighborhood of different identification points and the offset of each monitoring point with different identification points.

The invention uses a point distribution characteristic extractor to extract neighborhood point cloud distribution characteristics of sampling points, the neighborhood point distribution characteristics of the points are abstracted step by step and the spatial receptive field is enlarged step by step, thereby expressing the distribution characteristics of the points in different spatial ranges; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning accuracy, and the algorithm consumes shorter time and is more stable due to the fact that the algorithm consumes time for forward propagation of point cloud in the network and the lightweight design.

The method matches each monitoring point with a plurality of identification points, matches the identification points with the monitoring point as long as the monitoring point is adjacent to one identification point, predicts the position of the identification point matched with the monitoring point by using the characteristics of each monitoring point, and converts the positioning problem of the identification point in the point cloud into a multi-label prediction and regression problem.

Drawings

FIG. 1 is a flow chart of the structure of L andemark Net and its application to a set of normal-scale face points.

FIG. 2 is a diagram illustrating a simple matching result between a monitoring point and a target identification point.

FIG. 3 is a flow chart of an end-to-end point cloud deep learning network model according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In order to make the description of the present disclosure more complete and complete, the following description is given for illustrative purposes with respect to the embodiments and examples of the present invention; it is not intended to be the only form in which the embodiments of the invention may be practiced or utilized. The embodiments are intended to cover the features of the various embodiments as well as the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and step sequences.

As shown in fig. 3, the end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and includes the following steps:

(1) the network down-samples the input point cloud step by step to obtain a series of sampling point sets, and uses a point distribution characteristic extractor to extract the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set step by step, the neighborhood point cloud point distribution characteristics of the sampling points are abstracted step by step, and the spatial receptive field is enlarged step by step;

(2) selecting a part of point sets from the sampling point sets, and using all the sampling points in the sampling point sets as monitoring points to position the identification points;

Preferably, in the step (1), for any input point cloud P, it is first down-sampled into a point cloud P with a point cloud density D using a Voxel Grid filter₀(ii) a According to a fixed sampling ratio { tau₁,τ₂,…,τ_nFrom P₀Gradually down-sampling to obtain a sampling point set { P₁,P₂,…,P_n}；

From the first set of sample points P₁Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation₁,P₂,…,P_nAbstract features of the sample points in. Feature abstraction operations on a set of points P_i-1Computing a set of points P_iFor a set of sampling points P_iInner kth sample point

At a set of sampling points P_i-1The interior finding is located at the point

A radius of r as a center_iNeighborhood subset inside sphere

Extraction using point distribution feature extractor

N in_iPoints and their feature vectorsGet a point

Abstract feature vector of

Wherein n is_iAnd the point cloud density is positively correlated with D. Set of sampling points P_iCharacteristic of all sampling points in

Composition point set P_iAbstract feature set F of_iSet of sample points { P }₁,P₂,…,P_nFeature set of { F } {₁,F₂,…,F_nThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted; finally, the point cloud extractor acts on P_nAll the points in (a) will produce a feature vector that expresses global features.

Then, from the last layer sampling point set P_nThe sampling point set P is obtained step by step_n,P_n-1,…,P₁The propagation characteristics of all sampling points in the set will constitute a propagation characteristic set

Feature propagation operations on a set of points P_i+1Computing a set of points P_iFor each sample point set P_iInner kth sample point

Set of points P_i+1Neutralization of

Abstract features of the nearest 3 points and

taking the derivative of the distance as weight to carry out weighted average, and taking the weighted average result and the point

Abstract features of

Splicing is carried out, a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u functions) are used for acting on splicing results, and points are obtained

Characteristic of propagation of

Set of sampling points P_iPropagation characteristics of all the sampling points

Composition point set P_iIs propagated feature set

Due to the sampling point set P_nThe next stage of (2) is a feature vector, which is taken as the weighted average result and the set of sampling points P_nSplicing the abstract features of each point to obtain a point set P_nPropagation characteristics of each sampling point

Preferably, in step (1), the Voxel Grid filter is first voxelized, and the barycenter of the point located in each Voxel constitutes the output point cloud.

Preferably, in the step (2), from the sampling point set { P }₁,P₂,…,P_nSelecting a plurality of point sets, which are called monitoring point sets MPS, and all sampling points in the monitoring point sets are called monitoring points; for the ith monitor Point set P_iThe k monitoring point of

Will be provided with

And

respectively carrying out batch normalization and splicing, and taking the splicing result as the characteristic of each monitoring point

Characteristics of each monitoring point

The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have discrimination, the neighborhood of which target identification point the monitoring point belongs to is judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point is predicted.

Preferably, in the step (3), if the number of target standard test points is L, the ith monitoring point set P is selected_iThe k monitoring point of

Using 1 single fully-connected layer with output dimension of L

Acting on its characteristics

Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3

Acting on its characteristics

To the monitoring point

Predicting the offset (delta x, delta y, delta z) of each identification point; the jth

And predicting the offset of the monitoring point and the jth identification point.

Preferably, in the step (3), the parameters at these fully-connected layers are shared among each set of sampling points.

A network consists of many feature abstraction operations and feature propagation operations. For any input point cloud P, it is first down-sampled using a Voxel Grid filter into a point cloud P with a point cloud density D₀. The Voxel Grid filter first voxelizes the space, and the center of gravity of the points located within each Voxel constitutes the output point cloud. According to a fixed sampling ratio { tau₁,τ₂,…,τ_nFrom P₀Step-by-step sampling to obtain a sampling point set { P₁,P₂,…,P_n}. From the first set of sample points P₁Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation₁,P₂,…,P_nAbstract features of the sample points in. Feature abstraction operations on a set of points P_i-1Computing a set of points P_iFor a set of sampling points P_iInner kth sample point

At a set of sampling points P_i-1The interior finding is located at the point

A radius of r as a center_iNeighborhood subset inside sphere

Acting on using point-distributed feature extractors (e.g. PointNet, RS-CNN, etc.)

N in_iPoints and their feature vectors to obtain points

Abstract feature vector of

Wherein n is_iAnd the point cloud density is positively correlated with D. The characteristics of all sampling points in each sampling point set

Form a point set P_iAbstract feature set F of_iSet of sample points { P }₁,P₂,…,P_nFeature set of { F } {₁,F₂,…,F_nThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted. Finally, the point of use acts on P with the feature extractor_nAll the points in (a) will produce a feature vector that expresses global features. Then, from the last layer sampling point set P_nInitially, a set of sample points { P } will be obtained step by step_n,P_n-1,…,P₁The propagation characteristics of all sampling points in the set will constitute a propagation characteristic set

Set of points P_i+1Neutralization of

Abstract features of the nearest 3 points and

is weighted average, the weighted average result is obtainedAnd point

Abstract features of

Performing splicing, and applying a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u) to the splicing result to obtain points

Characteristic of propagation of

Due to the sampling point set P_nIdentifies a feature vector, and then takes this feature vector as the weighted average result and the set of sample points P_nSplicing abstract features of each point in the point set P, and obtaining a point set P through a plurality of multilayer perceptrons (M L P) and nonlinear activation functions (Re L u)_nPropagation characteristics of each sampling point

From the set of sample points { P₁,P₂,…,P_nA plurality of point sets are selected, and are called Monitoring Point Sets (MPS), and all sampling points in the monitoring point sets are called monitoring points. For the ith monitor Point set P_iThe k monitoring point of

Will be provided with

And

Due to the characteristics of each monitoring point

The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have the discrimination, the neighborhood of which target identification point the monitoring point belongs to can be judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point can be predicted.

If the number of the target identification points is L, for the ith monitoring point set P_iThe k monitoring point of

Using 1 single fully-connected layer with output dimension of L

Acting on its characteristics

Acting on its characteristics

To the monitoring point

The amount of deviation (Δ x, Δ y, Δ z) from each marker point is predicted. Is different

(for example,

) The offset of the monitoring point from a different identification point (e.g., the jth identification point) is predicted. The parameters at these fully-connected layers are shared among each set of sample points.

Features with larger spatial receptive fields can express distribution features of points in a larger spatial range, and can be used to locate identification points on faces with larger dimensions, and vice versa. If a plurality of monitoring point sets with different spatial receptive fields are used, the network can be enabled to simultaneously locate the identification points on faces of different dimensions. Because the relative topological relation of the identification points and the relative positions of the identification points and the feature areas on the human face are relatively fixed, the global information helps to position the identification points, and because the propagation characteristics of the points contain the global information, the propagation characteristics of the monitoring points are integrated as the characteristics of the monitoring points in addition to the abstract characteristics of the monitoring points, so that the positioning stability of the network is improved.

Preferably, when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with a specific scale are matched with the monitoring points with the corresponding spatial receptive fields, and a series of boxes, namely a target box TBX and a detection box MBX, are arranged by taking the gold standard identification points and the monitoring points as centers respectively.

Preferably, the golden standard is based on training data

Set the side length (l) of TBX_x ^t,l_y ^t,L_z ^t) The setting mode is formula (1):

wherein the content of the first and second substances,

is the left external canthus of the eye,

is the right external canthus,

is the heart of the eyebrow, and the eyebrow,

is the lower base; according to the use at each monitoring point

Upper level point centralized generation

Radius r of the sphere of_iIs provided with

Length of side (l)_x ^m,l_y ^m,l_z ^m) The setting mode is formula (2):

l_x ^m＝l_y ^m＝l_z ^m＝2r_i(2)

if the TBX and the monitoring point of the jth golden standard identification point

Is/are as follows

Has a jaccard value exceeding a threshold value th_mThen matching is performed according to equation (3):

preferably, all parameters of the network are trained simultaneously using the loss functions of equation (4), including classification loss functions and regression loss functions

loss＝loss_c+λloss_r(4)

The classification loss function is formula (5)

Wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively;

loss_i,kas a monitoring point

The loss of classification of (a) is,

is acted on using sigmoid function

Monitoring point of network prediction obtained by j-th dimension calculation of output

The probability in the adjacent area of the jth gold standard identification point defines the monitoring point matched with at least one gold standard identification point as a positive sample, the monitoring point not matched with any gold standard identification point as a negative sample, N_pIs the number of positive samples, N_eThe number of negative samples;

according to loss_i,kSorting the negative samples and selecting the loss_i,kThe largest first few negative samples calculate the classification loss and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.

The regression loss function is formula (6):

is a monitoring point predicted by the network

Offset from the jth target mark point of

An output of (d);

corresponding gold standard.

Fig. 2 is a schematic diagram of a simple matching result between the monitoring point and the target identification point. The training method is described in detail below.

In the network training stage, the monitoring points need to be matched with the gold standard in the training data, and the network is trained according to the matching result.

In order to solve the two problems, a multi-label matching strategy (M L M) is provided, each monitoring point is matched with a plurality of identification points, the identification points are matched with the monitoring points as long as the monitoring points are adjacent to a certain identification point, the position of the identification point matched with the monitoring point is predicted by using the characteristics of each monitoring point, and the positioning problem of the identification points in the point cloud is converted into a multi-label prediction and regression problem.

When the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with a certain specific scale need to be matched with the monitoring points with the corresponding large and small spatial receptive fields, and therefore a series of boxes, namely a Target Box (TBX) and a detection box (MBX), are arranged by taking the gold standard identification points and the monitoring points as centers respectively. As shown in fig. 2, two solid black dots and two bold line boxes represent two target identification points and their TBXs, respectively. The three slashed-filled black dots and the three thin line boxes are the three monitoring points and their MBXs, respectively.

In order to make the size of TBX reflect the size of the face in the training data, the training is based onGolden standard for practicing data

Set the side length (l) of TBX_x ^t,l_y ^t,l_z ^t) The setting mode is as follows:

wherein the content of the first and second substances,

is the left external canthus of the eye,

is the right external canthus,

is the heart of the eyebrow, and the eyebrow,

is the tip of the lower chin.

According to the use at each monitoring point

Upper level point centralized generation

Radius r of the sphere of_iIs provided with

Length of side (l)_x ^m,l_y ^m,l_z ^m) The setting mode is as follows:

l_x ^m＝l_y ^m＝l_z ^m＝2r_i

Is/are as follows

Has a jaccard value exceeding a threshold value th_mThen match them

Loss function: all parameters of the network are synchronously trained using loss functions, including classification loss functions and regression loss functions.

loss＝loss_c+λloss_r

Wherein the classification loss function is as follows:

wherein i and k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; loss_i,kAs a monitoring point

The loss of classification of (a) is,

is acted on using sigmoid function

The probability in the jth gold standard identification point neighborhood is obtained by the formula (3)

Defining the monitoring points matched with at least one gold standard identification point as positive samples, and the monitoring points not matched with any gold standard identification point as negative samples, N_pIs the number of positive samples, N_eThe number of negative samples.

Since the number of negative samples is much larger than the number of positive samples, according to loss_i,kSorting the negative samples and selecting the loss_i,kThe largest first few negative samples calculate the classification loss and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.

The regression loss function is defined as follows:

obtained by the formula (3)

Is a monitoring point predicted by the network

Offset from the jth target mark point of

An output of (d);

corresponding gold standard.

In more detail, the RS-Conv is used as a point distribution feature extractor in the network, and 3D euclidean distances and coordinate differences (3D-Ed, x) are used_i-x_j) As the low-level distribution relation information h of the point cloud. The network comprises 8 feature abstraction operations and feature propagation operations in total, and the sampling ratio is { tau₁,τ₂,…,τ₇Are {7/20,8/10,10/15,15 }20,20/25,25/60,60/120 for generating a local subset of samples for each sample point

Sampling radius of { r₁,r₂,…,r₇Are {8,10,15,20,25,60,120} (mm), respectively, and the last feature abstraction operation is to act on the set of points P₇Collecting local point cloud subsets of each sampling point from the upper stage sampling point set by using a farthest point collection method

Local point cloud subset

Number of intermediate sampling points s₁,s₂,…,s₇Are {75/V,100/V,50/V,75/V, 200/V,100/V } respectively, where V is the size of the Grid in the Voxel Grid filter used to downsample the set of input points, and V is 5 mm. In addition, λ is 1, th_m＝0.2,th_p＝0.9,th_d＝3mm,th_e＝5mm。

And (4) calculating a covariance matrix cov (X) for predicting the missing identification points according to the gold standard in the training set, and supplementing the missing gold standard identification points in the training data to finish the calculation of the matching condition of the gold standard and the monitoring points.

Data enhancement: the training data is sequentially rotated around the x, y, z axes by randomly selected angular values ranging from-2.5 ° to +2.5 ° and random jitter with a mean value of 0 and a standard deviation of 0.25mm is added to each point of the training data. Random rotation and random jitter will make the training data used each time the network is trained different from each other, which will make the network training stable and therefore very important.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims

1. An end-to-end point cloud deep learning network model is characterized in that: the deep learning network structure of the convolutional neural network CNN is similar to the deep learning network structure of the convolutional neural network CNN, and comprises the following steps:

2. The end-to-end point cloud deep learning network model of claim 1, wherein: in the step (1), for any input point cloud P, firstly, a Voxel Grid filter is used for down-sampling the point cloud P into a point cloud P with a point cloud density D₀(ii) a According to a fixed sampling ratio { tau₁,τ₂,…,τ_nFrom P₀Gradually down-sampling to obtain a sampling point set { P₁,P₂,…,P_n}；

From the first set of sample points P₁Initially, a set of sample points { P } is extracted step-by-step using a feature abstraction operation₁,P₂,…,P_nAbstract characteristics of sampling points in the data; feature abstraction operations on a set of points P_i-1Computing a set of points P_iFor a set of sampling points P_iInner kth sample point

At a set of sampling points P_i-1The interior finding is located at the point

A radius of r as a center_iNeighborhood subset inside sphere

Extraction using point distribution feature extractor

N in_iPoints and their feature vectors to obtain points

Abstract feature vector of

Wherein n is_iThe point cloud density is positively correlated with D; the characteristics of all sampling points in each sampling point set

Composition point set P_iAbstract feature set F of_iSet of sample points { P }₁,P₂,…,P_nFeature set of { F } {₁,F₂,…,F_nThe spatial receptive field of the structure is gradually enlarged and is gradually abstracted; finally using point cloud feature extractor to act on P_nAll the points in the image are used for generating a feature vector expressing global features;

then, from the last layer sampling point set P_nInitially, a set of sample points { P } will be obtained step by step_n,P_n-1,…,P₁The propagation characteristics of all sampling points in the set will constitute a propagation characteristic set

Feature propagation operations on a set of points P_i+1Computing a set of points P_iFor a set of sampling points P_iInner kth sample point

Set of points P_i+1Neutralization of

Abstract features of the nearest 3 points and

the reciprocal of the distance of (a) is weighted average, and the result of the weighted average is compared with the point

Abstract features of

And performing splicing, namely using a plurality of multi-layer perceptron M L P and nonlinear activation function Ru L u functions to act on a splicing result to obtain points

Characteristic of propagation of

Due to the sampling point set P_nIs a feature vector, then this feature vector is taken as the weighted average result and the set of sample points P_nSplicing the abstract features of each point to obtain a point set P_nPropagation characteristics of each sampling point

3. The end-to-end point cloud deep learning network model of claim 2, wherein: in the step (1), a Voxel Grid filter firstly voxelizes a space, and the barycenter of a point located in each Voxel forms an output point cloud.

4. The end-to-end point cloud deep learning network model of claim 3, characterized in thatThe method comprises the following steps: in the step (2), from the sampling point set { P }₁,P₂,…,P_nSelecting a plurality of point sets, which are called monitoring point sets MPS, and all sampling points in the monitoring point sets are called monitoring points; for the ith monitor Point set P_iThe k monitoring point of

Will be provided with

And

Characteristics of each monitoring point

The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas have discrimination, the neighborhood of which eye ratio identification point the monitoring point is located in is judged according to the characteristics of each monitoring point, and the position adjacent to the eye ratio identification point is predicted.

5. The end-to-end point cloud deep learning network model of claim 4, wherein in the step (3), if the number of target identification points is L, for the ith monitoring point set P_iThe k monitoring point of

Using 1 single fully-connected layer with output dimension of L

Acting on its characteristics

Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point, and using L single-layer full-connection layers with the output dimension of 3 (

j-0, 1, …, L-1) on its characteristics

To the monitoring point

6. The end-to-end point cloud deep learning network model of claim 5, wherein: in step (3), the parameters at these fully-connected layers are shared among each set of sample points.

7. The method for training the end-to-end point cloud deep learning network model according to claim 6, wherein: each monitoring point is matched with a plurality of identification points, as long as the monitoring point is adjacent to one identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristics of each monitoring point, and the positioning problem of the identification points in the point cloud is converted into a multi-label prediction and regression problem.

8. The method for training the end-to-end point cloud deep learning network model according to claim 7, wherein: when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with corresponding large and small spatial receptive fields, and a series of boxes, namely a target box TBX and a detection box MBX, are respectively arranged by taking the gold standard identification points and the monitoring points as centers.

9. The method for training the end-to-end point cloud deep learning network model according to claim 8, wherein: gold standard according to training data

wherein the content of the first and second substances,

is the left external canthus of the eye,

is the right external canthus,

is the heart of the eyebrow, and the eyebrow,

is the lower base;

according to the use at each monitoring point

Upper level point centralized generation

Radius r of the sphere of_iIs provided with

Length of side (l)_x ^m,l_y ^m,l_z ^m) The setting mode is formula (2):

l_x ^m＝l_y ^m＝l_z ^m＝2r_i(2)

Is/are as follows

10. the method for training the end-to-end point cloud deep learning network model according to claim 9, wherein: learning all parameters of the network simultaneously using the loss functions of equation (4), including classification loss function and regression loss function

loss＝loss_c+λloss_r(4)

The classification loss function is formula (5)

The loss of classification of (a) is,

is acted on using sigmoid function

according to loss_i,kSorting the negative samples and selecting the loss_i,kCalculating the classification loss of the first negative samples with the largest size, and ensuring that the number of the negative samples participating in calculation is not more than three times of the number of the positive samples;

the regression loss function is formula (6):

is a monitoring point predicted by the network

Offset from the jth target mark point of

An output of (d);

corresponding gold standard.