CN114067380A

CN114067380A - Model training method, detection method and device for face key point detection, and terminal equipment

Info

Publication number: CN114067380A
Application number: CN202010764497.XA
Authority: CN
Inventors: 林坚; 周金明
Original assignee: Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Current assignee: Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2022-02-18
Anticipated expiration: 2040-08-03
Also published as: CN114067380B

Abstract

The invention discloses a model training method, a detection method, a device and a terminal device for face key point detection, wherein the model training method comprises the following steps: step 1, collecting an original image containing a human face, labeling N human face key point coordinates at positions, and constructing an original training set; step 2: acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set; step 3, image preprocessing is carried out on the augmented data set; step 4, acquiring an Euler angle label of the augmented data set image; and 5, training the face key point detection model, detecting the face key points based on the network obtained by the model training method, and improving the detection speed under the condition of ensuring the precision.

Description

Model training method, detection method and device for face key point detection, and terminal equipment

Technical Field

The invention relates to the technical field of computer vision and face recognition, in particular to a model training method, a detection method and a device for face key point detection and a terminal device.

Background

The key point detection of the human face aims at key point positioning of a given human face image, and key points are artificially defined and used for representing some points on the five sense organs and the outline of the human face. How to accurately and efficiently locate the key points is an important prerequisite for face analysis and recognition. At present, the face key point detection method mainly includes a traditional method based on an ASM (active Shape model) and an AAM (active appearance model), a method based on a CSR (masked Shape regression), and a method based on deep learning, but the existing method is difficult to realize rapid detection under the condition of ensuring precision.

Disclosure of Invention

In order to overcome the defects of the prior art, the embodiment of the disclosure provides a model training method, a detection method and a detection device for face key point detection, and a terminal device, which can improve the detection speed under the condition of ensuring the precision. The technical scheme is as follows:

in a first aspect, a model training method for face keypoint detection is provided, which includes the following steps:

step 1, collecting an original image containing a human face, marking N human face key point coordinates at the position, wherein the human face key point coordinates are coordinates relative to the original image, the width of the original image is width, and the height of the original image is height, and constructing an original training set.

Step 2: acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set;

calculating the maximum and minimum horizontal and vertical coordinates of each image key point to form an original face region frame src _ box (x _ min, y _ min, x _ max, y _ max), obtaining num groups of enlarged face regions enlarge _ box (x _ min ', y _ min', x _ max ', y _ max') by using a random enlargement method for the face region frame, and correspondingly converting the key point coordinates corresponding to the original image into the key point coordinates corresponding to the enlarge _ box, thereby constructing an augmented training set.

And 3, carrying out image preprocessing on the augmented data set.

And 4, acquiring an Euler angle label of the augmented data set image.

Obtaining coordinates of six points including a left canthus, a right canthus, a nose tip, a left mouth corner, a right mouth corner and a lower jaw from key points of the face, calculating a rotation matrix, and converting the rotation matrix into an Euler angle; the euler angle comprises a pitch angle pitch, a yaw angle raw and a roll angle roll.

Step 5, training the face key point detection model,

the method comprises the steps of adopting an augmented training set, taking images in the training set, key points of the images and Euler angle labels of the images as input, using a basic feature extraction module on the input images to obtain basic feature parts, and respectively predicting the basic feature parts by adopting key point network main branches and angle network auxiliary branches.

The key point network main branch comprises a first scale feature extraction module, a second scale feature extraction module, a third scale feature extraction module and a key point regression prediction module, and first scale features are extracted from the basic feature part; continuously extracting the first scale features to obtain second scale features; extracting the second scale features to obtain third scale features; and finally, fusing the characteristics of all three scales as the input of a key point regression prediction module to obtain the predicted key point.

The angle network auxiliary branch comprises an Euler angle feature extraction module and an Euler angle feature prediction module, the Euler angle feature extraction module is used for extracting Euler angle features from the basic feature part, and then the Euler angle feature prediction module is used for obtaining a predicted Euler angle.

Calculating Loss by combining predicted key point coordinates and real key point coordinates_wingAnd coordinate regression Loss_regCalculating a reference to Loss in conjunction with the predicted euler angle and the true euler angle_regInfluence weight w of_θ(ii) a Total Loss of training Loss is Loss_wingSum weighted Loss_regThe training goal is to converge the total Loss to a minimum.

Preferably, in step 2, the frame of the face region is randomly enlarged, and the specific method is as follows:

the width magnification factor of the target frame is Sw, the height magnification factor is Sh, and the maximum threshold eta of Sw and Sh is set, namely that Sw is more than or equal to 1 and eta is less than or equal to 1, and Sh is more than or equal to 1 and eta is less than or equal to eta.

Set d_x1To the extent that the target frame is enlarged to the left, d_x1A random number ranging from 0 to Sw-1; d_y1For the scale of the target frame up, d_y1A random number ranging from 0 to Sh-1; d_x2For the scale of the enlargement of the target frame to the right, d_x2The value range is between 0 and 1.0-d_x1The random number of (2); dy2 being the scale of the target box expanding downward, d_y2The value range is between 0 and 1.0-d_y1The random number of (2); namely: d_x1∈[0，Sw-1]，d_y1∈[0，Sh-1]，d_x2∈[0，1.0-d_x1]，d_y2∈[0，1.0-d_y1]And d is not less than 0_x1+d_x2≤1，0≤d_y1+d_y2≤1。

The maximum outward expansion of the target frame is:

maximum value of left or right enlargement: max _ dx ═ (Sw-1.0) (x _ max-x _ min)

Maximum value of upward or downward expansion: max _ dy ═ (Sh-1.0) (. y _ max-y _ min)

Then, the image border crossing processing is carried out, the upper left corner point in the original image is taken as the original point of the original image, and the new coordinate position obtained after random expansion is taken as

x_min’＝max(0，x_min-d_x1*max_dx)

y_min’＝max(0，y_min-d_y1*max_dy)、

x_max’＝min(width，x_max+d_x2*max_dx)

y_max’＝min(height，y_max+d_y2*max_dy)

And obtaining the area frame (x _ min ', y _ min', x _ max ', y _ max') after one-time random amplification.

Preferably, the method of image pre-processing the augmented data set comprises one or more of the following: scaling to fixed input size, random rotation, random graying, random color and saturation transformation, pixel value normalization, random erasure.

Preferably, in step 5, the basic feature extraction module uses a basic network to convert the input image into a basic feature map with the size of original image 1/4; a first scale feature extraction module in a key point network main branch module down-samples the size of a basic feature mapping layer from 1/4 size of an original image to 1/8 size, and maps the down-sampled feature to 1 x 1 size by using self-adaptive mean pooling operation; the second scale feature extraction module down-samples the size of the first scale feature mapping layer from original drawing 1/8 to 1/16, and maps the down-sampled feature to 1 × 1 size by using self-adaptive mean pooling operation; the third scale feature extraction module down-samples the size of the second scale feature mapping layer from the size of the original image 1/16 to 1 x 1; the feature extraction module in the angle network assisted branching module down-samples the base feature mapping layer size from the original 1/4 size to 1/16 size.

Further, the basic feature extraction module in the step 5 comprises 1 convolution, G-bneck B, G-bneck C and G-bneck D.

The first scale feature extraction module comprises G-bneck A and n₁A plurality of G-bneck B,

the second scale feature extraction module comprises G-bneck A and n₂A plurality of G-bneck B,

the third scale feature extraction module comprises 1 convolution,

the angle network auxiliary branch comprises 4 convolution layers, a maximum pooling layer and 2 full-connection layers.

In a second aspect, a method for detecting key points of a human face is provided, and the method specifically includes:

according to any one of all possible implementation manners, the trained face key point detection network is input as an image to be detected, and a key point detection result is obtained.

Meanwhile, another method for detecting key points of the human face is provided, which specifically comprises the following steps:

according to any one of all possible implementation manners, the basic feature extraction module and the key point network main branch module are adopted as the detection network of the face key point detection network obtained by training, and the image to be detected is input to obtain the key point detection result.

Preferably, the method further comprises the following steps of processing the image to be detected: the method comprises the steps of obtaining the position of a face detection frame by using a face detection algorithm, setting a threshold value for the size of the face detection frame, removing the detection frame with the size smaller than the threshold value, carrying out scale magnification on the face frame when the size of the face detection frame is larger than the threshold value, intercepting the magnified face detection frame as detection input to obtain key point coordinates, and calculating the face key point coordinates of an image to be detected by combining the key point coordinates and the face detection frame coordinates.

In a third aspect, an apparatus for detecting key points of a human face is provided, the apparatus includes a training module and a detection module;

the training module is used for executing the steps of the model training method for detecting the key points of the human face in any one of all possible implementation modes;

the detection module is configured to execute the steps of the method for detecting key points of a human face according to any one of all possible implementation manners.

In a fourth aspect, a mobile terminal device is provided, where the mobile terminal device includes a device for detecting key points of a human face in any one of all possible implementation manners.

Compared with the prior art, one of the technical schemes has the following beneficial effects:

1. when data are enhanced, more different training data are generated by adopting a new mode of randomly amplifying a target frame, and the precision and the generalization capability of the model are improved;

2. by using the lightweight network as a backbone network, the calculation amount of the model is reduced, and the detection speed is improved;

3. the model is trained and strengthened only by using the angle network auxiliary branch in the training stage, and the detection precision is improved under the condition of not increasing the calculated amount in the test;

4. training the model by combining the weighted key point coordinate regression loss and wing loss;

5. direct regression of the position coordinates, rather than thermodynamic map processing, eliminates the time consuming post-processing component.

Drawings

Fig. 1 is a schematic diagram of constructing an augmented training set according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating random enlargement of a face area according to an embodiment of the disclosure.

Fig. 3 is a frame diagram of a face key point model training network provided in the embodiment of the present disclosure.

Fig. 4 is a structure diagram of a face key point model training network provided in the embodiment of the present disclosure.

Fig. 5 is a block diagram of four G-bneck modules according to an embodiment of the present disclosure.

Fig. 6 is a network framework diagram of face keypoint detection provided by the embodiment of the present disclosure.

Fig. 7 is a network structure diagram of face key point detection provided in the embodiment of the present disclosure.

Detailed Description

In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may, for example, be implemented in an order other than those described herein.

In a first aspect: the embodiment of the disclosure provides a model training method for face key point detection, which comprises the following steps:

step 1, collecting an original image containing a human face, labeling N human face key point coordinates (preferably 106 key points) at the position, wherein the human face key point coordinates are coordinates relative to the original image, the width of the original image is width, and the height of the original image is height, and constructing an original training set.

Step 2: and acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set.

Fig. 1 is a schematic diagram of constructing an augmented training set according to an embodiment of the present disclosure, and a specific method in combination with step 2 of the diagram includes: calculating the maximum and minimum horizontal and vertical coordinates of each image key point to form an original face region frame src _ box (x _ min, y _ min, x _ max, y _ max), obtaining num (default 20) groups of amplified face regions enlarge _ box (x _ min ', y _ min', x _ max ', y _ max') by using a random amplification method for the face region frame, and correspondingly converting the key point coordinates corresponding to the original image into the key point coordinates corresponding to the enlarge _ box, thereby constructing an augmentation training set.

Preferably, in step 2, the frame of the face region is randomly enlarged, and the specific method is as follows, as shown in fig. 2:

1) the width magnification factor of the target frame (namely the face area frame) is set to be Sw, the height magnification factor is set to be Sh, the maximum threshold eta (default is 2.0) of Sw and Sh is set, Sw is larger than or equal to 1 and smaller than or equal to eta, and Sh is larger than or equal to 1 and smaller than or equal to eta.

2) Set d_x1To the extent that the target frame is enlarged to the left, d_x1A random number ranging from 0 to Sw-1; d_y1For the scale of the target frame up, d_y1A random number ranging from 0 to Sh-1; d_x2For the scale of the enlargement of the target frame to the right, d_x2The value range is between 0 and 1.0-d_x1The random number of (2); dy2 being the scale of the target box expanding downward, d_y2The value range is between 0 and 1.0-d_y1The random number of (2); namely: d_x1∈[0，Sw-1]，d_y1∈[0，Sh-1]，d_x2∈[0，1.0-d_x1]，d_y2∈[0，1.0-d_y1]And d is not less than 0_x1+d_x2≤1，0≤d_y1+d_y2≤1。

3) The maximum outward expansion of the target frame is:

x_min’＝max(0，x_min-d_x1*max_dx)

y_min’＝max(0，y_min-d_y1*max_dy)、

x_max’＝min(width，x_max+d_x2*max_dx)

y_max’＝min(height，y_max+d_y2*max_dy)

The random amplification mechanism ensures that the new area frame comprises the original face area frame, the original face area frame does not sit at the central position of the great _ box, but falls at all positions of the new area frame at equal probability, and the size and the shape of the new area frame are not fixed, so that a plurality of different samples are generated in an initial picture, and the final precision and the generalization capability of model training are possible to be further improved; meanwhile, when the final training model is tested, if the extraction of the face target frame is not accurate enough, the final training model can be well positioned to the position of the key point, and even if the face occupies a large area in the image area, the accurate result can be obtained by directly adopting key point detection without using a target detection module.

Step 3, image preprocessing is carried out on the augmentation data set

Preferably, the method of image pre-processing the augmented data set comprises one or more of the following: scaling to a fixed input size (112 × 112 is used by default), random rotation (note that the coordinates of the key points also need corresponding transformation), random graying, random color and saturation transformation, pixel value normalization, (calculating the pixel mean and variance of all images of the training set with respect to three channels of RGB, and performing a mean-reducing and variance-removing operation on the images), random erasing, and the like.

Step 4, obtaining Euler angle label of the augmented data set image

Coordinates of six points including a left canthus, a right canthus, a nose tip, a left mouth corner, a right mouth corner and a lower jaw are obtained from key points of the face, a rotation matrix is calculated (the rotation matrix can be calculated by preferably using a solvePnP function in OpenCV), and then the rotation matrix is converted into an Euler angle; the euler angle comprises a pitch angle (pitch), a yaw angle (raw) and a roll angle (roll).

Step 5, training the face key point detection model,

fig. 3 is a frame diagram of a face key point model training network provided in the embodiment of the present disclosure, and step 5 in combination with the diagram specifically includes: the method comprises the steps of adopting an augmented training set, taking images in the training set, key points of the images and Euler Angle labels of the images as input, using a basic feature extraction Module for the input images to obtain basic feature parts, and respectively predicting the basic feature parts by adopting a key point network main branch (Landmark Net Module) and an Angle network auxiliary branch (Angle Net Module).

The key point network main branch comprises a first Scale feature extraction module (Scale _1_ Net), a second Scale feature extraction module (Scale _2_ Net), a third Scale feature extraction module (Scale _3_ Net) and a key point regression prediction module (Landmark _ Classfier), and the first Scale feature is extracted from the basic feature part; continuously extracting the first scale features to obtain second scale features; extracting the second scale features to obtain third scale features; and finally, fusing the characteristics of all three scales as the input of a key point regression prediction module to obtain the predicted key point.

Weight of influence w_θIs composed of

Wherein theta is₁、θ₂、θ₃Respectively represents the values of the three Euler angles of pitch, yaw and roll

Loss_regIs composed of

Where N is the number of keypoints (default to 106),

and representing the distance between the predicted value of the nth key point and the true value of the corresponding point.

Loss_wingIs composed of

Where w, e, and C are constants that can be set as appropriate (here, preferably, w is 10.0, e is 2.0, and C is 10 ln6, respectively)

The total Loss of training is Loss of

Loss＝(1+w_θ)*Loss_reg+Loss_wing

Further, as shown in fig. 4 and 5, the basic feature extraction module in step 5 includes 1 convolution, G-bneck B, G-bneck C, and G-bneck D.

The first scale feature extraction module comprises G-bneck A and n 1G-bneck B

The second scale feature extraction module comprises G-bneck A and n 2G-bneck B

The third scale feature extraction module comprises 1 convolution

In a second aspect: the embodiment of the present disclosure provides a method for detecting key points of a human face;

fig. 6 is a network frame diagram of face key point detection provided in the embodiment of the present disclosure, and in combination with the network frame diagram, a method for detecting face key points includes:

the method comprises the steps of training a human face key point detection network by adopting a model training method for human face key point detection according to any one of all possible implementation modes, inputting an image to be detected by adopting a basic feature extraction module and a key point network main branch module as the detection network, and obtaining a key point detection result (or key point detection coordinates).

In practical application, scenes with a relatively short distance between the face and the camera often appear, for example, when the face detection is performed at a security inspection entrance and the proportion of the face in an image to be detected is large, the face frame does not need to be amplified by adopting a face detection algorithm, and the original image is directly input into a key point detection network to obtain the coordinate position of the key point of the face related to the original image. However, for a scene with a relatively long distance between the face and the camera, when the number of faces in the image to be detected is more than one, the face detection frame is often too small, a face detection algorithm (preferably, MTCNN) can be preferably used to obtain the position of the face detection frame, a threshold is set for the size of the face detection frame, the detection frame with the size smaller than the threshold is removed, when the size of the face detection frame is larger than the threshold, the face detection frame is subjected to scale (preferably, 1.8) times amplification, the amplified face detection frame is intercepted and used as detection input, a key point detection result is obtained, and the predicted key point coordinates and the face detection frame coordinates are combined to calculate the face key point coordinates related to the image to be detected.

Preferably, fig. 7 is a network structure diagram of face keypoint detection provided by the embodiment of the present disclosure, and in combination with the diagram, the basic feature extraction module includes 1 convolution, G-bneck B, G-bneck C, and G-bneck D; the key point network main branch comprises a first Scale feature extraction module (Scale _1_ Net), a second Scale feature extraction module (Scale _2_ Net), a third Scale feature extraction module (Scale _3_ Net) and a key point regression prediction module (Landmark _ Classfier); the first scale feature extraction module comprises G-bneck A and n 1G-bneck B, the second scale feature extraction module comprises G-bneck A and n 2G-bneck B, and the third scale feature extraction module comprises 1 convolution.

It should be noted that the test part only uses the basic feature extraction module and the key point network main branch module to obtain the final human face key point prediction result. The effect of adding the angle network auxiliary branch during training is mainly to enable the trained model to have certain learning capacity on the angles of the human faces, enable the model to more accurately predict key point information when the model aims at the human faces with different angles, prompt the generalization capacity of the model on human face angle factors, and improve the accuracy of human face key point detection.

In a third aspect: the embodiment of the present disclosure provides a device for detecting key points of a human face;

based on the same technical concept, the device comprises a training module and a detection module;

The device for detecting the face key points, the model training method for detecting the face key points and the detection method provided by the embodiments belong to the same concept, and the specific implementation process is described in the method embodiments in detail and is not described herein again.

In a fourth aspect, an embodiment of the present disclosure provides a terminal device, where the terminal device includes a device for detecting key points of a human face in any one of all possible implementation manners.

The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims

1. A model training method for face key point detection is characterized by comprising the following steps:

step 1, collecting an original image containing a human face, marking N coordinates of human face key points at positions, wherein the human face key point coordinates are coordinates relative to the original image, the width of the original image is width, and the height of the original image is height, and constructing an original training set;

calculating the maximum and minimum horizontal and vertical coordinates of each image key point to form an original face region frame src _ box (x _ min, y _ min, x _ max, y _ max), obtaining num groups of enlarged face regions enlarge _ box (x _ min ', y _ min', x _ max ', y _ max') by using a random enlargement method for the face region frame, and correspondingly converting the key point coordinates corresponding to the original image into the key point coordinates corresponding to the enlarge _ box, thereby constructing an augmentation training set;

step 3, image preprocessing is carried out on the augmented data set;

step 4, acquiring an Euler angle label of the augmented data set image;

obtaining coordinates of six points including a left canthus, a right canthus, a nose tip, a left mouth corner, a right mouth corner and a lower jaw from key points of the face, calculating a rotation matrix, and converting the rotation matrix into an Euler angle; the Euler angles comprise a pitch angle pitch, a yaw angle raw and a roll angle roll;

step 5, training the face key point detection model,

adopting an augmented training set, taking images in the training set, key points of the images and Euler angle labels of the images as input, using a basic feature extraction module on the input images to obtain a basic feature part, and respectively predicting the basic feature part by adopting a key point network main branch and an angle network auxiliary branch;

the key point network main branch comprises a first scale feature extraction module, a second scale feature extraction module, a third scale feature extraction module and a key point regression prediction module, and first scale features are extracted from the basic feature part; continuously extracting the first scale features to obtain second scale features; extracting the second scale features to obtain third scale features; finally, fusing all three-scale features as the input of a key point regression prediction module to obtain predicted key points;

the angle network auxiliary branch comprises an Euler angle feature extraction module and an Euler angle feature prediction module, wherein the Euler angle feature extraction module is used for extracting Euler angle features from the basic feature part, and then the Euler angle feature prediction module is used for obtaining a predicted Euler angle;

2. The model training method for face key point detection according to claim 1, wherein the face region frame is randomly enlarged in step 2, and the specific method is as follows:

setting the width magnification factor of the target frame as Sw and the height magnification factor as Sh, and setting the maximum threshold eta of Sw and Sh, namely, Sw is more than or equal to 1 and less than or equal to eta, and Sh is more than or equal to 1 and less than or equal to eta;

set d_x1To the extent that the target frame is enlarged to the left, d_x1A random number ranging from 0 to Sw-1; d_y1For the scale of the target frame up, d_y1A random number ranging from 0 to Sh-1; d_x2For the scale of the enlargement of the target frame to the right, d_x2The value range is between 0 and 1.0-d_x1The random number of (2); dy2 being the scale of the target box expanding downward, d_y2The value range is between 0 and 1.0-d_y1The random number of (2); namely: d_x1∈[0，Sw-1]，d_y1∈[0，Sh-1]，d_x2∈[0，1.0-d_x1]，d_y2∈[0，1.0-d_y1]And d is not less than 0_x1+d_x2≤1，0≤d_y1+d_y2Less than or equal to 1; the maximum outward expansion of the target frame is:

x_min’＝max(0，x_min-d_x1*max_dx)

y_min’＝max(0，y_min-d_y1*max_dy)、

x_max’＝min(width，x_max+d_x2*max_dx)

y_max’＝min(height，y_max+d_y2*max_dy)

3. The model training method for face keypoint detection according to claim 1, characterized in that the method for image preprocessing of the augmented data set comprises one or more of the following ways: scaling to fixed input size, random rotation, random graying, random color and saturation transformation, pixel value normalization, random erasure.

4. The model training method for face keypoint detection according to any of claims 1 to 3, characterized in that in step 5, the basic feature extraction module uses a basic network to convert the input image into a basic feature map with dimensions of original drawing size 1/4; a first scale feature extraction module in a key point network main branch module down-samples the size of a basic feature mapping layer from 1/4 size of an original image to 1/8 size, and maps the down-sampled feature to 1 x 1 size by using self-adaptive mean pooling operation; the second scale feature extraction module down-samples the size of the first scale feature mapping layer from original drawing 1/8 to 1/16, and maps the down-sampled feature to 1 × 1 size by using self-adaptive mean pooling operation; the third scale feature extraction module down-samples the size of the second scale feature mapping layer from the size of the original image 1/16 to 1 x 1; the feature extraction module in the angle network assisted branching module down-samples the base feature mapping layer size from the original 1/4 size to 1/16 size.

5. The model training method for face key point detection according to claim 4, wherein the basic feature extraction module in step 5 comprises 1 convolution, G-bneck B, G-bneck C and G-bneck D;

the third scale feature extraction module comprises 1 convolution,

6. A method for detecting key points of a human face is characterized in that,

the model training method for face key point detection according to any one of claims 1 to 5, wherein the trained face key point detection network is input as an image to be detected to obtain a key point detection result.

7. A method for detecting key points of a human face is characterized in that,

the model training method for face key point detection according to any one of claims 1 to 5, wherein the trained face key point detection network adopts a basic feature extraction module and a key point network main branch module as a detection network, and an image to be detected is input to obtain a key point detection result.

8. The method for detecting the key points of the human face according to any one of the claims 6 or 7, characterized in that the image to be detected is processed by: the method comprises the steps of obtaining the position of a face detection frame by using a face detection algorithm, setting a threshold value for the size of the face detection frame, removing the detection frame with the size smaller than the threshold value, carrying out scale magnification on the face frame when the size of the face detection frame is larger than the threshold value, intercepting the magnified face detection frame as detection input to obtain key point coordinates, and calculating the face key point coordinates of an image to be detected by combining the key point coordinates and the face detection frame coordinates.

9. A human face key point detection device is characterized by comprising a training module and a detection module;

the training module is used for executing the steps of the model training method for face key point detection according to any one of claims 1 to 5;

the detection module is configured to perform the steps of the method for detecting key points of a human face according to any one of claims 6 to 8.

10. A mobile terminal device, characterized in that the mobile terminal device comprises the apparatus for detecting face key points according to claim 9.