CN113221812A

CN113221812A - Training method of face key point detection model and face key point detection method

Info

Publication number: CN113221812A
Application number: CN202110579203.0A
Authority: CN
Inventors: 刘思伟
Original assignee: Guangzhou Weaving Point Intelligent Technology Co ltd
Current assignee: Guangzhou Weaving Point Intelligent Technology Co ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-08-06

Abstract

The application discloses a training method of a face key point detection model and a face key point detection method, wherein the training method comprises the following steps: acquiring a face sample image carrying preset face labeling information; inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image; inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image; inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: the method has the advantages that the WingLoss function is preset, and the technical problem that the key point detection accuracy is low due to the fact that key point drift is easy to occur when the face is detected by the existing face key point detection method is solved.

Description

Training method of face key point detection model and face key point detection method

Technical Field

The application relates to the field of computer vision, in particular to a training method of a face key point detection model and a face key point detection method.

Background

Face keypoint detection has a key role for numerous applications, such as face pose correction, pose recognition, expression recognition, fatigue monitoring, mouth shape recognition, and the like. Therefore, how to obtain high-precision face key points is always a research hotspot in the field of computer vision.

The detection of the key points of the human face refers to the positioning of the key points of the human face, such as eyebrows, eyes, a nose, a mouth, a face contour and the like, of the human face given by the human face image. When the existing face key point detection method is used for detecting a face, key point drift is easy to occur, so that the face key point detection accuracy is low.

Disclosure of Invention

The application provides a training method of a face key point detection model and a face key point detection method, and solves the technical problem that the face key point detection accuracy is low due to the fact that key point drift is easy to occur when face detection is carried out by an existing face key point detection method.

In view of this, a first aspect of the present application provides a method for training a face keypoint detection model, including:

acquiring a face sample image carrying preset face labeling information;

inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image;

inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image;

inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.

Optionally, the preset WingLoss function is as follows:

where loss (x) is a preset WingLoss function, w is the interval between nonlinear components, ε is the curvature used to limit the degree of curve curvature, c is a constant, and x is the difference between the predicted and true values.

Optionally, the face labeling information includes: a face labeling frame and face key points;

the acquiring of the face sample image carrying the preset face labeling information specifically includes:

acquiring an unmarked original human face sample image;

carrying out face frame labeling on the original face sample image by a preset labeling tool to obtain a face sample image labeled with a face labeling frame;

detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates;

and performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.

Optionally, the face candidate box includes: a positive face candidate frame, a middle face candidate frame and a negative face candidate frame;

inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image, specifically comprising:

and inputting the face sample image into a Pnet network of an MTCNN (multiple-transmission-network) network, so that the Pnet network performs face frame selection on the face sample image, and classifying the selection frames into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame.

Optionally, the selecting the face frames of the face sample image, and classifying the frames into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the face frames specifically include:

carrying out face frame selection on the face sample image to obtain a plurality of face selection frames;

calculating the overlapping rate between each face selecting frame and the face labeling frame;

and classifying the face selection frames according to the overlapping rate corresponding to each face selection frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame.

Optionally, the classifying the face culling frames according to the overlapping rates corresponding to the face culling frames to obtain a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame specifically includes:

taking the face selection frame with the overlapping rate more than or equal to a first threshold value as a positive face candidate frame;

taking the face culling box with the overlapping rate larger than a second threshold and smaller than the first threshold as the middle face candidate box;

and taking the face selection frame with the overlapping rate less than or equal to the second threshold as a negative face candidate frame, wherein the second threshold is less than the first threshold.

A second aspect of the present application provides a method for detecting a face key point, including:

acquiring a human face image to be detected;

inputting the facial image to be detected into a preset MTCNN network model to obtain key point information of the facial image to be detected, which is output by the preset MTCNN network model, wherein the preset MTCNN network model is obtained by training according to any one of the training methods of the first aspect.

The third aspect of the present application provides a training apparatus for a face key point detection model, including:

the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a face sample image carrying preset face labeling information;

the first processing unit is used for inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image;

the second processing unit is used for inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image;

a third processing unit, configured to input the face target frame, the preset face labeling information, and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, where a key point regression function in the Onet network is: and presetting a WingLoss function.

Optionally, the preset WingLoss function is as follows:

The present application in a fourth aspect provides a face keypoint detection apparatus, comprising:

the acquisition unit is used for acquiring a face image to be detected;

a detection unit, configured to input the facial image to be detected into a preset MTCNN network model, to obtain key point information of the facial image to be detected output by the preset MTCNN network model, where the preset MTCNN network model is obtained by training according to any one of the training methods of the first aspect.

From the above technical method, the present application has the following advantages:

after researching the prior art, the inventor finds that the key point falling drift in the prior face key point detection is caused by the used key point regression function. The key point regression function in the prior art is MSELoss, a certain abnormal value inevitably exists in the process of marking key points of a human face, the MSELoss is easily influenced by the abnormal value, when the abnormal value appears in data, a model using MSE can endow the abnormal point with larger weight, so that the convergence of the model is poorer, in the application, a preset WingLoss function is used during the key point regression, and because WingLoss can relieve the sensitivity degree to the abnormal value, the key point drift is not easily generated when the model obtained by training the key point regression function is used for detecting the key points of the human face, so that the technical problem that the key point drift is easily generated when the existing human face key point detection method is used for detecting the human face, and the detection accuracy of the key points of the human face is lower is solved.

Drawings

In order to more clearly illustrate the technical method in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive labor.

Fig. 1 is a schematic flowchart of a first embodiment of a training method for a face keypoint detection model in an embodiment of the present application;

fig. 2 is a schematic flowchart of a second embodiment of a training method for a face keypoint detection model in the embodiment of the present application;

FIG. 3 is a schematic diagram of a Pnet network of an MTCNN network according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an Rnet network of an MTCNN network according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a Onet network of the MTCNN network in the embodiment of the present application;

FIG. 6a is an effect diagram of face keypoint detection using preset wingLoss in the embodiment of the present application;

FIG. 6b is a diagram illustrating the effect of face keypoint detection using the MSELoss function in the embodiment of the present application;

fig. 7 is a schematic flowchart of an embodiment of a method for detecting a face key point in an embodiment of the present application;

fig. 8 is a schematic structural diagram of an embodiment of a training apparatus for a face keypoint detection model in an embodiment of the present application;

fig. 9 is a schematic structural diagram of an embodiment of a face keypoint detection apparatus in an embodiment of the present application.

Detailed Description

The embodiment of the application provides a training method of a face key point detection model and a face key point detection method, and solves the technical problem that the face key point detection accuracy is low due to the fact that key point drift is easy to occur when the face is detected by the existing face key point detection method.

In order to make the method of the present application better understood, the technical method in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For easy understanding, please refer to fig. 1, where fig. 1 is a schematic flowchart of a first embodiment of a training method for a face keypoint detection model in an embodiment of the present application.

In this embodiment, a training method for a face key point detection model includes:

step 101, obtaining a face sample image carrying preset face labeling information.

And 102, inputting the face sample image into a Pnet network of the MTCNN network to obtain a face candidate frame corresponding to the face sample image.

The deep learning algorithm represented by the convolutional neural network has strong feature extraction capability, and the method based on deep learning quickly surpasses the traditional human face key point detection algorithm. For the deep learning method for face key point detection, MTCNN (multimedia-task masked simplified connected Networks) is taken as an example, and the algorithm performs face detection and face key point detection simultaneously by using potential relation between face detection and face key points.

MTCNN comprises three cascaded multitasking convolutional neural networks, Pnet (proposal network), Rnet (refine network), Onet (output network). The three network outputs of Pnet, Rnet and Onet are all three parts (face classification, judging whether the face is a face, frame regression and key point regression), and in practical application, the output of Pnet and Rnet is two parts: classification and frame regression, Onet output has three parts: classification, box regression, and keypoint regression. And the three networks of Pnet → Rnet → Onet are filtered step by step, so that the obtained face frame is more accurate.

And 103, inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image.

In this embodiment, the face candidate frame output by Pnet is used as the input of Rnet, and the Rnet further screens a large number of false candidate frames by using bounding box regression and NMS to obtain a more accurate face candidate frame, that is, a face target frame.

Step 104, inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.

Traditional Onet was used to refine the results further and output 5 face key points, this time using the mselos regression function. The MSELoss regression function gives a greater penalty for the case of larger error and a smaller penalty for the case of smaller error, and it can be seen from the training that the model prefers to penalize larger points, that is, MSE gives higher weight to outliers, which reduces the overall performance of the model. In a practical application scenario, some abnormal values are inevitable. When more than 5 key points are regressed, the point drift phenomenon can occur by using MSELoss, but key point regression is performed by using WingLoss to replace the MSELoss, WingLoss adopts a piecewise function, when small errors occur, the logarithm of ln is quoted for improvement, the gradient of the logarithm is 1/x, and the gradient value is larger as being closer to 0, so that loss can be amplified, and point regression is facilitated; when large errors appear, if the original function is continuously adopted, the abnormal values of the part are dominant, and the regression effect is reduced.

In the embodiment, firstly, a face sample image carrying preset face labeling information is obtained, then the face sample image is input into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image, then the face candidate frame and the face sample image are input into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image, finally the face target frame, the preset face labeling information and the face sample image are input into an Onet network of the MTCNN network to obtain a trained MTCNN network model, a preset WingLoss function is used in the key point regression, as WingLoss can relieve the sensitivity degree to abnormal values, the key point drift is not easy to occur in the key point detection of the face by the model obtained by the key point regression function training, thereby solving the problem that the existing face key point detection method is used for face detection, the method is easy to cause the technical problem of low detection accuracy of the key points of the human face due to the fact that the key points drift easily.

The above is an embodiment one of the training methods for a face key point detection model provided in the embodiments of the present application, and the following is an embodiment two of the training methods for a face key point detection model provided in the embodiments of the present application.

Referring to fig. 2, fig. 2 is a flowchart illustrating a second embodiment of a training method for a face keypoint detection model according to the present application.

The training method of the face key point detection model in the embodiment comprises the following steps:

step 201, obtaining an original human face sample image which is not marked.

In this embodiment, the original face sample image may be a previously photographed image or an image downloaded from a network.

In order to further enrich the face sample images, data enhancement algorithms such as mirroring and random rotation at different angles can be adopted to increase the diversity of samples.

Step 202, carrying out face frame labeling on the original face sample image through a preset labeling tool to obtain the face sample image labeled with the face labeling frame.

The preset labeling tool in this embodiment may be a labelme labeling tool, and in other embodiments, the preset labeling tool may also be in other forms, which are not limited and described in this embodiment.

And 203, detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates.

And 204, performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.

Step 205, inputting the face sample image into a Pnet network of the MTCNN network, so that the Pnet network performs face frame selection on the face sample image, and classifies the selection frames into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame.

Specifically, in this embodiment, the face frame selection is performed on the face sample image, and the frames are classified into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the respective face frames, which specifically includes:

selecting face frames of the face sample image to obtain a plurality of face selection frames;

calculating the overlapping rate between each face selecting frame and each face labeling frame;

and classifying the face selection frames according to the corresponding overlapping rate of each face selection frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame.

Further, according to the corresponding overlapping rate of each face frame, classifying the face frames to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame, specifically comprising:

selecting a face selection frame with the overlapping rate more than or equal to a first threshold value as a positive face candidate frame;

selecting the face picking frame with the overlapping rate larger than a second threshold and smaller than a first threshold as a middle face candidate frame;

and taking the face selection frame with the overlapping rate less than or equal to a second threshold value as a negative face candidate frame, wherein the second threshold value is less than the first threshold value.

It can be understood that, in the present embodiment, the corresponding number ratio of the positive face candidate frame, the middle face candidate frame, and the negative face candidate frame is 1:1: 3.

And step 206, inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image.

Step 207, inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.

Specifically, the preset WingLoss function in this embodiment is:

where, loss (x) is a preset WingLoss function, w is an interval between nonlinear parts, e is a curvature for limiting the bending degree of the curve, C is a constant, C is w-wln (1+ w/e), and x is a difference between a predicted value and a true value.

And (3) scaling the area corresponding to the face candidate frame to 48x 3, generating key point regression data with the same size by using the labeled face key points, wherein the proportion of the negative sample, the middle sample, the positive sample and the key points is 3:1:1:2, and inputting the data into an Onet for frame regression and key point regression.

WingLoss may mitigate sensitivity to outliers. The method is mainly embodied in the following two points: (1) a logarithmic loss is used. In the face key point regression task, the regression difficulty of each key point is different, errors of all the points are large in the initial training stage and can be regarded as large error, most key points are basically accurate in the middle and later training stages and can be regarded as small loss, loss needs to be amplified if the key points are required to be regressed more accurately, and the significance of logarithmic loss adopted by WingLoss is that; (2) a piecewise function is employed. The loss of several key points at the later stage of training is large loss, and at this time, if the original loss function is adopted, several outliers are dominant, regression is affected, and the outlier loss should be reduced.

The second embodiment of the training method for the face key point detection model provided in the embodiment of the present application is an application example of the training method for the face key point detection model provided in the embodiment of the present application.

FIG. 3 is a network structure of Pnet. Before input, the original picture is zoomed to form an image pyramid, the image pyramid is unified to input size 12x12 pixels required by Pnet, and finally, the face probability and the face candidate frame coordinates are output through three convolution operations. For the face classification task, a cross entropy loss function is adopted; for the face frame regression task, a mean square error is adopted, the task needs to predict the offset between each candidate frame and the real face frame, and the offset consists of four variables of the coordinate of the upper left corner of the frame, the vertical coordinate of the upper left corner of the frame, the width of the frame and the height of the frame. Therefore, the value of the box regression output is the relative shift of the abscissa of the upper left corner of the box, the relative shift of the ordinate of the upper left corner of the box, the error of the width of the box, the error of the height of the box, and the output shape is 1x1x 4.

Fig. 4 shows a network structure of Rnet. The network includes three convolutional layers and a fully-connected layer. Rnet scales the face candidate frame generated by the Pnet to 24x24 pixel size, and the output is used as the input of the Rnet, so that the result of the Pnet is further judged as the same as the Pnet, and most misjudgment situations in the Pnet are eliminated.

FIG. 5 is a network architecture diagram of Onet. The resulting face candidate box of Pnet and Rnet is further scaled to 48x48 size and input to the final Onet.

The output of Onet is more than the output of Pnet and Rnet for positioning key points of the human face, and for the part of tasks, WingLoss is adopted in the application.

To evaluate the performance of the present application, fig. 6 compares the effect of using mselos (fig. 6b) with WingLoss (fig. 6 a). As can be seen from fig. 6, there is a clear drift in the points at the nose, mouth and chin when using mselos, while the non-fitted keypoints can be calibrated using WingLoss, which indicates that WingLoss is very robust to keypoints.

The above is an application example of the training method for the face key point detection model provided in the embodiment of the present application, and the following is an embodiment of the face key point detection method provided in the embodiment of the present application.

Referring to fig. 7, the method for detecting key points of a human face in the present embodiment specifically includes:

and 701, acquiring a face image to be detected.

Step 702, inputting a facial image to be detected into a preset MTCNN network model to obtain key point information of the facial image to be detected output by the preset MTCNN network model, wherein the preset MTCNN network model is obtained by training according to the training method of any one of the embodiments.

In the embodiment, the preset WingLoss function is used during the key point regression, and because the WingLoss can relieve the sensitivity to abnormal values, the model obtained by training the key point regression function is not easy to have key point drift during the face key point detection, so that the technical problem that the face key point detection accuracy is low due to the fact that the key point drift is easy to occur during the face detection in the conventional face key point detection method is solved.

The above is an embodiment of a face keypoint detection method provided in the embodiment of the present application, and the following is an embodiment of a training apparatus for a face keypoint detection model provided in the embodiment of the present application.

Referring to fig. 8, the training apparatus for a face keypoint detection model in the present embodiment specifically includes:

an obtaining unit 801, configured to obtain a face sample image carrying preset face labeling information;

a first processing unit 802, configured to input the face sample image into a Pnet network of the MTCNN network, and obtain a face candidate frame corresponding to the face sample image;

the second processing unit 803 is configured to input the face candidate frame and the face sample image into an Rnet network of the MTCNN network, so as to obtain a face target frame corresponding to the face sample image;

a third processing unit 804, configured to input the face target frame, the preset face labeling information, and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, where a key point regression function in the Onet network is: and presetting a WingLoss function.

Further, the preset WingLoss function is as follows:

Further, the face labeling information includes: a face labeling frame and face key points;

the obtaining unit 801 specifically includes:

the acquiring subunit is used for acquiring an unlabeled original face sample image;

the first labeling subunit is used for performing face frame labeling on the original face sample image through a preset labeling tool to obtain a face sample image labeled with a face labeling frame;

the detection subunit is used for detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates;

and the second labeling subunit is used for performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.

Further, the face candidate frame includes: a positive face candidate frame, a middle face candidate frame and a negative face candidate frame;

the first processing unit 802 is specifically configured to input the face sample image to a Pnet network of the MTCNN network, so that the Pnet network performs face frame selection on the face sample image, and classifies the selection frames into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the respective face selection frames.

The face sample image is subjected to face frame selection, and the selection frames are classified into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame, and the method specifically comprises the following steps:

Classifying the face culling frames according to the corresponding overlapping rate of each face culling frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame, and specifically comprising:

The above is an embodiment of the training apparatus for a face keypoint detection model provided in the embodiment of the present application, and the following is an embodiment of the face keypoint detection apparatus provided in the embodiment of the present application.

Referring to fig. 9, the face key point detection device in the embodiment specifically includes:

an acquiring unit 901, configured to acquire a face image to be detected;

the detecting unit 902 is configured to input the facial image to be detected into a preset MTCNN network model, so as to obtain key point information of the facial image to be detected output by the preset MTCNN network model, where the preset MTCNN network model is obtained by training according to the training method in any embodiment.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A training method of a face key point detection model is characterized by comprising the following steps:

acquiring a face sample image carrying preset face labeling information;

2. Training method according to claim 1, characterized in that the preset WingLoss function is:

3. The training method of claim 1, wherein the face labeling information comprises: a face labeling frame and face key points;

acquiring an unmarked original human face sample image;

4. The training method of claim 3, wherein the face candidate box comprises: a positive face candidate frame, a middle face candidate frame and a negative face candidate frame;

the inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image specifically includes:

5. The training method according to claim 4, wherein the face frame selection is performed on the face sample image, and the selection frames are classified into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the respective face selection frames, and specifically includes:

6. The training method according to claim 5, wherein the classifying the face culling boxes according to the overlapping rates corresponding to the face culling boxes to obtain a positive face candidate box, a middle face candidate box, and a negative face candidate box specifically comprises:

7. A face key point detection method is characterized by comprising the following steps:

acquiring a human face image to be detected;

inputting the facial image to be detected into a preset MTCNN network model to obtain key point information of the facial image to be detected output by the preset MTCNN network model, wherein the preset MTCNN network model is obtained by training according to the training method of any one of claims 1 to 6.

8. The utility model provides a training device of face key point detection model which characterized in that includes:

9. Training apparatus according to claim 8, characterised in that the preset WingLoss function is:

10. A face key point detection device, comprising:

the acquisition unit is used for acquiring a face image to be detected;

a detection unit, configured to input the facial image to be detected into a preset MTCNN network model, so as to obtain key point information of the facial image to be detected output by the preset MTCNN network model, where the preset MTCNN network model is obtained by training according to the training method of any one of claims 1 to 6.