CN113221812A - Training method of face key point detection model and face key point detection method - Google Patents

Training method of face key point detection model and face key point detection method Download PDF

Info

Publication number
CN113221812A
CN113221812A CN202110579203.0A CN202110579203A CN113221812A CN 113221812 A CN113221812 A CN 113221812A CN 202110579203 A CN202110579203 A CN 202110579203A CN 113221812 A CN113221812 A CN 113221812A
Authority
CN
China
Prior art keywords
face
sample image
network
frame
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110579203.0A
Other languages
Chinese (zh)
Inventor
刘思伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weaving Point Intelligent Technology Co ltd
Original Assignee
Guangzhou Weaving Point Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weaving Point Intelligent Technology Co ltd filed Critical Guangzhou Weaving Point Intelligent Technology Co ltd
Priority to CN202110579203.0A priority Critical patent/CN113221812A/en
Publication of CN113221812A publication Critical patent/CN113221812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a training method of a face key point detection model and a face key point detection method, wherein the training method comprises the following steps: acquiring a face sample image carrying preset face labeling information; inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image; inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image; inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: the method has the advantages that the WingLoss function is preset, and the technical problem that the key point detection accuracy is low due to the fact that key point drift is easy to occur when the face is detected by the existing face key point detection method is solved.

Description

Training method of face key point detection model and face key point detection method
Technical Field
The application relates to the field of computer vision, in particular to a training method of a face key point detection model and a face key point detection method.
Background
Face keypoint detection has a key role for numerous applications, such as face pose correction, pose recognition, expression recognition, fatigue monitoring, mouth shape recognition, and the like. Therefore, how to obtain high-precision face key points is always a research hotspot in the field of computer vision.
The detection of the key points of the human face refers to the positioning of the key points of the human face, such as eyebrows, eyes, a nose, a mouth, a face contour and the like, of the human face given by the human face image. When the existing face key point detection method is used for detecting a face, key point drift is easy to occur, so that the face key point detection accuracy is low.
Disclosure of Invention
The application provides a training method of a face key point detection model and a face key point detection method, and solves the technical problem that the face key point detection accuracy is low due to the fact that key point drift is easy to occur when face detection is carried out by an existing face key point detection method.
In view of this, a first aspect of the present application provides a method for training a face keypoint detection model, including:
acquiring a face sample image carrying preset face labeling information;
inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image;
inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image;
inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.
Optionally, the preset WingLoss function is as follows:
Figure BDA0003085408880000021
where loss (x) is a preset WingLoss function, w is the interval between nonlinear components, ε is the curvature used to limit the degree of curve curvature, c is a constant, and x is the difference between the predicted and true values.
Optionally, the face labeling information includes: a face labeling frame and face key points;
the acquiring of the face sample image carrying the preset face labeling information specifically includes:
acquiring an unmarked original human face sample image;
carrying out face frame labeling on the original face sample image by a preset labeling tool to obtain a face sample image labeled with a face labeling frame;
detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates;
and performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.
Optionally, the face candidate box includes: a positive face candidate frame, a middle face candidate frame and a negative face candidate frame;
inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image, specifically comprising:
and inputting the face sample image into a Pnet network of an MTCNN (multiple-transmission-network) network, so that the Pnet network performs face frame selection on the face sample image, and classifying the selection frames into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame.
Optionally, the selecting the face frames of the face sample image, and classifying the frames into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the face frames specifically include:
carrying out face frame selection on the face sample image to obtain a plurality of face selection frames;
calculating the overlapping rate between each face selecting frame and the face labeling frame;
and classifying the face selection frames according to the overlapping rate corresponding to each face selection frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame.
Optionally, the classifying the face culling frames according to the overlapping rates corresponding to the face culling frames to obtain a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame specifically includes:
taking the face selection frame with the overlapping rate more than or equal to a first threshold value as a positive face candidate frame;
taking the face culling box with the overlapping rate larger than a second threshold and smaller than the first threshold as the middle face candidate box;
and taking the face selection frame with the overlapping rate less than or equal to the second threshold as a negative face candidate frame, wherein the second threshold is less than the first threshold.
A second aspect of the present application provides a method for detecting a face key point, including:
acquiring a human face image to be detected;
inputting the facial image to be detected into a preset MTCNN network model to obtain key point information of the facial image to be detected, which is output by the preset MTCNN network model, wherein the preset MTCNN network model is obtained by training according to any one of the training methods of the first aspect.
The third aspect of the present application provides a training apparatus for a face key point detection model, including:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a face sample image carrying preset face labeling information;
the first processing unit is used for inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image;
the second processing unit is used for inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image;
a third processing unit, configured to input the face target frame, the preset face labeling information, and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, where a key point regression function in the Onet network is: and presetting a WingLoss function.
Optionally, the preset WingLoss function is as follows:
Figure BDA0003085408880000031
where loss (x) is a preset WingLoss function, w is the interval between nonlinear components, ε is the curvature used to limit the degree of curve curvature, c is a constant, and x is the difference between the predicted and true values.
The present application in a fourth aspect provides a face keypoint detection apparatus, comprising:
the acquisition unit is used for acquiring a face image to be detected;
a detection unit, configured to input the facial image to be detected into a preset MTCNN network model, to obtain key point information of the facial image to be detected output by the preset MTCNN network model, where the preset MTCNN network model is obtained by training according to any one of the training methods of the first aspect.
From the above technical method, the present application has the following advantages:
after researching the prior art, the inventor finds that the key point falling drift in the prior face key point detection is caused by the used key point regression function. The key point regression function in the prior art is MSELoss, a certain abnormal value inevitably exists in the process of marking key points of a human face, the MSELoss is easily influenced by the abnormal value, when the abnormal value appears in data, a model using MSE can endow the abnormal point with larger weight, so that the convergence of the model is poorer, in the application, a preset WingLoss function is used during the key point regression, and because WingLoss can relieve the sensitivity degree to the abnormal value, the key point drift is not easily generated when the model obtained by training the key point regression function is used for detecting the key points of the human face, so that the technical problem that the key point drift is easily generated when the existing human face key point detection method is used for detecting the human face, and the detection accuracy of the key points of the human face is lower is solved.
Drawings
In order to more clearly illustrate the technical method in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive labor.
Fig. 1 is a schematic flowchart of a first embodiment of a training method for a face keypoint detection model in an embodiment of the present application;
fig. 2 is a schematic flowchart of a second embodiment of a training method for a face keypoint detection model in the embodiment of the present application;
FIG. 3 is a schematic diagram of a Pnet network of an MTCNN network according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an Rnet network of an MTCNN network according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a Onet network of the MTCNN network in the embodiment of the present application;
FIG. 6a is an effect diagram of face keypoint detection using preset wingLoss in the embodiment of the present application;
FIG. 6b is a diagram illustrating the effect of face keypoint detection using the MSELoss function in the embodiment of the present application;
fig. 7 is a schematic flowchart of an embodiment of a method for detecting a face key point in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an embodiment of a training apparatus for a face keypoint detection model in an embodiment of the present application;
fig. 9 is a schematic structural diagram of an embodiment of a face keypoint detection apparatus in an embodiment of the present application.
Detailed Description
The embodiment of the application provides a training method of a face key point detection model and a face key point detection method, and solves the technical problem that the face key point detection accuracy is low due to the fact that key point drift is easy to occur when the face is detected by the existing face key point detection method.
In order to make the method of the present application better understood, the technical method in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, please refer to fig. 1, where fig. 1 is a schematic flowchart of a first embodiment of a training method for a face keypoint detection model in an embodiment of the present application.
In this embodiment, a training method for a face key point detection model includes:
step 101, obtaining a face sample image carrying preset face labeling information.
And 102, inputting the face sample image into a Pnet network of the MTCNN network to obtain a face candidate frame corresponding to the face sample image.
The deep learning algorithm represented by the convolutional neural network has strong feature extraction capability, and the method based on deep learning quickly surpasses the traditional human face key point detection algorithm. For the deep learning method for face key point detection, MTCNN (multimedia-task masked simplified connected Networks) is taken as an example, and the algorithm performs face detection and face key point detection simultaneously by using potential relation between face detection and face key points.
MTCNN comprises three cascaded multitasking convolutional neural networks, Pnet (proposal network), Rnet (refine network), Onet (output network). The three network outputs of Pnet, Rnet and Onet are all three parts (face classification, judging whether the face is a face, frame regression and key point regression), and in practical application, the output of Pnet and Rnet is two parts: classification and frame regression, Onet output has three parts: classification, box regression, and keypoint regression. And the three networks of Pnet → Rnet → Onet are filtered step by step, so that the obtained face frame is more accurate.
And 103, inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image.
In this embodiment, the face candidate frame output by Pnet is used as the input of Rnet, and the Rnet further screens a large number of false candidate frames by using bounding box regression and NMS to obtain a more accurate face candidate frame, that is, a face target frame.
Step 104, inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.
Traditional Onet was used to refine the results further and output 5 face key points, this time using the mselos regression function. The MSELoss regression function gives a greater penalty for the case of larger error and a smaller penalty for the case of smaller error, and it can be seen from the training that the model prefers to penalize larger points, that is, MSE gives higher weight to outliers, which reduces the overall performance of the model. In a practical application scenario, some abnormal values are inevitable. When more than 5 key points are regressed, the point drift phenomenon can occur by using MSELoss, but key point regression is performed by using WingLoss to replace the MSELoss, WingLoss adopts a piecewise function, when small errors occur, the logarithm of ln is quoted for improvement, the gradient of the logarithm is 1/x, and the gradient value is larger as being closer to 0, so that loss can be amplified, and point regression is facilitated; when large errors appear, if the original function is continuously adopted, the abnormal values of the part are dominant, and the regression effect is reduced.
In the embodiment, firstly, a face sample image carrying preset face labeling information is obtained, then the face sample image is input into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image, then the face candidate frame and the face sample image are input into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image, finally the face target frame, the preset face labeling information and the face sample image are input into an Onet network of the MTCNN network to obtain a trained MTCNN network model, a preset WingLoss function is used in the key point regression, as WingLoss can relieve the sensitivity degree to abnormal values, the key point drift is not easy to occur in the key point detection of the face by the model obtained by the key point regression function training, thereby solving the problem that the existing face key point detection method is used for face detection, the method is easy to cause the technical problem of low detection accuracy of the key points of the human face due to the fact that the key points drift easily.
The above is an embodiment one of the training methods for a face key point detection model provided in the embodiments of the present application, and the following is an embodiment two of the training methods for a face key point detection model provided in the embodiments of the present application.
Referring to fig. 2, fig. 2 is a flowchart illustrating a second embodiment of a training method for a face keypoint detection model according to the present application.
The training method of the face key point detection model in the embodiment comprises the following steps:
step 201, obtaining an original human face sample image which is not marked.
In this embodiment, the original face sample image may be a previously photographed image or an image downloaded from a network.
In order to further enrich the face sample images, data enhancement algorithms such as mirroring and random rotation at different angles can be adopted to increase the diversity of samples.
Step 202, carrying out face frame labeling on the original face sample image through a preset labeling tool to obtain the face sample image labeled with the face labeling frame.
The preset labeling tool in this embodiment may be a labelme labeling tool, and in other embodiments, the preset labeling tool may also be in other forms, which are not limited and described in this embodiment.
And 203, detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates.
And 204, performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.
Step 205, inputting the face sample image into a Pnet network of the MTCNN network, so that the Pnet network performs face frame selection on the face sample image, and classifies the selection frames into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame.
Specifically, in this embodiment, the face frame selection is performed on the face sample image, and the frames are classified into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the respective face frames, which specifically includes:
selecting face frames of the face sample image to obtain a plurality of face selection frames;
calculating the overlapping rate between each face selecting frame and each face labeling frame;
and classifying the face selection frames according to the corresponding overlapping rate of each face selection frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame.
Further, according to the corresponding overlapping rate of each face frame, classifying the face frames to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame, specifically comprising:
selecting a face selection frame with the overlapping rate more than or equal to a first threshold value as a positive face candidate frame;
selecting the face picking frame with the overlapping rate larger than a second threshold and smaller than a first threshold as a middle face candidate frame;
and taking the face selection frame with the overlapping rate less than or equal to a second threshold value as a negative face candidate frame, wherein the second threshold value is less than the first threshold value.
It can be understood that, in the present embodiment, the corresponding number ratio of the positive face candidate frame, the middle face candidate frame, and the negative face candidate frame is 1:1: 3.
And step 206, inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image.
Step 207, inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.
Specifically, the preset WingLoss function in this embodiment is:
Figure BDA0003085408880000081
where, loss (x) is a preset WingLoss function, w is an interval between nonlinear parts, e is a curvature for limiting the bending degree of the curve, C is a constant, C is w-wln (1+ w/e), and x is a difference between a predicted value and a true value.
And (3) scaling the area corresponding to the face candidate frame to 48x 3, generating key point regression data with the same size by using the labeled face key points, wherein the proportion of the negative sample, the middle sample, the positive sample and the key points is 3:1:1:2, and inputting the data into an Onet for frame regression and key point regression.
WingLoss may mitigate sensitivity to outliers. The method is mainly embodied in the following two points: (1) a logarithmic loss is used. In the face key point regression task, the regression difficulty of each key point is different, errors of all the points are large in the initial training stage and can be regarded as large error, most key points are basically accurate in the middle and later training stages and can be regarded as small loss, loss needs to be amplified if the key points are required to be regressed more accurately, and the significance of logarithmic loss adopted by WingLoss is that; (2) a piecewise function is employed. The loss of several key points at the later stage of training is large loss, and at this time, if the original loss function is adopted, several outliers are dominant, regression is affected, and the outlier loss should be reduced.
In the embodiment, firstly, a face sample image carrying preset face labeling information is obtained, then the face sample image is input into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image, then the face candidate frame and the face sample image are input into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image, finally the face target frame, the preset face labeling information and the face sample image are input into an Onet network of the MTCNN network to obtain a trained MTCNN network model, a preset WingLoss function is used in the key point regression, as WingLoss can relieve the sensitivity degree to abnormal values, the key point drift is not easy to occur in the key point detection of the face by the model obtained by the key point regression function training, thereby solving the problem that the existing face key point detection method is used for face detection, the method is easy to cause the technical problem of low detection accuracy of the key points of the human face due to the fact that the key points drift easily.
The second embodiment of the training method for the face key point detection model provided in the embodiment of the present application is an application example of the training method for the face key point detection model provided in the embodiment of the present application.
The training method of the face key point detection model in the embodiment comprises the following steps:
FIG. 3 is a network structure of Pnet. Before input, the original picture is zoomed to form an image pyramid, the image pyramid is unified to input size 12x12 pixels required by Pnet, and finally, the face probability and the face candidate frame coordinates are output through three convolution operations. For the face classification task, a cross entropy loss function is adopted; for the face frame regression task, a mean square error is adopted, the task needs to predict the offset between each candidate frame and the real face frame, and the offset consists of four variables of the coordinate of the upper left corner of the frame, the vertical coordinate of the upper left corner of the frame, the width of the frame and the height of the frame. Therefore, the value of the box regression output is the relative shift of the abscissa of the upper left corner of the box, the relative shift of the ordinate of the upper left corner of the box, the error of the width of the box, the error of the height of the box, and the output shape is 1x1x 4.
Fig. 4 shows a network structure of Rnet. The network includes three convolutional layers and a fully-connected layer. Rnet scales the face candidate frame generated by the Pnet to 24x24 pixel size, and the output is used as the input of the Rnet, so that the result of the Pnet is further judged as the same as the Pnet, and most misjudgment situations in the Pnet are eliminated.
FIG. 5 is a network architecture diagram of Onet. The resulting face candidate box of Pnet and Rnet is further scaled to 48x48 size and input to the final Onet.
The output of Onet is more than the output of Pnet and Rnet for positioning key points of the human face, and for the part of tasks, WingLoss is adopted in the application.
To evaluate the performance of the present application, fig. 6 compares the effect of using mselos (fig. 6b) with WingLoss (fig. 6 a). As can be seen from fig. 6, there is a clear drift in the points at the nose, mouth and chin when using mselos, while the non-fitted keypoints can be calibrated using WingLoss, which indicates that WingLoss is very robust to keypoints.
The above is an application example of the training method for the face key point detection model provided in the embodiment of the present application, and the following is an embodiment of the face key point detection method provided in the embodiment of the present application.
Referring to fig. 7, the method for detecting key points of a human face in the present embodiment specifically includes:
and 701, acquiring a face image to be detected.
Step 702, inputting a facial image to be detected into a preset MTCNN network model to obtain key point information of the facial image to be detected output by the preset MTCNN network model, wherein the preset MTCNN network model is obtained by training according to the training method of any one of the embodiments.
In the embodiment, the preset WingLoss function is used during the key point regression, and because the WingLoss can relieve the sensitivity to abnormal values, the model obtained by training the key point regression function is not easy to have key point drift during the face key point detection, so that the technical problem that the face key point detection accuracy is low due to the fact that the key point drift is easy to occur during the face detection in the conventional face key point detection method is solved.
The above is an embodiment of a face keypoint detection method provided in the embodiment of the present application, and the following is an embodiment of a training apparatus for a face keypoint detection model provided in the embodiment of the present application.
Referring to fig. 8, the training apparatus for a face keypoint detection model in the present embodiment specifically includes:
an obtaining unit 801, configured to obtain a face sample image carrying preset face labeling information;
a first processing unit 802, configured to input the face sample image into a Pnet network of the MTCNN network, and obtain a face candidate frame corresponding to the face sample image;
the second processing unit 803 is configured to input the face candidate frame and the face sample image into an Rnet network of the MTCNN network, so as to obtain a face target frame corresponding to the face sample image;
a third processing unit 804, configured to input the face target frame, the preset face labeling information, and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, where a key point regression function in the Onet network is: and presetting a WingLoss function.
Further, the preset WingLoss function is as follows:
Figure BDA0003085408880000101
where, loss (x) is a preset WingLoss function, w is an interval between nonlinear parts, e is a curvature for limiting the bending degree of the curve, C is a constant, C is w-wln (1+ w/e), and x is a difference between a predicted value and a true value.
Further, the face labeling information includes: a face labeling frame and face key points;
the obtaining unit 801 specifically includes:
the acquiring subunit is used for acquiring an unlabeled original face sample image;
the first labeling subunit is used for performing face frame labeling on the original face sample image through a preset labeling tool to obtain a face sample image labeled with a face labeling frame;
the detection subunit is used for detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates;
and the second labeling subunit is used for performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.
Further, the face candidate frame includes: a positive face candidate frame, a middle face candidate frame and a negative face candidate frame;
the first processing unit 802 is specifically configured to input the face sample image to a Pnet network of the MTCNN network, so that the Pnet network performs face frame selection on the face sample image, and classifies the selection frames into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the respective face selection frames.
The face sample image is subjected to face frame selection, and the selection frames are classified into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame, and the method specifically comprises the following steps:
selecting face frames of the face sample image to obtain a plurality of face selection frames;
calculating the overlapping rate between each face selecting frame and each face labeling frame;
and classifying the face selection frames according to the corresponding overlapping rate of each face selection frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame.
Classifying the face culling frames according to the corresponding overlapping rate of each face culling frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame, and specifically comprising:
selecting a face selection frame with the overlapping rate more than or equal to a first threshold value as a positive face candidate frame;
selecting the face picking frame with the overlapping rate larger than a second threshold and smaller than a first threshold as a middle face candidate frame;
and taking the face selection frame with the overlapping rate less than or equal to a second threshold value as a negative face candidate frame, wherein the second threshold value is less than the first threshold value.
In the embodiment, firstly, a face sample image carrying preset face labeling information is obtained, then the face sample image is input into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image, then the face candidate frame and the face sample image are input into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image, finally the face target frame, the preset face labeling information and the face sample image are input into an Onet network of the MTCNN network to obtain a trained MTCNN network model, a preset WingLoss function is used in the key point regression, as WingLoss can relieve the sensitivity degree to abnormal values, the key point drift is not easy to occur in the key point detection of the face by the model obtained by the key point regression function training, thereby solving the problem that the existing face key point detection method is used for face detection, the method is easy to cause the technical problem of low detection accuracy of the key points of the human face due to the fact that the key points drift easily.
The above is an embodiment of the training apparatus for a face keypoint detection model provided in the embodiment of the present application, and the following is an embodiment of the face keypoint detection apparatus provided in the embodiment of the present application.
Referring to fig. 9, the face key point detection device in the embodiment specifically includes:
an acquiring unit 901, configured to acquire a face image to be detected;
the detecting unit 902 is configured to input the facial image to be detected into a preset MTCNN network model, so as to obtain key point information of the facial image to be detected output by the preset MTCNN network model, where the preset MTCNN network model is obtained by training according to the training method in any embodiment.
In the embodiment, the preset WingLoss function is used during the key point regression, and because the WingLoss can relieve the sensitivity to abnormal values, the model obtained by training the key point regression function is not easy to have key point drift during the face key point detection, so that the technical problem that the face key point detection accuracy is low due to the fact that the key point drift is easy to occur during the face detection in the conventional face key point detection method is solved.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A training method of a face key point detection model is characterized by comprising the following steps:
acquiring a face sample image carrying preset face labeling information;
inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image;
inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image;
inputting the face target frame, the preset face labeling information and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, wherein a key point regression function in the Onet network is as follows: and presetting a WingLoss function.
2. Training method according to claim 1, characterized in that the preset WingLoss function is:
Figure FDA0003085408870000011
where loss (x) is a preset WingLoss function, w is the interval between nonlinear components, ε is the curvature used to limit the degree of curve curvature, c is a constant, and x is the difference between the predicted and true values.
3. The training method of claim 1, wherein the face labeling information comprises: a face labeling frame and face key points;
the acquiring of the face sample image carrying the preset face labeling information specifically includes:
acquiring an unmarked original human face sample image;
carrying out face frame labeling on the original face sample image by a preset labeling tool to obtain a face sample image labeled with a face labeling frame;
detecting key points of the original face sample image through a preset detection interface to obtain key point coordinates;
and performing key point labeling on the original face sample image according to the key point coordinates to obtain the face sample image labeled with the face key points.
4. The training method of claim 3, wherein the face candidate box comprises: a positive face candidate frame, a middle face candidate frame and a negative face candidate frame;
the inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image specifically includes:
and inputting the face sample image into a Pnet network of an MTCNN (multiple-transmission-network) network, so that the Pnet network performs face frame selection on the face sample image, and classifying the selection frames into a positive face candidate frame, a middle face candidate frame and a negative face candidate frame based on the corresponding overlapping rate of each face selection frame.
5. The training method according to claim 4, wherein the face frame selection is performed on the face sample image, and the selection frames are classified into a positive face candidate frame, a middle face candidate frame, and a negative face candidate frame based on the overlapping rates corresponding to the respective face selection frames, and specifically includes:
carrying out face frame selection on the face sample image to obtain a plurality of face selection frames;
calculating the overlapping rate between each face selecting frame and the face labeling frame;
and classifying the face selection frames according to the overlapping rate corresponding to each face selection frame to obtain a positive face candidate frame, a middle face candidate frame and a negative face candidate frame.
6. The training method according to claim 5, wherein the classifying the face culling boxes according to the overlapping rates corresponding to the face culling boxes to obtain a positive face candidate box, a middle face candidate box, and a negative face candidate box specifically comprises:
taking the face selection frame with the overlapping rate more than or equal to a first threshold value as a positive face candidate frame;
taking the face culling box with the overlapping rate larger than a second threshold and smaller than the first threshold as the middle face candidate box;
and taking the face selection frame with the overlapping rate less than or equal to the second threshold as a negative face candidate frame, wherein the second threshold is less than the first threshold.
7. A face key point detection method is characterized by comprising the following steps:
acquiring a human face image to be detected;
inputting the facial image to be detected into a preset MTCNN network model to obtain key point information of the facial image to be detected output by the preset MTCNN network model, wherein the preset MTCNN network model is obtained by training according to the training method of any one of claims 1 to 6.
8. The utility model provides a training device of face key point detection model which characterized in that includes:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring a face sample image carrying preset face labeling information;
the first processing unit is used for inputting the face sample image into a Pnet network of an MTCNN network to obtain a face candidate frame corresponding to the face sample image;
the second processing unit is used for inputting the face candidate frame and the face sample image into an Rnet network of the MTCNN network to obtain a face target frame corresponding to the face sample image;
a third processing unit, configured to input the face target frame, the preset face labeling information, and the face sample image into an Onet network of the MTCNN network to obtain a trained MTCNN network model, where a key point regression function in the Onet network is: and presetting a WingLoss function.
9. Training apparatus according to claim 8, characterised in that the preset WingLoss function is:
Figure FDA0003085408870000031
where loss (x) is a preset WingLoss function, w is the interval between nonlinear components, ε is the curvature used to limit the degree of curve curvature, c is a constant, and x is the difference between the predicted and true values.
10. A face key point detection device, comprising:
the acquisition unit is used for acquiring a face image to be detected;
a detection unit, configured to input the facial image to be detected into a preset MTCNN network model, so as to obtain key point information of the facial image to be detected output by the preset MTCNN network model, where the preset MTCNN network model is obtained by training according to the training method of any one of claims 1 to 6.
CN202110579203.0A 2021-05-26 2021-05-26 Training method of face key point detection model and face key point detection method Pending CN113221812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110579203.0A CN113221812A (en) 2021-05-26 2021-05-26 Training method of face key point detection model and face key point detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110579203.0A CN113221812A (en) 2021-05-26 2021-05-26 Training method of face key point detection model and face key point detection method

Publications (1)

Publication Number Publication Date
CN113221812A true CN113221812A (en) 2021-08-06

Family

ID=77098668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110579203.0A Pending CN113221812A (en) 2021-05-26 2021-05-26 Training method of face key point detection model and face key point detection method

Country Status (1)

Country Link
CN (1) CN113221812A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel
WO2023020289A1 (en) * 2021-08-16 2023-02-23 北京百度网讯科技有限公司 Processing method and apparatus for network model, and device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992864A (en) * 2018-01-15 2018-05-04 武汉神目信息技术有限公司 A kind of vivo identification method and device based on image texture
CN109543545A (en) * 2018-10-25 2019-03-29 北京陌上花科技有限公司 Fast face detecting method and device
CN109815810A (en) * 2018-12-20 2019-05-28 北京以萨技术股份有限公司 A kind of biopsy method based on single camera
WO2019114036A1 (en) * 2017-12-12 2019-06-20 深圳云天励飞技术有限公司 Face detection method and device, computer device, and computer readable storage medium
CN109993086A (en) * 2019-03-21 2019-07-09 北京华捷艾米科技有限公司 Method for detecting human face, device, system and terminal device
CN111191616A (en) * 2020-01-02 2020-05-22 广州织点智能科技有限公司 Face shielding detection method, device, equipment and storage medium
CN112232117A (en) * 2020-09-08 2021-01-15 深圳微步信息股份有限公司 Face recognition method, face recognition device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019114036A1 (en) * 2017-12-12 2019-06-20 深圳云天励飞技术有限公司 Face detection method and device, computer device, and computer readable storage medium
CN107992864A (en) * 2018-01-15 2018-05-04 武汉神目信息技术有限公司 A kind of vivo identification method and device based on image texture
CN109543545A (en) * 2018-10-25 2019-03-29 北京陌上花科技有限公司 Fast face detecting method and device
CN109815810A (en) * 2018-12-20 2019-05-28 北京以萨技术股份有限公司 A kind of biopsy method based on single camera
CN109993086A (en) * 2019-03-21 2019-07-09 北京华捷艾米科技有限公司 Method for detecting human face, device, system and terminal device
CN111191616A (en) * 2020-01-02 2020-05-22 广州织点智能科技有限公司 Face shielding detection method, device, equipment and storage medium
CN112232117A (en) * 2020-09-08 2021-01-15 深圳微步信息股份有限公司 Face recognition method, face recognition device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Z. WANG ET AL.: "Learning to Detect Head Movement in Unconstrained Remote Gaze Estimation in the Wild", 《2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV)》, 31 December 2020 (2020-12-31), pages 4 *
崔馨方: "关于人脸关键点检测的若干问题研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 5, 15 May 2020 (2020-05-15), pages 2 - 10 *
陈雨薇: "基于改进MTCNN模型的人脸检测与面部关键点定位", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 1, 15 January 2020 (2020-01-15), pages 3 - 3 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023020289A1 (en) * 2021-08-16 2023-02-23 北京百度网讯科技有限公司 Processing method and apparatus for network model, and device and storage medium
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel

Similar Documents

Publication Publication Date Title
CN110728209B (en) Gesture recognition method and device, electronic equipment and storage medium
US10318797B2 (en) Image processing apparatus and image processing method
CN110532970B (en) Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces
CN111091109B (en) Method, system and equipment for predicting age and gender based on face image
CN110287963B (en) OCR recognition method for comprehensive performance test
EP0363828A2 (en) Method and apparatus for adaptive learning type general purpose image measurement and recognition
CN105550641B (en) Age estimation method and system based on multi-scale linear differential texture features
CN111652869B (en) Slab void identification method, system, medium and terminal based on deep learning
CN110287787B (en) Image recognition method, image recognition device and computer-readable storage medium
CN106980825B (en) Human face posture classification method based on normalized pixel difference features
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
CN108108760A (en) A kind of fast human face recognition
CN113221812A (en) Training method of face key point detection model and face key point detection method
CN110543848B (en) Driver action recognition method and device based on three-dimensional convolutional neural network
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN112633221A (en) Face direction detection method and related device
CN111415339A (en) Image defect detection method for complex texture industrial product
CN114155610B (en) Panel assembly key action identification method based on upper half body posture estimation
CN111950457A (en) Oil field safety production image identification method and system
CN104298960A (en) Robust analysis for deformable object classification and recognition by image sensors
CN111626197B (en) Recognition method based on human behavior recognition network model
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data
CN112989958A (en) Helmet wearing identification method based on YOLOv4 and significance detection
CN112580527A (en) Facial expression recognition method based on convolution long-term and short-term memory network
CN112329663A (en) Micro-expression time detection method and device based on face image sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination