CN113313010A - Face key point detection model training method, device and equipment - Google Patents

Face key point detection model training method, device and equipment Download PDF

Info

Publication number
CN113313010A
CN113313010A CN202110579215.3A CN202110579215A CN113313010A CN 113313010 A CN113313010 A CN 113313010A CN 202110579215 A CN202110579215 A CN 202110579215A CN 113313010 A CN113313010 A CN 113313010A
Authority
CN
China
Prior art keywords
face
training sample
net network
trained
key point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110579215.3A
Other languages
Chinese (zh)
Inventor
刘畅
刘思伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Weaving Point Intelligent Technology Co ltd
Original Assignee
Guangzhou Weaving Point Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weaving Point Intelligent Technology Co ltd filed Critical Guangzhou Weaving Point Intelligent Technology Co ltd
Priority to CN202110579215.3A priority Critical patent/CN113313010A/en
Publication of CN113313010A publication Critical patent/CN113313010A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a training method, a device and equipment of a face key point detection model, in the process of respectively training a P-Net network, an R-Net network and an O-Net network by a first face training sample, a second face training sample and a third face training sample, a first loss value of the R-Net network is adjusted by face angle information of the second face training sample, a second loss value of the O-Net network is adjusted by face angle information of the third face training sample, loss values of a face key point detection task under different postures are directionally adjusted by the face angle information, the detection precision of the face key point detection model under a large posture state is improved, the problem that the existing face key point detection method does not consider extra information of the face key point task is solved, particularly under the condition that a face is under a large posture angle, the technical problem that the detection precision of the key points of the human face is not high exists.

Description

Face key point detection model training method, device and equipment
Technical Field
The application relates to the technical field of face key point detection, in particular to a face key point detection model training method, device and equipment.
Background
The existing face key point detection technology has a good detection effect when facing a face in a relatively ideal environment. However, for the face key point detection technology applied to the mobile terminal, the requirement for real-time performance is high, and a large-scale neural network model cannot be adopted at the mobile terminal. Therefore, in the prior art, a lightweight network model is usually adopted for face key point detection, and the face key point detection task is divided into two tasks: the human face key point detection task is performed on the basis of the human face detection task, and the human face key point detection is performed by adopting a lightweight network model, so that the requirement on real-time performance can be met, but the requirement on precision is difficult to meet.
The existing lightweight network for detecting the human face key points is biased to a human face detection task, extra information of the human face key point task is not considered, and particularly, the accuracy of detecting the human face key points of the existing lightweight network is not high under the condition that a human face is in a large posture angle, such as under the conditions of a side face, a head-up condition, a head-down condition and the like.
Disclosure of Invention
The application provides a face key point detection model training method, a face key point detection model training device and face key point detection model training equipment, which are used for solving the technical problem that the existing face key point detection method does not consider extra information of a face key point task, and particularly has low face key point detection precision under the condition that a face is in a large posture angle.
In view of this, a first aspect of the present application provides a method for training a face keypoint detection model, including:
acquiring a first output result of a P-Net network trained by a first face training sample, and acquiring a second face training sample based on the first output result and an original face training sample, wherein the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample;
training an R-Net network through the second face training sample, and acquiring a first loss value of the R-Net network in the training process;
adjusting the first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network according to the adjusted first loss value to obtain the trained R-Net network;
inputting the second face training sample into the trained R-Net network to obtain a second output result, and obtaining a third face training sample according to the second output result and the original face training sample, wherein the third face training sample is marked with face angle information;
training an O-Net network through the third face training sample, and acquiring a second loss value of the O-Net network in the training process;
adjusting the second loss value according to the face angle information of the third face training sample, and updating the network parameters of the O-Net network according to the adjusted second loss value to obtain the trained O-Net network;
and combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, wherein the face key point detection model is used for face detection, face frame detection and face key point detection.
Optionally, the obtaining a first output result of a P-Net network trained by a first face training sample, and obtaining a second face training sample based on the first output result and an original face training sample, where the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample, and includes:
inputting a first face training sample into a trained P-Net network for face frame prediction, and obtaining a face frame prediction result of the first face training sample output by the trained P-Net network, wherein the first face training sample is obtained by cutting an original face training sample;
and cutting the original face training sample according to the face frame prediction result of the first face training sample to obtain a second face training sample, and acquiring the face angle information of the second face training sample according to the face key point coordinates of the second face training sample.
Optionally, the inputting the second face training sample into the trained R-Net network to obtain a second output result, and obtaining a third face training sample according to the second output result and the original face training sample, where the third face training sample is labeled with face angle information, includes:
inputting a second face training sample into the trained R-Net network to perform face frame prediction, and acquiring a face frame prediction result of the second face training sample output by the trained R-Net network;
and cutting the original face training sample according to the face frame prediction result of the second face training sample to obtain a third face training sample, and acquiring the face angle information of the third face training sample according to the face key point coordinates of the third face training sample.
Optionally, the method further includes:
according to the face angle information of the second face training sample or the third face training sample, taking the second face training sample or the third face training sample with a face angle exceeding a preset angle range as a non-target training sample, and taking the second face training sample or the third face training sample with a face angle within the preset angle range as a target training sample;
performing data enhancement on the non-target training sample to obtain an enhanced training sample;
and fusing the enhanced training sample and the target training sample to obtain the preprocessed second face training sample or the preprocessed third face training sample.
Optionally, the adjusted first loss value or the adjusted second loss value is:
Figure BDA0003085407220000031
wherein L isThe first loss value or the second loss value before adjustment,
Figure BDA0003085407220000032
c is 1, 2, 3, θ is the adjusted first loss value or the adjusted second loss value1、θ2、θ3Respectively a pitch angle, a yaw angle and a roll angle in the face angle information.
The second aspect of the present application provides a face key point detection model training device, including:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a first output result of a P-Net network trained by a first face training sample and acquiring a second face training sample based on the first output result and an original face training sample, the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample;
the first training unit is used for training an R-Net network through the second face training sample and acquiring a first loss value of the R-Net network in the training process;
the first adjusting unit is used for adjusting the first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network according to the adjusted first loss value to obtain the trained R-Net network;
a second obtaining unit, configured to input the second face training sample to the trained R-Net network to obtain a second output result, and obtain a third face training sample according to the second output result and the original face training sample, where the third face training sample is marked with face angle information;
the second training unit is used for training an O-Net network through the third face training sample and acquiring a second loss value of the O-Net network in the training process;
a second adjusting unit, configured to adjust the second loss value according to the face angle information of the third face training sample, and update the network parameter of the O-Net network according to the adjusted second loss value, so as to obtain the trained O-Net network;
and the combination unit is used for combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, and the face key point detection model is used for face detection, face frame detection and face key point detection.
Optionally, the first obtaining unit is specifically configured to:
inputting a first face training sample into a trained P-Net network for face frame prediction, and obtaining a face frame prediction result of the first face training sample output by the trained P-Net network, wherein the first face training sample is obtained by cutting an original face training sample;
and cutting the original face training sample according to the face frame prediction result of the first face training sample to obtain a second face training sample, and acquiring the face angle information of the second face training sample according to the face key point coordinates of the second face training sample.
Optionally, the second obtaining unit is specifically configured to:
inputting a second face training sample into the trained R-Net network to perform face frame prediction, and acquiring a face frame prediction result of the second face training sample output by the trained R-Net network;
and cutting the original face training sample according to the face frame prediction result of the second face training sample to obtain a third face training sample, and acquiring the face angle information of the third face training sample according to the face key point coordinates of the third face training sample.
Optionally, the method further includes: a data enhancement unit to:
according to the face angle information of the second face training sample or the third face training sample, taking the second face training sample or the third face training sample with a face angle exceeding a preset angle range as a non-target training sample, and taking the second face training sample or the third face training sample with a face angle within the preset angle range as a target training sample;
performing data enhancement on the non-target training sample to obtain an enhanced training sample;
and fusing the enhanced training sample and the target training sample to obtain the preprocessed second face training sample or the preprocessed third face training sample.
A third aspect of the present application provides a training device for a face keypoint detection model, which includes a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute any one of the face keypoint detection model training methods according to instructions in the program code.
According to the technical scheme, the method has the following advantages:
the application provides a face key point detection model training method, which comprises the following steps: acquiring a first output result of a P-Net network trained by a first face training sample, and acquiring a second face training sample based on the first output result and an original face training sample, wherein the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample; training an R-Net network through a second face training sample, and acquiring a first loss value of the R-Net network in the training process; adjusting a first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network through the adjusted first loss value to obtain a trained R-Net network; inputting a second face training sample into the trained R-Net network to obtain a second output result, and acquiring a third face training sample according to the second output result and the original face training sample, wherein the third face training sample is marked with face angle information; training an O-Net network through a third face training sample, and acquiring a second loss value of the O-Net network in the training process; adjusting a second loss value according to the face angle information of the third face training sample, and updating the network parameters of the O-Net network according to the adjusted second loss value to obtain a trained O-Net network; and combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, wherein the face key point detection model is used for face detection, face frame detection and face key point detection.
According to the method, when an R-Net network and an O-Net network are trained, loss values of face key point detection tasks under different postures are directionally adjusted by adding additionally labeled angle information of the face key points, so that the detection precision of a face key point detection model under a large posture state is improved, and the technical problem that the detection precision of the face key point is not high under the condition that the extra information of the face key point task is not considered in the existing face key point detection method, especially under the condition that a face is under a large posture angle is solved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a method for training a face key point detection model according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a face keypoint detection model training device according to an embodiment of the present application.
Detailed Description
The application provides a face key point detection model training method, a face key point detection model training device and face key point detection model training equipment, which are used for solving the technical problem that the existing face key point detection method does not consider extra information of a face key point task, and particularly has low face key point detection precision under the condition that a face is in a large posture angle.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, please refer to fig. 1, an embodiment of a method for training a face keypoint detection model provided by the present application includes:
step 101, obtaining a first output result of a P-Net network trained by a first face training sample, and obtaining a second face training sample based on the first output result and an original face training sample, wherein the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample.
The embodiment of the application considers that most of the existing face key point detection models are embedded into the mobile terminal for face key point detection, and has higher requirements on real-time performance and precision. Therefore, in the embodiment of the present application, a lightweight Network MTCNN (Multi-task Convolutional Neural Network) is preferably used for face key point detection, where the MTCNN is formed by cascading a P-Net Network, an R-Net Network, and an O-Net Network, and a specific structure thereof belongs to the prior art, and is not described herein again. In order to further improve the detection accuracy of the MTCNN in the large-posture state and not influence the detection speed, the embodiment of the application does not change the network structure of the MTCNN and improves the training process of the MTCNN.
First, the P-Net network in the MTCNN is trained. And acquiring an original face training sample, wherein the original training sample is marked with the position of a face frame and the coordinates of key points of the face. The original face training sample is clipped according to Iou (cross-over ratio) to obtain a first face training sample, where the first training sample includes a positive sample, a negative sample, and a partial sample, where the clipped face part (i.e., the first face training sample) and a first face training sample Iou >0.65 of a face frame in the original face training sample are positive samples, the first face training sample Iou <0.3 is a negative sample, and the first face training sample 0.4< Iou <0.65 is a partial sample. After the first face training sample is obtained through cutting, the first face training sample is input into a P-Net network to be trained, and the P-Net network focuses on face detection and foreground and background classification. When the P-Net network is trained, face classification training and face frame detection training are mainly carried out on the P-Net network, and face key point detection training is not carried out on the P-Net network.
Inputting the first face training sample into a trained P-Net network for face detection and face frame prediction, and acquiring a face frame prediction result of the first face training sample output by the trained P-Net network; and cutting the original face training sample according to the face frame prediction result of the first face training sample to obtain a second face training sample, and acquiring the face angle information of the second face training sample according to the face key point coordinates of the second face training sample.
And processing the first face training sample through the trained P-Net network, detecting whether a face exists in the first face sample, and giving a corresponding face frame. And cutting an original face training sample according to a face frame prediction result output by the trained P-Net network, and cutting a face region corresponding to the face frame to obtain a second face training sample, wherein the second face training sample can also define a positive sample, a negative sample and a partial sample according to Iou (cross-over ratio) of the face frame. The face key point coordinates of the second face training sample can be obtained through the face key point coordinates of the original face training sample, and then the face three-dimensional angles of the second face training sample can be calculated according to the face key point coordinates of the second face training sample, wherein the face three-dimensional angles comprise a pitch angle (an upper deflection angle, a lower deflection angle), a yaw angle (a front deflection angle, a rear deflection angle) and a roll angle (a left deflection angle and a right deflection angle), so that the face angle information of the second face training sample is obtained.
And 102, training the R-Net network through a second face training sample, and acquiring a first loss value of the R-Net network in the training process.
Further, before training the P-Net network, data enhancement can be performed on a second face training sample, specifically: according to the face angle information of the second face training sample, taking the second face training sample with the face angle exceeding the preset angle range as a non-target training sample, and taking the second face training sample with the face angle within the preset angle range as a target training sample; performing data enhancement on the non-target training sample to obtain an enhanced training sample; fusing the enhanced training sample and the target training sample to obtain a preprocessed second face training sample; and training the R-Net network through the preprocessed second face training sample, and acquiring a first loss value of the R-Net network in the training process.
The method comprises the steps of limiting the size of a face angle according to a normal steering limit of the face, presetting an angle range, taking a second face training sample with the face angle exceeding the preset angle range as a non-target training sample, and performing targeted data enhancement on the non-target training sample to improve the training effect of a face key point detection model; and then fusing the enhanced training sample after data enhancement and the target training sample and then training the R-Net network.
Inputting the preprocessed second face training sample into an R-Net network for multi-task training, including face classification training, face frame detection training and face key point detection training, to obtain a training result of the second face training sample; and then calculating a first loss value L through a loss function according to the training result and the label information of the second face training sample, wherein the loss function comprises a face classification loss function, a face frame loss function and a face key point loss function.
The face classification loss function is:
Figure BDA0003085407220000081
in the formula, LdetThe loss value is classified for the face,
Figure BDA0003085407220000082
the class is predicted for the face of the second face training sample m,
Figure BDA0003085407220000083
the face true class of the second face training sample m,
Figure BDA0003085407220000084
when the face classification loss value is calculated, part of samples do not participate in the calculation of the loss value.
The face frame loss function is:
Figure BDA0003085407220000085
in the formula, LboxThe loss value of the face frame is the face frame loss value,
Figure BDA0003085407220000086
the predicted position of the face frame of the second face training sample m,
Figure BDA0003085407220000091
training the true position of the face frame of sample m for the second face, (x)1,y1) Is the coordinate of the upper left corner of the face box, (x)2,y2) The coordinates of the lower right corner of the face box are shown.
The face keypoint loss function is:
Figure BDA0003085407220000092
in the formula, LlandmarkFor the face key point loss value,
Figure BDA0003085407220000093
the real coordinates of the face keypoints n of the second face training sample m,
Figure BDA0003085407220000094
and the predicted coordinates of the face key point n of the second face training sample m, and K, omega and epsilon are network hyper-parameters, and values can be flexibly set according to actual conditions.
The first loss value L is calculated by:
L=w1Ldet+w2Lbox+w3Llandmark
in the formula, w1、w2、w3Is a weight parameter.
And 103, adjusting a first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network according to the adjusted first loss value to obtain the trained R-Net network.
After a first loss value of the R-Net network is obtained, the first loss value is adjusted through face angle information of a second face training sample, and the adjusted first loss value is as follows:
Figure BDA0003085407220000095
wherein L is a first loss value before adjustment,
Figure BDA0003085407220000096
for the adjusted first loss value, C is 1, 2, 3, θ1、θ2、θ3Respectively a pitch angle, a yaw angle and a roll angle in the face angle information.
And updating the network parameters of the R-Net network through the adjusted first loss value until the R-Net network converges to obtain the trained R-Net network.
And 104, inputting the second face training sample into the trained R-Net network to obtain a second output result, and acquiring a third face training sample according to the second output result and the original face training sample, wherein the third face training sample is marked with face angle information.
Inputting the second face training sample into the trained R-Net network to perform face frame prediction, and acquiring a face frame prediction result of the second face training sample output by the trained R-Net network; and cutting the original face training sample according to the face frame prediction result of the second face training sample to obtain a third face training sample, wherein the third face training sample can also define a positive sample, a negative sample and a partial sample according to Iou of the face frame.
And 105, training the O-Net network through the third face training sample, and acquiring a second loss value of the O-Net network in the training process.
Further, before the O-Net network is trained, data enhancement can be performed on a third face training sample, specifically: according to the face angle information of the third face training sample, taking the third face training sample with the face angle exceeding the preset angle range as a non-target training sample, and taking the third face training sample with the face angle within the preset angle range as a target training sample; performing data enhancement on the non-target training sample to obtain an enhanced training sample; fusing the enhanced training sample and the target training sample to obtain a preprocessed third face training sample; and training the O-Net network through the preprocessed third face training sample, and acquiring a second loss value of the O-Net network in the training process.
The method comprises the steps of limiting the size of a face angle according to a normal steering limit of the face, presetting an angle range, taking a second face training sample with the face angle exceeding the preset angle range as a non-target training sample, and performing targeted data enhancement on the non-target training sample to improve the training effect of a face key point detection model; and then fusing the enhanced training sample after data enhancement and the target training sample and then training the R-Net network.
Inputting the preprocessed third face training sample into an O-Net network for multi-task training, including face classification training, face frame detection training and face key point detection training, to obtain a training result of the third face training sample; and then calculating a second loss value through a loss function according to the training result and the label information of the third face training sample, wherein the loss function also comprises a face classification loss function, a face frame loss function and a face key point loss function.
And 106, adjusting a second loss value according to the face angle information of the third face training sample, and updating the network parameters of the O-Net network according to the adjusted second loss value to obtain the trained O-Net network.
After a second loss value of the O-Net network is obtained, the second loss value is adjusted through the face angle information of a third face training sample, and the adjusted second loss value is as follows:
Figure BDA0003085407220000101
wherein L is a second loss value before adjustment,
Figure BDA0003085407220000102
for the adjusted second loss value, C is 1, 2, 3, θ1、θ2、θ3Respectively a pitch angle, a yaw angle and a roll angle in the face angle information.
And updating the network parameters of the O-Net network through the adjusted second loss value until the O-Net network converges to obtain the trained O-Net network.
And step 107, combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, wherein the face key point detection model is used for face detection, face frame detection and face key point detection.
And sequentially cascading the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain the face key point detection model. The face key point detection model can be used for carrying out face detection on an image to be detected, detecting whether a face exists in the image to be detected or not, outputting a face frame and face key point coordinates corresponding to the face after the detected face, carrying out face detection, face frame detection and face key point detection through the face key point detection model, improving the detection precision of the face in a large-posture state, adopting a lightweight network MTCNN and meeting the real-time requirement of a mobile terminal.
In the embodiment of the application, when an R-Net network and an O-Net network are trained, the loss values of the face key point detection task under different postures are directionally adjusted by adding additionally labeled angle information of the face key point, so that the detection precision of the face key point detection model under a large posture state is improved, and the technical problem that the detection precision of the face key point is not high under the condition that the extra information of the face key point task is not considered in the existing face key point detection method, especially under the condition that the face is under a large posture angle is solved.
The above is an embodiment of a training method for a face keypoint detection model provided by the present application, and the following is an embodiment of a training device for a face keypoint detection model provided by the present application.
Referring to fig. 2, an embodiment of the present application provides a face keypoint detection model training apparatus, including:
the first acquisition unit is used for acquiring a first output result of a P-Net network trained by a first face training sample, and acquiring a second face training sample based on the first output result and an original face training sample, wherein the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample;
the first training unit is used for training the R-Net network through the second face training sample and acquiring a first loss value of the R-Net network in the training process;
the first adjusting unit is used for adjusting a first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network according to the adjusted first loss value to obtain a trained R-Net network;
the second acquisition unit is used for inputting the second face training sample into the trained R-Net network to obtain a second output result, and acquiring a third face training sample according to the second output result and the original face training sample, wherein the third face training sample is marked with face angle information;
the second training unit is used for training the O-Net network through a third face training sample and acquiring a second loss value of the O-Net network in the training process;
the second adjusting unit is used for adjusting a second loss value according to the face angle information of the third face training sample, and updating the network parameters of the O-Net network according to the adjusted second loss value to obtain a trained O-Net network;
and the combination unit is used for combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, and the face key point detection model is used for face detection, face frame detection and face key point detection.
As a further improvement, the first obtaining unit is specifically configured to:
inputting the first face training sample into a trained P-Net network to perform face frame prediction, and obtaining a face frame prediction result of the first face training sample output by the trained P-Net network, wherein the first face training sample is obtained by cutting an original face training sample;
and cutting the original face training sample according to the face frame prediction result of the first face training sample to obtain a second face training sample, and acquiring the face angle information of the second face training sample according to the face key point coordinates of the second face training sample.
As a further improvement, the second obtaining unit is specifically configured to:
inputting the second face training sample into the trained R-Net network to perform face frame prediction, and acquiring a face frame prediction result of the second face training sample output by the trained R-Net network;
and cutting the original face training sample according to the face frame prediction result of the second face training sample to obtain a third face training sample, and acquiring the face angle information of the third face training sample according to the face key point coordinates of the third face training sample.
As a further aspect, the apparatus further comprises: a data enhancement unit to:
according to the face angle information of the second face training sample or the third face training sample, taking the second face training sample or the third face training sample with the face angle exceeding a preset angle range as a non-target training sample, and taking the second face training sample or the third face training sample with the face angle within the preset angle range as a target training sample;
performing data enhancement on the non-target training sample to obtain an enhanced training sample;
and fusing the enhanced training sample and the target training sample to obtain a preprocessed second face training sample or a preprocessed third face training sample.
In the embodiment of the application, when an R-Net network and an O-Net network are trained, the loss values of the face key point detection task under different postures are directionally adjusted by adding additionally labeled angle information of the face key point, so that the detection precision of the face key point detection model under a large posture state is improved, and the technical problem that the detection precision of the face key point is not high under the condition that the extra information of the face key point task is not considered in the existing face key point detection method, especially under the condition that the face is under a large posture angle is solved.
The embodiment of the application also provides a face key point detection model training device, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the training method of the face key point detection model in the foregoing method embodiments according to instructions in the program code.
The embodiment of the application also provides a computer-readable storage medium, which is used for storing a program code, and the program code is used for executing the face key point detection model training method in the foregoing method embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for executing all or part of the steps of the method described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A face key point detection model training method is characterized by comprising the following steps:
acquiring a first output result of a P-Net network trained by a first face training sample, and acquiring a second face training sample based on the first output result and an original face training sample, wherein the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample;
training an R-Net network through the second face training sample, and acquiring a first loss value of the R-Net network in the training process;
adjusting the first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network according to the adjusted first loss value to obtain the trained R-Net network;
inputting the second face training sample into the trained R-Net network to obtain a second output result, and obtaining a third face training sample according to the second output result and the original face training sample, wherein the third face training sample is marked with face angle information;
training an O-Net network through the third face training sample, and acquiring a second loss value of the O-Net network in the training process;
adjusting the second loss value according to the face angle information of the third face training sample, and updating the network parameters of the O-Net network according to the adjusted second loss value to obtain the trained O-Net network;
and combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, wherein the face key point detection model is used for face detection, face frame detection and face key point detection.
2. The training method of the face keypoint detection model according to claim 1, wherein said obtaining a first output result of a P-Net network trained by a first face training sample, and obtaining a second face training sample based on the first output result and an original face training sample, the second face training sample being labeled with face angle information, the first face training sample being obtained by clipping the original face training sample, comprises:
inputting a first face training sample into a trained P-Net network for face frame prediction, and obtaining a face frame prediction result of the first face training sample output by the trained P-Net network, wherein the first face training sample is obtained by cutting an original face training sample;
and cutting the original face training sample according to the face frame prediction result of the first face training sample to obtain a second face training sample, and acquiring the face angle information of the second face training sample according to the face key point coordinates of the second face training sample.
3. The method for training the face keypoint detection model according to claim 1, wherein the inputting the second face training sample to the trained R-Net network to obtain a second output result, and obtaining a third face training sample according to the second output result and the original face training sample, the third face training sample being labeled with face angle information comprises:
inputting a second face training sample into the trained R-Net network to perform face frame prediction, and acquiring a face frame prediction result of the second face training sample output by the trained R-Net network;
and cutting the original face training sample according to the face frame prediction result of the second face training sample to obtain a third face training sample, and acquiring the face angle information of the third face training sample according to the face key point coordinates of the third face training sample.
4. The training method of the face keypoint detection model according to any one of claims 1 to 3, characterized in that it further comprises:
according to the face angle information of the second face training sample or the third face training sample, taking the second face training sample or the third face training sample with a face angle exceeding a preset angle range as a non-target training sample, and taking the second face training sample or the third face training sample with a face angle within the preset angle range as a target training sample;
performing data enhancement on the non-target training sample to obtain an enhanced training sample;
and fusing the enhanced training sample and the target training sample to obtain the preprocessed second face training sample or the preprocessed third face training sample.
5. The training method of the face keypoint detection model according to claim 1, wherein the adjusted first loss value or the adjusted second loss value is:
Figure FDA0003085407210000021
wherein L is the first loss value or the second loss value before adjustment,
Figure FDA0003085407210000022
c is 1, 2, 3, θ is the adjusted first loss value or the adjusted second loss value1、θ2、θ3Respectively a pitch angle, a yaw angle and a roll angle in the face angle information.
6. The utility model provides a people face key point detection model trainer which characterized in that includes:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a first output result of a P-Net network trained by a first face training sample and acquiring a second face training sample based on the first output result and an original face training sample, the second face training sample is marked with face angle information, and the first face training sample is obtained by cutting the original face training sample;
the first training unit is used for training an R-Net network through the second face training sample and acquiring a first loss value of the R-Net network in the training process;
the first adjusting unit is used for adjusting the first loss value according to the face angle information of the second face training sample, and updating the network parameters of the R-Net network according to the adjusted first loss value to obtain the trained R-Net network;
a second obtaining unit, configured to input the second face training sample to the trained R-Net network to obtain a second output result, and obtain a third face training sample according to the second output result and the original face training sample, where the third face training sample is marked with face angle information;
the second training unit is used for training an O-Net network through the third face training sample and acquiring a second loss value of the O-Net network in the training process;
a second adjusting unit, configured to adjust the second loss value according to the face angle information of the third face training sample, and update the network parameter of the O-Net network according to the adjusted second loss value, so as to obtain the trained O-Net network;
and the combination unit is used for combining the trained P-Net network, the trained R-Net network and the trained O-Net network to obtain a face key point detection model, and the face key point detection model is used for face detection, face frame detection and face key point detection.
7. The device for training the face keypoint detection model according to claim 6, wherein the first obtaining unit is specifically configured to:
inputting a first face training sample into a trained P-Net network for face frame prediction, and obtaining a face frame prediction result of the first face training sample output by the trained P-Net network, wherein the first face training sample is obtained by cutting an original face training sample;
and cutting the original face training sample according to the face frame prediction result of the first face training sample to obtain a second face training sample, and acquiring the face angle information of the second face training sample according to the face key point coordinates of the second face training sample.
8. The device for training the face keypoint detection model according to claim 6, wherein the second obtaining unit is specifically configured to:
inputting a second face training sample into the trained R-Net network to perform face frame prediction, and acquiring a face frame prediction result of the second face training sample output by the trained R-Net network;
and cutting the original face training sample according to the face frame prediction result of the second face training sample to obtain a third face training sample, and acquiring the face angle information of the third face training sample according to the face key point coordinates of the third face training sample.
9. The training device for the face key point detection model according to any one of claims 6 to 8, further comprising: a data enhancement unit to:
according to the face angle information of the second face training sample or the third face training sample, taking the second face training sample or the third face training sample with a face angle exceeding a preset angle range as a non-target training sample, and taking the second face training sample or the third face training sample with a face angle within the preset angle range as a target training sample;
performing data enhancement on the non-target training sample to obtain an enhanced training sample;
and fusing the enhanced training sample and the target training sample to obtain the preprocessed second face training sample or the preprocessed third face training sample.
10. A human face key point detection model training device is characterized by comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the human face key point detection model training method of any one of claims 1 to 5 according to instructions in the program code.
CN202110579215.3A 2021-05-26 2021-05-26 Face key point detection model training method, device and equipment Pending CN113313010A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110579215.3A CN113313010A (en) 2021-05-26 2021-05-26 Face key point detection model training method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110579215.3A CN113313010A (en) 2021-05-26 2021-05-26 Face key point detection model training method, device and equipment

Publications (1)

Publication Number Publication Date
CN113313010A true CN113313010A (en) 2021-08-27

Family

ID=77375124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110579215.3A Pending CN113313010A (en) 2021-05-26 2021-05-26 Face key point detection model training method, device and equipment

Country Status (1)

Country Link
CN (1) CN113313010A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596637A (en) * 2022-03-23 2022-06-07 北京百度网讯科技有限公司 Image sample data enhancement training method and device and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657548A (en) * 2018-11-13 2019-04-19 深圳神目信息技术有限公司 A kind of method for detecting human face and system based on deep learning
CN109858466A (en) * 2019-03-01 2019-06-07 北京视甄智能科技有限公司 A kind of face critical point detection method and device based on convolutional neural networks
CN110188730A (en) * 2019-06-06 2019-08-30 山东大学 Face datection and alignment schemes based on MTCNN
CN111160269A (en) * 2019-12-30 2020-05-15 广东工业大学 Face key point detection method and device
CN111401257A (en) * 2020-03-17 2020-07-10 天津理工大学 Non-constraint condition face recognition method based on cosine loss
CN111738080A (en) * 2020-05-19 2020-10-02 云知声智能科技股份有限公司 Face detection and alignment method and device
CN111814573A (en) * 2020-06-12 2020-10-23 深圳禾思众成科技有限公司 Face information detection method and device, terminal equipment and storage medium
CN112633084A (en) * 2020-12-07 2021-04-09 深圳云天励飞技术股份有限公司 Face frame determination method and device, terminal equipment and storage medium
CN112651490A (en) * 2020-12-28 2021-04-13 深圳万兴软件有限公司 Training method and device for face key point detection model and readable storage medium
WO2021068323A1 (en) * 2019-10-12 2021-04-15 平安科技(深圳)有限公司 Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
CN112801043A (en) * 2021-03-11 2021-05-14 河北工业大学 Real-time video face key point detection method based on deep learning

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657548A (en) * 2018-11-13 2019-04-19 深圳神目信息技术有限公司 A kind of method for detecting human face and system based on deep learning
CN109858466A (en) * 2019-03-01 2019-06-07 北京视甄智能科技有限公司 A kind of face critical point detection method and device based on convolutional neural networks
CN110188730A (en) * 2019-06-06 2019-08-30 山东大学 Face datection and alignment schemes based on MTCNN
WO2021068323A1 (en) * 2019-10-12 2021-04-15 平安科技(深圳)有限公司 Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
CN111160269A (en) * 2019-12-30 2020-05-15 广东工业大学 Face key point detection method and device
CN111401257A (en) * 2020-03-17 2020-07-10 天津理工大学 Non-constraint condition face recognition method based on cosine loss
CN111738080A (en) * 2020-05-19 2020-10-02 云知声智能科技股份有限公司 Face detection and alignment method and device
CN111814573A (en) * 2020-06-12 2020-10-23 深圳禾思众成科技有限公司 Face information detection method and device, terminal equipment and storage medium
CN112633084A (en) * 2020-12-07 2021-04-09 深圳云天励飞技术股份有限公司 Face frame determination method and device, terminal equipment and storage medium
CN112651490A (en) * 2020-12-28 2021-04-13 深圳万兴软件有限公司 Training method and device for face key point detection model and readable storage medium
CN112801043A (en) * 2021-03-11 2021-05-14 河北工业大学 Real-time video face key point detection method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAIPENG ZHANG ET AL: "Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks", 《IEEE》 *
徐礼淮 等: "高精度轻量级的人脸关键点检测算法", 《激光与光电子学进展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596637A (en) * 2022-03-23 2022-06-07 北京百度网讯科技有限公司 Image sample data enhancement training method and device and electronic equipment
CN114596637B (en) * 2022-03-23 2024-02-06 北京百度网讯科技有限公司 Image sample data enhancement training method and device and electronic equipment

Similar Documents

Publication Publication Date Title
WO2022161286A1 (en) Image detection method, model training method, device, medium, and program product
CN107169421B (en) Automobile driving scene target detection method based on deep convolutional neural network
CN111160269A (en) Face key point detection method and device
CN112084856A (en) Face posture detection method and device, terminal equipment and storage medium
CN108734078B (en) Image processing method, image processing apparatus, electronic device, storage medium, and program
JP2023541752A (en) Neural network model training methods, image retrieval methods, equipment and media
Santhalingam et al. Sign language recognition analysis using multimodal data
CN111209811B (en) Method and system for detecting eyeball attention position in real time
WO2023151237A1 (en) Face pose estimation method and apparatus, electronic device, and storage medium
CN110222572A (en) Tracking, device, electronic equipment and storage medium
WO2022148248A1 (en) Image processing model training method, image processing method and apparatus, electronic device, and computer program product
CN110569775A (en) Method, system, storage medium and electronic device for recognizing human body posture
CN110598647B (en) Head posture recognition method based on image recognition
CN113239849B (en) Body-building action quality assessment method, body-building action quality assessment system, terminal equipment and storage medium
CN113313010A (en) Face key point detection model training method, device and equipment
CN117372604A (en) 3D face model generation method, device, equipment and readable storage medium
CN110490165B (en) Dynamic gesture tracking method based on convolutional neural network
CN115205750B (en) Motion real-time counting method and system based on deep learning model
CN113807430B (en) Model training method, device, computer equipment and storage medium
Nappi et al. Introduction to the special section on biometric systems and applications
CN113362249A (en) Text image synthesis method and device, computer equipment and storage medium
CN113205530A (en) Shadow area processing method and device, computer readable medium and electronic equipment
CN112200169A (en) Method, apparatus, device and storage medium for training a model
WO2020237674A1 (en) Target tracking method and apparatus, and unmanned aerial vehicle
CN111985510B (en) Generative model training method, image generation device, medium, and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240112