CN114067380A - Model training method, detection method and device for face key point detection, and terminal equipment - Google Patents

Model training method, detection method and device for face key point detection, and terminal equipment Download PDF

Info

Publication number
CN114067380A
CN114067380A CN202010764497.XA CN202010764497A CN114067380A CN 114067380 A CN114067380 A CN 114067380A CN 202010764497 A CN202010764497 A CN 202010764497A CN 114067380 A CN114067380 A CN 114067380A
Authority
CN
China
Prior art keywords
key point
face
max
detection
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010764497.XA
Other languages
Chinese (zh)
Other versions
CN114067380B (en
Inventor
林坚
周金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Original Assignee
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingzheyi Intelligent Transportation Technology Co ltd filed Critical Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority to CN202010764497.XA priority Critical patent/CN114067380B/en
Publication of CN114067380A publication Critical patent/CN114067380A/en
Application granted granted Critical
Publication of CN114067380B publication Critical patent/CN114067380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a model training method, a detection method, a device and a terminal device for face key point detection, wherein the model training method comprises the following steps: step 1, collecting an original image containing a human face, labeling N human face key point coordinates at positions, and constructing an original training set; step 2: acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set; step 3, image preprocessing is carried out on the augmented data set; step 4, acquiring an Euler angle label of the augmented data set image; and 5, training the face key point detection model, detecting the face key points based on the network obtained by the model training method, and improving the detection speed under the condition of ensuring the precision.

Description

Model training method, detection method and device for face key point detection, and terminal equipment
Technical Field
The invention relates to the technical field of computer vision and face recognition, in particular to a model training method, a detection method and a device for face key point detection and a terminal device.
Background
The key point detection of the human face aims at key point positioning of a given human face image, and key points are artificially defined and used for representing some points on the five sense organs and the outline of the human face. How to accurately and efficiently locate the key points is an important prerequisite for face analysis and recognition. At present, the face key point detection method mainly includes a traditional method based on an ASM (active Shape model) and an AAM (active appearance model), a method based on a CSR (masked Shape regression), and a method based on deep learning, but the existing method is difficult to realize rapid detection under the condition of ensuring precision.
Disclosure of Invention
In order to overcome the defects of the prior art, the embodiment of the disclosure provides a model training method, a detection method and a detection device for face key point detection, and a terminal device, which can improve the detection speed under the condition of ensuring the precision. The technical scheme is as follows:
in a first aspect, a model training method for face keypoint detection is provided, which includes the following steps:
step 1, collecting an original image containing a human face, marking N human face key point coordinates at the position, wherein the human face key point coordinates are coordinates relative to the original image, the width of the original image is width, and the height of the original image is height, and constructing an original training set.
Step 2: acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set;
calculating the maximum and minimum horizontal and vertical coordinates of each image key point to form an original face region frame src _ box (x _ min, y _ min, x _ max, y _ max), obtaining num groups of enlarged face regions enlarge _ box (x _ min ', y _ min', x _ max ', y _ max') by using a random enlargement method for the face region frame, and correspondingly converting the key point coordinates corresponding to the original image into the key point coordinates corresponding to the enlarge _ box, thereby constructing an augmented training set.
And 3, carrying out image preprocessing on the augmented data set.
And 4, acquiring an Euler angle label of the augmented data set image.
Obtaining coordinates of six points including a left canthus, a right canthus, a nose tip, a left mouth corner, a right mouth corner and a lower jaw from key points of the face, calculating a rotation matrix, and converting the rotation matrix into an Euler angle; the euler angle comprises a pitch angle pitch, a yaw angle raw and a roll angle roll.
Step 5, training the face key point detection model,
the method comprises the steps of adopting an augmented training set, taking images in the training set, key points of the images and Euler angle labels of the images as input, using a basic feature extraction module on the input images to obtain basic feature parts, and respectively predicting the basic feature parts by adopting key point network main branches and angle network auxiliary branches.
The key point network main branch comprises a first scale feature extraction module, a second scale feature extraction module, a third scale feature extraction module and a key point regression prediction module, and first scale features are extracted from the basic feature part; continuously extracting the first scale features to obtain second scale features; extracting the second scale features to obtain third scale features; and finally, fusing the characteristics of all three scales as the input of a key point regression prediction module to obtain the predicted key point.
The angle network auxiliary branch comprises an Euler angle feature extraction module and an Euler angle feature prediction module, the Euler angle feature extraction module is used for extracting Euler angle features from the basic feature part, and then the Euler angle feature prediction module is used for obtaining a predicted Euler angle.
Calculating Loss by combining predicted key point coordinates and real key point coordinateswingAnd coordinate regression LossregCalculating a reference to Loss in conjunction with the predicted euler angle and the true euler angleregInfluence weight w ofθ(ii) a Total Loss of training Loss is LosswingSum weighted LossregThe training goal is to converge the total Loss to a minimum.
Preferably, in step 2, the frame of the face region is randomly enlarged, and the specific method is as follows:
the width magnification factor of the target frame is Sw, the height magnification factor is Sh, and the maximum threshold eta of Sw and Sh is set, namely that Sw is more than or equal to 1 and eta is less than or equal to 1, and Sh is more than or equal to 1 and eta is less than or equal to eta.
Set dx1To the extent that the target frame is enlarged to the left, dx1A random number ranging from 0 to Sw-1; dy1For the scale of the target frame up, dy1A random number ranging from 0 to Sh-1; dx2For the scale of the enlargement of the target frame to the right, dx2The value range is between 0 and 1.0-dx1The random number of (2); dy2 being the scale of the target box expanding downward, dy2The value range is between 0 and 1.0-dy1The random number of (2); namely: dx1∈[0,Sw-1],dy1∈[0,Sh-1],dx2∈[0,1.0-dx1],dy2∈[0,1.0-dy1]And d is not less than 0x1+dx2≤1,0≤dy1+dy2≤1。
The maximum outward expansion of the target frame is:
maximum value of left or right enlargement: max _ dx ═ (Sw-1.0) (x _ max-x _ min)
Maximum value of upward or downward expansion: max _ dy ═ (Sh-1.0) (. y _ max-y _ min)
Then, the image border crossing processing is carried out, the upper left corner point in the original image is taken as the original point of the original image, and the new coordinate position obtained after random expansion is taken as
x_min’=max(0,x_min-dx1*max_dx)
y_min’=max(0,y_min-dy1*max_dy)、
x_max’=min(width,x_max+dx2*max_dx)
y_max’=min(height,y_max+dy2*max_dy)
And obtaining the area frame (x _ min ', y _ min', x _ max ', y _ max') after one-time random amplification.
Preferably, the method of image pre-processing the augmented data set comprises one or more of the following: scaling to fixed input size, random rotation, random graying, random color and saturation transformation, pixel value normalization, random erasure.
Preferably, in step 5, the basic feature extraction module uses a basic network to convert the input image into a basic feature map with the size of original image 1/4; a first scale feature extraction module in a key point network main branch module down-samples the size of a basic feature mapping layer from 1/4 size of an original image to 1/8 size, and maps the down-sampled feature to 1 x 1 size by using self-adaptive mean pooling operation; the second scale feature extraction module down-samples the size of the first scale feature mapping layer from original drawing 1/8 to 1/16, and maps the down-sampled feature to 1 × 1 size by using self-adaptive mean pooling operation; the third scale feature extraction module down-samples the size of the second scale feature mapping layer from the size of the original image 1/16 to 1 x 1; the feature extraction module in the angle network assisted branching module down-samples the base feature mapping layer size from the original 1/4 size to 1/16 size.
Further, the basic feature extraction module in the step 5 comprises 1 convolution, G-bneck B, G-bneck C and G-bneck D.
The first scale feature extraction module comprises G-bneck A and n1A plurality of G-bneck B,
the second scale feature extraction module comprises G-bneck A and n2A plurality of G-bneck B,
the third scale feature extraction module comprises 1 convolution,
the angle network auxiliary branch comprises 4 convolution layers, a maximum pooling layer and 2 full-connection layers.
In a second aspect, a method for detecting key points of a human face is provided, and the method specifically includes:
according to any one of all possible implementation manners, the trained face key point detection network is input as an image to be detected, and a key point detection result is obtained.
Meanwhile, another method for detecting key points of the human face is provided, which specifically comprises the following steps:
according to any one of all possible implementation manners, the basic feature extraction module and the key point network main branch module are adopted as the detection network of the face key point detection network obtained by training, and the image to be detected is input to obtain the key point detection result.
Preferably, the method further comprises the following steps of processing the image to be detected: the method comprises the steps of obtaining the position of a face detection frame by using a face detection algorithm, setting a threshold value for the size of the face detection frame, removing the detection frame with the size smaller than the threshold value, carrying out scale magnification on the face frame when the size of the face detection frame is larger than the threshold value, intercepting the magnified face detection frame as detection input to obtain key point coordinates, and calculating the face key point coordinates of an image to be detected by combining the key point coordinates and the face detection frame coordinates.
In a third aspect, an apparatus for detecting key points of a human face is provided, the apparatus includes a training module and a detection module;
the training module is used for executing the steps of the model training method for detecting the key points of the human face in any one of all possible implementation modes;
the detection module is configured to execute the steps of the method for detecting key points of a human face according to any one of all possible implementation manners.
In a fourth aspect, a mobile terminal device is provided, where the mobile terminal device includes a device for detecting key points of a human face in any one of all possible implementation manners.
Compared with the prior art, one of the technical schemes has the following beneficial effects:
1. when data are enhanced, more different training data are generated by adopting a new mode of randomly amplifying a target frame, and the precision and the generalization capability of the model are improved;
2. by using the lightweight network as a backbone network, the calculation amount of the model is reduced, and the detection speed is improved;
3. the model is trained and strengthened only by using the angle network auxiliary branch in the training stage, and the detection precision is improved under the condition of not increasing the calculated amount in the test;
4. training the model by combining the weighted key point coordinate regression loss and wing loss;
5. direct regression of the position coordinates, rather than thermodynamic map processing, eliminates the time consuming post-processing component.
Drawings
Fig. 1 is a schematic diagram of constructing an augmented training set according to an embodiment of the present disclosure.
Fig. 2 is a schematic diagram illustrating random enlargement of a face area according to an embodiment of the disclosure.
Fig. 3 is a frame diagram of a face key point model training network provided in the embodiment of the present disclosure.
Fig. 4 is a structure diagram of a face key point model training network provided in the embodiment of the present disclosure.
Fig. 5 is a block diagram of four G-bneck modules according to an embodiment of the present disclosure.
Fig. 6 is a network framework diagram of face keypoint detection provided by the embodiment of the present disclosure.
Fig. 7 is a network structure diagram of face key point detection provided in the embodiment of the present disclosure.
Detailed Description
In order to clarify the technical solution and the working principle of the present invention, the embodiments of the present disclosure will be described in further detail with reference to the accompanying drawings.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
The terms "step 1," "step 2," "step 3," and the like in the description and claims of this application and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may, for example, be implemented in an order other than those described herein.
In a first aspect: the embodiment of the disclosure provides a model training method for face key point detection, which comprises the following steps:
step 1, collecting an original image containing a human face, labeling N human face key point coordinates (preferably 106 key points) at the position, wherein the human face key point coordinates are coordinates relative to the original image, the width of the original image is width, and the height of the original image is height, and constructing an original training set.
Step 2: and acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set.
Fig. 1 is a schematic diagram of constructing an augmented training set according to an embodiment of the present disclosure, and a specific method in combination with step 2 of the diagram includes: calculating the maximum and minimum horizontal and vertical coordinates of each image key point to form an original face region frame src _ box (x _ min, y _ min, x _ max, y _ max), obtaining num (default 20) groups of amplified face regions enlarge _ box (x _ min ', y _ min', x _ max ', y _ max') by using a random amplification method for the face region frame, and correspondingly converting the key point coordinates corresponding to the original image into the key point coordinates corresponding to the enlarge _ box, thereby constructing an augmentation training set.
Preferably, in step 2, the frame of the face region is randomly enlarged, and the specific method is as follows, as shown in fig. 2:
1) the width magnification factor of the target frame (namely the face area frame) is set to be Sw, the height magnification factor is set to be Sh, the maximum threshold eta (default is 2.0) of Sw and Sh is set, Sw is larger than or equal to 1 and smaller than or equal to eta, and Sh is larger than or equal to 1 and smaller than or equal to eta.
2) Set dx1To the extent that the target frame is enlarged to the left, dx1A random number ranging from 0 to Sw-1; dy1For the scale of the target frame up, dy1A random number ranging from 0 to Sh-1; dx2For the scale of the enlargement of the target frame to the right, dx2The value range is between 0 and 1.0-dx1The random number of (2); dy2 being the scale of the target box expanding downward, dy2The value range is between 0 and 1.0-dy1The random number of (2); namely: dx1∈[0,Sw-1],dy1∈[0,Sh-1],dx2∈[0,1.0-dx1],dy2∈[0,1.0-dy1]And d is not less than 0x1+dx2≤1,0≤dy1+dy2≤1。
3) The maximum outward expansion of the target frame is:
maximum value of left or right enlargement: max _ dx ═ (Sw-1.0) (x _ max-x _ min)
Maximum value of upward or downward expansion: max _ dy ═ (Sh-1.0) (. y _ max-y _ min)
Then, the image border crossing processing is carried out, the upper left corner point in the original image is taken as the original point of the original image, and the new coordinate position obtained after random expansion is taken as
x_min’=max(0,x_min-dx1*max_dx)
y_min’=max(0,y_min-dy1*max_dy)、
x_max’=min(width,x_max+dx2*max_dx)
y_max’=min(height,y_max+dy2*max_dy)
And obtaining the area frame (x _ min ', y _ min', x _ max ', y _ max') after one-time random amplification.
The random amplification mechanism ensures that the new area frame comprises the original face area frame, the original face area frame does not sit at the central position of the great _ box, but falls at all positions of the new area frame at equal probability, and the size and the shape of the new area frame are not fixed, so that a plurality of different samples are generated in an initial picture, and the final precision and the generalization capability of model training are possible to be further improved; meanwhile, when the final training model is tested, if the extraction of the face target frame is not accurate enough, the final training model can be well positioned to the position of the key point, and even if the face occupies a large area in the image area, the accurate result can be obtained by directly adopting key point detection without using a target detection module.
Step 3, image preprocessing is carried out on the augmentation data set
Preferably, the method of image pre-processing the augmented data set comprises one or more of the following: scaling to a fixed input size (112 × 112 is used by default), random rotation (note that the coordinates of the key points also need corresponding transformation), random graying, random color and saturation transformation, pixel value normalization, (calculating the pixel mean and variance of all images of the training set with respect to three channels of RGB, and performing a mean-reducing and variance-removing operation on the images), random erasing, and the like.
Step 4, obtaining Euler angle label of the augmented data set image
Coordinates of six points including a left canthus, a right canthus, a nose tip, a left mouth corner, a right mouth corner and a lower jaw are obtained from key points of the face, a rotation matrix is calculated (the rotation matrix can be calculated by preferably using a solvePnP function in OpenCV), and then the rotation matrix is converted into an Euler angle; the euler angle comprises a pitch angle (pitch), a yaw angle (raw) and a roll angle (roll).
Step 5, training the face key point detection model,
fig. 3 is a frame diagram of a face key point model training network provided in the embodiment of the present disclosure, and step 5 in combination with the diagram specifically includes: the method comprises the steps of adopting an augmented training set, taking images in the training set, key points of the images and Euler Angle labels of the images as input, using a basic feature extraction Module for the input images to obtain basic feature parts, and respectively predicting the basic feature parts by adopting a key point network main branch (Landmark Net Module) and an Angle network auxiliary branch (Angle Net Module).
The key point network main branch comprises a first Scale feature extraction module (Scale _1_ Net), a second Scale feature extraction module (Scale _2_ Net), a third Scale feature extraction module (Scale _3_ Net) and a key point regression prediction module (Landmark _ Classfier), and the first Scale feature is extracted from the basic feature part; continuously extracting the first scale features to obtain second scale features; extracting the second scale features to obtain third scale features; and finally, fusing the characteristics of all three scales as the input of a key point regression prediction module to obtain the predicted key point.
The angle network auxiliary branch comprises an Euler angle feature extraction module and an Euler angle feature prediction module, the Euler angle feature extraction module is used for extracting Euler angle features from the basic feature part, and then the Euler angle feature prediction module is used for obtaining a predicted Euler angle.
Calculating Loss by combining predicted key point coordinates and real key point coordinateswingAnd coordinate regression LossregCalculating a reference to Loss in conjunction with the predicted euler angle and the true euler angleregInfluence weight w ofθ(ii) a Total Loss of training Loss is LosswingSum weighted LossregThe training goal is to converge the total Loss to a minimum.
Weight of influence wθIs composed of
Figure BDA0002614160060000081
Wherein theta is1、θ2、θ3Respectively represents the values of the three Euler angles of pitch, yaw and roll
LossregIs composed of
Figure BDA0002614160060000082
Where N is the number of keypoints (default to 106),
Figure BDA0002614160060000084
and representing the distance between the predicted value of the nth key point and the true value of the corresponding point.
LosswingIs composed of
Figure BDA0002614160060000083
Where w, e, and C are constants that can be set as appropriate (here, preferably, w is 10.0, e is 2.0, and C is 10 ln6, respectively)
The total Loss of training is Loss of
Loss=(1+wθ)*Lossreg+Losswing
Preferably, in step 5, the basic feature extraction module uses a basic network to convert the input image into a basic feature map with the size of original image 1/4; a first scale feature extraction module in a key point network main branch module down-samples the size of a basic feature mapping layer from 1/4 size of an original image to 1/8 size, and maps the down-sampled feature to 1 x 1 size by using self-adaptive mean pooling operation; the second scale feature extraction module down-samples the size of the first scale feature mapping layer from original drawing 1/8 to 1/16, and maps the down-sampled feature to 1 × 1 size by using self-adaptive mean pooling operation; the third scale feature extraction module down-samples the size of the second scale feature mapping layer from the size of the original image 1/16 to 1 x 1; the feature extraction module in the angle network assisted branching module down-samples the base feature mapping layer size from the original 1/4 size to 1/16 size.
Further, as shown in fig. 4 and 5, the basic feature extraction module in step 5 includes 1 convolution, G-bneck B, G-bneck C, and G-bneck D.
The first scale feature extraction module comprises G-bneck A and n 1G-bneck B
The second scale feature extraction module comprises G-bneck A and n 2G-bneck B
The third scale feature extraction module comprises 1 convolution
The angle network auxiliary branch comprises 4 convolution layers, a maximum pooling layer and 2 full-connection layers.
In a second aspect: the embodiment of the present disclosure provides a method for detecting key points of a human face;
fig. 6 is a network frame diagram of face key point detection provided in the embodiment of the present disclosure, and in combination with the network frame diagram, a method for detecting face key points includes:
the method comprises the steps of training a human face key point detection network by adopting a model training method for human face key point detection according to any one of all possible implementation modes, inputting an image to be detected by adopting a basic feature extraction module and a key point network main branch module as the detection network, and obtaining a key point detection result (or key point detection coordinates).
In practical application, scenes with a relatively short distance between the face and the camera often appear, for example, when the face detection is performed at a security inspection entrance and the proportion of the face in an image to be detected is large, the face frame does not need to be amplified by adopting a face detection algorithm, and the original image is directly input into a key point detection network to obtain the coordinate position of the key point of the face related to the original image. However, for a scene with a relatively long distance between the face and the camera, when the number of faces in the image to be detected is more than one, the face detection frame is often too small, a face detection algorithm (preferably, MTCNN) can be preferably used to obtain the position of the face detection frame, a threshold is set for the size of the face detection frame, the detection frame with the size smaller than the threshold is removed, when the size of the face detection frame is larger than the threshold, the face detection frame is subjected to scale (preferably, 1.8) times amplification, the amplified face detection frame is intercepted and used as detection input, a key point detection result is obtained, and the predicted key point coordinates and the face detection frame coordinates are combined to calculate the face key point coordinates related to the image to be detected.
Preferably, fig. 7 is a network structure diagram of face keypoint detection provided by the embodiment of the present disclosure, and in combination with the diagram, the basic feature extraction module includes 1 convolution, G-bneck B, G-bneck C, and G-bneck D; the key point network main branch comprises a first Scale feature extraction module (Scale _1_ Net), a second Scale feature extraction module (Scale _2_ Net), a third Scale feature extraction module (Scale _3_ Net) and a key point regression prediction module (Landmark _ Classfier); the first scale feature extraction module comprises G-bneck A and n 1G-bneck B, the second scale feature extraction module comprises G-bneck A and n 2G-bneck B, and the third scale feature extraction module comprises 1 convolution.
It should be noted that the test part only uses the basic feature extraction module and the key point network main branch module to obtain the final human face key point prediction result. The effect of adding the angle network auxiliary branch during training is mainly to enable the trained model to have certain learning capacity on the angles of the human faces, enable the model to more accurately predict key point information when the model aims at the human faces with different angles, prompt the generalization capacity of the model on human face angle factors, and improve the accuracy of human face key point detection.
In a third aspect: the embodiment of the present disclosure provides a device for detecting key points of a human face;
based on the same technical concept, the device comprises a training module and a detection module;
the training module is used for executing the steps of the model training method for detecting the key points of the human face in any one of all possible implementation modes;
the detection module is configured to execute the steps of the method for detecting key points of a human face according to any one of all possible implementation manners.
The device for detecting the face key points, the model training method for detecting the face key points and the detection method provided by the embodiments belong to the same concept, and the specific implementation process is described in the method embodiments in detail and is not described herein again.
In a fourth aspect, an embodiment of the present disclosure provides a terminal device, where the terminal device includes a device for detecting key points of a human face in any one of all possible implementation manners.
The invention has been described above by way of example with reference to the accompanying drawings, it being understood that the invention is not limited to the specific embodiments described above, but is capable of numerous insubstantial modifications when implemented in accordance with the principles and solutions of the present invention; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (10)

1. A model training method for face key point detection is characterized by comprising the following steps:
step 1, collecting an original image containing a human face, marking N coordinates of human face key points at positions, wherein the human face key point coordinates are coordinates relative to the original image, the width of the original image is width, and the height of the original image is height, and constructing an original training set;
step 2: acquiring a face region frame, and randomly amplifying the face region frame to obtain a constructed augmentation training set;
calculating the maximum and minimum horizontal and vertical coordinates of each image key point to form an original face region frame src _ box (x _ min, y _ min, x _ max, y _ max), obtaining num groups of enlarged face regions enlarge _ box (x _ min ', y _ min', x _ max ', y _ max') by using a random enlargement method for the face region frame, and correspondingly converting the key point coordinates corresponding to the original image into the key point coordinates corresponding to the enlarge _ box, thereby constructing an augmentation training set;
step 3, image preprocessing is carried out on the augmented data set;
step 4, acquiring an Euler angle label of the augmented data set image;
obtaining coordinates of six points including a left canthus, a right canthus, a nose tip, a left mouth corner, a right mouth corner and a lower jaw from key points of the face, calculating a rotation matrix, and converting the rotation matrix into an Euler angle; the Euler angles comprise a pitch angle pitch, a yaw angle raw and a roll angle roll;
step 5, training the face key point detection model,
adopting an augmented training set, taking images in the training set, key points of the images and Euler angle labels of the images as input, using a basic feature extraction module on the input images to obtain a basic feature part, and respectively predicting the basic feature part by adopting a key point network main branch and an angle network auxiliary branch;
the key point network main branch comprises a first scale feature extraction module, a second scale feature extraction module, a third scale feature extraction module and a key point regression prediction module, and first scale features are extracted from the basic feature part; continuously extracting the first scale features to obtain second scale features; extracting the second scale features to obtain third scale features; finally, fusing all three-scale features as the input of a key point regression prediction module to obtain predicted key points;
the angle network auxiliary branch comprises an Euler angle feature extraction module and an Euler angle feature prediction module, wherein the Euler angle feature extraction module is used for extracting Euler angle features from the basic feature part, and then the Euler angle feature prediction module is used for obtaining a predicted Euler angle;
calculating Loss by combining predicted key point coordinates and real key point coordinateswingAnd coordinate regression LossregCalculating a reference to Loss in conjunction with the predicted euler angle and the true euler angleregInfluence weight w ofθ(ii) a Total Loss of training Loss is LosswingSum weighted LossregThe training goal is to converge the total Loss to a minimum.
2. The model training method for face key point detection according to claim 1, wherein the face region frame is randomly enlarged in step 2, and the specific method is as follows:
setting the width magnification factor of the target frame as Sw and the height magnification factor as Sh, and setting the maximum threshold eta of Sw and Sh, namely, Sw is more than or equal to 1 and less than or equal to eta, and Sh is more than or equal to 1 and less than or equal to eta;
set dx1To the extent that the target frame is enlarged to the left, dx1A random number ranging from 0 to Sw-1; dy1For the scale of the target frame up, dy1A random number ranging from 0 to Sh-1; dx2For the scale of the enlargement of the target frame to the right, dx2The value range is between 0 and 1.0-dx1The random number of (2); dy2 being the scale of the target box expanding downward, dy2The value range is between 0 and 1.0-dy1The random number of (2); namely: dx1∈[0,Sw-1],dy1∈[0,Sh-1],dx2∈[0,1.0-dx1],dy2∈[0,1.0-dy1]And d is not less than 0x1+dx2≤1,0≤dy1+dy2Less than or equal to 1; the maximum outward expansion of the target frame is:
maximum value of left or right enlargement: max _ dx ═ (Sw-1.0) (x _ max-x _ min)
Maximum value of upward or downward expansion: max _ dy ═ (Sh-1.0) (. y _ max-y _ min)
Then, the image border crossing processing is carried out, the upper left corner point in the original image is taken as the original point of the original image, and the new coordinate position obtained after random expansion is taken as
x_min’=max(0,x_min-dx1*max_dx)
y_min’=max(0,y_min-dy1*max_dy)、
x_max’=min(width,x_max+dx2*max_dx)
y_max’=min(height,y_max+dy2*max_dy)
And obtaining the area frame (x _ min ', y _ min', x _ max ', y _ max') after one-time random amplification.
3. The model training method for face keypoint detection according to claim 1, characterized in that the method for image preprocessing of the augmented data set comprises one or more of the following ways: scaling to fixed input size, random rotation, random graying, random color and saturation transformation, pixel value normalization, random erasure.
4. The model training method for face keypoint detection according to any of claims 1 to 3, characterized in that in step 5, the basic feature extraction module uses a basic network to convert the input image into a basic feature map with dimensions of original drawing size 1/4; a first scale feature extraction module in a key point network main branch module down-samples the size of a basic feature mapping layer from 1/4 size of an original image to 1/8 size, and maps the down-sampled feature to 1 x 1 size by using self-adaptive mean pooling operation; the second scale feature extraction module down-samples the size of the first scale feature mapping layer from original drawing 1/8 to 1/16, and maps the down-sampled feature to 1 × 1 size by using self-adaptive mean pooling operation; the third scale feature extraction module down-samples the size of the second scale feature mapping layer from the size of the original image 1/16 to 1 x 1; the feature extraction module in the angle network assisted branching module down-samples the base feature mapping layer size from the original 1/4 size to 1/16 size.
5. The model training method for face key point detection according to claim 4, wherein the basic feature extraction module in step 5 comprises 1 convolution, G-bneck B, G-bneck C and G-bneck D;
the first scale feature extraction module comprises G-bneck A and n1A plurality of G-bneck B,
the second scale feature extraction module comprises G-bneck A and n2A plurality of G-bneck B,
the third scale feature extraction module comprises 1 convolution,
the angle network auxiliary branch comprises 4 convolution layers, a maximum pooling layer and 2 full-connection layers.
6. A method for detecting key points of a human face is characterized in that,
the model training method for face key point detection according to any one of claims 1 to 5, wherein the trained face key point detection network is input as an image to be detected to obtain a key point detection result.
7. A method for detecting key points of a human face is characterized in that,
the model training method for face key point detection according to any one of claims 1 to 5, wherein the trained face key point detection network adopts a basic feature extraction module and a key point network main branch module as a detection network, and an image to be detected is input to obtain a key point detection result.
8. The method for detecting the key points of the human face according to any one of the claims 6 or 7, characterized in that the image to be detected is processed by: the method comprises the steps of obtaining the position of a face detection frame by using a face detection algorithm, setting a threshold value for the size of the face detection frame, removing the detection frame with the size smaller than the threshold value, carrying out scale magnification on the face frame when the size of the face detection frame is larger than the threshold value, intercepting the magnified face detection frame as detection input to obtain key point coordinates, and calculating the face key point coordinates of an image to be detected by combining the key point coordinates and the face detection frame coordinates.
9. A human face key point detection device is characterized by comprising a training module and a detection module;
the training module is used for executing the steps of the model training method for face key point detection according to any one of claims 1 to 5;
the detection module is configured to perform the steps of the method for detecting key points of a human face according to any one of claims 6 to 8.
10. A mobile terminal device, characterized in that the mobile terminal device comprises the apparatus for detecting face key points according to claim 9.
CN202010764497.XA 2020-08-03 2020-08-03 Model training method, detection method and device for face key point detection, and terminal equipment Active CN114067380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010764497.XA CN114067380B (en) 2020-08-03 2020-08-03 Model training method, detection method and device for face key point detection, and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010764497.XA CN114067380B (en) 2020-08-03 2020-08-03 Model training method, detection method and device for face key point detection, and terminal equipment

Publications (2)

Publication Number Publication Date
CN114067380A true CN114067380A (en) 2022-02-18
CN114067380B CN114067380B (en) 2024-08-27

Family

ID=80231330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010764497.XA Active CN114067380B (en) 2020-08-03 2020-08-03 Model training method, detection method and device for face key point detection, and terminal equipment

Country Status (1)

Country Link
CN (1) CN114067380B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508678A (en) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 Training method, the detection method and device of face key point of Face datection model
CN109961006A (en) * 2019-01-30 2019-07-02 东华大学 A kind of low pixel multiple target Face datection and crucial independent positioning method and alignment schemes
CN110309706A (en) * 2019-05-06 2019-10-08 深圳市华付信息技术有限公司 Face critical point detection method, apparatus, computer equipment and storage medium
CN110889446A (en) * 2019-11-22 2020-03-17 高创安邦(北京)技术有限公司 Face image recognition model training and face image recognition method and device
CN111160269A (en) * 2019-12-30 2020-05-15 广东工业大学 Face key point detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508678A (en) * 2018-11-16 2019-03-22 广州市百果园信息技术有限公司 Training method, the detection method and device of face key point of Face datection model
CN109961006A (en) * 2019-01-30 2019-07-02 东华大学 A kind of low pixel multiple target Face datection and crucial independent positioning method and alignment schemes
CN110309706A (en) * 2019-05-06 2019-10-08 深圳市华付信息技术有限公司 Face critical point detection method, apparatus, computer equipment and storage medium
CN110889446A (en) * 2019-11-22 2020-03-17 高创安邦(北京)技术有限公司 Face image recognition model training and face image recognition method and device
CN111160269A (en) * 2019-12-30 2020-05-15 广东工业大学 Face key point detection method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石高辉;陈晓荣;刘亚茹;戴星宇;池笑宇;李恒;: "基于卷积神经网络的人脸关键点检测算法设计", 电子测量技术, no. 24 *

Also Published As

Publication number Publication date
CN114067380B (en) 2024-08-27

Similar Documents

Publication Publication Date Title
CN110232311B (en) Method and device for segmenting hand image and computer equipment
CN109543606B (en) Human face recognition method with attention mechanism
CN109961008B (en) Table analysis method, medium and computer equipment based on text positioning recognition
CN110309876B (en) Object detection method, device, computer readable storage medium and computer equipment
CN108960229B (en) Multidirectional character detection method and device
CN111160269A (en) Face key point detection method and device
EP3633605A1 (en) Information processing device, information processing method, and program
US20200118263A1 (en) Information processing device, information processing method, and storage medium
CN111291629A (en) Method and device for recognizing text in image, computer equipment and computer storage medium
CN108961235A (en) A kind of disordered insulator recognition methods based on YOLOv3 network and particle filter algorithm
TW201732651A (en) Word segmentation method and apparatus
CN113269257A (en) Image classification method and device, terminal equipment and storage medium
CN111539330B (en) Transformer substation digital display instrument identification method based on double-SVM multi-classifier
CN112381183B (en) Target detection method and device, electronic equipment and storage medium
CN113223025B (en) Image processing method and device, and neural network training method and device
CN104217459B (en) A kind of spheroid character extracting method
CN114155244B (en) Defect detection method, device, equipment and storage medium
CN112446356B (en) Method for detecting text with arbitrary shape in natural scene based on multiple polar coordinates
CN111753782A (en) False face detection method and device based on double-current network and electronic equipment
CN110751154A (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN111985488B (en) Target detection segmentation method and system based on offline Gaussian model
CN110689000A (en) Vehicle license plate identification method based on vehicle license plate sample in complex environment
US9286543B2 (en) Characteristic point coordination system, characteristic point coordination method, and recording medium
CN113989604A (en) Tire DOT information identification method based on end-to-end deep learning
CN112819008A (en) Method, device, medium and electronic equipment for optimizing instance detection network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant