CN114118303B - Face key point detection method and device based on prior constraint - Google Patents

Face key point detection method and device based on prior constraint Download PDF

Info

Publication number
CN114118303B
CN114118303B CN202210083501.5A CN202210083501A CN114118303B CN 114118303 B CN114118303 B CN 114118303B CN 202210083501 A CN202210083501 A CN 202210083501A CN 114118303 B CN114118303 B CN 114118303B
Authority
CN
China
Prior art keywords
face
key point
face key
self
point detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210083501.5A
Other languages
Chinese (zh)
Other versions
CN114118303A (en
Inventor
王金桥
刘智威
李碧莹
赵朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Objecteye Beijing Technology Co Ltd
Original Assignee
Objecteye Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Objecteye Beijing Technology Co Ltd filed Critical Objecteye Beijing Technology Co Ltd
Priority to CN202210083501.5A priority Critical patent/CN114118303B/en
Publication of CN114118303A publication Critical patent/CN114118303A/en
Application granted granted Critical
Publication of CN114118303B publication Critical patent/CN114118303B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention provides a method and a device for detecting key points of a human face based on prior constraint, wherein the method comprises the following steps: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image. The method and the device for detecting the key points of the human face based on the prior constraint can improve the accuracy of detecting the key points of the human face, have strong anti-interference capability and improve the robustness in difficult scenes.

Description

Face key point detection method and device based on prior constraint
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for detecting key points of a human face based on prior constraint.
Background
The human face key point detection is an important task in the field of computer vision, and aims to acquire the positions of face key points such as lip corners, eye corners, nose tips and the like in pictures or videos. The process of face key point detection is a key step of a plurality of downstream tasks, and plays an important role in tasks such as face recognition, face age estimation and the like. Due to wide application scenes, the face key point detection relates to some difficult scenes such as face shielding and light change. This puts higher demands on the accuracy and robustness of the identification method.
The existing face key point detection method is mainly divided into two types, one type mainly utilizes a coordinate regression method, and the method can cause the detail structure of the face to be lost; another class uses primarily heatmap regression methods, preserving facial detail information. However, the heat map regression method has poor anti-interference capability and higher detection error in the scenes of shielding, light interference and the like.
Disclosure of Invention
The invention provides a method and a device for detecting key points of a human face based on prior constraint, which are used for solving the defects that the detail structure of the human face is lost, the anti-jamming capability is poor and the detection accuracy is low in a shielding scene and a light interference scene in the prior art, improving the accuracy of detecting the key points of the human face, having strong anti-jamming capability and improving the robustness in a difficult scene.
The invention provides a face key point detection method, which comprises the following steps:
acquiring a face image to be recognized;
inputting the face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features;
the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
In some embodiments, the face keypoint detection model is trained based on the following steps:
acquiring the face sample image and an initial model;
inputting the face sample image into the initial model, extracting self-attention features by the initial model based on the face sample image, and performing image recognition based on the self-attention features to obtain a face key point detection result output by the initial model;
training the initial model based on the loss results of the self-attention feature and the structure prior feature, the loss results of the face key point detection result and the loss results of the sample data of the face key point position to obtain the face key point detection model.
In some embodiments, the prior structural feature of the face sample image is obtained based on the following steps:
acquiring sample data of the positions of the key points of the human face;
taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution;
determining the prior feature of the structure based on a mask of a plurality of the keypoint coordinates.
In some embodiments, said masking each keypoint based on a gaussian distribution centered on each keypoint in the face keypoint location sample data comprises:
based on the formula:
Figure 71435DEST_PATH_IMAGE001
determining a mask for each of the keypoints;
wherein the content of the first and second substances,
Figure 285248DEST_PATH_IMAGE002
and
Figure 407925DEST_PATH_IMAGE003
bar representing the jth keypointThe coordinate and the ordinate of the device are,
Figure 774315DEST_PATH_IMAGE004
and
Figure 957035DEST_PATH_IMAGE005
and (3) representing the abscissa and the ordinate of any point in the neighborhood centered on the jth keypoint, wherein R is the radius of the neighborhood centered on the jth keypoint, and σ is the standard deviation of any point in the neighborhood centered on the jth keypoint.
In some embodiments, the training the initial model based on the loss result of the self-attention feature and the structure prior feature, and the loss result of the face keypoint detection result and the loss result of the face keypoint sample data to obtain the face keypoint detection model includes:
determining a first loss function based on the self-attention feature and the structure prior feature;
determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points;
and training the initial model based on the first loss function, the second loss function and the weight parameter to obtain the face key point detection model.
In some embodiments, said determining a first loss function based on said self-attentional feature and said structure prior feature comprises:
based on the formula:
Figure 392564DEST_PATH_IMAGE006
Figure 53353DEST_PATH_IMAGE007
Figure 274249DEST_PATH_IMAGE008
Figure 627870DEST_PATH_IMAGE009
Figure 816275DEST_PATH_IMAGE010
determining a first loss function;
wherein the content of the first and second substances,
Figure 280755DEST_PATH_IMAGE011
representing a function of the first loss as a function of,
Figure 356158DEST_PATH_IMAGE012
representing a prior feature of the structure,
Figure 880680DEST_PATH_IMAGE013
and
Figure 290802DEST_PATH_IMAGE014
both represent two-dimensional coordinates of keypoints in the face keypoint location sample data,
Figure 824551DEST_PATH_IMAGE015
representing the self-attention features, and Z represents a feature map extracted from the face sample image by the face key point detection model;
Figure 20040DEST_PATH_IMAGE016
representing said self-attention feature
Figure 449885DEST_PATH_IMAGE015
The values of the singlets in the tensor after deformation.
In some embodiments, the determining a second loss function based on the face keypoint detection result and the face keypoint sample data comprises:
based on the formula:
Figure 612882DEST_PATH_IMAGE017
determining a second loss function;
wherein the content of the first and second substances,
Figure 419164DEST_PATH_IMAGE018
representing the second loss function in the form of a function,
Figure 203580DEST_PATH_IMAGE019
representing the detection result of the face key points,
Figure 335484DEST_PATH_IMAGE020
coordinates representing the jth key point in the face key point detection result, the
Figure 454619DEST_PATH_IMAGE021
And representing the position sample data of the key points of the human face, wherein N represents the total number of the key points in the detection result of the key points of the human face, and Y represents the image of the human face sample.
The invention also provides a face key point detection device, which comprises:
the acquisition module is used for acquiring a face image to be recognized;
the recognition module is used for inputting the face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features;
the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of any one of the above human face key point detection methods.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the above-described face keypoint detection methods.
According to the method and the device for detecting the key points of the human face based on the prior constraint, provided by the invention, the self-attention mechanism is added into the detection model of the key points of the human face, and the structural prior constraint is utilized in the training process, so that the detection model of the key points of the human face can be trained to improve the accuracy of the detection of the key points of the human face, the anti-interference capability is strong, and the robustness under difficult scenes is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a face key point detection method provided by the present invention;
FIG. 2 is a block diagram of a method for detecting key points of a human face according to the present invention;
FIG. 3 is a schematic structural diagram of a face key point detection apparatus provided in the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method and device for detecting the key points of the human face based on the prior constraint are described below with reference to fig. 1 to 4.
As shown in fig. 1, the present invention provides a method for detecting a face keypoint, which includes the following steps 110 to 120.
And step 110, obtaining a face image to be recognized.
It can be understood that the face image to be recognized is an image which needs to be subjected to face key point detection, and the face image to be recognized may be image data acquired and trimmed by a shooting device, or image data acquired by trimming an image input by a user.
And 120, inputting the face image to be recognized into the face key point detection model to obtain the self-attention feature extracted by the face key point detection model and the position information of the face key point output based on the self-attention feature.
The face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
It is understood that the face keypoint detection model may be a model constructed on the basis of a convolutional neural network, for example, the model may be constructed by using a convolutional network structure HRNet-v2 as a model architecture, and a basic framework uses HRNet-v2, and the basic framework may include a branch module, a feature extraction module and a head module.
The branch module is composed of two layers of convolution layers. The feature extraction module is composed of parallel multi-resolution convolutional networks, wherein values 1/4, 1/8, 1/16 and 1/32 in fig. 2 respectively represent the ratio of the feature resolution used by the level to the resolution of the input image, and different resolution levels have a mutual fusion mechanism, so that the feature expression of the human face under the multi-level resolution can be obtained.
The head module outputs the extracted features as a heat map of the predicted key points through the convolution layer, and the position information of the key points of the human face is represented.
Here, a spatial self-attention module is added after the feature extraction module, so that the extracted multi-resolution face features are enhanced, and global context information is introduced. And in order to deal with the situation of lacking face structure information in a difficult scene, the spatial self-attention module is subjected to face structure prior constraint during training.
When the face key point detection model is used for detecting the face key points, a face image to be recognized can be input into the face key point detection model, a spatial self-attention module of the face key point detection model can firstly extract self-attention features from the face image to be recognized, and further the self-attention features are used for enhancing the face features to obtain the position information of the face key points.
It should be noted that the attention mechanism is a kind of mechanism that effectively enhances the network feature expression. The self-attention mechanism is one of the attention mechanisms, can model long-distance feature dependence, is applied to various computer vision tasks such as detection and identification fields, and can make up for the problem of global context information loss caused by operations such as convolution and the like.
Specifically, the face keypoint detection model may be obtained by performing supervised learning training using a face sample image dataset, where the face sample image dataset may include a large number of face sample images and face keypoint position sample data corresponding to each face sample image, and the face keypoint position sample data may be used to perform a structure prior generation operation to obtain a structure prior feature of the face sample image, and the face keypoint detection model may be trained using the face sample image, the face keypoint position sample data, and the structure prior feature of the face sample image.
In the training process, the face sample image is input into the face key point detection model to obtain the self-attention feature output by the spatial self-attention module of the face key point detection model, and the face key point detection model enhances the face feature according to the self-attention feature to predict the face key point detection result.
The prior feature of the structure of the face sample image and the position sample data of the face key point corresponding to the face sample image are given in advance, so that the supervised learning of the face key point detection model can be realized, the global context information missing in the convolutional network is compensated by using the self-attention mechanism, and the action range of the self-attention mechanism is restrained by the face structure prior, so that the accuracy of the face key point detection is improved, and the robustness under the difficult scene is improved.
It is worth mentioning that, the inventor finds out in the research and development process that the face key point detection method based on deep learning can adopt the following two types of methods: coordinate-based regression methods, and heatmap-based regression methods.
The coordinate-based regression method implicitly learns the face shape and directly predicts the coordinates of key points through a network. However, the full-link layer used in the direct prediction of this kind of method causes the loss of the detail structure of the human face.
Unlike coordinate-based regression methods, heatmap-based regression methods use convolutional neural networks, preserving spatial detail information of the face. Such methods predict a heat map for each keypoint, and take the location in the heat map with the maximum as the location of the corresponding keypoint. The heat map-based method retains the facial detail information and the overall effect is superior to the coordinate-based method.
According to the face key point detection method provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under difficult scenes is improved.
In some embodiments, the face keypoint detection model is trained based on the following steps: acquiring a human face sample image and an initial model; inputting a face sample image into an initial model, extracting self-attention features from the initial model based on the face sample image, and carrying out image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature, the loss results of the face key point detection result and the loss results of the sample data of the face key point position to obtain the face key point detection model.
As shown in fig. 2, it is understood that a blank convolutional neural network can be used as an initial model, for example, a convolutional network structure HRNet-v2 can be used as a model architecture for construction, and the initial model can include a spatial self-attention module.
Since the receptive field of the convolutional neural network is local and lacks global context correlation, a self-attention module is added after the feature extraction module. And the main features of the key points of the human face are concentrated on the spatial dimension, so the added self-attention module is a spatial self-attention module.
And the multi-resolution C H W feature graph obtained from the feature extraction module is represented by Z, C is the number of feature graph channels, and H and W represent the height and the width respectively. The features respectively pass through three paths in parallel in the spatial self-attention module.
As shown in fig. 2, the features of the lower path are further convolved by 1 × 1 into two branches, respectively, to obtain the features with the size of C × H × W
Figure 330171DEST_PATH_IMAGE022
And
Figure 969094DEST_PATH_IMAGE023
then, the space self-attention moment matrix is obtained by transforming the feature tensor into C (H W) and multiplying the C (H W) and the H W
Figure 271899DEST_PATH_IMAGE024
Figure 878330DEST_PATH_IMAGE025
The features of the intermediate path are also convolved by 1 x 1 and transformed into C (H x W) features
Figure 291994DEST_PATH_IMAGE026
. Then using the spatial self-attention moment array
Figure 910057DEST_PATH_IMAGE027
To the characteristics
Figure 524709DEST_PATH_IMAGE028
And enhancing, adding the enhanced features to the original features, and deforming back to C H W to obtain the final output of the spatial self-attention module, namely the self-attention features, according to the formula:
Figure 228223DEST_PATH_IMAGE029
where γ is a learnable weight that follows the training of the network during the training process. The addition with the original features is element-by-element addition, other symbols are specifically explained above, the meaning of the symbols is the same as that described above, and the description is omitted here.
In the training process, the face sample image can be input into the initial model to obtain the self-attention feature output by the spatial self-attention module of the initial model, and the face key point detection model outputs the face key point detection result according to the self-attention feature.
The method can compare the self-attention feature with the structure prior feature to obtain a loss result, compares the face key point detection result with the sample data of the face key point position to obtain the loss result, and continuously adjusts the model parameters without inputting the face sample image on the basis of the two loss results, thereby continuously training the face key point detection model, and obtaining higher identification accuracy of the face key point detection model.
In some embodiments, the prior structural features of the face sample image are obtained based on the following steps: acquiring sample data of the positions of key points of the human face; taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution; based on the mask of the coordinates of the plurality of keypoints, a structure prior feature is determined.
It can be understood that, although the spatial self-attention mechanism can add spatial context information to the extracted face features, overall face structure information is lacking for face key point labeling in difficult scenes such as facial occlusion and illumination change.
The context association learned by the spatial self-attention mechanism includes not only the relationship between the key point and the neighborhood, but also the association between the key point and other irrelevant content, such as background and occlusion, which interferes with the detection of the key point.
The self-attention mechanism can only bring global context-dependent information to features, but lacks structural priors. Therefore, the invention proposes to use the face structure prior supervision self-attention mechanism to learn and constrain the features with face structure semantics, namely the features of the key point position and the neighborhood thereof.
Sample data of the positions of key points of the human face can be acquired; the number of the key points in the position sample data of the key points of the human face can be N, each key point in the position sample data of the key points of the human face can be used as a center, and the neighborhood of each key point is used as a mask of each key point based on Gaussian distribution; once the mask for each keypoint has been determined, the prior features of the structure can be determined from the masks for all keypoint coordinates.
In some embodiments, centering on each keypoint in the face keypoint location sample data, taking a neighborhood of each keypoint as a mask for each keypoint based on a gaussian distribution, comprises:
based on the formula:
Figure 570211DEST_PATH_IMAGE030
determining a mask for each keypoint;
wherein the content of the first and second substances,
Figure 308360DEST_PATH_IMAGE031
and
Figure 828334DEST_PATH_IMAGE032
the abscissa and ordinate of the jth keypoint are represented,
Figure 284723DEST_PATH_IMAGE033
and
Figure 164824DEST_PATH_IMAGE034
and (3) representing the abscissa and the ordinate of any point in the neighborhood centered on the jth keypoint, wherein R is the radius of the neighborhood centered on the jth keypoint, and σ is the standard deviation of any point in the neighborhood centered on the jth keypoint.
The masks of each key point can be added element by element, that is, the masks of a plurality of key point coordinates are added to obtain the final mask
Figure 23058DEST_PATH_IMAGE035
Figure 448354DEST_PATH_IMAGE036
The final mask can be used
Figure 392040DEST_PATH_IMAGE037
As a structural prior feature.
In some embodiments, training the initial model based on the loss result of the self-attention feature and the structure prior feature, and the loss result of the face key point detection result and the sample data of the face key point position to obtain the face key point detection model, includes: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; and training the initial model based on the first loss function, the second loss function and the weight parameter to obtain a face key point detection model.
It is understood that the first loss function is constructed based on the self-attention feature and the structure prior feature; constructing a second loss function according to the detection result of the face key points and the position sample data of the face key points, wherein the weight parameters of the first loss function and the second loss function can be preset
Figure 75831DEST_PATH_IMAGE038
And training the initial model according to the first loss function, the second loss function and the weight parameter to obtain a face key point detection model.
In some embodiments, determining the first loss function based on the self-attention feature and the structure prior feature comprises:
based on the formula:
Figure 788572DEST_PATH_IMAGE039
Figure 650349DEST_PATH_IMAGE040
Figure 815751DEST_PATH_IMAGE041
Figure 37653DEST_PATH_IMAGE042
Figure 870480DEST_PATH_IMAGE043
determining a first loss function;
wherein the content of the first and second substances,
Figure 637579DEST_PATH_IMAGE044
the first loss function is represented as a function of,
Figure 290277DEST_PATH_IMAGE045
the prior characteristics of the structure are shown,
Figure 315871DEST_PATH_IMAGE046
and
Figure 3204DEST_PATH_IMAGE047
both represent the two-dimensional coordinates of the keypoints in the face keypoint location sample data,
Figure 206784DEST_PATH_IMAGE048
representing self-attention characteristics, and Z represents a characteristic diagram extracted from a human face sample image by a human face key point detection model;
Figure 346778DEST_PATH_IMAGE049
representing said self-attention feature
Figure 922202DEST_PATH_IMAGE050
The values of the singlets in the tensor after deformation.
A keypoint neighborhood may be defined as
Figure 464042DEST_PATH_IMAGE051
Constraining activation of non-neighborhood positions and calculating prior loss
In some embodiments, determining the second loss function based on the face keypoint detection result and the face keypoint sample data includes:
based on the formula:
Figure 838523DEST_PATH_IMAGE052
determining a second loss function;
wherein the content of the first and second substances,
Figure 465813DEST_PATH_IMAGE053
represents the second lossThe function of the function is that of the function,
Figure 833209DEST_PATH_IMAGE054
the detection result of the key points of the human face is shown,
Figure 229556DEST_PATH_IMAGE055
the coordinates of the jth key point in the face key point detection result are shown,
Figure 774938DEST_PATH_IMAGE056
and the method comprises the steps of representing the position sample data of the key points of the human face, wherein N represents the total number of the key points in the detection result of the key points of the human face, and Y represents the image of the human face sample.
Spatial self-attention model
Figure 623945DEST_PATH_IMAGE057
Deforming into H W, and
Figure 670398DEST_PATH_IMAGE058
and (4) showing. The value of the mask-corresponding position is used as a weight in the loss calculation, and the closer the distance to the key point, the higher the weight.
A second loss function may be calculated for face keypoint detection results
Figure 45885DEST_PATH_IMAGE059
And finally with the weighting parameters
Figure 886802DEST_PATH_IMAGE060
Two loss functions are balanced:
Figure 364051DEST_PATH_IMAGE061
Figure 948616DEST_PATH_IMAGE062
Figure 444188DEST_PATH_IMAGE063
wherein the content of the first and second substances,
Figure 456006DEST_PATH_IMAGE064
is a result of detecting the key points of the human face,
Figure 420551DEST_PATH_IMAGE065
is a head module of a human face key point detection model,
Figure 543228DEST_PATH_IMAGE066
is a function of the first loss to be,
Figure 893307DEST_PATH_IMAGE067
is the second Loss function and Loss is the synthetic Loss function.
In the testing and application stage, the multi-resolution features extracted by the face key point detection model are enhanced through a spatial self-attention module to obtain self-attention features, the self-attention features are used for enhancing the face features to further obtain the position information of the face key points, and the face structure does not participate in the testing and application stage of the face key point detection model in a priori manner.
The following describes the face key point detection device provided by the present invention, and the face key point detection device described below and the face key point detection method described above may be referred to in correspondence with each other.
As shown in fig. 3, the present invention provides a face key point detection device, which includes: an acquisition module 310 and an identification module 320.
An obtaining module 310, configured to obtain a face image to be recognized;
the recognition module 330 is configured to input a face image to be recognized into the face key point detection model, so as to obtain a self-attention feature extracted by the face key point detection model and face key point position information output based on the self-attention feature;
the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
In some embodiments, the face keypoint detection apparatus may also have a training module.
A training module to: acquiring a human face sample image and an initial model; inputting a face sample image into an initial model, extracting self-attention features from the initial model based on the face sample image, and carrying out image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature, the loss results of the face key point detection result and the loss results of the sample data of the face key point position to obtain the face key point detection model.
In some embodiments, the training module is further configured to: acquiring sample data of the positions of key points of the human face; taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution; based on the mask of the coordinates of the plurality of keypoints, a structure prior feature is determined.
In some embodiments, the training module is further configured to:
based on the formula:
Figure 341606DEST_PATH_IMAGE068
determining a mask for each keypoint;
wherein the content of the first and second substances,
Figure 527868DEST_PATH_IMAGE069
and
Figure 454235DEST_PATH_IMAGE070
the abscissa and ordinate of the jth keypoint are represented,
Figure 658821DEST_PATH_IMAGE071
and
Figure 12442DEST_PATH_IMAGE072
and (3) representing the abscissa and the ordinate of any point in the neighborhood centered on the jth keypoint, wherein R is the radius of the neighborhood centered on the jth keypoint, and σ is the standard deviation of any point in the neighborhood centered on the jth keypoint.
In some embodiments, the training module is further configured to: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; and training the initial model based on the first loss function, the second loss function and the weight parameter to obtain a face key point detection model.
In some embodiments, the training module is further configured to:
based on the formula:
Figure 685999DEST_PATH_IMAGE073
Figure 416058DEST_PATH_IMAGE074
Figure 740729DEST_PATH_IMAGE008
Figure 265251DEST_PATH_IMAGE075
Figure 426105DEST_PATH_IMAGE076
determining a first loss function;
wherein the content of the first and second substances,
Figure 694276DEST_PATH_IMAGE077
the first loss function is represented as a function of,
Figure 873453DEST_PATH_IMAGE078
the prior characteristics of the structure are shown,
Figure 834456DEST_PATH_IMAGE079
and
Figure 482606DEST_PATH_IMAGE080
both represent the two-dimensional coordinates of the keypoints in the face keypoint location sample data,
Figure 288888DEST_PATH_IMAGE048
representing self-attention characteristics, and Z represents a characteristic diagram extracted from a human face sample image by a human face key point detection model;
Figure 588151DEST_PATH_IMAGE081
and the self-attention feature obtained by processing the face sample image by the face key point detection model is represented.
In some embodiments, the training module is further configured to:
based on the formula:
Figure 454476DEST_PATH_IMAGE082
determining a second loss function;
wherein the content of the first and second substances,
Figure 589922DEST_PATH_IMAGE083
the second loss function is represented as a function of,
Figure 199895DEST_PATH_IMAGE084
the detection result of the key points of the human face is shown,
Figure 353665DEST_PATH_IMAGE085
the coordinates of the jth key point in the face key point detection result are shown,
Figure 656470DEST_PATH_IMAGE086
representing face keysPoint position sample data, N represents the total number of key points in the face key point detection result, and Y represents a face sample image.
According to the face key point detection device provided by the invention, the self-attention mechanism is added in the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model obtained by training can improve the accuracy of face key point detection, has strong anti-interference capability, and improves the robustness in a difficult scene.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a face keypoint detection method comprising: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
According to the electronic equipment provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under a difficult scene is improved.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, where the computer program product includes a computer program, the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, a computer can execute the face keypoint detection method provided by the above methods, and the method includes: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
According to the computer program product provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under difficult scenes is improved.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the face keypoint detection method provided by the above methods, the method comprising: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.
According to the non-transitory computer-readable storage medium provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under difficult scenes is improved.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A face key point detection method is characterized by comprising the following steps:
acquiring a face image to be recognized;
inputting the face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features;
the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image;
the face key point detection model is obtained by training based on the following steps: acquiring the face sample image and an initial model; inputting the face sample image into the initial model, extracting self-attention features by the initial model based on the face sample image, and performing image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature and the loss results of the face key point detection result and the sample data of the face key point position to obtain the face key point detection model;
the training the initial model based on the loss result of the self-attention feature and the structure prior feature and the loss result of the face key point detection result and the face key point position sample data to obtain the face key point detection model comprises: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; training the initial model based on the first loss function, the second loss function and the weight parameter to obtain the face key point detection model;
said determining a first loss function based on said self-attentional feature and said structure prior feature, comprising:
based on the formula:
Figure 916682DEST_PATH_IMAGE001
Figure 541698DEST_PATH_IMAGE002
Figure 322441DEST_PATH_IMAGE003
Figure 400119DEST_PATH_IMAGE004
Figure 511294DEST_PATH_IMAGE005
determining a first loss function;
wherein the content of the first and second substances,
Figure 990817DEST_PATH_IMAGE006
representing a function of the first loss as a function of,
Figure 942462DEST_PATH_IMAGE007
representing a prior feature of the structure,
Figure 507435DEST_PATH_IMAGE008
and
Figure 687881DEST_PATH_IMAGE009
both represent two-dimensional coordinates of keypoints in the face keypoint location sample data,
Figure 21910DEST_PATH_IMAGE010
representing the self-attention features, and Z represents a feature map extracted from the face sample image by the face key point detection model;
Figure 144456DEST_PATH_IMAGE011
representing said self-attention feature
Figure 931146DEST_PATH_IMAGE010
The values of the single elements in the tensor after deformation;
the determining a second loss function based on the detection result of the face key point and the sample data of the position of the face key point includes:
based on the formula:
Figure 915283DEST_PATH_IMAGE012
determining a second loss function;
wherein the content of the first and second substances,
Figure 634977DEST_PATH_IMAGE013
representing the second loss function in the form of a function,
Figure 413577DEST_PATH_IMAGE014
representing the detection result of the face key points,
Figure 942691DEST_PATH_IMAGE015
coordinates representing the jth key point in the face key point detection result, the
Figure 730518DEST_PATH_IMAGE016
And representing the position sample data of the key points of the human face, wherein N represents the total number of the key points in the detection result of the key points of the human face, and Y represents the image of the human face sample.
2. The method for detecting the key points of the human face according to claim 1, wherein the structure prior characteristics of the human face sample image are obtained based on the following steps:
acquiring sample data of the positions of the key points of the human face;
taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution;
determining the prior feature of the structure based on a mask of a plurality of the keypoint coordinates.
3. The method according to claim 2, wherein the masking each keypoint by a neighborhood of each keypoint based on a gaussian distribution centered on each keypoint in the sample data of the positions of the keypoints of the face comprises:
based on the formula:
Figure 39140DEST_PATH_IMAGE017
determining a mask for each of the keypoints;
wherein the content of the first and second substances,
Figure 988641DEST_PATH_IMAGE018
and
Figure 264771DEST_PATH_IMAGE019
the abscissa and ordinate of the jth keypoint are represented,
Figure 325130DEST_PATH_IMAGE020
and
Figure 753838DEST_PATH_IMAGE021
and (3) representing the abscissa and the ordinate of any point in the neighborhood centered on the jth keypoint, wherein R is the radius of the neighborhood centered on the jth keypoint, and σ is the standard deviation of any point in the neighborhood centered on the jth keypoint.
4. A face key point detection device, comprising:
the acquisition module is used for acquiring a face image to be recognized;
the recognition module is used for inputting the face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features;
the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image;
the face key point detection model is obtained by training based on the following steps: acquiring the face sample image and an initial model; inputting the face sample image into the initial model, extracting self-attention features by the initial model based on the face sample image, and performing image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature and the loss results of the face key point detection result and the sample data of the face key point position to obtain the face key point detection model;
the training the initial model based on the loss result of the self-attention feature and the structure prior feature and the loss result of the face key point detection result and the face key point position sample data to obtain the face key point detection model comprises: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; training the initial model based on the first loss function, the second loss function and the weight parameter to obtain the face key point detection model;
said determining a first loss function based on said self-attentional feature and said structure prior feature, comprising:
based on the formula:
Figure 874241DEST_PATH_IMAGE001
Figure 388398DEST_PATH_IMAGE002
Figure 501717DEST_PATH_IMAGE003
Figure 50510DEST_PATH_IMAGE004
Figure 341814DEST_PATH_IMAGE005
determining a first loss function;
wherein the content of the first and second substances,
Figure 77689DEST_PATH_IMAGE006
representing a function of the first loss as a function of,
Figure 994698DEST_PATH_IMAGE007
representing a prior feature of the structure,
Figure 397998DEST_PATH_IMAGE008
and
Figure 860203DEST_PATH_IMAGE009
both represent two-dimensional coordinates of keypoints in the face keypoint location sample data,
Figure 348953DEST_PATH_IMAGE010
representing the self-attention features, and Z represents a feature map extracted from the face sample image by the face key point detection model;
Figure 820386DEST_PATH_IMAGE011
representing said self-attention feature
Figure 78192DEST_PATH_IMAGE010
The values of the single elements in the tensor after deformation;
the determining a second loss function based on the detection result of the face key point and the sample data of the position of the face key point includes:
based on the formula:
Figure 960566DEST_PATH_IMAGE012
determining a second loss function;
wherein the content of the first and second substances,
Figure 936612DEST_PATH_IMAGE013
representing the second loss function in the form of a function,
Figure 946157DEST_PATH_IMAGE014
representing the detection result of the face key points,
Figure 58469DEST_PATH_IMAGE015
coordinates representing the jth key point in the face key point detection result, the
Figure 128056DEST_PATH_IMAGE016
And representing the position sample data of the key points of the human face, wherein N represents the total number of the key points in the detection result of the key points of the human face, and Y represents the image of the human face sample.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the face keypoint detection method according to any of claims 1 to 3 when executing the program.
6. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the face keypoint detection method according to any one of claims 1 to 3.
CN202210083501.5A 2022-01-25 2022-01-25 Face key point detection method and device based on prior constraint Active CN114118303B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210083501.5A CN114118303B (en) 2022-01-25 2022-01-25 Face key point detection method and device based on prior constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210083501.5A CN114118303B (en) 2022-01-25 2022-01-25 Face key point detection method and device based on prior constraint

Publications (2)

Publication Number Publication Date
CN114118303A CN114118303A (en) 2022-03-01
CN114118303B true CN114118303B (en) 2022-04-29

Family

ID=80360880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210083501.5A Active CN114118303B (en) 2022-01-25 2022-01-25 Face key point detection method and device based on prior constraint

Country Status (1)

Country Link
CN (1) CN114118303B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460343A (en) * 2018-02-06 2018-08-28 北京达佳互联信息技术有限公司 Image processing method, system and server
CN111832465A (en) * 2020-07-08 2020-10-27 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
CN112581370A (en) * 2020-12-28 2021-03-30 苏州科达科技股份有限公司 Training and reconstruction method of super-resolution reconstruction model of face image
CN112766158A (en) * 2021-01-20 2021-05-07 重庆邮电大学 Multi-task cascading type face shielding expression recognition method
CN112906432A (en) * 2019-12-04 2021-06-04 中南大学 Error detection and correction method applied to human face key point positioning task
WO2021208687A1 (en) * 2020-11-03 2021-10-21 平安科技(深圳)有限公司 Human-face detection model training method, device, medium, and human-face detection method
CN113609935A (en) * 2021-07-21 2021-11-05 无锡我懂了教育科技有限公司 Lightweight vague discrimination method based on deep learning face recognition
CN113658040A (en) * 2021-07-14 2021-11-16 西安理工大学 Face super-resolution method based on prior information and attention fusion mechanism

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460343A (en) * 2018-02-06 2018-08-28 北京达佳互联信息技术有限公司 Image processing method, system and server
CN112906432A (en) * 2019-12-04 2021-06-04 中南大学 Error detection and correction method applied to human face key point positioning task
CN111832465A (en) * 2020-07-08 2020-10-27 星宏集群有限公司 Real-time head classification detection method based on MobileNet V3
WO2021208687A1 (en) * 2020-11-03 2021-10-21 平安科技(深圳)有限公司 Human-face detection model training method, device, medium, and human-face detection method
CN112581370A (en) * 2020-12-28 2021-03-30 苏州科达科技股份有限公司 Training and reconstruction method of super-resolution reconstruction model of face image
CN112766158A (en) * 2021-01-20 2021-05-07 重庆邮电大学 Multi-task cascading type face shielding expression recognition method
CN113658040A (en) * 2021-07-14 2021-11-16 西安理工大学 Face super-resolution method based on prior information and attention fusion mechanism
CN113609935A (en) * 2021-07-21 2021-11-05 无锡我懂了教育科技有限公司 Lightweight vague discrimination method based on deep learning face recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Attention CoupleNet: Fully Convolutional Attention;Yousong Zhu等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20190131;第113-126页 *
融合关键点属性与注意力表征的人脸表情识别;高红霞;《计算机工程与应用》;20210928;第1-10页 *

Also Published As

Publication number Publication date
CN114118303A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110728209B (en) Gesture recognition method and device, electronic equipment and storage medium
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
WO2019100724A1 (en) Method and device for training multi-label classification model
CN109902548B (en) Object attribute identification method and device, computing equipment and system
CN112597941B (en) Face recognition method and device and electronic equipment
CN109446889B (en) Object tracking method and device based on twin matching network
Chen et al. Learning linear regression via single-convolutional layer for visual object tracking
CN109711283A (en) A kind of joint doubledictionary and error matrix block Expression Recognition algorithm
CN112434655A (en) Gait recognition method based on adaptive confidence map convolution network
CN111428664B (en) Computer vision real-time multi-person gesture estimation method based on deep learning technology
JP6107531B2 (en) Feature extraction program and information processing apparatus
CN111967527B (en) Peony variety identification method and system based on artificial intelligence
CN113361431B (en) Network model and method for face shielding detection based on graph reasoning
CN114565605A (en) Pathological image segmentation method and device
Raju et al. Detection based long term tracking in correlation filter trackers
Xu et al. Extended non-local feature for visual saliency detection in low contrast images
CN115862119B (en) Attention mechanism-based face age estimation method and device
CN114119970B (en) Target tracking method and device
CN114118303B (en) Face key point detection method and device based on prior constraint
CN111931767B (en) Multi-model target detection method, device and system based on picture informativeness and storage medium
CN113610026A (en) Pedestrian re-identification method and device based on mask attention
CN112861678A (en) Image identification method and device
CN111160161A (en) Self-learning face age estimation method based on noise elimination
CN113591593B (en) Method, equipment and medium for detecting target in abnormal weather based on causal intervention
CN114998990B (en) Method and device for identifying safety behaviors of personnel on construction site

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant