CN114118303B

CN114118303B - Face key point detection method and device based on prior constraint

Info

Publication number: CN114118303B
Application number: CN202210083501.5A
Authority: CN
Inventors: 王金桥; 刘智威; 李碧莹; 赵朝阳
Original assignee: Objecteye Beijing Technology Co Ltd
Current assignee: Objecteye Beijing Technology Co Ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-04-29
Anticipated expiration: 2042-01-25
Also published as: CN114118303A

Abstract

The invention provides a method and a device for detecting key points of a human face based on prior constraint, wherein the method comprises the following steps: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image. The method and the device for detecting the key points of the human face based on the prior constraint can improve the accuracy of detecting the key points of the human face, have strong anti-interference capability and improve the robustness in difficult scenes.

Description

Face key point detection method and device based on prior constraint

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for detecting key points of a human face based on prior constraint.

Background

The human face key point detection is an important task in the field of computer vision, and aims to acquire the positions of face key points such as lip corners, eye corners, nose tips and the like in pictures or videos. The process of face key point detection is a key step of a plurality of downstream tasks, and plays an important role in tasks such as face recognition, face age estimation and the like. Due to wide application scenes, the face key point detection relates to some difficult scenes such as face shielding and light change. This puts higher demands on the accuracy and robustness of the identification method.

The existing face key point detection method is mainly divided into two types, one type mainly utilizes a coordinate regression method, and the method can cause the detail structure of the face to be lost; another class uses primarily heatmap regression methods, preserving facial detail information. However, the heat map regression method has poor anti-interference capability and higher detection error in the scenes of shielding, light interference and the like.

Disclosure of Invention

The invention provides a method and a device for detecting key points of a human face based on prior constraint, which are used for solving the defects that the detail structure of the human face is lost, the anti-jamming capability is poor and the detection accuracy is low in a shielding scene and a light interference scene in the prior art, improving the accuracy of detecting the key points of the human face, having strong anti-jamming capability and improving the robustness in a difficult scene.

The invention provides a face key point detection method, which comprises the following steps:

acquiring a face image to be recognized;

inputting the face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features;

the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.

In some embodiments, the face keypoint detection model is trained based on the following steps:

acquiring the face sample image and an initial model;

inputting the face sample image into the initial model, extracting self-attention features by the initial model based on the face sample image, and performing image recognition based on the self-attention features to obtain a face key point detection result output by the initial model;

training the initial model based on the loss results of the self-attention feature and the structure prior feature, the loss results of the face key point detection result and the loss results of the sample data of the face key point position to obtain the face key point detection model.

In some embodiments, the prior structural feature of the face sample image is obtained based on the following steps:

acquiring sample data of the positions of the key points of the human face;

taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution;

determining the prior feature of the structure based on a mask of a plurality of the keypoint coordinates.

In some embodiments, said masking each keypoint based on a gaussian distribution centered on each keypoint in the face keypoint location sample data comprises:

based on the formula:

determining a mask for each of the keypoints;

wherein the content of the first and second substances,

and

bar representing the jth keypointThe coordinate and the ordinate of the device are,

and

and (3) representing the abscissa and the ordinate of any point in the neighborhood centered on the jth keypoint, wherein R is the radius of the neighborhood centered on the jth keypoint, and σ is the standard deviation of any point in the neighborhood centered on the jth keypoint.

In some embodiments, the training the initial model based on the loss result of the self-attention feature and the structure prior feature, and the loss result of the face keypoint detection result and the loss result of the face keypoint sample data to obtain the face keypoint detection model includes:

determining a first loss function based on the self-attention feature and the structure prior feature;

determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points;

and training the initial model based on the first loss function, the second loss function and the weight parameter to obtain the face key point detection model.

In some embodiments, said determining a first loss function based on said self-attentional feature and said structure prior feature comprises:

based on the formula:

，

；

determining a first loss function;

wherein the content of the first and second substances,

representing a function of the first loss as a function of,

representing a prior feature of the structure,

and

both represent two-dimensional coordinates of keypoints in the face keypoint location sample data,

representing the self-attention features, and Z represents a feature map extracted from the face sample image by the face key point detection model;

representing said self-attention feature

The values of the singlets in the tensor after deformation.

In some embodiments, the determining a second loss function based on the face keypoint detection result and the face keypoint sample data comprises:

based on the formula:

；

determining a second loss function;

wherein the content of the first and second substances,

representing the second loss function in the form of a function,

representing the detection result of the face key points,

coordinates representing the jth key point in the face key point detection result, the

And representing the position sample data of the key points of the human face, wherein N represents the total number of the key points in the detection result of the key points of the human face, and Y represents the image of the human face sample.

The invention also provides a face key point detection device, which comprises:

the acquisition module is used for acquiring a face image to be recognized;

the recognition module is used for inputting the face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of any one of the above human face key point detection methods.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of any of the above-described face keypoint detection methods.

According to the method and the device for detecting the key points of the human face based on the prior constraint, provided by the invention, the self-attention mechanism is added into the detection model of the key points of the human face, and the structural prior constraint is utilized in the training process, so that the detection model of the key points of the human face can be trained to improve the accuracy of the detection of the key points of the human face, the anti-interference capability is strong, and the robustness under difficult scenes is improved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a face key point detection method provided by the present invention;

FIG. 2 is a block diagram of a method for detecting key points of a human face according to the present invention;

FIG. 3 is a schematic structural diagram of a face key point detection apparatus provided in the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method and device for detecting the key points of the human face based on the prior constraint are described below with reference to fig. 1 to 4.

As shown in fig. 1, the present invention provides a method for detecting a face keypoint, which includes the following steps 110 to 120.

And step 110, obtaining a face image to be recognized.

It can be understood that the face image to be recognized is an image which needs to be subjected to face key point detection, and the face image to be recognized may be image data acquired and trimmed by a shooting device, or image data acquired by trimming an image input by a user.

And 120, inputting the face image to be recognized into the face key point detection model to obtain the self-attention feature extracted by the face key point detection model and the position information of the face key point output based on the self-attention feature.

It is understood that the face keypoint detection model may be a model constructed on the basis of a convolutional neural network, for example, the model may be constructed by using a convolutional network structure HRNet-v2 as a model architecture, and a basic framework uses HRNet-v2, and the basic framework may include a branch module, a feature extraction module and a head module.

The branch module is composed of two layers of convolution layers. The feature extraction module is composed of parallel multi-resolution convolutional networks, wherein values 1/4, 1/8, 1/16 and 1/32 in fig. 2 respectively represent the ratio of the feature resolution used by the level to the resolution of the input image, and different resolution levels have a mutual fusion mechanism, so that the feature expression of the human face under the multi-level resolution can be obtained.

The head module outputs the extracted features as a heat map of the predicted key points through the convolution layer, and the position information of the key points of the human face is represented.

Here, a spatial self-attention module is added after the feature extraction module, so that the extracted multi-resolution face features are enhanced, and global context information is introduced. And in order to deal with the situation of lacking face structure information in a difficult scene, the spatial self-attention module is subjected to face structure prior constraint during training.

When the face key point detection model is used for detecting the face key points, a face image to be recognized can be input into the face key point detection model, a spatial self-attention module of the face key point detection model can firstly extract self-attention features from the face image to be recognized, and further the self-attention features are used for enhancing the face features to obtain the position information of the face key points.

It should be noted that the attention mechanism is a kind of mechanism that effectively enhances the network feature expression. The self-attention mechanism is one of the attention mechanisms, can model long-distance feature dependence, is applied to various computer vision tasks such as detection and identification fields, and can make up for the problem of global context information loss caused by operations such as convolution and the like.

Specifically, the face keypoint detection model may be obtained by performing supervised learning training using a face sample image dataset, where the face sample image dataset may include a large number of face sample images and face keypoint position sample data corresponding to each face sample image, and the face keypoint position sample data may be used to perform a structure prior generation operation to obtain a structure prior feature of the face sample image, and the face keypoint detection model may be trained using the face sample image, the face keypoint position sample data, and the structure prior feature of the face sample image.

In the training process, the face sample image is input into the face key point detection model to obtain the self-attention feature output by the spatial self-attention module of the face key point detection model, and the face key point detection model enhances the face feature according to the self-attention feature to predict the face key point detection result.

The prior feature of the structure of the face sample image and the position sample data of the face key point corresponding to the face sample image are given in advance, so that the supervised learning of the face key point detection model can be realized, the global context information missing in the convolutional network is compensated by using the self-attention mechanism, and the action range of the self-attention mechanism is restrained by the face structure prior, so that the accuracy of the face key point detection is improved, and the robustness under the difficult scene is improved.

It is worth mentioning that, the inventor finds out in the research and development process that the face key point detection method based on deep learning can adopt the following two types of methods: coordinate-based regression methods, and heatmap-based regression methods.

The coordinate-based regression method implicitly learns the face shape and directly predicts the coordinates of key points through a network. However, the full-link layer used in the direct prediction of this kind of method causes the loss of the detail structure of the human face.

Unlike coordinate-based regression methods, heatmap-based regression methods use convolutional neural networks, preserving spatial detail information of the face. Such methods predict a heat map for each keypoint, and take the location in the heat map with the maximum as the location of the corresponding keypoint. The heat map-based method retains the facial detail information and the overall effect is superior to the coordinate-based method.

According to the face key point detection method provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under difficult scenes is improved.

In some embodiments, the face keypoint detection model is trained based on the following steps: acquiring a human face sample image and an initial model; inputting a face sample image into an initial model, extracting self-attention features from the initial model based on the face sample image, and carrying out image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature, the loss results of the face key point detection result and the loss results of the sample data of the face key point position to obtain the face key point detection model.

As shown in fig. 2, it is understood that a blank convolutional neural network can be used as an initial model, for example, a convolutional network structure HRNet-v2 can be used as a model architecture for construction, and the initial model can include a spatial self-attention module.

Since the receptive field of the convolutional neural network is local and lacks global context correlation, a self-attention module is added after the feature extraction module. And the main features of the key points of the human face are concentrated on the spatial dimension, so the added self-attention module is a spatial self-attention module.

And the multi-resolution C H W feature graph obtained from the feature extraction module is represented by Z, C is the number of feature graph channels, and H and W represent the height and the width respectively. The features respectively pass through three paths in parallel in the spatial self-attention module.

As shown in fig. 2, the features of the lower path are further convolved by 1 × 1 into two branches, respectively, to obtain the features with the size of C × H × W

And

then, the space self-attention moment matrix is obtained by transforming the feature tensor into C (H W) and multiplying the C (H W) and the H W

：

；

The features of the intermediate path are also convolved by 1 x 1 and transformed into C (H x W) features

. Then using the spatial self-attention moment array

To the characteristics

And enhancing, adding the enhanced features to the original features, and deforming back to C H W to obtain the final output of the spatial self-attention module, namely the self-attention features, according to the formula:

；

where γ is a learnable weight that follows the training of the network during the training process. The addition with the original features is element-by-element addition, other symbols are specifically explained above, the meaning of the symbols is the same as that described above, and the description is omitted here.

In the training process, the face sample image can be input into the initial model to obtain the self-attention feature output by the spatial self-attention module of the initial model, and the face key point detection model outputs the face key point detection result according to the self-attention feature.

The method can compare the self-attention feature with the structure prior feature to obtain a loss result, compares the face key point detection result with the sample data of the face key point position to obtain the loss result, and continuously adjusts the model parameters without inputting the face sample image on the basis of the two loss results, thereby continuously training the face key point detection model, and obtaining higher identification accuracy of the face key point detection model.

In some embodiments, the prior structural features of the face sample image are obtained based on the following steps: acquiring sample data of the positions of key points of the human face; taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution; based on the mask of the coordinates of the plurality of keypoints, a structure prior feature is determined.

It can be understood that, although the spatial self-attention mechanism can add spatial context information to the extracted face features, overall face structure information is lacking for face key point labeling in difficult scenes such as facial occlusion and illumination change.

The context association learned by the spatial self-attention mechanism includes not only the relationship between the key point and the neighborhood, but also the association between the key point and other irrelevant content, such as background and occlusion, which interferes with the detection of the key point.

The self-attention mechanism can only bring global context-dependent information to features, but lacks structural priors. Therefore, the invention proposes to use the face structure prior supervision self-attention mechanism to learn and constrain the features with face structure semantics, namely the features of the key point position and the neighborhood thereof.

Sample data of the positions of key points of the human face can be acquired; the number of the key points in the position sample data of the key points of the human face can be N, each key point in the position sample data of the key points of the human face can be used as a center, and the neighborhood of each key point is used as a mask of each key point based on Gaussian distribution; once the mask for each keypoint has been determined, the prior features of the structure can be determined from the masks for all keypoint coordinates.

In some embodiments, centering on each keypoint in the face keypoint location sample data, taking a neighborhood of each keypoint as a mask for each keypoint based on a gaussian distribution, comprises:

based on the formula:

；

determining a mask for each keypoint;

wherein the content of the first and second substances,

and

the abscissa and ordinate of the jth keypoint are represented,

and

The masks of each key point can be added element by element, that is, the masks of a plurality of key point coordinates are added to obtain the final mask

：

；

The final mask can be used

As a structural prior feature.

In some embodiments, training the initial model based on the loss result of the self-attention feature and the structure prior feature, and the loss result of the face key point detection result and the sample data of the face key point position to obtain the face key point detection model, includes: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; and training the initial model based on the first loss function, the second loss function and the weight parameter to obtain a face key point detection model.

It is understood that the first loss function is constructed based on the self-attention feature and the structure prior feature; constructing a second loss function according to the detection result of the face key points and the position sample data of the face key points, wherein the weight parameters of the first loss function and the second loss function can be preset

And training the initial model according to the first loss function, the second loss function and the weight parameter to obtain a face key point detection model.

In some embodiments, determining the first loss function based on the self-attention feature and the structure prior feature comprises:

based on the formula:

；

；

，

；

；

determining a first loss function;

wherein the content of the first and second substances,

the first loss function is represented as a function of,

the prior characteristics of the structure are shown,

and

both represent the two-dimensional coordinates of the keypoints in the face keypoint location sample data,

representing self-attention characteristics, and Z represents a characteristic diagram extracted from a human face sample image by a human face key point detection model;

representing said self-attention feature

The values of the singlets in the tensor after deformation.

A keypoint neighborhood may be defined as

Constraining activation of non-neighborhood positions and calculating prior loss

In some embodiments, determining the second loss function based on the face keypoint detection result and the face keypoint sample data includes:

based on the formula:

；

determining a second loss function;

wherein the content of the first and second substances,

represents the second lossThe function of the function is that of the function,

the detection result of the key points of the human face is shown,

the coordinates of the jth key point in the face key point detection result are shown,

and the method comprises the steps of representing the position sample data of the key points of the human face, wherein N represents the total number of the key points in the detection result of the key points of the human face, and Y represents the image of the human face sample.

Spatial self-attention model

Deforming into H W, and

and (4) showing. The value of the mask-corresponding position is used as a weight in the loss calculation, and the closer the distance to the key point, the higher the weight.

A second loss function may be calculated for face keypoint detection results

And finally with the weighting parameters

Two loss functions are balanced:

；

；

；

wherein the content of the first and second substances,

is a result of detecting the key points of the human face,

is a head module of a human face key point detection model,

is a function of the first loss to be,

is the second Loss function and Loss is the synthetic Loss function.

In the testing and application stage, the multi-resolution features extracted by the face key point detection model are enhanced through a spatial self-attention module to obtain self-attention features, the self-attention features are used for enhancing the face features to further obtain the position information of the face key points, and the face structure does not participate in the testing and application stage of the face key point detection model in a priori manner.

The following describes the face key point detection device provided by the present invention, and the face key point detection device described below and the face key point detection method described above may be referred to in correspondence with each other.

As shown in fig. 3, the present invention provides a face key point detection device, which includes: an acquisition module 310 and an identification module 320.

An obtaining module 310, configured to obtain a face image to be recognized;

the recognition module 330 is configured to input a face image to be recognized into the face key point detection model, so as to obtain a self-attention feature extracted by the face key point detection model and face key point position information output based on the self-attention feature;

In some embodiments, the face keypoint detection apparatus may also have a training module.

A training module to: acquiring a human face sample image and an initial model; inputting a face sample image into an initial model, extracting self-attention features from the initial model based on the face sample image, and carrying out image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature, the loss results of the face key point detection result and the loss results of the sample data of the face key point position to obtain the face key point detection model.

In some embodiments, the training module is further configured to: acquiring sample data of the positions of key points of the human face; taking each key point in the position sample data of the key point of the human face as a center, and taking the neighborhood of each key point as a mask of each key point based on Gaussian distribution; based on the mask of the coordinates of the plurality of keypoints, a structure prior feature is determined.

In some embodiments, the training module is further configured to:

based on the formula:

；

determining a mask for each keypoint;

wherein the content of the first and second substances,

and

the abscissa and ordinate of the jth keypoint are represented,

and

In some embodiments, the training module is further configured to: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; and training the initial model based on the first loss function, the second loss function and the weight parameter to obtain a face key point detection model.

In some embodiments, the training module is further configured to:

based on the formula:

；

；

，

；

；

determining a first loss function;

wherein the content of the first and second substances,

the first loss function is represented as a function of,

the prior characteristics of the structure are shown,

and

and the self-attention feature obtained by processing the face sample image by the face key point detection model is represented.

In some embodiments, the training module is further configured to:

based on the formula:

；

determining a second loss function;

wherein the content of the first and second substances,

the second loss function is represented as a function of,

the detection result of the key points of the human face is shown,

representing face keysPoint position sample data, N represents the total number of key points in the face key point detection result, and Y represents a face sample image.

According to the face key point detection device provided by the invention, the self-attention mechanism is added in the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model obtained by training can improve the accuracy of face key point detection, has strong anti-interference capability, and improves the robustness in a difficult scene.

Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may invoke logic instructions in the memory 430 to perform a face keypoint detection method comprising: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.

According to the electronic equipment provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under a difficult scene is improved.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention further provides a computer program product, where the computer program product includes a computer program, the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, a computer can execute the face keypoint detection method provided by the above methods, and the method includes: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.

According to the computer program product provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under difficult scenes is improved.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the face keypoint detection method provided by the above methods, the method comprising: acquiring a face image to be recognized; inputting a face image to be recognized into a face key point detection model to obtain self-attention features extracted by the face key point detection model and face key point position information output based on the self-attention features; the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image.

According to the non-transitory computer-readable storage medium provided by the invention, the self-attention mechanism is added into the face key point detection model, and the structural prior constraint is utilized in the training process, so that the face key point detection model can be trained to improve the accuracy of face key point detection, the anti-interference capability is strong, and the robustness under difficult scenes is improved.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A face key point detection method is characterized by comprising the following steps:

acquiring a face image to be recognized;

the face key point detection model is obtained by supervised training based on a face sample image, face key point position sample data and structure prior characteristics of the face sample image, the structure prior characteristics of the face sample image are obtained by performing structure prior generation operation on the face key point position sample data, and the face key point position sample data are sample labels corresponding to the face sample image;

the face key point detection model is obtained by training based on the following steps: acquiring the face sample image and an initial model; inputting the face sample image into the initial model, extracting self-attention features by the initial model based on the face sample image, and performing image recognition based on the self-attention features to obtain a face key point detection result output by the initial model; training the initial model based on the loss results of the self-attention feature and the structure prior feature and the loss results of the face key point detection result and the sample data of the face key point position to obtain the face key point detection model;

the training the initial model based on the loss result of the self-attention feature and the structure prior feature and the loss result of the face key point detection result and the face key point position sample data to obtain the face key point detection model comprises: determining a first loss function based on the self-attention feature and the structure prior feature; determining a second loss function based on the detection result of the face key points and the sample data of the positions of the face key points; training the initial model based on the first loss function, the second loss function and the weight parameter to obtain the face key point detection model;

said determining a first loss function based on said self-attentional feature and said structure prior feature, comprising:

based on the formula:

；

；

，

；

；

determining a first loss function;

wherein the content of the first and second substances,

representing a function of the first loss as a function of,

representing a prior feature of the structure,

and

representing said self-attention feature

The values of the single elements in the tensor after deformation;

the determining a second loss function based on the detection result of the face key point and the sample data of the position of the face key point includes:

based on the formula:

；

determining a second loss function;

wherein the content of the first and second substances,

representing the second loss function in the form of a function,

representing the detection result of the face key points,

2. The method for detecting the key points of the human face according to claim 1, wherein the structure prior characteristics of the human face sample image are obtained based on the following steps:

acquiring sample data of the positions of the key points of the human face;

3. The method according to claim 2, wherein the masking each keypoint by a neighborhood of each keypoint based on a gaussian distribution centered on each keypoint in the sample data of the positions of the keypoints of the face comprises:

based on the formula:

；

determining a mask for each of the keypoints;

wherein the content of the first and second substances,

and

the abscissa and ordinate of the jth keypoint are represented,

and

4. A face key point detection device, comprising:

the acquisition module is used for acquiring a face image to be recognized;

based on the formula:

；

；

，

；

；

determining a first loss function;

wherein the content of the first and second substances,

representing a function of the first loss as a function of,

representing a prior feature of the structure,

and

representing said self-attention feature

The values of the single elements in the tensor after deformation;

based on the formula:

；

determining a second loss function;

wherein the content of the first and second substances,

representing the second loss function in the form of a function,

representing the detection result of the face key points,

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the face keypoint detection method according to any of claims 1 to 3 when executing the program.

6. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the face keypoint detection method according to any one of claims 1 to 3.