CN115131853A

CN115131853A - Face key point positioning method and device, electronic equipment and storage medium

Info

Publication number: CN115131853A
Application number: CN202210648126.4A
Authority: CN
Inventors: 王金桥; 马辰; 刘智威; 赵朝阳
Original assignee: Objecteye Beijing Technology Co Ltd
Current assignee: Objecteye Beijing Technology Co Ltd
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-09-30

Abstract

The invention relates to the technical field of face recognition, and provides a face key point positioning method, a face key point positioning device, electronic equipment and a storage medium. In addition, the method does not need to carry out face key point positioning on the face image with the shielding through the key point positioning model, does not need to manually mark the shielding sample to train the key point positioning model, can ensure the positioning accuracy of the key point positioning model on the face image without the shielding, and further reduces the positioning cost.

Description

Face key point positioning method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of face recognition, in particular to a face key point positioning method, a face key point positioning device, electronic equipment and a storage medium.

Background

The face key point positioning is to detect the position of each face in a face image and the positions of key points, such as the positions of major facial organs like nose, eyes, mouth, and the like. The technology can be used for face posture correction, posture recognition, fatigue monitoring, face three-dimensional reconstruction, face animation, face recognition, expression analysis and the like. The false key point positioning can cause distortion and deformation of the face, so that an algorithm capable of accurately extracting the face key points is very important.

At present, the face key point location algorithm mainly includes three types, which are an Active Appearance Model (AAM) based on a generation Model, an Active Shape Model (AAM) and an extension and extension location method thereof, a location method based on a cascade Shape regression, and a location method based on a deep learning.

The three methods are all that the positioning model is utilized to position the key points of the face on the original face image with the shielding, and because the apparent characteristics of the face of the shielding area on the face image with the shielding are lost, the positions of the key points in the shielding area can only be guessed by the positions of other visible points and the structure of the face in advance. When the occlusion area is large, the position of a key point in the occlusion area is directly estimated, so that great uncertainty and noise are generated, and meanwhile great difficulty is brought to manually marking occlusion samples, so that the number of the occlusion samples is rare, and a positioning model with robust anti-occlusion capability cannot be obtained through direct data-driven training.

Disclosure of Invention

The invention provides a method and a device for positioning key points of a human face, electronic equipment and a storage medium, which are used for solving the defects in the prior art.

The invention provides a face key point positioning method, which comprises the following steps:

acquiring an original face image to be positioned at a key point;

inputting the original face image into a face de-occlusion model to obtain an unshielded face image corresponding to the original face image output by the face de-occlusion model;

inputting the non-shielding face image into a key point positioning model to obtain face key point information in the non-shielding face image output by the key point positioning model;

the face de-occlusion model is obtained by training based on a first non-occlusion face image sample and a corresponding first occlusion face image sample, and the key point positioning model is obtained by training based on a second non-occlusion face image sample carrying a first face key point label.

According to the method for positioning the key points of the face, provided by the invention, the first occluded face image sample is determined based on the following method:

determining first face three-dimensional information of the first non-occlusion face image sample, and determining a first texture map of a face area in the first non-occlusion face image sample based on the first face three-dimensional information;

acquiring a second occluded face image sample, determining second face three-dimensional information of the second occluded face image sample, and determining a second texture map of an occluded area in the second occluded face image sample based on the second face three-dimensional information;

and fusing the first texture map and the second texture map to obtain a first fusion result, and determining the first occluded human face image sample based on the first human face three-dimensional information and the first fusion result.

According to the method for locating the key points of the human face, provided by the invention, the original human face image is input into a human face de-occlusion model to obtain an unobstructed human face image corresponding to the original human face image output by the human face de-occlusion model, and the method comprises the following steps:

inputting the original face image into an encoder module of the face de-occlusion model, and performing feature extraction on the original face image by the encoder module to obtain a feature vector of the original face image output by the encoder module;

and inputting the feature vector to a generation module of the face de-occlusion model, and carrying out image reconstruction on the feature vector by the generation module to obtain the non-occlusion face image output by the generation module.

According to the face key point positioning method provided by the invention, the encoder module is obtained by training based on the following method:

respectively inputting the first non-occlusion face image sample and the first occlusion face image sample to an initial encoder module to obtain a first feature vector sample corresponding to the first non-occlusion face image sample and a second feature vector sample corresponding to the first occlusion face image sample, which are output by the initial encoder module;

and calculating the characteristic loss of the initial encoder module based on the first characteristic vector sample and the second characteristic vector sample, and performing parameter iteration on the initial encoder module based on the characteristic loss to obtain the encoder module.

According to the face key point positioning method provided by the invention, a first occluded face image sample carries a second face key point label, and the second face key point label is determined based on the face key point label carried by a first non-occluded face image sample;

the generation module is obtained by training based on the following method:

inputting a third feature vector sample corresponding to the first shielded face image sample into an initial generation module to obtain a first face image corresponding to the third feature vector sample output by the initial generation module;

inputting the first face image into an auxiliary key point positioning module to obtain auxiliary face key point information corresponding to the first face image and output by the auxiliary key point positioning module;

calculating the consistency loss of the key points of the initial generation module based on the auxiliary face key point information and the second face key point label;

and performing parameter iteration on the initial generation module based on the consistency loss of the key points to obtain the generation module.

According to the method for positioning key points of a human face provided by the invention, the parameter iteration is performed on the initial generation module based on the consistency loss of the key points to obtain the generation module, and the method specifically comprises the following steps:

inputting a fourth feature vector sample corresponding to the first non-occluded face image sample into an initial generation module to obtain a second face image corresponding to the fourth feature vector sample output by the initial generation module;

calculating a first image reconstruction loss of the initial generation module based on the first face image and the first non-occlusion face image sample, and calculating a second image reconstruction loss of the initial generation module based on the second face image and the first non-occlusion face image sample;

respectively inputting the first face image, the second face image and the first non-occlusion face image sample into an auxiliary discrimination module to obtain a first feature map corresponding to the first face image, a second feature map corresponding to the second face image and a third feature map corresponding to the first non-occlusion face image sample, which are extracted by the auxiliary discrimination module;

calculating a first feature matching loss of the initial generation module based on the first feature map and the third feature map, and calculating a second feature matching loss of the initial generation module based on the second feature map and the third feature map;

and performing parameter iteration on the initial generation module based on the first image reconstruction loss, the second image reconstruction loss, the first feature matching loss, the second feature matching loss and the key point consistency loss to obtain the generation module.

According to the face key point positioning method provided by the invention, the key point positioning model is obtained by training based on the following method:

inputting the second non-occlusion face image sample into an initial key point positioning model, obtaining a first Gaussian response image which is output by the initial key point positioning model and corresponds to the second non-occlusion face image sample and takes key points as centers, and determining a second Gaussian response image corresponding to a first face key point label;

calculating a keypoint localization loss of the initial keypoint localization model based on the first Gaussian response map and the second Gaussian response map;

and performing parameter iteration on the initial key point positioning model based on the key point positioning loss to obtain a key point positioning model.

The invention also provides a face key point positioning device, which comprises:

the image acquisition module is used for acquiring an original face image to be positioned by the key point;

the shielding removing module is used for inputting the original face image into a face shielding removing model to obtain a non-shielding face image corresponding to the original face image output by the face shielding removing model;

the key point positioning module is used for inputting the non-shielding face image into a key point positioning model to obtain face key point information in the non-shielding face image output by the key point positioning model;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the human face key point positioning method.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for locating face keypoints as described in any of the above.

The present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the method for locating key points of a human face as described in any of the above.

According to the face key point positioning method, the face key point positioning device, the electronic equipment and the storage medium, the non-shielding face image corresponding to the original face image is obtained through the face de-shielding model, the face key point information in the original face image can be accurately obtained through the key point positioning model, and result uncertainty or noise caused by the fact that the face key point positioning is directly carried out on the shielded face image in the prior art is avoided. In addition, the method does not need to carry out face key point positioning on the face image with the shielding through the key point positioning model, does not need to manually mark the shielding sample to train the key point positioning model, can ensure the positioning accuracy of the key point positioning model on the face image without the shielding, and further reduces the positioning cost.

Drawings

In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic flow chart of a method for locating key points of a human face according to the present invention;

FIG. 2 is a schematic diagram of a process of generating a first occluded face image sample in the method for locating a key point of a face according to the present invention;

FIG. 3 is a schematic flow chart illustrating the process of training an initial model of a face de-occlusion model in the face key point positioning method provided by the invention;

FIG. 4 is a schematic structural diagram of a face key point positioning device according to the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Because the face key point positioning method in the prior art usually utilizes a positioning model to position the face key point on an original shielded face image, when a shielded area is large, the position of the key point in the shielded area is directly estimated, so that great uncertainty and noise are generated, and meanwhile great difficulty is brought to manually marking shielded samples, so that the number of shielded samples is rare, and the positioning model with robust anti-shielding capability cannot be obtained by directly training in a data driving mode. In order to solve the above problems, an embodiment of the present invention provides a method for positioning key points of a human face.

Fig. 1 is a schematic flowchart of a method for locating key points of a human face according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

s1, acquiring an original face image to be subjected to key point positioning;

s2, inputting the original face image into a face de-occlusion model to obtain a non-occlusion face image corresponding to the original face image output by the face de-occlusion model;

s3, inputting the non-occlusion face image into a key point positioning model to obtain face key point information in the non-occlusion face image output by the key point positioning model;

Specifically, an execution subject of the method for positioning key points of a human face provided in the embodiment of the present invention is a positioning apparatus for key points of a human face, and the apparatus may be configured in a server, where the server may be a local server or a cloud server, and the local server may be a computer, which is not specifically limited in the embodiment of the present invention.

Step S1 is executed first, an original face image to be subjected to key point positioning is obtained, where the original face image is an image of face key point information of a face area that needs to be determined, and may be an unobstructed face image, that is, the face area is not obstructed by an obstruction, or an obstructed face image, that is, the face area is obstructed by an obstruction, and this is not limited specifically here.

Then, step S2 is executed to input the original face image into the face de-occlusion model, and perform de-occlusion operation on the original face image through the face de-occlusion model, so as to obtain and output a non-occluded face image corresponding to the original face image. The shielding removal operation refers to removing the shielding of the face by shields such as masks, sunglasses, hands, scarves and the like in the face image.

Because the original face image can be either an unshielded face image or a shielded face image, the input of the face unshielded model can be either an unshielded face image or a shielded face image, and the output is the unshielded face image.

When the original face image is an unshielded face image, the unshielded face operation of the face unshielded model does not play a practical role on the original face image, and the unshielded face image corresponding to the output original face image is the original face image or the error of the original face image is within a preset range; when the original face image is the shielded face image, the shielding object on the face area in the original face image can be removed through the shielding removal operation of the face shielding removal model, and the non-shielded face image is obtained.

In the embodiment of the invention, the adopted face deblocking model can be obtained by training the initial face deblocking model through the first non-blocking face image sample and the corresponding first blocking face image sample. The initial face de-occlusion model may be a neural network model or other models, and is not limited in detail here.

The first non-occlusion face image sample is a face image sample completely displayed in a face area, and can be obtained through a conventional face image library or through shooting. The first occluded face image sample is a face image sample corresponding to the first non-occluded face image sample, in which a face area is partially occluded by an occluding object, and the first occluded face image sample may be automatically generated by the first non-occluded face image sample, or may be obtained by shooting, and is not specifically limited here.

The first non-occlusion face image sample and the first occlusion face image sample belong to two images of the same person, and the difference is that the face area of the first non-occlusion face image sample is not occluded by an occlusion object, and the face area of the second occlusion face image sample is occluded by an occlusion object.

The number of the first non-blocking face image samples is multiple, and the first non-blocking face image samples can be specifically set according to needs, and different first non-blocking face image samples can be face images of the same person in different postures, and also can be face images of different persons, and the number is not specifically limited here.

Here, the first non-occlusion face image samples may be copied to obtain two identical first non-occlusion face image samples, and a first sample pair is formed, one of the first non-occlusion face image samples is used as a model input, the other first non-occlusion face image sample is used as a label, the initial face de-occlusion model is trained through the first sample pair, and the obtained face de-occlusion model may have a function of directly outputting the non-occlusion face image.

The first non-shielding face image sample and the corresponding first shielding face image sample can form a second sample pair, the first shielding face image sample is used as model input, the first non-shielding face image sample is used as a label, the initial face shielding model is trained through the second sample, and the obtained face shielding removing model has the function of performing shielding removing operation on the shielding face image.

And finally, executing the step S3, inputting the non-occlusion face image obtained in the step S2 into a key point positioning model, and obtaining and outputting face key point information in the non-occlusion face image through the key point positioning model. The face key point information refers to position information of key points of a face area in an unobstructed face image, and the key points may include 5, 21, 49, 68, or 92, and the like, which is not specifically limited herein.

The key point positioning model can be obtained by training the initial key point positioning model through a second non-blocking face image sample carrying a first face key point label. The second non-occlusion face image sample may be the same as or different from the first non-occlusion face image sample, and is not limited specifically here. The first face keypoint label is position information of keypoints of the face region in the second non-occlusion face image sample. The initial key point positioning model can be an hourglass network or a U-net network and the like.

The face key point positioning method provided by the embodiment of the invention comprises the steps of firstly obtaining an original face image to be subjected to key point positioning; then inputting the original face image into a face de-occlusion model to obtain a non-occlusion face image corresponding to the original face image output by the face de-occlusion model; and finally, inputting the non-shielding face image into the key point positioning model to obtain the face key point information in the non-shielding face image output by the key point positioning model. The method obtains the non-shielding face image corresponding to the original face image through the face de-shielding model, and further can accurately obtain the face key point information in the original face image through the key point positioning model, thereby avoiding the result uncertainty or noise caused by directly positioning the face key point on the shielded face image in the prior art. In addition, the method does not need to carry out face key point positioning on the face image with the shielding through the key point positioning model, does not need to manually mark the shielding sample to train the key point positioning model, can ensure the positioning accuracy of the key point positioning model on the face image without the shielding, and further reduces the positioning cost.

In the prior art, when face deblocking is performed, a deblocking model is usually adopted, and the deblocking model is obtained by training a training sample pair including an active face picture, a masked face picture, and a target output face picture. When the model for removing the shelters is obtained in training, the sheltered human face picture is usually obtained by direct shooting or by directly adding shelters such as a mask, a scarves and the like on the target output human face picture.

The source face image corresponding to the sheltered face picture obtained by direct shooting is difficult to find, and the sheltering effect of the sheltering object removing model is influenced by the fact that a large difference exists between the obtained sheltered face picture and the actual sheltered face picture in a mode of directly adding sheltering object pictures such as a mask and a scarves on the target output face picture.

Therefore, on the basis of the above embodiment, in the face keypoint locating method provided in the embodiment of the present invention, the first occluded face image sample is determined based on the following method:

acquiring a second shielded face image sample, determining second face three-dimensional information of the second shielded face image sample, and determining a second texture map of a shielded area in the second shielded face image sample based on the second face three-dimensional information;

Specifically, in the embodiment of the present invention, when determining the first occluded facial image sample, first three-dimensional facial information of the first non-occluded facial image sample may be determined, where the first three-dimensional facial information refers to three-dimensional information of a facial area in the first non-occluded facial image sample. The first non-occlusion face image sample can be input into a face three-dimensional reconstruction network, and a face area in the first non-occlusion face image sample is subjected to three-dimensional reconstruction through the face three-dimensional reconstruction network, so that first face three-dimensional information is obtained. The face three-dimensional reconstruction network can be a PRNet or other networks.

Then, the texture of the face area in the first non-occlusion face image sample is mapped to the face texture mapping space through the first face three-dimensional information, and the first texture mapping of the face area in the first non-occlusion face image sample is determined. The mapping refers to unfolding the surface texture of a face area in a three-dimensional first non-occluded face image sample and then drawing the surface texture on a two-dimensional plane, wherein the face texture mapping space can be a UV space, and the obtained first texture mapping is a UV texture mapping. U and V are coordinates of the face texture mapping space, and are used to define the position information of each point in the face texture mapping space.

Thereafter, a second occluded face image sample may be obtained, which may be a photographed real occluded face image. The second shielded face image sample can be input into a face three-dimensional reconstruction network, and the face area in the second shielded face image sample is subjected to three-dimensional reconstruction through the face three-dimensional reconstruction network to obtain second face three-dimensional information. And mapping the texture of the occlusion region in the second non-occlusion face image sample to a face texture mapping space through the second face three-dimensional information to determine a second texture mapping of the occlusion region in the second occlusion face image sample. The occlusion area is an area covered by an occlusion object in the face area, and can be obtained by segmenting the occlusion object of the second non-occlusion face image sample. The second texture map may also be a UV texture map.

And finally, the first texture map and the second texture map can be fused to obtain a first fusion result, the first fusion result can be subjected to three-dimensional rendering by combining the three-dimensional information of the first face, and a first shielded face image sample is determined.

After the first fusion result is obtained, Gaussian filtering smoothing processing can be further performed on the first fusion result, and then three-dimensional rendering can be performed on the Gaussian filtering smoothing processing result by combining the three-dimensional information of the first face, so that the first face image sample with the shielding effect is determined.

Fig. 2 is a schematic diagram of a generation process of a first occluded face image sample. In fig. 2, on one hand, the second occluded face image sample is segmented by the occlusion object to obtain the texture of the occlusion area, and the texture mapping is performed to obtain the second UV texture map. On the other hand, the first non-occlusion face image sample is subjected to three-dimensional reconstruction and texture mapping to obtain a first UV texture map. And fusing the first UV texture map and the second UV texture map to obtain a first fusion result, and rendering to obtain a first shielded face image sample by combining the first face three-dimensional information.

In the embodiment of the invention, the first shielded face image sample corresponding to the first non-shielded face image sample can be determined without direct shooting through the existing second shielded face image sample, so that the shielded area in the first shielded face image sample is closer to the real shielded area, and the shielded area in the first shielded face image sample is more in line with the actual situation.

On the basis of the foregoing embodiment, the method for locating a face key point according to the present invention, where the original face image is input to a face de-occlusion model to obtain an unobstructed face image corresponding to the original face image output by the face de-occlusion model, includes:

Specifically, in the embodiment of the present invention, the face de-occlusion model may include an encoder module and a generation module, the encoder module may be a mapping network in a convolutional neural network structure and is configured to perform feature extraction on an input, the generation module may be a style generation countermeasure network, the style generation countermeasure network may include a face synthesis network, and the face synthesis network may be a full convolutional neural network and is configured to perform image reconstruction on a feature extraction result of the encoder module. It is understood that the process of feature extraction can be equivalent to the process of mapping the original face image to the hidden space of the style generation countermeasure network.

After the original face image is input into the face de-occlusion model, feature extraction can be performed on the original face image through an encoder module to obtain a feature vector of the original face image, wherein the feature vector can be a 512-dimensional hidden vector.

And then, inputting the feature vector into a generation module of the face de-occlusion model, and fusing the feature vector and the noise feature map by combining a face synthesis network in the generation module with a self-adaptive instance normalization method to obtain a second fusion result. Wherein if the feature vector is w ₁ Feature vectors can be injected into each layer of the face synthesis network by an adaptive instance normalization method to guide the pattern of the face image generated by the face synthesis network. First, by affine transforming a matrix T _i To w ₁ Affine transformation is carried out to obtain y _i Namely, the following steps are provided:

y _i ＝T _i (w ₁ )

then, mixing y _i Splitting into y _s，i And y _b，i Two vectors, y _s，i And y _b，i The dimension of the face synthesis network is equal to the number of channels of the face synthesis network in the layer.

And finally, adding Gaussian noise into the face synthesis network to control certain randomness of the synthesized image. And y is calculated by the following method _s，i And y _b，i And noise characteristic diagram x _i Carrying out fusion, namely:

wherein x is _i Is a feature map of the ith channel of a Gaussian noise map z through a certain layer of the face synthesis network, AdaIN (x) _i ，y _i ) For a certain layer of corresponding fusion results, μ (x) _i ) And σ (x) _i ) Respectively represent x _i Mean and variance of all above feature values.

And finally, obtaining a second fusion result through layer-by-layer fusion, and obtaining the non-occlusion face image.

In the embodiment of the present invention, the initial model may include an initial encoder module and an initial generation module, and when the initial model is trained, the initial encoder module and the initial generation module may be trained respectively, and the face synthesis network in the initial generation module may be obtained by pre-training in advance, that is, the face synthesis network in the initial generation module is the same as the face synthesis network in the generation module.

In the embodiment of the invention, a specific structure of a face de-occlusion model is provided, and the acquisition of an unobstructed face image corresponding to an original face image is realized through the cooperation of all modules.

On the basis of the above embodiment, in the face key point positioning method provided in the embodiment of the present invention, the encoder module is trained based on the following method:

respectively inputting the first non-blocking face image sample and the first blocking face image sample into an initial encoder module to obtain a first feature vector sample corresponding to the first non-blocking face image sample and a second feature vector sample corresponding to the first blocking face image sample, which are output by the initial encoder module;

Specifically, in the embodiment of the present invention, the encoder module is obtained by training the initial encoder moduleIn the process of blocking, the first non-occlusion face image sample I and the first occlusion face image sample I can be firstly processed _o Respectively input into an initial encoder module to obtain a first feature vector sample w corresponding to a first non-occlusion face image sample I and a first occlusion face image sample I output by the initial encoder module _o Corresponding second feature vector sample w _o . Namely, the method comprises the following steps:

w _o ＝F _mapping (I _o )

w＝F _mapping (I)

wherein, F _mapping Representing an initial encoder block.

First eigenvector sample w and second eigenvector sample w _o Can be regarded as that the first non-occlusion face image sample I and the first occlusion face image sample I _o While mapping to the same location in the hidden space of the initial style generation countermeasure module.

Furthermore, the feature loss L of the initial encoder block can be calculated by using the first feature vector sample and the second feature vector sample according to the following formula _w ：

L _w ＝||w-w _o || ₂

The eigenvector loss is the Mean Square Error (MSE) to guarantee the first eigenvector samples w and the second eigenvector samples w _o The positions in the hidden space are consistent as much as possible.

Finally, L is lost by the above feature _w And performing parameter iteration on the initial encoder module, namely adjusting the parameters of the initial encoder module, continuously adopting other first non-shielding face image samples and other first shielding face image samples to calculate the characteristic loss on the basis, and continuously adjusting the parameters until the preset iteration times are reached or the characteristic loss is converged, thereby obtaining the encoder module.

In the embodiment of the invention, the initial encoder module is trained by combining the first non-occlusion face image sample with the first occlusion face image sample, so that the encoder module obtained by training has the function of carrying out non-occlusion encoding on the occlusion face image, and the obtained feature vector is the feature vector of the non-occlusion face image corresponding to the occlusion face image, thereby being convenient for combining the generation module to obtain the non-occlusion face image.

On the basis of the above embodiment, in the face key point positioning method provided in the embodiment of the present invention, the first occluded face image sample carries a second face key point label, and the second face key point label is determined based on the face key point label carried by the first non-occluded face image sample;

the generation module is obtained by training based on the following method:

calculating the consistency loss of key points of the initial generation module based on the auxiliary face key point information and the second face key point label;

Specifically, in the embodiment of the present invention, the first occluded face image sample may carry a second face keypoint tag. The second face key point label may be a face key point label carried by the first occluded face image sample when the first occluded face image sample is generated, and the second occluded face image sample adopted in the generation process does not carry a label.

On the basis, when the initial generation module is trained, the first shielded face image sample I can be firstly obtained _o Corresponding third eigenvector sample w ₃ Inputting the third feature vector to an initial generation module to obtain a third feature vector output by the initial generation moduleSample w ₃ Corresponding first face image I _o '. Third eigenvector sample w ₃ The encoder module may be generated by a trained encoder module, and is not particularly limited herein.

Then the first face image I _o Respectively inputting the images to an auxiliary key point positioning module to obtain a first face image I output by the auxiliary key point positioning module _o ' corresponding auxiliary face key point information. The auxiliary key point positioning module can be a conventional module with a key point positioning function, and the introduced auxiliary key point positioning module is used for calculating the key point consistency loss of the initial generation module according to the obtained face key point information.

The key point consistency loss of the initial generation module can be calculated according to the auxiliary face key point information and the second face key point label, namely:

L _ldmk ＝||T-D _ldmk (I _o ′)|| ₂

wherein, T is a second face key point label which can be in a coordinate vector form of the face key point, D _ldmk Module for locating representative auxiliary key points, D _ldmk (I _o ') indicates auxiliary face key point information, L _ldmk Representing a key point consistency loss.

It can be understood that the consistency loss of the key points is a mean square error, so as to ensure that the original face structure of the face image with the occlusion is not damaged after the occlusion removing operation is performed on the face image with the occlusion.

And finally, according to the consistency loss of the key points, performing parameter iteration on the initial generation module, namely adjusting the parameters of the initial generation module, continuously adopting other first shielded face image samples to calculate the consistency loss of the key points on the basis, and continuously adjusting the parameters until the preset iteration times are reached or the consistency loss of the key points is converged, thereby obtaining the generation module.

In the embodiment of the invention, the initial generation module is trained by combining the first shielded face image sample with the auxiliary key point positioning module, so that the generated module obtained by training has the image reconstruction function of keeping the face structure unchanged.

On the basis of the foregoing embodiment, the method for locating a face keypoint provided in the embodiment of the present invention, where the parameter iteration is performed on the initial generation module based on the keypoint consistency loss to obtain the generation module, specifically includes:

and performing parameter iteration on the initial generation module based on the first image reconstruction loss, the second image reconstruction loss, the first feature matching loss, the second feature matching loss and the key point consistency loss to obtain a generation module.

Specifically, in the embodiment of the present invention, when the initial generation module is trained, besides introducing a keypoint consistency loss, an image reconstruction loss and a feature matching loss may also be introduced.

The first non-shielding can be performed firstFourth feature vector sample w corresponding to face image sample I ₄ Inputting the result to an initial generation module to obtain a fourth feature vector sample w output by the initial generation module ₄ And a corresponding second face image I'. Fourth eigenvector sample w ₄ The encoder module may be generated by a trained encoder module, which is not limited in detail herein.

Then, based on the first face image I _o ' and a first non-occlusion face image sample I, calculating a first image reconstruction loss of an initial generation module, namely:

L _recon ＝||I-I _o ′|| ₁

wherein L is _recon The first image reconstruction loss is used for ensuring that the texture, the attribute and the like of the face image with the shielding are consistent with those of the face image without the shielding after the face image with the shielding is subjected to the shielding removal operation.

Calculating the second image reconstruction loss of the initial generation module according to the second face image I' and the first non-occlusion face image sample I, namely:

L′ _recon ＝||I-I′|| ₁

wherein, L' _recon And the second image reconstruction loss is used for ensuring that the texture, the attribute and the like of the non-shielding face image are consistent after the non-shielding face image is subjected to the shielding removing operation.

Thereafter, the first face image I _o 'respectively inputting the second face image I' and the first non-shielding face image sample I into an auxiliary discrimination module to obtain a first face image I extracted by the auxiliary discrimination module _o 'corresponding first feature map, second feature map corresponding to the second face image I', and third feature map corresponding to the first non-occlusion face image sample I. The structure of the auxiliary discrimination module is the same as that of a face recognition network in the initial style confrontation generation model, the auxiliary discrimination module comprises multiple layers, each layer of the auxiliary discrimination module has a feature extraction function, the auxiliary discrimination module also has a function of discriminating and classifying input images through extracted features, namely, feature graphs extracted by the auxiliary discrimination module are all used for subsequent confrontationAnd (4) judging and classifying the input images. It is understood that the discrimination classification is to discriminate whether the input image is an occluded face image or a non-occluded face image.

Furthermore, according to the first feature map and the third feature map, the first feature matching loss of the initial generation module can be calculated, that is, the following steps are performed:

wherein D is ⁱ (I) A third feature map extracted for the i-th layer of the auxiliary discrimination module, D ⁱ (I _o ') A first feature map L extracted from the i-th layer of the auxiliary discrimination module _FM The first feature matching loss is used for ensuring that the hidden layer features of the face image with the shielding are consistent with the hidden layer features of the face image without the shielding after the face image with the shielding is subjected to the shielding removing operation.

According to the second feature map and the third feature map, the second feature matching loss of the initial generation module can be calculated as follows:

wherein D is ⁱ (I ') is a second feature map L ' extracted from the ith layer of the auxiliary discrimination module ' _FM And the second feature matching loss is used for ensuring that the non-occlusion face image is consistent with the hidden layer feature of the non-occlusion face image after the non-occlusion face image is subjected to the deblocking operation.

And finally, according to the first image reconstruction loss, the second image reconstruction loss, the first feature matching loss, the second feature matching loss and the key point consistency loss, performing parameter iteration on the initial generation module, namely adjusting parameters of the initial generation module, continuously calculating the loss by adopting other first shielded face image samples and first non-shielded face image samples on the basis, and continuously adjusting the parameters until the preset iteration times are reached or all the losses are converged, thereby obtaining the generation module.

In the embodiment of the invention, the initial generation module is trained by combining the first non-shielding face image sample with the auxiliary judgment module, so that the generated module obtained by training not only has an image reconstruction function of keeping the face structure unchanged, but also has an image reconstruction function of keeping the image content unchanged and the hidden feature unchanged.

Fig. 3 is a schematic flowchart of a process of training an initial model of a face de-occlusion model in the face keypoint location method provided in the embodiment of the present invention.

As shown in fig. 3, the method includes:

respectively sampling the first shielded face image samples I _o And the first non-shielding face image sample I is respectively input into the initial encoder module to obtain a first shielding face image sample I output by the initial encoder module _o Corresponding second feature vector sample w _o And calculating the characteristic loss L of the initial encoder module through the first characteristic vector sample and the second characteristic vector sample _w And training the initial encoder module to obtain the encoder module.

Then, respectively taking the first occluded face image sample I _o And the first non-shielding face image sample I is respectively input into the encoder module to obtain a first shielding face image sample I output by the encoder module _o Corresponding third eigenvector sample w ₃ And a fourth feature vector sample w corresponding to the first non-occlusion face image sample I ₄ And respectively sampling the third feature vector samples w ₃ And fourth eigenvector samples w ₄ And injecting the data layer by layer into a face synthesis network of the initial generation module, and fusing the data layer by layer with noise characteristic graphs output by each layer of the face synthesis network respectively, wherein A in the figure 3 represents a fusion process, and N represents introduced Gaussian noise.

Finally, the first face image I can be respectively output through the face synthesis network _o 'and a second face image I'. Further, according to the auxiliary face key point information and the second face key point label, calculation can be performedKey point consistency loss L _ldmk Through I and I _o ', the first image reconstruction loss L can be calculated _recon From I and I ', a second image reconstruction loss L ' can be calculated ' _recon . And combining the auxiliary judgment module, calculating the first characteristic matching loss and the second characteristic matching loss, and training by combining the losses to obtain the generation module. Thus, the human face shielding removing model is obtained.

On the basis of the above embodiment, in the face key point positioning method provided in the embodiment of the present invention, the key point positioning model is obtained by training based on the following method:

and training the initial key point positioning model based on the key point positioning loss to obtain a key point positioning model.

Specifically, in the embodiment of the present invention, when the initial key point location model is trained, the second non-occluded face image sample may be input to the initial key point location model, a first gaussian response map with a key point as a center corresponding to the second non-occluded face image sample output by the initial key point location model is obtained, and a second gaussian response map corresponding to the first face key point label is determined.

Then, the key point positioning loss of the initial key point positioning model is calculated through the first Gaussian response diagram and the second Gaussian response diagram, namely:

wherein，L _heat Representing the keypoint localization loss, H (I) representing a first Gaussian response map,

representing a second gaussian response plot.

It can be understood that the key point positioning loss is a mean square error to ensure the accuracy of the key point positioning position.

In summary, the embodiment of the present invention provides an anti-occlusion face key point positioning method in a complex scene, which performs de-occlusion on an occluded face image by using a face de-occlusion method based on three-dimensional rendering and style confrontation generation network prior, so as to effectively reduce the positioning ambiguity of a key point positioning model. When the original face image is shielded, on the premise of keeping the face structure of the visible region unchanged, the prior information of the face synthesis module is introduced to restore the face apparent characteristics of the shielded region, and finally, a high-definition non-shielded face image is generated to be used for subsequent face key point positioning models to perform face key point positioning, so that the robustness of a positioning algorithm can be improved.

As shown in fig. 4, on the basis of the above embodiments, an embodiment of the present invention provides a face key point positioning device, which includes:

an image obtaining module 41, configured to obtain an original face image to be subjected to key point positioning;

the deblocking module 42 is configured to input the original face image into a face deblocking model, so as to obtain an unobstructed face image corresponding to the original face image output by the face deblocking model;

a key point positioning module 43, configured to input the non-occlusion face image into a key point positioning model, so as to obtain face key point information in the non-occlusion face image output by the key point positioning model;

On the basis of the foregoing embodiment, the face keypoint locating device provided in the embodiment of the present invention further includes a sample generation module, configured to:

On the basis of the foregoing embodiment, in the face keypoint locating device provided in the embodiment of the present invention, the occlusion removing module is specifically configured to:

and inputting the characteristic vector to a generation module of the face de-occlusion model, and performing image reconstruction on the characteristic vector by the generation module to obtain the non-occlusion face image output by the generation module.

On the basis of the above embodiment, the face keypoint locating device provided in the embodiment of the present invention further includes an encoder training module, configured to:

On the basis of the above embodiment, in the face key point positioning device provided in the embodiment of the present invention, the first occluded face image sample carries a second face key point label, and the second face key point label is determined based on the face key point label carried by the first non-occluded face image sample;

the apparatus further comprises a generate training module to:

On the basis of the foregoing embodiment, in the face keypoint locating device provided in the embodiment of the present invention, the generation training module is specifically configured to:

inputting a fourth feature vector sample corresponding to the first non-occlusion face image sample into an initial generation module to obtain a second face image corresponding to the fourth feature vector sample output by the initial generation module;

On the basis of the above embodiment, the face key point positioning device provided in the embodiment of the present invention further includes a key point positioning model training module, configured to:

Specifically, the functions of the modules in the face key point positioning device provided in the embodiment of the present invention correspond to the operation flows of the steps in the method embodiments one to one, and the implementation effects are also consistent.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a Processor (Processor)510, a communication Interface (Communications Interface)520, a Memory (Memory)530 and a communication bus 540, wherein the Processor 510, the communication Interface 520 and the Memory 530 communicate with each other via the communication bus 540. The processor 510 may call logic instructions in the memory 530 to execute the face keypoint localization method provided in the above embodiments, the method including: acquiring an original face image to be positioned by a key point; inputting the original face image into a face de-occlusion model to obtain an unshielded face image corresponding to the original face image output by the face de-occlusion model; inputting the non-shielding face image into a key point positioning model to obtain face key point information in the non-shielding face image output by the key point positioning model; the face de-occlusion model is obtained by training based on a first non-occlusion face image sample and a corresponding first occlusion face image sample, and the key point positioning model is obtained by training based on a second non-occlusion face image sample carrying a first face key point label.

Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention further provides a computer program product, the computer program product including a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, a computer is capable of executing the method for locating a face keypoint, provided in the above embodiments, the method including: acquiring an original face image to be positioned at a key point; inputting the original face image into a face deblocking model to obtain an unshielded face image corresponding to the original face image output by the face deblocking model; inputting the non-shielding face image into a key point positioning model to obtain face key point information in the non-shielding face image output by the key point positioning model; the face de-occlusion model is obtained by training based on a first non-occlusion face image sample and a corresponding first occlusion face image sample, and the key point positioning model is obtained by training based on a second non-occlusion face image sample carrying a first face key point label.

In still another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the method for locating key points of a human face provided in the above embodiments, the method including: acquiring an original face image to be positioned by a key point; inputting the original face image into a face de-occlusion model to obtain an unshielded face image corresponding to the original face image output by the face de-occlusion model; inputting the non-shielding face image into a key point positioning model to obtain face key point information in the non-shielding face image output by the key point positioning model; the face de-occlusion model is obtained by training based on a first non-occlusion face image sample and a corresponding first occlusion face image sample, and the key point positioning model is obtained by training based on a second non-occlusion face image sample carrying a first face key point label.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on the understanding, the above technical solutions substantially or otherwise contributing to the prior art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for locating key points of a human face is characterized by comprising the following steps:

acquiring an original face image to be positioned by a key point;

2. The method of claim 1, wherein the first occluded face image sample is determined based on the following method:

determining first three-dimensional face information of the first non-occlusion face image sample, and determining a first texture map of a face area in the first non-occlusion face image sample based on the first three-dimensional face information;

3. The method according to claim 1, wherein the inputting the original face image into a face de-occlusion model to obtain an unobstructed face image corresponding to the original face image output by the face de-occlusion model comprises:

4. The method of claim 3, wherein the encoder module is trained based on the following method:

5. The method according to claim 3, wherein the first occluded face image sample carries a second face keypoint label, and the second face keypoint label is determined based on the face keypoint label carried by the first non-occluded face image sample;

the generation module is obtained by training based on the following method:

6. The method of claim 5, wherein the performing parameter iteration on the initial generation module based on the consistency loss of the key points to obtain the generation module specifically comprises:

7. A method for locating key points on a human face according to any one of claims 1 to 6, wherein the key point location model is trained based on the following method:

inputting the second non-occlusion face image sample into an initial key point positioning model to obtain a first Gaussian response map which is output by the initial key point positioning model and corresponds to the second non-occlusion face image sample and takes a key point as a center, and determining a second Gaussian response map corresponding to a first face key point label;

8. A face key point positioning device is characterized by comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the face keypoint localization method according to any of claims 1 to 7.

10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method for locating face keypoints according to any one of claims 1-7.