CN113989902A

CN113989902A - Method, device and storage medium for identifying shielded face based on feature reconstruction

Info

Publication number: CN113989902A
Application number: CN202111344585.5A
Authority: CN
Inventors: 朱鹏飞; 贾安; 胡清华
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-01-28

Abstract

The invention discloses a method, a device and a storage medium for identifying an occluded face based on feature reconstruction, wherein the method comprises the following steps: respectively inputting the non-shielding face image and the shielding face image into a face feature reconstruction model, reconstructing the non-shielding face image and the shielding face image, and acquiring a mapping relation between the shielding face image and the non-shielding face image; inputting the image characteristics of the shielded human face into a designed small network to obtain a weighted mapping relation, and then outputting the weighted image characteristics as image characteristics to be repaired; and constructing a reconstruction loss function and a style content loss function for optimizing the reconstruction effect. The device comprises: a processor and a memory. The method can better recover the human face shielding part, and cannot influence the characteristics of the non-shielding part.

Description

Method, device and storage medium for identifying shielded face based on feature reconstruction

Technical Field

The invention relates to the field of face recognition, in particular to a method, a device and a storage medium for recognizing an occluded face based on feature reconstruction.

Background

In the field of face recognition, even with the most advanced generic face recognition models, its performance is significantly degraded by occlusion. To solve this problem, related researchers have proposed many methods of occlusion face recognition, which can be summarized into two types: the first method comprises the following steps: the occluded face portion is restored. And the second method comprises the following steps: removing the feature damaged by the occlusion.

In a first class of methods, a representative work is Sparse Representation-based Classification, which recovers occluded face portions using a linear combination of training images. Subsequently, the method is improved by designing the distribution of sparse constraint terms or characterizing structural information. Subsequently, there are also relevant researchers who introduce deep learning to recover occluded faces for recognition. In the prior art, an LSTM (Long-Short Term Memory RNN), which is a cyclic neural network of Long-Short Term Memory model, is proposed to recover the face region blocked in the field and identify the recovered face image. However, the first category of methods generally does not recover the facial portion well, while preserving strong identity information remains very challenging.

The second category of early studies was based on shallow models with hand-made features. Therefore, when occluded, it is very simple to delete the damaged feature. However, the accuracy of these methods is limited by the shallow structure. Researchers have introduced depth models to optimize this problem by designing complex algorithms to remove the damaged depth features. But since the convolutional neural network in depth learning causes the spatial mapping between the input image and the depth features to be opaque, it is difficult to identify the corrupted features even if the occlusion position in the input image is provided. To address this problem, researchers have addressed this problem by adding a mask branch in the middle layer of the convolutional neural network model, which is expected to assign a lower weight to hidden cells that are corrupted by occlusion. But the middle part of the network contains too much extraneous information and no additional supervision is provided for learning guidance, so that it is difficult to reliably identify a broken unit.

Currently, the relatively best approach is to use a binary mask to clean corrupted features from a higher level network where the features are more discriminative. Specifically, the face image is divided into blocks and a dictionary containing all the blocks is learned to map each occlusion block to a corresponding feature mask, and then in the testing phase, it first detects the occluded blocks and then retrieves the corresponding binary mask to apply to the test features. This two-stage approach must rely on external occlusion detectors, resulting in a large network model. Furthermore, to learn the lexicon, the deep face model must be trained separately, making training inefficient and time consuming.

Disclosure of Invention

The invention provides a method, a device and a storage medium for identifying an occluded face based on feature reconstruction, which can better recover a face occluded part without influencing the feature of an unoccluded part, and are described in detail as follows:

in a first aspect, a method for identifying an occluded face based on feature reconstruction includes:

respectively inputting the non-shielding face image and the shielding face image into a face feature reconstruction model, reconstructing the non-shielding face image and the shielding face image, and acquiring a mapping relation between the shielding face image and the non-shielding face image;

inputting the image characteristics of the shielded human face into a designed small network to obtain a weighted mapping relation, and then outputting the weighted image characteristics as image characteristics to be repaired;

and constructing a reconstruction loss function and a style content loss function for optimizing the reconstruction effect.

In one embodiment, the face feature reconstruction model is:

wherein Z is_i＝E_i(I_i) Representing hidden vector features, I_iRepresenting the ith faceImage, E_iRepresentation feature extractor, G_iA representation of the image generator is provided,

representing the reconstructed image.

Wherein the face feature reconstruction model comprises two branches, the weights of the two branches are shared,

reconstructing a new face image by the first branch; the second branch reconstructs the occluded face image.

In one embodiment, the designed small network is:

first local feature Z₁Sending into a volume of lamination layer to respectively generate two new mapping characteristics Z₂And Z₃，{Z₂,Z₃}∈R^C ^×H×WIs a reaction of Z₂,Z₃Resetting to C × N, wherein N ═ H × W is the number of pixels;

Z₂and Z₃Matrix multiplication is carried out, and the information Z of interest is calculated through a softmax layer₅∈R^N×NR is a feature space;

wherein the content of the first and second substances,

and

is Z₁A new mapping characteristic is generated and the new mapping characteristic,

the features obtained for the calculation of the softmax layer.

Further, the method further comprises:

local feature Z₁Input to the convolutional layer to generate a new feature map Z₄∈R^C×H×WAnd reset to R^C×NTo Z is paired with₄And Z₅Performing matrix multiplication operation and setting the operation result as R^C×H×WMultiplying the operation result by a proportional parameter lambda, and carrying out element-based summation operation on the characteristics to obtain a final output Z₆∈R^C×H×W。

In one embodiment, when the second branch, the method further comprises:

and designing an occlusion data synthesis algorithm for recovering and occluding partial areas of the human face to acquire occluded human face images.

Wherein, the occlusion data synthesis algorithm is as follows:

inputting normal image datasets Images and dataset attribute position label text label.

Extracting a label value label in the text; obtaining a specific coordinate value coordinate through a label value label;

changing the pixel value in the coordinate value range; storing the processed image; and circularly processing each image until all the images are processed.

In a second aspect, an occlusion face recognition apparatus based on feature reconstruction, the apparatus comprising: a processor and a memory, the memory having stored therein program instructions, the processor calling the program instructions stored in the memory to cause the apparatus to perform the method steps of any of the first aspects.

In a third aspect, a computer-readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method steps of any one of the first aspect.

The technical scheme provided by the invention has the beneficial effects that:

(1) the method can better recover the shielded human face characteristics without influencing the unshielded human face characteristics;

(2) the two branch parameters of the core model designed in the method are shared, so that the parameter quantity of the model can be reduced;

(3) the small network designed in the method ensures the semantic consistency of the shielding characteristics and other characteristics of the human face, better generates shielding local information and is suitable for being migrated and used in the shielding human face recognition field;

(4) the loss function adopted by the design of the method is more suitable for processing the problem of shielded face recognition, the convergence of the task can be accelerated, and the performance is effectively improved;

(5) experiments prove that the method is suitable for application and popularization in the field of shielded face recognition.

Drawings

FIG. 1 is a diagram of a face reconstruction model;

FIG. 2 is a structural design diagram of the inside of a small network for face reconstruction;

FIG. 3 is a structural design diagram of a pre-training model of a face reconstruction model;

FIG. 4 is a Bottleneck structural design diagram inside a pre-training model of a face reconstruction model;

FIG. 5 is a flow chart of occlusion data synthesis for an occlusion face recognition method based on feature reconstruction;

FIG. 6 is an exemplary diagram of an occlusion data synthesis for an occlusion face recognition method based on feature reconstruction;

FIG. 7 is an exemplary diagram of an experimental result of a method for identifying an occluded face based on feature reconstruction;

fig. 8 is a schematic structural diagram of an occlusion face recognition device based on feature reconstruction.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

In order to solve the technical problem in the background art, the embodiment of the invention provides a feature reconstruction-based method for completing the identification of the occluded face, and on one hand, the embodiment of the invention recovers the occluded face part. On the other hand, after recovery, embodiments of the present invention will remove poorly recovered features, i.e., damaged features. More importantly, the algorithm model provided by the embodiment of the invention can solve the problem of face recognition of partial occlusion to a certain extent, and experiments prove the effectiveness of the algorithm.

Example 1

The embodiment of the invention provides an occlusion face recognition method based on feature reconstruction, and with reference to the figures 1-3, the method comprises the following steps:

101: respectively inputting the non-shielding face image and the shielding face image into a face feature Reconstruction model FRM (face Reconstruction model), reconstructing the non-shielding face image and the shielding face image, and acquiring a mapping relation between the shielding face image and the non-shielding face image;

wherein, the mapping relation of FRM is defined as f, the input parameters are respectively an occluded Face image and a non-occluded Face image, and are expressed as Face_occAnd Face_cleanThe method aims to: the FRM is made to learn the mapping relationship f, which can be expressed in a formalization as follows:

FRM＝f(Face_occ,Face_clean) (1)

102: inputting the image characteristics of the shielded human face into a designed small network to obtain a weighted mapping relation, and then outputting the weighted image characteristics as image characteristics to be repaired;

wherein the weighted mapping relationship is represented as f', and the parameter is from E in FIG. 1₂And E₁，E₂Feature extractor for representing occluded face images, E₁Representing a non-shielding face image feature extractor, A representing a small network, and formally expressing a mapping relation f' as follows:

f′＝A(E₂(Face_occ),E₁(Face_clean)) (2)

103: and constructing a reconstruction loss function and a style content loss function for optimizing the reconstruction effect.

The face feature reconstruction model comprises two branches, the weights of the two branches are shared, and the first branch reconstructs a new face image; the second branch reconstructs the occluded face image.

When the second branch is reached, before step 101, the method for identifying the occlusion face further includes the following steps:

Wherein, the occlusion data synthesis algorithm is as follows:

Further, the small network in the step 102 is: first local feature Z₁Sending into a volume of lamination layer to respectively generate two new mapping characteristics Z₂And Z₃，{Z₂,Z₃}∈R^C×H×WIs a reaction of Z₂,Z₃Resetting to C × N, wherein N ═ H × W is the number of pixels;

wherein the content of the first and second substances,

and

the features obtained for the calculation of the softmax layer.

In summary, the embodiment of the present invention can better recover the human face shielding part through the steps 101 to 103, and does not affect the characteristics of the non-shielding part, thereby meeting various requirements in practical applications.

Example 2

The protocol of example 1 is further described below in conjunction with specific examples, experimental data, and as described in detail below:

firstly, designing a human face feature reconstruction model

The face feature reconstruction model of the embodiment of the present invention is designed based on VAE (Variational auto encoder) as a whole, and the embodiment of the present invention innovatively designs two branches, the first branch is a reconstructed normal face model, as shown in fig. 1, the input is a normal face image, the face feature is extracted through a pre-trained face recognition model ResNet (residual error network), and the model structure is as shown in fig. 3:

wherein, the STAGE0 is for preprocessing INPUT, the STAGE1 (STAGE one) includes 3 bottlenecks (Bottleneck layer), and the rest STAGEs 2, 3, and 4 include 4, 6, and 3 bottlenecks, respectively. The bottleeck structure is shown in fig. 4, where C in the structure diagram represents the number of channels of the input map, W represents the width of the input map, CONV represents the convolutional layer, BN is Batch Normalization, and RELU refers to the RELU activation function. The facial features are then input into the generative model, thereby reconstructing a new facial image. The generation model is realized through a series of two-dimensional convolution transposition layers, and each transposition layer is paired with a two-dimensional batch standard layer and an activation function, so that the generation model is used for mapping a hidden vector obtained from a pre-training model to a data space, and a new face image is generated.

In addition, the second branch is used for reconstructing an occlusion face model, and unlike the first branch, the second branch is used for inputting an occlusion face image and outputting the occlusion face image as a reconstructed occlusion face image.

In the testing stage, the shielded face image is input, and the normal face image is output through the face characteristic reconstruction model designed by the embodiment of the invention.

The embodiment of the invention defines the human face characteristic reconstruction process as follows:

wherein Z is_i＝E_i(I_i) Representing hidden vector features, I_iRepresenting the ith human face image, E_iRepresentation feature extractor, G_iA representation of the image generator is provided,

representing the reconstructed image.

Namely, the formula (1) can be used for reconstructing a normal face model by the first branch to generate a new face image; and the method can also be used for reconstructing an occlusion face model by the second branch to generate a new occlusion image. When the face image is used for reconstructing a normal face model, a normal face image is input; when used in the second branch to reconstruct the occlusion face model, the input is the occlusion face image. The face feature reconstruction model of the embodiment of the invention comprises two branches, so that the embodiment of the invention shares the weight of the two branches, thereby reducing the parameter quantity of the face feature reconstruction model and accelerating the convergence speed of the face feature reconstruction model. For example: and if the convolution kernel is m × m, the total number of parameters is m × C when weight sharing is performed, wherein C is the number of channels, but weight sharing is not performed, the number of total parameters is W × H × C, wherein W is the width of the image, H is the length, and C is the number of channels, so that the total parameters are changed into times of the parameters for weight sharing.

Second, design of rebuilding small network

In the embodiment of the invention, the purpose of designing the small network is to enable the human face feature reconstruction model to learn the shielded local features, so that the local features are smoothly recovered. Therefore, the focus of the designed small network is on the recovery of occlusion features, and the invention embodiment introduces an Attention mechanism to help achieve this function. As shown in fig. 1, the input of the small network is a Z1 vector, that is, the local feature is taken, and in order to learn a better local feature, the embodiment of the present invention inputs the local feature into the small network, as shown in fig. 2, the core is calculated by performing a dot product first, then performing a softmax weight extraction operation, and then obtaining a weighted feature, so as to serve as an output, where the small network specifically is:

given a local feature Z₁∈R^C×H×WThe local feature is first sent to a convolution layer to generate two new mapping features Z₂And Z₃Wherein { Z₂,Z₃}∈R^C×H×WThen Z is₂,Z₃The reset is C × N, where N ═ H × W is the number of pixels. Finally, the embodiment of the present invention is directed to Z₂And Z₃Matrix multiplication is carried out, and the information Z of interest is calculated through a softmax layer₅∈R^N×NR is a feature space; this process is defined by the embodiments of the present invention as:

wherein the content of the first and second substances,

and

for the features obtained through softmax layer calculation, the formula also shows that the more similar the two location features, the more correlation will likely occur.

In the design of a small network, in addition to the above design, as shown in fig. 2, the embodiment of the present invention further provides a feature Z₁Input to the convolutional layer to generate a new feature map Z₄∈R^C×H×WAnd reset to R^C×NThen add Z to it₅Performing matrix multiplication operation and setting the operation result as R^C×H×WFinally, the embodiment of the invention multiplies the characteristic by a proportional parameter lambda, and carries out summation operation on the characteristic according to elements to obtain the final productOutput Z₆∈R^C×H×WIn the embodiment of the present invention, the process is defined as:

wherein the content of the first and second substances,

is a local feature of the image and is,

is composed of

And generating a new feature map through the convolutional layer.

The formula also shows that the result feature of each position is the feature weighted sum of all the positions and the original features, so that the small network designed by the embodiment of the invention can help to obtain the correlation between the local information of the face mask and other information and keep the consistency of semantics.

Design of model loss function

In order to optimize the reconstruction effect, the embodiment of the invention designs two loss functions, namely reconstruction loss and style content loss. Specifically, the reconstitution loss is expressed as:

L_recon＝||f(I_recon)-I_i||² (7)

wherein, f (I)_recon) Representing a reconstruction feature, I_iRepresenting the input features. And the method is used for constraining the characteristics of the reconstructed image so that the content of the reconstructed face characteristics is closer to the input face characteristics.

The stylistic content loss is expressed as:

Loss＝αLoss_content+βLoss_style (8)

the loss of the stylistic content is to further constrain the stylistic and reconstruction characteristics of the reconstructed map so that the reconstructed map is more effective. Where α and β are hyper-parameters, the style loss is:

the content loss is:

the embodiment of the invention synthesizes the learning target of the model by using the loss function, namely the smaller the total loss is, the better the learning effect of the model is.

In conclusion, after the model trained by the embodiment of the invention is used for preprocessing the occluded face image, the occluded face recognition precision is improved.

Fourth, design occlusion data synthesis algorithm

In order to make the generalization capability of the model stronger, the embodiment of the invention designs an occlusion data synthesis algorithm to help complete the preparation of the data set. The method comprises the following steps of blocking a partial region of a human face, and recovering the partial region of the blocked human face through a designed algorithm, so as to prove the effectiveness of the model provided by the embodiment of the invention, wherein the synthetic data algorithm is divided into six steps in total, the flow is shown in fig. 5, an example is shown in fig. 6, and the algorithm is as follows:

the first step is as follows: inputting normal image datasets Images and dataset attribute position label text label.

The second step is that: extracting a label value label in the text;

the third step: obtaining a specific coordinate value coordinate through a label value label;

fourthly, changing the pixel value within the coordinate value range;

the fifth step: storing the processed image;

and a sixth step: and circularly processing each image until all the images are processed.

In other words, in practical application, before the second branch inputs the masked face image, the masked face image is processed through the first to sixth steps and then input into the second branch.

The data set attribute position tag text label is a markup file for each attribute position of a human face in a data set, and specific contents of the file are shown in table 1:

TABLE 1

Img_num	nose_x	nose_y	leftmouth_x	leftmouth_y	rightmouth_x	rightmouth_y
							000001.jpg	196	249	194	271	266	260

In table 1, the first column indicates the number of the image, and the following columns indicate the coordinate positions of the attributes in the image.

In summary, the embodiment of the present invention can better recover the human face shielding part through the above steps, and does not affect the characteristics of the non-shielding part, thereby meeting various requirements in practical applications.

Example 3

The following experiments were performed to verify the feasibility of the protocols of examples 1 and 2, as described in detail below:

as shown in fig. 7, the feasibility verification results of examples 1 and 2 are given. After the model is trained, inputting an untrained shielding face image, selecting mask shielding at the position, then carrying out face recognition, recovering the shielding face by the model, carrying out feature comparison with the image in a face recognition library, accurately finding the image in the library for final face verification, and outputting a result corresponding to the second output result in the figure 7; or the non-occluded face after occlusion recovery can be used for verification, corresponding to the first output result in fig. 7. And in the face verification stage, after the comparison is correct, the face is marked out by a rectangular frame, and corresponding face identity information is displayed.

In addition, for better verification, after training a good model by using the CelebA data set, the embodiment of the present invention selects verification of accuracy of face identification for occlusion on the LFW data set, and the result is shown in table 2 below:

TABLE 2

Data set type	Accuracy of identification
		Non-shielding face	About 99.78%
With occluded faces, but without using embodiment algorithms	About 82.34 percent
		With occluded faces, using embodiment Algorithm	About 91.12%

From the results in table 2, it can be seen that the accuracy of the identification algorithm for the face without occlusion can reach about 99%, and the accuracy is reduced to about 80% after occlusion occurs, whereas the accuracy can be improved by about 10% by repairing the occlusion feature after the method designed by the embodiment of the present invention is used. Theoretically, it is speculated that if the training data is increased, the promotion is more obvious, and the requirements in practical application are completely met.

Example 4

An occlusion face recognition apparatus based on feature reconstruction, referring to fig. 8, the apparatus comprising: a processor 1 and a memory 2, the memory 2 having stored therein program instructions, the processor 1 calling the program instructions stored in the memory 2 to cause the apparatus to perform the method steps described above:

respectively inputting the face features and the shielded face images into a face feature reconstruction model, reconstructing a non-shielded face image and a shielded face image, and acquiring a mapping relation between the shielded face image and the non-shielded face image;

When the second branch, the processor is further configured to: and designing an occlusion data synthesis algorithm for recovering and occluding partial areas of the human face to acquire occluded human face images.

Wherein, the occlusion data synthesis algorithm is as follows:

Further, the small network is: first local feature Z₁Sending into a volume of lamination layer to respectively generate two new mapping characteristics Z₂And Z₃，{Z₂,Z₃}∈R^C×H×WIs a reaction of Z₂,Z₃Resetting to C × N, wherein N ═ H × W is the number of pixels;

wherein the content of the first and second substances,

and

the features obtained for the calculation of the softmax layer.

In summary, the embodiment of the present invention can better recover the human face shielding part through the processor and the memory, and does not affect the characteristics of the non-shielding part, thereby meeting various requirements in practical applications.

Example 5

Based on the same inventive concept, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the method steps in the foregoing embodiments.

The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.

It should be noted that the descriptions of the readable storage medium in the above embodiments correspond to the descriptions of the method in the embodiments, and the descriptions of the embodiments of the present invention are not repeated here.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium or a semiconductor medium, etc.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An occlusion face recognition method based on feature reconstruction, the method comprising:

2. The method for recognizing the shielding face based on the feature reconstruction as claimed in claim 1, wherein the face feature reconstruction model is:

representing the reconstructed image.

3. The method for identifying the occlusion face based on the feature reconstruction as claimed in claim 1 or 2, wherein the face feature reconstruction model comprises two branches, the weights of the two branches are shared,

4. The method for recognizing the shielding face based on the characteristic reconstruction as claimed in claim 1, wherein the designed small network is as follows:

firstly local feature z₁Sending into a volume of lamination layer to respectively generate two new mapping characteristics Z₂And Z₃，{Z₂，Z₃}∈R^C×H×WIs a reaction of Z₂，Z₃Resetting to C × N, wherein N ═ H × W is the number of pixels;

wherein the content of the first and second substances,

and

the features obtained for the calculation of the softmax layer.

5. The method for recognizing the shielding face based on the characteristic reconstruction as claimed in claim 1, wherein the method further comprises:

6. The method for recognizing the shielding face based on the characteristic reconstruction as claimed in claim 2, wherein when the second branch is selected, the method further comprises:

7. The method according to claim 6, wherein the occlusion data synthesis algorithm is:

8. An occlusion face recognition apparatus based on feature reconstruction, the apparatus comprising: a processor and a memory, the memory having stored therein program instructions, the processor calling upon the program instructions stored in the memory to cause the apparatus to perform the method steps of any of claims 1-7.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method steps of any of claims 1-7.