CN112016480B

CN112016480B - Face feature representing method, system, electronic device and storage medium

Info

Publication number: CN112016480B
Application number: CN202010898026.8A
Authority: CN
Inventors: 蔡少雄
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2024-05-28
Anticipated expiration: 2040-08-31
Also published as: CN112016480A

Abstract

The embodiment of the invention relates to the field of computer vision and discloses a method, a system, electronic equipment and a storage medium for generating face features. The method for generating the face features comprises the following steps: reconstructing the face image to be trained by utilizing a coding-decoding Encoder-Decoder network to obtain the characteristics of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network; classifying the face image to be trained by using a classifier according to the characteristics of the face image to be trained, and obtaining a classified face image, wherein the classification is to divide the face images belonging to the same person into one class; training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image; and acquiring the face characteristics of the face image to be tested according to the Encoder-Decoder network after training. The method is applied to the generation of the obtained face features, and achieves the aim that the generated face features are the face features meeting the requirements.

Description

Face feature representing method, system, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the field of computer vision, in particular to a method, a system, electronic equipment and a storage medium for generating face features.

Background

The face recognition technology generally needs to be subjected to feature generation and feature comparison. The conventional method is based on deep learning face recognition, which is to train with a large amount of data, acquire a still steady face characteristic representation when training data change, compare the face characteristic representation with the face to be recognized according to the face characteristic representation, analyze the comparison result and realize face recognition.

However, the inventor found that in practical applications, the face feature representation learned based on deep learning may not be the face representation actually required by the classifier, resulting in errors in face recognition.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a system, electronic equipment and a storage medium for generating face features, so that the learned face feature representation is a face representation really needed by a classifier.

In order to solve the above technical problems, the embodiment of the present invention provides a method for generating a face feature, including the following steps: reconstructing the face image to be trained by utilizing a coding-decoding Encoder-Decoder network to obtain the characteristics of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network; classifying the face image to be trained by using a classifier according to the characteristics of the face image to be trained, and obtaining a classified face image, wherein the classification is to divide the face image of the same person into one class; training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image; and acquiring the face characteristics of the face image to be tested according to the Encoder-Decoder network after training.

The embodiment of the invention also provides a system for generating the face characteristics, which comprises the following steps: the face reconstruction module is used for reconstructing the face image to be trained by utilizing a coding-decoding Encoder-Decoder network to obtain the characteristics of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network, and the face image to be trained is classified by utilizing a classifier according to the characteristics of the face image to be trained to obtain the classified face image, wherein the classification is to divide the face image of the same person into one class; the multitasking module is used for training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image, and acquiring the face characteristics of the face image to be tested according to the Encoder-Decoder network after training.

The embodiment of the invention also provides electronic equipment, which comprises:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating facial features described above.

The embodiment of the invention also provides a computer readable storage medium which stores a computer program, and the computer program realizes the generation method of the face features when being executed by a processor.

Compared with the prior art, the embodiment of the invention has the advantages that Encoder-Decoder networks are introduced to reconstruct the face images to be trained, the characteristics of the face images to be trained and the reconstructed face images are obtained, some hidden characteristics which cannot be directly observed or measured and are contained in the faces are obtained through reconstruction, the face images to be trained are classified according to the characteristics of the face images to be trained, the classified images are obtained, multitask training is carried out according to the reconstructed images and the classified images, a multitask model is obtained, the images to be tested are processed according to the multitask model, and the face characteristics output by the Encoder networks are used as the finally generated face characteristics. The interaction and promotion of the face classification and the face reconstruction improve the accuracy of Encoder-Decoder network face reconstruction, so that the face characteristics output by the Encoder network are the really needed face expressions.

In addition, the method comprises reconstructing the face image to be trained by using the encoding-decoding Encoder-Decoder network to obtain the features of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network, and comprises the following steps: acquiring the characteristics of the face image to be trained according to the Encoder network, wherein the Encoder network comprises a convolution layer, a batch normalization layer, an activation function layer and a pooling layer; reconstructing the face image to be trained according to the feature map by using the Decoder network to obtain the reconstructed face image, wherein the Decoder network comprises a transposed convolution layer and a batch normalization layer. Encoder-Decoder networks are capable of processing data quickly and efficiently, and capturing characteristic representations of images.

In addition, the obtaining the feature of the face image to be trained according to the Encoder network, where the Encoder network includes a convolution layer, a batch normalization layer, an activation function layer, and a pooling layer, including: the face image to be trained passes through the convolution layer, the batch normalization layer and the activation function layer to obtain initial characteristics of the face image to be trained; and the pooling layer pools the initial features to acquire the features of the face image to be trained. The image is mapped to a low-dimensional space, and the salient features of the data are extracted, so that the subsequent processing is facilitated.

In addition, the method includes reconstructing the face image to be trained according to the feature map by using the Decoder network, and obtaining the reconstructed face image, where the Decoder network includes a transpose convolution layer and a batch normalization layer, and includes: up-sampling the features of the face image according to the transpose convolution layer to obtain the up-sampled features; and carrying out batch normalization on the up-sampled features according to the batch normalization layer to obtain the reconstructed face image. Through image reconstruction, potential features of the face image are obtained, and the face features are reflected better.

In addition, the training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image includes: acquiring a face reconstruction sub-loss function according to the reconstructed face image; acquiring a face classification sub-loss function according to the classified face image; weighting and summing the face reconstruction sub-loss function and the face classification sub-loss function to obtain the loss function; training the Encoder-Decoder network and the classifier according to the penalty function. And carrying out combined training on the face reconstruction and the face classification by utilizing the reconstructed face image and the classified face image, and promoting each other so that the face characteristics output by the Encoder-Decoder network are required face characteristics.

In addition, the obtaining the face reconstruction sub-loss function according to the reconstructed face image includes: calculating the color loss of the pixel points according to the reconstructed face image and the face image to be trained; and averaging the color loss of the pixel points to obtain a face reconstruction loss function. And the color loss of the face pixel points is used as a supervision signal, so that the face reconstruction can be further optimized.

In addition, the classification loss sub-function is a cosine-based soft-max loss function AdaCos. The AdaCos loss function does not need super-parameters, and can automatically strengthen training supervision in the training process by utilizing the self-adaptive proportion parameters, so that multitasking training is possible.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

Fig. 1 is a flowchart of a method for generating a face feature according to a first embodiment of the present invention;

fig. 2 is a flowchart of a method for generating a face feature according to a second embodiment of the present invention;

fig. 3 is a flowchart of step 201 in the method for generating a face feature according to the second embodiment of the present invention shown in fig. 2;

fig. 4 is a flowchart of step 202 in the method for generating a face feature according to the second embodiment of the present invention shown in fig. 2;

Fig. 5 is a flowchart of a method for generating a face feature according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a face feature generating system according to a fourth embodiment of the present invention;

Fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. The claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments can be mutually combined and referred to without contradiction.

The first embodiment of the invention relates to a method for generating facial features. The specific flow is shown in fig. 1, and includes:

And 101, reconstructing the face image to be trained by utilizing an encoding-decoding Encoder-Decoder network to obtain the characteristics of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network.

In this embodiment, the feature of the face image to be trained is the result of the Encoder network output, and the reconstructed face image is the result of the Decoder network output.

In the present embodiment, the Encoder-Decoder network configuration is not limited, and in the actual use process, the Encoder-Decoder network may be any network capable of reconstructing an image by encoding and decoding.

It should be noted that the face image to be trained is an image containing faces, and faces in all the face images to be trained come from at least two people, so that the faces can be classified according to the people to which the faces belong in the subsequent classification.

Step 102, classifying the face images to be trained according to the characteristics of the face images to be trained by using a classifier, and obtaining the classified face images, wherein the classification is to divide the face images of the same person into one class.

In this embodiment, the classifier is not limited, and in the actual use process, the classifier may be any classifier capable of classifying face images in multiple ways.

And step 103, training Encoder-Decoder networks and classifiers according to the reconstructed face images and the classified face images.

In the embodiment, training Encoder-Decoder network and classifier is the multi-task learning of face reconstruction and face classification, which can be to introduce self-adaptive connection neural network in the whole feature extraction network to self-adaptively determine the connection state between feature nodes, so as to achieve flexible switching global reasoning and local reasoning.

And 104, acquiring the face characteristics of the face image to be tested according to the Encoder-Decoder network after training.

Compared with the prior art, the embodiment of the invention has the advantages that Encoder-Decoder networks are introduced to reconstruct the face images to be trained, the characteristics of the face images to be trained and the reconstructed face images are obtained, some hidden characteristics which cannot be directly observed or measured and are contained in the faces are obtained through reconstruction, the face images to be trained are classified according to the characteristics of the face images to be trained, the classified images are obtained, multitask training is carried out according to the reconstructed images and the classified images, a multitask model is obtained, the images to be tested are processed according to the multitask model, and the face characteristics output by the Encoder networks are used as the finally generated face characteristics. The interaction and promotion of the face classification and the face reconstruction improve the accuracy of Encoder-Decoder network face reconstruction, so that the face characteristics output by the Encoder network are the face expressions really needed by the classifier.

A second embodiment of the present invention relates to a method for generating a face feature. The second embodiment is substantially the same as the first embodiment, with the main difference that step 101 shown in fig. 2 includes:

step 201, obtaining features of an image to be trained according to Encoder network, wherein Encoder network comprises a convolution layer, a batch normalization layer, an activation function layer and a pooling layer.

In this embodiment, the number of the convolution layers, the batch normalization layers, the activation function layers and the pooling layers is not limited, and the connection manner may be that the batch normalization layers, the activation function layers and the pooling layers are sequentially connected after the convolution layers. The feature of the image to be trained is a feature map of the image obtained by mapping the image to a low-dimensional space after Encoder network processing.

And 202, reconstructing an image to be trained according to the feature map by using a Decoder network to obtain a reconstructed face image, wherein the Decoder network comprises a transposed convolution layer and a batch normalization layer.

In this embodiment, the number of transposed convolution layers and batch normalization layers is not limited, and the connection method may be that a convolution layer is followed by a batch normalization layer. The transposed convolution layer performs deconvolution by using a convolution kernel to check the features obtained before the convolution kernel processing.

In this embodiment, the number of convolution layers and the number of transposed convolution layers are the same, so that the reconstructed images are guaranteed to be the same in size, but there are differences in other aspects, such as color characteristics of pixels.

Specifically, as shown in fig. 3, step 201 may include:

step 301, the face image to be trained is subjected to a convolution layer, a batch normalization layer and an activation function layer, and initial characteristics of the face image to be trained are obtained.

In the present embodiment, the activation function is not limited, and may be a rectifying linear unit ReLU activation function or another activation function.

Step 302, the pooling layer pools the initial features to obtain the features of the face image to be trained.

In this embodiment, pooling may be performed to maximize pooling for achieving spatial invariance over small spatial movements while having a large receptive field in the features prior to pooling. The feature of the obtained face image to be trained is the obvious feature of the face after the dimension reduction is continued.

In order to solve this problem, it is necessary to save information before pooling, which may be all information or part of information, such as position information of a feature value.

In particular, when the maximum pooling is selected, all information of the feature before pooling cannot be stored due to the limitation of the real memory, and only the position of the feature value after pooling in the feature before pooling can be stored. The 2 x 2 window for pooling selection can be implemented with 2 bits, and compared with the float precision of the storage feature map, the method has higher efficiency, but has slight loss of accuracy, and is still suitable for practical application.

Specifically, as shown in fig. 4, step 202 may include:

Step 401, upsampling the features of the face image according to the transposed convolution layer to obtain upsampled features.

In this embodiment, the up-sampling is performed by deconvolution, and the up-sampled features are sparse features.

And step 402, carrying out batch normalization on the up-sampled features according to a batch normalization layer to obtain a reconstructed face image.

In this embodiment, the reconstructed face image has the same size as the corresponding face image to be trained, but there are differences in, for example, color characteristics and the like.

Compared with the prior art, the method and the device have the advantages that on the basis of achieving the beneficial effects brought by the first embodiment, the potential characteristics of the face image are obtained through image reconstruction, and the face characteristics are reflected better.

A third embodiment of the present invention relates to a method for generating a face feature. The third embodiment is substantially the same as the first embodiment, with the main difference that, as shown in fig. 5, step 103 includes:

Step 501, obtaining a face reconstruction sub-loss function according to the reconstructed face image.

In this embodiment, the face reconstruction sub-loss function may beWherein W, H is the length and width of the image in pixels, R, I is the color characteristic value of the pixels before and after reconstruction.

Step 502, obtaining a face classification sub-loss function according to the classified face image.

In this embodiment, the face classification sub-loss function may be a cosine based softmax loss AdaCos.

Step 503, the face reconstruction sub-loss function and the face classification sub-loss function are weighted and summed to obtain a loss function.

In the present embodiment, the loss functionWherein the first itemRepresenting cosine-based face classification loss; the second term L _pixel is the loss of the reconstructed pixels of the face, and the best experimental result is obtained when the value of lambda ₁ is 2.5.

Step 504, training Encoder-Decoder networks and classifiers according to the loss function.

It should be noted that, in this embodiment, when the value of λ ₁ is 2.5 and the face classification sub-loss function is a cosine-based softmax loss AdaCos, no hyper-parameters are needed, and the training supervision is automatically enhanced in the training process by using the adaptive scale parameters, so as to generate a more effective supervision, training is performed on the disclosed CASIA webface dataset, and the accuracy of the LFW dataset is tested, so that 99.82% of verification accuracy can be obtained, which is comparable to the most advanced model-based method at present.

Compared with the prior art, the method and the device have the advantages that on the basis of the beneficial effects brought by the first embodiment, the effects comparable with the most advanced model-based method at present can be obtained after the parameters and the face classification sub-loss function are properly selected.

The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they include the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.

A fourth embodiment of the present invention relates to a face feature generating system, as shown in fig. 6, including:

The face reconstruction module 601 is configured to reconstruct a face image to be trained by using an encoding-decoding Encoder-Decoder network to obtain features of the face image to be trained and a reconstructed face image, where the Encoder-Decoder network includes a Encoder network and a Decoder network, and classify the face image to be trained according to the features of the face image to be trained by using a classifier to obtain classified face images, where classification is to divide the face images of the same person into one class.

The multi-task module is used for training Encoder-Decoder networks and classifiers according to the reconstructed face images and the classified face images, and acquiring face features of the face images to be tested according to the trained Encoder-Decoder networks.

It is to be noted that this embodiment is a system example corresponding to the first embodiment, and can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, a detailed description is omitted here. Accordingly, the related art details mentioned in the present embodiment can also be applied to the first embodiment.

It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, units that are not so close to solving the technical problem presented by the present invention are not introduced in the present embodiment, but this does not indicate that other units are not present in the present embodiment.

A fifth embodiment of the present invention relates to an electronic device, as shown in fig. 7, including:

at least one processor 701; and

A memory 702 communicatively coupled to the at least one processor 701; wherein,

The memory 702 stores instructions executable by the at least one processor 701, so that the at least one processor 701 can execute the video color ring playing methods according to the first to third embodiments of the present invention.

Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.

A sixth embodiment of the present invention relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.

That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the invention and that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. The method for generating the face features is characterized by comprising the following steps:

Reconstructing the face image to be trained by utilizing a coding-decoding Encoder-Decoder network to obtain the characteristics of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network;

Classifying the face image to be trained by using a classifier according to the characteristics of the face image to be trained, and obtaining a classified face image, wherein the classification is to divide the face image of the same person into one class;

Training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image;

Acquiring face characteristics of a face image to be tested according to the Encoder-Decoder network after training;

wherein the training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image comprises:

acquiring a face reconstruction sub-loss function according to the reconstructed face image;

acquiring a face classification sub-loss function according to the classified face image;

weighting and summing the face reconstruction sub-loss function and the face classification sub-loss function to obtain the loss function;

training the Encoder-Decoder network and the classifier according to the loss function;

The step of obtaining a face reconstruction sub-loss function according to the reconstructed face image comprises the following steps:

Calculating the color loss of the pixel points according to the reconstructed face image and the face image to be trained;

averaging the color loss of the pixel points to obtain a face reconstruction loss function;

the face classification sub-loss function is a loss function AdaCos.

2. The method according to claim 1, wherein reconstructing the face image to be trained using the encoding-decoding Encoder-Decoder network, obtaining the feature of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network includes a Encoder network and a Decoder network, and includes:

Acquiring the characteristics of the face image to be trained according to the Encoder network, wherein the Encoder network comprises a convolution layer, a batch normalization layer, an activation function layer and a pooling layer;

Reconstructing the face image to be trained according to the characteristics by using the Decoder network to obtain the reconstructed face image, wherein the Decoder network comprises a transposition convolution layer and a batch normalization layer.

3. The method according to claim 2, wherein the obtaining the feature of the face image to be trained according to the Encoder network, wherein the Encoder network includes a convolution layer, a batch normalization layer, an activation function layer, a pooling layer, and includes:

The face image to be trained passes through the convolution layer, the batch normalization layer and the activation function layer to obtain initial characteristics of the face image to be trained;

And the pooling layer pools the initial features to acquire the features of the face image to be trained.

4. The method according to claim 2, wherein reconstructing the face image to be trained from the features by using the Decoder network, and obtaining the reconstructed face image, wherein the Decoder network includes a transpose convolution layer, a batch normalization layer, and includes:

Up-sampling the features of the face image according to the transpose convolution layer to obtain the up-sampled features;

and carrying out batch normalization on the up-sampled features according to the batch normalization layer to obtain the reconstructed face image.

5. A system for generating a face feature, comprising:

The face reconstruction module is used for reconstructing the face image to be trained by utilizing the coding-decoding Encoder-Decoder network to obtain the characteristics of the face image to be trained and the reconstructed face image, wherein the Encoder-Decoder network comprises a Encoder network and a Decoder network;

Classifying the face image to be trained according to the characteristics of the face image to be trained by using a classifier to obtain a classified face image, wherein the classification is to divide the face image of the same person into one type, train the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image, and obtain the face characteristics of the face image to be tested according to the Encoder-Decoder network after training;

The multitasking module is used for training the Encoder-Decoder network and the classifier according to the reconstructed face image and the classified face image and obtaining the face characteristics of the face image to be tested according to the Encoder-Decoder network after training;

wherein the multitasking module is further configured to:

Acquiring a face reconstruction sub-loss function according to the reconstructed face image; acquiring a face classification sub-loss function according to the classified face image; weighting and summing the face reconstruction sub-loss function and the face classification sub-loss function to obtain the loss function; training the Encoder-Decoder network and the classifier according to the loss function;

Calculating the color loss of the pixel points according to the reconstructed face image and the face image to be trained; averaging the color loss of the pixel points to obtain a face reconstruction loss function;

the face classification sub-loss function is a loss function AdaCos.

6. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the face feature generation method of any one of claims 1 to 4.

7. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the face feature generation method of any one of claims 1 to 4.