CN114463810A

CN114463810A - Training method and device for face recognition model

Info

Publication number: CN114463810A
Application number: CN202210051165.6A
Authority: CN
Inventors: 陈新华
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-05-10

Abstract

The embodiment of the invention provides a training method and a device of a face recognition model, wherein the method comprises the following steps: acquiring an original training set and a transformation training set; inputting images of the original training set and the transformed training set into a pre-trained teacher module for face recognition to obtain face features; determining the target face characteristics of each image in the original training set according to the face characteristics of each image in the original training set and the transformed training set; inputting each image of the original training set into a student model to perform face recognition to obtain face features, and classifying the images with labels to obtain classification results; and adjusting parameters of the student model according to the face features output by the student model to the images, the target face features corresponding to the images, the classification results corresponding to the labeled images and the classification labels, so as to train the student model. The method can enable the student model to learn the characteristics fused from the original training set and the transformation training set, thereby learning more image characteristics and improving the generalization.

Description

Training method and device for face recognition model

Technical Field

The invention relates to the technical field of image recognition, in particular to a training method of a face recognition model and a training device of the face recognition model.

Background

The face recognition is a biological recognition technology for identity verification based on face feature information of people, and comprises the steps of determining a face area from an original image by using a face detection algorithm, extracting face features by using a feature extraction algorithm, and confirming identity according to the face features.

In the prior art, the face features are often determined from a face image by a manual means and a label is set. The existing face image with the label has large data scale, but the scene is simple, most face images are face images which are collected by a camera, the time span is small, the similarity in the class is very high, and the generalization of a face recognition model trained by the face images is poor.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a training method for a face recognition model and a corresponding training apparatus for a face recognition model, which overcome or at least partially solve the above problems.

In order to solve the above problems, the embodiment of the present invention discloses a training method for a face recognition model, which includes:

acquiring an original training set and a transformation training set obtained by transforming the original training set; the original training set comprises labeled images with classification labels and unlabeled images without classification labels;

inputting the images of the original training set and the transformed training set into a pre-trained teacher module for face recognition to obtain face features;

determining the target face features of the images in the original training set according to the face features of the images in the original training set and the transformed training set;

inputting each image of the original training set into a student model for face recognition to obtain face features, and classifying the labeled images to obtain classification results;

and adjusting parameters of the student model according to the face features output by the student model to the images, the target face features corresponding to the images, the classification results corresponding to the labeled images and the classification labels, so as to train the student model.

Optionally, before inputting each image of the original training set into a student model for face recognition to obtain face features, and classifying the labeled images to obtain classification results, the method further includes:

and acquiring a student module, and adding model noise to the student model.

Optionally, the inputting each image of the original training set into a student model for face recognition to obtain a face feature, and classifying the labeled image to obtain a classification result includes:

adding image noise to the images of the original training set;

and inputting the image with the image noise into a student model to perform face recognition to obtain face features, and classifying the labeled image to obtain a classification result.

Optionally, the determining, according to the face features of the images in the original training set and the transformed training set, the target face feature of each image in the original training set includes:

and calculating the average characteristic of each image in the original training set as the target human face characteristic of the image according to the human face characteristic of the image and the human face characteristic of the corresponding image in the transformed training set.

Optionally, the adjusting, according to the face features output by the student model for each image, the target face features corresponding to each image, the classification result and the classification label corresponding to the labeled image, the parameters of the student model to train the student model includes:

calculating an L2 loss function according to the face features output by the student model to each image and the target face features corresponding to each image;

calculating an arcface loss function according to the classification result and the classification label corresponding to the labeled image;

and adjusting parameters of the student model according to the L2 loss function and the arcface loss function so as to train the student model.

Optionally, the method further comprises:

and taking the trained student model as a new teacher model, and returning to the step of obtaining an original training set and a transformed training set obtained by transforming the original training set so as to repeatedly train the student model.

The embodiment of the invention also discloses a training device of the face recognition model, which comprises the following steps:

the data acquisition module is used for acquiring an original training set and a transformation training set obtained by transforming the original training set; the original training set comprises labeled images with classification labels and unlabeled images without classification labels;

the first input module is used for inputting the images of the original training set and the transformed training set into a pre-trained teacher module for face recognition to obtain face features;

the target characteristic determining module is used for determining the target face characteristics of each image in the original training set according to the face characteristics of each image in the original training set and the transformed training set;

the second input module is used for inputting all the images of the original training set into a student model for face recognition to obtain face features and classifying the images with labels to obtain classification results;

and the adjusting module is used for adjusting the parameters of the student model according to the face features output by the student model to the images, the target face features corresponding to the images, the classification results corresponding to the labeled images and the classification labels so as to train the student model.

Optionally, the method further comprises:

and the student model processing module is used for acquiring the student modules and adding model noise to the student models.

Optionally, the second input module comprises:

the image enhancement submodule is used for adding image noise to the images of the original training set;

and the noise image input submodule is used for inputting the image with the image noise into a student model for face recognition to obtain face characteristics, and classifying the labeled image to obtain a classification result.

Optionally, the target feature determination module includes:

and the average characteristic determining submodule is used for calculating the average characteristic of each image in the original training set as the target human face characteristic of the image according to the human face characteristic of the image and the human face characteristic of the corresponding image in the transformed training set.

Optionally, the adjusting module includes:

the first function calculation submodule is used for calculating an L2 loss function according to the face features output by the student model to each image and the target face features corresponding to each image;

the second function calculation submodule is used for calculating an arcface loss function according to the classification result and the classification label corresponding to the labeled image;

and the parameter adjusting submodule is used for adjusting the parameters of the student model according to the L2 loss function and the arcfacce loss function so as to train the student model.

Optionally, the method further comprises:

and the iteration module is used for taking the trained student model as a new teacher model and returning to the step of obtaining an original training set and a transformed training set obtained by transforming the original training set so as to repeatedly train the student model.

The embodiment of the invention also discloses an electronic device, which comprises: a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the steps of the training method of the face recognition model as described above.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the training method of the face recognition model are realized.

The embodiment of the invention has the following advantages:

according to the embodiment of the invention, the labeled image and the unlabeled image are used as the training set to train the student model, so that the student model can learn more knowledge from abundant unlabeled data, the face recognition performance is improved, and no additional processing is needed on the unlabeled data. The method comprises the steps of transforming an original training set in multiple modes to obtain multiple transformed training sets in a data distillation mode, reasoning the original training set and the transformed training sets through the same teacher model to obtain face features, and determining target face features of images in the original training set according to the face features of the images in the original training set and the face features of the images in the transformed training set; inputting each image of the original training set into a student model to perform face recognition to obtain face features, and classifying the images with labels to obtain classification results; according to the face features output by the student model to each image, the target face features corresponding to each image, the classification results corresponding to the labeled images and the classification labels, parameters of the student model are adjusted to train the student model, so that the student model can learn the features fused from the original training set and the transformed training set, more image features are learned, and the generalization is improved.

Drawings

FIG. 1 is a flowchart illustrating steps of a training method for a face recognition model according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of another method for training a face recognition model according to an embodiment of the present invention;

fig. 3 is a block diagram of a structure of a training apparatus for a face recognition model according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Besides the labeled face image, many unlabeled image data such as monitoring data exist in the real business scene at the same time. In the image data without labels, the personal posture, the expression and the illumination difference are large, and the method has great value for improving the generalization of the model. One of the core ideas of the embodiment of the invention is to provide a training mode of a semi-supervised face recognition model, combine a large number of unlabelled face images with rich scenes on the basis of labeled face images, furthest mine the data value and improve the face recognition capability of the face recognition model.

Referring to fig. 1, a flowchart illustrating steps of a training method for a face recognition model according to an embodiment of the present invention is shown, where the method specifically includes the following steps:

step 101, obtaining an original training set and a transformation training set obtained by transforming the original training set; the original training set comprises labeled images with classification labels and unlabeled images without classification labels;

in the embodiment of the invention, the labeled images with the classification labels and the unlabeled images without the classification labels can be selected to form an original training set. Illustratively, labeled images and unlabeled images of a preset batch size can be selected according to a ratio of 1:1 to form an original training set. Semi-supervised model training can be achieved by using labeled images and unlabeled images together as a training set.

The image with the label can select an image which is subjected to face alignment in advance and is provided with a label. The label-free images can be randomly selected, the pseudo labels are generated without depending on clustering, no assumption is made on the distribution of the label-free images, the label-free images can be overlapped with the labeled images, the data distribution can be unbalanced, and each type of sample can be few. According to the embodiment of the invention, image characteristics are mined in a data distillation mode, an original training set can be transformed in multiple modes to obtain multiple transformation training sets, then the original training set and the transformation training sets are inferred through the same teacher model, then the inference results are integrated, and then the student model learns the integrated results. The transformation may include: identity transformation, left-right turning, perspective transformation, and the like. For each image in the original training set, there is a corresponding transformed image in the transformed data set. By transforming the image, more data values can be mined.

And 102, inputting the images of the original training set and the transformed training set into a pre-trained teacher module for face recognition to obtain face features.

The teacher module is a face recognition model trained in advance, and can input images of an original training set and images of a transformed training set into the same teacher model to perform face recognition to obtain face features. Only through a teacher model can save the video memory, promote the treatment effeciency.

In one embodiment of the invention, the teacher module may be trained in advance using the labeled images.

And 103, determining the target face characteristics of each image in the original training set according to the face characteristics of each image in the original training set and the transformed training set.

The target face features of the images in the original training set can be determined by integrating the face features of the images in the original training set with the face features of the images corresponding to the transformed training set. The target face features can be used as labels of the images of the original training set, which is equivalent to setting pseudo labels for the images.

And 104, inputting the images of the original training set into a student model to perform face recognition to obtain face features, and classifying the labeled images to obtain classification results.

The student model is a face recognition model that needs training. The images of the original training set can be input into a student model, and the student model can perform face recognition on the unlabeled images to obtain face features; the student model can perform face recognition on the labeled images to obtain face features, and classifies the images according to the face features.

And 105, adjusting parameters of the student model according to the face features output by the student model to the images, the target face features corresponding to the images, the classification results corresponding to the labeled images and the classification labels, so as to train the student model.

The target face features corresponding to the images and the classification labels corresponding to the labeled images are labels of the images; parameters of the student model can be adjusted according to the face features output by the student model for each image, the target face features corresponding to each image, the classification results corresponding to the labeled images and the classification labels, so that the student model can be trained, and the student model can learn knowledge of the teacher model.

According to the embodiment of the invention, the labeled image and the unlabeled image are used as the training set to train the student model, so that the student model can learn more knowledge from abundant unlabeled data, the face recognition performance is improved, and no additional processing is needed on the unlabeled data. The method comprises the steps of transforming an original training set in multiple modes to obtain multiple transformed training sets in a data distillation mode, reasoning the original training set and the transformed training sets through the same teacher model to obtain face features, and determining target face features of images in the original training set according to the face features of the images in the original training set and the face features of the images in the transformed training set; inputting each image of the original training set into a student model to perform face recognition to obtain face features, and classifying the images with labels to obtain classification results; according to the face features output by the student model to each image, the target face features corresponding to each image, the classification result corresponding to the labeled image and the classification label, the parameters of the student model are adjusted to train the student model, so that the student model can learn the features fused from the original training set and the transformed training set, thereby learning more image features and improving the generalization.

Referring to fig. 2, a flowchart illustrating steps of another training method for a face recognition model according to an embodiment of the present invention is shown, where the method specifically includes the following steps:

step 201, obtaining an original training set and a transformation training set obtained by transforming the original training set; the original training set includes labeled images with classification labels and unlabeled images without classification labels.

Illustratively, the tagged image may be represented as: { (x)₁,y₁),(x₂,y₂),...,(x_m,y_m) X refers to the aligned labeled face image, y is the face label (0, 1.., m-1); the unlabeled image may be represented as:

refers to an unlabeled face image.

And 202, inputting the images of the original training set and the transformed training set into a pre-trained teacher module for face recognition to obtain face features.

Illustratively, the face features of the labeled images in the original training set can be represented as:

wherein the content of the first and second substances,

is a model parameter of the teacher model; the face features of the unlabeled images in the original training set can be expressed as:

the face features of the labeled images in the transformed training set can be expressed as:

the face features of the unlabeled images in the transformed training set may be represented as:

step 203, determining the target face features of the images in the original training set according to the face features of the images in the original training set and the transformed training set.

In this embodiment of the present invention, the step 203 may include: and calculating the average characteristic of each image in the original training set as the target human face characteristic of the image according to the human face characteristic of the image and the human face characteristic of the corresponding image in the transformed training set.

For example, X_iImage, X' representing the original training set_iAn image representing a transformed training set. Image X for the original training set₁And the corresponding image X in the training set can be transformed according to the face characteristics of the user and the transformation₁Calculating the average feature as image X₁The target face features of (1).

And step 204, acquiring a student module, and adding model noise to the student model.

The model noise is a random operation such as dropconnect, dropout, and stochastic depth set in a model. dropout refers to randomly discarding a part of neurons in the process of training a neural network to reduce the complexity of the neural network. drop route refers to randomly setting the connection weight of a network architecture to zero in the process of training a neural network, so that the network adapts to different connections in each training step. The stochastic depth refers to randomly setting the depth of the neural network, namely randomly setting the number of layers of the neural network.

By adding model noise, the training task of the student model is more difficult than that of the teacher model, and besides learning the knowledge of the teacher model, the model performance can be improved.

It is worth noting that in the case where the same model is used as the teacher model and the student model, since the pseudo-labels are generated using the teacher model, in this case the cross-entropy loss of the student model on the unlabeled data will be zero, then the student model cannot learn new knowledge. Therefore, in the case of using the same model as the teacher model and the student model, it is necessary to add model noise to the student model so that the student module can learn new knowledge.

Step 205, adding image noise to the images of the original training set.

Adding image noise to an image may also be referred to as data enhancement. Illustratively, the manner of increasing the image noise may include: gaussian filtering (gaussian blur), motion blur (MotionBlur), defocus blur (defocus blur), image Compression (Compression), linear contrast (linear contrast), (gaussian noise added) additive gaussian noise, perspective change (perspective transform), piecewise affine (piewisesaffene), and the like.

By adding image noise to the image, the generalization of the student model can be improved.

And step 206, inputting the image with the image noise into a student model to perform face recognition to obtain face features, and classifying the labeled image to obtain a classification result.

And step 207, adjusting parameters of the student model according to the face features output by the student model to the images, the target face features corresponding to the images, and the classification results and the classification labels corresponding to the labeled images, so as to train the student model.

In an embodiment of the present invention, the step 207 may include the following sub-steps:

and a substep S11 of calculating an L2 loss function according to the face features output by the student model to each image and the target face features corresponding to each image.

And calculating a loss function value by adopting an L2 loss function according to the face features of the student models for reasoning on the images in the original training set and the target face features corresponding to the images.

And a substep S12, calculating an arcface loss function according to the classification result and the classification label corresponding to the labeled image.

The arcface loss function is a loss function specially used for evaluating the face recognition effect, and the arcface loss function is adopted to calculate the loss function value according to the classification result of the inference of the student model on the labeled images in the original training set and the classification labels corresponding to the labeled images.

And a substep S13, adjusting parameters of the student model according to the L2 loss function and the arcfacce loss function so as to train the student model.

And taking the L2 loss function and the arcfacce loss function as a total loss function, and adjusting parameters of the student model together according to the L2 loss function value of each image and the arcfacce loss function value of the tagged image so as to train the student model.

Illustratively, the loss function for the student model population may be:

therein, Loss_arcIs the arcface Loss function, Loss_kdIs the L2 loss function, F^noiseIs a student model with noise added, theta^t，θ^sTeacher model parameters and student model parameters respectively;

is the arcface loss function value for the tagged image;

is the L2 loss function value for the labeled image;

the loss function value is L2 for the unlabeled image.

In this embodiment of the present invention, the method may further include: and taking the trained student model as a new teacher model, and returning to the step of obtaining an original training set and a transformed training set obtained by transforming the original training set so as to repeatedly train the student model.

The training of the model is repeated by using the trained student model as a new teacher model and then performing steps 201-207 again. It should be noted that, when retraining, a new original training set needs to be selected, instead of the original training set of the last training; and the model noise is required to be added to the student model during retraining, so that the student model and the teacher model are distinguished, and the student model cannot learn new knowledge.

According to the embodiment of the invention, the labeled images and the unlabeled images are used as the training set to train the student model, so that the student model can learn more knowledge from abundant unlabeled data, the face recognition performance is improved, and additional processing on the unlabeled data is not needed. The method comprises the steps of transforming an original training set in multiple modes to obtain multiple transformed training sets in a data distillation mode, reasoning the original training set and the transformed training sets through the same teacher model to obtain face features, and determining target face features of images in the original training set according to the face features of the images in the original training set and the face features of the images in the transformed training set; inputting the images of the original training set added with image noise into a student model with model noise to perform face recognition to obtain face features, and classifying the labeled images to obtain classification results; according to the face features output by the student model to each image, the target face features corresponding to each image, the classification results corresponding to the labeled images and the classification labels, parameters of the student model are adjusted to train the student model, so that the student model can learn the features fused from the original training set and the transformed training set, more image features are learned, and the generalization is improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 3, a block diagram of a structure of a training apparatus for a face recognition model according to an embodiment of the present invention is shown, which may specifically include the following modules:

a data obtaining module 301, configured to obtain an original training set and a transformed training set obtained by transforming the original training set; the original training set comprises labeled images with classification labels and unlabeled images without classification labels;

a first input module 302, configured to input images of the original training set and the transformed training set into a pre-trained teacher module for face recognition, so as to obtain face features;

a target feature determining module 303, configured to determine a target face feature of each image in the original training set according to the face features of each image in the original training set and the transformed training set;

a second input module 304, configured to input each image of the original training set into a student model for face recognition to obtain a face feature, and classify the labeled image to obtain a classification result;

an adjusting module 305, configured to adjust parameters of the student model according to the face features output by the student model for each image, the target face features corresponding to each image, and the classification result and the classification label corresponding to the labeled image, so as to train the student model.

In an embodiment of the present invention, the apparatus may further include:

In an embodiment of the present invention, the second input module 304 may include:

In an embodiment of the present invention, the target characteristic determining module 303 may include:

In an embodiment of the present invention, the adjusting module 305 may include:

In one embodiment of the present invention, the apparatus may further include:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present invention further provides an electronic device, including:

the training method comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein when the computer program is executed by the processor, each process of the embodiment of the training method for the face recognition model is realized, the same technical effect can be achieved, and the process is not repeated here to avoid repetition.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements each process of the above-mentioned training method for a face recognition model, and can achieve the same technical effect, and is not described here again to avoid repetition.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The training method of the face recognition model and the device of the face recognition model provided by the invention are introduced in detail, specific examples are applied in the text to explain the principle and the implementation mode of the invention, and the description of the above embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A training method of a face recognition model is characterized by comprising the following steps:

2. The method of claim 1, before inputting the images of the original training set into a student model for face recognition to obtain face features and classifying the labeled images to obtain a classification result, further comprising:

and acquiring a student module, and adding model noise to the student model.

3. The method of claim 1, wherein the inputting of the images of the original training set into a student model for face recognition to obtain face features and the classifying of the labeled images to obtain classification results comprises:

adding image noise to the images of the original training set;

4. The method of claim 1, wherein determining the target facial features of the images in the original training set according to the facial features of the images in the original training set and the transformed training set comprises:

5. The method of claim 1, wherein the adjusting parameters of the student model according to the facial features output by the student model for each image, the target facial features corresponding to each image, the classification result corresponding to the labeled image and the classification label to train the student model comprises:

6. The method of claim 2, further comprising:

7. A training device for a face recognition model is characterized by comprising:

8. The apparatus of claim 1, further comprising:

9. An electronic device, comprising: processor, memory and a computer program stored on the memory and being executable on the processor, the computer program, when being executed by the processor, implementing the steps of the training method of a face recognition model according to any one of claims 1-6.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the training method of a face recognition model according to any one of claims 1-6.