CN116758601A

CN116758601A - Training method and device of face recognition model, electronic equipment and storage medium

Info

Publication number: CN116758601A
Application number: CN202210199328.5A
Authority: CN
Inventors: 吕永春; 朱徽; 周迅溢; 曾定衡
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2023-09-15

Abstract

The application discloses a training method and device of a face recognition model, electronic equipment and a storage medium, wherein the training method comprises the following steps: the method comprises the steps of obtaining first multidimensional depth features of each sample image of an unlabeled dataset through a face feature generation model, obtaining second multidimensional depth features of each sample image of the labeled dataset through a face feature generation model, inputting the first multidimensional depth features and the second multidimensional depth features into an initial face recognition model, outputting a first recognition result of the first sample image and a second recognition result of the second sample image, determining a loss function through the first recognition result and the second recognition result, training the initial face recognition model, and taking the initial face recognition model when the loss function converges as the face recognition model. The sample images trained by the application are more various, the human face feature generation model can extract rich multidimensional depth features, and the rich multidimensional depth features can enable the human face recognition model to have stronger image discrimination capability, so that the accuracy of the human face recognition model is higher.

Description

Training method and device of face recognition model, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a training method and apparatus for a face recognition model, an electronic device, and a storage medium.

Background

When the deep learning algorithm is used for classifying the face images, the semi-supervised deep learning method is generated as the labeled data with the class labels are less, and the unlabeled data without the class labels are more huge. The semi-supervised deep learning method can perform joint training by using a small amount of tagged data and a large amount of untagged face data, but the recognition accuracy of a recognition model obtained by the existing semi-supervised training is low.

Disclosure of Invention

In view of the above, the present application provides a training method and apparatus for a face recognition model, an electronic device and a storage medium, which can solve the above problems.

In a first aspect, an embodiment of the present application provides a training method for a face recognition model, where the method includes: obtaining an unlabeled dataset and a labeled dataset, the sample images in the unlabeled dataset being different from the sample images in the labeled dataset, wherein the unlabeled dataset comprises a plurality of first sample images and the labeled dataset comprises a plurality of second sample images; inputting the unlabeled dataset and the labeled dataset into a face feature generation model, and outputting first multidimensional depth features corresponding to the first sample images and second multidimensional depth features corresponding to the second sample images; inputting the first multi-dimensional depth feature and the second multi-dimensional depth feature into an initial face recognition model, and outputting a first recognition result corresponding to each first sample image and a second recognition result corresponding to each second sample image; determining a target loss function according to the first identification result and the second identification result; training the initial face recognition model based on the target loss function until the target loss function converges, and determining the initial face recognition model when the target loss function converges as the face recognition model.

In a second aspect, an embodiment of the present application provides a training apparatus for a face recognition model, where the apparatus includes: the system comprises a data set acquisition module, a feature generation module, a sample identification module, a function construction module and a model training module. The data set acquisition module is used for acquiring an unlabeled data set and a labeled data set, wherein a sample image in the unlabeled data set is different from a sample image in the labeled data set, the unlabeled data set comprises a plurality of first sample images, and the labeled data set comprises a plurality of second sample images; the feature generation module is used for inputting the unlabeled data set and the labeled data set into a face feature generation model and outputting a first multi-dimensional depth feature corresponding to each first sample image and a second multi-dimensional depth feature corresponding to each second sample image; the sample recognition module is used for inputting the first multidimensional depth feature and the second multidimensional depth feature into an initial face recognition model and outputting a first recognition result corresponding to each first sample image and a second recognition result corresponding to each second sample image; the function construction module is used for determining a target loss function according to the first identification result and the second identification result; and the model training module is used for training the initial face recognition model based on the target loss function until the target loss function converges, and determining the initial face recognition model when the target loss function converges as the face recognition model.

In a third aspect, an embodiment of the present application provides a face recognition method, where the method includes: acquiring a face image to be recognized, inputting the face image to be recognized into a face feature generation model, and outputting a multi-dimensional depth feature to be recognized corresponding to the face image to be recognized; inputting the multi-dimensional depth features to be recognized into a face recognition model, and outputting a recognition result of the face images to be recognized, wherein the face recognition model is obtained by training a plurality of first multi-dimensional depth features of an unlabeled data set and a plurality of second multi-dimensional depth features of a labeled data set, sample images in the unlabeled data set are different from sample images in the labeled data set, the unlabeled data set comprises a plurality of first sample images, the labeled data set comprises a plurality of second sample images, the plurality of first multi-dimensional depth features are obtained by extracting each first sample image by the face feature generation model, and the plurality of second multi-dimensional depth features are obtained by extracting each second sample image by the face feature generation model.

In a fourth aspect, an embodiment of the present application provides a face recognition apparatus, including: the system comprises an image acquisition module and an image identification module. The image acquisition module is used for acquiring a face image to be identified, inputting the face image to be identified into a face feature generation model, and outputting multi-dimensional depth features to be identified corresponding to the face image to be identified; the image recognition module is used for inputting the multi-dimensional depth features to be recognized into a face recognition model and outputting a recognition result of the face images to be recognized, wherein the face recognition model is obtained by training a plurality of first multi-dimensional depth features of an unlabeled data set and a plurality of second multi-dimensional depth features of a labeled data set, sample images in the unlabeled data set are different from sample images in the labeled data set, the unlabeled data set comprises a plurality of first sample images, the labeled data set comprises a plurality of second sample images, the plurality of first multi-dimensional depth features are obtained by extracting each first sample image by the face feature generation model, and the plurality of second multi-dimensional depth features are obtained by extracting each second sample image by the face feature generation model.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the above-described method.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, the program code being callable by a processor to perform the above method.

In a seventh aspect, an embodiment of the present application provides a computer program product comprising instructions, characterized in that the computer program product has instructions stored therein, which when run on a computer, cause the computer to implement the above method.

It can be seen that in embodiments of the present application, the face recognition model may be trained simultaneously using unlabeled datasets and labeled datasets that contain different sample images. Specifically, first, a first multidimensional depth feature corresponding to a first sample image in an unlabeled data set is obtained through a face feature generation model, a second multidimensional depth feature corresponding to a second sample image in the labeled data set is obtained through a face feature generation model, then the first multidimensional depth feature and the second multidimensional depth feature are input into an initial face recognition model, a first recognition result corresponding to the first sample image and a second recognition result corresponding to the second sample image are output, then the first recognition result and the second recognition result are used for determining a loss function of the initial face recognition model and training the initial face recognition model, and finally the initial face recognition model when the loss function converges is determined to be the face recognition model. In this embodiment, the sample images used during training are more various, so the face feature generation model can extract various multi-dimensional depth features, and the various multi-dimensional depth features can enable the face recognition model obtained through training to have stronger image discrimination capability, so that the face recognition model obtained through training can identify the types of the images more accurately.

These and other aspects of the application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application environment of a training method of a face recognition model according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a training method of a face recognition model according to another embodiment of the present application;

FIG. 4 is a flow chart of a training process of an initial data generation model according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating a training process of an initial face recognition model according to an embodiment of the present application;

fig. 6 is a flowchart illustrating an overall scheme of a training method of a face recognition model according to an embodiment of the present application;

Fig. 7 is a schematic flow chart of a face recognition method according to an embodiment of the present application;

fig. 8 is a block diagram of a training device of a face recognition model according to an embodiment of the present application;

fig. 9 shows a block diagram of a face recognition device according to an embodiment of the present application;

fig. 10 shows a block diagram of an electronic device according to an embodiment of the present application;

FIG. 11 shows a block diagram of a computer-readable storage medium provided by an embodiment of the application;

fig. 12 shows a block diagram of a computer program product according to an embodiment of the application.

Detailed Description

In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings.

With the development of artificial intelligence technology, the deep learning method has been remarkably successful in the face recognition field. However, the face recognition method based on deep learning depends on the size of the training data amount, and the classification quality is affected by the amount of the labeled data. The greater the amount of tagged data, the more efficient the trained deep learning method. However, in practice, it is difficult to obtain the tagged data, and the face image needs to be labeled manually, which affects the improvement of the performance of the face recognition model based on the supervised training to a certain extent. Thus, a semi-supervised deep learning approach ensues. The semi-supervised deep learning method can perform joint training by using a small amount of labeled data and a large amount of unlabeled face data, but the recognition accuracy of the face recognition model obtained by the existing semi-supervised training is low.

In order to solve the above problems, the inventors of the present application have found, after a long-term study, that the standard distribution of the respective categories corresponding to randomly acquired marked data and unmarked data is consistent. Therefore, the inventor proposes a training method for a face recognition model, which can train the face recognition model simultaneously by using unlabeled data sets and labeled data sets containing different sample images. Specifically, first, a first multidimensional depth feature corresponding to a first sample image in an unlabeled data set is obtained through a face feature generation model, a second multidimensional depth feature corresponding to a second sample image in the labeled data set is obtained through a face feature generation model, then the first multidimensional depth feature and the second multidimensional depth feature are input into an initial face recognition model, a first recognition result corresponding to the first sample image and a second recognition result corresponding to the second sample image are output, then the first recognition result and the second recognition result are used for determining a loss function of the initial face recognition model and training the initial face recognition model, and finally the initial face recognition model when the loss function converges is determined to be the face recognition model. In this embodiment, the sample images used during training are more various, so the face feature generation model can extract various multi-dimensional depth features, and the various multi-dimensional depth features can enable the face recognition model obtained through training to have stronger image discrimination capability, so that the face recognition model obtained through training can identify the types of the images more accurately.

In order to better understand the training method, device, electronic device and storage medium of the face recognition model provided by the embodiment of the application, an application environment suitable for the embodiment of the application is described below.

Referring to fig. 1, fig. 1 is a schematic view of an application environment of a training method of a face recognition model according to an embodiment of the application. The training method, the training device, the electronic equipment and the storage medium of the face recognition model provided by the embodiment of the application can be applied to a face recognition system. Illustratively, the face recognition system 100 may be composed of the terminal device 110 and the server 120 in fig. 1. Wherein the network is used as a medium to provide a communication link between terminal device 110 and server 120. The network may include various connection types, such as wired communication links, wireless communication links, and the like, as embodiments of the application are not limited in this regard.

It should be understood that the terminal device 110, server 120, and network in fig. 1 are merely illustrative. There may be any number of terminal devices, servers, and networks, as desired for implementation.

In some embodiments, terminal device 110 may send sample images, including first sample images of unlabeled categories, to server 120 over a network, to form an unlabeled dataset, and second sample images of the categories labeled, to form a labeled dataset. After the server 120 receives the sample images in the labeled dataset and the unlabeled dataset, the initial face recognition model may be trained by the training method of the face recognition model according to the embodiment of the present application, so as to obtain the face recognition model.

In some embodiments, after training to obtain the face recognition model, the server 120 may further obtain a face image to be recognized sent by the terminal device 110 through the network, and identify, in the server 120, the category of the face image to be recognized using the face recognition model, as a recognition result of the face image to be recognized. Optionally, the server 120 may also feed back the recognition result of the face image to be recognized obtained by recognition to the terminal device 120. It will be appreciated that the source of the sample image (including the first sample image and the second sample image) and the source of the face image to be identified may be the same or different, i.e. the sample image and the face image to be identified may originate from the same terminal device 110 or from different terminal devices 110.

The server 120 may be a physical server, a server cluster formed by a plurality of servers, or the like, and the terminal device 110 may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, a wearable device, a smart speaker, or the like. It will be appreciated that embodiments of the present application may also allow multiple terminal devices 110 to access the server 120 simultaneously.

The above application environments are merely examples for facilitating understanding, and it is to be understood that embodiments of the present application are not limited to the above application environments.

The training method, device, electronic equipment and storage medium of the face recognition model provided by the embodiment of the application are explained in detail below through specific embodiments.

Referring to fig. 2, a flowchart of a training method of a face recognition model according to an embodiment of the application is shown. The following will describe the flow shown in fig. 2 in detail, and the training method of the face recognition model specifically may include the following steps:

step S210: an unlabeled dataset and a labeled dataset are obtained, the sample images in the unlabeled dataset being different from the sample images in the labeled dataset, wherein the unlabeled dataset comprises a plurality of first sample images and the labeled dataset comprises a plurality of second sample images.

In the embodiment of the application, firstly, the face recognition model can be pre-trained by using the unlabeled data set and the labeled data set simultaneously. Alternatively, the unlabeled dataset may comprise a plurality of first sample images, and the first sample images are unlabeled data, that is, the class to which the first sample images belong is unknown before the unlabeled dataset is input into the face recognition model. Alternatively, the annotated data set may comprise a plurality of second sample images, and the second sample images are annotated data, that is, each second sample image may be annotated with a category label for representing the target category to which the second sample image actually corresponds.

In some embodiments, the second sample image may be manually annotated data. The annotated dataset may comprise n different target categories, wherein the target categories corresponding to the plurality of second sample images may be the same, i.e. the plurality of second sample images may be face images of the same person. Therefore, if the number of the second sample images is B1, B1 is greater than n, and B1 and n are both positive integers.

In some exemplary embodiments, the sample image in the unlabeled dataset is different from the sample image in the labeled dataset, i.e., the target class to which the first sample image in the unlabeled dataset truly corresponds is different from the class label to which the second sample image in the labeled dataset is labeled. Optionally, after the unlabeled dataset is obtained, the unlabeled dataset may be deduplicated before being input into the face feature generation model.

In some embodiments, in the de-duplication process, the initial face recognition model predicts the probability that each first sample image belongs to each target class, filters out the first sample images with probabilities greater than the first probability threshold, and then removes all the first sample images with probabilities greater than the first probability threshold from the unlabeled dataset. This is equivalent to removing the first sample image from which the initial face recognition model can accurately recognize the category, while retaining the first sample image that cannot be accurately recognized. And then training the face recognition model by using the untagged training set after the duplication removal. Alternatively, the first probability threshold may be, for example, 0.8. In other embodiments, during the deduplication process, the similarity between each first sample image and each second sample image may be determined, and the first sample images having a similarity greater than the similarity threshold are filtered out, thereby removing all first sample images having a similarity greater than the similarity threshold from the unlabeled dataset. This corresponds to the removal of sample images in the unlabeled dataset that are very similar to the sample images in the labeled dataset, so that the sample images remaining in the unlabeled dataset are different from the sample images in the labeled dataset.

In the iterative training process, the unlabeled dataset may be de-duplicated in the manner of the foregoing embodiment, so that each time the iterative training is performed, it is ensured that the sample image in the current unlabeled dataset is different from the sample image in the labeled dataset. Along with the training of the face recognition model, the face recognition model can gradually recognize the categories of all the first sample images in the unlabeled data, the remaining first sample images which cannot be accurately recognized in the unlabeled data set can be continuously reduced through duplication elimination, and finally the face recognition model can recognize the categories of all the first sample images.

In some embodiments, the pre-constructed neural network model may first be trained using the labeled data set to arrive at an initial face recognition model prior to inputting the first and second multi-dimensional depth features into the initial face recognition model. Specifically, the labeled data set can be input into a pre-constructed neural network model for iterative training, and the neural network model after the iterative training is completed is used as an initial face recognition model. In each iterative training process, the initial face feature generation model may be used to extract the sample face features of each second sample image in the labeled dataset first, and the sample face features may be input to the neural network model, which may be used to perform the following operations: and determining the sample category to which each second sample image belongs according to the sample face characteristics. Optionally, a first loss function may be constructed according to an identification error between the sample class and a target class corresponding to the class label corresponding to the second sample image, and whether the first loss function satisfies the second error condition is determined, if not, parameters of the neural network model are adjusted according to the first loss function; if yes, finishing iterative training, fixing parameters of the neural network model at the moment, and taking the neural network model at the moment as a face recognition model.

It can be understood that the sample class can be taken as a predicted value, the target class marked by the second sample image can be taken as an expected value, and the classification errors of the predicted value and the expected value can be calculated by constructing a loss function. Therefore, in this embodiment, an initial model loss function may be constructed through a classification error between a sample class and a target class, and then model parameters of the neural network model and the initial face feature generation model are continuously adjusted according to the initial model loss function, so that the initial model loss function is continuously iterated until the initial model loss function converges, and the iterative training is completed, and model parameters of the neural network model and the initial face feature generation model when the initial model loss function converges are fixed, and then the neural network model with fixed model parameters may be respectively used as an initial face recognition model, and the initial face feature generation model with fixed model parameters may be used as a face feature generation model.

Alternatively, the initial model loss function may be:

wherein n is ₁ Representing the number of second sample images, n1 is equal to B1,representing that a second sample image predicted by the neural network model belongs to a target category y corresponding to the category label _i N is the number of categories of the target category, +.>Representing that the second sample image predicted by the neural network model belongs to each target class y _j Probability values of (a) are provided.

Alternatively, the initial feature generation model may be built based on a skeleton network, which may be, for example, a ResNet series or the like. Alternatively, the pre-built neural network model may be a classifier, which may include, for example, a fully connected layer, may be implemented using ArcFace, for example.

Alternatively, the face recognition model may be trained using the second sample images in batches, where B1 represents the number of second sample images used in one batch of training.

Alternatively, if the number of the first sample images is B2, B2 may be greater than n, and B2 is a positive integer. Note that, the first sample image may be input to the face recognition model in batches, and at this time, B2 may represent the number of first sample images used in one batch of training process.

Step S220: and inputting the unlabeled data set and the labeled data set into a face feature generation model, and outputting a first multi-dimensional depth feature corresponding to each first sample image and a second multi-dimensional depth feature corresponding to each second sample image.

In an embodiment of the present application, the first multi-dimensional depth feature may be a depth face feature obtained by extracting the first sample image from the plurality of dimensions by the face feature generation model. Similarly, the second multi-dimensional depth feature may then be a depth face feature extracted from the second sample image by the face feature generation model from multiple dimensions. The deep facial features may include facial features, facial skin features, and the like, and the dimension d of the features may be 512 dimensions, for example.

Step S230: and inputting the first multidimensional depth feature and the second multidimensional depth feature into an initial face recognition model, and outputting a first recognition result corresponding to each first sample image and a second recognition result corresponding to each second sample image.

It should be noted that, whether the data set is marked or not marked, the depth face features of the image can all show the category characteristics of the face on the image, i.e. the category to which the image belongs can be identified according to the depth face features.

In some embodiments, a standard face image is set for each target class in the initial face recognition model, and depth face features extracted from each standard face image by the face feature generation model are obtained and used as standard multidimensional depth features corresponding to each target class. Then, for each second sample image in the annotated data set, the initial face recognition model may be used to calculate a second similarity between the second multi-dimensional depth feature and each standard multi-dimensional depth feature, and then determine a second class corresponding to each second sample image according to the second similarity, and output the second class as a second recognition result. For each first sample image in the unlabeled data set, the initial face recognition model can be used for calculating first similarity between the first multidimensional depth feature and each standard multidimensional depth feature, then determining a first category corresponding to each first sample image through the first similarity, and taking the first category as a first recognition result and outputting the first category. Optionally, the higher the first similarity is, the higher the coincidence degree between the depth face feature contained in the first sample image and the depth face feature contained in the standard face image is, and further the closer the class to which the first sample image belongs is to the target class corresponding to the standard face image is, so that the target class, which is greater than the preset similarity threshold, in the first similarity corresponding to each target class can be used as the first class. Similarly, the target category with the second similarity corresponding to each target category being greater than the preset similarity threshold may be used as the second category.

For example, if the first similarity between the first multi-dimensional depth feature of one first sample image and the standard multi-dimensional depth features of 4 standard face images is 88 (corresponding to "target class 1"), 23 (corresponding to "target class 2"), 51 (corresponding to "target class 3"), and 75 (corresponding to "target class 4"), respectively, where the preset similarity threshold is 85, then "target class 1" may be determined as the first recognition result of this first sample image. In some specific embodiments, the target class with the highest first similarity may also be used as the first recognition result of the first sample image.

In other embodiments, for each first sample image, each first multidimensional depth feature may be input into an initial face recognition model to obtain a first classification probability of each first sample image belonging to each target class, and then a target class corresponding to the first classification probability greater than a first preset probability value is determined as a first recognition result, where each first classification probability corresponds to one target class; for each second sample image, each second multidimensional depth feature can be input into an initial face recognition model to obtain second classification probability of each second sample image belonging to each target class, and then a target class corresponding to the second classification probability larger than the first preset probability value is determined to be a second recognition result, wherein each second classification probability corresponds to one target class.

Step S240: and determining an objective loss function according to the first identification result and the second identification result.

Step S250: training the initial face recognition model based on the target loss function until the target loss function converges, and determining the initial face recognition model when the target loss function converges as the face recognition model.

Since the sample image in the unlabeled dataset is different from the sample image in the labeled dataset, the first recognition result should also be different from the second recognition result. Thus, in some embodiments, the objective loss function may be constructed from the recognition error between each first recognition result and each second recognition result. For example, if the unlabeled dataset includes image 1, image 2, the labeled dataset includes image 3, image 4, and image 5, assuming that the first recognition result of image 1 is recognized as result 1, the first recognition result of image 2 is recognized as result 2, the second recognition result of image 3 is recognized as result 3, the second recognition result of image 4 is recognized as result 4, and the second recognition result of image 5 is recognized as result 5, respectively, the recognition errors between the result 1 of image 1 and the result 3 of image 3, the result 4 of image 4, and the result 5 of image 5 in the labeled dataset are calculated, respectively, the recognition errors between the result 2 of image 2 and the result 3 of image 3, the result 4 of image 4, and the result 5 of image 5 in the labeled dataset are calculated, and then the target loss function is constructed according to the recognition errors corresponding to image 1 and the recognition errors corresponding to image 2.

Optionally, the objective loss function may be a sum of recognition errors between each first recognition result and each second recognition result, and when the initial face recognition model is trained based on the objective loss function, model parameters of the initial face recognition model may be continuously adjusted so that the objective loss function (i.e. the recognition error between each first recognition result and each second recognition result) becomes larger until the objective loss function reaches a preset loss threshold value, which indicates that the objective loss function converges, and the model parameters of the initial face recognition model when the preset loss threshold value is reached are fixed as the model parameters of the face recognition model obtained by training.

In other embodiments, the first multi-dimensional depth feature of the unlabeled dataset may be reconstructed by the data generation model to obtain a first multi-dimensional reconstructed feature, so that a third recognition result recognized by the first sample image according to the first multi-dimensional reconstructed feature satisfies a distribution condition of each target class in the labeled dataset. That is, the model may be generated by the data that the reconstructed unlabeled dataset is consistent with the distribution of the target categories in the labeled dataset. At this time, the third recognition result may be used as a pseudo tag of each first sample image in the unlabeled dataset, the first loss function may be determined by respectively calculating the pseudo tag of each first sample image and the first recognition result recognized by the initial face recognition model, the second loss function may be determined by each class tag of each second sample image and the second recognition result recognized by the initial face recognition model, and then the target loss function may be determined according to the first loss function and the second loss function. Finally, training the initial face recognition model by using the target loss function.

In summary, according to the training method for the face recognition model provided in the embodiment, the face recognition model may be trained simultaneously by using the unlabeled data set and the labeled data set including the images of different samples. Specifically, first, a first multidimensional depth feature corresponding to a first sample image in an unlabeled data set is obtained through a face feature generation model, a second multidimensional depth feature corresponding to a second sample image in the labeled data set is obtained through a face feature generation model, then the first multidimensional depth feature and the second multidimensional depth feature are input into an initial face recognition model, a first recognition result corresponding to the first sample image and a second recognition result corresponding to the second sample image are output, then the first recognition result and the second recognition result are used for determining a loss function of the initial face recognition model and training the initial face recognition model, and finally the initial face recognition model when the loss function converges is determined to be the face recognition model. In this embodiment, the sample images used during training are more various, so the face feature generation model can extract various multi-dimensional depth features, and the various multi-dimensional depth features can enable the face recognition model obtained through training to have stronger image discrimination capability, so that the face recognition model obtained through training can identify the types of the images more accurately.

In some implementations, optionally, on the basis of the foregoing embodiment, before determining the objective loss function according to the first recognition result and the second recognition result, the present embodiment may reconstruct the first multi-dimensional depth feature of the unlabeled dataset through the data generation model to obtain a first multi-dimensional reconstruction feature, so that a third recognition result recognized by the first sample image according to the first multi-dimensional reconstruction feature satisfies a distribution condition of each objective category in the labeled dataset.

Referring to fig. 3, a flowchart of a training method of a face recognition model according to another embodiment of the application is shown. Optionally, in the processing procedure shown in fig. 3, each second sample image in the annotated data set corresponds to one target category, which specifically includes the following steps:

step S310: and carrying out feature reconstruction on each first multidimensional depth feature according to the data generation model to obtain first multidimensional reconstruction features corresponding to the first multidimensional depth features.

Step S320: and identifying the target categories corresponding to the first sample images according to the first multidimensional reconstruction features, and taking the target categories as third identification results, wherein the third identification results meet standard distribution, and the standard distribution is used for representing the distribution condition of each target category in the marked data set.

In some embodiments, if the sample images in the marked data set and the unmarked data set are randomly acquired face images without manual screening, the sample images in the marked data set and the unmarked data set both conform to the same distribution. In other embodiments, the sample image in the annotated dataset (i.e., the second sample image) is often data that has been screened by an expert in the field, while the sample image in the unlabeled dataset (i.e., the first sample image) has not been manually screened, so that abnormal data in the unlabeled dataset, which is not distributed differently from the annotated dataset, is likely to occur. At this time, the unlabeled dataset may be first filtered using the labeled dataset such that the labeled dataset conforms to the same distribution as the sample image in the unlabeled dataset.

The sample images in the marked data set and the unmarked data set are consistent with the same distribution, so that the depth face features which are extracted from the sample images in the marked data set and the unmarked data set and can embody category characteristics are consistent with the same distribution. According to the foregoing embodiment, before the unlabeled dataset is input into the face recognition model, the category to which the first sample image in the unlabeled dataset belongs is unknown, so after the first sample image is recognized by using the initial face recognition model to obtain the first recognition result, it cannot be determined whether the first recognition result is correct or incorrect.

Based on this, in the embodiment of the present application, the distribution condition of each target class in the labeled data set may be used as a standard distribution, and then the first multidimensional depth feature is reconstructed in the data generation model according to the standard distribution, so that the third recognition results corresponding to all the reconstructed first multidimensional reconstruction features satisfy the standard distribution. The reconstruction can lead the category characteristic represented by the first multidimensional depth feature extracted by the face feature generation model to be clearer, and lead the first multidimensional reconstruction feature to more accurately highlight the category of unlabeled data. Therefore, in the embodiment of the application, the third recognition result corresponding to each first multi-dimensional reconstruction feature can be used as the standard result of the first sample image, the first recognition result determined by the first multi-dimensional depth feature extracted by the face feature generation model is used as the prediction result of the initial face recognition model, and the prediction result (i.e. the first recognition result) of the initial face recognition model can be judged to be correct or incorrect according to the standard result (i.e. the third recognition result), so that the verification of the prediction result of the initial face recognition model on the unlabeled data set is achieved in training.

In the embodiment of the application, before performing feature reconstruction on each first multidimensional depth feature according to the data generation model to obtain first multidimensional reconstruction features corresponding to each first multidimensional depth feature, an initial data generation model to be trained can be trained by using the marked data set to obtain the data generation model.

Referring to fig. 4, a flowchart of a training process of an initial data generation model according to another embodiment of the present application is shown. Optionally, the method specifically comprises the following steps:

step S410: and inputting each second multidimensional depth feature into an initial data generation model to be trained to obtain initial reconstruction features corresponding to each second multidimensional depth feature.

In the embodiment of the application, the second multidimensional depth features extracted by the face feature generation model from each second sample image in the marked data set can be input into the initial data generation model to be trained. The initial data generation model is trained through the marked data set, so that the trained data generation model can enable the reconstructed depth face characteristics to accord with standard distribution. Optionally, the initial data generation model to be trained may reconstruct the second multidimensional depth features, and output initial reconstructed features corresponding to each second multidimensional depth feature.

Step S420: and inputting the initial reconstruction features into the initial face recognition model to obtain target categories corresponding to the second sample images respectively, and taking the target categories as initial recognition results.

Step S430: and determining an initial loss function according to the second identification result and the initial identification result.

Step S440: training the initial data generation model based on the initial loss function until the initial loss function converges, and determining the initial data generation model when the initial loss function converges as the data generation model, wherein the recognition result corresponding to the reconstructed feature obtained by reconstructing each second multidimensional depth feature by the initial data generation model when the initial loss function converges accords with the standard distribution.

It should be noted that, since the data input into the initial data generating model in the training process is the second multidimensional depth features of the labeled data set, if the initial data generating model is to enable the initial recognition result corresponding to the reconstructed initial reconstruction feature to meet the standard distribution, whether the initial recognition result is consistent with the second recognition result corresponding to the second multidimensional depth feature or not can be determined in the training process. Based on this, in the training process, the recognition error between the second recognition result and the initial recognition result may be determined, and then the initial loss function is constructed according to the recognition error between the second recognition result and the initial recognition result.

Optionally, when training the initial data generating model based on the initial loss function, model parameters of the initial data generating model may be continuously adjusted so that the initial loss function (i.e. the recognition error between the second recognition result and the initial recognition result) becomes smaller until the initial loss function reaches the first loss threshold, which indicates that the initial loss function converges, and the model parameters of the initial data generating model when the first loss threshold is reached are fixed as the model parameters of the data generating model obtained by training.

Alternatively, the initial data generation model to be trained may be composed of an encoder, a resampling module, and a decoder.

In some embodiments of the present application, a similarity between the initial reconstructed feature and the second multi-dimensional depth feature may be calculated, and then an identification error between the second identification result and the initial identification result may be determined from the similarity between the initial reconstructed feature and the second multi-dimensional depth feature.

In other embodiments of the application, a model is generated for initial data to be trained using a labeled datasetBefore training to obtain a data generation model, a neural network model constructed in advance may be trained first using a labeled data set to obtain an initial face recognition model. Optionally, the initial weight matrix of the initial face recognition model may be obtained while the initial face recognition model is obtained using a neural network model previously constructed from the labeled dataset. If the number of the target classes is n, and the dimension of the depth face features extracted from the sample image by the initial face recognition model is d, the initial weight matrix is W _n*d Each element in the initial weight matrix represents a weight value of each target class in each dimension of the depth face feature, which may be represented using a feature parameter w. It will be appreciated that the initial weight matrix may be represented by n column vectors, each column vector corresponding to a feature parameter (weight value) representing each target class in d feature dimensions. Illustratively, the n target classes are the 1 st class, the 2 nd class, and the … … n class, the 2 nd class is exemplified, and the column vector of the initial weight matrix corresponding to the 2 nd class is the feature parameter of the d-dimensional face feature of the 2 nd class.

It should be noted that, because the labeled dataset corresponds to a class label, the initial face recognition model obtained by training the labeled sample image in advance has a certain face recognition capability, so the initial weight matrix can represent the proportion of the face features of each target class in the labeled dataset, and further the column vector of the initial weight matrix can be used for representing the distribution condition of the face features of the specific target class.

In the embodiment of the application, the column vector of the initial weight matrix can be used as the characteristic standard value of the initial data generation model to be trained, and the initial reconstruction characteristic can be used as the characteristic prediction value of the initial data generation model to be trained. And then taking the error between the characteristic standard value and the characteristic predicted value as the identification error.

The initial loss function may be constructed based on an error between the feature standard value and the feature predicted value, and then parameters of the data generation model to be trained may be continuously adjusted based on the initial loss function, and iterated until the initial loss function converges, so that recognition results recognized after the training-completed data generation model reconstructs the input depth face features may satisfy the standard distribution.

Alternatively, the initial loss function may be:

/>

wherein the first term is reconstruction loss and the second term is KL loss. w (w) _i Column vector representing initial weight matrix, F _i A second multi-dimensional depth feature of each second sample image of the model is generated for the input initial data. G (-) represents the initial data generation model, that is, G (F) _i ) Representing the initial reconstructed characteristics of each second sample image. II w _i -G(F _i )‖ ² Is calculated as a second order norm _i And G (F) _i ) The distance between the two, namely the error between the characteristic standard value and the characteristic predicted value. m and v represent the mean and variance of the standard distribution output by the initial data generation model, and d and N are the dimension of the depth face feature and the number of second sample images, respectively. Alternatively, N is equal to B1. By setting the reconstruction loss and the KL loss in the initial loss function, the reconstructed features of the data generation model after reconstruction can retain the category characteristics of the multidimensional depth features before reconstruction, namely the recognition results obtained by image recognition by using the features before and after reconstruction have little difference, and the recognition results corresponding to the reconstructed features of the data generation model can meet the standard distribution.

Step S330: and determining a first loss function according to the first identification result and the third identification result.

Based on the foregoing embodiment, it can be known that, by using the third recognition result as the standard result of the unlabeled dataset, the predicted result (i.e., the first recognition result) of the unlabeled dataset can be verified by the initial face recognition model. Thus, in some embodiments, an identification error between the first identification result and the third identification result may be determined, and then the first penalty function may be constructed based on the identification error between the first identification result and the third identification result.

Step S340: and determining a second loss function according to the target category corresponding to the second sample image and the second identification result.

Optionally, the target category corresponding to the second sample image may be a target category corresponding to a category label marked by the second sample image. Alternatively, the target category corresponding to the second sample image may be used as a standard result of the labeled dataset, and the second recognition result may be used as a prediction result of the initial face recognition model on the labeled dataset. Thus, the recognition error between the second recognition result (i.e. the prediction result made by the initial face recognition model on the annotated data set) and the target class corresponding to the second sample image (i.e. the standard result of the annotated data set) may be determined first, and then the second loss function may be constructed according to the recognition error between the second recognition result and the target class corresponding to the second sample image.

Step S350: the target loss function is determined based on the first loss function and the second loss function.

In some exemplary embodiments, the sum of the first loss function and the second loss function may be taken as the target loss function.

At this time, the verification of the first recognition result of the unlabeled dataset recognized by the initial face recognition model using the third recognition result can be realized by constructing the first loss function, and the verification of the second recognition result of the labeled dataset recognized by the initial face recognition model using the labeled target class can be realized by constructing the second loss function. The target loss function determined by the first loss function and the second loss function is restrained, so that the recognition error of the initial face recognition model can be reduced, and the purpose of improving the recognition accuracy of the initial face recognition model in training is achieved.

In some embodiments of the present application, the step of determining the first recognition result and the second recognition result in the initial face recognition model may refer to the following procedure.

Referring to fig. 5, a flowchart of a training process of an initial face recognition model according to an embodiment of the application is shown. Optionally, step S230 may specifically include the following steps:

Step S510: and inputting each first multidimensional depth feature into the initial face recognition model to obtain a first classification probability of each first sample image belonging to each target class.

Step S520: and determining a target category corresponding to a first classification probability larger than a first preset probability value as the first recognition result, wherein each first classification probability corresponds to one target category.

Step S530: and inputting each second multidimensional depth feature into the initial face recognition model to obtain a second classification probability of each second sample image belonging to each target class.

Step S540: and determining a target class corresponding to the second classification probability larger than a second preset probability value as the second recognition result, wherein each second classification probability corresponds to one target class.

In some implementations, an initial weight matrix of an initial face recognition model may be obtained. Wherein each element within the initial weight matrix is used to represent a weight value for each target class in the respective dimension of the depth face feature.

And then, multiplying the first multidimensional depth features corresponding to the first sample images by the initial weight matrix respectively to obtain first classification probabilities that the first sample images belong to the target categories, and multiplying the second multidimensional depth features corresponding to the second sample images by the initial weight matrix respectively to obtain second classification probabilities that the second sample images belong to the target categories.

In the training process of the initial face recognition model, model parameters of the initial face recognition model are continuously adjusted, and it is understood that an initial weight matrix of the initial face recognition model is also continuously changed. Therefore, when the model parameters of the initial face recognition model are adjusted each time, the first classification probability and the second classification probability can be recalculated according to the current initial weight matrix, a new first recognition result and a new second recognition result are obtained again, the value of the target loss function is calculated according to the new first recognition result and the new second recognition result, whether the target loss function is converged is judged according to the value of the target loss function, whether the model parameters of the initial face recognition model are continuously adjusted is determined again until the target loss function is converged, training of the initial face recognition model is completed, and the initial face recognition model when the target loss function is converged is used as the face recognition model obtained through training.

By calculating the probability that each sample image belongs to each target category in the initial face recognition model and comparing the probability values corresponding to each target category, the target category corresponding to the probability larger than the first preset probability value (corresponding to the first sample image) or the probability larger than the second preset probability value (corresponding to the second sample image) can be used as the recognition result of the sample image, namely the target category with larger recognition probability can be filtered out to be used as the recognition result determined by the initial face recognition model.

In some embodiments, in the process of adjusting the initial face recognition model, the process of determining the first classification probability may further include the steps of:

and splicing the weight value of each target class corresponding to each dimension of the depth face features in the initial weight matrix with the first multidimensional depth features corresponding to each first sample image to obtain a first weight matrix of the initial face recognition model. And then determining a first classification probability of the first sample image belonging to each target category according to the first multidimensional depth feature and the first weight matrix.

Optionally, in the initial weight matrix of the initial face recognition model, if the number of classes of the target class is n and the dimension of the face feature generation model on the depth face feature extracted from the image is d, the initial weight matrix is W _n*d . In some embodiments, the initial weight matrix may be represented by n column vectors, each column vector corresponding to an initial feature parameter (composed of d weight values) representing each target class in d feature dimensions.

In some implementationsIn the embodiment, the dimension of the initial feature parameter is d, and the dimension of the first multidimensional depth feature is also d, so that the initial feature parameters of each target category in the initial weight matrix can be spliced with each first multidimensional depth feature to obtain a first weight matrix of the initial face recognition model. In the initial weight matrix, the number of initial feature parameters is the number n of target categories, the number of first multidimensional depth features is the number B2 of first sample faces, and the spliced first weight matrix is Representing a set of real numbers.

After splicing, updating the initial characteristic parameters in the first weight matrix into new target characteristic parameters. In some embodiments, a first sample similarity between the first face feature and each target feature parameter in the first weight matrix may be calculated, and then a first classification probability that each first sample image belongs to each target class may be calculated according to the first sample similarity. Alternatively, the first sample similarity may be a cosine similarity. Alternatively, the first sample similarity may be input into the classifier to obtain n-dimensional logits related to the class, and the n-dimensional logits may be used as the first classification probability corresponding to each target class.

In other embodiments, the data amplification device may be used to amplify the first face features of the unlabeled dataset S times to obtainLikewise, a +>Representing a set of real numbers. It will be appreciated that after amplification, it is possible to obtainThe s x B2 d-dimensional first face features may be included. In this embodiment +.>Splicing with each target characteristic parameter in the first weight matrix to obtain a new first weight matrixReuse->And determining a first classification probability of the first sample image belonging to each target category with the new first weight matrix. The S times of augmentation is equivalent to the S times of training the initial face recognition model by using the same unlabeled data set, the way of augmentation is equivalent to the increase of the training times of the initial face recognition model, and the accuracy of the face recognition model obtained after training can be obviously improved.

In some embodiments, after the first classification probability and the second classification probability are determined, a target class corresponding to the first classification probability greater than the first preset probability value may be determined as a first recognition result of the first sample image, and a target class corresponding to the second classification probability greater than the second preset probability value may be determined as a second recognition result of the second sample image. Optionally, the first preset probability value and the second preset probability value may be preset, and the first preset probability value and the second preset probability value may be equal or unequal.

Then, the objective loss function is determined based on the first recognition result and the second recognition result based on the manner in the foregoing embodiment. For example, a first loss function is determined according to the first recognition result and the third recognition result, a second loss function is determined according to the target class of the second recognition result corresponding to the second sample image, and a target loss function is determined based on the first loss function and the second loss function.

Alternatively, the first loss function corresponding to the unlabeled dataset may be

Wherein n is ₂ Representing the number of first sample images in the unlabeled dataset (n 2 equals B2), Probability value (i.e. second classification probability) indicating that the first sample image belongs to the pseudo tag corresponding class, n is the number of classes of the target class, +.>Representing that the first sample image predicted by the initial face recognition model belongs to each target class y _j I.e. the first classification probability).

In the embodiment of the application, the initial face recognition model can be trained by using the marked data set while the initial face recognition model is trained by using the unmarked data set. At this time, when the second loss function corresponding to the labeled data set is calculated, all the first sample images in the unlabeled data set may be regarded as negative samples. In some implementations, the second classification probability of each second sample image belonging to the respective target class may be determined using a face feature generation model for the second multi-dimensional depth features extracted for each second sample image in conjunction with the first weight matrix. Similar to the calculation process of the first classification probability, optionally, a second sample similarity between the second multidimensional depth feature and each target feature parameter in the first weight matrix may be calculated, and then the second classification probability of each second sample image belonging to each target class may be calculated according to the second sample similarity. Similarly, the second sample similarity may be a cosine similarity. Alternatively, the n-dimensional logits related to the category may be obtained after the second sample similarity is input into the full connection layer, and the n-dimensional logits may be used as the second category probability corresponding to each target category.

Alternatively, the second loss function corresponding to the annotated data set may be

Wherein n is ₁ Representing the number of second sample images in the annotated dataset (n 1 equals B1),representing that a second sample image predicted by the face recognition model belongs to the marked target class y _i N is the number of categories of the target category, +.>Representing that a second sample image predicted by the initial face recognition model belongs to each target class y _j I.e. the second class probability of each target class).

In some embodiments, the objective Loss function of the initial face recognition model may be Loss ₃ ＝Loss _label +Loss _unlabel . When the initial face recognition model is trained based on the target loss function, model parameters of the initial face recognition model can be continuously adjusted to enable the target loss function to be converged, and model parameters of the initial face recognition model when the target loss function is converged are fixed and used as model parameters of the face recognition model obtained through training.

After the face recognition model is adjusted by using the unlabeled data set, the recognition accuracy of the face recognition model is improved. It should be noted that, after each adjustment, the unlabeled dataset may be deduplicated. For example, a face recognition model may be used to calculate a sample probability that each first sample image in the unlabeled dataset belongs to each target class, and for each first sample image, if there is a target probability greater than a first probability threshold in the sample probabilities of the respective target classes, then the target class corresponding to the target probability greater than the first probability threshold may be used as a class label of the first sample image, and then the first sample image may be rejected from the unlabeled dataset. Alternatively, the rejected first sample image may be added to the annotated data set.

It will be appreciated that as the accuracy of the initial face recognition model increases, fewer and fewer first sample images in the unlabeled dataset will be available.

Referring to fig. 6, a flow chart of an overall scheme of a training method of a face recognition model according to an embodiment of the application may specifically include the following steps:

step one: the labeled dataset is used to train the face feature generation model and the initial face recognition model.

In this embodiment, an initial feature generation model (which may be constructed based on a skeleton network, such as a res net series) may be first trained using a labeled data set, and an initial face recognition model may be obtained by training a neural network model constructed in advance using the labeled data set. The second sample image in the marked data set may be a face image, the initial face feature generating model may extract a d-dimensional depth feature corresponding to each face image as 512 dimensions, and the pre-built neural network model (taking a full connection layer as an example) may convert the d-dimensional depth feature into n-dimensional logits related to the category, where n is the number of categories of the target category in the marked data set. And taking each d-dimensional depth feature and a target class (namely a class label) corresponding to each face image as input of a pre-built neural network model (such as ArcFace), outputting the target class corresponding to each d-dimensional depth feature, wherein the classification strategy can be measured by using cosine similarity, and calculating a loss function of an initial face feature generation model and the pre-built neural network model, which is called an initial model loss function. In the training process of the first step, if the initial model loss function converges, parameters and a weight matrix W of the face feature generation model and the initial face recognition model can be obtained _n*d 。

Step two: a model is generated using the labeled dataset training data.

In this embodiment, the parameters of the face feature generation model and the initial face recognition model obtained in the first step are respectively used as the initial parameters of the face feature generation model and the initial face recognition model in the second step.

And inputting the marked data set into a face feature generation model, and extracting d-dimensional depth features of each face image.

The data generation model includes an Encoder, a resampling module, and a decoder, wherein the data generation model may be implemented using a variable Auto-Encoder (VAE), for example. The data generation model uses d-dimensional depth features extracted by the face feature generation model as input, and the d-dimensional depth features generated by resampling are obtained through three modules of an encoder, a resampling module and a decoder. The encoder module predicts the standard distribution of the data to obtain the depth characteristic of the data, and then divides the depth characteristic into a mean and a variance. The resampling module learns the distribution parameters (mean and variance) of the predicted distribution, so as to obtain new characteristics. Finally, the decoder reconstructs the features to obtain d-dimensional features. Wherein, in the step one, the weight matrix W _n*d May be used to characterize the standard distribution of the respective target classes. The data generation model may calculate an error between the predicted distribution and the standard distribution using the first initial loss function. In the training process of the second step, if the initial loss function converges, parameters of the face feature generation model, parameters of the initial face recognition model and parameters of the data generation module can be obtained.

Step three: and training a face recognition model by combining the marked data and the unmarked data.

And step two, obtaining parameters of the face feature generation model, parameters of the initial face recognition model and parameters of the data generation module, and respectively taking the parameters as the initial parameters of the face feature generation model, the initial parameters of the initial face recognition model and the initial parameters of the data generation module in the step three.

Firstly, extracting d-dimensional depth features of a marked data set and an unmarked data set by using a face feature generation model, and then extracting the depth features of the marked data setDepth features of unlabeled dataWeight matrix of the last full connection layer +.>L2 normalization was performed separately, providing for the calculation of cosine similarity. Wherein d, B _label 、B _unlabel The number of the face features, the number of the samples in the marked data set in the batch and the number of the samples in the unmarked data set in the batch are respectively represented, and n is the class number of the target class.

And then, de-duplicating the first sample image in the unlabeled data set to ensure that the category to which the unlabeled data set actually belongs is not in the labeled data set. Specifically, after activation through the last fully-connected layer and softmax, the first sample image with softmax activation value greater than the first probability threshold is filtered out.

Then, a second loss function of the labeled dataset of the jointly trained initial face recognition model is calculated.

In the case of the second loss function of the annotated data set, since the deduplication strategy is used, each first sample image is different from the class in the annotated data set, so that all first sample images can be treated as negative samples in the calculation of the second loss function of the annotated data set. Specifically, the weight matrix W is spliced _n*d D-dimensional depth characteristics of all first sample images are combined to obtain a new weight matrixThen, calculating a second Loss function Loss corresponding to the marked data set by using the initial face recognition model _label 。

Finally, a first loss function of unlabeled data of the jointly trained initial face recognition model is calculated.

When the first loss function of the unlabeled dataset is calculated, the reconstructed features generated by the data generation model on all the first sample images are used as positive samples of the first loss function, and other features (such as column vectors of a weight matrix, d-dimensional depth features of the first sample images except the first loss function and corresponding reconstructed features) are used as negative samples of the first loss function.

Specifically, all the first is modeled by data generationD-dimensional depth feature of a sample image is amplified s times to obtainThen splice the weight matrix W ₁ And>obtaining a new weight matrixFinally, calculating a first Loss function Loss of the unlabeled dataset by using the initial face recognition model _unlabel 。

Training an initial face recognition model by combining the marked data set and the unmarked data set, taking the sum of the second loss function of the marked data set and the first loss function of the unmarked data set as a target loss function of the initial face recognition model, and checking whether the target loss function is converged or not. If the face characteristics are converged, parameters of the face characteristic generating model, parameters of the initial face identification model and parameters of the data generating module are fixed, and therefore the face identification model after training is obtained.

In some embodiments, the iterative training step two and the step three may be repeated until the target loss function converges, and the initial face recognition model when the target loss function converges is used as the face recognition model. In other embodiments, the iterative training steps two and three may also be repeated until a preset training condition is reached. The preset training conditions can be cross-trained through a training set and a testing set, so that the accuracy of the face recognition model reaches the preset accuracy and the like.

It can be understood that the above steps may refer to corresponding steps in the training process of the face recognition model in the foregoing embodiment, which is not described in detail in the embodiments of the present application.

In some embodiments, after training the training method of the face recognition model according to any of the foregoing embodiments to obtain the face recognition model, the face recognition model may be used to analyze the target class to which the face image to be recognized belongs, so as to obtain the recognition result of the face image to be recognized. Specifically, referring to fig. 7, a flow chart of a face recognition method according to an embodiment of the present application may specifically include the following steps:

step S710: and acquiring a face image to be recognized, inputting the face image to be recognized into a face feature generation model, and outputting a multi-dimensional depth feature to be recognized corresponding to the face image to be recognized.

Step S720: inputting the multi-dimensional depth features to be recognized into a face recognition model, and outputting a recognition result of the face images to be recognized, wherein the face recognition model is obtained by training a plurality of first multi-dimensional depth features of an unlabeled data set and a plurality of second multi-dimensional depth features of a labeled data set, sample images in the unlabeled data set are different from sample images in the labeled data set, the unlabeled data set comprises a plurality of first sample images, the labeled data set comprises a plurality of second sample images, the plurality of first multi-dimensional depth features are obtained by extracting each first sample image by the face feature generation model, and the plurality of second multi-dimensional depth features are obtained by extracting each second sample image by the face feature generation model.

In this embodiment, after obtaining the face image to be identified, the face image to be identified may be input into the face feature generation model to obtain the multi-dimensional depth feature to be identified corresponding to the face image to be identified.

The multi-dimensional depth features to be identified may then be input into a face recognition model, which may be trained from a plurality of first multi-dimensional depth features of the unlabeled dataset and a plurality of second multi-dimensional depth features of the labeled dataset using the training method of the face recognition model described in any of the embodiments above. Optionally, after receiving the multidimensional depth feature to be identified, the face identification model may obtain the probability that the face image to be identified belongs to each target category to be identified, and may output the target category corresponding to the probability that the probability to be identified is greater than the third preset probability value as an identification result of the face image to be identified, where each probability to be identified corresponds to one target category. The third predetermined probability value may be obtained based on the first predetermined probability value and the second predetermined probability value, for example, may be the same as the first predetermined probability value and the second predetermined probability value. In some typical embodiments, the target class with the highest probability of being identified may be output as the identification result of the face image to be identified.

In application scenes such as security protection and face payment, face images of users are generally collected in real time, then the collected face images are identified, and the identity of the users is verified according to face features of the users. For example, when the collected face image is identified, the face identification method described in the embodiment of the application can be adopted to identify the collected face image as the face image to be identified, and the identification result of the face image to be identified is output. And then verifying the user identity of the acquired face image according to the recognition result of the face image to be recognized, for example, determining whether the target category corresponding to the recognition result is consistent with the category input in advance, if so, indicating that the user identity verification is passed, and if not, indicating that the user identity verification is not passed. The embodiment of the application improves the accuracy of face recognition, so that the accuracy of user identity verification can be improved when the user identity is verified, and the effect of accurately distinguishing legal users can be achieved.

It may be understood that, in the embodiment of the present application, the training process of the feature extraction module, the data identification module, and the data generation module may refer to the processing process of the corresponding steps in the foregoing embodiment, which is not described herein again.

Referring to fig. 8, a block diagram of a training device for a face recognition model according to an embodiment of the present application is shown. Specifically, the training device of the face recognition model may include: a dataset acquisition module 810, a feature generation module 820, a sample identification module 830, a function construction module 840, and a model training module 850.

Wherein the data set obtaining module 810 is configured to obtain an unlabeled data set and a labeled data set, where a sample image in the unlabeled data set is different from a sample image in the labeled data set, and the unlabeled data set includes a plurality of first sample images, and the labeled data set includes a plurality of second sample images; the feature generation module 820 is configured to input the unlabeled dataset and the labeled dataset into a face feature generation model, and output a first multi-dimensional depth feature corresponding to each of the first sample images and a second multi-dimensional depth feature corresponding to each of the second sample images; the sample recognition module 830 is configured to input the first multi-dimensional depth feature and the second multi-dimensional depth feature into an initial face recognition model, and output a first recognition result corresponding to each of the first sample images and a second recognition result corresponding to each of the second sample images; a function construction module 840, configured to determine an objective loss function according to the first recognition result and the second recognition result; the model training module 850 is configured to train the initial face recognition model based on the objective loss function until the objective loss function converges, and determine the initial face recognition model when the objective loss function converges as the face recognition model.

In some embodiments, the training device of the face recognition model may further include: the feature reconstruction module is used for carrying out feature reconstruction on each first multidimensional depth feature according to the data generation model to obtain first multidimensional reconstruction features corresponding to each first multidimensional depth feature; the distribution determining module is used for identifying the corresponding target categories of the first sample image according to the first multidimensional reconstruction features and taking the target categories as third identification results, wherein the third identification results meet standard distribution, and the standard distribution is used for representing the distribution condition of each target category in the marked data set.

Alternatively, the function construction module 840 may include: the first function construction module is used for determining a first loss function according to the first identification result and the third identification result; a second function construction model is used for determining a second loss function according to the target category corresponding to the second sample image and the second identification result; and the function construction submodule is used for determining the target loss function based on the first loss function and the second loss function.

Optionally, the training device of the face recognition model may further include: the initial reconstruction module is used for inputting each second multidimensional depth feature into an initial data generation model to be trained to obtain initial reconstruction features corresponding to each second multidimensional depth feature; the initial recognition module is used for inputting the initial reconstruction features into the initial face recognition model to obtain target categories corresponding to the second sample images respectively and taking the target categories as initial recognition results; the initial function construction module is used for determining an initial loss function according to the second identification result and the initial identification result; the data generation model training module is configured to train the initial data generation model based on the initial loss function until the initial loss function converges, and determine the initial data generation model when the initial loss function converges as the data generation model, where an identification result corresponding to a reconstructed feature obtained after the initial data generation model when the initial loss function converges reconstructs each second multidimensional depth feature accords with the standard distribution.

Optionally, the sample identification module 830 may include: the first probability determining module is used for inputting each first multidimensional depth feature into the initial face recognition model to obtain a first classification probability that each first sample image belongs to each target class; the first result determining module determines a target category corresponding to a first classification probability larger than a first preset probability value as the first identification result, wherein each first classification probability corresponds to one target category; the second probability determining module is used for inputting each second multidimensional depth feature into the initial face recognition model to obtain a second classification probability of each second sample image belonging to each target class; and the second result determining module is used for determining a target category corresponding to the second classification probability larger than a second preset probability value as the second recognition result, wherein each second classification probability corresponds to one target category.

Referring to fig. 9, a block diagram of a face recognition device according to an embodiment of the present application is shown. Specifically, the face recognition apparatus may include: the image acquisition module 910 and the image recognition module 920.

The image obtaining module 910 is configured to obtain a face image to be identified, input the face image to be identified into a face feature generating model, and output a multi-dimensional depth feature to be identified corresponding to the face image to be identified; the image recognition module 920 is configured to input the multi-dimensional depth feature to be recognized into a face recognition model, and output a recognition result of the face image to be recognized, where the face recognition model is obtained by training a plurality of first multi-dimensional depth features of an unlabeled dataset and a plurality of second multi-dimensional depth features of a labeled dataset, the sample image in the unlabeled dataset is different from the sample image in the labeled dataset, the unlabeled dataset includes a plurality of first sample images, the labeled dataset includes a plurality of second sample images, the plurality of first multi-dimensional depth features are obtained by extracting each of the first sample images by the face feature generation model, and the plurality of second multi-dimensional depth features are obtained by extracting each of the second sample images by the face feature generation model.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to corresponding processes in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided by the present application, the illustrated or discussed coupling or direct coupling or communication connection of the modules to each other may be through some interfaces, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other forms.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

Referring to fig. 10, a block diagram of an electronic device according to an embodiment of the application is shown. The electronic device 1000 in this embodiment may include one or more of the following components: a processor 1010, a memory 1020, and one or more applications, wherein the one or more applications may be stored in the memory 1020 and configured to be executed by the one or more processors 1010, the one or more applications configured to perform the method as described in the foregoing method embodiments.

Wherein the electronic device may be any of a variety of types of computer system devices that are mobile, portable, and perform wireless communications. In particular, the electronic device may be a mobile phone or a smart phone (e.g., an iPhone-based (TM) -based phone), a Portable game device (e.g., a Nintendo DS (TM) -based phone, a PlayStation Portable (TM) -Gameboy Advance TM, an iPhone (TM)), a laptop, a PDA, a Portable internet device, a music player, and a data storage device, other handheld devices, and devices such as a smart watch, a smart bracelet, a headset, a pendant, etc., and the electronic device may also be other wearable devices (e.g., devices such as an electronic glasses, an electronic garment, an electronic bracelet, an electronic necklace, an electronic tattooing, an electronic device, or a head-mounted device (HMD)).

The electronic device may also be any of a number of electronic devices including, but not limited to, cellular telephones, smart phones, smart watches, smart bracelets, other wireless communication devices, personal digital assistants, audio players, other media players, music recorders, video recorders, cameras, other media recorders, radios, medical devices, vehicle transportation equipment, calculators, programmable remote controls, pagers, laptop computers, desktop computers, printers, netbooks, personal Digital Assistants (PDAs), portable Multimedia Players (PMPs), moving picture experts group (MPEG-1 or MPEG-2) audio layer 3 (MP 3) players, portable medical devices, and digital cameras, and combinations thereof.

In some cases, the electronic device may perform a variety of functions (e.g., playing music, displaying video, storing pictures, and receiving and sending phone calls). The electronic device may be, for example, a cellular telephone, a media player, other handheld device, a wristwatch device, a pendant device, an earpiece device, or other compact portable device, if desired.

Optionally, the electronic device may be a server, for example, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and an artificial intelligent platform, or a dedicated or platform server that provides face recognition, autopilot, industrial internet services, data communication (such as 4G, 5G, etc.).

Processor 1010 may include one or more processing cores. The processor 1010 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device, and process data by executing or executing instructions, applications, code sets, or instruction sets stored in the memory 1020, and invoking data stored in the memory 1020. Alternatively, the processor 1010 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 1010 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 1010 and may be implemented solely by a single communication chip.

Memory 1020 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (rom). Memory 1020 may be used to store instructions, applications, code, a set of codes, or a set of instructions. The memory 1020 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The stored data area may also be data created by the electronic device in use (e.g., phonebook, audio-video data, chat-record data), etc.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the processor 1010 and the memory 1020 of the electronic device described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

Referring to fig. 11, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable storage medium 1100 has stored therein program code that can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 1100 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, computer readable storage medium 1100 includes non-volatile computer readable storage medium (non-transitory computer-readable storage medium). The computer readable storage medium 1100 has storage space for program code 1110 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 1110 may be compressed, for example, in a suitable form. The computer readable storage medium 1100 may be, for example, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), an SSD, an electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read Only Memory EEPROM), or a Flash Memory (Flash).

In some embodiments, please refer to fig. 12, which illustrates a block diagram of a computer program product provided by an embodiment of the present application, the computer program product 1200 includes a computer program/instructions 1210, the computer program/instructions 1210 being stored in a computer readable storage medium. The computer program/instructions 1210 are read from a computer-readable storage medium by a processor of a computer device, and the processor executes the computer program/instructions 1210, causing the computer device to perform the steps of the method embodiments described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the method of the above embodiments may be implemented by means of software plus a necessary general purpose hardware platform, or of course by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, SSD, flash) comprising several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method of the embodiments of the present application.

The embodiment provides a training method, device, electronic equipment and storage medium for a face recognition model, which can use unlabeled data sets and labeled data sets containing images of different samples to train the face recognition model simultaneously. Specifically, first, a first multidimensional depth feature corresponding to a first sample image in an unlabeled data set is obtained through a face feature generation model, a second multidimensional depth feature corresponding to a second sample image in the labeled data set is obtained through a face feature generation model, then the first multidimensional depth feature and the second multidimensional depth feature are input into an initial face recognition model, a first recognition result corresponding to the first sample image and a second recognition result corresponding to the second sample image are output, then the first recognition result and the second recognition result are used for determining a loss function of the initial face recognition model and training the initial face recognition model, and finally the initial face recognition model when the loss function converges is determined to be the face recognition model. In this embodiment, the sample images used during training are more various, so the face feature generation model can extract various multi-dimensional depth features, and the various multi-dimensional depth features can enable the face recognition model obtained through training to have stronger image discrimination capability, so that the face recognition model obtained through training can identify the types of the images more accurately.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be appreciated by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for training a face recognition model, the method comprising:

obtaining an unlabeled dataset and a labeled dataset, the sample images in the unlabeled dataset being different from the sample images in the labeled dataset, wherein the unlabeled dataset comprises a plurality of first sample images and the labeled dataset comprises a plurality of second sample images;

inputting the unlabeled dataset and the labeled dataset into a face feature generation model, and outputting first multidimensional depth features corresponding to the first sample images and second multidimensional depth features corresponding to the second sample images;

Inputting the first multi-dimensional depth feature and the second multi-dimensional depth feature into an initial face recognition model, and outputting a first recognition result corresponding to each first sample image and a second recognition result corresponding to each second sample image;

determining a target loss function according to the first identification result and the second identification result;

training the initial face recognition model based on the target loss function until the target loss function converges, and determining the initial face recognition model when the target loss function converges as the face recognition model.

2. The method of claim 1, wherein each of the second sample images in the annotated dataset corresponds to a target class, and wherein prior to determining a target loss function based on the first recognition result and the second recognition result, further comprising:

performing feature reconstruction on each first multidimensional depth feature according to a data generation model to obtain first multidimensional reconstruction features corresponding to the first multidimensional depth features;

and identifying the target categories corresponding to the first sample images according to the first multidimensional reconstruction features, and taking the target categories as third identification results, wherein the third identification results meet standard distribution, and the standard distribution is used for representing the distribution condition of each target category in the marked data set.

3. The method of claim 2, wherein the determining a target loss function from the first recognition result and the second recognition result comprises:

determining a first loss function according to the first identification result and the third identification result;

determining a second loss function according to the target category corresponding to the second sample image and the second identification result;

the target loss function is determined based on the first loss function and the second loss function.

4. The method according to claim 2, wherein before performing feature reconstruction on each first multi-dimensional depth feature according to the data generation model to obtain a first multi-dimensional reconstructed feature corresponding to each first multi-dimensional depth feature, the method further comprises:

inputting each second multidimensional depth feature into an initial data generation model to be trained to obtain initial reconstruction features corresponding to each second multidimensional depth feature;

inputting the initial reconstruction features into the initial face recognition model to obtain target categories corresponding to the second sample images respectively, and taking the target categories as initial recognition results;

determining an initial loss function according to the second identification result and the initial identification result;

Training the initial data generation model based on the initial loss function until the initial loss function converges, and determining the initial data generation model when the initial loss function converges as the data generation model, wherein the recognition result corresponding to the reconstructed feature obtained by reconstructing each second multidimensional depth feature by the initial data generation model when the initial loss function converges accords with the standard distribution.

5. The method of claim 1, wherein each of the second sample images in the annotated dataset corresponds to a target class, wherein inputting each of the first multi-dimensional depth features and each of the second multi-dimensional depth features into an initial face recognition model, outputting a first recognition result for each of the first sample images and a second recognition result for each of the second sample images, comprises:

inputting each first multidimensional depth feature into the initial face recognition model to obtain a first classification probability of each first sample image belonging to each target class;

determining a target category corresponding to a first classification probability larger than a first preset probability value as the first recognition result, wherein each first classification probability corresponds to one target category;

Inputting each second multidimensional depth feature into the initial face recognition model to obtain a second classification probability of each second sample image belonging to each target class;

and determining a target class corresponding to the second classification probability larger than a second preset probability value as the second recognition result, wherein each second classification probability corresponds to one target class.

6. A method of face recognition, the method comprising:

acquiring a face image to be recognized, inputting the face image to be recognized into a face feature generation model, and outputting a multi-dimensional depth feature to be recognized corresponding to the face image to be recognized;

inputting the multi-dimensional depth features to be recognized into a face recognition model, and outputting a recognition result of the face images to be recognized, wherein the face recognition model is obtained by training a plurality of first multi-dimensional depth features of an unlabeled data set and a plurality of second multi-dimensional depth features of a labeled data set, sample images in the unlabeled data set are different from sample images in the labeled data set, the unlabeled data set comprises a plurality of first sample images, the labeled data set comprises a plurality of second sample images, the plurality of first multi-dimensional depth features are obtained by extracting each first sample image by the face feature generation model, and the plurality of second multi-dimensional depth features are obtained by extracting each second sample image by the face feature generation model.

7. A training device for a face recognition model, the device comprising:

a data set acquisition module, configured to acquire an unlabeled data set and a labeled data set, where a sample image in the unlabeled data set is different from a sample image in the labeled data set, and the unlabeled data set includes a plurality of first sample images, and the labeled data set includes a plurality of second sample images;

the feature generation module is used for inputting the unlabeled data set and the labeled data set into a face feature generation model and outputting a first multi-dimensional depth feature corresponding to each first sample image and a second multi-dimensional depth feature corresponding to each second sample image;

the sample recognition module is used for inputting the first multidimensional depth feature and the second multidimensional depth feature into an initial face recognition model and outputting a first recognition result corresponding to each first sample image and a second recognition result corresponding to each second sample image;

the function construction module is used for determining a target loss function according to the first identification result and the second identification result;

and the model training module is used for training the initial face recognition model based on the target loss function until the target loss function converges, and determining the initial face recognition model when the target loss function converges as the face recognition model.

8. An electronic device comprising a processor, a memory, the memory storing a computer program, the processor being configured to perform the method of any one of claims 1 to 5 or the method of claim 6 by invoking the computer program.

9. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the method of any one of claims 1 to 5, or the method of claim 6.

10. A computer program product comprising instructions stored therein, which when run on a computer causes the computer to implement the method of any one of claims 1 to 5 or the method of claim 6.