In specific implementation, the source image set and the target image set are used for training the face recognition model. In the process of training the face recognition model, the source images and the target images are subjected to domain fusion so as to realize the learning of features between mixed domains, so that the face recognition model obtained by training can learn some features in the target images while learning the features in the source images in the process of training the model.
Preferably, the comparison image in the comparison image set and the source image in the source image set may be the same. Because the comparison image in the comparison image set is the same as the source image in the source image set, the distance between the second neural network and the first characteristic vector extracted by the first neural network can be better used when the second neural network is used for extracting the second characteristic vector, the degree of the second neural network influenced by the target image is measured, a better second neural network training result is obtained, and the accuracy of face recognition is improved.
S102: the comparison image set is input into a first neural network, and a first feature vector is extracted for each comparison image in the input comparison image set by using the first neural network.
In a specific implementation, the first neural Network may perform feature extraction on each comparison image in the image set by using a Convolutional Neural Network (CNN) comparison, so as to obtain a first feature vector corresponding to each comparison image.
Specifically, the overall comparison image in the comparison image set may be subjected to feature extraction through the first neural network to obtain a first feature vector of the overall comparison image, or the first neural network may be used to perform local feature extraction for the input comparison image to generate a local first feature vector for each comparison image.
Here, when local feature extraction is performed on a comparison image input using the first neural network, a portion to be subjected to the local feature extraction is first determined. For example, local positions of glasses, mouth, nose, eyebrows and forehead in each comparison image can be determined first, and then feature extraction is performed on the selected local positions in each source image by using a CNN network, so as to generate corresponding first feature vectors for each local position. It should be noted here that there are a plurality of first feature vectors of corresponding parts of each comparison image; the local first feature vectors jointly form a first feature vector of the comparison image.
S103: inputting the source image set and the target image set into a second neural network, and extracting a second feature vector for each source image in the source image set after performing feature learning on the source images in the source image set and the target images in the target image set; and inputting the second feature vector into a face classifier to obtain a classification result.
In a specific implementation, a source image in a source image set carries a tag, and the tag is used for indicating the identity corresponding to a face in the source image; the target images in the set of target images may or may not carry tags. After the source image set and the target image set are input into the second neural network, the second neural network performs shared parameter feature learning on the source images in the source image set and the target images in the target image set. In the process, because the second neural network carries out supervised learning on the source images in the source image set and carries out unsupervised learning on the target images in the target image set, the parameters used in the second neural network can be continuously adjusted in the process of using the same second neural network to carry out parameter sharing learning on the two images, and therefore, in the process of training the second neural network, the parameters of the second neural network can be influenced by the target image set. And then the second neural network performs feature learning on the source images and the target images, and performs feature extraction on each source image to obtain the interference of the second feature vector by the target images, so that inter-domain mixing of the source images and the target images is realized.
S104: and performing the current round of training on the second neural network and the face classifier based on the comparison result of the first feature vector and the corresponding second feature vector and the classification result.
In a specific implementation, the first feature vector extracted from the comparison image in the comparison image set is used to constrain the influence of the target image on the second neural network, that is, the parameters in the second neural network are influenced by the target image, but the target image cannot greatly influence the parameters in the second neural network, so that the first feature vector and the second feature vector are compared after the first feature vector and the second feature vector are obtained. If the difference between the first feature vector and the second feature vector is too large, the influence of the target image on the parameters in the second neural network is too large, the parameters in the second neural network are readjusted, and the influence of the target image on the parameters in the second neural network is reduced.
Meanwhile, the parameters in the second neural network are influenced by the target image, so that the second feature vector extracted from the source image by the second neural network is influenced to a certain extent, and the second feature vector is used for carrying out supervised training on the classifier, so that the classifier obtained by training can correctly classify the source image by using the second feature vector extracted from a plurality of source images.
Further, the comparison image from which the first feature vector is compared and the source image from which the corresponding second feature vector is from are images of the same person, that is, in one round of training, the image set input to the first neural network and the second neural network may be images of a plurality of persons, and the corresponding feature vector may be output for each person.
Specifically, in an alternative embodiment, the embodiment of the present invention completes the current round of training on the second neural network and the face classifier based on the first neural network by performing the following distance determination operation and classification operation until the distance is not greater than the preset distance threshold and the obtained classification result is correct.
As shown in fig. 2, the distance determining operation includes:
s201: a distance between the first feature vector and the currently determined corresponding second feature vector is determined.
Here, the distance between the first feature vector and the second feature vector is actually used for measuring the similarity between the comparison image and the source image; the smaller the distance between the first feature vector and the second feature vector is, the more similar the comparison image and the source image are, and the greater the probability that the comparison image and the source image belong to the same face at the same time is.
S202: and generating first feedback information aiming at the condition that the distance is greater than a preset distance threshold value, and carrying out parameter adjustment on the second neural network based on the first feedback information.
Here, since the second neural network is constrained by using the distance between the first feature vector and the second feature vector, a distance threshold is preset, and when the first feature vector and the second feature vector are greater than the preset distance threshold, the first feedback information is generated. The first feedback information is fed back to the second neural network, and the second neural network can adjust the parameters based on the first feedback information.
S203: based on the adjusted parameters, a new second feature vector is extracted for each source image in the set of source images using the second neural network, and the distance determination operation is performed again.
Until the distance between the first feature vector and the second feature vector is not greater than a preset distance threshold.
It should be noted here that when the distance between the first feature vector and the second feature vector is not greater than the preset distance threshold, another feedback information is also generated and fed back to the second neural network, so that the second neural network performs a smaller-magnitude adjustment on the parameter, and strives for the gradient to decrease to the local optimum.
Referring to fig. 3, the sorting operation includes:
s301: and classifying the currently determined second feature vector by using a face classifier.
S302: and generating second feedback information aiming at the condition that the classification result is wrong, and adjusting parameters of the second neural network and the face classifier based on the second feedback information.
Specifically, when the result of the classifier classifying the currently determined second feature vector is wrong, it is considered that the second neural network has excessive parameters affected by the target image when domain-mixing the source image and the target image, and therefore when the classification result is wrong, the classifier generates second feedback information and feeds the second feedback information back to the second neural network and the face classifier. The second neural network and the face classifier can adjust the parameters of the second neural network and the face classifier respectively based on the second feedback information of the classification errors.
S303: based on the adjusted parameters, a new second feature vector is extracted for each source image in the source image set using the second neural network, and the classification operation is performed again.
Until the classification result of the face classifier is correct.
It should be noted here that when the classification of the classifier is correct, corresponding feedback information is also generated and fed back to the second neural network and the face classifier. The second neural network and the face classifier can adjust the parameters with smaller amplitude based on the correctly classified feedback information, and strive for gradient reduction to local optimum.
S105: and performing multi-round training on the second neural network and the face classifier to obtain a face recognition model.
In the specific implementation, a round of training is performed on the second neural network and the face classifier, which means that the second neural network and the face classifier are trained by using a group of comparison image sets, a source image set and a target image set. And then, continuously inputting a plurality of groups of comparison image sets, training a second neural network and a face classifier by the source image set and the target image set until the second neural network and the face classifier which meet the requirements are obtained, and taking the obtained second neural network and the face classifier as the obtained face recognition model.
In the face recognition model training method provided by the embodiment of the invention, the source image set and the target image set are input into the same second neural network, and the second neural network is used for carrying out feature learning of parameter sharing on the source image in the source image set and the target image in the target image set, so that the second neural network is influenced by the target image during training, and some features of the target image are learned, therefore, the second feature vector obtained when the second neural network is used for carrying out feature extraction on the source image can extract the features of the source image, and can also lead the extracted second feature vector to introduce preset features of the target image; when using the second eigenvector to train the face classifier, also can receive the influence of the preset characteristics of the target image, thereby making the face recognition model obtained by the training method of the face recognition model, when carrying out face recognition on the image to be recognized with the preset characteristics, obtaining better recognition effect, for example: the better recognition effect can be obtained by carrying out face recognition on the image with poor quality.
In another embodiment of the present invention, referring to fig. 4, after inputting the source image set and the target image set into the second neural network and performing feature learning on the source image and the target image, the method further includes:
s401: a third feature vector is extracted for each target image in the set of target images.
S402: and performing gradient inversion processing on the second feature vector and the third feature vector.
S403: and inputting the second feature vector and the third feature vector subjected to gradient reversal processing into a domain classifier.
S404: and adjusting parameters of the second neural network according to the domain classification result of the source image set and the target image set respectively represented by the second feature vector and the third feature vector of the domain classifier.
In the implementation, the process of training the second neural network by using the source image and the target image is actually a process of domain mixing the source image and the target image. The second feature vector obtained by feature extraction of the source image by using the second neural network is influenced by features in the target image, namely, the second feature vector is close to the features of the target image; meanwhile, a third feature vector obtained by using the second neural network to perform feature extraction on the source image is influenced by features in the source image, namely, the third feature vector is close to the features of the source image. Therefore, in order to realize domain mixing of the source image and the target image, after extracting the third feature vector for each target image in the target image set, the second feature vector and the third feature vector are subjected to gradient inversion processing, then the second feature vector and the third feature vector subjected to gradient inversion processing are input to a domain classifier, and the domain classifier is used for domain classification of the second feature vector and the third feature vector.
The result of the domain classification is correct, that is, the probability that the domain classifier can correctly classify the second feature vector and the third feature vector is higher, the smaller the degree of domain mixing is explained; the larger the probability of wrong result of the domain classification is, that is, the smaller the probability of correct classification of the second feature vector and the third feature vector by the domain classifier is, the larger the degree of domain mixing is, so that the parameter adjustment is performed on the second neural network based on the result of classification of the source image set and the target image set respectively guaranteed by the domain classifier on the second feature vector and the third feature vector.
Specifically, referring to fig. 5, the parameter adjustment of the second neural network according to the domain classification result can be realized by performing the following domain classification loss determination operation:
s501: and determining the domain classification loss of the current domain classification of the source image set and the target image set respectively characterized by the current second feature vector and the third feature vector.
Here, the degree of domain mixing is characterized by the domain classification loss. The domain classification loss of the source image set refers to the number of source images classified in the source image set into the target image set in the process of classifying the source images and the target images based on the second feature vectors and the third feature vectors. The domain classification loss of the target image set refers to the number of target images classified into the source image set in the target image set in the process of classifying the original image and the target images based on the second feature vector and the third feature vector. After the domain classifier is used for carrying out domain classification on the source image set and the target image set respectively represented by the second feature vector and the third feature vector, a domain classification result can be obtained, and then domain classification losses respectively corresponding to the source image set and the target image set are determined according to the domain classification result.
S502: and generating third feedback information aiming at the fact that the difference between the domain classification losses of the latest preset times is not smaller than a preset difference threshold value, and carrying out parameter adjustment on the second neural network based on the third feedback information.
Here, a preset difference threshold is used to constrain the degree of domain mixing. The domain classifier pre-stores the distribution of the domains to which the second feature vector and the third feature vector respectively belong, and when the difference between the domain classification losses of the latest preset times is not less than a preset difference threshold, the domain classification is considered to be in a stable state, that is, in a certain domain classification, the domain classifier can correctly distinguish the domains to which the second feature vector and the third feature vector respectively belong, in a certain domain classification, the domain classifier cannot correctly distinguish the domains to which the second feature vector and the third feature vector respectively belong, and the domain mixing degree is not stable, so that the parameters of the second neural network need to be adjusted, and third feedback information with overlarge domain classification loss difference can be generated and fed back to the second neural network. And after receiving the third feedback information with the excessive domain classification loss difference, the second neural network adjusts the parameters of the second neural network.
S503: based on the adjusted parameters, a second neural network is used for extracting a new second feature vector for each source image in the source image set, a new third feature vector is extracted for a target image in the target image set, domain classification loss determination operation is carried out until the difference is not greater than a preset difference threshold value, and the current round of training of the second neural network based on the domain classifier is completed.
Here, it should be noted that when the difference between the domain classification losses of the last preset number of times is smaller than the preset difference threshold, feedback information with appropriate domain classification loss difference is also generated and fed back to the second neural network. After receiving the feedback information with proper domain classification loss difference, the second neural network also adjusts the parameters of the second neural network in a smaller amplitude, and strives for gradient reduction to be locally optimal.
Therefore, through the domain classification loss, the distance between the first feature vector and the currently determined corresponding second feature vector and the triple constraint of the face classifier on the classification result of the second feature vector, a more optimized face recognition model can be obtained when the second neural network and the face classifier are trained.
Referring to fig. 6, an embodiment of the present invention further provides a specific example of a face model training method, including:
s601: and acquiring a comparison image set, a source image set and a target image set. Jumping to S602 and S603. Wherein, S602 and S603 are executed after S601 is executed, and the execution order of S602 and S603 is not limited.
S602: the comparison image set is input into a first neural network, and a first feature vector is extracted for each comparison image in the input comparison image set by using the first neural network. Jumping to S605.
S603: and inputting the source image set and the target image set into a second neural network, and performing feature learning on the source images in the source image set and the target images in the target image set. Jump to S604.
S604: and extracting a second feature vector for each source image in the source image set by using the second neural network after feature learning. Jumping to S605, S608 and S612. S605, S608, and S612 are executed after S604, and the execution order of S605, S608, and S612 is not limited.
S605: a distance between the first feature vector and a currently determined corresponding second feature vector is determined. Jump to S606.
S606: whether the distance is larger than a preset distance threshold value is detected. If yes, jumping to S607; if not, jumping to S619.
S607: and generating first feedback information, and performing parameter adjustment on the second neural network based on the first feedback information. Jump to S604.
S608: and inputting the second feature vector into a face classifier. Jump to S609.
S609: and classifying the currently determined second feature vector by using a face classifier. Jump to S610.
S610: and detecting whether the classification result is correct. If not, jumping to S611, if yes, jumping to S619.
S611: and generating second feedback information, and performing parameter adjustment on the second neural network and the face classifier based on the second feedback information. Jump to S604.
S612: and extracting a third feature vector for each target image in the target image set by using the second neural network subjected to feature learning. Jumping to S613:
s613: and performing gradient inversion processing on the second feature vector and the third feature vector. Jump to S614.
S614: and inputting the second feature vector and the third feature vector subjected to gradient reversal processing into a domain classifier. Jumping to S615.
S615: and performing domain classification on the source image set and the target image set respectively characterized by the second feature vector and the third feature vector by using a domain classifier. Jump to S616.
S616: and determining the domain classification loss of the current domain classification of the source image set and the target image set respectively characterized by the current second feature vector and the third feature vector. Jump to S617.
S617: detecting whether the difference between the domain classification losses of the latest preset times is smaller than a preset difference threshold value. If not, then jump to S618. If so, it jumps to S619.
S618: and generating third feedback information, and performing parameter adjustment on the second neural network based on the third feedback information. Jump to S604.
S619: the training of the current round is completed.
And when the second neural network and the face classifier are subjected to multi-round training and the parameters in the obtained second neural network and the face classifier tend to be stable, the obtained second neural network and the face classifier are used as a face recognition model. The face recognition model can effectively recognize face images with poor quality.
Further, in S606, S610, and S617, when "yes" is satisfied at the same time, the process may actually go to S619, and when one or "no" of the three exists, the current round of training is not really completed, but information is continuously fed back to the neural network to fine-tune the neural network as described above.
Referring to fig. 7, an embodiment of the present invention further provides a face recognition method, where the face recognition method specifically includes:
s701: acquiring an image to be identified;
s702: and carrying out face recognition on the image to be recognized by using the face recognition model to generate a face recognition result corresponding to the image to be recognized.
The face recognition model is obtained by adopting the face recognition model training method provided by the embodiment of the invention.
Specifically, the face recognition model includes: a second neural network and a face classifier; when a face recognition model is used for carrying out face recognition on an image to be recognized to generate a face recognition result corresponding to the image to be recognized, firstly, the image to be recognized is input into a second neural network, and feature extraction is carried out on the image to be recognized by using the second neural network to obtain a feature vector of the image to be recognized; and secondly, classifying the images to be recognized by adopting a face classifier based on the feature vectors of the images to be recognized to obtain the classification result of the images to be recognized.
In the face recognition method provided by the embodiment of the invention, when the face recognition model is trained, the source image set and the target image set are input into the same second neural network, and the second neural network is used for carrying out feature learning of parameter sharing on the source image in the source image set and the target image in the target image set, so that the second neural network is influenced by the target image during training, and some features of the target image are learned, therefore, a second feature vector obtained when the second neural network is used for carrying out feature extraction on the source image not only extracts the features capable of being extracted from the source image, but also introduces the features of the target image with poor quality into the extracted second feature vector; when the second feature vector is used for training the face classifier, the influence of the features of the target image with poor quality can be received, so that the face recognition model obtained by training through the face recognition model training method can obtain a good recognition effect when the face recognition is performed on the image to be recognized with poor quality.
Based on the same inventive concept, the embodiment of the invention also provides a face recognition model training device corresponding to the face recognition model training method.
Referring to fig. 8, a face recognition model training apparatus provided in an embodiment of the present invention includes:
the acquisition module is used for acquiring the comparison image set, the source image set and the target image set;
the first processing module is used for inputting the comparison image set into a first neural network and extracting a first feature vector for each comparison image in the input comparison image set by using the first neural network;
the second processing module is used for inputting the source image set and the target image set into a second neural network, and extracting a second feature vector for each source image in the source image set after performing feature learning on the source images in the source image set and the target images in the target image set; inputting the second feature vector into a face classifier to obtain a classification result;
the training module is used for carrying out the training of the current round on the second neural network and the face classifier based on the comparison result of the first feature vector and the corresponding second feature vector and the classification result; performing multi-round training on a second neural network and a face classifier to obtain a face recognition model;
wherein, the comparison image set comprises at least one comparison image with a label; the source image set comprises at least one source image with a label; and the comparison image from the first feature vector and the source image from the corresponding second feature vector are images of the same person.
Optionally, the second processing module is further configured to: inputting the source image set and the target image set into a second neural network, and extracting a third feature vector for each target image in the target image set after performing feature learning on the source images in the source image set and the target images in the target image set; and are
Carrying out gradient reversal processing on the second feature vector and the third feature vector; and
inputting the second feature vector and the third feature vector subjected to gradient reversal processing into a domain classifier;
and adjusting parameters of the second neural network according to the domain classification result of the source image set and the target image set respectively represented by the second feature vector and the third feature vector of the domain classifier.
Optionally, the training module is specifically configured to: performing the following distance determination operation and classification operation until the distance between the first feature vector and the second feature vector is not greater than a preset distance threshold and the obtained classification result is correct, and finishing the current round of training on a second neural network and a face classifier based on the first neural network;
the distance determining operation includes:
determining a distance between the first feature vector and a currently determined corresponding second feature vector;
generating first feedback information aiming at the condition that the distance is greater than a preset distance threshold value, and carrying out parameter adjustment on the second neural network based on the first feedback information;
based on the adjusted parameters, extracting a new second feature vector for each source image in the source image set by using a second neural network, and executing distance determination operation again;
the classification operation includes:
classifying the second feature vector determined currently by using a face classifier;
generating second feedback information aiming at the condition that the classification result is wrong, and adjusting parameters of the second neural network and the face classifier based on the second feedback information;
based on the adjusted parameters, a new second feature vector is extracted for each source image in the source image set using the second neural network, and the classification operation is performed again.
Optionally, the second processing module is specifically configured to: the following domain classification loss determination operations are performed:
determining the domain classification loss of the current domain classification of the source image set and the target image set respectively represented by the current second feature vector and the third feature vector;
generating third feedback information aiming at the fact that the difference between the domain classification losses of the latest preset times is not smaller than a preset difference threshold value, and adjusting parameters of the second neural network based on the third feedback information;
based on the adjusted parameters, a new second feature vector is extracted for each source image in the source image set by using the second neural network, a new third feature vector is extracted for each target image in the target image set, and domain classification loss determination operation is executed until the difference is not greater than a preset difference threshold value, and the current round of training on the second neural network based on the domain classifier is completed.
Optionally, the first processing module is specifically configured to: and performing local feature extraction on the input comparison images by using a first neural network, and generating a local first feature vector for each comparison image.
Optionally, the source image set is the same as the comparison image set.
According to the face recognition model training device provided by the embodiment of the invention, when the face recognition model is trained, the source image set and the target image set are input into the same second neural network, and the second neural network is used for performing feature learning of parameter sharing on the source image in the source image set and the target image in the target image set, so that the second neural network is influenced by the target image during training, and some features of the target image are learned, therefore, the second feature vector obtained when the second neural network is used for performing feature extraction on the source image can extract the features of the source image, and the extracted second feature vector can introduce the features of the target image with poor quality; when the second feature vector is used for training the face classifier, the influence of the features of the target image with poor quality can be received, so that the face recognition model obtained by training through the face recognition model training method can obtain a good recognition effect when the face recognition is performed on the image to be recognized with poor quality.
Referring to fig. 9, an embodiment of the present invention further provides a face recognition apparatus, where the apparatus includes:
the image acquisition module is used for acquiring an image to be identified;
the recognition module is used for carrying out face recognition on the image to be recognized by using the face recognition model obtained by the face recognition model training method provided by the embodiment of the invention so as to generate a face recognition result corresponding to the image to be recognized.
Specifically, the face recognition model includes: a second neural network and a face classifier; the identification module is specifically used for inputting the image to be identified into a second neural network, extracting the features of the image to be identified by using the second neural network and acquiring the feature vector of the image to be identified; and classifying the images to be recognized by adopting a face classifier based on the characteristic vectors of the images to be recognized to obtain the classification result of the images to be recognized.
In the face recognition device provided by the embodiment of the invention, when the face recognition model is trained, the source image set and the target image set are input into the same second neural network, and the second neural network is used for performing feature learning of parameter sharing on the source image in the source image set and the target image in the target image set, so that the second neural network is influenced by the target image during training, and some features of the target image are learned, therefore, a second feature vector obtained when the second neural network is used for performing feature extraction on the source image not only extracts the features capable of being extracted from the source image, but also introduces the features of the target image with poor quality into the extracted second feature vector; when the second feature vector is used for training the face classifier, the influence of the features of the target image with poor quality can be received, so that the face recognition model obtained by training through the face recognition model training method can obtain a good recognition effect when the face recognition is performed on the image to be recognized with poor quality.
Corresponding to the face recognition model training method in fig. 1 to fig. 6, an embodiment of the present invention further provides a computer device, as shown in fig. 10, the device includes a memory 1000, a processor 2000 and a computer program stored on the memory 1000 and operable on the processor 2000, wherein the processor 2000 implements the steps of the face recognition model training method when executing the computer program.
Specifically, the memory 1000 and the processor 2000 can be general memories and processors, which are not limited herein, and when the processor 2000 runs a computer program stored in the memory 1000, the face recognition model training method can be executed, so as to solve the problem that the current face recognition method cannot recognize a face image with poor quality, and further achieve an effect of effectively recognizing the face image with poor quality.
Corresponding to the face recognition model training method in fig. 1 to fig. 6, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the face recognition model training method.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the face recognition model training method can be executed, so that the problem that the existing face recognition method cannot recognize face images with poor quality is solved, and an effect of effectively recognizing the face images with poor quality is achieved.
The face recognition model training method and apparatus, and the computer program product of the face recognition method and apparatus provided in the embodiments of the present invention include a computer-readable storage medium storing a program code, and instructions included in the program code may be used to execute the method in the foregoing method embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.