CN113420683A

CN113420683A - Face image recognition method, device, equipment and computer readable storage medium

Info

Publication number: CN113420683A
Application number: CN202110725803.3A
Authority: CN
Inventors: 陈星宇; 黄余格; 李绍欣; 张睿欣; 李季檩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-21

Abstract

The embodiment of the application discloses a face image identification method, a face image identification device, face image identification equipment and a computer readable storage medium; the method and the device can acquire a plurality of face image samples of the target object and the image score corresponding to each face image sample; selecting a face image sample pair from a plurality of face image samples according to the difference between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; performing combined training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image; and carrying out face recognition on the face image to be recognized through the target model. Therefore, the recognition rate of the low-quality face image can be improved, and the accuracy of face recognition can be improved.

Description

Face image recognition method, device, equipment and computer readable storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for recognizing a face image.

Background

With the development of information security technology, people can perform information authentication through a face image recognition technology when facing scenes such as login or authentication, so as to realize quick information authentication. However, in the face image recognition process, the quality of the acquired face image is low, which may cause the failure of face image recognition. In order to improve the face image recognition rate, the related art performs face image recognition by acquiring uncertainty of all features in a face image, by suppressing feature expression with high uncertainty, and by enhancing feature weight with low uncertainty, that is, by enhancing feature expression with low uncertainty.

In the research and practice process of the prior art, the inventor of the present application finds that, for the existing face image recognition, the existing face image recognition depends on the features with low uncertainty in the face image, and when the number of the features with low uncertainty in the face image is small, the low-quality face image cannot be recognized, which results in low recognition rate of the face image, thereby affecting the accuracy of the face recognition.

Disclosure of Invention

The embodiment of the application provides a face image recognition method, a face image recognition device, face image recognition equipment and a computer readable storage medium. The recognition rate of the face image can be improved, and the accuracy of face recognition can be improved.

The embodiment of the application provides a face image identification method, which comprises the following steps:

acquiring a plurality of face image samples of a target object and an image score corresponding to each face image sample;

selecting a face image sample pair from the plurality of face image samples according to the difference value between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image;

performing joint training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image;

and carrying out face recognition on the face image to be recognized through the target model.

Correspondingly, the embodiment of the present application provides a face image recognition apparatus, including:

the acquisition unit is used for acquiring a plurality of face image samples of the target object and an image score corresponding to each face image sample;

a selecting unit, configured to select a face image sample pair from the multiple face image samples according to a difference between the image scores, where the face image sample pair includes a first face image and a second face image, and an image score of the first face image is greater than an image score of the second face image;

the training unit is used for carrying out joint training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, and the target model is obtained by carrying out distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image;

and the identification unit is used for carrying out face identification on the face image to be identified through the target model.

In some embodiments, the training unit comprises:

the first training subunit is used for generating constraint parameters according to the first characteristic parameters of the first face image and the second characteristic parameters of the second face image through the preset model, and adjusting the network parameters of the preset model according to the constraint parameters until the constraint parameters are converged;

and the second training subunit is used for carrying out classification training on the preset model after the constraint parameter convergence according to the probability value difference between the predicted probability value output by the preset model after the constraint parameter convergence and the preset probability value until the probability value difference is converged to obtain a trained target model.

In some embodiments, the first training subunit is further configured to:

extracting a first characteristic parameter corresponding to the first face image and extracting a second characteristic parameter corresponding to the second face image;

generating the characteristic constraint sub-parameter according to the first characteristic parameter and the second characteristic parameter, and adjusting a first network sub-parameter of the preset model according to the characteristic constraint sub-parameter until the characteristic constraint sub-parameter is converged;

converting the first feature parameters into first feature vectors and converting the second feature parameters into second feature vectors;

and determining the vector constraint sub-parameter according to the first feature vector and the second feature vector, and adjusting a second network sub-parameter of the preset model according to the vector constraint sub-parameter until the vector constraint sub-parameter is converged.

In some embodiments, the first training subunit is further configured to:

respectively carrying out average compression processing on the sub-characteristic parameters of each color channel in the first characteristic parameter and the second characteristic parameter to obtain a compressed first characteristic parameter and a compressed second characteristic parameter;

respectively carrying out normalization processing on the compressed first characteristic parameter and the compressed second characteristic parameter to obtain a first characteristic sub-parameter and a second characteristic sub-parameter;

and performing norm calculation according to the first characteristic subparameter and the second characteristic subparameter to obtain the characteristic constraint subparameter.

In some embodiments, the first training subunit is further configured to:

performing cosine calculation according to the first eigenvector and the second eigenvector to obtain a cosine constraint value;

and determining the cosine constraint value as the vector constraint subparameter.

In some embodiments, the selecting unit is further configured to:

acquiring a score difference value of image scores of any two face image samples in the plurality of face image samples;

filtering the score difference value according to a preset score difference value range to obtain a score difference value in the preset score difference value range;

and determining two face image samples corresponding to the score difference value within the preset score difference value range as the face image sample pair.

In some embodiments, the identification unit is further configured to:

inputting the face image to be recognized into the target model;

extracting the characteristics of the face image to be recognized through the target model to obtain characteristic parameters corresponding to the face image to be recognized;

normalizing the characteristic parameters through the target model to obtain characteristic vectors corresponding to the characteristic parameters;

and classifying the characteristic vectors through the target model to obtain an image recognition result corresponding to the face image to be recognized.

In addition, the embodiment of the present application further provides a computer device, which includes a processor and a memory, where the memory stores an application program, and the processor is configured to run the application program in the memory to implement the steps in the face image recognition method provided in the embodiment of the present application.

In addition, a computer-readable storage medium is provided, where multiple instructions are stored, and the instructions are suitable for being loaded by a processor to perform steps in any one of the face image recognition methods provided in the embodiments of the present application.

In addition, the embodiment of the present application also provides a computer program, which includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the steps in any human face image recognition method provided by the embodiment of the application.

The method and the device can acquire a plurality of face image samples of the target object and the image score corresponding to each face image sample; selecting a face image sample pair from a plurality of face image samples according to the difference between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; performing combined training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image; and carrying out face recognition on the face image to be recognized through the target model. Therefore, the preset model is trained through the face image sample pair, so that the preset model is distilled according to the difference between the first characteristic parameter corresponding to the first face image in the face image sample pair and the second characteristic parameter corresponding to the second face image, self-constraint is realized until the characteristics of the model in the face image sample pair are consistent, and a target model is obtained; therefore, the representation capability of the model in recognizing the characteristic parameters of the low-quality face image is improved through the characteristic parameters of the high-quality face image, so that the model is close to the characteristics of the high-quality face image when the low-quality face image is recognized, the face image to be recognized is conveniently recognized through the target model in the follow-up process, the recognition rate of any face image is improved, and the accuracy of face recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a scene schematic diagram of a face image recognition system according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating steps of a face image recognition method according to an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating another step of a face image recognition method according to an embodiment of the present application;

FIG. 4 is a diagram illustrating the effect of iterative training provided by an embodiment of the present application;

fig. 5 is a scene schematic diagram of a face image recognition method according to an embodiment of the present application;

fig. 6 is a scene schematic diagram of an attention feature extraction method provided in an embodiment of the present application;

fig. 7 is a schematic structural diagram of a face image recognition apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a face image recognition method, a face image recognition device, face image recognition equipment and a computer readable storage medium. Specifically, the embodiment of the present application will be described from the perspective of an image recognition apparatus, which may be specifically integrated in a computer device, and the computer device may be a server, or may be a terminal or other devices. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and an artificial intelligence platform. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

It should be noted that, in the face image recognition method or apparatus disclosed in the present application, a plurality of servers may be combined into a block chain, and the servers are nodes on the block chain.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence face image recognition and the like, and is specifically explained by the following embodiments:

for example, referring to fig. 1, a scene diagram of an image recognition system provided in an embodiment of the present application is shown. The scene comprises a terminal 10 and a server 20, and data interaction is realized between the terminal 10 and the server 20 through a wireless communication connection.

The user selects a sample data set (such as a facial image sample, a corresponding score sample, etc.) and a facial image to be recognized, which need to be uploaded for training, through the terminal 10, and uploads the sample data set and the facial image to be recognized to the server 20 corresponding to the corresponding platform through the terminal 10. So that the server 20 trains the model according to the sample data set or performs face recognition on the face image to be recognized based on the model.

The server 20 may collect a plurality of face image samples of the target object and an image score corresponding to each face image sample; selecting a face image sample pair from a plurality of face image samples according to the difference between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; and performing combined training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to the first face image and a second characteristic parameter corresponding to the second face image. In addition, the server 20 performs face recognition on the face image to be recognized through the target model when receiving the face image to be recognized.

The face image recognition can comprise processing modes of collecting face image samples and image scores, selecting sample pairs, carrying out distillation training on a preset model, recognizing a face image to be recognized and the like.

The following are detailed below. The order of the following examples is not intended to limit the preferred order of the examples.

Referring to fig. 2, fig. 2 is a schematic flow chart illustrating steps of a face image recognition method according to an embodiment of the present application, and the specific flow chart is as follows:

101. and acquiring a plurality of face image samples of the target object and an image score corresponding to each face image sample.

The face image sample can be an image containing only the human face appearance or at least the human face appearance, and contains the biological characteristic information of the human face. In this embodiment, the face image sample is used for model training to improve the accuracy of the biometric identification technology in identifying the face image.

It should be noted that, in the process of face image recognition, the quality of different face images of the same target object may be different, which is because each face image contains different facial feature information. For example, in a face image, the facial feature information may include front face and side face, sharpness and blur, angle of the face, occlusion or non-occlusion of the face, expression of the face, and other factors, which may affect the quality of the face image, and thus affect the recognition rate of the face image.

The image score can be a quality score of the face image and is used for reflecting the quality condition of the face image, the quality condition of the face image can be judged according to the face characteristic information which can be used for face recognition, and the more the face characteristic information is, the more the face image is beneficial to distinguishing the current face image and the face images of other different people in face recognition. It can be understood that, if the more facial feature information that is available for face recognition, the better the quality of the face image is, the easier it is to distinguish the face image from different people, the higher the image score of the face image is. For example, the face state in the face image is a positive face, and the image is clear, which indicates that the face image contains more face feature information that can be used for face recognition, and the image score of the face image is higher.

In order to distinguish the quality of the face images, the embodiment of the application assigns corresponding image scores according to the facial feature information of each face image so as to distinguish the quality condition of each face image. Specifically, the image score may be obtained by: and judging the quality of the face image through a preset face quality model so as to determine the image score of the face image. The preset Face Quality model may be a Face Image Quality evaluation model (SDD-FIQA, Similarity Distribution Distance for Face Image Quality Assessment) fused with Similarity Distribution Distance, and the training process may be: traversing the training data through the face recognition model, and acquiring corresponding intra-class distribution and inter-class distribution; calculating Wasserstein distances of intra-class distribution and inter-class distribution, and using the Wasserstein distances as quality score pseudo labels; the quality score regression network is trained under the constraints of a regression loss function (Huber loss). When the preset human face quality model evaluates the image scores of the human face images, the distance between the similarity distribution of the same person and the similarity distribution of different persons of the human face images is used as the quality score of the human face images, namely the image scores, namely, the image scores are higher when the similarity distribution distance of the same person and the similarity distribution distance of the different persons are farther, the image scores are higher, and otherwise, the image scores of the human face images are lower. Therefore, the quality of the face image can be reflected through the image score of the face image, and the difference between the current face image and the face image of the same person is further reflected.

102. And selecting a face image sample pair from a plurality of face image samples according to the difference value between the image scores.

The face image sample pair comprises a first face image and a second face image, and the image score of the first face image is larger than that of the second face image.

The difference value may be an image score difference value between face images of the same object under different qualities, and is used to represent a quality difference between different face images of the same object. For example, under the same object, there is a quality difference between a clear face image and a blurred face image, there is a quality difference between a face image in a front face state and a face image in a side face state, there is a quality difference between a face image blocked by a face and a face image not blocked by the face, and the like; in addition, the pattern of the face image may also include the above example combination manner, or include other face image examples with quality differences, which is not limited herein. It should be noted that, when there is a quality difference between two face images of the same object, there is a difference in image scores between the two face images.

The quality condition of the face image can be reflected through the image score, and if the image score of the face image is higher, the more face feature information which can be used for face recognition is contained in the face image, the better the quality is, and the face image is easier to distinguish from the face images of other objects.

In order to improve the recognition rate of the subsequent low-quality face image recognition and distinguish the low-quality face image from the face images of other objects, it is necessary to ensure that the recognition effects of the low-quality face image and the high-quality face image of the same object are consistent during face recognition. Therefore, the method and the device for recognizing the low-quality face image improve the recognition rate of the model to the low-quality face image by selecting the face images of the same object with different qualities as the training data of the model.

In some embodiments, the step of "selecting a face image sample pair from a plurality of face image samples according to a difference between image scores" comprises:

(1) and acquiring a score difference value of the image scores of any two face image samples in the plurality of face image samples.

The quality difference (the difference of the image scores) between the two face image samples is too large or too small, which is not beneficial to the training of the model. Specifically, if the quality difference between two face image samples is too small, the information which can be transmitted between the high-quality face image sample and the low-quality face image sample is limited; if the quality difference between the two face image samples is too large, the quality difference from the high-quality face image sample to the low-quality face image sample is too large, which may cause the situation that the model cannot be converged or overfit. Therefore, in the embodiment of the present application, two face image samples meeting the preset difference need to be selected as a face image sample pair, that is, two face image samples under the same target object at the preset image score difference are selected as a face image sample pair to be used as image data for subsequent model training.

In order to select image data used for model training, namely a face image sample pair, the embodiment of the application obtains the image score difference of any two face image samples of the same target object, so that two face images meeting the preset requirement are determined to serve as the face image sample pair according to the image score difference.

(2) And filtering the score difference value according to the preset score difference value range to obtain the score difference value in the preset score difference value range.

The preset score difference range can be a value range of image score differences of the face images and is used for filtering a combination of two face image samples with too large or too small image score differences.

In order to obtain a combination of two face image samples which meets a preset score difference range, in the embodiment of the application, after a score difference of image scores of any two face images is obtained, the score difference of the image scores corresponding to each combination is compared with the preset score difference range, so that the score difference is filtered through the preset score difference range, namely, the combination of the two face image samples of which the score difference does not meet the preset score difference range is filtered, so that the score difference in the preset score difference range is obtained, and a practical face image sample is selected according to the score difference in the preset score difference range.

(3) And determining two face image samples corresponding to the score difference value within the preset score difference value range as a face image sample pair.

In order to select a face image sample pair for subsequent model training, in the embodiment of the application, after determining a score difference value within a preset score difference value range, two face image samples corresponding to the score difference value within the preset score difference value range are determined as the face image sample pair.

Through the method, two face image samples which accord with the preset score difference value can be selected from a plurality of face image samples of the target object to serve as a face image sample pair for subsequent model training. Therefore, enough information quantity which can be transmitted to the model by the characteristic difference between the high-quality face image sample and the low-quality face image is ensured, meanwhile, the phenomenon that the model cannot be converged or over-fitted due to the fact that the difference between the high-quality face image sample and the low-quality face image is too large is avoided, the recognition rate of the model obtained by subsequent training in the recognition of the low-quality face image is improved, and the accuracy in the recognition of the face image is improved.

103. And performing joint training on the preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model.

And the target model is obtained by distilling and training the difference between a first characteristic parameter corresponding to the first human face image and a second characteristic parameter corresponding to the second human face image.

The network structure of the preset model and the target model may include a feature extraction layer and a classification layer. The feature extraction layer is used for extracting features of a face image input to the model and converting the extracted features into feature vectors; the classification layer is used for classifying according to the characteristic vectors to obtain a classification result, namely a recognition result of the face image.

Further, the model according to the embodiment of the present application further includes an attention network module (Transformer). In order to facilitate understanding, in the embodiment of the application, a preset model is trained as an example, and after the preset model receives a face image sample pair and a corresponding image score, a first face image and a second face image in the face image sample pair are distinguished based on the image score; extracting a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image through a characteristic extraction layer; preprocessing the first characteristic parameter through an attention network module to obtain a first characteristic sub-parameter, preprocessing the second characteristic parameter through the attention network module to obtain a second characteristic sub-parameter, and determining a characteristic constraint sub-parameter according to the first characteristic parameter and the second characteristic parameter through a first loss function so as to adjust a corresponding network parameter in a preset model according to the characteristic constraint sub-parameter, such as the first network sub-parameter of a characteristic extraction layer; in addition, the first feature parameter is converted into a first feature vector through the feature extraction layer, the second feature parameter is converted into a second feature vector, and a vector constraint sub-parameter is determined according to the first feature vector and the second feature vector through a second loss function so as to constrain a corresponding network parameter in the preset model according to the vector constraint sub-parameter, such as a second network sub-parameter of the feature extraction layer; and finally, performing classification processing according to the feature vectors through the classification layer to obtain a prediction classification result (prediction probability value), and adjusting corresponding network parameters in the preset model according to probability difference between the prediction probability value and the preset probability value, such as network parameters (second network parameters in the embodiment of the application) of the classification layer. Through the above manner, each network parameter of the preset model is adjusted until convergence, and the trained target model is obtained.

In some embodiments, the step of performing joint training on a preset model according to a face image sample pair and a corresponding image score to obtain a trained target model includes:

(1) and inputting the face image sample pair and the corresponding image fraction into a preset model.

In order to train the preset model, after a face image sample pair is selected, the face image sample pair and the image scores corresponding to the sample pair are input into the preset model; after receiving the face image sample pair and the image scores, the preset model distinguishes a first face image and a second face image in the face image sample pair based on the image scores so as to carry out knowledge self-distillation subsequently according to corresponding features of the first face image and the second face image.

(2) And generating constraint parameters according to the first characteristic parameters of the first face image and the second characteristic parameters of the second face image through the preset model, and adjusting the network parameters of the preset model according to the constraint parameters until the constraint parameters are converged.

In order to improve the recognition rate of the low-quality face image by the preset model, the preset model needs to be distilled. Specifically, according to the embodiment of the application, the constraint parameters are generated according to the first characteristic parameters of the first face image and the second characteristic parameters of the second face image, so that the network parameters of the preset model are adjusted according to the constraint parameters until the constraint parameters are converged. It should be noted that, when the constraint parameters generated by the first feature parameter and the second feature parameter are not changed, that is, the current constraint parameter is the same as or very close to the previous constraint parameter, which indicates that the constraint parameters are converged, at this time, the first feature parameter of the first facial image extracted by the model is the same as or very close to the second feature parameter of the second facial image, which indicates that the feature parameters of the low-quality facial image extracted by the current model are close to the feature parameters of the high-quality facial image, and the preset model performs knowledge self-distillation. By the mode, the characteristic parameters of the low-quality face image are close to the characteristic parameters of the high-quality face image when the model identifies the low-quality face image subsequently, so that the identification capability of the model on the low-quality face image is improved, namely the identification rate of the low-quality face image is improved.

(3) And according to the probability value difference between the predicted probability value output by the preset model after the constraint parameter convergence and the preset probability value, carrying out classification training on the preset model after the constraint parameter convergence until the probability value difference is converged to obtain a trained target model.

The prediction probability value may be a prediction classification structure when a preset model identifies a face image sample, that is, a similarity or probability that the face image sample belongs to a certain object. Specifically, the preset model extracts features according to the face image sample pair, and performs feature classification to obtain a probability value.

The preset probability value can be a probability value set for the face image sample in advance, and is used for carrying out convergence training on a result output by the preset model. For example, sample probability values may be set according to a first face image (high-quality face image) or a second face image (low-quality face image) in the face sample pair for convergence training of an output result of the model.

It should be noted that, after the network parameters corresponding to the preset model are adjusted according to the constraint parameters, the preset model with the constraint parameters being converged is obtained, and the current model has the recognition capability for the low-quality face image, but is not accurate in the aspect of the output result of the model. Therefore, in order to improve the accuracy of the recognition result output from the model, it is necessary to perform convergence training of the output result for the model.

In the embodiment of the application, after the preset model with the converged constraint parameters is obtained, convergence training is performed on the output probability value of the preset model. Specifically, the preset model extracts features of a first face image and a second face image in the face image sample pair, classifies the extracted features to obtain a prediction probability value, and adjusts corresponding network parameters (second network parameters) in the preset model according to the probability value difference between the prediction probability value and the preset probability value, for example, adjusts network parameters of a classification layer in the preset model until the difference is converged, so as to obtain a trained target model.

Through the above manner, each network parameter of the preset model is adjusted until convergence, and the trained target model is obtained, so that the recognition rate of the model in the subsequent recognition of the face image is improved, and the accuracy in the recognition of the face image is improved.

In some embodiments, the constraint parameters include feature constraint sub-parameters and vector constraint sub-parameters. The first network parameters include first network sub-parameters and second network sub-parameters, and specifically, the preset model may include a feature extraction layer and a vector conversion layer, where the first network sub-parameters may be network parameters of the feature extraction layer in the preset model, and the second network sub-parameters may be network parameters of the vector conversion layer in the preset model. Generating constraint parameters according to the first characteristic parameters of the first facial image and the second characteristic parameters of the second facial image through a preset model, and adjusting the network parameters of the preset model according to the constraint parameters until the constraint parameters are converged, wherein the step (2) comprises the following steps:

and (2.1) extracting a first characteristic parameter corresponding to the first face image and extracting a second characteristic parameter corresponding to the second face image.

In order to obtain the feature parameters of the first face image and the second face image in the face image sample pair for subsequent feature processing and model distillation, feature extraction needs to be performed on the first face image and the second face image. For example, convolution processing is performed on the first face image and the second face image to acquire a first feature parameter corresponding to the first face image and acquire a second feature parameter corresponding to the second face image.

And (2.2) generating a characteristic constraint sub-parameter according to the first characteristic parameter and the second characteristic parameter, and adjusting the first network sub-parameter of the preset model according to the characteristic constraint sub-parameter until the characteristic constraint sub-parameter is converged.

The characteristic constraint sub-parameter may be a constraint parameter calculated according to the first characteristic parameter and the second characteristic parameter, and is used to adjust a corresponding network parameter in the preset model.

The first network sub-parameter may be a network parameter of a corresponding network structure layer in the preset model, such as a network parameter corresponding to the network structure layer for extracting the first characteristic parameter and the second characteristic parameter.

In order to enable the feature parameters of the low-quality face image extracted by the model to be the same as or close to the feature parameters of the high-quality face image, knowledge self-distillation needs to be carried out on the model, namely the model adjusts corresponding network parameters in the model according to the difference between the feature parameters of the high-quality face image and the feature parameters of the low-quality face image at the same moment. Specifically, in the embodiment of the present application, a feature constraint sub-parameter is calculated according to a first feature parameter corresponding to a first face image and a second feature parameter corresponding to a second face image, a corresponding network structure in a preset model is adjusted according to the feature constraint sub-parameter, after the adjustment, a constraint sub-parameter is recalculated according to the first feature parameter corresponding to the first face image extracted by the adjusted preset model and the second feature parameter corresponding to the second face image, when the feature constraint sub-parameter generated by the first feature parameter and the second feature parameter does not change any more, that is, the current feature constraint sub-parameter is the same as or very close to the previous feature constraint sub-parameter, it indicates that the feature constraint sub-parameter is converged, and at this time, the first feature parameter extracted by the model is the same as or very close to the second feature parameter of the second face image, and (4) describing that the characteristic parameters of the low-quality face image extracted by the current network structure layer of the current model are close to the characteristic parameters of the high-quality face image, and the knowledge self-distillation of the current network structure layer in the preset model is finished.

In some embodiments, the step of "generating the feature constraint sub-parameter according to the first feature parameter and the second feature parameter" in (2.2) may include:

(2.2.1) respectively carrying out average compression processing on the sub-characteristic parameters of each color channel in the first characteristic parameter and the second characteristic parameter to obtain a compressed first characteristic parameter and a compressed second characteristic parameter;

(2.2.2) respectively carrying out normalization processing on the compressed first characteristic parameter and the compressed second characteristic parameter to obtain a first characteristic sub-parameter and a second characteristic sub-parameter;

and (2.2.3) performing norm calculation according to the first characteristic sub-parameter and the second characteristic sub-parameter to obtain a characteristic constraint sub-parameter.

It should be noted that after the first feature parameter corresponding to the first face image is extracted and the second feature parameter corresponding to the second face image is extracted, the first feature parameter and the second feature parameter may include feature parameters corresponding to a plurality of color channels, and the presentation form of the first feature parameter and the second feature parameter may be a multi-dimensional feature matrix parameter. In order to obtain an attention feature (attention map) for calculation, that is, a feature parameter for determining a feature constraint sub-parameter, in the embodiment of the present application, downsampling processing needs to be performed on the extracted first feature parameter and second feature parameter respectively, for example, the first feature parameter and the second feature parameter are processed in a convolution, pooling, or fusion processing manner.

For example, after obtaining the first characteristic parameter and the second characteristic parameter, the embodiment of the present application performs average compression processing on each color channel of the first characteristic parameter and the second characteristic parameter, and performs normalization processing after compression, so as to obtain a corresponding first characteristic sub-parameter and a corresponding second characteristic sub-parameter. Wherein, the average compression processing procedure may be: and averagely compressing the characteristic value of each color channel in the first characteristic parameter to obtain averagely compressed characteristic parameters.

For example, it is assumed that the size of a first feature parameter corresponding to a first face image and a second feature parameter corresponding to a second face image of an input model is H × W × C, where H × W × C represents "length × width × channel number"; averaging the characteristic value of each color channel (channel) in the first characteristic parameter and the second characteristic parameter to obtain an average compressed first characteristic parameter and a compressed second characteristic parameter, wherein the average compressed first characteristic parameter and the compressed second characteristic parameter are both H W, namely length and width; finally, the compressed first feature parameter and the compressed second feature parameter are normalized respectively to obtain a corresponding attention feature (attention map), namely a first feature sub-parameter and a second feature sub-parameter, wherein the size of the attention feature (attention map) is H × W.

Further, after obtaining the attention feature corresponding to the first face image and the attention feature corresponding to the second face image, that is, the first feature sub-parameter and the second feature sub-parameter, norm calculation is performed according to the first feature sub-parameter and the second feature sub-parameter through a corresponding loss function in the model, specifically, L2 norm calculation is performed according to the first feature sub-parameter and the second feature sub-parameter, and a target result corresponding to the L2 norm, that is, a feature constraint sub-parameter is obtained. Through the method, the characteristic constraint sub-parameters are obtained through calculation, so that backward propagation is facilitated according to the gradient information corresponding to the characteristic constraint sub-parameters, the network parameters of the corresponding network structure layer in the model are adjusted, and the model is optimized.

(2.3) converting the first feature parameters into a first feature vector, and converting the second feature parameters into a second feature vector.

The first feature vector and the second feature vector may be vector parameters obtained by the model before the classification processing.

In order to obtain a feature vector before classification processing, in the embodiment of the present application, a first feature parameter and a second feature parameter obtained by feature extraction are converted into a feature vector. For example, the first feature parameter is normalized to obtain a corresponding first feature vector, and the second feature parameter is normalized to obtain a corresponding second feature vector.

In some embodiments, after the feature constraint sub-parameter is converged, the embodiment of the present application may obtain a first feature parameter through the feature extraction layer after the convergence, and perform normalization processing on the first feature parameter to obtain a corresponding first feature vector; and acquiring a second characteristic parameter through the converged characteristic extraction layer, and carrying out normalization processing on the second characteristic parameter to obtain a corresponding second characteristic vector. Therefore, the network parameters of the vector conversion layer of the model can be further adjusted subsequently on the basis of convergence of the feature extraction layer based on the previous feature extraction layer, so that the model is optimized, the iteration speed is increased, and the distillation training efficiency of the model is improved. And is not limited herein.

And (2.4) determining a vector constraint sub-parameter according to the first feature vector and the second feature vector, and adjusting a second network sub-parameter of the preset model according to the vector constraint sub-parameter until the vector constraint sub-parameter is converged.

The vector constraint sub-parameter may be a constraint parameter calculated according to the first eigenvector and the second eigenvector, and is used for adjusting a corresponding network parameter in the preset model.

The second network sub-parameter may be a network parameter of a corresponding network structure layer in the preset model, for example, a network parameter corresponding to the network structure layer for converting the first feature vector and the second feature vector.

In order to enable the feature vector corresponding to the low-quality face image extracted by the model to be the same as or close to the feature vector corresponding to the high-quality face image, knowledge self-distillation needs to be carried out on the model, namely the model adjusts corresponding network parameters in the model according to the difference between the feature vector of the high-quality face image and the feature vector of the low-quality face image at the same moment. Specifically, in the embodiment of the present application, a vector constraint sub-parameter is calculated according to a first feature vector corresponding to a first face image and a second feature vector corresponding to a second face image, a corresponding network structure in a preset model is adjusted according to the vector constraint sub-parameter, and after adjustment, a constraint sub-parameter is recalculated according to the first feature vector converted from the adjusted preset model and the second feature vector corresponding to the second face image; when the vector constraint sub-parameters generated by the first feature vector and the second feature vector do not change any more, that is, the current vector constraint sub-parameter is the same as or very close to the last vector constraint sub-parameter, which indicates that the vector constraint sub-parameters are convergent, at this time, the first feature vector and the second feature vector obtained by the model conversion are the same as or very close to each other, which indicates that the feature vector of the low-quality face image obtained by the current network structure layer of the current model is close to the vector parameter of the high-quality face image, and the knowledge self-distillation of the current network structure layer in the preset model is completed.

In some embodiments, the determining the vector constraint subparameter according to the first feature vector and the second feature vector in step (2.4) includes:

(2.4.1) performing cosine calculation according to the first eigenvector and the second eigenvector to obtain a cosine constraint value;

(2.4.2) determining the cosine constraint value as a vector constraint subparameter.

In order to obtain a vector constraint subparameter for constraining a network parameter of a vector conversion layer of a model, the embodiment of the application performs calculation according to a first eigenvector and a second eigenvector obtained by conversion, specifically performs cosine calculation on the first eigenvector and the second eigenvector according to a corresponding second loss function in the model to obtain a cosine constraint value, and determines the cosine constraint value as the vector constraint subparameter. By the method, the vector constraint sub-parameters are obtained through calculation, so that the subsequent back propagation is facilitated according to the gradient information corresponding to the feature constraint sub-parameters, the network parameters of the corresponding network structure layer in the model are adjusted, and the network parameters of the corresponding network structure layer in the model are optimized.

It should be noted that, in the embodiment of the present application, the model may include a feature extraction layer and a classification layer, where the feature extraction layer may include a plurality of network structure layers, for example, the feature extraction layer includes a first sub-feature extraction layer, a second sub-feature extraction layer, a third sub-feature extraction layer, and a fourth sub-vector conversion layer. In order to improve the recognition rate of the model when recognizing the low-quality face image, in the embodiment of the application, after feature processing is performed on each network structure layer during distillation training of the model, constraint parameter calculation is performed on the obtained features through corresponding loss functions so as to adjust the network parameters of the corresponding network structure layers until convergence.

For example, after the first sub-feature extraction layer, the second sub-feature extraction layer and the third sub-feature extraction layer perform feature extraction, norm calculation is performed on extracted feature parameters through a first loss function preset in a model to obtain corresponding feature constraint sub-parameters, and network parameters of the corresponding feature extraction layers are adjusted according to the corresponding feature constraint sub-parameters respectively until the corresponding feature constraint sub-parameters converge; illustratively, after extracting a first characteristic parameter of a first facial image and a second characteristic parameter of a second facial image, the first sub-feature extraction layer performs L2 norm calculation according to the first characteristic parameter and the second characteristic parameter to obtain a characteristic constraint sub-parameter, and reversely propagates gradient information of the characteristic constraint sub-parameter to adjust a network parameter of the first sub-feature extraction layer until the characteristic constraint sub-parameter converges, thereby completing distillation training of the first sub-feature extraction layer inside the model. The distillation training process of the second sub-feature extraction layer and the third sub-feature extraction layer is the same as or similar to that of the first sub-feature extraction layer, and is not repeated here.

Further, the fourth sub-vector conversion layer converts the feature parameters extracted by the third sub-feature extraction layer into corresponding feature vectors, such as the first feature vector and the second feature vector. And performing cosine calculation on the first characteristic vector and the second characteristic vector obtained by conversion through a second loss function preset in the model to obtain a vector constraint subparameter, and reversely transmitting gradient information of the cosine constraint subparameter to adjust the network parameter of the fourth sub-vector conversion layer until the vector constraint subparameter is converged, thereby completing distillation training of the fourth sub-vector conversion layer in the model.

Further, network parameters of the model classification layer are adjusted according to the probability value difference between the prediction probability value output by the model and the preset probability value until the probability value difference is converged, and the trained target model is obtained.

The model is trained in the above mode to obtain a trained target model, any face image to be recognized can be recognized through the target model, and the recognition rate and accuracy of low-quality face images can be improved.

104. And carrying out face recognition on the face image to be recognized through the target model.

The facial image to be recognized may be any facial image to be recognized, such as a high-quality facial image with a higher image score or a low-quality facial image with a lower image score.

It should be noted that, when the face image to be recognized is a low-quality face image, since the target model is self-distilled, when the low-quality face image is recognized, the feature parameters extracted from the low-quality face image can be close to the feature parameters corresponding to the high-quality face image, and high-quality feature vectors are represented, so that classification processing is performed based on the high-quality feature vectors, the recognition of the face image to be recognized is completed, the recognition rate of the low-quality face image is improved, and the accuracy of the recognition of the low-quality face image is improved.

In some embodiments, the step of "performing face recognition on a face image to be recognized through a target model" includes:

(1) and inputting the face image to be recognized into the target model.

(2) And performing feature extraction on the face image to be recognized through the target model to obtain feature parameters corresponding to the face image to be recognized.

It should be noted that, when the target model performs feature extraction on a face image to be recognized, the target model may include feature extraction processing of a plurality of sub-feature extraction layers. The feature extraction method may be convolution processing, and if the target model includes a first convolution layer, a second convolution layer, and a third convolution layer, feature extraction is performed on the face image to be recognized through combination of the three convolution layers. Specifically, a first sub-feature (feature parameter or feature matrix) is obtained by performing convolution processing on the face image to be recognized through a first convolution layer, a second sub-parameter is obtained by performing convolution processing on the first sub-parameter through a second convolution layer, and a third sub-parameter is obtained by performing convolution processing on the second sub-parameter through a third convolution layer, wherein the third sub-parameter is a feature parameter corresponding to the face image to be recognized.

(3) And carrying out normalization processing on the characteristic parameters through the target model to obtain characteristic vectors corresponding to the characteristic parameters.

In order to obtain the feature vector for classification processing, in the embodiment of the present application, after obtaining the feature parameter corresponding to the face image to be recognized, the feature parameter needs to be converted into the corresponding feature vector. Specifically, the target model further includes a fourth convolution layer, and the feature parameters (the third sub-parameters) corresponding to the face image to be recognized are converted into feature vectors by the fourth convolution layer, so as to perform classification processing subsequently according to the feature vectors.

(4) And classifying the characteristic vectors through the target model to obtain an image recognition result corresponding to the face image to be recognized.

It should be noted that the target model further includes a classification layer, which is used for performing classification processing on the received feature vectors. Specifically, after receiving the feature vector, the classification layer performs classification processing on the feature vector to obtain a classification result, where the classification result may be a similarity between the face image to be recognized and a pre-stored target face image, or a probability value of the face image to be recognized and the pre-stored target face image, and determines the similarity or the probability value as an image recognition result corresponding to the face image to be recognized.

In the embodiment of the application, two face image samples which accord with a preset score difference range are selected as a face image sample pair through the score difference of the image scores, high-quality and low-quality face images which accord with a preset quality difference are selected as sample data, and the sample data are applied to self-distillation training of a preset model, so that the feature difference between the high-quality face image sample pair and the low-quality face image in the face image sample pair can transmit enough information in subsequent model self-distillation training, and meanwhile, the phenomenon that the model cannot be converged or over-fitted due to overlarge difference between the high-quality face image sample and the low-quality face image is avoided.

Furthermore, in the distillation training process of the model, after the feature parameters corresponding to the high-quality face images and the low-quality face images are extracted from each network structure layer, the constraint parameters are calculated according to the feature parameters corresponding to the high-quality face images and the low-quality face images through a preset loss function, so that the network parameters of the current network structure layer are adjusted according to the constraint parameters until the constraint parameters are converged; therefore, the network parameters of each network structure layer of the target model are independently optimized until the network parameters of all the feature extraction layers are optimized, the characterization capability of the model in recognizing the feature parameters of the low-quality face image is improved through the feature parameters of the high-quality face image, so that the model approaches to the features of the high-quality face image when the low-quality face image is recognized, and the recognition rate of the model on the face image is improved.

In addition, after the distillation training of the model is finished, network parameters of the classification layer are adjusted according to the probability value difference between the prediction probability value output by the model and the preset probability value until the probability value difference is converged to obtain a trained target model, so that any face image to be recognized can be recognized through the target model subsequently. It should be noted that, when the target model identifies a low-quality face image to be identified, the rate of identifying the low-quality face image to be identified by the target model is increased, and the accuracy of identifying the face image to be identified by the target model is increased.

As can be seen from the above, in the embodiment of the present application, a plurality of face image samples of a target object and an image score corresponding to each face image sample can be acquired; selecting a face image sample pair from a plurality of face image samples according to the difference between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; performing combined training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image; and carrying out face recognition on the face image to be recognized through the target model. Therefore, the preset model is trained through the face image sample pair, so that the preset model is distilled according to the difference between the first characteristic parameter corresponding to the first face image in the face image sample pair and the second characteristic parameter corresponding to the second face image, self-constraint is realized until the characteristics of the model in the face image sample pair are consistent, and a target model is obtained; therefore, the representation capability of the model in recognizing the characteristic parameters of the low-quality face image is improved through the characteristic parameters of the high-quality face image, so that the model is close to the characteristics of the high-quality face image when the low-quality face image is recognized, the face image to be recognized is conveniently recognized through the target model in the follow-up process, the recognition rate of any face image is improved, and the accuracy of face recognition is improved.

The method described in the above examples is further illustrated in detail below by way of example.

The embodiment of the present application takes face image recognition as an example, and further describes the face image recognition method provided by the embodiment of the present application.

Referring to fig. 3, fig. 3 is a schematic view of a flowchart of another step of the face image recognition method provided in the embodiment of the present application, fig. 4 is a schematic view of an effect of iterative training provided in the embodiment of the present application, fig. 5 is a schematic view of a scene of the face image recognition method provided in the embodiment of the present application, and fig. 6 is a schematic view of a scene of the attention feature extraction method provided in the embodiment of the present application; for ease of understanding, please refer to fig. 3, 4, 5 and 6 together to describe the embodiments of the present application.

In the embodiment of the present application, a description will be given from the perspective of a face image recognition apparatus, which may be specifically integrated in a computer device such as a terminal or a server. When a processor on the terminal or the server executes a program corresponding to the face image recognition method, the specific flow of the face image recognition method is as follows:

201. and acquiring a plurality of face image samples of the target object and an image score corresponding to each face image sample.

The face image sample can be an image containing only the human face appearance or at least the human face appearance, and contains the biological characteristic information of the human face.

202. And selecting a face image sample pair from a plurality of face image samples according to the difference value between the image scores.

In order to improve the recognition rate of the model in the subsequent recognition of the low-quality face image, the low-quality face image is distinguished from the face images of other objects, so that the recognition effects of the low-quality face image and the high-quality face image of the same object are consistent. According to the embodiment of the application, the human face images of the same object and different qualities are selected as the training data of the model, so that the recognition rate of the model on the low-quality human face images is improved. The method for selecting the training data may include: acquiring a score difference value of image scores of any two face image samples in a plurality of face image samples; filtering the score difference value according to a preset score difference value range to obtain a score difference value in the preset score difference value range; and determining two face image samples corresponding to the score difference value within the preset score difference value range as a face image sample pair.

For example, assume that the image score of the first face image (high quality) in the face image sample pair is Q (x)^h) The image score of the second face image (low quality) is Q (x)^l) The difference of the scores of the two images is Q (x)^h)-Q(x^l) (ii) a Setting the average value of the score difference values of the image scores corresponding to a plurality of face image samples of the same target object as mu, and representing the preset score difference value range as [ (1-tau) mu, (1+ tau) mu]Therefore, the formula for selecting the face image sample pair can be expressed as follows:

and if the score difference value of the image scores between the two face image samples is in the preset score difference value range, selecting the two face image samples corresponding to the score difference value in the preset score difference value range as a face image sample pair for the distillation training of the model. Specifically, the characteristic parameters of the first face image and the characteristic parameters of the second face image in the face image sample pair are subjected to gradient back transmission through a loss function. And if the score difference value of the image scores between the two face image samples is not in the preset score difference value range, setting the corresponding loss function to be zero, and not performing gradient information back propagation.

Through the mode, the face image sample pairs (pair) are formed by the two face image samples at random, the samples of the face image samples with different qualities can be ensured to be fully contacted, and meanwhile, only the relative relation exists between the high-quality samples and the low-quality samples, so that the optimization effects of the model on the low-quality samples can be improved in a mode of high-quality distillation and medium-quality distillation and low-quality iteration. Furthermore, the face image sample pair within the preset score difference range is selected, so that sufficient information quantity which can be transmitted to the model by the feature difference between the high-quality face image sample and the low-quality face image can be ensured, meanwhile, the phenomenon that the model cannot be converged or over-fitted due to the overlarge difference between the high-quality face image sample and the low-quality face image is avoided, and the recognition rate of the model obtained by subsequent training for recognizing the low-quality face image is improved.

203. And inputting the face image sample pair and the corresponding image fraction into a preset model.

The preset model may be an algorithm for training face recognition (ArcFace) model based on the loss function.

204. And extracting a first characteristic parameter corresponding to the first face image and extracting a second characteristic parameter corresponding to the second face image.

The network structure of the preset model and the target model may include a feature extraction layer. The feature extraction layer is used for extracting features of the face image input to the model and converting the extracted features into feature vectors. The feature extraction layer may include a plurality of network structure layers, for example, the feature extraction layer includes a first sub-feature extraction layer, a second sub-feature extraction layer, a third sub-feature extraction layer, and a fourth sub-vector conversion layer; the first sub-feature extraction layer, the second sub-feature extraction layer and the third sub-feature extraction layer may be convolution network layers, and corresponding features are sequentially convolved by the convolution network layers to realize feature extraction and obtain an intermediate layer feature, where the intermediate layer feature is a first feature parameter and a second feature parameter in this embodiment of the present application.

205. And generating a characteristic constraint sub-parameter according to the first characteristic parameter and the second characteristic parameter, and adjusting the first network sub-parameter of the preset model according to the characteristic constraint sub-parameter until the characteristic constraint sub-parameter is converged.

For example, the model according to the embodiment of the present application further includes an attention network module (fransformer) that extracts attention features (attention maps) in the first feature parameter and the second feature parameter, respectively. Referring to fig. 6, exemplarily, assuming that the sizes of the first face image and the second face image of the input model are H × W × C, after feature extraction is performed by the first sub-feature extraction layer, corresponding intermediate layer features (first feature parameters and second feature parameters) are obtained, and average compression processing, namely avg (X) is performed on the sub-feature parameters of each color channel in the first feature parameters and the second feature parameters by the transform module respectively_i,j,:) Obtaining a first characteristic parameter and a second characteristic parameter after compression; meanwhile, the first feature parameter and the second feature parameter after compression are respectively normalized by a Transformer module, namely (X-min (X))/(max (X)) -min (X))) to obtain a first feature sub-parameter and a second feature sub-parameter, wherein the first feature sub-parameter is an attention feature (attention map) of the first feature parameter, the second feature sub-parameter is an attention feature (attention map) of the second feature parameter, and the size of the attention feature (attention map) is H W.

Further, by in-modelFirst loss function L_MSEThe L2 norm calculation is performed on the first feature sub-parameter and the second feature parameter, for example, the feature constraint sub-parameter is obtained by squaring and then square root of each element of the first feature sub-parameter and the second feature sub-parameter. And carrying out back propagation on the gradient information of the characteristic constraint sub-parameters so as to adjust the network parameters of a corresponding first sub-characteristic extraction layer in the model and optimize the first sub-characteristic extraction layer.

In addition, after the second sub-feature extraction layer in the preset model extracts the intermediate layer features, the Transformer module is used for respectively extracting attention features (attention maps) in the intermediate layer features, and the extraction mode of the attention features (attention maps) is consistent with the mode of extracting the attention features (attention maps) in the intermediate layer features extracted by the first sub-feature extraction layer by the Transformer module. And the way of extracting attention features (attention maps) from the intermediate layer features extracted by the transform module in the third sub-feature extraction layer is the same as that described above, and further details are not described in this embodiment of the application.

Further, by applying a first loss function L_MSEThe second sub-feature extraction layer extracts attention features (attention maps) corresponding to the features (first feature parameters and second feature parameters) of the middle layer, performs L2 norm calculation to obtain corresponding feature constraint sub-parameters, and performs backward propagation on gradient information of the feature constraint sub-parameters to adjust network parameters of the corresponding second sub-feature extraction layer in the model and optimize the second sub-feature extraction layer.

Furthermore, by applying a first loss function L_MSEThe third sub-feature extraction layer extracts attention features (attention maps) corresponding to the features (the first feature parameters and the second feature parameters) of the middle layer, performs L2 norm calculation to obtain corresponding feature constraint sub-parameters, and performs backward propagation on gradient information of the feature constraint sub-parameters to adjust network parameters of the corresponding third sub-feature extraction layer in the model and optimize the third sub-feature extraction layer.

Through the mode, the distillation training of the first sub-feature extraction layer, the second sub-feature extraction layer and the third sub-feature extraction layer in the preset model is realized according to the difference between the high-quality face image and the low-quality face image.

206. The first feature parameters are converted into a first feature vector, and the second feature parameters are converted into a second feature vector.

Specifically, the feature extraction layer of the preset model includes a fourth vector conversion layer, and the middle layer features extracted by the third sub-feature extraction layer are converted into corresponding feature vectors through the fourth vector conversion layer. For example, a first feature parameter extracted by the third sub-feature extraction layer is converted into a first feature vector by the fourth vector conversion layer, and a second feature parameter extracted by the third sub-feature extraction layer is converted into a second feature vector by the fourth vector conversion layer. The feature parameters of the fourth vector conversion layer may be processed by normalization.

207. And performing cosine calculation according to the first eigenvector and the second eigenvector to obtain a cosine constraint value, determining the cosine constraint value as a vector constraint sub-parameter, and adjusting a second network sub-parameter of the preset model according to the vector constraint sub-parameter until the vector constraint sub-parameter is converged.

The second network sub-parameter may be a network parameter of a corresponding network structure layer in the preset model, for example, a network parameter corresponding to a fourth vector conversion layer for converting to obtain the first feature vector and the second feature vector.

In order to obtain a vector constraint subparameter for constraining network parameters of a vector conversion layer of a model, the embodiment of the present application performs calculation according to a first eigenvector and a second eigenvector obtained by conversion, specifically, according to a second loss function L in the model_cosAnd performing cosine calculation on the first characteristic vector and the second characteristic vector to obtain a cosine constraint value, and determining the cosine constraint value as a vector constraint subparameter. By the method, the vector constraint subparameter is obtained through calculation so as to facilitate subsequent constraint subparameter according to the vectorAnd the gradient information corresponding to the parameters is reversely propagated so as to adjust the network parameters of the fourth vector conversion layer in the model and realize the optimization of the fourth vector conversion layer in the model.

208. And according to the probability value difference between the predicted probability value output by the preset model after the constraint parameter convergence and the preset probability value, carrying out classification training on the preset model after the constraint parameter convergence until the probability value difference is converged to obtain a trained target model.

The prediction probability value may be a prediction classification structure when a preset model identifies a face image sample, that is, a similarity or probability that the face image sample belongs to a certain object. Specifically, the preset model performs feature extraction according to the face image sample pair, and performs feature classification to obtain a probability value.

The preset probability value can be a probability value set for the face image sample in advance, and is used for carrying out convergence training on a result output by the preset model.

It should be noted that, after the network parameters corresponding to the preset model are adjusted according to the corresponding constraint parameters in step 203-207, the preset model with the converged constraint parameters is obtained, because the current model has the recognition capability for the low-quality face image, but the output result of the model is not accurate. Therefore, in order to improve the accuracy of the recognition result output from the model, it is necessary to perform convergence training of the output result for the model.

For example, after a preset model with converged constraint parameters is obtained, convergence training is performed on the probability value of the output of the preset model. Specifically, after the preset model extracts the features of the first face image and the second face image in the face image sample pair, the classification layer performs classification processing according to the vectors corresponding to the extracted features to obtain the prediction probability value. When the difference exists between the predicted probability value and the preset probability value, the reverse propagation is carried out according to the gradient information of the difference of the probability values between the predicted probability value and the preset probability value so as to adjust the classification layer (L) in the preset model_cls) Until the probability value difference converges, the network parameter (second network parameter) of (1) is trainedThe latter object model.

The QKD (Quality-aware Knowledge Distillation) algorithm provided by the embodiment of the application is mainly used for approaching the similarity of the same person of a low-Quality face to the similarity of the same person of a high-Quality face through Quality component prior. As shown in fig. 4, the abscissa is the quality score (image score), the ordinate is the similarity (or probability value) of the same target object, and as the number of training iterations increases, it can be seen that the similarity of the same person of the low-quality sample is continuously improved, and finally the recognition effect of the low-quality sample is improved.

Specifically, the target loss function of the QKD pair proposed in the embodiment of the present application includes three loss functions, which are respectively the first loss function L_MSESecond loss function L_cosThird loss function L_clsSpecifically, the target loss function formula is as follows:

L＝L_cls+1(x^h,x^l)·(λ_aL_MSE+λ_bL_cos)

wherein L represents an objective loss function, λ_aRepresenting a first loss function L_MSELoss weight of λ_bRepresenting the second loss function L_cosThe loss weight of (2).

By performing steps 203-208 in the embodiment of the present application, after the first sub-feature extraction layer, the second sub-feature extraction layer and the third sub-feature extraction layer extract corresponding feature parameters, the first loss function L is used_MSECalculating corresponding characteristic constraint sub-parameters, and reversely transmitting gradient information corresponding to the characteristic constraint sub-parameters to adjust network parameters of corresponding sub-characteristic extraction layers until the characteristic constraint sub-parameters are converged, so as to realize optimization of the corresponding sub-characteristic extraction layers; and after the fourth vector conversion layer converts the second feature parameters extracted by the third sub-feature extraction layer into second feature vectors, the second loss function L is used for_cosPerforming cosine calculation on the first characteristic vector and the second characteristic vector to obtain a cosine constraint value, determining the cosine constraint value as a vector constraint sub-parameter, and reversely transmitting gradient information corresponding to the vector constraint sub-parameter to match a modeAnd adjusting the network parameters of the fourth vector conversion layer in the model until the vector constraint sub-parameters are converged, thereby realizing the optimization of the fourth vector conversion layer in the model. Further, the classification layer (L) is classified according to a difference in probability value between the predicted probability value and a preset probability value_cls) And adjusting the network parameters until the probability value difference is converged to obtain the trained target model. Therefore, the final characteristic vector (embedding) and the intermediate layer characteristic (characteristic parameter) are constrained and guided, the network parameters of the preset model are adjusted, the trained target model is obtained, more information can be transmitted from the high-quality face image sample to the low-quality face image sample, the risk of overfitting is avoided, the subsequent recognition rate of the model when the face image is recognized is improved, and the accuracy of the recognition of the face image is improved.

209. And carrying out face recognition on the face image to be recognized through the target model to obtain an image recognition result corresponding to the face image to be recognized.

For example, the identification process of the target model to the face image to be identified may be: inputting a face image to be recognized into a target model; performing feature extraction on the face image to be recognized through a target model to obtain feature parameters corresponding to the face image to be recognized; normalizing the characteristic parameters through the target model to obtain characteristic vectors corresponding to the characteristic parameters; and classifying the characteristic vectors through the target model to obtain an image recognition result corresponding to the face image to be recognized.

It should be noted that, when the target model performs feature extraction on a face image to be recognized, the target model may include feature extraction processing of a plurality of sub-feature extraction layers. For example, the feature extraction layers include a first sub-feature extraction layer, a second sub-feature extraction layer, a third sub-feature extraction layer, and a fourth sub-vector conversion layer. The feature extraction mode of the first sub-feature extraction layer, the second sub-feature extraction layer and the third sub-feature extraction layer may be convolution processing. Specifically, a first sub-feature (feature parameter or feature matrix) is obtained by performing convolution processing on a face image to be recognized through a first sub-feature extraction layer; performing convolution on the first sub-parameter through a second sub-feature extraction layer to obtain a second sub-parameter; and performing convolution processing on the second sub-parameter through a third sub-feature extraction layer to obtain a third sub-parameter. Further, normalization processing is carried out on the third sub-parameters obtained by convolution processing of the third sub-feature extraction layer through a fourth sub-vector conversion layer, and feature vectors are obtained. And finally, classifying the feature vectors through a classification layer of the target model to obtain a classification result, wherein the classification result can be the similarity between the face image to be recognized and a pre-stored target face image or the probability value of the face image to be recognized and the pre-stored target face image, and the similarity or the probability value is determined as the image recognition result corresponding to the face image to be recognized.

As can be seen from the above, in the embodiment of the present application, a plurality of face image samples of a target object and an image score corresponding to each face image sample can be acquired; selecting a face image sample pair from a plurality of face image samples according to the difference between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; performing combined training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image; and carrying out face recognition on the face image to be recognized through the target model. Therefore, the preset model is trained through the face image sample pair, the preset model distills according to the difference between the first characteristic parameter corresponding to the first face image in the face image sample pair and the second characteristic parameter corresponding to the second face image, self-constraint is achieved until the characteristics of the model representing the face image sample pair are consistent, the target model is obtained, the face image to be recognized is conveniently recognized through the target model, the recognition rate of any face image is improved, and therefore the accuracy of face recognition is improved.

In order to better implement the method, the embodiment of the present application further provides a facial image recognition apparatus, which may be integrated in a network device, such as a server or a terminal, and the terminal may include a tablet computer, a notebook computer, and/or a personal computer.

For example, as shown in fig. 7, the facial image recognition apparatus may include an acquisition unit 301, a selection unit 302, a training unit 303, and a recognition unit 304.

An acquisition unit 301, configured to acquire a plurality of face image samples of a target object and an image score corresponding to each face image sample;

a selecting unit 302, configured to select a face image sample pair from multiple face image samples according to a difference between image scores, where the face image sample pair includes a first face image and a second face image, and an image score of the first face image is greater than an image score of the second face image;

the training unit 303 is configured to perform joint training on a preset model according to the face image sample pair and the corresponding image score to obtain a trained target model, where the target model is obtained by performing distillation training on a difference between a first characteristic parameter corresponding to the first face image and a second characteristic parameter corresponding to the second face image;

and the recognition unit 304 is configured to perform face recognition on the face image to be recognized through the target model.

In some embodiments, the training unit 303, comprises:

the first training subunit is used for generating constraint parameters according to the first characteristic parameters of the first face image and the second characteristic parameters of the second face image through a preset model, and adjusting the network parameters of the preset model according to the constraint parameters until the constraint parameters are converged;

and the second training subunit is used for carrying out classification training on the preset model after the constraint parameter convergence according to the probability value difference between the predicted probability value output by the preset model after the constraint parameter convergence and the preset probability value until the probability value difference is converged to obtain the trained target model.

In some embodiments, the first training subunit is further specifically configured to:

extracting a first characteristic parameter corresponding to the first face image and extracting a second characteristic parameter corresponding to the second face image; generating a characteristic constraint sub-parameter according to the first characteristic parameter and the second characteristic parameter, and adjusting a first network sub-parameter of a preset model according to the characteristic constraint sub-parameter until the characteristic constraint sub-parameter is converged; converting the first feature parameters into first feature vectors and converting the second feature parameters into second feature vectors; and determining a vector constraint subparameter according to the first eigenvector and the second eigenvector, and adjusting a second network subparameter of the preset model according to the vector constraint subparameter until the vector constraint subparameter is converged.

respectively carrying out average compression processing on the sub-characteristic parameters of each color channel in the first characteristic parameter and the second characteristic parameter to obtain a compressed first characteristic parameter and a compressed second characteristic parameter; respectively carrying out normalization processing on the compressed first characteristic parameter and the compressed second characteristic parameter to obtain a first characteristic sub-parameter and a second characteristic sub-parameter; and performing norm calculation according to the first characteristic subparameter and the second characteristic subparameter to obtain a characteristic constraint subparameter.

the cosine constraint value is determined as a vector constraint subparameter.

In some embodiments, the selecting unit 302 is further configured to:

acquiring a score difference value of image scores of any two face image samples in a plurality of face image samples; filtering the score difference value according to a preset score difference value range to obtain a score difference value in the preset score difference value range; and determining two face image samples corresponding to the score difference value within the preset score difference value range as a face image sample pair.

In some embodiments, the identifying unit 304 is further configured to:

inputting a face image to be recognized into a target model; performing feature extraction on the face image to be recognized through a target model to obtain feature parameters corresponding to the face image to be recognized; normalizing the characteristic parameters through the target model to obtain characteristic vectors corresponding to the characteristic parameters; and classifying the characteristic vectors through the target model to obtain an image recognition result corresponding to the face image to be recognized.

As can be seen from the above, in the embodiment of the present application, the acquisition unit 301 may acquire a plurality of face image samples of the target object and an image score corresponding to each face image sample; selecting a face image sample pair from a plurality of face image samples according to the difference value between the image scores by a selecting unit 302, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; performing joint training on a preset model through a training unit 303 according to the face image sample pairs and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image; the recognition unit 304 is configured to perform face recognition on the face image to be recognized through the target model. Therefore, the preset model is trained through the face image sample pair, so that the preset model is distilled according to the difference between the first characteristic parameter corresponding to the first face image in the face image sample pair and the second characteristic parameter corresponding to the second face image, self-constraint is realized until the characteristics of the model in the face image sample pair are consistent, and a target model is obtained; therefore, the representation capability of the model in recognizing the characteristic parameters of the low-quality face image is improved through the characteristic parameters of the high-quality face image, so that the model is close to the characteristics of the high-quality face image when the low-quality face image is recognized, the face image to be recognized is conveniently recognized through the target model in the follow-up process, the recognition rate of any face image is improved, and the accuracy of face recognition is improved.

The embodiment of the present application further provides a computer device, as shown in fig. 8, which shows a schematic structural diagram of the computer device according to the embodiment of the present application, and specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 8 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 via a power management system, so that functions of managing charging, discharging, and power consumption are implemented via the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment of the present application, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application programs stored in the memory 402, thereby implementing various functions as follows:

acquiring a plurality of face image samples of a target object and an image score corresponding to each face image sample; selecting a face image sample pair from a plurality of face image samples according to the difference between the image scores, wherein the face image sample pair comprises a first face image and a second face image, and the image score of the first face image is greater than that of the second face image; performing combined training on a preset model according to the face image sample pair and the corresponding image scores to obtain a trained target model, wherein the target model is obtained by performing distillation training on the difference between a first characteristic parameter corresponding to a first face image and a second characteristic parameter corresponding to a second face image; and carrying out face recognition on the face image to be recognized through the target model.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the face image recognition methods provided in the present application. For example, the instructions may perform the steps of:

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any of the face image recognition methods provided in the embodiments of the present application, beneficial effects that can be achieved by any of the face image recognition methods provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The face image recognition method, apparatus, device and computer-readable storage medium provided by the embodiments of the present application are introduced in detail, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the embodiments is only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A face image recognition method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the jointly training the preset model according to the face image sample pair and the corresponding image scores to obtain the trained target model comprises:

inputting the face image sample pair and the corresponding image fraction into a preset model;

generating constraint parameters according to the first characteristic parameters of the first facial image and the second characteristic parameters of the second facial image through the preset model, and adjusting the network parameters of the preset model according to the constraint parameters until the constraint parameters are converged;

and according to the probability value difference between the predicted probability value output by the preset model after the constraint parameter convergence and the preset probability value, carrying out classification training on the preset model after the constraint parameter convergence until the probability value difference is converged to obtain a trained target model.

3. The method according to claim 2, wherein the constraint parameters include a feature constraint sub-parameter and a vector constraint sub-parameter, the network parameters include a first network sub-parameter and a second network sub-parameter, the generating constraint parameters according to the first feature parameter of the first facial image and the second feature parameter of the second facial image by the preset model, and adjusting the network parameters of the preset model according to the constraint parameters until the constraint parameters converge comprises:

4. The method according to claim 3, wherein the generating the feature constraint sub-parameters according to the first feature parameter and the second feature parameter comprises:

5. The method of claim 3, wherein determining the vector constraint subparameter from the first eigenvector and the second eigenvector comprises:

6. The method of claim 1, wherein selecting a face image sample pair from the plurality of face image samples according to the difference between the image scores comprises:

7. The method according to claim 1, wherein the performing face recognition on the face image to be recognized through the target model comprises:

inputting the face image to be recognized into the target model;

8. A face image recognition apparatus, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in the face image recognition method according to any one of claims 1-7.

10. A computer-readable storage medium, wherein the computer-readable storage medium is computer-readable and stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps of the facial image recognition method according to any one of claims 1 to 7.