CN114140862A

CN114140862A - Model training method, face recognition device, face recognition equipment, face recognition medium and product

Info

Publication number: CN114140862A
Application number: CN202111531367.2A
Authority: CN
Inventors: 郑明悟; 黄迪; 杨鸿宇; 邱迪; 甄成; 陈新华; 闫鹏飞; 韩浩; 魏晓林
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-04

Abstract

The application discloses a model training method, a face recognition device, face recognition equipment, a face recognition medium and a face recognition product, and belongs to the technical field of artificial intelligence. The method comprises the following steps: obtaining at least one group of face image samples, wherein each group of face image samples comprises a first image sample and a second image sample, and the second image sample is a sample obtained by amplifying the first image sample; extracting a first face feature from the first image sample through a recognition network; performing face feature enhancement processing on the second image sample through a feature enhancement network to obtain an enhanced image sample; extracting a second face feature from the enhanced image sample through an identification network; and adjusting network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain a trained face recognition model. The method can improve the face recognition accuracy of the face recognition model.

Description

Model training method, face recognition device, face recognition equipment, face recognition medium and product

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a model training method, a face recognition method, an apparatus, a device, a medium, and a product.

Background

Nowadays, the three-dimensional face recognition technology is widely popularized and applied, and application scenes of the technology comprise gate card punching, rider identity recognition, offline payment and the like.

In the related art, a three-dimensional face recognition network is used for recognizing the three-dimensional face in various application scenes. Generally, a three-dimensional face recognition network trains a pair of high-quality face images and low-quality face images; the high-quality face image refers to a face image with image quality higher than a specified index; the low-quality face image refers to a face image with image quality lower than the specified index.

The training mode is very easy to enable the three-dimensional face recognition network to be over-fitted during feature learning.

Disclosure of Invention

The embodiment of the application provides a model training method, a face recognition device, a face recognition medium and a product, wherein in the model training method, face features are extracted from different images, model parameters are adjusted through errors between the face features and the face features, and the model parameters are adjusted relative to errors based on pixels, so that overfitting of feature learning in a model can be avoided; and a feature enhancement network is added in the model, so that the feature enhancement can be carried out on the face image before face recognition, and the accuracy of the face recognition is improved. The technical scheme is as follows:

according to one aspect of the present application, there is provided a model training method for a face recognition model, the face recognition model including a feature enhancement network and a pre-trained recognition network, the method including:

obtaining at least one group of face image samples, wherein each group of face image samples comprises a first image sample and a second image sample, and the second image sample is a sample obtained by amplifying the first image sample;

extracting a first face feature from the first image sample through a recognition network;

performing face feature enhancement processing on the second image sample through a feature enhancement network to obtain an enhanced image sample; extracting a second face feature from the enhanced image sample through an identification network;

and adjusting network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain a trained face recognition model.

According to another aspect of the present application, there is provided a face recognition method, which uses the face recognition model obtained by the above-mentioned model training method, and the method includes:

calling a face recognition model to perform face feature enhancement processing on the acquired face image to obtain an enhanced image;

calling a face recognition model to perform feature extraction on the enhanced image to obtain face features in the face image;

and calling a face recognition model to perform recognition based on the face features to obtain a face recognition result.

According to another aspect of the present application, there is provided a model training apparatus for a face recognition model, the face recognition model including a feature enhancement network and a pre-trained recognition network, the apparatus comprising:

the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring at least one group of face image samples, each group of face image samples comprises a first image sample and a second image sample, and the second image sample is a sample obtained by amplifying the first image sample;

the characteristic acquisition module is used for extracting a first face characteristic from the first image sample through the identification network; performing face feature enhancement processing on the second image sample through a feature enhancement network to obtain an enhanced image sample, and extracting second face features from the enhanced image sample through an identification network;

and the parameter adjusting module is used for adjusting the network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain the trained face recognition model.

According to another aspect of the present application, there is provided a face recognition apparatus, which uses the face recognition model obtained by the above-mentioned model training method, the apparatus comprising:

the characteristic enhancement module is used for calling a face recognition model to carry out face characteristic enhancement processing on the collected face image to obtain an enhanced image;

the feature extraction module is used for calling a face recognition model to perform feature extraction on the enhanced image to obtain face features in the face image;

and the face recognition module is used for calling the face recognition model to perform recognition based on the face features to obtain a face recognition result.

According to another aspect of the present application, there is provided a computer apparatus, including: a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the model training method of the face recognition model as described above, or the face recognition method as described above.

According to another aspect of the present application, there is provided a computer-readable storage medium having stored therein a computer program, which is loaded and executed by a processor to implement the model training method for a face recognition model as described above, or the face recognition method as described above.

According to another aspect of the present application, there is provided a computer program product (or computer program) comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the model training method of the face recognition model as described above, or the face recognition method as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

the face recognition model in the model training method comprises two parts: a feature enhancement network and a pre-trained recognition network; extracting a first face feature from a first image sample through an identification network, simultaneously performing feature enhancement processing on an amplification sample (namely a second image sample) of the first image sample through a feature enhancement network, and extracting a second face feature from the enhanced image sample through the identification network; based on the feature error between the first face feature and the second face feature, adjusting the network parameters of the feature enhancement network, thereby improving the capability of the feature enhancement network for enhancing the face features, so that the image quality of the face image can be improved through the feature enhancement network before the face features are extracted by the face recognition model, and the face features with higher quality can be extracted from the enhanced face image (the higher quality is relative to the face features directly extracted from the face image); and compared with the method that model training is carried out by adopting errors among pixels, overfitting of feature learning in the model can be avoided by training the model through the errors among the features.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating a structure of a face recognition model according to an exemplary embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for model training of a face recognition model according to an exemplary embodiment of the present application;

FIG. 3 shows a schematic diagram of an augmented human face provided by an exemplary embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for model training of a face recognition model according to another exemplary embodiment of the present application;

FIG. 5 is a flow chart illustrating a face recognition method provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a face image before and after denoising provided by an exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a face image before and after denoising provided by another exemplary embodiment of the present application;

FIG. 8 is a block diagram of a model training apparatus for a face recognition model according to an exemplary embodiment of the present application;

FIG. 9 illustrates a block diagram of a face recognition apparatus provided in an exemplary embodiment of the present application;

fig. 10 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference will first be made to several terms referred to in this application:

face recognition is a biometric technology for identity recognition based on facial feature information of a person. The method comprises the following steps of collecting images or video streams containing human faces by using a camera or a camera, automatically detecting and tracking the human faces in the images, and further carrying out face recognition on the detected human faces.

The three-dimensional face recognition technology is a technology for realizing face recognition based on a three-dimensional face image, can fully utilize space geometric information, and can overcome the limitations in aspects of illumination, makeup, posture and the like during two-dimensional face recognition. Because the three-dimensional face image acquired by the sensor has various defects, such as voids with different sizes and positions (namely, the loss of pixel values), random noise which is difficult to model, inherent noise which changes along with the sensor and an acquisition scene, information loss when the three-dimensional face image is sampled into a depth map, and the like, the high quality and the low quality of the three-dimensional face image directly influence the identification accuracy of the three-dimensional face identification network.

Generally, pairs of high-quality face images and low-quality face images are used as training data to train a three-dimensional face recognition network, so that the recognition accuracy of the three-dimensional face recognition network on the low-quality face images is improved. However, in the training process, the model parameters of the three-dimensional face recognition network are adjusted based on the pixel-level recognition loss between the high-quality face image and the low-quality face image, which may result in that the finally obtained three-dimensional face recognition network is easily over-fitted during feature learning.

Therefore, in view of the above problems, the present application provides a model training method for a face recognition model, which, as shown in fig. 1, is designed with two parts: a feature enhancement network 110 and a recognition network 120; the feature enhancement network 110 is used for performing feature enhancement on a face in a three-dimensional face image, so that the enhanced image can provide more accurate face features; the recognition network 120 is used to extract facial features from three-dimensional facial images (including the three-dimensional facial images before and after feature enhancement).

For training of the face recognition model, firstly, the recognition network 120 is separately pre-trained to obtain the recognition network 120 with the face recognition capability; then, inputting the first image sample into the recognition network 120, and outputting a first facial feature corresponding to the first image sample by the recognition network 120; and inputting the second image sample into the feature enhancing network 110, performing face feature enhancement processing on the second image sample to obtain an enhanced image sample, inputting the enhanced image sample into the recognition network 120, and outputting a second face image corresponding to the second image sample by the recognition network 120. The second image sample may be obtained by adding artificial noise to the first image sample, and therefore, the image quality of the second image sample is lower than that of the first image sample, the first face feature may be used as a reference feature, a feature error (that is, a feature loss) between the second face feature and the first face feature is calculated, a network parameter of the feature enhancement network 110 is adjusted based on the feature error, and the trained feature enhancement network 110 is obtained after multiple parameter adjustments. After the adjustment of the network parameters of the recognition network 120 and the feature enhancement network 110 is completed in sequence, the third image sample is further adopted to perform the overall parameter adjustment on the face recognition model, and finally the training of the face recognition model is completed.

The feature enhancement network is added in the face recognition model, and the face features can be enhanced firstly before face recognition, so that the recognition network can obtain more accurate face features; in contrast to adjusting network parameters using recognition losses between pixels, the above-described manner of adjusting network parameters based on recognition losses between features can avoid overfitting of feature learning in the model. For a detailed implementation of the training method for the face recognition model provided in the present application, refer to the following embodiments.

Fig. 2 shows a flowchart of a method for training a face recognition model according to an exemplary embodiment of the present application, where the method is applied to a computer device, which may be, for example, a terminal or a server, and the method includes:

step 210, at least one group of face image samples is obtained, each group of face image samples includes a first image sample and a second image sample, and the second image sample is a sample obtained by amplifying the first image sample.

For example, each first image sample may correspond to multiple sets of facial image samples, with a second image sample in the multiple sets of facial image samples being different. Alternatively, each first image sample may correspond to a set of face image samples including a plurality of different second image samples.

The face image sample may be prepared before model training, and the computer device may, for example, perform an amplification process on the first image sample by using at least one of the following amplification methods:

add noise to the first image sample.

Illustratively, the noise may be artificial noise. The noise may be at least one of shot noise, dark noise and readout noise, and the computer device may randomly add the one or more noises to the first image sample to obtain a second image sample corresponding to the first image sample.

Down-sampling or up-sampling the first image sample.

Randomly down-sampling the first image sample by the computer equipment to obtain a second image sample; or, randomly up-sampling the first image sample to obtain a second image sample; or, sequentially carrying out random down-sampling and random up-sampling on the first image sample to obtain a second image sample; or, the first image sample is sequentially subjected to random up-sampling and random down-sampling to obtain a second image sample.

Adjusting the depth distance between the person and the background in the first image sample.

And adjusting the depth distance between the person and the background in the first image sample by the computer equipment to obtain a second image sample after distance amplification, wherein the background refers to the pixel area except the person in the first image sample. Illustratively, the computer device performs distance augmentation on the first image sample using random distances to obtain a second image sample.

As shown in fig. 3, the computer device amplifies the original image to obtain a noise-amplified image, a distance-amplified image, a mask-amplified image, and a posture-amplified image, wherein the pitch posture amplification _1, the pitch posture amplification _2, and the pitch posture amplification _3, and the pitch posture amplification _4 are amplified by different elevation angles; the deflection attitude amplification _1, the deflection attitude amplification _2 and the deflection attitude amplification _3 are obtained by adopting different angles of right deflection to amplify, and the deflection attitude amplification _4, the deflection attitude amplification _5 and the deflection attitude amplification _6 are obtained by adopting different angles of left deflection to amplify.

After obtaining the second image sample, the computer device stores the second image sample and the first image sample into a memory of the computer device correspondingly, or stores the second image sample and the first image sample into a database specially used for storing training samples; in model training, the computer device reads the required face image samples from a memory or database.

The face recognition model comprises a feature enhancement network and a pre-trained recognition network, and for the first image sample, the step 220 is executed to extract the face features; for the second image sample, steps 230 to 240 are performed to extract the facial features, as follows:

step 220, extracting a first face feature from the first image sample through the recognition network.

The computer equipment performs M times of feature extraction based on the first image sample through an identification network to obtain M first sample feature vectors of different levels; the levels are used for indicating the times of feature extraction, and the first sample feature vector of each level is extracted from the first sample feature vector of the previous level; performing Multi-Scale Feature Fusion (Multi-Scale Feature Fusion) on the M first sample Feature vectors through an identification network to obtain fused first sample Feature vectors, wherein M is an integer greater than 1; performing feature learning under a Spatial Attention vector mechanism (Spatial Attention vector) on the fused first sample feature vector through an identification network to obtain a learned first sample feature vector; and extracting first face features from the learned first sample feature vector through a recognition network.

Illustratively, for feature fusion, the computer device performs maximum pooling (MaxPooling) on each of the M first sample feature vectors to obtain M pooled first sample feature vectors; and fusing the M pooled first sample feature vectors to obtain a fused first sample feature vector.

And step 230, performing face feature enhancement processing on the second image sample through a feature enhancement network to obtain an enhanced image sample.

The characteristic enhancement network comprises a noise reduction network and a characteristic superposition function; the computer equipment carries out noise reduction processing on the second image sample through a noise reduction network to obtain a noise-reduced sample residual image; and superposing the sample residual image to the second image sample through a characteristic superposition function to obtain an enhanced image sample.

Optionally, the noise reduction network includes a first dilation convolution and a first activation function, N second dilation convolutions and N second activation functions, where each second dilation convolution corresponds to a second activation function; for the noise reduction of the second image sample, performing convolution calculation on the second image sample by adopting a first expansion convolution to obtain a sample characteristic image after characteristic extraction; inputting the sample characteristic image into a first activation function to obtain a middle sample characteristic image after nonlinear processing; performing convolution calculation on the intermediate sample characteristic image by adopting a second expansion convolution to obtain a sample characteristic image after feature extraction again; inputting the sample characteristic image subjected to the secondary characteristic extraction into a second activation function to obtain an updated intermediate sample characteristic image; and repeating the step of updating the characteristic image of the intermediate sample for N times to obtain a sample residual image after noise reduction, wherein N is an integer greater than 1.

For example, the noise reduction network may also adopt a convolutional neural network (IRCNN) for Image reduction.

Optionally, the noise reduction network includes a first dilation convolution and a first activation function, N second dilation convolutions, N Batch Normalization (Batch Normalization) layers and N second activation functions, and a third dilation convolution; wherein each second dilation convolution corresponds to a batch normalization layer and a second activation function;

for the noise reduction of the second image sample, performing convolution calculation on the second image sample by adopting a first expansion convolution to obtain a sample characteristic image after characteristic extraction; inputting the sample characteristic image into a first activation function to obtain a middle sample characteristic image after nonlinear processing;

performing convolution calculation on the intermediate sample characteristic image by adopting a second expansion convolution to obtain a sample characteristic image after feature extraction again; inputting the sample characteristic image subjected to the characteristic extraction again into a batch normalization layer to obtain a normalized sample characteristic image; inputting the normalized sample characteristic image into a second activation function to obtain an updated intermediate sample characteristic image; repeating the step of updating the intermediate sample characteristic image for N times to obtain N times of updated intermediate sample characteristic images, wherein N is an integer greater than 1;

and performing convolution calculation on the intermediate sample characteristic image after the N times of updating by adopting a third expansion convolution to obtain a sample residual image after noise reduction.

Illustratively, as shown in fig. 1, the noise reduction network includes: a first dilation convolution and a first activation function, five second dilation convolutions, five bulk normalization layers and five second activation functions, and a third dilation convolution; the computer equipment performs convolution calculation on the low-quality image (namely the second image sample) by adopting the first expansion convolution to obtain a sample characteristic image after characteristic extraction; inputting the sample feature vector after feature extraction into a first activation function to obtain a middle sample feature image 11 after nonlinear processing;

performing convolution calculation on the intermediate sample characteristic image 11 by adopting the 1 st second expansion convolution to obtain a sample characteristic image 1 after feature extraction again; inputting the sample characteristic image 1 subjected to the characteristic extraction again into the 1 st batch normalization layer to obtain a normalized sample characteristic image 1; inputting the normalized sample characteristic image 1 into the 1 st second activation function to obtain an updated intermediate sample characteristic image 12;

performing convolution calculation on the intermediate sample characteristic image 12 by adopting the 2 nd second expansion convolution to obtain a sample characteristic image 2 after feature extraction again; inputting the sample characteristic image 2 subjected to the characteristic extraction again into the 2 nd batch normalization layer to obtain a normalized sample characteristic image 2; inputting the normalized sample characteristic image 2 into the 2 nd second activation function to obtain an updated intermediate sample characteristic image 13;

performing convolution calculation on the intermediate sample characteristic image 13 by adopting a 3 rd second expansion convolution to obtain a sample characteristic image 3 after feature extraction again; inputting the sample characteristic image 3 subjected to the characteristic extraction again into a 3 rd batch normalization layer to obtain a normalized sample characteristic image 3; inputting the normalized sample characteristic image 3 into the 3 rd second activation function to obtain an updated intermediate sample characteristic image 14;

carrying out convolution calculation on the intermediate sample characteristic image 14 by adopting a 4 th second expansion convolution to obtain a sample characteristic image 4 after feature extraction again; inputting the sample characteristic image 4 subjected to the characteristic extraction again into a 4 th batch normalization layer to obtain a normalized sample characteristic image 4; inputting the normalized sample characteristic image 4 into a 4 th second activation function to obtain an updated intermediate sample characteristic image 15;

performing convolution calculation on the intermediate sample characteristic image 15 by adopting a 5 th second expansion convolution to obtain a sample characteristic image 5 after feature extraction again; inputting the sample characteristic image 5 subjected to the characteristic extraction again into a 5 th batch normalization layer to obtain a normalized sample characteristic image 5; inputting the normalized sample characteristic image 5 into the 5 th second activation function to obtain an updated intermediate sample characteristic image 16;

carrying out convolution calculation on the intermediate sample characteristic image 16 by adopting a third expansion convolution to obtain a noise-reduced sample residual image 17; and (4) superposing the sample residual image 17 on the low-quality image to complete the human face feature enhancement processing of the low-quality image, so as to obtain an enhanced image sample 18.

Optionally, the Activation Function (Activation Function) includes a relu (the Rectified Linear unit) Function.

Optionally, the adjacent dilation convolutions differ in dilation factor, and the dilation convolution includes a first dilation convolution and a second dilation convolution. For example, the above-mentioned dilation factor (dilation rate) may exhibit a tendency of increasing at equal intervals and decreasing at equal intervals, and the dilation factor of the first dilation convolution and the last second dilation convolution are the same; for example, in the noise reduction network shown in fig. 1, the spreading factor of the first spreading convolution is 1, and the spreading factors of the 6 second spreading convolutions are 2, 3, 4, 3, 2, and 1 in this order.

Optionally, the second dilation convolution + batch normalization layer + second activation function is regarded as a second recognition layer as a whole, and the feature mapping data of the second recognition layer is set to 64.

And 240, extracting a second face feature from the enhanced image sample through the recognition network.

The computer equipment performs M times of feature extraction based on the enhanced image sample through an identification network to obtain M second sample feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the second sample feature vector of each level is extracted from the second sample feature vector of the previous level; performing multi-scale feature fusion on the M second sample feature vectors through an identification network to obtain fused second sample feature vectors, wherein M is an integer greater than 1; performing feature learning under a space attention vector mechanism on the fused second sample feature vector through an identification network to obtain a learned second sample feature vector; and extracting a second face feature from the learned second sample feature vector through a recognition network.

For example, for feature fusion, the computer device performs Max firing on each of the M second sample feature vectors to obtain M pooled second sample feature vectors; and fusing the M pooled second sample feature vectors to obtain a fused second sample feature vector.

Illustratively, as shown in FIG. 1, the recognition network includes four feature extraction layers; the computer equipment extracts a sample feature vector 21 from a target image sample through a first feature extraction layer, extracts a sample feature vector 22 from the sample feature vector 21 through a second feature extraction layer, extracts a sample feature vector 23 from the sample feature vector 22 through a third feature extraction layer, and extracts a sample feature vector 24 from the sample feature vector 23 through a fourth feature extraction layer; performing multi-scale feature fusion on the sample feature vector 21, the sample feature vector 22, the sample feature vector 23 and the sample feature vector 24 to obtain a fused sample feature vector (not shown in the figure); performing feature learning on the fused sample feature vector under a spatial attention vector mechanism to obtain a learned sample feature vector 25; target face features are extracted from the learned sample feature vectors 25.

If the target image sample is the first image sample, the sample feature vector is the first sample feature vector, the fused sample feature vector is the fused first sample feature vector, the learned sample feature vector is the learned first sample feature vector, and the target face feature is the first face feature 26; if the target image sample is the second image sample, the sample feature vector is the second sample feature vector, the fused sample feature vector is the fused second sample feature vector, the learned sample feature vector is the learned second sample feature vector, and the target face feature is the second face feature 26'.

And step 250, adjusting network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain a trained face recognition model.

And the computer equipment carries out back propagation training on the network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain a trained face recognition model.

Illustratively, the computer device calculates a feature loss (i.e., a feature error) between the first facial feature and the second facial feature using a loss function, which may be a square loss function or a cross-entropy loss function.

Optionally, the computer device adjusts a network parameter of the feature enhancement network based on a feature error between the first face feature and the second face feature to obtain a trained feature enhancement network; and then, carrying out fine adjustment on the overall network parameters of the face recognition model. The computer equipment acquires at least one third image sample; calling the trained feature enhancement network to perform face feature enhancement processing on the third image sample to obtain an enhanced third image sample; calling an identification network to extract a third face feature from the enhanced third image sample; and adjusting the network parameters of the trained feature enhancement network and the network parameters of the pre-trained recognition network based on the third face features to obtain a trained face recognition model.

For example, the computer device may calculate a face recognition loss based on the third face feature, and adjust the network parameters of the trained feature enhancement network and the network parameters of the pre-trained recognition network based on the face recognition loss to obtain the trained face recognition model. Illustratively, the above-mentioned face recognition Loss calculation may be implemented by using at least one face recognition Loss function selected from a soft maximum Loss (Softmax Loss) function, a Center-Loss (Center-Loss) function, a Softmax Loss function, am (additive mark) -Softmax Loss function, arcfaceloss (additive Angular mark Loss) function, and the like.

Optionally, the third image sample comprises a second image sample. Illustratively, the computer device performs the above-described overall parameter fine-tuning process on the face recognition model using the second image sample.

In summary, in the model training method for the face recognition model provided in this embodiment, the pre-trained recognition network is used to extract the first face feature from the first image sample, the feature enhancement network is used to perform the feature enhancement processing on the amplified sample (i.e., the second image sample) of the first image sample, and the recognition network is used to extract the second face feature from the enhanced image sample; based on the characteristic error between the first face characteristic and the second face characteristic, the network parameters of the characteristic enhancement network are adjusted, so that the capability of the characteristic enhancement network for enhancing the face characteristic is improved, the image quality of a face image can be improved through the characteristic enhancement network before the face characteristic is extracted by the face recognition model, the burden of subsequent recognition network learning is reduced, the learned characteristic has pureness and generalization, and the face characteristic with higher quality is extracted from the enhanced face image.

Compared with the method for training the model by adopting the errors among the pixels, the method trains the model by the errors among the characteristics, learns the data from the characteristics to be high-quality, does not depend on constructing the paired data from the pixels to the pixels which accord with the real noise distribution, and can avoid overfitting of characteristic learning in the model.

The method learns the migration of the low-quality features to the high-quality features in the feature space, the feature space has small dimension, each feature and the position of the feature have clear semantics, the low-quality features are guided to advance towards a more separable direction (namely, the low-quality features migrate towards the high-quality features) in the feature space, the disturbance encountered in the convergence of the image space can be avoided, for example, the pixel averaging effect caused by a traditional noise reduction network is avoided, and meanwhile, any factor influencing the face recognition can have generalization.

For the loss of feature recognition of the face recognition model, the computer device may calculate using the positive sample face features and the negative sample face features, for example, as shown in fig. 4, the above step 250 may be replaced by steps 252 to 256, as follows:

step 252, after the first face feature and the second face feature are obtained, other face features are obtained.

The other face features are extracted from other second image samples except the second image sample, and faces corresponding to the other second image samples are different from faces corresponding to the second image samples.

The computer equipment performs feature extraction on other second image samples except the second image sample to obtain other face features of the other second image samples, and stores or caches the face features; in the process of model training for the first image sample and the second image sample, if the extraction of the first facial feature and the second facial feature is completed, the other stored or cached facial features are obtained.

Under the condition that a plurality of other face features are stored or cached, the computer equipment acquires the other face features which are stored newly; or acquiring other latest cached face features; alternatively, the computer device randomly acquires one of the other facial features.

And 254, calculating characteristic loss based on the first human face characteristic, the second human face characteristic and other human face characteristics.

The computer device calculates a feature loss based on the first face feature, the second face feature, and other face features, using the second face feature as a positive sample face feature, using the other face features as a negative sample face feature.

Illustratively, a binary loss function may be defined as follows:

L_turple＝max(||f_hq-f_p||-||f_hq-f_n||+margin,0)；

wherein L is_turpleRepresents the above characteristic loss, f_hqRepresenting a first facial feature, f_pRepresenting a second face feature, f_nRepresenting other face features, max representing taking the maximum value, and margin representing the hyper-parameter.

And inputting the first human face characteristic, the second human face characteristic and other human face characteristics into the binary loss function by the computer equipment, and calculating to obtain the characteristic loss.

And 256, adjusting network parameters of the feature enhancement network based on the feature loss to obtain a trained face recognition model.

Optionally, the computer device adjusts network parameters of the feature enhancement network based on the feature loss to obtain a trained feature enhancement network; acquiring at least one third image sample; calling the trained feature enhancement network to perform face feature enhancement processing on the third image sample to obtain an enhanced third image sample; calling an identification network to extract a third face feature from the enhanced third image sample; and adjusting the network parameters of the trained feature enhancement network and the network parameters of the recognition network based on the third face features to obtain a trained face recognition model. For example, the computer device may calculate a face recognition loss corresponding to the third face feature by using a Softmax loss function, and adjust the trained feature enhancement network and the pre-trained recognition network based on the face recognition loss to obtain the trained face recognition model.

Optionally, the third image sample includes a second image sample, and the computer device may perform the overall parameter fine-tuning process on the face recognition model by using the second image sample.

Illustratively, the algorithm flow of the model training method in the embodiment is as follows:

the Led3D is a Lightweight Efficient three-dimensional face recognition network, namely, a Lightweight and Efficient Deep Approach to recogning Low-quality 3D Faces.

In summary, the model training method for the face recognition model provided in this embodiment performs loss calculation based on the face features of the positive sample and the face features of the negative sample, and simultaneously considers the distribution of the positive sample and the negative sample in the high-quality feature space, so that the embedding of the high-quality feature space can be better matched, and the face recognition model can realize face recognition with higher accuracy.

It should be further noted that the pre-trained recognition network may be obtained by training based on the first image sample, that is, the pre-trained recognition network may be obtained by training based on a high-quality face image, so that the recognition network obtains a better face recognition capability.

Illustratively, the computer device performs M times of feature extraction based on the first image sample through the identification network to obtain M first sample feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the first sample feature vector of each level is extracted from the feature vector of the previous level; performing multi-scale feature fusion on the M first sample feature vectors through an identification network to obtain fused first sample feature vectors, wherein M is a positive integer greater than 1; performing feature learning under a space attention vector mechanism on the fused first sample feature vector through an identification network to obtain a learned first sample feature vector; extracting a first face feature from the learned first sample feature vector through an identification network; and adjusting network parameters of the recognition network based on the recognition loss of the first face characteristic, and obtaining the pre-trained recognition network through multiple rounds of training. Illustratively, a Softmax loss function may be employed to calculate the recognition loss of the first facial features.

The first face features can be stored, so that the first face features corresponding to the first sample image are directly obtained in the model training process.

Fig. 5 is a flowchart illustrating a face recognition method according to an exemplary embodiment of the present application, where the method is applied to a computer device, which may be, for example, a terminal or a server; the method adopts the face recognition model provided by the embodiment, and comprises the following steps:

and step 310, calling a face recognition model to perform face feature enhancement processing on the acquired face image to obtain an enhanced image.

The face recognition model comprises a noise reduction network and a feature superposition function; the computer equipment carries out noise reduction processing on the face image through a noise reduction network to obtain a residual image after noise reduction; and superposing the residual image to the face image through a characteristic superposition function to obtain an enhanced image.

Optionally, the noise reduction network comprises: a first dilation convolution and a first activation function, N second dilation convolutions and N second activation functions; for obtaining the residual image, the computer equipment performs convolution calculation on the face image by adopting a first expansion convolution to obtain a feature image after feature extraction; inputting the characteristic image into a first activation function to obtain a middle characteristic image after nonlinear processing; performing convolution calculation on the intermediate characteristic image by adopting second expansion convolution to obtain a characteristic image after characteristic extraction again; inputting the feature image subjected to feature extraction again into a second activation function to obtain an updated intermediate feature image; and repeating the step of updating the intermediate characteristic image for N times to obtain a residual image after noise reduction, wherein N is a positive integer greater than 1.

Optionally, the noise reduction network comprises: a first dilation convolution and a first activation function, N second dilation convolutions, N bulk normalization layers and N second activation functions, and a third dilation convolution; wherein each second dilation convolution corresponds to a batch normalization layer and a second activation function;

for obtaining the residual image, the computer equipment performs convolution calculation on the face image by adopting a first expansion convolution to obtain a feature image after feature extraction; inputting the characteristic image into a first activation function to obtain a middle characteristic image after nonlinear processing;

performing convolution calculation on the intermediate characteristic image by adopting second expansion convolution to obtain a characteristic image after characteristic extraction again; inputting the feature images subjected to feature extraction again into a batch normalization layer to obtain normalized feature images; inputting the normalized characteristic image into a second activation function to obtain an updated intermediate characteristic image; repeating the step of updating the intermediate characteristic image for N times to obtain N times of updated intermediate characteristic images, wherein N is a positive integer greater than 1;

and performing convolution calculation on the intermediate characteristic image after the N times of updating by adopting a third expansion convolution to obtain a residual image after noise reduction.

Optionally, the activation function includes a ReLU function.

Optionally, the adjacent dilation convolutions differ in dilation factor, and the dilation convolution includes a first dilation convolution and a second dilation convolution. For example, the above-mentioned dilation factor may exhibit a tendency to increase at equal intervals and decrease at equal intervals, and the dilation factor of the first dilation convolution and the last second dilation convolution is the same.

And step 320, calling a face recognition model to perform feature extraction on the enhanced image to obtain the face features in the face image.

The face recognition model comprises a recognition network; the computer equipment performs M times of feature extraction based on the enhanced image through an identification network to obtain M feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the feature vector of each level is extracted from the feature vector of the previous level; performing multi-scale feature fusion on the M feature vectors through an identification network to obtain fused feature vectors, wherein M is a positive integer greater than 1; performing feature learning on the fused feature vector under a space attention vector mechanism through an identification network to obtain a learned feature vector; and extracting the face features from the learned feature vectors through an identification network.

Illustratively, for feature fusion, the computer device performs Max Pooling on each of the M feature vectors to obtain M pooled feature vectors; and fusing the M pooled feature vectors to obtain fused feature vectors.

And step 330, calling a face recognition model to perform recognition based on the face features to obtain a face recognition result.

For example, the face recognition result may be a face recognition result obtained by performing identity authentication on the face visual feature information, or may be an extraction result of the face visual feature information (that is, a face feature).

Illustratively, the face recognition model further includes a Dropout layer, a Full Connected (FC) layer, and a classification (softmax) layer; the computer equipment inputs the face features into a Dropout layer, and abandons part of the features in the face features to obtain processed face features; inputting the processed human face features into an FC layer, and carrying out comprehensive processing on the human face features to obtain the human face features after the secondary processing; inputting the face features after the secondary processing into the softmax layer for recognition to obtain a face recognition result. The Dropout layer is used for reducing overfitting of feature learning and improving generalization capability of the network.

In summary, the face recognition method provided in this embodiment adopts a face recognition model to perform face recognition, where the face recognition model includes a noise reduction network and a feature superposition function, and performs noise reduction on a face image by using the noise reduction network to obtain a residual image, and then superimposes the residual image onto the face image by using the feature superposition function to perform face feature enhancement on the face image, and performs face recognition based on the feature-enhanced image, so as to reduce the burden of subsequent recognition network learning, so that the learned features have higher purity and generalization, and further extract higher-quality face features from the enhanced face image, thereby implementing more accurate face recognition.

Illustratively, gaussians with variable variances are randomly added to data on a high-quality three-dimensional face data set, and a scheme 1, a scheme 2 and a scheme in the application are respectively adopted for recognition experiments, and the results are shown in the following table 1:

TABLE 1

Model (model)	Rate of accuracy
		Led3d	90.78％
DnCNN+Led3d	94.15％
		Ours	97.15％

Wherein, the scheme 1 is a scheme of separately adopting an Led3d model for identification; the scheme 2 is a scheme of adopting a Denoising Convolutional Neural Network (DnCNN) to denoise a face image and then identifying the face image by Led3 d; scheme 3 is a scheme provided by this application, namely Ours. Obviously, compared with scheme 1, the recognition accuracy of the scheme provided by the application is improved by about 6.37%; compared with the method for carrying out noise reduction processing by adopting DnCNN, the identification accuracy of the scheme provided by the application is improved by 3%. As shown in fig. 6, which respectively shows the difference values of the face before noise reduction, the face after noise reduction, and the face before noise reduction in scheme 2 and scheme 3, it is obvious that the difference value in scheme 3 is clearer, which indicates that the noise reduction effect in scheme 3 is better.

Illustratively, experiments were performed using true low quality three-dimensional face datasets, with the results shown in table 2 below:

TABLE 2

Model (model)	Rate of accuracy
		Led3d	89.27％
Ours	94.77％

Compared with scheme 1, the recognition accuracy of the scheme provided by the application is improved by 5.5%, and as shown in fig. 7, the recognition effect on the real image is shown. As shown in fig. 7, the difference values of the face before noise reduction, the face after noise reduction, and the face before noise reduction in the scheme 3 in the real scene are shown, and it can be seen that the difference value in the scheme 3 is clear, which indicates that the noise reduction effect in the scheme 3 is excellent.

Fig. 8 is a block diagram of a model training apparatus for a face recognition model according to an exemplary embodiment of the present application, which may be implemented as part of or all of a computer device, such as a server or a terminal, through software, hardware, or a combination thereof. The face recognition model comprises a feature enhancement network and a pre-trained recognition network, and the device comprises:

a sample obtaining module 412, configured to obtain at least one group of face image samples, where each group of face image samples includes a first image sample and a second image sample, and the second image sample is a sample obtained by amplifying the first image sample;

a feature obtaining module 414, configured to extract a first facial feature from the first image sample through a recognition network; performing face feature enhancement processing on the second image sample through a feature enhancement network to obtain an enhanced image sample, and extracting second face features from the enhanced image sample through an identification network;

and a parameter adjusting module 416, configured to adjust a network parameter of the feature enhancement network based on a feature error between the first face feature and the second face feature, so as to obtain a trained face recognition model.

In some embodiments, the feature enhancement network includes a noise reduction network and a feature superposition function;

carrying out face feature enhancement processing on the second image sample through a feature enhancement network to obtain an enhanced image sample, wherein the face feature enhancement processing comprises the following steps:

denoising the second image sample through a denoising network to obtain a denoised sample residual image;

and superposing the sample residual image to the second image sample through a characteristic superposition function to obtain an enhanced image sample.

In some embodiments, the noise reduction network comprises a first expanded convolution and a first activation function, N second expanded convolutions and N second activation functions;

and carrying out noise reduction processing on the second image sample through a noise reduction network to obtain a noise-reduced sample residual image, wherein the noise reduction processing comprises the following steps:

performing convolution calculation on the second image sample by adopting a first expansion convolution to obtain a sample characteristic image after characteristic extraction; inputting the sample characteristic image into a first activation function to obtain a middle sample characteristic image after nonlinear processing;

performing convolution calculation on the intermediate sample characteristic image by adopting a second expansion convolution to obtain a sample characteristic image after feature extraction again; inputting the sample characteristic image subjected to the secondary characteristic extraction into a second activation function to obtain an updated intermediate sample characteristic image;

and repeating the step of updating the characteristic image of the intermediate sample for N times to obtain a sample residual image after noise reduction, wherein N is an integer greater than 1.

In some embodiments, the dilation convolution includes a first dilation convolution and a second dilation convolution, and the dilation convolution includes a first dilation convolution and a second dilation convolution.

In some embodiments, extracting the first facial features from the first image sample over the recognition network includes:

performing M times of feature extraction on the basis of first image samples through an identification network to obtain M first sample feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the first sample feature vector of each level is extracted from the first sample feature vector of the previous level;

performing multi-scale feature fusion on the M first sample feature vectors through an identification network to obtain fused first sample feature vectors, wherein M is an integer greater than 1;

performing feature learning under a space attention vector mechanism on the fused first sample feature vector through an identification network to obtain a learned first sample feature vector;

and extracting first face features from the learned first sample feature vector through a recognition network.

In some embodiments, extracting the second facial features from the enhanced image sample through the recognition network includes:

performing M times of feature extraction on the basis of the enhanced image sample through an identification network to obtain M second sample feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the second sample feature vector of each level is extracted from the second sample feature vector of the previous level;

performing multi-scale feature fusion on the M second sample feature vectors through an identification network to obtain fused second sample feature vectors, wherein M is an integer greater than 1;

performing feature learning under a space attention vector mechanism on the fused second sample feature vector through an identification network to obtain a learned second sample feature vector;

and extracting a second face feature from the learned second sample feature vector through a recognition network.

In some embodiments, the apparatus further comprises:

a feature obtaining module 414, configured to obtain other face features after obtaining the first face feature and the second face feature, where the other face features are extracted from other second image samples other than the second image sample, and faces corresponding to the other second image samples are different from faces corresponding to the second image sample;

a parameter adjustment module 416, configured to calculate a feature loss based on the first facial feature, the second facial feature, and other facial features; and adjusting network parameters of the feature enhancement network based on the feature loss to obtain a trained face recognition model.

In some embodiments, adjusting network parameters of the feature enhancement network based on a feature error between the first face feature and the second face feature to obtain a trained face recognition model includes:

adjusting network parameters of the feature enhancement network based on a feature error between the first face feature and the second face feature to obtain a trained feature enhancement network;

acquiring at least one third image sample;

calling the trained feature enhancement network to perform face feature enhancement processing on the third image sample to obtain an enhanced third image sample;

calling an identification network to extract a third face feature from the enhanced third image sample;

and adjusting the network parameters of the trained feature enhancement network and the network parameters of the recognition network based on the third face features to obtain a trained face recognition model.

In some embodiments, the amplification of the second image sample comprises at least one of:

adding noise to the first image samples;

adding an occlusion on the first image sample;

down-sampling or up-sampling the first image sample;

rotating the first image sample;

and adjusting the depth distance between the person and the background in the first image sample, wherein the background refers to the pixel area except for the person in the first image sample.

In summary, in the model training apparatus for a face recognition model provided in this embodiment, a pre-trained recognition network is used to extract a first face feature from a first image sample, and meanwhile, a feature enhancement network is used to perform a feature enhancement process on an amplified sample (i.e., a second image sample) of the first image sample, and then a recognition network is used to extract a second face feature from the enhanced image sample; based on the characteristic error between the first face characteristic and the second face characteristic, the network parameters of the characteristic enhancement network are adjusted, so that the capability of the characteristic enhancement network for enhancing the face characteristic is improved, the image quality of a face image can be improved through the characteristic enhancement network before the face characteristic is extracted by the face recognition model, the burden of subsequent recognition network learning is reduced, the learned characteristic has pureness and generalization, and the face characteristic with higher quality is extracted from the enhanced face image.

Compared with the model training by adopting the errors among pixels, the device trains the model by the errors among the characteristics, learns the data from the characteristics to have high quality, does not depend on constructing the paired data from the pixels to the pixels which accord with the real noise distribution, and can avoid the overfitting of the characteristic learning in the model.

The device learns the migration of low-quality features to high-quality features in a feature space, the dimension of the feature space is small, each feature and the position of the feature have clear semantics, the low-quality features are guided to advance towards a more separable direction (namely, the low-quality features migrate towards the high-quality features) in the feature space, the disturbance encountered in the convergence of an image space can be avoided, for example, the pixel averaging effect caused by a traditional noise reduction network is avoided, and meanwhile, any factor influencing face recognition can have generalization.

Fig. 9 shows a block diagram of a face recognition apparatus provided in an exemplary embodiment of the present application, which may be implemented as part or all of a computer device, which may include a server or a terminal, through software, hardware, or a combination of the two. The device adopts the face recognition model as described in the above embodiment, and the device includes:

the feature enhancement module 422 is used for calling a face recognition model to perform face feature enhancement processing on the acquired face image to obtain an enhanced image;

the feature extraction module 424 is configured to invoke a face recognition model to perform feature extraction on the enhanced image, so as to obtain face features in the face image;

and the face recognition module 426 is configured to invoke a face recognition model to perform recognition based on the face features, so as to obtain a face recognition result.

In some embodiments, the face recognition model includes a noise reduction network and a feature superposition function;

calling a face recognition model to perform face feature enhancement processing on the collected face image to obtain an enhanced image, wherein the face feature enhancement processing comprises the following steps:

carrying out noise reduction processing on the face image through a noise reduction network to obtain a residual image after noise reduction;

and superposing the residual image to the face image through a characteristic superposition function to obtain an enhanced image.

the method for denoising the face image through a denoising network to obtain a residual image after denoising comprises the following steps:

performing convolution calculation on the face image by adopting a first expansion convolution to obtain a feature image after feature extraction; inputting the characteristic image into a first activation function to obtain a middle characteristic image after nonlinear processing;

performing convolution calculation on the intermediate characteristic image by adopting second expansion convolution to obtain a characteristic image after characteristic extraction again; inputting the feature image subjected to feature extraction again into a second activation function to obtain an updated intermediate feature image;

and repeating the step of updating the intermediate characteristic image for N times to obtain a residual image after noise reduction, wherein N is a positive integer greater than 1.

In some embodiments, the face recognition model includes a recognition network;

calling a face recognition model to perform feature extraction on the enhanced image to obtain face features in the face image, and the method comprises the following steps:

performing M times of feature extraction on the basis of the enhanced image through an identification network to obtain M feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the feature vector of each level is extracted from the feature vector of the previous level;

performing multi-scale feature fusion on the M feature vectors through an identification network to obtain fused feature vectors, wherein M is a positive integer greater than 1;

performing feature learning on the fused feature vector under a space attention vector mechanism through an identification network to obtain a learned feature vector;

and extracting the face features from the learned feature vectors through an identification network.

In summary, the face recognition device provided in this embodiment adopts the face recognition model to perform face recognition, the face recognition model includes a noise reduction network and a feature superposition function, the noise reduction network is adopted to perform noise reduction processing on a face image to obtain a residual image, the feature superposition function is adopted to superimpose the residual image on the face image to perform face feature enhancement on the face image, the face recognition is performed based on the feature-enhanced image, the burden of subsequent recognition network learning can be reduced, so that the learned features have pureness and generalization, and then higher-quality face features are extracted from the enhanced face image, thereby realizing more accurate face recognition.

Fig. 10 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application. The computer device may be a device that performs a model training method of a face recognition model as provided herein, and/or a face recognition method, and the computer device may be a terminal or a server. Specifically, the method comprises the following steps:

the computer apparatus 500 includes a Central Processing Unit (CPU) 501, a system Memory 504 including a Random Access Memory (RAM) 502 and a Read Only Memory (ROM) 503, and a system bus 505 connecting the system Memory 504 and the Central Processing Unit 501. The computer device 500 also includes a basic Input/Output System (I/O System)506, which facilitates information transfer between various devices within the computer, and a mass storage device 507, which stores an operating System 513, application programs 514, and other program modules 515.

The basic input/output system 506 comprises a display 508 for displaying information and an input device 509, such as a mouse, keyboard, etc., for user input of information. Wherein a display 508 and an input device 509 are connected to the central processing unit 501 through an input output controller 510 connected to the system bus 505. The basic input/output system 506 may also include an input/output controller 510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 507 is connected to the central processing unit 501 through a mass storage controller (not shown) connected to the system bus 505. The mass storage device 507 and its associated computer-readable media provide non-volatile storage for the computer device 500. That is, mass storage device 507 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.

Computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Solid State Memory technology, CD-ROM, Digital Versatile Disks (DVD), or Solid State Drives (SSD), other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 504 and mass storage device 507 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 500 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the computer device 500 may be connected to the network 512 through the network interface unit 511 connected to the system bus 505, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 511.

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

In an alternative embodiment, a computer device is provided, which comprises a processor and a memory, wherein at least one instruction, at least one program, set of codes, or set of instructions is stored in the memory, and the at least one instruction, at least one program, set of codes, or set of instructions is loaded and executed by the processor to implement the model training method for a face recognition model, and/or the face recognition method, as described above.

In an alternative embodiment, a computer-readable storage medium is provided, in which at least one instruction, at least one program, code set, or set of instructions is stored, which is loaded and executed by a processor to implement the model training method for a face recognition model, and/or the face recognition method, as described above.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the model training method for a face recognition model and/or the face recognition method provided by the above-mentioned method embodiments.

The present application also provides a computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the model training method of the face recognition model and/or the face recognition method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A model training method of a face recognition model is characterized in that the face recognition model comprises a feature enhancement network and a pre-trained recognition network, and the method comprises the following steps:

extracting a first facial feature from the first image sample through the recognition network;

performing face feature enhancement processing on the second image sample through the feature enhancement network to obtain an enhanced image sample; extracting a second face feature from the enhanced image sample through the identification network;

and adjusting the network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain a trained face recognition model.

2. The method of claim 1, wherein the feature enhancement network comprises a noise reduction network and a feature superposition function;

the processing of enhancing the face features of the second image sample through the feature enhancement network to obtain an enhanced image sample includes:

denoising the second image sample through the denoising network to obtain a denoised sample residual image;

and superposing the sample residual image to the second image sample through the characteristic superposition function to obtain the enhanced image sample.

3. The method of claim 2, wherein the noise reduction network comprises a first expanded convolution and a first activation function, N second expanded convolutions and N second activation functions;

the denoising processing is performed on the second image sample through the denoising network to obtain a denoised sample residual image, and the denoising processing includes:

performing convolution calculation on the second image sample by adopting the first dilation convolution to obtain a sample characteristic image after characteristic extraction; inputting the sample characteristic image into the first activation function to obtain a middle sample characteristic image after nonlinear processing;

performing convolution calculation on the intermediate sample characteristic image by adopting the second dilation convolution to obtain a sample characteristic image after feature extraction again; inputting the sample characteristic image subjected to the secondary characteristic extraction into the second activation function to obtain an updated intermediate sample characteristic image;

and repeating the step of updating the intermediate sample characteristic image for N times to obtain the sample residual image after noise reduction, wherein N is an integer greater than 1.

4. The method of claim 3, wherein a dilation factor is different between adjacent dilation convolutions, wherein the dilation convolutions comprise the first dilation convolution and the second dilation convolution.

5. The method according to any one of claims 1 to 4, wherein the extracting, from the first image sample via the recognition network, first facial features comprises:

performing M times of feature extraction through the identification network based on the first image sample to obtain M first sample feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the first sample feature vector of each level is extracted from the first sample feature vector of the previous level;

performing multi-scale feature fusion on the M first sample feature vectors through the identification network to obtain fused first sample feature vectors, wherein M is an integer greater than 1;

performing feature learning under a space attention vector mechanism on the fused first sample feature vector through the identification network to obtain a learned first sample feature vector;

and extracting the first face feature from the learned first sample feature vector through the identification network.

6. The method according to any one of claims 1 to 4, wherein the extracting, by the recognition network, the second face feature from the enhanced image sample comprises:

performing M times of feature extraction through the identification network based on the enhanced image sample to obtain M second sample feature vectors of different levels, wherein the levels are used for indicating the times of feature extraction, and the second sample feature vector of each level is extracted from the second sample feature vector of the previous level;

performing multi-scale feature fusion on the M second sample feature vectors through the identification network to obtain fused second sample feature vectors, wherein M is an integer greater than 1;

performing feature learning under a space attention vector mechanism on the fused second sample feature vector through the identification network to obtain a learned second sample feature vector;

and extracting the second face features from the learned second sample feature vector through the identification network.

7. The method of any of claims 1 to 4, further comprising:

after the first face features and the second face features are obtained, obtaining other face features, wherein the other face features are extracted from other second image samples except the second image samples, and faces corresponding to the other second image samples are different from faces corresponding to the second image samples;

calculating a feature loss based on the first facial features, the second facial features, and the other facial features;

and adjusting the network parameters of the feature enhancement network based on the feature loss to obtain a trained face recognition model.

8. The method according to any one of claims 1 to 4, wherein the adjusting the network parameters of the feature enhancement network based on the feature error between the first face feature and the second face feature to obtain the trained face recognition model comprises:

acquiring at least one third image sample;

calling the identification network to extract a third face feature from the enhanced third image sample;

and adjusting the network parameters of the trained feature enhancement network and the network parameters of the recognition network based on the third face features to obtain the trained face recognition model.

9. The method of any of claims 1 to 4, wherein the second image sample is amplified in a manner that includes at least one of:

adding noise to the first image samples;

down-sampling or up-sampling the first image sample;

and adjusting the depth distance between the person in the first image sample and a background, wherein the background refers to a pixel area except the person in the first image sample.

10. A face recognition method using the face recognition model according to any one of claims 1 to 9, the method comprising:

calling the face recognition model to perform face feature enhancement processing on the collected face image to obtain an enhanced image;

calling the face recognition model to perform feature extraction on the enhanced image to obtain face features in the face image;

and calling the face recognition model to perform recognition based on the face features to obtain a face recognition result.

11. A model training apparatus for a face recognition model, wherein the face recognition model includes a feature enhancement network and a pre-trained recognition network, the apparatus comprising:

the system comprises a sample acquisition module, a comparison module and a comparison module, wherein the sample acquisition module is used for acquiring at least one group of face image samples, each group of face image samples comprises a first image sample and a second image sample, and the second image sample is a sample obtained by amplifying the first image sample;

the characteristic acquisition module is used for extracting a first face characteristic from the first image sample through the identification network; performing face feature enhancement processing on the second image sample through the feature enhancement network to obtain an enhanced image sample, and extracting second face features from the enhanced image sample through the identification network;

12. A face recognition apparatus using the face recognition model according to any one of claims 1 to 9, the apparatus comprising:

the characteristic enhancement module is used for calling the face recognition model to carry out face characteristic enhancement processing on the collected face image to obtain an enhanced image;

the feature extraction module is used for calling the face recognition model to perform feature extraction on the enhanced image to obtain face features in the face image;

13. A computer device, characterized in that the computer device comprises: a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the method of model training of a face recognition model according to any one of claims 1 to 9, or the method of face recognition according to claim 10.

14. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the model training method for a face recognition model according to any one of claims 1 to 9, or the face recognition method according to claim 10.

15. A computer program product, characterized in that the computer program product comprises computer instructions, the computer instructions being stored in a computer readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium, the processor executing the computer instructions to cause the computer device to perform the model training method of the face recognition model according to any one of claims 1 to 9, or the face recognition method according to claim 10.