WO2024021504A1

WO2024021504A1 - Facial recognition model training method and apparatus, recognition method, and device and medium

Info

Publication number: WO2024021504A1
Application number: PCT/CN2022/142236
Authority: WO
Inventors: 吴鹏; 肖嵘; 王孝宇
Original assignee: 成都云天励飞技术有限公司; 深圳云天励飞技术股份有限公司
Priority date: 2022-07-29
Filing date: 2022-12-27
Publication date: 2024-02-01
Also published as: CN115410249A

Abstract

Provided in the present application are a facial recognition model training method and apparatus, a recognition method, and a device and a medium. The facial recognition model training method comprises: acquiring a plurality of first sample facial images and an identity identification code corresponding to each first sample facial image; performing preset augmentation processing on the plurality of first sample facial images so as to obtain a plurality of first facial images, and according to a preset image augmentation model, performing augmentation processing on the plurality of first sample facial images so as to obtain a plurality of second facial images; and training a preset facial recognition model according to the plurality of first facial images and the plurality of second facial images, until the facial recognition model converges. In the present application, joint training is performed on a preset facial recognition model by means of first facial images and second facial images, such that a trained facial recognition model is more accurate.

Description

Face recognition model training method, recognition method, device, equipment and medium

This application requests the priority of the Chinese patent application submitted to the China Patent Office on July 29, 2022, with the application number 202210914189.X and the invention name "Face recognition model training method, recognition method, device, equipment and medium", The entire contents of which are incorporated herein by reference.

Technical field

The present application relates to the field of face recognition technology, and in particular to a face recognition model training method, recognition method, device, equipment and medium.

Background technique

As a basic attribute that distinguishes individuals, faces are frequently recognized in the fields of computer vision and multimedia applications. In these applications, face recognition models need to be redeployed in mobile phones and even smart cameras for use in many fields such as camera autofocus, human-computer interaction, photo management, urban security monitoring, and smart driving. Currently, in practical applications of face recognition under open environment conditions, it is often necessary to identify low-resolution face images, but the current recognition accuracy of low-resolution face images is poor. Currently, in order to improve the accuracy of low-resolution face image recognition, enhancement-based methods and embedding-based methods are used, but these two processing methods are not ideal and cannot meet user requirements. Therefore, how to improve the accuracy of low-resolution face image recognition is an urgent problem that needs to be solved.

Contents of the invention

The main purpose of this application is to provide a face recognition model training method, recognition method, device, equipment and medium, aiming to make the trained face recognition model more accurate and improve the accuracy of low-resolution face image recognition. .

In a first aspect, this application provides a face recognition method, which includes the following steps:

Obtain a plurality of first sample face images and the identity identification code corresponding to each first sample face image;

Perform preset augmentation processing on the plurality of first sample face images to obtain a plurality of first face images, and perform augmentation on the plurality of first sample face images according to the preset image augmentation model. Through extensive processing, a plurality of second face images are obtained. The first sample face image is the same as the identity identification code corresponding to the second face image. The image augmentation model is used to predict the first face image. The image undergoes image blur augmentation;

A preset face recognition model is trained according to the plurality of first face images and the plurality of second face images until the face recognition model converges.

In the second aspect, this application also provides a face recognition method, including:

Obtain the face image to be recognized;

Input the face image to be recognized into the face recognition model to obtain the identity characteristics of the person corresponding to the face image to be recognized, wherein the face recognition model is trained by the face recognition model Obtained by training by method;

According to the identity characteristics and the preset identity information database, the identity information of the person corresponding to the face image to be recognized is determined.

In a third aspect, this application also provides a face recognition model training device. The face recognition model training device includes a first acquisition module, a generation module and a training module, wherein:

The first acquisition module is used to acquire a plurality of first sample face images and an identity identification code corresponding to each of the first sample face images;

The generation module is used to perform preset augmentation processing on the plurality of first sample face images to obtain a plurality of first face images, and to perform preset augmentation processing on the plurality of first face images according to the preset image augmentation model. A sample face image is subjected to augmentation processing to obtain multiple second face images. The first sample face image is the same as the identity identification code corresponding to the second face image. The image augmentation model is used to Perform image blur augmentation on the first face image;

The first training module is used to train a preset face recognition model according to the plurality of first face images and the plurality of second face images until the face recognition model converges.

In a fourth aspect, the present application also provides a terminal device, which includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is When the processor is executed, the steps of the face recognition model training method and/or the face recognition method as mentioned above are implemented.

In a fifth aspect, the present application also provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the above-mentioned face recognition model training method is implemented. and/or steps of face recognition methods.

This application provides a face recognition model training method, recognition method, device, equipment and medium. This application obtains multiple first sample face images and the identity identification code corresponding to each first sample face image; and then A plurality of first sample face images are subjected to preset augmentation processing to obtain a plurality of first face images, and according to the preset image augmentation model, the plurality of first sample face images are augmented to obtain multiple first sample face images. A second face image, the first sample face image has the same identity code corresponding to the second face image, and the image augmentation model is used to perform image blur augmentation on the first face image; according to a plurality of third face images One face image and multiple second face images are used to train the preset face recognition model until the face recognition model converges. This solution uses preset augmentation and image augmentation models to augment multiple first sample face images, and can obtain a large number of first face images and second face images, greatly increasing the number of training samples. , by jointly training the preset face recognition model on the first face image and the second face image, so that the trained face recognition model is more accurate.

Description of drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application, which are of great significance to this field. Ordinary technicians can also obtain other drawings based on these drawings without exerting creative work.

Figure 1 is a schematic flow chart of a face recognition model training method provided by an embodiment of the present application;

Figure 2 is a schematic flowchart of an image augmentation model training provided by an embodiment of the present application;

Figure 3 is a schematic flowchart of the sub-steps of image augmentation model training in Figure 2;

Figure 4 is a schematic flowchart of the sub-steps of the face recognition model training method in Figure 1;

Figure 5 is a schematic flow chart of the steps of the face recognition method provided by the embodiment of the present application;

Figure 6 is a schematic block diagram of a face recognition model training device provided by an embodiment of the present application;

Figure 7 is a schematic block diagram of a sub-module of the face recognition model training device provided by the embodiment of the present application;

Figure 8 is a schematic block diagram of an image augmentation model training device provided by an embodiment of the present application;

Figure 9 is a schematic block diagram of a face recognition device provided by an embodiment of the present application;

Figure 10 is a schematic structural block diagram of a terminal device provided by an embodiment of the present application.

The realization of the purpose, functional features and advantages of the present application will be further described with reference to the embodiments and the accompanying drawings.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all contents and operations/steps, nor are they necessarily performed in the order described. For example, some operations/steps can also be decomposed, combined or partially merged, so the actual order of execution may change according to actual conditions.

Embodiments of the present application provide a face recognition model training method, recognition method, device, equipment and medium. Among them, the face recognition model training method can be applied to terminal devices, which can be electronic devices such as mobile phones, tablet computers, notebook computers, desktop computers, personal digital assistants, and wearable devices.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The following embodiments and features in the embodiments may be combined with each other without conflict.

Please refer to FIG. 1 , which is a schematic flowchart of a face recognition model training method provided by an embodiment of the present application.

As shown in Figure 1, the face recognition model training method includes steps S101 to S103.

Step S101: Obtain a plurality of first sample face images and an identity identification code corresponding to each first sample face image.

The identity identification code is the identity identification corresponding to the first sample face image. The identity identification code can be set according to the actual situation. This is not specifically limited in the embodiment of the present invention. For example, the identity identification code can be an ID card number. , the preset resolution can be set according to the actual situation, and the embodiment of the present invention does not specifically limit this. For example, the preset resolution is 720P.

In one embodiment, a plurality of first sample face images and an identity identification code corresponding to each first sample face image are obtained, and a plurality of sample face images and an identity identification code corresponding to each first sample face image are obtained. code, the resolution of each first sample face image is less than or equal to the preset resolution.

It should be noted that the acquisition method of the first sample face image can be selected according to the actual situation, and the embodiment of the present invention does not specifically limit this. For example, the first sample face image can be an image intercepted from a video, or It may be an image collected by a shooting device. The shooting device may be selected according to the actual situation. This is not specifically limited in the embodiment of the present invention. For example, the shooting device may be a camera, a camcorder, a mobile phone, and other devices.

Step S102: Perform preset augmentation processing on the plurality of first sample face images to obtain a plurality of first face images, and perform preset augmentation processing on the plurality of first sample face images according to the preset image augmentation model. The image is augmented to obtain multiple second face images.

Preset augmentation processing and image augmentation model processing are performed on multiple first sample face images to obtain more samples.

In one embodiment, please refer to FIG. 2 , which is a schematic flowchart of an image augmentation model training provided by an embodiment of the present application.

As shown in Figure 2, the image augmentation model training includes steps S201 to S202.

Step S201: Acquire a plurality of second sample face images, and add noise to each second sample face image to obtain a plurality of third sample face images.

Acquire a plurality of second sample face images, wherein the method of obtaining the plurality of second sample face images can be selected according to the actual situation. The embodiment of the present invention does not specifically limit this. For example, the second sample face image It may be an image captured from a video or an image collected by a shooting device. The shooting device may be selected according to the actual situation. This is not specifically limited in the embodiment of the present invention. For example, the shooting device may be a camera or a video camera. and mobile phones and other devices.

In one embodiment, preset photon noise, readout noise and quantization noise are obtained; photon noise, readout noise and quantization noise are added to each second sample face image according to the resolution of each second sample face image. Quantize the noise to obtain multiple third sample face images. The preset photon noise, readout noise and quantization noise can be set according to actual conditions, and this is not specifically limited in the embodiment of the present invention. By adding noise to each second sample face image, the sample images for training the image augmentation model are made more realistic, so that the trained image augmentation model is more accurate.

It should be noted that the photon noise is the optical noise generated by the photoelectric effect when photons are converted into electrons when collecting images; the readout noise is the inherent factor of the circuit in the process of converting electrons into voltage when collecting images, such as the heat of electrons in the device. Movement, etc., causes inaccuracy in the results, and the error generated is called readout noise; the quantization noise is the conversion of voltage into numbers when collecting images, and the conversion of continuous signals into digital signals. The information loss caused is called quantization error or The rounding error is the quantization noise.

In one embodiment, the method of obtaining photon noise can also be: obtaining the collected image is the number of photons I received by the sensor, and using the Poisson distribution to fit the number of photons I to obtain the photon noise. The photon noise can be accurately obtained by fitting the number of received photons through the Poisson distribution.

In one embodiment, the method of obtaining the readout noise can also be: obtain the error in the process of converting electrons into voltage during the image acquisition process, perform Gaussian distribution processing on the error and process it through the preset Tukey lambda distribution, and generate the readout noise. By processing the error in the conversion of electrons into voltage during image acquisition, the readout noise can be accurately obtained.

In an embodiment, the method of obtaining the quantization noise may also be: obtaining the quantization noise distribution, and the quantization noise distribution is [-0.5q, 0.5q], where q is the number of quantization steps. Obtain the number of quantization steps, and calculate the number of quantization steps and the quantization noise distribution to obtain the quantization noise. For example, the number of this quantization step is 1 (that is, q is 1), and the quantization noise is [-0.5, 0.5]; the number of this quantization step is 2 (that is, q is 2), and the quantization noise is [-0.1, 0.1].

In one embodiment, the resolution of each second sample face image is obtained, and photon noise, readout noise and quantization are added to each second sample face image according to the resolution of each second sample face image. Noise, multiple third sample face images are obtained. By adding photon noise, readout noise and quantization noise to the second sample face image, a sample image that is more consistent with the characteristics of the low-resolution image can be obtained.

For example, the noise superposition formula is obtained. The noise superposition formula is N=kN ₁ +N ₂ +N ₃ , where N is the total noise, k is the photon noise gain value, and N ₁ is the photon noise of each pixel. , N ₂ is the readout noise of the number of points per pixel, N ₃ is the quantization noise of the number of points per pixel, obtain the photon noise, readout noise and quantization noise. Based on the noise superposition formula, the photon noise, readout noise Added to the quantization noise to get the total noise. The photon noise gain value is set according to the imaging system, and the photon noise gain value can be set according to the actual situation, which is not specifically limited in the embodiment of the present invention. According to the resolution of the second sample face image, total noise is added to the second sample face image to generate a third sample face image.

Step S202: Train a preset image augmentation model based on a plurality of the third sample face images until the image augmentation model converges.

Among them, the image augmentation model includes an image downsampling model and a Gaussian blur model.

In an embodiment, as shown in Figure 3, step S202 includes sub-steps S2021 to sub-step S2023.

Sub-step S2021: Process each of the third sample face images through a preset image augmentation model to obtain a plurality of third face images.

Each third sample face image is down-sampled through an image down-sampling model, and the down-sampled image is processed with a Gaussian blur model to obtain a third face image corresponding to each third sample face image. Using the image downsampling model to downsample the third sample face image can make the sample image fit the size of the display area and generate corresponding image thumbnails. Gaussian blur processing on the thumbnails can accurately obtain the third face image.

Sub-step S2022: Determine whether the image augmentation model converges based on a plurality of second sample face images and a plurality of third face images.

Calculate the facial feature similarity between the two second sample face images matching each identity identification code, obtain at least one face similarity corresponding to each identity identification code, and establish a first similarity based on the similarity of each face degree histogram; calculate the facial feature similarity between the two third face images matching each identity identification code, obtain at least one face similarity corresponding to each identity identification code, and establish the similarity based on each face similarity The second similarity histogram; perform curve fitting on the first similarity histogram to obtain the first curve, perform curve fitting on the second similarity histogram to obtain the second curve; determine the distance between the first curve and the coordinate axis The first area enclosed and the second area enclosed by the second curve and the coordinate axis; when the area of the intersection area of the first area and the second area is greater than or equal to the preset area threshold, it is determined that the image augmentation model has converged ; When the area of the intersection area of the first area and the second area is less than the preset area threshold, it is determined that the image augmentation model has converged. The preset area threshold can be set according to actual conditions, and this is not specifically limited in the embodiment of the present invention. Whether the image augmentation model converges can be accurately known by determining the area of the intersection area where each second sample face image and the coordinate axis form a first area, and each third sample face image and the coordinate axis form a second area.

In one embodiment, the method of calculating the facial feature similarity between two second sample face images matching each identity identification code may be: obtaining two second sample face images matching the identity identification code. , calculate the cosine distance of the features of the two second sample face images, and obtain the similarity of the two second sample face images. By calculating the cosine distance of the features of the two second sample face images, the similarity of the two second sample face images can be accurately obtained.

In one embodiment, the method of establishing the first similarity histogram based on the similarity of each face may be: using the face similarity as the abscissa, and using the number of the same face similarities as the ordinate to establish a rectangular coordinate system, according to The similarity of each face and the number of similarities of the same face establish a first similarity histogram. By establishing a first similarity histogram for each face similarity, the accuracy of model training can be improved.

It should be noted that the method of calculating the facial feature similarity between two third face images matching each identity identification code and obtaining the similarity of at least one face corresponding to each identity identification code can refer to Calculating Each Identity The similarity of facial features between the two second sample face images whose identification codes match is used to obtain the similarity of at least one face corresponding to each identification code; establishing a second similarity histogram based on the similarity of each face The way of the graph can refer to the way of establishing the first similarity histogram based on the similarity of each face. Therefore, for calculating the facial feature similarity between the two third face images matching each identity identification code, we obtain each At least one face similarity corresponding to the identity identification code, and establishing a second similarity histogram based on each face similarity will not be described in too much detail.

In one embodiment, a preset curve fitting method is obtained, curve fitting is performed on the first similarity histogram based on the preset curve fitting method to obtain the first curve, and the second similarity histogram is performed. Curve fitting to obtain the second curve. Among them, the preset curve fitting method can be selected according to the actual situation. The embodiment of the present invention does not specifically limit this. For example, the preset curve fitting method can be using the mlab module in matplotlib or using seaborn. distplot plotting in the library. Through this curve fitting method, the first curve corresponding to the first similarity histogram and the second curve corresponding to the second similarity histogram can be accurately obtained.

Sub-step S2023: If the image augmentation model has not converged, adjust the model parameters of the image augmentation model to update the image augmentation model, and continue to train the updated image augmentation model until the Image augmentation model converges.

When determining whether the area of the intersection area of the first area and the second area is greater than or equal to the preset area threshold, when the area of the intersection area of the first area and the second area is greater than or equal to the preset area threshold, determine the image augmentation model Converged. When the area of the intersection area of the first area and the second area is less than the preset area threshold, it is determined that the image augmentation model has not converged, wherein the image augmentation model includes an image downsampling model and a Gaussian blur model, and the image augmentation model is adjusted The downsampling parameters of the image downsampling model and the model parameters of the Gaussian blur model are adjusted to update the image downsampling model and Gaussian blur model, and continue training the updated image downsampling model and Gaussian blur model until the image downsampling model and Gaussian blur The model converges and a converged image augmentation model is obtained. When it is determined that the image augmentation model has not converged, by adjusting the model parameters of the image augmentation model and continuing to train the image augmentation model with updated model parameters, a converged image augmentation model can be accurately obtained.

In one embodiment, the method of adjusting the downsampling parameters of the image downsampling model and adjusting the model parameters of the Gaussian blur model in the image augmentation model can be: selecting one from the preset downsampling parameter library and the preset model parameter library. The parameters are used as the downsampling parameters of the adjusted image downsampling model and the model parameters of the adjusted Gaussian blur model. The preset downsampling parameter library and the preset model parameter library can be set according to actual conditions, and this is not specifically limited in the embodiment of the present invention. For example, the downsampling parameters included in the downsampling parameter library can be 10 times, 20 times, and 50 times; the model parameters include Gaussian kernel parameters, and the Gaussian kernel parameters can be 0.5, 5, and 8. By selecting the downsampling parameters and model parameters from the preset downsampling parameter library and the preset model parameter library, it is possible to accurately adjust the downsampling parameters of the image downsampling model in the image augmentation model and adjust the model parameters of the Gaussian blur model. .

In one embodiment, a plurality of first sample face images are subjected to preset augmentation processing to obtain a plurality of first face images. The preset augmentation can be selected according to the actual situation, which is not specified in the embodiment of the present invention. The preset augmentation can be random flipping, brightness contrast adjustment, image grayscale, random erasure, etc. By performing preset augmentation processing on multiple first sample face images, multiple first face images are obtained to enrich the sample images.

In one embodiment, a plurality of first sample face images are augmented according to a preset image augmentation model to obtain a plurality of second face images, in which the first sample face image and the corresponding second face image are obtained. The identification codes of the face images are the same. The first sample face image is augmented through an image augmentation model to obtain multiple second face images.

Step S103: Train a preset face recognition model based on the plurality of first face images and the plurality of second face images until the face recognition model converges.

The face recognition model is a neural network model. The specific type of the neural network model can be selected according to the actual situation. This is not specifically limited in the embodiment of the present invention. For example, the neural network model can be a knowledge distillation neural network model. . For example, the face recognition model can be a face recognition model based on knowledge distillation neural network, a face recognition model based on convolutional neural network, and other models.

For example, the structure included in the face recognition model can be backbone+L2 norm. The backbone can be selected according to the actual situation. This embodiment of the present invention does not specifically limit this. For example, the backbone can be MobileFaceNet, iresnet, and vit. ; For example, the structure of the face recognition model can be MobileFaceNet+L2 norm, iresnet+L2 norm, vit+L2 norm and other structural models.

In an embodiment, as shown in Figure 4, step S103 includes sub-steps S1031 to sub-step S1034.

Sub-step S1031: Input the first face image to the face recognition model for processing to obtain a first feature vector.

The first face image is input to the face recognition model for processing to obtain a first feature vector. The first feature vector corresponding to the first face image can be accurately obtained through the face recognition model.

Sub-step S1032: Input the second face image to the face recognition model for processing to obtain a second feature vector.

The second face image is input to the face recognition model for processing to obtain a second feature vector. The second feature vector corresponding to the second face image can be accurately obtained through the face recognition model.

Sub-step S1033: Determine the target loss value of the face recognition model based on the first feature vector and the second feature vector, and determine whether the face recognition model converges based on the target loss value.

Generate a first loss value based on the first feature vector and the identity code corresponding to the first feature vector; generate a second loss value based on the second feature vector and the first feature vector; compare the first loss value and the first feature vector The two loss values are weighted and summed to obtain the target loss value. By determining the first loss value and the second loss value, and performing a weighted sum of the first loss value and the second loss value, the target loss value of the face recognition model can be accurately obtained.

In one embodiment, according to the first feature vector and the identity code corresponding to the first feature vector, the method of generating the first loss value may be: obtaining a preset first loss value formula. The first loss value formula for

Among them, L ₁ is the first loss value, N is the number of mini-batch images, n is the number of identity codes of the first sample face image participating in training, m is the angular distance, and s is the cosine distance expansion of the first feature vector. Multiple, θ _yi is the angle between the first feature vector and the feature prototype of the corresponding identity code. Based on the first loss value formula and the first feature vector and the identity code corresponding to the first feature vector, a first loss value is generated. The first loss value can be accurately calculated through the first loss value formula.

In one embodiment, the method of generating the second loss value based on the second feature vector and the first feature vector may be: performing distillation learning on the second feature vector and the first feature vector, specifically: converting the second feature vector into Vector and each of the first feature vectors set a triplet, the second feature vector is used as anchor, the first feature vector with the same identity code as the second feature vector and the smallest feature similarity is taken as positive, and the first feature vector with the smallest feature similarity is taken as positive. The identity vectors of the two feature vectors have different identity codes and the first feature vector with the greatest feature similarity is used as the negative. Based on the second feature vector corresponding to each anchor, the first feature vector corresponding to the positive and the first feature vector corresponding to the negative, a ternary process is performed. The group loss value is calculated to obtain the second loss value. By constructing a triplet from the second characteristic vector and each first characteristic vector, the second loss value can be accurately calculated based on the constructed triplet and triplet loss principle. It should be noted that the triplet loss principle is based on the Euclidean distance formalization principle.

For example, a preset second loss value formula is obtained. The second loss value formula is L ₂ =max{d(a, p)-d(a, n)+m, 0}. Let the sample be x and The mapping function is f _(x) , and the second loss value formula is simplified, and the simplified second loss value formula is obtained:

Among them, L ₂ second loss value, N is the total number of samples, n is the sample, a is anchor, p is positive, n is negative, m is a constant, where the constant m and the mapping function f _(x) It can be set according to the actual situation, and the embodiment of the present invention does not specifically limit this. Based on the second loss value formula and constructing a triplet according to the second characteristic vector and each first characteristic vector, a second loss value is generated.

In one embodiment, the first loss value and the second loss value are weighted and summed to obtain the target loss value by: obtaining the first weight parameter and the second weight parameter, and adding the first weight parameter and the first loss value. The value is multiplied to obtain the third loss value, the second weight parameter is multiplied by the second loss value to obtain the fourth loss value, and the third loss value and the fourth loss value are summed to obtain the target loss value. Among them, the first weight parameter and the second weight parameter can be set according to the actual situation. The embodiment of the present invention does not specifically limit this. By performing a weighted sum of the first loss value and the second loss value, it can be accurately obtained Target loss value.

In one embodiment, after obtaining the target loss value, it is determined whether the target loss value is less than or equal to the preset threshold. If the target loss value is less than or equal to the preset threshold, it is determined that the face recognition model has converged; if the target loss value is greater than If the threshold is preset, it is determined that the face recognition model has not converged. The preset threshold can be set according to the actual situation, and the embodiment of the present invention does not specifically limit this.

Sub-step S1034: If the face recognition model has not converged, adjust the model parameters of the face recognition model to update the face recognition model, and continue to train the updated face recognition model. If If the face recognition model converges, the converged face recognition model will be obtained.

Determine whether the target loss value is less than or equal to the preset threshold. If the target loss value is less than or equal to the preset threshold, it is determined that the face recognition model has converged; if the target loss value is greater than the preset threshold, it is determined that the face recognition model has not converged. Adjust the model parameters of the face recognition model to update the face recognition model, and continue to train the updated face recognition model. If the target loss value of the updated face recognition model is less than or equal to the preset threshold, determine the person The face recognition model has converged. When the face recognition model does not converge, update the model parameters and continue training to obtain a converged face recognition model.

The face recognition model training method provided by the above embodiments obtains a plurality of first sample face images and the identity identification code corresponding to each first sample face image; and then presets the plurality of first sample face images. Augmentation processing is performed to obtain a plurality of first face images, and according to a preset image augmentation model, augmentation processing is performed on a plurality of first sample face images to obtain a plurality of second face images, the first The sample face image is the same as the identity identification code corresponding to the second face image; based on the plurality of first face images and the plurality of second face images, the preset face recognition model is trained until the face Identify model convergence. This solution uses preset augmentation and image augmentation models to augment multiple first sample face images, and can obtain a large number of first face images and second face images, greatly increasing the number of training samples. , by jointly training the preset face recognition model on the first face image and the second face image, so that the trained face recognition model is more accurate.

Please refer to FIG. 5 , which is a schematic flowchart of the steps of the face recognition method provided by the embodiment of the present application.

As shown in Figure 5, the face recognition method includes steps S301 to S303.

Step S301: Obtain the face image to be recognized.

Obtain a face image to be recognized. The face image may be a face photo or a frame of face image in a video. This is not specifically limited in the embodiment of the present invention.

Step S302: Input the face image to be recognized to the face recognition model to obtain the identity characteristics of the person corresponding to the face image to be recognized.

Wherein, the face recognition model is trained through the aforementioned face recognition model training method.

Input the face image into the preset face recognition model to obtain the identity characteristics of the person corresponding to the face image. By inputting the face image into the preset face recognition model, the identity characteristics of the person corresponding to the face image can be accurately obtained.

Step S303: Determine the identity information of the person corresponding to the face image to be recognized based on the identity characteristics and the preset identity information database.

Wherein, the preset identity information database is an identity information database established in advance based on the identity information of each person, and each identity information in the identity information database maps the preset identity characteristics of each person. The preset identity information database can be established according to actual conditions, and this is not specifically limited in the embodiment of the present invention.

In one embodiment, the similarity between the identity feature and each preset identity feature in the identity information database is calculated to obtain the similarity of each preset identity feature of the identity feature, and the largest similarity correspondence is selected from the similarity queue. The preset identity feature is used as the target identity feature, and the identity information corresponding to the target identity feature is used as the identity information of the person corresponding to the face image to be recognized. By calculating the similarity between the identity feature and each preset identity feature in the identity information database, the identity information of the person corresponding to the face image to be recognized can be accurately determined.

In one embodiment, the similarity between the identity feature and each preset identity feature in the identity information database is calculated. The similarity of each preset identity feature in the identity information database can be obtained by: obtaining the preset cosine similarity formula. , the cosine similarity formula is

Among them, L ₃ is the identity feature similarity, A is the identity feature, and B is the preset identity feature. Substitute the identity feature and the preset identity feature into the cosine similarity formula to obtain the similarity between the identity feature and the preset identity feature.

The face recognition method provided by the above embodiments obtains the face image to be recognized; then inputs the face image into the face recognition model to obtain the identity characteristics of the person corresponding to the face image, and then based on the identity characteristics and preset The identity information database determines the identity information of the person corresponding to the face image to be recognized. By inputting this face image into the face recognition model, images with lower resolution can be accurately identified, greatly improving the accuracy of face recognition.

Please refer to Figure 6. Figure 6 is a schematic block diagram of a face recognition model training device provided by an embodiment of the present application.

As shown in Figure 6, the face recognition model training device 400 includes a first acquisition module 410, a generation module 420 and a first training module 430, where:

The first acquisition module 410 is used to acquire a plurality of first sample face images and the identity identification code corresponding to each of the first sample face images;

The generation module 420 is used to perform preset augmentation processing on the plurality of first sample face images to obtain a plurality of first face images, and to perform preset augmentation processing on the plurality of first face images according to the preset image augmentation model. The first sample face image is augmented to obtain a plurality of second face images. The first sample face image has the same identity code corresponding to the second face image. The image augmentation model uses Perform image blur augmentation on the first face image;

The first training module 430 is used to train a preset face recognition model according to the plurality of first face images and the plurality of second face images until the face recognition model converges. .

In one embodiment, as shown in Figure 7, the first training module 430 also includes a first processing module 431, a second processing module 432, a first determination module 433 and an update module 434, wherein:

The first processing module 431 is used to input the first face image to the face recognition model for processing to obtain a first feature vector;

The second processing module 432 is used to input the second face image to the face recognition model for processing to obtain a second feature vector;

The first determination module 433 is configured to determine the target loss value of the face recognition model based on the first feature vector and the second feature vector, and determine the face recognition model based on the target loss value. Whether to converge;

The update module 434 is configured to adjust the model parameters of the face recognition model to update the face recognition model if the face recognition model does not converge, and continue to train the updated face recognition model, If the face recognition model converges, a converged face recognition model is obtained.

In one embodiment, the first determination module 433 is also used to:

Generate a first loss value according to the first feature vector and the identity code corresponding to the first feature vector;

Generate a second loss value according to the second feature vector and the first feature vector;

The first loss value and the second loss value are weighted and summed to obtain a target loss value.

In one embodiment, please refer to FIG. 8 , which is a schematic block diagram of an image augmentation model training device provided by an embodiment of the present application. The image augmentation model training device 500 includes a second acquisition module 510, an adding module 520 and a second training model 530, wherein:

The second acquisition module 510 is used to acquire a plurality of second sample face images;

Adding module 520 is used to add noise to each of the second sample face images to obtain multiple third sample face images;

The second training model 530 is used to train a preset image augmentation model based on a plurality of the third sample face images until the image augmentation model converges.

In one embodiment, the adding module 520 is also used to:

Obtain preset photon noise, readout noise and quantization noise;

According to the resolution of each second sample face image, the photon noise, readout noise and quantization noise are added to each second sample face image to obtain a plurality of third sample face images.

In one embodiment, the second training module 530 is also used to:

Process each of the third sample face images through a preset image augmentation model to obtain a plurality of third face images;

Determine whether the image augmentation model converges according to a plurality of the second sample face images and a plurality of the third face images;

If the image augmentation model does not converge, adjust the model parameters of the image augmentation model to update the image augmentation model, and continue to train the updated image augmentation model until the image augmentation model convergence.

In one embodiment, the second training module 530 is also used to:

Calculate the facial feature similarity between the two second sample face images that match each identity identification code, obtain at least one face similarity corresponding to each of the identity identification codes, and calculate the similarity based on each of the faces. Similarity establishes a first similarity histogram;

Calculate the facial feature similarity between the two third face images matching each identity identification code, obtain at least one face similarity corresponding to each of the identity identification codes, and calculate the similarity of each face based on the similarity of each face Create a second similarity histogram;

Perform curve fitting on the first similarity histogram to obtain a first curve, and perform curve fitting on the second similarity histogram to obtain a second curve;

Determine the first area enclosed by the first curve and the coordinate axis and the second area enclosed by the second curve and the coordinate axis;

When the area of the intersection area of the first area and the second area is greater than or equal to the preset area threshold, it is determined that the image augmentation model has converged.

It should be noted that those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the face recognition model training device described above can be referred to the corresponding steps in the embodiment of the face recognition model training method. The process will not be described again here.

Please refer to FIG. 9 , which is a schematic block diagram of a face recognition device provided by an embodiment of the present application. The face recognition device 600 includes a third acquisition module 610, a recognition module 620 and a second determination module 630, wherein:

The third acquisition module 610 is used to acquire the face image to be recognized;

The recognition module 620 is used to input the face image to be recognized to the face recognition model and obtain the identity characteristics of the person corresponding to the face image to be recognized;

The second determination module 630 is used to determine the identity information of the person corresponding to the face image to be recognized based on the identity characteristics and the preset identity information database.

It should be noted that those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the above-mentioned face recognition model training device can be referred to the corresponding process in the aforementioned face recognition method embodiment. This will not be described again.

Please refer to FIG. 10 , which is a schematic structural block diagram of a terminal device provided by an embodiment of the present application.

As shown in Figure 10, the terminal device 700 includes a processor 701 and a memory 702. The processor 701 and the memory 702 are connected through a bus 703, which is, for example, an I2C (Inter-integrated Circuit) bus.

Specifically, the processor 701 is used to provide computing and control capabilities to support the operation of the entire terminal device 700 . The processor 701 can be a central processing unit (Central Processing Unit, CPU). The processor 701 can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general processor may be a microprocessor or the processor may be any conventional processor.

Specifically, the memory 702 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) disk, an optical disk, a U disk or a mobile hard disk, etc.

Those skilled in the art can understand that the structure shown in Figure 10 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the terminal equipment to which the solution of the present invention is applied. The specific terminal device 700 More or fewer components may be included than shown in the figures, or certain components may be combined, or may have a different arrangement of components. Wherein, in one embodiment, the processor 701 is used to run a computer program stored in the memory to implement the following steps:

Obtain a plurality of first sample face images and an identity identification code corresponding to each first sample face image;

According to the plurality of first face images and the plurality of second face images, a preset face recognition model is trained until the face recognition model converges.

In one embodiment, the processor 701 trains a preset face recognition model based on the plurality of first face images and the plurality of second face images until the When the face recognition model converges, it is used to implement:

Input the first face image to the face recognition model for processing to obtain a first feature vector;

Input the second face image to the face recognition model for processing to obtain a second feature vector;

Determine a target loss value of the face recognition model based on the first feature vector and the second feature vector, and determine whether the face recognition model converges based on the target loss value;

If the face recognition model does not converge, adjust the model parameters of the face recognition model to update the face recognition model, and continue to train the updated face recognition model. If the face recognition model When the model converges, the converged face recognition model is obtained.

In one embodiment, when determining the target loss value of the face recognition model based on the first feature vector and the second feature vector, the processor 701 is configured to:

In one embodiment, before implementing the acquisition of multiple first sample face images and the identity identification code corresponding to each of the first sample face images, the processor 701 is also configured to:

Obtain multiple second sample face images, and add noise to each second sample face image to obtain multiple third sample face images;

A preset image augmentation model is trained according to a plurality of the third sample face images until the image augmentation model converges.

In one embodiment, when the processor 701 adds noise to each of the second sample face images to obtain multiple third sample face images, the processor 701 is configured to:

Obtain preset photon noise, readout noise and quantization noise;

In one embodiment, the processor 701 is configured to train a preset image augmentation model based on a plurality of the third sample face images until the image augmentation model converges. accomplish:

In one embodiment, the processor 701 is configured to determine whether the image augmentation model converges based on a plurality of the second sample face images and a plurality of the third face images. accomplish:

In one embodiment, the processor 701 is used to implement:

Obtain the face image to be recognized;

Input the face image to be recognized into the face recognition model to obtain the identity characteristics of the person corresponding to the face image to be recognized;

It should be noted that those skilled in the art can clearly understand that for the convenience and simplicity of description, for the specific working process of the terminal device described above, reference can be made to the aforementioned face recognition model training method and/or face recognition method embodiments The corresponding process will not be described again here.

Embodiments of the present application also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. The computer program includes program instructions. The method implemented when the program instructions are executed may refer to this document. Various embodiments of facial recognition methods are applied for.

The computer-readable storage medium may be an internal storage unit of the computer device described in the previous embodiment, such as a hard disk or memory of the computer device. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital (SD) equipped on the computer device. ) card, Flash Card, etc.

It should be understood that the terminology used in the specification of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly dictates otherwise.

It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, as used herein, the terms "include", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or system that includes a list of elements not only includes those elements, but It also includes other elements not expressly listed or that are inherent to the process, method, article or system. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.

The above serial numbers of the embodiments of the present application are only for description and do not represent the advantages and disadvantages of the embodiments. The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present application. Modification or replacement, these modifications or replacements shall be covered by the protection scope of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A face recognition model training method, which is characterized by including:

Obtain a plurality of first sample face images and an identity identification code corresponding to each first sample face image;

Perform preset augmentation processing on the plurality of first sample face images to obtain a plurality of first face images, and perform augmentation on the plurality of first sample face images according to the preset image augmentation model. Through extensive processing, a plurality of second face images are obtained. The first sample face image is the same as the identity identification code corresponding to the second face image. The image augmentation model is used to predict the first face image. The image undergoes image blur augmentation;

A preset face recognition model is trained according to the plurality of first face images and the plurality of second face images until the face recognition model converges.
The face recognition model training method according to claim 1, wherein the preset face recognition model is trained based on the plurality of first face images and the plurality of second face images. Training until the face recognition model converges, including:

Input the first face image to the face recognition model for processing to obtain a first feature vector;

Input the second face image to the face recognition model for processing to obtain a second feature vector;

Determine a target loss value of the face recognition model based on the first feature vector and the second feature vector, and determine whether the face recognition model converges based on the target loss value;

If the face recognition model does not converge, adjust the model parameters of the face recognition model to update the face recognition model, and continue to train the updated face recognition model. If the face recognition model When the model converges, the converged face recognition model is obtained.
The face recognition model training method according to claim 2, wherein determining the target loss value of the face recognition model according to the first feature vector and the second feature vector includes:

Generate a first loss value according to the first feature vector and the identity code corresponding to the first feature vector;

Generate a second loss value according to the second feature vector and the first feature vector;

The first loss value and the second loss value are weighted and summed to obtain a target loss value.
The face recognition model training method according to claim 1, characterized in that before obtaining a plurality of first sample face images and the identity identification code corresponding to each first sample face image, it further includes:

Obtain multiple second sample face images, and add noise to each second sample face image to obtain multiple third sample face images;

A preset image augmentation model is trained according to a plurality of the third sample face images until the image augmentation model converges.
The face recognition model training method according to claim 4, characterized in that adding noise to each of the second sample face images to obtain a plurality of third sample face images includes:

Obtain preset photon noise, readout noise and quantization noise;

According to the resolution of each second sample face image, the photon noise, readout noise and quantization noise are added to each second sample face image to obtain a plurality of third sample face images.
The face recognition model training method according to claim 4, wherein the preset image augmentation model is trained according to a plurality of the third sample face images until the image augmentation model Convergence, including:

Process each of the third sample face images through a preset image augmentation model to obtain a plurality of third face images;

Determine whether the image augmentation model converges according to a plurality of the second sample face images and a plurality of the third face images;

If the image augmentation model does not converge, adjust the model parameters of the image augmentation model to update the image augmentation model, and continue to train the updated image augmentation model until the image augmentation model convergence.
The face recognition model training method according to claim 6, wherein the step of determining whether the image augmentation model is based on a plurality of second sample face images and a plurality of third face images. Convergence, including:

Calculate the facial feature similarity between the two second sample face images that match each identity identification code, obtain at least one face similarity corresponding to each of the identity identification codes, and calculate the similarity based on each of the faces. Similarity establishes a first similarity histogram;

Calculate the facial feature similarity between the two third face images matching each identity identification code, obtain at least one face similarity corresponding to each of the identity identification codes, and calculate the similarity of each face based on the similarity of each face Create a second similarity histogram;

Perform curve fitting on the first similarity histogram to obtain a first curve, and perform curve fitting on the second similarity histogram to obtain a second curve;

Determine the first area enclosed by the first curve and the coordinate axis and the second area enclosed by the second curve and the coordinate axis;

When the area of the intersection area of the first area and the second area is greater than or equal to the preset area threshold, it is determined that the image augmentation model has converged.
A face recognition method, characterized by including:

Obtain the face image to be recognized;

Input the face image to be recognized into a face recognition model to obtain the identity characteristics of the person corresponding to the face image to be recognized, wherein the face recognition model is configured by any one of claims 1-7 Obtained by training according to the face recognition model training method described in the item;

According to the identity characteristics and the preset identity information database, the identity information of the person corresponding to the face image to be recognized is determined.
A face recognition model training device, characterized in that the face recognition model training device includes a first acquisition module, a generation module and a first training module, wherein:

The first acquisition module is used to acquire a plurality of first sample face images and an identity identification code corresponding to each of the first sample face images;

The generation module is used to perform preset augmentation processing on the plurality of first sample face images to obtain a plurality of first face images, and to perform preset augmentation processing on the plurality of first face images according to the preset image augmentation model. A sample face image is subjected to augmentation processing to obtain multiple second face images. The first sample face image is the same as the identity identification code corresponding to the second face image. The image augmentation model is used to Perform image blur augmentation on the first face image;

The first training module is used to train a preset face recognition model according to the plurality of first face images and the plurality of second face images until the face recognition model converges.
A terminal device, characterized in that the terminal device includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein when the computer program is executed by the processor , implement the steps of the face recognition model training method according to any one of claims 1 to 7 and/or the face recognition method according to claim 8.
A storage medium, characterized in that a computer program is stored on the storage medium, wherein when the computer program is executed by a processor, the face recognition model training method as described in any one of claims 1 to 7 is implemented. And/or the steps of the face recognition method according to claim 8.