CN112464916B

CN112464916B - Face recognition method and model training method thereof

Info

Publication number: CN112464916B
Application number: CN202011614220.5A
Authority: CN
Inventors: 吕桢飞
Original assignee: Shanghai Qigan Electronic Information Technology Co ltd
Current assignee: Shanghai Qigan Electronic Information Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2023-09-19
Anticipated expiration: 2040-12-31
Also published as: CN112464916A

Abstract

The face recognition method and the model training method thereof, wherein the face recognition model training method comprises the following steps: a) Obtaining a face data set; b) Preprocessing a face data set to obtain input data; c) Inputting input data into a neural network to be trained, and extracting characteristics of the input data by the neural network to be trained; d) Calculating network loss according to an output vector output by the neural network to be trained; e) Updating the weight value according to the network loss; f) Repeating the steps c, d and e until the network loss converges to be lower than a preset first target value; g) Setting a weight value smaller than a preset first threshold value in the neural network to be trained to 0; h) Repeating the steps c, d and e until the network loss is converged below a predetermined second target value. The invention can improve the accuracy of face detection and speed up the face recognition.

Description

Face recognition method and model training method thereof

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a face recognition method and a model training method thereof.

Background

With the development of modern scientific information industry, the identity authentication technology has turned to the biometric aspect. Current biometric identification techniques include: fingerprint recognition, retina recognition, gait recognition, etc. Compared with other recognition methods, the face recognition method has the characteristics of being direct, friendly, convenient and the like, and a user has no psychological barrier and is easy to accept, so that the face recognition method is widely applied to fields such as criminal investigation and case breaking, certificate verification, video monitoring, entrance control and the like.

The early face recognition system adopts traditional machine learning methods including geometric features, local feature analysis and the like, but the methods have the defects of low recognition rate, poor anti-interference capability and the like, and cause great interference to the application of the face recognition system.

In recent years, with the rapid development of artificial intelligence, the use of trained artificial neural networks for face recognition is a future development trend. The artificial neural network for face recognition in the prior art often has the defects of high false detection rate, low detection speed and the like, and prevents the application of the artificial neural network in the face recognition field.

Therefore, how to overcome the above-mentioned drawbacks by using the artificial neural network for face detection, the artificial neural network with low false detection rate and fast detection speed is trained, so as to improve the detection accuracy and detection speed of face detection, which is a problem to be solved in the present day.

Disclosure of Invention

The invention solves the technical problems that: how to improve the detection accuracy and detection speed of face detection.

In order to solve the above technical problems, an embodiment of the present invention provides a face recognition model training method, including:

a) Obtaining a face data set, wherein the face data set is used for training a neural network to be trained for face recognition;

b) Preprocessing a face data set to obtain input data;

c) Inputting input data into a neural network to be trained, and extracting characteristics of the input data by the neural network to be trained;

d) Calculating the network loss according to the output vector of the neural network to be trained comprises adopting the following formula to calculate the network loss of the neural network to be trained:

wherein L represents network loss, N represents the number of training samples of each batch, s represents the radius of the face vector mapping hypersphere, y _i Represents a person or a person, m represents an angle margin, lambda represents a balance parameter for balancing weights of the front part and the rear part of the plus sign in the formula, and x _i Representing a certain class of vectors, C _yi Representation ofY (th) _i Center of depth feature of class sample, k represents the k-th layer of network, w _i Representing the sum of all weights of the ith layer;

e) Updating the weight value according to the network loss;

f) Repeating the steps c, d and e until the network loss converges to be lower than a preset first target value;

g) Setting a weight value smaller than a preset first threshold value in the neural network to be trained to 0;

h) Repeating the steps c, d and e until the network loss is converged below a predetermined second target value.

Optionally, the obtaining the face data set includes: and acquiring a face data set by adopting a local camera and an OpenCV.

Optionally, the obtaining the face data set includes: the disclosed face data set is downloaded from the internet.

Optionally, the obtaining the face data set further includes: carrying out data cleaning on the face data set; the step of cleaning the face data set comprises the following steps:

acquiring a first neural network, wherein the first neural network is a trained face recognition neural network;

inputting a face data set into a first neural network, wherein photos of the same person are put in the same folder for input, and different folders correspond to photos of different persons;

for each photo, the first neural network outputs an output vector respectively;

calculating the average value of the output vectors of all the photos in the folders respectively aiming at each folder;

calculating the distance between the output vector of each photo and the average value of the output vectors of all the photos in the folder where the photo is located as a first distance;

and deleting the photos with the first distance larger than a preset second threshold value.

Optionally, the obtaining the face data set includes: the photographs of the plurality of public characters are downloaded from the internet to form a face data set including a plurality of photographs of the respective public characters.

Optionally, the preprocessing the face data set includes: for YUV graphics in the face data set, extracting the component of the YUV graphics in the Y direction as the input of the neural network to be trained.

Optionally, the preprocessing the face data set includes: the photos in the face dataset are randomly rotated, randomly flipped and/or randomly cropped.

Optionally, the feature extraction of the input data by the neural network to be trained includes: and adopting a neural network of an acceptance-Resnet-v 1 network structure to extract the characteristics of the input data.

Optionally, the updating the weight value according to the network loss includes: the derivative of the network loss for each network weight value is calculated and the weight value is updated based on the derivative and the learning rate.

Optionally, the second target value is less than or equal to the first target value.

In order to solve the above technical problem, an embodiment of the present invention further provides a face recognition method, including:

training the neural network to be trained by adopting the face recognition model training method;

and adopting the trained neural network to be trained to conduct face recognition.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

after training until the network loss converges to be lower than a preset first target value, setting a weight value smaller than a preset first threshold value in the neural network to be trained to be 0, and then continuing training until the network loss converges to be lower than a preset second target value, so that an algorithm can be more beneficial to a sparse module in hardware to exert a higher effect, the running time of the face recognition algorithm is shortened, and the detection speed of face recognition is accelerated.

Furthermore, a brand new design algorithm is adopted to calculate network loss, and the algorithm combines an Arcface loss function and a centroloss function, so that similar distribution is tighter, false detection rate is reduced, and face detection accuracy is improved.

Furthermore, on the basis of the improved network loss algorithm, the L1 regularization term for the weight value is further increased, so that the sparseness of the weight value is further increased, and the detection speed of face recognition is increased.

Drawings

FIG. 1 is a flowchart of a face recognition model training method in an embodiment of the invention;

FIG. 2 is a training flow chart of a face recognition algorithm in the prior art;

fig. 3 is a training flowchart of a deep learning face recognition algorithm in an embodiment of the present invention.

Detailed Description

According to analysis of the background art, the artificial neural network for face recognition in the prior art often has the defects of high false detection rate, low detection speed and the like.

After training until the network loss converges to be lower than a preset first target value, the weight value smaller than the preset first threshold value in the neural network to be trained is set to be 0, then training is continued until the network loss converges to be lower than a preset second target value, meanwhile, a brand new design algorithm is adopted to calculate the network loss, the algorithm combines an Arcface loss function with a centrerss function, and then an L1 regularization term for the weight value is further increased, so that the detection accuracy and the detection speed of the trained face detection neural network are higher.

In order that those skilled in the art will better understand and practice the invention, a detailed description will be given below with reference to specific embodiments thereof.

Example 1

As described below, the embodiment of the invention provides a face recognition model training method.

Referring to the flowchart of the face recognition model training method shown in fig. 1, the following detailed description is given by specific steps:

s101, obtaining a face data set.

The face data set is used for training the neural network to be trained for face recognition.

In some embodiments, a local camera and OpenCV may be employed to acquire the face dataset. Specifically, by completing the Python script, the script can start the camera of the local computer, has a local face detection function, and can store the face screenshot detected by the face in a local folder.

For a face sample of a volunteer, it is necessary to collect photographs meeting the following conditions: different lighting conditions, different times of the day and/or different expressions.

In other embodiments, the disclosed face data set may be downloaded from the internet. Specifically, some general data sets include LFW, celebFaces, VGG-Face, megaFace, and all of these data sets have hundreds of thousands or even millions of face data sets, and are effective data sets for face recognition training. But there is a lot of data noise, and an automatic method is used in this embodiment to clean up noise in the known data set. The method for automatically cleaning up the data set comprises the following steps:

after downloading the disclosed face data set from the internet, the face data set may also be cleaned up. The step of cleaning the face data set may specifically include the following steps:

for each photo, the first neural network outputs an output vector respectively;

By the data cleaning method, the signal to noise ratio can be effectively increased, and the data set can be automatically cleaned.

In other embodiments, a photo composition face dataset comprising a plurality of photos of individual public characters may be downloaded from the internet. In particular, multiple pictures of multiple public characters are obtained from a publicly available platform, say hundred degree pictures. And then, intercepting part of the face in the picture through a face detection network and adjusting the size as an input size required by the network.

S102, preprocessing the face data set to obtain input data.

In some embodiments, the preprocessing the face dataset may include: for YUV graphics in the face data set, extracting the component of the YUV graphics in the Y direction as the input of the neural network to be trained.

In other embodiments, the preprocessing the face dataset may include: the photos in the face dataset are randomly rotated, randomly flipped and/or randomly cropped.

The effect of an image may be affected by the intensity of the ambient light, the reflection of objects, and the factors of the camera used, etc. In order to prevent the photograph from being affected by external factors, it is necessary to whiten the input image. To remove these factors, we convert the pixel value distribution of the picture into a value with a mean of 0 and a unit variance.

But for applications requiring deep learning accelerators, further reducing the picture input size reduces the computational effort of the picture if a single channel picture can be used. We first extract only the Y-direction component as input to the network for the input YUV graph here. In this way the amount of data of the input layer picture is reduced.

Meanwhile, the input face recognition training pictures are subjected to data enhancement, and the data enhancement modes are as follows:

randomly rotating, namely randomly rotating an input picture by a part of angles;

randomly turning over, namely randomly turning over the input picture left and right or up and down;

and (3) random clipping, namely, randomly clipping the picture and converting the picture into the original picture size.

S103, inputting the input data into a neural network to be trained, and extracting the characteristics of the input data by the neural network to be trained.

In some embodiments, the feature extraction of the input data by the neural network to be trained may include: and adopting a neural network of an acceptance-Resnet-v 1 network structure to extract the characteristics of the input data.

The input data prepared in step S102 is required to be sent to the deep learning framework for layer-by-layer feature extraction during training, and the output of the network is obtained.

Typically a deep learning network consists of several structures: convolution layer, pooling layer, full connection layer, dropout layer, etc. Different network architectures can be obtained through different combinations of the four structures, and the network architecture adopted in the embodiment is an acceptance-Resnet-v 1 structure pushed by Google as feature extraction, so that any pushed new deep learning architecture is supported to better extract features.

S104, calculating network loss according to the output vector of the neural network to be trained.

In some embodiments, the network loss of the neural network to be trained may be calculated specifically using the following equation:

wherein L represents network loss, N represents the number of training samples of each batch, s represents the radius of the face vector mapping hypersphere, y _i When a person or a person is shown, m represents an angle margin, lambda represents a balance parameter for balancing weights of a front part and a rear part of a plus sign in an arithmetic expression, and x _i Representing a certain class of vectors, C _yi Represents the y _i Center of depth feature of class sample, k represents the k-th layer of network, w _i Representing the sum of the i-th layer ownership weights.

In step S103, a network output result of a batch of input face pictures is obtained, but whether the result is correct or whether the network is trained is determined by calculating a loss value of the network, so that the smaller the loss, the better.

The loss function is a mathematical function for calculating network loss, and for face recognition, the currently-used deep learning loss function with prominent effect based on face recognition has the following steps:

1）Cross-Entropy Loss

2）Additive-Margin Softmax Loss

3）ArcFace Loss

the distance between different people is increased by the model trained by the current loss function, but the distance between the same people is not tight enough. Too much dispersion of the distribution among the same classes increases the false detection rate of face recognition, so how to reduce the distance among the same classes, and the distribution among the same classes is more compact is a problem.

According to the invention, by designing a new loss function and training the face recognition algorithm by adopting the new loss function, the distance between the same classes can be reduced, so that the false detection rate of the face recognition algorithm is reduced.

Arcface is a loss function with more outstanding effect in face recognition at present, and can effectively increase the distance between different categories, and the Arcface loss function is as follows:

through research, the model trained by the loss function has the defects of insufficiently tight distribution among the similar types and scattered distribution ratio of the similar types, namely the false detection rate of face recognition is increased, and the problem that A is recognized as B occurs. In order to reduce the false detection rate, the invention adds a centrol function on the basis of Arcface:

wherein C is _yi Represents the y _i Center point of depth feature of class sample. In model training, when a batch training is completed, the depth features belonging to the same class are trained to obtain an average value, and the distribution in the class can be more compact by the method, so that the false detection rate is reduced. The now improved Arcface loss function is:

where λ is a balance parameter, and the weights of the front and rear portions can be balanced by adjusting the value of λ. If it is desired to make the class more compact, a larger lambda value needs to be used.

Since we are face recognition models based on deep learning accelerator platforms, current deep learning network accelerators support sparse matrix computation. If we can train out the network with higher sparsity, the running speed of the network can be increased.

For increasing the sparsity of the model, this can continue to be achieved by changing the loss function. Adding an L1 regularization term to the weights in the loss function can further increase the weight sparseness. The L1 regularization term is:

the final improved face recognition loss function is as follows:

+/>

wherein L represents network loss, N represents the number of training samples of each batch, s represents the radius of the face vector mapping hypersphere, y _i Represents a person or a person, m represents an angle margin, lambda represents a balance parameter for balancing weights of the front part and the rear part of the plus sign in the formula, and x _i Representing a certain class of vectors, C _yi Represents the y _i Center of depth feature of class sample, k represents the k-th layer of network, w _i Representing the sum of the i-th layer ownership weights.

As can be seen from the above description of the technical solution: in this embodiment, a newly designed algorithm is used to calculate the network loss, and the algorithm combines the Arcface loss function with the centroloss function, so that the similar distribution is tighter, thereby being beneficial to reducing the false detection rate and improving the accuracy of face detection.

And S105, updating the weight value according to the network loss.

In some embodiments, the updating the weight value according to the network loss may include: the derivative of the network loss for each network weight value is calculated and the weight value is updated based on the derivative and the learning rate.

S106, judging whether the network loss is converged below a preset first target value.

If so, the process advances to the subsequent step S107.

If not, repeating the steps S103 to S105 until the network loss is converged below the predetermined first target value.

And S107, setting the weight value smaller than a preset first threshold value in the neural network to be trained to be 0.

In the deep network accelerator, the network can be further compressed according to the sparsity of the network, and multiplication of a sparse matrix is adopted for acceleration. The sparse matrix means that the number of parameters of zero in one matrix is larger than the number of parameters of non-zero. But at present, the sparsity of the network is not increased in consideration of the reason of deployment on the end when the model is trained.

The network pruning shears can increase the sparsity of the network by setting a suitable threshold value as follows, and setting a weight value smaller than the threshold value to 0.

According to the principle, network pruning is carried out on the trained face recognition network, so that more sparse network weight parameters are obtained.

S108, judging whether the network loss is converged below a preset second target value.

If yes, the sparse network training is ended, and the face recognition algorithm model is obtained.

If not, repeating the steps S103 to S105 until the network loss is converged below the predetermined second target value.

The second target value is smaller than or equal to the first target value, that is, the step S106 and the step S107 help to reduce network loss, so as to improve the recognition accuracy and recognition speed of the face recognition model.

As shown in fig. 2, a training flow of a face recognition algorithm in the prior art is shown; as shown in fig. 3, the training flow of the face recognition algorithm for deep learning in this embodiment is shown.

As can be seen from the above description of the technical solution: in this embodiment, after training until the network loss converges to be lower than a predetermined first target value, a weight value smaller than a predetermined first threshold value in the neural network to be trained is set to 0, and then training is continued until the network loss converges to be lower than a predetermined second target value, so that the algorithm can be more beneficial to the sparse module in the hardware to exert a higher effect, thereby reducing the running time of the face recognition algorithm and accelerating the detection speed of face recognition.

Example two

As described below, the embodiment of the invention provides a face recognition method.

The method is different from the prior art in that the face recognition method adopts the face recognition model training method provided by the embodiment of the invention to train the neural network to be trained. Therefore, the face recognition method can set the weight value smaller than the preset first threshold value in the neural network to be trained to be 0 after training until the network loss is converged to be lower than the preset first target value, and then continues training until the network loss is converged to be lower than the preset second target value, so that the algorithm can be more beneficial to the sparse module in hardware to exert higher effect, the running time of the face recognition algorithm is shortened, and the detection speed of face recognition is accelerated.

Those of ordinary skill in the art will appreciate that in the various methods of the above embodiments, all or part of the steps may be performed by hardware associated with program instructions, and the program may be stored in a computer readable storage medium, where the storage medium may include: ROM, RAM, magnetic or optical disks, etc.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims

1. The human face recognition model training method is characterized by comprising the following steps of:

b) Preprocessing a face data set to obtain input data;

wherein L represents network loss, N represents the number of training samples of each batch, s represents the radius of the face vector mapping hypersphere, y _i Represents a person or a person, m represents an angle margin, lambda represents a balance parameter for balancing weights of the front part and the rear part of the plus sign in the formula, and x _i Representing a certain class of vectors, C _yi Represents the y _i Center of depth feature of class sample, k represents the k-th layer of network, w _i Representing the sum of all weights of the ith layer;

e) Updating the weight value according to the network loss;

h) Repeating the steps c, d and e until the network loss converges to be lower than a preset second target value, wherein the second target value is smaller than or equal to the first target value.

2. The face recognition model training method of claim 1, wherein the obtaining a face data set comprises: and acquiring a face data set by adopting a local camera and an OpenCV.

3. The face recognition model training method of claim 1, wherein the obtaining a face data set comprises: the disclosed face data set is downloaded from the internet.

4. A face recognition model training method as defined in claim 3, wherein the obtaining a face dataset further comprises: carrying out data cleaning on the face data set; the step of cleaning the face data set comprises the following steps:

for each photo, the first neural network outputs an output vector respectively;

5. The face recognition model training method of claim 1, wherein the obtaining a face data set comprises: the photographs of the plurality of public characters are downloaded from the internet to form a face data set including a plurality of photographs of the respective public characters.

6. The face recognition model training method of claim 1, wherein preprocessing the face data set comprises: for YUV graphics in the face data set, extracting the component of the YUV graphics in the Y direction as the input of the neural network to be trained.

7. The face recognition model training method of claim 1, wherein preprocessing the face data set comprises: the photos in the face dataset are randomly rotated, randomly flipped and/or randomly cropped.

8. The face recognition model training method of claim 1, wherein the feature extraction of the input data by the neural network to be trained comprises: and adopting a neural network of an acceptance-Resnet-v 1 network structure to extract the characteristics of the input data.

9. The face recognition model training method of claim 1, wherein updating the weight value according to the network loss comprises: the derivative of the network loss for each network weight value is calculated and the weight value is updated based on the derivative and the learning rate.

10. A face recognition method, comprising:

training a neural network to be trained using the face recognition model training method of any one of claims 1 to 9;