CN112464916A

CN112464916A - Face recognition method and model training method thereof

Info

Publication number: CN112464916A
Application number: CN202011614220.5A
Authority: CN
Inventors: 吕桢飞
Original assignee: Shanghai Qigan Electronic Information Technology Co ltd
Current assignee: Shanghai Qigan Electronic Information Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-03-09
Anticipated expiration: 2040-12-31
Also published as: CN112464916B

Abstract

The face recognition method and the model training method thereof are provided, wherein the face recognition model training method comprises the following steps: a) obtaining a face data set; b) preprocessing a face data set to obtain input data; c) inputting input data into a neural network to be trained, and extracting the characteristics of the input data by the neural network to be trained; d) calculating the network loss according to the output vector output by the neural network to be trained; e) updating the weight value according to the network loss; f) repeating the steps c, d and e until the network loss converges to be lower than a preset first target value; g) setting a weight value smaller than a preset first threshold value in the neural network to be trained to be 0; h) and c, repeatedly executing the steps c, d and e until the network loss converges to be lower than a preset second target value. The invention can improve the accuracy of face detection and accelerate the speed of face recognition.

Description

Face recognition method and model training method thereof

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a face recognition method and a model training method thereof.

Background

With the development of the modern scientific information industry, the identity authentication technology has moved to the biometric level. Current biometric techniques include: fingerprint identification, retina identification, gait identification, and the like. Compared with other identification methods, the face identification has the characteristics of directness, friendliness, convenience and the like, so that a user has no psychological barrier and is easy to accept by the user, and the face identification is widely applied to the fields of criminal investigation and case solving, certificate verification, video monitoring, entrance control and the like.

The early face recognition system adopts the traditional machine learning method, including geometric feature, local feature analysis and other methods, but these methods all have the defects of low recognition rate, poor anti-interference capability and the like, and these methods cause great interference to the application of the face recognition system.

In recent years, with the rapid development of artificial intelligence, it is a future development trend to adopt a trained artificial neural network to perform face recognition. In the prior art, the artificial neural network used for face recognition often has the defects of high false detection rate, low detection speed and the like, and the application of the artificial neural network in the field of face recognition is hindered.

Therefore, how to overcome the defects by the artificial neural network for face detection, the artificial neural network with low false detection rate and high detection speed is trained, and the detection accuracy and detection speed of face detection are further improved, which is a problem to be solved urgently at present.

Disclosure of Invention

The technical problem solved by the invention is as follows: how to improve the detection accuracy and the detection speed of the face detection.

In order to solve the above technical problem, an embodiment of the present invention provides a face recognition model training method, including:

a) obtaining a face data set, wherein the face data set is used for training a neural network to be trained for face recognition;

b) preprocessing a face data set to obtain input data;

c) inputting input data into a neural network to be trained, and extracting the characteristics of the input data by the neural network to be trained;

d) calculating the network loss according to the output vector output by the neural network to be trained, wherein the network loss of the neural network to be trained is calculated by adopting the following formula:

wherein, L represents network loss, N represents the number of training samples in each batch, s represents the radius of a face vector mapping hyper-sphere, yi represents a certain class of people or a certain person, m represents an angle margin, lambda represents a balance parameter used for balancing the weights of two parts before and after addition in an equation, xi represents a vector of a certain class, cyi represents the center of the depth feature of a yi-th class of samples, k represents the kth layer of the network, and wi represents the sum of all weights of the ith layer;

e) updating the weight value according to the network loss;

f) repeating the steps c, d and e until the network loss converges to be lower than a preset first target value;

g) setting a weight value smaller than a preset first threshold value in the neural network to be trained to be 0;

h) and c, repeatedly executing the steps c, d and e until the network loss converges to be lower than a preset second target value.

Optionally, the obtaining the face data set includes: and acquiring a face data set by adopting a local camera and OpenCV.

Optionally, the obtaining the face data set includes: the public face data set is downloaded from the internet.

Optionally, the obtaining the face data set further includes: performing data cleaning on the face data set; the data cleaning of the face data set comprises the following steps:

acquiring a first neural network, wherein the first neural network is a trained face recognition neural network;

inputting a face data set into a first neural network, wherein photos of the same person are put into the same folder for inputting, and different folders correspond to photos of different persons;

aiming at each photo, the first neural network respectively outputs an output vector;

respectively calculating the average value of the output vectors of all the photos in each folder;

calculating the distance between the output vector of each photo and the average value of the output vectors of all the photos in the folder where the photo is located, and taking the distance as a first distance;

and deleting the photos with the first distance larger than a preset second threshold value.

Optionally, the obtaining the face data set includes: the photos of a plurality of public figures are downloaded from the internet to form a face data set, and the face data set comprises a plurality of photos related to the public figures.

Optionally, the preprocessing the face data set includes: and extracting the component of the YUV graph in the Y direction as the input of the neural network to be trained for the YUV graph in the face data set.

Optionally, the preprocessing the face data set includes: and carrying out random rotation, random turning and/or random cutting on the photos in the face data set.

Optionally, the performing, by the neural network to be trained, feature extraction on the input data includes: the neural network of the Inception-Resnet-v1 network structure is adopted to perform feature extraction on input data.

Optionally, the updating the weight value according to the network loss includes: a derivative of the network loss for each network weight value is calculated, and the weight values are updated according to the derivative and the learning rate.

Optionally, the second target value is less than or equal to the first target value.

In order to solve the above technical problem, an embodiment of the present invention further provides a face recognition method, including:

training a neural network to be trained by adopting the face recognition model training method;

and adopting the trained neural network to be trained to perform face recognition.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

after training is carried out until the network loss converges to be lower than a preset first target value, the weight value smaller than the preset first threshold value in the neural network to be trained is set to be 0, then training is carried out continuously until the network loss converges to be lower than a preset second target value, the algorithm is more beneficial to a sparse module in hardware to exert higher effect, the running time of the face recognition algorithm is shortened, and the detection speed of face recognition is accelerated.

Furthermore, a newly designed algorithm is adopted to calculate the network loss, and the Arcface loss function and the Centerlos function are combined by the algorithm, so that the similar distribution can be tighter, the false detection rate can be reduced, and the accuracy of the face detection can be improved.

Further, on the basis of the improved network loss algorithm, an L1 regularization term for the weight value is further added, so that the sparsity of the weight value is further increased, and the detection speed of face recognition is increased.

Drawings

FIG. 1 is a flow chart of a face recognition model training method in an embodiment of the present invention;

FIG. 2 is a flow chart of a training process of a face recognition algorithm in the prior art;

fig. 3 is a training flow chart of the deep learning face recognition algorithm in the embodiment of the present invention.

Detailed Description

According to analysis of the background technology, the artificial neural network used for face recognition in the prior art often has the defects of high false detection rate, low detection speed and the like.

After training is carried out until the network loss converges to be lower than a preset first target value, the weight value smaller than a preset first threshold value in the neural network to be trained is set to be 0, then training is continued until the network loss converges to be lower than a preset second target value, meanwhile, the network loss is calculated by adopting a completely new designed algorithm, the Arcface loss function and the Centerlos function are combined by the algorithm, and an L1 regularization item for the weight value is further added, so that the detection accuracy of the trained face detection neural network is higher, and the detection speed is higher.

In order that those skilled in the art will better understand and realize the present invention, the following detailed description is given by way of specific embodiments with reference to the accompanying drawings.

Example one

As described below, an embodiment of the present invention provides a face recognition model training method.

Referring to a flow chart of a face recognition model training method shown in fig. 1, the following detailed description is made through specific steps:

and S101, obtaining a face data set.

The face data set is used for training a neural network to be trained for face recognition.

In some embodiments, a local camera and OpenCV may be employed to acquire the face dataset. Specifically, by completing the Python script, the script can start a camera of the local computer, has a local face detection function, and can store a face screenshot obtained by face detection in a local folder.

For a face sample of a volunteer, it is necessary to collect photographs meeting the following conditions: different lighting conditions, different times of day, and/or different expressions.

In other embodiments, the public face data set may be downloaded from the internet. Specifically, some general data sets include LFW, CelebFaces, VGG-Face, MegaFace and the like, and the data sets all have hundreds of thousands or even millions of Face data sets, and are effective data sets for Face recognition training. But there is much data noise, and in this embodiment an automatic method is used to clean up the noise in the known data set. The method for automatically cleaning the data set comprises the following steps:

after downloading the public face data set from the internet, the data cleaning can be carried out on the face data set. The data cleaning of the face data set may specifically include the following steps:

By the data cleaning method, the signal-to-noise ratio can be effectively increased, and the data set can be automatically cleaned.

In other embodiments, multiple photographs of public people may be downloaded from the internet to form a face data set that includes multiple photographs of public people. Specifically, a plurality of pictures of a plurality of public characters are obtained from a publicly available platform, such as a hundred degree picture. And then intercepting the face part in the picture through a face detection network and adjusting the size of the face part to be used as an input size required by the network.

S102, preprocessing the face data set to obtain input data.

In some embodiments, the preprocessing the face data set may include: and extracting the component of the YUV graph in the Y direction as the input of the neural network to be trained for the YUV graph in the face data set.

In further embodiments, the preprocessing the face data set may include: and carrying out random rotation, random turning and/or random cutting on the photos in the face data set.

The effect of an image may be affected by the intensity of the ambient light, the reflection of objects, and the camera used. In order to prevent the photograph from being affected by external factors, it is necessary to perform whitening processing on the input image. To remove these factors, we convert the pixel value distribution of the picture into a value with a mean of 0 and a unit variance.

But for applications that need to be applied on deep learning accelerators, further downscaling of the picture input size reduces the computational load of the picture if a single-channel picture can be used. So here we first extract only the Y-direction component for the input YUV graph as the input to the network. In this way, the data amount of the input layer picture is reduced.

Meanwhile, data enhancement is carried out on the input face recognition training picture, and the data enhancement mode comprises the following steps:

randomly rotating, namely randomly rotating the input picture by a part of angle;

randomly turning, namely turning the input picture left and right or up and down randomly;

and (4) random cutting, namely cutting the picture at random and then converting the picture into the original picture size.

S103, inputting the input data into the neural network to be trained, and extracting the characteristics of the input data by the neural network to be trained.

In some embodiments, the feature extraction of the input data by the neural network to be trained may include: the neural network of the Inception-Resnet-v1 network structure is adopted to perform feature extraction on input data.

The input data prepared in step S102 needs to be sent to a deep learning framework for feature extraction layer by layer during training, and the output of the network is obtained.

Generally, a deep learning network is composed of the following structures: a build-up layer, a pooling layer, a full interconnect layer, a Dropout layer, etc. Different network architectures can be obtained through different combinations of the four structures, the network structure adopted in the embodiment is an inclusion-Resnet-v 1 structure introduced by Google as feature extraction, and any introduced new deep learning architecture is supported to better extract features.

And S104, calculating the network loss according to the output vector output by the neural network to be trained.

In some embodiments, the following equation may be specifically used to calculate the network loss of the neural network to be trained:

wherein, L represents the network loss, N represents the number of training samples in each batch, s represents the radius of a face vector mapping hyper-sphere, yi represents a certain class of people or a certain person, m represents an angle margin, lambda represents a balance parameter used for balancing the weights of two parts before and after plus in an equation, xi represents a vector of a certain class, cyi represents the center of the depth feature of a yi-th class of samples, k represents the k-th layer of the network, and wi represents the sum of all weights of the i-th layer.

In step S103, a network output result of a batch of input face pictures is obtained, but whether the result is correct or the network is trained needs to be determined by calculating a loss value of the network, and the smaller the loss is, the better the loss is.

The loss function is a mathematical function for calculating network loss, and for face recognition, the currently common deep learning loss functions based on face recognition with prominent effect include the following:

1）Cross-Entropy Loss

2）Additive-Margin Softmax Loss

3）ArcFace Loss

the distance between different people is increased by a model trained by the current loss function, but the distance between the same people is not tight enough. The false detection rate of face recognition can be increased due to the fact that the distribution among the same types is too dispersed, and therefore the problem is how to reduce the distance among the same types of people and enable the distribution among the same types to be tighter.

The invention designs a new loss function, and trains the face recognition algorithm by adopting the new loss function, so that the distance between the same kind of face recognition algorithm can be reduced, and the false detection rate of the face recognition algorithm is reduced.

Arcface is a loss function with prominent effect in the current face recognition, the distance between different classes can be effectively increased, and the Arcface loss function is as follows:

according to research, the model trained by the loss function has the defects that the distribution among the same types is not tight enough, the distribution ratio of the same types is scattered, the false detection rate of face recognition is increased, and the problem is that A is recognized as B. In order to reduce the false detection rate, the invention adds a Centerlos function on the basis of Arcface:

wherein the content of the first and second substances,

is represented by

The center point of the depth feature of the class sample. In the model training, when the training of one batch is completed, the depth features belonging to the same class are trained to obtain an average value, the distribution in the class can be more compact by the method, and the false detection rate is reduced. The now improved Arcface loss function is:

wherein, λ is a balance parameter, and the weights of the front and back parts can be balanced by adjusting the value of λ. If it is desired to make the intra-class ratio more compact, a larger lambda value needs to be used.

Because the human face recognition model is based on a deep learning accelerator platform, the current deep learning network accelerator supports sparse matrix calculation. If we can train a network with higher sparsity, the network operation speed can be improved.

For increasing the sparsity of the model, this effect can continue to be achieved by changing the loss function. Adding an L1 regularization term to the weights in the loss function may further increase the degree of weight sparsity. The L1 regularization term is:

the final improved face recognition loss function is as follows:

wherein L represents the network loss, N represents the number of training samples in each batch, s represents the radius of the face vector mapping hyper-sphere,

representing a certain person or a certain person, m represents an angle margin, lambda represents a balance parameter for balancing the weight of the two parts before and after the addition in the formula,

a vector representing a certain class of vectors is,

is shown as

The center of the depth feature of the class sample, k denotes the network k-th layer,

representing the sum of all weights at layer i.

The above description of the technical solution shows that: in the embodiment, a brand new designed algorithm is adopted to calculate the network loss, and the Arcface loss function and the Centerlos function are combined by the algorithm, so that the similar distribution is tighter, the false detection rate is reduced, and the accuracy of face detection is improved.

And S105, updating the weight value according to the network loss.

In some embodiments, the updating the weight value according to the network loss may include: a derivative of the network loss for each network weight value is calculated, and the weight values are updated according to the derivative and the learning rate.

S106, judging whether the network loss converges to be lower than a preset first target value or not.

If so, the process proceeds to the subsequent step S107.

If not, the above steps S103 to S105 are repeated until the network loss converges below the predetermined first target value.

And S107, setting a weight value smaller than a preset first threshold value in the neural network to be trained to be 0.

In the deep network accelerator, the network can be further compressed according to the sparsity of the network, and the acceleration can be realized by adopting multiplication of a sparse matrix. Sparse matrix means that the number of parameters of a matrix that are zero is greater than the number of parameters that are non-zero. But at present, the sparsity of the network is increased due to the fact that the model is deployed on the end is not considered in model training.

Network pruning may increase the sparsity of the network by setting a suitable threshold value to 0 for weight values less than the threshold value.

According to the principle, network pruning is carried out on the trained face recognition network, and more sparse network weight parameters are obtained.

And S108, judging whether the network loss converges to be lower than a preset second target value or not.

If so, finishing the sparse network training to obtain the face recognition algorithm model.

If not, the above steps S103 to S105 are repeated until the network loss converges below a predetermined second target value.

The second target value is smaller than or equal to the first target value, that is, step S106 and step S107 contribute to reducing network loss, thereby improving the recognition accuracy and recognition speed of the face recognition model.

As shown in fig. 2, it is a training process of a face recognition algorithm in the prior art; fig. 3 shows a training flow of the face recognition algorithm for deep learning in this embodiment.

The above description of the technical solution shows that: in this embodiment, after training until the network loss converges to be lower than the predetermined first target value, the weight value smaller than the preset first threshold value in the neural network to be trained is set to 0, and then training is continued until the network loss converges to be lower than the predetermined second target value, so that the algorithm is more favorable for a sparse module in hardware to exert a higher effect, thereby reducing the running time of the face recognition algorithm and accelerating the detection speed of face recognition.

Example two

As described below, an embodiment of the present invention provides a face recognition method.

The difference from the prior art is that the face recognition method adopts the face recognition model training method provided in the embodiment of the invention to train the neural network to be trained. Therefore, the face recognition method can set the weight value smaller than the preset first threshold value in the neural network to be trained to be 0 after training until the network loss converges to be lower than the preset first target value, then continue training until the network loss converges to be lower than the preset second target value, and enable the algorithm to be more beneficial to a sparse module in hardware to exert higher effect, thereby reducing the running time of the face recognition algorithm and accelerating the detection speed of face recognition.

Those skilled in the art will understand that, in the methods of the embodiments, all or part of the steps can be performed by hardware associated with program instructions, and the program can be stored in a computer-readable storage medium, which can include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A face recognition model training method is characterized by comprising the following steps:

b) preprocessing a face data set to obtain input data;

a vector representing a certain class of vectors is,

is shown as

represents the sum of all weights of the ith layer;

e) updating the weight value according to the network loss;

2. The training method of a face recognition model according to claim 1, wherein the obtaining the face data set comprises: and acquiring a face data set by adopting a local camera and OpenCV.

3. The training method of a face recognition model according to claim 1, wherein the obtaining the face data set comprises: the public face data set is downloaded from the internet.

4. The training method of a face recognition model according to claim 3, wherein the obtaining the face data set further comprises: performing data cleaning on the face data set; the data cleaning of the face data set comprises the following steps:

5. The training method of a face recognition model according to claim 1, wherein the obtaining the face data set comprises: the photos of a plurality of public figures are downloaded from the internet to form a face data set, and the face data set comprises a plurality of photos related to the public figures.

6. The training method of a face recognition model according to claim 1, wherein the preprocessing the face data set comprises: and extracting the component of the YUV graph in the Y direction as the input of the neural network to be trained for the YUV graph in the face data set.

7. The training method of a face recognition model according to claim 1, wherein the preprocessing the face data set comprises: and carrying out random rotation, random turning and/or random cutting on the photos in the face data set.

8. The training method of the face recognition model according to claim 1, wherein the feature extraction of the input data by the neural network to be trained comprises: the neural network of the Inception-Resnet-v1 network structure is adopted to perform feature extraction on input data.

9. The method of training a face recognition model according to claim 1, wherein the updating the weight values according to the network loss comprises: a derivative of the network loss for each network weight value is calculated, and the weight values are updated according to the derivative and the learning rate.

10. The face recognition model training method of claim 1, wherein the second target value is less than or equal to the first target value.

11. A face recognition method, comprising:

training a neural network to be trained by adopting the face recognition model training method of any one of claims 1 to 10;