CN114882558B - Learning scene real-time identity authentication method based on face recognition technology - Google Patents
Learning scene real-time identity authentication method based on face recognition technology Download PDFInfo
- Publication number
- CN114882558B CN114882558B CN202210471987.XA CN202210471987A CN114882558B CN 114882558 B CN114882558 B CN 114882558B CN 202210471987 A CN202210471987 A CN 202210471987A CN 114882558 B CN114882558 B CN 114882558B
- Authority
- CN
- China
- Prior art keywords
- network
- face recognition
- face
- training
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000005516 engineering process Methods 0.000 title claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 91
- 230000006870 function Effects 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims description 69
- 238000012360 testing method Methods 0.000 claims description 34
- 239000013598 vector Substances 0.000 claims description 21
- 230000004927 fusion Effects 0.000 claims description 15
- 238000010276 construction Methods 0.000 claims description 11
- 238000010586 diagram Methods 0.000 claims description 9
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 238000011158 quantitative evaluation Methods 0.000 claims description 2
- 238000005259 measurement Methods 0.000 claims 1
- 238000005070 sampling Methods 0.000 abstract description 3
- 230000004069 differentiation Effects 0.000 abstract 1
- 239000000284 extract Substances 0.000 abstract 1
- 238000012544 monitoring process Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/36—Indoor scenes
Abstract
A learning scene real-time identity authentication method based on face recognition technology is composed of two modules of face detection and face recognition. The face detection module adopts MoblieNetV1_0.25 as a main network to extract face characteristics, adopts an ATSS sampling method as a sampling mode of a positive sample, and adopts Generalized Focal Loss to optimize classification and regression branches. The face recognition module extracts the face characteristics after alignment by using a lightweight backbone network MobileFacenes, adaptively adjusts error samples by using an MV-Softmax loss function, enhances the differentiation of the characteristics and digs error-prone samples. The invention provides a real-time face recognition method which is used for carrying out real-time identity authentication on learners in a learning scene and has the advantages of high precision and high speed.
Description
Technical Field
The invention belongs to the technical field of face recognition, and particularly relates to a learning scene real-time identity authentication method based on the face recognition technology.
Background
If a learning scene such as a library and an examination room needs to be entered, the identity information of the learner needs to be obtained, and whether the learner has permission to enter the learning scene is judged according to the identity information of the learner. With the development of the deep neural network and the monitoring equipment, the cost of acquiring the monitoring video is reduced. The related technology of monitoring video data and computer vision is challenging to identify the learner in the learning scene, has wide application scene and research value, and attracts more and more attention in academia and industry.
The main current face detection and face recognition methods develop rapidly, the detection and recognition accuracy on the public data set is higher and higher, but the balance between the accuracy and the speed cannot be achieved. In the task of learning identity in educational scenes, the identity of the learner needs to be accurately authenticated in real time.
Therefore, in the educational scene, one technical problem to be solved at present is to provide an identity authentication method which has both precision and speed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide the learning scene real-time identity authentication method based on the face recognition technology, which has the advantages of simple method, high recognition speed and high recognition precision.
The technical scheme adopted for solving the technical problems is as follows: the learning scene real-time identity authentication method based on the face recognition technology comprises the following steps:
s1, data preprocessing
Collecting a learning scene learner video data set, generating a face detection data set FDL in a learning scene and a face recognition data set FRL in the learning scene according to the learning scene learner video data set, taking an original picture from the face detection data set FDL in the learning scene, dividing the original picture into a training set and a testing set with a certain quantity ratio, and adjusting the picture to be uniform in size; taking an original picture from a face recognition data set FRL in a learning scene, dividing the original picture into a training set and a testing set with a certain quantity and a certain proportion, and adjusting the picture to be uniform in size;
s2, constructing a face detection network, wherein the face detection network is formed by sequentially connecting a backbone network branch, a feature fusion branch and a detection head branch;
1) Construction of backbone branches
Inputting each picture in a training set of a face detection data set FDL in a learning scene into a characteristic MobileNet V1-0.25 network to extract a characteristic map, and obtaining a basic characteristic map set { C1, C2, C3, C4, C5};
2) Construction of feature fusion branches
Inputting the basic feature map sets { C1, C2, C3, C4 and C5} into a feature pyramid for feature fusion to obtain a fused feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Fusion of feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Inputting the feature images in the feature images into a scale balance pyramid module to further fuse the features with different scales to obtain an enhanced feature image set { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 };
3) Constructing a detection head branch, wherein the detection head branch comprises a regression branch and a classification branch;
firstly, constructing a regression branch, wherein the specific method is as follows:
the enhanced feature diagram is collected { N }, first 3 ,N 4 ,N 5 ,N 6 ,N 7 The pixel setting area on each feature map in the sequence is [64,128,256,512,1024 ]]An anchor with aspect ratio of 1, and then aggregating the enhanced feature maps { N ] 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation for t times to obtain a feature map with the size of H multiplied by W multiplied by a (n+1), H is the height of the feature map, W is the width of the feature map, a is the regression value of the anchor frame, and n is the maximum value of the integral set;
secondly, constructing a classification branch, wherein the specific method is as follows:
the enhanced feature diagram is assembled into { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation t times to obtain a feature map with the size of H multiplied by W multiplied by C, wherein C is the detection category;
finally, determining the loss L of the face detection network:
L=L cls +L box +L dfl
wherein L is cls For category loss, L box To predict frame regression loss, L dfl Is a distributed loss;
optimizing feature vectors extracted from backbone network branches and feature fusion branches through category loss, prediction frame regression loss and distribution loss, and carrying out back propagation and updating network parameters by using a random gradient descent algorithm;
s3, constructing a face recognition network
1) Construction of backbone branches
Extracting characteristics of each picture in a training set of a face recognition data set FRL in a learning scene through a MobileFaceNet network, and generating 512-dimension characteristic vectors for each face picture;
2) Optimizing feature vectors extracted from backbone network branches through an MV-Softmax loss function, enhancing the distinction of different features and carrying out back propagation;
the face recognition network loss is defined by the MV-Softmax loss function as follows:
wherein k is the kth category, x is the vector to be classified, s is the scale super parameter, ω is the weight vector, y is the kth category, m is the minimum spacing distance of the error angle, p y Is the predicted posterior probability, g (p y ) The function is used for digging a difficult sample, I k To indicate a function, to dynamically specify whether a sample is misclassified,is a weighted function of misclassified samples, +.>Is the error angle of the weight vector and the vector to be classified;
s4, training a face detection network and a face recognition network
The method for training the face detection network comprises the following steps:
inputting a training set in a face detection data set FDL in a learning scene, a corresponding tag file and pre-training on an ImageNet network into the face detection network for training to obtain a face detection model; in the training process, setting an initial learning rate R, optimizing a model by using an SGD optimizer, wherein during training, the number of pictures is B1 in each iteration, the total iteration number is E1, and the learning rate decays to 10 after K11 and K12 epochs -1 R、10 -2 R;
The method for training the face recognition network comprises the following steps:
inputting a training set in a face recognition data set FRL, a corresponding tag file and pre-training on an ImageNet network in a learning scene into the face recognition network to train to obtain a face recognition model, setting an initial learning rate as D in the training process, optimizing the model by using an SGD optimizer, wherein the number of pictures used per iteration is B2 during training, the total iteration number is E2, and the learning rate decays to 10 after K21, K22 and K23 epochs -1 D、10 -2 D、10 -3 D;
S5, saving the weight file
The face detection model and the face recognition model both store optimized weight files after F rounds of iteration;
s6, testing real-time identity authentication
1) Face detection test
Inputting the stored face detection weight file, the test data set and the corresponding label file into a face detection network for testing, performing post-processing on the output of the face detection network by using a non-maximum suppression and confidence threshold method to obtain a face detection result, and calculating the accuracy and the speed quantitative evaluation of mAP values on the face detection result by using a real target frame and a prediction target frame;
2) Face recognition test
And inputting the stored face recognition weight file, the test data set and the corresponding label file into a face recognition network to test and obtain a recognition result, calculating the cosine similarity value of the real face label and the predicted face label, and quantitatively evaluating the accuracy of the recognition result by a confidence threshold value method.
As a preferred technical solution, in the step S1, the number ratio of training sets to test sets of the face detection data set FDL in the learning scenario is 9:1 or 8:2 or 7:3, uniformly adjusting the size of the picture to 416 multiplied by 416 by adopting bilinear interpolation; the quantity ratio of training set to test set of face recognition data set FRL in the learning scene is 9:1 or 8:2 or 7: and 3, uniformly adjusting the picture size to 112×112 by adopting bilinear interpolation.
As a preferable technical scheme, { N in the construction regression branch of the step S2 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation for t times, wherein the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, the feature map with the size of H multiplied by W multiplied by a (n+1) is obtained, H is the height of the feature map, W is the width of the feature map, a is the regression value of an anchor frame, the optimal value of a is 4, n is the maximum value of an integral set, and the optimal value of n is 16;
building a feature map set { N } to be enhanced in a classification branch 3 ,N 4 ,N 5 ,N 6 ,N 7 And (3) carrying out convolution operation for t times on each feature map, wherein the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, the feature map with the size of H multiplied by W multiplied by C is obtained, C is the detection category, and the optimal value of C is 1.
As a preferable technical solution, the training face detection network in step S4 sets an initial learning rate R to 0.01, and during training, the number of iterative pictures B1 is 12, the number of iterations E1 is 24, and the learning rate decays to 0.001 and 0.0001 after K11 and K12 epochs;
the initial learning rate D is set to be 0.1 in the training face recognition network, the number of pictures in each iteration B2 is 512, the iteration number E2 is 18, and the learning rate is 0.01, 0.001 and 0.0001 in K21, K22 and K23 epochs.
As a preferable technical scheme, the training face detection network in step S4 sets an initial learning rate R to 0.05, and during training, the number of iterative pictures B1 is 12, the number of iterations E1 is 24, and the learning rate decays to 0.005 and 0.0005 after K11 and K12 epochs;
the initial learning rate D is set to be 0.5 in the training face recognition network, the number of pictures in each iteration B2 is 512, the iteration number E2 is 18, and the learning rate is 0.05, 0.005 and 0.0005 in K21, K22 and K23 epoch attenuations.
As a preferable technical solution, the iteration round F in step S5 is 1.
The beneficial effects of the invention are as follows:
the invention adopts a lightweight backbone network to reduce the parameter number and detection speed of the network, adopts a scale balance pyramid module to fully integrate the detail information of the shallow layer and the semantic information of the deep layer, and adopts an ATSS sampling method and GFL Loss to optimize the detection precision. Compared with the prior art, the method has the advantages of simplicity, high positioning speed, high positioning precision and the like, and can detect dense face pictures in the monitoring video of the educational scene in real time.
Drawings
Fig. 1 is a flow chart of embodiment 1 of the present invention.
Fig. 2 is four pictures in the face detection data set FDL in the learning scene of embodiment 1.
Fig. 3 is eight pictures in the face recognition data set FRL in the learning scenario of embodiment 1.
Fig. 4 is a schematic structural diagram of a face detection network according to embodiment 1 of the present invention.
Fig. 5 is a schematic structural diagram of a face recognition network according to embodiment 1 of the present invention.
Fig. 6 is a diagram of the face detection result of the present invention.
Fig. 7 is a graph of the recognition result of fig. 2 according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the following embodiments.
Example 1
Taking monitoring video data from an educational scenario as an example, the learning scenario real-time identity authentication method based on face recognition technology of the present embodiment, as shown in fig. 1, includes the following steps:
s1, data preprocessing
Collecting a learning scene learner video data set, generating a face detection data set FDL in a learning scene according to the learning scene learner video data set, as shown in fig. 2, and a face recognition data set FRL in the learning scene, as shown in fig. 3, wherein an original picture is taken from the face detection data set FDL in the learning scene to be divided into a training set and a testing set with the number ratio of 9:1, and the number ratio of the training set to the testing set can be 8:2 or 7:3, uniformly adjusting the pictures to 416×416 by adopting a bilinear interpolation method; taking an original picture from a face recognition data set FRL in a learning scene, dividing the original picture into a training set and a testing set with the number ratio of 9:1, wherein the number ratio of the training set to the testing set can be 8:2 or 7:3, and uniformly adjusting the picture to 112X 112 by adopting a bilinear interpolation method;
s2, constructing a face detection network, wherein the face detection network is formed by sequentially connecting a backbone network branch, a feature fusion branch and a detection head branch, as shown in fig. 4;
1) Construction of backbone branches
Inputting each picture in a training set of a face detection data set FDL in a learning scene into a characteristic MobileNet V1-0.25 network to extract a characteristic map, and obtaining a basic characteristic map set { C1, C2, C3, C4, C5};
2) Construction of feature fusion branches
Inputting the basic feature map sets { C1, C2, C3, C4 and C5} into a feature pyramid for feature fusion to obtain a fused feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Fusion of feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Inputting the feature images in the feature images into a scale balance pyramid module to further fuse the features with different scales to obtain an enhanced feature image set { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 };
3) Constructing a detection head branch, wherein the detection head branch comprises a regression branch and a classification branch;
firstly, constructing a regression branch, wherein the specific method is as follows:
the enhanced feature diagram is collected { N }, first 3 ,N 4 ,N 5 ,N 6 ,N 7 The pixel setting area on each feature map in the sequence is [64,128,256,512,1024 ]]An anchor with aspect ratio of 1, and then aggregating the enhanced feature maps { N ] 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation for t times, wherein the value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, the feature map with the size of H multiplied by W multiplied by a (n+1) is obtained, H is the height of the feature map, W is the width of the feature map, a is the regression value of an anchor frame, the value of a is 4, n is the maximum value of an integral set, and the value of n is 16;
secondly, constructing a classification branch, wherein the specific method is as follows:
the enhanced feature diagram is assembled into { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation for t times, wherein the value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, the feature map with the size of H multiplied by W multiplied by C is obtained, C is the detection category, and the value of C is 1;
finally, determining the loss L of the face detection network:
L=L cls +L box +L dfl
wherein L is cls For category loss, L box To predict frame regression loss, L dfl Is a distributed loss;
optimizing feature vectors extracted from backbone network branches and feature fusion branches through category loss, prediction frame regression loss and distribution loss, and carrying out counter propagation and updating network parameters by using a random gradient descent algorithm;
s3, constructing a face recognition network
The face recognition network consists of backbone network branches and loss functions, wherein the backbone network branches are responsible for extracting face features, and the loss functions are responsible for optimizing the features, as shown in fig. 5;
1) Construction of backbone branches
Extracting characteristics of each picture in a training set of a face recognition data set FRL in a learning scene through a MobileFaceNet network, and generating 512-dimension characteristic vectors for each face picture;
2) Optimizing feature vectors extracted from backbone network branches through an MV-Softmax loss function, enhancing the distinction of different features and carrying out back propagation;
the face recognition network loss is defined by the MV-Softmax loss function as follows:
wherein k is the kth category, x is the vector to be classified, s is the scale super parameter, ω is the weight vector, y is the kth category, m is the minimum spacing distance of the error angle, p y Is the predicted posterior probability, g (p y ) The function is used for digging a difficult sample, I k To indicate a function, to dynamically specify whether a sample is misclassified,is a weighted function of misclassified samples, +.>Is the error angle of the weight vector and the vector to be classified;
s4, training a face detection network and a face recognition network
The method for training the face detection network comprises the following steps:
inputting a training set in a face detection data set FDL in a learning scene, a corresponding tag file and pre-training on an ImageNet network into the face detection network to perform training to obtain a face detection model, setting an initial learning rate R as 0.01 in the training process, and optimizing the model by using an SGD optimizer, wherein the number of pictures is B1 and B1=12 in each iteration during training, the total iteration times is E1 and E1=24, and the learning rate decays to 0.001 and 0.0001 after K11 and K12 epochs;
the method for training the face recognition network comprises the following steps:
inputting a training set in a face recognition data set FRL, a corresponding tag file and pre-training on an ImageNet network in a learning scene into the face recognition network to train to obtain a face recognition model, setting an initial learning rate D as 0.1 in the training process, optimizing the model by using an SGD optimizer, wherein the number of pictures used in each iteration is B2, B2=512, the total iteration times is E2, E2=18, and the learning rates decay to 0.01, 0.001 and 0.0001 after K21, K22 and K23 epochs in the training process;
s5, saving the weight file
The face detection model and the face recognition model store optimized weight files after F=1 rounds of iteration;
s6, testing real-time identity authentication
1) Face detection test
Inputting the stored face detection weight file, the test data set and the corresponding label file into a face detection network for testing, and performing post-processing on the output of the face detection network by using a non-maximum suppression and confidence threshold method to obtain a face detection result, wherein the face detection result comprises a center point and a length and width of a detection frame, and the average value average precision mAP value is calculated by using a real target frame and a prediction target frame and is 0.888, and meanwhile, the FPS value is 50.5FPS, as shown in figure 6;
2) Face recognition test
The stored face recognition weight file, the test data set and the corresponding label file are input into a face recognition network to be tested, recognition results are obtained, wherein the recognition results comprise the names, the academic numbers and the professional information of learners, as shown in fig. 7, cosine similarity values of real face labels and predicted face labels are calculated, a confidence threshold value is set to be 0.8, and the recognition is considered to be correct for the results with the rest chord similarity being larger than 0.8, so that the accuracy rate is 94.27%.
Example 2
In this embodiment, the initial learning rate R is set to 0.05 in the training face detection network in step S4, and during training, the number of iterative pictures B1 is 12, the number of iterations E1 is 24, and the learning rate decays to 0.005 and 0.0005 after K11 and K12 epochs; the initial learning rate D is set to be 0.5 in the training face recognition network, the number of pictures in each iteration B2 is 512, the iteration number E2 is 18, and the learning rate is 0.05, 0.005 and 0.0005 in K21, K22 and K23 epoch attenuations.
Other operation steps were the same as in example 1.
Test 1
In order to verify the beneficial effects of the invention, the inventor uses the learning scene real-time identity authentication method based on the face recognition technology of the embodiment 1 to test an identity authentication comparison experiment in an education scene monitoring video with the face detection and recognition method of the prior art, and the comparison results are shown in the following tables 1 and 2;
TABLE 1 comparison of real-time face detection model and mainstream model
Wherein: retinaFace is a single-stage, multi-tasked face detection model that uses different proportions of anchors on different features and incorporates independent context modules and uses deformable convolution in cross-connect; tinaFace is also a single-stage face detection model that introduces IoU predictive branching, proposes the use of the acceptance module enhancement feature, and uses DIoU as a regression loss.
Table 2 comparison results of real-time face recognition model and mainstream model
Wherein: resNet is one of classical classification networks, and gradient explosion and disappearance in the training process are effectively relieved by adding jump connection to the convolution blocks, so that the depth of the convolution neural network is greatly improved; the EfficientNet searches three parameters of depth, width and resolution of the convolutional neural network by adopting a neural network structure searching method, and can realize higher precision under the condition that the parameter quantity is far smaller than that of a manually designed classification network; HRNet is a powerful backbone network, and has better performance on small targets and fine granularity data by always keeping high resolution characteristics; the GhostNet aims at the problem of feature redundancy of a common convolution module, a series of cheap feature graphs are generated from a plurality of basic feature graphs by using a linear transformation method, and the precision is good under the condition of extremely low parameter quantity.
Analysis of experimental results:
the face detection model can still keep a high detection speed and a small parameter under the condition of meeting high detection precision, and achieves a good balance between Tinaface and RetinaFace.
The precision performance of the face recognition model is equal to that of large trunks such as Resnet and HRNet, and the MV-Softmax serving as a loss function has better comprehensive performance than other loss functions, so that the face recognition model has obvious advantages in terms of parameters and calculation under the condition of meeting the precision requirement.
Claims (6)
1. The learning scene real-time identity authentication method based on the face recognition technology is characterized by comprising the following steps of:
s1, data preprocessing
Collecting a learning scene learner video data set, generating a face detection data set FDL in a learning scene and a face recognition data set FRL in the learning scene according to the learning scene learner video data set, taking an original picture from the face detection data set FDL in the learning scene, dividing the original picture into a training set and a testing set with a certain quantity ratio, and adjusting the picture to be uniform in size; taking an original picture from a face recognition data set FRL in a learning scene, dividing the original picture into a training set and a testing set with a certain quantity and a certain proportion, and adjusting the picture to be uniform in size;
s2, constructing a face detection network, wherein the face detection network is formed by sequentially connecting a backbone network branch, a feature fusion branch and a detection head branch;
1) Construction of backbone branches
Inputting each picture in a training set of a face detection data set FDL in a learning scene into a characteristic MobileNet V1-0.25 network to extract a characteristic map, and obtaining a basic characteristic map set { C1, C2, C3, C4, C5};
2) Construction of feature fusion branches
Inputting the basic feature map sets { C1, C2, C3, C4 and C5} into a feature pyramid for feature fusion to obtain a fused feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Fusion of feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Inputting the feature images in the feature images into a scale balance pyramid module to further fuse the features with different scales to obtain an enhanced feature image set { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 };
3) Constructing a detection head branch, wherein the detection head branch comprises a regression branch and a classification branch;
firstly, constructing a regression branch, wherein the specific method is as follows:
the enhanced feature diagram is collected { N }, first 3 ,N 4 ,N 5 ,N 6 ,N 7 The pixel setting area on each feature map in the sequence is [64,128,256,512,1024 ]]An anchor with aspect ratio of 1, and then aggregating the enhanced feature maps { N ] 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation for t times to obtain a feature map with the size of H multiplied by W multiplied by a (n+1), H is the height of the feature map, W is the width of the feature map, a is the regression value of the anchor frame, and n is the maximum value of the integral set;
secondly, constructing a classification branch, wherein the specific method is as follows:
the enhanced feature diagram is assembled into { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation t times to obtain a feature map with the size of H multiplied by W multiplied by C, wherein C is the detection category;
finally, determining the loss L of the face detection network:
L=L cls +L box +L dfl
wherein L is cls For category loss, L box To predict frame regression loss, L dfl Is a distributed loss;
optimizing feature vectors extracted from backbone network branches and feature fusion branches through category loss, prediction frame regression loss and distribution loss, and carrying out counter propagation and updating network parameters by using a random gradient descent algorithm;
s3, constructing a face recognition network
1) Construction of backbone branches
Extracting characteristics of each picture in a training set of a face recognition data set FRL in a learning scene through a MobileFaceNet network, and generating 512-dimension characteristic vectors for each face picture;
2) Optimizing feature vectors extracted from backbone network branches through an MV-Softmax loss function, enhancing the distinction of different features and carrying out back propagation;
the face recognition network loss is defined by the MV-Softmax loss function as follows:
wherein k is the kth category, x is the vector to be classified, s is the scale super parameter, ω is the weight vector, y is the kth category, m is the minimum spacing distance of the error angle, p y Is the predicted posterior probability, g (p y ) The function is used for digging a difficult sample, I k To indicate a function, to dynamically specify whether a sample is misclassified,is a weighted function of misclassified samples, +.>Is the error angle of the weight vector and the vector to be classified;
s4, training a face detection network and a face recognition network
The method for training the face detection network comprises the following steps:
inputting training set in face detection data set FDL in learning scene, corresponding label file and pre-training on ImageNet network to face detectionTraining in a measurement network to obtain a face detection model; in the training process, setting an initial learning rate R, optimizing a model by using an SGD optimizer, wherein during training, the number of pictures is B1 in each iteration, the total iteration number is E1, and the learning rate decays to 10 after K11 and K12 epochs -1 R、10 -2 R;
The method for training the face recognition network comprises the following steps:
inputting a training set in a face recognition data set FRL, a corresponding tag file and pre-training on an ImageNet network in a learning scene into the face recognition network to train to obtain a face recognition model, setting an initial learning rate as D in the training process, optimizing the model by using an SGD optimizer, wherein the number of pictures used per iteration is B2 during training, the total iteration number is E2, and the learning rate decays to 10 after K21, K22 and K23 epochs -1 D、10 -2 D、10 -3 D;
S5, saving the weight file
The face detection model and the face recognition model both store optimized weight files after F rounds of iteration;
s6, testing real-time identity authentication
1) Face detection test
Inputting the stored face detection weight file, the test data set and the corresponding label file into a face detection network for testing, performing post-processing on the output of the face detection network by using a non-maximum suppression and confidence threshold method to obtain a face detection result, and calculating the accuracy and the speed quantitative evaluation of mAP values on the face detection result by using a real target frame and a prediction target frame;
2) Face recognition test
And inputting the stored face recognition weight file, the test data set and the corresponding label file into a face recognition network to test and obtain a recognition result, calculating the cosine similarity value of the real face label and the predicted face label, and quantitatively evaluating the accuracy of the recognition result by a confidence threshold value method.
2. The learning scenario real-time identity authentication method based on face recognition technology according to claim 1, wherein the method is characterized in that: in the step S1, the number ratio of training set to test set of the face detection data set FDL in the learning scene is 9:1 or 8:2 or 7:3, uniformly adjusting the size of the picture to 416 multiplied by 416 by adopting bilinear interpolation; the quantity ratio of training set to test set of face recognition data set FRL in the learning scene is 9:1 or 8:2 or 7: and 3, uniformly adjusting the picture size to 112×112 by adopting bilinear interpolation.
3. The learning scenario real-time identity authentication method based on face recognition technology according to claim 1, wherein the method is characterized in that: { N in the construction regression branch of the step S2 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map in the sequence is subjected to convolution operation for t times, wherein the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, the feature map with the size of H multiplied by W multiplied by a (n+1) is obtained, H is the height of the feature map, W is the width of the feature map, a is the regression value of an anchor frame, the optimal value of a is 4, n is the maximum value of an integral set, and the optimal value of n is 16;
building a feature map set { N } to be enhanced in a classification branch 3 ,N 4 ,N 5 ,N 6 ,N 7 And (3) carrying out convolution operation for t times on each feature map, wherein the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, the feature map with the size of H multiplied by W multiplied by C is obtained, C is the detection category, and the optimal value of C is 1.
4. The learning scenario real-time identity authentication method based on face recognition technology according to claim 1, wherein the method is characterized in that: setting an initial learning rate R to be 0.01 in the training face detection network in the step S4, wherein during training, the number B1 of each iteration picture is 12, the iteration number E1 is 24, and the learning rate decays to be 0.001 and 0.0001 after K11 and K12 epochs;
the initial learning rate D is set to be 0.1 in the training face recognition network, the number of pictures in each iteration B2 is 512, the iteration number E2 is 18, and the learning rate is 0.01, 0.001 and 0.0001 in K21, K22 and K23 epochs.
5. The learning scenario real-time identity authentication method based on face recognition technology according to claim 1, wherein the method is characterized in that: setting an initial learning rate R to be 0.05 in the training face detection network in the step S4, wherein during training, the number B1 of each iteration picture is 12, the iteration number E1 is 24, and the learning rate decays to be 0.005 and 0.0005 after K11 and K12 epochs;
the initial learning rate D is set to be 0.5 in the training face recognition network, the number of pictures in each iteration B2 is 512, the iteration number E2 is 18, and the learning rate is 0.05, 0.005 and 0.0005 in K21, K22 and K23 epoch attenuations.
6. The learning scenario real-time identity authentication method based on face recognition technology according to claim 1, wherein the method is characterized in that: and in the step S5, the iteration round F is 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210471987.XA CN114882558B (en) | 2022-04-29 | 2022-04-29 | Learning scene real-time identity authentication method based on face recognition technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210471987.XA CN114882558B (en) | 2022-04-29 | 2022-04-29 | Learning scene real-time identity authentication method based on face recognition technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114882558A CN114882558A (en) | 2022-08-09 |
CN114882558B true CN114882558B (en) | 2024-02-23 |
Family
ID=82674446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210471987.XA Active CN114882558B (en) | 2022-04-29 | 2022-04-29 | Learning scene real-time identity authentication method based on face recognition technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114882558B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
CN111783532A (en) * | 2020-05-27 | 2020-10-16 | 东南大学 | Cross-age face recognition method based on online learning |
CN113158862A (en) * | 2021-04-13 | 2021-07-23 | 哈尔滨工业大学(深圳) | Lightweight real-time face detection method based on multiple tasks |
CN113298004A (en) * | 2021-06-03 | 2021-08-24 | 南京佑驾科技有限公司 | Lightweight multi-head age estimation method based on face feature learning |
-
2022
- 2022-04-29 CN CN202210471987.XA patent/CN114882558B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
CN111783532A (en) * | 2020-05-27 | 2020-10-16 | 东南大学 | Cross-age face recognition method based on online learning |
CN113158862A (en) * | 2021-04-13 | 2021-07-23 | 哈尔滨工业大学(深圳) | Lightweight real-time face detection method based on multiple tasks |
CN113298004A (en) * | 2021-06-03 | 2021-08-24 | 南京佑驾科技有限公司 | Lightweight multi-head age estimation method based on face feature learning |
Non-Patent Citations (1)
Title |
---|
基于深度学习的自然场景下多人脸实时检测;李昊璇;吴东东;;测试技术学报;20200117(01);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114882558A (en) | 2022-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110533631B (en) | SAR image change detection method based on pyramid pooling twin network | |
WO2020010785A1 (en) | Classroom teaching cognitive load measuring system | |
CN104794489B (en) | A kind of induction type image classification method and system based on deep tag prediction | |
CN111814704A (en) | Full convolution examination room target detection method based on cascade attention and point supervision mechanism | |
CN112650886B (en) | Cross-modal video time retrieval method based on cross-modal dynamic convolution network | |
CN110728656A (en) | Meta-learning-based no-reference image quality data processing method and intelligent terminal | |
CN108537119A (en) | A kind of small sample video frequency identifying method | |
CN113435269A (en) | Improved water surface floating object detection and identification method and system based on YOLOv3 | |
CN110689523A (en) | Personalized image information evaluation method based on meta-learning and information data processing terminal | |
CN111401105B (en) | Video expression recognition method, device and equipment | |
CN115147641A (en) | Video classification method based on knowledge distillation and multi-mode fusion | |
CN115170874A (en) | Self-distillation implementation method based on decoupling distillation loss | |
CN111126155A (en) | Pedestrian re-identification method for generating confrontation network based on semantic constraint | |
CN114241587A (en) | Evaluation method and device for human face living body detection confrontation robustness | |
CN111832479B (en) | Video target detection method based on improved self-adaptive anchor point R-CNN | |
CN113095251A (en) | Human body posture estimation method and system | |
CN113411566A (en) | No-reference video quality evaluation method based on deep learning | |
CN114882558B (en) | Learning scene real-time identity authentication method based on face recognition technology | |
CN115601745A (en) | Multi-view three-dimensional object identification method facing application end | |
CN114782983A (en) | Road scene pedestrian detection method based on improved feature pyramid and boundary loss | |
CN113919983A (en) | Test question portrait method, device, electronic equipment and storage medium | |
CN113449631A (en) | Image classification method and system | |
CN113688789A (en) | Online learning investment recognition method and system based on deep learning | |
CN111369124A (en) | Image aesthetic prediction method based on self-generation global features and attention | |
CN111199283A (en) | Air temperature prediction system and method based on convolution cyclic neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |