CN114882558A - Learning scene real-time identity authentication method based on face recognition technology - Google Patents
Learning scene real-time identity authentication method based on face recognition technology Download PDFInfo
- Publication number
- CN114882558A CN114882558A CN202210471987.XA CN202210471987A CN114882558A CN 114882558 A CN114882558 A CN 114882558A CN 202210471987 A CN202210471987 A CN 202210471987A CN 114882558 A CN114882558 A CN 114882558A
- Authority
- CN
- China
- Prior art keywords
- training
- network
- face
- face recognition
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000005516 engineering process Methods 0.000 title claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 92
- 230000006870 function Effects 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims description 73
- 238000012360 testing method Methods 0.000 claims description 35
- 239000013598 vector Substances 0.000 claims description 21
- 230000004927 fusion Effects 0.000 claims description 12
- 230000002238 attenuated effect Effects 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000005764 inhibitory process Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000011158 quantitative evaluation Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract 1
- 210000003128 head Anatomy 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/36—Indoor scenes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Computer Security & Cryptography (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Image Analysis (AREA)
- Collating Specific Patterns (AREA)
Abstract
A learning scene real-time identity authentication method based on a face recognition technology is composed of a face detection module and a face recognition module. The face detection module adopts MoblieNet V1_0.25 as a backbone network to extract face features, an ATSS sampling method as a sampling mode of a positive sample, and a Generalized local area optimization classification and regression branch. The face recognition module adopts a lightweight backbone network MobileFacenets to extract aligned face features, and an MV-Softmax loss function is adopted to adaptively adjust error samples, enhance the distinguishing degree of the features and mine error-prone samples. The invention provides a real-time face recognition method which is used for carrying out real-time identity authentication on learners in a learning scene and has the advantages of high precision and high speed.
Description
Technical Field
The invention belongs to the technical field of face recognition, and particularly relates to a learning scene real-time identity authentication method based on a face recognition technology.
Background
When a learning scene such as a library, an examination room and the like needs to enter, the identity information of the learner needs to be acquired, and whether the learner has the authority to enter the learning scene is judged according to the identity information of the learner. With the development of the deep neural network and the monitoring equipment, the cost for acquiring the monitoring video is reduced. The method for authenticating the identity of the learner in the learning scene by using the related technologies of monitoring video data and computer vision is challenging, has wide application scenes and research values, and draws more and more attention in both academic circles and industrial circles.
At present, mainstream face detection and face recognition methods are developed rapidly, the detection and recognition precision on a public data set is higher and higher, but the balance between the precision and the speed cannot be achieved. In the learner identity task in the education scene, the identity of the learner needs to be accurately authenticated in real time.
Therefore, in an educational scene, one technical problem to be solved at present is to provide an identity authentication method which is compatible with both precision and speed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a learning scene real-time identity authentication method based on a face recognition technology, which is simple in method, high in recognition speed and high in recognition precision.
The technical scheme for solving the technical problems is as follows: a learning scene real-time identity authentication method based on a face recognition technology comprises the following steps:
s1 data preprocessing
Collecting a learning scene learner video data set, generating a face detection data set FDL in a learning scene and a face identification data set FRL in the learning scene according to the learning scene learner video data set, taking original pictures from the face detection data set FDL in the learning scene, dividing the original pictures into training sets and testing sets with a certain number of proportions, and adjusting the pictures into uniform sizes; taking original pictures from a face recognition data set FRL in a learning scene, dividing the original pictures into training sets and testing sets with a certain number of parts, and adjusting the pictures into uniform sizes;
s2, constructing a face detection network, wherein the face detection network is formed by sequentially connecting a backbone network branch, a feature fusion branch and a detection head branch;
1) building backbone network branches
Inputting each picture in a training set of a face detection data set FDL in a learning scene into a feature MobileNet V1_0.25 network to extract a feature map, and obtaining a basic feature map set { C1, C2, C3, C4 and C5 };
2) constructing feature fusion branches
Inputting the basic feature map set { C1, C2, C3, C4 and C5} into a feature pyramid for feature fusion to obtain a fused feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Will fuse the feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Inputting the feature maps in the feature map set into a scale balance pyramid module to further fuse the features with different scales to obtain an enhanced feature map set { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 };
3) Constructing a detection head branch, wherein the detection head branch comprises a regression branch and a classification branch;
firstly, a regression branch is constructed, and the specific method comprises the following steps:
firstly, an enhanced feature map set { N 3 ,N 4 ,N 5 ,N 6 ,N 7 The pixel points on each characteristic graph in the graph are sequentially set to have the area of [64,128,256,512,1024 ]]Aspect ratio of 1, and then set the enhanced feature map { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 Performing convolution operation on each feature map for t times to obtain a feature map with the size of H multiplied by W multiplied by a (n +1), wherein H is the height of the feature map, W is the width of the feature map, a is a regression value of an anchor frame, and n is the maximum value of an integration set;
secondly, constructing a classification branch, wherein the specific method comprises the following steps:
set enhanced feature map N 3 ,N 4 ,N 5 ,N 6 ,N 7 Performing convolution operation on each feature map for t times to obtain a feature map with the size of H multiplied by W multiplied by C, wherein C is a detection type;
and finally determining the loss L of the face detection network:
L=L cls +L box +L dfl
in the formula, L cls As class loss, L box To predict the frame regression loss, L dfl Is a loss of distribution;
optimizing the feature vectors extracted from the backbone network branch and the feature fusion branch through category loss, prediction frame regression loss and distribution loss, performing back propagation by using a random gradient descent algorithm, and updating network parameters;
s3 construction of face recognition network
1) Building backbone network branches
Extracting features of each picture in a training set of a face recognition data set FRL in a learning scene through a MobileFaceNet network, and generating a feature vector with 512 dimensions for each face picture;
2) optimizing the feature vectors extracted from the backbone network branches through an MV-Softmax loss function, enhancing the discrimination of different features and carrying out reverse propagation;
the face recognition network loss is defined by the MV-Softmax loss function as follows:
in the formula, k is the kth category, x is the vector to be classified, s is the scale hyper-parameter, omega is the weight vector, y is the yth category, m is the minimum spacing distance of the error angle, p y Is the predicted posterior probability, g (p) y ) The function is used to mine difficult samples, I k To indicate the function, to dynamically specify whether a sample is misclassified,is a weighted function of the misclassified samples,is the error angle of the weight vector and the vector to be classified;
s4 training human face detection network and human face recognition network
The method for training the face detection network comprises the following steps:
inputting a training set in a face detection data set FDL in a learning scene, a corresponding label file and pre-training on an ImageNet network into a face detection network for training to obtain a face detection model; in the training process, an initial learning rate R is set, an SGD optimizer is used for optimizing the model, during training, the number of pictures is B1 in each iteration, the total iteration number is E1, and the learning rate is attenuated to 10 after K11 and K12 epochs -1 R、10 -2 R;
The method for training the face recognition network comprises the following steps:
inputting a training set, a corresponding label file and pre-training on an ImageNet network in a face recognition data set FRL in a learning scene into a face recognition network for training to obtain a face recognition model, setting an initial learning rate to be D in the training process, and optimizing the model by using an SGD optimizer, wherein during training, the number of pictures used in each iteration is B2, the total iteration number is E2, and the learning rate is attenuated to 10 after K21, K22 and K23 epochs -1 D、10 -2 D、10 -3 D;
S5 saving the weight file
The face detection model and the face recognition model store the optimized weight file after F-round iteration;
s6 testing real-time identity authentication
1) Face detection test
Inputting the stored face detection weight file, the stored test data set and the corresponding label file into a face detection network for testing, carrying out post-processing on the output of the face detection network by using a non-maximum value inhibition and confidence threshold value method to obtain a face detection result, and calculating the accuracy and speed quantitative evaluation of the mAP value on the face detection result by using a real target frame and a predicted target frame;
2) face recognition testing
Inputting the stored face recognition weight file, the test data set and the corresponding label file into a face recognition network for testing and obtaining a recognition result, calculating the cosine similarity value of the real face label and the predicted face label, and quantitatively evaluating the precision of the recognition result by a confidence threshold method.
As a preferred technical solution, in the step S1, the ratio of the number of training sets to the number of test sets of the face detection data set FDL in the learning scene is 9:1 or 8:2 or 7:3, uniformly adjusting the size of the picture to 416 multiplied by 416 by adopting bilinear interpolation; the number ratio of the training sets to the test sets of the face recognition data set FRL in the learning scene is 9:1 or 8:2 or 7:3, the picture size is uniformly adjusted to 112 × 112 by using bilinear interpolation.
As a preferred technical scheme, in the step S2 of constructing the regression branch { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 After t times of convolution operation are carried out on each feature map, the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, and a feature map with the size of H multiplied by W multiplied by a (n +1) is obtained, wherein H is the height of the feature map, W is the width of the feature map, a is the regression value of an anchor frame, the optimal value of a is 4, n is the maximum value of an integration set, and the optimal value of n is 16;
feature map set N to be enhanced in construction of classification branch 3 ,N 4 ,N 5 ,N 6 ,N 7 And (4) performing convolution operation on each feature map for t times, wherein the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, so that the feature map with the size of H multiplied by W multiplied by C is obtained, C is the detection type, and the optimal value of C is 1.
As a preferable technical solution, in the training face detection network of step S4, an initial learning rate R is set to be 0.01, during training, the number of pictures B1 in each iteration is 12, the number of iterations E1 is 24, and the learning rate is attenuated to 0.001 and 0.0001 after K11 and K12 epochs;
in the training face recognition network, an initial learning rate D is set to be 0.1, the number B2 of images iterated each time is set to be 512, the iteration number E2 is set to be 18, and the epoch attenuations of the learning rates at K21, K22 and K23 are set to be 0.01, 0.001 and 0.0001.
As a preferable technical solution, the initial learning rate R is set to be 0.05 in the training face detection network in step S4, during training, the number of pictures B1 in each iteration is 12, the number of iterations E1 is 24, and the learning rate is attenuated to 0.005 and 0.0005 after K11 and K12 epochs;
in the training face recognition network, an initial learning rate D is set to be 0.5, the number B2 of images in each iteration is set to be 512, the iteration number E2 is set to be 18, and the epoch attenuation of the learning rate in K21, K22 and K23 is set to be 0.05, 0.005 and 0.0005.
As a preferred technical solution, in the step S5, the iteration F is 1.
The invention has the following beneficial effects:
the invention adopts the light-weight backbone network to reduce the parameter quantity and the detection speed of the network, adopts the scale balance pyramid module to fully fuse the detail information of the shallow layer and the semantic information of the deep layer, and adopts the ATSS sampling method and the GFL Loss to optimize the detection precision. Compared with the prior art, the method has the advantages of simplicity, high positioning speed, high positioning precision and the like, and can be used for detecting the dense face pictures in the educational scene monitoring video in real time.
Drawings
FIG. 1 is a flowchart of example 1 of the present invention.
Fig. 2 shows four pictures in the face detection data set FDL in the learning scene in example 1.
Fig. 3 is eight pictures in the face recognition data set FRL in the learning scenario of example 1.
Fig. 4 is a schematic structural diagram of a face detection network in embodiment 1 of the present invention.
Fig. 5 is a schematic structural diagram of a face recognition network according to embodiment 1 of the present invention.
Fig. 6 is a diagram of the result of the face detection according to the present invention.
Fig. 7 is a graph of the recognition result of fig. 2 according to the present invention.
Detailed Description
The present invention will be described in further detail below with reference to the drawings and examples, but the present invention is not limited to the embodiments described below.
Example 1
Taking monitoring video data from an educational scene as an example, the learning scene real-time identity authentication method based on the face recognition technology of the embodiment, as shown in fig. 1, includes the following steps:
s1 data preprocessing
Collecting a learning scene learner video data set, generating a face detection data set FDL in a learning scene according to the learning scene learner video data set, as shown in FIG. 2, a face recognition data set FRL in the learning scene, as shown in FIG. 3, taking original pictures from the face detection data set FDL in the learning scene, dividing the original pictures into training sets and test sets with the number ratio of 9:1, wherein the number ratio of the training sets to the test sets can be 8:2 or 7:3, uniformly adjusting the pictures to 416 multiplied by 416 size by adopting a bilinear interpolation method; taking an original picture from a face recognition data set FRL in a learning scene, dividing the original picture into a training set and a test set with the quantity ratio of 9:1, wherein the quantity ratio of the training set to the test set can be 8:2 or 7:3, and uniformly adjusting the picture to be 112 multiplied by 112 by adopting a bilinear interpolation method;
s2, constructing a face detection network, wherein the face detection network is formed by sequentially connecting a backbone network branch, a feature fusion branch and a detection head branch, as shown in FIG. 4;
1) building backbone network branches
Inputting each picture in a training set of a face detection data set FDL in a learning scene into a feature MobileNet V1_0.25 network to extract a feature map, and obtaining a basic feature map set { C1, C2, C3, C4 and C5 };
2) constructing feature fusion branches
Inputting the basic feature map set { C1, C2, C3, C4 and C5} into a feature pyramid for feature fusion to obtain a fused feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Will fuse the feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Inputting the feature maps in the feature map set into a scale balance pyramid module to further fuse the features with different scales to obtain an enhanced feature map set { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 };
3) Constructing a detection head branch, wherein the detection head branch comprises a regression branch and a classification branch;
firstly, a regression branch is constructed, and the specific method comprises the following steps:
firstly, an enhanced feature map set { N 3 ,N 4 ,N 5 ,N 6 ,N 7 The pixel points on each characteristic graph in the graph are sequentially set to have the area of [64,128,256,512,1024 ]]Aspect ratio of 1, and then set the enhanced feature map { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 After t times of convolution operation are carried out on each feature map, the value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, and a feature map with the size of H multiplied by W multiplied by a (n +1) is obtained, wherein H is the height of the feature map, W is the width of the feature map, a is the regression value of an anchor frame, the value of a is 4, n is the maximum value of an integral set, and the value of n is 16;
secondly, constructing a classification branch, wherein the specific method comprises the following steps:
the enhanced feature map set N 3 ,N 4 ,N 5 ,N 6 ,N 7 Performing convolution operation on each feature map for t times, wherein the value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, and a feature map with the size of H multiplied by W multiplied by C is obtained, wherein C is a detection category, and the value of C is 1;
and finally determining the loss L of the face detection network:
L=L cls +L box +L dfl
in the formula, L cls As class loss, L box To predict the frame regression loss, L dfl Is a loss of distribution;
optimizing the feature vectors extracted from the backbone network branch and the feature fusion branch through category loss, prediction frame regression loss and distribution loss, performing back propagation by using a random gradient descent algorithm, and updating network parameters;
s3 construction of face recognition network
The face recognition network consists of a backbone network branch and a loss function, wherein the backbone network branch is responsible for extracting face features, and the loss function is responsible for optimizing the features, as shown in fig. 5;
1) building backbone network branches
Extracting features of each picture in a training set of a face recognition data set FRL in a learning scene through a MobileFaceNet network, and generating a feature vector with 512 dimensions for each face picture;
2) optimizing the feature vectors extracted from the backbone network branches through an MV-Softmax loss function, enhancing the discrimination of different features and performing reverse propagation;
the face recognition network loss is defined by the MV-Softmax loss function as follows:
in the formula, k is the kth category, x is the vector to be classified, s is the scale hyper-parameter, omega is the weight vector, y is the yth category, m is the minimum spacing distance of the error angle, p y Is the predicted posterior probability, g (p) y ) The function is used to mine difficult samples, I k To indicate the function, to dynamically specify whether a sample is misclassified,is a weighted function of the misclassified samples,is the error angle of the weight vector and the vector to be classified;
s4 training face detection network and face recognition network
The method for training the face detection network comprises the following steps:
inputting a training set, a corresponding label file and pre-training on an ImageNet network in a face detection data set FDL in a learning scene into a face detection network for training to obtain a face detection model, setting an initial learning rate to be 0.01 in the training process, and optimizing the model by using an SGD optimizer, wherein during training, the number of pictures is B1, B1 is 12, the total iteration frequency is E1, E1 is 24, and the learning rate is attenuated to be 0.001 and 0.0001 after K11 and K12 epochs;
the method for training the face recognition network comprises the following steps:
inputting a training set, a corresponding label file and pre-training on an ImageNet network in a face recognition data set FRL in a learning scene into a face recognition network for training to obtain a face recognition model, setting an initial learning rate D to be 0.1 in the training process, and optimizing the model by using an SGD optimizer, wherein during training, the number of pictures used in each iteration is B2, B2 is 512, the total iteration number is E2, E2 is 18, and the learning rate is attenuated to 0.01, 0.001 and 0.0001 after K21, K22 and K23 epochs;
s5 saving the weight file
The face detection model and the face recognition model store the optimized weight file after 1-round iteration;
s6 testing real-time identity authentication
1) Face detection test
Inputting the stored face detection weight file, the stored test data set and the corresponding label file into a face detection network for testing, performing post-processing on the output of the face detection network by using a non-maximum value inhibition and confidence threshold method to obtain a face detection result, wherein the face detection result comprises a central point and a length and a width of a detection frame, calculating by using a real target frame and a predicted target frame to obtain a mean average precision mAP value of 0.888 and simultaneously obtain an FPS value of 50.5FPS, and referring to FIG. 6;
2) face recognition testing
Inputting the stored face recognition weight file, the test data set and the corresponding label file into a face recognition network for testing and obtaining a recognition result, including the name, the academic number and the professional information of the learner, as shown in fig. 7, calculating the cosine similarity value of the real face label and the predicted face label, setting the confidence threshold value to be 0.8, and considering that the recognition is correct for the result with the similarity of the rest strings being more than 0.8, so as to obtain the accuracy of 94.27%.
Example 2
In this embodiment, the initial learning rate R is set to be 0.05 in the training face detection network in step S4, during training, the number of images iterated each time B1 is 12, the iteration number E1 is 24, and the learning rate is attenuated to 0.005 and 0.0005 after K11 and K12 epochs; in the training face recognition network, an initial learning rate D is set to be 0.5, the number B2 of images in each iteration is set to be 512, the iteration number E2 is set to be 18, and the epoch attenuation of the learning rate in K21, K22 and K23 is set to be 0.05, 0.005 and 0.0005.
The other operation steps are the same as in example 1.
In order to verify the beneficial effects of the invention, the inventor tests an identity authentication comparison experiment in an educational scene monitoring video by using the learning scene real-time identity authentication method based on the face recognition technology in the embodiment 1 and the face detection and recognition method in the prior art, and the comparison results are shown in the following tables 1 and 2;
TABLE 1 comparison of real-time face detection model and mainstream model
Wherein: the RetinaFace is a single-stage and multi-task face detection model, and the model uses anchors with different proportions on different characteristics, introduces an independent context module and uses deformable convolution when in transverse connection; TinaFace is also a single-stage face detection model that introduces IoU prediction branches, proposes the use of the inclusion module enhancement features, and uses DIoU as a regression loss.
TABLE 2 comparison of real-time face recognition model and mainstream model
Wherein: ResNet is one of the classical classification networks, and effectively relieves gradient explosion and disappearance in the training process by adding jump connection to the convolution blocks, and greatly improves the depth of the convolution neural network; the EfficientNet searches three parameters of depth, width and resolution of a convolutional neural network by adopting a neural network structure searching method, and can realize higher precision under the condition that the parameter quantity is far smaller than that of a manually designed classification network; the HRNet is a powerful backbone network, and has better performance on small target and fine granularity data by always keeping high resolution characteristics; the GhostNet aims at the problem that a common convolution module has characteristic redundancy, a series of cheap characteristic diagrams are generated by a plurality of basic characteristic diagrams by a linear transformation method, and the precision is better under the condition of extremely low parameter quantity.
Analyzing the experimental result:
the human face detection model of the invention can still keep a faster detection speed and a smaller parameter number under the condition of meeting higher detection precision, and achieves a good balance between the Tinaface and the Retina face.
The accuracy performance of the face recognition model is equal to that of large-scale main stems such as Resnet and HRNet, and the MV-Softmax as a loss function has better comprehensive performance than other loss functions, which shows that the face recognition model has obvious advantages in parameter quantity and calculated quantity under the condition of meeting the accuracy requirement.
Claims (6)
1. A learning scene real-time identity authentication method based on a face recognition technology is characterized by comprising the following steps:
s1 data preprocessing
Collecting a learning scene learner video data set, generating a face detection data set FDL in a learning scene and a face identification data set FRL in the learning scene according to the learning scene learner video data set, taking original pictures from the face detection data set FDL in the learning scene, dividing the original pictures into training sets and testing sets with a certain number of proportions, and adjusting the pictures into uniform sizes; taking original pictures from a face recognition data set FRL in a learning scene, dividing the original pictures into training sets and testing sets with a certain number of parts, and adjusting the pictures into uniform sizes;
s2, constructing a face detection network, wherein the face detection network is formed by sequentially connecting a backbone network branch, a feature fusion branch and a detection head branch;
1) building backbone network branches
Inputting each picture in a training set of a face detection data set FDL in a learning scene into a feature MobileNet V1_0.25 network to extract a feature map, and obtaining a basic feature map set { C1, C2, C3, C4 and C5 };
2) constructing feature fusion branches
Inputting the basic feature map set { C1, C2, C3, C4 and C5} into a feature pyramid for feature fusion to obtain a fused feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Will fuse the feature map set { P } 3 ,P 4 ,P 5 ,P 6 ,P 7 Inputting the feature maps in the feature map set into a scale balance pyramid module to further fuse the features with different scales to obtain an enhanced feature map set { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 };
3) Constructing a detection head branch, wherein the detection head branch comprises a regression branch and a classification branch;
firstly, a regression branch is constructed, and the specific method comprises the following steps:
firstly, an enhanced feature map set { N 3 ,N 4 ,N 5 ,N 6 ,N 7 The pixel points on each characteristic graph in the graph are sequentially set to have the area of [64,128,256,512,1024 ]]Aspect ratio of 1, and then set the enhanced feature map { N } 3 ,N 4 ,N 5 ,N 6 ,N 7 Convolution operation of each characteristic graph for t timesObtaining a feature map with the size of H multiplied by W multiplied by a (n +1), wherein H is the height of the feature map, W is the width of the feature map, a is a regression value of an anchor frame, and n is the maximum value of an integral set;
secondly, constructing a classification branch, wherein the specific method comprises the following steps:
set enhanced feature map N 3 ,N 4 ,N 5 ,N 6 ,N 7 Performing convolution operation on each feature map for t times to obtain a feature map with the size of H multiplied by W multiplied by C, wherein C is a detection type;
and finally determining the loss L of the face detection network:
L=L cls +L box +L dfl
in the formula, L cls As class loss, L box To predict the frame regression loss, L dfl To distribute losses;
optimizing the feature vectors extracted from the backbone network branch and the feature fusion branch through category loss, prediction frame regression loss and distribution loss, performing back propagation by using a random gradient descent algorithm, and updating network parameters;
s3 construction of face recognition network
1) Building backbone network branches
Extracting features of each picture in a training set of a face recognition data set FRL in a learning scene through a MobileFaceNet network, and generating a feature vector with 512 dimensions for each face picture;
2) optimizing the feature vectors extracted from the backbone network branches through an MV-Softmax loss function, enhancing the discrimination of different features and carrying out reverse propagation;
the face recognition network loss is defined by the MV-Softmax loss function as follows:
in the formula, k is the kth category, x is the vector to be classified, s is the scale hyper-parameter, omega is the weight vector, y is the yth category, m is the minimum spacing distance of the error angle, p y Is a predicted a posteriori probability,g(p y ) The function is used to mine difficult samples, I k To indicate the function, to dynamically specify whether a sample is misclassified,is a weighted function of the misclassified samples,is the error angle of the weight vector and the vector to be classified;
s4 training face detection network and face recognition network
The method for training the face detection network comprises the following steps:
inputting a training set in a face detection data set FDL in a learning scene, a corresponding label file and pre-training on an ImageNet network into a face detection network for training to obtain a face detection model; in the training process, an initial learning rate R is set, an SGD optimizer is used for optimizing the model, during training, the number of pictures is B1 in each iteration, the total iteration number is E1, and the learning rate is attenuated to 10 after K11 and K12 epochs -1 R、10 -2 R;
The method for training the face recognition network comprises the following steps:
inputting a training set, a corresponding label file and pre-training on an ImageNet network in a face recognition data set FRL in a learning scene into a face recognition network for training to obtain a face recognition model, setting an initial learning rate to be D in the training process, and optimizing the model by using an SGD optimizer, wherein during training, the number of pictures used in each iteration is B2, the total iteration number is E2, and the learning rate is attenuated to 10 after K21, K22 and K23 epochs -1 D、10 -2 D、10 -3 D;
S5 saving the weight file
The face detection model and the face recognition model store the optimized weight file after F-round iteration;
s6 testing real-time identity authentication
1) Face detection test
Inputting the stored face detection weight file, the stored test data set and the corresponding label file into a face detection network for testing, carrying out post-processing on the output of the face detection network by using a non-maximum value inhibition and confidence threshold value method to obtain a face detection result, and calculating the accuracy and speed quantitative evaluation of the mAP value on the face detection result by using a real target frame and a predicted target frame;
2) face recognition testing
Inputting the stored face recognition weight file, the test data set and the corresponding label file into a face recognition network for testing and obtaining a recognition result, calculating the cosine similarity value of the real face label and the predicted face label, and quantitatively evaluating the precision of the recognition result by a confidence threshold method.
2. The learning scene real-time identity authentication method based on the face recognition technology as claimed in claim 1, wherein: in the step S1, the number ratio of the training sets to the test sets of the face detection data set FDL in the learning scene is 9:1 or 8:2 or 7:3, uniformly adjusting the size of the picture to 416 multiplied by 416 by adopting bilinear interpolation; the number ratio of the training sets to the test sets of the face recognition data set FRL in the learning scene is 9:1 or 8:2 or 7:3, the picture size is uniformly adjusted to 112 × 112 by using bilinear interpolation.
3. The learning scene real-time identity authentication method based on the face recognition technology as claimed in claim 1, wherein: in the regression branch of step S2 construction, { N 3 ,N 4 ,N 5 ,N 6 ,N 7 After t times of convolution operation are carried out on each feature map, the optimal value of t is 4, the convolution size is 3 multiplied by 3, the step length is 1, and a feature map with the size of H multiplied by W multiplied by a (n +1) is obtained, wherein H is the height of the feature map, W is the width of the feature map, a is the regression value of an anchor frame, the optimal value of a is 4, n is the maximum value of an integration set, and the optimal value of n is 16;
feature map set N to be enhanced in construction of classification branch 3 ,N 4 ,N 5 ,N 6 ,N 7 Each feature map inAfter t times of convolution operation, the optimal value of t is 4, the convolution size is 3 × 3, the step length is 1, a feature graph with the size of H × W × C is obtained, C is the detection category, and the optimal value of C is 1.
4. The learning scene real-time identity authentication method based on the face recognition technology as claimed in claim 1, wherein: setting an initial learning rate R to be 0.01 in the training face detection network of the step S4, wherein during training, the number of images B1 of each iteration is 12, the iteration frequency E1 is 24, and the learning rate is attenuated to be 0.001 and 0.0001 after K11 and K12 epochs;
in the training face recognition network, an initial learning rate D is set to be 0.1, the number B2 of images iterated each time is set to be 512, the iteration number E2 is set to be 18, and the epoch attenuations of the learning rates at K21, K22 and K23 are set to be 0.01, 0.001 and 0.0001.
5. The learning scene real-time identity authentication method based on the face recognition technology as claimed in claim 1, wherein: setting an initial learning rate R to be 0.05 in the training face detection network of the step S4, wherein during training, the number of images B1 of each iteration is 12, the iteration number E1 is 24, and the learning rate is attenuated to be 0.005 and 0.0005 after K11 and K12 epochs;
in the training face recognition network, an initial learning rate D is set to be 0.5, the number B2 of images in each iteration is set to be 512, the iteration number E2 is set to be 18, and the epoch attenuation of the learning rate in K21, K22 and K23 is set to be 0.05, 0.005 and 0.0005.
6. The learning scene real-time identity authentication method based on the face recognition technology as claimed in claim 1, wherein: in step S5, iteration F is 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210471987.XA CN114882558B (en) | 2022-04-29 | 2022-04-29 | Learning scene real-time identity authentication method based on face recognition technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210471987.XA CN114882558B (en) | 2022-04-29 | 2022-04-29 | Learning scene real-time identity authentication method based on face recognition technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114882558A true CN114882558A (en) | 2022-08-09 |
CN114882558B CN114882558B (en) | 2024-02-23 |
Family
ID=82674446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210471987.XA Active CN114882558B (en) | 2022-04-29 | 2022-04-29 | Learning scene real-time identity authentication method based on face recognition technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114882558B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
CN111783532A (en) * | 2020-05-27 | 2020-10-16 | 东南大学 | Cross-age face recognition method based on online learning |
CN113158862A (en) * | 2021-04-13 | 2021-07-23 | 哈尔滨工业大学(深圳) | Lightweight real-time face detection method based on multiple tasks |
CN113298004A (en) * | 2021-06-03 | 2021-08-24 | 南京佑驾科技有限公司 | Lightweight multi-head age estimation method based on face feature learning |
-
2022
- 2022-04-29 CN CN202210471987.XA patent/CN114882558B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128367A1 (en) * | 2017-12-26 | 2019-07-04 | 广州广电运通金融电子股份有限公司 | Face verification method and apparatus based on triplet loss, and computer device and storage medium |
CN111783532A (en) * | 2020-05-27 | 2020-10-16 | 东南大学 | Cross-age face recognition method based on online learning |
CN113158862A (en) * | 2021-04-13 | 2021-07-23 | 哈尔滨工业大学(深圳) | Lightweight real-time face detection method based on multiple tasks |
CN113298004A (en) * | 2021-06-03 | 2021-08-24 | 南京佑驾科技有限公司 | Lightweight multi-head age estimation method based on face feature learning |
Non-Patent Citations (1)
Title |
---|
李昊璇;吴东东;: "基于深度学习的自然场景下多人脸实时检测", 测试技术学报, no. 01, 17 January 2020 (2020-01-17) * |
Also Published As
Publication number | Publication date |
---|---|
CN114882558B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110533631B (en) | SAR image change detection method based on pyramid pooling twin network | |
CN111814704B (en) | Full convolution examination room target detection method based on cascade attention and point supervision mechanism | |
CN108230278B (en) | Image raindrop removing method based on generation countermeasure network | |
CN110728656A (en) | Meta-learning-based no-reference image quality data processing method and intelligent terminal | |
CN109727246A (en) | Comparative learning image quality evaluation method based on twin network | |
CN113688723A (en) | Infrared image pedestrian target detection method based on improved YOLOv5 | |
CN110349185B (en) | RGBT target tracking model training method and device | |
CN111145116A (en) | Sea surface rainy day image sample augmentation method based on generation of countermeasure network | |
CN108846413A (en) | A kind of zero sample learning method based on global semantic congruence network | |
CN109598220A (en) | A kind of demographic method based on the polynary multiple dimensioned convolution of input | |
CN110287777A (en) | A kind of golden monkey body partitioning algorithm under natural scene | |
CN110110845B (en) | Learning method based on parallel multi-level width neural network | |
CN114511710A (en) | Image target detection method based on convolutional neural network | |
Pratama et al. | Face recognition for presence system by using residual networks-50 architecture | |
CN116229319A (en) | Multi-scale feature fusion class behavior detection method and system | |
CN109670423A (en) | A kind of image identification system based on deep learning, method and medium | |
CN115170874A (en) | Self-distillation implementation method based on decoupling distillation loss | |
CN116452904B (en) | Image aesthetic quality determination method | |
CN114882558B (en) | Learning scene real-time identity authentication method based on face recognition technology | |
TWI803243B (en) | Method for expanding images, computer device and storage medium | |
CN113449631A (en) | Image classification method and system | |
CN113919983A (en) | Test question portrait method, device, electronic equipment and storage medium | |
CN112434615A (en) | Time sequence action detection method based on Tensorflow deep learning framework | |
CN113688789A (en) | Online learning investment recognition method and system based on deep learning | |
CN112434614A (en) | Sliding window action detection method based on Caffe framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |