CN115273202A - Face comparison method, system, equipment and storage medium - Google Patents

Face comparison method, system, equipment and storage medium Download PDF

Info

Publication number
CN115273202A
CN115273202A CN202210949819.7A CN202210949819A CN115273202A CN 115273202 A CN115273202 A CN 115273202A CN 202210949819 A CN202210949819 A CN 202210949819A CN 115273202 A CN115273202 A CN 115273202A
Authority
CN
China
Prior art keywords
face
module
convolution
faces
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210949819.7A
Other languages
Chinese (zh)
Inventor
韦涛
杜欢
梁勇
吴康杰
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Asean Information Harbor Co ltd
Original Assignee
China Asean Information Harbor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Asean Information Harbor Co ltd filed Critical China Asean Information Harbor Co ltd
Priority to CN202210949819.7A priority Critical patent/CN115273202A/en
Publication of CN115273202A publication Critical patent/CN115273202A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a face comparison method, a system, equipment and a storage medium, which belong to the technical field of computer vision and solve the technical problems of poor stability and slow comparison speed of the existing face comparison method, and the method comprises the following steps: constructing a lightweight face detection module based on a convolutional neural network, and detecting a face image to obtain a series of face candidate frames; decoding the position information of the face candidate frame, and converting the position information into face candidate frame information on the original image; screening a plurality of personal face candidate frames as detection results according to the scores of the face candidate frames for face prediction, and cutting out face parts from an original image according to the detection results to be used as input images; constructing a face feature extraction module based on a convolutional neural network, inputting an input image into the face feature extraction module, and obtaining a series of feature values of face information quantization; and calculating the similarity of the two faces according to the characteristic values of the two faces, and judging whether the two faces are the same person or not.

Description

Face comparison method, system, equipment and storage medium
Technical Field
The present invention relates to the field of computer vision technology, and more particularly, to a method, system, device and storage medium for comparing human faces.
Background
The face comparison technology is also called face verification technology, namely, whether the faces in the two images are the same person or not is judged, and the face comparison technology is widely applied to the fields of national security, military security, public security, civil affairs, economy and the like at present, for example, an entrance guard system for face brushing in and out, a financial system for face brushing payment and the like, and has very important research value and significance.
In the prior art, an LBP operator is used for extracting face image features to obtain an LBP coded image of the whole image, the LBP coded image is divided into a plurality of regions and corresponding LBP coded histograms are obtained, so that the LBP coded histograms of the whole image are obtained, when face verification is performed, feature distances of two compared images are calculated by using an image similarity calculation function based on the histograms, and if the feature distances are larger than a set threshold value, the images are regarded as faces of the same person. The method has the advantage of reducing errors caused by incomplete alignment of the face region within a certain range.
The human face comparison method has a plurality of defects, such as the influence of fuzzy images, side faces, light reflection and shielding on the human face identification process, low stability, meanwhile, the method has a low comparison speed in practical application, cannot meet the requirement of real-time performance in certain scenes, and has certain limitation.
Disclosure of Invention
The technical problem to be solved by the present invention is to solve the above-mentioned deficiencies of the prior art, and an object of the present invention is to provide a human face comparison method with good stability and high comparison speed.
The invention aims to provide a human face comparison system with good stability and high comparison speed.
The invention also provides a computer device.
The fourth object of the present invention is to provide a computer-readable storage medium.
In order to achieve the above object, the present invention provides a face comparison method, including:
s1, constructing a lightweight face detection module based on a convolutional neural network, and detecting a face image to determine the position information of a face to obtain a series of face candidate frames;
s2, decoding the position information of the face candidate frames, and converting each face candidate frame into face candidate frame information on the original image;
s3, screening a plurality of face candidate frames as detection results according to score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to serve as an input image of a subsequent face feature extraction module;
s4, constructing a face feature extraction module based on a convolutional neural network, inputting the input image into the face feature extraction module, and obtaining a series of feature values of face information quantization;
and S5, calculating the similarity of the two faces according to the feature values of the two faces, and judging whether the two faces are the same person or not according to the similarity.
As a further improvement, in step S1, the process of constructing the face detection module is as follows:
s11, constructing an input layer, and adjusting the size of the image to be 500 multiplied by 3 by the input layer in order to meet the requirement of fixed-size input of the convolutional neural network;
s12, constructing a lightweight convolution submodule which mainly comprises convolution kernels of two scales, namely a 3 x 3 convolution kernel and a 1 x 1 convolution kernel, wherein the number of the 3 x 3 convolution kernels is 64, and the number of the 1 x 1 convolution kernels is 32; the connection mode of convolution layers in the module is that a 1 multiplied by 1 convolution kernel is connected behind a 3 multiplied by 3 convolution kernel, and simultaneously, a nonlinear activation operation is connected behind the 1 multiplied by 1 convolution kernel;
step S13, connecting 3 convolution sub-modules after the input layer, flattening the feature graph output by the convolution module through a Flatten layer, integrating the information of the convolution layer through two full-connection layers, and outputting the information of 5 multiplied by 2 regression frames through a Reshape layer (b) x ,b y ,b w ,b h Confidence, score), wherein (b) x ,b y ) As coordinates of the center point of the regression box, b w Is the width of the regression box, b h As for the height of the regression frame, confidence is the confidence score of the regression frame, acore is the score of the regression frame including the human face, 5 × 5 × 2 indicates that the original image is divided into 5 × 5 regions, and the model predicts the position information of 2 regression frames in each region.
Further, in step S2, the position information of the regression frame is decoded, and for each regression frame, it is converted into the face candidate frame information on the original image using the following formula:
Figure BDA0003788722270000031
Figure BDA0003788722270000032
Figure BDA0003788722270000033
Figure BDA0003788722270000034
wherein (t) x ,t y ) For candidate frames of faces on the original drawingThe coordinates of the center point of (a); t is t w And t h Respectively width and height; w represents the original image width, h represents the original image height; s represents dividing the original image into S multiplied by S areas, wherein S is 5; x is the number of offset Abscissa, y, representing the area to which the regression box belongs offset And represents the ordinate of the region to which the regression box belongs.
Further, in step S3, the scores score of the face prediction is arranged from large to small according to each face candidate frame, and the top m candidate frames with larger scores are selected; and then, carrying out secondary screening on the m candidate frames by using a non-maximum value inhibition algorithm according to the confidence of the regression frame, and selecting the n screened candidate frames as the result of the face detection, wherein m and n are set values.
Further, if the number of candidate frames after the secondary screening is less than n, all the candidate frames after the secondary screening are used as the detection result.
Further, in step S4, the process of constructing the face feature extraction module is as follows:
s41, constructing an input layer, and preprocessing an input image, wherein the preprocessing process mainly adjusts the input image to be in a uniform size;
s42, connecting 3 lightweight convolution sub-modules behind an input layer, wherein the function is to accelerate the process of feature extraction and increase the nonlinear expression capability of a network, so that the human face features are better extracted;
s43, constructing an inclusion module, wherein the module is positioned behind the lightweight convolution sub-module and consists of convolution layers of 3 scales and a maximum pooling layer, convolution kernels of 3 scales are respectively 1 × 1, 3 × 3 and 5 × 5,3 and 1 pooling layer and are connected in parallel, and the outputs of 4 network layers are spliced together at the last of the module to serve as the output of the inclusion module;
s44, constructing a residual error module, wherein the residual error module is positioned behind the inclusion module and comprises two convolution layers, the sizes of convolution kernels are 3 multiplied by 3, the input feature maps of the module are convolved by the two convolution layers, then the feature maps obtained by convolution and the input feature maps are added bit by bit, and the new feature maps obtained by addition are used as the output of the residual error module;
and S45, connecting two full-connection layers behind the residual module, integrating the information of the convolutional layers, and finally outputting the information through the full-connection layers of k neurons, wherein the k neurons are also k-dimensional feature vectors extracted from the human face by the module.
Further, in step S5, the k-dimensional feature vector may be mapped to a feature point of a k-dimensional feature space, and the feature vectors of two faces may be mapped to two feature points, where the distance between the two feature points represents the similarity degree of the two faces, and a closer distance indicates that the two faces are more similar; the cosine distance is used as the similarity distance of the two feature points, and the specific calculation process is as follows:
calculating two face feature vectors (x) 1 ,x 2 ,...,x k ) And (y) 1 ,y 2 ,...,y k ) The dot product of (a) is shown by the following formula:
dot XY =(x 1 ,x 2 ,...,x k )×(y 1 ,y 2 ,...,y k ) T
two norms of two face feature vectors are respectively calculated, and the two norms are shown as the following formula:
Figure BDA0003788722270000041
Figure BDA0003788722270000042
the cosine distance of the two eigenvectors is calculated from the dot product and the norm as shown in the following formula:
Figure BDA0003788722270000043
the cosine distance is the similarity distance of the two faces, and if the distance is greater than a threshold value, the two faces are considered as the same person; otherwise the two faces are considered not to be the same person.
In order to achieve the second objective, the present invention provides a face comparison system, which includes:
the face detection module is used for detecting a face image to determine the position information of a face so as to obtain a series of face candidate frames;
the decoding module is used for decoding the position information of the face candidate frames and converting each face candidate frame into face candidate frame information on the original image;
the screening module is used for screening a plurality of face candidate frames as detection results according to the score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to be used as an input image of the subsequent face feature extraction module;
the human face feature extraction module is used for extracting features of the input image to obtain a series of characteristic values of human face information quantization;
and the similarity comparison module is used for calculating the similarity of the two faces according to the characteristic values of the two faces and judging whether the two faces are the same person or not according to the similarity.
In order to achieve the third objective, the invention provides a computer device, comprising a memory, a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program to implement the above-mentioned face comparison method.
In order to achieve the fourth object, the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing a face comparison method as described above when executed by a processor.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
the invention realizes the processes of face detection, face feature extraction and face comparison by constructing a series of modules based on the convolutional neural network, compared with other existing face comparison technologies, the invention uses the convolutional neural network to extract the face features, thereby improving the quality of face feature extraction, and simultaneously, a series of improvements are carried out on the network structure, thereby improving the speed of feature extraction and further improving the efficiency of face comparison.
Drawings
FIG. 1 is a general flow diagram of the present invention;
fig. 2 is a flowchart of comparing two faces in practical application of the present invention.
Detailed Description
The invention will be further described with reference to specific embodiments shown in the drawings.
Referring to fig. 1 and 2, a face comparison method includes:
s1, constructing a lightweight face detection module based on a convolutional neural network, and detecting a face image to determine the position information of a face to obtain a series of face candidate frames;
s2, decoding the position information of the face candidate frames, and converting each face candidate frame into face candidate frame information on the original image;
s3, screening a plurality of face candidate frames as detection results according to scores score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to serve as an input image of a subsequent face feature extraction module;
s4, constructing a face feature extraction module based on a convolutional neural network, inputting an input image into the face feature extraction module, and obtaining a series of feature values of face information quantization;
and S5, calculating the similarity of the two faces according to the characteristic values of the two faces, and judging whether the two faces are the same person or not according to the similarity.
In step S1, the process of constructing the face detection module is as follows:
s11, constructing an input layer, and adjusting the size of the image to be 500 multiplied by 3 by the input layer in order to meet the requirement of fixed-size input of the convolutional neural network;
s12, constructing a lightweight convolution submodule which mainly comprises convolution kernels of two scales, namely a convolution kernel of 3 x 3 and a convolution kernel of 1 x 1, wherein the number of the convolution kernels of 3 x 3 is 64, and the number of the convolution kernels of 1 x 1 is 32; the connection mode of the convolution layer in the module is that a 1 multiplied by 1 convolution kernel is connected behind a 3 multiplied by 3 convolution kernel, and simultaneously, a nonlinear activation operation is connected behind the 1 multiplied by 1 convolution kernel; the advantage of this is that the 1 × 1 convolution kernel does not change the size of its input feature map (without losing the resolution of the feature map), and at the same time, the width of the input feature map can be reduced, and in addition, a nonlinear activation operation is added after the 1 × 1 convolution kernel, so that the nonlinear expression capability of the network can be increased;
step S13, connecting 3 convolution submodules after the input layer, flattening the feature graph output by the convolution module through a Flatten layer, integrating the information of the convolution layers through two full-connection layers, and finally outputting the information of 5 multiplied by 2 regression frames through a Reshape layer (b) x ,b y ,b w ,b h Confidence, score), wherein (b) x ,b y ) As coordinates of the center point of the regression box, b w Is the width of the regression box, b h As for the height of the regression frame, confidence is the confidence score of the regression frame, score is the score of the regression frame including the human face, 5 × 5 × 2 indicates that the original image is divided into 5 × 5 regions, and the model predicts the position information of 2 regression frames in each region.
In step S2, the regression frame information output by the face detection basic module is a series of face candidate frames proposed by the model preliminary prediction, and the predicted position information is a normalized value, so to obtain the actual face position information, it is necessary to decode the position information of these candidate frames and perform further screening processing on these candidate frames. Decoding the position information of the regression frame, and converting the position information of the regression frame into the face candidate frame information on the original image by using the following formula for each regression frame:
Figure BDA0003788722270000071
Figure BDA0003788722270000072
Figure BDA0003788722270000073
Figure BDA0003788722270000074
wherein (t) x ,t y ) Coordinates of a central point of a face candidate frame on the original image are obtained; t is t w And t h Respectively width and height; w represents the original image width, h represents the original image height; s represents that the original image is divided into S multiplied by S areas, the original image is divided into 5 multiplied by 5 areas by the invention, and therefore S is 5; x is the number of offset The abscissa, y, representing the area to which the regression box belongs offset And represents the ordinate of the region to which the regression box belongs.
In step S3, firstly, the scores score of the face prediction is arranged from large to small according to each face candidate frame, and the first m candidate frames with larger scores are selected; and then, carrying out secondary screening on the m candidate frames by using a non-maximum value inhibition algorithm according to the confidence of the regression frame, and selecting the n screened candidate frames as the result of the face detection, wherein m and n are set values. And if the candidate frames after the secondary screening are less than n, taking all the candidate frames after the secondary screening as detection results.
In step S4, comparing whether the two faces are consistent, quantizing the two face information into a series of feature values, and comparing the similarity of the two feature values to determine whether the two faces are the same person, so that a face feature extraction module is constructed to extract face features based on the convolutional neural network. The process of constructing the face feature extraction module is as follows:
s41, constructing an input layer, and preprocessing an input image, wherein the preprocessing process mainly adjusts the input image into a uniform size; because the face partial images output by the face detection module are different in size, and the feature extraction module comprises a full connection layer, the face partial images need to be adjusted to be uniform in size;
s42, connecting 3 lightweight convolution sub-modules behind an input layer, wherein the function is to accelerate the process of feature extraction and increase the nonlinear expression capability of a network, so that the human face features are better extracted;
s43, constructing an Incep module, wherein the Incep module is positioned behind the lightweight convolution sub-module and consists of convolution layers with 3 scales and a maximum pooling layer, convolution kernels with 3 scales are respectively 1 × 1, 3 × 3 and 5 × 5,3 convolution layers and 1 pooling layer and are connected in parallel, and the outputs of 4 network layers are spliced together at the last of the module to serve as the output of the Incep module; the module has the advantages that the characteristics of the human face can be extracted in multiple scales, and the extracted characteristics of the human face are ensured to be rich enough, so that the comparison accuracy is improved;
s44, constructing a residual error module, wherein the residual error module is positioned behind the inclusion module and comprises two convolution layers, the sizes of convolution kernels are 3 multiplied by 3, the input feature maps of the module are convolved by the two convolution layers, then the feature maps obtained by convolution and the input feature maps are added bit by bit, and the new feature maps obtained by addition are used as the output of the residual error module; the module is introduced to enable the model to be more easily converged during training, and meanwhile, the fitting capacity of the model can be increased;
and S45, connecting two full-connection layers behind the residual module, integrating the information of the convolutional layers, and finally outputting the information through the full-connection layers of k neurons, wherein the k neurons are also k-dimensional feature vectors extracted from the human face by the module.
In step S5, the k-dimensional feature vector may be mapped to a feature point of a k-dimensional feature space, and the feature vectors of two faces may be mapped to two feature points, where the distance between the two feature points represents the similarity of the two faces, and the closer the distance is, the more similar the two faces are, the problem of face comparison is converted into the problem of calculating the similarity distance between the face feature points; the cosine distance is used as the similarity distance of the two feature points, and the specific calculation process is as follows:
calculating two face feature vectors (x) 1 ,x 2 ,...,x k ) And (y) 1 ,y 2 ,...,y k ) The dot product of (a) is shown by the following formula:
dot XY =(x 1 ,x 2 ,...,x k )×(y 1 ,y 2 ,...,y k ) T
two norms of two face feature vectors are respectively calculated, and the two norms are shown as the following formula:
Figure BDA0003788722270000091
Figure BDA0003788722270000092
the cosine distance of the two eigenvectors is calculated from the dot product and the norm as shown in the following formula:
Figure BDA0003788722270000093
the cosine distance is the similarity distance of two faces, and if the distance is greater than a threshold value, the two faces are considered as the same person; otherwise the two faces are considered not to be the same person.
A face alignment system, comprising:
the face detection module is used for detecting a face image to determine the position information of a face to obtain a series of face candidate frames;
the decoding module is used for decoding the position information of the face candidate frames and converting each face candidate frame into face candidate frame information on the original image;
the screening module is used for screening a plurality of face candidate frames as detection results according to the score sxre of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to be used as an input image of the subsequent face feature extraction module;
the human face feature extraction module is used for extracting features of the input image to obtain a series of characteristic values of human face information quantization;
and the similarity comparison module is used for calculating the similarity of the two faces according to the characteristic values of the two faces and judging whether the two faces are the same person or not according to the similarity.
A computer device comprising a memory, a processor; the memory is used for storing a computer program; the processor is used for executing the computer program to realize the human face comparison method.
A computer-readable storage medium, on which a computer program is stored, the computer program being adapted to implement a face comparison method as described above when executed by a processor.
Practical application
In the following, the face comparison process of analyzing two images by taking the certificate photo and the spot photo retained when the operator website verifier transacts the service as an example, the certificate photo usually only contains 1 face, and the spot photo may contain a plurality of faces.
1. The sizes of the identification photo and the field photo are uniformly adjusted to be 500 multiplied by 3;
2. detecting face of the certificate photo, inputting the certificate photo into the face detection module, outputting 50 regression frames from the face detection module, and comparing the position information (b) of the regression frames s ,b y ,b w ,b h ) Decoding is performed using the following equation:
Figure BDA0003788722270000101
Figure BDA0003788722270000111
Figure BDA0003788722270000112
Figure BDA0003788722270000113
for this example, the taking of SThe value 5,w is 500 and the value h is 500. According to (t) x ,t y ,t w ,t h ) Obtaining the positions of candidate frames on a series of decoded original images and confidence scores confidence and face prediction scores score corresponding to the positions;
3. screening the face candidate frames of the certificate photo, arranging the candidate frames from large to small according to the score of each candidate frame, selecting the first 10 candidate frames with larger scores, then carrying out secondary screening on the 10 candidate frames by using a non-maximum suppression algorithm according to the confidence, and selecting 1 screened candidate frame as the result of face detection of the certificate photo;
4. performing face detection on the scene shot, inputting the scene shot into the face detection basic module, outputting the information of 50 regression frames from the face detection basic module, and performing position detection on the regression frames (b) x ,b y ,b w ,b h ) Decoding is performed using the following equation:
Figure BDA0003788722270000114
Figure BDA0003788722270000115
Figure BDA0003788722270000116
Figure BDA0003788722270000117
for this embodiment, S is 5,w, 500, and h is 500. According to (t) x ,t y ,t w ,t h ) Obtaining the positions of candidate frames on a series of decoded original images and confidence scores confidence and face prediction scores score corresponding to the positions;
5. screening the scene photo face candidate frames, arranging the candidate frames from large to small according to the score of each candidate frame, selecting the first 20 candidate frames with larger scores, then performing secondary screening on the 20 candidate frames by using a non-maximum suppression algorithm according to the confidence, selecting the 5 screened candidate frames as the scene photo face detection result, and if the number of the screened candidate frames is less than 5, taking the rest candidate frames as the scene photo face detection result;
6. for the face detection result of the identification photo extracted in the step 3, image preprocessing is firstly carried out on the face detection result, the size of the face detection result is adjusted to 224 multiplied by 3, then the face detection result is input into a face feature extraction module, and after a series of convolution operations of the module, 160-dimensional feature vectors (x) are output through a full connection layer of 160 neurons 1 ,x 2 ,...,x 160 );
7. Respectively preprocessing the images of the (possibly multiple) on-site shot face detection results extracted in the step 5, uniformly adjusting the sizes to 224 multiplied by 3, respectively inputting the results into a face feature extraction module, and outputting 160-dimensional feature vectors (i) through a full-connection layer of 160 neurons after a series of convolution operations of the modules 1 ,i 2 ,...,i 160 )、(j 1 ,j 2 ,...,j 160 )、(k 1 ,k 2 ,...,k 160 ) .., putting the vectors into a field photo face feature vector set;
8. taking out a feature vector from the field photo face feature vector set without replacing the feature vector, calculating the similarity distance between the feature vector and the identification photo face feature vector through a cosine distance formula, namely the similarity degree between the field photo face and the identification photo face, and putting the result into a comparison set;
9. repeating the step 8 until all vectors in the field photo face feature vector set and the identification photo face feature vector calculate similarity distance, namely all faces in the field photo are compared with the identification photo face;
10. taking the value with the maximum similarity from the comparison set as a face comparison result of the field photo and the certificate photo, comparing the value with a set face comparison threshold value threshold, and if the value is greater than the threshold value, determining that the field photo and the certificate photo pass the face comparison; otherwise, the comparison of the two faces is not passed.
The above is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that several variations and modifications can be made without departing from the structure of the present invention, which will not affect the effect of the implementation of the present invention and the utility of the patent.

Claims (10)

1. A face comparison method is characterized by comprising the following steps:
s1, constructing a lightweight face detection module based on a convolutional neural network, and detecting a face image to determine the position information of a face to obtain a series of face candidate frames;
s2, decoding the position information of the face candidate frames, and converting each face candidate frame into face candidate frame information on the original image;
s3, screening a plurality of face candidate frames as detection results according to score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to serve as an input image of a subsequent face feature extraction module;
s4, constructing a face feature extraction module based on a convolutional neural network, inputting the input image into the face feature extraction module, and obtaining a series of feature values of face information quantization;
and S5, calculating the similarity of the two faces according to the characteristic values of the two faces, and judging whether the two faces are the same person or not according to the similarity.
2. The method according to claim 1, wherein in step S1, the process of constructing the face detection module is as follows:
step S11, constructing an input layer, and adjusting the size of an image to be 500 multiplied by 3 by the input layer in order to meet the requirement of fixed-size input of the convolutional neural network;
s12, constructing a lightweight convolution submodule which mainly comprises convolution kernels of two scales, namely a 3 x 3 convolution kernel and a 1 x 1 convolution kernel, wherein the number of the 3 x 3 convolution kernels is 64, and the number of the 1 x 1 convolution kernels is 32; the connection mode of the convolution layer in the module is that a 1 multiplied by 1 convolution kernel is connected behind a 3 multiplied by 3 convolution kernel, and simultaneously, a nonlinear activation operation is connected behind the 1 multiplied by 1 convolution kernel;
step S13, connecting 3 convolution submodules after the input layer, flattening the feature graph output by the convolution module through a Flatten layer, integrating the information of the convolution layers through two full-connection layers, and finally outputting the information of 5 multiplied by 2 regression frames through a Reshape layer (b) x ,b y ,b w ,b h Confirm, score), wherein (b) x ,b y ) As coordinates of the center point of the regression box, b w Is the width of the regression box, b h As for the height of the regression frame, confidence is the confidence score of the regression frame, acore is the score of the regression frame including the human face, 5 × 5 × 2 indicates that the original image is divided into 5 × 5 regions, and the model predicts the position information of 2 regression frames in each region.
3. The method of claim 2, wherein in step S2, the position information of the regression frames is decoded, and for each regression frame, the regression frame is converted into the face candidate frame information on the original image by using the following formula:
Figure FDA0003788722260000021
Figure FDA0003788722260000022
Figure FDA0003788722260000023
Figure FDA0003788722260000024
wherein (t) x ,t y ) Coordinates of a central point of a face candidate frame on the original image are obtained; t is t w And t h Respectively width and height; w represents the original image width, h represents the original image height; s represents dividing the original image into S multiplied by S areas, wherein S is 5; x is the number of offset The abscissa, y, representing the area to which the regression box belongs offset And represents the ordinate of the region to which the regression box belongs.
4. A face comparison method as claimed in claim 2, wherein in step S3, the scores score of face prediction are arranged from large to small according to each face candidate frame, and the first m candidate frames with larger scores are selected; and then, carrying out secondary screening on the m candidate frames by using a non-maximum value inhibition algorithm according to the confidence of the regression frame, and selecting the n screened candidate frames as the result of the face detection, wherein m and n are set values.
5. The method according to claim 4, wherein if the number of the candidate frames after the secondary screening is less than n, all the candidate frames after the secondary screening are used as the detection result.
6. The method of claim 1, wherein in step S4, the process of constructing the face feature extraction module is as follows:
s41, constructing an input layer, and preprocessing an input image, wherein the preprocessing process mainly adjusts the input image into a uniform size;
s42, connecting 3 lightweight convolution sub-modules behind an input layer, wherein the function is to accelerate the process of feature extraction and increase the nonlinear expression capability of a network, so that the human face features are better extracted;
s43, constructing an Incep module, wherein the Incep module is positioned behind the lightweight convolution sub-module and consists of convolution layers with 3 scales and a maximum pooling layer, convolution kernels with 3 scales are respectively 1 × 1, 3 × 3 and 5 × 5,3 convolution layers and 1 pooling layer and are connected in parallel, and the outputs of 4 network layers are spliced together at the last of the module to serve as the output of the Incep module;
s44, constructing a residual error module, wherein the residual error module is positioned behind the inclusion module and comprises two convolution layers, the sizes of convolution kernels are 3 multiplied by 3, the input feature maps of the module are convolved by the two convolution layers, then the feature maps obtained by convolution and the input feature maps are added bit by bit, and the new feature maps obtained by addition are used as the output of the residual error module;
and S45, connecting two full-connection layers behind the residual module, integrating the information of the convolutional layers, and finally outputting the information through the full-connection layers of k neurons, wherein the k neurons are also k-dimensional feature vectors extracted from the human face by the module.
7. The method according to claim 6, wherein in step S5, the k-dimensional feature vector can be mapped to a feature point in a k-dimensional feature space, and the feature vectors of two faces can be mapped to two feature points, the distance between the two feature points represents the similarity of the two faces, and the closer the distance is, the more similar the two faces are; the cosine distance is used as the similarity distance of the two feature points, and the specific calculation process is as follows:
calculating two face feature vectors (x) 1 ,x 2 ,...,x k ) And (y) 1 ,y 2 ,...,y k ) The dot product of (a) is shown by the following formula:
dot XY =(x 1 ,x 2 ,...,x k )×(y 1 ,y 2 ,...,y k ) T
two norms of two face feature vectors are respectively calculated, and the two norms are shown as the following formula:
Figure FDA0003788722260000031
Figure FDA0003788722260000041
calculating the cosine distance of the two eigenvectors according to the dot product and the norm, as shown in the following formula:
Figure FDA0003788722260000042
the cosine distance is the similarity distance of two faces, and if the distance is greater than a threshold value, the two faces are considered as the same person; otherwise the two faces are considered not to be the same person.
8. A face comparison system, comprising:
the face detection module is used for detecting a face image to determine the position information of a face to obtain a series of face candidate frames;
the decoding module is used for decoding the position information of the face candidate frames and converting each face candidate frame into face candidate frame information on the original image;
the screening module is used for screening out a plurality of face candidate frames as detection results according to the score of each face candidate frame on the original image for face prediction, and cutting out a face part from the original image according to the detection results to be used as an input image of the subsequent face feature extraction module;
the human face feature extraction module is used for extracting features of the input image to obtain a series of characteristic values of human face information quantization;
and the similarity comparison module is used for calculating the similarity of the two faces according to the characteristic values of the two faces and judging whether the two faces are the same person or not according to the similarity.
9. A computer device comprising a memory, a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program to implement a face comparison method as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is configured to implement a face comparison method according to any one of claims 1 to 7.
CN202210949819.7A 2022-08-09 2022-08-09 Face comparison method, system, equipment and storage medium Pending CN115273202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210949819.7A CN115273202A (en) 2022-08-09 2022-08-09 Face comparison method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210949819.7A CN115273202A (en) 2022-08-09 2022-08-09 Face comparison method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115273202A true CN115273202A (en) 2022-11-01

Family

ID=83748799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210949819.7A Pending CN115273202A (en) 2022-08-09 2022-08-09 Face comparison method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115273202A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690934A (en) * 2023-01-05 2023-02-03 武汉利楚商务服务有限公司 Master and student attendance card punching method and device based on batch face recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690934A (en) * 2023-01-05 2023-02-03 武汉利楚商务服务有限公司 Master and student attendance card punching method and device based on batch face recognition

Similar Documents

Publication Publication Date Title
CN111639692B (en) Shadow detection method based on attention mechanism
CN110287960B (en) Method for detecting and identifying curve characters in natural scene image
CN111968150B (en) Weak surveillance video target segmentation method based on full convolution neural network
CN111524145B (en) Intelligent picture cropping method, intelligent picture cropping system, computer equipment and storage medium
CN109145745B (en) Face recognition method under shielding condition
WO2017080196A1 (en) Video classification method and device based on human face image
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN113361495A (en) Face image similarity calculation method, device, equipment and storage medium
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN113920400B (en) Metal surface defect detection method based on improvement YOLOv3
CN110826558B (en) Image classification method, computer device, and storage medium
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN114677535A (en) Training method of domain-adaptive image classification network, image classification method and device
CN117237599A (en) Image target detection method and device
CN115273202A (en) Face comparison method, system, equipment and storage medium
CN109359530B (en) Intelligent video monitoring method and device
CN114463732A (en) Scene text detection method and device based on knowledge distillation
CN118230354A (en) Sign language recognition method based on improvement YOLOv under complex scene
CN115587994A (en) Model fusion image tampering detection method and device based on multi-view features
CN116403237A (en) Method for re-identifying blocked pedestrians based on associated information and attention mechanism
CN113191195B (en) Face detection method and system based on deep learning
CN114694042A (en) Disguised person target detection method based on improved Scaled-YOLOv4
CN114332915A (en) Human body attribute detection method and device, computer equipment and storage medium
CN112767427A (en) Low-resolution image recognition algorithm for compensating edge information
CN114385843A (en) Classification network construction method and image retrieval method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination