CN114360030A - Face recognition method based on convolutional neural network - Google Patents

Face recognition method based on convolutional neural network Download PDF

Info

Publication number
CN114360030A
CN114360030A CN202210049539.0A CN202210049539A CN114360030A CN 114360030 A CN114360030 A CN 114360030A CN 202210049539 A CN202210049539 A CN 202210049539A CN 114360030 A CN114360030 A CN 114360030A
Authority
CN
China
Prior art keywords
layer
spatial attention
face recognition
pooling
attention module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210049539.0A
Other languages
Chinese (zh)
Inventor
李琦
赖艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Ruiyun Technology Co ltd
Original Assignee
Chongqing Ruiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Ruiyun Technology Co ltd filed Critical Chongqing Ruiyun Technology Co ltd
Priority to CN202210049539.0A priority Critical patent/CN114360030A/en
Publication of CN114360030A publication Critical patent/CN114360030A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face recognition method based on a convolutional neural network, which comprises the steps of training a face recognition network by utilizing a training data set, inputting a face image into the trained face recognition network, performing global average pooling operation on an intermediate feature map by utilizing a GAP layer, sequentially passing a primary feature vector through a full connection layer and a softmax layer, and finally outputting to obtain a classification result. The face recognition network comprises a DSAG unit, a GM pooling layer, a GAP layer, a full connection layer and a softmax layer. According to the invention, the global median pooling operation layers are arranged in the two spatial attention modules, and part of the calibration graph in the first spatial attention module is input into the second spatial attention module, so that the second calibration graph has a calibration effect on feature information of different scales, and the DSAG unit can more fully extract useful feature information from the low-resolution face image.

Description

Face recognition method based on convolutional neural network
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a face recognition method based on a convolutional neural network.
Background
With the proposal of various deep learning models, the accuracy of the convolutional neural network on the face recognition task is gradually improved, and the existing technology can well meet the identity recognition requirement in urban public traffic scenes from the current practical floor application projects (such as face brushing and gate passing, face brushing and ticket checking, an access control system based on face recognition and the like). However, for some tourist attractions, the crowd has a large and scattered range of motion, the infrastructure construction cost is high, and a large number of cameras cannot be arranged to collect face images like a city. In the application scene, monitoring equipment can be only arranged at key positions, the face image only occupies a small part of the acquired image, the resolution of the face image is low, and the existing model identification precision is not high under the condition.
Disclosure of Invention
In view of the above-mentioned deficiencies in the prior art, the present invention provides a face recognition method based on a convolutional neural network to more accurately recognize a low-resolution face image.
In order to achieve the above purpose, the solution adopted by the invention is as follows: a face recognition method based on a convolutional neural network comprises the following steps:
s10, building a face recognition network, and training the face recognition network by using a training data set;
the face recognition network comprises a DSAG unit, a GM pooling layer, a GAP layer, a full connection layer and a softmax layer, wherein the DSAG unit is used for extracting feature information in an image, the DSAG unit and the GM pooling layer are multiple, and the DSAG unit and the GM pooling layer are alternately arranged along the depth direction of the face recognition network;
s20, acquiring a face image to be recognized, inputting the face image into the trained face recognition network, and obtaining an intermediate feature map after the face image sequentially passes through the DSAG units and the GM pooling layer;
s30, carrying out global average pooling operation on the intermediate feature graph by utilizing the GAP layer to obtain a primary feature vector;
s40, enabling the primary feature vector to sequentially pass through the full-connection layer and the softmax layer, and finally outputting to obtain a classification result to finish face recognition;
wherein the DSAG unit may be represented by the following mathematical model:
T1=fRC 1(Yn-1)
T2=fSA 1(T1)*T1
T3=fRC 2(T2)
Yn=fSA 2(T3,U)*T3
wherein, Yn-1And YnRespectively representing the input and output of the DSAG unit, fRC 1() Representing a first feature extraction component, fRC 2() Representing a second feature extraction component, fSA 1() Representing a first spatial attention module, fSA 2() Represents a second spatial attention module, U represents a calibration graph input to the second spatial attention module from the first spatial attention module, and T3 and U are simultaneously used as input of the second spatial attention module.
Further, the first feature extraction component and the second feature extraction component each include a plurality of convolution residual blocks connected in sequence, and the convolution residual blocks can be expressed by the following formula;
Mn=λ2(f21(f1(Mn-1))))+Mn-1
wherein M isn-1And MnRespectively representing the input and output of said block of convolution residues, f1And f2All represent convolution layers with a convolution kernel size of 3 x3, λ1And λ2Both represent the activation function ReLU.
Further, the first spatial attention module may be represented by the following formula:
Z1=θ1(fCS 1(<AP1(T1),MP1(T1),DP1(T1)>))
wherein T1 is a feature map input to the first spatial attention module, Z1 represents a first calibration map of the first spatial attention module output, fCS 1Denotes convolution operation with convolution kernel size 1 x1, θ 1 denotes sigmoThe id activates a function that is to be executed,<·>represents performing a splicing operation, AP1() represents an average pooling operation layer, MP1() represents a maximum pooling operation layer, and DP1() represents a median pooling operation layer.
Further, the second spatial attention module may be represented by the following mathematical model:
X1=AP1(T1)+MP1(T1)-AP2(T3)
X2=MP1(T1)+MP2(T3)
X3=θ2(fCS 2(<X1,X2,AP2(T3),DP1(T1),DP2(T3)>))
wherein T1 is a feature map input to the first spatial attention module, T3 is a feature map input to the second spatial attention module, X3 represents a second calibration map output by the second spatial attention module, AP1() and AP2() respectively represent average pooling levels in the first and second spatial attention modules, MP1() and MP2() respectively represent maximum pooling levels in the first and second spatial attention modules, DP1() and DP2() respectively represent median pooling levels (median), fCS 2Represents the convolution operation with convolution kernel size of 1 x1, theta 2 represents sigmoid activation function,<·>indicating that a splicing operation is performed. The maximum pooling operation layer, the average pooling operation layer and the median pooling operation layer are all calibration graphs which operate the channel direction of the feature graph, the output is the number of channels is 1, and the length and the width are the same as the input.
Further, the pooling window size of the GM pooling layer is 3 x3, the step size is 2, and the operation of the GM pooling layer can be represented as the following mathematical model:
K1=sort(P)
K2=Avg(max1(K1)+max2(K1)+max3(K1))+max1(K1)
where P is a matrix of 3 × 3 size input to the GM pooling layer, sort (P) represents sorting elements in the matrix P from large to small, max1(K1) represents obtaining the value of an element located at the first position in the front end of the number series K1 (i.e., the maximum value), max2(K1) represents obtaining the value of an element located at the second position in the number series K1, max3(K1) represents obtaining the value of an element located at the third position in the number series K1, and Avg () represents an averaging operation. By filling the edges of the feature map, the length and width dimensions of the feature map become half of the original dimensions after passing through the GM pooling layer.
The invention has the beneficial effects that:
(1) in the current face recognition model based on the convolutional neural network, the attention mechanism adopts average pooling and maximum pooling, takes the characteristics of a face image into consideration, fully and accurately extracts edge information in the face image, and plays an important role in improving the face recognition accuracy, so that the invention arranges a global median pooling operation layer in two spatial attention modules, so that the amount of obtaining edge characteristic information in the characteristic information can be improved in the process of calibrating a characteristic diagram by a spatial calibration diagram, thereby improving the accuracy of low-resolution face recognition;
(2) with the increase of the network depth and the gradual increase of the receptive field of convolution operation, the invention inputs part of the calibration graph in the first spatial attention module into the second spatial attention module, thus being capable of improving the receptive range of the calibration graph generated by the second spatial attention module, and leading the second calibration graph to have a calibration effect on feature information of different scales, rather than only aiming at feature information of the same scale as the T3 feature graph, and by means of the method, the DSAG unit can more fully extract useful feature information from the low-resolution face image;
(3) in a conventional classification network, the feature map is processed by adopting maximum pooling operation, so that although the operation is simple, the utilization rate of the features is low, and especially under the condition of low image resolution, less effective information which is originally available is easily lost.
Drawings
FIG. 1 is a schematic diagram of a face recognition network according to an embodiment;
fig. 2 is a schematic diagram of an internal structure of a DSAG unit in the face recognition network shown in fig. 1;
FIG. 3 is a diagram illustrating the structure of a bit convolution residual block in the DSAG unit shown in FIG. 2;
FIG. 4 is a schematic diagram of the first spatial attention module and the second spatial attention module of the DSAG unit of FIG. 2;
in the drawings:
1-DSAG unit, 11-first feature extraction component, 12-second feature extraction component, 13-first space attention module, 14-second space attention module, 15-convolution residual block, 2-GM pooling layer, 3-GAP layer, 4-full connection layer, 5-softmax layer and 6-face image to be recognized.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
example (b):
fig. 1 is a schematic diagram of an overall structure of a face recognition network in this embodiment, where a DSAG unit 1 and a GM pooling layer 2 are correspondingly arranged, the number of the DSAG unit 1 and the GM pooling layer 1 is 4, a specific structure of the DSAG unit 1 is shown in fig. 2, in each DSAG unit 1, four convolution residual blocks 15 are respectively arranged in the first feature extraction component 11 and the second feature extraction component, which are sequentially connected, a structure of the convolution residual block 15 is shown in fig. 3, and structures of the first spatial attention module 13 and the second spatial attention module 14 are shown in fig. 4. When the model is trained, the cross entropy is adopted as a loss function, the epoch is set to be 1500, and the batch-size is set to be 16.
Taking the length, width and channel number of the input face image 6 to be recognized as 112 × 112 × 3, respectively, in the network, the first convolution operation of the first convolution residual block 15 in the first DSAG unit 1 is used to increase the channel number of the feature map, and the size of the output feature map is 112 × 112 × 64. In each DSAG unit 1, the first convolution operation of the first convolution residual block 15 in the second feature extraction component 12 also serves to increase the number of feature map channels, which is twice the number of feature map output channels as the input. For other convolution operations within the block of convolution residues 15 in the network, the length and width of the feature map and the channel size are not changed before and after the convolution. The size of the network feature map at different positions in the embodiment is specifically shown in the following table:
Figure BDA0003473435370000061
Figure BDA0003473435370000071
the GAP layer 3 is used for carrying out global average pooling operation on the intermediate feature graph, for the full connection layer 4, the number of input nodes is 1024, and the number of output nodes is set according to the total number of identities needing to be identified actually. It should be noted that, according to the application scenario, the softmax layer 5 may be removed, and the open set identification is realized by calculating the distance between the output feature vector of the full connection layer 4 and the preset sample feature vector.
The VGG19, the ResNet101 and the face recognition network provided by the invention are respectively trained by using the same training set, and then the test is carried out on the same test set, and the results are shown in the following table:
Figure BDA0003473435370000081
from the above results, it can be seen that, compared with the prior art, the recognition accuracy of the face recognition network provided by the invention on the low-resolution face image is greatly improved, and a substantial progress is made.
On the basis of the present embodiment, the GM pooling layer 2 is replaced by a normal maximum pooling layer (pooling window size 3 × 3, step size 2), and the rest of the network is unchanged, resulting in a comparative network a. On the basis of the present embodiment, the connection portion between the first spatial attention module 13 and the second spatial attention module 14 is removed, the calibration chart of the first spatial attention module 13 is cancelled and input into the second spatial attention module 14, and the other portions of the network are not changed, so as to construct the comparison network B. The exact same training and testing procedure was used and the test results are shown in the following table:
Figure BDA0003473435370000082
from the above results, the GM pooling layer 2 and the DSAG unit 1 provided by the present invention both have an obvious positive effect on improving the accuracy of network recognition on low resolution face images.
The above-mentioned embodiments only express the specific embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (5)

1. A face recognition method based on a convolutional neural network is characterized in that: the method comprises the following steps:
s10, building a face recognition network, and training the face recognition network by using a training data set;
the face recognition network comprises a DSAG unit, a GM pooling layer, a GAP layer, a full connection layer and a softmax layer, wherein the DSAG unit is used for extracting feature information in an image, the DSAG unit and the GM pooling layer are multiple, and the DSAG unit and the GM pooling layer are alternately arranged along the depth direction of the face recognition network;
s20, acquiring a face image to be recognized, inputting the face image into the trained face recognition network, and obtaining an intermediate feature map after the face image sequentially passes through the DSAG units and the GM pooling layer;
s30, carrying out global average pooling operation on the intermediate feature graph by utilizing the GAP layer to obtain a primary feature vector;
s40, enabling the primary feature vector to sequentially pass through the full-connection layer and the softmax layer, and finally outputting to obtain a classification result to finish face recognition;
wherein the DSAG unit may be represented by the following mathematical model:
T1=fRC 1(Yn-1)
T2=fSA 1(T1)*T1
T3=fRC 2(T2)
Yn=fSA 2(T3,U)*T3
wherein, Yn-1And YnRespectively representing the input and output of the DSAG unit, fRC 1() Representing a first feature extraction component, fRC 2() Representing a second feature extraction component, fSA 1() Representing a first spatial attention module, fSA 2() Represents a second spatial attention module, U represents a calibration map input to the second spatial attention module from the first spatial attention module, and T3 and U are both inputs to the second spatial attention module.
2. The face recognition method based on the convolutional neural network as claimed in claim 1, wherein: the first feature extraction component and the second feature extraction component each include a plurality of convolution residual blocks connected in sequence, and the convolution residual blocks can be expressed by the following formula;
Mn=λ2(f21(f1(Mn-1))))+Mn-1
wherein M isn-1And MnRespectively representing the input and output of said block of convolution residues, f1And f2All represent convolution layers with a convolution kernel size of 3 x3, λ1And λ2Both represent the activation function ReLU.
3. The face recognition method based on the convolutional neural network as claimed in claim 1, wherein: the first spatial attention module may be represented by the following equation:
Z1=θ1(fCS 1(<AP1(T1),MP1(T1),DP1(T1)>))
wherein T1 is the characteristic diagram input into the first spatial attention module, and Z1 represents the first spatial attention moduleFirst calibration map of spatial attention module output, fCS 1Represents the convolution operation with convolution kernel size of 1 x1, theta 1 represents sigmoid activation function,<·>represents performing a splicing operation, AP1() represents an average pooling operation layer, MP1() represents a maximum pooling operation layer, and DP1() represents a median pooling operation layer.
4. The face recognition method based on the convolutional neural network as claimed in claim 3, wherein: the second spatial attention module may be represented by the following mathematical model:
X1=AP1(T1)+MP1(T1)-AP2(T3)
X2=MP1(T1)+MP2(T3)
X3=θ2(fCS 2(<X1,X2,AP2(T3),DP1(T1),DP2(T3)>))
wherein T1 is a feature map input to the first spatial attention module, T3 is a feature map input to the second spatial attention module, X3 represents a second calibration map output by the second spatial attention module, AP1() and AP2() represent average pooling levels in the first and second spatial attention modules, MP1() and MP2() represent maximum pooling levels in the first and second spatial attention modules, DP1() and DP2() represent median pooling levels in the first and second spatial attention modules, respectively, fCS 2Represents the convolution operation with convolution kernel size of 1 x1, theta 2 represents sigmoid activation function,<·>indicating that a splicing operation is performed.
5. The face recognition method based on the convolutional neural network as claimed in claim 1, wherein: the pooling window size of the GM pooling layer is 3 x3, the step size is 2, and the operation of the GM pooling layer can be represented as the following mathematical model:
K1=sort(P)
K2=Avg(max1(K1)+max2(K1)+max3(K1))+max1(K1)
where P is a matrix of 3 × 3 size input to the GM pooling layer, sort (P) represents sorting elements in the matrix P from large to small, max1(K1) represents obtaining the value of an element located at the first position in the number series K1, max2(K1) represents obtaining the value of an element located at the second position in the number series K1, max3(K1) represents obtaining the value of an element located at the third position in the number series K1, and Avg () represents averaging operation.
CN202210049539.0A 2022-01-17 2022-01-17 Face recognition method based on convolutional neural network Pending CN114360030A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210049539.0A CN114360030A (en) 2022-01-17 2022-01-17 Face recognition method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210049539.0A CN114360030A (en) 2022-01-17 2022-01-17 Face recognition method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN114360030A true CN114360030A (en) 2022-04-15

Family

ID=81091275

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210049539.0A Pending CN114360030A (en) 2022-01-17 2022-01-17 Face recognition method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN114360030A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187814A (en) * 2022-07-25 2022-10-14 重庆芸山实业有限公司 Chrysanthemum mosaic disease diagnosis method and device based on artificial intelligence
CN115205614A (en) * 2022-05-20 2022-10-18 钟家兴 Ore X-ray image identification method for intelligent manufacturing
CN115661911A (en) * 2022-12-23 2023-01-31 四川轻化工大学 Face feature extraction method, device and storage medium
CN115937956A (en) * 2023-01-05 2023-04-07 广州蚁窝智能科技有限公司 Face recognition method and board system for kitchen
CN115984949A (en) * 2023-03-21 2023-04-18 威海职业学院(威海市技术学院) Low-quality face image recognition method and device with attention mechanism

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205614A (en) * 2022-05-20 2022-10-18 钟家兴 Ore X-ray image identification method for intelligent manufacturing
CN115205614B (en) * 2022-05-20 2023-12-22 深圳市沃锐图像技术有限公司 Ore X-ray image identification method for intelligent manufacturing
CN115187814A (en) * 2022-07-25 2022-10-14 重庆芸山实业有限公司 Chrysanthemum mosaic disease diagnosis method and device based on artificial intelligence
CN115187814B (en) * 2022-07-25 2024-05-10 重庆芸山实业有限公司 Artificial intelligence-based chrysanthemum mosaic disease diagnosis method and equipment
CN115661911A (en) * 2022-12-23 2023-01-31 四川轻化工大学 Face feature extraction method, device and storage medium
CN115937956A (en) * 2023-01-05 2023-04-07 广州蚁窝智能科技有限公司 Face recognition method and board system for kitchen
CN115984949A (en) * 2023-03-21 2023-04-18 威海职业学院(威海市技术学院) Low-quality face image recognition method and device with attention mechanism

Similar Documents

Publication Publication Date Title
CN112861720B (en) Remote sensing image small sample target detection method based on prototype convolutional neural network
CN114360030A (en) Face recognition method based on convolutional neural network
CN109118479B (en) Capsule network-based insulator defect identification and positioning device and method
CN108764085B (en) Crowd counting method based on generation of confrontation network
CN108846835B (en) Image change detection method based on depth separable convolutional network
CN106778595B (en) Method for detecting abnormal behaviors in crowd based on Gaussian mixture model
CN110175613A (en) Street view image semantic segmentation method based on Analysis On Multi-scale Features and codec models
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN114359283B (en) Defect detection method based on Transformer and electronic equipment
CN112597985B (en) Crowd counting method based on multi-scale feature fusion
CN105976383A (en) Power transmission equipment fault diagnosis method based on limit learning machine image recognition
CN108960404B (en) Image-based crowd counting method and device
CN109598220B (en) People counting method based on multi-input multi-scale convolution
CN110322453A (en) 3D point cloud semantic segmentation method based on position attention and auxiliary network
CN106156777A (en) Textual image detection method and device
CN112818969A (en) Knowledge distillation-based face pose estimation method and system
CN112990282B (en) Classification method and device for fine-granularity small sample images
CN112766218A (en) Cross-domain pedestrian re-identification method and device based on asymmetric joint teaching network
CN109753906A (en) Public place anomaly detection method based on domain migration
CN108875448B (en) Pedestrian re-identification method and device
CN112347927A (en) High-resolution image building extraction method based on convolutional neural network probability decision fusion
Do Attention in crowd counting using the transformer and density map to improve counting result
CN115690549A (en) Target detection method for realizing multi-dimensional feature fusion based on parallel interaction architecture model
CN112580569B (en) Vehicle re-identification method and device based on multidimensional features
CN115456957B (en) Method for detecting change of remote sensing image by full-scale feature aggregation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination