CN110610129A - Deep learning face recognition system and method based on self-attention mechanism - Google Patents
Deep learning face recognition system and method based on self-attention mechanism Download PDFInfo
- Publication number
- CN110610129A CN110610129A CN201910719368.6A CN201910719368A CN110610129A CN 110610129 A CN110610129 A CN 110610129A CN 201910719368 A CN201910719368 A CN 201910719368A CN 110610129 A CN110610129 A CN 110610129A
- Authority
- CN
- China
- Prior art keywords
- characteristic diagram
- channel
- attention
- obtaining
- self
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000007246 mechanism Effects 0.000 title claims abstract description 20
- 238000013135 deep learning Methods 0.000 title claims description 29
- 238000010586 diagram Methods 0.000 claims abstract description 103
- 239000011159 matrix material Substances 0.000 claims abstract description 58
- 238000006243 chemical reaction Methods 0.000 claims abstract description 39
- 238000005457 optimization Methods 0.000 claims abstract description 25
- 238000004364 calculation method Methods 0.000 claims abstract description 18
- 230000017105 transposition Effects 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 21
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 5
- 230000003213 activating effect Effects 0.000 claims 1
- 238000001914 filtration Methods 0.000 abstract description 2
- 238000003909 pattern recognition Methods 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 13
- 238000000605 extraction Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 238000012795 verification Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a system and a method for deeply learning face recognition based on a self-attention mechanism, and belongs to the field of computer vision and pattern recognition. The invention constructs a channel self-attention module, performs dimension conversion transposition on three-dimensional data of a characteristic diagram, learns a cross-correlation relationship matrix among channels to express a relative relationship among different channels, obtains the characteristics after channel optimization through calculation with the original characteristics, and performs different weight assignment on different channels, thereby realizing the selection of channel filtration and reducing the redundant information of characteristic channels. A spatial self-attention module is constructed, spatial information of a three-dimensional feature map is modeled, a cross-correlation relation matrix among the spatial positions of the feature map is learned to represent the relative relation among different positions, the feature after spatial position optimization is obtained through calculation with the input feature, different weights are given to different positions of a face feature map, the selection of important feature areas of the face is achieved, and the feature is concentrated in the important areas of the face.
Description
Technical Field
The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a deep learning face recognition system and method based on an attention mechanism.
Background
In recent years, with the rapid development of parallel computing processing capability of computers, the technical field of computer vision has advanced greatly under the push of the heat trend of deep learning, and has certain application requirements in various fields. The face recognition is a technology for enabling a computer to automatically recognize the identities of related personnel in monitoring data in a visual algorithm, and is widely applied to various fields such as intelligent security, personnel attendance, community inspection, self-service and the like. For example, the sky-eye monitoring system in the 'safe city smart community' plan in China tracks and catches suspects by using a face recognition technology; in daily life and work, a face recognition technology is often used, such as a face recognition system installed in a campus laboratory and an enterprise office, so that the attendance checking function of related workers can be completed, and meanwhile, the invasion of external personnel can be prevented; in addition, in the field of financial payment, the face recognition technology is fully utilized, for example, a face recognition system is installed on an ATM (automatic teller machine) of a bank to prevent fraudulent card swiping of other people, and face swiping payment adopted in mobile payment and the like are used for further guaranteeing safety. The face recognition technology generally only needs one common camera to complete recognition and authentication operation in an actual deployment environment, has the advantages of dynamic property, no need of cooperation and the like, has more convenient advantages compared with the traditional biological characteristics such as iris recognition, fingerprint recognition and the like, and the application of the face recognition technology is more and more extensive due to the existence of the factors.
Since 2012, the understanding and analysis of face images by computers has been greatly leaped owing to the theoretical development of deep learning and the technological progress of GPU acceleration. The face recognition technology is also applied to the commercial application of high-speed trains with convolutional neural networks under the non-matching condition. Particularly, the current real-time personnel deployment and control system based on the monitoring video can automatically detect, analyze and capture the face image area in the video while analyzing the monitoring video stream, upload the face image area to a background server to complete real-time face deployment and control comparison, and simultaneously alarm abnormal face images, so that a large amount of manpower, material resources and financial resources are saved for the construction of the current 'safe city'.
By means of the analysis capability of Deep Convolutional Neural Network (DCNN) on images, the Deep features based on the Convolutional Neural Network gradually replace the traditional manual features in face recognition. Compared with the traditional manual shallow feature, the depth feature has stronger distinguishing capability and robustness. At present, the face recognition algorithm based on the convolutional neural network mainly realizes the constraint of a feature space by modifying a loss function, such as CosFace, ArcFace and the like, but does not carry out targeted research on the network structure. The method carries out feature extraction through a general classification convolutional neural network, and then carries out constraint of a feature space on a final classification layer, thereby realizing the purposes of increasing the distance between classes and reducing the distance in the classes. The modification aiming at the loss function actually enhances the distinguishing capability of the features to a great extent, but the methods ignore the problems of the convolutional neural network structure in the face recognition feature extraction. The existing convolutional neural network has a single structure, so that the problems of information redundancy and the like exist in forward-propagated feature extraction, the flexibility is weak, and the generalization capability is slightly poor.
Most of the existing face recognition algorithms use a general image classification backbone network, and such networks have two disadvantages in actual face application. Firstly, the feature map extracted by the standard CNN network often has a large number of channels, for example, the number of channels in the later stage of the ResNet network reaches 2048, so that a certain information redundancy is brought to a great extent by the large number of channels, and even a risk of network overfitting may exist. Although there are some regularization approaches such as Dropout that are effective to alleviate this problem, the results are still unsatisfactory; secondly, based on the cognition of the human face to the real world, different parts in the human face image have different importance in the actual recognition, but the mechanism of parameter sharing of convolution kernel in the convolution neural network endows the same weight to all image pixels, and different processing modes cannot be well given to different positions.
Disclosure of Invention
The invention provides a deep learning face recognition method based on a self-attention mechanism, aiming at the defects that overfitting is caused by high channel number of characteristic images in a general convolutional neural network in the prior art and different positions of a face are not distinguished and treated due to a convolutional kernel weight sharing mechanism and the like and improvement requirements. The method aims to learn the cross-correlation information among characteristic diagram channels through a channel self-attention module to obtain the matrix relation among the channels and endow different channels with different importance; then, the spatial self-attention module learns the cross-correlation information between the positions of the feature map to obtain the matrix relation between the positions, and different weights are given to the spatial positions of the feature map to learn the importance of different positions of the human face. The method not only can keep the excellent performance of the original convolution neural network, but also can optimize the characteristics of the face image in the forward transmission process of the neural network, reduce the information redundancy among image channels, concentrate the convolution kernel on the more important position in the face image, improve the face recognition accuracy and enhance the flexibility and generalization capability of the model.
To achieve the above object, according to one aspect of the present invention, there is provided a deep learning face recognition system based on an attention-free mechanism, the system including:
the input module is used for selecting a face picture training set and inputting a face picture to be recognized;
a self-attention based deep learning module with ResNet as a backbone network, comprising a plurality of residual blocks and a plurality of attention modules, said attention modules comprising a channel attention module and/or a spatial attention module, concatenating the channel attention module and/or the spatial attention module at the end of each residual block, the last layer being a fully connected layer, the residual block is used for further extracting the characteristic diagram of the input face picture or the characteristic diagram, the channel attention module is used for learning a cross-correlation relation matrix among characteristic diagram channels in the forward propagation process to obtain a characteristic diagram after channel optimization, the space attention module is used for learning a cross-correlation relation matrix between the space positions of the feature maps in the forward propagation process to obtain the feature maps after the space positions are optimized; the full connection layer is used for converting the finally optimized feature map into features;
the training module is used for training the self-attention-based deep learning module by adopting the face picture training set to obtain a trained self-attention-based deep learning module;
and the face recognition module is used for inputting the face picture to be recognized into the trained self-attention-based deep learning module and outputting a face recognition result.
Specifically, the channel attention module is realized by the following steps:
inputting a feature map FI∈RC×H×WRespectively obtaining a characteristic diagram theta (F) through two parallel convolutionsI)∈RC×H×W、
Characteristic diagram theta (F)I)、φ(FI) Respectively carrying out maximum pooling and average pooling in parallel to obtain a characteristic diagram
Characteristic diagram Pool (F)I)1Obtaining a characteristic diagram through dimension conversionCharacteristic diagram Pool (F)I)2Obtaining a characteristic diagram through dimension conversionThen to Pool' (F)I)2Performing transposition;
obtaining a channel self-attention moment array by a Softmax activation function operated according to rows
Feature map FIObtaining a characteristic diagram through convolutionCharacteristic diagram ρ (F)I) Obtaining a characteristic diagram through dimension conversion
To ACAnd ρ' (F)I) Matrix multiplication and dimension conversion are carried out, and the result after dimension conversion is summed with FIAdding bit by bit to obtain the final characteristic diagram with optimized channel dimension of C multiplied by H multiplied by W
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,representing a matrix multiplication operation and alpha representing a coefficient controlling the proportion of the original features to the channel optimized features.
In particular, the channel self-attention matrix AcThe deployment is as follows:
wherein,showing the correlation between the ith channel and the jth channel.
Specifically, the spatial attention module is realized by the following steps:
inputting a feature map FC∈RC×H×WRespectively obtaining a characteristic diagram through two parallel convolutions
Characteristic diagram theta (F)C) Obtaining a characteristic diagram through the maximum pooling and the average pooling in parallel connection
Respectively corresponding to the characteristic diagram phi (F)C) And Pool (F)C)1Dimension conversion to obtain a characteristic diagram Andaim at phi' (F)C) Performing transposition;
for feature map phi' (F)C)TAnd Pool' (F)C)1Matrix multiplication and Softmax nonlinear activation calculation are carried out to obtain a spatial self-attention moment array
Feature map FCObtaining a characteristic graph rho (F) through convolutionC)∈RC×H×WObtaining a characteristic diagram through the maximum pooling and the average pooling which are connected in parallelObtaining a characteristic diagram through dimension conversion
Obtaining a feature map with dimension of C multiplied by H multiplied by W after space position optimization through matrix multiplication and bitwise addition
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,expressing matrix multiplication operation, beta expressing a coefficient for controlling the proportion of the original characteristic to the space position optimization characteristic, and a variable r serving as a coefficient for channel dimension reduction, wherein the requirement that r is more than 1 andare integers.
Specifically, the calculation formula of the loss function L is as follows:
wherein N and N respectively represent the number of samples of the current batch and the total number of categories, the hyperparameter s represents a scale scaling factor,representing the angle between the current sample feature and the corresponding class weight, θjRepresenting the angle between the class weight and the corresponding sample, m1And m2The angle interval and the cosine interval are respectively represented as two hyper-parameters of the loss function.
To achieve the above object, according to another aspect of the present invention, there is provided a deep learning face recognition method based on an attention-free mechanism, the method including the steps of:
training the self-attention-based deep learning network by adopting a face picture training set to obtain a trained self-attention-based deep learning network;
inputting the face picture to be recognized into the trained self-attention-based deep learning network, and outputting a face recognition result;
the self-attention-based deep learning network takes ResNet as a backbone network and comprises a plurality of residual blocks and a plurality of attention modules, wherein each attention module comprises a channel attention module and/or a space attention module, the channel attention module and/or the space attention module are connected in series at the tail of each residual block, the last layer is a full connection layer, the residual blocks are used for further extracting feature maps of input face pictures or feature maps, the channel attention module is used for learning a cross-correlation relationship matrix among feature map channels in the forward propagation process to obtain a feature map after channel optimization, and the space attention module is used for learning a cross-correlation relationship matrix among feature map space positions in the forward propagation process to obtain a feature map after space position optimization; the full connection layer is used for converting the finally optimized feature map into features.
Specifically, the channel attention module is realized by the following steps:
inputting a feature map FI∈RC×H×WRespectively obtaining a characteristic diagram theta (F) through two parallel convolutionsI)∈RC×H×W、
Characteristic diagram theta (F)I)、φ(FI) Respectively carrying out maximum pooling and average pooling in parallel to obtain a characteristic diagram
Characteristic diagram Pool (F)I)1Obtaining a characteristic diagram through dimension conversionCharacteristic diagram Pool (F)I)2Obtaining a characteristic diagram through dimension conversionThen to Pool' (F)I)2Performing transposition;
obtaining a channel self-attention moment array by a Softmax activation function operated according to rows
Feature map FIObtaining a characteristic diagram through convolutionCharacteristic diagram ρ (F)I) Obtaining a characteristic diagram through dimension conversion
To ACAnd ρ' (F)I) Matrix multiplication and dimension conversion are carried out, and the result after dimension conversion is summed with FIAdding bit by bit to obtain the final characteristic diagram with optimized channel dimension of C multiplied by H multiplied by W
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,representing a matrix multiplication operation and alpha representing a coefficient controlling the proportion of the original features to the channel optimized features.
In particular, the channel self-attention matrix AcThe deployment is as follows:
wherein,showing the correlation between the ith channel and the jth channel.
Specifically, the spatial attention module is realized by the following steps:
inputting a feature map FC∈RC×H×WRespectively obtaining a characteristic diagram through two parallel convolutions
Characteristic diagram theta (F)C) Obtaining a characteristic diagram through the maximum pooling and the average pooling in parallel connection
Respectively corresponding to the characteristic diagram phi (F)C) And Pool (F)C)1Dimension conversion to obtain a characteristic diagram Aim at phi' (F)C) Performing transposition;
for feature map phi' (F)C)TAnd Pool' (F)C)1Matrix multiplication and Softmax nonlinear activation calculation are carried out to obtain a spatial self-attention moment array
Feature map FCObtaining a characteristic graph rho (F) through convolutionC)∈RC×H×WObtaining a characteristic diagram through the maximum pooling and the average pooling which are connected in parallelObtaining a characteristic diagram through dimension conversion
Obtaining a feature map with dimension of C multiplied by H multiplied by W after space position optimization through matrix multiplication and bitwise addition
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,expressing matrix multiplication operation, beta expressing a coefficient for controlling the proportion of the original characteristic to the space position optimization characteristic, and a variable r serving as a coefficient for channel dimension reduction, wherein the requirement that r is more than 1 andare integers.
Specifically, the calculation formula of the loss function L is as follows:
wherein N and N respectively represent the number of samples of the current batch and the total number of categories, the hyperparameter s represents a scale scaling factor,representing the angle between the current sample feature and the corresponding class weight, θjRepresenting the angle between the class weight and the corresponding sample, m1And m2The angle interval and the cosine interval are respectively represented as two hyper-parameters of the loss function.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
1. according to the principle of an attention mechanism, a channel self-attention module is constructed, the module learns a cross-correlation relation matrix among channels by performing operations such as dimension conversion transposition on three-dimensional data of a feature map, the matrix represents the relative relation among different channels, finally, the feature after channel optimization is obtained by calculating the original feature, the cross-correlation relation among different channels is finally learned, different weight assignment is performed on different channels, the selection of channel filtering is realized, and the redundant information of the feature channels is reduced.
2. According to the invention, a spatial self-attention module is constructed according to the principles of an attention mechanism, global feature expression and the like, the spatial self-attention module models spatial information of a three-dimensional feature map, learns a cross-correlation relation matrix among the spatial positions of the feature map, the matrix represents the relative relation among different positions, and finally obtains features after spatial position optimization through calculation with input features, finally learns the cross-correlation relation among different spatial positions, gives different weights to different positions of a face feature map, realizes the selection of important feature regions of a face, distinguishes different parts and processes, and concentrates the features in the most important region of the face.
Drawings
Fig. 1 is an overall framework diagram of a deep learning face recognition system based on an attention-driven mechanism according to an embodiment of the present invention;
FIG. 2 is a block diagram of a channel self-attention module according to an embodiment of the present invention;
FIG. 3 is a block diagram of a spatial self-attention module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model combination of channel self-attention and spatial self-attention provided by an embodiment of the present invention;
fig. 5 is an effect diagram of a face recognition method based on a self-attention mechanism according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, in the deep learning face recognition system based on the Self-Attention mechanism, a Residual Self-Attention Network model (SRANet) is improved based on the standard ResNet. Specifically, the invention adds a serial channel attention module and a serial space attention module at the end of each standard residual block to calculate a channel attention relationship matrix and a space attention relationship matrix, and then obtains the final optimization characteristics in a matrix multiplication mode. In addition, the last average pooling layer in the original ResNet structure is removed, and the average pooling layer is replaced by a full-connection layer with the fixed size of 512 dimensions, so that the final feature extraction is carried out. Compared with average pooled single-channel average, the full-connection layer considers channel and space information at the same time, and is matched with a channel and space attention module, so that the design is more reasonable.
Taking the data in fig. 1 as an example, assume that the original output of a certain residual block in the convolutional neural network is FISRANet based on the self-attention mechanism, first FIInputting the data into a channel self-attention module to calculate a channel attention matrix, and after cross-correlation matrix information among different channels is obtained, FIPerforming matrix multiplication operation with channel self-attention matrix, and performing bitwise addition operation with original input to obtain channel qualityCharacteristic F after conversionC(ii) a Similarly, the feature F after the spatial position optimization can be obtained by using the same methodSFinally, FSThe output of this residual structure is input to the next residual structure, and the ellipses in the figure indicate that there are a plurality of such structures.
The invention divides the face recognition into four stages: the method comprises a face image preprocessing stage, a self-attention model building stage, a loss function calculating stage and a feature extraction and retrieval comparison stage.
Preprocessing stage of face image
The face image preprocessing stage comprises the following steps: selection of a face data set, and preprocessing of the face data. The face data preprocessing is mainly divided into two parts: face detection, key point alignment and image data normalization.
For face detection and face key point alignment, the invention uses a cascaded multi-task convolutional neural network MTCNN commonly used in the industry to predict the face position and the face key point at the same time. In the actual training, 4 coordinate positions and 5 key point positions are predicted, and then the detected original face is cut into 112 × 112 face pictures with fixed sizes through similarity transformation.
For image data normalization, the present invention normalizes the pixel values in the original RGB image to [ -1, 1] by subtracting 127.5 and then dividing by 128. In addition, in training, the normalized training set is horizontally turned over with a probability of 50%, so that the effect of data set expansion is achieved, and the overall system precision is improved.
Self-attention model construction phase
(2.1) selecting backbone network
The invention adopts ResNet-50/ResNet-100 as the backbone network of the self-attention model to train the face recognition model, and in the design of ResNet residual block, the convolution kernel size of 3 x 3 is selected.
(2.2) design and implementation of channel self-attention Module
And a channel self-attention module is added behind each residual block of the backbone network to learn the cross-correlation relationship among the characteristic diagram channels in the forward transmission process of the convolutional neural network.
The structure of the channel self-attention module is shown in fig. 2. Input feature map FI∈RC×H×WFirstly, inputting an input feature map into two parallel convolution layers of 1 multiplied by 1, keeping the space scale of the input feature map unchanged, and halving the number of one channel to obtain the feature mapIn order to reduce the burden of matrix calculation, the invention simultaneously increases the maximum pooling and the average pooling which are connected in parallel before the matrix calculation and after the convolution layer, and the pooling kernels of the maximum pooling and the average pooling are the same in size and are both C multiplied by 2. On one hand, the performance stability is kept, and on the other hand, the consumption of video memory is greatly reduced. Through the operation of two pooling layers, the channel self-attention module only retains one-quarter size of spatial data for calculation, so there are:
wherein,
next, two feature maps Pool (F)I)1、Pool(FI)2Dimension conversion and/or transposition operations are performed. Characteristic diagram Pool (F)I)1Through dimension conversion, willIs converted intoObtaining a characteristic diagram Pool' (F)I)1. Characteristic diagram Pool (F)I)2Through dimension conversion, willIs converted intoObtaining a characteristic diagram Pool' (F)I)2Then is transposed into
Finally, obtaining a channel self-attention matrix A by a Softmax activation function operated according to rowsc。
The formula is developed:
wherein,showing the correlation between the ith channel and the jth channel. After the channel notices the completion of the calculation of the moment array, the characteristic F is input in the same wayIObtained by a 1 × 1 convolutionBy dimension conversion to obtain a characteristic diagramFeature map ρ' (F) of the magnitude ofI). Then to ACAnd ρ' (F)I) Matrix multiplication and dimension conversion are carried out, and then F is subjected toIAdding the results after dimension conversion bit by bit to obtain the characteristic F with the dimension of C multiplied by H multiplied by W of the final channel optimizationC。
In all the above formulas, C, H, W represents the channel dimension, height and width of the input feature map, respectively, θ, φ and ρ represent the channel convolution operation,it is shown that the bit-by-bit addition operation,the coefficient alpha for controlling the proportion of the original features and the channel optimization features is a learnable parameter with an initial value of 0, and the purpose of the coefficient alpha is to reduce the difficulty of the neural network when the neural network is just trained.
(2.3) design and implementation of spatial self-attention Module
After the channel self-attention module, a serial spatial self-attention module is followed to learn the relationship between the feature map positions, wherein all parameters are trained by a neural network back propagation technology and are self-adaptively learned.
As shown in fig. 3, a feature map F is inputC∈RC×H×WThe spatial self-attention module first inputs the feature FCInputting the data into two parallel 1 × 1 convolutional layers, keeping their spatial scale unchanged, but performing a certain degree of channel dimensionality reduction to obtain a feature map r > 1 andare integers. And then reducing the space dimension of one feature map by adopting two parallel maximum pooling and average pooling (the pooling cores are the same). As shown in FIG. 3, θ (F) is selectedC) Obtaining a characteristic diagram Pool (F)C)1。
Then respectively adding phi (F)C) And Pool (F)C)1Dimension conversion toAndobtain a characteristic diagram phi' (F)C) And Pool' (F)C)1. Will phi' (F)C) Is transposed intoThen, the invention carries out matrix multiplication and Softmax nonlinear activation calculation on the two characteristics to obtain a space self-attention matrix AS。
Unfolding like the tunnel is self-attentive, one can get:
in the context of this formula, the expression,representing the number of features in the pooled feature space dimension.Is a 2-dimensional matrix representing the relationship between any two spatial locations of the input features, e.g.,denotes phi' (F)C)TI th position and Pool' (F)C)1Is determined, where Softmax is calculated by row.
Next, spatial self-annotation is computedAfter the relationship matrix of the intention, the input feature F is also givenCOne convolution to obtain rho (F)C)∈RC×H×WIs converted intoDimension conversion toMatrix Pool' (p (F)C)). Finally, obtaining the characteristic F with the dimensionality of C multiplied by H multiplied by W after space self-attention structure optimization through matrix multiplication and bitwise additionS。
In all the above equations, θ, φ, ρ represent convolution operations,it is shown that the bit-by-bit addition operation,and representing matrix multiplication operation, wherein beta is a learnable parameter with an initial value set to be 0, and a variable r is used as a coefficient for reducing the dimension of a channel and is finally set to be 16 through a comparison experiment.
(2.4) feature optimization and feature extraction settings
As shown in fig. 4, in order to fully and comprehensively optimize the three-dimensional feature map, the channel self-attention module and the spatial self-attention module are respectively connected in series behind the ResNet residual block of the backbone network, so as to optimize the feature map in the forward propagation process. The topology of fig. 4 includes dot-multiply and dot-add operations, with arrows directing the direction of input flow to output.
In addition, in the aspect of final feature extraction, the global average pooling layer of the original ResNet is removed, the optimized features are input into a full-connection layer with fixed dimension, the dimension of the full-connection layer is fixedly set to 512, and the full-connection layer with 512 dimensions is replaced by the full-connection layer with 512 dimensions for final feature extraction.
Loss function calculation stage
In order to effectively solve the problem that the conventional loss function can not comprehensively and effectively constrain all samples of the feature space, the invention provides an improved loss function L based on multi-interval constraint.
The formula is established on the basis of weight normalization and feature normalization, namely the invention firstly needs toAfter such constraints, all sample features are distributed on a hypersphere, where xi∈RdFeatures of the ith sample, with this sample belonging to the yiIndividual class, wj∈RdJ-th column, b, representing a weight parameter WjIs the corresponding bias term parameter, N and N respectively represent the number of samples of the current batch and the total number of categories,representing the angle between the current sample feature and the corresponding class weight, m1And m2Two hyper-parameters of the loss function, representing the angle interval and the cosine interval, respectively, the hyper-parameter s representing a scale scaling factor for avoiding the disappearance of the gradient, θjRepresenting the angle between the class weight and the corresponding sample, | | | | represents a 2-norm operation.
Feature extraction and retrieval comparison stage
After the face image to be recognized is processed by the trained model, a feature vector with fixed dimension is obtained, the vector is used for carrying out real-time comparison with features extracted offline from a library, and whether the face image is a person needing to be retrieved is judged according to cosine similarity obtained through calculation and a set threshold value. In this embodiment, the threshold is usually set in the range of 0.6 to 0.7.
The extraction and retrieval comparison of the characteristics are the stage when real-time face recognition is carried out on line, the given face to be searched is input into the trained model according to the same processing mode, the characteristic vector with the fixed size of 512 dimensions is extracted at the last full-connected layer, the cosine similarity comparison is carried out on the characteristic vector and the characteristics extracted off line in the library, and the cosine similarity calculation formula is as follows:
wherein A isi、BiAnd respectively indicating the features of the facial image to be retrieved and the stored facial image features in the search library, taking a plurality of images with the highest similarity and the similarity being greater than a set threshold value as query results, and completing the final facial recognition process, wherein P represents the dimension of a feature vector, and is 512.
Examples
In order to prove that the deep learning face recognition method based on the self-attention mechanism has advantages in performance and adaptability, the method is verified and analyzed through the following experiments:
A. experimental data set
Training set: CASIA-Webface and MS-Celeb-1M. The total number of the CASIA-Webface is 10575, and the total number of the human face images is 49.4 million. The total number of 100K people in the MS-Celeb-1M raw data is 10M face pictures, but the number of wrong samples is more, so that samples after cleaning are adopted in training, and the total number of images is 86876 individual 3.9M.
And (3) test set: LFW, AgeDB-30, CFP-FP, and MegaFace. The LFW, the AgeDB-30 and the CFP-FP test the human face verification accuracy on a small scale, and the MegaFace tests the human face recognition accuracy on a million level and the human face verification accuracy on a millionth false alarm rate.
B. Evaluation criteria
The invention adopts the mainstream evaluation standard of face recognition research at home and abroad, for face verification test, the accuracy is evaluated, and if the tested sample set has K pairs of pictures, wherein L pairs exist in wrong judgment, the accuracy of face verification is as follows:
for the recognition accuracy of MegaFace in the million level, a cumulative matching feature first accuracy rate CMC @1, namely a Rank1 recognition rate, is adopted. For the assumption that the size of a face query set is Q, each image Q to be queried in the face query set isiQ performs similarity rank matching work, if each query image Q is a query image Q, i is 1, 2iThe first correctly matched image location is r (q)i) Then the calculation formula for CMC @ K is:
in the CMC curve, the identification accuracy is higher when K is larger, and in the MegaFace test protocol, the identification result of Rank1 is analyzed, namely CMC @ 1.
C. Results of the experiment
Experiments show that the face verification accuracy of the invention on LFW, AgeDB-3 and CFP-FP reaches 99.83%, 98.67% and 95.86% respectively; in addition, the Rank1 recognition rate on MegaFace in the million level is 98.38%, the verification rate with the false alarm rate of one millionth is 98.45%, and the levels reach the leading level. Meanwhile, the invention compares the existing mainstream scheme on several data sets, and the experimental results are shown in the following table:
TABLE 1 face verification accuracy (%) -of LFW, AgeDB-30 and CFP-FP
TABLE 2 MegaFace test results (%)
From the above two tables, it can be seen that the present invention shows superior performance in the same experimental environment, and in addition, the present invention also performs visualization processing on the face model based on the self-attention mechanism, and as a result, as shown in fig. 5, it can be seen that the face model with the attention module has a clearer face contour, so that the person is easier to recognize, which fully proves that the model based on the self-attention mechanism can effectively perform feature optimization on the forward transmission process of the convolutional neural network, and enhances the distinguishing force and robustness of the face features.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A system for deep learning face recognition based on a self-attention mechanism, the system comprising:
the input module is used for selecting a face picture training set and inputting a face picture to be recognized;
a self-attention based deep learning module with ResNet as a backbone network, comprising a plurality of residual blocks and a plurality of attention modules, said attention modules comprising a channel attention module and/or a spatial attention module, concatenating the channel attention module and/or the spatial attention module at the end of each residual block, the last layer being a fully connected layer, the residual block is used for further extracting the characteristic diagram of the input face picture or the characteristic diagram, the channel attention module is used for learning a cross-correlation relation matrix among characteristic diagram channels in the forward propagation process to obtain a characteristic diagram after channel optimization, the space attention module is used for learning a cross-correlation relation matrix between the space positions of the feature maps in the forward propagation process to obtain the feature maps after the space positions are optimized; the full connection layer is used for converting the finally optimized feature map into features;
the training module is used for training the self-attention-based deep learning module by adopting the face picture training set to obtain a trained self-attention-based deep learning module;
and the face recognition module is used for inputting the face picture to be recognized into the trained self-attention-based deep learning module and outputting a face recognition result.
2. The face recognition system of claim 1, wherein the channel attention module is implemented by:
inputting a feature map FI∈RC×H×WRespectively obtaining a characteristic diagram theta (F) through two parallel convolutionsI)∈RC×H×W、
Characteristic diagram theta (F)I)、φ(FI) Respectively carrying out maximum pooling and average pooling in parallel to obtain a characteristic diagram
Characteristic diagram Pool (F)I)1Obtaining a characteristic diagram through dimension conversionCharacteristic diagram Pool (F)I)2Obtaining a characteristic diagram through dimension conversionThen to Pool' (F)I)2Performing transposition;
channel self-attention is obtained by activating functions of Softmax operated according to rowsMatrix array
Feature map FIObtaining a characteristic diagram through convolutionCharacteristic diagram ρ (F)I) Obtaining a characteristic diagram through dimension conversion
To ACAnd ρ' (F)I) Matrix multiplication and dimension conversion are carried out, and the result after dimension conversion is summed with FIAdding bit by bit to obtain the final characteristic diagram with optimized channel dimension of C multiplied by H multiplied by W
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,representing a matrix multiplication operation and alpha representing a coefficient controlling the proportion of the original features to the channel optimized features.
3. The face recognition system of claim 2, wherein the channels are from an attention matrix acThe deployment is as follows:
wherein,showing the correlation between the ith channel and the jth channel.
4. The face recognition system of claim 1, wherein the spatial attention module is implemented by:
inputting a feature map FC∈RC×H×WRespectively obtaining a characteristic diagram through two parallel convolutions
Characteristic diagram theta (F)C) Obtaining a characteristic diagram through the maximum pooling and the average pooling in parallel connection
Respectively corresponding to the characteristic diagram phi (F)C) And Pool (F)C)1Dimension conversion to obtain a characteristic diagram Andaim at phi' (F)C) Performing transposition;
for feature map phi' (F)C)TAnd Pool' (F)C)1Matrix multiplication and Softmax nonlinear activation calculation are carried out to obtain a spatial self-attention moment array
Feature map FCObtaining a characteristic graph rho (F) through convolutionC)∈RC×H×WObtaining a characteristic diagram through the maximum pooling and the average pooling which are connected in parallelObtaining a characteristic diagram through dimension conversion
Obtaining a feature map with dimension of C multiplied by H multiplied by W after space position optimization through matrix multiplication and bitwise addition
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,expressing matrix multiplication operation, beta expressing a coefficient for controlling the proportion of the original characteristic to the space position optimization characteristic, and a variable r serving as a coefficient for channel dimension reduction, wherein the requirement that r is more than 1 andare integers.
5. A face recognition system as claimed in any one of claims 1 to 4, wherein the loss function L is calculated as follows:
wherein N and N respectively represent the current batchThe number of secondary samples, and the total number of classes, the hyper-parameter s represents a scaling factor,representing the angle between the current sample feature and the corresponding class weight, θiRepresenting the angle between the class weight and the corresponding sample, m1And m2The angle interval and the cosine interval are respectively represented as two hyper-parameters of the loss function.
6. A deep learning face recognition method based on a self-attention mechanism is characterized by comprising the following steps:
training the self-attention-based deep learning network by adopting a face picture training set to obtain a trained self-attention-based deep learning network;
inputting the face picture to be recognized into the trained self-attention-based deep learning network, and outputting a face recognition result;
the self-attention-based deep learning network takes ResNet as a backbone network and comprises a plurality of residual blocks and a plurality of attention modules, wherein each attention module comprises a channel attention module and/or a space attention module, the channel attention module and/or the space attention module are connected in series at the tail of each residual block, the last layer is a full connection layer, the residual blocks are used for further extracting feature maps of input face pictures or feature maps, the channel attention module is used for learning a cross-correlation relationship matrix among feature map channels in the forward propagation process to obtain a feature map after channel optimization, and the space attention module is used for learning a cross-correlation relationship matrix among feature map space positions in the forward propagation process to obtain a feature map after space position optimization; the full connection layer is used for converting the finally optimized feature map into features.
7. The face recognition method of claim 6, wherein the channel attention module is implemented by:
inputting a feature map FI∈RC×H×WRespectively obtaining a characteristic diagram theta (F) through two parallel convolutionsI)∈RC×H×W、
Characteristic diagram theta (F)I)、φ(FI) Respectively carrying out maximum pooling and average pooling in parallel to obtain a characteristic diagram
Characteristic diagram Pool (F)I)1Obtaining a characteristic diagram through dimension conversionCharacteristic diagram Pool (F)I)2Obtaining a characteristic diagram through dimension conversionThen to Pool' (F)I)2Performing transposition;
obtaining a channel self-attention moment array by a Softmax activation function operated according to rows
Feature map FIObtaining a characteristic diagram through convolutionCharacteristic diagram ρ (F)I) Obtaining a characteristic diagram through dimension conversion
To ACAnd ρ' (F)I) Matrix multiplication and dimension conversion are carried out, and the result after dimension conversion is summed with FIAdding bit by bit to obtain the final characteristic diagram with optimized channel dimension of C multiplied by H multiplied by W
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,representing a matrix multiplication operation and alpha representing a coefficient controlling the proportion of the original features to the channel optimized features.
8. The face recognition method of claim 7, wherein the channel is from the attention matrix acThe deployment is as follows:
wherein,showing the correlation between the ith channel and the jth channel.
9. The face recognition method of claim 6, wherein the spatial attention module is implemented by:
inputting a feature map FC∈RC×H×WRespectively obtaining a characteristic diagram through two parallel convolutions
Characteristic diagram theta (F)C) Obtaining a characteristic diagram through the maximum pooling and the average pooling in parallel connection
Respectively corresponding to the characteristic diagram phi (F)C) And Pool (F)C)1Dimension conversion to obtain a characteristic diagram Andaim at phi' (F)C) Performing transposition;
for feature map phi' (F)C)TAnd Pool' (F)C)1Matrix multiplication and Softmax nonlinear activation calculation are carried out to obtain a spatial self-attention moment array
Feature map FCObtaining a characteristic graph rho (F) through convolutionC)∈RC×H×WObtaining a characteristic diagram through the maximum pooling and the average pooling which are connected in parallelObtaining a characteristic diagram through dimension conversion
Obtaining a feature map with dimension of C multiplied by H multiplied by W after space position optimization through matrix multiplication and bitwise addition
Wherein C, H, W represents the channel dimension, height and width of the original characteristic diagram, theta, phi and rho represent the channel convolution operation,it is shown that the bit-by-bit addition operation,expressing matrix multiplication operation, beta expressing a coefficient for controlling the proportion of the original characteristic to the space position optimization characteristic, and a variable r serving as a coefficient for channel dimension reduction, wherein the requirement that r is more than 1 andare integers.
10. The face recognition method according to any one of claims 6 to 9, wherein the loss function L is calculated as follows:
wherein N and N respectively represent the number of samples of the current batch and the total number of categories, the hyperparameter s represents a scale scaling factor,representing the angle between the current sample feature and the corresponding class weight, θjRepresenting the angle between the class weight and the corresponding sample, m1And m2The angle interval and the cosine interval are respectively represented as two hyper-parameters of the loss function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910719368.6A CN110610129A (en) | 2019-08-05 | 2019-08-05 | Deep learning face recognition system and method based on self-attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910719368.6A CN110610129A (en) | 2019-08-05 | 2019-08-05 | Deep learning face recognition system and method based on self-attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110610129A true CN110610129A (en) | 2019-12-24 |
Family
ID=68890322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910719368.6A Pending CN110610129A (en) | 2019-08-05 | 2019-08-05 | Deep learning face recognition system and method based on self-attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110610129A (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199233A (en) * | 2019-12-30 | 2020-05-26 | 四川大学 | Improved deep learning pornographic image identification method |
CN111222515A (en) * | 2020-01-06 | 2020-06-02 | 北方民族大学 | Image translation method based on context-aware attention |
CN111260462A (en) * | 2020-01-16 | 2020-06-09 | 东华大学 | Transaction fraud detection method based on heterogeneous relation network attention mechanism |
CN111274999A (en) * | 2020-02-17 | 2020-06-12 | 北京迈格威科技有限公司 | Data processing method, image processing method, device and electronic equipment |
CN111325145A (en) * | 2020-02-19 | 2020-06-23 | 中山大学 | Behavior identification method based on combination of time domain channel correlation blocks |
CN111368815A (en) * | 2020-05-28 | 2020-07-03 | 之江实验室 | Pedestrian re-identification method based on multi-component self-attention mechanism |
CN111582215A (en) * | 2020-05-17 | 2020-08-25 | 华中科技大学同济医学院附属协和医院 | Scanning identification system and method for normal anatomical structure of biliary-pancreatic system |
CN111798445A (en) * | 2020-07-17 | 2020-10-20 | 北京大学口腔医院 | Tooth image caries identification method and system based on convolutional neural network |
CN111860393A (en) * | 2020-07-28 | 2020-10-30 | 浙江工业大学 | Face detection and recognition method on security system |
CN111881746A (en) * | 2020-06-23 | 2020-11-03 | 安徽清新互联信息科技有限公司 | Face feature point positioning method and system based on information fusion |
CN112001215A (en) * | 2020-05-25 | 2020-11-27 | 天津大学 | Method for identifying identity of text-independent speaker based on three-dimensional lip movement |
CN112084911A (en) * | 2020-08-28 | 2020-12-15 | 安徽清新互联信息科技有限公司 | Human face feature point positioning method and system based on global attention |
CN112101434A (en) * | 2020-09-04 | 2020-12-18 | 河南大学 | Infrared image weak and small target detection method based on improved YOLO v3 |
CN112101456A (en) * | 2020-09-15 | 2020-12-18 | 推想医疗科技股份有限公司 | Attention feature map acquisition method and device and target detection method and device |
CN112183190A (en) * | 2020-08-18 | 2021-01-05 | 杭州翌微科技有限公司 | Human face quality evaluation method based on local key feature recognition |
CN112270213A (en) * | 2020-10-12 | 2021-01-26 | 萱闱(北京)生物科技有限公司 | Improved HRnet based on attention mechanism |
CN112464787A (en) * | 2020-11-25 | 2021-03-09 | 北京航空航天大学 | Remote sensing image ship target fine-grained classification method based on spatial fusion attention |
CN112464851A (en) * | 2020-12-08 | 2021-03-09 | 国网陕西省电力公司电力科学研究院 | Smart power grid foreign matter intrusion detection method and system based on visual perception |
CN112465026A (en) * | 2020-11-26 | 2021-03-09 | 深圳市对庄科技有限公司 | Model training method and device for jadeite mosaic recognition |
CN112633158A (en) * | 2020-12-22 | 2021-04-09 | 广东电网有限责任公司电力科学研究院 | Power transmission line corridor vehicle identification method, device, equipment and storage medium |
CN112667841A (en) * | 2020-12-28 | 2021-04-16 | 山东建筑大学 | Weak supervision depth context-aware image characterization method and system |
CN112801069A (en) * | 2021-04-14 | 2021-05-14 | 四川翼飞视科技有限公司 | Face key feature point detection device, method and storage medium |
CN112949841A (en) * | 2021-05-13 | 2021-06-11 | 德鲁动力科技(成都)有限公司 | Attention-based CNN neural network training method |
CN112967327A (en) * | 2021-03-04 | 2021-06-15 | 国网河北省电力有限公司检修分公司 | Monocular depth method based on combined self-attention mechanism |
CN113065550A (en) * | 2021-03-12 | 2021-07-02 | 国网河北省电力有限公司 | Text recognition method based on self-attention mechanism |
CN113344875A (en) * | 2021-06-07 | 2021-09-03 | 武汉象点科技有限公司 | Abnormal image detection method based on self-supervision learning |
CN113379657A (en) * | 2021-05-19 | 2021-09-10 | 上海壁仞智能科技有限公司 | Image processing method and device based on random matrix |
CN113392696A (en) * | 2021-04-06 | 2021-09-14 | 四川大学 | Intelligent court monitoring face recognition system and method based on fractional calculus |
CN113469335A (en) * | 2021-06-29 | 2021-10-01 | 杭州中葳数字科技有限公司 | Method for distributing weight for feature by using relationship between features of different convolutional layers |
CN113554151A (en) * | 2021-07-07 | 2021-10-26 | 浙江工业大学 | Attention mechanism method based on convolution interlayer relation |
CN113616209A (en) * | 2021-08-25 | 2021-11-09 | 西南石油大学 | Schizophrenia patient discrimination method based on space-time attention mechanism |
CN113989579A (en) * | 2021-10-27 | 2022-01-28 | 腾讯科技(深圳)有限公司 | Image detection method, device, equipment and storage medium |
CN114005078A (en) * | 2021-12-31 | 2022-02-01 | 山东交通学院 | Vehicle weight identification method based on double-relation attention mechanism |
CN114118140A (en) * | 2021-10-29 | 2022-03-01 | 新黎明科技股份有限公司 | Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing |
CN114550162A (en) * | 2022-02-16 | 2022-05-27 | 北京工业大学 | Three-dimensional object identification method combining view importance network and self-attention mechanism |
CN115100709A (en) * | 2022-06-23 | 2022-09-23 | 北京邮电大学 | Feature-separated image face recognition and age estimation method |
WO2023005161A1 (en) * | 2021-07-27 | 2023-02-02 | 平安科技(深圳)有限公司 | Face image similarity calculation method, apparatus and device, and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256450A (en) * | 2018-01-04 | 2018-07-06 | 天津大学 | A kind of supervised learning method of recognition of face and face verification based on deep learning |
CN109543606A (en) * | 2018-11-22 | 2019-03-29 | 中山大学 | A kind of face identification method that attention mechanism is added |
-
2019
- 2019-08-05 CN CN201910719368.6A patent/CN110610129A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256450A (en) * | 2018-01-04 | 2018-07-06 | 天津大学 | A kind of supervised learning method of recognition of face and face verification based on deep learning |
CN109543606A (en) * | 2018-11-22 | 2019-03-29 | 中山大学 | A kind of face identification method that attention mechanism is added |
Non-Patent Citations (2)
Title |
---|
HEFEI LING 等: ""Self Residual Attention Network for Deep Face Recognition"", 《IEEE ACCESS》 * |
JIANKANG DENG 等: ""ArcFace: Additive Angular Margin Loss for Deep Face Recognition"", 《ARXIV》 * |
Cited By (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199233A (en) * | 2019-12-30 | 2020-05-26 | 四川大学 | Improved deep learning pornographic image identification method |
CN111222515A (en) * | 2020-01-06 | 2020-06-02 | 北方民族大学 | Image translation method based on context-aware attention |
CN111222515B (en) * | 2020-01-06 | 2023-04-07 | 北方民族大学 | Image translation method based on context-aware attention |
CN111260462A (en) * | 2020-01-16 | 2020-06-09 | 东华大学 | Transaction fraud detection method based on heterogeneous relation network attention mechanism |
CN111260462B (en) * | 2020-01-16 | 2022-05-27 | 东华大学 | Transaction fraud detection method based on heterogeneous relation network attention mechanism |
CN111274999A (en) * | 2020-02-17 | 2020-06-12 | 北京迈格威科技有限公司 | Data processing method, image processing method, device and electronic equipment |
CN111274999B (en) * | 2020-02-17 | 2024-04-19 | 北京迈格威科技有限公司 | Data processing method, image processing device and electronic equipment |
CN111325145A (en) * | 2020-02-19 | 2020-06-23 | 中山大学 | Behavior identification method based on combination of time domain channel correlation blocks |
CN111325145B (en) * | 2020-02-19 | 2023-04-25 | 中山大学 | Behavior recognition method based on combined time domain channel correlation block |
CN111582215A (en) * | 2020-05-17 | 2020-08-25 | 华中科技大学同济医学院附属协和医院 | Scanning identification system and method for normal anatomical structure of biliary-pancreatic system |
CN112001215A (en) * | 2020-05-25 | 2020-11-27 | 天津大学 | Method for identifying identity of text-independent speaker based on three-dimensional lip movement |
CN112001215B (en) * | 2020-05-25 | 2023-11-24 | 天津大学 | Text irrelevant speaker identity recognition method based on three-dimensional lip movement |
CN111368815A (en) * | 2020-05-28 | 2020-07-03 | 之江实验室 | Pedestrian re-identification method based on multi-component self-attention mechanism |
CN111881746A (en) * | 2020-06-23 | 2020-11-03 | 安徽清新互联信息科技有限公司 | Face feature point positioning method and system based on information fusion |
CN111881746B (en) * | 2020-06-23 | 2024-04-02 | 安徽清新互联信息科技有限公司 | Face feature point positioning method and system based on information fusion |
CN111798445A (en) * | 2020-07-17 | 2020-10-20 | 北京大学口腔医院 | Tooth image caries identification method and system based on convolutional neural network |
CN111798445B (en) * | 2020-07-17 | 2023-10-31 | 北京大学口腔医院 | Tooth image caries identification method and system based on convolutional neural network |
CN111860393A (en) * | 2020-07-28 | 2020-10-30 | 浙江工业大学 | Face detection and recognition method on security system |
CN112183190A (en) * | 2020-08-18 | 2021-01-05 | 杭州翌微科技有限公司 | Human face quality evaluation method based on local key feature recognition |
CN112084911B (en) * | 2020-08-28 | 2023-03-07 | 安徽清新互联信息科技有限公司 | Human face feature point positioning method and system based on global attention |
CN112084911A (en) * | 2020-08-28 | 2020-12-15 | 安徽清新互联信息科技有限公司 | Human face feature point positioning method and system based on global attention |
CN112101434B (en) * | 2020-09-04 | 2022-09-09 | 河南大学 | Infrared image weak and small target detection method based on improved YOLO v3 |
CN112101434A (en) * | 2020-09-04 | 2020-12-18 | 河南大学 | Infrared image weak and small target detection method based on improved YOLO v3 |
CN112101456A (en) * | 2020-09-15 | 2020-12-18 | 推想医疗科技股份有限公司 | Attention feature map acquisition method and device and target detection method and device |
CN112101456B (en) * | 2020-09-15 | 2024-04-26 | 推想医疗科技股份有限公司 | Attention characteristic diagram acquisition method and device and target detection method and device |
CN112270213A (en) * | 2020-10-12 | 2021-01-26 | 萱闱(北京)生物科技有限公司 | Improved HRnet based on attention mechanism |
CN112464787A (en) * | 2020-11-25 | 2021-03-09 | 北京航空航天大学 | Remote sensing image ship target fine-grained classification method based on spatial fusion attention |
CN112464787B (en) * | 2020-11-25 | 2022-07-08 | 北京航空航天大学 | Remote sensing image ship target fine-grained classification method based on spatial fusion attention |
CN112465026A (en) * | 2020-11-26 | 2021-03-09 | 深圳市对庄科技有限公司 | Model training method and device for jadeite mosaic recognition |
CN112464851A (en) * | 2020-12-08 | 2021-03-09 | 国网陕西省电力公司电力科学研究院 | Smart power grid foreign matter intrusion detection method and system based on visual perception |
CN112633158A (en) * | 2020-12-22 | 2021-04-09 | 广东电网有限责任公司电力科学研究院 | Power transmission line corridor vehicle identification method, device, equipment and storage medium |
CN112667841A (en) * | 2020-12-28 | 2021-04-16 | 山东建筑大学 | Weak supervision depth context-aware image characterization method and system |
CN112967327A (en) * | 2021-03-04 | 2021-06-15 | 国网河北省电力有限公司检修分公司 | Monocular depth method based on combined self-attention mechanism |
CN113065550A (en) * | 2021-03-12 | 2021-07-02 | 国网河北省电力有限公司 | Text recognition method based on self-attention mechanism |
CN113392696A (en) * | 2021-04-06 | 2021-09-14 | 四川大学 | Intelligent court monitoring face recognition system and method based on fractional calculus |
CN112801069A (en) * | 2021-04-14 | 2021-05-14 | 四川翼飞视科技有限公司 | Face key feature point detection device, method and storage medium |
CN112949841A (en) * | 2021-05-13 | 2021-06-11 | 德鲁动力科技(成都)有限公司 | Attention-based CNN neural network training method |
CN113379657A (en) * | 2021-05-19 | 2021-09-10 | 上海壁仞智能科技有限公司 | Image processing method and device based on random matrix |
CN113379657B (en) * | 2021-05-19 | 2022-11-25 | 上海壁仞智能科技有限公司 | Image processing method and device based on random matrix |
CN113344875A (en) * | 2021-06-07 | 2021-09-03 | 武汉象点科技有限公司 | Abnormal image detection method based on self-supervision learning |
CN113469335B (en) * | 2021-06-29 | 2024-05-10 | 杭州中葳数字科技有限公司 | Method for distributing weights for features by utilizing relation among features of different convolution layers |
CN113469335A (en) * | 2021-06-29 | 2021-10-01 | 杭州中葳数字科技有限公司 | Method for distributing weight for feature by using relationship between features of different convolutional layers |
CN113554151A (en) * | 2021-07-07 | 2021-10-26 | 浙江工业大学 | Attention mechanism method based on convolution interlayer relation |
CN113554151B (en) * | 2021-07-07 | 2024-03-22 | 浙江工业大学 | Attention mechanism method based on convolution interlayer relation |
WO2023005161A1 (en) * | 2021-07-27 | 2023-02-02 | 平安科技(深圳)有限公司 | Face image similarity calculation method, apparatus and device, and storage medium |
CN113616209B (en) * | 2021-08-25 | 2023-08-04 | 西南石油大学 | Method for screening schizophrenic patients based on space-time attention mechanism |
CN113616209A (en) * | 2021-08-25 | 2021-11-09 | 西南石油大学 | Schizophrenia patient discrimination method based on space-time attention mechanism |
CN113989579A (en) * | 2021-10-27 | 2022-01-28 | 腾讯科技(深圳)有限公司 | Image detection method, device, equipment and storage medium |
CN114118140A (en) * | 2021-10-29 | 2022-03-01 | 新黎明科技股份有限公司 | Multi-view intelligent fault diagnosis method and system for explosion-proof motor bearing |
CN114005078B (en) * | 2021-12-31 | 2022-03-29 | 山东交通学院 | Vehicle weight identification method based on double-relation attention mechanism |
CN114005078A (en) * | 2021-12-31 | 2022-02-01 | 山东交通学院 | Vehicle weight identification method based on double-relation attention mechanism |
CN114550162B (en) * | 2022-02-16 | 2024-04-02 | 北京工业大学 | Three-dimensional object recognition method combining view importance network and self-attention mechanism |
CN114550162A (en) * | 2022-02-16 | 2022-05-27 | 北京工业大学 | Three-dimensional object identification method combining view importance network and self-attention mechanism |
CN115100709A (en) * | 2022-06-23 | 2022-09-23 | 北京邮电大学 | Feature-separated image face recognition and age estimation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110610129A (en) | Deep learning face recognition system and method based on self-attention mechanism | |
CN109543606B (en) | Human face recognition method with attention mechanism | |
CN111325115B (en) | Cross-modal countervailing pedestrian re-identification method and system with triple constraint loss | |
CN111126360A (en) | Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model | |
CN105138998B (en) | Pedestrian based on the adaptive sub-space learning algorithm in visual angle recognition methods and system again | |
CN109359541A (en) | A kind of sketch face identification method based on depth migration study | |
CN106503687A (en) | The monitor video system for identifying figures of fusion face multi-angle feature and its method | |
CN102156887A (en) | Human face recognition method based on local feature learning | |
CN108564040B (en) | Fingerprint activity detection method based on deep convolution characteristics | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN111950525B (en) | Fine-grained image classification method based on destructive reconstruction learning and GoogLeNet | |
Lin et al. | Face gender recognition based on face recognition feature vectors | |
CN109145704B (en) | Face portrait recognition method based on face attributes | |
CN112232184A (en) | Multi-angle face recognition method based on deep learning and space conversion network | |
CN115830531A (en) | Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion | |
CN110414431B (en) | Face recognition method and system based on elastic context relation loss function | |
CN106886771A (en) | The main information extracting method of image and face identification method based on modularization PCA | |
Chen et al. | A finger vein recognition algorithm based on deep learning | |
Saravanan et al. | Using machine learning principles, the classification method for face spoof detection in artificial neural networks | |
Ebrahimpour et al. | Liveness control in face recognition with deep learning methods | |
Ge et al. | Deep and discriminative feature learning for fingerprint classification | |
Elbarawy et al. | Facial expressions recognition in thermal images based on deep learning techniques | |
Xiao et al. | An improved siamese network model for handwritten signature verification | |
CN111898400A (en) | Fingerprint activity detection method based on multi-modal feature fusion | |
Desai et al. | Face anti-spoofing technique using CNN and SVM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191224 |
|
WD01 | Invention patent application deemed withdrawn after publication |