CN112580502A - SICNN-based low-quality video face recognition method - Google Patents
SICNN-based low-quality video face recognition method Download PDFInfo
- Publication number
- CN112580502A CN112580502A CN202011496030.8A CN202011496030A CN112580502A CN 112580502 A CN112580502 A CN 112580502A CN 202011496030 A CN202011496030 A CN 202011496030A CN 112580502 A CN112580502 A CN 112580502A
- Authority
- CN
- China
- Prior art keywords
- network
- face
- sicnn
- loss
- quality video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Abstract
The invention discloses a low-quality video face recognition method based on SICNN, which comprises the following steps of firstly, classifying key points representing the face direction by adopting a clustering algorithm according to the position characteristics of the face in a video to select key frames; then, establishing a SICNN reconstruction model, extracting features through a reconstruction network and an identification network, respectively obtaining reconstruction loss and identification loss so as to define identity loss, training the reconstruction network by using an alternative training strategy, and obtaining a frame image with high resolution and more identity features; and finally, inputting the reconstructed frame image into an identification network inclusion Resnet v2, extracting depth features for classification and identification, and voting the identification results of all the image frames to obtain a video identification result. The method is applied to low-quality video face recognition, and the accuracy rate of the low-quality video face recognition is effectively improved.
Description
Technical Field
The invention belongs to the technical field of face recognition, and particularly relates to a low-quality video face recognition method based on SICNN.
Background
In recent years, the development of computer vision has prompted more and more technology to fall into a practical product in everyday life. With the rise of deep neural networks, face recognition technology has been developed rapidly. Image face recognition has achieved excellent results, and the research on video face recognition is relatively unexpected. This is because the video face recognition faces more than the same problems of illumination, occlusion, and posture as the image face recognition, and the image frame quality of the video in practical application (such as in a monitoring scene) is usually inferior to that of the image. Currently, video face recognition methods are divided into two categories: classical methods and deep learning methods. The method is commonly used for face recognition by a deep learning method, a section of video is divided into a plurality of image frames, the image frames are used for face recognition, and the final recognition result is voted to obtain the final video recognition result. A common method for improving the low-quality video is a super-resolution reconstruction method, which reconstructs a low-resolution video frame to obtain a super-resolution image with better visual effect and more characteristics, thereby solving the problem of unmatched space characteristic dimensions of the images with medium and low resolution.
However, the general super-resolution reconstruction algorithm also has some problems: most of the prior super-resolution reconstruction methods consider whether an output image is clear and vivid and improve visual effect, so that the recovery of facial features is neglected, a face close to a real identity cannot be generated, the accuracy of face recognition cannot be improved, and an expected recognition effect cannot be achieved.
Disclosure of Invention
The invention aims to overcome the defects of the existing super-resolution reconstruction algorithm and provide a reconstruction algorithm beneficial to face recognition. The invention adopts a low-quality video face recognition method based on SICNN (super identity convolutional neural network), which can effectively solve the problem of low-quality video recognition efficiency and achieve good recognition effect.
In order to achieve the purpose, the invention adopts the technical scheme that:
a low-quality video face recognition method based on SICNN comprises the following steps:
selecting key frames, taking the key point positions of the low-quality video face image frames in the data set as face features, and selecting the key frames by using a K-means clustering algorithm and a random algorithm;
step 3, the low-quality video key frame image is processedInput improved SICNN reconstruction network CNNHExtracting features, reconstructing to obtain super-resolution imageSimultaneous and high resolution face imageObtaining the super-resolution reconstruction loss LSR;
Step 4, reconstructing the super-resolution image in the step 3Input-improved recognition network CNNRExtracting depth features, mapping the features to a hypersphere space for classification and identification to obtain an identification loss LFRAnd super identity loss LSI;
step 6, inputting the video image frame reconstructed by the SICNN in the step 3 into an inclusion Resnet v2 network, extracting features by using small convolution, using a softmax classifier, and improving Centerlos as a loss function training network, wherein the improved Centerlos is obtained by calculating the original Centerlos and directly takes the features of the high-resolution face image as the center;
and 7, voting the identification result of the image frame to obtain a final video identification result.
Further, the data set used in step 1 is a COX data set, ten divisions of the COX data set are used for the training sample and the testing sample, and the result is an average value of ten experiments.
Further, the K value of the K-means algorithm in step 2 is 5, which respectively represents 5 different human face poses: the face recognition method comprises the following steps of left side face, left deflection face, front face, right deflection face and right side face, wherein 10 key frames are selected by each group through a random algorithm.
Further, CNN in the step 3HThe network comprises DB (Dense Block), convolution, DB, deconvolution, convolution, DB and convolution sequence connection; CNNHThe network uses DB to extract semantic features, uses deconvolution to amplify the resolution of input features, and uses convolution to realize mapping and reconstruction; because the resolution of the used low-quality video key frame is 32 x 40, the identification requirement can be met only by amplifying by 4 times; thus CNNHThe number of deconvolution in the network is changed from 3 to 2, so that the original amplification is changed from 8 times to 4 times.
Furthermore, each DB block is sequentially connected with 6 identical DenseLayer structures, each DenseLayer structure comprises a 1 × 1 and a 3 × 3 convolutional layer sequentially connected, and the 1 × 1 convolutional layer is a bottleneck layer, so as to reduce the number of input feature maps, i.e. dimensionality reduction; the composition of each convolution layer is a Batch Normalization + ReLU +3 × 3Conv layer, the growth _ rate of the DB block is equal to 32, and the bn _ size is equal to 4.
Further, in the step 4, the CNN similar to ResNet in the SICNN model is usedRNetwork improvement, CNNRThe method comprises 36 convolutional layers, wherein 6 convolutional layers and 5 residual layers are alternately connected, the convolutional layers are a convolutional layer 1a, a convolutional layer 1b, a residual layer 1, a convolutional layer 2, a residual layer 2, a convolutional layer 3, a residual layer 3, a convolutional layer 4, a residual layer 4, a convolutional layer 5 and a residual layer 5 in sequence, each residual layer comprises one convolutional layer and a plurality of residual blocks, and the number of the residual blocks corresponding to the 5 residual layers is 1, 2, 4, 6 and 2 respectively; CNNRThe emphasis is on improving the face identity characteristics, the reconstructed SR image is closer to the original HR image, the loss function uses A-softmax, the function introduces the angle classification distance and is equivalent to learning the characteristics in a hypersphere space, and the learned face characteristics have better distinguishability.
Further, the loss of excess identity L in the step 5SIIs a kind of temporal loss computing normalized euclidean distance that directly relates the loss to the identity in hypersphere space.
Further, the inclusion respet v2 network in step 6 is sequentially connected by Stem, 5 × inclusion respet a, Reduction a, 10 × inclusion respet B, Reduction B, 5 × inclusion respet C, Average firing, drop, Softmax layers; the inclusion respet v2 network implements the dimensionality Reduction operation with 1 × 1 convolution in each layer, using two 3 × 3 convolutions instead of 5 × 5, while decomposing the 7 × 7 convolution and the 3 × 3 convolutions in the Stem layer and Reduction ab C layer into two one-dimensional convolutions (1 × 7, 7 × 1) (1 × 3, 3 × 1); the inclusion respet v2 network replaces the sequential connection of the convolution and pooling before the inclusion respet structure with Stem modules to obtain a deeper network; the training uses the softmax cross entropy loss function and the modified center loss function, centerlos.
Further, the final classification result in step 7 is obtained by voting, and the result with the largest number of votes obtained is the final video identification result.
Compared with the prior art, the invention has the following beneficial effects:
the invention relates to a low-quality video face recognition model based on super-resolution reconstruction, which aims at the characteristic of low-resolution illumination difference of a low-quality video, and improves the accuracy of video recognition by performing key frame extraction, super-resolution reconstruction and face recognition classification on a video frame;
according to the method, the video key frames are selected through the key frame selection algorithm, the calculation complexity of reconstruction and recognition is reduced on the basis of not influencing the recognition efficiency, and the training and testing time is reduced;
according to the invention, by introducing identity loss into the reconstruction network through the SICNN reconstruction method, the reconstructed super-resolution image can obtain more identity characteristics, and the face recognition accuracy can be improved;
the Incepton Resnet v2+ Centerlos identification network in the invention utilizes the center loss function to more accurately classify the result, and the Incepton Resnet v2 network is used to reduce the calculation cost and accelerate the learning speed.
Drawings
FIG. 1 is a diagram of a SICNN-based low-quality video face recognition model;
FIG. 2 is a SICNN model framework;
fig. 3 is an inclusion-Resnet v2 network architecture.
Detailed Description
The present invention will be further described with reference to the following examples.
Example 1
A low-quality video face recognition method based on SICNN comprises the following steps:
COX face data sets aim to address the problems of video to still (V2S), still to video (S2V) and video to video (V2V) face recognition. The data set contains 1,000 subjects, each simulating a video surveillance scene, capturing 1 high quality still image and 3 video sequences (cam1, cam2, cam 3). After face detection and data preprocessing, the number of image frames containing faces in most video sequences is more than 100, some are even more than 300.
And 2, selecting key frames, taking the key point positions of the low-quality video face image frames in the data set as face features, and selecting the key frames by using a K-means clustering algorithm and a random algorithm. The K value of the K-means algorithm is 5, and the K value represents 5 different human face poses: the face recognition method comprises the following steps of left side face, left deflection face, front face, right deflection face and right side face, wherein 10 key frames are selected by each group through a random algorithm.
According to the invention, the K-Means clustering is carried out on the image according to the positions of the key points of the face, and the key point position of the a-th sample is set as x(a). Generally, the image detected by the human face marks the positions of the two eyes of the human face, and the invention takes the positions as key points, so the invention has the advantages of simple structure, convenient operation and low costNamely, it isIs the left eye position coordinate of the a-th sample,is the right eye position coordinate of the a-th sample. Therefore, the invention defines the distance function of the a sample of the K-Means cluster as follows:
wherein: lajFor the a sample and the j class centroid mujDistance of (d), mujIs the centroid of class j, (x)jL,yjLIs) isjLeft eye position coordinates, (x)jR,yjRIs) isjThe right eye position coordinates.
Sample position set of hypothetical inputIs S ═ x(1),x(2),…,x(a),…,x(m)},x(a)∈RnM is the number of samples, RnFor an n-dimensional real number set, the algorithm steps are as follows:
(1) randomly selecting k clustering centroids as mu1,μ2,…,μk∈RnWherein: mu.skIs the centroid of class k;
(2) repeating the following process until convergence
For each sample set x(a)Calculating x(a)Class (c) to which(a)Class of the a-th sample, j is class number):
for each class j, the centroid of the class is recalculated
}
Step 3, the ith low-quality video key frame, namely the low-resolution face imageInput improved SICNN reconstruction network CNNHExtracting features, reconstructing to obtain super-resolution face imageSimultaneous and high resolution face imageObtaining the super-resolution reconstruction loss LSR。CNNHThe network comprises DB (Dense Block), convolution, DB, deconvolution, convolution, DB and convolution sequence connection; CNNHNetwork using DB to extract semantic features, using deconvolution to amplify inputResolution of features, mapping and reconstruction using convolution. Since the resolution of the low-quality video key frames used is 32 x 40, only 4 times of magnification is needed to meet the recognition requirements. Thus improving CNNHThe number of deconvolution in the network is changed from 3 to 2, so that the original amplification is changed from 8 times to 4 times.
3.1 DB(Dense Block)
CNNHThe network uses DB to extract semantic features, and in order to solve the problem of gradient disappearance, a DB block directly connects all layers by means of the thought of Resnet on the premise of ensuring the maximum information transmission between the layers in the network. Simply speaking, the input to each layer is from the output of all previous layers.
The number of output signatures per convolutional layer in the DB is small (less than 100) and not as wide as hundreds or thousands of networks. Meanwhile, the connection mode enables the transfer of the characteristics and the gradient to be more effective, and the network is easier to train. The gradient vanishing problem is easier to occur when the depth of the network is deeper, because the input information and the gradient information are transmitted among a plurality of layers, and now the dense connection is equivalent to that each layer is directly connected with input and loss, so that the gradient vanishing phenomenon can be reduced, and the deeper network is not a problem.
Each DB block in the present invention is composed of 6 substructures, each substructure includes one convolution layer of 1 × 1 and one convolution layer of 3 × 3, and the convolution layer of 1 × 1 is a bottleneck layer, so as to reduce the number of feature maps input, i.e., reduce the dimension. The composition of each convolution layer is a Batch Normalization + ReLU +3 × 3Conv layer, the growth _ rate of the DB block is equal to 32, and the bn _ size is equal to 4.
The face image with size 32 x 48 is input, and the reconstructed image resolution is enlarged to 8 times of the original image, which is 256 x 320.
3.2 loss function
On reestablishing the network CNNHDefining Euclidean distance of pixels between the SR image and the high-resolution HR image after LR image reconstruction as super-resolution loss LSRFor the ith low resolution face image, it goes through CNNHThe super-resolution loss after reconstruction is as follows:
wherein:the ith LR and HR face images in the training set respectively,to representThe reconstructed output can also be expressed as
Step 4, reconstructing the super-resolution image of the ith low-resolution face image in the step 3Input-improved recognition network CNNRExtracting depth features, mapping the features to a hypersphere space for classification and identification to obtain an identification loss LFRAnd super identity loss LSI. CNN similar to ResNet in SICNN modelRNetwork improvement, CNNRThe multilayer structure comprises 36 convolutional layers, wherein 6 convolutional layers and 5 residual layers are alternately connected, the convolutional layers are a convolutional layer 1a, a convolutional layer 1b, a residual layer 1, a convolutional layer 2, a residual layer 2, a convolutional layer 3, a residual layer 3, a convolutional layer 4, a residual layer 4, a convolutional layer 5 and a residual layer 5 in sequence, each residual layer comprises a convolutional layer and a plurality of residual blocks, and the number of the residual blocks corresponding to the 5 residual layers is 1, 2, 4, 6 and 2 respectively. CNNRThe emphasis is on improving the face identity characteristics, the reconstructed SR image is closer to the original HR image, the loss function uses A-softmax, the function introduces the angle classification distance and is equivalent to learning the characteristics in a hypersphere space, and the learned face characteristics are enabled to beWith better distinguishability.
4.1 identification network CNNR
CNNRIs a CNN similar to ResnetRThe multilayer structure comprises 36 convolutional layers, wherein 6 convolutional layers and 5 residual layers are alternately connected, the convolutional layers are a convolutional layer 1a, a convolutional layer 1b, a residual layer 1, a convolutional layer 2, a residual layer 2, a convolutional layer 3, a residual layer 3, a convolutional layer 4, a residual layer 4, a convolutional layer 5 and a residual layer 5 in sequence, each residual layer comprises a convolutional layer and a plurality of residual blocks, and the number of the residual blocks corresponding to the 5 residual layers is 1, 2, 4, 6 and 2 respectively. CNNRThe network structure of (a) is as follows:
4.2 identifying networks CNNR
CNNRThe A-Softmax loss is used as a loss function, the function introduces angle classification distance, and is equivalent to learning features in a hypersphere space, so that the learned face features have better distinguishability. CNNRExpressing this loss function as the recognition loss LFRFor a term belonging to the y-thiSuper-resolution face image input of individual identityIdentifying lossThe following were used:
in the formula: e is the base of the natural logarithm,for inputting human face imagesFrom CNNRThe identity characteristics of the extracted identification information are extracted,andrespectively representing identities yiAnd yjThe learning angle of (2) is set,andare respectively fromAndgeneralized monotonic decreasing function.Andthe derivation formula of (1) is as follows:
wherein the content of the first and second substances,andare respectively asAndb is a hyper-parameter of the angle margin constraint, b is more than or equal to 1, c is a nonnegative integer and c belongs to [0, b-1 ]]Preferably, b is 4, c ∈ [0,3 ]]。
wherein:is thatThe identity representation projected onto the unit hypersphere,is thatThe identity representation projected onto the unit hypersphere,andare respectivelyAndfrom CNNRThe extracted identity features.
5.1 training strategy
Inputting: recognition model CNN for high-resolution image HR face image trainingRUsing the resolution loss LSRTrained human face hallucination model CNNHThe minimum batch size is N,the ith low-resolution and high-resolution face images.
And (3) outputting: and (4) SICNN.
1 when not converging
5 using N image pairsAverage super-resolution loss L ofSRAnd average hyperidentity loss LSITo update the reconstruction model CNNH(α is the loss of superidentity LSIWeight of (2), equal to 8):
6, end
And 6, inputting the video image frame reconstructed by the SICNN in the step 3 into an inclusion Resnet v2 network, extracting features by using small convolution, using a softmax classifier, and training the network by improving Centerlos as a loss function. The Inception Resnet v2 network is different from the convolutional layer and the pooling layer of the traditional network, and the 1 × 1 convolution, the 3 × 3 convolution or the 3 × 3 pooling are operated in parallel in the same layer. The inclusion Resnet v2 network is connected by Stem, 5 × inclusion ResNet A, Reduction A, 10 × inclusion ResNet B, Reduction B, 5 × inclusion ResNet C, Average Pooling, Dropout, Softmax layers in sequence; the Incepton Resnet v2 network uses 1 × 1 convolution to realize dimensionality Reduction operation in each layer, uses two 3 × 3 convolutions to replace 5 × 5, and decomposes the 7 × 7 convolution and the 3 × 3 convolution in the Stem layer and the Reduction AB C layer into two one-dimensional convolutions (1 × 7, 7 × 1) (1 × 3, 3 × 1), thereby reducing the number of parameters, accelerating calculation and further increasing the depth of the network. The inclusion respet v2 network replaces the sequential connection of the convolution and pooling before the inclusion respet structure with Stem modules to obtain a deeper network. The training uses the softmax cross entropy loss function and the modified center loss function, centerlos.
Supposing a super-resolution face image reconstructed from the ith low-resolution face imageFeatures extracted from the inclusion Resnet v2 network are denoted as IiCorresponding real category is yiThe center of each category is marked asThe central loss function for the formation of cohesive increase is defined as Lcenter:
Considering the characteristics of video recognition, will IiCorresponding high resolution face imageFeature H extracted from Incepton Resnet v2 networkiAs a true class yiOf (2), i.e. improved LcenterInIs HiAnd the center is unchanged during the training process. Improved LcenterComprises the following steps:
and 7, voting the identification result of the image frame to obtain a final video identification result. And the final classification result is obtained by voting the identification results of all the video frames, and the result with the largest number of votes is the final video identification result.
Firstly, classifying key point positions representing the direction of a face by adopting a clustering algorithm according to the position characteristics of the face in a video to select key frames; then, establishing a SICNN reconstruction model, extracting features through a reconstruction network and an identification network, respectively obtaining reconstruction loss and identification loss so as to define identity loss, training the reconstruction network by using an alternative training strategy, and obtaining a frame image with high resolution and more identity features; and finally, inputting the reconstructed frame image into an identification network inclusion Resnet v2, extracting depth features for classification and identification, and voting the identification results of all the image frames to obtain a video identification result. Compared with a classical point pair set video identification method and a common super-resolution reconstruction method, the low-quality video face identification method based on the super-resolution reconstruction method is applied to low-quality video face identification, and the accuracy of low-quality video face identification can be effectively improved by adopting the technical scheme of the invention.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (9)
1. A low-quality video face recognition method based on SICNN is characterized by comprising the following steps:
step 1, preprocessing data, namely splitting low-quality video data in a data set into image frames, cutting face images into face images with the size of 32 × 40px by face detection, and dividing the image set into a training set and a testing set by using an algorithm, wherein the size ratio of the data set is 7: 3;
selecting key frames, taking the key point positions of the low-quality video face image frames in the data set as face features, and selecting the key frames by using a K-means clustering algorithm and a random algorithm;
step 3, the low-quality video key frame image is processedInput improved SICNN reconstruction network CNNHExtracting features, reconstructing to obtain super-resolution imageSimultaneous and high resolution face imageObtaining the super-resolution reconstruction loss LSR;
Step 4, reconstructing the super-resolution image in the step 3Input-improved recognition network CNNRExtracting depth features, mapping the features to a hypersphere space for classification and identification to obtain an identification loss LFRAnd super identity loss LSI;
Step 5, training the network by using an alternative training strategy, and using the recognition loss L obtained in the step 4FRTraining the recognition network to obtain a weighted hyperidentity loss L using step 4SIAnd the super-resolution reconstruction loss L obtained in the step 3SRCo-training the reconstruction network until convergence;
step 6, inputting the video image frame reconstructed by the SICNN in the step 3 into an inclusion Resnet v2 network, extracting features by using small convolution, using a softmax classifier, and improving Centerlos as a loss function training network, wherein the improved Centerlos is obtained by calculating the original Centerlos and directly takes the features of the high-resolution face image as the center;
and 7, voting the identification result of the image frame to obtain a final video identification result.
2. The SICNN-based low-quality video face recognition method of claim 1, wherein: the data set used in the step 1 is a COX data set, ten divisions of the COX data set are used for the training sample and the testing sample, and the result is the average value of ten experiments.
3. The SICNN-based low-quality video face recognition method of claim 1, wherein: the K value of the K-means algorithm in the step 2 is 5, and the K value represents 5 different human face poses: the face recognition method comprises the following steps of left side face, left deflection face, front face, right deflection face and right side face, wherein 10 key frames are selected by each group through a random algorithm.
4. The SICNN-based low-quality video face recognition method of claim 1, wherein: CNN in the step 3HThe network comprises DB, convolution, DB, deconvolution, convolution, DB and convolution sequential connection; CNNHThe network uses the DB to extract semantic features, uses deconvolution to scale up the resolution of the input features, uses convolution to achieve mapping and reconstruction.
5. The SICNN-based low-quality video face recognition method of claim 4, wherein: each DB block is sequentially connected by 6 identical DenseLayer structures, each DenseLayer structure comprises sequentially connected convolution layers of 1 x 1 and 3 x 3, and the convolution layers of 1 x 1 are bottleneck layers; the composition of each convolution layer is a Batch Normalization + ReLU +3 × 3Conv layer, the growth _ rate of the DB block is equal to 32, and the bn _ size is equal to 4.
6. The SICNN-based low-quality video face recognition method of claim 1, wherein: in the step 4, CNN in the SICNN model is calculatedRThe method comprises 36 convolutional layers, wherein 6 convolutional layers and 5 residual layers are alternately connected, the convolutional layers are a convolutional layer 1a, a convolutional layer 1b, a residual layer 1, a convolutional layer 2, a residual layer 2, a convolutional layer 3, a residual layer 3, a convolutional layer 4, a residual layer 4, a convolutional layer 5 and a residual layer 5 in sequence, each residual layer comprises one convolutional layer and a plurality of residual blocks, and the number of the residual blocks corresponding to the 5 residual layers is 1, 2, 4, 6 and 2 respectively; the loss function uses a-softmax.
7. The SICNN-based low-quality video face recognition method of claim 1, wherein: the super identity loss L in the step 5SIIs a kind of percentual loss calculation normalized euclidean distance.
8. The SICNN-based low-quality video face recognition method of claim 1, wherein: the inclusion Resnet v2 network in the step 6 is sequentially connected by Stem, 5 × inclusion ResNet A, Reduction A, 10 × inclusion ResNet B, Reduction B, 5 × inclusion ResNet C, Average Pooling, Drapout and Softmax layers; the inclusion respet v2 network used 1 × 1 convolution in each layer, using two 3 × 3 convolutions instead of 5 × 5, while decomposing the 7 × 7 convolution and the 3 × 3 convolutions in the Stem and Reduction AB C layers into two one-dimensional convolutions (1 × 7, 7 × 1) (1 × 3, 3 × 1); the inclusion respet v2 network replaces the sequential connection of the convolution and pooling before the inclusion respet structure with a Stem module; the training uses the softmax cross entropy loss function and the modified center loss function, centerlos.
9. The SICNN-based low-quality video face recognition method of claim 1, wherein: the final classification result in the step 7 is obtained by voting, and the result with the largest number of votes obtained is the final video identification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011496030.8A CN112580502A (en) | 2020-12-17 | 2020-12-17 | SICNN-based low-quality video face recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011496030.8A CN112580502A (en) | 2020-12-17 | 2020-12-17 | SICNN-based low-quality video face recognition method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112580502A true CN112580502A (en) | 2021-03-30 |
Family
ID=75135971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011496030.8A Pending CN112580502A (en) | 2020-12-17 | 2020-12-17 | SICNN-based low-quality video face recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112580502A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113375676A (en) * | 2021-05-26 | 2021-09-10 | 南京航空航天大学 | Detector landing point positioning method based on impulse neural network |
CN114612990A (en) * | 2022-03-22 | 2022-06-10 | 河海大学 | Unmanned aerial vehicle face recognition method based on super-resolution |
CN115205768A (en) * | 2022-09-16 | 2022-10-18 | 山东百盟信息技术有限公司 | Video classification method based on resolution self-adaptive network |
-
2020
- 2020-12-17 CN CN202011496030.8A patent/CN112580502A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113375676A (en) * | 2021-05-26 | 2021-09-10 | 南京航空航天大学 | Detector landing point positioning method based on impulse neural network |
CN113375676B (en) * | 2021-05-26 | 2024-02-20 | 南京航空航天大学 | Detector landing site positioning method based on impulse neural network |
CN114612990A (en) * | 2022-03-22 | 2022-06-10 | 河海大学 | Unmanned aerial vehicle face recognition method based on super-resolution |
CN115205768A (en) * | 2022-09-16 | 2022-10-18 | 山东百盟信息技术有限公司 | Video classification method based on resolution self-adaptive network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110827213B (en) | Super-resolution image restoration method based on generation type countermeasure network | |
CN108520535B (en) | Object classification method based on depth recovery information | |
CN112580502A (en) | SICNN-based low-quality video face recognition method | |
CN104268593B (en) | The face identification method of many rarefaction representations under a kind of Small Sample Size | |
CN109615582A (en) | A kind of face image super-resolution reconstruction method generating confrontation network based on attribute description | |
CN106778796B (en) | Human body action recognition method and system based on hybrid cooperative training | |
CN109360170B (en) | Human face repairing method based on advanced features | |
CN109376787B (en) | Manifold learning network and computer vision image set classification method based on manifold learning network | |
CN111523483B (en) | Chinese meal dish image recognition method and device | |
CN113033345B (en) | V2V video face recognition method based on public feature subspace | |
CN112950480A (en) | Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention | |
CN112784929A (en) | Small sample image classification method and device based on double-element group expansion | |
CN109711442A (en) | Unsupervised layer-by-layer generation fights character representation learning method | |
CN113628297A (en) | COVID-19 deep learning diagnosis system based on attention mechanism and transfer learning | |
CN113378949A (en) | Dual-generation confrontation learning method based on capsule network and mixed attention | |
CN109446997A (en) | Document code automatic identifying method | |
CN110414431B (en) | Face recognition method and system based on elastic context relation loss function | |
CN111611909A (en) | Multi-subspace-domain self-adaptive face recognition method | |
CN111507356A (en) | Segmentation method of handwritten characters of lower case money of financial bills | |
CN111695455A (en) | Low-resolution face recognition method based on coupling discrimination manifold alignment | |
Chen et al. | Generalized face antispoofing by learning to fuse features from high-and low-frequency domains | |
Zhang et al. | Attention-enhanced CNN for chinese calligraphy styles classification | |
CN114818963A (en) | Small sample detection algorithm based on cross-image feature fusion | |
CN114492634A (en) | Fine-grained equipment image classification and identification method and system | |
CN114155572A (en) | Facial expression recognition method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |