CN111967331B - Face representation attack detection method and system based on fusion feature and dictionary learning - Google Patents
Face representation attack detection method and system based on fusion feature and dictionary learning Download PDFInfo
- Publication number
- CN111967331B CN111967331B CN202010696193.4A CN202010696193A CN111967331B CN 111967331 B CN111967331 B CN 111967331B CN 202010696193 A CN202010696193 A CN 202010696193A CN 111967331 B CN111967331 B CN 111967331B
- Authority
- CN
- China
- Prior art keywords
- dictionary
- fusion
- face
- term
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 84
- 238000001514 detection method Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000012360 testing method Methods 0.000 claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000003384 imaging method Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 22
- 230000009467 reduction Effects 0.000 claims description 10
- 238000009499 grossing Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000000513 principal component analysis Methods 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 9
- 239000000284 extract Substances 0.000 description 6
- 238000011176 pooling Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 230000005484 gravity Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000000137 annealing Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a face representation attack detection method and a face representation attack detection system based on fusion characteristics and dictionary learning, wherein the method comprises the following steps: extracting image quality characteristics of the whole face image according to a distortion source of the face image secondary imaging; constructing a depth convolution network model, and extracting depth network characteristics of the face image block through the depth convolution network; cascading the two features through PCA to generate a final fusion feature; initializing dictionary atoms by using fusion features, and training a dictionary learning classifier based on a low-rank shared dictionary; and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error. According to the invention, the face representation attack detection is carried out by combining the image quality characteristics and the depth network characteristics for the first time, so that the information provided by a single frame image is better utilized, and the discrimination capability of the extracted characteristics is effectively enhanced; the same mode of the true and false samples is stripped through the low-rank shared dictionary for the first time, so that the accuracy of attack detection is successfully improved, and the method has good generalization.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a face representation attack detection method and system based on fusion characteristics and dictionary learning.
Background
Face recognition technology is widely applied to security, payment, entertainment and other scenes. However, face recognition systems have certain potential safety hazards. With the development of social networks and the popularization of smart phones, more and more people share photos and videos of individuals on the network, lawbreakers can attack a face recognition system by using the media to disguise as other people or deliberately confuse the identities of the individuals, and the purposes of infringement of property safety of other people, escape of legal sanctions and the like are achieved. Attempts to borrow the identity of a legitimate user by means of pictures, videos, etc. of the user through operation of a face recognition system are known as face representation attacks, and methods of detecting such attacks are known as face in vivo detection.
In the living body detection of a face, the face image can be classified into two types, and one type is an image obtained by directly photographing the user himself. The shot object of another type of image may be an object with high similarity to the face of the legal user, such as a photo, a video, a wax image, etc. of the legal user. Such images are called face representation attack images (attack faces for short), and are objects to be detected by the living body detection technology.
The core of the human face living body detection algorithm is to extract the characteristic of the human face image, which has the most discriminant capability for detecting living bodies, and the traditional detection technology is based on the manually designed characteristic, such as LBP (local binary pattern) and LPQ (local phase quantization), so that the manually designed characteristic capable of detecting the attacking human face becomes very difficult along with the continuous improvement of the imaging quality of equipment. In recent years, the use of convolutional neural networks to automatically extract features has become the mainstream. The deep convolutional neural network performs excellently on image classification tasks, but is limited by the scale of the living body detection data set, and the deep network supervised by the class labels only tends to memorize any feature existing in the training set, so that overfitting is easy to cause, and algorithm generalization is poor.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides the face representation attack detection method based on fusion characteristics and dictionary learning, which fully utilizes information provided by a single frame image by fusing artificially designed image quality characteristics and depth network characteristics, effectively enhances the recognition capability of the characteristics, adopts a dictionary learning method based on a low-rank shared dictionary to realize classification of true and false samples, and can strip the commonality of the true and false samples from the shared dictionary, thereby improving the accuracy of attack detection.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the invention provides a face representation attack detection method based on fusion characteristics and dictionary learning, which comprises the following steps:
performing face detection and clipping on the input video to construct a face image database;
extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics;
extracting image quality characteristics of the whole face image according to a distortion source of the face image secondary imaging;
constructing a depth convolution network model, and extracting depth network characteristics of the face image block through the depth convolution network;
according to the image quality characteristics and the depth network characteristics, respectively standardizing the two characteristics, cascading, and performing dimension reduction on the cascading characteristics through PCA to generate final fusion characteristics;
initializing dictionary atoms based on fusion features, and training a dictionary learning classifier based on a low-rank shared dictionary;
and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error.
As an preferable technical solution, the extracting the image quality feature of the whole face image according to the distortion source of the face image secondary imaging specifically includes: and extracting specular reflection characteristics, extracting fuzzy characteristics, extracting color moment characteristics and extracting color diversity characteristics, and cascading the extracted characteristics to obtain image quality characteristics.
As an preferable technical solution, the extracting the depth network feature of the face image block through the depth convolution network specifically includes:
the face image block is generated by carrying out random scaling and random cutting on the complete face image, a lightweight depth convolution network model is constructed, the face image block is used as input of the convolution network model, the depth network characteristics of the face image block are extracted by training the convolution network model through a Focal Loss function, and a label smoothing method is adopted to convert the single-heat coding label into a soft label, so that the training process of the depth convolution neural network is optimized.
As an preferable technical solution, the initializing dictionary atoms based on the fusion features, training the dictionary learning classifier based on the low-rank shared dictionary, specifically includes the steps of: and (3) the dictionary is saved after the set times of iterative optimization are performed by alternately optimizing the dictionary and minimizing the cost function of the dictionary model by the sparse coefficient.
As a preferred technical solution, the cost function of the dictionary model is expressed as:
the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition capability of the dictionary; the judging coefficient term is used for increasing similarity and reducing similarity among the classes, and the L1 regularization term is used for realizing sparsity of the coefficient X; the kernel norm is used for restricting the subspace size formed by the shared dictionary, ensuring the low rank performance of the shared dictionary, and lambda 1 、λ 2 And η is used to weigh the specific gravity of each item of the cost function;
the discriminant fidelity term is defined as:
wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n c Representing the number of samples of class c, D representing the global dictionary, D c Sub-dictionary representing class c, X c i represents class cCoefficients of the sample on the class i dictionary;
the discrimination coefficient term is defined as:
wherein M is c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M 0 Representing the average value of the coefficients in the shared dictionary,the effect of (c) is to force the coefficients of all training samples on the shared dictionary to be close to the average.
As a preferable technical scheme, the method further comprises a step of solving the sparse coefficient of the test sample, specifically comprising the following steps: and constructing two class dictionaries with shared dictionaries through the saved dictionaries, and solving sparse coefficients of the test samples by using the fixed class dictionaries.
As a preferred technical solution, the determining the class of the test sample based on the magnitude of the fusion feature reconstruction residual error specifically includes:
based on the sparse coefficient of the test sample obtained by regularization of the elastic network, reconstructing fusion characteristics of the test sample through the sparse coefficient, and reconstructing the category with the minimum residual error as the prediction category of the test sample.
The invention provides a face representation attack detection system based on fusion characteristics and dictionary learning, which comprises the following steps: the device comprises a face image database construction module, a primary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a class judgment module of a test sample;
the primary fusion feature extraction module comprises an image quality feature extraction module and a depth network feature extraction module;
the face image database construction module is used for carrying out face detection and cutting on the input video to construct a face image database;
the primary fusion feature extraction module is used for extracting fusion features of face images in a face image database, wherein the fusion features comprise image quality features and depth network features;
the image quality feature extraction module is used for extracting the image quality features of the whole face image according to the distortion sources of the secondary imaging of the face image;
the depth network feature extraction module is used for constructing a depth convolution network model and extracting the depth network features of the face image blocks through the depth convolution network;
the final fusion feature generation module is used for respectively standardizing the two features according to the image quality features and the depth network features, then cascading the two features, and performing dimension reduction on the cascading features through PCA to generate final fusion features;
the dictionary learning classifier training module is used for initializing dictionary atoms based on the fusion characteristics and training a dictionary learning classifier based on a low-rank shared dictionary;
the class judging module of the test sample is used for judging the class of the test sample based on the size of the fusion characteristic reconstruction residual error.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) According to the invention, by fusing the image quality characteristics and the depth network characteristics of the manual design, the information provided by a single frame image is fully utilized, and the distinguishing capability of the characteristics is enhanced.
(2) The invention adopts the low-rank shared dictionary to strip the commonality of the true sample, ensures that the class dictionary can better represent the difference between the true sample and the attack sample, avoids the defect that the adoption of the full-connection layer is easy to generate overfitting, has good generalization and further improves the accuracy of attack detection.
(3) According to the invention, the elastic network regularization is adopted to replace the traditional L1 regularization, so that the problem that certain characteristics are easy to ignore when the L1 regularization is used is solved, the method is beneficial to maintaining fine characteristics, and the discrimination of sparse coefficients is enhanced.
(4) The invention adopts the image blocks which are randomly cut and randomly generated as the input of the convolutional neural network, so that the convolutional neural network concentrates on learning and extracting the effective information related to the deception mode, the data set scale is enlarged in an effective mode, and the problem of performance degradation caused by small data scale is effectively solved.
Drawings
Fig. 1 is a flow chart of a face representation attack detection method based on fusion features and dictionary learning in the present embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
As shown in fig. 1, the present embodiment provides a face representation attack detection method based on fusion features and dictionary learning, which includes the following steps:
s1: performing face detection and clipping on the input video to construct a face image database;
the embodiment selects a public face representation ATTACK video data set REPLAY-ATTACK, CASIA-FASD and MSU-MFSD, wherein the three data sets comprise real face videos and ATTACK face videos, the division of a training set and a testing set is provided, the first 30 frames of each video of the data sets are extracted, a cascade classifier based on Haar characteristics is adopted to detect the position of a face in a picture frame, and a face image is cut;
s2: extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics, and the method comprises the following specific steps of:
s21) extracting image quality characteristics of the whole face image according to a distortion source of the secondary imaging of the face image,
according to the face image database, sub-feature extraction is carried out on the face image through the aspects of ambiguity, specular reflection, color distortion and the like, and the final sample image quality feature vector is formed by splicing all sub-feature vectors, and the specific steps are as follows:
under the same imaging environment, the real access face is a primary imaging picture, the face representation attack is a secondary imaging picture, and analyzing the source of face image distortion in the secondary imaging process is helpful for enhancing the discrimination of the extracted features, and the embodiment extracts the image quality features from four aspects of specular reflection, ambiguity, color moment distortion and color diversity distortion;
extracting specular reflection characteristics: iteratively replacing chromaticity of a highlight position of an input face image with the maximum diffuse reflection chromaticity of adjacent pixels, extracting specular reflection components in the image, and further utilizing percentage values, average values and variance values occupied by the components to form specular reflection characteristics;
extracting fuzzy characteristics: and extracting the fuzzy characteristics of the image by adopting a method based on secondary blurring. The input image is converted into a gray image, the gray image is subjected to low-pass filtering by using a Gaussian filter with a convolution kernel size of 33, and the filtered image is called a blurred image. The image definition is measured by comparing the changes of adjacent pixels of the images, specifically, the absolute difference images in the horizontal direction and the vertical direction are calculated, the gray value sum of all pixel points of the absolute difference images is calculated and is respectively called as the horizontal difference sum and the vertical difference sum, and the ratio of the horizontal difference sum and the vertical difference sum of the images before and after filtering forms a fuzzy characteristic.
Extracting color moment characteristics: firstly, an input face image is converted from an RGB space to an HSV space with relatively independent channels, then the average value, the variance value and the skewness of each channel are calculated, the percentage of pixels in the minimum histogram bin and the maximum histogram bin of each channel are calculated, and the 5 values form a color moment characteristic.
Extracting color diversity characteristics: the R, G, B three channels of the input picture are color quantized and then color diversity features are constructed using the histogram bin counts of the first 100 most frequently occurring colors and the number of all different colors that occur in the face image.
The four extracted features are cascaded and called image quality features, and the dimension of the image quality features is 121 dimensions.
S22) constructing a deep convolution network model, and extracting the deep network characteristics of the face image block through the deep convolution network, wherein the specific explanation is as follows:
the size of the face image obtained in the step S1 is scaled to 112 multiplied by 112, random segmentation of the face image is realized through random scaling and random cutting, the size of an image block is set to 48 multiplied by 48, the face image block is used as the input of a convolutional neural network, the neural network is trained to extract features, the local image block is used for increasing the scale of a training set, the convolutional neural network can concentrate on learning and extracting effective information related to a spoofing attack mode, meanwhile, the resolution of the original input is maintained, and the discriminative information is prevented from losing;
the existing public data set has smaller scale, and the convolution network model can adopt a model with smaller complexity. This example uses a pre-trained ResNet18 model on an ImageNet dataset. Meanwhile, the convolution kernel size of the first convolution layer of the ResNet18 is reduced to 3×3, the step length is reduced to 1, the next layer of the last convolution layer is a global pooling layer, the global pooling layer averages each feature image output by the convolution layer, then the averages are connected into a one-dimensional vector, the next layer of the global pooling layer is a full-connection layer, the full-connection layer takes the one-dimensional vector output by the global pooling layer as an input, the output dimension is a corresponding class number, in the embodiment, the class number is set to 2, and the real face and the attack face are respectively corresponding.
The Loss function when the convolutional network is trained is the Focal Loss function. The formula of the Focal Loss function is as follows:
FL(p t )=-(1-p t ) γ log(p t )
wherein,,p is the probability of a real face sample output by a network, y represents the real label of an input image, the label value of the real face is 1, the label value of an attack face is 0, gamma is called a focusing parameter, and the value is larger than 0. Modulation factor (1-p) t ) γ The prediction score of the model is integrated into the loss function, so that the model can be adaptively adjusted according to the difficulty of the sample, and the gamma value is 2 in the embodiment, and in general, the face video is attackedThe number of the cross entropy Loss functions is several times that of the real face videos, and the problem of data unbalance commonly existing in a data set can be solved by adopting the Focal Loss function to replace the conventional cross entropy Loss function.
In the convolutional neural network training process, a label smoothing finger is adopted to convert a traditional single-heat coding label into a soft label, and the label smoothing finger is as follows:
wherein y is oh Representing a conventional one-time heat coded label, y ls Representing the soft label after label smoothing. Tag smoothing reduces the tag value at the correct class to 1-alpha times the original, and the original 0 item becomesK represents the number of categories, alpha ε [0,1 ]]. In this embodiment, α has a value of 0.1. Tag smoothing encourages models to choose the correct category by properly reducing the value of the correct tag, but without undue confidence. For face representation attack detection, the positive and negative samples are very similar in the image domain. Therefore, in the initial stage of the network, the adoption of the hard tag easily leads to rapid fitting of the network, and the generalization capability of a convolution network model is further improved by introducing a tag smoothing method.
The optimization method used in training the neural network is a random gradient descent method, the initial learning rate and the weight attenuation are respectively set to 0.001 and 0.00001, the learning rate regulator is a cosine annealing regulator with restarting, the lowest learning rate is set to 0.00004, the cosine period is 5 rounds, the total iteration is 30 rounds, after the network training is finished, the last global average pooling layer and the full connection layer are removed, the front convolution block group is utilized to extract the depth network characteristics of the face image block, and the depth network characteristic dimension is 512 dimensions;
s23) respectively standardizing the two features according to the image quality features and the depth network features, then cascading the two features, and performing dimension reduction on the cascading features through PCA to generate final fusion features, wherein the specific description is as follows:
extracting two groups of features of image quality features and depth network features from each face image of a face image database, calculating to obtain an average value and a variance of the two groups of features, normalizing the two groups of features, and directly cascading the normalized corresponding image quality features and depth network features for each face image, wherein the direct cascading length of the two features is 633;
adopting PCA (principal component analysis) to reduce the dimension of the cascade features, wherein the feature after the dimension reduction is called as a fusion feature, and in order to determine a relatively good principal component number of the PCA, the embodiment determines a cutting point by setting an experiment with a larger principal component number; in this embodiment, the dimension of the PCA after dimension reduction is set to 400, then the principal components are sorted from large to small according to the variance, the cumulative value of the variance is calculated, and the dimension of the PCA dimension reduction is redetermined according to the proportion of the sum of the variance to the sum of the total variances. The dimension of PCA dimension reduction is set to 256 dimensions in the embodiment;
s3: initializing dictionary atoms by using fusion characteristics of training samples, and training a dictionary learning classifier based on a low-rank shared dictionary;
the embodiment adopts a dictionary learning method based on a low-rank shared dictionary, and sets a total dictionary D= [ D ] 1 ,D 2 ,D 0 ]∈R m×n Where m represents the dimension of the fused feature, n represents the size of the dictionary, class dictionary D 1 And D 2 The size of the class dictionary is set to 125 corresponding to the real face and the attack face, respectively. Shared dictionary D 0 The size of the dictionary is set to be 20, fusion features are extracted from the training set images, dictionary atoms are initialized by using the fusion features, wherein two class dictionaries randomly extract samples from corresponding classes, a shared dictionary randomly extracts samples from the whole training set, and the atoms of the dictionary are normalized by L2;
the cost function J of the dictionary model is minimized by iteratively optimizing the dictionary D and the coefficients X, and in this embodiment, the iteration number is set to 25, and the cost function J of the dictionary model is defined as follows:
the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the effect of the discrimination fidelity term is to realize the recognition capability of a dictionary; the function of the judging coefficient term is to increase similarity and reduce similarity among classes, and the function of the L1 regularization term is to realize sparseness of the coefficient X; the function of the kernel norm is to restrict the size of subspace formed by the shared dictionary, ensure the low rank performance of the shared dictionary, lambda 1 、λ 2 And eta are used to weigh the specific gravity of each item of the cost function, in this embodiment lambda 1 Set to 0.1 lambda 2 Let 0.01, η be 0.0001;
specifically, the definition of the discriminant fidelity term is as follows:
wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n c Representing the number of samples of class c, c having a value of 1 or 2, D representing the total dictionary, D c Sub-dictionary representing class c, +.>Representing the coefficient of the c-type sample on the i-type dictionary, wherein the value of i is 1 or 2;
specifically, the definition of the discrimination coefficient term is as follows:
wherein M is c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M 0 Representing the average value of the coefficients in the shared dictionary,the function of (2) is to force the coefficients of all training samples on the shared dictionary to be close to the average value, so as to prevent the too large contribution gap of the shared dictionary to the samples of different categories from affecting the classification performance;
and (3) iteratively optimizing the dictionary for a certain number of times by alternately optimizing the dictionary and minimizing the cost function of the dictionary model by using the sparse coefficient, then saving the dictionary, constructing two class dictionaries with shared dictionaries by using the saved dictionaries, and solving the sparse coefficient of the test sample by using the fixed class dictionary.
S4: and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error.
Based on the sparse coefficient of the test sample obtained by regularization of the elastic network, reconstructing fusion characteristics of the test sample through the sparse coefficient, and reconstructing the category with the minimum residual error as the prediction category of the test sample.
The dictionary D obtained by the present embodiment is used to construct two sub-dictionaries, respectivelyAnd->Namely, the dictionary stored in step S3 constructs two class dictionaries with shared dictionaries, and when the sparse coefficient of the test sample y is solved, the embodiment adopts elastic network regularization, and the optimization problem of the model is as follows:
wherein,,and (3) representing a class dictionary with a shared dictionary, and x represents sparse coefficients corresponding to the test sample y. The second term is an L1 regularization term, the third term is an L2 regularization term, λ a And lambda (lambda) b For balancing the specific gravity of the L1 regularization term and the L2 regularization termIn the examples lambda a Set to 0.01 lambda b Let 0.01, L2 regularization tends to make the solution of x smoother than L1 regularization, and thus by linearly combining L1 regularization and L2 regularization, improved sparse coding can be produced.
After the sparse coefficient of the test sample y is obtained, reconstructing y according to the coefficient corresponding to the sub-dictionary of each class. The class with the smallest reconstructed residual is taken as the prediction class, and the following formula is shown:
as shown in Table 1 below, the performance of this example was compared with that of a single feature on three data sets of REPLAY-ATTACK, CASIA-FASD, MSU-MFSD, and the evaluation index was HTER (half total error rate).
Table 1 comparison of performance using different features on three published data sets
REPLAY-ATTACK | CASIA-FASD | MSU-MFSD | |
Image quality features | 12.85% | 13.99% | 13.71% |
Deep network features | 2.37% | 4.81% | 11.13% |
Fusion features | 1.92% | 4.41% | 9.39% |
Table 1 shows that the depth network features cannot automatically extract all discriminant factors in the artificial design features, and the method further utilizes image information by fusing the image quality features and the depth network features, so that the feature recognition capability is effectively enhanced.
As shown in Table 2 below, the performance of this example was compared with other methods in the CASIA-FASD and REPLAY-ATTACK cross dataset scenarios, with an evaluation index HTER.
TABLE 2 Performance comparison Using different features in a Cross dataset scenario
Table 2 shows that compared with the LBP et al artificial design method and the single CNN method, the method has better generalization under the cross-dataset scene.
The embodiment also provides a face representation attack detection system based on fusion features and dictionary learning, which comprises: the device comprises a face image database construction module, a primary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a class judgment module of a test sample;
in this embodiment, the preliminary fusion feature extraction module includes an image quality feature extraction module and a depth network feature extraction module;
in this embodiment, the face image database construction module is configured to perform face detection and clipping on an input video to construct a face image database;
in this embodiment, the primary fusion feature extraction module is configured to extract fusion features of face images in a face image database, where the fusion features include image quality features and depth network features;
in this embodiment, the image quality feature extraction module is configured to extract image quality features of the entire face image according to a distortion source of secondary imaging of the face image;
in this embodiment, the depth network feature extraction module is configured to construct a depth convolution network model, and extract a depth network feature of the face image block through the depth convolution network;
in this embodiment, the final fusion feature generation module is configured to respectively normalize two features according to an image quality feature and a depth network feature, and then concatenate the two features, where the concatenated features undergo dimension reduction through PCA, so as to generate a final fusion feature;
in this embodiment, the dictionary learning classifier training module is configured to initialize dictionary atoms based on the fusion features, and train a dictionary learning classifier based on a low-rank shared dictionary;
in this embodiment, the class determination module of the test sample is configured to determine a class of the test sample based on a size of the fusion feature reconstruction residual.
Through the description of the technical scheme, the invention can fully utilize the information provided by a single frame image by combining the image quality characteristics and the depth network characteristics which are designed manually, and enhance the distinguishing capability of the characteristics. The invention optimizes the structure and training mode of the convolutional neural network aiming at the characteristic of the face representing attack data set, solves the problem of data unbalance through the Focal Loss function, and further improves the generalization capability of the depth network through the label smoothing technology. In addition, the commonality of true and false samples is stripped by introducing a low-rank shared dictionary, and sparse coding of test samples is improved by adopting elastic network regularization, so that the accuracy of the dictionary learning classifier is further improved. The method has good generalization and is suitable for two-dimensional face representation attack detection in actual scenes.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (6)
1. The face representation attack detection method based on fusion characteristics and dictionary learning is characterized by comprising the following steps of:
performing face detection and clipping on the input video to construct a face image database;
extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics;
extracting image quality characteristics of the whole face image according to a distortion source of the face image secondary imaging;
constructing a depth convolution network model, and extracting depth network characteristics of the face image block through the depth convolution network;
according to the image quality characteristics and the depth network characteristics, respectively standardizing the two characteristics, cascading, and performing dimension reduction on the cascading characteristics through PCA to generate final fusion characteristics;
initializing dictionary atoms based on fusion features, and training a dictionary learning classifier based on a low-rank shared dictionary;
the dictionary atoms are initialized based on the fusion characteristics, the dictionary learning classifier based on the low-rank shared dictionary is trained, and the specific steps comprise: minimizing a cost function of the dictionary model by alternately optimizing the dictionary and the sparse coefficient, and storing the dictionary after iterative optimization for set times;
the cost function of the dictionary model is expressed as:
wherein the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, and the third term is L1 regularizing terms, wherein a fourth term is a nuclear norm, and distinguishing the fidelity terms is used for realizing the recognition power of the dictionary; the judging coefficient term is used for increasing similarity and reducing similarity among the classes, and the L1 regularization term is used for realizing sparsity of the coefficient X; the kernel norm is used for restricting the subspace size formed by the shared dictionary, ensuring the low rank performance of the shared dictionary, and lambda 1 、λ 2 And eta is used for balancing the proportion of each item of the cost function, D 0 Representing a shared dictionary;
the discriminant fidelity term is defined as:
wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n c Representing the number of samples of class c, D representing the global dictionary, D c Sub-dictionary representing class c, X c i Representing coefficients of class c samples on a class i dictionary;
the discrimination coefficient term is defined as:
wherein M is c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M 0 Representing the average value of the coefficients in the shared dictionary,the effect of (2) is to force the coefficients of all training samples on the shared dictionary to be close to the average value;
and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error.
2. The face representation attack detection method based on fusion feature and dictionary learning according to claim 1, wherein the extracting the image quality feature of the whole face image according to the distortion source of the face image secondary imaging comprises the specific steps of: and extracting specular reflection characteristics, extracting fuzzy characteristics, extracting color moment characteristics and extracting color diversity characteristics, and cascading the extracted characteristics to obtain image quality characteristics.
3. The face representation attack detection method based on fusion feature and dictionary learning according to claim 1, wherein the depth network feature of the face image block is extracted through a depth convolution network, and the specific steps include:
the face image block is generated by carrying out random scaling and random cutting on the complete face image, a lightweight depth convolution network model is constructed, the face image block is used as input of the convolution network model, the depth network characteristics of the face image block are extracted by training the convolution network model through a Focal Loss function, and a label smoothing method is adopted to convert the single-heat coding label into a soft label, so that the training process of the depth convolution neural network is optimized.
4. The face representation attack detection method based on fusion features and dictionary learning according to claim 1, further comprising a step of solving a sparse coefficient of a test sample, specifically: and constructing two class dictionaries with shared dictionaries through the saved dictionaries, and solving sparse coefficients of the test samples by using the fixed class dictionaries.
5. The face representation attack detection method based on fusion feature and dictionary learning according to claim 1, wherein the step of judging the class of the test sample based on the magnitude of the fusion feature reconstruction residual error comprises the following specific steps:
based on the sparse coefficient of the test sample obtained by regularization of the elastic network, reconstructing fusion characteristics of the test sample through the sparse coefficient, and reconstructing the category with the minimum residual error as the prediction category of the test sample.
6. A face representation attack detection system based on fusion features and dictionary learning, comprising: the device comprises a face image database construction module, a primary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a class judgment module of a test sample;
the primary fusion feature extraction module comprises an image quality feature extraction module and a depth network feature extraction module;
the face image database construction module is used for carrying out face detection and cutting on the input video to construct a face image database;
the primary fusion feature extraction module is used for extracting fusion features of face images in a face image database, wherein the fusion features comprise image quality features and depth network features;
the image quality feature extraction module is used for extracting the image quality features of the whole face image according to the distortion sources of the secondary imaging of the face image;
the depth network feature extraction module is used for constructing a depth convolution network model and extracting the depth network features of the face image blocks through the depth convolution network;
the final fusion feature generation module is used for respectively standardizing the two features according to the image quality features and the depth network features, then cascading the two features, and performing dimension reduction on the cascading features through PCA to generate final fusion features;
the dictionary learning classifier training module is used for initializing dictionary atoms based on the fusion characteristics and training a dictionary learning classifier based on a low-rank shared dictionary;
the dictionary atoms are initialized based on the fusion characteristics, the dictionary learning classifier based on the low-rank shared dictionary is trained, and the specific steps comprise: minimizing a cost function of the dictionary model by alternately optimizing the dictionary and the sparse coefficient, and storing the dictionary after iterative optimization for set times;
the cost function of the dictionary model is expressed as:
the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition capability of the dictionary; the judging coefficient term is used for increasing similarity and reducing similarity among the classes, and the L1 regularization term is used for realizing sparsity of the coefficient X; the kernel norm is used for restricting the subspace size formed by the shared dictionary, ensuring the low rank performance of the shared dictionary, and lambda 1 、λ 2 And eta is used for balancing the proportion of each item of the cost function, D 0 Representing a shared dictionary;
the discriminant fidelity term is defined as:
wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n c Representing the number of samples of class c, D representing the global dictionary, D c Sub-dictionary representing class c, X c i Representing coefficients of class c samples on a class i dictionary;
the discrimination coefficient term is defined as:
wherein M is c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M 0 Representing the average value of the coefficients in the shared dictionary,the effect of (a) is to force the coefficients of all training samples on a shared dictionary to be close to flatThe average value;
the class judging module of the test sample is used for judging the class of the test sample based on the size of the fusion characteristic reconstruction residual error.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010696193.4A CN111967331B (en) | 2020-07-20 | 2020-07-20 | Face representation attack detection method and system based on fusion feature and dictionary learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010696193.4A CN111967331B (en) | 2020-07-20 | 2020-07-20 | Face representation attack detection method and system based on fusion feature and dictionary learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111967331A CN111967331A (en) | 2020-11-20 |
CN111967331B true CN111967331B (en) | 2023-07-21 |
Family
ID=73362137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010696193.4A Active CN111967331B (en) | 2020-07-20 | 2020-07-20 | Face representation attack detection method and system based on fusion feature and dictionary learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111967331B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505722B (en) * | 2021-07-23 | 2024-01-02 | 中山大学 | Living body detection method, system and device based on multi-scale feature fusion |
CN113449707B (en) * | 2021-08-31 | 2021-11-30 | 杭州魔点科技有限公司 | Living body detection method, electronic apparatus, and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281845A (en) * | 2014-10-29 | 2015-01-14 | 中国科学院自动化研究所 | Face recognition method based on rotation invariant dictionary learning model |
CN105844223A (en) * | 2016-03-18 | 2016-08-10 | 常州大学 | Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning |
CN107194873A (en) * | 2017-05-11 | 2017-09-22 | 南京邮电大学 | Low-rank nuclear norm canonical facial image ultra-resolution method based on coupling dictionary learning |
CN107832747A (en) * | 2017-12-05 | 2018-03-23 | 广东技术师范学院 | A kind of face identification method based on low-rank dictionary learning algorithm |
CN108985177A (en) * | 2018-06-21 | 2018-12-11 | 南京师范大学 | A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint |
CN109766813A (en) * | 2018-12-31 | 2019-05-17 | 陕西师范大学 | Dictionary learning face identification method based on symmetrical face exptended sample |
CN110428392A (en) * | 2019-09-10 | 2019-11-08 | 哈尔滨理工大学 | A kind of Method of Medical Image Fusion based on dictionary learning and low-rank representation |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780342A (en) * | 2016-12-28 | 2017-05-31 | 深圳市华星光电技术有限公司 | Single-frame image super-resolution reconstruction method and device based on the reconstruct of sparse domain |
-
2020
- 2020-07-20 CN CN202010696193.4A patent/CN111967331B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281845A (en) * | 2014-10-29 | 2015-01-14 | 中国科学院自动化研究所 | Face recognition method based on rotation invariant dictionary learning model |
CN105844223A (en) * | 2016-03-18 | 2016-08-10 | 常州大学 | Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning |
CN107194873A (en) * | 2017-05-11 | 2017-09-22 | 南京邮电大学 | Low-rank nuclear norm canonical facial image ultra-resolution method based on coupling dictionary learning |
CN107832747A (en) * | 2017-12-05 | 2018-03-23 | 广东技术师范学院 | A kind of face identification method based on low-rank dictionary learning algorithm |
CN108985177A (en) * | 2018-06-21 | 2018-12-11 | 南京师范大学 | A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint |
CN109766813A (en) * | 2018-12-31 | 2019-05-17 | 陕西师范大学 | Dictionary learning face identification method based on symmetrical face exptended sample |
CN110428392A (en) * | 2019-09-10 | 2019-11-08 | 哈尔滨理工大学 | A kind of Method of Medical Image Fusion based on dictionary learning and low-rank representation |
Also Published As
Publication number | Publication date |
---|---|
CN111967331A (en) | 2020-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111460931B (en) | Face spoofing detection method and system based on color channel difference image characteristics | |
Nishiyama et al. | Facial deblur inference using subspace analysis for recognition of blurred faces | |
CN111160313B (en) | Face representation attack detection method based on LBP-VAE anomaly detection model | |
CN1975759A (en) | Human face identifying method based on structural principal element analysis | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN113011357A (en) | Depth fake face video positioning method based on space-time fusion | |
CN110472089B (en) | Infrared and visible light image retrieval method based on countermeasure generation network | |
CN111967331B (en) | Face representation attack detection method and system based on fusion feature and dictionary learning | |
CN105956570B (en) | Smiling face's recognition methods based on lip feature and deep learning | |
Ding et al. | Noise-resistant network: a deep-learning method for face recognition under noise | |
CN112381987A (en) | Intelligent entrance guard epidemic prevention system based on face recognition | |
CN114913588A (en) | Face image restoration and recognition method applied to complex scene | |
CN112818774A (en) | Living body detection method and device | |
CN114764939A (en) | Heterogeneous face recognition method and system based on identity-attribute decoupling | |
Zhang et al. | Spatial–temporal gray-level co-occurrence aware CNN for SAR image change detection | |
Ma | Improving SAR target recognition performance using multiple preprocessing techniques | |
Huang et al. | Multi-Teacher Single-Student Visual Transformer with Multi-Level Attention for Face Spoofing Detection. | |
CN117095471B (en) | Face counterfeiting tracing method based on multi-scale characteristics | |
Giap et al. | Adaptive multiple layer retinex-enabled color face enhancement for deep learning-based recognition | |
JP3962517B2 (en) | Face detection method and apparatus, and computer-readable medium | |
Li et al. | A new qr code recognition method using deblurring and modified local adaptive thresholding techniques | |
Krupiński et al. | Binarization of degraded document images with generalized Gaussian distribution | |
Karamizadeh et al. | Skin Classification for Adult Image Recognition Based on Combination of Gaussian and Weight-KNN | |
Shu et al. | Face anti-spoofing based on weighted neighborhood pixel difference pattern | |
CN111754459B (en) | Dyeing fake image detection method based on statistical depth characteristics and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |