CN111967331A - Face representation attack detection method and system based on fusion feature and dictionary learning - Google Patents

Face representation attack detection method and system based on fusion feature and dictionary learning Download PDF

Info

Publication number
CN111967331A
CN111967331A CN202010696193.4A CN202010696193A CN111967331A CN 111967331 A CN111967331 A CN 111967331A CN 202010696193 A CN202010696193 A CN 202010696193A CN 111967331 A CN111967331 A CN 111967331A
Authority
CN
China
Prior art keywords
dictionary
fusion
face
face image
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010696193.4A
Other languages
Chinese (zh)
Other versions
CN111967331B (en
Inventor
傅予力
黄汉业
向友君
许晓燕
吕玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010696193.4A priority Critical patent/CN111967331B/en
Publication of CN111967331A publication Critical patent/CN111967331A/en
Application granted granted Critical
Publication of CN111967331B publication Critical patent/CN111967331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face representation attack detection method and system based on fusion features and dictionary learning, wherein the method comprises the following steps: extracting image quality characteristics of the complete face image according to a distortion source of secondary imaging of the face image; constructing a depth convolution network model, and extracting the depth network characteristics of the human face image block through a depth convolution network; cascading the two characteristics through PCA to generate final fusion characteristics; initializing dictionary atoms by utilizing the fusion characteristics, and training a dictionary learning classifier based on a low-rank shared dictionary; and judging the category of the test sample based on the size of the fusion feature reconstruction residual error. The face representation attack detection is carried out by combining the image quality characteristic and the depth network characteristic for the first time, the information provided by a single-frame image is better utilized, and the discrimination capability of extracting the characteristic is effectively enhanced; the same mode of the true and false samples is stripped through the low-rank shared dictionary for the first time, so that the accuracy of attack detection is successfully improved, and the method has good generalization.

Description

Face representation attack detection method and system based on fusion feature and dictionary learning
Technical Field
The invention relates to the technical field of image processing, in particular to a face representation attack detection method and system based on fusion features and dictionary learning.
Background
Nowadays, the face recognition technology is widely applied in security, payment, entertainment facilities and other scenes. However, the face recognition system has a certain safety hazard. With the development of social networks and the popularization of smart phones, more and more people share personal photos and videos on the network, lawless persons can attack a face recognition system by disguising others through the media or intentionally confusing personal identities, and the purposes of infringing the property safety of others, escaping legal sanctions and the like are achieved. The method for detecting the attack is called face living body detection, and the method is called face representation attack by means of pictures, videos and the like of a legal user to borrow the identity of the user through the operation of a face recognition system.
In human face living body detection, human face images can be divided into two types, one type is an image obtained by directly shooting a legal user. The photographic subject of another type of image may be a photograph, video, wax image, etc. of a legitimate user with a high degree of facial similarity to the legitimate user. Such images are called face representation attack images (called attack faces for short) and are objects to be detected by the living body detection technology.
The core of the human face living body detection algorithm is to extract the most discriminative features of a detected living body in a human face image, and the traditional detection technology is based on manually designed features, such as Local Binary Pattern (LBP) and Local Phase Quantization (LPQ), and as the imaging quality of equipment is continuously improved, the manually designed features capable of detecting an attacking human face become very difficult. In recent years, automatic feature extraction using a convolutional neural network has become the mainstream. The deep convolutional neural network is excellent in image classification task, but is limited by the scale of a living body detection data set, and the deep network supervised only by class labels tends to memorize any characteristics existing in a training set, so that overfitting is easily caused, and algorithm generalization is poor.
Disclosure of Invention
In order to overcome the defects and shortcomings in the prior art, the invention provides the face representation attack detection method based on fusion characteristics and dictionary learning.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a face representation attack detection method based on fusion features and dictionary learning, which comprises the following steps:
carrying out face detection and cutting on an input video to construct a face image database;
extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics;
extracting image quality characteristics of the complete face image according to a distortion source of secondary imaging of the face image;
constructing a depth convolution network model, and extracting the depth network characteristics of the human face image block through a depth convolution network;
respectively standardizing and cascading the two characteristics according to the image quality characteristic and the depth network characteristic, and reducing the dimension of the cascaded characteristics through PCA to generate final fusion characteristics;
initializing dictionary atoms based on fusion characteristics, and training a dictionary learning classifier based on a low-rank shared dictionary;
and judging the category of the test sample based on the size of the fusion feature reconstruction residual error.
As a preferred technical solution, the extracting of the image quality characteristics of the complete face image according to the distortion source of the secondary imaging of the face image specifically comprises the following steps: and extracting specular reflection characteristics, fuzzy characteristics, color moment characteristics and color diversity characteristics, and cascading the extracted characteristics to obtain image quality characteristics.
As a preferred technical scheme, the extracting of the depth network features of the face image block through the depth convolution network specifically comprises the following steps:
the method comprises the steps of generating a face image block by randomly zooming and randomly cutting a complete face image, constructing a lightweight depth convolution network model, taking the face image block as the input of the convolution network model, training the convolution network model by adopting a Focal local Loss function to extract the depth network characteristics of the face image block, converting a one-hot coded label into a soft label by adopting a label smoothing method, and optimizing the training process of a depth convolution neural network.
As a preferred technical scheme, the initializing dictionary atoms based on the fusion features and training the dictionary learning classifier based on the low-rank shared dictionary specifically comprise the following steps: and (4) alternately optimizing the dictionary and minimizing the cost function of the dictionary model by the sparse coefficient, and storing the dictionary after the set times of iterative optimization.
As a preferred technical solution, the cost function of the dictionary model is expressed as:
Figure BDA0002591071660000034
the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on a Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition power of the dictionary; the discrimination coefficient item is used for increasing the intra-class similarity and reducing the inter-class similarity, and the L1 regularization item is used for realizing the sparsity of the coefficient X; the kernel norm is used for constraining the size of a subspace spanned by the shared dictionary, and the low rank property, lambda, of the shared dictionary is ensured1、λ2And η is used to trade off the specific gravity of the terms of the cost function;
the discriminant fidelity term is defined as:
Figure BDA0002591071660000031
wherein,
Figure BDA0002591071660000032
representing samples of class c, the samples being fusion features, m representing the dimension of the fusion features, ncRepresenting the number of samples of class c, D representing the global dictionary, DcSub-dictionary, X, representing class cci represents the coefficient of the class c sample on the class i dictionary;
the term discriminant coefficient is defined as:
Figure BDA0002591071660000033
wherein M iscRepresenting the mean value of the sparse coefficients of class c samples, M representing the mean value of the sparse coefficients of the entire training set, M0Represents the average of the coefficients on the shared dictionary,
Figure BDA0002591071660000041
the effect of (a) is to force the coefficients of all training samples on the shared dictionary to be close to the average.
As a preferred technical scheme, the method further comprises the step of solving the sparse coefficient of the test sample, specifically: and constructing two class dictionaries with shared dictionaries through the stored dictionaries, and solving sparse coefficients of the test samples by fixing the class dictionaries.
As a preferred technical solution, the determining the type of the test sample based on the size of the fusion feature reconstruction residual error specifically includes:
and solving the sparse coefficient of the test sample based on the regularization of the elastic network, reconstructing the fusion characteristics of the test sample through the sparse coefficient, and reconstructing the class with the minimum residual error as the prediction class of the test sample.
The invention provides a face representation attack detection system based on fusion features and dictionary learning, which comprises the following steps: the system comprises a face image database construction module, a preliminary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a test sample category judgment module;
the preliminary fusion feature extraction module comprises an image quality feature extraction module and a depth network feature extraction module;
the face image database construction module is used for carrying out face detection and cutting on an input video to construct a face image database;
the preliminary fusion feature extraction module is used for extracting fusion features of the face images in the face image database, and the fusion features comprise image quality features and depth network features;
the image quality characteristic extraction module is used for extracting the image quality characteristics of the complete face image according to the distortion source of the secondary imaging of the face image;
the depth network feature extraction module is used for constructing a depth convolution network model and extracting the depth network features of the human face image blocks through a depth convolution network;
the final fusion feature generation module is used for respectively standardizing and cascading the two features according to the image quality feature and the depth network feature, and reducing the dimension of the cascaded features through PCA to generate final fusion features;
the dictionary learning classifier training module is used for initializing dictionary atoms based on fusion characteristics and training a dictionary learning classifier based on a low-rank shared dictionary;
the type judgment module of the test sample is used for judging the type of the test sample based on the size of the fusion characteristic reconstruction residual error.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention fully utilizes the information provided by the single-frame image and enhances the discrimination capability of the characteristics by fusing the image quality characteristics and the depth network characteristics which are artificially designed.
(2) The method adopts the low-rank shared dictionary to strip the commonalities of true and false samples, ensures that the category dictionary can better represent the difference between the true sample and the attack sample, avoids the defect that overfitting is easy to happen by adopting a full-connection layer, has good generalization, and further improves the accuracy of attack detection.
(3) According to the method, the elastic network regularization is adopted to replace the traditional L1 regularization, so that the problem that certain characteristics are easily ignored by a model when L1 regularization is used is solved, the method is beneficial to keeping detailed characteristics, and the discrimination of sparse coefficients is enhanced.
(4) The invention adopts the randomly cut and generated image blocks as the input of the convolutional neural network, so that the convolutional neural network is focused on learning and extracting effective information related to the deception mode, the data set scale is enlarged in an effective mode, and the problem of performance degradation caused by small data scale is effectively solved.
Drawings
Fig. 1 is a schematic flow chart of a face representation attack detection method based on fusion features and dictionary learning according to the present embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Examples
As shown in fig. 1, the present embodiment provides a face representation attack detection method based on fusion feature and dictionary learning, including the following steps:
s1: carrying out face detection and cutting on an input video to construct a face image database;
in the embodiment, disclosed face representation ATTACK video data sets REPLAY-ATTACK, CASIA-FASD and MSU-MFSD are selected, the three data sets comprise real face videos and ATTACK face videos, a training set and a testing set are divided, the first 30 frames of each video of the data sets are extracted, a cascade classifier based on Haar features is adopted to detect the positions of faces in picture frames, and face images are cut out;
s2: extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics, and the method comprises the following specific steps:
s21) extracting the image quality characteristics of the complete face image according to the distortion source of the face image secondary imaging,
according to a human face image database, sub-feature extraction is carried out on a human face image through the aspects of ambiguity, specular reflection, color distortion and the like, and the final sample image quality feature vector is formed by splicing all sub-feature vectors, and the method specifically comprises the following steps:
under the same imaging environment, the real access face is a primary imaging picture, the face represents that the attack is a secondary imaging picture, the source of the face image distortion in the secondary imaging process is analyzed to be beneficial to enhancing the discrimination of the extracted features, and the image quality features are extracted from the four aspects of specular reflection, ambiguity, color moment distortion and color diversity distortion;
extracting specular reflection characteristics: iteratively replacing the chromaticity of the highlight position of the input face image with the maximum diffuse reflection chromaticity of an adjacent pixel, then extracting a specular reflection component in the image, and further forming specular reflection characteristics by using the percentage value, the average value and the variance value of the component;
fuzzy features are extracted: and extracting the fuzzy characteristics of the image by adopting a method based on secondary fuzzy. The input image is converted into a gray image, the gray image is subjected to low-pass filtering by using a Gaussian filter with a convolution kernel of 33, and the filtered image is called a blurred image. The image definition is measured by comparing the change of adjacent pixels of the image, the specific mode is to calculate the absolute difference image in the horizontal direction and the vertical direction, calculate the gray value sum of all pixel points of the absolute difference image, which is respectively called as the horizontal difference sum and the vertical difference sum, and the ratio of the horizontal difference sum and the vertical difference sum of the image before and after filtering forms the fuzzy characteristic.
Extracting color moment characteristics: firstly, converting an input face image from an RGB space to an HSV space with relatively independent channels, then calculating the average value, the variance value and the skewness of each channel, calculating the percentage of pixels in a minimum histogram box and a maximum histogram box of each channel, and forming the color moment characteristics by the 5 values.
Extracting color diversity characteristics: the R, G, B three channels of the input picture are color quantized and then the color diversity feature is constructed using the histogram bin count of the first 100 most commonly occurring colors and the number of all the different colors that appear in the face image.
The four extracted features are cascaded and called as image quality features, and the dimension of the image quality features is 121 dimensions.
S22), constructing a depth convolution network model, and extracting the depth network characteristics of the human face image block through the depth convolution network, wherein the depth convolution network model is specifically described as follows:
the size of the face image obtained in the step S1 is scaled to 112 x 112, the face image is randomly blocked through random scaling and random cutting, the size of the image block is set to 48 x 48, the face image block is used as the input of a convolutional neural network, the neural network is trained to extract features, the scale of a training set is increased by using a local image block, the convolutional neural network can be focused on learning to extract effective information related to a cheating attack mode, meanwhile, the original input resolution is kept, and the loss of discriminative information is prevented;
the scale of the existing public data set is small, and a convolution network model can adopt a model with small complexity. This embodiment employs the ResNet18 model pre-trained on ImageNet datasets. Meanwhile, the convolution kernel size of the first convolutional layer of the ResNet18 is reduced to 3 × 3, the step size is reduced to 1, the next layer of the last convolutional layer is a global pooling layer, the global pooling layer averages each feature image output by the convolutional layer, and then connects the averages into a one-dimensional vector, the next layer of the global pooling layer is a full-link layer, the full-link layer takes the one-dimensional vector output by the global pooling layer as input, the output dimension is the corresponding category number, and the category number in this embodiment is set to 2, and respectively corresponds to a real face and an attack face.
The Loss function in convolutional network training is the Focal local function. The formula for the Focal local Loss function is as follows:
FL(pt)=-(1-pt)γlog(pt)
wherein,
Figure BDA0002591071660000081
p is the probability of a real face sample output by the network, y represents the real label of the input image, the label value of the real face is 1, the label value of the attack face is 0, gamma is called a focusing parameter, and the value is greater than 0. Modulation factor (1-p)t)γThe prediction score of the model is integrated into the Loss function, so that the model can be adaptively adjusted according to the difficulty of the sample, the value of gamma is 2, the number of the attack face videos is several times that of the real face videos, and the problem of data imbalance commonly existing in a data set can be solved by adopting a Focal local Loss function to replace a conventional cross entropy Loss function.
In the process of convolutional neural network training, the label smoothing is adopted to convert the traditional one-hot coded label into a soft label, as follows:
Figure BDA0002591071660000082
wherein, yohDenotes a conventional one-hot coded label, ylsRepresenting the soft label after label smoothing. Tag smoothing reduces the tag value at the correct class by (1- α) times, with the term that is 0 becoming
Figure BDA0002591071660000083
K represents the number of classes, α ∈ [0,1 ]]. In this embodiment, α is 0.1. Label smoothing encourages the model to select the correct category by appropriately reducing the value of the correct label, but without undue confidence. For face representation attack detection, the positive and negative samples are very similar in the image domain. Therefore, in the initial stage of the network, the network is easy to quickly fit by adopting the hard tag, and the generalization capability of the convolutional network model is further improved by introducing a tag smoothing method.
The optimization method used in the training of the neural network is a random gradient descent method, the initial learning rate and the weight attenuation are respectively set to be 0.001 and 0.00001, the learning rate regulator is a cosine annealing regulator with a restart function, the lowest learning rate is set to be 0.00004, the cosine cycle is 5 rounds, 30 rounds of total iteration are performed, after the network training is finished, the last global average pooling layer and the full connection layer are removed, the depth network characteristics of the human face image block are extracted by utilizing the previous convolution block group, and the dimension of the depth network characteristics is 512 dimensions;
s23) respectively standardizing and cascading the two features according to the image quality feature and the depth network feature, and performing dimensionality reduction on the cascaded features through PCA to generate final fusion features, wherein the details are as follows:
extracting two groups of features of image quality features and depth network features from each face image of a face image database, calculating to obtain the average value and variance of the two groups of features, standardizing the two groups of features, and directly cascading the image quality features and the depth network features which correspond to each standardized face image, wherein the direct cascading length of the two features is 633;
adopting PCA (principal component analysis) to reduce the dimension of the cascade feature, wherein the feature after dimension reduction is called as a fusion feature, and in order to determine a relatively good PCA principal component number, the embodiment firstly determines a cutting point by setting an experiment with a larger principal component number; in this embodiment, the dimensionality of the PCA after dimension reduction is set to 400, then the principal components are sorted from large to small according to the variance, the cumulative value of the variance is calculated, and the dimension of the PCA after dimension reduction is determined again according to the proportion of the cumulative sum of the variances to the sum of the total variances. The dimension of PCA dimensionality reduction is selected to be 256 dimensions in the embodiment;
s3: initializing dictionary atoms by utilizing the fusion characteristics of training samples, and training a dictionary learning classifier based on a low-rank shared dictionary;
in this embodiment, a dictionary learning method based on a low-rank shared dictionary is adopted, and a total dictionary D ═ D is set1,D2,D0]∈Rm×nWhere m represents the dimension of the fused feature, n represents the size of the dictionary, class dictionary D1And D2The size of the class dictionary is set to 125 corresponding to the real face and the attack face, respectively. Shared dictionary D0Is set to 20, fusion features are extracted from the training set images, and dictionary atoms are initialized by using the fusion features, wherein twoThe class dictionary randomly extracts samples from corresponding classes, the shared dictionary randomly extracts samples from the whole training set, and atoms of the dictionary are normalized through L2;
the cost function J of the dictionary model is minimized by iteratively optimizing the dictionary D and the coefficient X, in this embodiment, the iteration number is set to 25, and the cost function J of the dictionary model is defined as follows:
Figure BDA0002591071660000101
the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on a Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition power of the dictionary; the function of the discrimination coefficient term is to increase the intra-class similarity and reduce the inter-class similarity, and the function of the L1 regularization term is to realize the sparsity of the coefficient X; the kernel norm has the function of constraining the size of a subspace spanned by the shared dictionary and ensuring the low rank property, lambda, of the shared dictionary1、λ2And η is used to weigh the proportion of the terms of the cost function, in this example λ1Is set to 0.1, lambda2Set to 0.01, η is set to 0.0001;
specifically, the discriminant fidelity term is defined as follows:
Figure BDA0002591071660000102
wherein,
Figure BDA0002591071660000103
representing samples of class c, the samples being fusion features, m representing the dimension of the fusion features, ncRepresenting the number of samples in class c, c being 1 or 2, D representing the global dictionary, DcA sub-dictionary representing the class c,
Figure BDA0002591071660000105
representing the coefficient of the class c sample on the class i dictionary, wherein the value of i is 1 or 2;
specifically, the discrimination coefficient term is defined as follows:
Figure BDA0002591071660000104
wherein M iscRepresenting the mean value of the sparse coefficients of class c samples, M representing the mean value of the sparse coefficients of the entire training set, M0Represents the average of the coefficients on the shared dictionary,
Figure BDA0002591071660000111
the method has the advantages that the coefficients of all training samples on the shared dictionary are forced to be close to the average value, so that the phenomenon that the shared dictionary has too large contribution difference to different classes of samples to influence the classification performance is avoided;
and alternately optimizing the cost function of the dictionary model and the sparse coefficient minimization dictionary model, iteratively optimizing the cost function for a certain number of times, storing the dictionary, constructing two class dictionaries with shared dictionaries through the stored dictionaries, and fixing the class dictionaries to solve the sparse coefficient of the test sample.
S4: and judging the category of the test sample based on the size of the fusion feature reconstruction residual error.
And solving the sparse coefficient of the test sample based on the regularization of the elastic network, reconstructing the fusion characteristics of the test sample through the sparse coefficient, and reconstructing the class with the minimum residual error as the prediction class of the test sample.
Two sub-dictionaries are respectively constructed by utilizing the dictionary D obtained by the embodiment
Figure BDA0002591071660000112
And
Figure BDA0002591071660000113
that is, the dictionary stored in step S3 constructs two class dictionaries with shared dictionaries, and when solving the sparse coefficient of the test sample y, the embodiment adopts elastic network regularization, and the model optimization problem is as follows:
Figure BDA0002591071660000114
wherein,
Figure BDA0002591071660000115
and representing a class dictionary with a shared dictionary, and x represents a sparse coefficient corresponding to the test sample y. The second term is the L1 regularization term, the third term is the L2 regularization term, λaAnd λbFor weighing the specific gravity of the L1 regularization term and the L2 regularization term, in this embodiment, λaIs set to 0.01, lambdabSet to 0.01, L2 regularization tends to make the solution of x smoother than L1 regularization, and thus by linearly combining L1 regularization and L2 regularization, improved sparse coding can be produced.
And after the sparse coefficient of the test sample y is obtained, reconstructing y according to the coefficient corresponding to the sub-dictionary of each class. The class with the minimum reconstructed residual is used as the prediction class, and is shown as the following formula:
Figure BDA0002591071660000116
as shown in Table 1 below, the performance of this example was compared with that of a single feature on three data sets, REPLAY-ATTACK, CASIA-FASD and MSU-MFSD, and the evaluation index was HTER (half total error rate).
TABLE 1 comparison of Performance Using different features on three public data sets
REPLAY-ATTACK CASIA-FASD MSU-MFSD
Image quality characteristics 12.85% 13.99% 13.71%
Deep network features 2.37% 4.81% 11.13%
Fusion feature 1.92% 4.41% 9.39%
Table 1 shows that the depth network features cannot automatically extract all discriminative factors in the manually designed features, and the method further utilizes image information by fusing the image quality features and the depth network features, thereby effectively enhancing the recognition capability of the features.
As shown in the following Table 2, in the present example, the performance of the CASIA-FASD and REPLAY-ATTACK data set is compared with that of other methods, and the evaluation index is HTER.
TABLE 2 Performance comparison with different features across dataset scenarios
Figure BDA0002591071660000121
Table 2 shows that compared with the manual design methods such as LBP and the like and the single CNN method, the method provided by the invention has better generalization in a cross-data-set scene.
The embodiment further provides a face representation attack detection system based on fusion feature and dictionary learning, which includes: the system comprises a face image database construction module, a preliminary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a test sample category judgment module;
in the embodiment, the preliminary fusion feature extraction module comprises an image quality feature extraction module and a depth network feature extraction module;
in this embodiment, the face image database construction module is configured to perform face detection and clipping on an input video to construct a face image database;
in this embodiment, the preliminary fusion feature extraction module is configured to extract fusion features of a face image in a face image database, where the fusion features include image quality features and depth network features;
in the embodiment, the image quality feature extraction module is used for extracting the image quality features of the complete face image according to the distortion source of the secondary imaging of the face image;
in this embodiment, the depth network feature extraction module is configured to construct a depth convolution network model, and extract depth network features of the face image block through a depth convolution network;
in this embodiment, the final fusion feature generation module is configured to respectively standardize and cascade the two features according to the image quality feature and the depth network feature, and perform dimensionality reduction on the cascaded features through PCA to generate a final fusion feature;
in this embodiment, the dictionary learning classifier training module is configured to initialize dictionary atoms based on fusion features and train a dictionary learning classifier based on a low-rank shared dictionary;
in this embodiment, the category determination module of the test sample is configured to determine the category of the test sample based on the size of the fusion feature reconstruction residual.
Through the description of the technical scheme, the invention can be seen that the information provided by the single-frame image is fully utilized by combining the artificially designed image quality characteristic and the depth network characteristic, and the distinguishing capability of the characteristic is enhanced. The method optimizes the structure and the training mode of the convolutional neural network aiming at the characteristic that the face represents an attack data set, solves the problem of data imbalance through a Focal local Loss function, and further improves the generalization capability of the deep network through a label smoothing technology. In addition, a low-rank shared dictionary is introduced to strip out the commonality of true and false samples, sparse coding of the test samples is improved by adopting elastic network regularization, and the accuracy of the dictionary learning classifier is further improved. The method has good generalization and is suitable for the detection of the two-dimensional face representation attack in the actual scene.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (8)

1. A face representation attack detection method based on fusion features and dictionary learning is characterized by comprising the following steps:
carrying out face detection and cutting on an input video to construct a face image database;
extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics;
extracting image quality characteristics of the complete face image according to a distortion source of secondary imaging of the face image;
constructing a depth convolution network model, and extracting the depth network characteristics of the human face image block through a depth convolution network;
respectively standardizing and cascading the two characteristics according to the image quality characteristic and the depth network characteristic, and reducing the dimension of the cascaded characteristics through PCA to generate final fusion characteristics;
initializing dictionary atoms based on fusion characteristics, and training a dictionary learning classifier based on a low-rank shared dictionary;
and judging the category of the test sample based on the size of the fusion feature reconstruction residual error.
2. The method for detecting the face representation attack based on the fusion feature and the dictionary learning according to the claim 1, characterized in that the image quality feature of the complete face image is extracted according to the distortion source of the secondary imaging of the face image, and the specific steps comprise: and extracting specular reflection characteristics, fuzzy characteristics, color moment characteristics and color diversity characteristics, and cascading the extracted characteristics to obtain image quality characteristics.
3. The method for detecting the face representation attack based on the fusion feature and the dictionary learning as claimed in claim 1, wherein the depth network feature of the face image block is extracted through a depth convolution network, and the specific steps include:
the method comprises the steps of generating a face image block by randomly zooming and randomly cutting a complete face image, constructing a lightweight depth convolution network model, taking the face image block as the input of the convolution network model, training the convolution network model by adopting a Focal local Loss function to extract the depth network characteristics of the face image block, converting a one-hot coded label into a soft label by adopting a label smoothing method, and optimizing the training process of a depth convolution neural network.
4. The method for detecting the face representation attack based on the fusion feature and the dictionary learning as claimed in claim 1, wherein the dictionary atoms are initialized based on the fusion feature, and the dictionary learning classifier based on the low-rank shared dictionary is trained, and the method comprises the following specific steps: and (4) alternately optimizing the dictionary and minimizing the cost function of the dictionary model by the sparse coefficient, and storing the dictionary after the set times of iterative optimization.
5. The method for detecting human face representation attack based on fusion feature and dictionary learning according to claim 4, wherein the cost function of the dictionary model is expressed as:
Figure FDA0002591071650000021
wherein the first item is a discrimination fidelity itemThe second term is a discrimination coefficient term based on a Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition power of the dictionary; the discrimination coefficient item is used for increasing the intra-class similarity and reducing the inter-class similarity, and the L1 regularization item is used for realizing the sparsity of the coefficient X; the kernel norm is used for constraining the size of a subspace spanned by the shared dictionary, and the low rank property, lambda, of the shared dictionary is ensured1、λ2And η is used to trade off the specific gravity of the terms of the cost function;
the discriminant fidelity term is defined as:
Figure FDA0002591071650000022
wherein,
Figure FDA0002591071650000023
representing samples of class c, the samples being fusion features, m representing the dimension of the fusion features, ncRepresenting the number of samples of class c, D representing the global dictionary, DcA sub-dictionary representing the class c,
Figure FDA0002591071650000024
representing coefficients of a class c sample on an class i dictionary;
the term discriminant coefficient is defined as:
Figure FDA0002591071650000025
wherein M iscRepresenting the mean value of the sparse coefficients of class c samples, M representing the mean value of the sparse coefficients of the entire training set, M0Represents the average of the coefficients on the shared dictionary,
Figure FDA0002591071650000026
the effect of (a) is to force the coefficients of all training samples on the shared dictionary to be close to the average.
6. The face representation attack detection method based on fusion feature and dictionary learning according to claim 4, characterized by further comprising a step of solving sparse coefficients of test samples, specifically: and constructing two class dictionaries with shared dictionaries through the stored dictionaries, and solving sparse coefficients of the test samples by fixing the class dictionaries.
7. The method for detecting the face representation attack based on the fusion feature and the dictionary learning according to claim 1, wherein the method for judging the category of the test sample based on the size of the residual error of the fusion feature reconstruction comprises the following specific steps:
and solving the sparse coefficient of the test sample based on the regularization of the elastic network, reconstructing the fusion characteristics of the test sample through the sparse coefficient, and reconstructing the class with the minimum residual error as the prediction class of the test sample.
8. A face representation attack detection system based on fusion feature and dictionary learning, comprising: the system comprises a face image database construction module, a preliminary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a test sample category judgment module;
the preliminary fusion feature extraction module comprises an image quality feature extraction module and a depth network feature extraction module;
the face image database construction module is used for carrying out face detection and cutting on an input video to construct a face image database;
the preliminary fusion feature extraction module is used for extracting fusion features of the face images in the face image database, and the fusion features comprise image quality features and depth network features;
the image quality characteristic extraction module is used for extracting the image quality characteristics of the complete face image according to the distortion source of the secondary imaging of the face image;
the depth network feature extraction module is used for constructing a depth convolution network model and extracting the depth network features of the human face image blocks through a depth convolution network;
the final fusion feature generation module is used for respectively standardizing and cascading the two features according to the image quality feature and the depth network feature, and reducing the dimension of the cascaded features through PCA to generate final fusion features;
the dictionary learning classifier training module is used for initializing dictionary atoms based on fusion characteristics and training a dictionary learning classifier based on a low-rank shared dictionary;
the type judgment module of the test sample is used for judging the type of the test sample based on the size of the fusion characteristic reconstruction residual error.
CN202010696193.4A 2020-07-20 2020-07-20 Face representation attack detection method and system based on fusion feature and dictionary learning Active CN111967331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010696193.4A CN111967331B (en) 2020-07-20 2020-07-20 Face representation attack detection method and system based on fusion feature and dictionary learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010696193.4A CN111967331B (en) 2020-07-20 2020-07-20 Face representation attack detection method and system based on fusion feature and dictionary learning

Publications (2)

Publication Number Publication Date
CN111967331A true CN111967331A (en) 2020-11-20
CN111967331B CN111967331B (en) 2023-07-21

Family

ID=73362137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010696193.4A Active CN111967331B (en) 2020-07-20 2020-07-20 Face representation attack detection method and system based on fusion feature and dictionary learning

Country Status (1)

Country Link
CN (1) CN111967331B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449707A (en) * 2021-08-31 2021-09-28 杭州魔点科技有限公司 Living body detection method, electronic apparatus, and storage medium
CN113505722A (en) * 2021-07-23 2021-10-15 中山大学 In-vivo detection method, system and device based on multi-scale feature fusion

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281845A (en) * 2014-10-29 2015-01-14 中国科学院自动化研究所 Face recognition method based on rotation invariant dictionary learning model
CN105844223A (en) * 2016-03-18 2016-08-10 常州大学 Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning
CN107194873A (en) * 2017-05-11 2017-09-22 南京邮电大学 Low-rank nuclear norm canonical facial image ultra-resolution method based on coupling dictionary learning
CN107832747A (en) * 2017-12-05 2018-03-23 广东技术师范学院 A kind of face identification method based on low-rank dictionary learning algorithm
US20180225807A1 (en) * 2016-12-28 2018-08-09 Shenzhen China Star Optoelectronics Technology Co., Ltd. Single-frame super-resolution reconstruction method and device based on sparse domain reconstruction
CN108985177A (en) * 2018-06-21 2018-12-11 南京师范大学 A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint
CN109766813A (en) * 2018-12-31 2019-05-17 陕西师范大学 Dictionary learning face identification method based on symmetrical face exptended sample
CN110428392A (en) * 2019-09-10 2019-11-08 哈尔滨理工大学 A kind of Method of Medical Image Fusion based on dictionary learning and low-rank representation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281845A (en) * 2014-10-29 2015-01-14 中国科学院自动化研究所 Face recognition method based on rotation invariant dictionary learning model
CN105844223A (en) * 2016-03-18 2016-08-10 常州大学 Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning
US20180225807A1 (en) * 2016-12-28 2018-08-09 Shenzhen China Star Optoelectronics Technology Co., Ltd. Single-frame super-resolution reconstruction method and device based on sparse domain reconstruction
CN107194873A (en) * 2017-05-11 2017-09-22 南京邮电大学 Low-rank nuclear norm canonical facial image ultra-resolution method based on coupling dictionary learning
CN107832747A (en) * 2017-12-05 2018-03-23 广东技术师范学院 A kind of face identification method based on low-rank dictionary learning algorithm
CN108985177A (en) * 2018-06-21 2018-12-11 南京师范大学 A kind of facial image classification method of the quick low-rank dictionary learning of combination sparse constraint
CN109766813A (en) * 2018-12-31 2019-05-17 陕西师范大学 Dictionary learning face identification method based on symmetrical face exptended sample
CN110428392A (en) * 2019-09-10 2019-11-08 哈尔滨理工大学 A kind of Method of Medical Image Fusion based on dictionary learning and low-rank representation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505722A (en) * 2021-07-23 2021-10-15 中山大学 In-vivo detection method, system and device based on multi-scale feature fusion
CN113505722B (en) * 2021-07-23 2024-01-02 中山大学 Living body detection method, system and device based on multi-scale feature fusion
CN113449707A (en) * 2021-08-31 2021-09-28 杭州魔点科技有限公司 Living body detection method, electronic apparatus, and storage medium
CN113449707B (en) * 2021-08-31 2021-11-30 杭州魔点科技有限公司 Living body detection method, electronic apparatus, and storage medium

Also Published As

Publication number Publication date
CN111967331B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111460931B (en) Face spoofing detection method and system based on color channel difference image characteristics
Ye et al. Real-time no-reference image quality assessment based on filter learning
CN111160313B (en) Face representation attack detection method based on LBP-VAE anomaly detection model
Tereikovskyi et al. The method of semantic image segmentation using neural networks
CN108389189B (en) Three-dimensional image quality evaluation method based on dictionary learning
Zhong et al. DCT histogram optimization for image database retrieval
CN111967331B (en) Face representation attack detection method and system based on fusion feature and dictionary learning
CN113392791A (en) Skin prediction processing method, device, equipment and storage medium
CN114764939A (en) Heterogeneous face recognition method and system based on identity-attribute decoupling
Ma Improving SAR target recognition performance using multiple preprocessing techniques
Szankin et al. Influence of thermal imagery resolution on accuracy of deep learning based face recognition
CN117095471B (en) Face counterfeiting tracing method based on multi-scale characteristics
CN107133579A (en) Based on CSGF (2D)2The face identification method of PCANet convolutional networks
Nguyen et al. Convolution autoencoder-based sparse representation wavelet for image classification
CN112818774A (en) Living body detection method and device
JP3962517B2 (en) Face detection method and apparatus, and computer-readable medium
Bruckert et al. Deep learning for inter-observer congruency prediction
CN111242114A (en) Character recognition method and device
Khanna et al. Memorability‐based image compression
Raihan et al. CNN modeling for recognizing local fish
Du et al. Robust image hashing based on multi-view dimension reduction
CN110147824B (en) Automatic image classification method and device
CN111754459B (en) Dyeing fake image detection method based on statistical depth characteristics and electronic device
Alsandi Image splicing detection scheme using surf and mean-LBP based morphological operations
Mokalla Deep learning based face detection and recognition in MWIR and visible bands

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant