CN111967331B

CN111967331B - Face representation attack detection method and system based on fusion feature and dictionary learning

Info

Publication number: CN111967331B
Application number: CN202010696193.4A
Authority: CN
Inventors: 傅予力; 黄汉业; 向友君; 许晓燕; 吕玲玲
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2023-07-21
Anticipated expiration: 2040-07-20
Also published as: CN111967331A

Abstract

The invention discloses a face representation attack detection method and a face representation attack detection system based on fusion characteristics and dictionary learning, wherein the method comprises the following steps: extracting image quality characteristics of the whole face image according to a distortion source of the face image secondary imaging; constructing a depth convolution network model, and extracting depth network characteristics of the face image block through the depth convolution network; cascading the two features through PCA to generate a final fusion feature; initializing dictionary atoms by using fusion features, and training a dictionary learning classifier based on a low-rank shared dictionary; and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error. According to the invention, the face representation attack detection is carried out by combining the image quality characteristics and the depth network characteristics for the first time, so that the information provided by a single frame image is better utilized, and the discrimination capability of the extracted characteristics is effectively enhanced; the same mode of the true and false samples is stripped through the low-rank shared dictionary for the first time, so that the accuracy of attack detection is successfully improved, and the method has good generalization.

Description

Face representation attack detection method and system based on fusion feature and dictionary learning

Technical Field

The invention relates to the technical field of image processing, in particular to a face representation attack detection method and system based on fusion characteristics and dictionary learning.

Background

Face recognition technology is widely applied to security, payment, entertainment and other scenes. However, face recognition systems have certain potential safety hazards. With the development of social networks and the popularization of smart phones, more and more people share photos and videos of individuals on the network, lawbreakers can attack a face recognition system by using the media to disguise as other people or deliberately confuse the identities of the individuals, and the purposes of infringement of property safety of other people, escape of legal sanctions and the like are achieved. Attempts to borrow the identity of a legitimate user by means of pictures, videos, etc. of the user through operation of a face recognition system are known as face representation attacks, and methods of detecting such attacks are known as face in vivo detection.

In the living body detection of a face, the face image can be classified into two types, and one type is an image obtained by directly photographing the user himself. The shot object of another type of image may be an object with high similarity to the face of the legal user, such as a photo, a video, a wax image, etc. of the legal user. Such images are called face representation attack images (attack faces for short), and are objects to be detected by the living body detection technology.

The core of the human face living body detection algorithm is to extract the characteristic of the human face image, which has the most discriminant capability for detecting living bodies, and the traditional detection technology is based on the manually designed characteristic, such as LBP (local binary pattern) and LPQ (local phase quantization), so that the manually designed characteristic capable of detecting the attacking human face becomes very difficult along with the continuous improvement of the imaging quality of equipment. In recent years, the use of convolutional neural networks to automatically extract features has become the mainstream. The deep convolutional neural network performs excellently on image classification tasks, but is limited by the scale of the living body detection data set, and the deep network supervised by the class labels only tends to memorize any feature existing in the training set, so that overfitting is easy to cause, and algorithm generalization is poor.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides the face representation attack detection method based on fusion characteristics and dictionary learning, which fully utilizes information provided by a single frame image by fusing artificially designed image quality characteristics and depth network characteristics, effectively enhances the recognition capability of the characteristics, adopts a dictionary learning method based on a low-rank shared dictionary to realize classification of true and false samples, and can strip the commonality of the true and false samples from the shared dictionary, thereby improving the accuracy of attack detection.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a face representation attack detection method based on fusion characteristics and dictionary learning, which comprises the following steps:

performing face detection and clipping on the input video to construct a face image database;

extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics;

extracting image quality characteristics of the whole face image according to a distortion source of the face image secondary imaging;

constructing a depth convolution network model, and extracting depth network characteristics of the face image block through the depth convolution network;

according to the image quality characteristics and the depth network characteristics, respectively standardizing the two characteristics, cascading, and performing dimension reduction on the cascading characteristics through PCA to generate final fusion characteristics;

initializing dictionary atoms based on fusion features, and training a dictionary learning classifier based on a low-rank shared dictionary;

and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error.

As an preferable technical solution, the extracting the image quality feature of the whole face image according to the distortion source of the face image secondary imaging specifically includes: and extracting specular reflection characteristics, extracting fuzzy characteristics, extracting color moment characteristics and extracting color diversity characteristics, and cascading the extracted characteristics to obtain image quality characteristics.

As an preferable technical solution, the extracting the depth network feature of the face image block through the depth convolution network specifically includes:

the face image block is generated by carrying out random scaling and random cutting on the complete face image, a lightweight depth convolution network model is constructed, the face image block is used as input of the convolution network model, the depth network characteristics of the face image block are extracted by training the convolution network model through a Focal Loss function, and a label smoothing method is adopted to convert the single-heat coding label into a soft label, so that the training process of the depth convolution neural network is optimized.

As an preferable technical solution, the initializing dictionary atoms based on the fusion features, training the dictionary learning classifier based on the low-rank shared dictionary, specifically includes the steps of: and (3) the dictionary is saved after the set times of iterative optimization are performed by alternately optimizing the dictionary and minimizing the cost function of the dictionary model by the sparse coefficient.

As a preferred technical solution, the cost function of the dictionary model is expressed as:

the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition capability of the dictionary; the judging coefficient term is used for increasing similarity and reducing similarity among the classes, and the L1 regularization term is used for realizing sparsity of the coefficient X; the kernel norm is used for restricting the subspace size formed by the shared dictionary, ensuring the low rank performance of the shared dictionary, and lambda ₁ 、λ ₂ And η is used to weigh the specific gravity of each item of the cost function;

the discriminant fidelity term is defined as:

wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n _c Representing the number of samples of class c, D representing the global dictionary, D _c Sub-dictionary representing class c, X _c i represents class cCoefficients of the sample on the class i dictionary;

the discrimination coefficient term is defined as:

wherein M is _c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M ⁰ Representing the average value of the coefficients in the shared dictionary,the effect of (c) is to force the coefficients of all training samples on the shared dictionary to be close to the average.

As a preferable technical scheme, the method further comprises a step of solving the sparse coefficient of the test sample, specifically comprising the following steps: and constructing two class dictionaries with shared dictionaries through the saved dictionaries, and solving sparse coefficients of the test samples by using the fixed class dictionaries.

As a preferred technical solution, the determining the class of the test sample based on the magnitude of the fusion feature reconstruction residual error specifically includes:

based on the sparse coefficient of the test sample obtained by regularization of the elastic network, reconstructing fusion characteristics of the test sample through the sparse coefficient, and reconstructing the category with the minimum residual error as the prediction category of the test sample.

The invention provides a face representation attack detection system based on fusion characteristics and dictionary learning, which comprises the following steps: the device comprises a face image database construction module, a primary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a class judgment module of a test sample;

the primary fusion feature extraction module comprises an image quality feature extraction module and a depth network feature extraction module;

the face image database construction module is used for carrying out face detection and cutting on the input video to construct a face image database;

the primary fusion feature extraction module is used for extracting fusion features of face images in a face image database, wherein the fusion features comprise image quality features and depth network features;

the image quality feature extraction module is used for extracting the image quality features of the whole face image according to the distortion sources of the secondary imaging of the face image;

the depth network feature extraction module is used for constructing a depth convolution network model and extracting the depth network features of the face image blocks through the depth convolution network;

the final fusion feature generation module is used for respectively standardizing the two features according to the image quality features and the depth network features, then cascading the two features, and performing dimension reduction on the cascading features through PCA to generate final fusion features;

the dictionary learning classifier training module is used for initializing dictionary atoms based on the fusion characteristics and training a dictionary learning classifier based on a low-rank shared dictionary;

the class judging module of the test sample is used for judging the class of the test sample based on the size of the fusion characteristic reconstruction residual error.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) According to the invention, by fusing the image quality characteristics and the depth network characteristics of the manual design, the information provided by a single frame image is fully utilized, and the distinguishing capability of the characteristics is enhanced.

(2) The invention adopts the low-rank shared dictionary to strip the commonality of the true sample, ensures that the class dictionary can better represent the difference between the true sample and the attack sample, avoids the defect that the adoption of the full-connection layer is easy to generate overfitting, has good generalization and further improves the accuracy of attack detection.

(3) According to the invention, the elastic network regularization is adopted to replace the traditional L1 regularization, so that the problem that certain characteristics are easy to ignore when the L1 regularization is used is solved, the method is beneficial to maintaining fine characteristics, and the discrimination of sparse coefficients is enhanced.

(4) The invention adopts the image blocks which are randomly cut and randomly generated as the input of the convolutional neural network, so that the convolutional neural network concentrates on learning and extracting the effective information related to the deception mode, the data set scale is enlarged in an effective mode, and the problem of performance degradation caused by small data scale is effectively solved.

Drawings

Fig. 1 is a flow chart of a face representation attack detection method based on fusion features and dictionary learning in the present embodiment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Examples

As shown in fig. 1, the present embodiment provides a face representation attack detection method based on fusion features and dictionary learning, which includes the following steps:

s1: performing face detection and clipping on the input video to construct a face image database;

the embodiment selects a public face representation ATTACK video data set REPLAY-ATTACK, CASIA-FASD and MSU-MFSD, wherein the three data sets comprise real face videos and ATTACK face videos, the division of a training set and a testing set is provided, the first 30 frames of each video of the data sets are extracted, a cascade classifier based on Haar characteristics is adopted to detect the position of a face in a picture frame, and a face image is cut;

s2: extracting fusion characteristics of face images in a face image database, wherein the fusion characteristics comprise image quality characteristics and depth network characteristics, and the method comprises the following specific steps of:

s21) extracting image quality characteristics of the whole face image according to a distortion source of the secondary imaging of the face image,

according to the face image database, sub-feature extraction is carried out on the face image through the aspects of ambiguity, specular reflection, color distortion and the like, and the final sample image quality feature vector is formed by splicing all sub-feature vectors, and the specific steps are as follows:

under the same imaging environment, the real access face is a primary imaging picture, the face representation attack is a secondary imaging picture, and analyzing the source of face image distortion in the secondary imaging process is helpful for enhancing the discrimination of the extracted features, and the embodiment extracts the image quality features from four aspects of specular reflection, ambiguity, color moment distortion and color diversity distortion;

extracting specular reflection characteristics: iteratively replacing chromaticity of a highlight position of an input face image with the maximum diffuse reflection chromaticity of adjacent pixels, extracting specular reflection components in the image, and further utilizing percentage values, average values and variance values occupied by the components to form specular reflection characteristics;

extracting fuzzy characteristics: and extracting the fuzzy characteristics of the image by adopting a method based on secondary blurring. The input image is converted into a gray image, the gray image is subjected to low-pass filtering by using a Gaussian filter with a convolution kernel size of 33, and the filtered image is called a blurred image. The image definition is measured by comparing the changes of adjacent pixels of the images, specifically, the absolute difference images in the horizontal direction and the vertical direction are calculated, the gray value sum of all pixel points of the absolute difference images is calculated and is respectively called as the horizontal difference sum and the vertical difference sum, and the ratio of the horizontal difference sum and the vertical difference sum of the images before and after filtering forms a fuzzy characteristic.

Extracting color moment characteristics: firstly, an input face image is converted from an RGB space to an HSV space with relatively independent channels, then the average value, the variance value and the skewness of each channel are calculated, the percentage of pixels in the minimum histogram bin and the maximum histogram bin of each channel are calculated, and the 5 values form a color moment characteristic.

Extracting color diversity characteristics: the R, G, B three channels of the input picture are color quantized and then color diversity features are constructed using the histogram bin counts of the first 100 most frequently occurring colors and the number of all different colors that occur in the face image.

The four extracted features are cascaded and called image quality features, and the dimension of the image quality features is 121 dimensions.

S22) constructing a deep convolution network model, and extracting the deep network characteristics of the face image block through the deep convolution network, wherein the specific explanation is as follows:

the size of the face image obtained in the step S1 is scaled to 112 multiplied by 112, random segmentation of the face image is realized through random scaling and random cutting, the size of an image block is set to 48 multiplied by 48, the face image block is used as the input of a convolutional neural network, the neural network is trained to extract features, the local image block is used for increasing the scale of a training set, the convolutional neural network can concentrate on learning and extracting effective information related to a spoofing attack mode, meanwhile, the resolution of the original input is maintained, and the discriminative information is prevented from losing;

the existing public data set has smaller scale, and the convolution network model can adopt a model with smaller complexity. This example uses a pre-trained ResNet18 model on an ImageNet dataset. Meanwhile, the convolution kernel size of the first convolution layer of the ResNet18 is reduced to 3×3, the step length is reduced to 1, the next layer of the last convolution layer is a global pooling layer, the global pooling layer averages each feature image output by the convolution layer, then the averages are connected into a one-dimensional vector, the next layer of the global pooling layer is a full-connection layer, the full-connection layer takes the one-dimensional vector output by the global pooling layer as an input, the output dimension is a corresponding class number, in the embodiment, the class number is set to 2, and the real face and the attack face are respectively corresponding.

The Loss function when the convolutional network is trained is the Focal Loss function. The formula of the Focal Loss function is as follows:

FL(p _t )＝-(1-p _t ) ^γ log(p _t )

wherein,,p is the probability of a real face sample output by a network, y represents the real label of an input image, the label value of the real face is 1, the label value of an attack face is 0, gamma is called a focusing parameter, and the value is larger than 0. Modulation factor (1-p) _t ) ^γ The prediction score of the model is integrated into the loss function, so that the model can be adaptively adjusted according to the difficulty of the sample, and the gamma value is 2 in the embodiment, and in general, the face video is attackedThe number of the cross entropy Loss functions is several times that of the real face videos, and the problem of data unbalance commonly existing in a data set can be solved by adopting the Focal Loss function to replace the conventional cross entropy Loss function.

In the convolutional neural network training process, a label smoothing finger is adopted to convert a traditional single-heat coding label into a soft label, and the label smoothing finger is as follows:

wherein y is _oh Representing a conventional one-time heat coded label, y _ls Representing the soft label after label smoothing. Tag smoothing reduces the tag value at the correct class to 1-alpha times the original, and the original 0 item becomesK represents the number of categories, alpha ε [0,1 ]]. In this embodiment, α has a value of 0.1. Tag smoothing encourages models to choose the correct category by properly reducing the value of the correct tag, but without undue confidence. For face representation attack detection, the positive and negative samples are very similar in the image domain. Therefore, in the initial stage of the network, the adoption of the hard tag easily leads to rapid fitting of the network, and the generalization capability of a convolution network model is further improved by introducing a tag smoothing method.

The optimization method used in training the neural network is a random gradient descent method, the initial learning rate and the weight attenuation are respectively set to 0.001 and 0.00001, the learning rate regulator is a cosine annealing regulator with restarting, the lowest learning rate is set to 0.00004, the cosine period is 5 rounds, the total iteration is 30 rounds, after the network training is finished, the last global average pooling layer and the full connection layer are removed, the front convolution block group is utilized to extract the depth network characteristics of the face image block, and the depth network characteristic dimension is 512 dimensions;

s23) respectively standardizing the two features according to the image quality features and the depth network features, then cascading the two features, and performing dimension reduction on the cascading features through PCA to generate final fusion features, wherein the specific description is as follows:

extracting two groups of features of image quality features and depth network features from each face image of a face image database, calculating to obtain an average value and a variance of the two groups of features, normalizing the two groups of features, and directly cascading the normalized corresponding image quality features and depth network features for each face image, wherein the direct cascading length of the two features is 633;

adopting PCA (principal component analysis) to reduce the dimension of the cascade features, wherein the feature after the dimension reduction is called as a fusion feature, and in order to determine a relatively good principal component number of the PCA, the embodiment determines a cutting point by setting an experiment with a larger principal component number; in this embodiment, the dimension of the PCA after dimension reduction is set to 400, then the principal components are sorted from large to small according to the variance, the cumulative value of the variance is calculated, and the dimension of the PCA dimension reduction is redetermined according to the proportion of the sum of the variance to the sum of the total variances. The dimension of PCA dimension reduction is set to 256 dimensions in the embodiment;

s3: initializing dictionary atoms by using fusion characteristics of training samples, and training a dictionary learning classifier based on a low-rank shared dictionary;

the embodiment adopts a dictionary learning method based on a low-rank shared dictionary, and sets a total dictionary D= [ D ] ₁ ,D ₂ ,D ₀ ]∈R ^m×n Where m represents the dimension of the fused feature, n represents the size of the dictionary, class dictionary D ₁ And D ₂ The size of the class dictionary is set to 125 corresponding to the real face and the attack face, respectively. Shared dictionary D ₀ The size of the dictionary is set to be 20, fusion features are extracted from the training set images, dictionary atoms are initialized by using the fusion features, wherein two class dictionaries randomly extract samples from corresponding classes, a shared dictionary randomly extracts samples from the whole training set, and the atoms of the dictionary are normalized by L2;

the cost function J of the dictionary model is minimized by iteratively optimizing the dictionary D and the coefficients X, and in this embodiment, the iteration number is set to 25, and the cost function J of the dictionary model is defined as follows:

the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the effect of the discrimination fidelity term is to realize the recognition capability of a dictionary; the function of the judging coefficient term is to increase similarity and reduce similarity among classes, and the function of the L1 regularization term is to realize sparseness of the coefficient X; the function of the kernel norm is to restrict the size of subspace formed by the shared dictionary, ensure the low rank performance of the shared dictionary, lambda ₁ 、λ ₂ And eta are used to weigh the specific gravity of each item of the cost function, in this embodiment lambda ₁ Set to 0.1 lambda ₂ Let 0.01, η be 0.0001;

specifically, the definition of the discriminant fidelity term is as follows:

wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n _c Representing the number of samples of class c, c having a value of 1 or 2, D representing the total dictionary, D _c Sub-dictionary representing class c, +.>Representing the coefficient of the c-type sample on the i-type dictionary, wherein the value of i is 1 or 2;

specifically, the definition of the discrimination coefficient term is as follows:

wherein M is _c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M ⁰ Representing the average value of the coefficients in the shared dictionary,the function of (2) is to force the coefficients of all training samples on the shared dictionary to be close to the average value, so as to prevent the too large contribution gap of the shared dictionary to the samples of different categories from affecting the classification performance;

and (3) iteratively optimizing the dictionary for a certain number of times by alternately optimizing the dictionary and minimizing the cost function of the dictionary model by using the sparse coefficient, then saving the dictionary, constructing two class dictionaries with shared dictionaries by using the saved dictionaries, and solving the sparse coefficient of the test sample by using the fixed class dictionary.

S4: and judging the category of the test sample based on the size of the fusion characteristic reconstruction residual error.

The dictionary D obtained by the present embodiment is used to construct two sub-dictionaries, respectivelyAnd->Namely, the dictionary stored in step S3 constructs two class dictionaries with shared dictionaries, and when the sparse coefficient of the test sample y is solved, the embodiment adopts elastic network regularization, and the optimization problem of the model is as follows:

wherein,,and (3) representing a class dictionary with a shared dictionary, and x represents sparse coefficients corresponding to the test sample y. The second term is an L1 regularization term, the third term is an L2 regularization term, λ _a And lambda (lambda) _b For balancing the specific gravity of the L1 regularization term and the L2 regularization termIn the examples lambda _a Set to 0.01 lambda _b Let 0.01, L2 regularization tends to make the solution of x smoother than L1 regularization, and thus by linearly combining L1 regularization and L2 regularization, improved sparse coding can be produced.

After the sparse coefficient of the test sample y is obtained, reconstructing y according to the coefficient corresponding to the sub-dictionary of each class. The class with the smallest reconstructed residual is taken as the prediction class, and the following formula is shown:

as shown in Table 1 below, the performance of this example was compared with that of a single feature on three data sets of REPLAY-ATTACK, CASIA-FASD, MSU-MFSD, and the evaluation index was HTER (half total error rate).

Table 1 comparison of performance using different features on three published data sets

	REPLAY-ATTACK	CASIA-FASD	MSU-MFSD
				Image quality features	12.85％	13.99％	13.71％
Deep network features	2.37％	4.81％	11.13％
				Fusion features	1.92％	4.41％	9.39％

Table 1 shows that the depth network features cannot automatically extract all discriminant factors in the artificial design features, and the method further utilizes image information by fusing the image quality features and the depth network features, so that the feature recognition capability is effectively enhanced.

As shown in Table 2 below, the performance of this example was compared with other methods in the CASIA-FASD and REPLAY-ATTACK cross dataset scenarios, with an evaluation index HTER.

TABLE 2 Performance comparison Using different features in a Cross dataset scenario

Table 2 shows that compared with the LBP et al artificial design method and the single CNN method, the method has better generalization under the cross-dataset scene.

The embodiment also provides a face representation attack detection system based on fusion features and dictionary learning, which comprises: the device comprises a face image database construction module, a primary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a class judgment module of a test sample;

in this embodiment, the preliminary fusion feature extraction module includes an image quality feature extraction module and a depth network feature extraction module;

in this embodiment, the face image database construction module is configured to perform face detection and clipping on an input video to construct a face image database;

in this embodiment, the primary fusion feature extraction module is configured to extract fusion features of face images in a face image database, where the fusion features include image quality features and depth network features;

in this embodiment, the image quality feature extraction module is configured to extract image quality features of the entire face image according to a distortion source of secondary imaging of the face image;

in this embodiment, the depth network feature extraction module is configured to construct a depth convolution network model, and extract a depth network feature of the face image block through the depth convolution network;

in this embodiment, the final fusion feature generation module is configured to respectively normalize two features according to an image quality feature and a depth network feature, and then concatenate the two features, where the concatenated features undergo dimension reduction through PCA, so as to generate a final fusion feature;

in this embodiment, the dictionary learning classifier training module is configured to initialize dictionary atoms based on the fusion features, and train a dictionary learning classifier based on a low-rank shared dictionary;

in this embodiment, the class determination module of the test sample is configured to determine a class of the test sample based on a size of the fusion feature reconstruction residual.

Through the description of the technical scheme, the invention can fully utilize the information provided by a single frame image by combining the image quality characteristics and the depth network characteristics which are designed manually, and enhance the distinguishing capability of the characteristics. The invention optimizes the structure and training mode of the convolutional neural network aiming at the characteristic of the face representing attack data set, solves the problem of data unbalance through the Focal Loss function, and further improves the generalization capability of the depth network through the label smoothing technology. In addition, the commonality of true and false samples is stripped by introducing a low-rank shared dictionary, and sparse coding of test samples is improved by adopting elastic network regularization, so that the accuracy of the dictionary learning classifier is further improved. The method has good generalization and is suitable for two-dimensional face representation attack detection in actual scenes.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The face representation attack detection method based on fusion characteristics and dictionary learning is characterized by comprising the following steps of:

the dictionary atoms are initialized based on the fusion characteristics, the dictionary learning classifier based on the low-rank shared dictionary is trained, and the specific steps comprise: minimizing a cost function of the dictionary model by alternately optimizing the dictionary and the sparse coefficient, and storing the dictionary after iterative optimization for set times;

the cost function of the dictionary model is expressed as:

wherein the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, and the third term is L1 regularizing terms, wherein a fourth term is a nuclear norm, and distinguishing the fidelity terms is used for realizing the recognition power of the dictionary; the judging coefficient term is used for increasing similarity and reducing similarity among the classes, and the L1 regularization term is used for realizing sparsity of the coefficient X; the kernel norm is used for restricting the subspace size formed by the shared dictionary, ensuring the low rank performance of the shared dictionary, and lambda ₁ 、λ ₂ And eta is used for balancing the proportion of each item of the cost function, D ₀ Representing a shared dictionary;

the discriminant fidelity term is defined as:

wherein,,a sample representing class c, the sample being a fusion feature, m representing the dimension of the fusion feature, n _c Representing the number of samples of class c, D representing the global dictionary, D _c Sub-dictionary representing class c, X _c ⁱ Representing coefficients of class c samples on a class i dictionary;

the discrimination coefficient term is defined as:

wherein M is _c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M ⁰ Representing the average value of the coefficients in the shared dictionary,the effect of (2) is to force the coefficients of all training samples on the shared dictionary to be close to the average value;

2. The face representation attack detection method based on fusion feature and dictionary learning according to claim 1, wherein the extracting the image quality feature of the whole face image according to the distortion source of the face image secondary imaging comprises the specific steps of: and extracting specular reflection characteristics, extracting fuzzy characteristics, extracting color moment characteristics and extracting color diversity characteristics, and cascading the extracted characteristics to obtain image quality characteristics.

3. The face representation attack detection method based on fusion feature and dictionary learning according to claim 1, wherein the depth network feature of the face image block is extracted through a depth convolution network, and the specific steps include:

4. The face representation attack detection method based on fusion features and dictionary learning according to claim 1, further comprising a step of solving a sparse coefficient of a test sample, specifically: and constructing two class dictionaries with shared dictionaries through the saved dictionaries, and solving sparse coefficients of the test samples by using the fixed class dictionaries.

5. The face representation attack detection method based on fusion feature and dictionary learning according to claim 1, wherein the step of judging the class of the test sample based on the magnitude of the fusion feature reconstruction residual error comprises the following specific steps:

6. A face representation attack detection system based on fusion features and dictionary learning, comprising: the device comprises a face image database construction module, a primary fusion feature extraction module, a final fusion feature generation module, a dictionary learning classifier training module and a class judgment module of a test sample;

the cost function of the dictionary model is expressed as:

the first term is a discrimination fidelity term, the second term is a discrimination coefficient term based on Fisher criterion, the third term is an L1 regularization term, the fourth term is a nuclear norm, and the discrimination fidelity term is used for realizing the recognition capability of the dictionary; the judging coefficient term is used for increasing similarity and reducing similarity among the classes, and the L1 regularization term is used for realizing sparsity of the coefficient X; the kernel norm is used for restricting the subspace size formed by the shared dictionary, ensuring the low rank performance of the shared dictionary, and lambda ₁ 、λ ₂ And eta is used for balancing the proportion of each item of the cost function, D ₀ Representing a shared dictionary;

the discriminant fidelity term is defined as:

the discrimination coefficient term is defined as:

wherein M is _c Mean value of sparse coefficient of c-th sample, M represents mean value of sparse coefficient of whole training set, M ⁰ Representing the average value of the coefficients in the shared dictionary,the effect of (a) is to force the coefficients of all training samples on a shared dictionary to be close to flatThe average value;