CN112580576B

CN112580576B - Face spoofing detection method and system based on multi-scale illumination invariance texture characteristics

Info

Publication number: CN112580576B
Application number: CN202011577117.8A
Authority: CN
Inventors: 胡永健; 罗鑫; 葛治中; 刘琲贝; 王宇飞
Original assignee: South China University of Technology SCUT; Sino Singapore International Joint Research Institute
Current assignee: South China University of Technology SCUT; Sino Singapore International Joint Research Institute
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2023-06-20
Anticipated expiration: 2040-12-28
Also published as: CN112580576A

Abstract

The invention discloses a face spoofing detection method and a system based on multi-scale illumination invariance texture characteristics, wherein the method comprises the following steps: the method comprises the steps of after framing a video, matting a face image, and separating channels to obtain a color channel diagram; obtaining an illumination unchanged texture feature map through an illumination separation texture retaining module, merging the normalized illumination unchanged texture feature map with a color channel map to obtain face features, and obtaining input features to be trained after data enhancement; constructing a multi-scale texture module by using the central difference convolution of the multiple receptive fields, and embedding the multi-scale texture module into a lightweight network to construct a lightweight multi-scale texture network; weighting the pixel level loss and the cross entropy loss as total loss; sending the input features into a lightweight multi-scale texture network to learn the intrinsic spoofing features of the texture; updating network parameters according to the loss function, and storing network models and parameters after training; predicting the classification result according to the saved model. The invention accurately extracts the deception features of texture nature, effectively improves the generalization performance of the model and reduces the storage and calculation consumption of deployment.

Description

Face spoofing detection method and system based on multi-scale illumination invariance texture characteristics

Technical Field

The invention relates to the technical field of face spoofing detection, in particular to a face spoofing detection method and system based on multi-scale illumination invariance texture features.

Background

The face biological characteristics have rich and unique personal information for authentication and identification, become the most popular biological characteristics due to the convenience and the friendliness, and acquire very accurate authentication and identification performance. However, there are a variety of face spoofing attacks: photo attack refers to an attacker spoofing an authentication system using a printed photo or a face image of a display screen; the video replay attack refers to a video spoofing authentication system of an attacker by utilizing a prerecorded attacker; the attack of the facial mask means that an attacker wears a mask deception system which is carefully manufactured according to the face of the attacked person; against sample attacks refers to an attacker generating specific sample noise through the GAN network to interfere with the face authentication system to generate false directed authentication. These face spoofing attacks are not only inexpensive but can fool the system, severely impacting and threatening the application of the face recognition system.

In the related research, the texture characteristics such as LBP, hoG, SIFT, SURF are relatively rough in texture details and are easily influenced by illumination and scenes; the computational complexity of the detection algorithm based on physiological characteristics such as depth map and rppg signal (human face vascular beat signal) is high; non-illuminated features such as MSR (multi-scale retinal features) and reflectivity maps cannot preserve rich spoofing marks and textures; the existing method has larger limitation on generalization performance and calculation complexity.

The existing face spoofing prevention detection algorithm can achieve a good detection effect in a library, but under the condition of large illumination and environmental change, the generalization performance of the algorithm can be seriously reduced; in addition, the face recognition system is widely applied to embedded terminals, mobile equipment and monitoring equipment, and the high storage and high calculation requirements of the existing face spoofing prevention detection algorithm are difficult to meet in terms of calculation and storage resources.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a face deception detection method and a face deception detection system based on multi-scale illumination invariance texture features, wherein the illumination invariance texture features and original images are combined, the features comprise material characteristics only related to reflection coefficients, deception noise and other distinguishing clues introduced by secondary imaging and color gamut loss in illumination components, and the deception features can be accurately extracted by matching with a lightweight multi-scale texture network, so that the detection and generalization performances of an algorithm model are improved; the lightweight multi-scale texture network designed by the invention can reduce the requirements of memory and computing resources due to the lightweight model, and is convenient to be deployed on mobile phones, monitoring equipment and embedded terminals.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a face spoofing detection method based on multi-scale illumination invariance texture characteristics, which comprises the following steps:

the method comprises the steps of after framing a video, matting a face image, and separating channels to obtain a plurality of color channel images;

the multiple color channel diagrams respectively pass through an illumination separation texture retaining module to obtain an illumination unchanged texture characteristic diagram;

merging the normalized illumination invariant texture feature map and the color channel map into a face feature;

the face features are subjected to real-time data enhancement and then serve as input features to be trained;

constructing a multi-scale texture module by using the central difference convolution of the multiple receptive fields;

embedding a multi-scale texture module into a lightweight network to construct a lightweight multi-scale texture network;

weighting the pixel-level mean square error loss and the two-class cross entropy loss as a loss function of network training;

the input characteristics after data enhancement are sent to the deception characteristics of the light-weight multi-scale texture network learning texture nature, and the light-weight multi-scale texture network is trained by taking the minimum loss function as a target;

updating network parameters by using an optimizer according to the loss function, and storing a lightweight multi-scale texture network model and parameters after training is completed;

and acquiring a face image of the video data to be detected, sending the input features to be trained into a saved lightweight multi-scale texture network, and predicting a classification result.

As an preferable technical solution, the step of picking up the face image after framing the video, and separating channels to obtain a plurality of color channel images includes the specific steps of:

the method comprises the steps of obtaining a face image frame by framing video data, detecting a face area by using an MTCNN face recognition algorithm, cutting and unifying the face area to obtain a face image, wherein the face image is in an RGB format, and channels are separated to obtain three color channel face images of red, green and blue, which are respectively F _r 、F _g 、F _b 。

As an preferable technical solution, the steps of obtaining the illumination-invariant texture feature map from the plurality of color channel maps through the illumination separation texture retaining module respectively include:

the face picture of each color channel is calculated to obtain a corresponding illumination invariant texture picture INTF _r 、INTF _g 、INTF _b ：

Wherein C represents a red, green and blue 3 color channel, and Filter_x and Filter_y are respectively 5×5 convolution kernels in the horizontal direction and the vertical direction of the local gravity mode;

carrying out convolution operation on the color channel face picture, wherein the convolution operation is specifically expressed as the following two formulas:

F(x,y)＝R(x,y)×L(x,y)

the first formula is a lambertian reflection model, wherein (x, y) is the coordinate of a pixel, F (x, y) is the pixel value of the coordinate pixel, R (x, y) is the reflection coefficient of the coordinate pixel, and L (x, y) is the illumination intensity of imaging of the coordinate pixel;

wherein ε represents a small constant set to prevent denominator from being 0, L is considered as a constant value, R _c For the reflection coefficient of the central pixel of the face picture in the convolution region, P represents the number of pixel point sets of the face picture in the convolution region, R _i For the reflection coefficient of the pixel point with index i of the face picture in the convolution region, fx _i And Fy _i Respectively, the index i of the convolution kernels filter_x and filter_y corresponds to a constant value, and INTF is a human face texture characteristic only related to the reflection coefficient.

As an optimal technical scheme, the normalized illumination invariant texture feature map and the color channel map are combined into the face feature, and the normalized calculation formula is as follows:

wherein Max (INTF) _c ),Min(INTF _c ) Respectively representing the maximum value and the minimum value of the illumination-invariant texture map of the single channel C;

the color channel map is combined with the color channel map and expressed as:

Input＝concat[INTF _r ,INTF _g ,INTF _b ]

wherein Input represents the combined face features.

As a preferred technical solution, the step of performing real-time data enhancement on the face features to serve as input features to be trained specifically includes:

and (3) carrying out random horizontal overturn, randomly selecting an area with the size of 0-28 pixels, setting the pixel value to 0, and carrying out data enhancement on chromaticity, brightness, saturation and contrast.

As an optimal technical scheme, the multi-scale texture module is built by using the central difference convolution of the multi-receptive fields, and the method comprises the steps of extracting and fusing multi-scale local texture features by using the central difference convolution of the multi-receptive fields;

the center difference convolution is specifically expressed as:

wherein the first term on the right of the equation is a common convolution, the second term is a differential convolution of a center pixel and an adjacent pixel of a convolution region, θ is a weight added by the two convolutions, and P ₀ Representing the center position of the convolution region, P _n Is a convolution region

Position index, ω (P) _n ) Representing index P in convolution kernel _n Weights of x (P) ₀ +P _n ) Index P in the face diagram _n Pixel values of (2);

the fused multi-scale local texture features are expressed as:

Y(P ₀ )≈concat[y(P ₀ ,v＝1),y(P ₀ ,v＝2),y(P ₀ ,v＝3),x(P ₀ )]

wherein y (P) ₀ V=r), r ε {1,2,3} is the output of the center difference convolution of different void fractions r, x (P) ₀ ) For input of a multi-scale texture module, Y (P ₀ ) For feature merging, approximately denoted as the output of the multi-scale texture module.

As a preferred technical solution, the embedding the multi-scale texture module into the lightweight network to construct a lightweight multi-scale texture network specifically includes:

inputting feature combinations of different dimensions in a lightweight multi-scale texture network into the network for training, wherein the feature combinations of different dimensions refer to low-medium-high dimension features of a jumper network, and the feature combinations are specifically expressed as follows:

x_concat_64_128＝concat[downsample64_64(x_128_64),x_64_64]

x_concat_32_256＝concat[downsample32_32(x_64_128),x_32_128]

x_concat_16_384＝concat[downsample16_16(x_32_256),x_16_128]

where downsampled represents feature downsampling, x is the channel feature, and x_concat is the feature after channel merging.

As a preferable technical scheme, the pixel-level mean square error loss and the two-class cross entropy loss are weighted as a loss function of network training, and a specific calculation formula is as follows:

L＝αL _maps +βL _cls

wherein L is _maps 、L _cls Respectively obtaining pixel-level mean square error loss of 0/1 graph output and 0/1 graph label, and two kinds of cross entropy loss of 0-1 output and 0-1 label in a lightweight multi-scale texture network; where the 0/1 graph label is an all 0 or all 1 graph of 16 x 16 size, α and β represent weights.

As an optimal technical scheme, the network parameters are updated by using an optimizer according to the loss function, an Adam optimizer with weight attenuation of 1e-4 is adopted, the initial value of the learning rate is 0.0002, and the learning rate is attenuated along with the training iteration times, and the method specifically comprises the following steps:

wherein t is the iteration number of training,

to round down lr (t) is the corresponding learning rate at iteration t.

A face spoof detection system based on multi-scale illumination invariance texture features, comprising:

the system comprises a face region color channel image acquisition module, an illumination separation texture retaining module, an illumination invariant texture map and color map merging module, an input characteristic data enhancement module, a multi-scale texture module construction module, a light-weight multi-scale texture network construction module, a total loss construction module, a light-weight multi-scale texture network training module, a parameter retaining module and a detection module;

the face region color channel image acquisition module is used for carrying out face positioning after framing the video to obtain a face region image, and separating the face image channels to obtain a multi-color channel face image;

the illumination separation texture retaining module is used for respectively processing the face images of the plurality of color channels to obtain corresponding illumination unchanged texture images;

the illumination invariant texture map and color map merging module is used for merging the normalized illumination invariant texture feature map and the color channel map into a face feature;

the input feature data enhancement module is used for enhancing the real-time data of the face features and then taking the face features as input features to be trained;

the multi-scale texture module construction module is used for constructing a multi-scale texture module by adopting center difference convolution of multiple receptive fields;

the lightweight multi-scale texture network construction module is used for embedding the multi-scale texture module into a lightweight network to construct a lightweight multi-scale texture network;

the total loss construction module is used for weighting the pixel-level mean square error loss and the two-class cross entropy loss as a loss function of network training;

the lightweight multi-scale texture network training module is used for sending the input characteristics after data enhancement to deception characteristics of the lightweight multi-scale texture network learning texture essence, and training the lightweight multi-scale texture network with the minimum loss function as a target;

the parameter storage module is used for updating network parameters by using an optimizer according to the loss function, and storing a lightweight multi-scale texture network model and parameters after training is completed;

the detection module is used for extracting face images of video data to be detected, combining an original image and a normalized illumination invariant texture image, inputting the enhanced data into a stored lightweight multi-scale texture network, and predicting a classification result.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) The invention adopts the human face characteristics combined by the illumination invariance texture map and the original map, the characteristics comprise the material characteristics only related to the reflection coefficient and distinguishing clues such as deception noise and the like introduced by secondary imaging and color gamut loss in illumination components, and the influence of different illumination and environments on the performance of the model is reduced, thereby improving the generalization performance of the human face deception detection model.

(2) The input features adopted by the invention retain rich texture information, the lightweight multi-scale texture network can extract local texture features under the multi-receptive field, the feature has strong cheating distinguishing capability, good detection performance can be obtained in a library, and good generalization performance is shown across the library.

(3) The lightweight multi-scale texture network designed by the invention can reduce the requirements of memory and computing resources due to the lightweight model, is convenient to be deployed on mobile phones, monitoring equipment and embedded terminals, has high running speed, can achieve higher detection performance, and is beneficial to application in actual scenes.

Drawings

Fig. 1 is a schematic overall flow chart of a face spoofing detection method based on multi-scale illumination invariance texture features in the embodiment;

FIG. 2 is a schematic diagram of a lightweight multi-scale texture network according to the present embodiment;

FIG. 3 is a schematic diagram of the convolution kernels in the horizontal and vertical directions of the local gravity mode (PLGF) of this embodiment;

fig. 4 is an illumination invariant texture map obtained after processing a real face sample and an attack face sample according to the present embodiment;

FIG. 5 is a schematic diagram of the center difference convolution of the present embodiment;

FIG. 6 is a schematic diagram of a multi-scale texture module according to the present embodiment;

FIG. 7 is a schematic diagram of a lightweight multi-scale texture network according to the present embodiment;

FIG. 8 is a schematic diagram of the training process in this embodiment;

fig. 9 is a schematic diagram of a test flow in this embodiment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Examples

In the embodiment, the data sets of the CASIA-MFSD face spoofing database, the MSU-MFSD database and the Replay-attach database are adopted for training and detection;

the CASIA-MFSD face spoofing database is divided into a training set shot by 20 persons and a test set shot by 30 persons, and the false face is made by recapturing the true face. The database collects video of different imaging quality using three cameras, with resolutions of 640 x 480 pixels, 480 x 640 pixels, 1920 x1080 pixels, respectively. The attack mode comprises the following steps: a bend photo attack, a cut photo attack, and a video replay attack; and contains 50 persons and 600 sections of video.

Replay-attach database is composed of 50 real and attacked videos, the shooting device is a MacBook camera with 320×240 pixels resolution, and 1200 videos shot are divided into two illumination conditions: (1) uniform background, fluorescent lamp illumination; (2) uneven background and poor lighting. Three attack modes are included: (1) Printing attack, namely printing a high-resolution picture on A4 paper; (2) attack by mobile phone: displaying high-resolution pictures and videos on an iPhone 3GS screen; (3) high definition attack: pictures and video are displayed on an iPad screen with a resolution of 1024 x1080 pixels.

The MSU_MFSD database comprises 280 videos for attacking 35 persons, and the attack modes comprise: (1) High resolution video replayed using an iPad Air screen with a resolution of 2048 x 1536 pixels; (2) iPhone 5S screen replay video attack using resolution 1136×640 pixels; (3) photo attack using A3 paper printing; the 3 kinds of attack modes are captured by a notebook computer camera and an Android camera, and the equipment is a MacBook Air 13 (with the resolution ratio of 640 multiplied by 480) built-in camera and a front camera in a Google Nexus 5 Android mobile phone, so that 6 attack videos are captured again by each person in total.

The embodiment is carried out on a Linux system and is mainly realized based on a deep learning framework Pytorch1.0.1, wherein the display card is GTX1080Ti, the CUDA version is 8.0.61 and the cudnn version 6.0.21.

As shown in fig. 1, fig. 2, and fig. 3, the embodiment provides a face spoofing detection method based on multi-scale illumination invariance texture features, which includes the steps of a face region color channel image acquisition module, an illumination removal texture retention module, an illumination invariance texture map and color map merging module, an input feature data enhancement module, a multi-scale texture module construction module, a lightweight multi-scale texture network construction module, a total loss construction module, a lightweight multi-scale texture network training module, a parameter updating and saving module, a detection module, and the like, and the specific steps are as follows:

s1, acquiring a color channel image of a face region:

the method comprises the steps of obtaining a face image frame by framing video data, detecting a face region by using an MTCNN face recognition algorithm, cutting and unifying 256X 256 face images to obtain face images, wherein the face images are in an RGB format, and channels are separated to obtain face images of red, green and blue channels, which are respectively F _r 、F _g 、F _b ；

S2, respectively obtaining an illumination invariant texture feature map for the color channels through an illumination separation texture retaining module;

the illumination separation texture retaining module specifically comprises the following steps: the face picture of each color channel is calculated to obtain a corresponding illumination invariant texture picture INTF _r 、INTF _g 、INTF _b ：

Wherein C represents a red, green and blue 3 color channel; filter_x and Filter_y are respectively 5×5 convolution kernels in the horizontal direction and the vertical direction of a local gravity mode (PLGF), and specific parameters are set as shown in FIG. 3;

F(x,y)＝R(x,y)×L(x,y)

the first formula is a lambertian reflection model, where (x, y) is the coordinates of a pixel, F (x, y) is the pixel value of the coordinate pixel, R (x, y) is the reflection coefficient of the coordinate pixel, and L (x, y) is the illumination intensity of the image of the coordinate pixel.

Where ε represents a small constant set to prevent denominator from being 0 and the empirical value is 0.0001. L can be seen as a constant value due to the slow variation of the illumination intensity in small areas. R is R _c For the reflection coefficient of the central pixel of the face picture in the 5X 5 convolution region, P represents that the number of pixel point sets of the face picture in the 5X 5 convolution region is 25, and R _i Reflection coefficient of pixel point with index i in 5×5 convolution area for face picture, fx _i And Fy _i The index i of the convolution kernels filter_x and filter_y is a constant value corresponding to i, and the obtained INTF is the human face texture characteristic only related to the reflection coefficient, has rich texture information and can be used as an effective characteristic of deception detection.

As shown in FIG. 4, the illumination-invariant texture features remove illumination components and preserve rich texture details, and since the accuracy of the existing detection algorithm is greatly reduced along with changes in illumination and environment, filtering out illumination components can improve the generalization performance of the algorithm, and compared with the existing detection algorithm based on texture, the illumination-invariant texture features in the example preserve richer texture details, which is beneficial to detection of deception marks.

S3, merging the illumination invariant texture map and the color map into a face feature, wherein the specific steps comprise:

s31: normalizing each channel of the illumination-invariant texture feature, and adopting the following formula:

s32: combining the illumination invariant texture features and the original image channels, and adopting the following formula:

Input＝concat[INTF _r ,INTF _g ,INTF _b ]

wherein Input is 6-channel face features;

s4, carrying out data enhancement on the input characteristics, wherein the specific steps comprise:

s41, randomly overturning the face features of the 6 channels in the horizontal direction;

s42, randomly selecting an area with the size of 0-28 pixels for the face features and setting the area to be 0;

s43, carrying out data enhancement on chromaticity, brightness, saturation and contrast of the face features:

s5, constructing a multi-scale texture module by using the central difference convolution of the multi-receptive fields, and specifically comprising the steps of extracting and fusing multi-scale local texture features by using the central difference convolution of the multi-receptive fields:

s51: based on the observation of the illumination invariant texture map, the deception clues are reflected on the difference of textures, the central difference convolution is adopted to extract local details, and the specific formula of the central difference convolution is as follows:

wherein the first term to the right of the equation is a normal convolution, the second term is a differential convolution of the center pixel and the neighboring pixels of the 3 x 3 convolution region, θ is the weight of the two convolutions added and is set to 0.7, P ₀ Representing the center position of a3 x 3 convolution region and P _n Is a convolution region

Position index, ω (P) _n ) Representing index P in a3 x 3 convolution kernel _n Weights of x (P) ₀ +P _n ) Index P in the face diagram _n Is a pixel value of (a).

As shown in fig. 5, the central pixel is subtracted from all the pixels in the 3×3 region, so that local gradient information can be extracted, and the method is more suitable for extracting deception traces from texture features which are not changed in illumination, and features extracted by the difference convolution and the common convolution are added according to the weight θ to be used as the output of the central difference convolution.

S52: considering that the field of view of local gradient information of the central difference convolution feeling is smaller, the local texture information obtained by the central difference convolution of different receptive fields is fused, and the multiscale local texture characteristics are fused, wherein the specific formula is expressed as follows:

Y(P ₀ )≈concat[y(P ₀ ,v＝1),y(P ₀ ,v＝2),y(P ₀ ,v＝3),x(P ₀ )]

wherein y (P) ₀ V=r), r ε {1,2,3} is the output of the center difference convolution of different void fractions r, x (P) ₀ ) Is the input of the multi-scale texture module, and Y (P ₀ ) For the combination of the above features, approximately denoted as the output of the multi-scale texture module;

as shown in fig. 6, the input features obtain multi-scale texture information by adopting central difference convolution of different void ratios through three branches, input information of short circuit is added after combination to serve as the output features, specific dimensional change of the features is as shown in fig. 6, and the convolution void ratio r=1 of r is not set.

S6, embedding the multi-scale texture module into a lightweight network to construct a lightweight multi-scale texture network, combining the characteristics of different dimensions as shown in FIG. 2, inputting the characteristics into the lightweight multi-scale texture network for training, and fully utilizing the low-dimensional characteristics with strong cheating distinguishing capability to perform cheating detection; the formula of the three-time characteristic jump in the network is expressed as follows:

x_concat_64_128＝concat[downsample64_64(x_128_64),x_64_64]

x_concat_32_256＝concat[downsample32_32(x_64_128),x_32_128]

x_concat_16_384＝concat[downsample16_16(x_32_256),x_16_128]

taking the first term as an example, where downsample64_64 represents downsampling a feature to a size of 64×64, x_128_64 is a feature of 64 channels 128×128, and x_concat_64_128 is a feature of 128 channels 64×64 after channel merging.

In this embodiment, the specific steps of constructing a lightweight multi-scale texture network include:

s61: constructing a lightweight multi-scale texture network according to fig. 2, wherein the specific layers and parameter details of the network structure are as shown in fig. 7;

s62: combining the feature downsample64_64 after the first pooling with the feature channel after the second pooling as the input of the multi-scale texture module 2;

s63: combining the feature downsample32_32 of the S61 into a feature channel with the size of 32 multiplied by 32 and the feature channel after the third pooling as the input of the multi-scale texture module 3;

s64: combining the feature downsample16_16 of the S62 feature downsampled size of 16×16 and the fourth pooled feature channel as an input to Dropout;

s7, constructing a total loss function of network training; in this embodiment, the total loss function is a weighted sum of pixel-level mean square error loss and two-class cross entropy loss, and the optimal values of the weights α and β are set to 0.8 and 0.2, and the specific calculation formula is:

L＝0.8L _maps +0.2L _cls

wherein L is _maps 、L _cls Respectively obtaining pixel-level mean square error loss of 0/1 graph output and 0/1 graph label, and two kinds of cross entropy loss of 0-1 output and 0-1 label in a lightweight multi-scale texture network; wherein the 0/1 map label is an all 0 or all 1 map of 16×16 size.

S8, as shown in FIG. 8, a lightweight multi-scale texture network training, parameter updating and saving module; in the embodiment, a lightweight multi-scale texture network built in the step S6 is adopted, an Adam optimizer with weight attenuation of 1e-4 is adopted, the initial value of the learning rate is 0.0002, and the learning rate is attenuated along with the training iteration number, and the method specifically comprises the following steps:

where t is the number of iterations of the training,

to round down lr (t) is the corresponding learning rate at iteration t.

And (3) sending the characteristics after the data enhancement in the step (S4) into a lightweight multi-scale texture network, calculating a loss value by using the output predicted value and the label, updating the network by using an optimizer according to the loss value, and storing the lightweight multi-scale texture network and the weight after training.

S9, model test: as shown in fig. 9, a face image of the video data to be detected is obtained, the input features to be trained are obtained according to the steps, the input features are sent to a saved lightweight multi-scale texture network, all test data are predicted to be 0/1 graph according to a test flow chart, and the average value is used as a classification result.

The performance evaluation index of the face spoofing detection algorithm of this embodiment adopts an Error acceptance Rate (False Acceptance Rate, FAR), an Error rejection Rate (False Rejection Rate, FRR), an Equal Error Rate (EER), and a half Error Rate (Half Total Error Rate, HTER). The above indicators are described in detail using the confusion matrix of table 1 below:

TABLE 1 confusion matrix

Label/predict	Predicted to be true	Prediction as false
			The tag is true	TA	FR
The label being false	FA	TR

The False Acceptance Rate (FAR) refers to the ratio of the number of living faces to the number of non-living faces in the label judgment of the non-living faces:

the False Rejection Rate (FRR) refers to the ratio of the number of non-living faces to the number of living faces in the label determined by the living faces:

the Equal Error Rate (EER) is the error rate when the FRR is equal to the FAR:

half error rate (HTER) is the average of FRR and FAR:

the embodiment also provides a face spoofing detection system based on the multi-scale illumination invariance texture feature, which comprises: the system comprises a face region color channel image acquisition module, an illumination separation texture retaining module, an illumination invariant texture map and color map merging module, an input characteristic data enhancement module, a multi-scale texture module construction module, a light-weight multi-scale texture network construction module, a total loss construction module, a light-weight multi-scale texture network training module, a parameter retaining module and a detection module;

in this embodiment, the face region color channel image acquisition module is configured to perform face positioning on a video after framing by using an MTCNN face recognition algorithm to obtain a face region image, and separate face image channels to obtain a face image of three color channels of red, green and blue;

in this embodiment, the illumination separation texture preserving module is configured to process the face images of the three color channels of red, green and blue to obtain corresponding illumination invariant texture images;

in this embodiment, the illumination invariant texture map and color map merging module is configured to normalize the illumination invariant texture map and merge the normalized illumination invariant texture map with three color channel face maps of red, green and blue to form a 6-channel face feature;

in this embodiment, the input feature data enhancement module is configured to perform random horizontal overturn, local clipping, chroma, brightness, saturation, contrast adjustment, and other data enhancement on the input 6-channel face feature;

in this embodiment, the multi-scale texture module construction module is used for constructing a multi-scale texture module by central difference convolution of multiple receptive fields;

in this embodiment, the lightweight multi-scale texture network construction module is configured to embed the multi-scale texture module into a lightweight network and to jump-connect the low-medium-high-dimensional network layer to construct the lightweight multi-scale texture network;

in this embodiment, the total loss building module is configured to set a total loss function of the network training, where the total loss function is equal to a weighted sum of a pixel-level mean square error loss and a two-class cross entropy loss;

in this embodiment, the lightweight multi-scale texture network training module is configured to send the 6-channel face feature after data enhancement to a spoofing feature of the lightweight multi-scale texture network learning texture nature, so as to train the lightweight multi-scale texture network with a minimum loss function as a target;

in this embodiment, the parameter storage module is configured to update parameters of the network model with an Adam optimizer according to the loss function, and store the lightweight multi-scale texture network model and the parameters after training is completed;

in this embodiment, the detection module is configured to extract a face image of video data to be detected, combine the original image and the normalized illumination invariant texture image, enhance the data, and input the enhanced data into a saved lightweight multi-scale texture network, and take the average value of the predicted 0/1 image as a classification result.

In order to prove the effectiveness of the invention and to test the generalization performance of the method, in-library experiments and cross-library experiments are respectively carried out on the CASIA-MFSD, replay-attach and MSU-MFSD databases. The in-library experimental results and the cross-library experimental results are shown in tables 2 and 3, respectively:

table 2 in-library experimental results table

TABLE 3 Cross-library experiment results Table

As can be seen from table 2, compared with the current method, the semi-error rate and the equal error rate in the library are lower, and the library has excellent performance of fraud detection; as can be seen from table 3, the half error rate of cross-library detection is also lower than that of the current method; compared with LBP, hoG, SIFT, SURF and other texture analysis methods, the method reduces the influence of scene and illumination change, and compared with MSR (multi-scale retina feature) and reflectivity map and other illumination invariant features, the illumination invariant texture features retain rich texture information and are material attribute features, so that deception marks can be effectively detected. The experimental results prove that the method greatly reduces the error rate of the cross-database and obviously improves the generalization performance while ensuring the high accuracy in the database.

In order to prove the light weight effectiveness of the model of the invention, the light weight multi-scale texture network used in the invention and the current popular deep learning network framework are subjected to parameter and calculation quantity comparison, and experimental results are shown in the following table 4 respectively:

table 4 comparison of parameters and calculated amounts:

as can be seen from the experimental results, the method has the advantages of minimum parameter quantity and lightest model, and has smaller calculated quantity than other models except the separable convolution Mobilene in the calculation quantity aspect, so that the method can reduce the requirements of algorithms on the internal storage quantity and the calculation quantity, and is convenient to be deployed on mobile phones, monitoring equipment and embedded terminals.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. A face spoofing detection method based on multi-scale illumination invariance texture features is characterized by comprising the following steps:

the multiple color channel diagrams respectively pass through an illumination separation texture retaining module to obtain an illumination unchanged texture feature diagram, and the specific steps comprise:

Wherein C represents a red, green and blue 3 color channel, filterx, filtery being a 5×5 convolution kernel in the horizontal and vertical directions of the local gravity mode, respectively;

F(x,y)＝R(x,y)×L(x,y)

wherein ε represents a small constant set to prevent denominator from being 0, L is considered as a constant value, R _c For the reflection coefficient of the central pixel of the face picture in the convolution region, P represents the number of pixel point sets of the face picture in the convolution region, R _i For the reflection coefficient of the pixel point with index i of the face picture in the convolution region, fx _i And Fy _i Respectively, the index of the convolution kernel Filterx, filtery is a constant value corresponding to i, and INTF is a human face texture characteristic only related to the reflection coefficient;

2. The face spoofing detection method based on multi-scale illumination invariance texture features of claim 1, wherein the steps of picking up face images after framing video and separating channels to obtain a plurality of color channel images include:

3. The face spoofing detection method based on multi-scale illumination invariance texture features according to claim 1, wherein the illumination invariance texture feature map is normalized and then combined with a color channel map to form a face feature, and the normalized calculation formula is as follows:

the color channel map is combined with the color channel map and expressed as:

Input＝concat[INTF _r ,INTF _g ,INTF _b ]

wherein Input represents the combined face features.

4. The face spoofing detection method based on multi-scale illumination invariance texture features according to claim 1, wherein the step of performing real-time data enhancement on the face features to serve as input features to be trained comprises the following specific steps:

5. The face spoof detection method based on multi-scale illumination invariance texture features according to claim 1, wherein the step of constructing a multi-scale texture module by using multi-receptive field center difference convolution comprises the steps of extracting and fusing multi-scale local texture features by using multi-receptive field center difference convolution;

the center difference convolution is specifically expressed as:

wherein the first term on the right of the equation is a normal convolution and the second term is a convolution regionDifferential convolution of center pixel and adjacent pixel, θ is the added weight of the two convolutions, P ₀ Representing the center position of the convolution region, P _n Is a convolution region

the fused multi-scale local texture features are expressed as:

Y(P ₀ )≈concat[y(P ₀ ,v＝1),y(P ₀ ,v＝2),y(P ₀ ,v＝3),x(P ₀ )]

6. The face spoofing detection method based on multi-scale illumination invariance texture features according to claim 1, wherein the embedding the multi-scale texture module into a lightweight network constructs a lightweight multi-scale texture network, and the specific steps include:

x_concat_64_128＝concat[downsample64_64(x_128_64),x_64_64]

x_concat_32_256＝concat[downsample32_32(x_64_128),x_32_128]

x_concat_16_384＝concat[downsample16_16(x_32_256),x_16_128]

7. The face spoofing detection method based on multi-scale illumination invariance texture features according to claim 1, wherein the pixel-level mean square error loss and the two-class cross entropy loss are weighted as a loss function of network training, and a specific calculation formula is as follows:

L＝αL _maps +βL _cls

8. The face spoofing detection method based on multi-scale illumination invariance texture features according to claim 1, wherein the network parameters are updated by an optimizer according to a loss function, an Adam optimizer with weight attenuation of 1e-4 is adopted, the initial value of learning rate is 0.0002 and is attenuated along with the training iteration number, and the method is specifically set as follows:

wherein t is the iteration number of training,

to round down lr (t) is the corresponding learning rate at iteration t.

9. A face spoofing detection system based on multi-scale illumination invariance texture features, comprising:

the method comprises the steps that a plurality of color channel diagrams respectively pass through an illumination separation texture retaining module to obtain an illumination unchanged texture characteristic diagram, and specifically comprises the following steps:

F(x,y)＝R(x,y)×L(x,y)

wherein ε represents a small constant set to prevent denominator from being 0, L is considered as a constant value, R _c For the reflection coefficient of the central pixel of the face picture in the convolution region, P represents the number of pixel point sets of the face picture in the convolution region, R _i For the reflection coefficient of the pixel point with index i of the face picture in the convolution region, fx _i And Fy _i Each of convolution kernels Filterx, filteryLeading to a constant value corresponding to i, wherein INTF is the human face texture characteristic only related to the reflection coefficient;