CN113269167B

CN113269167B - Face counterfeiting detection method based on image blocking and disordering

Info

Publication number: CN113269167B
Application number: CN202110810798.6A
Authority: CN
Inventors: 练智超; 刘思佟; 李千目; 李硕豪
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-09-28
Anticipated expiration: 2041-07-19
Also published as: CN113269167A

Abstract

The invention discloses a face counterfeiting detection method based on image blocking and disordering, which comprises the following steps: carrying out face detection on the input image by using a face detection model Blazeface, and cutting a face area; zooming the cut image to a fixed size, and partitioning the image; carrying out image block scrambling on each image block to generate a new image; extracting multilayer characteristics of the generated new image by using a backbone network EfficientNet-B3; further distinguishing feature information by using an image block space-channel attention mechanism, extracting high-level semantic information, and acquiring image block-level features; and inputting the acquired image block-level features into a full-connection layer to obtain the probability that the final image is a forged image. The invention effectively improves the precision of face forgery detection, enhances the generalization capability of a face forgery detection model, and improves the condition that the performance of a detection task is seriously degraded in the face of an unknown forgery method.

Description

Face counterfeiting detection method based on image blocking and disordering

Technical Field

The invention relates to the technical field of face counterfeit image detection, in particular to a face counterfeit detection method based on image blocking and disordering.

Background

In recent years, face forgery technology has started to rise on the internet, and face replacement technology represented by the methods of defakes is more prominent. The technology can replace a target face in an image to produce a specific false image. With the continuous development of the deep learning technology, the deep counterfeiting technology can forge very vivid false images only by a few face images, has strong deceptiveness, and is difficult to distinguish true from false. If lawless persons spread the false information to the network, social public opinion will be seriously influenced.

The classical face replacement operation can be divided into three stages: 1) face detection; 2) synthesizing the face; 3) and (4) mixing. The existing face replacement counterfeiting method covers all stages, and generates a highly vivid forged face by means of a complex algorithm. Early face replacement relied primarily on manual editing of images, such as by image editing software such as PhotoShop. With the development of graphics, face-changing techniques based on graphics are beginning to be focused. And the faceSwap renders the corresponding positions of the 3D model by extracting key points of the human face, reduces the difference between the projection shape of the target and the positions of the key points, and finally mixes the rendered image with the original image and carries out color correction to obtain a final image. Nirkin et al (Yuval Nirkin, Y. Keller, and tall Hassner. Fsgan: Objective noise face mapping and representation [ C ]//2019 IEEE/CVFINertino Conference on Computer Vision (ICCV), 2019, pp 7183 and 7192.) by segmentation, the face is mesh-estimated to a 3D shape, and finally, the face is fused and aligned to obtain face forgery. The face changing method of graphics is generally high in threshold and high in cost, and with the development of deep learning technology, the cost of face counterfeiting is greatly reduced due to the application of technologies such as a self-encoder and a generation countermeasure network. The method comprises the steps that the Deepfakes encode different human faces through an encoder, then two decoders are trained to respectively learn and reconstruct the different human faces, and the face replacement can be completed by exchanging the decoders. The Facesewap-GAN adds a generation countermeasure method on the basis of the Deepfakes, introduces a countermeasure loss function of a discriminator and greatly improves the quality of generated images.

Most of the current detection schemes aiming at face replacement counterfeiting technology are regarded as a two-classification problem, and counterfeit faces are detected based on a data-driven training deep convolution network. Zhou et al (Peng Zhou, Xindong Han, Vlad I. Morariu, and L. Davis. two-stream neural networks for sampled surface detection [ C ]//2017 IEEE Conference on Computer Vision and Pattern recognition works (PRCVW), 2017, pp 1831 and 1839.) designed a dual-flow network structure to detect counterfeit images, the first part is a CNN-based binary classifier, and the second part extracts hidden features based on block level for capturing low-level camera features like local noise. Li et al (Yuezun Li and Siwei Lyu. expanding artifact video by detecting surface warping artifacts [ C ]//2018 IEEE Conferenceon Computer Vision and Pattern Recognition Works (CVPRW), 2018.) found that the Deepfakes left unique artifacts after affine transformation and realized effective detection with a classical classifier. Matern et al (Falko Matern, C. Riess, and M. Stamminger. Exploiting vi-surface aspects to ex position surfaces and surface orientations [ C ]//2019 IEEE Window Applications of Computer Vision works-widths (WACVW), 2019, pp 83-92.) detect counterfeit faces based on subtle differences in artificial visual features, such as eyes, teeth, contours. Li et al (Lingzhi Li, Jianmin Bao, Ting Zhang, Hao Yang, DongChen, Fan, and B. Guo. Face x-ray for more general Face for detection [ C ]//2020 IEEE/CVF Conference on Com-driver Vision and Pattern Recognition (CVPR), 2020, pp 5000 + 5009.) achieve the detection of a counterfeit image by detecting the boundary where the counterfeit Face is fused with the original image. However, these detection methods rely heavily on data, are often limited in effectiveness to specific counterfeiting methods, have limited generalization capability, and have significantly reduced performance in the face of unknown counterfeiting methods.

Patent CN112784781A discloses a method and an apparatus for detecting a fake face based on difference perception element learning, but the backbone network used by the method is complex, the operation time is long, the requirement on the hardware environment is high, the method does not get rid of the dependence on data, and data including various fake methods are needed to assist training. Patent CN112395943A discloses a method for detecting a fake face video based on deep learning, which utilizes a capsule network model to perform fake detection, and although a certain robustness is obtained, the method still does not get rid of the dependence on data.

In summary, in the current face forgery detection, because of the dependence of the detection method on data, the detection model is limited by a specific forgery method, the generalization capability is limited, the problem that the detection performance is obviously reduced when an unknown forgery method is faced occurs, and the complexity of the network is higher, the computation amount is large, and the training and the application are not facilitated.

Disclosure of Invention

The invention aims to provide a face counterfeiting detection method based on image blocking scrambling, which enhances the generalization capability of a detection model based on image blocking scrambling and a space-channel attention mechanism, keeps the detection performance on an original data set and obtains an accurate face image counterfeiting detection result.

The technical solution for realizing the purpose of the invention is as follows: a face counterfeiting detection method based on image blocking disorder comprises the following steps:

step 1, carrying out face detection on an input image by using a face detection model Blazeface, and cutting a face area;

step 2, zooming the cut image to a fixed size, and blocking the image;

step 3, carrying out image block scrambling on each image block to generate a new image;

step 4, extracting multilayer characteristics of the new image generated in the step 3 by utilizing a backbone network EfficientNet-B3;

step 5, further distinguishing feature information by using an image block space-channel attention mechanism, extracting high-level semantic information, and acquiring image block-level features;

and 6, inputting the image block level characteristics obtained in the step 5 into a full connection layer to obtain the probability that the final image is a forged image.

Compared with the prior art, the invention has the following remarkable advantages:

(1) by adopting an image block scrambling method, tiny flaws contained in a forged image are destroyed, the detection of the model is helped to reduce the dependence on unstable tiny features, and the generalization capability of the model is enhanced;

(2) the low-level features extracted by the network are weighted in space by using a space attention mechanism, and high-level semantic information is weighted by using a channel attention mechanism, so that the detection capability of the network is enhanced, and the attention to the image block-level features is increased;

(3) the detection performance on the original data set is kept, and an accurate face image counterfeiting detection result can be obtained.

Drawings

Fig. 1 is a flow chart of the detection method for face forgery based on image blocking and disordering of the invention.

FIG. 2 is a diagram illustrating an intra-block scrambling method used in the present invention.

FIG. 3 is a schematic diagram of the integration of the spatio-channel attention module with a neural network used in the present invention.

FIG. 4 is a schematic diagram of a spatial attention module for use with the present invention.

FIG. 5 is a schematic diagram of a channel attention module for use with the present invention.

Detailed Description

The invention relates to a face counterfeiting detection method based on image blocking disorder, which comprises the following steps:

step 2, zooming the cut image to a fixed size, and blocking the image;

Further, in step 1, the face detection model BlazeFace is used to perform face detection on the input image, and the face region is cut, specifically as follows:

carrying out face detection on the input image I by using a face detection model Blazeface to obtain an area I where a face is located in the image_f：

I_f = I(x_f,y_f,w_f,h_f) (1)

In the formula, x_fAbscissa, y, representing the center of the detected face_fOrdinate, w, representing the center of the detected face_fIndicates the width, h, of the detected face_fRepresenting the height of the detected human face; i (x)_f,y_f,w_f,h_f) Representing the image I with (x)_f,y_f) Is a center and a width of w_fHigh is h_fThe area of (a);

face cropping area I of image I_cComprises the following steps:

I_c = I(x_f,y_f,1.2×w_f,1.2×h_f) (2)

the above formula represents the clipping region I_cIs a face area I_f1.2 times of the total weight of the powder.

Further, step 2 scales the cropped image to a fixed size, and blocks the image, specifically as follows:

the image is scaled to 224 pixels in width and height by a bilinear interpolation method, and the scaled image is divided into 4-pixel blocks with side length, which is 3136 blocks in total.

Further, in step 3, performing intra-image block scrambling on each image block to generate a new image, specifically as follows:

for each 4 × 4 image block, p (i) is the ith pixel in the image block, i ϵ {1,2, …,16}, and the new pixel value p' (i) after scrambling is expressed as:

p'(i) = p(α_i) (3)

in which alpha is represented by the vector [1,2, …,16 ]]Randomly disordering the element sequence generation, and for each image block, regenerating alpha; alpha is alpha_iRepresenting the ith element in the vector alpha.

Further, in step 4, the multilayer features of the new image generated in step 3 are extracted by using the backbone network EfficientNet-B3, which is specifically as follows:

for the new image generated in step 3, firstly, partial bottom-layer features F are extracted by utilizing the first 2 convolution blocks of the original EfficientNet-B3 network₁Then the remaining volume blocks extract the global feature F₂(ii) a Input image size 224 × 224, bottom layer feature F₁Has a size of 112 x 112, a channel number of 24, and a global feature F₂Has a size of 7 × 7 and a number of channels of 1536.

Further, the step 5 further distinguishes the feature information by using the image block space-channel attention mechanism, extracts high-level semantic information, and obtains image block-level features, specifically as follows:

the space-channel attention mechanism consists of two parts of space attention and channel attention, and the space attention acts on the bottom layer characteristic F₁Channel attention is applied to the global feature F₂；

The spatial attention is realized by the following steps: first in the bottom layer feature F₁Respectively generating a feature map by performing maximum pooling operation and average pooling operation, wherein the size of a pooling unit is half of the size of the partition in the step 2 and is 2 multiplied by 2 pixels; splicing two characteristic graphs obtained by the pooling operation, and performing maximum pooling operation and average pooling operation along a channel axis; splicing the pooling results, and performing 7 × 7 convolution operation to obtain a space attention diagram, wherein the size is 56 × 56, and the number of channels is 1; upsampling to 112 x 112 using nearest neighbor interpolation for a spatial attention map to obtain a final output spatial attention map M_sWith the underlying feature F₁Multiplying by elements to complete weighting to obtain weighted feature F₁'；

The channel attention is realized by the following steps: for global feature F₂Firstly, maximum pooling and average pooling operations are carried out on the space to generate a feature vector respectively, and the two feature vectors obtained by the pooling operations are spliced and input into a perception machine generation channel attention map M of a single hidden layer_cThe channel attention map M_cAnd global feature F₂Multiplying along the channel axis to complete weighting to obtain weighted feature F₂'。

Further, step 6 is to input the image block-level features obtained in step 5 into a full-link layer to obtain the probability that the final image is a counterfeit image, which is specifically as follows:

and (3) the space size of the features extracted in the step (5) is 7 multiplied by 7, the number of channels is 1536, the features are spliced on the space according to lines and then spliced according to the channels to obtain an 75264-dimensional vector, and a probability is output through a full connection layer to represent the probability that the image is a forged image.

The invention is described in further detail below with reference to the figures and the embodiments.

Examples

As shown in fig. 1, an image based segmentationThe method for detecting the face forgery includes PSAM-Net: for an input image, carrying out face detection on the input image by using a face detection model Blazeface, and cutting a face area to obtain a face image I_c(ii) a Human face image I_cThe method comprises the steps of scaling the image to a fixed-size input image blocking and disordering module, blocking the image, and performing intra-image blocking and disordering on each image block; inputting the disordered image into a convolutional neural network integrated with an attention mechanism, further distinguishing feature information by using a space-channel attention module, extracting high-level semantic information, obtaining image block-level features, and obtaining the probability that the final image is a forged image by using the extracted features through a full connection layer. The invention effectively improves the precision of face forgery detection, enhances the generalization capability of a face forgery detection model, and improves the condition that the performance of a detection task is seriously degraded in the face of an unknown forgery method.

Step 1, carrying out face detection on an input image I by using a face detection model Blazeface to obtain an area I where a face is located in the image_f：

I_f = I(x_f,y_f,w_f,h_f)

In the formula, x_fAbscissa, y, representing the center of the detected face_fOrdinate, w, representing the center of the detected face_fIndicates the width, h, of the detected face_fRepresenting the height of the detected human face; i (x)_f,y_f,w_f,h_f) Representing the image I with (x)_f,y_f) Is a center and a width of w_fHigh is h_fThe area of (a).

Face cropping area I of image I_cComprises the following steps:

I_c = I(x_f,y_f,1.2×w_f,1.2×h_f)

cutting out area I_cIs a face area I_f1.2 times of;

and 2, scaling the image into blocks with the width and the height both being 224 pixels by a bilinear interpolation method, and dividing the scaled image into blocks with the side length of 4 pixels, wherein the total number of the blocks is 3136.

Step 3, for each image block with the size of 4 × 4, p (i) is the ith pixel in the image block, i ϵ {1,2, …,16}, and the new pixel value p' (i) after scrambling is represented as:

p'(i) = p(α_i)

in which alpha is represented by the vector [1,2, …,16 ]]And randomly disordering the element order generation. For each image block, α will be regenerated; alpha is alpha_iRepresenting the ith element in the vector alpha. Fig. 2 illustrates a flow of scrambling for an image block of 2 × 2 size, and the correspondence between pixels in the image block after scrambling and pixels in the image block before scrambling is p '(1) = p (3), p' (2) = p (1), p '(3) = p (4), and p' (4) = p (2), based on α generated at random.

Step 4, firstly, extracting partial bottom layer characteristics F by utilizing the first 2 convolution blocks of the original EfficientNet-B3 network₁Then the remaining volume blocks extract the global feature F₂. Input image size 224 × 224, bottom layer feature F₁Has a size of 112 x 112, a channel number of 24, and a global feature F₂Has a size of 7 × 7 and a number of channels of 1536.

And step 5, a space-channel attention mechanism consists of a space attention part and a channel attention part. Spatial attention is applied to the underlying feature F₁Channel attention is applied to the global feature F₂The integration of two attention modules with a neural network is shown in FIG. 3, where M_sRepresenting a spatial attention map, F₁' denotes the underlying feature F₁Features weighted with spatial attention map, M_cShowing a channel attention map, F₂' denotes the underlying feature F₂Features weighted with spatial attention maps;

spatial attention is achieved as shown in FIG. 4, first at the bottom level feature F₁Performing maximum pooling operation and average pooling operation on the image to generate a feature map, respectively, due to the bottom layer feature F₁The image is once pooled in the extraction process, and the size of the image is half of that of the input image, so that the size of a pooling unit is half of the block size of the step 2 and is 2 multiplied by 2; splicing two characteristic graphs obtained by the pooling operation, and performing maximum pooling operation and average pooling operation along a channel axis; pooling the resultsSplicing, performing 7 × 7 convolution operation to obtain a space attention diagram, wherein the size is 56 × 56, and the number of channels is 1; upsampling to 112 x 112 using nearest neighbor interpolation for a spatial attention map to obtain a final output spatial attention map M_sWith the underlying feature F₁Multiplying by elements to complete weighting to obtain weighted feature F₁'；

Implementation of channel attention FIG. 5 shows for global feature F₂Firstly, maximum pooling and average pooling operations are carried out on the space to generate a feature vector respectively, and the two feature vectors obtained by the pooling operations are spliced and input into a perception machine generation channel attention map M of a single hidden layer_cThe channel attention map M_cAnd global feature F₂Multiplying along the channel axis to complete weighting to obtain weighted feature F₂'。

And 6, the feature size extracted in the step 5 is 7 multiplied by 7, the number of channels is 1536, the feature map is elongated into an 75264-dimensional vector, and a probability is output through a full connection layer to represent the probability that the image is a forged image.

TABLE 1 comparison of the method of the invention on the data set Celeb-DF v2 data set

Table 1 compares the results of the inventive method PSAM-Net on the data set Celeb-DF v 2. Evaluation was performed on Celeb-DFv2 and on a counterfeit image generated by the FSGAN counterfeit method based on Celeb-DFv2, as indicated by the detection accuracy. It can be found that the method of the present invention is excellent in the face of the FSGAN falsification method which does not occur in the training set, and the detection accuracy on the original data set is better than that of the original method. The generalization capability of the face forgery detection network model can be effectively enhanced.

The invention provides an image intra-block scrambling method and a channel-space attention method for improving the generalization capability of a counterfeit detection model from the aspect of robust optimization. The intra-block scrambling method eliminates tiny flaws in the forged image, helping the model reduce reliance on data. The spatio-channel attention module enhances the detectability of the model. The method effectively solves the problems that the detection method depends on data, the generalization capability is limited, and the performance is obviously reduced when an unknown counterfeiting method is faced under the condition of lower performance expense.

Claims

1. A face forgery detection method based on image block scrambling is characterized by comprising the following steps:

step 1, carrying out face detection on an input image by using a face detection model BlazeFace, and cutting a face area, wherein the face detection model BlazeFace specifically comprises the following steps:

I_f = I(x_f,y_f,w_f,h_f) (1)

face cropping area I of image I_cComprises the following steps:

I_c = I(x_f,y_f,1.2×w_f,1.2×h_f) (2)

the above formula represents the clipping region I_cIs a face area I_f1.2 times of;

step 2, zooming the cut image to a fixed size, and blocking the image;

2. The method for detecting face forgery based on image blocking and obfuscation according to claim 1, wherein step 2 scales the clipped image to a fixed size, and blocks the image, specifically as follows:

3. The method for detecting face forgery based on image block scrambling according to claim 2, wherein the image block scrambling in each image block in step 3 generates a new image, specifically as follows:

p'(i) = p(α_i) (3)

4. The method for detecting face forgery based on image blocking and scrambling of claim 3, wherein the step 4 of extracting the multi-layer features of the new image generated in the step 3 by using the backbone network EfficientNet-B3 specifically comprises the following steps:

5. The method according to claim 4, wherein the step 5 further distinguishes feature information by using an image block space-channel attention mechanism, extracts high-level semantic information, and obtains image block-level features, specifically as follows:

6. The method for detecting face forgery based on image block scrambling according to claim 5, wherein step 6 is to input the image block level features obtained in step 5 into a full link layer to obtain the probability that the final image is a forged image, specifically as follows: