CN115546873A

CN115546873A - Face counterfeiting detection method based on local region regularization

Info

Publication number: CN115546873A
Application number: CN202211365584.3A
Authority: CN
Inventors: 刘思佟; 练智超; 肖亮
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2022-12-30

Abstract

The invention discloses a face forgery detection method based on local area regularization, which belongs to the technical field of artificial intelligence, wherein the arrangement sequence of image blocks formed by scaling and blocking an image of a face area is randomly disordered to generate a new image, and the arrangement sequence is recorded; then extracting the characteristics of a new image generated by disordering the sequence; in the training process, the extracted feature input positions of new images generated by disordering the sequence are reconstructed and branched to obtain the arrangement sequence of the image blocks, so that the correlation of the model modeling image blocks is encouraged; and inputting the extracted features of the new image generated in the disordered sequence into a classifier to obtain the probability that the final image is a forged image. According to the method, the image is subjected to block-level disordering and reconstructed to realize local area regularization, overfitting caused by distribution deviation of a training set and a test set is overcome, a model is encouraged to pay attention to the correlation between the local area of the image and a modeling image block, the detection and generalization capability of the network is enhanced, and the counterfeit detection performance of damaged data is improved.

Description

Face counterfeiting detection method based on local region regularization

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a face counterfeiting detection method based on local region regularization.

Background

With the rapid development of the deep learning technology, the face counterfeiting technology based on deep learning has made remarkable progress. Researchers typically classify them into face replacement, face attribute editing, and face synthesis depending on the counterfeit target. The face replacement technology represented by the Deepfakes can change the identity information of the face in the photo and make a specific false image. The application of techniques such as self-encoders, generation of countermeasure networks, etc. greatly reduces the cost of counterfeiting. The method comprises the steps that the Deepfakes encode different human faces through an encoder, then two decoders are trained to respectively learn and reconstruct the different human faces, and the face replacement can be completed by exchanging the decoders. The large-scale spread of these false images over the internet creates a security risk, further promoting the development of deep forgery detection techniques.

At present, most of forgery detection technologies aiming at face replacement are regarded as a two-classification problem, and how to extract features and frequency domain information from a spatial domain based on a data-driven training deep convolutional network is used for detecting a forgery face. Nguyen et al (Nguyen, huy Hoang, junichi Yamagishi and Isao Echizen. Capsule-forms: using Capsule Networks to Detect formed Images and Videos [ C ]// ICASSP 2019-2019 IEEE International Conference Acoustics, speech and Signal Processing (ICASSP), 2019, 2307-2311) propose a Capsule network-based detection method. <xnotran> Li (Yuezun Li and Siwei Lyu. Exposing deepfake videos bydetecting face warping artifacts [ C ]//2018 IEEE Conferenceon Computer Vision and Pattern Recognition Workshops (CVPRW) , SSPNet . Li (Lingzhi Li, jianmin Bao, ting Zhang, hao Yang, dongChen, fang Wen, and B. Guo. Face x-ray for more generalface forgery detection [ C ]//2020 IEEE/CVF Conference on Com-puter Vision and Pattern Recognition (CVPR), 2020, pp 5000-5009.) X , . , , . Durallet (Durall, ricard, margret Keuper, franz-Josef Pfreundt and Janis Keuper. Unmasking DeepFakes with simple Features [ C ]// ArXiv abs/8978 zxft 8978.) (DFT) . Luo (Luo, yucheng, yong Zhang, junchi Yan and Wei Liu. Generalizing Face Forgery Detection with High-frequency Features [ C ]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021,pp 16312-16321.) . Qian (Qian, yuyang, guojun Yin, lu Sheng, zixuan Chen and Jing Shao.Thinking in Frequency: face Forgery Detection by Mining Frequency-aware Clues [ C ]// ECCV, </xnotran> 2020. ) Frequency-aware image decomposition and local frequency statistical information are extracted using DFT to improve the counterfeit detection performance of high-compression video. However, performance across datasets drops significantly. Since the differences between true and false images are mainly subtle and local, researchers are looking for solutions from local areas. Du et al (Du, mengnan, shiva K. Pentyala, yuening Li and Xia Hu. Towards general inventive Detection with Locality-aware AutoEncoder [ C ] \ \ Proceedings of the 29th ACM International Conference Information & Knowledge Management, 2020.) focused on local spatial features, learned the internal representation of local forged regions by an auto-encoder to bridge the generalized differences. Zhao et al (Zhao, hanqing, wenbo Zhou, dongdong Chen, tianyi Wei, weiming Zhang and Nenghai Yu. Multi-objective detepfake Detection [ C ]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp 2185-2194.) propose a Multi-attention network to focus on different local parts and subtle artifacts. Wang et al (Wang, chengcui and Weihong Deng. Compressive formation Mining for Face Detection [ C ]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp 14918-14927.) propose an attention-based data enhancement framework that encourages the network to mine different features by occluding sensitive areas. Wang et al (Wang, junke, zuxuan Wu, jingking Chen and Yu-Gang Jiang. M2TR, multi-modal Multi-scale transformations for Deepfake Detection [ C ]// Proceedings of the 2022 International Conference on Multi media Retrieval, 2022.) propose a Multi-modal Multi-scale transformer to capture subtle artifacts of different scales. However, images tend to degrade to different degrees during propagation (e.g., resizing and blurring), which presents challenges to the generalization capability of detection methods. In severe cases, the depth model may perform well on the training data, but with a significant drop in accuracy when predicting new data.

Patent CN113269167a discloses a face forgery detection method based on image block scrambling, which blocks an image and scrambles the block to enhance the forgery detection capability, but the scrambling itself introduces a noise misleading detection model, resulting in limited increase of the model in generalization and robustness, and poor expression in low image quality. Patent CN114445891a discloses a human face falsification detection and positioning method based on SSIM difference maps, but a backbone network used by the method is complex, has long operation time, has high requirements for hardware environment, does not get rid of dependence on data, and needs to include various data enhancement modules to assist training.

In summary, in the current human face forgery detection, the detection model has limited generalization capability due to the dependence of the detection method on data. Most of image videos in network transmission have quality degradation to a certain degree, so that local features of forged images are unstable, the detection performance of an existing forged detection model is remarkably reduced when the existing forged detection model faces broken data, and the network is high in complexity, large in calculation amount and not beneficial to training and application.

Disclosure of Invention

The technical problem solved by the invention is as follows: the invention aims to provide a face forgery detection method based on image block scrambling and recovery, which is used for obtaining an accurate face forgery detection result based on the correlation of image blocks modeled by a scrambling and recovery encouraging model among the image blocks.

The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a face counterfeiting detection method based on local region regularization comprises the following steps: randomly disordering the arrangement sequence of image blocks formed after the face area image is zoomed and blocked to generate new images, and recording the arrangement sequence; then extracting the characteristics of a new image generated by disordering the sequence; in the training process, the extracted feature input positions of new images generated by disordering the sequence are reconstructed and branched to obtain the arrangement sequence of the image blocks, so that the correlation of the model modeling image blocks is encouraged; and inputting the extracted features of the new image generated in the disordered sequence into a classifier to obtain the probability that the final image is a forged image. According to the method, the image is subjected to block-level disordering and reconstructed to realize local area regularization, so that overfitting caused by distribution deviation of a training set and a test set is overcome, the model is encouraged to pay attention to the correlation between the local area of the image and a modeling image block, the detection capability and the generalization capability of the network are enhanced, and the counterfeit detection performance of damaged data is improved.

Further, in the step of randomly disordering the arrangement order of the image blocks formed by scaling and blocking the image of the face area to generate a new image, and recording the arrangement order, for the image block divided into 32 × 32 blocks,p(i)is the first of an imageiEach of the image blocks is a block of an image,iϵ {1,2, …,49}, new image blocks after scramblingp'(i)Expressed as:

p'(i) = p(α _i ) (3)

in the formula (I), the compound is shown in the specification,αfrom the vector [1,2, …,49]The generation of the order of the elements is randomly disturbed,α _i representing a vectorαTo (1)iAn element; new images generated by disordering the orderI ^* For each of the images, the image data is stored,αit is reproduced and recorded as the arrangement order M of the images.

Further, in the step of extracting the characteristics of the new image generated in the disordered order, the new image generated in the disordered order is extracted by using 5 convolution blocks of the main network XceptionI ^* Characteristic F of (1).

Furthermore, the position reconstruction branch R consists of a PixleShuffle sampling layer, a 1 × 1 convolution layer and a HardTanh activation layer;

in the training process, firstly, inputting the characteristics F of a new image generated by disordering the sequence into a Pixle shuffle sampling layer for pixel recombination; then inputting the signals into a 1 × 1 convolution layer to obtain the characteristics of 7 × 7 size and 2 channel number; inputting the characteristics into a HardTanh activation layer to obtain a recovery sequence with the size of 7 multiplied by 7 and the number of channels of 2; inputting the recovery order and the arrangement order into a Smooth L1 loss function to calculate loss and position reconstruction loss

Expressed as:

(4)

wherein the content of the first and second substances,Ca backbone network is represented that is,

representing a backbone networkC1 to 5 volume blocks of the block,Ra position reconstruction branch is shown which is,

representing position reconstruction branchesRIs determined by the parameters of (a) and (b), I ^* representing new images generated out of order,Mindicating the order in which the images are arranged.

Further, the classifier consists of an average pooling layer and a full-link layer; new image generated out of sequenceI ^* The size of the feature F is 7 multiplied by 7, the number of channels is 2048, the feature is input into an average pooling layer to obtain the feature with the size of 1 multiplied by 1 and the number of channels is 2028, a probability is output through a full-connection layer to represent the probability that the image is a fake image, and the cross entropy loss of the classification is represented as:

(5)

wherein the content of the first and second substances,

representing backbone networksCThe parameter(s) of (a) is,yan authenticity label representing the image.

Has the advantages that: compared with the prior art, the invention has the following advantages:

the invention relates to a face forgery detection method based on local area regularization,

(1) By adopting an image inter-block disordering method, the global structure of an image is damaged, equal probability sampling of blocks is indirectly realized, a model is encouraged to pay attention to a local area of the image, a counterfeit detection task is prevented from being degenerated into a face identification task, translation invariance of a network is enhanced, and the detection capability of the model on a deep counterfeit image is enhanced;

(2) An image block recovery method is adopted to recover the original arrangement of image blocks, model the semantic relevance of the image blocks and enhance the detection capability of the model on the depth forged image;

(3) Regularization aiming at a local region is realized by disordering and restoring the sequence of image blocks, overfitting caused by distribution deviation of a training set and a test set caused by image breakage is overcome, and the generalization capability of a model to the broken image is enhanced;

(4) The detection performance on the original data set is kept, and an accurate face image counterfeiting detection result can be obtained. The accuracy of face forgery detection is effectively improved, the generalization capability of a face forgery detection model is enhanced, and the condition that the performance is seriously degraded when a detection task faces to the broken data is improved.

Drawings

Fig. 1 is a flowchart of a face forgery detection method based on image block scrambling recovery according to the present invention.

FIG. 2 is a diagram illustrating an exemplary method for inter-block scrambling according to the present invention.

Fig. 3 is a schematic diagram of a position reconstruction branch used in the present invention.

Fig. 4 is a schematic diagram of the effect of scrambling and restoring face images of different counterfeiting methods according to the invention.

FIG. 5 is a statistical diagram of the position gap after the chaotic reconstruction on the FaceForensics + + dataset according to the present invention.

Detailed Description

The present invention will be further illustrated by the following specific examples, which are carried out on the premise of the technical scheme of the present invention, and it should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.

In the face forgery detection method based on local area regularization of this embodiment, for an input image, a face detection model is used to perform face detection on the input image, and a face area is cut to obtain a face imageI _c (ii) a Human face imageI _c After the image is zoomed to a fixed size, the image is partitioned, the image blocks are randomly disordered and the arrangement sequence is recorded; inputting the new images generated in the disordered sequence into a convolutional neural network to extract image features, reconstructing and branching the feature input positions of the extracted new images generated in the disordered sequence during model training to obtain the arrangement sequence of the image blocks, so as to encourage the model to model the correlation of the image blocks, and inputting the features of the extracted new images generated in the disordered sequence into a classifier to obtain the probability that the final image is a forged image. The method specifically comprises the following steps:

step 1, carrying out face detection on an input image by using a face detection model RetinaFace, and cutting a face area; the method comprises the following specific steps:

input image pair by using human face detection model RetinaFaceICarrying out face detection to obtain the region of the face in the imageI _f ：

I _f = I(x _f ,y _f ,w _f ,h _f ) (1)

In the formula (I), the compound is shown in the specification,x _f the abscissa representing the center of the detected face,y _f the ordinate of the center of the detected face is represented,w _f representing faces detectedThe width of the paper is that of the paper,h _f representing the height of the detected human face;I(x _f ,y _f ,w _f ,h _f )representing imagesIIn a step ofx _f ,y _f ) Is central and widew _f High ish _f The area of (a);

image of a personIFace cutting areaI _c Comprises the following steps:

I _c = I(x _f ,y _f ,1.2×w _f ,1.2×h _f ) (2)

the above formula represents the face clipping regionI _c Is the area of the faceI _f 1.2 times of the total weight of the powder.

Step 2, zooming the cut image to a fixed size, and blocking the image; the method comprises the following specific steps:

cutting out the human face of the imageI _c And scaling the image to 224 pixels with width and height by a bicubic interpolation method so as to conveniently perform blocking operation on the image and input the image into a backbone network to extract features. The scaled image is divided into squares of length 32 pixels for a total of 49 blocks.

Step 3, randomly disordering the arrangement sequence of the image blocks to generate new images, and recording the arrangement sequence; the method comprises the following specific steps:

for an image divided into image blocks of size 32 x 32,p(i)is the first of an imageiEach of the image blocks is a block of an image,iϵ {1,2, …,49}, new image blocks after scramblingp'(i)Expressed as:

p'(i) = p(α _i ) (3)

in the formula (I), the compound is shown in the specification,αfrom the vector [1,2, …,49]The generation of the order of the elements is randomly disturbed,α _i representing a vectorαTo (1)iAn element; new images generated by disordering the orderI ^* For each of the images, the image data is stored,αit is reproduced and recorded, which is called the arrangement order M of the images. Fig. 2 shows a scrambling flow for dividing an image into 7 × 7 image blocks, where the image blocks in the image are scrambled according to a randomly generated α.

The global structure of the image is disturbed among the blocks, equal-probability sampling of the blocks is indirectly realized, the model is encouraged to pay attention to the local area of the image, the counterfeiting detection task is prevented from being degenerated into a face identification task, the translation invariance of the network is enhanced, and the detection capability of the model on the deep counterfeiting image is enhanced.

Step 4, extracting the characteristics of the new image generated in the disorderly sequence in the step 3 by using the main network Xconcept; the method comprises the following specific steps:

for the new image generated in the step 3, extracting image characteristics F by using 5 convolution blocks of the original Xprediction network C; the size of the input image is 224 × 224, the size of the image feature F is 7 × 7, and the number of channels is 2048. The Xception network effectively reduces the number of parameters by separate convolution of the feature channels.

Step 5, in the training process, the characteristic input positions extracted in the step 4 are subjected to branch reconstruction to obtain the arrangement sequence of the image blocks, so that the relevance of the image blocks modeled by the model is encouraged; the method comprises the following specific steps:

the position reconstruction branch is shown in fig. 3, and the position reconstruction branch R is composed of a pixleeffe sampling layer, a 1 × 1 convolutional layer and a hardtranh active layer;

in the training process, the new image generated by the disorderly sequence obtained in step 4 is firstly generatedI ^* Inputting the characteristic F into a PixleShuffle sampling layer for pixel recombination; then inputting the signals into a 1 × 1 convolution layer to obtain the characteristics of 7 × 7 size and 2 channel number; inputting the characteristics into a HardTanh activation layer to obtain a recovery sequence with the size of 7 multiplied by 7 and the number of channels of 2; inputting the recovery order and the arrangement order into a Smooth L1 loss function to calculate loss and position reconstruction loss

Is shown as：

(4)

representing position reconstruction branchesRIs determined by the parameters of (a) and (b), I ^* representing new images generated out of order,Mindicating the order in which the images recorded in step 3 are arranged. Semantic relevance among local areas can be modeled by restoring the positions of the image blocks, and the detection capability of the network is enhanced.

Step 6, generating new images in the disorderly sequence acquired in the step 4I ^* The feature F of (2) is input into the classifier to obtain the probability that the final image is a counterfeit image, which is specifically as follows:

the classifier consists of an average pooling layer and a full-connection layer; step 4 new image extractionI ^* The size of the feature F is 7 multiplied by 7, the number of channels is 2048, the feature is input into an average pooling layer to obtain the feature with the size of 1 multiplied by 1 and the number of channels is 2028, a probability is output through a full-connection layer to represent the probability that the image is a fake image, and the cross entropy loss of the classification is represented as:

(5)

wherein the content of the first and second substances,

representing backbone networksCIs determined by the parameters of (a) and (b),yan authenticity label representing the image.

TABLE 1 comparison of the present invention on a data set FaceForensics + + data set

Table 1 shows a comparison of the results of the method of the invention on the data set FaceF identities + +. The evaluation is carried out on c23 video quality data and c40 video quality data of faceForenses + +, a model is trained on the c23 data and then tested on the two video quality data, and the index is the detection accuracy. It can be found that the present invention performs well in the face of c40 video quality that does not appear in the training set, and the detection accuracy rate on c23 video quality exceeds the original method. The generalization capability of the face forgery detection network model can be effectively enhanced.

FIG. 4 shows the effect of scrambling and reconstructing under the real data of the data set FaceForense + + and the data of different counterfeiting methods. The first row of the figure shows the original image, the second row shows the shuffled effect, and the third row shows the reconstructed effect. It can be seen that the original disordered face image basically restores the correct arrangement after being reconstructed by the method. Fig. 5 shows distance statistics of positions of an original image block and a reconstructed image block on a data set FaceForensics + +, and it can be seen that more than 80% of the image blocks are restored to positions less than or equal to 1 from the original position, which proves the effectiveness of reconstruction branches in the method of the present invention.

The invention provides an image block scrambling recovery method for improving the generalization capability of a counterfeit detection model. The image block disordering recovery method encourages the model to pay attention to the local block-level area, models the correlation of the image block, and effectively solves the problems that the detection method is limited in generalization capability and the performance is obviously reduced when the data is damaged under the condition of low performance cost.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A face forgery detection method based on local area regularization is characterized by comprising the following steps: randomly disordering the arrangement sequence of image blocks formed after the face area image is zoomed and blocked to generate new images, and recording the arrangement sequence; then extracting the characteristics of a new image generated by disordering the sequence; in the training process, the extracted feature input positions of new images generated by disordering the sequence are reconstructed and branched to obtain the arrangement sequence of the image blocks, so that the correlation of the model modeling image blocks is encouraged; and inputting the extracted features of the new image generated in the disordered sequence into a classifier to obtain the probability that the final image is a forged image.

2. The local region regularization based face forgery detection method as claimed in claim 1, wherein: randomly disordering the arrangement order of the image blocks formed by scaling and partitioning the face region image to generate new images, and recording the arrangement order step, for the image divided into image blocks with the size of 32 x 32,p(i)is the first of an imageiEach of the image blocks is a block of an image,iϵ {1,2, …,49}, new image blocks after scramblingp'(i)Expressed as:

p'(i) = p(α _i ) (3)

3. The local region regularization-based face forgery detection method as claimed in claim 1, wherein: in the characteristic step of extracting the new image generated by the disordering sequence, the new image generated by the disordering sequence is extracted by using 5 convolution blocks of the main network XceptionI ^* Characteristic F of (1).

4. The local region regularization-based face forgery detection method as claimed in claim 1, wherein: the position reconstruction branch R consists of a PixleShuffle sampling layer, a 1 multiplied by 1 convolution layer and a HardTanh activating layer;

in the training process, firstly, inputting the characteristics F of a new image generated by disordering the sequence into a Pixle shuffle sampling layer for pixel recombination; then inputting the signals into a 1 × 1 convolution layer to obtain the characteristics of 7 × 7 size and 2 channel number; inputting the characteristics into a HardTanh activation layer to obtain a recovery sequence with the size of 7 multiplied by 7 and the number of channels of 2; the recovery order and the permutation order are input to a penalty function to calculate a penalty.

5. The local region regularization-based face forgery detection method as claimed in claim 4, wherein: inputting the recovery order and the arrangement order into a Smooth L1 loss function to calculate loss and position reconstruction loss

Expressed as:

(4)

representing a backbone networkC1 to 5 volume blocks of,Ra position reconstruction branch is shown which is,

6. The local region regularization-based face forgery detection method as claimed in claim 1, wherein: the classifier consists of an averaging poolThe chemical layer and the full connecting layer; new images generated out of orderI ^* The size of the feature F of (1) is 7 × 7, the number of channels is 2048, the features are input into the average pooling layer to obtain the features of 1 × 1 and the number of channels is 2028, and a probability representing the probability that the image is a counterfeit image is output through a full connection layer.

7. The local region regularization based face forgery detection method as claimed in claim 6, wherein: the cross-entropy loss for the classification is expressed as:

(5)

wherein the content of the first and second substances,