CN115482595B

CN115482595B - Specific character visual sense counterfeiting detection and identification method based on semantic segmentation

Info

Publication number: CN115482595B
Application number: CN202211188905.7A
Authority: CN
Inventors: 周琳娜; 杨震; 王任颖; 陈贤浩; 林清然; 储贝林; 毛羽哲
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-04-07
Anticipated expiration: 2042-09-27
Also published as: CN115482595A

Abstract

The invention discloses a specific character visual sense counterfeiting detection and identification method based on semantic segmentation, belongs to the technical field of depth counterfeiting and detection, and provides a counterfeiting video detection mode which takes depth counterfeiting video detection of specific characters as a research target, constructs a personal characteristic model of a target task based on a basic method of semi-supervision and semantic segmentation, selects and classifies attribute masks of constructed human face regions, and outputs results by integrating classification weights of various attributes. Firstly, constructing a mask data set of a target character region based on semantic segmentation; and secondly, establishing a personal semantic model for visual forgery detection and identification to detect a deep forged video. In the process of manufacturing the data set, the data set is amplified by using a semi-supervised machine learning algorithm, so that the problem of insufficient data set of the specific person is solved, and the manual labeling cost is reduced.

Description

A Person-specific Visual Forgery Detection and Identification Method Based on Semantic Segmentation

技术领域technical field

本发明属于深度伪造与检测技术领域，是一种伪造视频检测方法，具体来说是一种基于语义分割的特定人物视觉伪造检测与鉴别方法。The invention belongs to the technical field of deep counterfeiting and detection, and relates to a counterfeit video detection method, specifically a semantic segmentation-based specific person visual forgery detection and identification method.

背景技术Background technique

深度伪造由“DeepFake”一词翻译而来，该单词是“Deep learning”和“Fake”的组合，即深度学习与伪造的结合。深度伪造是一种基于深度学习的技术，指的是通过交换人的脸部来制作假视频及图像。DeepFake这个词源于Reddit用户deepfakes在2017发布的一种机器学习算法，并声称该算法可以帮助他将名人脸转换成色情视频。该算法一经发布就受到了民众与媒体的热议，随之而来的是视觉深度伪造算法研究的一股热潮。2018年，BuzzFeed发布了一段关于巴拉克·奥巴马发表演讲的深度伪造视频，该视频使用Reddit用户制造的FakeApp软件制作。自2017年至2020年，关于深度伪造相关的论文由原先的3篇增长至250余篇，同时，FakeApp、Faceswap、Zao、FaceApp等能够实现无技术成本的面向大众的快捷深度伪造软件也被依次开发，由视觉深度伪造技术制作的各种类别的伪造视频也引发了人们对身份盗窃、假冒以及在社交媒体上传播虚假信息的担忧。Deep fake is translated from the word "DeepFake", which is a combination of "Deep learning" and "Fake", that is, the combination of deep learning and forgery. Deepfake is a deep learning-based technology that refers to the creation of fake videos and images by swapping people's faces. The term DeepFake stems from a machine learning algorithm released by Reddit user deepfakes in 2017, which he claimed could help him convert celebrity faces into pornographic videos. Once the algorithm was released, it was hotly discussed by the public and the media, followed by an upsurge in research on visual deepfake algorithms. In 2018, BuzzFeed released a deepfake video of Barack Obama giving a speech, created using FakeApp software created by a Reddit user. From 2017 to 2020, the number of papers related to deep forgery has increased from the original 3 to more than 250. At the same time, FakeApp, Faceswap, Zao, FaceApp and other fast deep forgery software for the public that can achieve no technical cost have also been successively published. Various categories of fake videos produced by visual deepfakes have also raised concerns about identity theft, impersonation, and the spread of disinformation on social media.

目前现有的视觉深度伪造方法大致可以分为三种类型：合成新脸、面部修饰及面部互换。其中，合成新脸指的是使用GAN创建不存在的人脸图像；面部修饰指的是为原始存在的人脸进行某些部位的修改；面部互换指的是对两张人脸进行局部或整体的交换。Existing visual deep forgery methods can be roughly divided into three types: synthetic new faces, face modification and face swapping. Among them, synthesizing a new face refers to using GAN to create a face image that does not exist; face modification refers to modifying some parts of the original existing face; face swapping refers to partially or Whole exchange.

合成新脸方法利用强大的生成对抗网络GAN，完全地创造出整个原本不存在的人脸图像，目前合成新脸技术的数据库都是基于ProGAN和StyleGAN架构创造的，并且每个被创造出来的伪造图像都会携带其特定的GAN指纹。面部修饰方法主要是为目标人脸增添一些面部修饰，如改变头发颜色或肤色、修改目标人物的性别、为目标人物增加眼镜等等，该方法也需要基于生成对抗网络GAN，目前最新的StarGAN技术可以同时将面部分成多个领域并对其进行修饰操作。面部互换方法由两部分组成，第一种是将另一个人的脸用于替换视频中目标人物的脸，这是目前视觉深度伪造方向上最流行的方法，如DeepFakes、FaceSwap都是利用的这种方法，不同于前两种方法将面部合成操作放在图像上，此方法可以用于深度伪造视频的合成；第二种方式是面部表情交换，也被称为面部重现，即将另一个人脸上的面部表情替代到目标人物的面部表情上，如通过改变奥巴马的表情和动作使其完成伪造的“演讲”。The method of synthesizing new faces uses the powerful generation confrontation network GAN to completely create the entire face image that does not exist. At present, the databases for synthesizing new faces are based on the ProGAN and StyleGAN architectures, and each fake created Each image will carry its specific GAN fingerprint. The facial modification method is mainly to add some facial modifications to the target face, such as changing the hair color or skin color, modifying the gender of the target person, adding glasses to the target person, etc. This method also needs to be based on the generation confrontation network GAN, the latest StarGAN technology Faces can be divided into multiple domains and retouching operations can be performed on them at the same time. The face swap method consists of two parts. The first is to use another person’s face to replace the face of the target person in the video. This is currently the most popular method in the direction of visual deep forgery, such as DeepFakes and FaceSwap. This method, different from the first two methods that put the facial synthesis operation on the image, this method can be used for the synthesis of deep fake videos; the second method is facial expression exchange, also known as facial reproduction, that is, another The facial expression on the human face is replaced by the facial expression of the target person, such as by changing Obama's expression and movements to make it complete a fake "speech".

视觉深度伪造检测技术主要由特征提取、模型建立、检测分类等步骤进行。首先，研究人员将待检测的图像或视频数据进行预处理，并根据先验知识或图像处理的手段确定待检测特征。接着，设计相应算法提取出确定的特征，并建立与检测任务相匹配的网络模型。最后，使用待检测数据对检测算法的性能进行测试，从而验证所选取特征的科学性及分类模型的有效性。其中，决定检测性能的关键就在于如何选择可以有效区分真假人脸的相关特征，以及如何建立分类效果良好的模型。The visual deep forgery detection technology is mainly carried out by the steps of feature extraction, model building, detection and classification. First, researchers preprocess the image or video data to be detected, and determine the features to be detected based on prior knowledge or image processing methods. Then, the corresponding algorithm is designed to extract certain features, and a network model matching the detection task is established. Finally, the performance of the detection algorithm is tested by using the data to be detected, so as to verify the scientificity of the selected features and the validity of the classification model. Among them, the key to determining the detection performance is how to select relevant features that can effectively distinguish between real and fake faces, and how to establish a model with good classification effect.

不同的深度伪造检测方法体现在检测算法流程中的侧重点不同，因此可以对检测方法进行分类：Different deep forgery detection methods have different emphases in the detection algorithm process, so the detection methods can be classified as follows:

基于具体伪影的视觉深度伪造检测技术侧重于检测流程图中的特征确定部分，从图像处理角度出发，以像素级粒度捕捉生成图像或视频中存在的模糊、抖动及叠影等异常现象。伪影特征的区分度高低直接影响着检测算法的性能优劣。The visual deepfake detection technology based on specific artifacts focuses on the feature determination part in the detection flow chart. From the perspective of image processing, it captures the abnormal phenomena such as blur, jitter, and ghosting in the generated image or video with pixel-level granularity. The degree of discrimination of artifact features directly affects the performance of detection algorithms.

基于数据驱动的视觉深度伪造检测技术侧重于检测流程图中的模型建立部分，使用精心设计的神经网络对提取到的伪造品中的时域与频域信息进行训练分类。优秀的网络设计能够更加有效地提取出潜在的细微特征。The data-driven visual deep forgery detection technology focuses on the model building part of the detection flow chart, and uses a well-designed neural network to train and classify the time domain and frequency domain information in the extracted fakes. Excellent network design can extract potential subtle features more effectively.

基于信息不一致的视觉深度伪造检测技术重点在于从生物固有特征、时间连续性、以及运动向量等高级语义出发，捕捉伪造品与客观规律间的不一致部分。由于高级语义特征的提取过程较为复杂，因此此项技术侧重于检测流程图中的特征确定以及特征提取两个部分。The visual deep forgery detection technology based on information inconsistency focuses on capturing the inconsistency between forgery and objective laws from the high-level semantics such as biological inherent characteristics, time continuity, and motion vectors. Because the extraction process of advanced semantic features is relatively complex, this technology focuses on the two parts of feature determination and feature extraction in the detection flow chart.

由于特定人物具有大量的可用真实人脸数据，根据其真实人脸，利用生成对抗网络GAN进行大量训练，可以制造出非常逼真的深度伪造人脸，同时辅以Wav2Lip等伪造技术，对特定人物的深度伪造制品容易造成恶劣影响而目前泛领域的伪造检测方法不足以很好性能的识别特定人物的伪造制品，因此需要针对特定人物，进行深度伪造检测的研究。Since a specific person has a large amount of available real face data, according to the real face, a large amount of training can be performed using the generative confrontation network GAN to create a very realistic deep fake face. Deep counterfeit products are easy to cause adverse effects, and the current general-field counterfeit detection methods are not good enough to identify counterfeit products of specific people. Therefore, it is necessary to conduct research on deep counterfeit detection for specific people.

发明内容Contents of the invention

针对上述问题，本发明提出了一种基于语义分割的特定人物视觉伪造检测与鉴别方法，有效提升伪造检测与鉴别能力。In view of the above problems, the present invention proposes a method for visual forgery detection and identification of a specific person based on semantic segmentation, which effectively improves the ability of forgery detection and identification.

本发明基于语义分割的特定人物视觉伪造检测与鉴别方法，分为语义分割部分与伪造检测与鉴别部分。The visual forgery detection and identification method of a specific person based on semantic segmentation in the present invention is divided into a semantic segmentation part and a forgery detection and identification part.

所述语义分割部分对深度伪造人脸进行语义分割：将目标人物图像根据人脸的十一个特征进行标注，形成初始训练集；利用初始训练集采用半监督的语义分割模型生成目标人物的掩膜数据集。The semantic segmentation part performs semantic segmentation on the deep fake face: the image of the target person is marked according to eleven features of the face to form an initial training set; the initial training set is used to generate a mask of the target person using a semi-supervised semantic segmentation model. Membrane dataset.

所述伪造检测与鉴别部分根据语义分割后的目标人物掩膜数据与该掩膜数据对应的人脸图片进行点乘，获取指定的图片属性区域，进一步对获得的区域属性进行模型构造，具体为：The forgery detection and identification part performs dot product according to the mask data of the target person after semantic segmentation and the face picture corresponding to the mask data to obtain the designated image attribute area, and further model the obtained area attribute, specifically as :

对每张目标人物的原始人脸图片z，通过掩膜数据集中五官各自分割开的图片掩膜a与手动选择的感兴趣五官区域向量V结合，获取面部五官感兴趣区域，然后再将其与对应的原始人脸图片进行点乘，生成所需要的面部感兴趣区域的条件张量T。For the original face image z of each target person, the image mask a separated by the facial features in the mask dataset is combined with the manually selected facial features region of interest vector V to obtain the region of interest of the facial features, and then combine it with The corresponding original face image is dot-multiplied to generate the conditional tensor T of the required face region of interest.

将输入图片z与该图片z对应的张量T进行点乘，将点乘处理结果p(z)输入到生成对抗网络中进行姿态无关的识别处理。上述选定感兴趣区域的张量T与原图z进行点乘处理的公式如下所示：Dot product the input picture z and the tensor T corresponding to the picture z, and input the dot product processing result p(z) into the generative confrontation network for pose-independent recognition processing. The formula for dot product processing the tensor T of the above selected region of interest with the original image z is as follows:

p(z)＝z·T＝z·a·Vp(z)=z·T=z·a·V

将p(z)以及给定的姿态输入生成器G；由生成器G利用给定的姿态生成相应的假图片，利用判别器D对生成的假图片的姿态以及身份进行判断，不断进行对抗训练，直至达到判别器D认为生成器G生成的假图片与原始输入图片的身份相同的临界状态，得到姿态无关的人脸图片。Input p(z) and the given pose to the generator G; the generator G uses the given pose to generate corresponding fake pictures, and uses the discriminator D to judge the pose and identity of the generated fake pictures, and continuously conduct confrontation training , until it reaches the critical state where the discriminator D believes that the fake picture generated by the generator G has the same identity as the original input picture, and a pose-independent face picture is obtained.

在姿态无关的识别处理之后，将经过姿态改变的人脸图片x的各个面部分割属性区域输入到单个的卷积神经网络中进行分类处理，并且构建一个新的CNN二分类分类器。其中，通过卷积网络学习图片特征，通过池化层减少输出维度，通过全连接层对深度特征进行融合，最终形成分类结果并输出，达到识别输入图片是正样本或负样本的目的。After the pose-independent recognition process, each face segmentation attribute area of the face image x that has undergone pose change is input into a single convolutional neural network for classification processing, and a new CNN binary classification classifier is constructed. Among them, the image features are learned through the convolutional network, the output dimension is reduced through the pooling layer, and the depth features are fused through the fully connected layer, and finally the classification result is formed and output to achieve the purpose of identifying whether the input image is a positive sample or a negative sample.

本发明的优点在于：The advantages of the present invention are:

1、本发明基于语义分割的特定人物视觉伪造检测鉴别方法，构建了基于特定人物的面部掩膜数据集，能够在不增加人工标注成本的情况下扩建人脸伪造的数据。1. The method for detecting and discriminating visual forgery of a specific person based on semantic segmentation in the present invention constructs a face mask data set based on a specific person, which can expand the data of face forgery without increasing the cost of manual labeling.

2、本发明基于语义分割的特定人物视觉伪造检测鉴别方法，能有效利用注意力机制，从而使得分类器对伪造样本的检测准确率大大提升。2. The semantic segmentation-based visual forgery detection and identification method for a specific person in the present invention can effectively use the attention mechanism, thereby greatly improving the detection accuracy of the classifier for forgery samples.

3、本发明基于语义分割的特定人物视觉伪造检测鉴别方法，构建了姿态无关模块，实现对各种姿态的输入图片的检测，增加了对伪造检测的鲁棒性。3. The present invention is based on the semantic segmentation-based visual forgery detection and identification method for a specific person, constructs a pose-independent module, realizes the detection of input pictures of various poses, and increases the robustness of forgery detection.

4、本发明基于语义分割的特定人物视觉伪造检测鉴别方法，能够对多种假脸伪造技术生成的图片及视频进行检测，增加了对伪造检测的泛化性。4. The method for visual forgery detection and identification of a specific person based on semantic segmentation of the present invention can detect pictures and videos generated by various fake face forgery technologies, increasing the generalization of forgery detection.

附图说明Description of drawings

图1为本发明基于语义分割的特定人物视觉伪造检测鉴别方法流程图；Fig. 1 is a flow chart of the method for detecting and discriminating visual forgery of a specific person based on semantic segmentation in the present invention;

图2为语义分割网络结构。Figure 2 shows the semantic segmentation network structure.

具体实施方式Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

本发明基于语义分割的特定人物视觉伪造检测与鉴别方法，如图1所示，具体步骤为：The method for detecting and discriminating visual forgery of a specific person based on semantic segmentation in the present invention, as shown in Figure 1, the specific steps are:

步骤1：基于语义分割方法，生成特定人物面部特征分割的掩膜数据集。Step 1: Based on the semantic segmentation method, generate a mask dataset for facial feature segmentation of a specific person.

101、对目标人物进行大量数据收集，形成初始训练集。101. Collect a large amount of data on the target person to form an initial training set.

数据集需要收集包括以下四个部分：目标人物的真实人脸、非目标人物的其他人物真人人脸、目标人物的深度伪造人脸和目标人物的模仿者及扮演者。从而构建模型训练的基础资源。The data set needs to be collected and includes the following four parts: the real face of the target person, the real face of other people who are not the target person, the deep fake face of the target person, and the imitators and actors of the target person. In order to build the basic resources for model training.

出于后续对视频数据的处理的考量，标人物的真实人脸可在公开视频网站中选取目标人物的正面尽量面向镜头的清晰度较高的视频，并将其进行下载。选择目标人物的真实人脸为目标收集对象，对其收集约60小时视频以保证模型能够有效学习到目标人物的真实人脸特征。对于目标人物的模仿者及扮演者，以及非目标人物的真实人脸均参照目标人物的真实人脸收集方式进行收集；而目标人物的深度伪造人脸则通过对目标人物的真实人脸分别通过FaceSwap、Wav2lip及First Order Motion三种伪造方法制作而成。上述非目标人物的真实人脸，可以选择面部特征与目标人物尽量类似的人脸，以增强模型对真实场景中相似人脸的鉴别能力。For the consideration of the subsequent processing of video data, the real face of the target person can be selected from the public video website with a high-definition video with the front of the target person facing the camera as much as possible, and download it. Select the real face of the target person as the target collection object, and collect about 60 hours of video to ensure that the model can effectively learn the real face features of the target person. For the imitators and actors of the target person, as well as the real face of the non-target person, they are collected by referring to the real face collection method of the target person; while the deep fake face of the target person is collected through the real face of the target person respectively. FaceSwap, Wav2lip and First Order Motion are produced by three forgery methods. As for the real face of the above-mentioned non-target person, a face whose facial features are as similar as possible to the target person can be selected to enhance the model's ability to identify similar faces in real scenes.

选择目标人物作为保护的目标对象，则目标人物的真实人脸的视频数据集作为正样本，将目标人物的模仿者及扮演者、该目标人物的深度伪造人脸以及非目标人物的伪造制品数据集作为负样本。Select the target person as the target object of protection, then the video data set of the real face of the target person is used as a positive sample, and the imitator and actor of the target person, the deep fake face of the target person, and the counterfeit product data of the non-target person set as a negative sample.

102、对收集到的目标人物的真实人脸数据集与伪造制品数据集进行面部的语义分割。102. Perform facial semantic segmentation on the collected real face datasets and counterfeit product datasets of the target person.

对所有正样本及负样本的人脸，选取视频中随机抽帧的N张图片，使用LabelMe标注工具对其进行手动标注，如图1所示，手动标注过程中需要对11处部位的所在区域进行标记，分别是：左眼，右眼，左眉，右眉，鼻子，上嘴唇，下嘴唇，头发，左耳，右耳与脖子。之后将标注产生的json格式文件与标注的图片原图一同进行处理，得到不同面部类别标签的掩膜数据集。For the faces of all positive samples and negative samples, select N pictures of randomly selected frames in the video, and use the LabelMe labeling tool to manually label them, as shown in Figure 1. During the manual labeling process, the areas where 11 parts are located need to be identified. Mark, respectively: left eye, right eye, left eyebrow, right eyebrow, nose, upper lip, lower lip, hair, left ear, right ear and neck. Afterwards, the json format file generated by the annotation is processed together with the original image of the annotation to obtain a mask dataset of different facial category labels.

103、采用前一步手动标注的掩膜数据集中的图片通过构建半监督机器学习的语义分割网络Deeplabv3+来训练机器标注方法。103. Use the pictures in the mask dataset manually marked in the previous step to train the machine labeling method by constructing a semantic segmentation network Deeplabv3+ for semi-supervised machine learning.

选用语义分割网络Deeplabv3+训练，输入步骤102中手动标注生成的N张掩膜数据集图片，通过深度学习模型，对步骤1中收集的正负样本中剩下的M张未经过标注的图片进行机器自动标注，实现半监督机器学习的标注，方法如下：Select the semantic segmentation network Deeplabv3+ for training, input the N mask dataset pictures generated manually in step 102, and use the deep learning model to machine the remaining M unmarked pictures in the positive and negative samples collected in step 1. Automatic labeling, to achieve semi-supervised machine learning labeling, the method is as follows:

A、如图2所示，构建出同样结构、权重初始值不同的两个语义分割网络P₁和P₂：A. As shown in Figure 2, two semantic segmentation networks P ₁ and P ₂ with the same structure and different weight initial values are constructed:

P₁＝f(X；θ₁)P ₁ = f(X; θ ₁ )

P₂＝f(X；θ₂)P ₂ = f(X; θ ₂ )

P₁和P₂为两个语义分割网络Deeplabv3+，两者只是权重参数初始值不同。P ₁ and P ₂ are two semantic segmentation networks Deeplabv3+, and the two are only different in the initial value of the weight parameter.

其中，X表示对N张已标注图片实行数据增强后的输入图片；θ₁与θ₂分别表示P₁与P₂两个网络的权重；Y表示两个语义分割网络得到的伪标签，即初步分割结果。其中，两个语义分割网络P₁和P₂采用Deeplabv3+，分类器Y1和Y2采用ResNet101网络。Deeplabv3+相对于别的语义分割网络结构，能够使得网络中靠前的层可对输入特征的卷积或者池化来对不同尺度的上下文信息进行编码，同时使得网络中靠后的层可以通过逐渐回复空间信息捕捉到清晰的物体边界，适用于面部的语义分割。对两个分割网络，通过argmax操作得到对应的one-hot标签Y₁和Y₂。然后将这两个伪标签作为监督信号，用Y₂作为P₁的监督，Y₁作为P₂的监督，并用交叉熵损失函数约束，以提升语义分割网络的性能。Among them, X represents the input image after data enhancement of N marked images; θ ₁ and θ ₂ represent the weights of P ₁ and P ₂ networks respectively; Y represents the pseudo-label obtained by two semantic segmentation networks, that is, the preliminary Split results. Among them, the two semantic segmentation networks P ₁ and P ₂ use Deeplabv3+, and the classifiers Y1 and Y2 use the ResNet101 network. Compared with other semantic segmentation network structures, Deeplabv3+ can enable the upper layers of the network to encode context information of different scales by convolution or pooling of input features, and at the same time enable the lower layers of the network to gradually recover Spatial information captures clear object boundaries and is suitable for semantic segmentation of faces. For the two segmentation networks, the corresponding one-hot labels Y ₁ and Y ₂ are obtained through the argmax operation. Then these two pseudo-labels are used as supervision signals, _Y2 is used as the supervision of _P1 , _Y1 is used as the supervision of _P2 , and constrained by the cross-entropy loss function to improve the performance of the semantic segmentation network.

最终将使用语义分割网络机器生成标注的M张图片与原始手动标注的N张图片结合构成Mask掩膜数据集。Finally, the M images labeled by the semantic segmentation network machine will be combined with the original manually labeled N images to form the Mask mask dataset.

步骤2：建立个人语义模型进行视觉伪造检测与鉴别Step 2: Build a personal semantic model for visual forgery detection and identification

本发明构建一种特定人物视觉伪造检测与鉴别的方法，按流程分为数据预处理、姿态无关的识别、模型构建；具体如下：The present invention constructs a method for visual forgery detection and identification of a specific person, which is divided into data preprocessing, posture-independent recognition, and model construction according to the process; the details are as follows:

201：原始目标人脸数据预处理201: Preprocessing of original target face data

将步骤101中收集的目标人物的原始人脸图片标记为z∈Z，其中Z为步骤101中收集的所有人脸数据，是包括正样本及其负样本的人脸集合，对每张人脸图，通过步骤1得到的Mask掩膜数据集中五官各自分割开的图片掩膜a与手动选择的感兴趣五官区域向量V(如鼻子、眼睛、耳朵分别表示一个向量，选中鼻子即向量V为[1，0，0])结合，获取面部五官感兴趣区域，然后再将其与对应的原始人脸图片进行点乘，生成所需要的面部感兴趣区域的条件张量T。The original face picture of the target person collected in step 101 is marked as z ∈ Z, where Z is all the face data collected in step 101, which is a collection of faces including positive samples and negative samples. For each face Figure 1 shows the image mask a of the facial features separated from the Mask mask data set obtained in step 1 and the manually selected facial features area vector V (such as the nose, eyes, and ears represent a vector, and the selected nose is the vector V as [ 1, 0, 0]) to obtain the region of interest of facial features, and then dot-multiply it with the corresponding original face image to generate the conditional tensor T of the required facial region of interest.

p(z)＝z·T＝z·a·Vp(z)=z·T=z·a·V

202：姿态无关的识别202: Attitude-independent recognition

将每张图片z与该图片z对应的张量T点乘结果p(z)以及给定的姿态输入生成器G；本发明中设定姿态为人脸正面朝向摄像头的方向，表示人脸与正前方的角度为0°。生成器G利用给定的姿态生成相应的假图片，利用判别器D对生成的假图片的姿态以及身份进行判断，不断进行对抗训练，直至达到判别器D认为生成器G生成的假图片与原始输入图片的身份相同的临界状态，就得到姿态无关的人脸图片。Each picture z is multiplied by the tensor T point corresponding to the picture z and the result p(z) and the given attitude are input to the generator G; in the present invention, the attitude is set as the direction in which the front of the face faces the camera, indicating that the face and the front The front angle is 0°. The generator G uses the given pose to generate the corresponding fake picture, uses the discriminator D to judge the pose and identity of the generated fake picture, and continuously conducts confrontation training until the discriminator D believes that the fake picture generated by the generator G is the same as the original When the critical state of the same identity of the input image is obtained, the pose-independent face image is obtained.

203：深度伪造检测模型构建203: Deep forgery detection model construction

在姿态无关的识别处理之后，将经过姿态改变的人脸图片x的各个面部分割属性区域输入到单个的卷积神经网络中进行分类处理，并且构建一个新的CNN二分类分类器。其中，通过卷积网络学习图片特征，通过池化层减少输出维度。通过全连接层对深度特征进行融合，最终形成分类结果并输出，达到识别输入图片是正样本或负样本的目的。After the pose-independent recognition process, each face segmentation attribute area of the face image x that has undergone pose change is input into a single convolutional neural network for classification processing, and a new CNN binary classification classifier is constructed. Among them, the image features are learned through the convolutional network, and the output dimension is reduced through the pooling layer. The deep features are fused through the fully connected layer, and finally the classification result is formed and output to achieve the purpose of identifying whether the input picture is a positive sample or a negative sample.

其中，在损失函数的设计上，用二元交叉熵(BCE)损失函数衡量其分类损失，L_Y的形式化定义为如下公式：Among them, in the design of the loss function, the binary cross entropy (BCE) loss function is used to measure the classification loss, and the formal definition of _LY is as follows:

L_Y(x，y)＝BCE(p，y)＝-(y*log(p))+(1-y)*log(1-p)L _Y (x, y) = BCE (p, y) = -(y*log(p))+(1-y)*log(1-p)

其中，p是分类器的预测分类输出，y∈{0，1}为真假标签，在二分类任务中采用sigmoid激活函数对输出进行处理。Among them, p is the predicted classification output of the classifier, y ∈ {0, 1} is the true and false label, and the sigmoid activation function is used to process the output in the binary classification task.

综上，本发明基于语义分割的特定人物视觉伪造检测与鉴别方法，建立特定人物的面部图像分割的掩膜训练数据集，利用半监督机器学习算法扩增数据集，解决特定人物数据集不足的问题并降低人工标注成本；在二分类检测模型之前加入语义分割模块和姿态改变模块，采用不同特征建模并拟合的方式，提升伪造检测与鉴别能力，能够提升二分类检测模型的准确率5-10个百分点。In summary, the present invention is based on semantic segmentation-based specific person visual forgery detection and identification method, establishes a mask training data set for facial image segmentation of a specific person, uses a semi-supervised machine learning algorithm to amplify the data set, and solves the problem of insufficient specific person data sets. problem and reduce the cost of manual labeling; add semantic segmentation module and attitude change module before the binary classification detection model, adopt different feature modeling and fitting methods, improve the ability of forgery detection and identification, and can improve the accuracy of the binary classification detection model5 -10 percentage points.

Claims

1. A specific person visual forgery detection and identification method based on semantic segmentation, characterized in that: it is divided into a semantic segmentation part and a forgery detection and identification part;

The semantic segmentation part performs semantic segmentation on the deep fake face: the image of the target person is marked according to eleven features of the face to form an initial training set; the initial training set is used to generate a mask of the target person using a semi-supervised semantic segmentation model. Membrane dataset;

The forgery detection and identification part performs dot product according to the mask data of the target person after semantic segmentation and the face picture corresponding to the mask data to obtain the designated image attribute area, and further model the obtained area attribute, specifically as :

For the original face image z of each target person, the image mask a separated by the facial features in the mask dataset is combined with the manually selected facial features region of interest vector V to obtain the region of interest of the facial features, and then combine it with The corresponding original face image is dot-multiplied to generate the conditional tensor T of the required facial region of interest;

Dot multiplication of the input picture z and the conditional tensor T corresponding to the picture z, and input the dot product processing result p(z) into the generation confrontation network for attitude-independent recognition processing; the above-mentioned required conditions for the facial region of interest The formula for dot product processing tensor T and original image z is as follows:

p(z)=z·T=z·a·V

Input p(z) and the given pose to the generator G; the generator G uses the given pose to generate corresponding fake pictures, and uses the discriminator D to judge the pose and identity of the generated fake pictures, and continuously conduct confrontation training , until it reaches the critical state where the discriminator D believes that the fake picture generated by the generator G has the same identity as the original input picture, and a pose-independent face picture is obtained;

After the pose-independent recognition process, input the face segmentation attribute regions of the face image x that has undergone pose changes into a single convolutional neural network for classification processing, and construct a new CNN binary classification classifier; where, by The convolutional network learns image features, reduces the output dimension through the pooling layer, and fuses the deep features through the fully connected layer, and finally forms a classification result and outputs it to achieve the purpose of identifying whether the input image is a positive sample or a negative sample.

2. A kind of specific character visual forgery detection and discrimination method based on semantic segmentation as claimed in claim 1, is characterized in that: semantic segmentation method is:

Manually annotate N pictures of randomly selected frames in the selected face image video. During the manual annotation process, the areas where 11 parts are located need to be marked; after that, the json format file generated by the annotation will be processed together with the original image of the annotated image. , to get mask datasets of different face category labels;

Select the semantic segmentation network Deeplabv3+ for training, input N mask dataset pictures generated by manual labeling, and use the deep learning model to automatically label the remaining M unlabeled pictures in the face image to realize semi-supervised machine learning label, specifically:

Construct two semantic segmentation networks P ₁ and P ₂ with the same structure and different weight initial values:

P ₁ = f(X; θ ₁ )

P ₂ = f(X; θ ₂ )

Among them, X represents the input image after data enhancement for N marked images; θ ₁ and θ ₂ represent the weights of P ₁ and P ₂ networks respectively; Y represents the pseudo-label obtained by two semantic segmentation networks; A segmentation network, the corresponding one-hot labels Y ₁ and Y ₂ are obtained through the argmax operation; then these two pseudo-labels are used as supervision signals, Y ₂ is used as the supervision of P ₁ , Y ₁ is used as the supervision of P ₂ , and crossover Entropy loss function constraints; finally, the M pictures generated and marked by the semantic segmentation network machine will be combined with the original manually marked N pictures to form the mask data set of the target person.

3. the forgery detection and discrimination method based on the specific person's vision of semantic segmentation as claimed in claim 1 is characterized in that: on the design of classifier loss function, measure its classification loss with binary cross entropy BCE loss function, L _Y The formal definition of is the following formula:

L _Y (x,y)=BCE(p,y)=-(y*log(p))+(1-y)*log(1-p)

Among them, x represents the input image, p is the predicted classification output of the classifier, y∈{0,1} is the true and false label, and the sigmoid activation function is used to process the output in the binary classification task.