CN114360034A

CN114360034A - Method, system and equipment for detecting deeply forged human face based on triplet network

Info

Publication number: CN114360034A
Application number: CN202210269883.0A
Authority: CN
Inventors: 王中元; 梁步云; 黄宝金; 王骞; 王闻捷; 艾家欣
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-04-15

Abstract

The invention discloses a method, a system and equipment for detecting a deep forged face based on a triplet network

Preprocessing into 299 x 3 size, inputting into the main feature extraction network of the three-cell network to obtain the depth feature of the face imageNet(I)，Net(I)Is a characteristic vector of 2048 dimensions; then using the classification network pairNet(I)And (3) classifying, outputting a 2-dimensional feature by a classification network for the input 2048-dimensional feature vector, converting the numerical value of the 2-dimensional feature into a probability through Softmax processing, and expressing the relative probability of authenticity of the picture, wherein the picture with the probability greater than a preset value is a forged face picture. The three-cell network can extract more effective true and false face identification characteristics and has more accurate deep fake face detection effect.

Description

Deep forgery face detection method, system and device based on triplets network

技术领域technical field

本发明属于人工智能安全技术领域，涉及一种深度伪造人脸检测方法、系统及设备，具体涉及一种基于三胞胎网络的深度伪造人脸检测方法、系统及设备。The invention belongs to the technical field of artificial intelligence security, and relates to a deep forgery face detection method, system and equipment, in particular to a deep forgery face detection method, system and equipment based on a triplets network.

背景技术Background technique

深度伪造是一种基于深度学习等智能化方法创建或合成伪造内容（如图像、视频）的技术。近年来，随着深度学习技术的发展，深度伪造正在以前所未有的速度发展。目前，深度伪造技术不仅可以生成换脸图像、模仿真人说话的动作表情，还可以创造出现实中不存在的人物，并且很难分辨，颠覆了“眼见为实”的传统观念。Deepfake is a technology to create or synthesize fake content (such as images, videos) based on intelligent methods such as deep learning. In recent years, with the development of deep learning technology, deepfakes are developing at an unprecedented speed. At present, deepfake technology can not only generate face-changing images and imitate the facial expressions of real people, but also create characters that do not exist in reality and are difficult to distinguish, subverting the traditional concept of "seeing is believing".

深度伪造技术一旦被滥用，将对个人、社会和国家带来极大的危害。Once deepfake technology is abused, it will bring great harm to individuals, society and the country.

对抗“深度伪造”最好的方法就是“深度伪造检测”。深度伪造检测技术目的是检测一张图像或者视频有无通过深度伪造技术被伪造。目前的主流检测方法包括了利用传统图像特征的检测方法和基于深度学习的检测方法。随着深度学习技术的发展，越来越多的新颖的深度伪造检测技术被应用。研究人员通过架构不同的卷积神经网络结构，提取出图像中的深度特征，利用深度特征来判别人脸是否被深度伪造。为了提升特征的表达能力，研究人员不断提出新的网络架构，主流的包括Xception网络、残差网络以及DenseNet。也有研究人员在卷积神经网络中引入了频域信息，增加了特征的表达能力。The best way to fight "deepfake" is "deepfake detection". The purpose of deepfake detection technology is to detect whether an image or video has been forged through deepfake technology. The current mainstream detection methods include detection methods using traditional image features and detection methods based on deep learning. With the development of deep learning technology, more and more novel deepfake detection techniques are applied. By building different convolutional neural network structures, the researchers extract the depth features in the image, and use the depth features to determine whether the face is deep forged. In order to improve the expressiveness of features, researchers continue to propose new network architectures, the mainstream ones including Xception network, residual network and DenseNet. Some researchers have also introduced frequency domain information in convolutional neural networks to increase the expressiveness of features.

然而，尽管这些卷积神经网络结构能够很好的提取出图像的主要特征，但是这些单样本输入的网络很容易关注到与图片真伪属性无关的特征表达，例如背景特征、肤色特征等等，并且很难捕获图片真伪属性相关的内在特征表达，尤其是当多个图片外观相似，但真伪属性不同时，这些网络很容易提取出相似的图像特征，从而影响检测的准确性。However, although these convolutional neural network structures can extract the main features of the image well, these single-sample input networks are easy to pay attention to the feature expressions that are not related to the authenticity of the image, such as background features, skin color features, etc. And it is difficult to capture the intrinsic feature expression related to the authenticity attributes of images, especially when multiple images have similar appearance but different authenticity attributes, these networks can easily extract similar image features, thus affecting the accuracy of detection.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本发明提出了一种基于三胞胎网络的深度人脸伪造检测方法、系统及电子设备。将原始图片、目标图片和伪造图片这3个耦合样本对输入网络进行学习，能让网络捕获相像外观、但真假属性不同的特征表达。In order to solve the above technical problems, the present invention proposes a deep face forgery detection method, system and electronic device based on triplets network. The three coupled samples of the original image, the target image and the fake image are used to learn the input network, which enables the network to capture feature expressions with similar appearance but different true and false attributes.

本发明的方法所采用的技术方案是：一种基于三胞胎网络的深度伪造人脸检测方法，包括以下步骤：The technical scheme adopted by the method of the present invention is: a deep forgery face detection method based on triplets network, comprising the following steps:

步骤1：将待检测的伪造人脸图像I预处理成预设大小并输入到三胞胎网络的主干特征提取网络中，得到人脸图像的深度特征Net(I)；Net(I)是一个2048维的特征向量；Step 1: Preprocess the fake face image I to be detected into a preset size and input it into the backbone feature extraction network of the triplets network to obtain the depth feature Net(I) of the face image; Net(I) is a 2048-dimensional feature vector;

步骤2：利用分类网络对Net(I)进行分类，对于输入的2048维特征向量，分类网络输出一个2维特征，通过Softmax处理将2维特征的数值转化为相对概率，用来表示图片真伪的相对概率；其中，概率大于预设值的图片为伪造人脸图片；Step 2: Use the classification network to classify Net(I) . For the input 2048-dimensional feature vector, the classification network outputs a 2-dimensional feature, and the value of the 2-dimensional feature is converted into a relative probability through Softmax processing, which is used to represent the authenticity of the picture. The relative probability of ; wherein, the picture with the probability greater than the preset value is a fake face picture;

所述三胞胎网络的主干特征提取网络，采用Xception网络的骨架，包括入口流、中间流和出口流；入口流包含2次3×3的卷积并使用ReLU激活函数激活以及3个卷积块；中间流包含8个卷积模块；出口流包含一个卷积块以及两次3×3的深度可分卷积，并使用ReLU函数进行激活，最后执行一次平均池化操作；三个主干特征提取网络共享一个权重；The backbone feature extraction network of the triplet network adopts the skeleton of the Xception network, including the entry flow, the intermediate flow and the exit flow; the entry flow contains two 3×3 convolutions and uses the ReLU activation function to activate and three convolutions. block; the middle stream contains 8 convolution modules; the outlet stream contains a convolution block and two 3×3 depthwise separable convolutions, activated with the ReLU function, and finally performs an average pooling operation; three backbone features Extract the network share a weight;

所述分类网络，采用BP神经网络，包含三层，第一层为输入层，包含2048个节点；中间层包含1024个节点，输出层包含2个节点；每个层之间使用ReLU激活函数进行激活。The classification network adopts BP neural network, including three layers, the first layer is the input layer, including 2048 nodes; the middle layer includes 1024 nodes, and the output layer includes 2 nodes; ReLU activation function is used between each layer activation.

本发明的系统所采用的技术方案是：一种基于三胞胎网络的深度伪造人脸检测系统，包括以下模块：The technical scheme adopted by the system of the present invention is: a deep forgery face detection system based on triplet network, comprising the following modules:

模块1，用于将待检测的伪造人脸图像I预处理成预设大小并输入到三胞胎网络的主干特征提取网络中，得到人脸图像的深度特征Net(I)；Net(I)是一个2048维的特征向量；Module 1 is used to preprocess the forged face image I to be detected into a preset size and be input into the backbone feature extraction network of the triplets network, obtain the depth feature Net(I) of the face image; Net(I) is a 2048-dimensional feature vector;

模块2，用于利用分类网络对Net(I)进行分类，对于输入的2048维特征向量，分类网络输出一个2维特征，通过Softmax处理将2维特征的数值转化为相对概率，用来表示图片真伪的相对概率；其中，概率大于预设值的图片为伪造人脸图片；Module 2 is used to classify Net(I) by using the classification network. For the input 2048-dimensional feature vector, the classification network outputs a 2-dimensional feature, and the value of the 2-dimensional feature is converted into a relative probability through Softmax processing, which is used to represent the picture. Relative probability of authenticity; among them, pictures with a probability greater than the preset value are forged face pictures;

所述三胞胎网络的主干特征提取网络，采用Xception网络的骨架，包括入口流、中间流和出口流；入口流包含2次3×3的卷积并使用ReLU激活函数激活以及3个卷积块；中间流包含8个卷积模块；出口流包含一个卷积块以及两次3×3的深度可分卷积，并使用ReLU函数进行激活，最后执行一次平均池化操作；三个主干特征提取网络共享一个权重；The backbone feature extraction network of the triplet network adopts the skeleton of the Xception network, including the entry flow, the intermediate flow and the exit flow; the entry flow contains two 3×3 convolutions and uses the ReLU activation function to activate and three convolutions. block; the middle stream contains 8 convolution modules; the outlet stream contains a convolution block and two 3×3 depthwise separable convolutions, which are activated using the ReLU function, and finally perform an average pooling operation; three backbone features Extract the network share a weight;

本发明的设备所采用的技术方案是：一种基于三胞胎网络的深度伪造人脸检测设备，包括：The technical scheme adopted by the device of the present invention is: a deep forgery face detection device based on triplet network, including:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行时，使得所述一个或多个处理器实现所述的基于三胞胎网络的深度伪造人脸检测方法。A storage device for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the triplets-based network The deepfake face detection method.

本发明的优点和积极效果：Advantages and positive effects of the present invention:

（1）本发明采用原始图片、目标图片和伪造图片这3个耦合样本对输入网络进行学习。相较于单样本输入网络，使用三元组能促使网络捕获相像外观、但真假属性不同的特征表达。(1) The present invention uses three coupled samples of the original image, the target image and the fake image to learn the input network. Compared to a single-sample input network, using triples enables the network to capture feature representations that look alike but differ in their true and false attributes.

（2）本发明实现深度伪造人脸的鉴别，解决实际应用场景中伪造人脸带来的安全问题。(2) The present invention realizes the identification of deep forged faces, and solves the security problems caused by forged faces in practical application scenarios.

附图说明Description of drawings

图1为本发明实施例的方法原理图；1 is a schematic diagram of a method according to an embodiment of the present invention;

图2为本发明实施例构建的三胞胎网络示意图；2 is a schematic diagram of a triplets network constructed in an embodiment of the present invention;

图3为本发明实例构建的分类网络的示意图；3 is a schematic diagram of a classification network constructed by an example of the present invention;

图4为本发明实例构建的网络的实验结果图。FIG. 4 is an experimental result diagram of a network constructed by an example of the present invention.

具体实施方式Detailed ways

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate the understanding and implementation of the present invention by those of ordinary skill in the art, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit it. this invention.

现有的卷积神经网络结构仅可以提取出图像的深度特征，但是这些深度特征并没有关注到与图像真伪属性相关的特征表达。而将原始图片、目标图片和伪造图片这3个耦合样本送入三胞胎网络进行学习，能让三胞胎网络捕获相像外观、但真假属性不同的特征表达。The existing convolutional neural network structure can only extract the depth features of the image, but these depth features do not pay attention to the feature expression related to the authenticity of the image. The original image, target image and fake image, three coupled samples, are sent to the triplets network for learning, which enables the triplets network to capture feature expressions with similar appearance but different true and false attributes.

请见图1，本发明提供的一种基于三胞胎网络的深度伪造人脸检测方法，包括以下步骤：See Fig. 1, a kind of deep forgery face detection method based on triplets network provided by the present invention comprises the following steps:

步骤1：将待检测的伪造人脸图像I预处理成299×299×3的大小并输入到三胞胎网络的主干特征提取网络中，得到人脸图像的深度特征Net(I)；Net(I)是一个2048维的特征向量；Step 1: Preprocess the fake face image I to be detected into a size of 299×299×3 and input it into the backbone feature extraction network of the triplets network to obtain the depth feature Net(I) of the face image; Net( I) is a 2048-dimensional feature vector;

步骤2：利用分类网络对Net(I)进行分类，对于输入的2048维特征向量，分类网络输出一个2维特征，通过Softmax处理将2维特征的数值转化为相对概率，用来表示图片真伪的相对概率；其中，概率大于预设值的图片为伪造人脸图片。Step 2: Use the classification network to classify Net(I) . For the input 2048-dimensional feature vector, the classification network outputs a 2-dimensional feature, and the value of the 2-dimensional feature is converted into a relative probability through Softmax processing, which is used to represent the authenticity of the picture. The relative probability of ; wherein, the picture with the probability greater than the preset value is a fake face picture.

请见图2，本实施例的主干特征提取网络，采用Xception网络的骨架，包括入口流、中间流和出口流；入口流是将一张299×299×3的图片转换为一个19×19×728的特征图，包含2次3×3的卷积并使用ReLU激活函数激活以及3个卷积块；中间流包含8个卷积模块；出口流是将一张19×19×728的特征图转化为一个2048维的特征向量，包含一个卷积块以及两次3×3的深度可分卷积，并使用ReLU函数进行激活，最后执行一次平均池化操作；三个主干特征提取网络共享一个权重，最终通过主干特征提取网络，将一张299×299×3的图像转换为一个2048维的特征向量。Please refer to Figure 2. The backbone feature extraction network of this embodiment adopts the skeleton of the Xception network, including inlet flow, intermediate flow and outlet flow; the inlet flow is to convert a 299×299×3 picture into a 19×19× The feature map of 728 contains 2 convolutions of 3×3 and uses the ReLU activation function to activate and 3 convolution blocks; the intermediate stream contains 8 convolution modules; the outlet stream is a 19×19×728 feature map Converted into a 2048-dimensional feature vector, containing a convolution block and two 3×3 depthwise separable convolutions, activated using the ReLU function, and finally performing an average pooling operation; three backbone feature extraction networks share one Finally, through the backbone feature extraction network, a 299×299×3 image is converted into a 2048-dimensional feature vector.

请见图3，本实施例的分类网络，采用BP神经网络，包含三层，第一层为输入层，包含2048个节点；中间层包含1024个节点，输出层包含2个节点；每个层之间使用ReLU激活函数进行激活。Please refer to Figure 3, the classification network of this embodiment adopts BP neural network, including three layers, the first layer is the input layer, including 2048 nodes; the middle layer includes 1024 nodes, and the output layer includes 2 nodes; each layer Use the ReLU activation function to activate between them.

本实施例的主干特征提取网络，是训练好的主干特征提取网络；其训练过程包括以下子步骤：The backbone feature extraction network of this embodiment is a trained backbone feature extraction network; the training process includes the following substeps:

步骤1.1：采集若干原始图像-目标图像-伪造图像的三元组，记为(original,target,fake)；Step 1.1: Collect several original image-target image-fake image triples, denoted as ( original , target , fake );

本实施例首先将伪造人脸视频和原始人脸视频和目标人脸视频进行下采样，选取视频中从特定帧开始的、固定帧数间隔的帧图像，保持伪造人脸视频与其来源的原始人脸视频以及目标人脸视频下采样得到的视频帧一一对应。本本实施例选取从0秒开始、每秒一帧、每个视频共10帧图像作为原始数据集；In this embodiment, the forged face video, the original face video and the target face video are down-sampled first, and frame images starting from a specific frame and with a fixed frame interval are selected in the video, and the forged face video and the original person from which it is derived are kept. There is a one-to-one correspondence between the face video and the video frames obtained by downsampling the target face video. This embodiment selects a total of 10 frames of images starting from 0 seconds, one frame per second, and each video as the original data set;

本实施例接着在获取的帧图像中使用RetinaFace人脸检测算法检测人脸区域，裁剪人脸图像。提取5个面部特征点，包括左眼、右眼、鼻子、左边嘴角和右边嘴角；通过面部特征点将面部对齐，使得对齐后的人脸位于图像中心；将图片通过opencv调整大小为长299像素、宽299像素、3通道数，将处理后的人脸图像组织为原始人脸-目标人脸-伪造人脸三元组数据，记为 (original,target,fake)。In this embodiment, the RetinaFace face detection algorithm is used to detect the face region in the acquired frame image, and the face image is cropped. Extract 5 facial feature points, including left eye, right eye, nose, left corner of mouth and right corner of mouth; align the face through facial feature points so that the aligned face is in the center of the image; resize the image to 299 pixels long by opencv , 299 pixels wide, 3 channels, organize the processed face image into original face-target face-fake face triple data, denoted as ( original , target , fake ).

本实施例以步骤1.1中获取将处理后的人脸图像组织为原始人脸-目标人脸-伪造人脸三元组数据分别作为输入和监督样本，不断训练主干特征提取网络。In this embodiment, the processed face image is organized into the original face-target face-forged face triple data obtained in step 1.1 as input and supervision samples respectively, and the backbone feature extraction network is continuously trained.

步骤1.2：对于每一组三元组(original,target,fake)，将三元组中的图像依次送入三胞胎网络的主干特征提取网络Net( )；分别得到图像的深度特征，记为(Net(original)，Net(target)，Net(fake))；其中，主干特征提取网络Net( )的权重共享；Step 1.2: For each group of triples ( original , target , fake ), the images in the triples are sequentially sent to the backbone feature extraction network Net( ) of the triplet network; the depth features of the images are obtained respectively, denoted as ( Net ( original ), Net ( target ), Net ( fake )); wherein, the weights of the backbone feature extraction network Net ( ) are shared;

步骤1.3: 计算原始图像深度特征与目标图像的深度特征的特征距离Dis(Net(original),Net(target))和目标图像深度特征和伪造图像深度特征的特征距离Dis(Net(target),Net(fake))，其中Dis(a,b)代表两个特征向量a,b之间的特征距离，特征向量距离计算公式如下：Step 1.3: Calculate the feature distance Dis ( Net ( original ), Net ( target )) of the depth feature of the original image and the depth feature of the target image and the feature distance Dis ( Net ( target ), Net of the depth feature of the target image and the depth feature of the fake image ( fake )), where Dis ( a , b ) represents the characteristic distance between two eigenvectors a , b , and the calculation formula of the eigenvector distance is as follows:

；

;

其中

代表L2距离； in

represents the L2 distance;

步骤1.4：根据步骤1.3计算出的特征距离，计算三胞胎网络损失函数loss，为了驱动网络模型关注的不是外观相像特征（哪怕外观不像，深度特征也要接近），而是真假属性表达（哪怕外观像，深度特征也要远离），希望两个真实图片特征距离Dis(Net(original),Net(target))要尽可能的小，真实与虚假图像的特征距离Dis(Net(target),Net(fake))要尽可能大。因此损失函数计算公式如下：Step 1.4: Calculate the loss function loss of the triplets network according to the feature distance calculated in step 1.3. In order to drive the network model, the focus is not on the appearance-like features (even if the appearance is not the same, the depth features should be close), but the true and false attribute expression (Even if the appearance is similar, the depth feature should be far away), I hope that the distance between the two real image features Dis ( Net ( original ), Net ( target )) should be as small as possible, and the distance between the real and fake images is Dis ( Net ( target ) , Net ( fake )) should be as large as possible. Therefore, the loss function calculation formula is as follows:

loss=max (Dis(Net(original),Net(target))-Dis(Net(target),Net(fake))+margin,0)； loss = max ( Dis ( Net ( original ), Net ( target )) - Dis ( Net ( target ), Net ( fake )) + margin , 0);

其中，margin为超参数，用来设定两特征距离之间的间隔；Among them, margin is a hyperparameter, used to set the interval between two feature distances;

本实施例中，设定margin=0.2，这意味着：当原始图片与目标图片的特征距离比目标图片计与伪造图片的特征距离小0.2时，不会产生损失。In this embodiment, margin = 0.2 is set, which means that no loss occurs when the feature distance between the original image and the target image is 0.2 smaller than the feature distance between the target image and the fake image.

步骤1.5：计算出损失后，使用Adam优化器对主干特征提取网络进行反向传播并优化；Step 1.5: After calculating the loss, use the Adam optimizer to back-propagate and optimize the backbone feature extraction network;

步骤1.6：重复步骤1.1-1.5，训练主干特征提取网络至收敛，获得训练好的主干特征提取网络。Step 1.6: Repeat steps 1.1-1.5 to train the backbone feature extraction network to convergence, and obtain the trained backbone feature extraction network.

本实施例的分类网络，是训练好的分类网络；其训练过程包括以下子步骤：The classification network of the present embodiment is a trained classification network; its training process includes the following substeps:

步骤2.1：将待检测的伪造人脸图像I预处理成299×299×3的大小并输入训练好的三胞胎网络中的主干特征提取网络中，得到人脸图像的深度特征Net(I)； Net(I)是一个2048维的特征向量；Step 2.1: Preprocess the fake face image I to be detected into a size of 299×299×3 and input it into the backbone feature extraction network in the trained triplets network to obtain the depth feature of the face image Net(I) ; Net(I) is a 2048-dimensional feature vector;

步骤2.2：利用分类网络对Net(I)进行分类，对于输入的2048维特征向量，分类网络输出一个2维特征，通过Softmax处理将2维特征的数值转化为相对概率，用来表示图片真伪的相对概率；Step 2.2: Use the classification network to classify Net(I) . For the input 2048-dimensional feature vector, the classification network outputs a 2-dimensional feature, and the value of the 2-dimensional feature is converted into a relative probability through Softmax processing, which is used to indicate the authenticity of the picture. the relative probability of ;

步骤2.3：计算预测结果与实际结果之间的交叉熵损失函数loss，loss的计算方法如下：Step 2.3: Calculate the cross-entropy loss function loss between the predicted result and the actual result. The calculation method of loss is as follows:

；

;

其中p _i代表样本i为真的概率，y _i代表样本i的标签，如果样本i为伪造图片，则y _i=0，否则 y _i=1；where pi represents the probability that sample _i is true, yi represents the label of sample i , if sample _i is a fake image, then yi ₌ 0, otherwise yi = ₁ ;

步骤2.4：计算出交叉熵损失后，使用SGD优化器，梯度下降法反向传播并优化分类网络；Step 2.4: After calculating the cross entropy loss, use the SGD optimizer, backpropagation with gradient descent method and optimize the classification network;

步骤2.5：重复步骤2.1和步骤2.4，直至分类网络收敛，获得训练好的分类网络。Step 2.5: Repeat steps 2.1 and 2.4 until the classification network converges, and a trained classification network is obtained.

完成上述步骤后，对网络进行实验。混合伪造图像和真实图像并送入训练好的三胞胎网络进行特征提取。将提取出的特征向量映射在二维平面直角坐标上，结果如图4所示。通过图4可以看出，伪造人脸的特征和真实人脸的特征有显著的区分。本发明能通过检测人脸图像的真伪性，且检测结果具有较高的可信度。After completing the above steps, experiment with the network. The fake and real images are mixed and fed into the trained triplets network for feature extraction. The extracted feature vector is mapped on the two-dimensional plane rectangular coordinates, and the result is shown in Figure 4. It can be seen from Figure 4 that the features of fake faces are significantly different from those of real faces. The present invention can detect the authenticity of the face image, and the detection result has high reliability.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed, and therefore should not be considered as a limitation on the protection scope of the patent of the present invention. In the case of the protection scope, substitutions or deformations can also be made, which all fall within the protection scope of the present invention, and the claimed protection scope of the present invention shall be subject to the appended claims.

Claims

1. A deep fake face detection method based on a triplet network is characterized by comprising the following steps:

step 1: counterfeit human face image to be detectedIPreprocessing the image into a preset size and inputting the preset size into a trunk feature extraction network of a three-cell network to obtain the depth feature of the face imageNet(I)；Net(I)Is a characteristic vector of 2048 dimensions;

step 2: using a classification network pairNet(I)Classifying, namely outputting a 2-dimensional feature by a classification network for the input 2048-dimensional feature vector, and converting the numerical value of the 2-dimensional feature into relative probability through Softmax processing to express the relative probability of authenticity of the picture; wherein, the picture with the probability larger than the preset value is a forged face picture;

the main feature extraction network of the three-cell network adopts a framework of an Xconvergence network, and comprises an inlet flow, a middle flow and an outlet flow; the ingress stream contains 2 convolutions of 3 × 3 and is activated using the ReLU activation function and 3 volume blocks; the intermediate stream contains 8 convolution modules; the exit stream comprises a volume block and two times of 3 x 3 depth separable convolutions, and is activated by using a ReLU function, and finally an average pooling operation is performed; three trunk characteristic extraction networks share one weight;

the classification network adopts a BP neural network and comprises three layers, wherein the first layer is an input layer and comprises 2048 nodes; the middle layer comprises 1024 nodes, and the output layer comprises 2 nodes; and a ReLU activation function is used between each layer for activation.

2. The method for detecting the deep forged face based on the triplet network as claimed in claim 1, wherein: the trunk feature extraction network of the three-cell network in the step 1 is a trained trunk feature extraction network; the training process comprises the following substeps:

step 1.1: acquiring a plurality of triplets of original image-target image-forged image, and recording as (original,target,fake)；

Step 1.2: for each set of triplets: (original，target，fake) Sequentially sending the images in the triplets into a trunk feature extraction network of the triplet networkNet( )(ii) a Obtaining the depth features of the images respectively, and recording as (Net(original)，Net(target)，Net(fake) ); wherein the trunk feature extraction networkNet( )Sharing the weight of (1);

step 1.3: calculating the feature distance between the depth feature of the original image and the depth feature of the target imageDis(Net(original),Net(target) And feature distances of target image depth features and forged image depth featuresDis(Net(target),Net(fake) Therein), whereinDis(a,b) Representing two eigenvectorsa,bThe feature distance between them, the feature vector distance calculation formula is as follows:

；

wherein

Represents the L2 distance;

step 1.4: calculating the network loss function of the triplet according to the characteristic distance calculated in the step 1.3loss，The loss function calculation formula is as follows:

loss=max (Dis(Net(original),Net(target))-Dis(Net(target),Net(fake))+margin,0)；

wherein,marginsetting the interval between two characteristic distances for the hyper-parameter;

step 1.5: after calculating the loss, performing back propagation and optimization on the trunk feature extraction network by using an Adam optimizer;

and 1.6, repeating the step 1.1-1.5, and training the trunk feature extraction network until convergence to obtain the trained trunk feature extraction network.

3. The method for detecting the deep forged face based on the triplet network as claimed in claim 2, wherein the step 1.1 comprises the following steps:

step 1.1.1: selecting frame images starting from specified frames and having fixed frame number intervals from the forged face video, the target face video and the original face video; enabling the original face video frame, the target face video frame and the forged face video frame to correspond one to one; generating a group of original face-target face-fake face images;

step 1.1.2: preprocessing the images in the step 1.1.1, and identifying and cutting a face area of each image through a face detection technology; aligning the human face through the human face feature point, so that the aligned human face is positioned in the center of the image; organizing the obtained image into a triplet of an original image, a target image and a fake image, and marking as (original,target,fake)。

4. The method for detecting the deep forged face based on the triplet network as claimed in claim 1, wherein: the classification network in the step 2 is a trained classification network; the training process comprises the following substeps:

step 2.1: counterfeit human face image to be detectedIPreprocessing into 299 x 3 size, inputting into the main feature extraction network in the trained three-cell network to obtain the depth feature of the face imageNet(I)； Net(I)Is a characteristic vector of 2048 dimensions;

step 2.2: using a classification network pairNet(I)Classifying, namely outputting a 2-dimensional feature by a classification network for the input 2048-dimensional feature vector, and converting the numerical value of the 2-dimensional feature into relative probability through Softmax processing to express the relative probability of authenticity of the picture;

step 2.3: calculating intersections between predicted results and actual resultsEntropy loss functionloss，lossThe calculation method of (2) is as follows:

；

whereinp _iRepresentative sampleiThe probability of being true is the probability that,y _irepresentative sampleiIf the sample is a label ofiTo forge a picture, theny _i=0, otherwise y _i=1；

Step 2.4: after the cross entropy loss is calculated, an SGD optimizer is used, and a gradient descent method is used for back propagation and optimizing a classification network;

step 2.5: and (5) repeating the step 2.1 and the step 2.4 until the classification network is converged to obtain the trained classification network.

5. A depth fake face detection system based on a three-cell network is characterized by comprising the following modules:

module 1 for detecting a fake face image to be detectedIPreprocessing the image into a preset size and inputting the preset size into a trunk feature extraction network of a three-cell network to obtain the depth feature of the face imageNet(I)；Net(I)Is a characteristic vector of 2048 dimensions;

module 2 for utilizing a classification network pairNet(I)Classifying, namely outputting a 2-dimensional feature by a classification network for the input 2048-dimensional feature vector, and converting the numerical value of the 2-dimensional feature into relative probability through Softmax processing to express the relative probability of authenticity of the picture; wherein, the picture with the probability larger than the preset value is a forged face picture;

6. A depth forgery face detection device based on three-cell network is characterized by comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the network-based deep false face detection method according to any one of claims 1 to 4.