CN111986180B - Face forged video detection method based on multi-correlation frame attention mechanism - Google Patents

Face forged video detection method based on multi-correlation frame attention mechanism Download PDF

Info

Publication number
CN111986180B
CN111986180B CN202010851718.7A CN202010851718A CN111986180B CN 111986180 B CN111986180 B CN 111986180B CN 202010851718 A CN202010851718 A CN 202010851718A CN 111986180 B CN111986180 B CN 111986180B
Authority
CN
China
Prior art keywords
frame
video
attention
face
target frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010851718.7A
Other languages
Chinese (zh)
Other versions
CN111986180A (en
Inventor
张勇东
胡梓珩
谢洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010851718.7A priority Critical patent/CN111986180B/en
Publication of CN111986180A publication Critical patent/CN111986180A/en
Application granted granted Critical
Publication of CN111986180B publication Critical patent/CN111986180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Abstract

The invention discloses a face counterfeit video detection method based on a multi-correlation frame attention mechanism. And designing an inter-frame attention mechanism, calculating dynamic association information among the characteristic streams of each frame, fusing the dynamic association information with intra-frame static characteristics and inter-frame dynamic characteristics of a target frame, and judging whether human face tampering exists from the overall perspective of a video by taking the dynamic association information as a prediction basis. The detection precision of the forged video can be improved, and meanwhile, the robustness of the forged video on image quality reduction and new tampering modes is achieved.

Description

Face forged video detection method based on multi-correlation frame attention mechanism
Technical Field
The invention relates to the technical field of counterfeit video detection, in particular to a face counterfeit video detection method based on a multi-correlation frame attention mechanism.
Background
With the development of deep learning, especially GAN (generative countermeasure network) and other technologies, many video face tampering methods and programs have recently been generated, which can replace the face of an original person in a video with the face of another person, or tamper the expression of the person, while maintaining the visual reality of the video. The procedures are simple to operate, the produced video has vivid effect, is difficult to distinguish by common people, and can generate adverse effects of law and morality if being used maliciously, so that an effective counterfeit video detection method is urgently needed at present.
The existing detection technology of fake video aiming at human face is mainly divided into two types: (1) the method is based on a single-frame image in a video, but the method does not consider time domain information of the video, converts the classification problem of the video into the classification problem of the image, resolves a large number of real videos and forged videos into frame images, uses the frame images as a training data set, designs various network structures to train the real and fake images to obtain two classifiers, extracts a plurality of frames from the video to be detected, and respectively gives a prediction result. (2) A frame sequence based method is characterized in that a plurality of frames of a video are sent into a network, and the features of the frames are fused by means of RNN, LSTM and the like to give a binary classification result. The above prior art methods have achieved some basic effects, but have some problems: in the method (1), during model training, the precision is improved quickly to reach a high level, but during testing, the effect is greatly reduced. The method (2) when the video quality is poor, the detection precision is poor; especially, in the process of spreading the forged video on the internet, the forged video can be repeatedly forwarded and compressed, the image quality is reduced, the tampering trace becomes fuzzy, and the detection difficulty is further increased.
Disclosure of Invention
The invention aims to provide a face counterfeit video detection method based on a multi-correlation frame attention mechanism, which can improve the detection precision of counterfeit videos and has robustness on the reduction of the image quality of detected videos or the use of a new tampering mode
The purpose of the invention is realized by the following technical scheme:
a face forgery video detection method based on a multi-correlation frame attention mechanism comprises the following steps:
decoding a video to be detected into a frame sequence, and extracting a face image of each frame;
selecting a frame as a target frame, selecting N reference frames before and after the target frame, performing feature extraction on a face image in a 2N +1 frame, respectively calculating inter-frame attention information between image features of the target frame and the image features of each reference frame, respectively calculating an average value of the inter-frame attention information before and after the target frame, so as to obtain the attention information before and after the target frame, and then fusing the image features of the target frame with the attention information before and after the target frame;
and predicting based on the fusion result, so as to judge whether the video to be detected is the face forged video based on the prediction result of the whole video angle.
According to the technical scheme provided by the invention, a multi-stream structure is adopted for one video, and multiple frames are used as input. And designing an inter-frame attention mechanism, calculating dynamic association information among the characteristic streams of each frame, fusing the dynamic association information with the intra-frame static characteristics of a target frame, and judging whether human face tampering exists from the overall perspective of the video by taking the dynamic association information as a prediction basis. The method can improve the detection precision of the forged video, and has robustness to the degradation of the image quality of the detected video or the use of a new tampering mode.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a schematic overall flow chart of a face-forged video detection method based on a multi-correlation frame attention mechanism according to an embodiment of the present invention;
fig. 2 is a flowchart of a related frame fusion work of variable frame number in a testing stage according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an inter-frame attention mechanism provided by an embodiment of the present invention;
FIG. 4 is a diagram of a prediction module network architecture according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a visualization result of a convolutional layer according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Because the existing method mainly depends on the characteristics in a single frame, the dynamic relation among video frames is not mined, and if the whole video is viewed, some tampering marks are dynamically shown. The reason is that when the video is tampered, the face in each frame is fused into the original frame after the face in each frame is tampered, so that the tampering traces of all frames of the same video are not identical, and the video can be dynamically displayed. For the reasons, it is necessary to use inter-frame information of the video during detection, and the effect of detecting the tampered video can be improved by mining dynamic features by using the inter-frame information. Based on this, an embodiment of the present invention provides a face-forged video detection method based on a multi-correlation frame attention mechanism, as shown in fig. 1, which mainly includes:
1. for the video to be detected, decoding is carried out to form a frame sequence, and a face image of each frame is extracted.
In the embodiment of the invention, the DLIB (or other face recognition modules) can be used for detecting the face in each frame; illustratively, the face area may also be enlarged by 1.3 times, and the size may be set to 3 × 224 × 224.
2. Selecting a frame from a frame sequence as a target frame, selecting N reference frames before and after the target frame, extracting the features of a face image in a 2N +1 frame, respectively calculating inter-frame attention information between the image features of the target frame and the image features of each reference frame, respectively calculating an average value of the inter-frame attention information before and after the target frame, thereby obtaining the attention information before and after the target frame, and then fusing the image features of the target frame with the attention information before and after the target frame.
In the embodiment of the invention, the image can be input into a feature extraction network taking ResNet50 as a backbone to obtain corresponding image features; illustratively, the size of the feature extraction net output may be 256 × 29 × 29 (corresponding to C × H × W below).
The feature extraction network comprises 5 Bottleneck layers (Bottleneck layers) layer 1-layer 5. Each of the Bottleneck layers has three convolution layers and a BatchNormalization layer and a ReLU. The feature map generated by the last layer, layer5, is used as the output of the feature extraction module.
Preferred embodiments of this step are given below:
a training stage:
in the model training phase, N is 1, but it is needless to say that model training may be performed using other numerical values, and the description will be given by taking N as 1 only as an example.
For ease of understanding, fig. 1 gives a specific example, i.e. three frames of images are selected, the middle frame being called the target frame, denoted F1The other two frames are reference frames and are respectively marked as F2And F3. The image features extracted using the feature extraction network are denoted as V1,V2,V3These are 3-dimensional matrices having a size of C × H × W, wherein C, H, W represents the number of channels, height, and width, respectively. To facilitate subsequent calculations, they are transformed into a 2-dimensional matrix of C × HW; thereafter, V is calculated based on the inter-frame attention mechanism as shown in FIG. 31And V2,V1And V3Similarity matrix A between12、A13
Figure BDA0002644944490000041
Figure BDA0002644944490000042
Wherein W is a weight parameter matrix of C × C, and the obtained similarity matrix A12、A13The size of (A) is WH × WH. The attention diagram Z is then recalculated12And attention-seeking drawing Z13(ii) a With Z12The relevant formula is given for example:
Z12=V2A12
target frame feature vector V1Obtaining G through a convolution layer1Two attention diagrams Z12、Z13Obtaining attention information I by a convolution layer and using softmax normalization12And I13
Figure BDA0002644944490000043
Figure BDA0002644944490000044
Figure BDA0002644944490000045
As will be understood by those skilled in the art, the convolution layer can be mathematically expressed in terms of a weight W, plus an offset b, such that
Figure BDA00026449444900000411
And
Figure BDA00026449444900000412
and
Figure BDA0002644944490000048
for processing the weight and offset of the feature vector of the target frame and the two convolutional layers in the attention diagram, K is the number of convolutional layer convolutional kernels, namely the output G1、I12And I13Has K channels.
Since N is 1 in this example of the training phase, the average value does not need to be calculated, and the fusion can be directly performed by respectively fusing G1And I12、G1And I13Multiplying corresponding channels, and then cascading to obtain:
Figure BDA0002644944490000049
where K represents the number of convolution kernels in the convolution layer.
And (3) information fusion of a variable number of related frames in a test stage:
as shown in fig. 2, a workflow diagram of the test phase information fusion is shown (wherein Cross-attention is the inter-frame attention mechanism shown in fig. 3). In the target frame FtThe N reference frames selected before and after the frame are respectively marked as { Fb1,Fb2,...,FbNAnd { F }a1,Fa2,...,FaN}. Feature extraction of the face image in the 2N +1 frame, denoted asVt、{Vb1,Vb2,...,VbNAnd { V }a1,Va2,...,VaN}。
Respectively calculating the image features extracted from each reference frame and the target frame FtThe extracted image feature VtThe similarity matrix between them is expressed as:
Figure BDA00026449444900000410
wherein, VmnRepresenting the image features extracted from a reference frame, m ═ a, b, N ═ 1, 2.., N;
each reference frame utilizes its own image feature VmnWith corresponding similarity matrix AmnCalculating an attention map ZmnExpressed as:
Zmn=VmnAmn
characterizing the target frame by VtAfter each attention map is passed through a convolution layer (with K convolution kernels) and normalized using softmax (fig. 2 does not show the figure, and according to fig. 3, the operations of convolution and softmax are in a cross-annotation module and are not shown, fig. 2 shows the feature vector V of the target frametConvolution operation of (d) to obtain inter-frame attention information:
Figure BDA0002644944490000051
Figure BDA0002644944490000052
target frame feature vector and attention map ZmnAfter each convolution layer is processed, the convolution layers respectively have K channels, and i is a channel index.
Based on the scheme, inter-frame attention information between the image features of the target frame and the image features of the front and rear selected N reference frames is obtained: { Ib1,Ib2,...,IbNAnd { I }a1,Ia2,...,IaN}。
Calculating the average value of the attention information between frames to obtain the attention information I before the frame of the target framebAnd post-frame attention information Ia
Figure BDA0002644944490000053
Figure BDA0002644944490000054
G is to betAttention information I before frame of target framebAnd post-frame attention information IaMultiplying and cascading corresponding channels to obtain a fusion result It
Figure BDA0002644944490000055
3. And inputting the calculated fusion result into a prediction module, so as to judge whether the video to be detected is the face fake video or not based on the prediction result of the whole video angle.
In the embodiment of the present invention, a prediction module as shown in fig. 4 is used for prediction, and the prediction module includes: convolutional layer (capacitive), Average Pooling layer (Average Pooling), three Fully-connected layers (full-connected), and soffmax layer. The convolution layer output is connected to Batch Normalization (Batch Normalization) and the ReLU activation function. In order to prevent overfitting, dropout is added after the first two fully-connected layers, the random loss probability can be set to 0.75, and a ReLU activation function is also added.
The prediction module output is a two-dimensional vector p0,p1]The cross entropy loss function is used in the training phase and is expressed as:
Figure BDA0002644944490000061
wherein, [ y ]0,y1]A mark value representing a video in the training set, [ y ] when the video is an unforeseen video0,y1]=[1.0,0.0](ii) a When the video is a fake video, [ y0,y1]=[0.0,1.0],[p0,p1]A two-dimensional vector output by the prediction module.
In the test phase, p is calculated0/(p0+p1) As a prediction score. For example, a threshold value of 0.5 may be set, and when the prediction result is greater than 0.5, the video is determined to be a fake video, and when the prediction result is less than 0.5, the video is determined to be a real video.
In the embodiment of the present invention, the network model shown in fig. 1 may be implemented based on a PyTorch framework.
Illustratively, the training phase is optimized by a random gradient descent (SGD) optimizer with an initial learning rate of 0.001, weight decay of 0.0001, momentum of 0.95, batch size set to 12, and all parameters in the network are initialized with a gaussian distribution with a mean of 0 and variance of 0.001. The model was trained using a server equipped with 2 NVIDIARTX2080Ti GPUs, Intel Xeon E5-2695 CPUs, and Ubuntu16.04 operating system. As previously described, the update parameters are propagated back using 3 frames per video in the training phase using a cross-entropy loss function.
And after the model training is finished, testing, wherein the loss function is not calculated and back propagation is not performed in the testing stage, and the parameters are kept fixed. Different from the training phase, the number N of the selected relevant frames may not be 1, that is, the number of the fused inter-frame relevant information is variable.
The effects of the above method of the embodiment of the present invention are explained as follows:
the experiment was performed starting from 1 for the number of relevant frames N. When N is 1, it is equivalent to take one frame before and after the target frame (i.e. the example given above with reference to fig. 1), and then increase the number of N. Experiments show that the detection effect is best when N is 4 for a 10-second video, which is equivalent to taking 4 frames before and 4 frames after the target frame, adding the target frame to the 4 frames, and totaling 9 frames, and averaging 1 second frame. If the number of reference frames is continuously increased, the detection effect is not obviously improved. The inter-frame attention mechanism and the multi-frame structure designed by the method are effective, and the dynamic change of the inter-frame can be extracted, and the dynamic change mode of tampering the video fake trace is learned, so that the detection effect is improved. In implementation, the program and the trained model can be installed on a social media website or a background server of a short video application to detect the video uploaded by a user, so that forged videos made by various mainstream face tampering methods can be effectively detected, the authenticity of the uploaded video is ensured, and the propagation of false information by using the forged videos is prevented, so that adverse effects are avoided.
For some truer tampered videos, the face replacement effect is more vivid, the authenticity of the video is difficult to judge only from one frame of still image, but some flaws exist in dynamic playing. Compared with the prior art, the scheme discovers the dynamic change information between frames by calculating the attention diagrams among multiple frames. In addition, a multi-stream structure is also used, and a frame sequence of the video is used as input to calculate an attention diagram between every two frames, so that the inter-frame relation of the video is modeled, and the analysis is carried out from the perspective of the whole video. Based on the design, not only static marks of tampering are learned, but also dynamic change modes generated by tampering are learned, the detection performance is enhanced, and the problem that interframe information is not utilized in the existing method is solved.
The method achieves the most advanced effect when experiments are carried out on two face tampering video data sets of faceforces + + and Celeb-DF (V2). The main stream tampering methods, namely, Deepfaces, Face2Face and faceSwap, are accurate to more than 98%. The accuracy rate of the image quality is over 95 percent even in extremely low image quality (faceforces + + c 40). The model trained using faceforces + + c40 was tested on Celeb-DF (V2) and auc reached 70.4. Meanwhile, experiments prove that the detection precision can be improved by increasing the number N of related frames in the testing stage, the number of related frames is related to the length of the video to be detected, when one frame of related frame is taken about every second, the effect is best, more is obtained, the effect is slightly improved, but more computing resources are consumed. As can be seen from the visualization of the convolution layer after the attention information and the target frame feature vector are fused in fig. 5, the network focuses on the tampered parts with large dynamic changes, such as eyes and mouths, of a human, and plays a role in correct judgment.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A face forgery video detection method based on a multi-correlation frame attention mechanism is characterized by comprising the following steps:
decoding a video to be detected into a frame sequence, and extracting a face image of each frame;
selecting a frame as a target frame, selecting N reference frames before and after the target frame, performing feature extraction on a face image in a 2N +1 frame, respectively calculating inter-frame attention information between image features of the target frame and the image features of each reference frame, respectively calculating an average value of the inter-frame attention information before and after the target frame, so as to obtain the attention information before and after the target frame, and then fusing the image features of the target frame with the attention information before and after the target frame;
predicting based on the fusion result, and judging whether the video to be detected is a face forged video based on the prediction result of the whole video angle;
wherein, respectively calculating inter-frame attention information between the image features of the target frame and the image features of each reference frame comprises: respectively calculating the image features extracted from each reference frame and the target frame FtThe extracted image feature VtA similarity matrix between; each reference frame calculates an attention diagram by using the image characteristics of the reference frame and the corresponding similarity matrix, and then inter-frame attention information is obtained.
2. The method for detecting the face-forged video based on the multi-correlation frame attention mechanism as claimed in claim 1, wherein the target frame F is a target frametThe N reference frames selected before and after the frame are respectively marked as { Fb1,Fb2,...,FbNAnd { F }a1,Fa2,...,FaN};
Extracting the features of the face image in the 2N +1 frame and marking as Vt、{Vb1,Vb2,...,VbNAnd { V }a1,Va2,...,VaN}。
3. The method for detecting face-forged video based on multi-correlation frame attention mechanism as claimed in claim 2,
the calculation formula of the similarity matrix is as follows:
Figure FDA0003000678190000011
wherein, VmnRepresenting the image features extracted from a reference frame, m ═ a, b, N ═ 1, 2.., N; w is a weight parameter matrix;
the calculation attention force diagram is as follows:
Zmn=VmnAmn
after each attention map is passed through the convolutional layer and normalized using softmax, inter-frame attention information is obtained:
Figure FDA0003000678190000012
wherein the content of the first and second substances,
Figure FDA0003000678190000021
weights, offsets of convolutional layers, each of which is a processing attention map; k represents the number of convolution kernels in the convolution layer and i is the channel index.
4. The method for detecting face-forged video based on multi-correlation frame attention mechanism as claimed in claim 1, 2 or 3,
the inter-frame attention information between the image feature of the target frame and the image features of the N selected reference frames before and after the target frame is expressed as { Ib1,Ib2,...,IbNAnd { I }a1,Ia2,...,IaN};
Calculating the average value of the attention information between frames to obtain the attention information I before the frame of the target framebAnd post-frame attention information Ia
Figure FDA0003000678190000022
Figure FDA0003000678190000023
5. The method for detecting face-forged video based on multi-correlation frame attention mechanism as claimed in claim 1,
image characteristic V of target frametObtaining G through a convolution layertAnd then with the pre-frame attention information I of the target framebAnd post-frame attention information IaMultiplying and cascading corresponding channels to obtain a fusion result It
Figure FDA0003000678190000024
Figure FDA0003000678190000025
Wherein the content of the first and second substances,
Figure FDA0003000678190000026
weights, offsets of convolution layers each processing a feature vector of a target frame; k represents the number of convolution kernels in the convolution layer and i is the channel index.
6. The method for detecting the video frequency forged by the human face based on the multi-correlation frame attention mechanism as claimed in claim 1, wherein the prediction is performed by a prediction module, and the prediction module comprises, in sequence: the multilayer film comprises a convolution layer, an average pooling layer, three full-connection layers and a softmax layer; the prediction module output is a two-dimensional vector p0,p1]Calculating p0/(p0+p1) As a prediction score; comparing the prediction score with a threshold value, and if the prediction score exceeds the threshold value, determining that the video to be detected is a fake video; otherwise, the video is real video.
7. The method for detecting the video frequency forged by the human face based on the multi-correlation frame attention mechanism as claimed in claim 1 or 6, characterized in that, a cross entropy loss function is used in the training process, which is expressed as:
Figure FDA0003000678190000027
wherein, [ y ]0,y1]A mark value representing a video in the training set, [ y ] when the video is an unforeseen video0,y1]=[1.0,0.0](ii) a When the video is a fake video, [ y0,y1]=[0.0,1.0],[p0,p1]A two-dimensional vector output by the prediction module.
CN202010851718.7A 2020-08-21 2020-08-21 Face forged video detection method based on multi-correlation frame attention mechanism Active CN111986180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010851718.7A CN111986180B (en) 2020-08-21 2020-08-21 Face forged video detection method based on multi-correlation frame attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010851718.7A CN111986180B (en) 2020-08-21 2020-08-21 Face forged video detection method based on multi-correlation frame attention mechanism

Publications (2)

Publication Number Publication Date
CN111986180A CN111986180A (en) 2020-11-24
CN111986180B true CN111986180B (en) 2021-07-06

Family

ID=73443101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010851718.7A Active CN111986180B (en) 2020-08-21 2020-08-21 Face forged video detection method based on multi-correlation frame attention mechanism

Country Status (1)

Country Link
CN (1) CN111986180B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733625B (en) * 2020-12-28 2022-06-14 华南理工大学 False face video tampering detection method and system based on time domain self-attention mechanism
CN112733733A (en) * 2021-01-11 2021-04-30 中国科学技术大学 Counterfeit video detection method, electronic device and storage medium
CN112749686B (en) * 2021-01-29 2021-10-29 腾讯科技(深圳)有限公司 Image detection method, image detection device, computer equipment and storage medium
CN113205044B (en) * 2021-04-30 2022-09-30 湖南大学 Deep fake video detection method based on characterization contrast prediction learning
CN113435292B (en) * 2021-06-22 2023-09-19 北京交通大学 AI fake face detection method based on inherent feature mining
CN113378514B (en) * 2021-08-12 2021-11-05 华东交通大学 Multi-label data feature selection processing method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69637782D1 (en) * 1995-05-08 2009-01-29 Digimarc Corp Method for embedding machine-readable steganographic code
US7376241B2 (en) * 2003-08-06 2008-05-20 The Boeing Company Discrete fourier transform (DFT) watermark
CN108229325A (en) * 2017-03-16 2018-06-29 北京市商汤科技开发有限公司 Method for detecting human face and system, electronic equipment, program and medium
CN107392142B (en) * 2017-07-19 2020-11-13 广东工业大学 Method and device for identifying true and false face
CN110880172A (en) * 2019-11-12 2020-03-13 中山大学 Video face tampering detection method and system based on cyclic convolution neural network
CN111144314B (en) * 2019-12-27 2020-09-18 北京中科研究院 Method for detecting tampered face video

Also Published As

Publication number Publication date
CN111986180A (en) 2020-11-24

Similar Documents

Publication Publication Date Title
CN111986180B (en) Face forged video detection method based on multi-correlation frame attention mechanism
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN104063706B (en) Video fingerprint extraction method based on SURF algorithm
EP2568429A1 (en) Method and system for pushing individual advertisement based on user interest learning
JP2006172437A (en) Method for determining position of segment boundary in data stream, method for determining segment boundary by comparing data subset with vicinal data subset, program of instruction executable by computer, and system or device for identifying boundary and non-boundary in data stream
CN111144314B (en) Method for detecting tampered face video
Jiang et al. A deep evaluator for image retargeting quality by geometrical and contextual interaction
Yang et al. Spatiotemporal trident networks: detection and localization of object removal tampering in video passive forensics
Li et al. One-class knowledge distillation for face presentation attack detection
CN110135446A (en) Method for text detection and computer storage medium
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
Zhang et al. Crowd counting based on attention-guided multi-scale fusion networks
Yu et al. Unbiased multi-modality guidance for image inpainting
CN115424209A (en) Crowd counting method based on spatial pyramid attention network
Babnik et al. DifFIQA: Face image quality assessment using denoising diffusion probabilistic models
CN114724218A (en) Video detection method, device, equipment and medium
Jia et al. SiaTrans: Siamese transformer network for RGB-D salient object detection with depth image classification
Yu et al. SegNet: a network for detecting deepfake facial videos
CN112801092B (en) Method for detecting character elements in natural scene image
Aldhaheri et al. MACC Net: Multi-task attention crowd counting network
Cheng et al. Activity guided multi-scales collaboration based on scaled-CNN for saliency prediction
Yang et al. HeadPose-Softmax: Head pose adaptive curriculum learning loss for deep face recognition
CN111967399A (en) Improved fast RCNN behavior identification method
Liu et al. Video decolorization based on the CNN and LSTM neural network
CN116012835A (en) Two-stage scene text erasing method based on text segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant