CN113283393A - Method for detecting Deepfake video based on image group and two-stream network - Google Patents
Method for detecting Deepfake video based on image group and two-stream network Download PDFInfo
- Publication number
- CN113283393A CN113283393A CN202110717852.2A CN202110717852A CN113283393A CN 113283393 A CN113283393 A CN 113283393A CN 202110717852 A CN202110717852 A CN 202110717852A CN 113283393 A CN113283393 A CN 113283393A
- Authority
- CN
- China
- Prior art keywords
- network
- video
- frame
- stream
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Computational Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a method for detecting a Deepfake video based on an image group and a two-stream network, which comprises the following steps: (1) extracting key frames of a video to be detected to form an image group; (2) inputting the first frame of the image group into a spatial stream in a two-stream network to extract spatial features; (3) respectively differentiating the rest frames of the image group with the first frame to obtain a difference image, forming a difference image sequence, and inputting the difference image sequence into a time stream in the two-stream network to extract time characteristics; (4) and fusing the extracted spatial features and the time features, and evaluating the authenticity of the video by using a dynamic routing algorithm. Compared with the prior art, the method has the advantages that the computational redundancy is reduced by utilizing the image group, the network is concentrated on the key frame, the space-time information of the key frame is fully utilized by fusing the spatial characteristic and the time characteristic, and the classification is carried out by a dynamic routing algorithm to obtain a more accurate evaluation result.
Description
Technical Field
The invention belongs to the field of video detection, and particularly relates to a method for detecting a Deepfake video based on an image group and a two-stream network.
Background
With the rise and development of artificial intelligence technology, face changing technology is gradually gaining wide attention in the continuous development process. The advent of deep is a breakthrough of face exchange technology, which is a technology that can replace the face image of a source person in a video with the face image of a target person. With the advent and optimization of generative confrontation networks, face exchange becomes easier and less noticeable to the naked eye. Celebrities and politicians have a large number of videos released on the network as public characters, so that lawless persons can forge videos at will, thereby spreading false information, making confusion and the like, and threatening the human society. Therefore, detection aiming at the Deepfake video is not slow, and the method has great practical significance.
The detection method of the deep video can be divided into detection based on intra-frame artifacts and detection based on inter-frame time characteristics. The detection method based on the intra-frame artifacts firstly decomposes the video into frames, then analyzes all the frames, and judges the video authenticity by averaging the results of all the frames to obtain the prediction of the video level. This method is similar to image detection, except that after compression, the sharpness of the video frame is reduced and the detection difficulty is increased. Although CNN can correctly predict each frame, predicting the authenticity of a video by calculating an average is not accurate. The second method based on the inter-frame time characteristics takes the video as a whole, takes the time correlation between video frames into consideration, and evaluates the Deepfake video more reasonably. However, both of the above two methods have a common problem that the result can be obtained only by analyzing the whole video, and the similarity between video frames inevitably causes high redundancy of information between video frames, so that the detection method has a large calculation amount and is slow in processing.
Disclosure of Invention
In order to solve the problems of large calculation amount and low efficiency of the existing Deepfake video detection technology, the invention provides a Deepfake video detection method based on an image group and a two-stream network. The technical scheme adopted by the invention is as follows:
a method for detecting a Deepfake video based on an image group and a two-stream network comprises the following steps:
step 1: extracting key frames of a video to be detected to form an image group;
step 2: inputting the first frame of the image group into a spatial stream in a two-stream network to extract spatial features;
and step 3: differentiating the rest frames of the image group with the first frame to obtain a difference image, forming a difference image sequence, and inputting the difference image sequence into a time stream in the two-stream network to extract time characteristics;
and 4, step 4: and fusing the extracted spatial features and the time features, and evaluating the authenticity of the video by using a dynamic routing algorithm.
Further, in the step 1, a face region image in a video frame is obtained by cutting in a fixed size, the face region images between adjacent frames are differentiated, 10 frames of face region images with the largest face region change are extracted according to the average intensity of the inter-frame difference to serve as key frames, and an image group is formed according to time sequence to represent the video.
Further, the calculation formula of the inter-frame difference method is as follows,
absDiffi=Fi-Fi-1,
wherein, Fi、Fi-1Respectively representing the face region image of the i-th frame and the face region image of the i-1 th frame, absDiffiRepresenting the difference between the face area image of the ith frame and the face area image of the (i-1) th frame; the calculation expression of the average strength of the inter-frame difference is as follows,
wherein, absDiffi(x, y) is absDiffiThe values at coordinates (x, y), width and height, respectively, represent the width and height of the face region image, diffMeaniAnd the average intensity of the difference between the face region image of the ith frame and the face region image of the (i-1) th frame is represented.
Further, the two-stream network in step 2 and step 3 includes spatial stream and temporal stream; the spatial stream is composed of parts of the first sequence to the fifth sequence of the pre-trained ResNet50 network and a main capsule network and is used for extracting spatial features; the time flow consists of a spatial pyramid pooling network and a GRU network and is used for extracting time characteristics; the spatial characteristics are used as auxiliary information and assigned to a hidden state of the GRU network; the GRU network is used for analyzing time coherence; the two-flow network is trained by adopting an Adam optimization algorithm, the loss function adopts a cross entropy loss function, and the expression of the cross entropy loss function is as follows
Wherein L is the loss value, y is the sum ofRespectively representing a sample label and a prediction label.
Furthermore, the capsule structures of the main capsule networks are the same and comprise two-dimensional convolution layers, a statistic pool layer and a one-dimensional convolution layer, wherein the statistic pool layer is used for calculating the mean value and the variance of each convolution kernel; the calculation expression of the mean value is as follows,
the computational expression of the variance is as follows,
wherein, mukMeans, I, representing the k-th layer convolution kernelkijRepresenting the value at the k-th layer of the convolution kernel (i, j), W, H representing the width and height of the convolution kernel respectively,representing the variance of the k-th layer convolution kernel.
Furthermore, the output of the spatial pyramid pooling network is a one-dimensional feature vector, the length of the feature vector is determined by the number N of pyramid layers,where the coefficient 3 is the dimension of the difference map.
Further, the difference map in step 3 can be expressed as
Diffm-1=Fm-F1,m=2,…,10,
Wherein Diffm-1Represents the m-1 th difference chart, FmAnd F1Respectively showing the mth frame and the first frame in the image group.
Further, in the step 4, the spatial features and the temporal features are spliced and fused and transmitted to the digital capsule network through a dynamic routing algorithm; the output vector of the digital capsule network is averaged after softmax to obtain the final network output vectorRepresenting the probability that the video is a Deepfake video,representing the probability that the video is a real video ifThen the network predicts the labelIf the video to be detected is the Deepfake videoThen the network predicts the labelThe video to be detected is a real video.
Compared with the prior art, the invention has the beneficial effects that: key frames in the video are selected through inter-frame difference to form an image group to replace a video input network, so that the network can mainly learn the characteristics of the key video frames, the calculation redundancy is reduced, and the operation efficiency is improved; the space-time combined two-flow detection network is provided and a dynamic routing algorithm is adopted for detection, so that the space characteristics and the time characteristics of the image group are fully utilized, and the detection precision is effectively improved.
Drawings
FIG. 1 is a method block diagram of the present invention.
Fig. 2 is a schematic structural diagram of a spatial flow network according to the present invention.
Fig. 3 is a schematic structural diagram of the spatial pyramid pooling network of the present invention.
Fig. 4 is pseudo code of the dynamic routing algorithm of the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
FIG. 1 shows a flow chart of the present invention, which comprises the following steps:
(1) extracting key frames from video to be detected to form image group
The method comprises the steps of cutting a video to be detected in a fixed size to obtain a face region image, extracting a key frame by utilizing interframe difference to serve as an image group of an input network, and extracting the key frame by taking the two frame images as a difference and then obtaining a frame with larger change according to the average strength of interframe difference as a key frame. Since there is strong temporal correlation between video frames, in order not to lose temporal features, extracted 10-frame key frames are sequentially combined into a group of images to represent a video. The calculation formula of the interframe difference method is shown as formula (1), the calculation expression of the average strength of interframe difference is shown as formula (2),
absDiffi=Fi-Fi-1, (1)
wherein, Fi、Fi-1Respectively representing the face region image of the i-th frame and the face region image of the i-1 th frame, absDiffiabsDiff, which is a difference between the face area image of the i-th frame and the face area image of the i-1 th framei(x, y) is absDiffiThe values at coordinates (x, y), width and height, respectively, represent the width and height of the face region image, diffMeaniAnd the average intensity of the difference between the face region image of the ith frame and the face region image of the (i-1) th frame is represented.
(2) The first frame of the group of images is input into the spatial stream in the two-stream network to extract spatial information.
Since the number of Deepfake videos is small, the network training is not suitable to start from zero, and in order to avoid training overfitting, the part of the ResNet50 network that is pre-trained on the ILSVRC database is used to extract potential features, compared to a full ResNet50 network, using the first to fifth sequences of the pre-trained network (two blocks in the first conv layer and the second conv layer) is more advantageous for detection because the full ResNet50 network extracts high-level semantic information, which ignores the artifact features within the frame.
As shown in fig. 2, the complete capsule network includes a plurality of main capsule networks for extracting key features and a digital capsule network for classification. The main capsule network is composed of a plurality of groups of neurons called capsules, each capsule can have different structures, in order to simplify the operation, the invention adopts the capsules with the same structure, each capsule comprises a two-dimensional convolution layer, a statistical pool layer and a one-dimensional convolution layer, the statistical pool layer is used for calculating the mean value and the variance of each convolution kernel, the expressions of the mean value and the variance are respectively an expression (3) and an expression (4),
wherein, mukMeans, I, representing the k-th layer convolution kernelkijRepresenting the value at the k-th layer of the convolution kernel (i, j), W, H representing the width and height of the convolution kernel respectively,representing the variance of the k-th layer convolution kernel.
In image processing, CNN focuses on the detection of important features in an image, and ignores the spatial relationship between features. The capsule network is based on the learning characteristics of each complete capsule, each capsule represents the characteristics of different human face areas, such as eyes, nose, mouth and the like, and the capsule network is a directional vector, can reflect spatial hierarchy information and is more robust to false face detection.
(3) Temporal streaming extraction of inter-frame disparity in a two-stream network with residual frames of an image group
Because each frame in the image group has strong similarity, under the condition of carrying out spatial feature analysis on the main frame, the difference image sequence is obtained by subtracting the remaining multiple frames from the main frame respectively, such as formula 5, which is favorable for reducing feature redundancy and saving computing resources.
Diffm-1=Fm-F1,m=2,…,10, (5)
Wherein Diffm-1Represents the m-1 th difference chart, FmAnd F1Respectively showing the mth frame and the first frame in the image group.
After generating the difference graph sequence, the temporal coherence between frames is analyzed by using a GRU network, the GRU network is generally used for text analysis, a cell predicts a word, the word is represented by a one-dimensional vector, the human face difference graph in the invention is three-dimensional, and the human face difference graph needs to be tiled into a one-dimensional shape in order to adapt to the GRU network. Because the difference map is sparse, direct tiling not only causes space waste, but also increases the amount of calculation, so that the invention adopts a spatial pyramid pooling network to extract key information of the three-dimensional difference map. The spatial pyramid pooling network can obtain output with fixed size no matter what the input size is, as shown in fig. 3, pooling difference maps in different scales, combining the pooled features obtained in each scale into a one-dimensional feature vector, the length of the feature vector is determined by the number N of pyramid layers,wherein the coefficient 3 is the dimension of the difference graph, and N is generally 3-5.
The one-dimensional feature vector learned from the three-dimensional difference map is input into the GRU network to extract time inconsistency information. Compared with the LSTM network, the GRU network can choose to forget and memorize by using the same updating gate control, thereby greatly reducing the parameter quantity and accelerating the network training. Hidden states in the GRU network are generally initialized to zero, and spatial features extracted from the spatial streams are assigned to the hidden states as auxiliary information in the invention. Because the input in the time stream is obtained by differentiating with the first frame, a large number of important characteristics are lost, and the characteristics are extracted from the space stream, so the characteristics are directly introduced into the time stream, thereby avoiding repeated extraction of the space characteristics, and achieving the purposes of reducing redundancy and accelerating the training and detection process.
(4) Evaluating authenticity of video to be detected by utilizing dynamic routing algorithm
After learning of the time characteristics and the space characteristics of the two-stream network, the two are spliced to realize fusion of space-time characteristics, and the possibility of video truth is calculated by using a dynamic routing algorithm to obtain a video evaluation result. The dynamic routing algorithm is proposed in the capsule network, can be regarded as a full-connection layer of a vector version, and can more accurately route the features to the category to which the features belong by using the length of the vector to express the probability of the existence of the entity. The specific dynamic routing algorithm is shown in fig. 4, the space characteristic and the time characteristic are spliced and fused and transmitted to the digital capsule network through the dynamic routing algorithm, and the output vector of the digital capsule network is averaged after passing through softmax to obtain the final network output vectorRepresenting the probability that the video is a Deepfake video,representing the probability that the video is a real video ifThen the network predicts the labelIf the video to be detected is the Deepfake videoThen the network predicts the labelThe video to be detected is a real video.
Since the capsule is forensically not reconstructed, the network is trained using only the cross-entropy loss function, expressed as follows,
wherein L is the loss value, y is the sum ofRespectively representing a sample label and a prediction label, wherein the training data is from a faceforces + + dataset, respectively extracting key frames of each video in the dataset to form an image group, the image group sample label from the Deepfake video is 0, and the image group sample label from the real video is 1.
In conclusion, the method for detecting the Deepfake video utilizes the image group, greatly reduces the calculation redundancy, and enables the network to be concentrated on the key video segment; extracting space and time characteristics of the image group by using a two-stream network, and fully mining key characteristics of video authenticity as a judgment basis; and finally, classifying through a dynamic routing algorithm, so that an evaluation result can be obtained more accurately.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.
Claims (8)
1. A method for detecting a Deepfake video based on an image group and a two-stream network is characterized by comprising the following steps:
step 1: extracting key frames of a video to be detected to form an image group;
step 2: inputting a first frame of an image group into a spatial stream in a two-stream network to extract spatial information as spatial features;
and step 3: respectively differentiating the rest frames of the image group with the first frame to obtain a difference image, forming a difference image sequence, and inputting the difference image sequence into a time stream in a two-stream network to extract the inter-frame inconsistency as a time characteristic;
and 4, step 4: and fusing the extracted spatial features and the time features, and evaluating the authenticity of the video by using a dynamic routing algorithm.
2. The method as claimed in claim 1, wherein in step 1, the face region images in the video frames are obtained by cropping with a fixed size, the face region images between adjacent frames are differentiated, 10 frames of face region images with the largest face region change are extracted as key frames according to the average intensity of the inter-frame difference, and the image groups are formed in time sequence to represent the video.
3. The method for detecting the Deepfake video based on the image group and the two-stream network as claimed in claim 2, wherein the calculation formula of the inter-frame difference method is as follows,
absDiffi=Fi-Fi-1,
wherein, Fi、Fi-1Respectively representing the face region image of the i-th frame and the face region image of the i-1 th frame, absDiffiRepresenting the difference between the face area image of the ith frame and the face area image of the (i-1) th frame; the calculation expression of the average strength of the inter-frame difference is as follows,
wherein, absDiffi(x, y) is absDiffiThe values at coordinates (x, y), width and height, respectively, represent the width and height of the face region image, diffMeaniAnd the average intensity of the difference between the face region image of the ith frame and the face region image of the (i-1) th frame is represented.
4. The method for detecting the Deepfake video based on the image group and the two-stream network as claimed in claim 1, wherein the two-stream network in the steps 2 and 3 comprises a spatial stream and a temporal stream; the spatial stream is composed of parts of the first sequence to the fifth sequence of the pre-trained ResNet50 network and a main capsule network and is used for extracting spatial features; the time flow consists of a spatial pyramid pooling network and a GRU network and is used for extracting time characteristics; the spatial characteristics are used as auxiliary information and assigned to a hidden state of the GRU network; the GRU network is used for analyzing time coherence; the two-flow network is trained by adopting an Adam optimization algorithm, the loss function adopts a cross entropy loss function, the expression is as follows,
5. The method as claimed in claim 4, wherein the main capsule network has the same capsule structure, and includes two-dimensional convolutional layers, a statistical pool layer and a one-dimensional convolutional layer, wherein the statistical pool layer is used for calculating the mean and variance of each convolutional kernel; the calculation expression of the mean value is as follows,
the computational expression of the variance is as follows,
7. The method as claimed in claim 1, wherein the difference map in step 3 is represented as a difference map in the form of a spatio-temporal combination two-stream network
Diffm-1=Fm-F1,m=2,…,10,
Wherein Diffm-1Represents the m-1 th difference chart, FmAnd F1Respectively showing the mth frame and the first frame in the image group.
8. The method for detecting the Deepfake video based on the image group and the two-stream network as claimed in claim 1, wherein in the step 4, the spatial feature and the temporal feature are merged and fused and transmitted to the digital capsule network through a dynamic routing algorithm; the output vector of the digital capsule network is averaged after softmax to obtain the final network output vector Representing the probability that the video is a Deepfake video,representing the probability that the video is a real video ifThen the network predicts the labelIf the video to be detected is the Deepfake videoThen the network predicts the labelThe video to be detected is a real video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110717852.2A CN113283393B (en) | 2021-06-28 | 2021-06-28 | Deepfake video detection method based on image group and two-stream network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110717852.2A CN113283393B (en) | 2021-06-28 | 2021-06-28 | Deepfake video detection method based on image group and two-stream network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113283393A true CN113283393A (en) | 2021-08-20 |
CN113283393B CN113283393B (en) | 2023-07-25 |
Family
ID=77285677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110717852.2A Active CN113283393B (en) | 2021-06-28 | 2021-06-28 | Deepfake video detection method based on image group and two-stream network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113283393B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114494804A (en) * | 2022-04-18 | 2022-05-13 | 武汉明捷科技有限责任公司 | Unsupervised field adaptive image classification method based on domain specific information acquisition |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030090505A1 (en) * | 1999-11-04 | 2003-05-15 | Koninklijke Philips Electronics N.V. | Significant scene detection and frame filtering for a visual indexing system using dynamic thresholds |
US20120008836A1 (en) * | 2010-07-12 | 2012-01-12 | International Business Machines Corporation | Sequential event detection from video |
CN107451552A (en) * | 2017-07-25 | 2017-12-08 | 北京联合大学 | A kind of gesture identification method based on 3D CNN and convolution LSTM |
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN111182292A (en) * | 2020-01-05 | 2020-05-19 | 西安电子科技大学 | No-reference video quality evaluation method and system, video receiver and intelligent terminal |
CN111241958A (en) * | 2020-01-06 | 2020-06-05 | 电子科技大学 | Video image identification method based on residual error-capsule network |
CN111860414A (en) * | 2020-07-29 | 2020-10-30 | 中国科学院深圳先进技术研究院 | Method for detecting Deepfake video based on multi-feature fusion |
CN111967427A (en) * | 2020-08-28 | 2020-11-20 | 广东工业大学 | Fake face video identification method, system and readable storage medium |
KR20200132665A (en) * | 2019-05-17 | 2020-11-25 | 삼성전자주식회사 | Attention layer included generator based prediction image generating apparatus and controlling method thereof |
CN112163488A (en) * | 2020-09-21 | 2021-01-01 | 中国科学院信息工程研究所 | Video false face detection method and electronic device |
US20210042529A1 (en) * | 2019-08-07 | 2021-02-11 | Zerofox, Inc. | Methods and systems for detecting deepfakes |
CN112487989A (en) * | 2020-12-01 | 2021-03-12 | 重庆邮电大学 | Video expression recognition method based on capsule-long-and-short-term memory neural network |
CN112801037A (en) * | 2021-03-01 | 2021-05-14 | 山东政法学院 | Face tampering detection method based on continuous inter-frame difference |
CN112927202A (en) * | 2021-02-25 | 2021-06-08 | 华南理工大学 | Method and system for detecting Deepfake video with combination of multiple time domains and multiple characteristics |
CN112991278A (en) * | 2021-03-01 | 2021-06-18 | 华南理工大学 | Method and system for detecting Deepfake video by combining RGB (red, green and blue) space domain characteristics and LoG (LoG) time domain characteristics |
-
2021
- 2021-06-28 CN CN202110717852.2A patent/CN113283393B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030090505A1 (en) * | 1999-11-04 | 2003-05-15 | Koninklijke Philips Electronics N.V. | Significant scene detection and frame filtering for a visual indexing system using dynamic thresholds |
US20120008836A1 (en) * | 2010-07-12 | 2012-01-12 | International Business Machines Corporation | Sequential event detection from video |
CN107451552A (en) * | 2017-07-25 | 2017-12-08 | 北京联合大学 | A kind of gesture identification method based on 3D CNN and convolution LSTM |
KR20200132665A (en) * | 2019-05-17 | 2020-11-25 | 삼성전자주식회사 | Attention layer included generator based prediction image generating apparatus and controlling method thereof |
US20210042529A1 (en) * | 2019-08-07 | 2021-02-11 | Zerofox, Inc. | Methods and systems for detecting deepfakes |
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN111182292A (en) * | 2020-01-05 | 2020-05-19 | 西安电子科技大学 | No-reference video quality evaluation method and system, video receiver and intelligent terminal |
CN111241958A (en) * | 2020-01-06 | 2020-06-05 | 电子科技大学 | Video image identification method based on residual error-capsule network |
CN111860414A (en) * | 2020-07-29 | 2020-10-30 | 中国科学院深圳先进技术研究院 | Method for detecting Deepfake video based on multi-feature fusion |
CN111967427A (en) * | 2020-08-28 | 2020-11-20 | 广东工业大学 | Fake face video identification method, system and readable storage medium |
CN112163488A (en) * | 2020-09-21 | 2021-01-01 | 中国科学院信息工程研究所 | Video false face detection method and electronic device |
CN112487989A (en) * | 2020-12-01 | 2021-03-12 | 重庆邮电大学 | Video expression recognition method based on capsule-long-and-short-term memory neural network |
CN112927202A (en) * | 2021-02-25 | 2021-06-08 | 华南理工大学 | Method and system for detecting Deepfake video with combination of multiple time domains and multiple characteristics |
CN112801037A (en) * | 2021-03-01 | 2021-05-14 | 山东政法学院 | Face tampering detection method based on continuous inter-frame difference |
CN112991278A (en) * | 2021-03-01 | 2021-06-18 | 华南理工大学 | Method and system for detecting Deepfake video by combining RGB (red, green and blue) space domain characteristics and LoG (LoG) time domain characteristics |
Non-Patent Citations (7)
Title |
---|
AKUL MEHRA 等: "Deepfake Detection using Capsule Networks and Long Short-Term Memory Networks", 《HTTPS://PURL.UTWENTE.NL//ESSAYS/83028》, pages 407 - 414 * |
OSCAR DE LIMA 等: "Deepfake Detection using Spatiotemporal Convolutional Networks", 《ARXIV》, pages 1 - 6 * |
张怡暄 等: "基于帧间差异的人脸篡改视频检测方法", 《信息安全学报》, vol. 05, no. 02, pages 49 - 72 * |
张玫瑰: "基于关键帧的Deepfake视频检测算法", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2023, pages 138 - 2734 * |
耿鹏志 等: "基于篡改伪影的深度伪造检测方法", 《计算机工程》, vol. 47, no. 12, pages 156 - 162 * |
赵磊 等: "基于时空特征一致性的Deepfake视频检测模型", 《工程科学与技术》, vol. 52, no. 04, pages 243 - 250 * |
项俊 等: "时域模型对视频行人重识别性能影响的研究", 《计算机工程与应用》, vol. 56, no. 20, pages 152 - 157 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114494804A (en) * | 2022-04-18 | 2022-05-13 | 武汉明捷科技有限责任公司 | Unsupervised field adaptive image classification method based on domain specific information acquisition |
CN114494804B (en) * | 2022-04-18 | 2022-10-25 | 武汉明捷科技有限责任公司 | Unsupervised field adaptive image classification method based on domain specific information acquisition |
Also Published As
Publication number | Publication date |
---|---|
CN113283393B (en) | 2023-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108133188B (en) | Behavior identification method based on motion history image and convolutional neural network | |
CN112347859B (en) | Method for detecting significance target of optical remote sensing image | |
Wang et al. | Deep metric learning for crowdedness regression | |
Fan et al. | A survey of crowd counting and density estimation based on convolutional neural network | |
CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
CN109740419A (en) | A kind of video behavior recognition methods based on Attention-LSTM network | |
CN111723693B (en) | Crowd counting method based on small sample learning | |
CN110889449A (en) | Edge-enhanced multi-scale remote sensing image building semantic feature extraction method | |
CN112560831B (en) | Pedestrian attribute identification method based on multi-scale space correction | |
CN110097028B (en) | Crowd abnormal event detection method based on three-dimensional pyramid image generation network | |
CN113221641A (en) | Video pedestrian re-identification method based on generation of confrontation network and attention mechanism | |
CN109829495A (en) | Timing image prediction method based on LSTM and DCGAN | |
CN111931602A (en) | Multi-stream segmented network human body action identification method and system based on attention mechanism | |
CN106650617A (en) | Pedestrian abnormity identification method based on probabilistic latent semantic analysis | |
CN115512103A (en) | Multi-scale fusion remote sensing image semantic segmentation method and system | |
CN114220154A (en) | Micro-expression feature extraction and identification method based on deep learning | |
CN114612456B (en) | Billet automatic semantic segmentation recognition method based on deep learning | |
CN116580278A (en) | Lip language identification method, equipment and storage medium based on multi-attention mechanism | |
CN115113165A (en) | Radar echo extrapolation method, device and system | |
CN113283393B (en) | Deepfake video detection method based on image group and two-stream network | |
CN115049739A (en) | Binocular vision stereo matching method based on edge detection | |
CN113033283B (en) | Improved video classification system | |
CN113221683A (en) | Expression recognition method based on CNN model in teaching scene | |
CN114758285B (en) | Video interaction action detection method based on anchor freedom and long-term attention perception | |
CN115170985B (en) | Remote sensing image semantic segmentation network and segmentation method based on threshold attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |