CN116563957B - Face fake video detection method based on Fourier domain adaptation - Google Patents
Face fake video detection method based on Fourier domain adaptation Download PDFInfo
- Publication number
- CN116563957B CN116563957B CN202310834717.5A CN202310834717A CN116563957B CN 116563957 B CN116563957 B CN 116563957B CN 202310834717 A CN202310834717 A CN 202310834717A CN 116563957 B CN116563957 B CN 116563957B
- Authority
- CN
- China
- Prior art keywords
- representing
- image
- video
- domain
- video sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006978 adaptation Effects 0.000 title claims abstract description 37
- 238000001514 detection method Methods 0.000 title claims abstract description 24
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 33
- 238000001228 spectrum Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 17
- 230000009466 transformation Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 7
- 230000005012 migration Effects 0.000 claims description 6
- 238000013508 migration Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 239000002131 composite material Substances 0.000 claims description 3
- 230000000873 masking effect Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000002829 reductive effect Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000005242 forging Methods 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/40—Spoof detection, e.g. liveness detection
- G06V40/45—Detection of the body part being alive
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a face fake video detection method based on Fourier domain adaptation, which relates to the technical field of face fake detection and is characterized in that: the method mainly comprises the following steps: s1: carrying out Fourier domain adaptation on video sequences in a source domain data set and a target domain data set to obtain a video sequence with domain alignment; s2, inputting each frame of image in the video sequence with the aligned domains into an Xreception network to obtain a feature vector of each frame of image; s3: inputting the video sequence with the aligned domains into a TimeSformer space-time converter network to obtain feature vectors of the video sequence; s4: mutually fusing the feature vectors output by the Xreception network and the TimeSformer space-time converter network to obtain fused feature vectors; s5: and inputting the fused feature vectors into a classifier to obtain a judgment result of whether the video sequence is forged by the human face.
Description
Technical Field
The invention relates to the technical field of face counterfeiting detection, in particular to a face counterfeiting video detection method based on Fourier domain adaptation.
Background
Face forging refers to the process of falsifying or replacing a real face by utilizing a digital image processing technology or an artificial intelligence technology so as to generate a false face image or video. Face counterfeiting techniques may be used in entertainment, education, medical, etc., but may also be used for malicious purposes such as fraud, defamation, disruption of social order, etc. Therefore, the face-forgery detection technology is an important security protection means that can protect personal privacy and social fairness by analyzing whether a face in an image or video is genuine or forgery.
Currently, face counterfeiting detection techniques are mainly divided into two categories: methods based on conventional image processing techniques and methods based on deep learning techniques. Methods based on traditional image processing technologies mainly use some statistical features or visual artifacts in images to determine whether a face is forged, such as color distribution, edge sharpness, illumination inconsistency, blink frequency, and the like. The method has the advantages of simplicity and easiness in implementation, but has the defects that different characteristic extractors are required to be designed for different forging modes, the generalization capability is poor, and the method is easily interfered by factors such as noise, compression, shielding and the like. The method based on the deep learning technology mainly utilizes a convolutional neural network or a cyclic neural network and other models to automatically learn the characteristics in the image or the video, and classifies or regresses. The method has the advantages of extracting high-level semantic features, having strong adaptability to different fake modes and being capable of processing data with high resolution and high frame rate. However, the disadvantage is that a large amount of annotation data is required for training and that generalization capability is poor for unknown forgeries or cross-domain datasets.
In order to improve the generalization ability and the cross-domain adaptability of the face-forgery detection technology, some researchers have proposed methods based on domain adaptation or domain alignment. Domain adaptation or domain alignment refers to transforming or mapping data sets of different distributions or different styles so that they are more similar or identical under some measure. For example, the paper of the artificial intelligence international top-level conference CVPR 2022 is a face falsification detection method based on a spatial domain adaptation network (Spatial Domain Adaptation Network, SDAN) and a frequency domain adaptation network (Frequency Domain Adaptation Network, FDAN), which first performs spatial domain adaptation and frequency domain adaptation on images in a source domain data set and a target domain data set, and then inputs the adapted images into a shared convolutional neural network for feature extraction and classification. The method can effectively reduce the difference between the source domain data set and the target domain data set in the space domain and the frequency domain, and improves the accuracy of cross-domain detection.
However, the above method only considers spatial domain adaptation and frequency domain adaptation of images, and does not consider timing information present in video sequences. The video sequence contains frame-to-frame dynamic changes and associations that are useful to distinguish between real faces and fake faces. For example, in a video sequence, a real face often has self-consistent movements such as expression changes, eye flickering, head rotation, etc., and a fake face may have abnormal phenomena such as incompatibility, stiffness, repetition, etc. Therefore, in performing face-forgery detection, not only image information but also video information is considered.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention relates to a face counterfeiting detection method based on a Fourier domain adaptation and deep learning network, which can effectively utilize image information and video information to judge whether the face counterfeiting exists in a video sequence and has good generalization capability and cross-domain adaptability.
The invention is realized by the following technical scheme:
a face fake video detection method based on Fourier domain adaptation is characterized by comprising the following steps of: the method comprises the following steps:
s1: carrying out Fourier domain adaptation on video sequences in a source domain data set and a target domain data set to obtain a video sequence with domain alignment;
s2, inputting each frame of image in the video sequence with the aligned domains into an Xreception network to obtain a feature vector of each frame of image;
s3: inputting the video sequence with the aligned domains into a TimeSformer space-time converter network to obtain feature vectors of the video sequence;
s4: mutually fusing the feature vectors output by the Xreception network and the TimeSformer space-time converter network to obtain fused feature vectors;
s5: inputting the fused feature vectors into a classifier to obtain a judgment result of whether the video sequence is forged by a human face;
in S1, fourier domain adaptation is performed on video sequences in a source domain data set and a target domain data set to obtain a video sequence after domain alignment, and the implementation steps include:
s11: video given a source domain dataset isThe video of the target domain dataset is +.>, wherein ,/>A certain video representing a source domain dataset, +.>Representation->Color picture frame of corresponding video, wherein +.>Representing the real number field, ++> and />Representing the height and width of the image, 3 representing an RGB image with color channels red, green and blue,/for the color channels red, green and blue>A label representing the correspondence of the video or picture, i.e. a face fake video is true or false, wherein +.>Video representing a target domain dataset, +.>Picture representing a target domain dataset, +.>A corresponding tag representing a target domain dataset;
s12: is provided withAmplitude component representing a fourier transformation of a color image, < >>The phase component representing the fourier transform of a color image is converted from the spatial domain to the frequency domain for a single-channel image by equation (1), equation (1) being:
(1)
wherein ,is the image at coordinates +.>Pixel value at +.>Is the transformed image at the coordinatesValue of (I) at (I)>Is imaginary unit, ++>Euler number, & lt + & gt>Representing the abscissa of the image, +.>The ordinate of the image is represented, and />Represents the abscissa in the frequency domain, +.>Is indicated at->Frequency variation in direction, +.>Is indicated at->Frequency variation in direction, +.>Representing the height of the image +.>Representing the width of the image;
s13: by usingRepresenting a mask matrix for replacing the low frequency region of the image, expressed by the formula (2)The method is shown as follows:
(2)
wherein the center position of the designated image is,/>An area with image pixel values of 1 is formed as a square, wherein +.>The size of this square area is indicated, < >>、/>Representing the height and width of the image, +.>、/>Indicating the height and width of the masking region required to be performed;
s14: converting the image of the frequency domain into the space domain again according to the inverse Fourier transform to obtain an image after domain alignment, wherein the transformation formula is as shown in formula (3):
(3)
s15: let equation (3) be written asGiven frame pictures in two video sequences +.>,/>Fourier domain adaptation is expressed by equation (4):
(4)
wherein ,representing inverse fourier transform ++>An image representing a source domain video, +.>An image representing a target field video, +.>Representing an image generated after style migration, +.>Representing the phase part of the source domain image after fourier transformation, and>representing the magnitude part of the target domain image after fourier transformation, a ∈>Representing the magnitude part of the source domain image after fourier transformation, a ∈>Representing a mask matrix->Representing a composite of two functions; in S15, 0.001 is set; the fourier domain adaptation refers to: performing Fourier transform on each video sequence in the source domain data set and the target domain data set on a time-frequency plane, calculating the magnitude spectrum and the phase spectrum of each video sequence, randomly pairing each video sequence in the source domain data set with each video sequence in the target domain data, and pairingExchanging the magnitude spectrum of the rear video sequence; and finally, carrying out inverse Fourier transform on the video sequence after the amplitude spectrum exchange on a time-frequency plane, and retaining the original phase spectrum of the video sequence.
As a further limitation of the present technical solution, the feature fusion refers to combining or integrating feature vectors of different networks to generate a new feature vector with a better expression capability or better fit for classification tasks, where the used Xception is,The feature of the image is extracted by the functional representation, +.>Representing the extracted feature vector, as shown in equation (5), the average feature vector corresponding to the frame sequence is then found from the corresponding number of frames +.>See formula (6), whereinRepresenting the number of frames contained in the frame sequence; similarly, timeSformer is +.>,/>The function representation extracts features of the image sequence, < >>Representing the extracted feature vector, < >>Representing a frame sequence generated after style migration, as shown in equation (7), let ∈>Is a fused feature vector expressed as formula (8) representing +.>And->The two eigenvectors are correspondingly added to obtain a fused eigenvector +.>Then, the final prediction probability ++can be obtained by further carrying out the formula (9)>, wherein />Representing a softmax layer,/->Representing a linear layer; for Xreception networks, ++can be determined according to equation (10)>Probability of conversion to predictive class through the linear layer and softmax layer +.>;
(5)
(6)
(7)
(8)
(9)
(10)。
As a further limitation of the present solution, for Xception networks, a loss function is usedFor cross entropy loss, the calculation formula is expressed as follows by formula (11):
(11)
wherein ,represents natural logarithm, and the base is +.>,/>Is a true sample tag.
As a further limitation of the present solution, for the TimeSformer space-time transformer network, a loss function is usedFor the focus loss, the calculation formula is expressed as follows by formula (12):
(12)
wherein ,represents natural logarithm, and the base is +.>,/>For regulating parameters->Set to 2 +>Representing a scaling factor,/->The weight of the positive and negative samples is adjusted to be 0.25;
wherein ,representing the loss function of the timeformer network.
As a further limitation of the technical proposal, the parameters are setWeight parameters for different losses:
(13)
wherein ,for the total loss of the whole process, +.>For the loss of Xreception network, +.>For loss of TimeSformer network, +.>Is set to 0.5 for the proportionality coefficient.
The beneficial effects of the invention are as follows:
(1) Not only is Fourier domain adaptation carried out on the image, but also Fourier domain adaptation is carried out on continuous frames in the video sequence, domain alignment is realized by utilizing information on a frequency domain, and domain alignment operation is carried out between a source domain and a target domain, so that the accuracy of face counterfeiting detection is greatly improved;
(2) The method is characterized in that the method does not adopt a spatial domain adaptation network, but adopts an Xception network and a TimeSformer network to respectively extract the characteristics of an image and a video sequence, and mutually fuses the characteristics of the image and the video sequence, so that the detection effect is improved by utilizing the information on space and time sequence;
(3) The full training of the neural network is added while domain alignment between different data sets is performed, so that very excellent performance can be achieved in face fake detection. The X-section network has multi-layer convolution operation, can extract multi-scale characteristics of images, is a time-space modeling method based on an attention mechanism, can effectively capture time sequence characteristics in videos, and can comprehensively utilize characteristic information of static images and video sequences by combining with the X-section and the time-space former, so that the detection capability of face counterfeiting is improved. Both Xpercent and TimeSformer are models trained by a large-scale data set, have stronger robustness and generalization capability, and can cope with the change and interference of different samples.
Drawings
Fig. 1 is a fourier transformed image amplitude low frequency component transfer diagram of the present invention.
Fig. 2 is a flowchart of feature fusion of an image sequence using an Xception network and a TimeSformer network according to the present invention.
Fig. 3 is a schematic diagram of the operation of the TimeSformer encoder module.
Fig. 4 is a process of the timeformer processing a video sequence to a timing feature.
Fig. 5 is a flow chart of the present invention.
Detailed Description
In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying fig. 1 to 5. In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "left", "right", "front", "rear", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
The specific embodiments of the present invention are as follows:
a face fake video detection method based on Fourier domain adaptation is characterized by comprising the following steps of: the method comprises the following steps:
s1: carrying out Fourier domain adaptation on video sequences in a source domain data set and a target domain data set to obtain a video sequence with domain alignment;
s2, inputting each frame of image in the video sequence with the aligned domains into an Xreception network to obtain a feature vector of each frame of image;
s3: inputting the video sequence with the aligned domains into a TimeSformer space-time converter network to obtain feature vectors of the video sequence;
s4: mutually fusing the feature vectors output by the Xreception network and the TimeSformer space-time converter network to obtain fused feature vectors;
s5: and inputting the fused feature vectors into a classifier to obtain a judgment result of whether the video sequence is forged by the human face.
In S1, fourier domain adaptation is performed on video sequences in a source domain data set and a target domain data set to obtain a video sequence after domain alignment, and the implementation steps include:
s11: video given a source domain dataset isThe video of the target domain dataset is +.>, wherein ,/>A certain video representing a source domain dataset, +.>Representation->Color picture frame of corresponding video, wherein +.>Representing the real number field, ++> and />Representing the height and width of the image, 3 representing an RGB image with color channels red, green and blue,/for the color channels red, green and blue>A label representing the correspondence of the video or picture, i.e. a face fake video is true or false, wherein +.>Video representing a target domain dataset, +.>Picture representing a target domain dataset, +.>A corresponding tag representing a target domain dataset;
s12: is provided withAmplitude component representing a fourier transformation of a color image, < >>The phase component representing the fourier transform of a color image is converted from the spatial domain to the frequency domain for a single-channel image by equation (1), equation (1) being:
(1)
wherein ,is the image at coordinates +.>Pixel value at +.>Is the transformed image at coordinates +.>Value of (I) at (I)>Is imaginary unit, ++>Euler number, & lt + & gt>Representing the abscissa of the image, +.>Representing the ordinate of the image, +.> and />Represents the abscissa in the frequency domain, +.>Is indicated at->Frequency variation in direction, +.>Is indicated at->Frequency variation in direction, +.>Representing the height of the image +.>Representing the width of the image;
s13: by usingRepresenting a mask matrix for replacing the low frequency region of the image, expressed by equation (2):
(2)
wherein the center position of the designated image is,/>An area with image pixel values of 1 is formed as a square, wherein +.>The size of this square area is indicated, < >>、/>Representing the height and width of the image, +.>、/>Indicating the height and width of the masking region required to be performed;
s14: converting the image of the frequency domain into the space domain again according to the inverse Fourier transform to obtain an image after domain alignment, wherein the transformation formula is as shown in formula (3):
(3)
s15: let equation (3) be written asGiven frame pictures in two video sequences +.>,/>Fourier domain adaptation is expressed by equation (4):
(4)
wherein ,representing inverse fourier transform ++>An image representing a source domain video, +.>An image representing a target field video, +.>Representing an image generated after style migration, +.>Representing the phase part of the source domain image after fourier transformation, and>representing the magnitude part of the target domain image after fourier transformation, a ∈>Representing the magnitude part of the source domain image after fourier transformation, a ∈>Representing a mask matrix->Representing a composite of the two functions.
Wherein the low frequency part of the amplitude on the source domain video imageLow frequency portion on target domain video imageReplaced, then, a->The low frequency region of the image is +.>The low frequency region of the image is replaced, the generated image +.>Content and->The same style and->Same (same as->With the same appearance).
In the S15, atIn the gradual increase from 0 to 1, < + >>The image will also get closer and closer to +.>But at the same time visible artefacts will occur, therefore +.>The method is set to be 0.001, and a low-frequency part (namely, a region with slow gray value change) on a source domain video image is replaced by a low-frequency part (namely, a region matched with a target style) on a target domain video image, so that an image with the same content as the source domain and the same style as the target domain is generated, and in this way, the domain difference between the source domain and the target domain can be obviously reduced, and a better effect can be achieved when counterfeiting detection is carried out.
The fourier domain adaptation refers to: performing Fourier transform on each video sequence in the source domain data set and the target domain data set on a time-frequency plane, calculating the amplitude spectrum and the phase spectrum of each video sequence, randomly pairing each video sequence in the source domain data set with each video sequence in the target domain data set, and exchanging the amplitude spectrum between the paired video sequences; and finally, carrying out inverse Fourier transform on the video sequence after the amplitude spectrum exchange on a time-frequency plane, and retaining the original phase spectrum of the video sequence.
The Xattention network is a convolution neural network model based on a depth separable convolution (Depthwise Separable Convolution) structural design, and can perform feature extraction on each picture in a video sequence, so that the accuracy of image identification is improved, the Xattention network adopts the depth separable convolution to replace the traditional convolution, so that the number of parameters and the calculated amount are reduced, the depth separable convolution comprises a depth convolution layer and a point-by-point convolution layer, the depth convolution layer performs convolution operation on each input channel respectively, the depth of a convolution kernel is set to be 1, and the input depth is kept unchanged; the point-by-point convolution layer combines the characteristic graphs of different channels to form an output characteristic graph;
each frame of image is input into the Xception network, and a feature vector with the length of 2048 can be obtained as a feature representation of the frame of image.
The TimeSformer network is a video classification model designed based on a transducer structure, and can model each frame in a video sequence and learn the time sequence relationship between frames;
each video sequence is input into the timeformer network, and a feature vector with a length of 2048 can be obtained as a feature representation of the video sequence.
The TimeSformer network comprises a block segmentation layer, a position embedding layer, a linear embedding layer, 12 encoder modules and a global average pooling layer;
the image frame is divided into image blocks through block division, the image blocks are linearly embedded into vector forms, and the vector forms are added with position information contained in the block division to be combined into embedded vectorsAs input to the encoder module.
The encoder module processes the input video sequence using a separate spatiotemporal attention mechanism; the separate spatiotemporal attention mechanism includes: a temporal attention mechanism for interactive processing of image blocks within each frame of the input video sequence and at the same spatial locations of the image blocks from frame to frame; a spatial attention mechanism for processing the interaction between the image blocks in each frame and other image blocks in the same frame; a multi-layer perceptron module for transforming and mapping the characteristics of the temporal and spatial attention mechanisms, and the output of the multi-layer perceptron module can be used as the input of the next encoder module. After 12 iterations, the output of the encoder module enters the global averaging pooling layer, and the required sequence features can be obtained.
The feature fusion means that feature vectors of different networks are combined or integrated to generate new feature vectors with more expressive capacity or more suitable for classification tasks, and the used Xattention is,/>The feature of the image is extracted by the functional representation, +.>Representing the extracted feature vector, as shown in equation (5), the average feature vector corresponding to the frame sequence is then found from the corresponding number of frames +.>See formula (6), wherein +.>Representing the number of frames contained in the frame sequence; similarly, timeSformer is +.>,/>The function representation extracts features of the image sequence, < >>Representing the extracted feature vector, < >>Representing a frame sequence generated after style migration, as shown in equation (7), let ∈>Is a fused feature vector expressed as formula (8) representing +.>And->The two eigenvectors are correspondingly added to obtain a fused eigenvector +.>Then, the final prediction probability ++can be obtained by further carrying out the formula (9)>, wherein ,/>The softmax layer is shown as such,representing a linear layer; for Xreception networks, ++can be determined according to equation (10)>Probability of conversion to predictive class through the linear layer and softmax layer +.>;
(5)
(6)
(7)
(8)
(9)
(10)。
For Xreception networks, the loss function usedFor cross entropy loss, the calculation formula is expressed as follows by formula (11):
(11)
wherein ,represents natural logarithm, and the base is +.>,/>Is a true sample tag.
For TimeSformer space-time transformer networks, the loss function usedFor focus loss, focus loss is suitable for handling the problem of class imbalance, in particular, it is calculated by changing the cross entropy loss function such that the loss of the model is reduced on those samples that have been correctly classified and the loss is increased on those samples that have been difficult to classify, as shown in equation (12):
(12)
wherein ,represents natural logarithm, and the base is +.>,/>For regulating parameters->Set to 2 +>Representing a scaling factor for adjusting the weight of the easily classified samples, < > -when the samples are predicted correctly>The scaling factor is close to 0 and is close to 1, so that the weight of the sample easy to classify is reduced; when the sample predicts errors, the +_>Near 0, the scaling factor is near 1, so that the weight of the samples difficult to classify is improved; />The weight of the positive and negative samples is adjusted to be 0.25; in this section, use->To represent the loss function of the timeformer network, i.e. the focus loss is used for the optimization of the whole network.
Setting parametersTo balance the duty cycle of the cross entropy loss and the focus loss throughout the network:
(13)
wherein ,for the total loss of the whole process, +.>For the loss of Xreception network, +.>For loss of TimeSformer network, +.>The weight of the two loss functions can be adjusted to be the proportionality coefficient, and if the detection effect is bad, the weight can be increased appropriately>To increase the specific gravity occupied by the focus loss.
The specific implementation mode of the invention is as follows:
the source domain data set selected by the invention is Celeb-DF (v 2), and the Celeb-DF (v 2) data set comprises real and deep synthesized video, and the video quality is similar to that of online transmission. Celeb-DF includes 590 raw videos collected from video websites, which have topics of different ages and sexes, and 5639 corresponding DeepFake videos. The target domain dataset was chosen from faceforensis++ datasets, which are a Face counterfeited dataset consisting of 1000 original video sequences, created using four methods of operation, face2Face, faceSwap, deepFakes and neurosortutes. The method for detecting whether the video sequence in the target domain data set is the face fake or not comprises the following steps:
first, each video sequence in the source domain dataset (face a, style a) and the target domain dataset (face B, style B) is fourier transformed on the time-frequency plane and its magnitude and phase spectra are calculated. Wherein face a and face B represent faces in different data sets and style a and style B represent different styles for each image. Then, randomly pairing each video sequence in the source domain data set with each video sequence in the target domain data set, and exchanging the amplitude spectrum between the paired video sequences; and finally, carrying out inverse Fourier transform on the video sequence after the amplitude spectrum exchange on a time-frequency plane, and reserving the original phase spectrum of the video sequence to form an image containing a face A and a style B.
Then, frame pictures in the domain-aligned video sequence are displayedInputting the image data into an Xreception network to obtain a feature vector of each frame of image; inputting the video sequence with the aligned domains into a TimeSformer network to obtain a feature vector of the video sequence;
then, mutually fusing the feature vectors output by the Xception network and the TimeSformer network to obtain fused feature vectors, specifically, averaging the feature vectors of each frame of image output by the Xception network, and adding the average value with the feature vector of the video sequence output by the TimeSformer network, so as to obtain a feature vector containing both image information and video information, and fully reflecting whether the face in the video sequence is fake or not;
and finally, inputting the fused feature vector into a classifier to obtain a judgment result of whether the video sequence is forged by the human face.
The present invention is not described in detail in the present application, and is well known to those skilled in the art. Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.
Claims (4)
1. A face fake video detection method based on Fourier domain adaptation is characterized by comprising the following steps of: the method comprises the following steps:
s1: carrying out Fourier domain adaptation on video sequences in a source domain data set and a target domain data set to obtain a video sequence with domain alignment;
s2, inputting each frame of image in the video sequence with the aligned domains into an Xreception network to obtain a feature vector of each frame of image;
s3: inputting the video sequence with the aligned domains into a TimeSformer space-time converter network to obtain feature vectors of the video sequence;
s4: mutually fusing the feature vectors output by the Xreception network and the TimeSformer space-time converter network to obtain fused feature vectors;
s5: inputting the fused feature vectors into a classifier to obtain a judgment result of whether the video sequence is forged by a human face;
in S1, fourier domain adaptation is performed on video sequences in a source domain data set and a target domain data set to obtain a video sequence after domain alignment, and the implementation steps include:
s11: video given a source domain dataset is,
Video of the target domain dataset is, wherein ,/>A certain video representing a source domain dataset, +.>Representation->Color picture frame of corresponding video, wherein +.>Representing the real number field, ++> and />Representing the height and width of the image, 3 representing an RGB image with color channels red, green and blue,/for the color channels red, green and blue>A label representing the correspondence of the video or picture, i.e. a face fake video is true or false, wherein +.>Video representing a target domain dataset, +.>Picture representing a target domain dataset, +.>A corresponding tag representing a target domain dataset;
s12: is provided withAmplitude component representing a fourier transformation of a color image, < >>The phase component representing the fourier transform of a color image is converted from the spatial domain to the frequency domain for a single-channel image by equation (1), equation (1) being:
(1)
wherein ,is the image at coordinates +.>Pixel value at +.>Is the transformed image at coordinates +.>Value of (I) at (I)>Is imaginary unit, ++>Euler number, & lt + & gt>Representing the abscissa of the image, +.>Representing the ordinate of the image, +.> and />Represents the abscissa in the frequency domain, +.>Is indicated at->Frequency variation in direction, +.>Is indicated at->Frequency variation in direction, +.>Representing the height of the image +.>Representing the width of the image;
s13: by usingRepresenting a mask matrix for replacing the low frequency region of the image, expressed by equation (2):
(2)
wherein the center position of the designated image is,/>An area with image pixel values of 1 is formed as a square, wherein +.>The size of this square area is indicated, < >>、/>Representing the height and width of the image, +.>、/>Indicating the height and width of the masking region required to be performed;
s14: converting the image of the frequency domain into the space domain again according to the inverse Fourier transform to obtain an image after domain alignment, wherein the transformation formula is as shown in formula (3):
(3)
s15: let equation (3) be written asGiven frame pictures in two video sequences +.>,/>Fourier domain adaptation is expressed by equation (4):
(4)
wherein ,representing inverse fourier transform ++>An image representing a source domain video, +.>An image representing a target field video, +.>Representing an image generated after style migration, +.>Representing the fourier transformed phase portion of the source domain image,representing the magnitude part of the target domain image after fourier transformation, a ∈>Representing the magnitude part of the source domain image after fourier transformation, a ∈>Representing a mask matrix->Representing a composite of two functions;
in the step S15 of the process described above,set to 0.001, the fourier domain adaptation refers to: performing Fourier transform on each video sequence in the source domain data set and the target domain data set on a time-frequency plane, calculating the amplitude spectrum and the phase spectrum of each video sequence, randomly pairing each video sequence in the source domain data set with each video sequence in the target domain data, and exchanging the amplitude of each paired video sequenceA degree spectrum; finally, carrying out inverse Fourier transform on the video sequence after the amplitude spectrum exchange on a time-frequency plane, and retaining the original phase spectrum;
the feature fusion means that feature vectors of different networks are combined or integrated to generate new feature vectors with more expressive capacity or more suitable for classification tasks, and the used Xattention is,/>The feature of the image is extracted by the functional representation, +.>Representing the extracted feature vector, as shown in equation (5), the average feature vector corresponding to the frame sequence is then found from the corresponding number of frames +.>See formula (6), wherein +.>Representing the number of frames contained in the frame sequence; similarly, timeSformer is +.>,/>The function representation extracts features of the image sequence, < >>Representing the extracted feature vector, < >>Representing a frame sequence generated after style migration, as shown in equation (7), let ∈>Is a fused feature vector expressed as formula (8) representing +.>And->The two feature vectors are correspondingly added to obtain a fused feature vectorThen, the final prediction probability ++can be obtained by further carrying out the formula (9)>, wherein />The softmax layer is shown as such,representing a linear layer; for Xreception networks, ++can be determined according to equation (10)>Probability of conversion to predictive class through the linear layer and softmax layer +.>;
(5)
(6)
(7)
(8)
(9)
(10)。
2. The method for detecting human face falsification video based on fourier domain adaptation according to claim 1, wherein the method comprises the steps of: for Xreception networks, the loss function usedFor cross entropy loss, the calculation formula is expressed as follows by formula (11):
(11)
wherein ,represents natural logarithm, and the base is +.>,/>Is a true sample tag.
3. The face fake video detection method based on fourier domain adaptation according to claim 2, wherein: for TimeSformer space-time transformer networks, the loss function usedFor the focus loss, the calculation formula is expressed as follows by formula (12):
(12)
wherein ,represents natural logarithm, and the base is +.>,/>For regulating parameters->Set to 2 +>Representing a scaling factor,/->The weight of the positive and negative samples is adjusted to be 0.25; wherein (1)>Representing the loss function of the timeformer network.
4. A face falsification video detection method based on fourier domain adaptation as defined in claim 3, wherein: setting parametersWeight parameters for different losses:
(13)
wherein ,for the total loss of the whole process, +.>For the loss of Xreception network, +.>For loss of TimeSformer network, +.>Is set to 0.5 for the proportionality coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310834717.5A CN116563957B (en) | 2023-07-10 | 2023-07-10 | Face fake video detection method based on Fourier domain adaptation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310834717.5A CN116563957B (en) | 2023-07-10 | 2023-07-10 | Face fake video detection method based on Fourier domain adaptation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116563957A CN116563957A (en) | 2023-08-08 |
CN116563957B true CN116563957B (en) | 2023-09-29 |
Family
ID=87488318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310834717.5A Active CN116563957B (en) | 2023-07-10 | 2023-07-10 | Face fake video detection method based on Fourier domain adaptation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116563957B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117115927A (en) * | 2023-10-23 | 2023-11-24 | 广州佰锐网络科技有限公司 | Audio and video security verification method and system applied to living body detection in financial business |
CN118334473B (en) * | 2024-06-13 | 2024-08-23 | 南昌大学 | Deep fake image detection method based on semantic entanglement |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112734696A (en) * | 2020-12-24 | 2021-04-30 | 华南理工大学 | Face changing video tampering detection method and system based on multi-domain feature fusion |
EP3818526A1 (en) * | 2018-07-05 | 2021-05-12 | DTS, Inc. | Hybrid audio synthesis using neural networks |
CN113313054A (en) * | 2021-06-15 | 2021-08-27 | 中国科学技术大学 | Face counterfeit video detection method, system, equipment and storage medium |
CN113435292A (en) * | 2021-06-22 | 2021-09-24 | 北京交通大学 | AI counterfeit face detection method based on inherent feature mining |
CN114492599A (en) * | 2022-01-07 | 2022-05-13 | 北京邮电大学 | Medical image preprocessing method and device based on Fourier domain self-adaptation |
CN114519897A (en) * | 2021-12-31 | 2022-05-20 | 重庆邮电大学 | Human face in-vivo detection method based on color space fusion and recurrent neural network |
CN114758272A (en) * | 2022-03-31 | 2022-07-15 | 中国人民解放军战略支援部队信息工程大学 | Forged video detection method based on frequency domain self-attention |
CN115188039A (en) * | 2022-05-27 | 2022-10-14 | 国家计算机网络与信息安全管理中心 | Depth forgery video technology tracing method based on image frequency domain information |
CN115273169A (en) * | 2022-05-23 | 2022-11-01 | 西安电子科技大学 | Face counterfeiting detection system and method based on time-space-frequency domain clue enhancement |
WO2023280423A1 (en) * | 2021-07-09 | 2023-01-12 | Cariad Estonia As | Methods, systems and computer programs for processing and adapting image data from different domains |
CN115761459A (en) * | 2022-12-09 | 2023-03-07 | 云南楚姚高速公路有限公司 | Multi-scene self-adaption method for bridge and tunnel apparent disease identification |
CN115909129A (en) * | 2022-10-17 | 2023-04-04 | 同济大学 | Face forgery detection method based on frequency domain feature double-flow network |
CN116386590A (en) * | 2023-05-29 | 2023-07-04 | 北京科技大学 | Multi-mode expressive voice synthesis method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111814871B (en) * | 2020-06-13 | 2024-02-09 | 浙江大学 | Image classification method based on reliable weight optimal transmission |
CN114913565B (en) * | 2021-01-28 | 2023-11-17 | 腾讯科技(深圳)有限公司 | Face image detection method, model training method, device and storage medium |
-
2023
- 2023-07-10 CN CN202310834717.5A patent/CN116563957B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3818526A1 (en) * | 2018-07-05 | 2021-05-12 | DTS, Inc. | Hybrid audio synthesis using neural networks |
CN112734696A (en) * | 2020-12-24 | 2021-04-30 | 华南理工大学 | Face changing video tampering detection method and system based on multi-domain feature fusion |
CN113313054A (en) * | 2021-06-15 | 2021-08-27 | 中国科学技术大学 | Face counterfeit video detection method, system, equipment and storage medium |
CN113435292A (en) * | 2021-06-22 | 2021-09-24 | 北京交通大学 | AI counterfeit face detection method based on inherent feature mining |
WO2023280423A1 (en) * | 2021-07-09 | 2023-01-12 | Cariad Estonia As | Methods, systems and computer programs for processing and adapting image data from different domains |
CN114519897A (en) * | 2021-12-31 | 2022-05-20 | 重庆邮电大学 | Human face in-vivo detection method based on color space fusion and recurrent neural network |
CN114492599A (en) * | 2022-01-07 | 2022-05-13 | 北京邮电大学 | Medical image preprocessing method and device based on Fourier domain self-adaptation |
CN114758272A (en) * | 2022-03-31 | 2022-07-15 | 中国人民解放军战略支援部队信息工程大学 | Forged video detection method based on frequency domain self-attention |
CN115273169A (en) * | 2022-05-23 | 2022-11-01 | 西安电子科技大学 | Face counterfeiting detection system and method based on time-space-frequency domain clue enhancement |
CN115188039A (en) * | 2022-05-27 | 2022-10-14 | 国家计算机网络与信息安全管理中心 | Depth forgery video technology tracing method based on image frequency domain information |
CN115909129A (en) * | 2022-10-17 | 2023-04-04 | 同济大学 | Face forgery detection method based on frequency domain feature double-flow network |
CN115761459A (en) * | 2022-12-09 | 2023-03-07 | 云南楚姚高速公路有限公司 | Multi-scene self-adaption method for bridge and tunnel apparent disease identification |
CN116386590A (en) * | 2023-05-29 | 2023-07-04 | 北京科技大学 | Multi-mode expressive voice synthesis method and device |
Non-Patent Citations (6)
Title |
---|
Hui Qi.A Real-Time Face Detection Method Based on Blink Detection. IEEE Access .2023,全文. * |
Hui Qi.A Real-Time Face Detection Method Based on Blink Detection. IEEE Access.2023,全文. * |
RD-IWAN: Residual Dense Based Imperceptible Watermark Attack Network;Chunpeng Wang;IEEE Transactions on Circuits and Systems for Video Technology;全文 * |
一种基于空域和频域多特征融合的人脸活体检测算法;陈然;伍世虔;徐望明;;电视技术(第03期);全文 * |
基于域自适应与多子空间的人脸识别研究;韩晗;徐智;;桂林电子科技大学学报(第03期);全文 * |
陈鹏 ; 梁涛 ; 刘锦 ; 戴娇 ; 韩冀中 ; .融合全局时序和局部空间特征的伪造人脸视频检测方法.信息安全学报.2020,(第02期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN116563957A (en) | 2023-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Fake face detection via adaptive manipulation traces extraction network | |
CN116563957B (en) | Face fake video detection method based on Fourier domain adaptation | |
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
Jin et al. | Generative adversarial network technologies and applications in computer vision | |
CN111667400B (en) | Human face contour feature stylization generation method based on unsupervised learning | |
CN110880172A (en) | Video face tampering detection method and system based on cyclic convolution neural network | |
CN113536972B (en) | Self-supervision cross-domain crowd counting method based on target domain pseudo label | |
CN113283444B (en) | Heterogeneous image migration method based on generation countermeasure network | |
Ma et al. | Unsupervised domain adaptation augmented by mutually boosted attention for semantic segmentation of VHR remote sensing images | |
Xia et al. | Towards deepfake video forensics based on facial textural disparities in multi-color channels | |
CN115482595B (en) | Specific character visual sense counterfeiting detection and identification method based on semantic segmentation | |
CN113689382A (en) | Tumor postoperative life prediction method and system based on medical images and pathological images | |
Li et al. | Zooming into face forensics: A pixel-level analysis | |
CN113553954A (en) | Method and apparatus for training behavior recognition model, device, medium, and program product | |
CN114119356A (en) | Method for converting thermal infrared image into visible light color image based on cycleGAN | |
Xiao et al. | Securing the socio-cyber world: Multiorder attribute node association classification for manipulated media | |
Wen et al. | A hybrid model for natural face de-identiation with adjustable privacy | |
CN117095471B (en) | Face counterfeiting tracing method based on multi-scale characteristics | |
Peng et al. | Presentation attack detection based on two-stream vision transformers with self-attention fusion | |
CN114937298A (en) | Micro-expression recognition method based on feature decoupling | |
CN112990340B (en) | Self-learning migration method based on feature sharing | |
Sabitha et al. | Enhanced model for fake image detection (EMFID) using convolutional neural networks with histogram and wavelet based feature extractions | |
CN114519897B (en) | Human face living body detection method based on color space fusion and cyclic neural network | |
He et al. | Dynamic residual distillation network for face anti-spoofing with feature attention learning | |
Chen | Evaluation technology of classroom students’ learning state based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |