CN114758272A - Forged video detection method based on frequency domain self-attention - Google Patents

Forged video detection method based on frequency domain self-attention Download PDF

Info

Publication number
CN114758272A
CN114758272A CN202210334683.9A CN202210334683A CN114758272A CN 114758272 A CN114758272 A CN 114758272A CN 202210334683 A CN202210334683 A CN 202210334683A CN 114758272 A CN114758272 A CN 114758272A
Authority
CN
China
Prior art keywords
video
face image
video frame
image
forged
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210334683.9A
Other languages
Chinese (zh)
Inventor
李邵梅
吉立新
黄瑞阳
马欣
杨帆
高超
张建朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202210334683.9A priority Critical patent/CN114758272A/en
Publication of CN114758272A publication Critical patent/CN114758272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a forged video detection method based on frequency domain self-attention. The method comprises the following steps: dividing a video to be detected into a plurality of video frames; judging whether each video frame forges the video frame, specifically comprising: extracting a face image in a current video frame and recording the face image as an original face image; extracting a phase spectrum of the original face image, reconstructing the original face image based on the phase spectrum, and recording the reconstructed face image as a reconstructed face image; splitting the reconstructed face image into a plurality of image blocks with the same size, and converting the image blocks into sequence data; inputting the sequence data into a trained transform model to extract a characteristic vector, inputting the characteristic vector into a multilayer perceptron, and judging whether a video frame corresponding to the characteristic vector is a forged video frame; and counting the number of the forged video frames and the number of the real video frames, and if the former is larger than the latter, considering the video to be detected as a forged video, and otherwise, considering the video to be detected as a real video.

Description

Forged video detection method based on frequency domain self-attention
Technical Field
The invention relates to the technical field of video processing and network space security, in particular to a frequency domain self-attention-based counterfeit video detection method.
Background
The traditional forged video face detection method is mainly based on CNN (convolutional neural network), and researches in two years find that an attention model based on Transformer can obtain better performance in forged video detection, but the conventional forged video detection model based on Transformer only performs forged feature learning from original image pixels and characteristic images based on CNN, does not consider phase spectrum features based on frequency domain transformation, and has a space for further improving detection accuracy.
Disclosure of Invention
Aiming at the problem of low detection precision of the existing forged video detection method, the invention provides a forged video detection method based on frequency domain self attention.
The invention provides a method for detecting a forged video based on frequency domain self attention, which comprises the following steps:
step 1: dividing a video to be detected into a plurality of video frames;
And 2, step: judging whether each video frame forges a video frame, specifically comprising:
step 2.1: extracting a face image in a current video frame and recording the face image as an original face image; extracting a phase spectrum of the original face image, reconstructing the original face image based on the phase spectrum, and recording the reconstructed face image as a reconstructed face image;
step 2.2: splitting the reconstructed face image into a plurality of image blocks with the same size, and converting the image blocks into sequence data;
step 2.3: inputting the sequence data into a trained transform model to extract a characteristic vector, inputting the characteristic vector into a multilayer perceptron, and judging whether a video frame corresponding to the characteristic vector is a forged video frame;
and step 3: and counting the number of the forged video frames and the number of the real video frames, and if the former is larger than the latter, considering the video to be detected as a forged video, and otherwise, considering the video to be detected as a real video.
Further, step 2.1 specifically includes:
converting the original face image I (x, y) into a grayscale image Ig(x, y) for the grayscale image Ig(x, y) performing fast Fourier transform according to the formula (1) to obtain an image F (x, y); then calculating according to a formula (2) to obtain a phase spectrum S (x, y); finally, obtaining a reconstructed face image P (x, y) according to a formula (3);
F(x,y)=FFT(Ig(x,y)) (1)
S(x,y)=p(F(x,y)) (2)
P(x,y)=IFFT([ei·S(x,y)]) (3)
Wherein FFT (. cndot.) and IFFT (. cndot.) represent the fast Fourier transform and inverse fast Fourier transform, respectively, and p (. cndot.) is a function of the phase angle.
Further, step 2.2 specifically includes:
setting the size of the reconstructed face image to be H multiplied by W and the size of the image block to be P multiplied by P, obtaining N image blocks, wherein N is (H multiplied by W)/P2
Converting the N image blocks into sequence data z according to formula (4)0
Figure BDA0003576292610000021
Wherein x isclassRepresenting D-dimensional learnable category-related variables,
Figure BDA0003576292610000022
representing N pixel matrices of size P, E representing a linear mapping matrix for transforming image blocks to D-dimensional embedding, EposRepresenting a matrix in which the locations are embedded.
Further, step 2.3 specifically includes:
the feature extraction process of the transform model is represented by formulas (5) to (6), and the decision process of the multilayer perceptron is represented by formula (7):
z'l=MHA(LN(zl-1))+(zl-1), l=1...L (5)
zl=MLP(LN(z'l))+(z'l), l=1...L (6)
Figure BDA0003576292610000031
wherein MHA (-) represents a multi-head attention mechanism; LN (·) denotes layer normalization; MLP (-) represents a multi-layer perceptron; recording the corresponding multilayer perceptron in the formula (6) as a first multilayer perceptron, and recording the corresponding multilayer perceptron in the formula (7) as a second multilayer perceptron; l represents the total number of layers of the Transformer, L represents the L-th layer, zlRepresents the output of the l-th layer MLP, z' lRepresents the output of the l-th layer MHA,
Figure BDA0003576292610000032
denotes zlData of 1 st dimension.
Further, the first multi-layer perceptron consists of two hidden layers; the first hidden layer has H1A node, a second hidden layer having H2A plurality of nodes; wherein H1=D,H2Is equal to said pluralityThe output dimension of the layer perceptron;
the calculation formula of the first hidden layer is shown as formula (10), and the calculation formula of the second hidden layer is shown as formula (11):
Figure BDA0003576292610000033
Figure BDA0003576292610000034
wherein the content of the first and second substances,
Figure BDA0003576292610000035
representing the learnable weights of the first hidden layer and the second hidden layer, respectively, g (-) represents the activation function,
Figure BDA0003576292610000036
an intermediate temporary value representing the i-th hidden node of the first hidden layer of the MLP,
Figure BDA0003576292610000037
an intermediate temporary value representing the i-th hidden node of the second hidden layer of the MLP,
Figure BDA0003576292610000038
an output representing an i-th hidden node of a first hidden layer of the MLP;
Figure BDA0003576292610000039
an output representing an i-th hidden node of a second hidden layer of the MLP; x is the number ofjIs the input for the j-th dimension.
Further, the second multi-layer perceptron has a hidden layer, the hidden layer has two nodes, the value of the first node is taken as the probability that the video frame is a real video frame, and the value of the second node is taken as the probability that the video frame is a fake video frame.
The invention has the beneficial effects that:
by combining the Transformer model with the phase spectrum characteristics of the frequency domain, compared with the traditional CNN-based characteristic extraction network, the Transformer model has better forged video characteristic extraction performance, and compared with the traditional method for detecting forged videos based on the Transformer model, the method considers the influence of the phase spectrum characteristics on the forged videos, and can further improve the detection precision of the forged videos.
Drawings
Fig. 1 is a schematic flowchart of a method for detecting a forged video based on frequency domain self-attention according to an embodiment of the present invention;
fig. 2 is an effect diagram of reconstructing a face image based on a phase spectrum according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a process of determining whether a current video frame is a forged video frame according to an embodiment of the present invention;
FIG. 4 is a diagram of a transform model provided in the prior art;
fig. 5 is a flowchart of category embedding to be learned and location embedding to input embedding according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be described clearly below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for detecting a counterfeit video based on frequency domain self-attention, including the following steps:
s101: dividing a video to be detected into a plurality of video frames;
s102: judging whether each video frame forges the video frame, specifically comprising:
s1021: as shown in fig. 2, extracting a face image in a current video frame, and recording the face image as an original face image; extracting a phase spectrum of the original face image, reconstructing the original face image based on the phase spectrum, and recording the reconstructed face image as a reconstructed face image;
specifically, a classical retinaFace model can be adopted to extract a face image from a video frame. Converting the original face image I (x, y) into a grayscale image I (x, y) because the phase information does not require attention to the color of the imageg(x, y) for the grayscale image Ig(x, y) performing fast Fourier transform according to the formula (1) to obtain an image F (x, y); then calculating according to a formula (2) to obtain a phase spectrum S (x, y); finally, obtaining a reconstructed face image P (x, y) according to a formula (3);
F(x,y)=FFT(Ig(x,y)) (1)
S(x,y)=p(F(x,y)) (2)
P(x,y)=IFFT([ei·S(x,y)]) (3)
wherein FFT (-) and IFFT (-) respectively represent fast Fourier transform and inverse fast Fourier transform, and p (-) is a function of the phase angle;
s1022: as shown in fig. 3, the reconstructed face image is divided into a plurality of image blocks with the same size, and the image blocks are converted into sequence data;
Specifically, if the size of the reconstructed face image is H × W and the size of the image block is P × P, N image blocks are obtained, where N is (H × W)/P2
Converting the N image blocks into sequence data z according to formula (4)0
Figure BDA0003576292610000051
Wherein x isclassRepresenting D-dimensional learnable category-related variables,
Figure BDA0003576292610000052
representing N pixel matrices of size P, E representing the linear mapping moment for transforming image blocks to D-dimensional embeddingArray, EposRepresenting a matrix in which the locations are embedded.
S1023: as shown in fig. 3, inputting the sequence data into a trained transform model to extract a feature vector, inputting the feature vector into a multilayer perceptron, and determining whether a video frame corresponding to the feature vector is a counterfeit video frame;
specifically, the feature extraction process is represented by formulas (5) to (6), and the decision process of the multi-layer perceptron is represented by formula (7):
z'l=MHA(LN(zl-1))+(zl-1), l=1...L (5)
zl=MLP(LN(z'l))+(z'l), l=1...L (6)
Figure BDA0003576292610000061
wherein MHA (-) represents a multi-head attention mechanism; LN (·) denotes layer normalization; MLP (-) represents a multi-layer perceptron; for the sake of distinction and description, the corresponding multi-layer perceptron in the formula (6) is referred to as a first multi-layer perceptron, and the corresponding multi-layer perceptron in the formula (7) is referred to as a second multi-layer perceptron; l represents the total number of layers of the Transformer, each layer L represents the L-th layer, z lRepresents the output of the l-layer MLP, z'lRepresents the output of the l-th layer MHA,
Figure BDA0003576292610000062
denotes zlData of 1 st dimension.
It should be noted that LN (-) is used to normalize one or several dimensions of the input, for example, to normalize X { X } for some dimension of the input X1,x2,…xnNormalized, LN () is calculated as:
Figure BDA0003576292610000063
wherein E (x) is the mean value of x,
Figure BDA0003576292610000064
Var[x]is the variance of x and is the sum of the variance of x,
Figure BDA0003576292610000065
ε is a small value added to prevent the denominator from being 0, and is typically 1 e-05.
The MHA (-) is calculated as follows:
Figure BDA0003576292610000066
wherein, WQ、WK、WVIs a learnable parameter matrix, d is KTDimension of Q, assuming K denotes the number of output classes of the neural network, v is the output vector, vjFor the value of the jth output category in v, i represents the category currently required to be calculated, the calculation result is between 0 and 1, and the softmax values of all the categories are summed to be 1. The formula for softmax (·) is:
Figure BDA0003576292610000071
in this embodiment, the first multi-layer perceptron is composed of two hidden layers; the first hidden layer of the first multilayer perceptron has H1A node, a second hidden layer having H2A node; wherein H1=D,H2Is equal to the output dimension of the multilayer perceptron;
the calculation formula of the first hidden layer is shown as formula (10), and the calculation formula of the second hidden layer is shown as formula (11):
Figure BDA0003576292610000072
Figure BDA0003576292610000073
Wherein the content of the first and second substances,
Figure BDA0003576292610000074
representing the learnable weights of the first hidden layer and the second hidden layer, respectively, g (-) represents the activation function,
Figure BDA0003576292610000075
an intermediate temporary value representing the i-th hidden node of the first hidden layer of the MLP,
Figure BDA0003576292610000076
an intermediate temporary value representing the i-th hidden node of the second hidden layer of the MLP,
Figure BDA0003576292610000077
the output of the i-th hidden node representing the first hidden layer of the MLP, as can be seen from equation (10), is via
Figure BDA0003576292610000078
Sending the activation function to obtain;
Figure BDA0003576292610000079
the output of the i-th hidden node representing the second hidden layer of the MLP, as seen by equation (11), is via
Figure BDA00035762926100000710
Sending the activation function to obtain; x is the number ofjIs the input for the j-th dimension.
In this embodiment, a ReLU function is used as an activation function, and a calculation formula thereof is as follows:
Figure BDA00035762926100000711
in this embodiment, the second multi-layer perceptron has a hidden layer, the hidden layer has two nodes, the value of the first node is taken as the probability that the video frame is a real video frame, and the value of the second node is taken as the probability that the video frame is a fake video frame. If the value of the first node is larger than that of the second node, the current video frame is considered to be a real video frame, otherwise, the current video frame is considered to be a fake video frame.
As an implementation, the training process of the Transformer model is as follows:
Firstly, collecting M real face images and M forged face images generated based on depth forging; then, respectively positioning the human face regions in the M real human face images and the M forged human face images by adopting a RetinaFace model, and cutting and extracting the human face regions; then, the extracted real face image and the forged face image are reconstructed based on the phase spectrum, and M real face images obtained through reconstruction form a forward sample set p ═ p1,p2,…,pMAnd the reconstructed M forged face images form a negative direction n ═ n1,n2,…,nM}; setting the label of each sample in the positive sample set to be 1, and setting the label of each sample in the negative sample set to be 0; finally, p and n are input into the network shown in FIG. 4 for training. Preferably, M is 100.
S103: and counting the number of the forged video frames and the number of the real video frames, and if the former is larger than the latter, considering the video to be detected as a forged video, and otherwise, considering the video to be detected as a real video.
Example 2
The following describes the processing flow by taking the detection of one video frame as an example. Firstly, extracting a face region from a video frame by adopting a RetinaFace model, and then obtaining a face image after phase spectrum reconstruction by adopting formulas (1) - (3).
For each phase spectrum reconstructed gray-scale face image, the size of the image is firstly adjusted to 256 × 256, and then the image is cut into image blocks of 32 × 32 size, so as to obtain 64 image blocks, and each image block is mapped to 32 × 32-1024 dimensions through linear mapping. The 64 image block embeddings and the 1 × 1024 dimensional learnable classification embeddings (vectors indicated by the leftmost bands in fig. 3) constitute a 65 × 1024 dimensional embeddings. Considering that the positional relationship between image blocks is meaningful for understanding the contents of an image, a final 65 × 1024 dimensional embedding is obtained as an input of the transform model by adding a 65 × 1024 dimensional learnable position embedding to the embedding.
The above-mentioned 65X 1024 dimensional embedding I1The features are extracted by being fed into a Transformer model, wherein the Transformer model is composed of 6 (namely L is 6) network structures shown in FIG. 4, the number of attention heads of a multi-head attention mechanism is 16, and the number of nodes H of the 1 st hidden layer of the multi-layer perceptron is H12048, number of nodes of the 2 nd hidden layer H2Is 1024. After transform coding, outputting a new 65 x 1024 dimensional image representation I2
From I2The 1024-dimensional vector of the 1 st row is extracted and input into the multi-layer perceptron at the top of the graph 3 as the learned category vector, and the perceptron has only 1 hidden layer and 2 nodes and is used for converting the 1024-dimensional input vector into a 2-dimensional category vector, which is {0.03,0.4} because 0.03 is used for converting the 1024-dimensional input vector into the 2-dimensional category vector <0.4, so the video frame is determined to be a fake video frame.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A fake video detection method based on frequency domain self attention is characterized by comprising the following steps:
step 1: dividing a video to be detected into a plurality of video frames;
step 2: judging whether each video frame forges the video frame, specifically comprising:
step 2.1: extracting a face image in a current video frame and recording the face image as an original face image; extracting a phase spectrum of the original face image, reconstructing the original face image based on the phase spectrum, and recording the reconstructed face image as a reconstructed face image;
step 2.2: splitting the reconstructed face image into a plurality of image blocks with the same size, and converting the image blocks into sequence data;
Step 2.3: inputting the sequence data into a trained transform model to extract a characteristic vector, inputting the characteristic vector into a multilayer perceptron, and judging whether a video frame corresponding to the characteristic vector is a forged video frame;
and 3, step 3: and counting the number of the forged video frames and the number of the real video frames, and if the former is larger than the latter, considering the video to be detected as a forged video, and otherwise, considering the video to be detected as a real video.
2. A method for detecting a forged video based on frequency domain self attention as claimed in claim 1, wherein the step 2.1 specifically comprises:
converting the original face image I (x, y) into a gray image Ig(x, y) for the grayscale image Ig(x, y) performing fast Fourier transform according to the formula (1) to obtain an image F (x, y); then calculating according to a formula (2) to obtain a phase spectrum S (x, y); finally, obtaining a reconstructed face image P (x, y) according to a formula (3);
F(x,y)=FFT(Ig(x,y)) (1)
S(x,y)=p(F(x,y)) (2)
P(x,y)=IFFT([ei·S(x,y)]) (3)
wherein FFT (-) and IFFT (-) respectively represent the fast Fourier transform and the inverse fast Fourier transform, and p (-) is a function of the phase angle.
3. A method for detecting a forged video based on frequency domain self attention as claimed in claim 1, wherein the step 2.2 specifically comprises:
setting the size of the reconstructed face image to be H multiplied by W and the size of the image block to be P multiplied by P, obtaining N image blocks, wherein N is (H multiplied by W)/P 2
Dividing N said graphs according to formula (4)Block conversion to sequence data z0
Figure FDA0003576292600000021
Wherein x isclassRepresenting a class-related variable learnable by the D-dimension,
Figure FDA0003576292600000022
representing N pixel matrices of size P, E representing a linear mapping matrix for transforming image blocks to D-dimensional embedding, EposRepresenting a matrix in which the locations are embedded.
4. A method for detecting a forged video based on frequency domain self attention as claimed in claim 3, wherein the step 2.3 specifically comprises:
the feature extraction process of the transform model is represented by formulas (5) to (6), and the decision process of the multilayer perceptron is represented by formula (7):
z'l=MHA(LN(zl-1))+(zl-1),l=1...L (5)
zl=MLP(LN(z'l))+(z'l),l=1...L (6)
Figure FDA0003576292600000023
wherein MHA (-) represents a multi-head attention mechanism; LN (·) denotes layer normalization; MLP (-) represents a multi-layer perceptron; recording the corresponding multilayer perceptron in the formula (6) as a first multilayer perceptron, and recording the corresponding multilayer perceptron in the formula (7) as a second multilayer perceptron; l represents the total number of layers of the Transformer, L represents the L-th layer, zlRepresents the output of the l-th layer MLP, z'lRepresents the output of the l-th layer MHA,
Figure FDA0003576292600000024
denotes zlIn the 1 st dimensionAnd (4) data.
5. The method for detecting forged video based on frequency domain self-attention according to claim 4, wherein the first multi-layer perceptron is composed of two hidden layers; the first hidden layer has H 1A node, the second hidden layer having H2A plurality of nodes; wherein H1=D,H2Is equal to the output dimension of the multilayer perceptron;
the calculation formula of the first hidden layer is shown as formula (10), and the calculation formula of the second hidden layer is shown as formula (11):
Figure FDA0003576292600000031
Figure FDA0003576292600000032
wherein the content of the first and second substances,
Figure FDA0003576292600000033
representing the learnable weights of the first hidden layer and the second hidden layer, respectively, g (-) represents the activation function,
Figure FDA0003576292600000034
an intermediate temporary value representing the i-th hidden node of the first hidden layer of the MLP,
Figure FDA0003576292600000035
an intermediate temporary value representing the i-th hidden node of the second hidden layer of the MLP,
Figure FDA0003576292600000036
an output representing an i-th hidden node of a first hidden layer of the MLP;
Figure FDA0003576292600000037
an output representing an i-th hidden node of a second hidden layer of the MLP; x is the number ofjIs the input for the j-th dimension.
6. The method according to claim 4, wherein the second multi-layered perceptron has a hidden layer, the hidden layer has two nodes, the value of the first node is taken as the probability that the video frame is a real video frame, and the value of the second node is taken as the probability that the video frame is a fake video frame.
CN202210334683.9A 2022-03-31 2022-03-31 Forged video detection method based on frequency domain self-attention Pending CN114758272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210334683.9A CN114758272A (en) 2022-03-31 2022-03-31 Forged video detection method based on frequency domain self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210334683.9A CN114758272A (en) 2022-03-31 2022-03-31 Forged video detection method based on frequency domain self-attention

Publications (1)

Publication Number Publication Date
CN114758272A true CN114758272A (en) 2022-07-15

Family

ID=82329306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210334683.9A Pending CN114758272A (en) 2022-03-31 2022-03-31 Forged video detection method based on frequency domain self-attention

Country Status (1)

Country Link
CN (1) CN114758272A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer
CN116563957A (en) * 2023-07-10 2023-08-08 齐鲁工业大学(山东省科学院) Face fake video detection method based on Fourier domain adaptation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer
CN116563957A (en) * 2023-07-10 2023-08-08 齐鲁工业大学(山东省科学院) Face fake video detection method based on Fourier domain adaptation
CN116563957B (en) * 2023-07-10 2023-09-29 齐鲁工业大学(山东省科学院) Face fake video detection method based on Fourier domain adaptation

Similar Documents

Publication Publication Date Title
US11288546B2 (en) Apparatus and method for training facial locality super resolution deep neural network
CN111369563B (en) Semantic segmentation method based on pyramid void convolutional network
CN109949317B (en) Semi-supervised image example segmentation method based on gradual confrontation learning
WO2021042828A1 (en) Neural network model compression method and apparatus, and storage medium and chip
US11188813B2 (en) Hybrid architecture system and method for high-dimensional sequence processing
CN111353373B (en) Related alignment domain adaptive fault diagnosis method
CN114758272A (en) Forged video detection method based on frequency domain self-attention
CN111950649A (en) Attention mechanism and capsule network-based low-illumination image classification method
CN113255837A (en) Improved CenterNet network-based target detection method in industrial environment
CN112766376A (en) Multi-label eye fundus image identification method based on GACNN
CN113903073A (en) False video detection method based on local enhancement transform
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
CN112883941A (en) Facial expression recognition method based on parallel neural network
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention
CN115880523A (en) Image classification model, model training method and application thereof
CN115147601A (en) Urban street point cloud semantic segmentation method based on self-attention global feature enhancement
CN112990340B (en) Self-learning migration method based on feature sharing
CN113222002B (en) Zero sample classification method based on generative discriminative contrast optimization
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network
CN112016592B (en) Domain adaptive semantic segmentation method and device based on cross domain category perception
Yawalkar et al. Automatic handwritten character recognition of Devanagari language: a hybrid training algorithm for neural network
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
CN114972851A (en) Remote sensing image-based ship target intelligent detection method
CN114821706A (en) Fake image detection and positioning method and system based on regional perception
CN115170876A (en) Image identification method based on causal interference attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination