CN114155139A - Deepfake generation method based on vector discretization representation - Google Patents

Deepfake generation method based on vector discretization representation Download PDF

Info

Publication number
CN114155139A
CN114155139A CN202111400589.0A CN202111400589A CN114155139A CN 114155139 A CN114155139 A CN 114155139A CN 202111400589 A CN202111400589 A CN 202111400589A CN 114155139 A CN114155139 A CN 114155139A
Authority
CN
China
Prior art keywords
video frame
picture
vector
face
target video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111400589.0A
Other languages
Chinese (zh)
Other versions
CN114155139B (en
Inventor
舒明雷
曹伟
陈达
刘丽
许继勇
孔祥龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Original Assignee
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Institute of Artificial Intelligence filed Critical Qilu University of Technology
Priority to CN202111400589.0A priority Critical patent/CN114155139B/en
Publication of CN114155139A publication Critical patent/CN114155139A/en
Application granted granted Critical
Publication of CN114155139B publication Critical patent/CN114155139B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method for generating the deepfake based on the vector discretization representation includes the steps of extracting video frames of a source video and a target video, and then sequentially carrying out face detection, face alignment, trained face exchange network, face sharpening and fusion and video frame combination on the source video frames to obtain a final result. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.

Description

Deepfake generation method based on vector discretization representation
Technical Field
The invention relates to the field of generation of face exchange in videos, in particular to a deepfake generation method based on vector discretization representation.
Background
With the development of deep learning techniques and the flooding of personal media data in public network environments, many fake face-changing videos have been generated. And a technique of generating a fake face-changed video using deep learning is called deep. Specifically, the technology is to replace the face of the source video with the face of the target video, and ensure that the exchanged face remains unchanged in the source face attribute information (expression, illumination, background, and the like) and the target face identity information. The current generation technology is mainly realized by an autoencoder and a generative countermeasure network.
The self-encoder generation is through a common encoder and two source and target decoders, respectively. Through training, the source face picture in the source video frame and the target face picture in the target face video frame are put into a common encoder to extract the general characteristics of the human face, and then the general characteristics are reconstructed into the source face picture and the target face picture through respective decoders. The face changing is to put the source face image into the trained common encoder and then output the source face image from the decoder of the target face, so as to finally obtain the face changing result. However, the faces synthesized by the self-encoder are all operated at the pixel level, so that the whole process generates artifacts, and the details of the synthesized image are not clear enough and the image quality is not stable enough.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a deepfake generation method based on vector discretization representation, which reduces artifacts generated in the face changing operation process.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a deepfake generation method based on vector discretization representation comprises the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
Further, step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet
a-2) for source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignAlignment person of jA face picture. Further, step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reacting Ps iAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formula
Figure BDA0003368859330000031
Calculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formula
Figure BDA0003368859330000032
h is Ps i、Pt jW is Ps i、Pt jWidth value of (n)zIn order to embed the dimensions of the image,
Figure BDA0003368859330000033
k isNumber of vectors, encoding the vector tqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formula
Figure BDA0003368859330000034
Calculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg
b-3) by the formula l1=||sg-Ps i||2+||tg-Pt j||2Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsDiscriminating t is carried out, and decoding result t is obtainedgInput to the target video frame picture discriminator DtMaking a judgment by using the formula l3=logDs(sg)+log(1-Ds(Ps i))+logDt(tg)+log(1-Dt(Pt j) Computing the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3And (4) back propagation, and continuously carrying out iterative adjustment on the network model in the b-1) by using an optimizer.
Further, aligning the face picture P in the source video frame in the step c)sface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog
Further, decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf
Further, in step e)Frames of multiple video framesfSynthesis of the final video V Using the multimedia processing tool ffempgf
Preferably, the residual error unit in step b-1) includes a normalized convolution module and a convolution layer, the normalized convolution module sequentially includes two normalized layers plus the convolution layer, the picture sequentially passes through the two normalized layers plus the convolution layer and the output result of the convolution layer and is subjected to addition operation, the convolution kernel of the convolution layer is 3 × 3, the step length is 1, and the padding is 1.
Preferably, the convolution kernel of the downsampled convolutional layer of the encoder E in step b-1) is 3 × 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses the LeakyReLU activation function.
The invention has the beneficial effects that: the final result is obtained by extracting video frames of a source video and a target video, and then sequentially carrying out face detection, face alignment, a trained face exchange network, face sharpening and fusion and video frame combination on the source video frames. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of the pretreatment process of the present invention;
FIG. 3 is a diagram of a training network of the model of the present invention;
FIG. 4 is a diagram of the testing and post-processing of the model of the present invention;
FIG. 5 is a diagram of a residual unit structure of the model of the present invention;
FIG. 6 is a diagram of an encoder network model of the present invention;
FIG. 7 is a diagram of a decoder network model of the present invention;
FIG. 8 is a diagram of a discrete space embedding process of the model of the present invention;
FIG. 9 is a diagram of a network model of the discriminator according to the present invention.
Detailed Description
The present invention will be further described with reference to fig. 1 to 9.
A deepfake generation method based on vector discretization representation comprises the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
The method comprises the steps of putting a source face picture obtained through preprocessing into a trained shared encoder to obtain an encoding result, quantizing the encoding result into a space vector in a discrete representation mode in order to reduce artifacts generated in the face changing operation process, and then putting the space vector in the discrete representation mode into a trained target face decoder to obtain a face changing result. And carrying out a series of post-processing on the face changing result to obtain a final video result. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.
The step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet
a-2) For source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignThe jth aligned face picture. The step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reacting Ps iAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formula
Figure BDA0003368859330000061
Calculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formula
Figure BDA0003368859330000062
h is Ps i、Pt jW is Ps i、Pt jWidth value of (n)zIn order to embed the dimensions of the image,
Figure BDA0003368859330000063
k is the number of vectors, and the vector t is encodedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formula
Figure BDA0003368859330000064
Calculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg
b-3) by the formula l1=||sg-Ps i||2+||tg-Pt j||2Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsDiscriminating t is carried out, and decoding result t is obtainedgInput to the target video frame picture discriminator DtMaking a judgment by using the formula l3=logDs(sg)+log(1-Ds(Ps i))+logDt(tg)+log(1-Dt(Pt j) Computing the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3Backward propagation, using optimizers constantly on networks in b-1)And carrying out iterative adjustment on the model.
Step c) aligning the face picture P in the source video framesface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog
Decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf
The plurality of video frame frames in step e) are processedfSynthesis of the final video V Using the multimedia processing tool ffempgf. And b-1), wherein the residual error unit comprises a normalized convolution module and a convolution layer, the normalized convolution module sequentially comprises two normalized layers and a convolution layer, the picture sequentially passes through the two normalized layers, the convolution layer and the convolution layer, the output result is subjected to addition operation, the convolution kernel of the convolution layer is 3 x 3, the step length is 1, and the filling is 1. The convolution kernel of the downsampled convolutional layer of the encoder E in the step b-1) is 3 x 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses a Leaky ReLU activation function.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A deepfake generation method based on vector discretization representation is characterized by comprising the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
2. The method for deepfake generation based on vector discretization representation according to claim 1, wherein step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet
a-2) for source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align
Figure FDA0003368859320000011
Is Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignThe jth aligned face picture.
3. The method for deepfake generation based on vector discretization representation according to claim 2, wherein step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reacting
Figure FDA0003368859320000021
And Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formula
Figure FDA0003368859320000022
Calculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formula
Figure FDA0003368859320000023
h is
Figure FDA0003368859320000024
Pt jA height value of w is
Figure FDA0003368859320000025
Pt jWidth value of (n)zIn order to embed the dimensions of the image,
Figure FDA0003368859320000026
k is the number of vectors, and the vector t is encodedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formula
Figure FDA0003368859320000027
Calculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg
b-3) by the formula
Figure FDA0003368859320000028
Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsDiscriminating t is carried out, and decoding result t is obtainedgInput to the target video frame picture discriminator DtMaking a judgment by using a formula
Figure FDA0003368859320000029
Calculating the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3And (4) back propagation, and continuously carrying out iterative adjustment on the network model in the b-1) by using an optimizer.
4. The method of deepfake generation based on vector discretization representation according to claim 3, characterized in that: viewing the source in step c)Aligned face picture P in frequency framesface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog
5. The method of deepfake generation based on vector discretization representation according to claim 4, wherein: decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf
6. The method of depfake generation based on vector discretization representation according to claim 5, wherein: the plurality of video frame frames in step e) are processedfSynthesis of the final video V Using the multimedia processing tool ffempgf
7. The method for depfake generation based on vector discretization representation according to claim 1, wherein: and b-1), wherein the residual error unit comprises a normalized convolution module and a convolution layer, the normalized convolution module sequentially comprises two normalized layers and a convolution layer, the picture sequentially passes through the two normalized layers, the convolution layer and the convolution layer, the output result is subjected to addition operation, the convolution kernel of the convolution layer is 3 x 3, the step length is 1, and the filling is 1.
8. The method for depfake generation based on vector discretization representation according to claim 1, wherein: the convolution kernel of the downsampled convolutional layer of the encoder E in the step b-1) is 3 x 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses a Leaky ReLU activation function.
CN202111400589.0A 2021-11-23 2021-11-23 Deepfake generation method based on vector discretization representation Active CN114155139B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111400589.0A CN114155139B (en) 2021-11-23 2021-11-23 Deepfake generation method based on vector discretization representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111400589.0A CN114155139B (en) 2021-11-23 2021-11-23 Deepfake generation method based on vector discretization representation

Publications (2)

Publication Number Publication Date
CN114155139A true CN114155139A (en) 2022-03-08
CN114155139B CN114155139B (en) 2022-07-22

Family

ID=80457238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111400589.0A Active CN114155139B (en) 2021-11-23 2021-11-23 Deepfake generation method based on vector discretization representation

Country Status (1)

Country Link
CN (1) CN114155139B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer
CN116246022A (en) * 2023-03-09 2023-06-09 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415056B1 (en) * 1996-01-22 2002-07-02 Matsushita Electric Industrial, Co., Ltd. Digital image encoding and decoding method and digital image encoding and decoding device using the same
US20130235045A1 (en) * 2012-03-06 2013-09-12 Mixamo, Inc. Systems and methods for creating and distributing modifiable animated video messages
CN103489011A (en) * 2013-09-16 2014-01-01 广东工业大学 Three-dimensional face identification method with topology robustness
US20200288171A1 (en) * 2017-07-03 2020-09-10 Nokia Technologies Oy Apparatus, a method and a computer program for omnidirectional video
CN112446364A (en) * 2021-01-29 2021-03-05 中国科学院自动化研究所 High-definition face replacement video generation method and system
US20210084290A1 (en) * 2017-12-14 2021-03-18 Electronics And Telecommunications Research Institute Image encoding and decoding method and device using prediction network
US20210090217A1 (en) * 2019-09-23 2021-03-25 Tencent America LLC Video coding for machine (vcm) based system and method for video super resolution (sr)
US20210124996A1 (en) * 2019-10-24 2021-04-29 Sony Interactive Entertainment Inc. Encoding and decoding apparatus
CN113192161A (en) * 2021-04-22 2021-07-30 清华珠三角研究院 Virtual human image video generation method, system, device and storage medium
CN113240575A (en) * 2021-05-12 2021-08-10 中国科学技术大学 Face counterfeit video effect enhancement method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6415056B1 (en) * 1996-01-22 2002-07-02 Matsushita Electric Industrial, Co., Ltd. Digital image encoding and decoding method and digital image encoding and decoding device using the same
US20130235045A1 (en) * 2012-03-06 2013-09-12 Mixamo, Inc. Systems and methods for creating and distributing modifiable animated video messages
CN103489011A (en) * 2013-09-16 2014-01-01 广东工业大学 Three-dimensional face identification method with topology robustness
US20200288171A1 (en) * 2017-07-03 2020-09-10 Nokia Technologies Oy Apparatus, a method and a computer program for omnidirectional video
US20210084290A1 (en) * 2017-12-14 2021-03-18 Electronics And Telecommunications Research Institute Image encoding and decoding method and device using prediction network
US20210090217A1 (en) * 2019-09-23 2021-03-25 Tencent America LLC Video coding for machine (vcm) based system and method for video super resolution (sr)
US20210124996A1 (en) * 2019-10-24 2021-04-29 Sony Interactive Entertainment Inc. Encoding and decoding apparatus
CN112446364A (en) * 2021-01-29 2021-03-05 中国科学院自动化研究所 High-definition face replacement video generation method and system
CN113192161A (en) * 2021-04-22 2021-07-30 清华珠三角研究院 Virtual human image video generation method, system, device and storage medium
CN113240575A (en) * 2021-05-12 2021-08-10 中国科学技术大学 Face counterfeit video effect enhancement method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DAVID GUERA: "Deepfake Video Detection Using Recurrent Neural Networks", 《IEEE》 *
IVAN PEROV: "DeepFaceLab: A simple, flexible and extensible face", 《ARXIV》 *
王先先等: "一种基于改进条件生成式对抗网络的人脸表情生成方法", 《小型微型计算机系统》 *
高威等: "DeepFake技术背后的安全问题:机遇与挑战", 《信息安全研究》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer
CN116246022A (en) * 2023-03-09 2023-06-09 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance
CN116246022B (en) * 2023-03-09 2024-01-26 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance

Also Published As

Publication number Publication date
CN114155139B (en) 2022-07-22

Similar Documents

Publication Publication Date Title
Gu et al. NTIRE 2022 challenge on perceptual image quality assessment
CN113658051B (en) Image defogging method and system based on cyclic generation countermeasure network
CN114155139B (en) Deepfake generation method based on vector discretization representation
CN109993678B (en) Robust information hiding method based on deep confrontation generation network
CN115311720B (en) Method for generating deepfake based on transducer
CN111369565A (en) Digital pathological image segmentation and classification method based on graph convolution network
Oquab et al. Low bandwidth video-chat compression using deep generative models
CN116246022B (en) Face image identity synthesis method based on progressive denoising guidance
CN110880193A (en) Image compression method using depth semantic segmentation technology
CN115829876A (en) Real degraded image blind restoration method based on cross attention mechanism
CN115546060A (en) Reversible underwater image enhancement method
CN116309107A (en) Underwater image enhancement method based on Transformer and generated type countermeasure network
CN115713680A (en) Semantic guidance-based face image identity synthesis method
CN114449276B (en) Super prior side information compensation image compression method based on learning
CN116612211A (en) Face image identity synthesis method based on GAN and 3D coefficient reconstruction
Fujihashi et al. Wireless 3D point cloud delivery using deep graph neural networks
CN113781324B (en) Old photo restoration method
CN108171325B (en) Time sequence integration network, coding device and decoding device for multi-scale face recovery
CN115879516B (en) Data evidence obtaining method
CN116523985A (en) Structure and texture feature guided double-encoder image restoration method
Huang et al. CLSR: cross-layer interaction pyramid super-resolution network
CN114494387A (en) Data set network generation model and fog map generation method
CN107018287A (en) The method and apparatus for carrying out noise reduction to image using video epitome
Xie et al. Visual Redundancy Removal of Composite Images via Multimodal Learning
CN115272122B (en) Priori-guided single-stage distillation image defogging method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant