CN114155139A - Deepfake generation method based on vector discretization representation - Google Patents
Deepfake generation method based on vector discretization representation Download PDFInfo
- Publication number
- CN114155139A CN114155139A CN202111400589.0A CN202111400589A CN114155139A CN 114155139 A CN114155139 A CN 114155139A CN 202111400589 A CN202111400589 A CN 202111400589A CN 114155139 A CN114155139 A CN 114155139A
- Authority
- CN
- China
- Prior art keywords
- video frame
- picture
- vector
- face
- target video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000013598 vector Substances 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000001514 detection method Methods 0.000 claims abstract description 5
- 230000004927 fusion Effects 0.000 claims abstract description 5
- 239000004576 sand Substances 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 11
- 238000012549 training Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G06T3/04—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G06T5/73—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A method for generating the deepfake based on the vector discretization representation includes the steps of extracting video frames of a source video and a target video, and then sequentially carrying out face detection, face alignment, trained face exchange network, face sharpening and fusion and video frame combination on the source video frames to obtain a final result. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.
Description
Technical Field
The invention relates to the field of generation of face exchange in videos, in particular to a deepfake generation method based on vector discretization representation.
Background
With the development of deep learning techniques and the flooding of personal media data in public network environments, many fake face-changing videos have been generated. And a technique of generating a fake face-changed video using deep learning is called deep. Specifically, the technology is to replace the face of the source video with the face of the target video, and ensure that the exchanged face remains unchanged in the source face attribute information (expression, illumination, background, and the like) and the target face identity information. The current generation technology is mainly realized by an autoencoder and a generative countermeasure network.
The self-encoder generation is through a common encoder and two source and target decoders, respectively. Through training, the source face picture in the source video frame and the target face picture in the target face video frame are put into a common encoder to extract the general characteristics of the human face, and then the general characteristics are reconstructed into the source face picture and the target face picture through respective decoders. The face changing is to put the source face image into the trained common encoder and then output the source face image from the decoder of the target face, so as to finally obtain the face changing result. However, the faces synthesized by the self-encoder are all operated at the pixel level, so that the whole process generates artifacts, and the details of the synthesized image are not clear enough and the image quality is not stable enough.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a deepfake generation method based on vector discretization representation, which reduces artifacts generated in the face changing operation process.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a deepfake generation method based on vector discretization representation comprises the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
Further, step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet;
a-2) for source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection;
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignAlignment person of jA face picture. Further, step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reacting Ps iAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formulah is Ps i、Pt jW is Ps i、Pt jWidth value of (n)zIn order to embed the dimensions of the image,k isNumber of vectors, encoding the vector tqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg;
b-3) by the formula l1=||sg-Ps i||2+||tg-Pt j||2Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsDiscriminating t is carried out, and decoding result t is obtainedgInput to the target video frame picture discriminator DtMaking a judgment by using the formula l3=logDs(sg)+log(1-Ds(Ps i))+logDt(tg)+log(1-Dt(Pt j) Computing the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3And (4) back propagation, and continuously carrying out iterative adjustment on the network model in the b-1) by using an optimizer.
Further, aligning the face picture P in the source video frame in the step c)sface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog。
Further, decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf。
Further, in step e)Frames of multiple video framesfSynthesis of the final video V Using the multimedia processing tool ffempgf。
Preferably, the residual error unit in step b-1) includes a normalized convolution module and a convolution layer, the normalized convolution module sequentially includes two normalized layers plus the convolution layer, the picture sequentially passes through the two normalized layers plus the convolution layer and the output result of the convolution layer and is subjected to addition operation, the convolution kernel of the convolution layer is 3 × 3, the step length is 1, and the padding is 1.
Preferably, the convolution kernel of the downsampled convolutional layer of the encoder E in step b-1) is 3 × 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses the LeakyReLU activation function.
The invention has the beneficial effects that: the final result is obtained by extracting video frames of a source video and a target video, and then sequentially carrying out face detection, face alignment, a trained face exchange network, face sharpening and fusion and video frame combination on the source video frames. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of the pretreatment process of the present invention;
FIG. 3 is a diagram of a training network of the model of the present invention;
FIG. 4 is a diagram of the testing and post-processing of the model of the present invention;
FIG. 5 is a diagram of a residual unit structure of the model of the present invention;
FIG. 6 is a diagram of an encoder network model of the present invention;
FIG. 7 is a diagram of a decoder network model of the present invention;
FIG. 8 is a diagram of a discrete space embedding process of the model of the present invention;
FIG. 9 is a diagram of a network model of the discriminator according to the present invention.
Detailed Description
The present invention will be further described with reference to fig. 1 to 9.
A deepfake generation method based on vector discretization representation comprises the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
The method comprises the steps of putting a source face picture obtained through preprocessing into a trained shared encoder to obtain an encoding result, quantizing the encoding result into a space vector in a discrete representation mode in order to reduce artifacts generated in the face changing operation process, and then putting the space vector in the discrete representation mode into a trained target face decoder to obtain a face changing result. And carrying out a series of post-processing on the face changing result to obtain a final video result. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.
The step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet;
a-2) For source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection;
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignThe jth aligned face picture. The step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reacting Ps iAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formulah is Ps i、Pt jW is Ps i、Pt jWidth value of (n)zIn order to embed the dimensions of the image,k is the number of vectors, and the vector t is encodedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg;
b-3) by the formula l1=||sg-Ps i||2+||tg-Pt j||2Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsDiscriminating t is carried out, and decoding result t is obtainedgInput to the target video frame picture discriminator DtMaking a judgment by using the formula l3=logDs(sg)+log(1-Ds(Ps i))+logDt(tg)+log(1-Dt(Pt j) Computing the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3Backward propagation, using optimizers constantly on networks in b-1)And carrying out iterative adjustment on the model.
Step c) aligning the face picture P in the source video framesface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog。
Decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf。
The plurality of video frame frames in step e) are processedfSynthesis of the final video V Using the multimedia processing tool ffempgf. And b-1), wherein the residual error unit comprises a normalized convolution module and a convolution layer, the normalized convolution module sequentially comprises two normalized layers and a convolution layer, the picture sequentially passes through the two normalized layers, the convolution layer and the convolution layer, the output result is subjected to addition operation, the convolution kernel of the convolution layer is 3 x 3, the step length is 1, and the filling is 1. The convolution kernel of the downsampled convolutional layer of the encoder E in the step b-1) is 3 x 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses a Leaky ReLU activation function.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A deepfake generation method based on vector discretization representation is characterized by comprising the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
2. The method for deepfake generation based on vector discretization representation according to claim 1, wherein step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet;
a-2) for source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection;
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align,Is Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignThe jth aligned face picture.
3. The method for deepfake generation based on vector discretization representation according to claim 2, wherein step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reactingAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formulah isPt jA height value of w isPt jWidth value of (n)zIn order to embed the dimensions of the image,k is the number of vectors, and the vector t is encodedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg;
b-3) by the formulaCalculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsDiscriminating t is carried out, and decoding result t is obtainedgInput to the target video frame picture discriminator DtMaking a judgment by using a formulaCalculating the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3And (4) back propagation, and continuously carrying out iterative adjustment on the network model in the b-1) by using an optimizer.
4. The method of deepfake generation based on vector discretization representation according to claim 3, characterized in that: viewing the source in step c)Aligned face picture P in frequency framesface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog。
5. The method of deepfake generation based on vector discretization representation according to claim 4, wherein: decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf。
6. The method of depfake generation based on vector discretization representation according to claim 5, wherein: the plurality of video frame frames in step e) are processedfSynthesis of the final video V Using the multimedia processing tool ffempgf。
7. The method for depfake generation based on vector discretization representation according to claim 1, wherein: and b-1), wherein the residual error unit comprises a normalized convolution module and a convolution layer, the normalized convolution module sequentially comprises two normalized layers and a convolution layer, the picture sequentially passes through the two normalized layers, the convolution layer and the convolution layer, the output result is subjected to addition operation, the convolution kernel of the convolution layer is 3 x 3, the step length is 1, and the filling is 1.
8. The method for depfake generation based on vector discretization representation according to claim 1, wherein: the convolution kernel of the downsampled convolutional layer of the encoder E in the step b-1) is 3 x 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses a Leaky ReLU activation function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111400589.0A CN114155139B (en) | 2021-11-23 | 2021-11-23 | Deepfake generation method based on vector discretization representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111400589.0A CN114155139B (en) | 2021-11-23 | 2021-11-23 | Deepfake generation method based on vector discretization representation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114155139A true CN114155139A (en) | 2022-03-08 |
CN114155139B CN114155139B (en) | 2022-07-22 |
Family
ID=80457238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111400589.0A Active CN114155139B (en) | 2021-11-23 | 2021-11-23 | Deepfake generation method based on vector discretization representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114155139B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311720A (en) * | 2022-08-11 | 2022-11-08 | 山东省人工智能研究院 | Defekake generation method based on Transformer |
CN116246022A (en) * | 2023-03-09 | 2023-06-09 | 山东省人工智能研究院 | Face image identity synthesis method based on progressive denoising guidance |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6415056B1 (en) * | 1996-01-22 | 2002-07-02 | Matsushita Electric Industrial, Co., Ltd. | Digital image encoding and decoding method and digital image encoding and decoding device using the same |
US20130235045A1 (en) * | 2012-03-06 | 2013-09-12 | Mixamo, Inc. | Systems and methods for creating and distributing modifiable animated video messages |
CN103489011A (en) * | 2013-09-16 | 2014-01-01 | 广东工业大学 | Three-dimensional face identification method with topology robustness |
US20200288171A1 (en) * | 2017-07-03 | 2020-09-10 | Nokia Technologies Oy | Apparatus, a method and a computer program for omnidirectional video |
CN112446364A (en) * | 2021-01-29 | 2021-03-05 | 中国科学院自动化研究所 | High-definition face replacement video generation method and system |
US20210084290A1 (en) * | 2017-12-14 | 2021-03-18 | Electronics And Telecommunications Research Institute | Image encoding and decoding method and device using prediction network |
US20210090217A1 (en) * | 2019-09-23 | 2021-03-25 | Tencent America LLC | Video coding for machine (vcm) based system and method for video super resolution (sr) |
US20210124996A1 (en) * | 2019-10-24 | 2021-04-29 | Sony Interactive Entertainment Inc. | Encoding and decoding apparatus |
CN113192161A (en) * | 2021-04-22 | 2021-07-30 | 清华珠三角研究院 | Virtual human image video generation method, system, device and storage medium |
CN113240575A (en) * | 2021-05-12 | 2021-08-10 | 中国科学技术大学 | Face counterfeit video effect enhancement method |
-
2021
- 2021-11-23 CN CN202111400589.0A patent/CN114155139B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6415056B1 (en) * | 1996-01-22 | 2002-07-02 | Matsushita Electric Industrial, Co., Ltd. | Digital image encoding and decoding method and digital image encoding and decoding device using the same |
US20130235045A1 (en) * | 2012-03-06 | 2013-09-12 | Mixamo, Inc. | Systems and methods for creating and distributing modifiable animated video messages |
CN103489011A (en) * | 2013-09-16 | 2014-01-01 | 广东工业大学 | Three-dimensional face identification method with topology robustness |
US20200288171A1 (en) * | 2017-07-03 | 2020-09-10 | Nokia Technologies Oy | Apparatus, a method and a computer program for omnidirectional video |
US20210084290A1 (en) * | 2017-12-14 | 2021-03-18 | Electronics And Telecommunications Research Institute | Image encoding and decoding method and device using prediction network |
US20210090217A1 (en) * | 2019-09-23 | 2021-03-25 | Tencent America LLC | Video coding for machine (vcm) based system and method for video super resolution (sr) |
US20210124996A1 (en) * | 2019-10-24 | 2021-04-29 | Sony Interactive Entertainment Inc. | Encoding and decoding apparatus |
CN112446364A (en) * | 2021-01-29 | 2021-03-05 | 中国科学院自动化研究所 | High-definition face replacement video generation method and system |
CN113192161A (en) * | 2021-04-22 | 2021-07-30 | 清华珠三角研究院 | Virtual human image video generation method, system, device and storage medium |
CN113240575A (en) * | 2021-05-12 | 2021-08-10 | 中国科学技术大学 | Face counterfeit video effect enhancement method |
Non-Patent Citations (4)
Title |
---|
DAVID GUERA: "Deepfake Video Detection Using Recurrent Neural Networks", 《IEEE》 * |
IVAN PEROV: "DeepFaceLab: A simple, flexible and extensible face", 《ARXIV》 * |
王先先等: "一种基于改进条件生成式对抗网络的人脸表情生成方法", 《小型微型计算机系统》 * |
高威等: "DeepFake技术背后的安全问题:机遇与挑战", 《信息安全研究》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311720A (en) * | 2022-08-11 | 2022-11-08 | 山东省人工智能研究院 | Defekake generation method based on Transformer |
CN116246022A (en) * | 2023-03-09 | 2023-06-09 | 山东省人工智能研究院 | Face image identity synthesis method based on progressive denoising guidance |
CN116246022B (en) * | 2023-03-09 | 2024-01-26 | 山东省人工智能研究院 | Face image identity synthesis method based on progressive denoising guidance |
Also Published As
Publication number | Publication date |
---|---|
CN114155139B (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gu et al. | NTIRE 2022 challenge on perceptual image quality assessment | |
CN113658051B (en) | Image defogging method and system based on cyclic generation countermeasure network | |
CN114155139B (en) | Deepfake generation method based on vector discretization representation | |
CN109993678B (en) | Robust information hiding method based on deep confrontation generation network | |
CN115311720B (en) | Method for generating deepfake based on transducer | |
CN111369565A (en) | Digital pathological image segmentation and classification method based on graph convolution network | |
Oquab et al. | Low bandwidth video-chat compression using deep generative models | |
CN116246022B (en) | Face image identity synthesis method based on progressive denoising guidance | |
CN110880193A (en) | Image compression method using depth semantic segmentation technology | |
CN115829876A (en) | Real degraded image blind restoration method based on cross attention mechanism | |
CN115546060A (en) | Reversible underwater image enhancement method | |
CN116309107A (en) | Underwater image enhancement method based on Transformer and generated type countermeasure network | |
CN115713680A (en) | Semantic guidance-based face image identity synthesis method | |
CN114449276B (en) | Super prior side information compensation image compression method based on learning | |
CN116612211A (en) | Face image identity synthesis method based on GAN and 3D coefficient reconstruction | |
Fujihashi et al. | Wireless 3D point cloud delivery using deep graph neural networks | |
CN113781324B (en) | Old photo restoration method | |
CN108171325B (en) | Time sequence integration network, coding device and decoding device for multi-scale face recovery | |
CN115879516B (en) | Data evidence obtaining method | |
CN116523985A (en) | Structure and texture feature guided double-encoder image restoration method | |
Huang et al. | CLSR: cross-layer interaction pyramid super-resolution network | |
CN114494387A (en) | Data set network generation model and fog map generation method | |
CN107018287A (en) | The method and apparatus for carrying out noise reduction to image using video epitome | |
Xie et al. | Visual Redundancy Removal of Composite Images via Multimodal Learning | |
CN115272122B (en) | Priori-guided single-stage distillation image defogging method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |