CN114155139B - Deepfake generation method based on vector discretization representation - Google Patents
Deepfake generation method based on vector discretization representation Download PDFInfo
- Publication number
- CN114155139B CN114155139B CN202111400589.0A CN202111400589A CN114155139B CN 114155139 B CN114155139 B CN 114155139B CN 202111400589 A CN202111400589 A CN 202111400589A CN 114155139 B CN114155139 B CN 114155139B
- Authority
- CN
- China
- Prior art keywords
- video frame
- picture
- vector
- face
- target video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000013598 vector Substances 0.000 title claims abstract description 69
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000001514 detection method Methods 0.000 claims abstract description 5
- 230000004927 fusion Effects 0.000 claims abstract description 5
- 239000004576 sand Substances 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 15
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 11
- 238000012549 training Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G06T3/04—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G06T5/73—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Abstract
A method for generating the deepfake based on the vector discretization representation includes the steps of extracting video frames of a source video and a target video, and then sequentially carrying out face detection, face alignment, trained face exchange network, face sharpening and fusion and video frame combination on the source video frames to obtain a final result. The method converts the coding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded pictures to be clearer and the quality to be more stable.
Description
Technical Field
The invention relates to the field of generation of face exchange in videos, in particular to a deepake generation method based on vector discretization representation.
Background
With the development of deep learning techniques and the flooding of personal media data in public network environments, many fake face-changing videos have been generated. And the technique of generating a fake face-changed video using deep learning is called Deepfake. Specifically, the technology is to replace the face of the source video with the face of the target video, and ensure that the exchanged face remains unchanged from the source face attribute information (expression, illumination, background, etc.) and the target face identity information. The current generation technology is mainly realized by an autoencoder and a generative countermeasure network.
The self-encoder generation is through a common encoder and two respective source and target decoders. Through training, the source face picture in the source video frame and the target face picture in the target face video frame are put into a common encoder to extract the general characteristics of the human face, and then the general characteristics are reconstructed into the source face picture and the target face picture through respective decoders. The face changing is to put the source face image into a trained shared encoder and then output the source face image from a decoder of the target face, so as to finally obtain a face changing result. However, the faces synthesized by the self-encoder are all operated at the pixel level, so that the whole process generates artifacts, and the details of the synthesized image are not clear enough and the image quality is not stable enough.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a deepfake generation method based on vector discretization representation, which reduces artifacts generated in the face changing operation process.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a deepfake generation method based on vector discretization representation comprises the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
Further, step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting a source video framesFor the target video VtExtracting target video framet;
a-2) for source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection;
a-3) taking a face picture Psface_dectionAnd Ptface_dectionCarrying out alignment operation on the human face characteristic points by using a 2DFAN human face alignment algorithm to respectively obtain aligned human face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignThe jth aligned face picture. Further, step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtA source video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtEach composed of 2 residual error units and 4 upsampling convolution layers in sequence, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAre all sequentially composed of 2A residual unit, an attnBlock module of the transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe components are sequentially composed of 2 convolution and activation function layers, 3 convolution and activation function plus batch normalization layers and 2 convolution and activation function layers;
b-2) reacting Ps iAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formulah is Ps i、Pt jW is Ps i、Pt jWidth value of (n)zIn order to embed the dimensions of the image,k is the number of vectors, and the vector t is encodedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg;
b-3) by the formula l1=||sg-Ps i||2+||tg-Pt j||2Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame picture discriminator DsMaking a judgment, and decoding the result tgInput to a target video frame picture discriminator DtMaking a judgment by using the formula l3=logDs(sg)+log(1-Ds(Ps i))+logDt(tg)+log(1-Dt(Pt j) Computing the loss l of the reconstructed picture and the original picture3Will lose l1Loss of l2And loss of l3And (4) back propagation, and continuously carrying out iterative adjustment on the network model in the b-1) by using an optimizer.
Further, in step c), aligning the face picture P in the source video framesface_alignInputting the image data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame image discrete vector embedding unit and a target video frame image decoder GtThen obtaining a decoding result tstog。
Further, decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf。
Further, the plurality of video frame frames in step e) are processedfSynthesis of the final video V Using the multimedia processing tool ffempgf。
Preferably, the residual error unit in step b-1) includes a normalized convolution module and a convolution layer, the normalized convolution module sequentially includes two normalized layers plus the convolution layer, the picture sequentially passes through the two normalized layers plus the convolution layer and the output result of the convolution layer and is subjected to addition operation, the convolution kernel of the convolution layer is 3 × 3, the step length is 1, and the padding is 1.
Preferably, the convolution kernel of the downsampled convolutional layer of the encoder E in step b-1) is 3 × 3, the step size is 2, and the padding is 0; the convolution kernel of the upsampled convolutional layer is 3 x 3, the step length is 1, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step size is 2, the padding is 1,source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses the LeakyReLU activation function.
The beneficial effects of the invention are: the final result is obtained by extracting video frames of a source video and a target video, and then sequentially carrying out face detection, face alignment, trained face switching network, face sharpening and fusion and video frame combination on the source video frames. The method converts the coding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded picture to be clearer and the quality to be more stable.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of the pretreatment process of the present invention;
FIG. 3 is a diagram of a training network of the model of the present invention;
FIG. 4 is a diagram of the testing and post-processing of the model of the present invention;
FIG. 5 is a diagram of a residual unit structure of the model of the present invention;
FIG. 6 is a diagram of an encoder network model of the present invention;
FIG. 7 is a diagram of a decoder network model of the present invention;
FIG. 8 is a diagram of a discrete spatial embedding process of the model of the present invention;
FIG. 9 is a diagram of a network model of the discriminator according to the present invention.
Detailed Description
The present invention is further described with reference to fig. 1 to 9.
A depfake generation method based on vector discretization representation comprises the following steps:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating the steps c) -d), and combining the video frames into a final video.
The method comprises the steps of putting a source face picture obtained through preprocessing into a trained shared encoder to obtain an encoding result, quantizing the encoding result into a space vector in a discrete representation mode in order to reduce artifacts generated in the face changing operation process, and then putting the space vector in the discrete representation mode into a trained target face decoder to obtain a face changing result. And carrying out a series of post-processing on the face changing result to obtain a final video result. The method converts the encoding result into a vector discrete representation form, and reduces artifacts generated in the face changing operation process. Meanwhile, the discriminator added in the training process enables the details of the decoded pictures to be clearer and the quality to be more stable.
The step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the source video VsExtracting source video frame framessFor the target video VtExtracting target video framet;
a-2) for source video framesAnd target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection;
a-3) taking a face picture Psface_dectionAnd Ptface_dectionUsing 2DFAN face alignment algorithm to align the face feature points to respectively obtain aligned face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignThe jth aligned face picture. The step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtA source video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E sequentially comprises 2 residual error units and 4 down-sampling convolution layers, and the source video frame picture decoder GsAnd a target video frame picture decoder GtAre composed of 2 residual error units and 4 upsampling convolution layers in turn, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe structure of the system is composed of 2 layers of convolution and activation function, 3 layers of convolution and activation function and batch normalization layer and 2 layers of convolution and activation function in sequence;
b-2) reacting Ps iAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formulah is Ps i、Pt jW is Ps i、Pt jWidth value of nzIn order to embed the dimensions of the image,k is the number of vectors, and the vector t is codedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZtWill space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg;
b-3) by the formula l1=||sg-Ps i||2+||tg-Pt j||2Calculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame image discriminator DsMaking a judgment, and decoding the result tgInput to a target video frame picture discriminator DtMaking a judgment by using the formula l3=logDs(sg)+log(1-Ds(Ps i))+logDt(tg)+log(1-Dt(Pt j) Computing the loss l of the reconstructed picture and the original picture3Will lose l1Loss of l2And loss of l3And (4) back propagation, and continuously performing iterative adjustment on the network model in b-1) by using an optimizer.
Step c) aligning the face picture P in the source video framesface_alignInputting the image data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame image discrete vector embedding unit and a target video frame image decoder GtThen obtaining a decoding result tstog。
In step d) decoding result tstogObtaining a plurality of video frame frames through sharpening and fusion operationsf。
Multiple video frame frames in step e)fSynthesis of the final video V Using the multimedia processing tool ffempgf. The residual error unit in the step b-1) comprises a normalized convolution module and a convolution layer, wherein the normalized convolution module sequentially comprises two normalization layers and a convolution layer, and the picture sequentially passes through two normalization layers and two convolution layersAnd adding the convolution layers and the output results of the convolution layers by the normalization layers, wherein the convolution kernel of the convolution layers is 3 x 3, the step length is 1, and the filling is 1. The convolution kernel of the downsampled convolutional layer of the encoder E in the step b-1) is 3 x 3, the step size is 2, and the padding is 0; convolution kernel of the upsampled convolution layer is 3 x 3, step length is 1, filling is 1, and source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses a Leaky ReLU activation function.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described above, or equivalents may be substituted for elements thereof. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A deepfake generation method based on vector discretization representation is characterized by comprising the following steps of:
a) extracting frames of a source video and a target video, and identifying and aligning human faces in the frames of the source video and the target video;
b) establishing a network model, and optimizing the network model by using a loss function;
c) sequentially passing aligned face pictures in a source video frame through an encoder, a discrete vector embedding unit and a decoder to obtain a face-changed picture;
d) sharpening and fusing the face-changed picture, and putting the sharpened and fused picture into a video frame;
e) repeating steps c) -d), and combining the video frames into a final video;
the step a) comprises the following steps:
a-1) Using the multimedia processing tool ffempg, on the sourceVideo VsExtracting a source video framesFor the target video VtExtracting target video framet;
a-2) for Source video framesAnd a target video frametRespectively cutting out a face picture P by using an S3FD face detection algorithmsface_dectionAnd Ptface_dection;
a-3) taking a face picture Psface_dectionAnd Ptface_dectionUsing 2DFAN face alignment algorithm to align the face feature points to respectively obtain aligned face pictures Psface_alignAnd Ptface_align,Ps iIs Psface_alignThe ith alignment face picture, Pt jIs Ptface_alignAligning the jth alignment face picture; the step b) comprises the following steps:
b-1) establishing a network model, wherein the network model is composed of an encoder E and a source video frame picture decoder GsTarget video frame picture decoder GtSource video frame picture discrete vector embedding unit EsTarget video frame picture discrete vector embedding unit EtSource video frame picture discriminator DsAnd a target video frame picture discriminator DtThe encoder E comprises 2 residual units and 4 downsampling convolutional layers in sequence, and the source video frame picture decoder GsAnd a target video frame picture decoder GtAre composed of 2 residual error units and 4 upsampling convolution layers in turn, and a source video frame picture discrete vector embedding unit EsAnd a target video frame picture discrete vector embedding unit EtAll sequentially consisting of 2 residual error units, an attnBlock module of a transform model and a dictionary vector Embedding function Embedding, and a source video frame picture discriminator DsAnd a target video frame picture discriminator DtThe components are sequentially composed of 2 convolution and activation function layers, 3 convolution and activation function plus batch normalization layers and 2 convolution and activation function layers;
b-2) reactingAnd Pt jRespectively obtaining coding vectors s from the input coder EqAnd tqEncoding the vector sqDiscrete vector embedding unit E for picture input into source video framesMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZsIn the formulah isPt jA height value of w isPt jWidth value of nzIn order to embed the dimensions of the image,k is the number of vectors, and the vector t is encodedqDiscrete vector embedding unit E of picture input into target video frametMiddle passing formulaCalculating to obtain a space vector Z quantized to be similar in discrete space ZtWill be a space vector ZsInput source video frame picture decoder GsTo obtain a decoding result sgWill be a space vector ZtInput target video frame picture decoder GtTo obtain a decoding result tg;
b-3) by the formulaCalculating to obtain the loss l1By the formula l2=||sg-Zs||2+||tg-Zt||2Calculating to obtain the loss l2Decoding the result sgInput to source video frame image discriminator DsMaking a judgment, and decoding the result tgInput to the target video frame picture discriminator DtMaking a judgment by using a formulaCalculating the loss l of the reconstructed picture and the original picture3Will lose l1Loss l2And loss of l3And (4) back propagation, and continuously performing iterative adjustment on the network model in b-1) by using an optimizer.
2. The method for depfake generation based on vector discretization representation according to claim 1, wherein: step c) aligning the face picture P in the source video framesface_alignInputting the data into the network model after iteration in the step b-3), and sequentially passing through an encoder E, a target video frame picture discrete vector embedding unit and a target video frame picture decoder GtThen obtaining a decoding result tstog。
3. The method for depfake generation based on vector discretization representation according to claim 2, wherein: decoding result t in step d)stogObtaining a plurality of video frame frames through sharpening and fusion operationsf。
4. The method for depfake generation based on vector discretization representation according to claim 3, wherein: multiple video frame frames in step e)fSynthesis of the final video V Using the multimedia processing tool ffempgf。
5. The method for depfake generation based on vector discretization representation according to claim 1, wherein: and b-1), wherein the residual error unit comprises a normalized convolution module and a convolution layer, the normalized convolution module sequentially comprises two normalized layers and a convolution layer, the picture sequentially passes through the two normalized layers, the convolution layer and the convolution layer, the output result is subjected to addition operation, the convolution kernel of the convolution layer is 3 x 3, the step length is 1, and the filling is 1.
6. The method for depfake generation based on vector discretization representation according to claim 1, wherein: in the step b-1), the convolution kernel of the downsampled convolution layer of the encoder E is 3 x 3, the step length is 2, and the padding is 0; convolution kernel of the upsampled convolution layer is 3 x 3, step length is 1, filling is 1, and source video frame image discriminator DsAnd a target video frame picture discriminator DtThe convolution kernel of the medium convolution is 4 x 4, the step length is 2, the filling is 1, and a source video frame image discriminator DsAnd a target video frame picture discriminator DtThe medium activation function uses a Leaky ReLU activation function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111400589.0A CN114155139B (en) | 2021-11-23 | 2021-11-23 | Deepfake generation method based on vector discretization representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111400589.0A CN114155139B (en) | 2021-11-23 | 2021-11-23 | Deepfake generation method based on vector discretization representation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114155139A CN114155139A (en) | 2022-03-08 |
CN114155139B true CN114155139B (en) | 2022-07-22 |
Family
ID=80457238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111400589.0A Active CN114155139B (en) | 2021-11-23 | 2021-11-23 | Deepfake generation method based on vector discretization representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114155139B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115311720B (en) * | 2022-08-11 | 2023-06-06 | 山东省人工智能研究院 | Method for generating deepfake based on transducer |
CN116246022B (en) * | 2023-03-09 | 2024-01-26 | 山东省人工智能研究院 | Face image identity synthesis method based on progressive denoising guidance |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1125568C (en) * | 1996-01-22 | 2003-10-22 | 松下电器产业株式会社 | Digital image encoding and decoding method and apparatus using same |
US9747495B2 (en) * | 2012-03-06 | 2017-08-29 | Adobe Systems Incorporated | Systems and methods for creating and distributing modifiable animated video messages |
CN103489011A (en) * | 2013-09-16 | 2014-01-01 | 广东工业大学 | Three-dimensional face identification method with topology robustness |
CN110870302B (en) * | 2017-07-03 | 2021-06-29 | 诺基亚技术有限公司 | Apparatus, method and computer program for omnidirectional video |
KR102262554B1 (en) * | 2017-12-14 | 2021-06-09 | 한국전자통신연구원 | Method and apparatus for encoding and decoding image using prediction network |
US11410275B2 (en) * | 2019-09-23 | 2022-08-09 | Tencent America LLC | Video coding for machine (VCM) based system and method for video super resolution (SR) |
GB2588438B (en) * | 2019-10-24 | 2022-06-08 | Sony Interactive Entertainment Inc | Encoding and decoding apparatus |
CN112446364B (en) * | 2021-01-29 | 2021-06-08 | 中国科学院自动化研究所 | High-definition face replacement video generation method and system |
CN113192161B (en) * | 2021-04-22 | 2022-10-18 | 清华珠三角研究院 | Virtual human image video generation method, system, device and storage medium |
CN113240575A (en) * | 2021-05-12 | 2021-08-10 | 中国科学技术大学 | Face counterfeit video effect enhancement method |
-
2021
- 2021-11-23 CN CN202111400589.0A patent/CN114155139B/en active Active
Non-Patent Citations (2)
Title |
---|
DeepFaceLab: A simple, flexible and extensible face;Ivan Perov;《arXiv》;20200520;第1-17页 * |
DeepFake技术背后的安全问题:机遇与挑战;高威等;《信息安全研究》;20200705(第07期);第64-74页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114155139A (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114155139B (en) | Deepfake generation method based on vector discretization representation | |
CN109993678B (en) | Robust information hiding method based on deep confrontation generation network | |
CN115311720B (en) | Method for generating deepfake based on transducer | |
CN111369565A (en) | Digital pathological image segmentation and classification method based on graph convolution network | |
CN109996073B (en) | Image compression method, system, readable storage medium and computer equipment | |
Ji et al. | U2-former: A nested u-shaped transformer for image restoration | |
CN116246022B (en) | Face image identity synthesis method based on progressive denoising guidance | |
CN110880193A (en) | Image compression method using depth semantic segmentation technology | |
CN116309107A (en) | Underwater image enhancement method based on Transformer and generated type countermeasure network | |
CN115829876A (en) | Real degraded image blind restoration method based on cross attention mechanism | |
CN113747163A (en) | Image coding and decoding method and compression method based on context reorganization modeling | |
CN115713680A (en) | Semantic guidance-based face image identity synthesis method | |
CN113781324B (en) | Old photo restoration method | |
CN112750175B (en) | Image compression method and system based on octave convolution and semantic segmentation | |
CN108171325B (en) | Time sequence integration network, coding device and decoding device for multi-scale face recovery | |
CN116523985B (en) | Structure and texture feature guided double-encoder image restoration method | |
CN115880762B (en) | Human-machine hybrid vision-oriented scalable face image coding method and system | |
CN117061760A (en) | Video compression method and system based on attention mechanism | |
Ma et al. | AFEC: adaptive feature extraction modules for learned image compression | |
Kim et al. | End-to-end learnable multi-scale feature compression for vcm | |
CN115619681A (en) | Image reconstruction method based on multi-granularity Vit automatic encoder | |
CN115393452A (en) | Point cloud geometric compression method based on asymmetric self-encoder structure | |
CN115496134A (en) | Traffic scene video description generation method and device based on multi-modal feature fusion | |
Huang et al. | CLSR: cross-layer interaction pyramid super-resolution network | |
CN114494387A (en) | Data set network generation model and fog map generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |