CN111263226B - Video processing method, video processing device, electronic equipment and medium - Google Patents

Video processing method, video processing device, electronic equipment and medium Download PDF

Info

Publication number
CN111263226B
CN111263226B CN202010057682.5A CN202010057682A CN111263226B CN 111263226 B CN111263226 B CN 111263226B CN 202010057682 A CN202010057682 A CN 202010057682A CN 111263226 B CN111263226 B CN 111263226B
Authority
CN
China
Prior art keywords
replaced
training
encoder
video
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010057682.5A
Other languages
Chinese (zh)
Other versions
CN111263226A (en
Inventor
张勇东
胡梓珩
谢洪涛
邓旭冉
李岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Original Assignee
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Research Institute, University of Science and Technology of China USTC filed Critical Beijing Zhongke Research Institute
Priority to CN202010057682.5A priority Critical patent/CN111263226B/en
Publication of CN111263226A publication Critical patent/CN111263226A/en
Application granted granted Critical
Publication of CN111263226B publication Critical patent/CN111263226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Abstract

Video processing method, apparatus, electronic device, and medium. A method of video processing, the method comprising: decoding a video to be replaced and a target video into a first frame sequence and a second frame sequence respectively, and acquiring an object image to be replaced and a target object image which respectively correspond to each other; coding an image of an object to be replaced, and adding preselected noise in the coding process; carrying out style migration on the coding result; decoding and reconstructing the encoding result of the style migration to enable the target object image to replace the object image to be replaced, and obtaining a reconstructed image; and fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence, and restoring the replaced first frame sequence to the video. The method saves time cost and material cost, reduces replacement traces, ensures the definition and the trueness of the face changing effect, has higher watching effect and is simple to operate.

Description

Video processing method, video processing device, electronic equipment and medium
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a video processing method, apparatus, electronic device, and medium.
Background
In the current movie entertainment industry, for several reasons, it is desirable to replace certain actors after the completion of the filming of the work. Engaging other actors to re-record related shots can consume significant time and material costs. Video is a collection of sequences of frames, usually measured in number of frames per second (i.e., FPS). According to the conventional 24 frames per second estimation, 240 frames of images exist in a video clip every 10 seconds, the workload of manual processing is extremely large, and a professional person needing the images needs to operate the video clip, so that the requirements on professional ability and proficiency are high, and otherwise, the processing effect cannot be guaranteed. With the continuous development of the artificial intelligence deep learning technology, it becomes possible to apply the artificial intelligence technology to the automatic face replacement in the video.
Disclosure of Invention
Technical problem to be solved
In view of the above technical problems, the present disclosure provides a video processing method, an apparatus, an electronic device, and a medium, which are used to at least solve the above technical problems.
(II) technical scheme
According to a first aspect of the embodiments of the present disclosure, there is provided a video processing method, including: decoding a video to be replaced and a target video into a first frame sequence and a second frame sequence respectively, and acquiring an object image to be replaced and a target object image which respectively correspond to each other; coding an image of an object to be replaced, and adding preselected noise in the coding process; carrying out style migration on the coding result; decoding and reconstructing the encoding result of the style migration to enable the target object image to replace the object image to be replaced, and obtaining a reconstructed image; and fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence, and restoring the replaced first frame sequence to the video.
Optionally, encoding the image of the object to be replaced includes: acquiring a first noise reduction self-encoder; and inputting the object image to be replaced into a first noise reduction self-encoder for encoding.
Optionally, decoding and reconstructing the encoded result of the style migration includes: acquiring a second noise reduction self-encoder; and inputting the coding result of the style migration into a second noise reduction self-coder for decoding and reconstruction.
Optionally, the method further comprises: training either the first noise reducing self-encoder or the second noise reducing self-encoder, the operations comprising: acquiring a first training data set and a second training data set, wherein the first training data set comprises a video to be replaced for training, and the second training data set comprises a target video for training; extracting first image data in a video to be replaced for training, and extracting second image data of a target video for training; and training a first noise reduction self-encoder by using the first image data and training a second noise reduction self-encoder by using the second image data by using a hierarchical training method.
Optionally, training the first noise reduction self-encoder with the first image data by using a hierarchical training method includes: performing first training on the first image data by adopting double-layer convolution to obtain a first parameter; training the first image data and the first parameters by adopting four layers of convolution to obtain second parameters; training the first image data and the second parameter by adopting six-layer convolution to obtain a third parameter; and by analogy, two layers of convolution are added each time, wherein one layer corresponds to an encoder of the first noise reduction self-encoder, and the other layer corresponds to a decoder of the first noise reduction self-encoder.
Optionally, the method further comprises: and acquiring the position information of the object image to be replaced in the video to be replaced.
Optionally, fusing the reconstructed image to a sequence of frames, comprising: the reconstructed image is fused to the position in the frame sequence to which the position information points.
Optionally, the preselected noise is selected to be gaussian noise.
According to a second aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including: the decomposition module is used for decoding the video to be replaced and the target video into a first frame sequence and a second frame sequence respectively and acquiring the object image to be replaced and the target object image which respectively correspond to each other; the first self-encoder is used for encoding the image of the object to be replaced, and preselection noise is added in the encoding process; the migration module is used for carrying out style migration on the coding result; the second self-encoder is used for decoding and reconstructing the encoding result of the style migration, so that the target object image replaces the object image to be replaced, and a reconstructed image is obtained; and the replacing module is used for fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence and restoring the replaced first frame sequence into the video.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: one or more processors. Memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program comprising computer executable instructions for implementing the method as described above when executed.
(III) advantageous effects
The present disclosure provides a video processing method, apparatus, electronic device and medium, which have the following beneficial effects:
1. when the video works are replaced, all shots in the video can be processed only by training a model aiming at the replaced objects, so that the time cost and the material cost are saved.
2. In the process of coding the object to be replaced, a noise reduction mechanism is added, style migration is carried out on a coding result, in addition, a level training method is adopted in the model training process, and replacement traces are reduced, so that the definition and the authenticity of the face changing effect are ensured, and the higher watching effect is realized.
3. The operation in the method is packaged in a program stored in a device or electronic equipment, so that a user only needs to have basic computer operation capacity and operate the corresponding program according to the training steps and the use flow without deeply mastering computer science and image processing professional knowledge, and the operation is simple.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention. Wherein:
fig. 1 schematically shows a flow chart of a video processing method according to an exemplary embodiment of the present disclosure;
FIG. 2 schematically illustrates a schematic diagram of an auto-encoder according to an exemplary embodiment of the present disclosure;
fig. 3 schematically illustrates a network structure diagram of an auto-encoder according to an exemplary embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a training method of an autoencoder according to an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates a network architecture diagram of a VGG19 according to an exemplary embodiment of the present disclosure;
fig. 6 schematically shows a block diagram of a video processing apparatus according to an exemplary embodiment of the present disclosure; and
fig. 7 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The embodiment of the disclosure provides a video processing method, which includes: and respectively decoding the video to be replaced and the target video into a first frame sequence and a second frame sequence to obtain the object image to be replaced and the target object image which respectively correspond to each other. And coding the image to be replaced, and adding preselected noise in the coding process. And carrying out style migration on the encoding result. And decoding and reconstructing the encoding result of the style migration, so that the target object image replaces the object image to be replaced, and a reconstructed image is obtained. And fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence, and restoring the replaced first frame sequence to the video.
Fig. 1 schematically shows a flow diagram of a video processing method according to an exemplary embodiment of the present disclosure, which may include operations S101 to S104, for example.
S101, decoding the video to be replaced and the target video into a first frame sequence and a second frame sequence respectively, and obtaining the object image to be replaced and the target object image which respectively correspond to each other.
In a feasible manner of this embodiment, for example, video decoding software may be used to decode a video to be replaced into frames to obtain a first frame sequence, decode a target video into frames to obtain a second frame sequence, and store the first frame sequence and the second frame sequence in two folders, respectively. For each frame, an object image to be replaced and a target object image are extracted, and the object image to be replaced may be a human face, for example, but the invention is not limited thereto. If the face needs to be used, a face Detector (DLIB) or a Multi-task cascaded convolutional Neural network (MTCNN) may be used to extract and align the face images in each frame, and the specific extraction method is not limited in the present invention. In order to facilitate the subsequent restoration of the replaced object (human face) to the original position in the video, the position information of the object to be replaced needs to be acquired and stored, so as to be fused into the original frame sequence after replacement.
S102, coding the image to be replaced, and adding preselected noise in the coding process.
In a feasible manner of the embodiment, an auto-encoder may be used to encode the image to be replaced. The self-Encoder is an unsupervised learning model, which is composed of an Encoder (Encoder) and a Decoder (Decoder). The encoder may be represented by a function h ═ f (x), and the decoder may be represented by a function r ═ g (h). The purpose of the self-encoder is to impose certain constraints on the output, so that the encoder learns discriminative features in the image, and the output image data x' reconstructed by the decoder reproduces the input image x as much as possible.
In practical application, due to the fact that scenes in videos are complex, angles and expressions of faces of people are changed greatly, and local areas of the faces are easily shielded by hairs or other objects, namely, irrelevant noise of face images is large. Therefore, when face replacement is performed, local blurring may occur, and even false replacement may occur, resulting in a large noise in the output image. In order to overcome this problem, the present embodiment retrains based on the common self-encoder to obtain the first noise reduction self-encoder, and the principle of the noise reduction self-encoder is shown in fig. 2. The following describes the training process in detail by taking the example of replacing faces in video, but the training process is not limited to faces.
First, a first training data set is obtained, the training data set includes a to-be-replaced training video, and the to-be-replaced training video refers to a to-be-replaced video for training.
Then, the video to be replaced for training is decoded into a frame sequence, and the facial image x in each frame of image is extracted as the first image data, and the extraction method may be DLIB, for example. And adding noise to the face image to obtain a noise picture, wherein the noise picture is obtained. The noise may be gaussian noise, for example. The noise picture is input into a self-encoder to be trained, the network structure of an encoder and a decoder of the self-encoder is shown in fig. 3, the sizes of convolution kernels and the number of output channels of each layer are specifically marked, and the encoder and the decoder respectively comprise 5 layers of convolution layers.
The phenomenon of gradient dispersion is easy to occur when the random gradient descent method is used for training the network. Therefore, in a feasible manner of this embodiment, the first image data is trained by using a hierarchical training method, and the training process is as shown in fig. 4, and the first image data is trained for the first time by using double-layer convolution to obtain the first parameter, that is, only the first layer is trained. And training the first image data and the first parameters by adopting four layers of convolution to obtain second parameters, namely adding the first layer parameters into the second layer for training. And training the first image data and the second parameter by adopting six layers of convolution to obtain a third parameter, and repeating the steps, wherein two layers of convolution layers are added each time, one layer corresponds to an encoder of the first noise reduction self-encoder, and the other layer corresponds to a decoder of the first noise reduction self-encoder. Wherein, one layer is deepened every 100000 training rounds. The relevant parameters may be, for example: training parameters: batch _ size: 16, weight attenuation is 0.0001, initial learning rate is 0.0001, network output image y is obtained, and a loss function is a mean square error function MSE:
L(x,y)=||x-y||2
the encoder thus far obtained is denoted as EAThe decoder is marked as DA
And after the training is finished, acquiring the first noise reduction self-encoder, and inputting the image of the object to be replaced into the first noise reduction self-encoder for encoding.
Because the video needs to be restored subsequently, the object image to be replaced is replaced by the target object image, and the encoding result needs to be decoded, based on the same training method, the second noise reduction self-encoder is obtained by training, and the encoder for obtaining the second noise reduction self-encoder is marked as EBThe decoder is marked as DB. The training process uses a second training data set that includes target video for training.
And S103, carrying out style migration on the coding result.
The object to be replaced is different from the target object in video scenes and shooting environments, so that the light ray difference is large. Natural light is input into the camera equipment through processes of lens processing, CCD sensitization and the like, and dark current noise, thermal noise and shot noise are inevitably added into the electronic equipment. Meanwhile, the camera equipment performs mathematical quantization operations such as difference, white balance, gamma correction and the like on the sensed information to finally obtain a digital image, and quantization noise is generated in the process. Therefore, the object to be replaced has a different light and shadow effect and noise pattern from the target object. If directly using the encoder EAAnd decoder DBThe replaced human faces can meet the basic requirements of face changing, but are not real and natural enough, have certain artificial traces and have poor ornamental effect.
Therefore, in a feasible manner of the embodiment, a migration network T is introducedθCascaded at encoder EAAnd decoder DBIn addition, the replacement process is further trained by using VGG loss, the original style characteristics can be kept, and the overall effect is more real and coordinated.
The migration network TθThe training process of (a) may be, for example, as follows:
and taking the first training data set as training data, namely the input face image x to be replaced. The structure of the migration network may be, for example, two fully connected layers plus one convolutional layer, the number of output neurons is 1024 and 4 × 1024, respectively, the convolutional layer convolution kernel is 3 × 3, and the number of output channels is 1024.
The training loss function is:
Loss=Lvgg(y,x)=Lvgg(DB(Tθ(EA(x))),x)
wherein the content of the first and second substances,
Figure BDA0002371731310000071
Figure BDA0002371731310000072
is a characteristic diagram of the image x output through a VGG19 network, y is a reconstructed image obtained by encoding + migration + decoding the x image,
Figure BDA0002371731310000073
is a characteristic diagram of the reconstructed image y output through the network of VGG19, wherein the structure diagram of the network of VGG19 is shown in fig. 5.
And S104, decoding and reconstructing the encoding result of the style migration, so that the target object image replaces the object image to be replaced, and a reconstructed image is obtained.
And acquiring the trained second noise reduction self-encoder, inputting the encoding result after the style migration into the second noise reduction self-encoder for decoding, and reconstructing an image based on the decoding result, wherein the reconstructed image has the identity characteristic of the target object. Thus, the input image x is passed through the encoder E of the first noise reduction self-encoderAThen through the migration network, and finally through the decoder D of the second noise reduction self-encoderBDecoding to obtain a reconstructed image y ═ DB(Tθ(EA(x)))。
S105, fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence, and restoring the replaced first frame sequence to the video.
And seamlessly fusing the replaced reconstructed image into a first frame sequence according to the stored position information, and restoring the first frame sequence into a video, so as to finish the replacement of the object in the video to be replaced.
When replacing the video works in the embodiment, all shots in the video can be processed only by training a model aiming at the replaced object, so that the time cost and the material cost are saved. In the process of coding the object to be replaced, a noise reduction mechanism is added, style migration is carried out on a coding result, in addition, a level training method is adopted in the model training process, and replacement traces are reduced, so that the definition and the authenticity of the face changing effect are ensured, and the higher watching effect is realized. The operation in the method is packaged in a program stored in a device or electronic equipment, so that a user only needs to have basic computer operation capacity and operate the corresponding program according to the training steps and the use flow without deeply mastering computer science and image processing professional knowledge, and the operation is simple.
Fig. 6 schematically shows a block diagram of a video processing apparatus according to an exemplary embodiment of the present disclosure, and as shown in fig. 6, the apparatus 600 may include, for example, a decomposition module 610, a first self-encoder 620, a migration module 630, a second self-encoder 640, and a replacement module 650.
The decomposition module 610 is configured to decode the video to be replaced and the target video into a first frame sequence and a second frame sequence, and obtain an object image to be replaced and a target object image that respectively correspond to each other.
And a first self-encoder 620 for encoding the image to be replaced, wherein preselected noise is added in the encoding process.
And a migration module 630, configured to perform style migration on the encoding result.
And the second self-encoder 640 is configured to decode and reconstruct the encoding result of the style migration, so that the target object image replaces the object image to be replaced, and a reconstructed image is obtained.
And a replacing module 650, configured to fuse the reconstructed image to the first frame sequence to obtain a replaced first frame sequence, and restore the replaced first frame sequence to a video.
It should be noted that the embodiment of the apparatus portion is similar to the embodiment of the method portion, and please refer to the method embodiment portion for details, which are not described herein again.
Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by any other reasonable means of hardware or firmware by integrating or packaging the circuits, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present disclosure may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.
For example, any plurality of the decomposition module 610, the first self-encoder 620, the migration module 630, the second self-encoder 640, and the replacement module 650 may be combined in one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the decomposition module 610, the first self-encoder 620, the migration module 630, the second self-encoder 640, and the replacement module 650 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, at least one of the decomposition module 610, the first self-encoder 620, the migration module 630, the second self-encoder 640 and the replacement module 650 may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.
FIG. 7 is a block diagram illustrating an electronic device in accordance with an example embodiment. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 700 includes a processor 710, a computer-readable storage medium 720. The electronic device 700 may perform a method according to an embodiment of the present disclosure.
In particular, processor 710 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 710 may also include on-board memory for caching purposes. Processor 710 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
Computer-readable storage medium 720, for example, may be a non-volatile computer-readable storage medium, specific examples including, but not limited to: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); memory such as Random Access Memory (RAM) or flash memory, etc.
The computer-readable storage medium 720 may include a computer program 721, which computer program 721 may include code/computer-executable instructions that, when executed by the processor 710, cause the processor 710 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 721 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 721 may include one or more program modules, including 721A, modules 721B, … …, for example. It should be noted that the division and number of modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 710 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 710.
At least one of the decomposition module 610, the first self-encoder 620, the migration module 630, the second self-encoder 640, and the replacement module 650 may be implemented as a computer program module described with reference to fig. 7, which when executed by the processor 710 may implement the respective operations described above, according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be included in the apparatus/device/system described in the above embodiments, or may exist separately without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be understood by those skilled in the art that while the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. Accordingly, the scope of the present disclosure should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims (8)

1. A method of video processing, the method comprising:
decoding a video to be replaced and a target video into a first frame sequence and a second frame sequence respectively, and acquiring an object image to be replaced and a target object image which respectively correspond to each other;
acquiring a first noise reduction self-encoder, inputting the object image to be replaced into the first noise reduction self-encoder for encoding, and adding preselected noise in the encoding process;
carrying out style migration on the coding result;
acquiring a second noise reduction self-encoder, inputting the encoding result of the style migration into the second noise reduction self-encoder for decoding and reconstruction, and enabling the target object image to replace the object image to be replaced to obtain a reconstructed image;
and fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence, and restoring the replaced first frame sequence to a video.
2. The method of claim 1, further comprising: training the first or second noise-reducing autoencoder, operations comprising:
acquiring a first training data set and a second training data set, wherein the first training data set comprises videos to be replaced for training, and the second training data set comprises target videos for training;
extracting first image data in the video to be replaced for training, and extracting second image data of the target video for training;
and training the first noise reduction self-encoder by using the first image data and training the second noise reduction self-encoder by using the second image data by using a hierarchical training method.
3. The method of claim 2, wherein training the first noise-reducing self-encoder using the hierarchical training method using the first image data comprises:
performing first training on the first image data by adopting double-layer convolution to obtain a first parameter;
training the first image data and the first parameters by adopting four layers of convolution to obtain second parameters;
training the first image data and the second parameter by adopting six layers of convolution to obtain a third parameter;
and in the same way, two layers of convolution are added each time, wherein one layer corresponds to the encoder of the first noise reduction self-encoder, and the other layer corresponds to the decoder of the first noise reduction self-encoder.
4. The method of claim 1, further comprising:
and acquiring the position information of the object image to be replaced in the video to be replaced.
5. The method of claim 1, wherein the fusing the reconstructed image to the sequence of frames comprises:
fusing the reconstructed image to a position in the frame sequence pointed to by the position information.
6. The method of claim 1, wherein the preselected noise is selected to be gaussian noise.
7. A video processing apparatus, characterized in that the apparatus comprises:
the decomposition module is used for decoding the video to be replaced and the target video into a first frame sequence and a second frame sequence respectively and acquiring the object image to be replaced and the target object image which respectively correspond to each other;
the first self-encoder is used for encoding the object image to be replaced, and preselection noise is added in the encoding process;
the migration module is used for carrying out style migration on the coding result;
the second self-encoder is used for decoding and reconstructing the encoding result of the style migration, so that the target object image replaces the object image to be replaced, and a reconstructed image is obtained;
and the replacing module is used for fusing the reconstructed image to the first frame sequence to obtain a replaced first frame sequence and restoring the replaced first frame sequence into a video.
8. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-6.
CN202010057682.5A 2020-01-17 2020-01-17 Video processing method, video processing device, electronic equipment and medium Active CN111263226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010057682.5A CN111263226B (en) 2020-01-17 2020-01-17 Video processing method, video processing device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010057682.5A CN111263226B (en) 2020-01-17 2020-01-17 Video processing method, video processing device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN111263226A CN111263226A (en) 2020-06-09
CN111263226B true CN111263226B (en) 2021-10-22

Family

ID=70948985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010057682.5A Active CN111263226B (en) 2020-01-17 2020-01-17 Video processing method, video processing device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN111263226B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112752147A (en) * 2020-09-04 2021-05-04 腾讯科技(深圳)有限公司 Video processing method, device and storage medium
CN112651449B (en) * 2020-12-29 2023-08-01 北京百度网讯科技有限公司 Method, device, electronic equipment and storage medium for determining content characteristics of video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564166A (en) * 2018-03-22 2018-09-21 南京大学 Based on the semi-supervised feature learning method of the convolutional neural networks with symmetrical parallel link
CN110503703A (en) * 2019-08-27 2019-11-26 北京百度网讯科技有限公司 Method and apparatus for generating image
CN110533579A (en) * 2019-07-26 2019-12-03 西安电子科技大学 Based on the video style conversion method from coding structure and gradient order-preserving
CN110533585A (en) * 2019-09-04 2019-12-03 广州华多网络科技有限公司 A kind of method, apparatus that image is changed face, system, equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9639743B2 (en) * 2013-05-02 2017-05-02 Emotient, Inc. Anonymization of facial images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564166A (en) * 2018-03-22 2018-09-21 南京大学 Based on the semi-supervised feature learning method of the convolutional neural networks with symmetrical parallel link
CN110533579A (en) * 2019-07-26 2019-12-03 西安电子科技大学 Based on the video style conversion method from coding structure and gradient order-preserving
CN110503703A (en) * 2019-08-27 2019-11-26 北京百度网讯科技有限公司 Method and apparatus for generating image
CN110533585A (en) * 2019-09-04 2019-12-03 广州华多网络科技有限公司 A kind of method, apparatus that image is changed face, system, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于栈式降噪自编码器的人脸识别方法;冉鹏 等;《工业控制计算机》;20161231;第29卷(第9期);正文部分 *

Also Published As

Publication number Publication date
CN111263226A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
Lutz et al. Alphagan: Generative adversarial networks for natural image matting
Niu et al. HDR-GAN: HDR image reconstruction from multi-exposed LDR images with large motions
Lin et al. Real-time high-resolution background matting
Zhang et al. Semantic image inpainting with progressive generative networks
US11055828B2 (en) Video inpainting with deep internal learning
Wang et al. Deep learning for hdr imaging: State-of-the-art and future trends
Lu et al. Layered neural rendering for retiming people in video
CN113658051A (en) Image defogging method and system based on cyclic generation countermeasure network
CN111263226B (en) Video processing method, video processing device, electronic equipment and medium
Wang et al. Event-driven video frame synthesis
US11562597B1 (en) Visual dubbing using synthetic models
Messikommer et al. Multi-bracket high dynamic range imaging with event cameras
US20220156987A1 (en) Adaptive convolutions in neural networks
US20230274400A1 (en) Automatically removing moving objects from video streams
Wan et al. Purifying low-light images via near-infrared enlightened image
CN113542780B (en) Method and device for removing compression artifacts of live webcast video
CN115719399A (en) Object illumination editing method, system and medium based on single picture
Barua et al. Arthdr-net: Perceptually realistic and accurate hdr content creation
CN112669234A (en) High-resolution image restoration method and system based on neural network
Yang et al. Multi-scale extreme exposure images fusion based on deep learning
Flaxton HD Aesthetics and Digital Cinematography
Que et al. Residual dense U‐Net for abnormal exposure restoration from single images
CN114449280B (en) Video coding and decoding method, device and equipment
US20230044969A1 (en) Video matting
Suraj et al. A Technique for Video Inpainting using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant