CN109359687B - Video style conversion processing method and device - Google Patents

Video style conversion processing method and device Download PDF

Info

Publication number
CN109359687B
CN109359687B CN201811220100.XA CN201811220100A CN109359687B CN 109359687 B CN109359687 B CN 109359687B CN 201811220100 A CN201811220100 A CN 201811220100A CN 109359687 B CN109359687 B CN 109359687B
Authority
CN
China
Prior art keywords
target
video
model
output vector
style
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811220100.XA
Other languages
Chinese (zh)
Other versions
CN109359687A (en
Inventor
柏提
孙昊
刘霄
李鑫
赵翔
杨凡
李旭斌
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811220100.XA priority Critical patent/CN109359687B/en
Publication of CN109359687A publication Critical patent/CN109359687A/en
Application granted granted Critical
Publication of CN109359687B publication Critical patent/CN109359687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Abstract

The application provides a video style conversion processing method and a video style conversion processing device, wherein the method comprises the following steps: setting a first target output vector reflecting the style attribute characteristic network layer according to the style attribute information of the sample picture, setting a second target output vector reflecting the content characteristic network layer according to the content information of the current input video frame, setting a third target output vector reflecting the optical flow field characteristic network layer according to the optical flow field information of the current input video frame, training network parameters of each network layer in a target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.

Description

Video style conversion processing method and device
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for processing a video style conversion.
Background
With the continuous development of internet technology, the demand of users for richness of media resources is increasing, for example, from classical text content to later picture content, and further to videos, especially short video content, which are widely popular nowadays. On the other hand, users also want to be able to perform new artistic processing on the content itself to obtain a more novel and creative artistic form, such as artistic style conversion of videos.
In the related art, the artistic style conversion method can only perform style conversion on a single picture, so the style conversion processing is performed in a classic picture-by-picture-based mode, the problem of long consumed time is inevitably caused due to the fact that video content often contains massive data, and mutation possibly exists between every two pictures, namely the picture content after the style conversion and the picture content before the original conversion do not have the same optical flow field, and the fluency of the video is influenced.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a video style conversion processing method, which improves efficiency of video style conversion processing while ensuring video smoothness by training and generating a video style conversion model.
A second object of the present application is to propose another video style conversion processing method.
A third object of the present application is to provide a video style conversion processing apparatus.
A fourth object of the present application is to provide another video style conversion processing apparatus.
A fifth object of the present application is to propose a computer device.
A sixth object of the present application is to propose a computer program product.
A seventh object of the present application is to propose a non-transitory computer-readable storage medium.
To achieve the above object, a first object of the present application is to provide a video style conversion processing method, including:
obtaining a sample picture for model training and a corresponding sample video set;
obtaining style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model;
acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model;
and training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters meeting preset training conditions and the target model, and converting a target video according to the video style conversion model to generate a video style matched with the sample picture.
To achieve the above object, a second object of the present application is to provide a video style conversion processing method, including:
acquiring a video style conversion request containing a target video and a target picture;
acquiring a pre-trained target video style conversion model corresponding to the target picture;
and converting the target video according to the target video style conversion model to generate a video style matched with the target picture.
To achieve the above object, a third object of the present application is to provide a video style conversion processing apparatus, including:
the acquisition module is used for acquiring a sample picture for model training and a corresponding sample video set;
the first setting module is used for acquiring style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model;
the second setting module is used for acquiring content information and optical flow field information of each video frame in the sample video, further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame in the process of training the target model, and setting a third target output vector reflecting the optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame;
and the training generation module is used for training the network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters meeting preset training conditions and the target model, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture.
To achieve the above object, a fourth object of the present application is to provide a video style conversion processing apparatus, including:
the second acquisition module is used for acquiring a video style conversion request containing a target video and a target picture;
the third acquisition module is used for acquiring a pre-trained target video style conversion model corresponding to the target picture;
and the conversion module is used for converting the target video according to the target video style conversion model to generate a video style matched with the target picture.
To achieve the above object, a fifth embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the video style conversion process according to the foregoing method embodiment.
To achieve the above object, a sixth aspect of the present application provides a computer program product, which when being executed by an instruction processor, implements the video style conversion processing method according to the foregoing method embodiment.
To achieve the above object, a seventh embodiment of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the video style conversion processing method according to the foregoing method embodiment.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
obtaining a sample picture used for model training and a corresponding sample video set, obtaining style attribute information of the sample picture, further setting a first target output vector reflecting a style attribute characteristic network layer in a target model according to the style attribute information in the process of training the target model, obtaining content information and optical flow field information of each video frame in the sample video, further setting a second target output vector reflecting the content characteristic network layer in the target model according to the content information of a current input video frame, setting a third target output vector reflecting the optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame, and training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, and generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when the preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a video style conversion processing method according to one embodiment of the present application;
FIG. 2 is a flow chart of a video style conversion processing method according to another embodiment of the present application
FIG. 3 is a schematic structural diagram of a video style conversion processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a video style conversion processing apparatus according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of a video style conversion processing apparatus according to still another embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The method aims to solve the problem that time consumption is too long due to the fact that style conversion processing is carried out in a classic picture frame-by-frame based mode in the related technology, and the fluency of videos is affected because the picture content after the style conversion and the picture content before the original conversion do not have the same optical flow field.
According to the method and the device, the video style conversion model is generated, so that the video content before and after the video conversion processing is not changed, only the style conversion of the video is performed, and the fluency of the video is ensured.
The following describes a video style conversion processing method and apparatus according to an embodiment of the present application with reference to the drawings.
Fig. 1 is a flowchart of a video style conversion processing method according to an embodiment of the present application, as shown in fig. 1, the method including:
step 101, obtaining a sample picture for model training and a corresponding sample video set.
It is understood that a sample picture represents an artistic style, that is, a sample picture corresponds to a video style conversion model.
It can also be understood that, in order to enable the video style conversion model generated by training in the example of the present application to be suitable for style conversion of different target videos, it is necessary to acquire as many videos of different scenes as possible as a sample video set for model training.
In order to improve the effectiveness of the video style conversion model, a natural scene video of nearly thousand orders of magnitude is adopted as a sample video set as much as possible for model training.
It should be noted that, in order to further improve richness and accuracy of model training, as a possible implementation manner, according to the size of an input picture of a target model, the size of each video frame in a sample picture and/or a sample video set is adjusted, so that the size of each video frame in the adjusted sample picture and/or sample video set matches with the size of the input picture.
The size of the input picture may be adjusted according to actual application requirements, and there are various ways to adjust the size of each video frame in the sample picture and/or the sample video set, for example, as follows:
in a first example, the size of each video frame in the sample picture and/or sample video set is cropped.
The cropping processing can be any position of each video frame in the sample picture and/or the sample video set, and the flexibility of the processing is further improved.
A second example, the size of each video frame in the sample picture and/or sample video set is interpolated.
102, obtaining style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in the target model according to the style attribute information in the process of training the target model.
Specifically, different sample pictures have different style attribute information, the style attribute information in the sample pictures can be obtained through a picture processing algorithm and other modes in the related technology, for example, sanskrit starry sky is used as the sample picture, and the style attribute information of the sample picture is obtained, for example, the style of blue like sea, and the style of soft and quiet color tone.
It can be understood that the target model has a plurality of layer network layers, one or more of which can be used as a network layer reflecting the style attribute characteristics according to needs, and a first target output vector reflecting the style attribute characteristics is set according to the style attribute information. Wherein, the target model can be a model trained by VGG19 on ImageNet.
Step 103, obtaining content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of the current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model.
It is to be understood that the sample video is composed of a plurality of video frames, each having corresponding content information and optical flow field information. Content information such as characters, images, and the like; optical flow field information such as motion information of objects, and rich information about the three-dimensional structure of the scene, etc.
Similarly, one or more of the multiple network layers of the target model can be used as a network layer reflecting the content characteristics as required, and a second target output vector reflecting the content characteristics can be set according to the content information of the current input video frame; one or more of the multiple network layers of the target model can be used as a network layer reflecting the optical flow field characteristics as required, and a third target output vector reflecting the optical flow field characteristics can be set according to the optical flow field information of the current input video frame.
For example, the target model is a 90-layer network layer, the bottom 30 layers are used as the network layers for reflecting the style attribute, and the middle 30 layers are used as the network layers for reflecting the content characteristic and the top 30 layers are used as the network layers for reflecting the optical flow field characteristic.
It is emphasized that in the present example, the network layer in the target model cannot be set repeatedly, such as the network layer that has been set to reflect the style attribute features cannot be set repeatedly any more, and such as the network layer that has been set to reflect the optical flow field features cannot be set repeatedly any more.
That is, different network layers in the target model may be used to reflect different features, such as style attribute features, content features, optical flow field features, and the like, so as to improve the effectiveness of model training.
And 104, training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture.
Specifically, the network parameters of each network layer in the target model are trained through the first target output vector, the second target output vector and the third target output vector, so that the network parameters of different network layers in the target model can be infinitely close to the corresponding reflection style attribute characteristic network layer, the reflection content characteristic network layer and the reflection optical flow field characteristic network layer respectively, and in order to meet the preset training condition, the video style conversion model corresponding to the sample picture can be generated by the corresponding network parameters and the target model.
Furthermore, the target video can be converted according to the video style conversion model to generate a video style matched with the sample picture.
It can be understood that the video style conversion model output picture and the sample picture are trained corresponding to the network layer of the target model reflecting the style attribute characteristics so as to ensure the style similarity between the two pictures; the video style conversion model outputs a picture and a video frame, and the content feature network layer of the picture and the video frame corresponding to the target model is trained, so that the similarity of the content and the style is ensured. And training the video optical flow field before and after transformation to ensure that the two have similar optical flows, so that the video after style transformation has the characteristic of smooth interframes.
In practical applications, different users have different preferences for artistic styles, and in order to enable the video style conversion model generated by training in the example of the present application to meet the video style conversion requirements of different users, it is necessary to obtain as many sample pictures with different artistic styles as possible for generating a plurality of different video style conversion models.
Specifically, sample pictures with different artistic styles can be randomly or purposefully selected, and as an example, a plurality of artistic pictures with western artistic painting styles are obtained to be used as sample pictures for generating a plurality of different video style conversion models; as another example, a plurality of western art painting-style art pictures, a plurality of chinese traditional painting-style art pictures, and a plurality of japanese cartoon-style art pictures are acquired as sample pictures for generating a plurality of different video-style conversion models.
When the method is applied, a target video style conversion model is selected according to user requirements to convert a target video to generate a video style matched with a sample picture. Wherein the artistic style of the sample picture is the target artistic style that the user needs to convert into.
In this example, in order to ensure that the video style conversion model can be converted in real time at the terminal device, after the video style conversion model corresponding to the sample picture is generated, the target network parameters need to be calculated according to a preset algorithm, and the network layer corresponding to the candidate network parameters whose calculation results meet the preset filtering conditions is deleted.
As a possible implementation manner, a filtering algorithm based on the L1 norm calculates a target network parameter to obtain a network layer corresponding to a candidate network parameter whose absolute value is smaller than a preset threshold, and performs deletion processing on the network layer, thereby implementing the compression acceleration effect of the video style conversion model.
That is to say, after the training of the video style conversion model is completed, the stylized video which is smoother can be generated without performing explicit calculation on the optical flow field, so that the processing speed of the video style conversion model can be greatly increased, and the practicability of the video style conversion model is further improved.
In the present example, while the video style conversion model is deployed, a memory multiplexing technique may be employed to preserve efficient use of memory.
As a possible implementation manner, a memory multiplexing setting is performed on the network layer in the video style conversion model, so that in the process of performing conversion processing on the target video according to the video style conversion model, processing data of the network layer stored in the memory is deleted.
That is, after the style conversion processing is performed on the target video a, the processing of the network layer in the video style conversion model that is set by memory multiplexing may be deleted, so as to perform the style conversion processing on the next target video B, thereby improving the efficiency of the video style conversion processing.
To sum up, the video style conversion processing method of the embodiment of the present application obtains a sample picture and a corresponding sample video set for model training, obtains style attribute information of the sample picture, further sets a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model, obtains content information and optical flow field information of each video frame in the sample video, further sets a second target output vector of the network layer reflecting content characteristics in the target model according to the content information of a current input video frame in the process of training the target model, sets a third target output vector of the network layer reflecting optical flow field characteristics in the target model according to the optical flow field information of the current input video frame, trains network parameters of each network layer in the target model according to the first target output vector, the second target output vector, and the third target output vector, and generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when the preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.
Fig. 2 is a flowchart of a video style conversion processing method according to another embodiment of the present application, as shown in fig. 2, the method including:
step 201, a video style conversion request containing a target video and a target picture is obtained.
Step 202, a pre-trained target video style conversion model corresponding to the target picture is obtained.
And step 203, converting the target video according to the target video style conversion model to generate a video style matched with the target picture.
Specifically, different users have different preferences for artistic styles, may have different scenes, different requirements for artistic styles, and the like. It is therefore desirable to obtain as many sample pictures of different artistic styles as possible for generating a plurality of different video style conversion models.
Therefore, when the method is applied, a video style conversion request containing the target video and the target picture is obtained. When a user needs to perform style conversion on a target video, the target video and a target picture are determined, the target video is the video needing to be converted, the target picture is the target style needing to be converted, a pre-trained target video style conversion model corresponding to the target picture is obtained, namely the target video style conversion model corresponding to the target style is obtained, and therefore the target video is converted according to the target video style conversion model to generate a video style matched with the target picture, video style conversion processing can be achieved rapidly, and user experience is improved.
In order to implement the foregoing embodiment, an embodiment of the present application further provides a video style conversion processing apparatus, and fig. 3 is a schematic structural diagram of the video style conversion processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the video style conversion processing apparatus includes: a first acquisition module 310, a first setup module 320, a second setup module 330, and a training generation module 340.
The first obtaining module 310 is configured to obtain a sample picture for model training and a corresponding sample video set.
The first setting module 320 is configured to obtain style attribute information of the sample picture, and further set a first target output vector, which reflects a style attribute feature network layer, in the target model according to the style attribute information in a process of training the target model.
The second setting module 330 is configured to obtain content information and optical flow field information of each video frame in the sample video, and further set a second target output vector, which reflects a content feature network layer in the target model, according to the content information of the current input video frame and set a third target output vector, which reflects an optical flow field feature network layer in the target model, according to the optical flow field information of the current input video frame in the process of training the target model.
The training generating module 340 is configured to train a network parameter of each network layer in the target model according to the first target output vector, the second target output vector, and the third target output vector, and generate a video style conversion model corresponding to the sample picture according to the target network parameter and the target model corresponding to the preset training condition, so as to convert the target video according to the video style conversion model to generate a video style matched with the sample picture.
In an embodiment of the present application, as shown in fig. 4, on the basis of fig. 3, the method further includes: an adjusting module 350, a calculating and deleting module 360 and a multiplexing and deleting module 370.
And an adjusting module 350, configured to adjust, according to the size of the input picture of the target model, the size of each video frame in the sample picture and/or the sample video set, so that the size of each video frame in the sample picture and/or the sample video set after adjustment matches the size of the input picture.
In one embodiment of the present application, the size of each video frame in the sample picture and/or the sample video set is clipped or interpolated.
And the calculation deletion module 360 is configured to calculate the target network parameter according to a preset algorithm, and delete the network layer corresponding to the candidate network parameter whose calculation result meets the preset filtering condition.
And a multiplexing deletion module 370, configured to perform memory multiplexing on the network layer in the video style conversion model, so that in the process of performing conversion processing on the target video according to the video style conversion model, the processing data of the network layer stored in the memory is deleted.
That is to say, after the training of the video style conversion model is completed, the stylized video which is smoother can be generated without performing explicit calculation on the optical flow field, so that the processing speed of the video style conversion model can be greatly increased, and the practicability of the video style conversion model is further improved.
That is, after the style conversion processing is performed on the target video a, the processing of the network layer in the video style conversion model that is set by memory multiplexing may be deleted, so as to perform the style conversion processing on the next target video B, thereby improving the efficiency of the video style conversion processing.
It should be noted that the foregoing explanation on the embodiment of the video style conversion processing method is also applicable to the video style conversion processing apparatus of the embodiment, and the implementation principle thereof is similar, and is not repeated here.
To sum up, the video style conversion processing apparatus of the embodiment of the present application obtains a sample picture and a corresponding sample video set for model training, obtains style attribute information of the sample picture, further sets a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in a process of training the target model, obtains content information and optical flow field information of each video frame in the sample video, further sets a second target output vector of the network layer reflecting content characteristics in the target model according to the content information of a current input video frame in the process of training the target model, sets a third target output vector of the network layer reflecting optical flow field characteristics in the target model according to the optical flow field information of the current input video frame, trains network parameters of each network layer in the target model according to the first target output vector, the second target output vector, and the third target output vector, and generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when the preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.
In order to implement the foregoing embodiment, an embodiment of the present application further provides a video style conversion processing apparatus, and fig. 5 is a schematic structural diagram of a video style conversion processing apparatus according to yet another embodiment of the present application, as shown in fig. 5, the video style conversion processing apparatus includes: a second obtaining module 510, a third obtaining module 520, and a converting module 530.
A second obtaining module 510, configured to obtain a video style conversion request including a target video and a target picture.
A third obtaining module 520, configured to obtain a pre-trained target video style conversion model corresponding to the target picture.
And the conversion module 530 is configured to perform conversion processing on the target video according to the target video style conversion model to generate a video style matched with the target picture.
Therefore, when the method is applied, a video style conversion request containing the target video and the target picture is obtained. When a user needs to perform style conversion on a target video, the target video and a target picture are determined, the target video is the video needing to be converted, the target picture is the target style needing to be converted, a pre-trained target video style conversion model corresponding to the target picture is obtained, namely the target video style conversion model corresponding to the target style is obtained, and therefore the target video is converted according to the target video style conversion model to generate a video style matched with the target picture, video style conversion processing can be achieved rapidly, and user experience is improved.
In order to implement the foregoing embodiments, the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the computer device implements the video style conversion process according to the foregoing method embodiments.
In order to implement the foregoing embodiments, the present application also proposes a computer program product, which when executed by an instruction processor in the computer program product implements the video style conversion processing method as described in the foregoing method embodiments.
In order to implement the above embodiments, the present application also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video style conversion processing method as described in the foregoing method embodiments.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (12)

1. A video style conversion processing method is characterized by comprising the following steps:
obtaining a sample picture for model training and a corresponding sample video set;
obtaining style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model;
acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model;
and training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters meeting preset training conditions and the target model, and converting a target video according to the video style conversion model to generate a video style matched with the sample picture.
2. The method of claim 1, wherein after the obtaining sample pictures and corresponding sample video sets for model training, further comprising:
and adjusting the size of each video frame in the sample picture and/or the sample video set according to the size of the input picture of the target model, so that the size of each video frame in the sample picture and/or the sample video set after adjustment is matched with the size of the input picture.
3. The method of claim 2, wherein the resizing each video frame in the sample picture and/or the sample video set comprises:
clipping the size of the sample picture and/or each video frame in the sample video set, or,
and interpolating the size of each video frame in the sample picture and/or the sample video set.
4. The method of claim 1, wherein after the generating the video style conversion model corresponding to the sample picture, further comprising:
and calculating the target network parameters according to a preset algorithm, and deleting the network layer corresponding to the candidate network parameters of which the calculation results meet the preset filtering conditions.
5. The method of claim 1, before the converting the target video according to the video style conversion model to generate the video style matching the sample picture, further comprising:
and carrying out memory multiplexing setting on the network layer in the video style conversion model so as to delete the processing data of the network layer stored in the memory in the process of carrying out conversion processing on the target video according to the video style conversion model.
6. A video style conversion processing method is characterized by comprising the following steps:
acquiring a video style conversion request containing a target video and a target picture;
acquiring a pre-trained target video style conversion model corresponding to the target picture; acquiring a target picture for model training and a corresponding sample video set; obtaining style attribute information of the target picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model; acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model; training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, and generating a video style conversion model corresponding to the target picture according to the corresponding target network parameters meeting preset training conditions and the target model;
and converting the target video according to the target video style conversion model to generate a video style matched with the target picture.
7. A video style conversion processing apparatus, comprising:
the first acquisition module is used for acquiring a sample picture for model training and a corresponding sample video set;
the first setting module is used for acquiring style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model;
the second setting module is used for acquiring content information and optical flow field information of each video frame in the sample video, further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame in the process of training the target model, and setting a third target output vector reflecting the optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame;
and the training generation module is used for training the network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters meeting preset training conditions and the target model, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture.
8. A video style conversion processing apparatus, comprising:
the second acquisition module is used for acquiring a video style conversion request containing a target video and a target picture;
the third acquisition module is used for acquiring a pre-trained target video style conversion model corresponding to the target picture; acquiring a target picture for model training and a corresponding sample video set; obtaining style attribute information of the target picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model; acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model; training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, and generating a video style conversion model corresponding to the target picture according to the corresponding target network parameters meeting preset training conditions and the target model;
and the conversion module is used for converting the target video according to the target video style conversion model to generate a video style matched with the target picture.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing a video style conversion process according to any one of claims 1 to 5.
10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the video style conversion processing method according to any one of claims 1 to 5.
11. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the video style conversion process of claim 6.
12. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the video style conversion processing method of claim 6.
CN201811220100.XA 2018-10-19 2018-10-19 Video style conversion processing method and device Active CN109359687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811220100.XA CN109359687B (en) 2018-10-19 2018-10-19 Video style conversion processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811220100.XA CN109359687B (en) 2018-10-19 2018-10-19 Video style conversion processing method and device

Publications (2)

Publication Number Publication Date
CN109359687A CN109359687A (en) 2019-02-19
CN109359687B true CN109359687B (en) 2020-11-24

Family

ID=65345917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811220100.XA Active CN109359687B (en) 2018-10-19 2018-10-19 Video style conversion processing method and device

Country Status (1)

Country Link
CN (1) CN109359687B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110599421B (en) * 2019-09-12 2023-06-09 腾讯科技(深圳)有限公司 Model training method, video fuzzy frame conversion method, device and storage medium
CN111556244B (en) * 2020-04-23 2022-03-11 北京百度网讯科技有限公司 Video style migration method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355555A (en) * 2011-09-22 2012-02-15 中国科学院深圳先进技术研究院 Video processing method and system
CN105303598A (en) * 2015-10-23 2016-02-03 浙江工业大学 Multi-style video artistic processing method based on texture transfer
WO2018075927A1 (en) * 2016-10-21 2018-04-26 Google Llc Stylizing input images
WO2018111786A1 (en) * 2016-12-16 2018-06-21 Microsoft Technology Licensing, Llc Image stylization based on learning network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355555A (en) * 2011-09-22 2012-02-15 中国科学院深圳先进技术研究院 Video processing method and system
CN105303598A (en) * 2015-10-23 2016-02-03 浙江工业大学 Multi-style video artistic processing method based on texture transfer
WO2018075927A1 (en) * 2016-10-21 2018-04-26 Google Llc Stylizing input images
WO2018111786A1 (en) * 2016-12-16 2018-06-21 Microsoft Technology Licensing, Llc Image stylization based on learning network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的图像与视频风格化研究与实现;操江峰;《中国优秀硕士学位论文全文数据库信息科技辑》;20171015;I138-232 *

Also Published As

Publication number Publication date
CN109359687A (en) 2019-02-19

Similar Documents

Publication Publication Date Title
US7657060B2 (en) Stylization of video
US7764310B2 (en) Image processing apparatus, program and method for performing preprocessing for movie reproduction of still images
US10650570B2 (en) Dynamic local temporal-consistent textured mesh compression
CN110085244B (en) Live broadcast interaction method and device, electronic equipment and readable storage medium
CN107180443B (en) A kind of Freehandhand-drawing animation producing method and its device
US9129655B2 (en) Time compressing video content
CN104394422A (en) Video segmentation point acquisition method and device
US20240087610A1 (en) Modification of objects in film
CN109359687B (en) Video style conversion processing method and device
US11582519B1 (en) Person replacement utilizing deferred neural rendering
US11581020B1 (en) Facial synchronization utilizing deferred neural rendering
CN110958469A (en) Video processing method and device, electronic equipment and storage medium
CN114972574A (en) WEB-based digital image real-time editing using latent vector stream renderer and image modification neural network
KR102546631B1 (en) Apparatus for video data argumentation and method for the same
Hoogeboom et al. High-fidelity image compression with score-based generative models
JP5109038B2 (en) Lip sync animation creation device and computer program
CN115049558A (en) Model training method, human face image processing device, electronic equipment and readable storage medium
JP3859989B2 (en) Image matching method and image processing method and apparatus capable of using the method
CN115049559A (en) Model training method, human face image processing method, human face model processing device, electronic equipment and readable storage medium
Ravichandran et al. Synthesizing photorealistic virtual humans through cross-modal disentanglement
CN115917647A (en) Automatic non-linear editing style transfer
Rajatha et al. Cartoonizer: Convert Images and Videos to Cartoon-Style Images and Videos
CN115761065A (en) Intermediate frame generation method, device, equipment and medium
Ghadekar et al. Video Regeneration Using Image Diffusion Model
Yan et al. Analogies based video editing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant