CN109359687B

CN109359687B - Video style conversion processing method and device

Info

Publication number: CN109359687B
Application number: CN201811220100.XA
Authority: CN
Inventors: 柏提; 孙昊; 刘霄; 李鑫; 赵翔; 杨凡; 李旭斌; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2020-11-24
Anticipated expiration: 2038-10-19
Also published as: CN109359687A

Abstract

The application provides a video style conversion processing method and a video style conversion processing device, wherein the method comprises the following steps: setting a first target output vector reflecting the style attribute characteristic network layer according to the style attribute information of the sample picture, setting a second target output vector reflecting the content characteristic network layer according to the content information of the current input video frame, setting a third target output vector reflecting the optical flow field characteristic network layer according to the optical flow field information of the current input video frame, training network parameters of each network layer in a target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.

Description

Video style conversion processing method and device

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for processing a video style conversion.

Background

With the continuous development of internet technology, the demand of users for richness of media resources is increasing, for example, from classical text content to later picture content, and further to videos, especially short video content, which are widely popular nowadays. On the other hand, users also want to be able to perform new artistic processing on the content itself to obtain a more novel and creative artistic form, such as artistic style conversion of videos.

In the related art, the artistic style conversion method can only perform style conversion on a single picture, so the style conversion processing is performed in a classic picture-by-picture-based mode, the problem of long consumed time is inevitably caused due to the fact that video content often contains massive data, and mutation possibly exists between every two pictures, namely the picture content after the style conversion and the picture content before the original conversion do not have the same optical flow field, and the fluency of the video is influenced.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present application is to provide a video style conversion processing method, which improves efficiency of video style conversion processing while ensuring video smoothness by training and generating a video style conversion model.

A second object of the present application is to propose another video style conversion processing method.

A third object of the present application is to provide a video style conversion processing apparatus.

A fourth object of the present application is to provide another video style conversion processing apparatus.

A fifth object of the present application is to propose a computer device.

A sixth object of the present application is to propose a computer program product.

A seventh object of the present application is to propose a non-transitory computer-readable storage medium.

To achieve the above object, a first object of the present application is to provide a video style conversion processing method, including:

obtaining a sample picture for model training and a corresponding sample video set;

obtaining style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model;

acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model;

and training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters meeting preset training conditions and the target model, and converting a target video according to the video style conversion model to generate a video style matched with the sample picture.

To achieve the above object, a second object of the present application is to provide a video style conversion processing method, including:

acquiring a video style conversion request containing a target video and a target picture;

acquiring a pre-trained target video style conversion model corresponding to the target picture;

and converting the target video according to the target video style conversion model to generate a video style matched with the target picture.

To achieve the above object, a third object of the present application is to provide a video style conversion processing apparatus, including:

the acquisition module is used for acquiring a sample picture for model training and a corresponding sample video set;

the first setting module is used for acquiring style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model;

the second setting module is used for acquiring content information and optical flow field information of each video frame in the sample video, further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame in the process of training the target model, and setting a third target output vector reflecting the optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame;

and the training generation module is used for training the network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters meeting preset training conditions and the target model, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture.

To achieve the above object, a fourth object of the present application is to provide a video style conversion processing apparatus, including:

the second acquisition module is used for acquiring a video style conversion request containing a target video and a target picture;

the third acquisition module is used for acquiring a pre-trained target video style conversion model corresponding to the target picture;

and the conversion module is used for converting the target video according to the target video style conversion model to generate a video style matched with the target picture.

To achieve the above object, a fifth embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the video style conversion process according to the foregoing method embodiment.

To achieve the above object, a sixth aspect of the present application provides a computer program product, which when being executed by an instruction processor, implements the video style conversion processing method according to the foregoing method embodiment.

To achieve the above object, a seventh embodiment of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the video style conversion processing method according to the foregoing method embodiment.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

obtaining a sample picture used for model training and a corresponding sample video set, obtaining style attribute information of the sample picture, further setting a first target output vector reflecting a style attribute characteristic network layer in a target model according to the style attribute information in the process of training the target model, obtaining content information and optical flow field information of each video frame in the sample video, further setting a second target output vector reflecting the content characteristic network layer in the target model according to the content information of a current input video frame, setting a third target output vector reflecting the optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame, and training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, and generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when the preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram of a video style conversion processing method according to one embodiment of the present application;

FIG. 2 is a flow chart of a video style conversion processing method according to another embodiment of the present application

FIG. 3 is a schematic structural diagram of a video style conversion processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a video style conversion processing apparatus according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of a video style conversion processing apparatus according to still another embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The method aims to solve the problem that time consumption is too long due to the fact that style conversion processing is carried out in a classic picture frame-by-frame based mode in the related technology, and the fluency of videos is affected because the picture content after the style conversion and the picture content before the original conversion do not have the same optical flow field.

According to the method and the device, the video style conversion model is generated, so that the video content before and after the video conversion processing is not changed, only the style conversion of the video is performed, and the fluency of the video is ensured.

The following describes a video style conversion processing method and apparatus according to an embodiment of the present application with reference to the drawings.

Fig. 1 is a flowchart of a video style conversion processing method according to an embodiment of the present application, as shown in fig. 1, the method including:

step 101, obtaining a sample picture for model training and a corresponding sample video set.

It is understood that a sample picture represents an artistic style, that is, a sample picture corresponds to a video style conversion model.

It can also be understood that, in order to enable the video style conversion model generated by training in the example of the present application to be suitable for style conversion of different target videos, it is necessary to acquire as many videos of different scenes as possible as a sample video set for model training.

In order to improve the effectiveness of the video style conversion model, a natural scene video of nearly thousand orders of magnitude is adopted as a sample video set as much as possible for model training.

It should be noted that, in order to further improve richness and accuracy of model training, as a possible implementation manner, according to the size of an input picture of a target model, the size of each video frame in a sample picture and/or a sample video set is adjusted, so that the size of each video frame in the adjusted sample picture and/or sample video set matches with the size of the input picture.

The size of the input picture may be adjusted according to actual application requirements, and there are various ways to adjust the size of each video frame in the sample picture and/or the sample video set, for example, as follows:

in a first example, the size of each video frame in the sample picture and/or sample video set is cropped.

The cropping processing can be any position of each video frame in the sample picture and/or the sample video set, and the flexibility of the processing is further improved.

A second example, the size of each video frame in the sample picture and/or sample video set is interpolated.

102, obtaining style attribute information of the sample picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in the target model according to the style attribute information in the process of training the target model.

Specifically, different sample pictures have different style attribute information, the style attribute information in the sample pictures can be obtained through a picture processing algorithm and other modes in the related technology, for example, sanskrit starry sky is used as the sample picture, and the style attribute information of the sample picture is obtained, for example, the style of blue like sea, and the style of soft and quiet color tone.

It can be understood that the target model has a plurality of layer network layers, one or more of which can be used as a network layer reflecting the style attribute characteristics according to needs, and a first target output vector reflecting the style attribute characteristics is set according to the style attribute information. Wherein, the target model can be a model trained by VGG19 on ImageNet.

Step 103, obtaining content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of the current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model.

It is to be understood that the sample video is composed of a plurality of video frames, each having corresponding content information and optical flow field information. Content information such as characters, images, and the like; optical flow field information such as motion information of objects, and rich information about the three-dimensional structure of the scene, etc.

Similarly, one or more of the multiple network layers of the target model can be used as a network layer reflecting the content characteristics as required, and a second target output vector reflecting the content characteristics can be set according to the content information of the current input video frame; one or more of the multiple network layers of the target model can be used as a network layer reflecting the optical flow field characteristics as required, and a third target output vector reflecting the optical flow field characteristics can be set according to the optical flow field information of the current input video frame.

For example, the target model is a 90-layer network layer, the bottom 30 layers are used as the network layers for reflecting the style attribute, and the middle 30 layers are used as the network layers for reflecting the content characteristic and the top 30 layers are used as the network layers for reflecting the optical flow field characteristic.

It is emphasized that in the present example, the network layer in the target model cannot be set repeatedly, such as the network layer that has been set to reflect the style attribute features cannot be set repeatedly any more, and such as the network layer that has been set to reflect the optical flow field features cannot be set repeatedly any more.

That is, different network layers in the target model may be used to reflect different features, such as style attribute features, content features, optical flow field features, and the like, so as to improve the effectiveness of model training.

And 104, training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture.

Specifically, the network parameters of each network layer in the target model are trained through the first target output vector, the second target output vector and the third target output vector, so that the network parameters of different network layers in the target model can be infinitely close to the corresponding reflection style attribute characteristic network layer, the reflection content characteristic network layer and the reflection optical flow field characteristic network layer respectively, and in order to meet the preset training condition, the video style conversion model corresponding to the sample picture can be generated by the corresponding network parameters and the target model.

Furthermore, the target video can be converted according to the video style conversion model to generate a video style matched with the sample picture.

It can be understood that the video style conversion model output picture and the sample picture are trained corresponding to the network layer of the target model reflecting the style attribute characteristics so as to ensure the style similarity between the two pictures; the video style conversion model outputs a picture and a video frame, and the content feature network layer of the picture and the video frame corresponding to the target model is trained, so that the similarity of the content and the style is ensured. And training the video optical flow field before and after transformation to ensure that the two have similar optical flows, so that the video after style transformation has the characteristic of smooth interframes.

In practical applications, different users have different preferences for artistic styles, and in order to enable the video style conversion model generated by training in the example of the present application to meet the video style conversion requirements of different users, it is necessary to obtain as many sample pictures with different artistic styles as possible for generating a plurality of different video style conversion models.

Specifically, sample pictures with different artistic styles can be randomly or purposefully selected, and as an example, a plurality of artistic pictures with western artistic painting styles are obtained to be used as sample pictures for generating a plurality of different video style conversion models; as another example, a plurality of western art painting-style art pictures, a plurality of chinese traditional painting-style art pictures, and a plurality of japanese cartoon-style art pictures are acquired as sample pictures for generating a plurality of different video-style conversion models.

When the method is applied, a target video style conversion model is selected according to user requirements to convert a target video to generate a video style matched with a sample picture. Wherein the artistic style of the sample picture is the target artistic style that the user needs to convert into.

In this example, in order to ensure that the video style conversion model can be converted in real time at the terminal device, after the video style conversion model corresponding to the sample picture is generated, the target network parameters need to be calculated according to a preset algorithm, and the network layer corresponding to the candidate network parameters whose calculation results meet the preset filtering conditions is deleted.

As a possible implementation manner, a filtering algorithm based on the L1 norm calculates a target network parameter to obtain a network layer corresponding to a candidate network parameter whose absolute value is smaller than a preset threshold, and performs deletion processing on the network layer, thereby implementing the compression acceleration effect of the video style conversion model.

That is to say, after the training of the video style conversion model is completed, the stylized video which is smoother can be generated without performing explicit calculation on the optical flow field, so that the processing speed of the video style conversion model can be greatly increased, and the practicability of the video style conversion model is further improved.

In the present example, while the video style conversion model is deployed, a memory multiplexing technique may be employed to preserve efficient use of memory.

As a possible implementation manner, a memory multiplexing setting is performed on the network layer in the video style conversion model, so that in the process of performing conversion processing on the target video according to the video style conversion model, processing data of the network layer stored in the memory is deleted.

That is, after the style conversion processing is performed on the target video a, the processing of the network layer in the video style conversion model that is set by memory multiplexing may be deleted, so as to perform the style conversion processing on the next target video B, thereby improving the efficiency of the video style conversion processing.

To sum up, the video style conversion processing method of the embodiment of the present application obtains a sample picture and a corresponding sample video set for model training, obtains style attribute information of the sample picture, further sets a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model, obtains content information and optical flow field information of each video frame in the sample video, further sets a second target output vector of the network layer reflecting content characteristics in the target model according to the content information of a current input video frame in the process of training the target model, sets a third target output vector of the network layer reflecting optical flow field characteristics in the target model according to the optical flow field information of the current input video frame, trains network parameters of each network layer in the target model according to the first target output vector, the second target output vector, and the third target output vector, and generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when the preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.

Fig. 2 is a flowchart of a video style conversion processing method according to another embodiment of the present application, as shown in fig. 2, the method including:

step 201, a video style conversion request containing a target video and a target picture is obtained.

Step 202, a pre-trained target video style conversion model corresponding to the target picture is obtained.

And step 203, converting the target video according to the target video style conversion model to generate a video style matched with the target picture.

Specifically, different users have different preferences for artistic styles, may have different scenes, different requirements for artistic styles, and the like. It is therefore desirable to obtain as many sample pictures of different artistic styles as possible for generating a plurality of different video style conversion models.

Therefore, when the method is applied, a video style conversion request containing the target video and the target picture is obtained. When a user needs to perform style conversion on a target video, the target video and a target picture are determined, the target video is the video needing to be converted, the target picture is the target style needing to be converted, a pre-trained target video style conversion model corresponding to the target picture is obtained, namely the target video style conversion model corresponding to the target style is obtained, and therefore the target video is converted according to the target video style conversion model to generate a video style matched with the target picture, video style conversion processing can be achieved rapidly, and user experience is improved.

In order to implement the foregoing embodiment, an embodiment of the present application further provides a video style conversion processing apparatus, and fig. 3 is a schematic structural diagram of the video style conversion processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the video style conversion processing apparatus includes: a first acquisition module 310, a first setup module 320, a second setup module 330, and a training generation module 340.

The first obtaining module 310 is configured to obtain a sample picture for model training and a corresponding sample video set.

The first setting module 320 is configured to obtain style attribute information of the sample picture, and further set a first target output vector, which reflects a style attribute feature network layer, in the target model according to the style attribute information in a process of training the target model.

The second setting module 330 is configured to obtain content information and optical flow field information of each video frame in the sample video, and further set a second target output vector, which reflects a content feature network layer in the target model, according to the content information of the current input video frame and set a third target output vector, which reflects an optical flow field feature network layer in the target model, according to the optical flow field information of the current input video frame in the process of training the target model.

The training generating module 340 is configured to train a network parameter of each network layer in the target model according to the first target output vector, the second target output vector, and the third target output vector, and generate a video style conversion model corresponding to the sample picture according to the target network parameter and the target model corresponding to the preset training condition, so as to convert the target video according to the video style conversion model to generate a video style matched with the sample picture.

In an embodiment of the present application, as shown in fig. 4, on the basis of fig. 3, the method further includes: an adjusting module 350, a calculating and deleting module 360 and a multiplexing and deleting module 370.

And an adjusting module 350, configured to adjust, according to the size of the input picture of the target model, the size of each video frame in the sample picture and/or the sample video set, so that the size of each video frame in the sample picture and/or the sample video set after adjustment matches the size of the input picture.

In one embodiment of the present application, the size of each video frame in the sample picture and/or the sample video set is clipped or interpolated.

And the calculation deletion module 360 is configured to calculate the target network parameter according to a preset algorithm, and delete the network layer corresponding to the candidate network parameter whose calculation result meets the preset filtering condition.

And a multiplexing deletion module 370, configured to perform memory multiplexing on the network layer in the video style conversion model, so that in the process of performing conversion processing on the target video according to the video style conversion model, the processing data of the network layer stored in the memory is deleted.

It should be noted that the foregoing explanation on the embodiment of the video style conversion processing method is also applicable to the video style conversion processing apparatus of the embodiment, and the implementation principle thereof is similar, and is not repeated here.

To sum up, the video style conversion processing apparatus of the embodiment of the present application obtains a sample picture and a corresponding sample video set for model training, obtains style attribute information of the sample picture, further sets a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in a process of training the target model, obtains content information and optical flow field information of each video frame in the sample video, further sets a second target output vector of the network layer reflecting content characteristics in the target model according to the content information of a current input video frame in the process of training the target model, sets a third target output vector of the network layer reflecting optical flow field characteristics in the target model according to the optical flow field information of the current input video frame, trains network parameters of each network layer in the target model according to the first target output vector, the second target output vector, and the third target output vector, and generating a video style conversion model corresponding to the sample picture according to the corresponding target network parameters and the target model when the preset training conditions are met, and converting the target video according to the video style conversion model to generate a video style matched with the sample picture. Therefore, the efficiency of the video style conversion processing is improved while the video fluency is ensured.

In order to implement the foregoing embodiment, an embodiment of the present application further provides a video style conversion processing apparatus, and fig. 5 is a schematic structural diagram of a video style conversion processing apparatus according to yet another embodiment of the present application, as shown in fig. 5, the video style conversion processing apparatus includes: a second obtaining module 510, a third obtaining module 520, and a converting module 530.

A second obtaining module 510, configured to obtain a video style conversion request including a target video and a target picture.

A third obtaining module 520, configured to obtain a pre-trained target video style conversion model corresponding to the target picture.

And the conversion module 530 is configured to perform conversion processing on the target video according to the target video style conversion model to generate a video style matched with the target picture.

In order to implement the foregoing embodiments, the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the computer device implements the video style conversion process according to the foregoing method embodiments.

In order to implement the foregoing embodiments, the present application also proposes a computer program product, which when executed by an instruction processor in the computer program product implements the video style conversion processing method as described in the foregoing method embodiments.

In order to implement the above embodiments, the present application also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video style conversion processing method as described in the foregoing method embodiments.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A video style conversion processing method is characterized by comprising the following steps:

2. The method of claim 1, wherein after the obtaining sample pictures and corresponding sample video sets for model training, further comprising:

and adjusting the size of each video frame in the sample picture and/or the sample video set according to the size of the input picture of the target model, so that the size of each video frame in the sample picture and/or the sample video set after adjustment is matched with the size of the input picture.

3. The method of claim 2, wherein the resizing each video frame in the sample picture and/or the sample video set comprises:

clipping the size of the sample picture and/or each video frame in the sample video set, or,

and interpolating the size of each video frame in the sample picture and/or the sample video set.

4. The method of claim 1, wherein after the generating the video style conversion model corresponding to the sample picture, further comprising:

and calculating the target network parameters according to a preset algorithm, and deleting the network layer corresponding to the candidate network parameters of which the calculation results meet the preset filtering conditions.

5. The method of claim 1, before the converting the target video according to the video style conversion model to generate the video style matching the sample picture, further comprising:

and carrying out memory multiplexing setting on the network layer in the video style conversion model so as to delete the processing data of the network layer stored in the memory in the process of carrying out conversion processing on the target video according to the video style conversion model.

6. A video style conversion processing method is characterized by comprising the following steps:

acquiring a pre-trained target video style conversion model corresponding to the target picture; acquiring a target picture for model training and a corresponding sample video set; obtaining style attribute information of the target picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model; acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model; training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, and generating a video style conversion model corresponding to the target picture according to the corresponding target network parameters meeting preset training conditions and the target model;

7. A video style conversion processing apparatus, comprising:

the first acquisition module is used for acquiring a sample picture for model training and a corresponding sample video set;

8. A video style conversion processing apparatus, comprising:

the third acquisition module is used for acquiring a pre-trained target video style conversion model corresponding to the target picture; acquiring a target picture for model training and a corresponding sample video set; obtaining style attribute information of the target picture, and further setting a first target output vector of a network layer reflecting style attribute characteristics in a target model according to the style attribute information in the process of training the target model; acquiring content information and optical flow field information of each video frame in the sample video, and further setting a second target output vector reflecting a content characteristic network layer in the target model according to the content information of a current input video frame and setting a third target output vector reflecting an optical flow field characteristic network layer in the target model according to the optical flow field information of the current input video frame in the process of training the target model; training network parameters of each network layer in the target model according to the first target output vector, the second target output vector and the third target output vector, and generating a video style conversion model corresponding to the target picture according to the corresponding target network parameters meeting preset training conditions and the target model;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing a video style conversion process according to any one of claims 1 to 5.

10. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the video style conversion processing method according to any one of claims 1 to 5.

11. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the video style conversion process of claim 6.

12. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the video style conversion processing method of claim 6.