CN111556244B

CN111556244B - Video style migration method and device

Info

Publication number: CN111556244B
Application number: CN202010327039.XA
Authority: CN
Inventors: 林天威; 李鑫; 李甫; 何栋梁; 张赫男; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2022-03-11
Anticipated expiration: 2040-04-23
Also published as: CN111556244A

Abstract

The application discloses a video style migration method and device, and relates to the technical field of computer vision. One embodiment of the method comprises: acquiring intermediate characteristics of style migration of previous image frames of two adjacent frames in a video; acquiring optical flows of two adjacent frames, and performing distortion processing on the intermediate feature of style migration of a previous image frame based on the optical flows to obtain the intermediate feature of style migration of a next image frame in the two adjacent frames; and updating the intermediate characteristic of the style migration of the previous image frame by using the intermediate characteristic of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration. The embodiment improves the frame rate when the video style is migrated.

Description

Video style migration method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of computer vision, and particularly relates to a video style migration method and device.

Background

A generated countermeasure network (GAN) can convert pictures in one style into pictures in another style, for example, convert self-portrait pictures of a real person into a cartoon style. Current GAN networks are usually generated for a single frame of image, and the speed is usually slow, so that it is difficult to achieve real-time effect on the mobile terminal when used to generate video.

Disclosure of Invention

The embodiment of the application provides a video style migration method and device, electronic equipment and a computer readable medium.

According to a first aspect, there is provided a video style migration method, the method comprising: acquiring intermediate characteristics of style migration of previous image frames of two adjacent frames in a video; acquiring optical flows of two adjacent frames; based on the optical flow, performing distortion processing on the intermediate feature of the style migration of the previous image frame to obtain the intermediate feature of the style migration of the next image frame in the two adjacent frames; and updating the intermediate characteristic of the style migration of the previous image frame by using the intermediate characteristic of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration.

According to a second aspect, there is provided a video style migration apparatus, the apparatus comprising: the characteristic acquisition module is configured to acquire intermediate characteristics of style migration of the prior image frames of two adjacent frames in the video; an optical flow acquisition module configured to acquire optical flows of two adjacent frames; the distortion processing module is configured to perform distortion processing on the intermediate features of the style migration of the previous image frame based on the optical flow to obtain the intermediate features of the style migration of the next image frame in the two adjacent frames; and the style migration module is configured to update the intermediate feature of the style migration of the previous image frame by using the intermediate feature of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration.

According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

According to a fourth aspect, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first aspect.

According to the method and the device for transferring the video style, firstly, the intermediate characteristic of style transfer of the previous image frame of two adjacent frames in the video is obtained; then acquiring optical flows of two adjacent frames, and then performing distortion processing on the intermediate feature of the style migration of the previous image frame based on the optical flows to obtain the intermediate feature of the style migration of the next image frame in the two adjacent frames; and finally, updating the intermediate characteristic of the style migration of the previous image frame by using the intermediate characteristic of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration. Therefore, the style migration operation of the whole path is not needed to be carried out on the subsequent image frame, the subsequent path style migration operation is only needed to be carried out on the intermediate characteristic of the style migration of the subsequent image frame, the calculation amount is saved, the time consumption of the video in the style migration is reduced, and the frame rate of the video in the style migration is improved under the condition that the style migration effect is not lost.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary flow diagram of an embodiment of a video style migration method according to the present application;

FIG. 2 is an exemplary flowchart of a method for acquiring optical flows of two adjacent frames according to an embodiment of the present application;

FIG. 3 is an exemplary flowchart of a method for deriving intermediate features for style migration of a subsequent image frame according to an embodiment of the present application;

FIG. 4 is an exemplary flow chart of a method of deriving smoothed intermediate features according to an embodiment of the present application;

FIG. 5 is a diagram of an example of a prior image frame in a video style migration method according to the present application;

fig. 6 is an exemplary diagram of a stylized frame after a style migration of a prior image frame in a video style migration method according to the application.

FIG. 7 is a diagram of an example of a subsequent image frame in a video style migration method according to the application;

FIG. 8 is an exemplary graph of optical flow for two adjacent frames in a video style migration method according to the application;

fig. 9 is an exemplary diagram of stylized frames after a subsequent image frame is style migrated in accordance with the video style migration method of the present application.

FIG. 10 is a schematic block diagram of an embodiment of a video style migration apparatus according to the present application;

fig. 11 is a block diagram of an electronic device for implementing a video style migration method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 shows an exemplary flow 100 according to an embodiment of a video style migration method of the present application. The video style migration method comprises the following steps:

step 101, acquiring intermediate features of style migration of previous image frames of two adjacent frames in a video.

In this embodiment, an execution subject (e.g., a server or a terminal device) on which the video style migration method operates may obtain an intermediate feature of the style migration of a previous image frame of two adjacent frames in the video.

The style migration network can be adopted to migrate each image frame in the video into a stylized frame corresponding to each image frame one by one, and the intermediate characteristic of the style migration of each image frame in the video can be as follows: features generated in the style migration network after each image frame is input into the style migration network and before the style migration network obtains a stylized frame for each image frame. The execution body can directly acquire the intermediate features of each image frame from the style migration network for style migration. In addition, the intermediate features of the image frames for style migration may also be features pre-stored on a local or other electronic device, and the execution subject may directly obtain the intermediate features of the image frames for style migration from the local or other electronic device.

Further, the style migration network can convert one style of picture into another style of picture, and may be a style migration network having an encoder/decoder structure, and the style migration network may adopt a network structure such as an Auto Encoder (AE), GAN, and the like, and based on different training data, the style migration network may convert a photo of a real person into a photo of a cartoon style, or convert a photo of a real person into a photo of an oil painting style, or convert a photo of a real person into a photo of a hand painting style, and the like. In practice, the intermediate features of the style migration of each image frame in the video may be features generated by an encoder of the style migration network, for example, the shape of the intermediate features output by the encoder of the style migration network is 64 × 256, where 64 × 64 is the size of each feature map in the intermediate features, and 256 is the number of feature map channels.

Step 102, optical flows of two adjacent frames are obtained.

Specifically, optical flow (optical flow) is the instantaneous velocity of pixel motion of a spatially moving object on the viewing imaging plane. When the time interval is small (for example, between two consecutive frames of the video), the optical flow is equivalent to the displacement of the target pixel point, that is, the displacement of each pixel on the image frame during the movement of the image frame in the video. For example, the position of the pixel point a in the T-th image frame is (x1, y1), and when the pixel point a is found in the T +1 image frame, the position thereof becomes (x2, y2), and the displacement amount of the pixel point a in two adjacent frames is (u, v) ═ x2, y2) - (x1, y 1).

In this embodiment, the optical flow may be a dense optical flow or a sparse optical flow, where the dense optical flow is an image registration method for performing point-by-point matching on an image frame in a video, and the dense optical flow calculates offsets of all points on the image frame in the video, unlike the sparse optical flow which is only for a plurality of feature points on the image frame in the video, so as to form a dense optical flow field. Through the dense optical flow field, image registration at a pixel level can be performed, so that the effect after registration is obviously better than that of sparse optical flow registration. But since the dense optical flow calculates the offset of each point, the calculation amount is also significantly larger than that of the sparse optical flow.

Specifically, the above-described optical flow can be obtained by various means. For example, the executing agent may obtain the optical flow directly from a local or other electronic device. Further, the execution main body may acquire a succeeding image frame adjacent to the preceding image frame and the preceding image frame, and generate the optical flow. For example, the execution main body may generate an initial optical flow between a preceding image frame and a succeeding image frame, and perform preprocessing on the initial optical flow to obtain the optical flow, where the preprocessing may make the feature points in a relatively weak region of the initial optical flow have a displacement amount as close as possible to the offset amount of the feature points in a strong region, or make the displacement amounts of two adjacent feature points not differ too much, that is, the rate of change of the displacement amounts not too much.

And 103, based on the optical flow, performing distortion processing on the intermediate features subjected to style migration on the previous image frame to obtain the intermediate features subjected to style migration on the next image frame in the two adjacent frames.

In this embodiment, the distortion processing on the image refers to the position deviation of each pixel point in the image; and the distortion processing is carried out on the intermediate features subjected to style migration on the previous image frame, and the distortion processing comprises the following steps: and controlling the position offset of pixel points in the feature map of each channel in the previous image frame based on the optical flow, wherein the offset of the offset is the optical flow.

Optionally, controlling, based on the optical flow, a pixel point in a feature map of each channel in a previous image frame to perform position shift may include: and shifting the intermediate feature subjected to style migration of the previous image frame according to the optical flow to obtain the intermediate feature subjected to style migration of the next image frame in two adjacent frames.

Optionally, controlling, based on the optical flow, a pixel point in a feature map of each channel in a previous image frame to perform position offset, and the method may further include: and shifting the intermediate features subjected to style migration of the previous image frame according to the optical flow, and then performing smoothing processing (such as mean filtering processing and median filtering processing) on the feature map of each channel to obtain the intermediate features subjected to style migration of the subsequent image frame in two adjacent frames.

And 104, updating the intermediate characteristic of the style migration of the previous image frame by using the intermediate characteristic of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration.

In this embodiment, when the intermediate feature of the style migration performed on each image frame in the video is a feature generated by an encoder of the style migration network, the intermediate feature of the style migration performed on the subsequent image frame is used to update the intermediate feature of the style migration performed on the previous image frame, that is, the intermediate feature of the style migration performed on the subsequent image frame is used to replace the intermediate feature of the style migration performed on the previous image frame, so that the stylized frame output by the style migration network can be the stylized frame after the style migration performed on the subsequent image frame.

The method for transferring the video style comprises the steps of firstly obtaining intermediate characteristics of style transfer of previous image frames of two adjacent frames in a video; then acquiring optical flows of two adjacent frames, and then performing distortion processing on the intermediate feature of the style migration of the previous image frame based on the optical flows to obtain the intermediate feature of the style migration of the next image frame in the two adjacent frames; and finally, updating the intermediate characteristic of the style migration of the previous image frame by using the intermediate characteristic of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration. Therefore, the style migration operation of the whole path is not needed to be carried out on the subsequent image frame, the subsequent path style migration operation is only needed to be carried out on the intermediate characteristic of the style migration of the subsequent image frame, the calculation amount is saved, the time consumption of the video in the style migration is reduced, and the frame rate of the video in the style migration is improved under the condition that the style migration effect is not lost.

In order to better reduce the time consumed by the video style migration, in some optional implementations of the present embodiment, further referring to fig. 2, the optical flow employs a dense optical flow, and the acquiring the optical flows of two adjacent frames may be performed according to the following procedure:

step 201, reducing the previous image frame to the size of the intermediate feature of the previous image frame for style migration.

In this implementation, the size of the intermediate feature for performing the style migration on the previous image frame may be determined in advance by the style migration network, and the execution subject of the video style migration method needs to acquire the size of the intermediate feature for performing the style migration on the previous image frame before reducing the size of the previous image frame of the video, and reduces the size of the previous image frame according to the size of each feature map in the intermediate feature, for example, the size of the previous image frame is 256 × 256, and the size of the feature map is 32 × 32, and needs to reduce the size of the previous image frame to 32 × 32.

In step 202, a subsequent image frame of two adjacent frames is reduced to the size of the intermediate feature of the previous image frame for style migration.

In this optional implementation, the algorithm for reducing the sizes of the previous image frame and the subsequent image frame may adopt a variety of algorithms:

1) in a nearest neighbor interpolation algorithm in a classical interpolation method, a constant function is adopted as a reconstruction function of the nearest neighbor interpolation algorithm, the simplest image scaling is that the scaling ratios in the horizontal direction and the vertical direction are the same, and on the basis of an original image, one point is taken every other pixel in each line, and operation is performed every other line, so that a scaled image is obtained.

2) In a bilinear interpolation algorithm in a classical interpolation method, the bilinear interpolation algorithm utilizes the correlation of four pixel points around an original image pixel point needing to be processed, and a reconstruction function is calculated by a bilinear function.

The reduced target image obtained by the classical interpolation method has the phenomena of edge blurring or sawtooth and the like.

3) The curved surface interpolation method in the non-classical interpolation method is characterized in that a double curved surface is fitted in a square range formed by taking any adjacent 4 points capable of forming a square in an image as vertexes. The effect of the method is better than that of a classical interpolation method, but the reference points are few, the local characteristics of the image are ignored, and when the image is reduced to be large, a small amount of saw teeth may exist at the edge.

And step 203, calculating a dense optical flow between the reduced previous image frame and the reduced next image frame by adopting an optical flow method, and taking the dense optical flow as the optical flow of two adjacent frames.

In this alternative implementation, the optical flow method is a method for calculating motion information of an object between adjacent frames by using a change of a pixel in an image sequence in a time domain and a correlation between adjacent frames to find a correspondence between a previous frame and a current frame.

Optical flow methods can be classified into: gradient-based methods, matching-based methods, energy-based methods, phase-based methods, and the like.

In this embodiment, the reduced preceding image frame and the reduced succeeding image frame are only size differences with respect to the preceding image frame and the succeeding image frame, and both the reduced preceding image frame and the reduced succeeding image frame are conventional frame images, so that the calculation of the dense optical flow between the reduced preceding image frame and the reduced succeeding image frame can be implemented by any one of the implementations of the optical flow method described above.

In the optional implementation mode, based on the principle that the dense optical flow reflects the mapping relationship between image pixels, firstly, the previous image frame and the subsequent image frame are subjected to reduction processing, so that the size of the previous image frame and the size of the subsequent image frame are reduced to be the size of the intermediate feature, and the size of the intermediate feature is generally smaller than the size of the previous image frame and the size of the subsequent image frame.

Because the dense optical flow is adopted between the two adjacent frames, the accurate offset may not be found by the pixel points of the pure color area of the image, and a smoothing process is also needed to be added in order to obtain the more accurate intermediate feature for performing the style migration on the subsequent image frame. Specifically, in some optional implementations of this embodiment, further referring to fig. 3, the warping processing is performed on the intermediate feature subjected to style migration in the previous image frame based on the optical flow to obtain the intermediate feature subjected to style migration in the subsequent image frame in the two adjacent frames, which may be performed according to the following process:

step 301, based on the optical flow, respectively performing warping processing on the intermediate features subjected to style transition in the reduced subsequent image frame and the previous image frame, and generating a warped subsequent image frame and a warped intermediate feature.

In this optional implementation, the warping the reduced subsequent image frame may include: the reduced subsequent image frame is shifted by the optical flow to generate a distorted subsequent image frame with the optical flow as a shift amount.

Step 302, based on the distorted subsequent image frame and the reduced subsequent image frame, smoothing the distorted intermediate feature to obtain a smoothed intermediate feature.

In the image processing technology, smoothing processing may also be called filtering or smoothing filtering, and smoothing filtering is a low-frequency enhanced spatial domain filtering technology, and its purpose is divided into two categories: one is blur; the other is noise cancellation. The smoothing filtering in the spatial domain is generally performed by a simple averaging method, that is, an average luminance value of neighboring pixel points is obtained. The size of the neighborhood is directly related to the smoothing effect, the larger the neighborhood is, the better the smoothing effect is, but the larger the neighborhood is, the larger the loss of the image edge information is due to the fact that the smoothing effect is, so that the output image becomes fuzzy, and therefore the size of the neighborhood needs to be reasonably selected.

In this optional implementation, the principle of smoothing the distorted intermediate feature is the same as that of smoothing in the image processing technology, and it is also for the purpose that the offset effect of each feature point in the distorted intermediate feature is relatively uniform.

Furthermore, in order to make the smoothed intermediate feature more accurately approximate to the actual intermediate feature of the style transition of the subsequent image frame, and in view of the fact that both the distorted subsequent image and the reduced subsequent image frame can reflect the characteristics of the subsequent image frame, the distorted subsequent image frame and the reduced subsequent image frame are used as references for smoothing processing of the distorted intermediate feature, and therefore the accuracy of the smoothed intermediate feature can be improved.

Furthermore, a large number of sample experiments can be performed in the style migration network, the corresponding relation among the distorted subsequent image frame, the reduced subsequent image frame and the smoothed intermediate feature is determined, and the intermediate feature is smoothed according to the corresponding relation to obtain the smoothed intermediate feature.

Optionally, a correspondence table of the current warped subsequent image frame and the current reduced subsequent image frame is preset in the execution main body, and after the current warped subsequent image frame and the current reduced subsequent image frame are obtained, the currently smoothed intermediate feature corresponding to the current warped subsequent image frame and the current reduced subsequent image frame is determined by searching the correspondence table.

And comparing the distorted intermediate features with the current smoothed intermediate features, and offsetting the distorted intermediate features according to the set multiple of the optical flow once the comparison result does not meet the preset condition, wherein the obtained offset intermediate features are smoothed intermediate features.

It should be noted that the preset condition may be determined according to the accuracy of the style migration network, for example, the distorted intermediate feature is less than 80% of the current smoothed intermediate feature. The setting multiple may be determined according to the distortion requirement, for example, the setting multiple is 2.

And step 303, taking the smoothed intermediate feature as an intermediate feature for performing style migration on a subsequent image frame in two adjacent frames.

In this embodiment, the smoothed intermediate feature may replace the intermediate feature for performing the style migration on the subsequent image frame, and enter the style migration network, so that the stylized frame of the subsequent image frame output by the style migration network may be obtained.

The method for obtaining the intermediate characteristic of the style migration of the subsequent image frame provided by the optional implementation mode comprises the steps of performing distortion processing on the reduced subsequent image frame to generate a distorted subsequent image frame; based on the distorted and reduced subsequent image frames, the distorted intermediate features are subjected to smoothing processing, so that the offset of the feature points in a weaker region in the distorted intermediate features is close to the offset of the strong feature points as much as possible, and the offset difference of two adjacent feature points is small, namely the change rate of the offset is small, thereby ensuring the accuracy of the intermediate features for performing style migration on the subsequent image frames and improving the accuracy of the style migration.

In some optional implementations of this embodiment, with further reference to fig. 4, the above-mentioned warped subsequent image frame and reduced subsequent image frame, performing a smoothing process on the warped intermediate feature to obtain a smoothed intermediate feature, may be performed according to the following procedures:

step 401, calculating a first absolute difference value between the distorted subsequent image frame and the reduced subsequent image frame, where the first absolute difference value is an absolute value of a difference between pixel values at corresponding positions of the distorted subsequent image frame and the reduced subsequent image frame.

In this alternative implementation, the distorted intermediate feature may be regarded as an approximate result of the real intermediate feature of the subsequent image frame, but the distorted intermediate feature may have a place that is not smooth enough and accurate enough, and the first difference absolute value, i.e. the optical flow mapping error, may reflect the position that is not accurate enough to a certain extent, so as to provide a basis for the place that is not smooth enough and accurate enough in the subsequent corrected distorted intermediate feature.

And 402, inputting the first difference absolute value and the distorted intermediate feature into a feature time sequence migration model to obtain a smoothed intermediate feature output by the feature time sequence migration model.

In this optional implementation, the working principle of the feature timing migration model is as follows: a smoothed intermediate feature is generated based on the first difference absolute value and the warped intermediate feature. The feature time sequence migration model can be obtained by training a large number of data samples in order to achieve high accuracy of the output smoothed intermediate features.

In the implementation shown in fig. 4, the first absolute difference value is obtained first, and then the distorted intermediate feature is processed by using the first absolute difference value and the feature time sequence migration model to obtain the smoothed intermediate feature, which may be closer to the intermediate feature actually subjected to style migration in the subsequent image frame, compared to the distorted intermediate feature, thereby improving the accuracy and reliability of style migration.

In another optional implementation manner of this embodiment, the input of the feature timing migration model is the first difference absolute value and the distorted intermediate feature, the output of the feature timing migration model is the smoothed intermediate feature, and the feature timing migration model takes the second difference absolute value smaller than the preset threshold as the optimization target.

And the second absolute difference value is the absolute value of the difference value of the pixel values of the positions corresponding to the feature image in the channel of the smoothed intermediate feature and the intermediate feature subjected to style migration in the subsequent image frame.

In this optional implementation, the feature timing migration model may be a model obtained by performing end-to-end learning based on a Convolutional Neural Network (CNN), which is used to refine the smoothed intermediate features. End-to-end learning refers to a learning mode with one step, and the principle of the learning mode is as follows: the mode of input, the mode of output and the supervised learning mode of output are determined, and the module can be directly trained. For example, when the feature time sequence migration model is trained, the input of the feature time sequence migration model is determined as the first difference absolute value and the distorted intermediate feature, the output of the feature time sequence migration model is determined as the smoothed intermediate feature, and the supervised learning mode is that the second difference absolute value is smaller than the preset threshold.

Specifically, the preset threshold may be changed with the change of the distorted intermediate feature, for example, the preset threshold is an absolute value of a difference between pixel values of positions corresponding to the feature map in the channel of the distorted intermediate feature and the intermediate feature subjected to style migration in the subsequent image frame, and an error between the smoothed intermediate feature and the intermediate feature subjected to style migration in the subsequent image frame may be minimized by the preset threshold. Alternatively, the preset threshold may be a fixed invariant value, which may be determined according to the accuracy of the style migration network.

In the optional implementation manner, the preset threshold is set for the feature time sequence migration model so that the difference between the generated smoothed intermediate feature and the intermediate feature for performing style migration on the post-image frame is smaller than the difference between the distorted intermediate feature and the intermediate feature for performing style migration on the post-image frame, a fine-trimming process of the feature time sequence migration module on the smoothed intermediate feature is realized, the unevenness and jitter of the distorted intermediate feature are reduced, and the accuracy and reliability of the smoothed intermediate feature are ensured; furthermore, a preset threshold value is set, so that the aim of minimizing the error between the smoothed middle feature and the middle feature for performing style migration on the subsequent image frame in the video style migration process can be fulfilled.

In another optional implementation manner of this embodiment, the obtaining of the intermediate feature of performing style migration on a previous image frame of two adjacent frames in the video includes: and inputting the prior images of two adjacent frames in the video into the encoder of the generator of the generative countermeasure network to obtain the intermediate characteristics output by the encoder of the generator of the generative countermeasure network.

In the optional implementation mode, the intermediate characteristic of the style migration of the previous image frame is obtained by adopting the generation countermeasure network, so that the speed of the style migration of the video can be improved while the structure and the calculated amount of the generation countermeasure network are not changed.

The generation of the countermeasure network (GAN) employed by the present implementation is introduced as follows: GAN inputs picture X of style a (e.g., real person style) with a size of (3, 256, 256), where 3 is the number of color channels, and outputs picture Y of style B (e.g., cartoon style) with a size of (3, 256, 256); GAN can be divided into 2 parts: an encoder and a decoder, wherein the encoder down-samples and encodes picture X to obtain intermediate features F of smaller size and larger number of channels, the size of F being (C, 32, 32); the decoder then upsamples based on F to generate an output result picture Y. Typically, the encoder and decoder are the same in computation, each accounting for 50% of the total computation.

The following describes in detail each execution step of a specific implementation of the video style migration method according to this embodiment with reference to fig. 5 to 9:

A. executing subject to T-1 frame image X in video_t-1(i.e., the previous image frame of two adjacent frames, as shown in fig. 5), the encoder and decoder of the GAN network may be used to generate a corresponding stylized picture Y_t-1(i.e., stylized frame after style migration of previous image frame, as shown in FIG. 6), and stores its intermediate features F_t-1(i.e., intermediate features of a previous image frame undergoing style migration).

B. Mixing X_t-1And X_t(i.e., the subsequent image frame of the two adjacent frames, as shown in fig. 7) the two-frame picture is scaled to 32 × 32, i.e., the two-frame picture is scaled in size and the intermediate feature F_t-1Are the same size, to yield x_t-1(i.e., the reduced prior image frame) and x_t(i.e., the reduced subsequent image frame) and calculating a dense optical flow f between the two image frames (as shown in fig. 8), the dense optical flow f reflects the mapping relationship between the image pixels, and the optical flow calculation is performed on a small scale, so the optical flow calculation speed is very fast, about 3 ms.

C. On the basis of the dense optical flow F, to the intermediate features F_t-1Performing distortion processing to approximately generate an intermediate feature F 'at time T'_t(i.e., the intermediate feature after twisting); at the same time, for x_t-1Is subjected to twisting treatment to obtain x'_t(i.e., the warped subsequent image frame).

D. Based on x'_tAnd x_tObtaining a first absolute difference value delta x, F'_tAnd Δ x input feature timing migration module for obtaining F "t (i.e. smoothed intermediate feature) output by the feature timing migration module, the feature timing migration module acting on the warped featureAnd performing smoothing processing on the graph.

E. Based on F'_t-1, using the decoder of the generator of the GAN network to generate the stylized result Y at time T_t(i.e., stylized frame after style migration of the subsequent image frame, as shown in FIG. 9).

The steps A to E are carried out in a circulating mode in the video aiming at the T +1 th frame image and the T +2 th frame image, one frame image frame in every two frame image frames of the video can be completely generated through a decoder and an encoder of a generator of the GAN network, and the other frame image frame only needs the decoder of the generator of the GAN network to carry out calculation, so that the calculation amount is saved.

Further, in the embodiment, the GAN network is used for generating a plurality of frames of images, and the speed comparison block can achieve a real-time effect on the mobile terminal when used for generating a video, thereby improving user experience.

With further reference to fig. 10, as an implementation of the method shown in the above figures, the present application provides an embodiment of a video style migration apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 10, the video style migration apparatus 1000 according to the present embodiment includes: a feature acquisition module 1001, an optical flow acquisition module 1002, a warping processing module 1003, and a style migration module 1004. The feature obtaining module 1001 may be configured to obtain an intermediate feature for performing style migration on a previous image frame of two adjacent frames in the video. The optical flow obtaining module 1002 may be configured to obtain optical flows of two adjacent frames. The warping processing module 1003 may be configured to warp the intermediate feature subjected to style migration of the previous image frame based on the optical flow, so as to obtain the intermediate feature subjected to style migration of the subsequent image frame in two adjacent frames. The style migration module 1004 may be configured to update the intermediate feature of the style migration of the previous image frame with the intermediate feature of the style migration of the subsequent image frame, so as to obtain a stylized frame after the style migration of the subsequent image frame.

In this embodiment, in the video style migration apparatus 1000: the specific processing and the technical effects of the feature obtaining module 1001, the optical flow obtaining module 1002, the warping processing module 1003, and the style migration module 1004 may refer to the related descriptions of step 101, step 102, step 103, and step 104 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations of this embodiment, the optical flow is a dense optical flow, and the optical flow obtaining module 1002 includes: a first frame reduction unit (not shown), a second frame reduction unit (not shown), and an optical flow calculation unit (not shown). Wherein the previous frame reducing unit may be configured to reduce the previous image frame to a size of an intermediate feature of the previous image frame for style migration. The above-mentioned subsequent frame reduction unit may be configured to reduce a subsequent image frame of the two adjacent frames to a size of an intermediate feature of the previous image frame for style migration. The optical flow calculation unit may be configured to calculate a dense optical flow between the reduced preceding image frame and the reduced succeeding image frame by using an optical flow method, and to use the dense optical flow as the optical flow of two adjacent frames.

In some optional implementations of this embodiment, the warping processing module 1003 includes: a feature warping unit (not shown), a smoothing unit (not shown), and a feature transferring unit (not shown). The feature warping unit may be configured to warp the intermediate features subjected to style transition in the reduced subsequent image frame and the previous image frame based on the optical flow, and generate the warped subsequent image frame and the warped intermediate features. The smoothing unit may be configured to smooth the distorted intermediate feature based on the distorted subsequent image frame and the reduced subsequent image frame to obtain a smoothed intermediate feature. The feature transfer unit may be configured to use the smoothed intermediate feature as an intermediate feature for performing style transfer on a subsequent image frame of two adjacent frames.

In some optional implementations of this embodiment, the smoothing unit includes: a parameter calculation subunit (not shown in the figure), and a feature output subunit (not shown in the figure). The parameter calculating subunit may be configured to calculate a first absolute difference value between the warped subsequent image frame and the reduced subsequent image frame, where the first absolute difference value is an absolute difference value between pixel values of corresponding positions of the warped subsequent image frame and the reduced subsequent image frame. The feature output subunit may be configured to input the first difference absolute value and the warped intermediate feature into the feature time sequence migration model, so as to obtain a smoothed intermediate feature output by the feature time sequence migration model.

Further, the input of the feature time sequence migration model is a first difference absolute value and a distorted intermediate feature, the output of the feature time sequence migration module is a smoothed intermediate feature, and the feature time sequence migration model takes a second difference absolute value smaller than a preset threshold value as an optimization target; and the second absolute difference value is the absolute value of the difference value of the pixel values of the positions corresponding to the feature image in the channel of the smoothed intermediate feature and the intermediate feature subjected to style migration in the subsequent image frame.

In some optional implementations of this embodiment, the feature obtaining module 1001 includes: a feature acquisition unit (not shown in the figure). The feature obtaining unit may be configured to input a previous image frame of two adjacent frames in the video into an encoder of the generator of the generative confrontation network, and obtain the intermediate feature output by the encoder of the generator of the generative confrontation network.

In the video style migration apparatus provided in the embodiment of the present application, first, the feature obtaining module 1001 obtains an intermediate feature for performing style migration on a previous image frame of two adjacent frames in a video; then the optical flow obtaining module 1002 obtains the optical flows of two adjacent frames, and the post-warping processing module 1003 performs warping processing on the intermediate feature of the style migration of the previous image frame based on the optical flows to obtain the intermediate feature of the style migration of the next image frame in the two adjacent frames; and finally, the style migration module 1004 updates the intermediate feature of the style migration of the previous image frame by using the intermediate feature of the style migration of the subsequent image frame to obtain the stylized frame of the subsequent image frame after the style migration. Therefore, the style migration operation of the whole path is not needed to be carried out on the subsequent image frame, the subsequent path style migration operation is only needed to be carried out on the intermediate characteristic of the style migration of the subsequent image frame, the calculation amount is saved, the time consumption of the video in the style migration is reduced, and the frame rate of the video in the style migration is improved under the condition that the style migration effect is not lost.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 11 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 11, the electronic apparatus includes: one or more processors 1101, a memory 1102, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses 1105 and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 11, a processor 1101 is taken as an example.

The memory 1102 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the video style migration method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the video style migration method provided herein.

Memory 1102, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the video style migration method in the embodiments of the present application (e.g., feature acquisition module 1001, optical flow acquisition module 1002, warping processing module 1003, and style migration module 1004 shown in fig. 10). The processor 1101 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 1102, that is, implements the video style migration method in the above-described method embodiment.

The memory 1102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the video style migration method, and the like. Further, the memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely from the processor 1101, which may be connected to the electronic device of the video style migration method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the video style migration method may further include: an input device 1103 and an output device 1104. The processor 1101, memory 1102, input device 1103, and output device 1104 may be connected by a bus 1105 or other means, such as by bus 1105 in fig. 11.

The input device 1103 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the video style migration method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 1104 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A video style migration method, the method comprising:

acquiring intermediate characteristics of style migration of previous image frames of two adjacent frames in a video;

acquiring optical flows of the two adjacent frames;

based on the optical flow, performing distortion processing on the intermediate feature of the style migration of the previous image frame to obtain the intermediate feature of the style migration of the next image frame in the two adjacent frames;

updating the intermediate characteristic of the style migration of the previous image frame by using the intermediate characteristic of the style migration of the subsequent image frame to obtain a stylized frame of the subsequent image frame after the style migration; the optical flow is dense optical flow, and the acquiring the optical flows of the two adjacent frames comprises: reducing the previous image frame to a size of an intermediate feature of the previous image frame for style migration;

reducing a subsequent image frame in the two adjacent frames to the size of an intermediate feature of the previous image frame for style migration;

and calculating the dense optical flows between the reduced previous image frame and the reduced next image frame by adopting an optical flow method, and taking the dense optical flows as the optical flows of the two adjacent frames.

2. The method of claim 1, wherein the warping the intermediate feature of the style migration of the previous image frame based on the optical flow to obtain the intermediate feature of the style migration of the next image frame of the two adjacent frames comprises:

based on the optical flow, respectively carrying out distortion processing on the reduced later image frame and the intermediate feature of style migration of the prior image frame to generate a distorted later image frame and a distorted intermediate feature;

based on the distorted later image frame and the reduced later image frame, smoothing the distorted intermediate features to obtain smoothed intermediate features;

and taking the smoothed intermediate feature as an intermediate feature for performing style migration on a subsequent image frame in the two adjacent frames.

3. The method of claim 2, wherein smoothing the warped intermediate feature based on the warped subsequent image frame and the reduced subsequent image frame to obtain a smoothed intermediate feature comprises:

calculating a first difference absolute value of the warped subsequent image frame and the reduced subsequent image frame, wherein the first difference absolute value is an absolute value of a difference of pixel values of corresponding positions of the warped subsequent image frame and the reduced subsequent image frame;

and inputting the first difference absolute value and the distorted intermediate feature into a feature time sequence migration model to obtain a smoothed intermediate feature output by the feature time sequence migration model.

4. The method according to claim 3, wherein the inputs of the feature time sequence migration model are a first difference absolute value and a warped intermediate feature, the output of the feature time sequence migration module is a smoothed intermediate feature, and the feature time sequence migration model takes a second difference absolute value smaller than a preset threshold value as an optimization target;

and the second absolute difference value is the absolute value of the difference value of the pixel values of the positions corresponding to the feature map in the channel of the smoothed intermediate feature and the intermediate feature for performing style migration on the subsequent image frame.

5. The method according to one of claims 1 to 4, wherein the acquiring intermediate features of style migration of previous image frames of two adjacent frames in the video comprises:

and inputting the prior image frames of two adjacent frames in the video into an encoder of a generator of the generative countermeasure network to obtain the intermediate characteristics output by the encoder of the generator of the generative countermeasure network.

6. A video style migration apparatus, the apparatus comprising:

the characteristic acquisition module is configured to acquire intermediate characteristics of style migration of the prior image frames of two adjacent frames in the video;

an optical flow acquisition module configured to acquire optical flows of the two adjacent frames;

a warping processing module configured to warp the intermediate feature subjected to style migration of the previous image frame based on the optical flow to obtain an intermediate feature subjected to style migration of a subsequent image frame in the two adjacent frames;

the style migration module is configured to update the intermediate feature of the style migration of the previous image frame by using the intermediate feature of the style migration of the subsequent image frame to obtain a stylized frame of the subsequent image frame after the style migration; wherein the optical flow is a dense optical flow, the optical flow acquisition module comprising:

a previous frame reduction unit configured to reduce the previous image frame to a size of an intermediate feature of the previous image frame for style migration;

a subsequent frame reduction unit configured to reduce a subsequent image frame of the two adjacent frames to a size of an intermediate feature of the previous image frame for style migration;

an optical flow calculation unit configured to calculate a dense optical flow between the reduced preceding image frame and the reduced succeeding image frame by using an optical flow method, and to take the dense optical flow as an optical flow of the adjacent two frames.

7. The apparatus of claim 6, wherein the warp processing module comprises:

a feature warping unit configured to warp the reduced later image frame and the intermediate feature subjected to style migration of the earlier image frame based on the optical flow, and generate a warped later image frame and a warped intermediate feature;

a smoothing unit configured to smooth the warped intermediate feature based on the warped subsequent image frame and the reduced subsequent image frame, resulting in a smoothed intermediate feature;

a feature transfer unit configured to use the smoothed intermediate feature as an intermediate feature for style transfer of a subsequent image frame of the two adjacent frames.

8. The apparatus of claim 7, wherein the smoothing unit comprises:

a parameter calculating subunit configured to calculate a first absolute difference value between the warped subsequent image frame and the reduced subsequent image frame, where the first absolute difference value is an absolute difference value between corresponding position pixel values of the warped subsequent image frame and the reduced subsequent image frame;

a feature output subunit configured to input the first difference absolute value and the warped intermediate feature into a feature time sequence migration model, so as to obtain a smoothed intermediate feature output by the feature time sequence migration model.

9. The apparatus of one of claims 6-8, wherein the feature acquisition module comprises:

the characteristic acquisition unit is configured to input the prior image frames of two adjacent frames in the video into the encoder of the generator of the generative countermeasure network, and obtain the intermediate characteristic output by the encoder of the generator of the generative countermeasure network.

10. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

11. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.