CN113365107A

CN113365107A - Video processing method, movie video processing method and device

Info

Publication number: CN113365107A
Application number: CN202010147566.2A
Authority: CN
Inventors: 刘国栋
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2021-09-07
Anticipated expiration: 2040-03-05
Also published as: CN113365107B

Abstract

The invention discloses a video processing method, which comprises the following steps: acquiring a first video, wherein the first video is a video obtained by down-sampling an original video and has a resolution lower than that of the original video; processing the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video; and inputting the second video into a preset resolution improving component, and improving the resolution of the second video by the resolution improving component to output a result video, wherein the resolution of the result video is the same as that of the original video. The invention also discloses a corresponding movie and television video processing method and computing equipment.

Description

Video processing method, movie video processing method and device

Technical Field

The invention relates to the technical field of video processing, in particular to a video processing method, a video processing method and a video processing device.

Background

With the development of video acquisition equipment and display equipment and the further expansion of network bandwidth, a large number of videos are shot on high-definition resolution, are efficiently encoded and transmitted after post-production, and are played on high-quality display equipment to be presented to users.

High resolution video provides a good visual experience for the user, but also increases the cost of post-production. The post-production of high-resolution video is time-consuming, and higher requirements are put on the storage and operation capacity of video production equipment. In particular, in the field of movie production, a large number of movies have been shot at an ultra-high definition resolution of 4K or even 8K, and the image quality is very fine. Huge manpower, time and software and hardware costs are needed to perform post production on movie and television series videos, the cost is very high, and the efficiency is very low.

Disclosure of Invention

To this end, the present invention provides a video processing method, a movie video processing method and an apparatus, which seek to solve or at least alleviate the above-existing problems.

According to a first aspect of the present invention, there is provided a video processing method comprising: acquiring a first video, wherein the first video is a video obtained by down-sampling an original video and has a resolution lower than that of the original video; processing the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video; and inputting the second video into a preset resolution improving component, and improving the resolution of the second video by the resolution improving component to output a result video, wherein the resolution of the result video is the same as that of the original video.

According to a second aspect of the present invention, there is provided a method for processing a movie video, comprising the steps of: acquiring a first video output by a movie making device, wherein the first video is a video which is obtained by down-sampling an original movie video and has a resolution lower than that of the original movie video; performing post-processing on the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video; and inputting the second video into a preset resolution improving component, and improving the resolution of the second video by the resolution improving component to output a result video, wherein the resolution of the result video is the same as that of the original film and television video.

According to a third aspect of the present invention, there is provided a method for processing a movie video, comprising the steps of: acquiring a first video output by a movie making device, wherein the first video is a video which is obtained by down-sampling an original movie video and has a resolution lower than that of the original movie video; performing post-processing on the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video, and the post-processing does not involve color mixing processing; inputting the second video into a preset resolution improving component, and improving the resolution of the second video by the resolution improving component to output a third video, wherein the resolution of the third video is the same as that of the original video; and performing color matching processing on the third video to obtain a result video.

According to a fourth aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions; when the program instructions are read and executed by a processor, the program instructions cause a computing device to perform the video processing method or the movie video processing method.

According to a fifth aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to execute the above-described video processing method or movie video processing method.

In the video processing scheme of the present invention, the first video is a low-resolution video obtained by down-sampling an original video of high resolution. And processing the first video with low resolution to obtain a second video, wherein the resolution is not changed in the processing process, and the second video is still the low resolution video. Subsequently, the second video is input into a preset resolution enhancement component, and the resolution enhancement component is suitable for converting the second video with low resolution into a result video with high resolution.

The video processing scheme of the invention processes the down-sampling video of the original video, namely the first video with low resolution, instead of directly processing the original video with high resolution, and then reduces the low resolution video to the video with high resolution through the resolution improving component, thereby greatly reducing the cost of the post-production of the video and improving the efficiency of the post-production.

Furthermore, the resolution enhancement assembly comprises a conversion unit and a super-resolution unit, wherein the conversion unit is suitable for preprocessing the second video and outputting a third video; the super-resolution unit is suitable for amplifying the third video and outputting a high-resolution result video. The conversion unit does not change the resolution of the video, but can improve the image quality of the video, so that after the output video is input into the super-resolution unit, the super-resolution unit can output a high-quality high-resolution result video. That is, the third video has the same resolution as the second video but has a higher image quality, enabling the super-resolution unit to output a high-quality high-resolution video.

The transformation unit and the super-resolution unit are both realized as a neural network. The network structure of the super-resolution unit is very deep, the complexity is high, and the time cost for retraining the super-resolution unit is very high; the transformation unit is light-weight, and the structure of the transformation unit is much simpler compared with that of the super-resolution unit, so that the training speed is high, and retraining is facilitated. In the technical scheme of the invention, the conversion unit can be retrained according to different application scenes (aiming at different down-sampling methods adopted for generating the first video), and the super-resolution unit is kept unchanged, so that the training efficiency of the resolution improving assembly is greatly improved. In addition, in the resolution improving assembly, the video is preprocessed through the converting unit firstly, so that the state which is most beneficial to the resolution improving of the super-resolution unit is achieved, and the super-resolution unit can be ensured to output the high-quality high-resolution video. Therefore, the video processing scheme of the invention can save cost and improve efficiency, and simultaneously ensure the image quality of the result video, so that the result video can have almost the same visual effect as the original video.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a schematic diagram of a prior art video production process;

FIG. 2 shows a schematic diagram of a video production process according to one embodiment of the invention;

FIG. 3 shows a flow diagram of a video processing method 300 according to one embodiment of the invention;

FIG. 4 shows a schematic diagram of a resolution enhancement assembly according to one embodiment of the invention;

FIG. 5 shows a block diagram of a conversion unit according to one embodiment of the invention;

FIG. 6 shows a block diagram of a residual module according to one embodiment of the invention;

FIG. 7 shows a block diagram of a residual channel attention module, according to one embodiment of the invention;

FIG. 8 illustrates a flow diagram of a method 800 for movie video processing according to one embodiment of the present invention;

FIG. 9 is a diagram illustrating a movie video processing procedure according to one embodiment of the invention;

FIG. 10 illustrates a flow diagram of a method 1000 of movie video processing according to one embodiment of the invention;

FIG. 11 is a diagram illustrating a movie video processing procedure according to another embodiment of the present invention;

FIG. 12 shows a schematic diagram of a computing device 1200, according to one embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a schematic diagram of a video production process of the prior art. As shown in fig. 1, first, a high-resolution original film, i.e., an original video, is photographed by using a video capture device (e.g., a video camera, a mobile phone, etc.). The original resolution may be, for example, 4K (4096 × 2160 image size), 8K, or other values, and the 4K resolution is used as an example in fig. 1 to describe the conventional video production process.

And after the later-stage workers take the original film, the original film is subjected to post-processing such as clipping, special effect making, color mixing and the like, and finally the finished film is output. The entire post-processing procedure operates at 4K resolution. Due to the high resolution, the requirements on software and hardware equipment used for post-processing are very high, old equipment can hardly meet the requirements, equipment needs to be updated and replaced, and the time used for post-processing is long and the efficiency is low.

Aiming at the problems of high cost and low efficiency caused by directly carrying out post-processing on an original video with high resolution in the prior art, the invention provides a video processing method to reduce the cost of video post-processing and improve the efficiency of the post-processing.

Fig. 2 shows a schematic diagram of a video production process implemented on the basis of the video processing method of the present invention. As shown in fig. 2, first, a high-resolution original (4K original in fig. 2 as an example) is photographed by using a video capture device, that is, an original video. The original is then downsampled to obtain a low resolution video having a lower resolution than the original. And then, performing post-processing such as clipping, special effect making, color mixing and the like on the low-resolution video, inputting the processed low-resolution video into a preset resolution improving component, converting the low-resolution video into a high-resolution video by the resolution improving component, and outputting high-resolution film forming (4K film forming). In the process, the post-processing of the video is carried out under the low resolution, so that the cost of the post-processing of the video is greatly reduced, and the efficiency of the post-processing is improved.

It should be noted that, although the technical effect of the video processing scheme of the present invention is described above by taking the post-production process of the video as an example, those skilled in the art can understand that the video processing scheme of the present invention can be applied not only to the post-production process of the video, but also to various scenes such as video on demand, live video, short video processing and playing, so as to improve the video processing, audio and video communication and playing efficiency.

It should be understood by those skilled in the art that the present invention is not limited to the specific application scenario of the video processing scheme, and any application scenario to which the video processing scheme of the present invention is applied is within the scope of the present invention.

For example, in one embodiment, the video processing scheme of the present invention may be integrated into a variety of media processing tools. When the video needs to be processed by a plurality of media processing tools, according to the video processing scheme of the present invention, the high resolution video to be processed can be converted into the low resolution video, each media processing tool processes the video at low resolution, and the processed low resolution video is transmitted to the mobile communication network such as internet, 3G, 4G, 5G, GPRS, etc. or Wi-Fi, Bluetooth^TMThe communication channel such as the near field communication network is transmitted to the next media processing tool. After the low-resolution video is processed in the last media processing tool, the media processing tool can restore the low-resolution video to high resolution by the resolution enhancement component of the present invention. In the processing process, the video processing and the communication process are both carried out under the low resolution, so that the video processing and network communication efficiency is greatly improved.

In another embodiment, the video processing scheme of the present invention may be integrated into a terminal device of a user, the terminal device including but not limited to a mobile phone, a tablet computer, a multimedia playing device, a smart wearable device, and the like. The user can access the network on the terminal equipment, request the movie and television play and short video which are interested by the user, or watch the live video, and the like. Users always want to acquire high-resolution video to improve the look and feel. However, obtaining high resolution video requires high network bandwidth. Also, some video assets in the network may only have low resolution versions, but not high resolution versions, due to the long-term shooting age or low hardware configuration of the shooting device. Under the conditions, based on the video processing scheme provided by the invention, a user can acquire a low-resolution video from a network through terminal equipment, and then the acquired low-resolution video is converted into a high-resolution video by adopting the resolution improving assembly provided by the invention, so that the network bandwidth resource is saved, and the video impression is improved. The video processing scheme of the present invention will be described in detail below.

Fig. 3 shows a flow diagram of a video processing method 300 according to an embodiment of the invention. The method 300 is executed in a computing device for achieving low-cost, high-efficiency video post-processing while ensuring processing effectiveness. The computing device may be any device suitable for image processing, such as a workstation dedicated to processing images and video data, or a personal computer such as a desktop computer and a notebook computer, and may also be a mobile phone, a tablet computer, an internet of things device, and the like, but is not limited thereto. As shown in fig. 3, the method 300 begins at step S310.

In step S310, a first video, which is a video obtained by down-sampling an original video and having a resolution lower than that of the original video, is acquired.

In an embodiment of the invention, the resolution of the video is the resolution of the image frames of the video.

The original video is the video which is acquired by the video acquisition equipment and is not processed. The resolution of the original video is generally higher, for example, 1080P resolution (1920 × 1080 pixels), 4K resolution (4096 × 2160 pixels), 8K resolution (7680 × 4320 pixels), and the like. The original video can be any video, and the invention does not limit the source, type, subject and the like of the original video. For example, the original video may be a movie video, a drama video, a television program video, and the like, which are acquired by professional video acquisition equipment such as a video camera, and the like, or a life video, which is acquired by an ordinary user using terminal equipment such as a mobile phone, a tablet computer, and the like.

It should be noted that the resolution of the original video may be any value, and the invention is not limited to the resolution of the original video. In addition, it should be noted that the term high resolution, low resolution as used herein is a set of relative concepts and does not correspond to a particular numerical range of resolutions. In the embodiment of the present invention, the resolution of the original video, the first sample video, and the second sample video is taken as a high resolution, and correspondingly, the resolution of the down-sampled video of the original video, the first sample video, and the second sample video is taken as a low resolution, and the present invention does not limit specific values of the high resolution and the low resolution.

In step S310, the first video is a low-resolution video obtained by down-sampling the original video. The original video is downsampled, that is, each image frame of the original video is downsampled, the resolution of each image frame is reduced, and the downsampled image frames with low resolution are combined to form the first video with low resolution. For example, the original video is a movie video, and the first video is a video output by a movie production device downsampling the movie video.

It should be noted that the present invention is not limited to the down-sampling method employed to generate the first video. Specifically, the first video may be generated using a specified downsampling method including, but not limited to, nearest neighbor sampling, quadratic interpolation, bicubic interpolation, and the like. The first video may also be generated and output by a video production device. The video production device can be, for example, a video acquisition device such as a video camera and a video camera, and can also be a video post-processing device such as a high-performance workstation and a digital intermediate film system. The video production apparatus itself can convert the high-resolution original video into the low-resolution first video for output, but it is unknown what down-sampling method is used in generating the first video.

After the first video with low resolution is acquired in step S310, step S320 is performed.

Subsequently, in step S320, the first video is processed to obtain a second video, and the resolution of the second video is the same as the resolution of the first video.

The processing performed on the first video in step S320 may be any post-processing, such as clipping, special effect making, toning, and the like. The video processing in step S320 is performed on the low resolution, and the resolution of the second video generated by the processing is the same as that of the first video, and both are low resolution videos.

Subsequently, in step S330, the second video is input into a preset resolution increasing component, and the resolution increasing component increases the resolution of the second video to output a result video, where the resolution of the result video is the same as the original video.

The resolution boosting component may be, for example, a deep learning network for image processing, which is adapted to boost the resolution of the input video, outputting an output video having a higher resolution than the input video. Specifically, the resolution improving component takes each image frame of the input video as input, improves the resolution of each image frame, outputs the image frame with high resolution, and the image frames with high resolution are combined to form the output video with high resolution.

Before step S330, the training of the resolution improving component is completed in advance, and in step S330, the trained resolution improving component is used to perform forward calculation on the second video, so as to improve the resolution of the second video, and output the result video, where the resolution of the result video is the same as that of the original video.

Those skilled in the art will appreciate that a video is composed of multiple image frames, and that an image may have multiple channels, e.g., RGB channels, YUV channels, etc., and accordingly, a video also has multiple channels.

According to an embodiment, in step S330, all channels of the second video may be input into a preset resolution enhancement component, the resolution enhancement component outputs single-channel result videos of each channel, and the single-channel result videos are overlapped to form a complete result video.

For example, the second video is a video of an RGB channel. In step S330, the RGB channels are all input to the resolution increasing component, the resolution increasing component outputs single-channel result videos of the RGB channels, and the three single-channel result videos are superimposed to form a complete result video.

According to an embodiment, in step S330, a part of channels of the second video may be input to a preset resolution increasing component, and the resolution of the part of channels is increased by the resolution increasing component; for another part of channels of the second video, the resolution is increased by means of up-sampling, and the resolution is not increased by adopting a resolution increasing component. The upsampling method may be, for example, a Lanczos3 method, a bicubic interpolation method, or other interpolation methods based on one-dimensional filters, etc., but is not limited thereto.

The method of increasing resolution by upsampling is faster and more efficient than the resolution increasing module, but the resolution increasing effect is not as good as the latter. In step S330, the resolutions of different channels of the second video are improved by using two methods, namely, an upsampling and resolution improving component, so that the efficiency can be improved while the quality is ensured.

According to an embodiment, since human eyes are sensitive to luminance and less sensitive to chrominance, in step S330, the luminance channel map of the second video may be input to the resolution increasing component, so that the resolution increasing component increases the resolution of the luminance channel map and outputs the luminance result video; performing up-sampling on the chroma of the second video to obtain a chroma result video; and superposing the brightness result video and the chrominance result video to obtain a result video.

For example, the second video is a video of YUV channel, where Y is a luminance channel and U, V is a chrominance channel. Because human eyes are sensitive to brightness and poor in sensitivity to chromaticity, the resolution improving component can be adopted to improve the resolution of the brightness channel Y so as to ensure that the brightness channel can achieve a good resolution improving effect; the upsampling approach is used to increase the resolution of the chroma channel U, V to increase processing speed.

According to an embodiment, the resolution enhancement assembly comprises a translation unit and a super-resolution unit, as shown in fig. 4. The conversion unit is adapted to pre-process the second video to output a third video having the same resolution as the second video and adapted to be processed by the super-resolution unit. The third video generated by the conversion unit is then input to a super-resolution unit adapted to perform a resolution enhancement process on the third video to output a resulting video having the same resolution as the original video.

The super-resolution unit can be any neural network capable of improving the image resolution, and the specific structure of the super-resolution unit is not limited by the invention. The super-resolution unit may be implemented as, for example, a Residual Channel Attention Network (RCAN), but is not limited thereto.

The conversion unit may for example be a light-weighted neural network which does not change the resolution of the video, i.e. the resolution of the output video of the conversion unit is the same as its input video.

FIG. 5 shows a block diagram of a conversion unit according to one embodiment of the invention. As shown in fig. 5, the transformation unit comprises an input convolutional layer, an active layer, a plurality of residual modules (two residual modules are shown in fig. 5), an intermediate convolutional layer, a residual connection layer, and an output convolutional layer, which are connected in sequence, wherein the residual connection layer is adapted to add the output of the active layer and the output of the intermediate convolutional layer. The arrangement of the residual connecting layer can solve the problem of gradient disappearance in the training process of the transformation unit, is beneficial to keeping image details and improves the network training efficiency and the network accuracy.

It will be understood by those skilled in the art that the structure of the conversion unit shown in FIG. 5 is merely an example. In practice, the specific structure of the transformation unit may be adjusted on the basis of fig. 5, for example, the number and parameters of convolution layers, the number of residual modules, the activation function adopted by the activation layer, and the like are adjusted, and the specific structure of the transformation unit is not limited by the present invention.

As shown in fig. 5, the second video sequentially passes through the processes of the input convolutional layer Conv1, the active layer prellu, the two Residual modules Residual Group1 and Residual Group2, the intermediate convolutional layer Conv2, the Residual Connection layer Skip Connection1, and the output convolutional layer Conv3, and outputs a third video. Each convolutional layer in fig. 5 includes four parameters, where the first parameter represents the number of input channels, the second parameter represents the number of output channels, and the third and fourth parameters represent the size of the convolutional kernel. For example, the

parameters

1, 48, 3, 3 of the input convolutional layer Conv1 indicate that the number of input channels of the input convolutional layer is 1, the number of output channels is 48, and the size of the convolutional kernel is 3 × 3. The active layer adopts a PReLU activation function, and unlike the ReLU activation function, the PReLU activation function can process data smaller than 0 part, so that better effect can be achieved under the network with the same depth.

It should be noted that the number of input channels of the input convolutional layer Conv1 and the number of output channels of the output convolutional layer Conv3 in fig. 5 should be the same as the number of channels of the second video processed by the conversion unit. For example, the number of input channels of the input convolutional layer Conv1 and the number of output channels of the output convolutional layer Conv3 in fig. 5 are both 1, which indicates that the conversion unit processes only one channel of the second video, for example, only the luminance channel of the second video. If the conversion unit needs to process multiple channels of the second video, for example, the second video is an RGB video, and the conversion unit needs to process all three channels of RGB, the number of input channels of the input convolutional layer Conv1 and the number of output channels of the output convolutional layer Conv3 in fig. 5 should both be 3.

According to one embodiment, the two residual modules in fig. 5 are identical in structure. FIG. 6 shows a block diagram of a residual module according to one embodiment of the invention. As shown in fig. 6, the Residual module includes a plurality of Residual Channel Attention Blocks (RCABs) (four are shown in fig. 6), a convolutional layer Conv4, and a Residual Connection layer Skip Connection2, which are connected in sequence.

Those skilled in the art will appreciate that the structure of the residual module shown in fig. 6 is only an example, and in practice, the specific structure of the residual module may be adjusted on the basis of fig. 6, for example, the number of RCAB modules, the number of convolutional layers, parameters, and the like, and the present invention is not limited to the specific structure of the residual module.

The four residual channel attention modules (RCAB modules) shown in fig. 6 are identical in structure according to one embodiment. Figure 7 shows a block diagram of an RCAB module in accordance with one embodiment of the present invention. As shown in fig. 7, the RCAB module includes a convolutional layer Conv5, an active layer pralu, a convolutional layer Conv6, a mean Pooling layer Average pond, a convolutional layer Conv7, an active layer pralu, a convolutional layer Conv8, an active layer Sigmoid, an Element-wise Product, and a residual Connection layer Skip Connection3, which are connected in sequence. The first two active layers adopt a PReLU active function, and the third active layer adopts a Sigmoid active function.

The element multiplication layer is suitable for multiplying the feature images output by the convolutional layer Conv6 and the elements at the corresponding positions of the feature images output by the active layer Sigmoid, and the obtained feature images have the same size as the feature images output by the convolutional layer Conv6 and the active layer Sigmoid. The residual Connection layer Skip Connection3 is adapted to add the input of the convolutional layer Conv5 to the output of the Element-wise Product.

Those skilled in the art will appreciate that the RCAB structure shown in fig. 7 is merely an example. In practice, the specific structure of the RCAB may be adjusted based on fig. 7, for example, the number and parameters of the convolutional layers, the activation function adopted by the activation layers, and the like, and the invention is not limited to the specific structure of the RCAB.

As can be understood by those skilled in the art, in order to enable the super-resolution unit to achieve high-quality output results, the neural network structure is often deep, the complexity of the algorithm is high, and the training process takes a long time, so that it is inconvenient to retrain the super-resolution unit repeatedly. The conversion unit does not change the resolution of the video, the network structure is much simpler compared with the super-resolution unit, the training speed is high, and retraining is convenient.

Therefore, in the technical solution of the present invention, the trained super-resolution unit is preferably kept unchanged, and the conversion unit is retrained according to the super-resolution unit and different application scenarios (i.e. different down-sampling methods adopted for generating the first video), so as to greatly improve the training efficiency of the entire resolution enhancement assembly. In addition, in the resolution improving assembly, the video is preprocessed through the converting unit firstly, so that the state which is most beneficial to the resolution improving of the super-resolution unit is achieved, and the super-resolution unit can be ensured to output high-quality result video. Therefore, the video processing scheme of the invention can save cost and improve efficiency, and simultaneously ensure the image quality of the result video, so that the result video can have almost the same visual effect as the original video.

In the embodiment of the present invention, the conversion unit is configured to ensure that the super-resolution unit outputs a high-quality result video while avoiding retraining the super-resolution unit. Therefore, the transformation unit needs to be trained according to the super-resolution unit. In particular, during network training, the down-sampling method used to generate the input image of the super-resolution unit should be the same as the down-sampling method used to generate the target image of the translation unit.

According to one embodiment, the super-resolution unit is a neural network trained using the first sample video and a first downsampled video of the first sample video. The first downsampling video of the first sample video is obtained by downsampling the first sample video by adopting a first downsampling method. In the training process of the super-resolution unit, each image frame of the first down-sampling video of the first sample video is input to the super-resolution unit, and the corresponding image frame of the first sample video is an output target of the super-resolution unit.

One or more first sample videos participating in the training of the super-resolution unit can be provided, and the first sample videos are high-resolution videos. After the plurality of first sample videos are obtained, each first sample video is respectively subjected to down-sampling by adopting a first down-sampling method, and first down-sampled videos of each first sample video are obtained. The first downsampling method is typically a high-quality downsampling method such as, but not limited to, a BiCubic method (i.e., BiCubic method). After the first sample video and the first downsampled video thereof are obtained, the super-resolution unit is trained according to the first sample video and the first downsampled video thereof. In the training process, each image frame of the first down-sampling video is used as the input of the super-resolution unit, and the corresponding image frame of the first sample video is used as the output target of the super-resolution unit.

In the training process of the super-resolution unit, the input image of the super-resolution unit is a first downsampled video of the high-resolution video. In order for the conversion unit to be able to generate a video that is advantageous for processing by the super-resolution unit, the output target of the conversion unit should therefore be the same as the input of the super-resolution unit, both being the first downsampled video of the high-resolution video.

According to an embodiment, the conversion unit is trained by using a first downsampling video and a second downsampling video of a second sample video, where the first downsampling video and the second downsampling video of the second sample video are obtained by downsampling the second sample video by using a first downsampling method and a second downsampling method, respectively, and the second downsampling method is the downsampling method used for converting the original video into the first video in the foregoing step S310. In the training process of the conversion unit, each image frame of a second downsampling video of the second sample video is input into the conversion unit, and a corresponding image frame of a first downsampling video of the second sample video is an output target of the conversion unit.

One or more second sample videos participating in the training of the conversion unit can be provided, and the second sample videos are high-resolution videos. The second set of sample videos used to train the transformation unit may overlap completely or partially with the first set of sample videos used to train the super-resolution unit, or be completely different. After the plurality of second sample videos are obtained, a first downsampling method is adopted to perform downsampling on each second sample video respectively, and a first downsampling video of each second sample video is obtained. Then, the second down-sampling method is used to down-sample the second sample video, and a second down-sampled video of each second sample video is obtained. The second down-sampling method is the same as the down-sampling method employed to convert the original video into the first video in step S310.

After the first downsampling video and the second downsampling video of the second sample video are obtained, the first downsampling video and the second downsampling video of the second sample video are adopted to train the conversion unit. In the training process, each image frame of a second downsampling video of the second sample video is used as the input of the conversion unit, and the corresponding image frame of a first downsampling video of the second sample video is used as the output target of the conversion unit.

It should be noted that the second downsampling method is the same as the downsampling method adopted in the step S310 to convert the original video into the first video, but the second downsampling method may be uncertain. For example, the first video in step S310 is generated and output by the video production apparatus, but it is not clear what down-sampling method the video production apparatus employs when generating the first video. In this case, the first downsampled video of the second sample video may be obtained by performing the first downsampling calculation on the second sample video, and the second downsampled video of the second sample video is directly output by the video production apparatus.

For example, the second sample video used to train the conversion unit may be a high resolution video captured by a video capture device. The first downsampling method is adopted to downsample the high-resolution video, so that a first downsampled video of the high-resolution video can be obtained. The second downsampled video of the high-resolution video is internally generated and output by the video capture device.

The conversion unit of the present invention is particularly suitable for an application scenario in which the first video is generated and output by a video production device, and the down-sampling method (i.e., the second down-sampling method) used by the video production device to generate the first video is unknown and has uncontrollable quality. In this application scenario, the conversion unit may convert the low-resolution video (corresponding to the second video) generated by the unknown downsampling method into the low-resolution video (corresponding to the third video) generated by the first downsampling method, and the first downsampling method is the downsampling method used for generating the training input image of the super-resolution unit, so that the conversion unit may convert the low-resolution video into a state most beneficial to resolution enhancement by the super-resolution unit, and the super-resolution unit is guaranteed to achieve a good output effect.

It should be noted that the ability of the transformation unit to transform different parts of the image to achieve the target result is different, for example, smooth areas in the image are easy to process to achieve the target result, and strong texture/strong edge areas are relatively more difficult to achieve the target result. Therefore, during the training process of the transformation unit, a loss function can be set for the above characteristics to improve the transformation effect of the strong texture/strong edge region.

According to one embodiment, during the training of the conversion unit, the loss value (loss) may be determined according to the following method:

first, the gradient value of each pixel of the target image of the conversion unit is calculated, and the gradient values of each pixel are combined into a gradient image. For example, the target image is object, the gradient values of the pixels in the target image are calculated respectively, and the gradient values of the ith row and jth column pixels in the target image are recorded as grad (i, j), where 1 ≦ i ≦ M, 1 ≦ j ≦ N, and M, N are the height and width of the target image. The gradient values of the respective pixels constitute a gradient image grad.

And then, low-pass filtering the gradient image, and determining the weight of each pixel according to the gradient value of each pixel after filtering, wherein the larger the gradient value of the pixel is, the larger the weight of the pixel is. The algorithm of the low-pass filtering is not limited in the present invention, and may be, for example, mean filtering, median filtering, gaussian filtering, etc., but is not limited thereto. After the gradient image grad is low-pass filtered, the gradient value grad' (i, j) of each pixel after filtering is obtained. The weight of each pixel is determined from the filtered gradient value grad' (i, j) of each pixel, wherein the greater the filtered gradient value, the greater the weight of the pixel.

Then, calculating an absolute difference image of the output image of the conversion unit and the corresponding target image, wherein the pixel value in the absolute difference image is the absolute value of the difference of the pixel values of the same position of the output image and the target image; the result of the weighted summation of the pixel values in the absolute difference image is taken as the loss value.

That is, the loss function in the training process of the transformation unit is the L1 norm of the difference between the output image of the transformation unit and the target image thereof weighted by the above weight, i.e. the loss value

Wherein, w_ijIs the weight, output, of the pixel in the ith row and the jth column_ijFor outputting pixel values of ith row and jth column in image, object_ijIs the pixel value of the ith row and the jth column in the target image.

According to the video processing method 300 of the present invention, the first video is a low resolution video obtained by down-sampling an original video with a high resolution. And processing the first video with low resolution to obtain a second video, wherein the resolution is not changed in the processing process, and the second video is still the low resolution video. Subsequently, the second video is input into a preset resolution enhancement component, and the resolution enhancement component is suitable for converting the second video with low resolution into a result video with high resolution.

The video processing method 300 of the present invention processes the down-sampled video of the original video, i.e., the first video with low resolution, instead of directly processing the original video with high resolution, and then reduces the video with low resolution to the video with high resolution by the resolution improving component, thereby greatly reducing the cost of the post-production of the video and improving the efficiency of the post-production.

The network structure of the super-resolution unit is very deep, the complexity is high, and the time cost for retraining the super-resolution unit is very high; the transformation unit is light-weight, and the structure of the transformation unit is much simpler compared with that of the super-resolution unit, so that the training speed is high, and retraining is facilitated. In the technical scheme of the invention, the conversion unit can be retrained according to different application scenes (aiming at different down-sampling methods adopted for generating the first video), and the super-resolution unit is kept unchanged, so that the training efficiency of the resolution improving assembly is greatly improved. In addition, in the resolution improving assembly, the video is preprocessed through the converting unit firstly, so that the state which is most beneficial to the resolution improving of the super-resolution unit is achieved, and the super-resolution unit can be ensured to output the high-quality high-resolution video. Therefore, the video processing scheme of the invention can save cost and improve efficiency, and simultaneously ensure the image quality of the result video, so that the result video can have almost the same visual effect as the original video.

The video processing method 300 of the invention can be applied to the post-production of the movie and television video, improves the efficiency of the movie and television post-production, saves the cost, ensures that the produced result video has good image quality, and can present almost the same visual effect as the original video.

Fig. 8 shows a flow chart of a method 800 for processing movie video, resulting from applying the video processing method 300 of the present invention to a movie post-production scene. The method 800 is executed in a computing device for achieving low-cost, efficient post-processing of movie and video videos while ensuring post-production effects. As shown in fig. 8, the method 800 begins at step S810.

In step S810, a first video output by the movie production device is obtained, where the first video is obtained by down-sampling an original movie video and has a resolution lower than that of the original movie video.

Movie video is video for showing on a screen, and includes movies, television shows, television programs, animation, and the like. Because the movie and television videos need to be played on a screen or a screen, the requirements on image quality and definition are high, so that the resolution of the movie and television videos is high, the current movie and television videos are generally 4K in resolution, and some videos can also reach 8K in resolution or higher.

The original video is the original video which is collected by video collecting equipment such as a video camera, a video camera and the like and is not processed.

The movie and television production equipment is used for producing movie and television videos and comprises movie and television acquisition equipment such as a camera and a video camera and special movie and television post-processing equipment such as a high-performance workstation and a digital intermediate film system. The movie and television production equipment is generally internally provided with a down-sampling module which can down-sample an original movie and television video and output a first video with lower resolution than the original movie and television video. However, the downsampling module built in the movie production video is a black box for the user, and the user cannot determine what downsampling method is specifically adopted by the movie production device to generate the first video.

After the first video with low resolution is acquired in step S810, step S820 is executed.

In step S820, the first video is post-processed to obtain a second video, and the resolution of the second video is the same as the resolution of the first video.

The post-processing in step S820 can be divided into two types, one is a processing step that is greatly affected by the resolution, and the other is a processing step that is not affected by the resolution or is less affected by the resolution.

Post-processing steps that are greatly affected by resolution include clipping, special effect making, toning, and the like. The post-processing steps need to process specific image frames in the video, and the size of the video resolution affects the processing efficiency of multiple links of storage, loading, calculation and the like of the video image frames, so the post-processing steps are greatly affected by the video resolution.

Post-processing steps that are not affected by resolution or are affected little by resolution include adding dubbing, sound effect synthesis, adding subtitles, and the like.

According to an embodiment, in step S820, the first video with low resolution is preferably subjected to post-processing steps, such as clipping, special effect making, color mixing and the like, which are greatly affected by resolution, so that the time and software and hardware costs of post-processing can be greatly saved, and the processing efficiency can be improved. As for the post-processing steps of the sound effect composition, the addition of subtitles, and the like, which are hardly affected by the resolution, the cost saving of performing these steps on the first video of low resolution is not significant, and therefore, in order to secure the processing effect of these post-processing steps, it is preferable to perform these post-processing steps at high resolution, that is, after the result video of high resolution is generated at step S830, perform the sound effect composition, the addition of subtitles, and the like on the result video.

After the post-processing of the low-resolution first video is completed in step S820, step S830 is performed.

In step S830, the second video is input into a preset resolution increasing component, and the resolution increasing component increases the resolution of the second video to output a result video, where the resolution of the result video is the same as the original video.

The resolution increasing component in step S830 is the same as the resolution increasing component in step S330, and includes two parts, namely a conversion unit and a super-resolution unit. The conversion unit is used for preprocessing the second video, does not change the resolution of the video, but can improve the image quality of the video, so that the processed third video reaches a state most beneficial to resolution improvement of the super-resolution unit. That is, the third video has the same resolution as the second video but has a higher image quality, enabling the super-resolution unit to output a high-quality high-resolution video.

With regard to the structure of the transformation unit, the super-resolution unit, the training samples, the training process, etc., reference may be made to the related description of step S330, which is not repeated herein.

In step S830, the second video with low resolution may be converted into the result video with high resolution.

Fig. 9 is a diagram illustrating a movie production process implemented by the movie video processing method 800 according to the present invention. As shown in fig. 9, first, a movie capture device (e.g., a camera, a video camera, etc.) captures a high-resolution original movie video. The original movie video is then downsampled by the movie capture device itself or other movie production devices (e.g., high performance workstations, digital intermediate film systems, etc.) to generate a low resolution first video. And then, post-processing such as clipping, special effect making, color matching and the like is carried out on the low-resolution first video, and a low-resolution second video is generated. The post-processing steps of clipping, special effect making, color mixing and the like are carried out under low resolution, so that the post-processing time and the software and hardware cost are saved, and the processing efficiency is improved. And then, inputting the second video with low resolution into a resolution improving component, and improving the resolution of the second video by the resolution improving component to output a result video with high resolution. And then, performing post-processing steps such as sound effect synthesis, caption addition and the like on the result video, which are irrelevant to the video resolution or have small relation, and finally obtaining the high-resolution film.

Fig. 10 shows a flow diagram of a method 1000 of movie video processing according to another embodiment of the invention. The method 1000 is executed in a computing device for achieving low-cost, high-efficiency post-processing of movie and video videos on the premise of ensuring post-production effects. As shown in fig. 10, the method 1000 begins at step S1010.

In step S1010, a first video output by the movie production device is acquired, where the first video is obtained by down-sampling an original movie video and has a resolution lower than that of the original movie video.

Step S1010 is the same as step S810 of the method 800, and is not described herein again.

Subsequently, in step S1020, the first video is post-processed to obtain a second video, where the resolution of the second video is the same as the resolution of the first video, and the post-processing here does not involve the color matching processing.

Subsequently, in step S1030, the second video is input to a preset resolution increasing component, and the resolution increasing component increases the resolution of the second video to output a third video, where the resolution of the third video is the same as that of the original movie video.

Subsequently, in step S1040, the third video is subjected to toning processing to obtain a resultant video.

The method 1000 differs from the aforementioned method 800 in that the post-processing in step S1020 of the method 1000 does not include toning, but only includes clipping and special effect production. That is, in the method 1000, clipping, special effect production is performed at low resolution, and toning is performed at high resolution, that is, after converting the low resolution video into the high resolution video in step S1030, it is performed in step S1040. This is because, in the post-processing step, the subjectivity of the color matching processing is large, the influence of the color matching processing on the video image is not controllable, and a suitable training sample is sometimes found for the color matching processing, and therefore, when the super-resolution unit performs resolution enhancement on the low-resolution video after color matching, it may be difficult to obtain a satisfactory effect. Therefore, for the sake of reliability, the toning process is put to be performed at a high resolution, thereby ensuring the visual effect of the resulting video.

Fig. 11 is a diagram illustrating a movie production process implemented by the movie video processing method 1000 according to the present invention. As shown in fig. 11, first, a movie capture device (e.g., a camera, a video camera, etc.) captures a high-resolution original movie video. The original movie video is then downsampled by the movie capture device itself or other movie production devices (e.g., high performance workstations, digital intermediate film systems, etc.) to generate a low resolution first video. Subsequently, post-processing, such as clipping and special effect making, which does not include toning processing is performed on the low-resolution first video, and a low-resolution second video is generated. The post-processing steps such as editing, special effect making and the like are carried out under low resolution, so that the post-processing time and the software and hardware cost are saved, and the processing efficiency is improved. And then, inputting the second video with low resolution into a resolution improving component, and performing resolution improvement on the second video by the resolution improving component to output a third video with high resolution. Subsequently, the high-resolution third video is subjected to toning processing, and a resultant video is generated. And then, performing post-processing steps such as sound effect synthesis, caption addition and the like on the result video, which are irrelevant to the video resolution or have small relation, and finally obtaining the high-resolution film.

The video processing method 300, the video processing method 800 and the video processing method 1000 of the present invention are all executed in a computing device. The computing device may be, for example, any device suitable for image processing, such as a workstation dedicated to processing images and video data, or a personal computer such as a desktop computer and a notebook computer, or a mobile phone, a tablet computer, an internet of things device, and the like, but is not limited thereto.

FIG. 12 shows a schematic diagram of a computing device 1200, according to one embodiment of the invention. As shown in fig. 12, computing device 1200 includes a processor 1210 and a memory 1220. The memory 1220 has stored therein program instructions, and the processor 1210 is adapted to read and execute the program instructions in the memory 1220.

The program instructions stored in memory 1220 include at least one of program instructions for performing video processing method 300, program instructions for performing video processing method 800, and program instructions for performing video processing method 1000. When read and executed by processor 1210, certain program instructions may cause computing device 1200 to perform corresponding methods. For example, program instructions for performing video processing method 300, when read and executed by processor 1210, may cause computing device 1200 to perform video processing method 300 of the present invention.

Program instructions for performing video processing method 300, program instructions for performing video processing method 800, and program instructions for performing video processing method 1000 of the present invention may also be stored in a readable storage medium. The above program instructions in the readable storage medium, when read and executed by a computing device, may cause the computing device to perform a corresponding method.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to perform the video processing method of the present invention according to instructions in said program code stored in the memory.

By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.

In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense with respect to the scope of the invention, as defined in the appended claims.

Claims

1. A video processing method, comprising the steps of:

acquiring a first video, wherein the first video is a video obtained by down-sampling an original video and has a resolution lower than that of the original video;

processing the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video;

and inputting the second video into a preset resolution improving component, wherein the resolution improving component improves the resolution of the second video to output a result video, and the resolution of the result video is the same as that of the original video.

2. The method of claim 1, wherein the resolution enhancement component comprises a translation unit and a super-resolution unit,

the conversion unit is suitable for preprocessing the second video to output a third video which has the same resolution as the second video and is suitable for being processed by the super-resolution unit;

the super-resolution unit is suitable for performing resolution enhancement processing on the third video to output a result video with the same resolution as the original video.

3. The method of claim 2, wherein the super-resolution unit is a neural network trained by using a first sample video and a first downsampling video of the first sample video, wherein the first downsampling video of the first sample video is a video obtained by downsampling the first sample video by using a first downsampling method;

in the training process of the super-resolution unit, each image frame of a first down-sampling video of the first sample video is an input of the super-resolution unit, and a corresponding image frame of the first sample video is an output target of the super-resolution unit.

4. The method according to claim 3, wherein the conversion unit is a neural network trained by using a first downsampling video and a second downsampling video of a second sample video, wherein the first downsampling video and the second downsampling video of the second sample video are respectively obtained by downsampling the second sample video by using a first downsampling method and a second downsampling method, and the second downsampling method is used for converting the original video into the first video;

in the training process of the conversion unit, each image frame of a second downsampling video of the second sample video is an input of the conversion unit, and a corresponding image frame of a first downsampling video of the second sample video is an output target of the conversion unit.

5. The method of claim 3 or 4, wherein the first downsampling method is a bicubic interpolation method.

6. The method of claim 4 or 5, wherein during training of the conversion unit, the loss value is determined according to the following method:

calculating gradient values of all pixels of the target image of the conversion unit, and forming gradient images by the gradient values of all pixels;

performing low-pass filtering on the gradient image, and determining the weight of each pixel according to the gradient value of each pixel after filtering, wherein the larger the gradient value of the pixel is, the larger the weight of the pixel is;

calculating an absolute difference image of an output image of the conversion unit and a corresponding target image, wherein a pixel value in the absolute difference image is an absolute value of a difference between pixel values of the same position of the output image and the target image;

and taking the weighted summation result of each pixel value in the absolute difference image as a loss value.

7. The method of any one of claims 2-6, wherein the translation unit comprises, in succession, an input convolutional layer, an active layer, a plurality of residual modules, an intermediate convolutional layer, a residual connection layer, and an output convolutional layer, wherein the residual connection layer is adapted to add an output of the active layer and an output of the intermediate convolutional layer.

8. The method of claim 7, wherein the residual module comprises a plurality of residual channel attention modules connected in series.

9. The method of claim 7 or 8, wherein the activation layer employs a PReLU activation function.

10. The method of any one of claims 1-9, wherein the inputting the second video into a preset resolution boost component that boosts the resolution of the second video to output a resultant video comprises:

inputting the luminance channel map of the second video into the resolution improving component, wherein the resolution improving component improves the resolution of the luminance channel map to output a luminance result video;

upsampling the chrominance channel map of the second video to obtain a chrominance result video;

and superposing the brightness result video and the chrominance result video to obtain a result video.

11. The method according to any one of claims 1-10, wherein the original video is a movie video and the first video is a video output by a movie production device downsampling the movie video.

12. A method for processing movie and television videos comprises the following steps:

acquiring a first video output by a movie making device, wherein the first video is a video which is obtained by down-sampling an original movie video and has a resolution lower than that of the original movie video;

performing post-processing on the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video;

and inputting the second video into a preset resolution improving component, wherein the resolution improving component improves the resolution of the second video to output a result video, and the resolution of the result video is the same as that of the original film and television video.

13. The method of claim 12, wherein the post-processing comprises clipping, special effects making, toning.

14. A method for processing movie and television videos comprises the following steps:

performing post-processing on the first video to obtain a second video, wherein the resolution of the second video is the same as that of the first video, and the post-processing does not involve color matching processing;

inputting the second video into a preset resolution improving component, wherein the resolution improving component improves the resolution of the second video to output a third video, and the resolution of the third video is the same as that of the original film and television video; and

and carrying out color matching processing on the third video to obtain a result video.

15. The method of claim 13, wherein the first post-processing comprises clipping, effect production.

16. A computing device, comprising:

at least one processor; and

a memory storing program instructions;

the program instructions, when read and executed by the processor, cause the computing device to perform at least one of the video processing method of any of claims 1-11, the video processing method of claim 12 or 13, the video processing method of claim 14 or 15.

17. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform at least one of the video processing method of any of claims 1-11, the video processing method of claim 12 or 13, the video processing method of claim 14 or 15.