CN113989118A

CN113989118A - Video processing method and video processing device

Info

Publication number: CN113989118A
Application number: CN202111275096.9A
Authority: CN
Inventors: 磯部駿; 陶鑫; 戴宇荣
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The present disclosure relates to a video processing method and a video processing apparatus, the method including: aiming at an image frame at each moment in a video, obtaining an input feature of the current moment based on the image frame at the current moment, hidden layer features at m moments before the current moment and super-resolution features at n moments before the current moment; inputting the input features of the current moment into a video super-resolution model to obtain the super-resolution features of the current moment output by an output layer of the video super-resolution model and the hidden layer features of the current moment output by a hidden layer of the video super-resolution model, wherein the hidden layer features of the current moment are used for obtaining the input features of m moments after the current moment; obtaining a super-resolution image at the current moment based on the super-resolution characteristics at the current moment and the image frame at the current moment; wherein m is an integer greater than or equal to 1, n is an integer greater than or equal to 1, and m and n are not simultaneously 1.

Description

Video processing method and video processing device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video processing method and a video processing apparatus.

Background

The low-resolution video transmission is a means for saving cost and reducing bandwidth, and the video super-resolution algorithm can be matched with the transmission strategy to restore the low-resolution videos to high-resolution videos and then send the high-resolution videos to users, so that the performance of the video super-resolution algorithm is very necessary to be improved.

In the related art, the video super-resolution algorithm may include an algorithm based on explicit motion compensation. Algorithms based on explicit motion compensation tend to employ optical flow as a representation of motion between video frames, but estimates of optical flow tend to be inaccurate and time consuming.

Disclosure of Invention

The present disclosure provides a video processing method and a video processing apparatus to solve at least the problems in the related art described above.

According to a first aspect of the embodiments of the present disclosure, there is provided a video processing method, including: aiming at an image frame at each moment in a video, obtaining an input feature of the current moment based on the image frame at the current moment, hidden layer features at m moments before the current moment and super-resolution features at n moments before the current moment; inputting the input features of the current moment into a video super-resolution model to obtain the super-resolution features of the current moment output by an output layer of the video super-resolution model and the hidden features of the current moment output by a hidden layer of the video super-resolution model, wherein the hidden features of the current moment are used for obtaining the input features of m moments after the current moment; obtaining a super-resolution image at the current moment based on the super-resolution characteristics at the current moment and the image frame at the current moment; wherein m is an integer greater than or equal to 1, n is an integer greater than or equal to 1, and m and n are not simultaneously 1.

Optionally, the obtaining the input feature of the current time based on the image frame of the current time, the hidden layer features at m times before the current time, and the super-resolution features at n times before the current time includes: aiming at the hidden layer feature of each moment in m moments before the current moment, filtering out the information related to the hidden layer feature of the moment and the image frame of the current moment, and taking the information as a filtering result of the hidden layer feature of the moment; and/or filtering out information related to the hyper-resolution feature at the moment and the image frame at the current moment aiming at the hyper-resolution feature at each moment in n moments before the current moment, wherein the information is used as a filtering result of the hyper-resolution feature at the moment; and obtaining the input feature of the current moment based on the image frame of the current moment, the hidden layer features or the filtering results thereof at m moments before the current moment, and the super-resolution features or the filtering results thereof at n moments before the current moment.

Optionally, for the hidden layer feature at each time instant m times before the current time instant, filtering information about the hidden layer feature at the time instant and the image frame at the current time instant, as a filtering result of the hidden layer feature at the time instant, including: performing convolution processing on an image frame at the current moment and hidden layer features at m moments before the current moment to obtain a number of convolution processed hidden layer features, wherein a is equal to m, and the a number of convolution processed hidden layer features correspond to the m moments before the current moment one by one; activating a part, of the a convolution-processed hidden layer features, of which the correlation with the image frame at the current moment meets a first preset condition by using an activation function to obtain a activated hidden layer features; and performing dot multiplication on the a activated hidden layer features and the hidden layer features at m moments before the current moment to obtain a filtering result of the hidden layer features at m moments before the current moment.

Optionally, for the hyper-resolution feature at each time instant in n time instants before the current time instant, filtering information about the hyper-resolution feature at the time instant and the image frame at the current time instant, as a filtering result of the hyper-resolution feature at the time instant, including: performing convolution processing on the image frame at the current moment and the hyper-resolution features at n moments before the current moment to obtain b hyper-resolution features after convolution processing, wherein b is equal to n, and the b hyper-resolution features after convolution processing correspond to the n moments before the current moment one by one; activating a part, of the b convolution-processed hyper-resolution features, of which the correlation with the image frame at the current moment meets a second preset condition by using an activation function to obtain b activated hyper-resolution features; and performing point multiplication on the b activated hyper-resolution features and the hyper-resolution features at n moments before the current moment to obtain a filtering result of the hyper-resolution features at n moments before the current moment.

Optionally, hidden layer features at m times before the current time are obtained from a historical hidden layer feature queue for storing the hidden layer features; wherein the video processing method further comprises: and updating the historical hidden layer feature queue according to the hidden layer feature at the current moment.

Optionally, the updating the historical hidden layer feature queue according to the hidden layer feature at the current time includes: deleting the history hidden layer characteristics stored firstly in the history hidden layer characteristic queue, and storing the hidden layer characteristics at the current moment into the history hidden layer characteristic queue; the historical hidden layer feature queue is a first-in first-out queue used for storing m hidden layer features.

Optionally, the hyper-resolution features at n times before the current time are obtained from a historical hyper-resolution feature queue for storing the hyper-resolution features; wherein the video processing method further comprises: and updating the historical super-score feature queue according to the super-score feature at the current moment.

Optionally, the updating the historical hyper-score feature queue according to the hyper-score feature at the current time includes: deleting the history super-resolution features written in the history super-resolution feature queue at first, and writing the super-resolution features at the current moment into the history super-resolution feature queue; the historical super-score feature queue is a first-in first-out queue used for storing n super-score features.

Optionally, the video super-resolution model includes: the system comprises a first convolutional neural network, at least one residual error module and a second convolutional neural network; the method for inputting the input features of the current moment into the video super-resolution model to obtain the super-resolution features of the current moment output by the output layer of the video super-resolution model and the hidden features of the current moment output by the hidden layer of the video super-resolution model includes: inputting the input features of the current moment into the first convolution neural network to obtain a convolution result; inputting the convolution result into the at least one residual error module to obtain hidden layer characteristics at the current moment; and inputting the hidden layer characteristics of the current moment into the second convolutional neural network to obtain the hyper-resolution characteristics of the current moment.

According to a second aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including: the first acquisition module is configured to obtain an input feature of the current moment based on an image frame of the current moment, hidden layer features of m moments before the current moment and super-resolution features of n moments before the current moment for the image frame of each moment in the video; the input module is configured to input the input features of the current moment into a video super-resolution model, and obtain the super-resolution features of the current moment output by an output layer of the video super-resolution model and the hidden layer features of the current moment output by a hidden layer of the video super-resolution model, wherein the hidden layer features of the current moment are used for obtaining the input features of m moments after the current moment; the second acquisition module is configured to obtain a super-resolution image at the current moment based on the super-resolution characteristics at the current moment and the image frame at the current moment; wherein m is an integer greater than or equal to 1, n is an integer greater than or equal to 1, and m and n are not simultaneously 1.

Optionally, the first obtaining module is configured to: aiming at the hidden layer feature of each moment in m moments before the current moment, filtering out the information related to the hidden layer feature of the moment and the image frame of the current moment, and taking the information as a filtering result of the hidden layer feature of the moment; and/or filtering out information related to the hyper-resolution feature at the moment and the image frame at the current moment aiming at the hyper-resolution feature at each moment in n moments before the current moment, wherein the information is used as a filtering result of the hyper-resolution feature at the moment; and obtaining the input feature of the current moment based on the image frame of the current moment, the hidden layer features or the filtering results thereof at m moments before the current moment, and the super-resolution features or the filtering results thereof at n moments before the current moment.

Optionally, the first obtaining module is configured to: performing convolution processing on an image frame at the current moment and hidden layer features at m moments before the current moment to obtain a number of convolution processed hidden layer features, wherein a is equal to m, and the a number of convolution processed hidden layer features correspond to the m moments before the current moment one by one; activating a part, of the a convolution-processed hidden layer features, of which the correlation with the image frame at the current moment meets a first preset condition by using an activation function to obtain a activated hidden layer features; and performing dot multiplication on the a activated hidden layer features and the hidden layer features at m moments before the current moment to obtain a filtering result of the hidden layer features at m moments before the current moment.

Optionally, the first obtaining module is configured to: performing convolution processing on the image frame at the current moment and the hyper-resolution features at n moments before the current moment to obtain b hyper-resolution features after convolution processing, wherein b is equal to n, and the b hyper-resolution features after convolution processing correspond to the n moments before the current moment one by one; activating a part, of the b convolution-processed hyper-resolution features, of which the correlation with the image frame at the current moment meets a second preset condition by using an activation function to obtain b activated hyper-resolution features; and performing point multiplication on the b activated hyper-resolution features and the hyper-resolution features at n moments before the current moment to obtain a filtering result of the hyper-resolution features at n moments before the current moment.

Optionally, hidden layer features at m times before the current time are obtained from a historical hidden layer feature queue for storing the hidden layer features; wherein the video processing apparatus further comprises: the first updating module is configured to update the historical hidden layer feature queue according to the hidden layer feature at the current moment.

Optionally, the first updating module is configured to: deleting the history hidden layer characteristics stored firstly in the history hidden layer characteristic queue, and storing the hidden layer characteristics at the current moment into the history hidden layer characteristic queue; the historical hidden layer feature queue is a first-in first-out queue used for storing m hidden layer features.

Optionally, the hyper-resolution features at n times before the current time are obtained from a historical hyper-resolution feature queue for storing the hyper-resolution features; wherein the video processing apparatus further comprises: and the second updating module is configured to update the historical hyper-score feature queue according to the hyper-score feature at the current moment.

Optionally, the second updating module is configured to: deleting the history super-resolution features written in the history super-resolution feature queue at first, and writing the super-resolution features at the current moment into the history super-resolution feature queue; the historical super-score feature queue is a first-in first-out queue used for storing n super-score features.

Optionally, the video super-resolution model includes: the system comprises a first convolutional neural network, at least one residual error module and a second convolutional neural network; wherein the input module is configured to: inputting the input features of the current moment into the first convolution neural network to obtain a convolution result; inputting the convolution result into the at least one residual error module to obtain hidden layer characteristics at the current moment; and inputting the hidden layer characteristics of the current moment into the second convolutional neural network to obtain the hyper-resolution characteristics of the current moment.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement a video processing method according to the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a video processing method according to the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a video processing method according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

when the super-resolution image is obtained, historical motion states of a plurality of moments before the current moment can be utilized, time information of a video frame before the current moment is more, the blurring degree of the obtained super-resolution image can be reduced, and the super-resolution image is sharper.

Furthermore, the historical motion state can be filtered, the historical motion state related to the image frame at the current moment is screened out, the historical motion state unrelated to the image frame at the current moment is filtered out, noise can be avoided, and information resources in hidden layers and super-resolution features are utilized to the maximum extent.

Furthermore, a storage strategy of multi-step hidden layer characteristics and/or multi-step long hyper-resolution characteristics is adopted. Furthermore, by means of updating the historical hidden layer feature queue, the historical hidden layer features in the historical hidden layer feature queue can be ensured to be advanced with time, each video frame in the video can be further ensured to use the historical hidden layer features at the time close to the video frame for motion compensation, the introduction of a too long historical motion state unrelated to the image frame at the current time can be avoided, noise is avoided, and the quality of the super-resolution image is ensured. Furthermore, by means of updating the historical hyper-resolution feature queue, the historical hyper-resolution features in the historical hyper-resolution feature queue can be guaranteed to be advanced with time, each video frame in the video can be guaranteed to use the historical hyper-resolution features at the time close to the video frame for motion compensation, the introduction of a too long historical motion state irrelevant to the image frame at the current time can be avoided, noise is avoided, and the quality of the super-resolution image is guaranteed.

Furthermore, the history hidden layer characteristics in the history hidden layer characteristic queue can be realized by setting the history hidden layer characteristic queue as a first-in first-out queue, and the method is simple in realization process, convenient and fast. Furthermore, the history super-score features in the history super-score feature queue can be realized by setting the history super-score feature queue as a first-in first-out queue, so that the realization process is simple, convenient and fast.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating a video processing method according to an exemplary embodiment;

FIG. 2 is a diagram illustrating a super-resolution feature at a current time and a hidden layer feature at the current time obtained by a video super-resolution model according to an exemplary embodiment;

FIG. 3 is a diagram illustrating a comparison of implicit motion compensation results for a video processing method with RLSP mode in accordance with an exemplary embodiment;

FIG. 4 is a block diagram illustrating a video processing device according to an example embodiment;

FIG. 5 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

The video super-resolution algorithm may comprise an implicit motion compensation based algorithm. The implicit motion compensation based algorithm regards the features stored in the hidden layer as motion states at different moments, and can fuse the features stored in the hidden layer with the video frame, and at the moment, the different motion states can be regarded as implicit motion compensation for the video frame. For example, implicit motion compensation can be performed in a cyclic Latent Space Propagation (RLSP) manner:

(1) and (4) performing frame-by-frame disassembly on a video sequence.

(2) And after the decomposition, the first frame and the second frame are used as the input of the RLSP module, and then the characteristic h1 of the first moment and the super-resolution image y1 are obtained.

(3) And inputting the hidden layer feature h1 at the first moment, the second frame and the third frame into an RLSP module, and then fusing the hidden layer feature h1 at the first moment with the two video frames in the RLSP module to obtain the hidden layer feature h2 at the second moment and a super-resolved image y 2.

(4) Repeating (2) until the whole video sequence is over-divided.

The present disclosure considers that the implicit motion compensation algorithm only stores the motion state of the previous moment, ignores the motion state of the longer moment, and utilizes less time information of the video frame, which results in a blurred super-resolution image. Therefore, when the super-resolution image is obtained, the video processing method provided by the disclosure can utilize historical motion states of a plurality of moments before the current moment, the utilized time information of the video frame before the current moment is more, and the degree of blur of the obtained super-resolution image can be reduced and is sharper. Furthermore, the historical motion state can be filtered, the historical motion state related to the image frame at the current moment is screened out, the historical motion state unrelated to the image frame at the current moment is filtered out, noise can be avoided, and information resources in hidden layers and super-resolution features are utilized to the maximum extent. Furthermore, a storage strategy of multi-step hidden layer characteristics and/or multi-step long hyper-resolution characteristics is adopted. Furthermore, by means of updating the historical hidden layer feature queue, the historical hidden layer features in the historical hidden layer feature queue can be ensured to be advanced with time, each video frame in the video can be further ensured to use the historical hidden layer features at the time close to the video frame for motion compensation, the introduction of a too long historical motion state unrelated to the image frame at the current time can be avoided, noise is avoided, and the quality of the super-resolution image is ensured. Furthermore, by means of updating the historical hyper-resolution feature queue, the historical hyper-resolution features in the historical hyper-resolution feature queue can be guaranteed to be advanced with time, each video frame in the video can be guaranteed to use the historical hyper-resolution features at the time close to the video frame for motion compensation, the introduction of a too long historical motion state irrelevant to the image frame at the current time can be avoided, noise is avoided, and the quality of the super-resolution image is guaranteed. Furthermore, the history hidden layer characteristics in the history hidden layer characteristic queue can be realized by setting the history hidden layer characteristic queue as a first-in first-out queue, and the method is simple in realization process, convenient and fast. Furthermore, the history super-score features in the history super-score feature queue can be realized by setting the history super-score feature queue as a first-in first-out queue, so that the realization process is simple, convenient and fast.

Fig. 1 is a flow diagram illustrating a video processing method according to an example embodiment.

Referring to fig. 1, in step 101, for each image frame of a video at a time, the image frame X at the current time may be based on the image frame X at the current time_tAnd (namely, taking the time t as the current time), the hidden layer characteristics at m times before the current time and the super-resolution characteristics at n times before the current time to obtain the input characteristics at the current time. It should be understood that the image frame at each time in the video may be taken as the image frame at the current time in turn.

Wherein m is an integer greater than or equal to 1, n is an integer greater than or equal to 1, and m and n are not simultaneously 1. It should be understood that the values of m and n may be determined according to the aspects of computing resources, requirements on computing efficiency and overhead, and the like. For example, m and n may be made larger in the case of abundant computing resources.

For example, referring to fig. 2, m may be 3 and n may be 3, and in this case, the image frame X at the current time may be processed_tHidden layer feature h of the nearest 3 moments before the current moment_t-1、h_t-2、h_t-3And the hyper-resolution feature o of the last 3 moments before the current moment_t-1、o_t-2、o_t-3Performing a stitching operation (concatenation) to obtain an input feature of the current time, it should be understood that m is 3, n is 3, and for example, m may be 3, n may be 1, and in this case, the image frame X of the current time may be processed_tHidden layer feature h at 3 moments before the current moment_t-1、h_t-2、h_t-3And a hyper-resolution feature o of a time preceding the current time_t-1And performing splicing operation to obtain the input characteristics of the current moment.

According to the exemplary embodiments of the present disclosure, it should be noted that the difference between frames may change as the motion state changes. Therefore, if the motion state at the previous time is less correlated with the current frame, it is difficult to perform the motion compensation. For example, in diving sports, if the position of the player in the previous frame is far from the position of the player in the next frame, it is difficult to achieve the effect of motion compensation. Moreover, the introduction of the historical information irrelevant to the current frame can interfere the learning of the neural network instead, and inevitable noise is introduced. Therefore, for the hidden layer feature at each of m moments before the current moment, filtering out information related to the image frame at the current moment and the hidden layer feature at the moment as a filtering result of the hidden layer feature at the moment; and/or filtering information of the hyper-resolution feature at the moment and the image frame at the current moment aiming at the hyper-resolution feature at each moment in n moments before the current moment as a filtering result of the hyper-resolution feature at the moment. Next, image frame X may be based on the current time instant_tHidden layer characteristics or filtering results of m moments before the current moment, and super-resolution characteristics or filtering results of n moments before the current moment are obtainedInput characteristics to the current time. Therefore, the historical motion state can be filtered, the historical motion state related to the image frame at the current moment is screened out, the historical motion state unrelated to the image frame at the current moment is filtered out, noise can be avoided, and information resources in the hidden layer and the super-resolution feature are utilized to the maximum extent. For example, the image frame X at the current time can be processed_tAnd splicing the filtering results of the hidden layer features at m moments before the current moment and the super-resolution features at n moments before the current moment to obtain the input features at the current moment. For example, the image frame X at the current time can be processed_tAnd splicing the filtering results of the hidden layer features at m moments before the current moment and the filtering results of the super-resolution features at n moments before the current moment to obtain the input features at the current moment. For example, the image frame X at the current time can be processed_tAnd splicing the filtering results of the hidden layer characteristics at m moments before the current moment and the super-resolution characteristics at n moments before the current moment to obtain the input characteristics at the current moment.

According to an exemplary embodiment of the present disclosure, the image frame X at the current time may be corrected_tAnd carrying out convolution processing on the hidden layer characteristics at m moments before the current moment to obtain a hidden layer characteristics after the convolution processing. And a is equal to m, and the hidden layer features after the convolution processing are in one-to-one correspondence with m moments before the current moment. Then, the obtained a convolution-processed hidden layer features can be spliced together in the feature dimension to obtain a hidden layer feature splicing result. Next, the image frame X at the current time in the result of stitching the hidden layer feature with the feature dimension may be stitched by using an activation function (e.g., Softmax)_tThe part of the correlation satisfying the first preset condition is activated to obtain a activated hidden layer characteristics. Then, dot multiplication can be performed on the a activated hidden layer features and the hidden layer features at m times before the current time, so as to obtain a filtering result of the hidden layer features at m times before the current time. Specifically, each activated hidden layer feature is dot-multiplied with the hidden layer feature at the corresponding time. For example, the first preset condition may be: the degree of correlation exceeding a first predetermined thresholdThe value is obtained.

According to an exemplary embodiment of the present disclosure, the image frame X at the current time may also be processed_tAnd performing convolution processing on the hyper-resolution features at n moments before the current moment to obtain b hyper-resolution features after the convolution processing. And b is equal to n, and the b convolution processed hyper-resolution features correspond to n moments before the current moment one by one. Next, the obtained b convolution-processed hyper-segmentation features may be stitched together in the feature dimension to obtain a hyper-segmentation feature stitching result. Next, the image frame X at the current time in the super-divided feature stitching result may be stitched in the feature dimension by using an activation function (e.g., Softmax)_tThe part of the correlation which meets the second preset condition is activated to obtain b activated hyper-resolution characteristics. Then, the b activated hyper-resolution features and the hyper-resolution features at n moments before the current moment can be subjected to dot multiplication to obtain a filtering result of the hyper-resolution features at n moments before the current moment. Specifically, each activated hyper-score feature is dot multiplied with the hyper-score feature at its corresponding time. For example, the second preset condition may be: the correlation exceeds a second preset threshold.

Referring back to fig. 1, in step 102, the input feature at the current time may be input to the video super resolution model, and a super-resolution feature o at the current time output by the output layer of the video super resolution model is obtained_tHidden layer characteristic h of hidden layer output of video super-resolution model at current moment_t. Wherein, the hidden layer characteristic h of the current time_tIs used to derive input features m times after the current time.

According to an exemplary embodiment of the present disclosure, the video super resolution model may include: a first convolutional neural network (Conv2D), at least one residual Block (Res Block), and a second convolutional neural network. Fig. 2 is a schematic diagram illustrating obtaining a super-resolution feature at a current time and a hidden layer feature at the current time through a video super-resolution model according to an exemplary embodiment.

Referring to FIG. 2, input features at the current time may be input into a first convolutional neural network, which may serve as a fused inputBy the action of in, the convolution result obtained can be characteristic of hxwx 128. Then, the convolution result output by the first convolution neural network, that is, the characteristic of hxwx128, may be input into the at least one residual error module to obtain the hidden layer characteristic h at the current time_t. It should be noted that the convolution result may be input to a first residual block of the at least one residual block, and an output of the first residual block may be used as an input of a second residual block of the at least one residual block. By analogy, the last residual error module in the at least one residual error module can output the hidden layer feature h of the current time_t. Next, the hidden layer feature h of the current time can be set_tInputting the second convolutional neural network to obtain the hyper-resolution characteristic o of the current moment output by the second convolutional neural network_t。

It should be understood that the hidden layer characteristics at each time instant are: and inputting the input features at the moment into the video super-resolution model, and outputting the features of the specific hidden layer of the video super-resolution model. For example, in the above embodiment, the specific hidden layer is the at least one residual module.

Referring back to FIG. 1, in step 103, the hyper-resolution feature o may be based on the current time of day_tAnd image frame X at the current time_tAnd obtaining the super-resolution image at the current moment.

It should be understood that the image frame at each moment in the video can be taken as the image frame at the current moment in turn, and a super-resolution image thereof can be obtained.

As an example, the image frame X at the current time may be first processed_tAnd performing upsampling to obtain an upsampling result. The obtained upsampling result may then be compared to the hyper-resolution feature o at the current time instant_tAnd (5) overlapping to obtain the super-resolution image at the current moment.

According to an exemplary embodiment of the present disclosure, hidden layer features at m times before the current time may be obtained from a historical hidden layer feature queue for storing the hidden layer features.

The video processing method can also be used for processing the video according to the hidden layer characteristic h of the current moment_tUpdating history hidden layer feature queue. Therefore, by updating the historical hidden layer feature queue, the historical hidden layer features in the historical hidden layer feature queue can be ensured to be advanced with time, each video frame in the video can be further ensured to use the historical hidden layer features at the time close to the video frame for motion compensation, the introduction of the too long historical motion state unrelated to the image frame at the current time can be avoided, the introduction of noise is avoided, and the quality of the super-resolution image is ensured.

According to the exemplary embodiment of the disclosure, the history hidden layer feature stored first in the history hidden layer feature queue can be deleted, and the hidden layer feature h at the current moment can be used_tAnd storing the data into a history hidden layer characteristic queue. The history hidden layer feature queue may be a first-in first-out queue for storing m hidden layer features. For example, as described above, m may be 3, and the hidden layer feature at 3 times before the current time may be h_t-3、h_t-2、h_t-1. Obtaining hidden layer characteristic h at current moment_tThen, the history hidden layer feature stored first in the history hidden layer feature queue, namely the oldest history hidden layer feature h, can be deleted_t-3And the hidden layer characteristic h of the current time is used_tStoring the historical hidden layer feature queue to obtain an updated historical hidden layer feature queue: h is_t-2、h_t-1、h_t. Therefore, the history hidden layer characteristics in the history hidden layer characteristic queue can be realized by setting the history hidden layer characteristic queue as a first-in first-out queue, and the method is simple in realization process, convenient and fast.

According to an exemplary embodiment of the present disclosure, the hyper-divided feature at n times before the current time may be obtained from a historical hyper-divided feature queue for storing hyper-divided features.

The video processing method can also be used for processing the video according to the hyper-resolution characteristic o at the current moment_tAnd updating the history super-divided characteristic queue. Therefore, by updating the historical hyper-resolution feature queue, the historical hyper-resolution features in the historical hyper-resolution feature queue can be ensured to be advanced with time, and each video frame in the video can be ensured to move by using the historical hyper-resolution features at the moment close to the video frameAnd compensation can avoid introducing a too long historical motion state irrelevant to the image frame at the current moment, avoid introducing noise and ensure the quality of the super-resolution image.

According to an exemplary embodiment of the disclosure, the history super-score feature written first in the history super-score feature queue may be deleted, and the super-score feature o at the current moment may be used_tAnd writing the history super-divided characteristic queue. The historical super-divide feature queue may be a first-in-first-out queue for storing n super-divide features. For example, as mentioned above, n may be 3, and the super-resolution feature at 3 times before the current time may be o_t-3、o_t-2、o_t-1. Over-score feature o at the current time of acquisition_tThereafter, the history hyper-score feature that was first stored in the history hyper-score feature queue, i.e., the oldest history hyper-score feature o, may be deleted_t-3And the over-score feature o of the current time is used_tAnd storing the historical hyper-resolution feature queue to obtain an updated historical hyper-resolution feature queue: o_t-2、o_t-1、o_t. Therefore, the history super-score features in the history super-score feature queue can be realized by setting the history super-score feature queue as a first-in first-out queue, the realization process is simple, and the method is convenient and fast.

After obtaining the super-resolution image at time t, the next time (t +1) time of time t may be set as the current time, and the image frame X at time t +1 may be set as the current time_t+1Then the updated historical hidden layer feature queue may be used: h is_t-2、h_t-1、h_tAnd the updated history super-divided feature queue: o_t-2、o_t-1、o_tTo obtain the corresponding super-resolution image at the time t + 1. And repeating the steps until each image frame in the video obtains a super-resolution image corresponding to each image frame.

It should be noted that hidden layer features may be 128-dimensional and are relatively large, while super-resolution features may be 48-dimensional, which may be much smaller than the 128-dimensional hidden layer features. If the number m of the historical hidden layer features in the hidden layer feature queue is made smaller, for example, m is made equal to 1, and the number n of the historical super-resolution features in the historical super-resolution feature queue is made larger, for example, n is made greater than 1, the calculation efficiency is higher, but the performance may be reduced at this time, for example, the super-resolution image obtained at this time is fuzzy and not sharp enough; if the number m of the history hidden layer features in the hidden layer feature queue is made larger, for example, m is greater than 1, and the number n of the history super-resolution features in the history super-resolution feature queue is made smaller, for example, n is equal to 1, the degree of blurring of the obtained super-resolution image can be reduced, and the super-resolution image is sharper. Therefore, the length of the hidden layer feature queue and the length of the historical super-resolution feature queue can be flexibly adjusted according to actual conditions and needs, and the sharpness of the super-resolution image and the calculation efficiency are in a good balance state.

The present disclosure verifies on the video super resolution academic collection (Vid4) that Vid4 can contain test scenes of 4 long videos: leaf (leaf) sequences, Walk (Walk) sequences, City (City) sequences, and Calendar (calendars) sequences, which have different resolutions and motion patterns. Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Metric (SSIM) can be used to evaluate the advantage of the video processing method of the present disclosure in implicit motion compensation compared to RLSP, where the larger the PSNR and SSIM, the better the effect of implicit motion compensation. Fig. 3 is a diagram illustrating a comparison between implicit motion compensation results of a video processing method and RLSP mode according to an exemplary embodiment.

Referring to fig. 3, it can be seen that compared with the existing RLSP mode, the implicit motion compensation effect in the four test scenes is improved, and after a screening filtering strategy is further added, the implicit motion compensation effect is improved remarkably. For example, in a City test scenario, the PSNR of the video processing method using the filtering strategy of the present disclosure is 28.39dB, and the PSNR of the existing RLSP mode is 27.89dB, which means that the PSNR of the video processing method using the filtering strategy of the present disclosure is improved by 0.5dB compared to the existing RLSP mode.

Fig. 4 is a block diagram illustrating a video processing apparatus 400 according to an example embodiment.

Referring to fig. 4, the video processing apparatus 400 may include a first acquisition module 401, an input module 402, and a second acquisition module 403.

The first obtaining module 401 may be based on the image frame X at the current time for the image frame at each time in the video_tAnd (namely, taking the time t as the current time), the hidden layer characteristics at m times before the current time and the super-resolution characteristics at n times before the current time to obtain the input characteristics at the current time. It should be understood that the image frame at each time in the video may be taken as the image frame at the current time in turn.

According to the exemplary embodiments of the present disclosure, it should be noted that the difference between frames may change as the motion state changes. Therefore, if the motion state at the previous time is less correlated with the current frame, it is difficult to perform the motion compensation. For example, in diving sports, if the position of the player in the previous frame is far from the position of the player in the next frame, it is difficult to achieve the effect of motion compensation. Moreover, the introduction of the historical information irrelevant to the current frame can interfere the learning of the neural network instead, and inevitable noise is introduced. Therefore, the first obtaining module 401 may further filter, for the hidden layer feature at each time of m times before the current time, information related to the hidden layer feature at the time and the image frame at the current time as a filtering result of the hidden layer feature at the time; and/or filtering information of the hyper-resolution feature at the moment and the image frame at the current moment aiming at the hyper-resolution feature at each moment in n moments before the current moment as a filtering result of the hyper-resolution feature at the moment.

Then, theThe first obtaining module 401 may be based on the image frame X at the current time_tThe input feature of the current moment is obtained through hidden layer features or filtering results of the hidden layer features at m moments before the current moment, and the super-resolution features or filtering results of the super-resolution features at n moments before the current moment. Therefore, the historical motion state can be filtered, the historical motion state related to the image frame at the current moment is screened out, the historical motion state unrelated to the image frame at the current moment is filtered out, noise can be avoided, and information resources in the hidden layer and the super-resolution feature are utilized to the maximum extent. For example, the image frame X at the current time can be processed_tAnd splicing the filtering results of the hidden layer features at m moments before the current moment and the super-resolution features at n moments before the current moment to obtain the input features at the current moment. For example, the image frame X at the current time can be processed_tAnd splicing the filtering results of the hidden layer features at m moments before the current moment and the filtering results of the super-resolution features at n moments before the current moment to obtain the input features at the current moment. For example, the image frame X at the current time can be processed_tAnd splicing the filtering results of the hidden layer characteristics at m moments before the current moment and the super-resolution characteristics at n moments before the current moment to obtain the input characteristics at the current moment.

According to an exemplary embodiment of the present disclosure, the first acquisition module 401 may acquire the image frame X at the current time_tAnd carrying out convolution processing on the hidden layer characteristics at m moments before the current moment to obtain a hidden layer characteristics after the convolution processing. And a is equal to m, and the hidden layer features after the convolution processing are in one-to-one correspondence with m moments before the current moment. Then, the first obtaining module 401 may splice the obtained a convolution-processed hidden layer features together in the feature dimension to obtain a hidden layer feature splicing result. Next, the first obtaining module 401 may use an activation function (e.g., Softmax) to stitch the image frame X with the current time in the hidden layer feature stitching result in the feature dimension_tThe part of the correlation satisfying the first preset condition is activated to obtain a activated hidden layer characteristics. Then, the first obtaining module 401 may perform a hidden layer for the activated a hidden layersAnd performing dot multiplication on the features and the hidden layer features at m moments before the current moment to obtain a filtering result of the hidden layer features at m moments before the current moment. Specifically, each activated hidden layer feature is dot-multiplied with the hidden layer feature at the corresponding time. For example, the first preset condition may be: the correlation exceeds a first preset threshold.

According to an exemplary embodiment of the present disclosure, the first obtaining module 401 may further obtain the image frame X at the current time_tAnd performing convolution processing on the hyper-resolution features at n moments before the current moment to obtain b hyper-resolution features after the convolution processing. And b is equal to n, and the b convolution processed hyper-resolution features correspond to n moments before the current moment one by one. Next, the first obtaining module 401 may splice the obtained b convolution-processed hyper-differential features together in the feature dimension to obtain a hyper-differential feature splicing result. Next, the first obtaining module 401 may use an activation function (e.g., Softmax) to stitch the image frame X with the current time in the super-divided feature stitching result in the feature dimension_tThe part of the correlation which meets the second preset condition is activated to obtain b activated hyper-resolution characteristics. Then, the first obtaining module 401 may perform dot multiplication on the b activated hyper-resolution features and the hyper-resolution features n times before the current time to obtain a filtering result of the hyper-resolution features n times before the current time. Specifically, each activated hyper-score feature is dot multiplied with the hyper-score feature at its corresponding time. For example, the second preset condition may be: the correlation exceeds a second preset threshold.

The input module 402 can input the input feature of the current time to the video super-resolution model to obtain the super-resolution feature o of the current time output by the output layer of the video super-resolution model_tHidden layer characteristic h of hidden layer output of video super-resolution model at current moment_t. Wherein, the hidden layer characteristic h of the current time_tIs used to derive input features m times after the current time.

According to an exemplary embodiment of the present disclosure, the video super resolution model may include: a first convolutional neural network (Conv2D), at least one residual module (Res)Block) and a second convolutional neural network. The input module 402 may input the input features of the current time into a first convolutional neural network, which may function as a fusion input, and the obtained convolution result may be the features of hxwx 128. Then, the input module 402 may input the convolution result, i.e. the characteristic of hxwx128, into the at least one residual module to obtain the hidden layer characteristic h at the current time_t. It should be noted that the convolution result may be input to a first residual block of the at least one residual block, and an output of the first residual block may be used as an input of a second residual block of the at least one residual block. By analogy, the last residual error module in the at least one residual error module can output the hidden layer feature h of the current time_t。

Next, the input module 402 may conceal the current time-instant feature h_tInputting the second convolutional neural network to obtain the hyper-resolution characteristic o of the current moment output by the second convolutional neural network_t。

The second obtaining module 403 may be based on the hyper-score feature o of the current time_tAnd image frame X at the current time_tAnd obtaining the super-resolution image at the current moment.

As an example, the second obtaining module 403 may first obtain the image frame X at the current time_tAnd performing upsampling to obtain an upsampling result. The obtained upsampling result may then be compared to the hyper-resolution feature o at the current time instant_tAnd (5) overlapping to obtain the super-resolution image at the current moment.

According to an exemplary embodiment of the present disclosure, hidden layer features at m times before the current time may be obtained from a historical hidden layer feature queue for storing the hidden layer features. The video processing device of the present disclosure may further include a first updating module, and the first updating module may be configured to update the hidden layer feature h according to the current time_tAnd updating the history hidden layer characteristic queue. Thus, by updating the history hidden layer feature queueThe method can ensure that the historical hidden layer characteristics in the historical hidden layer characteristic queue are advanced with time, further ensure that each video frame in the video can use the historical hidden layer characteristics at the time close to the video frame to perform motion compensation, avoid introducing a too long historical motion state unrelated to the image frame at the current time, avoid introducing noise and ensure the quality of super-resolution images.

According to an exemplary embodiment of the disclosure, the first updating module may delete the history hidden layer feature stored first in the history hidden layer feature queue, and may delete the hidden layer feature h at the current time_tAnd storing the data into a history hidden layer characteristic queue. The history hidden layer feature queue may be a first-in first-out queue for storing m hidden layer features. For example, as described above, m may be 3, and the hidden layer feature at 3 times before the current time may be h_t-3、h_t-2、h_t-1. Obtaining hidden layer characteristic h at current moment_tThen, the history hidden layer feature stored first in the history hidden layer feature queue, namely the oldest history hidden layer feature h, can be deleted_t-3And the hidden layer characteristic h of the current time is used_tStoring the historical hidden layer feature queue to obtain an updated historical hidden layer feature queue: h, h_t-2、h_t-1、h_t. Therefore, the history hidden layer characteristics in the history hidden layer characteristic queue can be realized by setting the history hidden layer characteristic queue as a first-in first-out queue, and the method is simple in realization process, convenient and fast.

According to an exemplary embodiment of the present disclosure, the hyper-divided feature at n times before the current time may be obtained from a historical hyper-divided feature queue for storing hyper-divided features. The video processing apparatus of the present disclosure may further include a second updating module, where the second updating module may be configured to update the second video according to the super-score feature o of the current time_tAnd updating the history super-divided characteristic queue. Therefore, by updating the historical hyper-resolution feature queue, the historical hyper-resolution features in the historical hyper-resolution feature queue can be ensured to be advanced with time, and each video frame in the video can be further ensured to carry out motion compensation by using the historical hyper-resolution features at the moment close to the video frameThe method can avoid introducing a too long historical motion state irrelevant to the image frame at the current moment, avoid introducing noise and ensure the quality of the super-resolution image.

According to an exemplary embodiment of the disclosure, the second updating module may delete the history super-score feature written first in the history super-score feature queue and update the super-score feature o at the current moment_tAnd writing the history super-divided characteristic queue. The historical super-divide feature queue may be a first-in-first-out queue for storing n super-divide features. For example, as mentioned above, n may be 3, and the super-resolution feature at 3 times before the current time may be o_t-3、o_t-2、o_t-1. Over-score feature o at the current time of acquisition_tThereafter, the history hyper-score feature that was first stored in the history hyper-score feature queue, i.e., the oldest history hyper-score feature o, may be deleted_t-3And the over-score feature o of the current time is used_tAnd storing the historical hyper-resolution feature queue to obtain an updated historical hyper-resolution feature queue: o_t-2、o_t-1、o_t. Therefore, the history super-score features in the history super-score feature queue can be realized by setting the history super-score feature queue as a first-in first-out queue, the realization process is simple, and the method is convenient and fast.

Fig. 5 is a block diagram illustrating an electronic device 500 in accordance with an example embodiment.

Referring to fig. 5, the electronic device 500 includes at least one memory 501 and at least one processor 502, the at least one memory 501 having instructions stored therein, which when executed by the at least one processor 502, perform a video processing method according to an exemplary embodiment of the present disclosure.

By way of example, the electronic device 500 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the instructions described above. Here, the electronic device 500 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 500 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 500, the processor 502 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 502 may execute instructions or code stored in the memory 501, wherein the memory 501 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 501 may be integrated with the processor 502, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 501 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 501 and the processor 502 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 502 is able to read files stored in the memory.

In addition, the electronic device 500 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 500 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the above-described video processing method. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, comprising a computer program which, when executed by a processor, implements a video processing method according to the present disclosure.

According to the video processing method and the video processing device disclosed by the invention, when the super-resolution image is obtained, historical motion states of a plurality of moments before the current moment can be utilized, the time information of the utilized video frame before the current moment is more, and the blurring degree of the obtained super-resolution image can be reduced and is sharper. Furthermore, the historical motion state can be filtered, the historical motion state related to the image frame at the current moment is screened out, the historical motion state unrelated to the image frame at the current moment is filtered out, noise can be avoided, and information resources in hidden layers and super-resolution features are utilized to the maximum extent. Furthermore, a storage strategy of multi-step hidden layer characteristics and/or multi-step long hyper-resolution characteristics is adopted. Furthermore, by means of updating the historical hidden layer feature queue, the historical hidden layer features in the historical hidden layer feature queue can be ensured to be advanced with time, each video frame in the video can be further ensured to use the historical hidden layer features at the time close to the video frame for motion compensation, the introduction of a too long historical motion state unrelated to the image frame at the current time can be avoided, noise is avoided, and the quality of the super-resolution image is ensured. Furthermore, by means of updating the historical hyper-resolution feature queue, the historical hyper-resolution features in the historical hyper-resolution feature queue can be guaranteed to be advanced with time, each video frame in the video can be guaranteed to use the historical hyper-resolution features at the time close to the video frame for motion compensation, the introduction of a too long historical motion state irrelevant to the image frame at the current time can be avoided, noise is avoided, and the quality of the super-resolution image is guaranteed. Furthermore, the history hidden layer characteristics in the history hidden layer characteristic queue can be realized by setting the history hidden layer characteristic queue as a first-in first-out queue, and the method is simple in realization process, convenient and fast. Furthermore, the history super-score features in the history super-score feature queue can be realized by setting the history super-score feature queue as a first-in first-out queue, so that the realization process is simple, convenient and fast.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video processing method, comprising:

aiming at an image frame at each moment in a video, obtaining an input feature of the current moment based on the image frame at the current moment, hidden layer features at m moments before the current moment and super-resolution features at n moments before the current moment;

inputting the input features of the current moment into a video super-resolution model to obtain the super-resolution features of the current moment output by an output layer of the video super-resolution model and the hidden features of the current moment output by a hidden layer of the video super-resolution model, wherein the hidden features of the current moment are used for obtaining the input features of m moments after the current moment;

obtaining a super-resolution image at the current moment based on the super-resolution characteristics at the current moment and the image frame at the current moment;

wherein m is an integer greater than or equal to 1, n is an integer greater than or equal to 1, and m and n are not simultaneously 1.

2. The method of claim 1, wherein obtaining the input feature for the current time based on the image frame at the current time, the hidden layer features m times before the current time, and the super-resolution features n times before the current time comprises:

aiming at the hidden layer feature of each moment in m moments before the current moment, filtering out the information related to the hidden layer feature of the moment and the image frame of the current moment, and taking the information as a filtering result of the hidden layer feature of the moment; and/or filtering out information related to the hyper-resolution feature at the moment and the image frame at the current moment aiming at the hyper-resolution feature at each moment in n moments before the current moment, wherein the information is used as a filtering result of the hyper-resolution feature at the moment;

and obtaining the input feature of the current moment based on the image frame of the current moment, the hidden layer features or the filtering results thereof at m moments before the current moment, and the super-resolution features or the filtering results thereof at n moments before the current moment.

3. The method as claimed in claim 2, wherein the filtering, for the hidden layer feature at each time instant m time instants before the current time instant, information related to the image frame at the current time instant from the hidden layer feature at the time instant, as a filtering result of the hidden layer feature at the time instant, comprises:

performing convolution processing on an image frame at the current moment and hidden layer features at m moments before the current moment to obtain a number of convolution processed hidden layer features, wherein a is equal to m, and the a number of convolution processed hidden layer features correspond to the m moments before the current moment one by one;

activating a part, of the a convolution-processed hidden layer features, of which the correlation with the image frame at the current moment meets a first preset condition by using an activation function to obtain a activated hidden layer features;

and performing dot multiplication on the a activated hidden layer features and the hidden layer features at m moments before the current moment to obtain a filtering result of the hidden layer features at m moments before the current moment.

4. The method as claimed in claim 2, wherein for the hyper-segmentation feature at each time instant of n time instants before the current time instant, filtering out information related to the image frame at the current time instant from the hyper-segmentation feature at the time instant, as a filtering result of the hyper-segmentation feature at the time instant, comprising:

performing convolution processing on the image frame at the current moment and the hyper-resolution features at n moments before the current moment to obtain b hyper-resolution features after convolution processing, wherein b is equal to n, and the b hyper-resolution features after convolution processing correspond to the n moments before the current moment one by one;

activating a part, of the b convolution-processed hyper-resolution features, of which the correlation with the image frame at the current moment meets a second preset condition by using an activation function to obtain b activated hyper-resolution features;

and performing point multiplication on the b activated hyper-resolution features and the hyper-resolution features at n moments before the current moment to obtain a filtering result of the hyper-resolution features at n moments before the current moment.

5. The method of claim 1, wherein hidden layer features at m times before a current time are obtained from a historical hidden layer feature queue for storing hidden layer features;

wherein the video processing method further comprises:

and updating the historical hidden layer feature queue according to the hidden layer feature at the current moment.

6. The method of claim 5, wherein the updating the historical hidden layer feature queue according to the hidden layer feature at the current time comprises:

deleting the history hidden layer characteristics stored firstly in the history hidden layer characteristic queue, and storing the hidden layer characteristics at the current moment into the history hidden layer characteristic queue;

the historical hidden layer feature queue is a first-in first-out queue used for storing m hidden layer features.

7. A video processing apparatus, comprising:

the first acquisition module is configured to obtain an input feature of the current moment based on an image frame of the current moment, hidden layer features of m moments before the current moment and super-resolution features of n moments before the current moment for the image frame of each moment in the video;

the input module is configured to input the input features of the current moment into a video super-resolution model, and obtain the super-resolution features of the current moment output by an output layer of the video super-resolution model and the hidden layer features of the current moment output by a hidden layer of the video super-resolution model, wherein the hidden layer features of the current moment are used for obtaining the input features of m moments after the current moment;

the second acquisition module is configured to obtain a super-resolution image at the current moment based on the super-resolution characteristics at the current moment and the image frame at the current moment;

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video processing method of any of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video processing method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the video processing method of any of claims 1-6 when executed by a processor.