CN115243044A

CN115243044A - Reference frame selection method and device, equipment and storage medium

Info

Publication number: CN115243044A
Application number: CN202210892757.0A
Authority: CN
Inventors: 陈志波; 符军; 刘森; 杨智尧
Original assignee: University of Science and Technology of China USTC; Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: University of Science and Technology of China USTC; Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-25
Also published as: WO2024022047A1

Abstract

The application provides a reference frame selection method, a device, equipment and a storage medium; wherein the method comprises the following steps: acquiring E first adjacent frames of a video frame to be processed; selecting a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the image content of the E first adjacent frames; wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

Description

Reference frame selection method and device, equipment and storage medium

Technical Field

The present application relates to video image technology, and relates to, but is not limited to, a method and apparatus for selecting a reference frame, a device, and a storage medium.

Background

Video has become the most popular way of consuming content today. According to the relevant reports, video viewing accounts for 82% of all internet traffic by 2022. To reduce transmission bandwidth and storage costs, video service providers typically compress video. However, some video compression algorithms tend to produce visually objectionable compression artifacts due to the use of block transform based coding schemes. Therefore, it is very necessary to develop video enhancement algorithms.

In the related technology, a reference frame is selected from adjacent frames of a current frame, and then the current frame is enhanced according to the reference frame to obtain a reconstructed frame with better image quality than the current frame; however, the image quality of the reconstructed frame obtained based on the related art cannot satisfy the quality requirement.

Disclosure of Invention

In view of this, the reference frame selection method, apparatus, device, and storage medium provided in the present application aim to select a better reference frame, so as to help better enhance the image quality of a video frame to be processed and improve the image quality of a reconstructed frame.

According to an aspect of an embodiment of the present application, there is provided a reference frame selection method, including: acquiring E first adjacent frames of a video frame to be processed; wherein E is greater than 1; selecting a first reference frame of the video frame to be processed from the E first adjacent frames according to the video frame to be processed and the image content of the E first adjacent frames; wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

In this way, because the image contents of the video frame to be processed and its E first adjacent frames are considered when selecting the reference frame, rather than simply selecting the reference frame based on the position relationship between the adjacent frames and the video frame to be processed (for example, taking the previous frame and the next frame of the video frame to be processed as its reference frame), the selected reference frame is more adaptive to the image contents of the video frame to be processed, so that the image quality of the video frame to be processed can be better enhanced, and the image enhancement quality of the video frame to be processed is improved.

According to an aspect of an embodiment of the present application, there is provided a video enhancement method, including: acquiring E first adjacent frames of a video frame to be processed; wherein E is greater than 1; selecting a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the image content of the E first adjacent frames; and enhancing the image quality of the video frame to be processed according to the first reference frame.

According to an aspect of an embodiment of the present application, there is provided a reference frame selecting apparatus, including: the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire E first adjacent frames of video frames to be processed; wherein E is greater than 1; a selection module configured to select a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the E first adjacent frames; wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

According to an aspect of an embodiment of the present application, there is provided a video enhancement apparatus, including: the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire E first adjacent frames of a video frame to be processed; wherein E is greater than 1; a selection module configured to select a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the E first adjacent frames; and the enhancement module is configured to enhance the image quality of the video frame to be processed according to the first reference frame.

According to an aspect of the embodiments of the present application, there is provided an electronic device, including a memory and a processor, the memory storing a computer program executable on the processor, and the processor implementing the method of the embodiments of the present application when executing the program.

According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the method provided by the embodiments of the present application.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 is a schematic flowchart illustrating an implementation process of a reference frame selection method according to an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of another reference frame selection method according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a relationship between a first adjacent frame and a video frame to be processed according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a process of determining a trained image enhancement model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a process for determining a trained reference frame selection model according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a process of determining a trained target image enhancement model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating a video decompression distortion removing process according to an embodiment of the present application;

FIG. 8 is a schematic flowchart illustrating an operation of an adaptive reference frame selection module according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a training process provided by an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating a comparison between subjective performance of compression distortion removal according to an embodiment of the present application and a heuristic reference frame selection method according to the present application;

fig. 11 is a schematic structural diagram of a reference frame selecting apparatus according to an embodiment of the present application;

fig. 12 is a hardware entity diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, specific technical solutions of the present application will be described in further detail below with reference to the accompanying drawings in the embodiments of the present application. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

In the following description, references to "some embodiments," "the present embodiment," "embodiments of the present application," and examples, etc., describe a subset of all possible embodiments, but it is understood that "some embodiments" can be the same subset or different subsets of all possible embodiments, and can be combined with each other without conflict.

It should be noted that the terms "first \ second \ third \ fourth \ fifth", and the like in the embodiments of the present application are used for distinguishing similar or different objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third fourth \ fifth" and the like may be interchanged with a specific order or sequence where possible, so that the embodiments of the present application described herein can be implemented in an order other than that shown or described herein.

The embodiment of the application provides a reference frame selection method, which is applied to an electronic device, and the electronic device may be various types of devices with information processing capability in the implementation process, for example, the electronic device may include a mobile phone, a tablet computer, a desktop, a television, a projection device, and the like. The functions implemented by the method can be implemented by calling program code by a processor in an electronic device, and the program code can be stored in a computer storage medium.

Fig. 1 is a schematic flow chart of an implementation of a reference frame selection method provided in an embodiment of the present application, and as shown in fig. 1, the method may include the following steps 101 to 102:

step 101, obtaining E first adjacent frames of a video frame to be processed; wherein E is greater than 1;

102, selecting a first reference frame of the video frame to be processed from the E first adjacent frames according to the image contents of the video frame to be processed and the E first adjacent frames; wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

In the embodiment of the application, because the image contents of the video frame to be processed and its E first adjacent frames are considered when selecting the reference frame, rather than simply selecting the reference frame based on the position relationship between the adjacent frames and the video frame to be processed (for example, taking the previous frame and the next frame of the video frame to be processed as its reference frames), the selected reference frame is more adaptive to the image contents of the video frame to be processed, so that the image quality of the video frame to be processed can be better enhanced, and the image enhancement quality of the video frame to be processed is improved.

In the embodiment of the present application, the processor implementing step 101 and step 102 and the processor for enhancing the image quality of the video frame to be processed may be the same processor or different processors, and this is not limited thereto.

Fig. 2 is a schematic flow chart illustrating an implementation of another reference frame selection method provided in the embodiment of the present application, and as shown in fig. 2, the method includes the following steps 201 to 203:

step 201, acquiring E first adjacent frames of a video frame to be processed;

step 202, processing the image contents of the video frame to be processed and the E first adjacent frames through a reference frame selection model obtained through pre-training to obtain a first reference frame of the video frame to be processed; wherein the reference frame selection model is an AI model;

step 203, inputting the video frame to be processed and the corresponding first reference frame into a target image enhancement model obtained by pre-training, so as to obtain a fifth reconstructed frame of the video frame to be processed.

Further alternative embodiments of the above steps and related terms are described below.

In step 201, E first neighboring frames of a video frame to be processed are obtained.

The term "adjacent frame" as used herein, whether the first adjacent frame, the second adjacent frame, the third adjacent frame, etc., as mentioned below, is a broad concept and refers to a video frame within a certain range from the video frame or the sample frame to be processed, and includes a video frame at least one time before the video frame or the sample frame to be processed and/or a video frame at least one time after the video frame or the sample frame to be processed.

For example, taking a video frame to be processed as an example, as shown in FIG. 3, assume X _t For video frames to be processed, video frame X _t-1 To video frame X _t-N (i.e., video frame X) _t The first N frames) are all video frames X _t First adjacent frame of (2), video frame X _t+1 To video frame X _t+N (i.e., video frame X) _t Last N frames) are all video frames X _t A first adjacent frame of (a); wherein N is greater than 0.

In step 202, processing the image contents of the video frame to be processed and the E first adjacent frames through a reference frame selection model obtained through pre-training to obtain a first reference frame of the video frame to be processed; wherein the reference frame selection model is an AI model.

In the embodiment of the present application, the structure of the reference frame selection model is not limited, and various AI models may be used. For example, linear regression, logistic regression, linear discriminant analysis, decision trees, bayes, K-nearest neighbors, learning vector quantization, support vector machines, bagging, and random forest or deep neural networks, and the like may be used.

Further, in some embodiments, the reference frame selection model is a convolutional neural network, comprising at least: the device comprises a convolution layer, a pooling layer, a full-connection layer and an output layer; the convolution layer is used for carrying out convolution operation on image contents of the video frame to be processed and E first adjacent frames of the video frame to be processed to obtain a feature map; the pooling layer is used for pooling the characteristic diagram to obtain a pooled characteristic diagram; the full connection layer is used for determining the probability that the first adjacent frame is selected as the first reference frame according to the pooled feature map; the output layer is used for selecting the first reference frame according to the probability that the E first adjacent frames are selected as the first reference frame.

In some embodiments, the video frame to be processed and the E first adjacent frames may be input into the convolutional layer one by one, and the convolutional layer performs a convolution operation on the input image content one by one; in other embodiments, the video frames may be merged and then input to the convolutional layer, that is, after the video frame to be processed and the E first adjacent frames are merged along the channel dimension, a merged video frame is obtained, and then the merged video frame is input to the convolutional layer, and the features of the merged video frame are extracted by using the convolution operation.

The merging operation is to simply splice or combine the video frames to obtain a larger-sized image. For example, assume that the images to be merged include image a and image B; wherein image A is represented as

Image B is represented as

Then the two frames of images are merged to obtain a merged video frame

In some embodiments, the full-link layer is further configured to determine, according to the pooled feature map, probabilities that F different numbers of frames are respectively used as the number of the first reference frames; wherein F is greater than 1; correspondingly, the output layer is used for determining a target number according to the probability that the F different frame numbers are respectively used as the number of the first reference frames; and selecting a target number of first reference frames according to the probability that the E first adjacent frames are selected as the first reference frames.

Further, in some embodiments, the number of first reference frames with the highest probability or with a probability greater than a number threshold may be taken as the target number.

In step 203, the video frame to be processed and the corresponding first reference frame are input into a target image enhancement model obtained by pre-training, so as to obtain a fifth reconstructed frame of the video frame to be processed.

The determination process of the target image enhancement model is divided into three stages, namely a first stage, a second stage and a third stage are sequentially carried out; the trained reference frame selection model can be obtained through the first stage and the second stage, and then the image enhancement model is retrained based on the reference frame selection model obtained through the second stage in the third stage.

The method comprises the following steps that in the first stage, an initial model of an image enhancement model is pre-trained, so that the image enhancement model with robustness and reliability is obtained, namely, a better enhancement effect can be obtained for various reference frames; the second stage is that a pre-trained image enhancement model is fixed, and an initial model of a reference frame selection model is trained, so that rapid convergence can be realized, and the trained reference frame selection model is obtained; in the third stage, the image enhancement model is retrained based on the reference frame selection model obtained in the second stage, so that a target image enhancement model with better performance is obtained.

The detailed description of these three phases is as follows:

the first stage is as follows: obtaining a trained image enhancement model, comprising:

as shown in fig. 4, according to a plurality of fourth adjacent frames of the third sample frame and the third sample frame, performing a second adjustment process on the model parameters of the second initial model of the image enhancement model 401 to obtain an adjusted second initial model; wherein the second adjustment processing includes: sampling at least one third reference frame from the plurality of fourth adjacent frames; inputting the third sample frame and the at least one third reference frame into a second initial model to obtain a third reconstructed frame of the third sample frame; determining a second loss of the third reconstructed frame based at least on the third reconstructed frame and a standard frame of the third sample frame; adjusting the model parameters of the second initial model according to the second loss;

and performing second adjustment processing on the adjusted model parameters of the second initial model according to a plurality of fifth adjacent frames and fourth sample frames of the fourth sample frame until the corresponding obtained second loss or iteration number meets a cutoff condition, so as to obtain an image enhancement model 401. That is, model parameters of the second initial model are continuously trained through a large number of sample frames and corresponding adjacent frames, and finally, when a training result meets a cutoff condition, an image enhancement model, namely the trained second initial model, is obtained.

It is to be understood that, when returning to the second adjustment processing of the model parameters of the second initial model again, the data on which the new sample data is based is, for example, in the step of "performing the second adjustment processing of the model parameters of the adjusted second initial model according to the fifth adjacent frames and the fourth sample frame of the fourth sample frame", the data on which the new sample data is based is the fourth sample frame and the fifth adjacent frames, but not the third sample frame and the fourth adjacent frames.

The standard frame of the third sample frame mentioned herein refers to an image frame whose image quality meets the index requirement, for example, the standard frame is a lossless frame. The standard frames of the other sample frames mentioned below can be understood with reference to the description herein of the standard frame of the third sample frame.

In this embodiment of the application, a sampling manner is not limited, and the third reference frame may be obtained by randomly sampling from the plurality of fourth adjacent frames, or according to another predetermined sampling strategy. In summary, the reference frame sampled for a plurality of iterations is different in position relative to the sample frame; therefore, the image enhancement model obtained by training can output a reconstructed frame with better image quality for any input reference frame.

In an embodiment of the present application, the second penalty meeting the cutoff condition includes the second penalty being less than a first threshold. The number of iterations meeting the cutoff condition includes the number of iterations reaching a second threshold.

For the step of determining a second loss of the third reconstructed frame at least from the third reconstructed frame and the standard frame of the third sample frame, in some embodiments, the second loss may be determined based on a difference of the third reconstructed frame and the standard frame of the third sample frame. For example, the second loss L can be calculated by the following formula (1) ₂ ：

In the formula (1), the reaction mixture is,

for the third reconstructed frame, Y _t ε is set to 1e-6 for the standard frame of the third sample frame.

It can be understood that, after the second initial model is trained, that is, the image enhancement model is obtained, the image enhancement model can be used as an evaluator to evaluate the quality of the reference frame selected by the first initial model of the reference frame selection model, so as to implement the training of the first initial model. In particular, reference is made to the following detailed description of the second stage.

And a second stage: obtaining a trained reference frame selection model, comprising:

as shown in fig. 5, according to a plurality of second adjacent frames of a first sample frame and the first sample frame, performing a first adjustment process on a model parameter of a first initial model of the reference frame selection model 501 to obtain an adjusted first initial model;

wherein the first adjustment processing includes: inputting the plurality of second adjacent frames and the first sample frame into the first initial model to obtain a second reference frame of the first sample frame; acquiring a first reconstructed frame of the first sample frame, where the first reconstructed frame is obtained by an image enhancement model 401 obtained by pre-training based on the first sample frame and a corresponding second reference frame; determining a first loss of the first reconstructed frame at least according to the first reconstructed frame and a standard frame of the first sample frame; adjusting model parameters of the first initial model according to the first loss;

and according to a plurality of third adjacent frames of the second sample frame and the second sample frame, performing the first adjustment processing on the adjusted model parameters of the first initial model until the corresponding obtained first loss or iteration times meet a cutoff condition, so as to obtain a reference frame selection model 501.

Thus, on one hand, the influence of the selection of the reference frame on the enhancement effect of the image enhancement model is considered in the second stage; therefore, the trained reference frame selection model can obtain a better reference frame, and the image quality of the video frame to be processed is favorably and better enhanced. On the other hand, the pre-trained image enhancement model is used for training the reference frame selection model, so that the training process is rapidly converged, and the calculation power consumption is saved.

It will be appreciated that the reference frame selection model is structurally identical to the first initial model except for the values of the model parameters. In some embodiments, the reference frame selection model is a convolutional neural network, and the inputting the plurality of second adjacent frames and the first sample frame into the first initial model to obtain a second reference frame of the first sample frame includes: performing convolution operation on the input image content of the first sample frame and the second adjacent frame thereof through the convolution layer to obtain a characteristic diagram; then, after pooling operation is carried out on the feature map output by the convolutional layer through the pooling layer, the feature map is output to the full connection layer, and the full connection layer determines the probability that each second adjacent frame is selected as a second reference frame based on the pooled feature map; the output layer selects a second reference frame based on the probabilities of the plurality of second neighbors being selected as second reference frames.

It should be noted that the first reconstructed frame is obtained by the image enhancement model 401 obtained in the first stage based on the input first sample frame and the corresponding second reference frame.

The "determining the first loss of the first reconstructed frame at least based on the first reconstructed frame and the standard frame of the first sample frame" in the first adjustment process may be implemented in embodiment 1 or embodiment 2 as follows, and of course, the first loss may be determined by other methods.

In embodiment 1, the first loss L is determined by the following formula (2) in the same manner as the determination method of the second loss ₁ ：

In the formula (2), the reaction mixture is,

for the first reconstructed frame, Y _t ε is set to 1e-6 for the standard frame of the first sample frame.

In example 2, the first loss can also be determined by: determining a first reward of the first reconstructed frame according to the first reconstructed frame and a standard frame of the first sample frame; taking the first sample frame as a starting point, taking a continuous M1 frame before the first sample frame and a continuous M2 frame after the first sample frame as fifth reference frames, and inputting the fifth reference frames and the first sample frame into the image enhancement model to obtain a second reconstructed frame of the first sample frame; wherein M1 and M2 are greater than 0 and less than or equal to half of the number of the second plurality of adjacent frames; determining a second reward of the second reconstructed frame according to the second reconstructed frame and the standard frame of the first sample frame; determining the first loss based on the first reward, the second reward, and a probability that the second reference frame is selected as a reference frame; in this way, in the embodiment of the present application, the first loss is calculated not only according to the loss of the first reconstructed frame (i.e., the reconstructed frame calculated by the image enhancement model based on the reference frame output by the first initial model of the reference frame selection model in the embodiment of the present application), but also according to the loss of the second reconstructed frame (i.e., the reconstructed frame calculated by the image enhancement model based on the reference frame selected by the reference method), so that the reference frame output by the finally trained reference frame selection model is better than the reference frame selected by the reference method.

Further, in some embodiments, the first reward may be determined according to equation (3) below

In equation (3), the function f (-) is used to calculate the PSNR, Y of the first reconstructed frame _t A standard frame that is the first frame of the sample,

is a first reconstructed frame;

based on this, in order to maximize the desired reward, the first loss L is calculated using a loss function as shown in the following formula (4) ₁ ：

In equation (4), K represents the total number of second reference frames,

representing the probability of the second reference frame being selected as the reference frame,

it is referred to as a first bonus that is,

refers to the secondary prize.

And a third stage: in order to obtain a better performance image enhancement model, the image enhancement model also needs to be retrained based on the reference frame selection model obtained by the second stage training, and in some embodiments, the method includes:

as shown in fig. 6, according to a plurality of sixth adjacent frames of the fifth sample frame and the fifth sample frame, performing a third adjustment process on the model parameters of the image enhancement model 401 to obtain an adjusted image enhancement model;

wherein the third adjustment processing includes: inputting the sixth adjacent frames and the fifth sample frame into the reference frame selection model 501 to obtain a fourth reference frame of the fifth sample frame; acquiring a fourth reconstructed frame of the fifth sample frame; the fourth reconstructed frame is obtained by the image enhancement model 401 based on the fifth sample frame and the corresponding fourth reference frame; determining a third loss of the fourth reconstructed frame based on at least the fourth reconstructed frame and a standard frame of the fifth sample frame; adjusting model parameters of the image enhancement model 401 according to the third loss;

and adjusting the model parameters of the adjusted image enhancement model according to a plurality of seventh adjacent frames and sixth sample frames of the sixth sample frame until the correspondingly obtained third loss or iteration times meet the cutoff condition, so as to obtain the target image enhancement model.

Therefore, the trained reference frame selection model is fixed, the image enhancement model is trained again, the performance of the image enhancement model can be further improved, and therefore a reconstructed frame with better image quality is obtained through the target image enhancement model in an online use stage.

It should be noted that, as for the method for determining the third loss, the method for determining the first loss or the second loss may be referred to, and the description thereof will not be repeated.

In the embodiment of the present application, a network structure of the image enhancement model is not limited, and may be any video enhancement network. For example, the network structure of the image enhancement model is EDVR.

An embodiment of the present application further provides a video enhancement method, including: acquiring E first adjacent frames of a video frame to be processed; wherein E is greater than 1; selecting a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the image content of the E first adjacent frames; and enhancing the image quality of the video frame to be processed according to the first reference frame.

In some embodiments, the enhancing the image quality of the video frame to be processed according to the first reference frame includes: and inputting the video frame to be processed and the corresponding first reference frame into the target image enhancement model to obtain a fifth reconstruction frame of the video frame to be processed.

It should be noted that the above description of the embodiment of the video enhancement method is similar to the description of the embodiment of the reference frame selection method, and has similar beneficial effects to the embodiment of the reference frame selection method. For technical details not disclosed in the embodiments of the video enhancement method of the present application, please refer to the description of the embodiments of the frame selection method of the present application for understanding.

The reference frame selection method and the video enhancement method provided by the embodiment of the application are applicable to scenes such as video decompression and video deblurring.

Some video compression algorithms are prone to visually objectionable compression artifacts due to the use of block transform based coding schemes. Therefore, it is very necessary to develop a video decompression artifact algorithm. Considering the existence of time sequence redundancy in the video, the video decompression and distortion algorithm mines spatio-temporal information from the reference frame to remove the compression distortion of the current frame. As shown in fig. 7, the video decompression distortion algorithm first selects a reference frame from adjacent frames by using a reference frame selection module, and then inputs the reference frame and a current frame to a decompression distortion module to obtain a reconstructed frame. In order to improve the quality of the reconstructed frame, researchers have focused on designing better decompression distortion modules and have focused less on the design of the reference frame selection module. In the embodiments of the present application, instead, attention is focused on optimizing the reference frame selection module.

The relevant reference frame selection modules are designed based on heuristic rules. For example, the two nearest peak quality frames are taken as reference frames in example 1; the previous frame is taken as a reference frame in example 2; in example 3, adjacent preceding and following frames are taken as reference frames; in example 4, the neighboring video frame or I/P frame with lower quantization parameter is taken as the reference frame.

However, the above-mentioned reference frame selection module designed based on heuristic rules cannot adaptively select reference frames according to video content and easily finds sub-optimal reference frames. In particular, the method described in example 1 ignores high quality detail information in low quality frames. The method described in example 2 ignores information from subsequent frames. The method described in example 3 is not necessarily optimal for each frame because the quality fluctuation of the adjacent frames of each frame is usually not the same. The method described in example 4 also ignores high quality detail information in low quality frames and is highly dependent on the information at the decoding end.

Based on this, an exemplary application of the embodiments of the present application in a practical application scenario will be described below.

Aiming at the problems that a reference frame cannot be selected adaptively according to video content, a sub-optimal solution is easy to be caused and the like in a reference frame selection module, the embodiment of the application provides a video decompression distortion removing method based on adaptive reference frame selection. The method includes two modules, an adaptive reference frame selection module (i.e., an example of a reference frame selection model) and a de-compression distortion module (i.e., an example of an image enhancement model). Firstly, the self-adaptive reference frame selection module selects a reference frame according to the information of the current frame and the adjacent frames thereof, and then the decompression distortion module carries out the operation of decompression distortion on the current frame based on the selected reference frame.

The working flow of the adaptive reference frame selection module is shown in fig. 8, and the module selects the reference frame according to the information of the current frame and its neighboring frames. The specific steps include the following steps 801 to 805:

step 801, firstly, giving a current video frame, a first N frames of video frames, a second N frames of video frames, and 2N +1 video frames in total, and merging the video frames along a channel dimension;

step 802, extracting the features of the merged video frame by using convolution operation, and outputting a feature map with smaller spatial resolution and more channels;

step 803, converting the feature map into a 1-dimensional vector by using average pooling;

step 804, convert the 1-dimensional vector into a probability distribution, i.e. X, using the fully-connected operation layer _t-N To X _t-1 And X _t+1 To X _t+N A probability of each being selected as a reference frame;

from 2N adjacent frames (i.e., X) according to a probability distribution, step 805 _t-N To X _t-1 And X _t+1 To X _t+N ) The K frames are selected as reference frames.

The input of the decompression distortion removal module is a current frame and a selected reference frame, the output of the decompression distortion removal module is a current frame with compression distortion removed, and the network structure of the decompression distortion removal module can adopt any video enhancement network, in the example, the video enhancement network EDVR is adopted.

The following describes a training mode related to the embodiment of the present application, and the training is divided into three stages: training a decompression distortion removal module based on a reference frame selection strategy of random sampling in a stage I; in the second stage, a compression distortion removal module is fixed, and a self-adaptive reference frame selection module is trained; in stage three, the adaptive reference frame selection module is fixed and the de-compression distortion module is retrained. The three stages are described in detail below.

And in the first stage, a decompression distortion removal module is trained based on a reference frame selection strategy of random sampling. As shown in fig. 9, K frames are uniformly sampled from 2N adjacent frames as reference frames, then the reference frames and the current frame are input to the de-compression distortion module, and finally the de-compression distortion module is optimized by charbonier loss function as shown in the following formula (5):

in the formula (5), the reaction mixture is,

is the output of the decompression distortion module, Y _t Is a lossless picture, ε is set to 1e-6. C. H and W correspond to output graphs respectivelyNumber of channels, height and width of the image.

And in the second stage, a decompression distortion removing module is fixed, and an adaptive reference frame selecting module is trained. As shown in fig. 9, a training mode based on reinforcement learning is adopted, i.e. the adaptive reference frame selection module is optimized according to the quality of the reconstructed image. The states, actions and rewards in this training mode are defined as follows.

The state is as follows:

defined as input 2N +1 consecutive frames.

The actions are as follows:

is derived from the probability distribution p ∈ R ^2N And (4) sampling. P is the output of the adaptive reference frame selection module, and satisfies

P _0:N And P _N:2N Respectively correspond to previous frames X _[t-N:t-1] And subsequent frame X _[t-N:t-1] Is selected.

Rewarding:

reflect in the state

Taking action

The value of (A) is obtained. As shown in the following equation (6), the quality of the reconstructed map is used as a reward:

in equation (6), f is used to calculate PSNR of the reconstructed image.

To maximize the desired reward, a loss function is used as shown below (7):

in the formula (7), the reaction mixture is,

in adjacent preceding and succeeding frames { X _t-K/2 ...,X _t-1 ,X _t+1 ,...,X _t+K/2 As the reconstruction quality of the reference frame.

And step three, fixing the self-adaptive reference frame selection module and retraining the decompression distortion module. As shown in FIG. 9, after the adaptive reference frame selection module is trained, the de-compression distortion module is retrained based on the learned reference frame selection strategy.

In order to verify the effectiveness of the adaptive reference frame selection module in the method, the adaptive reference frame selection module is compared with three common heuristic reference frame selection methods such as Adjacent, MQF and PQF. The Adjacent method is to connect Adjacent front and back frames { X _t-K/2 ...,X _t-1 ,X _t+1 ,...,X _t+K/2 And taking the K frame with the highest quality in adjacent frames as a reference frame by an MQF method, and taking two nearest peak quality frames as reference frames by a PQF method. The reference frame search radius N in the adaptive reference frame selection module is set to 10.

The test data were 18 test sequences in the public data set. The resolution of these test sequences varies from 352 × 240 to 2560 × 1680 and is compressed by the HEVC encoder in a certain handset.

As shown in table 1 below, the above method and the heuristic reference frame selection method provided in the present application are quantitatively compared, where Δ PSNR and Δ SSIM respectively represent the average PSNR and SSIM boost values of 18 test sequences in the reference frame selection method. As can be seen from table 1, the methods provided by the present application all achieve higher Δ PSNR and Δ SSIM than the heuristic reference frame selection method under different reference frame numbers. This verifies the effectiveness of the above-described method provided by the present application.

TABLE 1

As shown in fig. 10, which qualitatively compares the above-described method provided by the present application with a heuristic reference frame selection method. As shown in fig. 10, the above method provided by the present application recovers more detailed information than the heuristic reference frame selection method. This is mainly due to the fact that the above method provided by the present application can provide high-quality reference information, i.e., reference frame selection is better.

In the embodiment of the application, a video decompression distortion method based on adaptive reference frame selection is provided. Compared with other video decompression distortion methods, the method provided by the application has two advantages in reference frame selection. First, the method is able to adaptively select a reference frame according to video content, and is independent of decoding side information. Secondly, the method learns the reference frame selection in a data-driven mode, and can find a better solution than a heuristic reference frame selection method. That is to say, in the embodiment of the present application, the influence of the selection of the reference frame on the enhancement effect is considered, so that a better solution than a heuristic reference frame selection method can be found, which is favorable for better enhancing the image quality of the video frame to be processed.

In this embodiment of the application, the EDVR network in the distortion removal module may also be replaced by another video enhancement network, and the network structure of the distortion removal module is not limited.

The algorithm of the self-adaptive reference frame selection module is also suitable for tasks such as video compression, video deblurring and the like.

On the basis of the self-adaptive reference frame selection module, a reference frame number selection branch is added, so that the number of the reference frames and the positions of the reference frames are determined in a self-adaptive manner according to the space-time information of the current frame.

It should be noted that although the steps of the methods in this application are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order or that all of the depicted steps must be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step execution, and/or one step may be broken down into multiple step executions, etc.; or, the steps in different embodiments are combined into a new technical solution.

Based on the foregoing embodiments, the present application provides a reference frame selecting apparatus, which includes modules and units included in the modules, and can be implemented by a processor; of course, the implementation can also be realized through a specific logic circuit; in implementation, the processor may be an AI acceleration engine (e.g., NPU, etc.), a GPU, a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 11 is a schematic structural diagram of a reference frame selecting apparatus according to an embodiment of the present application, and as shown in fig. 11, the reference frame selecting apparatus 110 includes:

an obtaining module 1101 configured to obtain E first adjacent frames of a video frame to be processed;

a selecting module 1102 configured to select a first reference frame of the to-be-processed video frame from the E first adjacent frames according to the to-be-processed video frame and the image content of the E first adjacent frames; wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

In some embodiments, the selection module 1102 is configured to: processing the image contents of the video frame to be processed and the E first adjacent frames through a reference frame selection model obtained through pre-training to obtain a first reference frame of the video frame to be processed; wherein the reference frame selection model is an AI model.

In some embodiments, the reference frame selection model is a convolutional neural network, comprising at least: the device comprises a convolution layer, a pooling layer, a full-connection layer and an output layer; the convolution layer is used for carrying out convolution operation on the input video frame to be processed and the image content of the E first adjacent frames to obtain a feature map; the pooling layer is used for pooling the characteristic diagram to obtain a pooled characteristic diagram; the full connection layer is used for determining the probability that the first adjacent frame is selected as a first reference frame according to the pooled feature map; the output layer is used for selecting the first reference frame according to the probability that the E first adjacent frames are selected as the first reference frame.

In some embodiments, the full-link layer is further configured to determine, according to the pooled feature map, probabilities that F different numbers of frames are respectively used as the number of first reference frames; correspondingly, the output layer is used for determining a target number according to the probability that the F different frame numbers are respectively used as the number of the first reference frames; and selecting the first reference frames with the target number according to the probability that the E first adjacent frames are selected as the first reference frames.

In some embodiments, the determining of the reference frame selection model comprises: according to a plurality of second adjacent frames of a first sample frame and the first sample frame, carrying out first adjustment processing on model parameters of a first initial model of the reference frame selection model to obtain an adjusted first initial model; wherein the first adjustment processing includes: inputting the plurality of second adjacent frames and the first sample frame into the first initial model to obtain a second reference frame of the first sample frame; acquiring a first reconstruction frame of the first sample frame, wherein the first reconstruction frame is obtained by an image enhancement model obtained by pre-training based on the first sample frame and a corresponding second reference frame; determining a first loss of the first reconstructed frame based on at least the first reconstructed frame and a standard frame of the first sample frame; adjusting model parameters of the first initial model according to the first loss;

and according to a plurality of third adjacent frames of a second sample frame and the second sample frame, performing the first adjustment processing on the adjusted model parameters of the first initial model until the corresponding obtained first loss or iteration times meet a cutoff condition, and obtaining the reference frame selection model.

It should be noted that, the reference frame selection module may be executed by the reference frame selection module 110, or may be executed by another module, which is not limited in this respect.

In some embodiments, the determining a first loss of the first reconstructed frame from at least the first reconstructed frame and a standard frame of the first sample frame includes: determining a first reward of the first reconstructed frame according to the first reconstructed frame and a standard frame of the first sample frame; taking the first sample frame as a starting point, taking a continuous M1 frame before the first sample frame and a continuous M2 frame after the first sample frame as fifth reference frames, and inputting the fifth reference frames and the first sample frame into the image enhancement model to obtain a second reconstructed frame of the first sample frame; wherein M1 and M2 are greater than 0 and less than or equal to half of the number of the second plurality of adjacent frames; determining a second reward of the second reconstructed frame according to the second reconstructed frame and the standard frame of the first sample frame; determining the first loss based on the first reward, the second reward, and a probability of the second reference frame being selected as a reference frame.

In some embodiments, the determining of the image enhancement model comprises: according to a plurality of fourth adjacent frames of a third sample frame and the third sample frame, carrying out second adjustment processing on model parameters of a second initial model of the image enhancement model to obtain an adjusted second initial model; wherein the second adjustment processing includes: sampling at least one third reference frame from the plurality of fourth adjacent frames; inputting the third sample frame and the at least one third reference frame into the second initial model to obtain a third reconstructed frame of the third sample frame; determining a second loss of the third reconstructed frame based on at least the third reconstructed frame and a standard frame of the third sample frame; adjusting model parameters of the second initial model according to the second loss; and performing second adjustment processing on the adjusted model parameters of the second initial model according to a plurality of fifth adjacent frames of a fourth sample frame and the fourth sample frame until a corresponding obtained second loss or iteration number meets a cutoff condition, so as to obtain the image enhancement model.

It should be noted that the determination process of the image enhancement model may be executed by the reference frame selecting device 110, or may be executed by other devices, which is not limited to this.

In some embodiments, the reference frame selection apparatus 110 further comprises an input module configured to: and inputting the video frame to be processed and the corresponding first reference frame into a target image enhancement model obtained by pre-training to obtain a fifth reconstruction frame of the video frame to be processed.

In some embodiments, the determining of the target image enhancement model comprises: performing third adjustment processing on the model parameters of the image enhancement model according to a plurality of sixth adjacent frames of a fifth sample frame and the fifth sample frame to obtain an adjusted image enhancement model; wherein the third adjustment processing includes: inputting the sixth adjacent frames and the fifth sample frame into the reference frame selection model to obtain a fourth reference frame of the fifth sample frame; acquiring a fourth reconstruction frame of the fifth sample frame; the fourth reconstructed frame is obtained by the image enhancement model based on the fifth sample frame and a corresponding fourth reference frame; determining a third loss of the fourth reconstructed frame based on at least the fourth reconstructed frame and a standard frame of the fifth sample frame; adjusting model parameters of the image enhancement model according to the third loss;

and adjusting the model parameters of the adjusted image enhancement model according to a plurality of seventh adjacent frames of a sixth sample frame and the sixth sample frame until the correspondingly obtained third loss or iteration number meets a cutoff condition, so as to obtain the target image enhancement model.

It should be noted that, the determination process of the target image enhancement model may be executed by the reference frame selecting device 110, and may also be executed by other devices, which is not limited in this respect.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

An embodiment of the present application further provides a video enhancement apparatus, including: the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire E first adjacent frames of video frames to be processed; wherein E is greater than 1; a selection module configured to select a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the E first adjacent frames; and the enhancement module is configured to enhance the image quality of the video frame to be processed according to the first reference frame.

In some embodiments, the enhancement module is configured to input the video frame to be processed and the corresponding first reference frame into the target image enhancement model, so as to obtain a fifth reconstructed frame of the video frame to be processed.

It should be noted that the above description of the embodiment of the video enhancement apparatus is similar to the description of the embodiment of the reference frame selection method, and has similar beneficial effects to the embodiment of the reference frame selection method. For technical details not disclosed in the embodiments of the video enhancement apparatus of the present application, please refer to the description of the embodiments of the frame selection method of the present application for understanding.

It should be noted that, the division of the modules by the apparatus described in the foregoing embodiment is illustrative, and is only one logical function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, may exist alone physically, or may be integrated into one unit by two or more units. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. Or in a combination of software and hardware.

It should be noted that, in the embodiment of the present application, if the method described above is implemented in the form of a software functional module and sold or used as a standalone product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application or portions thereof that contribute to the related art may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

An electronic device according to an embodiment of the present application is provided, fig. 12 is a schematic diagram of a hardware entity of the electronic device according to the embodiment of the present application, and as shown in fig. 12, an electronic device 120 includes a memory 1201 and a processor 1202, where the memory 1201 stores a computer program that can be executed on the processor 1202, and the processor 1202 implements the steps in the method provided in the foregoing embodiment when executing the program.

It is noted that the Memory 1201 is configured to store instructions and applications executable by the processor 1202, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 1202 and modules in the electronic device 120, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps in the methods provided in the above embodiments.

Embodiments of the present application provide a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the method provided by the above-described method embodiments.

It is to be noted here that: the above description of the storage medium and device embodiments, similar to the description of the method embodiments above, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium, the storage medium and the device of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" or "some embodiments" means that a particular feature, structure or characteristic described in connection with the embodiments is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not imply any order of execution, and the order of execution of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

The term "and/or" herein is merely an association relationship describing an associated object, and means that three relationships may exist, for example, object a and/or object B, may mean: the object A exists alone, the object A and the object B exist simultaneously, and the object B exists alone.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, method, article, or apparatus comprising the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be electrical, mechanical or in other forms.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules; can be located in one place or distributed on a plurality of network units; some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may be separately regarded as one unit, or two or more modules may be integrated into one unit; the integrated module can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer-readable storage medium, and when executed, executes the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit described above may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of reference frame selection, the method comprising:

acquiring E first adjacent frames of a video frame to be processed; wherein E is greater than 1;

selecting a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the image content of the E first adjacent frames;

wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

2. The method according to claim 1, wherein said selecting a first reference frame of the video frame to be processed from the E first neighboring frames according to the image contents of the video frame to be processed and the E first neighboring frames comprises:

processing the image contents of the video frame to be processed and the E first adjacent frames through a reference frame selection model obtained through pre-training to obtain a first reference frame of the video frame to be processed; wherein the reference frame selection model is an AI model.

3. The method of claim 2, wherein the reference frame selection model is a convolutional neural network comprising at least: the device comprises a convolution layer, a pooling layer, a full-connection layer and an output layer; wherein the content of the first and second substances,

the convolution layer is used for carrying out convolution operation on the image contents of the video frame to be processed and the E first adjacent frames to obtain a characteristic diagram;

the pooling layer is used for pooling the characteristic diagram to obtain a pooled characteristic diagram;

the full connection layer is used for determining the probability that the first adjacent frame is selected as a first reference frame according to the pooled feature map;

the output layer is configured to select a first reference frame based on a probability that the E first neighboring frames are selected as first reference frames.

4. The method of claim 3, wherein the full-link layer is further configured to determine probabilities that the respective F different frame numbers are the number of the first reference frames according to the pooled feature map; wherein F is greater than 1;

correspondingly, the output layer is used for determining a target number according to the probability that the F different frame numbers are respectively used as the number of the first reference frames; and selecting the first reference frames with the target number according to the probability that the E first adjacent frames are selected as the first reference frames.

5. The method according to any of claims 2 to 4, wherein the determining of the reference frame selection model comprises:

according to a plurality of second adjacent frames of a first sample frame and the first sample frame, carrying out first adjustment processing on model parameters of a first initial model of the reference frame selection model to obtain an adjusted first initial model;

wherein the first adjustment processing includes: inputting the plurality of second adjacent frames and the first sample frame into the first initial model to obtain a second reference frame of the first sample frame; acquiring a first reconstruction frame of the first sample frame, wherein the first reconstruction frame is obtained by an image enhancement model obtained by pre-training based on the first sample frame and a corresponding second reference frame; determining a first loss of the first reconstructed frame based on at least the first reconstructed frame and a standard frame of the first sample frame; adjusting model parameters of the first initial model according to the first loss;

6. The method of claim 5, wherein determining the first loss of the first reconstructed frame based on at least the first reconstructed frame and a standard frame of the first sample frame comprises:

determining a first reward of the first reconstructed frame according to the first reconstructed frame and a standard frame of the first sample frame;

taking the first sample frame as a starting point, taking a continuous M1 frame before the first sample frame and a continuous M2 frame after the first sample frame as fifth reference frames, and inputting the fifth reference frames and the first sample frame into the image enhancement model to obtain a second reconstructed frame of the first sample frame; wherein M1 and M2 are greater than 0 and less than or equal to half of the number of the second plurality of adjacent frames;

determining a second reward of the second reconstructed frame according to the second reconstructed frame and the standard frame of the first sample frame;

determining the first loss based on the first reward, the second reward, and a probability of the second reference frame being selected as a reference frame.

7. The method of claim 5, wherein the determining of the image enhancement model comprises:

according to a plurality of fourth adjacent frames of a third sample frame and the third sample frame, carrying out second adjustment processing on model parameters of a second initial model of the image enhancement model to obtain an adjusted second initial model;

wherein the second adjustment processing includes: sampling at least one third reference frame from the plurality of fourth adjacent frames; inputting the third sample frame and the at least one third reference frame into the second initial model to obtain a third reconstructed frame of the third sample frame; determining a second loss of the third reconstructed frame based on at least the third reconstructed frame and a standard frame of the third sample frame; adjusting model parameters of the second initial model according to the second loss;

and performing second adjustment processing on the adjusted model parameters of the second initial model according to a plurality of fifth adjacent frames of a fourth sample frame and the fourth sample frame until a corresponding obtained second loss or iteration number meets a cutoff condition, so as to obtain the image enhancement model.

8. The method of claim 5, further comprising:

performing third adjustment processing on the model parameters of the image enhancement model according to a plurality of sixth adjacent frames of a fifth sample frame and the fifth sample frame to obtain an adjusted image enhancement model;

wherein the third adjustment processing includes: inputting the sixth adjacent frames and the fifth sample frame into the reference frame selection model to obtain a fourth reference frame of the fifth sample frame; acquiring a fourth reconstruction frame of the fifth sample frame; the fourth reconstructed frame is obtained by the image enhancement model based on the fifth sample frame and a corresponding fourth reference frame; determining a third loss of the fourth reconstructed frame based on at least the fourth reconstructed frame and a standard frame of the fifth sample frame; adjusting model parameters of the image enhancement model according to the third loss;

9. The method of claim 8, further comprising:

and inputting the video frame to be processed and the corresponding first reference frame into the target image enhancement model to obtain a fifth reconstruction frame of the video frame to be processed.

10. A method of video enhancement, the method comprising:

and enhancing the image quality of the video frame to be processed according to the first reference frame.

11. An apparatus for reference frame selection, the apparatus comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire E first adjacent frames of a video frame to be processed; wherein E is greater than 1;

a selection module configured to select a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the E first adjacent frames; wherein the first reference frame is used for enhancing the image quality of the video frame to be processed.

12. A video enhancement apparatus, the apparatus comprising:

a selection module configured to select a first reference frame of the video frame to be processed from the E first adjacent frames according to the video content of the video frame to be processed and the E first adjacent frames;

and the enhancement module is configured to enhance the image quality of the video frame to be processed according to the first reference frame.

13. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the method of any one of claims 1 to 9 when executing the program or the processor implements the method of claim 10 when executing the program.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 9, or which, when being executed by a processor, carries out the method of claim 10.