CN117880512A

CN117880512A - B frame multi-time layer layered filtering method, device, equipment and medium

Info

Publication number: CN117880512A
Application number: CN202311755463.4A
Authority: CN
Inventors: 马思伟; 赵衍琛; 贾川民; 何汶轩; 王苫社
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-04-12

Abstract

The present disclosure relates to the field of video coding and filtering technologies, and in particular, to a B-frame multi-temporal layer hierarchical filtering method, apparatus, device, and medium. The method comprises the following steps: performing preset coding configuration on a target video frame, and coding the target video frame after configuration; determining a B frame to be filtered in the process of encoding the target video frame; constructing a multi-time layer filtering model, wherein the multi-time layer filtering model is obtained by integrating a plurality of iteratively trained layered filtering models; and filtering the B frame to be filtered by using the multi-temporal layer filtering model to obtain a filtered video frame. The method and the device enable the B frames to be filtered by the corresponding layered filtering model according to the height of the time layer, so that the filtering quality of each layer of frames is improved, and the filtering loss is small. Meanwhile, when the filtered frame is used as a reference of a frame to be encoded, the condition of excessive filtering is avoided, and the filtering performance is improved under the condition of not affecting the encoding efficiency.

Description

B frame multi-time layer layered filtering method, device, equipment and medium

Technical Field

The present disclosure relates to the field of video coding and filtering technologies, and in particular, to a B-frame multi-temporal layer hierarchical filtering method, apparatus, device, and medium.

Background

The deep learning technology is continuously broken through on the basis of the traditional computer vision task, and a video coding and decoding algorithm combined with the deep learning is gradually a key technology in the field of video compression and transmission, and in the background, a plurality of loop filtering technologies based on the deep learning are generated. The loop filtering technology based on deep learning is essentially to train by taking an original image as a label through a supervised learning mode by means of strong nonlinear feature extraction and expression capacity of a neural network, and directly regress a high-quality image at a pixel level.

However, the loop filtering technology is applied to the quality enhancement of the video frames in the encoding loop, and compared with the post-processing quality enhancement technology, the loop filtering technology can affect the subsequent encoding process, so that the filtering quality of frames of different time layers is uneven, and the filtering performance is poor.

Disclosure of Invention

Based on the technical problems, the application aims to provide a multi-time-layer hierarchical filtering method, device, equipment and medium for B frames, so as to solve the problem of poor filtering performance.

The first aspect of the present application provides a multi-temporal layer hierarchical filtering method for B frames, the method comprising:

performing preset coding configuration on a target video frame, and coding the target video frame after configuration;

determining a B frame to be filtered in the process of encoding the target video frame;

constructing a multi-time layer filtering model, wherein the multi-time layer filtering model is obtained by integrating a plurality of iteratively trained layered filtering models;

and filtering the B frame to be filtered by using the multi-temporal layer filtering model to obtain a filtered video frame.

In some embodiments of the present application, the preset encoding configuration includes a random access configuration and a quantization parameter configuration.

In some embodiments of the present application, the quantization parameter configuration includes configuring quantization parameters to quantization parameter values of 27, 32, 38, and 45, respectively, under a preset encoding standard.

In some embodiments of the present application, iteratively training a hierarchical filtering model includes:

and according to different quantization parameter values, training a plurality of layered filtering models from a low time layer to a high time layer in a gradual iteration mode, and stopping training until the preset iterative training times are reached.

In some embodiments of the present application, the training the multiple layered filtering models from the low temporal layer to the high temporal layer step by step according to the difference of the quantization parameter values until reaching the preset number of iterative training times, includes:

respectively training layered filtering models with quantization parameter values of 27, 32, 38 and 45, and obtaining 4 trained low-time-layer filtering models when training is completed, wherein the low-time-layer comprises a first time layer and a second time layer;

respectively training layered filtering models with quantization parameter values of 27, 32, 38 and 45, and adopting unfiltered B frames of a third time layer and a fourth time layer as data sets to obtain 4 trained middle time layer filtering models when training is completed;

and respectively training the layered filtering models with quantization parameter values of 27, 32, 38 and 45, adopting unfiltered B frames of the fifth time layer and the sixth time layer as data sets, and obtaining 4 trained high-time layer filtering models when training is completed.

In some embodiments of the present application, the method for acquiring a data set includes:

closing a preset filtering tool in the preset coder-decoder, only starting the low-time layer filtering model, and acquiring quantization parameter values of 27, 32, 38 and 45, wherein the coding configuration is an original coding video training set under random access configuration;

determining a reconstruction sequence in the original coded video training set;

b frames with the third time layer and the fourth time layer and unfiltered are extracted from the reconstruction sequence, B frames with the third time layer and the fourth time layer and unfiltered are extracted from the extracted time layers, and original video frames corresponding to the unfiltered B frames are used as a data set of a training time layer filtering model;

closing a preset filtering tool in the preset coder-decoder, starting the low-time layer filtering model and the medium-time layer filtering model, and acquiring quantization parameter values of 27, 32, 38 and 45, wherein an original coded video training set under random access configuration is used as a first training set;

determining a first reconstruction sequence in the first training set;

b frames with the time layers being a fifth time layer and a sixth time layer and being unfiltered are extracted from the first reconstruction sequence, B frames with the extracted time layers being the fifth time layer and the sixth time layer and being unfiltered, and original video frames corresponding to the unfiltered B frames are used as a data set for training a high-time layer filtering model.

In some embodiments of the present application, the method further comprises:

and integrating the multi-temporal layer filtering model into a preset coder and decoder, and coding and decoding the target video frame based on an integration result.

A second aspect of the present application provides a multi-temporal layer hierarchical filtering apparatus for B frames, the apparatus comprising:

the configuration module is used for carrying out preset coding configuration on the target video frame and coding the target video frame after configuration;

the determining module is used for determining a B frame to be filtered in the process of encoding the target video frame;

the construction module is used for constructing a multi-time-layer filtering model, wherein the multi-time-layer filtering model is obtained by integrating a plurality of iteratively trained layered filtering models;

and the filtering module is used for filtering the B frame to be filtered by using the multi-temporal layer filtering model so as to obtain a filtered video frame.

A third aspect of the present application provides an electronic device, including a memory and a processor, where the memory stores computer readable instructions that, when executed by the processor, cause the processor to perform a multi-temporal layer hierarchical filtering method for B frames as described in embodiments of the present application.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a multi-temporal layer hierarchical filtering method for B frames as described in embodiments of the present application.

The technical scheme provided in the embodiment of the application has at least the following technical effects or advantages:

according to the multi-temporal layered filtering method for the B frames, preset coding configuration is conducted on target video frames, the target video frames are coded after configuration, the B frames to be filtered are determined in the coding process of the target video frames, a multi-temporal layered filtering model is built, the multi-temporal layered filtering model is obtained through integration of a plurality of iterative trained layered filtering models, the multi-temporal layered filtering model is used for filtering the B frames to be filtered, so that filtered video frames are obtained, the B frames are filtered according to the corresponding layered filtering models of the temporal layers, filtering quality of each layer of frames is improved, filtering loss is small, and filtering performance is improved. Meanwhile, when the filtered frame is used as a reference of a frame to be encoded, the condition of excessive filtering is avoided, and the filtering performance is improved under the condition of not affecting the encoding efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic diagram illustrating steps of a B-frame multi-temporal layer hierarchical filtering method according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a hierarchical filtering model in an exemplary embodiment of the present application;

FIG. 3 is a flow chart of a multi-temporal layer hierarchical filtering method for B frames in an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of a B-frame multi-model hierarchical filtering device according to an exemplary embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples, it being understood that the examples depicted herein are for purposes of illustration only and are not intended to limit the scope of the present invention. It should be further noted that, for convenience of description, only the part related to the present invention is shown in the drawings.

The loop filtering technique is applied to quality enhancement of video frames within a coding loop, which affects subsequent coding processes as compared to post-processing quality enhancement techniques. For example, in a Random Access (RA) configuration, each frame of image has reference frames in front and back directions, a frame of a low temporal layer is encoded first, the quality of the compressed frame is higher, a frame of a high temporal layer is encoded last, the quality of the frame is lower than that of a frame of a low temporal layer, which results in different generation reasons and severity of artifacts of each frame, and the filtering strength of a neural network filtering tool actually needed is also different. The prior art causes uneven filtering quality of frames of different time layers and poor filtering performance. Furthermore, the enhanced frame is used as a reference of a subsequent frame to be encoded, and the situation of repeated and excessive filtering of the same picture area of the subsequent frame exists, so that the filtering performance is greatly lost.

Thus, in some embodiments of the present application, a multi-temporal layer hierarchical filtering method of B frames is provided, as shown in fig. 1, the method comprising steps S1 to S4.

S1, carrying out preset coding configuration on a target video frame, and coding the target video frame after configuration.

The encoding refers to a process of converting video data into a digital format, and the preset encoding configuration is a set of parameters set in advance before processing video, and the encoding parameter configuration is performed on the target video frame so as to be used in subsequent processing. Here, the preset encoding configuration includes a random access configuration and a Quantization Parameter (QP) configuration. The frames in the random access configuration allow for quick access to a specific location at any time without the need to decode the entire video stream. In some coding standards, such as h.266/VVC, the random access configuration, RA configuration, may involve employing a hierarchical B-frame structure that allows CRA frames to be inserted at certain locations in the encoded video stream, thereby providing more flexible random access capabilities.

The quantization parameter is a parameter used to control the quantization process. These parameters determine the accuracy and extent to which the signal is mapped to discrete values. In audio and video coding, quantization parameters are important for determining the quality and file size of the compressed data. In this embodiment, the configuration includes configuring quantization parameters to quantization parameter values of 27, 32, 38, and 45, respectively, under the AVS (all Audio Video Coding Standard) coding standard, which have represented quantization levels of different degrees.

The target video frame is subjected to RA configuration and quantization parameter configuration, and encoding is started after configuration, so that filtering under the RA configuration and quantization parameter configuration is performed later when a filtering model is trained.

S2, determining a B frame to be filtered in the process of encoding the target video frame.

B frames typically use inter-frame prediction, i.e. predicting the content of the current frame by referring to the information of the frames, which contains motion estimation and motion compensation for the previous and subsequent frames. The encoder will compare the current frame to the previously encoded I or P frames to find the best motion vector, representing the motion of the target frame relative to the reference frame. B frames are also often introduced in the GOP (Group of Pictures) structure, and the encoder will decide at which locations to introduce B frames to achieve the best video compression effect based on GOP settings. Therefore, when coding is started, the B frame needing to be filtered is determined according to the decision of the coder, namely the B frame to be filtered.

S3, constructing a multi-time-layer filtering model, wherein the multi-time-layer filtering model is integrated by a plurality of iteratively trained layered filtering models.

In a specific implementation, according to different quantization parameter values, the multiple layered filtering models are iteratively trained step by step from the low time layer to the high time layer until reaching a preset number of iterative training times, and in a preferred implementation, the method comprises the steps of S31-S33.

S31, respectively training the layered filtering models with quantization parameter values of 27, 32, 38 and 45, and obtaining 4 trained low-time layer filtering models when training is completed, wherein the low-time layer comprises a first time layer and a second time layer.

Fig. 2 shows a structure of a layered filtering model, which is a luminance filtering model. As shown in fig. 2, the luminance filtering model includes three modules of part1, part2 and part3, where part1 and part3 are both convolution modules, and the difference is that part1 includes a convolution kernel of 3*3 and an activation function ReLu, and part3 includes two convolution kernels of 3*3 and an activation function, both of which have feature extraction functions, but the depths of the extracted features of both are different. And part2 is the Residual block (Residual blocks in fig. 2) that can learn the Residual, i.e. the difference between the input and the desired output, i.e. the difference of the original luminance frame (Original luma frame in fig. 2) and the reconstructed luminance frame (i.e. Reconstructed luma frame in fig. 2). In this embodiment, a luminance filter model is used as a layered filter model, but models of different levels of time layers with quantization parameter values of 27, 32, 38 and 45 need to be trained respectively, and finally 12 models are trained.

The first temporal layer and the second temporal layer are temporal layers indicating index values designated as 1 and 2, and the temporal layers having index values of 1 and 2 are regarded as low temporal layers, and since the temporal layers are low, only the original encoded video training set is utilized without extracting temporal layer frames having index values of 1 and 2 as samples. The application adopts an experimental platform for AVS intelligent coding reference software HPM14.2-ModAI10.0, coding configuration is RA, a training video data set adopts a B, C, D sequence of BVI-DVC, and the B, C, D sequence has different resolutions. Referring to fig. 3, in the training phase, a compressed training video data set is encoded for 4 common quantization parameters, namely QP, using a conventional video codec, and a B-frame neural network filter model for time slices 1 and 2 is trained to obtain 4 different QP's. These 4 models are integrated into the platform, after which training is required to get the B-frame neural network filter models for temporal layers 3 and 4 at 4 different QPs, and the B-frame neural network filter models for temporal layers 5 and 6 at 4 different QPs.

It should be noted that: when the data set is acquired, a preset filtering tool in a preset codec is closed, only a low-time layer filtering model is started, the acquired quantization parameter values are 27, 32, 38 and 45, the encoding configuration is an original encoding video training set under the random access configuration, and the preset filtering tool refers to a filtering tool based on a traditional method, such as DBF, ALF, SAO. And then determining a reconstruction sequence in the original coded video training set, extracting B frames with the time layers being a third time layer and a fourth time layer and unfiltered, taking the B frames with the extracted time layers being the third time layer and the fourth time layer and unfiltered and the original video frames corresponding to the unfiltered B frames as a data set of a time layer filtering model in training. And then closing a preset filtering tool in a preset coder and decoder, starting a low-time layer filtering model and a medium-time layer filtering model, collecting quantization parameter values of 27, 32, 38 and 45, and taking an original coded video training set configured in a random access configuration as a first training set. And finally, determining a first reconstruction sequence in a first training set, extracting B frames which are not filtered and have the time layers of a fifth time layer and a sixth time layer in the first reconstruction sequence, taking the B frames which are not filtered and have the time layers of the fifth time layer and the sixth time layer and the original video frames corresponding to the B frames which are not filtered as a data set for training a high-time layer filtering model.

S32, respectively training a layered filtering model under quantization parameter values of 27, 32, 38 and 45, adopting unfiltered B frames of a third time layer and a fourth time layer as a data set, and obtaining 4 trained middle time layer filtering models when training is completed.

When a data set is acquired, a preset filtering tool in the preset coder-decoder is closed, only the low-time layer filtering model is started, the acquired quantization parameter values are 27, 32, 38 and 45, the coding configuration is an original coding video training set under random access configuration, and the original coding video training set still adopts a B, C, D sequence of BVI-DVC. Determining a reconstruction sequence in an original coding video training set, extracting unfiltered B frames with time layers being a third time layer and a fourth time layer (with index values being time layers of 3 and 4) from the reconstruction sequence, finally taking the unfiltered B frames and original video frames corresponding to the unfiltered B frames as a data set, obtaining B frame neural network filtering models aiming at the time layers 3 and 4 under 4 different QPs, integrating the 4 models into a platform, and training aiming at higher time layer frames.

S33, respectively training a layered filtering model under quantization parameter values of 27, 32, 38 and 45, adopting unfiltered B frames of a fifth time layer and a sixth time layer as a data set, and obtaining 4 trained high-time layer filtering models when training is completed.

When the data set is acquired, the traditional filtering tools such as DBF, ALF, SAO and the like in all encoders in the preset coder are closed, the neural network filtering tools of the first time layer and the second time layer (namely the low time layer) and the third time layer and the fourth time layer (namely the middle time layer) are opened, quantization parameter values of 27, 32, 38 and 45 are acquired, the original coding video training set under the random access configuration is encoded, a reconstruction sequence is determined in the original coding video training set, the time layers are extracted in the reconstruction sequence to be fifth time layer and sixth time layer and unfiltered B frames, and finally the unfiltered B frames and the original video frames corresponding to the unfiltered B frames are taken as the data set.

It should be noted that, training for different time layer frames is to optimize parameters of the model through multiple iterations to gradually improve the model performance. The compressed higher time layer frames are extracted and input into a neural network filtering model with the same structure for training and fine adjustment, and the filtering models with the same structure and different weights are obtained. The number of iterative training times can be determined according to the number of specific time layers, the upper limit of storage space and the like, and a filtering model can be independently trained for each time layer, so that the filtering performance is improved more and the total volume of the model is larger; the same filtering model can be trained and used together by two or more adjacent time layers, so that the filtering performance is improved moderately, and the total volume of the model is moderate. Therefore, the method and the device can effectively improve the hit rate of the filtering model under the middle and high time layers, improve the overall coding performance, only increase the number of the original models, and not increase the extra coding complexity on the basis of the original models, namely improve the filtering performance under the condition of not affecting the coding efficiency.

As further shown in fig. 3, a B-frame neural network filter model for temporal layers 5 and 6 at 4 different QPs is obtained. To this end, 12 layered filter models were trained altogether, and all of these 12 layered filter models were integrated into the platform, i.e., the multi-temporal layer filter model, as shown in table 1.

Table 1 RA additional case table of layered filter model under configuration

Temporal Layer\QP	≤31	[32,37]	[38,44]	≥45
					1、2	Original model	Original model	Original model	Original model
3、4	New model	New model	New model	New model
					5、6	New model	New model	New model	New model

As can be seen from table 1, the present application trains frames of the third time layer, the fourth time layer, the fifth time layer and the sixth time layer, learns features of more time layers, and adds 8 new layered filtering models, so that B frames are filtered by the corresponding layered filtering models according to the heights of the time layers, and the filtering quality of each layer of frames is improved.

And S4, filtering the B frame to be filtered by using the multi-temporal layer filtering model so as to obtain a filtered video frame.

In a specific implementation manner, the method further includes integrating the multi-temporal layer filtering model into a preset codec, and encoding and decoding the target video frame based on an integration result. The preset codec may be a conventional codec or a codec currently used in the prior art.

The integration test is carried out on AVS3 intelligent coding reference software HPM14.2-ModAI10.0, the test condition is general test condition of AVS3 intelligent coding, the test process uses CPU to carry out neural network reasoning, the Batch Size (Batch Size) is set to 16, the coding configuration is RA configuration, the brightness component can obtain 0.97% coding gain on average compared with the original ModAI10.0 on the general test sequence of general test CPU, and the result is shown in Table 2.

Table 2 coding performance table after filtering using layered filtering model of the present application under RA configuration

Random Access in Table 2 indicates performance testing under Random Access configuration, and general test sequences of 1080P and 720P are shown, which represent 1080P resolution test sequence and 720P resolution test sequence, respectively, and Overall indicates comprehensive evaluation of Overall performance. Y denotes a luminance information component of an image, U and V denote color information components, where U denotes a blue chrominance component, V denotes a red chrominance component, and "EncT" and "DecT" denote the percentage of the encoding time and decoding time of modai10.0 integrated with the filtering method of the present application, respectively, compared to the original modai 10.0. As can be seen from table 2, the filtering method of the present application can achieve an average coding gain of 0.97% on the luminance component. In addition, the number and the volume of the original filtering models are only increased through the iterative training method, coding complexity is not additionally increased, decoding complexity is increased due to the fact that different filtering models are switched and loaded, the method can be achieved through the model preloading method, and the method is adapted to the original loop filtering method and the coding and decoding scene to the greatest extent.

In some embodiments of the present application, there is further provided a multi-temporal layer filtering apparatus for B frames, performing the multi-temporal layer filtering method for B frames described in each embodiment, as shown in fig. 4, and as shown in fig. 4, the apparatus includes:

the configuration module 401 is configured to perform preset encoding configuration on a target video frame, and encode the target video frame after configuration;

a determining module 402, configured to determine a B frame to be filtered during encoding of the target video frame;

a building module 403, configured to build a multi-temporal filtering model, where the multi-temporal filtering model is obtained by integrating a plurality of iteratively trained layered filtering models;

and the filtering module 404 is configured to filter the B frame to be filtered using the multi-temporal layer filtering model to obtain a filtered video frame.

The filtering module comprises a training module, and the training module specifically executes the following steps:

The multi-time layer layered filtering device enables B frames to be filtered by the corresponding layered filtering model according to the height of time layers, so that the filtering quality of each layer of frames is improved, the filtering loss is small, and the filtering performance is improved. Meanwhile, when the filtered frame is used as a reference of a subsequent frame to be encoded, the condition of excessive filtering is avoided.

The training module, when acquiring the data set, performs: collecting quantization parameter values of 27, 32, 38 and 45, wherein the coding configuration is an original coding video training set under random access configuration; determining a reconstruction sequence in the original coded video training set; extracting B frames of which the temporal layers are the third temporal layer and the fourth temporal layer and are unfiltered in the reconstruction sequence, and extracting B frames of which the temporal layers are the fifth temporal layer and the sixth temporal layer and are unfiltered in the reconstruction sequence; the unfiltered B frame and the original video frame corresponding to the unfiltered B frame are taken as a dataset.

Reference is now made to fig. 5, which is a schematic diagram of an electronic device according to some embodiments of the present application. As shown in fig. 5, the electronic device 2 includes: a processor 200, a memory 201, a bus 202 and a communication interface 203, the processor 200, the communication interface 203 and the memory 201 being connected by the bus 202; the memory 201 stores a computer program executable on the processor 200, and the processor 200 executes a multi-temporal layer hierarchical filtering method of B frames according to any one of the embodiments of the present application when the computer program is executed. The method comprises the following steps: performing preset coding configuration on a target video frame, and coding the target video frame after configuration; determining a B frame to be filtered in the process of encoding the target video frame; constructing a multi-time layer filtering model, wherein the multi-time layer filtering model is obtained by integrating a plurality of iteratively trained layered filtering models; and filtering the B frame to be filtered by using the multi-temporal layer filtering model to obtain a filtered video frame.

The memory 201 may include a high-speed random access memory (RAM: random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 203 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

Bus 202 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. The memory 201 is configured to store a program, and the processor 200 executes the program after receiving an execution instruction, and the multi-temporal layer filtering method of B frames disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200 or implemented by the processor 200.

The processor 200 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 200 or by instructions in the form of software. The processor 200 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 201, the processor 200 reads the information in the memory 201 and in combination with its hardware performs the steps of the multi-temporal layered filtering method of the B-frames. The steps include: performing preset coding configuration on a target video frame, and coding the target video frame after configuration; determining a B frame to be filtered in the process of encoding the target video frame; constructing a multi-time layer filtering model, wherein the multi-time layer filtering model is obtained by integrating a plurality of iteratively trained layered filtering models; and filtering the B frame to be filtered by using the multi-temporal layer filtering model to obtain a filtered video frame.

The present application also provides a computer readable storage medium corresponding to the multi-temporal layer hierarchical filtering method of B frames provided in the foregoing embodiments, on which a computer program is stored, which when executed by a processor, performs the multi-temporal layer hierarchical filtering method of B frames provided in any of the foregoing embodiments. Moreover, examples of the computer readable storage medium may include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage medium, which will not be described in detail herein.

In addition, the implementation manner of the application also provides a computer program product, which comprises a computer program, and the computer program is executed by a processor to realize the multi-temporal layer hierarchical filtering method of any B frame in the previous embodiments.

Those skilled in the art will appreciate that the various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in the creation means of a virtual machine according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP).

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A multi-temporal layer hierarchical filtering method for B frames, the method comprising:

2. The multi-temporal layered filtering method of B-frames according to claim 1, wherein the preset coding configuration comprises a random access configuration and a quantization parameter configuration.

3. The multi-temporal layered filtering method of B-frames according to claim 2, wherein the quantization parameter configuration comprises configuring quantization parameters to quantization parameter values of 27, 32, 38 and 45, respectively, under a preset coding standard.

4. A multi-temporal layered filtering method for B frames according to claim 3, characterized by iteratively training a layered filtering model comprising:

5. The method for multi-temporal layer hierarchical filtering of B-frames according to claim 4, wherein the step of iteratively training the plurality of hierarchical filtering models from the low temporal layer to the high temporal layer according to the difference of quantization parameter values until reaching a preset number of iterative training times, comprises:

respectively training layered filtering models with quantization parameter values of 27, 32, 38 and 45, obtaining 4 trained low-time layer filtering models when training is completed, and integrating the low-time layer filtering models into a preset coder-decoder, wherein the low-time layer comprises a first time layer and a second time layer;

respectively training layered filtering models with quantization parameter values of 27, 32, 38 and 45, taking unfiltered B frames of a third time layer and a fourth time layer as data sets, obtaining 4 trained middle time layer filtering models when training is completed, and integrating the 4 trained middle time layer filtering models into a preset coder-decoder;

and respectively training layered filtering models with quantization parameter values of 27, 32, 38 and 45, taking unfiltered B frames of a fifth time layer and a sixth time layer as data sets, obtaining 4 trained high-time layer filtering models when training is completed, and integrating the 4 trained high-time layer filtering models into a preset coder-decoder.

6. The multi-temporal layered filtering method of B-frames according to claim 5, wherein the acquisition method of the data set comprises:

determining a reconstruction sequence in the original coded video training set;

determining a first reconstruction sequence in the first training set;

7. The multi-temporal layer hierarchical filtering method of B frames according to claim 1, further comprising:

8. A multi-temporal layered filtering apparatus for B frames, the apparatus comprising:

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the multi-temporal layer hierarchical filtering method of the B-frame of any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements a multi-temporal layer hierarchical filtering method of B frames according to any of claims 1-7.