CN117615129A

CN117615129A - Inter-frame prediction method, inter-frame prediction device, computer equipment and storage medium

Info

Publication number: CN117615129A
Application number: CN202410093936.7A
Authority: CN
Inventors: 张宏顺
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-02-27
Anticipated expiration: 2044-01-23
Also published as: CN117615129B

Abstract

The application provides an inter-frame prediction method, an inter-frame prediction device, computer equipment and a storage medium, and relates to the technical field of video encoding and decoding. The method comprises the following steps: in the process of encoding a target encoding unit, obtaining target rate distortion cost of each reference frame in a reference frame list; the target rate distortion cost is the rate distortion cost corresponding to the optimal MVP of the reference frame; determining available reference frames in the reference frames based on respective target rate distortion costs of the reference frames; inter-prediction of the target coding unit is performed based on available ones of the respective reference frames.

Description

Inter-frame prediction method, inter-frame prediction device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of video encoding and decoding, in particular to an inter-frame prediction method, an inter-frame prediction device, computer equipment and a storage medium.

Background

In current video compression techniques, such as VVC (Versatile Video Coding, general video coding), a corresponding reference frame needs to be selected during inter-prediction.

In the related art, the encoding end may set a reference frame list, which includes a plurality of reference frames obtained by reconstructing the encoded image frame, and in the inter-frame prediction process, each reference frame in the reference frame list is traversed to select a reference frame corresponding to the current encoding unit from the reference frame list.

However, in the above scheme, the process of traversing the reference frames in the reference frame list consumes a long time and more processing resources, which affects the coding efficiency.

Disclosure of Invention

The embodiment of the application provides an inter-frame prediction method, an inter-frame prediction device, computer equipment and a storage medium, which can reduce the time length and processing resources required for selecting a reference frame and improve the coding efficiency. The technical scheme is as follows:

according to an aspect of embodiments of the present application, there is provided an inter prediction method, the method including:

in the process of encoding a target encoding unit, obtaining target rate distortion cost of each reference frame in a reference frame list; the target rate distortion cost is the rate distortion cost corresponding to the optimal MVP of the reference frame;

determining available reference frames in the reference frames based on respective target rate distortion costs of the reference frames;

inter-prediction of the target coding unit is performed based on available ones of the respective reference frames.

According to an aspect of embodiments of the present application, there is provided an inter prediction apparatus, the apparatus including:

the cost acquisition module is used for acquiring the target rate distortion cost of each reference frame in the reference frame list in the process of encoding the target encoding unit; the target rate distortion cost is the rate distortion cost corresponding to the optimal MVP of the reference frame;

A reference frame determining module, configured to determine available reference frames in the respective reference frames based on respective target rate distortion costs of the respective reference frames;

and the prediction module is used for carrying out inter-frame prediction on the target coding unit based on available reference frames in the reference frames.

According to an aspect of embodiments of the present application, there is provided a computer apparatus including a processor and a memory, in which at least one computer instruction, at least one program, a set of codes, or a set of instructions is stored, the at least one computer instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the inter prediction method described above.

According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored therein at least one computer instruction, at least one program, a code set, or an instruction set, which is loaded and executed by a processor to implement the inter prediction method described above.

In yet another aspect, embodiments of the present application provide a computer program product or computer program comprising at least one computer instruction stored in a computer-readable storage medium. The processor of the computer device reads the at least one computer instruction from the computer-readable storage medium, and the processor executes the at least one computer instruction to cause the computer device to perform the inter-frame prediction method described above.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

in the process of coding a target coding unit, firstly acquiring target rate distortion costs corresponding to the optimal MVP of each reference frame in a reference frame list, then determining available reference frames from each reference frame based on the respective target rate distortion costs of each reference frame, and carrying out inter-frame prediction on the target coding unit based on the available reference frames; in the scheme, firstly, the available reference frames are screened from the reference frame list through the rate distortion cost corresponding to the optimal MVP of each reference frame, and only the screened available reference frames are required to be traversed during the subsequent inter-frame prediction, and all the reference frames in the reference frame list are not required to be traversed, so that the number of the reference frames required to be traversed in the inter-frame prediction process is reduced, the time length and processing resources required for selecting the reference frames are reduced, and the coding efficiency is improved.

Drawings

FIG. 1 is a basic flow chart of a video encoding process illustratively shown herein;

FIG. 2 is a schematic view of CU partitioning according to the present application;

FIG. 3 is a schematic diagram of an inter prediction mode provided by one embodiment of the present application;

FIG. 4 is a schematic diagram of candidate motion vectors provided by one embodiment of the present application;

FIG. 5 is a schematic diagram of an intra block copy mode provided by one embodiment of the present application;

FIG. 6 is a schematic diagram of an intra-string copy mode provided by one embodiment of the present application;

FIG. 7 is a simplified block diagram of a communication system provided in one embodiment of the present application;

FIG. 8 is a schematic diagram of the placement of video encoders and video decoders in a streaming environment as exemplarily shown herein;

FIG. 9 is a flow chart of an inter prediction method provided by one embodiment of the present application;

FIG. 10 is a flow chart of an inter prediction method provided in another embodiment of the present application;

fig. 11 is a flowchart of an inter prediction method provided in yet another embodiment of the present application;

FIG. 12 is a flow chart of an inter prediction method provided in yet another embodiment of the present application;

FIG. 13 is a flow chart of an inter prediction method provided by yet another embodiment of the present application;

FIG. 14 is a coding schematic diagram according to the present application;

fig. 15 is a schematic view of temporal candidate MVP scaling according to the present application;

FIG. 16 is a schematic diagram of the positional relationship between a current block and a neighboring block according to the present application;

FIG. 17 is a schematic diagram of prediction orders of different CU partitions referred to herein;

Fig. 18 is a schematic diagram of an inter prediction order according to the present application;

FIG. 19 is a flow chart of initial reference frame template construction in accordance with the present application;

FIG. 20 is a flow chart of a master template construction according to the present application;

FIG. 21 is a reference relationship diagram corresponding to different frame types referred to herein;

fig. 22 is a reference relationship diagram of the GOP16 related to the present application;

FIG. 23 is a reference frame template application diagram in accordance with the present application;

FIG. 24 is a block diagram of an inter prediction apparatus provided by one embodiment of the present application;

fig. 25 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before describing the embodiments of the present application, a brief description of video encoding techniques is first provided in connection with fig. 1. Fig. 1 illustrates a basic flow chart of a video encoding process.

A video signal refers to an image sequence comprising a plurality of frames. A frame (frame) is a representation of the spatial information of a video signal. Taking YUV mode as an example, a frame includes one luminance sample matrix (Y) and two chrominance sample matrices (Cb and Cr). From the viewpoint of the video signal acquisition mode, two modes, i.e., capturing by a camera and generating by a computer, can be classified. The corresponding compression coding modes may also differ due to the difference in statistical properties.

In some mainstream video coding technologies, such as h.265/HEVC (High Efficient Video Coding, high efficiency video compression coding), h.266/VVC (Versatile Video Coding, general video coding) standard, AVS (Audio Video coding Standard ), such as AVS3, a hybrid coding framework is used to perform the following series of operations and processes on an input original video signal.

1. Block division structure (Block Partition Structure): the input image is divided into several non-overlapping processing units, each of which will perform a similar compression operation. This processing Unit is called CTU (Coding Tree Unit), or LCU (Large Coding Unit, maximum Coding Unit). The CTU may continue to divide more finely, further down, to obtain one or more basic coded units, called CU (Coding Unit). Each CU is the most basic element in an encoding pass.

CTUs may be divided down into different CUs in a quadtree manner. For example, as shown in fig. 2, a CU partitioning diagram is shown, which is related to the present application. As shown in fig. 2, CTUs in VVC are first divided into different CUs according to a quadtree, and then CUs of sub-nodes of the quadtree may be divided according to a multi-type tree, including four division types: vertical binary tree partitioning (split_bt_ver), horizontal binary tree partitioning (split_bt_hor), vertical trigeminal tree partitioning (split_tt_ver), horizontal trigeminal tree partitioning (split_tt_hor), wherein the trigeminal tree is according to 1:2:1 division. The leaf nodes of the multi-type tree are also called CUs.

Each CU block contains intra prediction and inter prediction. Comparing different prediction modes in the same prediction type to find an optimal segmentation mode, and comparing inter-frame modes in a frame to find an optimal prediction mode under the current CU; and meanwhile, transforming the CU by a Transform Unit (TU), and finding the optimal transformation type. Finally, a frame of image is divided into CUs.

Among these, there are DC Mode, planar Mode, and 65 angle prediction modes, and also Intra Sub-region division (ISP), inter-component linear model prediction (Cross-Component Linear Model Prediction, CCLM), most probable Mode for luminance component (Most Probable Mode, MPM), luminance Derived Mode for chrominance component (DM), multi-reference line Intra prediction (Multiple Reference Line Intra Prediction, MRLP), and the like.

Inter Prediction is based on h.265, and a Combined Inter-Intra Prediction (CIIP) technique, a geometric partition Inter-Prediction technique GPM, a symmetric motion vector differential coding (Symmetric MVD Coding, SMVD) technique, a decoding end motion vector refinement (Decoder Side Motion Vector Refinement, DMVR), a Bi-directional optical flow (Bi-Directional Optical Flow, BDOF), affine transformation, and the like are introduced.

2. Predictive coding (Predictive Coding): the method comprises modes of intra-frame prediction, inter-frame prediction and the like, and the original video signal is predicted by the selected reconstructed video signal to obtain a residual video signal. The encoding side needs to decide one of the most suitable prediction coding modes among many possible prediction coding modes for the current CU and inform the decoding side. Intra prediction refers to the fact that the predicted signal comes from a region that has been encoded and reconstructed within the same image. Inter prediction refers to a predicted signal from an already encoded other picture (referred to as a reference picture) than the current picture.

3. Transform and Quantization (Transform & Quantization): the residual video signal is subjected to transformation operations such as DFT (Discrete Fourier Transform ), DCT (Discrete Cosine Transform, discrete cosine transform), etc., and the signal is converted into a transform domain, referred to as transform coefficients. And (3) carrying out lossy quantization operation on the signals in the transform domain, and losing certain information, so that the quantized signals are favorable for compression expression. In some video coding standards, there may be more than one transform mode to choose, so the coding end also needs to choose one of the transforms for the current CU and inform the decoding end. The degree of refinement of quantization is typically determined by quantization parameters. The QP (Quantization Parameter ) has a larger value, and the coefficients representing a larger range of values will be quantized to the same output, thus usually bringing more distortion and lower code rate; conversely, a smaller QP value will represent a smaller range of coefficients to be quantized to the same output, and therefore will typically result in less distortion, while corresponding to a higher code rate.

4. Entropy Coding (Entropy Coding) or statistical Coding: the quantized transform domain signal is subjected to statistical compression coding according to the occurrence frequency of each value, and finally a binary (0 or 1) compressed code stream is output. Meanwhile, encoding generates other information, such as a selected mode, a motion vector, etc., and entropy encoding is also required to reduce a code rate. The statistical coding is a lossless coding mode, and can effectively reduce the code rate required for expressing the same signal. Common statistical coding methods are variable length coding (Variable Length Coding, VLC for short) or context-based binary arithmetic coding (Content Adaptive Binary Arithmetic Coding, CABAC for short).

5. Loop Filtering (Loop Filtering): the encoded image is subjected to inverse quantization, inverse transformation and predictive compensation (the inverse operation of 2-4 above), so as to obtain a reconstructed decoded image. The reconstructed image is different from the original image in part of information due to the quantization effect compared to the original image, resulting in distortion. The degree of distortion generated by quantization can be effectively reduced by performing a filtering operation on the reconstructed image, such as deblocking filtering (deblocking), SAO (Sample Adaptive Offset ), ALF (Adaptive Lattice Filter, adaptive lattice filter), or the like. Since these filtered reconstructed images will be used as references for subsequent encoded images for predicting future signals, the above-described filtering operations are also referred to as loop filtering, as well as filtering operations within the encoding loop.

That is, in encoding, a frame of image is fed to an encoder, divided into CTUs, and depth-divided to obtain CUs, each of which includes a plurality of prediction modes and TUs. And predicting each CU to obtain a predicted value, subtracting the predicted value from input data to obtain a residual error, then carrying out transformation and quantization to obtain a residual error coefficient, then sending the residual error coefficient into an entropy coding module to output a code stream, simultaneously carrying out inverse quantization and inverse transformation on the residual error coefficient to obtain a residual error value of a reconstructed image, adding the residual error value and the predicted value to obtain a reconstructed image, filtering the reconstructed image, and then entering a reference frame queue to be used as a reference image of the next frame, thereby sequentially encoding backwards.

According to the above encoding process, at the decoding end, for each CU, the decoder obtains a compressed bitstream, and then performs entropy decoding to obtain various mode information and quantized transform coefficients. Each coefficient is subjected to inverse quantization and inverse transformation to obtain a residual signal. On the other hand, according to the known coding mode information, a prediction signal corresponding to the CU can be obtained, and after the prediction signal and the prediction signal are added, a reconstructed signal can be obtained. Finally, the reconstructed values of the decoded image require a loop filtering operation to produce the final output signal.

Some mainstream video coding standards, such as HEVC, VVC, AVS3, use a block-based hybrid coding framework. The method divides the original video data into a series of coding blocks, and combines video coding methods such as prediction, transformation, entropy coding and the like to realize the compression of the video data. Among them, motion compensation is a type of prediction method commonly used for video coding, and motion compensation derives a prediction value of a current coding block from a coded region based on redundancy characteristics of video content in a time domain or a space domain. Such prediction methods include: inter prediction, intra block copy prediction, intra string copy prediction, etc., these prediction methods may be used alone or in combination in a particular coding implementation. For coded blocks using these prediction methods, it is often necessary to explicitly or implicitly encode one or more two-dimensional displacement vectors in the code stream, indicating the displacement of the current block (or co-located blocks of the current block) relative to its one or more reference blocks.

It should be noted that, in different prediction modes and in different implementations, the displacement vectors may have different names, and are described herein in the following manner: 1) The displacement Vector in the inter prediction mode is called a Motion Vector (MV); 2) The displacement Vector in the IBC (Intra Block Copy) prediction mode is called a Block Vector (BV); 3) The displacement Vector in the ISC (Intra String Copy, intra-frame String copy) prediction mode is called a String Vector (SV). Intra-string replication is also known as "string prediction" or "string matching", etc.

MV refers to a displacement vector for an inter prediction mode, which points from a current picture to a reference picture, with a value of a coordinate offset between the current block and the reference block, where the current block and the reference block are in two different pictures. In the inter prediction mode, motion vector prediction can be introduced, and a predicted motion vector corresponding to the current block is obtained by predicting the motion vector of the current block, and the difference value between the predicted motion vector corresponding to the current block and the actual motion vector is coded and transmitted, so that compared with the direct coding and transmitting of the actual motion vector corresponding to the current block, the bit cost is saved. In the embodiment of the present application, the predicted motion vector refers to a predicted value of a motion vector of a current block obtained by a motion vector prediction technique.

BV refers to a displacement vector for IBC prediction mode, which has a value of a coordinate offset between a current block and a reference block, wherein the current block and the reference block are both in a current picture. In the IBC prediction mode, block vector prediction may be introduced, and a block vector of the current block is predicted to obtain a predicted block vector corresponding to the current block, and a difference value between the predicted block vector corresponding to the current block and an actual block vector is encoded and transmitted, which is advantageous for saving bit overhead compared with directly encoding and transmitting the actual block vector corresponding to the current block. In the embodiment of the present application, the predicted block vector refers to a predicted value of a block vector of a current block obtained by a block vector prediction technique.

SV refers to a displacement vector for the ISC prediction mode, which has a value of a coordinate offset between a current string and a reference string, both of which are in a current image. In the ISC prediction mode, string vector prediction can be introduced, a predicted string vector corresponding to the current string is obtained by predicting the string vector of the current string, and the difference value between the predicted string vector corresponding to the current string and the actual string vector is coded and transmitted, so that compared with the direct coding and transmitting of the actual string vector corresponding to the current string, the bit cost is saved. In the embodiment of the present application, the predicted string vector refers to a predicted value of a string vector of the current string obtained by a string vector prediction technique.

Several different prediction modes are described below.

1. Inter prediction mode

As shown in fig. 3, inter prediction utilizationThe correlation of the video time domain uses the pixels adjacent to the coded image to predict the pixels of the current image so as to achieve the purpose of effectively removing the video time domain redundancy and effectively saving the bits of the coded residual data. Wherein P is the current frame, P _r B is the current block to be coded and B is the reference frame _r Is the reference block of B. B' is the same as B in the coordinate position in the image, B _r The coordinates are (x) _r ，y _r ) The B' coordinates are (x, y). The displacement between the current block to be coded and its reference block, called Motion Vector (MV), is a vector that marks the positional relationship between the current block and the reference block when inter prediction is performed, namely:

MV=（x _r -x，y _r -y）

considering that the temporal or spatial neighboring blocks have a strong correlation, MV prediction techniques can be used to further reduce the bits required to encode MVs. In h.265/HEVC, inter prediction includes two MV prediction techniques, merge and AMVP (advanced motion vector prediction).

The Merge mode creates an MV candidate list for the current PU (Prediction Unit), where there are 5 candidate MVs (and their corresponding reference pictures). Traversing the 5 candidate MVs, and selecting the MVs with the minimum rate distortion cost as the optimal MVs. If the codec builds the candidate list in the same way, the encoder only needs to transmit the index of the optimal MV in the candidate list. It should be noted that the MV prediction technique of HEVC is also a skip mode, which is a special case of the Merge mode. After the Merge mode finds the optimal MV, if the current block is basically the same as the reference block, the residual data need not be transmitted, and only the index of the MV and a skip flag need to be transmitted.

Wherein the rate distortion cost (Rate Distortion Cost, rdcost) is used for preference among a plurality of options, calculated by the following formula:

wherein dist represents distortion, the difference between the original input pixel and the predicted pixel is recorded, and dist can be obtained by the sum of absolute variation differences (Sum of Absolute Transformed Difference, SATD), the sum of squares of errors (Sum of Square Error, SSE), and the like. SATD means that the sum of absolute values is calculated after hadamard transformation, and is one way to calculate distortion, namely, the sum of absolute values of all elements is calculated after hadamard transformation is carried out on residual signals. SSE represents the sum of squares of the errors of the original pixel and the reconstructed pixel, the residual signal is required to be transformed, quantized, inversely quantized and inversely transformed, and the estimated code is identical to the true code.

The MV candidate list established by the Merge mode includes two cases of space domain and time domain, and also includes a mode of combining the list for B Slice (B frame image). Wherein the spatial domain provides at most 4 candidate MVs, the establishment of which is shown in part (a) of fig. 4. The airspace list is established according to the sequence of A1-B0-A0-B2, wherein B2 is an alternative, namely when one or more of A1, B1, B0 and A0 are not existed, motion information of B2 is needed to be used; the time domain provides at most 1 candidate MV, and its establishment is shown in part (b) of fig. 4, and is obtained by scaling MVs of co-located PUs according to the following formula:

curMV=t _d * colMV / t _b

Wherein curMV represents the MV of the current PU, colMV represents the MV of the co-located PU, t _d Representing the distance between the current image and the reference image, t _b Representing the distance between the co-located image and the reference image. If the PU at the D0 position on the homonymous block is not available, the homonymous PU at the D1 position is replaced. For PUs in B Slice, since there are two MVs, its MV candidate list also needs to provide two MVPs (Motion Vector Predictor, predictive motion vectors), which can represent the initial positions of MVs derived from neighboring blocks. HEVC generates a combined list for B Slice by combining the first 4 candidate MVs in the MV candidate list two by two.

Similarly, AMVP mode utilizes MV correlation of spatial and temporal neighboring blocks to build a MV candidate list for the current PU. Different from the Merge mode, selecting an optimal prediction MV from an MV candidate list of the AMVP mode, and performing differential coding on the optimal prediction MV and an optimal MV obtained by motion searching of a current block to be coded, namely coding MVD=MV-MVP, wherein MVD is a motion vector residual error (Motion Vector Difference); the decoding end can calculate the MVs of the current decoding block by establishing the same list and only needing the serial numbers of the MVDs and the MVPs in the list. The MV candidate list in AMVP mode also contains both spatial and temporal cases, except that the MV candidate list in AMVP mode has a length of only 2.

As described above, in AMVP mode, MVDs need to be encoded. The resolution of the MVD is controlled by the use_integer_mv_flag in the slice_header, and when the value of the flag is 0, the MVD is encoded at 1/4 (luminance) pixel resolution; when the flag has a value of 1, the mvd is encoded with full (luminance) pixel resolution. A method of adaptive motion vector accuracy (Adaptive Motion Vector Resolution, AMVR) is used in VVC. This method allows each CU to adaptively select the resolution of the coded MV. In the normal AMVP mode, the selectable resolutions include 1/4,1/2,1, and 4 pixel resolutions. For a CU with at least one non-zero MVD component, a flag is first encoded indicating whether quarter-luma sample MVD precision is used for the CU. If the flag is 0, the MVD of the current CU is encoded with 1/4 pixel resolution. Otherwise, a second flag needs to be encoded to indicate that the CU uses 1/2 pixel resolution or other MVD resolution. Otherwise, a third flag is encoded to indicate whether 1 pixel resolution or 4 pixel resolution is used for the CU.

2. IBC prediction mode

IBC is an intra-frame coding tool adopted in HEVC screen content coding (Screen Content Coding, SCC for short) extension, which significantly improves the coding efficiency of screen content. In AVS3 and VVC, IBC technology is also adopted to improve the performance of screen content encoding. The IBC predicts the pixels of the current block to be coded by using the pixels of the coded image on the current image by utilizing the spatial correlation of the screen content video, and can effectively save the bits required by coding the pixels. As shown in fig. 5, the displacement between the current block and its reference block in IBC is called BV (block vector). The h.266/VVC employs BV prediction techniques similar to inter prediction to further save bits required to encode BVs. VVC predicts BV using AMVP mode similar to that in inter prediction and allows BVD to be encoded using 1 or 4 pixel resolution.

3. ISC prediction mode

The ISC technique divides a coded block into a series of pixel strings or unmatched pixels in some scanning order (e.g., raster scan, round-trip scan, zig-Zag scan, etc.). Similar to IBC, each string searches for a reference string of the same shape in the encoded region of the current image, derives a predicted value of the current string, and by encoding a residual between the pixel value of the current string and the predicted value, bits can be effectively saved by encoding the pixel value instead of directly encoding the pixel value. Fig. 6 gives a schematic diagram of intra-frame string replication, where fig. 6 shows the encoded region, string 1 (28 pixels), string 2 (35 pixels), unmatched pixels (1 pixel). The displacement between string 1 and its reference string is the string vector 1 in fig. 6; the displacement between string 2 and its reference string is the string vector 2 in fig. 6.

The intra-frame string copy technique requires coding of SV, string length, and a flag of whether there is a matching string or not for each string in the current coding block. Where SV represents the displacement of the string to be encoded to its reference string. The string length represents the number of pixels contained in the string. In different implementations, there are several ways of encoding the string length, several examples (some examples may be used in combination) are given below:

The length of the string is encoded directly in the code stream.

Encoding and processing the subsequent pixel number to be processed of the string in the code stream, and calculating the length of the current string by a decoding end according to the size N of the current block and the processed pixel number N1 and the decoded pixel number N2 to obtain the length L=N-N1-N2;

a flag is encoded in the code stream to indicate whether the string is the last string, and if so, the length l=n-N1 of the current string is calculated from the number of processed pixels N1 according to the size N of the current block. If a pixel does not find a corresponding reference in the referenceable region, the pixel values of the unmatched pixels will be encoded directly.

As shown in fig. 7, a simplified block diagram of a communication system provided in one embodiment of the present application is shown. Communication system 200 includes a plurality of devices that may communicate with each other through, for example, network 250. For example, the communication system 200 includes a first device 210 and a second device 220 interconnected by a network 250. In the embodiment of fig. 7, the first device 210 and the second device 220 perform unidirectional data transmission. For example, the first apparatus 210 may encode video data, such as a stream of video pictures acquired by the first apparatus 210, for transmission to the second apparatus 220 over the network 250. The encoded video data is transmitted in one or more encoded video code streams. The second device 220 may receive the encoded video data from the network 250, decode the encoded video data to recover the video data, and display the video pictures according to the recovered video data. Unidirectional data transmission is common in applications such as media services.

In another embodiment, the communication system 200 includes a third device 230 and a fourth device 240 that perform bi-directional transmission of encoded video data, which may occur, for example, during a video conference. For bi-directional data transmission, each of the third device 230 and the fourth device 240 may encode video data (e.g., a stream of video pictures collected by the device) for transmission over the network 250 to the other of the third device 230 and the fourth device 240. Each of the third device 230 and the fourth device 240 may also receive encoded video data transmitted by the other of the third device 230 and the fourth device 240, and may decode the encoded video data to recover the video data, and may display video pictures on an accessible display device according to the recovered video data.

In the embodiment of fig. 7, the first device 210, the second device 220, the third device 230, and the fourth device 240 may be computer devices such as a server, a personal computer, and a smart phone, but the principles disclosed herein may not be limited thereto. The embodiment of the application is applicable to a PC (Personal Computer ), a mobile phone, a tablet computer, a media player or special video conference equipment. Network 250 represents any number of networks that transfer encoded video data between first device 210, second device 220, third device 230, and fourth device 240, including, for example, a wired network or a wireless communication network. Communication network 250 may exchange data in circuit-switched or packet-switched channels. The network may include a telecommunications network, a local area network, a wide area network, or the internet. For the purposes of this application, the architecture and topology of network 250 may be irrelevant to the operation disclosed herein, unless explained below.

As an example, fig. 8 shows the placement of video encoders and video decoders in a streaming environment. The subject matter disclosed herein is equally applicable to other video-enabled applications including, for example, video conferencing, digital TV (television), storing compressed video on digital media including CD (Compact Disc), DVD (Digital Versatile Disc ), memory stick, etc.

The streaming system may include an acquisition subsystem 313, which may include a video source 301, such as a digital camera, that creates an uncompressed video picture stream 302. In an embodiment, the video picture stream 302 includes samples taken by a digital camera. The video tile stream 302 is depicted as a bold line compared to the encoded video data 304 (or encoded video code stream) to emphasize a high data volume video tile stream, the video tile stream 302 may be processed by an electronic device 320, the electronic device 320 comprising a video encoder 303 coupled to a video source 301. The video encoder 303 may include hardware, software, or a combination of hardware and software to implement or implement aspects of the disclosed subject matter as described in more detail below. The encoded video data 304 (or encoded video stream 304) is depicted as a thin line compared to the video picture stream 302 to emphasize a lower amount of encoded video data 304 (or encoded video stream 304), which may be stored on the streaming server 305 for future use. One or more streaming client subsystems, such as client subsystem 306 and client subsystem 308 in fig. 8, may access streaming server 305 to retrieve copies 307 and 309 of encoded video data 304. Client subsystem 306 may include, for example, video decoder 310 in electronic device 330. The video decoder 310 decodes the incoming copy 307 of the encoded video data and generates an output video picture stream 311 that can be presented on a display 312 (e.g., a display screen) or another presentation device (not depicted). In some streaming systems, the encoded video data 304, 307, and 309 (e.g., video streams) may be encoded according to some video encoding/compression standard.

It should be noted that electronic device 320 and electronic device 330 may include other components (not shown). For example, electronic device 320 may include a video decoder (not shown), and electronic device 330 may also include a video encoder (not shown). Wherein the video decoder is configured to decode the received encoded video data; video encoders are used to encode video data.

It should be noted that, the technical solution provided in the embodiment of the present application may be applied to the h.266/VVC standard, the h.265/HEVC standard, AVS (such as AVS 3) or the next-generation video coding and decoding standard, which is not limited in this embodiment of the present application.

In inter prediction, motion Estimation (ME) is performed for all reference frames in the reference frame list. As the number of reference frames increases, the encoder needs to perform motion estimation on more reference frames to find the best motion vector to predict the content of the current frame. This involves more searches and computations, resulting in a multiple increase in coding complexity.

In VVC, the maximum number of reference frames is 16, and by increasing the number of reference frames, the encoder can better exploit temporal and spatial correlation to provide higher quality video coding, but the complexity involved can seriously impact coding efficiency.

In the method shown in the subsequent embodiment of the present application, in the process of encoding the target encoding unit, the target rate distortion cost of each reference frame in the reference frame list may be obtained; the target rate distortion cost is the rate distortion cost corresponding to the optimal MVP of the reference frame; determining available reference frames in each reference frame based on respective target rate distortion costs for each reference frame; inter-prediction is performed on the target coding unit based on available ones of the respective reference frames. That is, in the process of encoding the target encoding unit, the available reference frames are first selected from the reference frame list, and then only inter-frame prediction is needed based on the available reference frames, so as to improve the encoding efficiency.

In the method provided by the embodiment of the present application, the execution body of each step may be an encoding end device. In the video encoding process, the technical scheme provided by the embodiment of the application can be adopted to encode the image. The encoding end device may be a computer device, which refers to an electronic device with data computing, processing and storage capabilities, such as a PC, a cell phone, a tablet computer, a media player, a dedicated video conferencing device, a server, etc.

In addition, the methods provided herein may be used alone or in any order in combination with other methods. An encoder based on the methods provided herein may be implemented by 1 or more processors or 1 or more integrated circuits.

Referring to fig. 9, a flowchart of an inter prediction method according to an embodiment of the present application is shown. For convenience of explanation, only the execution subject of each step will be described as a computer device. The method may comprise the following steps.

Step 910: in the process of encoding a target encoding unit, obtaining target rate distortion cost of each reference frame in a reference frame list; the target rate-distortion cost is the rate-distortion cost corresponding to the optimal MVP of the reference frame.

In the embodiment of the present application, in the process of encoding a target encoding unit in a current frame, before motion estimation is performed on reference frames in a reference frame list, a rate distortion cost corresponding to an optimal MVP of each reference frame in the reference frame list is first determined and used as a target rate distortion cost of each reference frame.

In some embodiments, the process of obtaining the target rate distortion cost of each reference frame in the reference frame list may include:

Acquiring an advanced motion vector prediction AMVP candidate list of a first reference frame; the first reference frame is any one of the respective reference frames;

calculating the rate distortion cost of each candidate MVP in the AMVP candidate list;

and obtaining the minimum value in the rate distortion cost of each candidate MVP as the target rate distortion cost of the first reference frame.

In the embodiment of the present application, when applied to an AMVP process, the computer device may first obtain an AMVP candidate list for each reference frame in each reference frame, calculate a rate-distortion cost of each candidate MVP in the AMVP candidate list, and select the smallest rate-distortion cost as a target rate-distortion cost, where the candidate MVP corresponding to the target rate-distortion cost is the optimal MVP of the reference frame.

Step 920: the available ones of the respective reference frames are determined based on the respective target rate distortion costs for the respective reference frames.

In one possible implementation, the computer device may determine one or more available reference frames from the respective reference frames in order of decreasing target rate-distortion cost to screen one or more reference frames from the respective reference frames having a smaller target rate-distortion cost.

For example, the computer device may sort the reference frames from small to large according to the target rate-distortion cost, and then select one or more reference frames arranged in front as the available reference frames according to the sorting order.

Step 930: inter-prediction is performed on the target coding unit based on available ones of the respective reference frames.

That is, in the embodiment of the present application, when performing motion estimation in the inter-frame prediction process, the computer device only needs to perform motion estimation on available reference frames in each reference frame, and does not need to perform motion estimation on all reference frames in each reference frame.

In one possible implementation manner, the acquiring the target rate distortion cost of each reference frame in the reference frame list further includes: the optimal MVP for each reference frame is recorded. Accordingly, the inter-frame prediction of the target coding unit based on the available reference frames in the reference frames includes:

and traversing available reference frames in the reference frames, and performing motion estimation based on the optimal MVP of the available reference frames so as to perform inter-frame prediction on the target coding unit.

In the embodiment of the application, since the optimal MVP of each reference frame is calculated before the available reference frames are screened, the optimal MVP of each reference frame can be recorded when the available reference frames are screened, and when the target coding unit is predicted in a subsequent frame-to-frame mode based on the available reference frames, the recorded optimal MVP of the available reference frames is directly used for prediction, and the optimal MVP of the available reference frames does not need to be repeatedly calculated, so that the efficiency of single inter-frame prediction is ensured.

The reference frame list may include a forward reference frame list and a backward reference frame list, and in the embodiment of the present application, the above scheme may be performed separately for a single reference frame list.

For example, for the forward reference frame list and the backward reference frame list, the computer device calculates the target rate distortion cost for each reference frame in the forward reference frame list and the backward reference frame list, respectively, in step 910; in step 920, the available reference frames in the forward reference frame list are determined based on the respective target rate distortion costs of the reference frames in the forward reference frame list, and the available reference frames in the backward reference frame list are determined based on the respective target rate distortion costs of the reference frames in the backward reference frame list; in step 930, forward inter-prediction is performed on the target coding unit based on the available reference frames in the forward reference frame list, and backward inter-prediction is performed on the target coding unit based on the available reference frames in the backward reference frame list.

The process of determining the available reference frames in each reference frame based on the respective target rate distortion cost of each reference frame may be implemented by using a reference frame template, for example, the reference frame template includes a flag bit (may be represented by 1 bit) corresponding to each reference frame, and setting one reference frame as the available reference frame may refer to setting the flag bit corresponding to the reference frame in the reference frame template to a specified value (such as 0 or 1); accordingly, when the available reference frames in the reference frames are traversed subsequently and motion estimation is performed based on the optimal MVP of the available reference frames to perform inter-frame prediction on the target coding unit, each marking bit in the reference frame template can be traversed, and motion estimation is performed on the reference frame (i.e. the available reference frame) with the marking bit as a specified value through the optimal MVP, so as to determine whether the reference frame which is the target coding unit is selected in the inter-frame prediction process.

In summary, in the scheme shown in the embodiment of the present application, during encoding a target encoding unit, a target rate-distortion cost corresponding to an optimal MVP of each reference frame in a reference frame list is first obtained, then an available reference frame is determined from each reference frame based on the target rate-distortion cost of each reference frame, and inter-frame prediction is performed on the target encoding unit based on the available reference frame; in the scheme, firstly, the available reference frames are screened from the reference frame list through the rate distortion cost corresponding to the optimal MVP of each reference frame, and only the screened available reference frames are required to be traversed during the subsequent inter-frame prediction, and all the reference frames in the reference frame list are not required to be traversed, so that the number of the reference frames required to be traversed in the inter-frame prediction process is reduced, the time length and processing resources required for selecting the reference frames are reduced, and the coding efficiency is improved.

Referring to fig. 10, a flowchart of an inter prediction method according to another embodiment of the present application is shown. As shown in fig. 10, the above step 920 may be implemented as step 920a or step 920b.

Step 920a: and setting the reference frames with the target rate distortion cost smaller than the cost threshold value as available reference frames in the reference frames.

In the embodiment of the application, the computer device may set all reference frames with the target rate distortion cost smaller than the cost threshold as available reference frames in each reference frame.

For example, the computer device may traverse each reference frame in the forward reference frame list, and if the target rate distortion cost for the currently traversed reference frame is less than the first cost threshold, set the reference frame as an available reference frame in the forward reference frame list; accordingly, the computer device may traverse each reference frame in the backward reference frame list and set the reference frame as an available reference frame in the backward reference frame list if the target rate distortion cost of the currently traversed reference frame is less than the second cost threshold.

The first price threshold and the second cost threshold may be preset thresholds.

The first price threshold and the second cost threshold may be the same, or the first price threshold and the second cost threshold may be different.

Step 920b: and setting a maximum of m reference frames with the target rate distortion cost smaller than a cost threshold value in each reference frame as available reference frames, wherein m is an integer greater than or equal to 1.

Wherein the value of m is smaller than the maximum number of reference frames contained in the reference frame list.

In the embodiment of the application, the computer device may set up at most m reference frames as available reference frames in reference frames with a target rate distortion cost smaller than a cost threshold value.

Wherein for the forward reference frame list and the backward reference frame list, the computer device may determine a maximum of m reference frames, respectively.

For example, when the number of reference frames in each reference frame for which the target rate-distortion cost is less than the cost threshold is not greater than m, the computer device sets all reference frames in each reference frame for which the target rate-distortion cost is less than the cost threshold as available reference frames.

For another example, when the number of reference frames in each reference frame for which the target rate-distortion cost is smaller than the cost threshold is greater than m, the computer device sets the first m reference frames in each reference frame, in which the target rate-distortion cost is arranged from small to large, as available reference frames.

Specifically, for example, for any one of the forward reference frame list and the backward reference frame list, after the computer device ranks the reference frames in the reference frame list in order from small to large according to the target rate distortion cost, the computer device sequentially traverses the reference frames from the first reference frame, if the target rate distortion cost of the currently traversed reference frame is smaller than the threshold value and the currently traversed reference frame is in the first m bits of the ranked queue, the currently traversed reference frame is determined as an available reference frame, and if the target rate distortion cost of the currently traversed reference frame is not smaller than the threshold value or the currently traversed reference frame is in the m th bit of the ranked queue, the computer device stops traversing the subsequent reference frames in the ranked queue.

In the embodiment of the application, the computer equipment can select the available reference frames with smaller partial cost from the reference frame list by combining the cost threshold value, so that the accuracy of selecting the available reference frames is ensured.

In one possible implementation, the cost threshold is determined by the product of the minimum rate-distortion cost and a specified coefficient; the minimum rate-distortion cost is the minimum of the respective target rate-distortion costs for the respective reference frames.

For example, the cost threshold may be a product of a minimum value of the respective target rate-distortion costs of the respective reference frames and a specified coefficient.

The above specified coefficient may be a coefficient set in advance by a developer/user.

In the embodiment of the application, the computer equipment can determine the cost threshold according to the minimum value in the respective target rate distortion cost of each reference frame, so that the proper cost threshold can be flexibly selected according to the respective target rate distortion cost of each reference frame, and the flexibility and the accuracy of cost threshold determination can be ensured.

Referring to fig. 11, a flowchart of an inter prediction method according to another embodiment of the present application is shown. As shown in fig. 11, step 921 and step 922 may also be included before step 930.

Step 921: and acquiring the selected times of each reference frame.

The number of times of selection of the reference frame refers to the number of times of selection of the reference frame as the reference frame used for inter prediction in other inter prediction processes before the current inter prediction.

In one possible implementation, the selected number of times may include at least one of the following.

1) The number of times a reference frame is selected by an adjacent coding block; the neighboring coding block is a coding block that is adjacent to the target coding unit and has completed coding, and the optimal mode of the neighboring coding block is an inter prediction mode.

In the embodiment of the present application, for a coding block that is adjacent to a target coding unit of current coding and has completed coding, if an optimal mode of a certain coding block is an inter prediction mode, the coding block is finally coded by a selected number of times +1 of a reference frame used by the coding block.

2) The reference frame is selected times in the process of performing predictive partitioning by other predictive partitioning methods before performing current predictive partitioning on the target coding unit.

In this embodiment of the present application, before a prediction of a certain partition mode is performed on a target coding unit, if prediction of one or more partition modes of the target coding unit has been completed before, for each partition mode for which prediction has been completed before on the target coding unit, the selected number of times +1 of a reference frame selected by the target coding unit in the prediction process of the partition mode is selected.

3) The reference frame is selected a number of times during inter prediction by other inter prediction modes before performing current inter prediction in current prediction partitioning of the target coding unit.

In the embodiment of the present application, before a prediction of a certain inter prediction mode in a certain sub-division manner is performed on a target coding unit, if a prediction of one or more other inter prediction modes in a current division manner has been completed before, for the inter prediction mode for which the prediction has been completed in the current division manner, the selected number of times of the reference frame determined in the prediction process of the inter prediction mode is +1.

For a coded block, during the coding process of the coded block, multiple partition modes (such as no partition, quadtree partition, binary tree partition, trigeminal tree partition, etc.) of the coded block are predicted, during the prediction process of each partition mode, multiple prediction modes (including modes of inter-frame prediction and intra-frame prediction) are predicted, and during the inter-frame prediction process, multiple inter-frame prediction modes (such as merge prediction and AMVP prediction) may be performed. For a coding block, after the computer device goes through the above prediction process, the optimal division mode and the optimal prediction mode can be selected, and finally the coding block is coded by the optimal division mode and the optimal prediction mode. That is, for a coded block that has been coded, if the optimal prediction mode of the coded block is the inter prediction mode, the coded block will correspond to a reference frame used for final coding in the reference frame list; in addition, in the encoding process of one encoding block, in the inter prediction process in each partition mode, one reference frame for prediction is selected in each prediction mode.

In one possible implementation, the computer device may obtain the respective selected times of each reference frame before predicting the first partition of the current coding unit, where the respective selected times of each reference frame may include the times that the reference frame is selected by the adjacent coding block of the current coding unit.

In another possible implementation, the computer device may obtain/update the respective selected times of the respective reference frames before predicting a certain partitioning of the current coding unit, where the respective selected times of the respective reference frames may include: the number of times that the reference frame is selected by the adjacent coding block of the current coding unit, the number of times that the reference frame is selected in the process of performing predictive partitioning by other predictive partitioning methods before performing current predictive partitioning on the target coding unit.

In yet another possible implementation manner, the computer device may obtain/update the respective selected times of each reference frame before predicting a certain inter-frame prediction mode in the prediction process of a certain sub-division manner of the current coding unit, where the respective selected times of each reference frame may include: the sum of the number of times that the reference frame is selected by the neighboring coding block of the current coding unit, the number of times that the reference frame is selected in the process of performing prediction partitioning by other prediction partitioning methods before performing current prediction partitioning on the target coding unit, and the number of times that the reference frame is selected in the process of performing inter prediction by other inter prediction methods (such as prediction in Merge mode) before performing current inter prediction (such as prediction in AMVP mode) in current prediction partitioning on the target coding unit.

Step 922: the available ones of the respective reference frames are determined based on the respective selected times of the respective reference frames.

In the embodiment of the application, the computer device may preferably select, according to the order of the number of times of selection of each reference frame from large to small, the reference frame with the number of times of selection being greater than the number of times of selection to set as the available reference frame.

That is, in the scheme shown in the embodiment of the present application, in addition to determining the available reference frames according to the order of the target rate-distortion cost of the reference frames from small to large, the computer device may also screen the available reference frames by selecting the reference frames in the reference frame list in the previous encoding process, thereby expanding the determination manner of the available reference frames, and ensuring the number and accuracy of the available reference frames while reducing the number of reference frames that need to be traversed in the subsequent encoding process, and further ensuring the accuracy of the subsequent encoding.

In one possible implementation, determining available reference frames in each reference frame based on the respective selected times of each reference frame includes:

setting the reference frames with the selected times larger than the first time threshold value in the reference frames as available reference frames; or,

And setting a maximum of n reference frames, of which the selected times are greater than a first time threshold, as available reference frames, wherein n is an integer greater than or equal to 1.

In the embodiment of the application, the computer device may set all reference frames with the number of times greater than the first time threshold as available reference frames in each reference frame.

In the embodiment of the present application, the computer device may also set up to n reference frames as available reference frames among the reference frames selected for a number of times greater than the first time threshold.

For example, when the number of reference frames selected for a number of times greater than the first time threshold value in each reference frame is not greater than n, the computer device sets all reference frames selected for a number of times greater than the first time threshold value in each reference frame as available reference frames.

For another example, when the number of reference frames selected for the number of times greater than the first threshold is greater than n in each reference frame, the computer device sets the first n reference frames of each reference frame, which are arranged from large to small in the number of times selected, as the available reference frames.

In one possible implementation, the first time count threshold may be set by a developer or user.

Specifically, for example, for any one of the forward reference frame list and the backward reference frame list, after arranging the reference frames in the reference frame list in order from the large number of times to the small number of times, the computer device sequentially traverses the first reference frame, if the number of times of selecting the currently traversed reference frame is greater than the first time threshold value and the currently traversed reference frame is in the first n bits of the ordered queue, the currently traversed reference frame is determined as an available reference frame, and if the number of times of selecting the currently traversed reference frame is not greater than the threshold value or the currently traversed reference frame is in the nth bit of the ordered queue, the computer device stops traversing the subsequent reference frames in the ordered queue.

In the embodiment of the application, the computer equipment can select part of the reference frames with more selected times from the reference frame list as the available reference frames by combining the frequency threshold value, so that the accuracy of selecting the available reference frames is ensured.

Referring to fig. 12, a flowchart of an inter prediction method according to still another embodiment of the present application is shown. As shown in fig. 12, step 923 may also be included prior to step 930.

Step 923: in each reference frame, if there is no reference frame with the selected number of times greater than the first threshold, setting p reference frames nearest to the current frame in each reference frame as available reference frames, wherein p is an integer greater than or equal to 1.

For example, for a forward reference frame list, when no reference frame with the selected times greater than the first time threshold exists in the forward reference frame list, setting p reference frames closest to the current frame in the forward reference frame list as available reference frames in the forward reference frame list; and for the backward reference frame list, when no reference frame with the selected times larger than the first time threshold exists in the backward reference frame list, setting p reference frames closest to the current frame in the backward reference frame list as available reference frames in the backward reference frame list.

The value of p may be preset by a developer. The values of p corresponding to the forward reference frame list and the backward reference frame list may be the same or different.

In this embodiment of the present application, there may be no reference frame with the number of times selected being greater than the first threshold, at this time, through the scheme shown in step 923, a new available reference frame may not be selected, at this time, the computer device may select, from the reference frames, p reference frames closest to the current frame, to be set as available reference frames, so as to ensure the number of available reference frames, and then ensure accuracy of subsequent encoding.

Referring to fig. 13, a flowchart of an inter prediction method according to still another embodiment of the present application is shown based on the embodiment shown in fig. 11 or fig. 12. As shown in fig. 13, the above method may further include steps 924 and 925.

Step 924: and acquiring the maximum selected times, wherein the maximum selected times are the maximum value of the selected times of each reference frame.

For example, for a forward reference frame list and a backward reference frame list, the computer device may determine a maximum number of selections of the forward reference frame list (i.e., a maximum number of selections of each reference frame in the forward reference frame list), and a maximum number of selections of the backward reference frame list (i.e., a maximum number of selections of each reference frame in the backward reference frame list).

Step 925: and under the condition that the maximum selected times is smaller than the second time threshold, determining available reference frames in the reference frames according to the frame types of the current frames.

For example, for the forward reference frame list, when the maximum selected number of times of the forward reference frame list is smaller than the second number of times threshold, determining available reference frames in each reference frame in the forward reference frame list according to the frame type of the current frame; and for the backward reference frame list, determining available reference frames in all reference frames in the backward reference frame list according to the frame type of the current frame under the condition that the maximum selected times of the backward reference frame list are smaller than a second time threshold.

The second time threshold may be preset by a developer.

The second time thresholds corresponding to the forward reference frame list and the backward reference frame list may be the same or different.

In the embodiment of the application, the computer equipment can supplement or enhance the available reference frames according to the frame type of the current frame, so that the number of the available reference frames is further improved, and the accuracy of subsequent encoding is further ensured.

In one possible implementation manner, in the case that the maximum selected number of times is smaller than the second number of times threshold, determining the available reference frame in each reference frame according to the frame type of the current frame may include:

Setting q reference frames closest to the current frame in the reference frames as available reference frames under the condition that the maximum selected times are smaller than a second time threshold, wherein q is an integer greater than or equal to 1;

wherein the value of q is determined by the frame type of the current frame. The relationship between the above-described value of q and the frame type may be preset by a developer.

For example, for the forward reference frame list, when the maximum selected number of times of the forward reference frame list is smaller than the second number of times threshold, q reference frames closest to the current frame in the forward reference frame list are set as available reference frames in the forward reference frame list according to the frame type of the current frame; and setting q reference frames closest to the current frame in the backward reference frame list as available reference frames in the backward reference frame list according to the frame type of the current frame when the maximum selected times of the backward reference frame list is smaller than a second time threshold.

The q values corresponding to the forward reference frame list and the backward reference frame list may be the same or different.

In the embodiment of the application, when the computer equipment supplements or enhances the available reference frames according to the frame type of the current frame, a certain number of reference frames closest to the current frame can be used as the available reference frames according to the frame type of the current frame, so that the accuracy of determining the available reference frames is ensured, and the accuracy of subsequent encoding is further ensured.

In one possible implementation, the value of q is positively correlated with the weight corresponding to the frame type of the current frame.

The weights corresponding to the frame types may be preset by a developer.

For example, the weights corresponding to the frame types may be determined by a developer in advance according to a reference relationship between frames of different types.

For example, in the weights corresponding to the frame types of the current frame, the weight of the I frame > the weight of the P frame > the weight of the B frame > the weights of the other frames.

In the embodiment of the application, when the computer device supplements or enhances the available reference frames according to the frame type of the current frame, q reference frames closest to the current frame can be used as the available reference frames according to the frame type of the current frame, wherein the higher the weight corresponding to the frame type of the current frame is, the larger the value of q is, so that the accuracy of determining the available reference frames is ensured, and the accuracy of subsequent encoding is further ensured.

In the encoding process in the related art, all reference frames in the 2 reference frame list are traversed, an optimal MVP for motion estimation is found, and subsequent motion estimation is performed. In the solution shown in the foregoing embodiment of the present application, the process of searching the optimal MVP may be referred to before the reference frame list and the reference frame traverse, so that the optimal MVP and the corresponding rate distortion cost of each reference frame may be obtained, and when the MVP is used later, the extraction may be performed according to the reference frame list direction and the reference frame index, so that the target rate distortion cost of each reference frame may be obtained without increasing the calculation amount, and the available reference frames may be selected according to the target rate distortion cost of each reference frame, which is equivalent to performing a first selection of reference frames for one time, so as to reduce the number of reference frames that need to be subjected to motion estimation subsequently. Thus, the reference frame template may be constructed with a rate-distortion cost; furthermore, the compensation and increase can be performed by combining the adjacent blocks, the predicted information of different CUs and the weight information of the current frame, so as to obtain a final reference frame template. When in application, only the reference frame selected by the template is used for realizing the great acceleration of encoding.

Referring to fig. 14, a coding scheme is shown in accordance with the present application. As shown in fig. 14, the scheme provided in the present application may be divided into 4 parts.

S1401, obtaining an optimal cost of each reference frame MVP.

S1402, obtaining the reference frame information under the adjacent blocks and different CUs.

S1403, generating a reference frame template.

S1404, template application (using template guided prediction).

Each of the above portions is explained below.

S1401, obtaining the optimal cost of each reference frame MVP

The AMVP technique encodes motion information by efficiently expressing a motion vector difference MVD. The protocol prescribes a derivation method of motion vector prediction AMVP, and the motion vector MV which is optimal for the current block and obtained through motion estimation is only encoded to reduce bit number during encoding, so that compression performance is improved.

Wherein mvd=mv-MVP.

The AMVP candidate list sequentially comprises a spatial domain candidate MVP, a time domain candidate MVP, a history-based candidate MVP and a zero value MVP. The AMVP list has a length of 2 and is divided into a forward list and a backward list, which are established for each reference image, and each item in each list only contains unidirectional motion information.

After determining each direction reference picture of the current CU, the protocol specifies a specific construction method of the corresponding AMVP candidate list as follows.

1) Spatial domain candidate MVP

The conventional AMVP mode MVP candidate list preferably selects spatial candidate MVP, and the spatial domain selects 2 candidate MVP at most.

As shown in part (a) of fig. 4, a spatial candidate MVP is generated on the left side and above the current CU (first left and then upper), for example, the left side selection order is A0- > A1, and the upper selection order is B0- > B1- > B2. Wherein, the above-mentioned A0, A1, B0, B1 and B2 may be referred to as a neighboring image block of the current CU, and the MV of each neighboring image block is referred to as a neighboring candidate MV, and when the neighboring candidate MV corresponds to the reference image and the current CU reference image, the neighboring candidate MV is marked as "available", that is, the MVP corresponding to the neighboring candidate MV is determined as a spatial candidate MVP. If 2 spatial candidate MVPs are obtained, a further redundancy check is needed, i.e. to check if the 2 spatial candidate MVPs are equal, and if the 2 spatial candidate MVPs are equal, only one of them is retained.

2) Time domain candidate MVP

After the space domain candidate MVP is determined, if the AMVP list is not full, selecting the time domain candidate MVP for list filling. The temporal candidate co-located CU locations are shown in part (b) of fig. 4.

If B in the co-located image _r A CU with a position (i.e., D0 position in fig. 4) is not available, or is an intra coding mode, IBC mode, palette (Palette) mode, then the Ctr position (i.e., D1 position in fig. 4) CU is used as a co-located CU.

The temporal candidate MVP cannot generally directly use MV information of the co-located CU, but should be scaled according to the positional relationship with the reference picture.

Please refer to fig. 15, which illustrates a schematic view of time-domain candidate MVP scaling according to the present application. As shown in fig. 15, t _b And t _d Representing the temporal distances between the current picture, the co-located picture and the corresponding reference picture, respectively, i.e. the difference between picture order counts (Picture Order Count, POC), the temporal candidate MVP of the current CU is:

3) History-based candidate MVP

After the derivation of the spatial candidate MVP and the temporal candidate MVP is finished, if the AMVP list is not filled, the latest 4 candidates in the history-based candidate MVP (HMVP) are checked one by one, and non-repeated candidates are added to the list until the AMVP list is filled or candidates satisfying the condition in the HMVP are exhausted.

HMVP is to store motion information of previously encoded blocks in a HMVP list with a maximum length of 5. The HMVP list is updated continuously with the encoding process, and after each CU with inter-frame coding completes encoding, its motion information is added as a new candidate to the end of the HMVP list. Then, redundancy check is performed on the candidates already existing in the list, and if the newly inserted candidate is repeated with the already existing candidates in the list, the already existing candidates in the list are deleted. Finally, the maximum length of the list is kept at 5 on a first-in first-out (First In First Out, FIFO) basis, i.e. if the number of candidates in the list after adding a new candidate is greater than 5, the foremost candidate is deleted. The life cycle of the HMVP list is that of each CTU row, i.e. the HMVP candidate list is emptied before the first CTU of each row of CTUs is encoded.

4) Zero value MVP

If the length of the final AMVP list is less than 2, then the empty positions in the AMVP list need to be filled with (0, 0).

After the specific construction of the AMVP candidate list is completed, calculating the cost for each candidate MVP by using the SATD as a rate-distortion cost function of the distortion function.

Where dist denotes distortion, bit denotes coding speed, λ is lagrangian constant, the difference between the original input pixel and the predicted pixel is recorded, and it is obtained by SATD.

Finding out the minimum cost minudcost, taking the minudcost as the cost of the corresponding reference frame in the current reference list, and obtaining the corresponding MVP which is the optimal MVP of the current reference frame.

Sequentially finding the optimal MVP and cost minudcest corresponding to each reference frame in the forward and backward lists, and marking the optimal MVP and cost minudcest as:

list_cost[n][ref_index]= minrdcost

where n is 0 or 1, indicating whether the forward list or the backward list, and ref_index indicates the current reference frame index.

S1402, acquiring reference frame information under adjacent blocks and different CU partitions

The information contains 3 parts, assuming that the reference frame is stored in a two-dimensional array ref_time [2] [16] and each element of ref_time is initialized to 0. Namely:

ref_time[2][16]={{0，0，0，0，0，0，0，0，0，0，0，0，0，0，0，0}，{0，0，0，0，0，0，0，0，0，0，0，0，0，0，0，0}}。

(1) Acquiring adjacent block reference frame information

Please refer to fig. 16, which illustrates a schematic diagram of the positional relationship between the current block and the neighboring block according to the present application. As shown in fig. 16, each position of the adjacent block is sequentially determined, and if the adjacent block exists and the optimal mode of the adjacent block is the inter mode, the reference frame information of the corresponding adjacent block is recorded. Respectively denoted as (a_ref_index 0, a_ref_index 1), (b_ref_index 0, b_ref_index 1), (c_ref_index 0, c_ref_index 1), (d_ref_index 0, d_ref_index 1), (e_ref_index 0, e_ref_index 1), each group corresponding to a current position forward reference frame index and backward reference frame index.

Taking the a position as an example: if A_ref_Index0 is larger than-1, indicating that the forward reference frame of the A position is valid, adding one to the number of reference frames corresponding to the index ref_Index0 in the previous column table, namely, ref_time [0] [ A_ref_Index0] ++; if A_ref_Index1 is larger than-1, indicating that the backward reference frame of the A position is valid, adding one to the number of reference frames corresponding to the index ref_Index0 in the column table, namely, ref_time [1] [ A_ref_Index1] ++; the other location method is the same as the a location.

(2) Segmentation of reference frame information by different CUs

There are various partition types of the current prediction block, as shown in fig. 2, in which the partition modes of the binary tree and the trigeminal tree are shown, and besides, there are modes of non-partition and quadtree partition. The coding prediction process is performed sequentially, so that other CU partitions may be already done when the partition type is done. For example, please refer to fig. 17, which illustrates a schematic view of prediction order of different CU partitions related to the present application. As shown in fig. 17, currently is the tt_hor partition type (trigeminal tree horizontal partition), the previous CU partition (no partition (NONE), binary tree horizontal partition (bt_hor), and binary tree vertical partition (bt_ver)) has already been predicted, and thus the reference frame selected in the previously determined CU partition information can be used as an available reference frame in the current partition manner.

The prediction order of the different CU partitions is only one of them, and is an example for aiding understanding, and the order is not fixed here in fact.

The reference frame information for each block under each determined CU partition is taken. And if the judging method is similar to that of the adjacent block, and if the optimal mode is an inter mode, recording the reference frame information of the current block, and adding one to the corresponding reference frame list and the reference frame times.

(3) Optimal reference frame index after current CU has done merge prediction

Referring to fig. 18, a schematic diagram of an inter prediction order according to the present application is shown. As shown in fig. 18, the merge prediction is performed before the AMVP prediction, so that the optimum information of the merge prediction can be obtained.

Here, the prediction order is only one of them, and is an example for aiding understanding, and in fact, the merge prediction and AMVP prediction order are fixed here, but Intra (inter) prediction is not fixed.

(4) Data arrangement

The reference frames are ordered from large to small according to the selected times. Thus, the probability that the reference frame stored in the front is selected is higher, so that the reference frame with low probability is skipped.

Definition of the structure: struct RefInfo {

int num；

int8_t ref_idx；

}；

The structure body represents the corresponding selected number num under the current reference frame index ref_idx.

And recording the information of the reference frames which are ordered from large to small according to the selected times by using a two-dimensional array RefInfo ref_info [2] [16 ].

S1403, generating a reference frame template

The template is marked as a two-dimensional array mask [2], corresponds to the template of the forward reference frame list and the backward reference frame list respectively, and consists of an initialization template, a main template and an enhancement template 3, namely:

mask[0] = mask_init [0]| mask_main [0] | mask_add[0]；

mask[1] = mask_init [1]| mask_main [1] | mask_add[1]。

(1) Initializing templates

And generating initialization templates which are marked as mask_init2 and respectively correspond to the forward list and the backward list according to the collected reference frame information. Initializing mask_init [2] = {0,0}.

And step 1, constructing an initial template according to the ordered reference frame information.

The reference frames are ordered from big to small according to the selected times, all the reference frames in the current list are circulated, the selected times of the current reference frames are judged in sequence, and then whether the current reference frames are added into the primary selection template is judged according to the selected times.

Referring to fig. 19, a flow chart of initial reference frame template construction is shown in accordance with the present application. As shown in fig. 19: the direction of each reference frame (i.e. whether it belongs to the forward reference frame list or the backward reference frame list) and the corresponding reference frame index are determined one by one, and the threshold thr is used to cut off the reference frame with low probability of selection.

As shown in fig. 19, the initial reference frame template construction flow is as follows.

S1901, initializing the index of the current reference frame list to 0 (i.e., list=0).

S1902, judging whether the index of the current reference frame list is smaller than (list < 2), if yes, proceeding to S1904, otherwise proceeding to S1903.

S1903, the process ends.

S1904, traversing each reference frame in the current reference frame list in turn to perform the following processing:

s1904a, judging whether the index of the currently traversed reference frame in the reference frame list is smaller than the maximum index value (for example, a reference frame list contains 16 reference frames at most, and the index of each reference frame is increased from 0 to 16), if not, entering S1905, otherwise, executing the following step S1904b;

s1904b, judging whether the current reference frame satisfies the condition: the number of times selected is greater than 0 and the index of the reference frame in the list is less than a threshold thr (here threshold thr corresponds to n above); if not, go to S1905, otherwise, execute the subsequent step S1904c;

s1904c, the value of the current reference frame in the corresponding mask_init [ ] is set to 1, and after the index +1 of the currently traversed reference frame in the reference frame list, the process returns to step S1904a.

S1905, index +1 of the current reference frame list, and returns to S1902.

The reference frame list is a reference frame list which is sorted according to the sequence of the selected times of the reference frames from big to small, and the selected times are smaller and the reference value is smaller in the process of judging from front to back. If the initial reference frame template is added in its entirety, unnecessary reference frames are increased, and the encoding speed is compromised.

And 2, correcting and supplementing.

Since after the above operation, there may be a case where mask_init is 0, at this time mask_init= (1 < <0x 7); binary number corresponding to hexadecimal 0x7 is 111, mask_init= (1 < <0x 7); the bit position of the corresponding template representing the 3 nearest reference frames is 1, and when the template is applied later, whether the current bit is 1 is judged, namely whether the current reference frame participates in prediction is judged.

Wherein mask_init is 0 because no useful information is collected; if mask_init is 0, the reference frame template forces the addition of the last 3 reference frames.

(2) Main template

And generating according to the optimal cost of the first part MVP.

There are 2 reference frame lists, forward list and backward list. The master template is therefore also directed forward and backward. Denoted as mask_main [2], initialized to 0, i.e., mask_main [2] = {0,0}.

Searching the MVP cost minimum value corresponding to all the reference frames in the current reference frame list, and recording the MVP cost minimum value as a mincost; and then comparing all the reference frame rate distortion costs with a mincost in turn, and adding the reference frames into a main reference frame template corresponding to the current reference frame list by combining a threshold value. Referring to fig. 20, a flow chart of the master template construction is shown in accordance with the present application.

As shown in fig. 20, the main template construction flow is as follows.

S2001, the index of the current reference frame list is initialized to 0 (i.e., list=0).

S2002, judging whether the index of the current reference frame list is smaller than (list < 2), if yes, entering S2004, otherwise, entering S2003.

S2003, the process ends.

S2004, traversing each reference frame in the current reference frame list in turn to perform the following processing:

s2004a, judging whether the index of the currently traversed reference frame in the reference frame list is smaller than the maximum index value, if not, entering S2005, otherwise, executing step S2004b;

s2004b, judging whether the MVP cost corresponding to the currently traversed reference frame is smaller than thr (the threshold thr corresponds to the specified coefficient), if not, entering S2005, otherwise, executing step S2004c;

s2004c, the value of the currently traversed reference frame in the corresponding mask_main [ ] is set to 1, and after the index of the currently traversed reference frame in the reference frame list +1, the process returns to step S2004a.

S2005, index +1 of the current reference frame list is returned to S2002.

The reference frame list is a reference frame list which is sequenced according to the order from small to large MVP cost corresponding to the reference frames, and in the front-to-back judging process, the corresponding MVP cost is larger and the reference value is smaller. If the initial reference frame template is added in its entirety, unnecessary reference frames are increased, and the encoding speed is compromised.

(3) Reinforced template

In relation to the frame type weights and the maximum selected number of collected reference frames. Denoted as mask_add [2], initialized to 0, i.e., mask_add [2] = {0,0}.

First, a threshold thr is generated in conjunction with the current frame type weight.

Please refer to fig. 21, which illustrates a reference relationship diagram corresponding to different frame types related to the present application, and as shown in fig. 21, the importance of the frames of different types is ordered from large to small, I frame > P frame > B frame > non-reference B frame.

In the embodiment of the present application, there may be different structures of groups of pictures (Group of Pictures, GOP), taking GOP16 as a column, please refer to fig. 22, which shows a schematic diagram of the reference relationship of GOP16 related to the present application.

As can be seen from the reference relationship of fig. 21, poc16 references poc0; poc8 refers to poc0 and poc16; poc4 references poc0 and poc8; poc2 references poc0 and poc4; while poc1, poc3, poc5, poc7, poc9, poc11, poc13 and poc15 are not referenced.

Thus, the weights are shown in table 1 below.

As shown in table 1, the weights are ordered from big to small: poc0> poc16> poc8> poc4> poc2> poc1.

The selection of the reference frame may be adjusted in conjunction with the current frame type corresponding weights. The current frame type is determined before prediction, so that the current weight level can be obtained before prediction and is denoted as slice_level.

The param value may be predefined by the developer, for example, in the embodiment of the present application, param [6] = {5,5,5,5,4,4}, may be defined.

If the highest selected number of collected reference frames is less than the threshold thr, compensation is forced for the reference frame template.

And judging the forward list and the rear list in sequence. If replenishment is required, compensation is performed according to the following procedure:

if the current frame weight slice_level is less than 3, adding the last 3 reference frames into the template, namely mask_add=0x7;

otherwise, if the current frame weight slice_level is greater than 4, adding the latest 1 reference frames into the template, namely mask_add=0x1;

otherwise, the last 2 reference frames are added to the template, i.e., mask_add=0x3.

S1404, guiding prediction by using template

And (3) circulating the two reference frame lists, acquiring a reference frame template mask under the current reference frame list, sequentially judging all the reference frames under the current list, if the corresponding bit of the corresponding mask of the current reference frame is 1, indicating that the current reference frame is available, otherwise, skipping the current reference frame, and referring to FIG. 23, wherein the reference frame template application diagram is shown.

As shown in fig. 23, the reference frame template application flow is as follows.

S2301, an index of the current reference frame list is initialized to 0 (i.e., list=0).

S2302, judging whether the index of the current reference frame list is smaller than (list < 2), if yes, proceeding to S2304, otherwise proceeding to S2303.

S2303, the process ends.

S2304, traversing each reference frame in the current reference frame list in order to perform the following processing:

s2304a, judging whether the index of the currently traversed reference frame in the reference frame list is smaller than the maximum index value, if not, entering S2005, otherwise, executing step S2304b;

s2304b, judging whether the value of the currently traversed reference frame in the corresponding mask_main is 1, if not, returning to the step S2304a after the index of the currently traversed reference frame in the reference frame list is +1, otherwise, executing the step S2304c;

s2304c, inter-frame prediction is performed by the currently traversed reference frame, and after the index +1 of the currently traversed reference frame in the reference frame list, the process returns to step S2304a.

S2305, index +1 of the current reference frame list, and returns to S2302.

According to the method, the reference frame information of the coded CU block can be fully extracted, meanwhile, when the current block originally traverses the reference frame, the process of obtaining the optimal MVP is advanced, so that the selection of the reference frame can be guided without increasing calculation, and then the reference frame template is adaptively generated by combining with the frame type weight information, and the aim of accelerating coding is achieved.

The scheme is suitable for compression protocols such as H.265, H.266, avs, av1 or higher-level compression protocols.

The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

Referring to fig. 24, a block diagram of an inter prediction apparatus according to an embodiment of the present application is shown. The device has the function of realizing the method example, and the function can be realized by hardware or can be realized by executing corresponding software by hardware. The apparatus may be a computer device as described above, or may be provided on a computer device. The apparatus may include the following modules.

The cost obtaining module 2401 is configured to obtain a target rate distortion cost of each reference frame in the reference frame list in a process of encoding the target encoding unit; the target rate distortion cost is the rate distortion cost corresponding to the optimal MVP of the reference frame;

a reference frame determining module 2402, configured to determine available reference frames in the respective reference frames based on respective target rate distortion costs of the respective reference frames;

a prediction module 2403, configured to perform inter-prediction on the target coding unit based on available reference frames in the respective reference frames.

In some embodiments, the cost acquisition module 2401 is configured to,

setting the reference frames with the target rate distortion cost smaller than a cost threshold value in the reference frames as the available reference frames; or,

and setting a maximum m reference frames with the target rate distortion cost smaller than a cost threshold value in the reference frames as the available reference frames, wherein m is an integer greater than or equal to 1.

In some embodiments, the cost threshold is determined by the product of a minimum rate distortion cost and a specified coefficient;

the minimum rate-distortion cost is the minimum value of the target rate-distortion cost of each of the reference frames.

In some embodiments, the apparatus further comprises:

a number acquisition module, configured to acquire respective selected numbers of times of the respective reference frames before the prediction module 2403 performs inter-frame prediction on the target coding unit based on available reference frames in the respective reference frames;

The reference frame determining module 2402 is further configured to determine available reference frames in the reference frames based on the respective selected times of the reference frames;

wherein the selected times include at least one of the following times:

the number of times the reference frame is selected by adjacent coding blocks; the adjacent coding block is a coding block which is adjacent to the target coding unit and has completed coding, and the optimal mode of the adjacent coding block is an inter prediction mode;

before the current prediction division is carried out on the target coding unit, the reference frame is selected times in the process of carrying out prediction division by other prediction division modes;

the reference frame is selected times in the process of inter-frame prediction by other inter-frame prediction modes before the current inter-frame prediction is carried out on the target coding unit.

In some embodiments, the reference frame determination module 2402 is configured to,

setting the reference frames with the selected times larger than a first time threshold value in the reference frames as the available reference frames; or,

and setting up to n reference frames with the selected times larger than a first time threshold value as the available reference frames in the reference frames, wherein n is an integer larger than or equal to 1.

In some embodiments, the reference frame determining module 2402 is further configured to, in the case where, in the respective reference frames, there are no reference frames selected for a number of times greater than the first time threshold, set p reference frames closest to the current frame among the respective reference frames as the available reference frames, where p is an integer greater than or equal to 1.

In some embodiments, the number acquisition module is further configured to acquire a maximum number of selected times, where the maximum number of selected times is a maximum value of the number of selected times of the respective reference frames;

the reference frame determining module 2402 is further configured to determine available reference frames in the reference frames according to a frame type of the current frame if the maximum selected number of times is less than a second number of times threshold.

In some embodiments, the reference frame determining module 2402 is configured to set q reference frames closest to the current frame among the reference frames as the available reference frames if the maximum selected number of times is less than a second number of times threshold, and q is an integer greater than or equal to 1;

wherein the value of q is determined by the frame type of the current frame.

In some embodiments, the value of q is positively correlated with the weight corresponding to the frame type of the current frame.

In some embodiments, the cost obtaining module 2401 is further configured to record an optimal MVP of each of the reference frames;

the prediction module 2403 is configured to traverse available reference frames in the reference frames, and perform motion estimation based on an optimal MVP of the available reference frames, so as to perform inter-frame prediction on the target coding unit.

Referring to fig. 25, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be the encoding end device described above or the decoding end device described above. The computer device 2500 may include: a processor 2501, a memory 2502, a communication interface 2503, an encoder/decoder 2504, and a bus 2505.

The processor 2501 includes one or more processing cores, and the processor 2501 executes various functional applications and information processing by running software programs and modules.

The memory 2502 may be used to store a computer program, and the processor 2501 is used to execute the computer program to implement the inter-prediction method described above.

Communication interface 2503 may be used to communicate with other devices, such as to transmit and receive audiovisual data.

The encoder/decoder 2504 may be used to implement encoding and decoding functions, such as encoding and decoding audio-video data.

The memory 2502 is connected to the processor 2501 by a bus 2505.

Further, the memory 2502 may be implemented by any type of volatile or nonvolatile memory device or combination thereof, including but not limited to: magnetic or optical disks, EEPROMs (Electrically Erasable Programmable Read-Only Memory), EPROMs (Erasable Programmable Read-Only Memory), SRAMs (Static Random-Access Memory), ROMs (Read-Only Memory), magnetic memories, flash memories, PROMs (Programmable Read-Only Memory).

Those skilled in the art will appreciate that the architecture shown in fig. 25 is not limiting and that computer device 2500 may include more or less components than illustrated, or combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which when executed by a processor, implement the inter prediction method described above.

In an exemplary embodiment, a computer program product or a computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the inter prediction method described above.

The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and scope of the invention.

Claims

1. An inter prediction method, the method comprising:

2. The method of claim 1, wherein the obtaining the target rate distortion costs for each reference frame in the list of reference frames comprises:

3. The method of claim 1, wherein said determining available ones of the respective reference frames based on their respective target rate distortion costs comprises:

4. A method according to claim 3, wherein the cost threshold is determined by the product of a minimum rate distortion cost and a specified coefficient;

5. The method of claim 1, wherein prior to inter-predicting the target coding unit based on available ones of the respective reference frames, further comprising:

acquiring the selected times of each reference frame;

determining available reference frames in the reference frames based on the selected times of the reference frames;

wherein the selected times include at least one of the following times:

6. The method of claim 5, wherein said determining available ones of said respective reference frames based on respective selected times of said respective reference frames comprises:

7. The method of claim 6, wherein the method further comprises:

and setting p reference frames closest to the current frame in the reference frames as the available reference frames under the condition that the reference frames with the selected times larger than the first time threshold value do not exist in the reference frames, wherein p is an integer larger than or equal to 1.

8. The method of claim 5, wherein the method further comprises:

obtaining the maximum selected times, wherein the maximum selected times are the maximum value of the selected times of each reference frame;

and under the condition that the maximum selected times is smaller than a second time threshold, determining available reference frames in the reference frames according to the frame type of the current frame.

9. The method of claim 8, wherein the determining available ones of the respective reference frames based on the frame type of the current frame if the maximum selected number is less than a second number threshold comprises:

setting q reference frames closest to a current frame in the reference frames as the available reference frames under the condition that the maximum selected times are smaller than a second time threshold, wherein q is an integer greater than or equal to 1;

wherein the value of q is determined by the frame type of the current frame.

10. The method of claim 9, wherein the value of q is positively correlated with a weight corresponding to the frame type of the current frame.

11. The method of claim 1, wherein obtaining the target rate distortion cost for each reference frame in the list of reference frames further comprises:

recording the optimal MVP of each reference frame;

the inter-predicting the target coding unit based on available reference frames in the respective reference frames includes:

12. An inter prediction apparatus, the apparatus comprising:

13. A computer device comprising a processor and a memory having stored therein at least one computer instruction that is loaded and executed by the processor to implement the method of any of claims 1 to 11.

14. A computer readable storage medium having stored therein at least one computer instruction that is loaded and executed by a processor in a computer device to implement the method of any one of claims 1 to 11.

15. A computer program product, the computer program product comprising at least one computer instruction stored in a computer readable storage medium; the at least one computer instruction being loaded and executed by a processor in a computer device to implement the method of any one of claims 1 to 11.