CN113923455B

CN113923455B - Bidirectional inter-frame prediction method and device

Info

Publication number: CN113923455B
Application number: CN202111040982.3A
Authority: CN
Inventors: 符婷; 陈焕浜; 杨海涛; 张昊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2023-07-18
Anticipated expiration: 2038-03-30
Also published as: CN110324623B; CN113923455A; WO2019184639A1; CN110324623A

Abstract

The embodiment of the application discloses a bidirectional inter-frame prediction method and a bidirectional inter-frame prediction device, relates to the technical field of video coding and decoding, and solves the problem of how to select a bidirectional prediction motion compensation technology for bidirectional inter-frame prediction to achieve the best balance between compression ratio and computational complexity. The specific scheme is as follows: firstly, obtaining motion information of a current image block, and obtaining an initial prediction block of the current image block according to the motion information; then, a motion compensation mode of the current image block is determined according to the attribute information of the initial prediction block, or according to the motion information and the attribute information of the current image block, and finally, the current image block is motion compensated according to the determined motion compensation mode and the initial prediction block. The motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction. The embodiment of the application is used for the process of bidirectional inter prediction.

Description

Bidirectional inter-frame prediction method and device

Technical Field

The embodiment of the application relates to the technical field of video encoding and decoding, in particular to a bidirectional inter-frame prediction method and device.

Background

The video coding compression technology mainly adopts block-based hybrid video coding to divide a frame of video image into a plurality of blocks, and uses the blocks as units to realize video coding compression through steps of intra prediction (intra prediction), inter prediction (inter prediction), transformation (transform), quantization (quantization), entropy coding (entropy coding), in-loop filtering (in-loop filtering) (mainly deblocking filtering (de-blocking filtering)), and the like. Inter prediction may also be referred to as motion compensated prediction (motion compensation prediction, MCP), i.e., motion information for a block is obtained and then the predicted pixel values for the block are determined based on the motion information. The process of calculating motion information for a block is referred to as motion estimation (motion estimation, ME) and the process of determining predicted pixel values for the block from the motion information is referred to as motion compensation (motion compensation, MC). Inter prediction includes forward prediction, backward prediction, and bi-directional prediction according to prediction directions.

For bi-prediction, first, a forward prediction block of a current image block is obtained according to forward prediction according to motion information, and a backward prediction block of the current image block is obtained according to backward prediction according to motion information, then, pixel values of the same pixel positions in the forward prediction block and the backward prediction block are subjected to weighted prediction based on a weighted prediction technology of bi-prediction to obtain a prediction block of the current image block, or a bi-prediction based optical flow technology (bi-directional optical flow, BIO) determines the prediction block of the current image block according to the forward prediction block and the backward prediction block.

The weighted prediction technology has the advantages of simple calculation, but when the weighted prediction technology is applied to block-level-based motion compensation, the image prediction effect of complex textures is poor, and the compression efficiency is low. Although the BIO technique can improve the compression ratio by motion refinement at the pixel level, the BIO technique has high computational complexity, greatly affects the encoding and decoding speeds, and in some cases, the compression effect even exceeding that of the BIO technique can be achieved by using a weighted prediction technique. Therefore, how to select motion compensation techniques for bi-directional inter-prediction to achieve the best tradeoff between compression ratio and computational complexity is a challenge.

Disclosure of Invention

The embodiment of the application provides a bidirectional inter-frame prediction method and a bidirectional inter-frame prediction device, which solve the problem of how to select a bidirectional prediction motion compensation technology for bidirectional inter-frame prediction so as to achieve the optimal balance between compression ratio and computational complexity.

In order to achieve the above purpose, the embodiment of the application adopts the following technical scheme:

in a first aspect of an embodiment of the present application, a bi-directional inter prediction method is provided, including: after the motion information of the current image block is acquired, an initial prediction block of the current image block is acquired according to the motion information, then a motion compensation mode of the current image block is determined according to the attribute information of the initial prediction block, or a motion compensation mode of the current image block is determined according to the motion information and the attribute information of the current image block, and finally the motion compensation is performed on the current image block according to the determined motion compensation mode and the initial prediction block. The current image block is an image block to be encoded or an image block to be decoded. The motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction.

According to the bidirectional inter prediction method provided by the embodiment of the application, when the current image block is subjected to motion compensation, a proper motion compensation mode is determined according to the characteristics of the current image block and the characteristics of the initial prediction block of the current image block, so that the characteristics of high compression ratio and low coding and decoding complexity are considered, and the optimal balance of the compression ratio and the complexity is effectively achieved.

The motion information described in the embodiments of the present application may include a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector. With reference to the first aspect, in one possible implementation manner, the obtaining, according to motion information, an initial prediction block of a current image block specifically includes: determining a first initial prediction block of the current image block according to a first reference frame index and a first motion vector, and determining a second initial prediction block of the current image block according to a second reference frame index and a second motion vector, wherein the first reference frame index is used for representing an index of a frame where a forward reference block of the current image block is located, the first motion vector is used for representing a motion displacement of the current image block relative to the forward reference block, the attribute information of the first initial prediction block comprises pixel values of M x N pixel points, the second reference frame index is used for representing an index of a frame where a backward reference block of the current image block is located, the second motion vector is used for representing a motion displacement of the current image block relative to the backward reference block, the attribute information of the second initial prediction block comprises pixel values of M x N pixel points, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 1.

With reference to the foregoing possible implementation manners, in one possible implementation manner, the determining, according to the attribute information of the initial prediction block, a motion compensation manner of the current image block according to the embodiment of the present application specifically includes: obtaining M x N pixel difference values according to the pixel values of M x N pixel points of the first initial prediction block and the pixel values of M x N pixel points of the second initial prediction block, determining texture complexity of the current image block according to the M x N pixel difference values, and determining a motion compensation mode according to the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, determining the texture complexity of the current image block according to the m×n pixel differences includes: calculating the sum of absolute values of M x N pixel difference values; the sum of the absolute values of the M x N pixel differences is determined as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, determining the texture complexity of the current image block according to the m×n pixel differences includes: calculating an average value of the M x N pixel difference values; the average value of the M x N pixel differences is determined as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, determining the texture complexity of the current image block according to the m×n pixel differences includes: calculating standard deviation of M x N pixel difference values; the standard deviation of the M x N pixel differences is determined as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining a motion compensation manner according to the texture complexity of the current image block specifically includes: judging whether the texture complexity of the current image block is smaller than a first threshold value, wherein the first threshold value is any real number larger than 0; if the texture complexity of the current image block is smaller than a first threshold value, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction; if the texture complexity of the current image block is greater than or equal to a first threshold, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction.

With reference to the foregoing possible implementation manner, in one possible implementation manner, the motion amplitude of the current image block in the embodiment of the present application is determined by motion information, and the motion compensation manner is determined according to the motion information and attribute information of an initial prediction block, which specifically includes: determining a first motion amplitude of the current image block according to the first motion vector, and determining a second motion amplitude of the current image block according to the second motion vector; and determining a motion compensation mode according to the first motion amplitude, the second motion amplitude and the attribute information of the initial prediction block.

Optionally, in another possible implementation manner of the present application, the motion compensation manner is determined according to the first motion amplitude, the second motion amplitude, and attribute information of the initial prediction block, where the attribute information of the initial prediction block may be a pixel value of a pixel point. The manner in which the attribute information of the initial prediction block is obtained may be referred to the possible implementations described above. The method for determining the motion compensation mode comprises the following steps: obtaining M x N pixel difference values according to the pixel values of the M x N pixel points of the first initial prediction block and the pixel values of the M x N pixel points of the second initial prediction block; determining texture complexity of the current image block according to the M x N pixel difference values; determining a selection probability according to the texture complexity, the first motion amplitude, the second motion amplitude and the first mathematical model of the current image block; or, determining a selection probability according to the texture complexity of the current image block, the first motion amplitude and the second motion amplitude, and inquiring a first mapping table, wherein the first mapping table comprises the corresponding relation between the selection probability and the texture complexity of the current image block, the first motion amplitude and the second motion amplitude; and determining a motion compensation mode according to the selection probability.

With reference to the first aspect, in one possible implementation manner, the motion information includes a first motion vector and a second motion vector, and determining a motion compensation manner of the current image block according to the motion information and attribute information of the current image block includes: determining a selection probability according to the size of the current image block, a horizontal component of a first motion vector, a vertical component of a first motion vector, a horizontal component of a second motion vector, a vertical component of a second motion vector, and a second mathematical model, the first motion vector including the horizontal component of the first motion vector and the vertical component of the first motion vector, the second motion vector including the horizontal component of the second motion vector and the vertical component of the second motion vector; alternatively, determining the selection probability by querying a second mapping table according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector, wherein the second mapping table comprises the correspondence between the selection value and the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector; and determining a motion compensation mode according to the selection probability.

Optionally, in another possible implementation manner of the present application, the determining a motion compensation manner according to the selection probability specifically includes: judging whether the selection probability is larger than a second threshold value, wherein the second threshold value is any real number which is larger than or equal to 0 and smaller than or equal to 1; if the selection probability is larger than a second threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction; and if the selection probability is smaller than or equal to the second threshold value, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction.

In a second aspect of embodiments of the present application, there is provided an encoding method, including: the bidirectional inter prediction method according to any of the above aspects is used in an encoding process, where a current image block is an image block to be encoded.

In a third aspect of the embodiments of the present application, a decoding method is provided, including: the bi-directional inter prediction method according to any of the above aspects is used in a decoding process, where a current image block is an image block to be decoded.

In a fourth aspect of embodiments of the present application, there is provided a bidirectional inter prediction apparatus, including: a motion estimation unit, a determination unit and a motion compensation unit.

Specifically, the motion estimation unit is configured to obtain motion information of a current image block, where the current image block is an image block to be encoded or an image block to be decoded; the determining unit is used for obtaining an initial prediction block of the current image block according to the motion information; the determining unit is further configured to determine a motion compensation mode of the current image block according to the attribute information of the initial prediction block, or according to the motion information and the attribute information of the current image block, where the motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction; the motion compensation unit is used for performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block.

The motion information described in the embodiments of the present application includes a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector. With reference to the fourth aspect, in one possible implementation manner, the determining unit is specifically configured to: determining a first initial prediction block of the current image block according to a first reference frame index and a first motion vector, wherein the first reference frame index is used for representing an index of a frame where a forward reference block of the current image block is located, the first motion vector is used for representing motion displacement of the current image block relative to the forward reference block, attribute information of the first initial prediction block comprises pixel values of M x N pixel points, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 1; and determining a second initial prediction block of the current image block according to a second reference frame index and a second motion vector, wherein the second reference frame index is used for indicating the index of a frame where a backward reference block of the current image block is located, the second motion vector is used for indicating the motion displacement of the current image block relative to the backward reference block, and the attribute information of the second initial prediction block comprises pixel values of M x N pixel points.

With reference to the foregoing possible implementation manner, in one possible implementation manner, the foregoing determining unit is specifically configured to: obtaining M x N pixel difference values according to the pixel values of the M x N pixel points of the first initial prediction block and the pixel values of the M x N pixel points of the second initial prediction block; determining texture complexity of the current image block according to the M x N pixel difference values; and determining a motion compensation mode according to the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: calculating the sum of absolute values of M x N pixel difference values; the sum of the absolute values of the M x N pixel differences is determined as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: calculating an average value of the M x N pixel difference values; the average value of the M x N pixel differences is determined as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: calculating standard deviation of M x N pixel difference values; the standard deviation of the M x N pixel differences is determined as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: judging whether the texture complexity of the current image block is smaller than a first threshold value, wherein the first threshold value is any real number larger than 0; if the texture complexity of the current image block is smaller than a first threshold value, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction; if the texture complexity of the current image block is greater than or equal to a first threshold, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction.

With reference to the foregoing possible implementation manner, in one possible implementation manner, the motion amplitude of the current image block in the embodiment of the present application is determined by motion information, and the determining unit is specifically configured to: determining a first motion amplitude of the current image block according to the first motion vector, and determining a second motion amplitude of the current image block according to the second motion vector; and determining a motion compensation mode according to the first motion amplitude, the second motion amplitude and the attribute information of the initial prediction block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: obtaining M x N pixel difference values according to the pixel values of the M x N pixel points of the first initial prediction block and the pixel values of the M x N pixel points of the second initial prediction block; determining texture complexity of the current image block according to the M x N pixel difference values; determining a selection probability according to the texture complexity, the first motion amplitude, the second motion amplitude and the first mathematical model of the current image block; or, determining a selection probability according to the texture complexity of the current image block, the first motion amplitude and the second motion amplitude, and inquiring a first mapping table, wherein the first mapping table comprises the corresponding relation between the selection probability and the texture complexity of the current image block, the first motion amplitude and the second motion amplitude; and determining a motion compensation mode according to the selection probability.

With reference to the fourth aspect, in a possible implementation manner, the motion information includes a first motion vector and a second motion vector, and the determining unit is specifically configured to: determining a selection probability according to the size of the current image block, a horizontal component of a first motion vector, a vertical component of a first motion vector, a horizontal component of a second motion vector, a vertical component of a second motion vector, and a second mathematical model, the first motion vector including the horizontal component of the first motion vector and the vertical component of the first motion vector, the second motion vector including the horizontal component of the second motion vector and the vertical component of the second motion vector; alternatively, determining the selection probability by querying a second mapping table according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector, wherein the second mapping table comprises the correspondence between the selection value and the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector; and determining a motion compensation mode according to the selection probability.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: judging whether the selection probability is larger than a second threshold value, wherein the second threshold value is any real number which is larger than or equal to 0 and smaller than or equal to 1; if the selection probability is larger than a second threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction; and if the selection probability is smaller than or equal to the second threshold value, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction.

In a fifth aspect of the embodiments of the present application, there is provided a terminal, including: one or more processors, memory, and communication interfaces; the memory and the communication interface are connected with one or more processors; the terminal communicates with other devices via a communication interface, and the memory is configured to store computer program code comprising instructions that, when executed by the one or more processors, perform the bi-directional inter-prediction method of any of the aspects described above.

In a sixth aspect of embodiments of the present application, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the bi-directional inter prediction method of any of the above aspects.

A seventh aspect of embodiments of the present application provides a computer readable storage medium comprising instructions that, when executed on a terminal, cause the terminal to perform the bi-directional inter prediction method of any of the above aspects.

An eighth aspect of the embodiments of the present application provides a video encoder, including a nonvolatile storage medium and a central processing unit, where the nonvolatile storage medium stores an executable program, and the central processing unit is connected to the nonvolatile storage medium, and when the central processing unit executes the executable program, the video encoder executes the bidirectional inter prediction method of any aspect.

A ninth aspect of the embodiments of the present application provides a video decoder, including a nonvolatile storage medium and a central processing unit, where the nonvolatile storage medium stores an executable program, and the central processing unit is connected to the nonvolatile storage medium, and when the central processing unit executes the executable program, the video decoder executes the bidirectional inter prediction method of any aspect.

In addition, the technical effects brought by the design manners in any aspect may be referred to the technical effects brought by the different design manners in the first aspect, which are not described herein.

In the embodiment of the application, the names of the bidirectional inter-frame prediction device and the terminal do not limit the device itself, and in actual implementation, these devices may appear under other names. Insofar as the function of each device is similar to that of the embodiments of the present application, it is within the scope of the claims of the present application and the equivalents thereof.

Drawings

Fig. 1 is a simplified schematic diagram of a video transmission system architecture according to an embodiment of the present application;

FIG. 2 is a simplified schematic diagram of a video encoder according to an embodiment of the present application;

FIG. 3 is a simplified schematic diagram of a video decoder according to an embodiment of the present application;

fig. 4 is a flowchart of a bidirectional inter prediction method according to an embodiment of the present application;

fig. 5 is a schematic motion diagram of a current image block according to an embodiment of the present application;

FIG. 6 is a flowchart of another bi-directional inter prediction method according to an embodiment of the present application;

FIG. 7 is a flowchart of yet another bi-directional inter prediction method according to an embodiment of the present application;

fig. 8 is a schematic diagram of obtaining m×n pixel differences according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of yet another bi-directional inter prediction method according to an embodiment of the present application;

FIG. 10 is a flowchart of yet another bi-directional inter prediction method according to an embodiment of the present application;

fig. 11 is a schematic diagram of a bidirectional inter prediction apparatus according to an embodiment of the present application;

fig. 12 is a schematic diagram illustrating a composition of another bi-directional inter prediction apparatus according to an embodiment of the present application.

Detailed Description

The terms "first" and "second" and the like in the description and in the claims of the present application are used for distinguishing between different objects and not for limiting a particular order.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In order to facilitate understanding of the embodiments of the present application, relevant elements related to the embodiments of the present application are first described herein.

Video encoding (video encoding): a process of compressing video (image sequences) into a bitstream.

Video decoding (video decoding): and recovering the code stream into a processing procedure of reconstructing the image according to a specific grammar rule and a processing method.

In most coding frameworks, the video consists of a series of pictures (pictures), one picture being called a frame. The image is divided into at least one slice, each slice in turn being divided into image blocks (blocks). Video encoding or video decoding is in units of image blocks. For example, the encoding process or decoding process may be performed from left to right, top to bottom, line by line, starting from the upper left corner position of the image. Here, the image block may be a Macroblock (MB) in the video codec standard h.264, or may be a Coding Unit (CU) in the high efficiency video coding (figh efficiency video coding, HEVC) standard, which is not specifically limited in this embodiment of the present application.

In the embodiment of the present application, an image block that is being subjected to encoding processing or decoding processing is referred to as a current image block (current block), and an image in which the current image block is located is referred to as a current frame (current image).

In video coding, a current frame may be classified into an I frame, a P frame, and a B frame according to the prediction type of a current image block. I-frames are frames encoded as independent still images, providing random access points in the video stream. A P frame is a frame predicted from a previous I frame or P frame adjacent thereto, and can be used as a reference frame for a next P frame or B frame. The B frame is a frame obtained by bi-directionally predicting using the two nearest preceding and following frames (I frame or P frame) as reference frames. In the embodiment of the present application, the current frame refers to a bi-directional predicted frame (B-frame).

Because of the strong time correlation between the continuous frames of images in the video, that is, the adjacent frames contain much redundancy, the time correlation between the frames is often used to reduce the redundancy between the frames when the video is encoded, thereby achieving the purpose of compressing data. At present, a motion-compensated inter-frame prediction technology is mainly adopted to encode video so as to improve the compression ratio.

Inter prediction refers to prediction performed by using correlation between a current frame and its reference frames in units of encoded image blocks or decoded image blocks, and the current frame may have one or more reference frames. Specifically, a prediction block of the current image block is generated from pixels in a reference frame of the current image block.

Specifically, when the encoding end encodes a current image block in a current frame, firstly, randomly selecting more than one reference frame from frames encoded by a video image, acquiring a predicted block corresponding to the current image block from the reference frames, then calculating a residual error value between the predicted block and the current image block, and carrying out quantization encoding on the residual error value; when decoding a current image block in a current frame, a decoding end firstly acquires a predicted image block corresponding to the current image block, then acquires a residual value of the predicted image block and the current image block from a received code stream, and decodes and reconstructs the current image block according to the residual value and the predicted block.

The temporal correlation between the current frame and other frames in the video is expressed not only in that there is a temporal correlation between the current frame and the frame encoded before it, but also in that there is a temporal correlation between the current frame and the frame encoded after it. Based on this, bi-directional inter prediction can be considered in video coding to obtain a better coding effect.

In general, for a current image block, a prediction block of the current image block may be generated from only one reference block, or may be generated from two reference blocks. The above-described generation of the prediction block of the current image block from one reference block is referred to as unidirectional inter prediction, and the above-described generation of the prediction block of the current image block from two reference blocks is referred to as bidirectional inter prediction. The two reference image blocks in the bi-directional inter prediction may be from the same reference frame or different reference frames.

Alternatively, bi-directional inter-prediction may refer to inter-prediction using the correlation between a current video frame and a video frame encoded before and played before it, and the correlation between a current video frame and a video frame encoded before and played after it.

It can be seen that the bi-directional inter prediction described above involves inter prediction in two directions, generally referred to as: forward inter prediction and backward inter prediction. Forward inter prediction refers to inter prediction using correlation between a current video frame and a video frame encoded before it and played before it. Backward inter prediction refers to inter prediction using the correlation between a current video frame and video frames encoded before and played after it.

Motion compensation is a method for describing the difference between adjacent frames (adjacent here means adjacent in the coding relation, and two frames are not necessarily adjacent in the playing sequence), is a process of finding a reference block of a current image block according to motion information, and obtaining a prediction block of the current image block by processing the reference block of the current image block, and belongs to a loop in the inter-frame prediction process.

For bidirectional inter prediction, the predicted block of the current image block can be obtained by performing weighted prediction on pixel values of the same pixel positions in the forward predicted block of the current image block and the backward predicted block of the current image block by using a weighted prediction technology based on bidirectional prediction, or the predicted block of the current image block can be determined according to the forward predicted block of the current image block and the backward predicted block of the current image block by using an optical flow technology based on bidirectional prediction. However, the weighted prediction technology based on bidirectional prediction is simple in calculation and low in compression efficiency; the optical flow technology based on the bidirectional prediction has high compression efficiency and high calculation complexity. Therefore, how to select motion compensation technique for bi-directional prediction to achieve the best tradeoff between compression ratio and computational complexity is a challenge.

In view of the above problems, an embodiment of the present application provides a bidirectional inter prediction method, which has the following basic principles: after the motion information of the current image block is acquired, an initial prediction block of the current image block is acquired according to the motion information, then a motion compensation mode of the current image block is determined according to the attribute information of the initial prediction block, or according to the motion information and the attribute information of the current image block, and then the current image block is subjected to motion compensation according to the determined motion compensation mode and the initial prediction block. The current image block is an image block to be encoded or an image block to be decoded. The motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction. Therefore, when the current image block is subjected to motion compensation, a proper motion compensation mode is determined according to the characteristics of the current image block and the characteristics of the initial prediction block of the current image block, so that the characteristics of high compression ratio and low coding and decoding complexity are considered, and the optimal balance of the compression ratio and the complexity is effectively achieved.

The implementation of the examples of the present application will be described in detail below with reference to the accompanying drawings.

The bidirectional inter-frame prediction method provided by the embodiment of the application is suitable for a video transmission system. Fig. 1 shows a simplified schematic diagram of a video transmission system 100 architecture to which embodiments of the present application may be applied. As shown in fig. 1, the video transmission system includes a source device and a destination device.

The source device includes a video source 101, a video encoder 102, and an output interface 103.

In some examples, video source 101 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of the above video data sources. The video source 101 is configured to collect video data, perform pre-encoding processing on the collected video data, convert the optical signal into a digitized image sequence, and transmit the digitized image sequence to the video encoder 102.

The video encoder 102 is configured to encode a sequence of images from the video source 101 to obtain a code stream.

Output interface 103 may include a modulator/demodulator (modem) and/or a transmitter. The output interface 103 is used for sending the code stream obtained by encoding by the video encoder 102.

In some examples, the source device transmits the encoded code stream directly to the destination device via output interface 103. The encoded bitstream may also be stored on a storage medium or file server for later access by a destination device for decoding and/or playback. Such as storage 107.

The destination device includes an input interface 104, a video decoder 105, and a display device 106.

In some examples, the input interface 104 includes a receiver and/or a modem. The input interface 104 may receive the code stream transmitted via the network 108 transmitted by the output interface 103 and transmit the code stream to the video decoder 105. Network 108 may be an IP network including routers and switches, etc.

The video decoder 105 is configured to decode the code stream received by the input interface 104 and reconstruct the image sequence. The video encoder 102 and the video decoder 105 may operate in accordance with a video compression standard, such as the high efficiency video codec h.265 standard.

The display device 106 may be integral with the destination device or may be external to the destination device. In general, the display device 106 displays the decoded video data. The display device 106 may include a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or other types of display devices.

The destination device may further include a rendering module for rendering the reconstructed image sequence decoded by the video decoder 105, so as to improve the display effect of the video.

Specifically, the bi-directional inter prediction method according to the embodiments of the present application may be performed by the video encoder 102 and the video decoder 105 in the video transmission system shown in fig. 1.

The video encoder and video decoder will be briefly described below in conjunction with fig. 2 and 3.

Fig. 2 is a simplified schematic diagram of a video encoder 200 according to an embodiment of the present application. The video encoder 200 includes an inter predictor 201, an intra predictor 202, a summer 203, a transformer 204, a quantizer 205, and an entropy encoder 206. For image block reconstruction, the video encoder 200 also includes an inverse quantizer 207, an inverse transformer 208, a summer 209, and a filter unit 210. The inter predictor 201 includes a motion estimation unit and a motion compensation unit. The intra predictor 202 includes a selection intra prediction unit and an intra prediction unit. Filter unit 210 is intended to represent one or more loop filters, such as a deblocking filter, an Adaptive Loop Filter (ALF), and a Sample Adaptive Offset (SAO) filter. Although filter unit 210 is shown in fig. 2 as an in-loop filter, in other implementations, filter unit 210 may be implemented as a post-loop filter. In one example, the video encoder 200 may further include a video data memory, a segmentation unit (not shown). The video data memory may store video data to be encoded by components of the video encoder 200. Video data stored in the video data memory may be obtained from a video source. DPB107 may be a reference picture memory that stores reference video data for encoding video data by video encoder 200 in an intra, inter coding mode. The video data memory and DPB107 may be formed of any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. The video data memory and DPB107 may be provided by the same memory device or separate memory devices. In various examples, the video data memory may be on-chip with other components of video encoder 100, or off-chip with respect to those components.

The video encoder 200 receives video data and stores the video data in a video data memory. The segmentation unit segments the video data into image blocks and these image blocks may be further segmented into smaller blocks, e.g. image block segmentations based on a quadtree structure or a binary tree structure. Such partitioning may also include partitioning into slices (slices), tiles, or other larger units. Video encoder 200 generally illustrates the components that encode image blocks within a video slice to be encoded. A stripe may be divided into a plurality of tiles (and possibly into a set of tiles called a tile).

After the video data is divided to obtain the current image block, the current image block may be inter-predicted by the inter predictor 201. Inter prediction refers to searching for a matching reference block for a current image block in a current image in a reconstructed image, thereby obtaining motion information of the current image block, and then calculating prediction information (prediction block) of pixel values of pixel points in the current image block according to the motion information. Among them, the process of calculating motion information is called motion estimation. The motion estimation process requires that multiple reference blocks be tried in a reference picture for the current picture block, and that which reference block or blocks to use as prediction is determined ultimately using rate-distortion optimization (RDO) or other methods. The process of calculating the prediction block of the current image block is called motion compensation. Specifically, the bi-directional inter prediction method described in the embodiments of the present application may be performed by the inter predictor 201.

After the video data is segmented to obtain the current image block, the current image block may also be intra-predicted by the intra-predictor 202. Intra prediction refers to predicting pixel values of pixels in a current image block by using pixel values of pixels in a reconstructed image block in an image in which the current image block is located.

After the video data generates a prediction block of the current image block through the inter predictor 201 and the intra predictor 202, the video encoder 200 forms a residual image block by subtracting the prediction block from the current image block to be encoded. Summer 203 represents one or more components that perform this subtraction operation. Residual video data in the residual block may be included in one or more Transform Units (TUs) and applied to transformer 204. The transformer 204 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform or a conceptually similar transform. The transformer 204 may convert the residual video data from a pixel value domain to a transform domain, such as the frequency domain.

The transformer 204 may send the resulting transform coefficients to the quantizer 205. The quantizer 205 quantizes the transform coefficients to further reduce bit rate. In some examples, quantizer 205 may then perform a scan of a matrix including quantized transform coefficients. Alternatively, the entropy encoder 206 may perform the scanning.

After quantization, the quantized transform coefficients are entropy encoded by an entropy encoder 206. For example, the entropy encoder 206 may perform Context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), probability Interval Partitioning Entropy (PIPE) coding, or another entropy coding method or technique. After entropy encoding by entropy encoder 206, the encoded stream may be transmitted to video decoder 300, or archived for later transmission or retrieval by video decoder 300. The entropy encoder 206 may also entropy encode syntax elements of the current image block to be encoded.

The inverse quantizer 207 and the inverse transformer 208 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain, for example for later use as a reference block of a reference image. Summer 209 adds the reconstructed residual block to the prediction block generated by inter predictor 201 or intra predictor 202 to generate a reconstructed image block. The filter unit 210 may be adapted to reconstruct image blocks to reduce distortion, such as block artifacts (blockartifacts). The reconstructed image block is then stored as a reference block in a decoded image buffer, which may be used by the inter predictor 201 as a reference block for inter prediction of a block in a subsequent video frame or image.

It should be appreciated that other structural variations of video encoder 200 may be used to encode a video stream. For example, for some image blocks or image frames, video encoder 200 may directly quantize the residual signal without processing via transformer 204, and accordingly without processing via inverse transformer 208; alternatively, for some image blocks or image frames, the video encoder 200 does not generate residual data and accordingly does not need to process through the transformer 203, quantizer 205, inverse quantizer 207, and inverse transformer 208; alternatively, the video encoder 200 may directly store the reconstructed image block as a reference block without processing by the filter unit 210; alternatively, quantizer 205 and inverse quantizer 207 in video encoder 200 may be combined.

The video encoder 200 is used to output video to a post-processing entity 211. Post-processing entity 211 represents an example of a video entity, such as a Media Aware Network Element (MANE) or a stitching/editing device, that may process encoded video data from video encoder 200. In some cases, post-processing entity 211 may be an instance of a network entity. In some video coding systems, the post-processing entity 211 and the video encoder 200 may be parts of separate devices, while in other cases the functionality described with respect to the post-processing entity 211 may be performed by the same device that includes the video encoder 200. In one example, post-processing entity 211 is an example of storage 107 of FIG. 1.

Fig. 3 is a simplified schematic diagram of a video decoder 300 according to an embodiment of the present application. The video decoder 300 includes an entropy decoder 301, an inverse quantizer 302, an inverse transformer 303, a summer 304, a filter unit 305, an inter predictor 306, and an intra predictor 307. Video decoder 300 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 200 from fig. 2. First, residual information is obtained using the entropy decoder 301, the inverse quantizer 302, and the inverse transformer 303, and the decoded code stream determines whether intra prediction or inter prediction is used for the current image block. If it is intra prediction, the intra predictor 307 constructs prediction information according to the used intra prediction method using pixel values of pixel points in the surrounding reconstructed region. If the inter prediction is performed, the inter predictor 306 needs to parse out motion information, determine a reference block in the reconstructed image by using the parsed motion information, and use the pixel value of the pixel point in the block as prediction information, and obtain reconstruction information by filtering operation by using the prediction information and residual information.

The bidirectional inter-frame prediction method disclosed by the embodiment of the application is not only suitable for wireless application scenes, but also can be applied to video encoding and decoding supporting various multimedia applications such as the following applications: air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, a video codec system may be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

The bidirectional inter prediction method provided in the embodiment of the present application may be performed by a bidirectional inter prediction apparatus, or may be performed by a video codec, or may be performed by another device having a video codec function, which is not specifically limited in the embodiment of the present application.

For convenience of explanation, a bidirectional inter prediction method will be explained below taking a bidirectional inter prediction apparatus as an execution subject.

Fig. 4 is a flowchart of a bidirectional inter prediction method according to an embodiment of the present application. The bi-directional inter prediction method shown in fig. 4 may occur in either the encoding process or the decoding process. For example, the bi-directional inter prediction method shown in fig. 4 may occur in an inter prediction process at the time of encoding and decoding. As shown in fig. 4, the bi-directional inter prediction method includes:

s401, the bidirectional inter-frame prediction device acquires motion information of a current image block.

The current image block is an image block to be encoded or an image block to be decoded. If the current image block is the image block to be encoded, the motion information of the current image block can be obtained according to motion estimation. If the current image block is the image block to be decoded, the motion information of the current image block can be obtained according to code stream decoding.

The motion information mainly includes prediction direction information of the current image block, a reference frame index of the current image block, and a motion vector of the current image block. The prediction direction information of the current image block includes forward prediction, backward prediction, and bi-prediction. The reference frame index of the current image block indicates the index of the frame in which the reference block of the current image block is located. The reference frame index of the current image block includes a forward reference frame index of the current image block and a backward reference frame index of the current image block according to the prediction direction. The motion vector of the current image block represents the motion displacement of the current image block relative to the reference block.

The motion vector includes a horizontal component (denoted as MV _x ) And vertical component (denoted as MV _y ). The horizontal component represents the motion displacement of the current image block in the horizontal direction with respect to the reference block. The vertical component represents the motion displacement of the current image block in the vertical direction relative to the reference block. If the prediction direction information indicates forward prediction or backward prediction, there are only one motion vector, and if the prediction direction information indicates bidirectional prediction, there are two motion vectors. For example, bi-directionally predicted motion information includes a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector. The first reference frame index is used to indicate the index of the frame in which the forward reference block of the current image block is located. The first motion vector is used to represent the motion displacement of the current image block relative to the forward reference block. The second reference frame index is used to indicate the index of the frame in which the backward reference block of the current image block is located. The second motion vector is used to represent the motion displacement of the current image block relative to the backward reference block.

For example, as shown in fig. 5, B represents the current image block. The frame in which the current image block is located is the current frame. A denotes a forward reference block. The frame in which the forward reference block is located is a forward reference frame. C denotes a backward reference block. The frame in which the backward reference block is located is a backward reference frame. 0 denotes a forward direction, and 1 denotes a backward direction. MV0 represents a forward motion vector, mv0= (MV 0 _x ,MV0 _y ) Wherein MV0 _x Representing the horizontal component of the forward motion vector, MV0 _y Representing the vertical component of the forward motion vector. MV1 represents a backward motion vector, mv1= (MV 1 _x ,MV1 _y ) Wherein MV1 _x Representing the horizontal component of the forward motion vector, MV1 _y Representing the vertical component of the forward motion vector. The dashed line represents the motion trajectory of the current image block B.

S402, the bidirectional inter-frame prediction device acquires an initial prediction block of the current image block according to the motion information.

The process of obtaining the initial prediction block of the current image block according to the motion information may specifically refer to the prior art, where the initial prediction block of the current image block includes a forward prediction block and a backward prediction block. By way of example, as shown in fig. 6, S402 may be implemented by the following detailed steps.

S601, the bidirectional inter-frame prediction device determines a first initial prediction block of the current image block according to a first reference frame index and a first motion vector.

Firstly, the bidirectional inter-frame prediction device can determine a first reference frame where a first reference block of a current image block is located according to a first reference frame index, then, the first reference block of the current image block is determined in the first reference frame according to a first motion vector, and the first reference block is subjected to sub-pixel interpolation to obtain a first initial prediction block. The first initial prediction block may refer to a forward prediction block of the current image block.

The first reference frame index is assumed to be a forward reference frame index. For example, as shown in fig. 5, a forward reference frame in which a forward reference block a of a current image block B is located is first determined according to a forward reference frame index, then the same coordinate point (i ', j ') is found in the forward reference frame according to the coordinates (i, j) of the current image block, then a block B ' in the forward reference frame is determined according to the length and width of the current image block B, and a forward motion vector mv0= (MV 0) of the current image block B _x ,MV0 _y ) And (3) moving the block B' to a forward reference block A, and obtaining a forward prediction block of the current image block B by the forward reference block A through subpixel interpolation. (i, j) represents the coordinates of the point in the upper left corner of the current image block B in the current frame. The origin of coordinates of the current frame is the point of the upper left corner of the current frame where the current image block B is located. (i ', j ') represents the coordinates of the point in the upper left corner of block B ' in the forward reference frame. The origin of coordinates of the forward reference frame is the point in the upper left corner of the forward reference frame where block B' is located.

S602, the bidirectional inter-frame prediction device determines a second initial prediction block of the current image block according to the second reference frame index and the second motion vector.

Firstly, the bidirectional inter-frame prediction device can determine a second reference frame where a second reference block of the current image block is located according to a second reference frame index, then, the second reference block of the current image block is determined in the second reference frame according to a second motion vector, and the second reference block is subjected to sub-pixel interpolation to obtain a second initial prediction block. The second initial prediction block may refer to a backward prediction block of the current image block.

It should be noted that the process of determining the backward prediction block of the current image block is the same as the process of determining the forward prediction block of the current image block, except that the reference direction is different, and the specific method may be described with reference to S601. If the current image block is not bidirectional predicted, the obtained forward predicted block or backward predicted block is the predicted block of the current image block.

S403a, the bidirectional inter prediction device determines a motion compensation mode of the current image block according to the attribute information of the initial prediction block.

The attribute information of the initial prediction block includes a size of the initial prediction block, the number of pixels included in the initial prediction block, and pixel values of the pixels included in the initial prediction block. In addition, since the method described in the embodiments of the present application is directed to bi-directional inter prediction, the initial prediction block herein includes a first initial prediction block and a second initial prediction block. The first and second initial prediction blocks may be obtained in the manner described in reference to S402. The embodiment of the present application describes how to determine a motion compensation mode of a current image block according to attribute information of an initial prediction block by taking a pixel value of a pixel point included in the initial prediction block as an example.

For example, assuming that the current image block includes m×n pixels, the first initial prediction block includes m×n pixels, and the second initial prediction block includes m×n pixels. N is an integer greater than or equal to 1, M is an integer greater than or equal to 1, and M and N may be equal or unequal. As shown in fig. 7, S403a may be implemented by the following detailed steps.

And S701, the bidirectional inter-frame prediction device obtains M.N pixel difference values according to the pixel values of the M.N pixel points of the first initial prediction block and the pixel values of the M.N pixel points of the second initial prediction block.

The bi-directional inter-prediction device may obtain m×n pixel differences according to differences between the pixel values of m×n pixels of the first initial prediction block and the pixel values of m×n pixels of the second initial prediction block. It should be understood that the m×n pixel differences are obtained by subtracting, in sequence, the pixel values of the respective pixels included in the first initial prediction block from the pixel values of the corresponding positions in the second initial prediction block. The corresponding positions referred to herein refer to positions relative to the same coordinate points in the same coordinate system. M x N pixel differences also correspond to the composition of an intermediate prediction block.

For example, as shown in fig. 8, assume that m=4, n=4. The current image block includes 4*4 pixels, i.e., b _0,0 ,b _0,1 ,b _0,2 ,b _0,3 ....b _3,0 ,b _3,1 ,b _3,2 ,b _3,3 . The first initial prediction block comprises 4*4 pixels, i.e., a _0,0 ,a _0,1 ,a _0,2 ,a _0,3 ....a _3,0 ,a _3,1 ,a _3,2 ,a _3,3 . The second initial prediction block comprises 4*4 pixels, i.e., c _0,0 ,c _0,1 ,c _0,2 ,c _0, ₃ ....c _3,0 ,c _3,1 ,c _3,2 ,c _3,3 . A is a _0,0 、b _0,0 And c _0,0 As the origin of coordinates, a two-dimensional rectangular coordinate system is established with i as the abscissa and j as the ordinate j. For example, pixel point a in the first initial prediction block _0,0 Pixel point b corresponding to the same position in the second initial prediction block as the same coordinate node (0, 0) _0,0 With a _0,0 Subtracting b _0,0 The pixel difference of the coordinate node (0, 0) is obtained. And obtaining 4*4 pixel difference values according to the difference between the pixel values of 4*4 pixel points of the first initial prediction block and the pixel values of 4*4 pixel points of the second initial prediction block. For the formula to represent pixel difference values, the formula is D (i, j) =abs (a (i, j) -B (i, j)), where (i, j) represents coordinates within a block of pixel points. D (i, j) represents the pixel difference value of the pixel point having the coordinates (i, j), i.e., the pixel difference value of the pixel point of the ith row and the jth column. A (i, j) represents a pixel value of a pixel point having coordinates (i, j) included in the first initial prediction block. B (i, j) represents the coordinates comprised by the second initial prediction blockPixel values of the pixel points of (i, j). abs () represents an absolute value operation. i is an integer, i is 0 to M-1.j is an integer, j is 0 to N-1.4*4 pixels corresponding to 4*4 pixel difference values can form an intermediate prediction block, and the intermediate prediction block comprises 4*4 pixels, namely d _0,0 ,d _0,1 ,d _0,2 ,d _0, ₃ ....d _3,0 ,d _3,1 ,d _3,2 ,d _3,3 。

S702, the bidirectional inter-frame prediction device determines the texture complexity of the current image block according to the M x N pixel difference values.

The bi-directional inter-prediction apparatus may determine the texture complexity of the current image block according to the m×n pixel differences after obtaining the m×n pixel differences according to the pixel values of the m×n pixels of the first initial prediction block and the pixel values of the m×n pixels of the second initial prediction block.

In one possible implementation, the texture complexity of the current image block may be determined from the sum of m×n pixel differences. It should be understood that the sum of M x N pixel differences herein may also refer to the sum of absolute values of M x N pixel differences. The texture complexity of the current image block is the sum of M x N pixel differences. Expressing the texture complexity by a formula, wherein the formula is as followsWherein (1)>Representing texture complexity. The sum of absolute values of errors (Sum of Absolute Differences, SAD) represents the sum of absolute values of m×n pixel differences.

In another possible implementation, the texture complexity of the current image block may be determined from an average of m×n pixel differences. The texture complexity of the current image block is the average of the M x N pixel differences. Expressing the texture complexity by a formula, wherein the formula is as follows Where μ represents the average of the m×n pixel differences. M×n indicates the number of pixel points.

In a third possible implementation, the texture complexity of the current image block may be determined according to the standard deviation of the m×n pixel differences. The texture complexity of the current image block is the standard deviation of the M x N pixel differences. Expressing the texture complexity by a formula, wherein the formula is as followsWhere σ represents the standard deviation of the difference of m×n pixels.

S703, the bidirectional inter prediction device determines a motion compensation mode according to the texture complexity of the current image block.

The bi-directional inter prediction apparatus may determine the motion compensation mode according to the texture complexity of the current image block compared with a preset threshold. For example, whether the texture complexity of the current image block is smaller than a first threshold value is judged, and if the texture complexity of the current image block is smaller than the first threshold value, a motion compensation mode is determined to be a weighted prediction technology based on bidirectional prediction; if the texture complexity of the current image block is greater than or equal to a first threshold, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction. The first threshold is any real number greater than 0, such as 150 or 200, etc. In practical applications, the first threshold may be adjusted according to the codec parameters, the specific codec, and the target codec time. The value of the first threshold may be preset or set in a high-level syntax. At a higher level, the syntax may be specified in a sequence parameter set (sequence parameter set, SPS), a picture parameter set (picture parameter set, PPS), or slice header (slice header) or the like.

S403b, the bidirectional inter prediction device determines a motion compensation mode of the current image block according to the motion information and the attribute information of the initial prediction block.

When the bidirectional inter prediction apparatus determines the motion compensation mode of the current image block according to the attribute information of the initial prediction block, the bidirectional inter prediction apparatus may determine the motion compensation mode together with the attribute information of the initial prediction block according to the motion amplitude of the current image block. The motion amplitude of the current image block may be determined from the motion information. The attribute information of the initial prediction block may be obtained according to S701 and S702 described above, and the embodiments of the present application are not described herein.

For example, as shown in fig. 9, after the bi-directional inter-prediction apparatus determines the texture complexity of the current image block according to m×n pixel differences, i.e. S702, the embodiments of the present application may further include the following detailed steps.

S901, the bidirectional inter-frame prediction device determines a first motion amplitude of a current image block according to a first motion vector, and determines a second motion amplitude of the current image block according to a second motion vector.

Illustratively, the first motion amplitude is formulated asWherein MV0 _x Representing the horizontal component of the first motion vector (forward motion vector). MV0 _y Representing the vertical component of the first motion vector (forward motion vector). The second motion amplitude is expressed by the formula +. >Wherein MV1 _x Representing the horizontal component of the second motion vector (backward motion vector). MV1 _y Representing the vertical component of the second motion vector (backward motion vector).

It should be noted that, the sequence of the steps of the bidirectional inter prediction method provided in the embodiment of the present application may be appropriately adjusted, and the steps may also be increased or decreased accordingly according to the situation, for example, if the front-to-back sequence among S901, S701, and S702 may be interchanged, that is, S901 may be executed first, and then S701 and S702 may be executed, and any method that is familiar with the technical field and that can easily think about the change is covered in the protection scope of the present application, so that the description is omitted.

S902, the bidirectional inter-frame prediction device determines a selection probability according to texture complexity, a first motion amplitude, a second motion amplitude and a first mathematical model of the current image block.

The first mathematical model may be, for example, a first logistic regression model. The first logistic regression model is as follows:

wherein omega ₀ ，ω ₁ ，ω ₂ And omega ₃ Is a parameter of the first logistic regression model. Omega ₀ Is typically 2.06079643. Omega ₁ Is typically-0.01175306. Omega ₂ Is typically-0.00122516. Omega ₃ Is typically-0.0008786. Substituting dist0 and dist1 into the first logistic regression model, respectively, may result in a selection probability y. It should be noted that, the parameters of the first logistic regression model may be set in advance or in a high level grammar. At a higher level, the syntax may be specified in a parameter set such as SPS, PPS, slice header, etc.

Alternatively, a first mapping table may be predefined at the time of encoding, in addition to the selection probability y calculated by the logistic regression model. The texture complexity of the current image block, the possible values of the first motion amplitude and the second motion amplitude and the corresponding values of the selection probability y are stored in the first mapping table. The value of the selection probability y can be obtained in a table look-up mode during encoding.

S903, the bidirectional inter prediction device determines a motion compensation mode according to the selection probability.

The motion compensation mode can be determined by comparing the selection probability with a preset threshold value. For example, whether the selection probability is greater than a second threshold is determined, and if the selection probability is greater than the second threshold, the motion compensation mode is determined to be an optical flow technology based on bidirectional prediction; and if the selection probability is smaller than or equal to the second threshold value, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction. The second threshold is any real number that is equal to or greater than 0 and equal to or less than 1. For example, the second threshold may take a value of 0.7.

S403c, the bidirectional inter prediction device determines a motion compensation mode of the current image block according to the motion information and the attribute information of the current image block.

The attribute information of the current image block includes the size of the current image block, the number of pixels included in the current image block, and the pixel values of the pixels included in the current image block. The bi-directional inter prediction apparatus determines a motion compensation mode according to motion information and attribute information of a current image block, taking a size of the current image block as an example, in conjunction with the accompanying drawings. Because the current image block is composed of the pixel point arrays formed by the pixel points, the bidirectional inter-frame prediction device can obtain the size of the current image block according to the pixel points. It is understood that the size of the current image block is the width and height of the current image block. As shown in fig. 10, S403c may be implemented by the following detailed steps.

S1001, the bi-directional inter prediction apparatus determines a selection probability according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector, the vertical component of the second motion vector, and the second mathematical model.

The second mathematical model may be, for example, a second logistic regression model. The second logistic regression model is as follows:

y＝1/(1+exp(-1×(ω ₀ +ω ₁ ·H+ω ₂ ·W+ω ₃ ·MV0 _x +ω ₄ ·MV0 _y +ω ₅ ·MV1 _x +ω ₆ ·MV1 _y )))

Wherein omega ₀ ，ω ₁ ，ω ₂ 、ω ₃ 、ω ₄ 、ω ₅ And omega ₆ Is a parameter of the second logistic regression model. Omega ₀ Is typically-0.18929861. Omega ₁ Is typically 4.81715386e-03. Omega ₂ Is typically 4.66279123e-03. Omega ₃ Is typically-7.46496930 e-05. Omega ₄ Is typically 1.23565538e-04. Omega ₅ Is typically-4.25855176 e-05. Omega ₆ Is typically 1.44069088e-04.W represents the width of the prediction block of the current image block. H represents the high of the prediction block of the current image block. MV0 _x Representing the horizontal component of the first motion vector (forward motion vector). MV0 _y Representing the vertical component of the first motion vector (forward motion vector). MV1 _x Representing a second motion vector (backward motion vector)Amount) of the horizontal component. MV1 _y Representing the vertical component of the second motion vector (backward motion vector). The selection probability y can be obtained by substituting the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector, and the vertical component of the second motion vector into the second logistic regression model, respectively. It should be noted that, the parameters of the second logistic regression model may be set in advance or in a high level grammar. At a higher level, the syntax may be specified in a parameter set such as SPS, PPS, slice header, etc.

Alternatively, a second mapping table may be predefined at the time of encoding, in addition to the selection probability y calculated by the second logistic regression model. The second mapping table stores the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector, and the possible values of the vertical component of the second motion vector, and the corresponding values of the selection probabilities y. The value of the selection probability y can be obtained in a table look-up mode during encoding.

S1002, the bidirectional inter-frame prediction device determines a motion compensation mode according to the selection probability.

For a detailed explanation of S1002, reference may be made to the explanation in S903, and the embodiments of the present application are not described herein.

S404, the bidirectional inter-frame prediction device performs motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block.

The initial prediction block includes a first initial prediction block and a second initial prediction block. The motion compensation of the current image block by the weighted prediction technique based on bi-directional prediction and the initial prediction block, and the motion compensation of the current image block by the optical flow technique based on bi-directional prediction and the initial prediction block can refer to the specific implementation manner of the prior art, and the embodiments of the present application are not described herein.

Further, after the bi-directional inter prediction apparatus determines the motion compensation method used for bi-directional motion compensation by the bi-directional inter prediction method described in the above embodiments, the selected motion compensation method may be written into the syntax element of the current image block. The decision action is not required to be repeated during decoding, and only the motion compensation mode is required to be directly selected according to the syntax element.

For example, a syntax element (bio_flag) is allocated to the current picture block, and the syntax element occupies 1 bit in the bitstream. When the value of Bio_flag is 0, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; when bio_flag takes a value of 1, it indicates that the motion compensation mode is an optical flow technique based on bi-directional prediction. The initial value of Bio_flag is 0. When the decoding end parses the bitstream, a value of a syntax element (bio_flag) of the current decoding block is obtained. The motion compensation mode used for bi-directional motion compensation is determined according to the value of bio_flag. If the Bio_flag value is 0, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; if Bio_flag is 1, the motion compensation mode is an optical flow technology based on bidirectional prediction.

Alternatively, the decision method used by the bi-directional inter prediction apparatus to determine the motion compensation mode may also be determined by setting a syntax element at a high level syntax. For example, the decision method is a first decision method, a second decision method, or a third decision method. The first decision method is to determine the motion compensation mode of the current image block according to the attribute information of the initial prediction block. The second decision method is to determine the motion compensation mode of the current image block according to the motion information and the attribute information of the initial prediction block. And the third judging method determines the motion compensation mode of the current image block according to the motion information and the attribute information of the current image block. Specific implementation manners of the first decision method, the second decision method and the third decision method may be described in detail with reference to the foregoing embodiments, which are not described herein. Syntax elements may be set in parameter sets such as SPS, PPS, slice header, etc.

By way of example, the syntax element may be a select mode (select_mode), which occupies 2 bits in the bitstream. The initial value of the syntax element select_mode is 0. As shown in table 1, the value of the select_mode and the indicated decision method are shown in the following table:

TABLE 1

Value of select_mode	Decision method
		0	First decision method
1	Second decision method
		2	Third decision method

After the bidirectional inter prediction device acquires the motion information of the current image block, a motion compensation mode is determined according to a specified decision method. If the determined decision method is the first decision method, the bidirectional inter prediction device performs bidirectional inter prediction according to the first decision method. And if the determined judging method is a second judging method, the bidirectional inter-frame prediction device carries out bidirectional inter-frame prediction according to the second judging method. And if the determined judging method is a third judging method, the bidirectional inter-frame prediction device carries out bidirectional inter-frame prediction according to the third judging method.

According to the bidirectional inter prediction method, when the current image block is subjected to motion compensation, a proper motion compensation mode is determined according to the characteristics of the current image block and the characteristics of the prediction block of the current image block, so that the characteristics of high compression ratio and low coding and decoding complexity are considered, and the optimal balance of the compression ratio and the complexity is effectively achieved.

The above description has been presented mainly from the point of interaction between the network elements. It will be appreciated that each network element, e.g. bi-directional inter-frame prediction means, in order to achieve the above described functionality, comprises corresponding hardware structures and/or software modules performing each function. Those of skill in the art will readily appreciate that the algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the present application may divide the functional modules of the bidirectional inter prediction apparatus according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

In the case of dividing the respective functional modules with the respective functions, fig. 11 shows a schematic diagram of one possible composition of the bi-directional inter prediction apparatus referred to in the above and embodiments, as shown in fig. 11, the bi-directional inter prediction apparatus may include: motion estimation unit 1101, determination unit 1102, motion compensation unit 1103.

The motion estimation unit 1101 is configured to support the bidirectional inter-frame prediction device to perform S401 in the bidirectional inter-frame prediction method shown in fig. 4, S401 in the bidirectional inter-frame prediction method shown in fig. 6, S401 in the bidirectional inter-frame prediction method shown in fig. 7, S401 in the bidirectional inter-frame prediction method shown in fig. 9, and S401 in the bidirectional inter-frame prediction method shown in fig. 10.

A determination unit 1102 for supporting the bi-directional inter prediction apparatus to perform S402, S403a, S403b, and S403c in the bi-directional inter prediction method shown in fig. 4, S601, S602, S403a, S403b, and S403c in the bi-directional inter prediction method shown in fig. 6, S601, S602, S701-S703, S403b, and S403c in the bi-directional inter prediction method shown in fig. 7, S601, S602, S701-S703, S901-S903, and S403c in the bi-directional inter prediction method shown in fig. 9, and S601, S602, S403a, S403b, S1001, and S1002 in the bi-directional inter prediction method shown in fig. 10.

The motion compensation unit 1103 is configured to support the bidirectional inter prediction apparatus to perform S404 in the bidirectional inter prediction method shown in fig. 4, S404 in the bidirectional inter prediction method shown in fig. 6, S404 in the bidirectional inter prediction method shown in fig. 7, S404 in the bidirectional inter prediction method shown in fig. 9, and S404 in the bidirectional inter prediction method shown in fig. 10.

It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

The bidirectional inter prediction device provided by the embodiment of the application is used for executing the bidirectional inter prediction method, so that the same effect as the bidirectional inter prediction method can be achieved.

In case of using an integrated unit, fig. 12 shows another possible composition diagram of the bi-directional inter prediction apparatus involved in the above-described embodiment. As shown in fig. 12, the bi-directional inter prediction apparatus includes: a processing module 1201 and a communication module 1202.

The processing module 1201 is configured to control and manage the operation of the bi-directional inter-frame prediction device, for example, the processing module 1201 is configured to support the bi-directional inter-frame prediction device to perform S402, S403a, S403b, and S403c in the bi-directional inter-frame prediction method shown in fig. 4, S601, S602, S403a, S403b, and S403c in the bi-directional inter-frame prediction method shown in fig. 6, S601, S602, S701-S703, S403b, and S403c in the bi-directional inter-frame prediction method shown in fig. 7, S601, S602, S701-S703, S901-S903, and S403c in the bi-directional inter-frame prediction method shown in fig. 9, S601, S602, S403a, S1001, and S1002 in the bi-directional inter-frame prediction method shown in fig. 10, and/or other processes for the techniques described herein. The communication module 1202 is configured to support communication between the bi-directional inter-prediction apparatus and other network entities, such as communication with the functional modules or network entities shown in fig. 1 or 3. The bi-directional inter prediction apparatus may further comprise a storage module 1203 for storing program code and data of the bi-directional inter prediction apparatus.

Wherein the processing module 1201 may be a processor or a controller. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and so forth. The communication module 1202 may be a transceiver circuit or a communication interface, etc. The memory module 1203 may be a memory.

All relevant contents of each scenario related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

The bidirectional inter prediction apparatus 11 and the bidirectional inter prediction apparatus 12 may each perform the bidirectional inter prediction method shown in any one of fig. 4, 6, 7, 9 and 10, and the bidirectional inter prediction apparatus 11 and the bidirectional inter prediction apparatus 12 may be specifically a video encoding apparatus, a video decoding apparatus or other devices having a video encoding and decoding function. The bi-directional inter prediction means 11 and the bi-directional inter prediction means 12 may be used for motion compensation both during encoding and during decoding.

The application also provides a terminal, which comprises: one or more processors, memory, a communication interface. The memory, communication interface, and one or more processors; the memory is used to store computer program code that includes instructions that, when executed by the one or more processors, cause the terminal to perform the bi-directional inter prediction method of embodiments of the present application.

The terminals herein may be video display devices, smart phones, laptops and other devices that can process video or play video.

The application also provides a video encoder, which comprises a nonvolatile storage medium and a central processing unit, wherein the nonvolatile storage medium stores an executable program, and the central processing unit is connected with the nonvolatile storage medium and executes the executable program to realize the bidirectional inter prediction method of the embodiment of the application.

The application also provides a video decoder, which comprises a nonvolatile storage medium and a central processing unit, wherein the nonvolatile storage medium stores an executable program, and the central processing unit is connected with the nonvolatile storage medium and executes the executable program to realize the bidirectional inter prediction method of the embodiment of the application.

Another embodiment of the present application also provides a computer-readable storage medium including one or more program codes, the one or more programs including instructions, which when executed by a processor in a terminal, perform the bidirectional inter prediction method shown in any one of the above-described fig. 4, 6, 7, 9, and 10.

In another embodiment of the present application, there is also provided a computer program product comprising computer-executable instructions stored in a computer-readable storage medium; the at least one processor of the terminal may read the computer-executable instructions from the computer-readable storage medium, the at least one processor executing the computer-executable instructions causing the terminal to perform the steps of performing the bi-directional inter prediction apparatus in the bi-directional inter prediction method shown in any one of the above-described figures 4, 6, 7, 9 and 10.

The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A bi-directional inter prediction method, comprising:

obtaining motion information of a current image block, wherein the current image block is an image block to be encoded or an image block to be decoded;

acquiring an initial prediction block of the current image block according to the motion information; the initial prediction block includes a first initial prediction block and a second initial prediction block;

determining a motion compensation mode of the current image block according to the attribute information of the initial prediction block, wherein the motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology BIO based on bidirectional prediction; the determining the motion compensation mode of the current image block according to the attribute information of the initial prediction block comprises the following steps: obtaining m×n pixel difference values according to the pixel values of the m×n pixel points of the first initial prediction block and the pixel values of the m×n pixel points of the second initial prediction block; determining texture complexity of the current image block according to the M x N pixel difference values; determining the motion compensation mode according to the texture complexity of the current image block;

Performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block;

reconstructing the image block according to the current image block after motion compensation.

2. The method of claim 1, wherein the motion information comprises a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector;

the obtaining the initial prediction block of the current image block according to the motion information includes:

determining a first initial prediction block of the current image block according to the first reference frame index and the first motion vector, wherein the first reference frame index is used for representing an index of a frame where a forward reference block of the current image block is located, the first motion vector is used for representing motion displacement of the current image block relative to the forward reference block, attribute information of the first initial prediction block comprises pixel values of M x N pixel points, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 1;

and determining a second initial prediction block of the current image block according to the second reference frame index and the second motion vector, wherein the second reference frame index is used for indicating the index of a frame where a backward reference block of the current image block is located, the second motion vector is used for indicating the motion displacement of the current image block relative to the backward reference block, and the attribute information of the second initial prediction block comprises pixel values of M x N pixel points.

3. The method according to claim 1 or 2, wherein said determining the texture complexity of the current image block from the M x N pixel differences comprises:

calculating the sum of absolute values of the M x N pixel difference values;

and determining the sum of absolute values of the M x N pixel difference values as texture complexity of the current image block.

4. The method according to claim 1 or 2, wherein said determining the texture complexity of the current image block from the M x N pixel differences comprises:

calculating an average value of the M x N pixel difference values;

and determining the average value of the M x N pixel difference values as the texture complexity of the current image block.

5. The method according to claim 1 or 2, wherein said determining the texture complexity of the current image block from the M x N pixel differences comprises:

calculating standard deviation of the M x N pixel difference values;

and determining the standard deviation of the M x N pixel differences as the texture complexity of the current image block.

6. The method according to claim 1 or 2, wherein said determining the motion compensation mode according to the texture complexity of the current image block comprises:

Judging whether the texture complexity of the current image block is smaller than a first threshold value, wherein the first threshold value is any real number larger than 0;

if the texture complexity of the current image block is smaller than the first threshold value, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction;

and if the texture complexity of the current image block is greater than or equal to the first threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction.

7. A method of encoding, comprising:

the bi-directional inter prediction method according to any of claims 1-6, used in an encoding process, the current picture block being a picture block to be encoded.

8. A decoding method, comprising:

the bi-directional inter prediction method according to any of claims 1-6, used in a decoding process, the current picture block being a picture block to be decoded.

9. A bi-directional inter prediction apparatus, comprising:

the motion estimation unit is used for obtaining the motion information of a current image block, wherein the current image block is an image block to be encoded or an image block to be decoded; the initial prediction block includes a first initial prediction block and a second initial prediction block;

A determining unit, configured to obtain an initial prediction block of the current image block according to the motion information; the determining unit is specifically configured to: obtaining m×n pixel difference values according to the pixel values of the m×n pixel points of the first initial prediction block and the pixel values of the m×n pixel points of the second initial prediction block; determining texture complexity of the current image block according to the M x N pixel difference values; determining the motion compensation mode according to the texture complexity of the current image block;

the determining unit is further configured to determine a motion compensation mode of the current image block according to the attribute information of the initial prediction block, where the motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology BIO based on bidirectional prediction;

a motion compensation unit, configured to perform motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block;

and the reconstruction unit is used for reconstructing the image block according to the current image block after the motion compensation.

10. The apparatus of claim 9, wherein the motion information comprises a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector;

The determining unit is specifically configured to:

11. The apparatus according to claim 9 or 10, wherein the determining unit is specifically configured to:

calculating the sum of absolute values of the M x N pixel difference values;

12. The apparatus according to claim 9 or 10, wherein the determining unit is specifically configured to:

calculating an average value of the M x N pixel difference values;

13. The apparatus according to claim 9 or 10, wherein the determining unit is specifically configured to:

calculating standard deviation of the M x N pixel difference values;

14. The apparatus according to claim 9 or 10, wherein the determining unit is specifically configured to:

15. A terminal, the terminal comprising: one or more processors, memory, and communication interfaces;

the memory, the communication interface, and the one or more processors are connected; the terminal being in communication with other devices via the communication interface, the memory being for storing computer program code comprising instructions which, when executed by the one or more processors, perform the bi-directional inter prediction method as claimed in any of claims 1 to 6.

16. A computer readable storage medium comprising instructions which, when run on a terminal, cause the terminal to perform the bi-directional inter prediction method of any of claims 1-6.

17. A video encoder comprising a non-volatile storage medium and a central processor, wherein the non-volatile storage medium stores an executable program, the central processor being coupled to the non-volatile storage medium, the video encoder performing the bi-directional inter prediction method of any of claims 1-6 when the executable program is executed by the central processor.

18. A video decoder comprising a non-volatile storage medium and a central processor, wherein the non-volatile storage medium stores an executable program, the central processor being coupled to the non-volatile storage medium, the video decoder performing the bi-directional inter prediction method of any of claims 1-6 when the executable program is executed by the central processor.