CN110324623B

CN110324623B - Bidirectional interframe prediction method and device

Info

Publication number: CN110324623B
Application number: CN201810276300.0A
Authority: CN
Inventors: 符婷; 陈焕浜; 杨海涛; 张昊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2021-09-07
Anticipated expiration: 2038-03-30
Also published as: WO2019184639A1; CN113923455B; CN113923455A; CN110324623A

Abstract

The embodiment of the application discloses a bidirectional interframe prediction method and a bidirectional interframe prediction device, relates to the technical field of video coding and decoding, and solves the problem of achieving the optimal balance of compression ratio and calculation complexity by selecting a bidirectional prediction motion compensation technology for bidirectional interframe prediction. The specific scheme is as follows: firstly, acquiring motion information of a current image block, and acquiring an initial prediction block of the current image block according to the motion information; and then, determining a motion compensation mode of the current image block according to the attribute information of the initial prediction block, or according to the motion information and the attribute information of the current image block, and finally, performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block. The motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction. The embodiment of the application is used for the bidirectional inter-frame prediction process.

Description

Bidirectional interframe prediction method and device

Technical Field

The embodiment of the application relates to the technical field of video coding and decoding, in particular to a bidirectional inter-frame prediction method and device.

Background

The video coding compression technology mainly adopts block-based hybrid video coding, divides a frame of video image into a plurality of blocks (blocks), and realizes video coding compression by taking the blocks as units through intra-frame prediction (intra-prediction), inter-frame prediction (inter-prediction), transformation (transform), quantization (quantization), entropy coding (entropy encoding) and in-loop filtering (in-loop filtering) (mainly de-blocking filtering). Inter prediction may also be referred to as Motion Compensation Prediction (MCP), in which motion information of a block is obtained first, and then a predicted pixel value of the block is determined according to the motion information. A process of calculating motion information of a block is called Motion Estimation (ME), and a process of determining a predicted pixel value of the block according to the motion information is called Motion Compensation (MC). Inter prediction includes forward prediction, backward prediction, and bi-directional prediction according to a difference in prediction direction.

For bidirectional prediction, a forward prediction block of a current image block is obtained according to motion information by forward prediction, a backward prediction block of the current image block is obtained according to backward prediction according to the motion information, and then a prediction block of the current image block is obtained by performing weighted prediction on pixel values of the same pixel position in the forward prediction block and the backward prediction block by a weighted prediction technology based on bidirectional prediction, or a prediction block of the current image block is determined according to the forward prediction block and the backward prediction block by a bi-directional optical flow (BIO) based on bidirectional prediction.

The weighted prediction technique has the advantages of simple calculation, but when the weighted prediction technique is applied to motion compensation based on a block level, the image prediction effect with complex texture is poor, and the compression efficiency is not high. Although the BIO technique can improve the compression ratio through the motion refinement at the pixel level, the BIO technique has high computational complexity, greatly influences the coding and decoding speed, and can achieve or even exceed the compression effect of the BIO technique by using the weighted prediction technique in some cases. Therefore, it is an urgent problem to achieve the optimal tradeoff between the compression ratio and the computational complexity for the motion compensation technique for bi-directional inter prediction.

Disclosure of Invention

The embodiment of the application provides a bidirectional interframe prediction method and device, and solves the problem of optimal balance between compression ratio and calculation complexity by selecting a bidirectional prediction motion compensation technology for bidirectional interframe prediction.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect of the embodiments of the present application, a bidirectional inter-frame prediction method is provided, including: after the motion information of the current image block is acquired, an initial prediction block of the current image block is acquired according to the motion information, then the motion compensation mode of the current image block is determined according to the attribute information of the initial prediction block, or the motion compensation mode of the current image block is determined according to the motion information and the attribute information of the current image block, and finally the motion compensation is carried out on the current image block according to the determined motion compensation mode and the initial prediction block. And the current image block is an image block to be encoded or an image block to be decoded. The motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction.

According to the bidirectional interframe prediction method provided by the embodiment of the application, motion compensation is carried out on the current image block, and a proper motion compensation mode is determined according to the characteristics of the current image block and the characteristics of the initial prediction block of the current image block, so that the characteristics of high compression ratio and low encoding and decoding complexity are considered, and the optimal balance of the compression ratio and the complexity is effectively achieved.

The motion information according to the embodiment of the present application may include a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector. With reference to the first aspect, in a possible implementation manner, the acquiring an initial prediction block of a current image block according to motion information specifically includes: determining a first initial prediction block of the current image block according to the first reference frame index and the first motion vector, and determining a second initial prediction block of the current image block based on the second reference frame index and the second motion vector, the first reference frame index is used for representing an index of a frame where a forward reference block of a current image block is located, the first motion vector is used for representing motion displacement of the current image block relative to the forward reference block, attribute information of the first initial prediction block comprises pixel values of M pixel points by N, the second reference frame index is used for representing an index of a frame where a backward reference block of the current image block is located, the second motion vector is used for representing motion displacement of the current image block relative to the backward reference block, the attribute information of the second initial prediction block comprises pixel values of M pixel points by N, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 1.

With reference to the foregoing possible implementation manners, in a possible implementation manner, the determining a motion compensation manner of a current image block according to attribute information of an initial prediction block according to an embodiment of the present application specifically includes: obtaining M pixel differences according to the pixel values of M pixel points of the first initial prediction block and the pixel values of M pixel points of the second initial prediction block, determining the texture complexity of the current image block according to the M pixel differences, and determining a motion compensation mode according to the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining the texture complexity of the current image block according to the M × N pixel difference values includes: calculating the sum of the absolute values of the difference values of M x N pixels; and determining the sum of the absolute values of the M x N pixel difference values as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining the texture complexity of the current image block according to the M × N pixel difference values includes: calculating the average value of the difference values of M x N pixels; and determining the average value of the M x N pixel difference values as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining the texture complexity of the current image block according to the M × N pixel difference values includes: calculating the standard deviation of the M x N pixel difference values; and determining the standard deviation of the M x N pixel difference values as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining a motion compensation manner according to the texture complexity of the current image block specifically includes: judging whether the texture complexity of the current image block is smaller than a first threshold value, wherein the first threshold value is any real number larger than 0; if the texture complexity of the current image block is smaller than a first threshold, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction; and if the texture complexity of the current image block is greater than or equal to a first threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction.

With reference to the foregoing possible implementation manners, in a possible implementation manner, determining a motion compensation manner according to motion information and attribute information of an initial prediction block according to a motion amplitude of a current image block determined by motion information according to an embodiment of the present application specifically includes: determining a first motion amplitude of the current image block according to the first motion vector, and determining a second motion amplitude of the current image block according to the second motion vector; and determining a motion compensation mode according to the first motion amplitude, the second motion amplitude and the attribute information of the initial prediction block.

Optionally, in another possible implementation manner of the present application, the motion compensation manner is determined according to the first motion amplitude, the second motion amplitude, and the attribute information of the initial prediction block, where the attribute information of the initial prediction block may be a pixel value of a pixel. The manner of acquiring the attribute information of the initial prediction block may refer to the above possible implementation manners. The method for determining the motion compensation mode comprises the following steps: obtaining M pixel differences according to the pixel values of M pixel points of the first initial prediction block and the pixel values of M pixel points of the second initial prediction block; determining the texture complexity of the current image block according to the M x N pixel difference values; determining the selection probability according to the texture complexity, the first motion amplitude, the second motion amplitude and the first mathematical model of the current image block; or querying a first mapping table according to the texture complexity, the first motion amplitude and the second motion amplitude of the current image block to determine the selection probability, wherein the first mapping table comprises the corresponding relation between the selection probability and the texture complexity, the first motion amplitude and the second motion amplitude of the current image block; and determining a motion compensation mode according to the selection probability.

With reference to the first aspect, in a possible implementation manner, the motion information includes a first motion vector and a second motion vector, and the determining, according to the motion information and the attribute information of the current image block, a motion compensation manner of the current image block includes: determining a selection probability according to the size of the current image block, a horizontal component of a first motion vector, a vertical component of the first motion vector, a horizontal component of a second motion vector, a vertical component of the second motion vector and a second mathematical model, wherein the first motion vector comprises the horizontal component of the first motion vector and the vertical component of the first motion vector, and the second motion vector comprises the horizontal component of the second motion vector and the vertical component of the second motion vector; or querying a second mapping table according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector to determine the selection probability, wherein the second mapping table comprises the corresponding relation between the selection value and the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector; and determining a motion compensation mode according to the selection probability.

Optionally, in another possible implementation manner of the present application, the determining a motion compensation manner according to the selection probability specifically includes: judging whether the selection probability is greater than a second threshold value, wherein the second threshold value is any real number which is greater than or equal to 0 and less than or equal to 1; if the selection probability is larger than a second threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction; and if the selection probability is less than or equal to the second threshold value, determining the motion compensation mode to be a weighted prediction technology based on bidirectional prediction.

In a second aspect of the embodiments of the present application, there is provided an encoding method, including: the bidirectional inter-frame prediction method in any aspect is used in an encoding process, and a current image block is an image block to be encoded.

In a third aspect of the embodiments of the present application, there is provided a decoding method, including: the bidirectional inter-frame prediction method in any aspect is used in a decoding process, and a current image block is an image block to be decoded.

In a fourth aspect of the embodiments of the present application, there is provided a bidirectional inter-frame prediction apparatus, including: a motion estimation unit, a determination unit and a motion compensation unit.

Specifically, the motion estimation unit is configured to obtain motion information of a current image block, where the current image block is an image block to be encoded or an image block to be decoded; the determining unit is configured to obtain an initial prediction block of the current image block according to the motion information; the determining unit is further configured to determine a motion compensation mode of the current image block according to the attribute information of the initial prediction block, or according to the motion information and the attribute information of the current image block, where the motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction; the motion compensation unit is configured to perform motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block.

The motion information described in the embodiments of the present application includes a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector. With reference to the fourth aspect, in a possible implementation manner, the determining unit is specifically configured to: determining a first initial prediction block of a current image block according to a first reference frame index and a first motion vector, wherein the first reference frame index is used for representing an index of a frame where a forward reference block of the current image block is located, the first motion vector is used for representing motion displacement of the current image block relative to the forward reference block, attribute information of the first initial prediction block comprises pixel values of M x N pixel points, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 1; and determining a second initial prediction block of the current image block according to a second reference frame index and a second motion vector, wherein the second reference frame index is used for indicating an index of a frame where a backward reference block of the current image block is located, the second motion vector is used for indicating motion displacement of the current image block relative to the backward reference block, and attribute information of the second initial prediction block comprises pixel values of M pixel points.

With reference to the foregoing possible implementation manner, in a possible implementation manner, the determining unit is specifically configured to: obtaining M pixel differences according to the pixel values of M pixel points of the first initial prediction block and the pixel values of M pixel points of the second initial prediction block; determining the texture complexity of the current image block according to the M x N pixel difference values; and determining a motion compensation mode according to the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: calculating the sum of the absolute values of the difference values of M x N pixels; and determining the sum of the absolute values of the M x N pixel difference values as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: calculating the average value of the difference values of M x N pixels; and determining the average value of the M x N pixel difference values as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: calculating the standard deviation of the M x N pixel difference values; and determining the standard deviation of the M x N pixel difference values as the texture complexity of the current image block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: judging whether the texture complexity of the current image block is smaller than a first threshold value, wherein the first threshold value is any real number larger than 0; if the texture complexity of the current image block is smaller than a first threshold, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction; and if the texture complexity of the current image block is greater than or equal to a first threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction.

With reference to the foregoing possible implementation manner, in a possible implementation manner, the motion amplitude of the current image block in the embodiment of the present application is determined by motion information, and the determining unit is specifically configured to: determining a first motion amplitude of the current image block according to the first motion vector, and determining a second motion amplitude of the current image block according to the second motion vector; and determining a motion compensation mode according to the first motion amplitude, the second motion amplitude and the attribute information of the initial prediction block.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: obtaining M pixel differences according to the pixel values of M pixel points of the first initial prediction block and the pixel values of M pixel points of the second initial prediction block; determining the texture complexity of the current image block according to the M x N pixel difference values; determining the selection probability according to the texture complexity, the first motion amplitude, the second motion amplitude and the first mathematical model of the current image block; or querying a first mapping table according to the texture complexity, the first motion amplitude and the second motion amplitude of the current image block to determine the selection probability, wherein the first mapping table comprises the corresponding relation between the selection probability and the texture complexity, the first motion amplitude and the second motion amplitude of the current image block; and determining a motion compensation mode according to the selection probability.

With reference to the fourth aspect, in a possible implementation manner, the motion information includes a first motion vector and a second motion vector, and the determining unit is specifically configured to: determining a selection probability according to the size of the current image block, a horizontal component of a first motion vector, a vertical component of the first motion vector, a horizontal component of a second motion vector, a vertical component of the second motion vector and a second mathematical model, wherein the first motion vector comprises the horizontal component of the first motion vector and the vertical component of the first motion vector, and the second motion vector comprises the horizontal component of the second motion vector and the vertical component of the second motion vector; or querying a second mapping table according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector to determine the selection probability, wherein the second mapping table comprises the corresponding relation between the selection value and the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector; and determining a motion compensation mode according to the selection probability.

Optionally, in another possible implementation manner of the present application, the determining unit is specifically configured to: judging whether the selection probability is greater than a second threshold value, wherein the second threshold value is any real number which is greater than or equal to 0 and less than or equal to 1; if the selection probability is larger than a second threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction; and if the selection probability is less than or equal to the second threshold value, determining the motion compensation mode to be a weighted prediction technology based on bidirectional prediction.

In a fifth aspect of the embodiments of the present application, a terminal is provided, where the terminal includes: one or more processors, memory, and a communication interface; the memory and the communication interface are connected with the one or more processors; the terminal communicates with other devices through a communication interface, and the memory is configured to store computer program code comprising instructions which, when executed by the one or more processors, cause the terminal to perform the bidirectional inter prediction method of any of the above aspects.

In a sixth aspect of the embodiments of the present application, there is provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the bidirectional inter prediction method of any of the above aspects.

A seventh aspect of embodiments of the present application provides a computer-readable storage medium, which includes instructions that, when executed on a terminal, cause the terminal to perform the bidirectional inter-frame prediction method of any of the above aspects.

In an eighth aspect of the embodiments of the present application, a video encoder is provided, where the video encoder includes a nonvolatile storage medium and a central processing unit, where the nonvolatile storage medium stores an executable program, and the central processing unit is connected to the nonvolatile storage medium, and when the central processing unit executes the executable program, the video encoder executes the bidirectional inter-frame prediction method in any aspect.

In a ninth aspect of the embodiments of the present application, there is provided a video decoder, including a nonvolatile storage medium and a central processing unit, where the nonvolatile storage medium stores an executable program, and the central processing unit is connected to the nonvolatile storage medium, and when the central processing unit executes the executable program, the video decoder executes the bidirectional inter-frame prediction method in any aspect.

In addition, the technical effects brought by the design manners of any aspect can be referred to the technical effects brought by the different design manners in the first aspect, and are not described herein again.

In the embodiment of the present application, the names of the bidirectional inter-frame prediction apparatus and the terminal do not limit the devices themselves, and in practical implementation, the devices may appear by other names. Provided that the function of each device is similar to the embodiments of the present application, and fall within the scope of the claims of the present application and their equivalents.

Drawings

Fig. 1 is a simplified schematic diagram of a video transmission system architecture according to an embodiment of the present application;

FIG. 2 is a simplified schematic diagram of a video encoder according to an embodiment of the present application;

FIG. 3 is a simplified schematic diagram of a video decoder according to an embodiment of the present application;

fig. 4 is a flowchart of a bidirectional inter-frame prediction method according to an embodiment of the present application;

fig. 5 is a schematic diagram of a motion of a current image block according to an embodiment of the present application;

FIG. 6 is a flowchart of another bi-directional inter prediction method according to an embodiment of the present application;

FIG. 7 is a flowchart of another bi-directional inter prediction method according to an embodiment of the present application;

fig. 8 is a schematic diagram of obtaining M × N pixel difference values according to an embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a method for bi-directional inter prediction according to an embodiment of the present application;

FIG. 10 is a flowchart illustrating a method for bi-directional inter prediction according to an embodiment of the present application;

fig. 11 is a schematic diagram illustrating a bidirectional inter-frame prediction apparatus according to an embodiment of the present disclosure;

fig. 12 is a schematic diagram illustrating another bi-directional inter prediction apparatus according to an embodiment of the present application.

Detailed Description

The terms "first" and "second", and the like in the description and claims of the present application, are used for distinguishing between different objects and not for defining a particular order.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

For the convenience of understanding the embodiments of the present application, relevant elements related to the embodiments of the present application will be described first.

Video encoding (video encoding): process for compressing video (image sequence) into a code stream.

Video decoding (video decoding): and restoring the code stream into a reconstructed image according to a specific grammar rule and a specific processing method.

In most coding frameworks, a video comprises a series of pictures (pictures), one picture being called a frame (frame). An image is divided into at least one stripe, each stripe in turn being divided into image blocks (blocks). Video encoding or video decoding is in units of image blocks. For example, the encoding process or the decoding process may be performed from left to right, from top to bottom, one line after another, starting from the position of the upper left corner of the image. Here, the image block may be a Macroblock (MB) in the video coding and decoding standard h.264, or may also be a Coding Unit (CU) in the High Efficiency Video Coding (HEVC) standard, which is not specifically limited in this embodiment of the present invention.

In the embodiment of the present application, an image block that is being encoded or decoded is referred to as a current block, and an image in which the current image block is located is referred to as a current frame (current image).

In video coding, a current frame may be divided into an I frame, a P frame, and a B frame according to a prediction type of a current image block. I-frames are frames encoded as independent still pictures, providing random access points in the video stream. A P frame is a frame predicted from a previous I frame or P frame adjacent thereto, and can be used as a reference frame for a next P frame or B frame. The B frame is a frame obtained by bidirectional prediction using two frames (which may be I frame or P frame) before and after the nearest neighbor as reference frames. In the embodiment of the present application, the current frame refers to a bidirectional predicted frame (B frame).

Because strong time correlation exists between a plurality of continuous frame images in a video, that is, a lot of redundancy is contained between adjacent frames, when video coding is carried out, the time correlation between each frame is often utilized to reduce the redundancy between frames, thereby achieving the purpose of compressing data. At present, the video is mainly encoded by using an inter-frame prediction technique of motion compensation to improve the compression ratio.

Inter prediction refers to prediction performed using correlation between a current frame, which may have one or more reference frames, and its reference frame in units of encoded image blocks or decoded image blocks. Specifically, a prediction block for a current image block is generated based on pixels in a reference frame for the current image block.

Specifically, when a coding end codes a current image block in a current frame, at first, more than one reference frame is arbitrarily selected from coded frames of a video image, a prediction block corresponding to the current image block is obtained from the reference frame, then, a residual value between the prediction block and the current image block is calculated, and the residual value is subjected to quantization coding; when decoding a current image block in a current frame, a decoding end firstly acquires a predicted image block corresponding to the current image block, then acquires a residual value between the predicted image block and the current image block from a received code stream, and reconstructs the current image block according to the residual value and the predicted block.

Temporal correlation between a current frame and other frames in a video is manifested not only by the presence of temporal correlation between the current frame and frames encoded before it, but also by the presence of temporal correlation between the current frame and frames encoded after it. Therefore, when video coding is performed, bidirectional inter-frame prediction can be considered to obtain a better coding effect.

In general, for a current image block, a prediction block for the current image block may be generated from only one reference block, or may be generated from two reference blocks. The above-described generation of a prediction block of a current image block from one reference block is called unidirectional inter prediction, and the above-described generation of a prediction block of a current image block from two reference blocks is called bidirectional inter prediction. The two reference image blocks in bi-directional inter prediction may be from the same reference frame or different reference frames.

Alternatively, bidirectional inter prediction may refer to inter prediction using correlation between a current video frame and a video frame encoded before and played before it, and correlation between a current video frame and a video frame encoded before and played after it.

It can be seen that the above bi-directional inter prediction involves inter prediction in two directions, commonly referred to as: forward inter prediction and backward inter prediction. Forward inter prediction refers to inter prediction that exploits the correlation between a current video frame and a video frame that was encoded before and played before it. Backward inter prediction refers to inter prediction that exploits the correlation between a current video frame and a video frame that was encoded before and played after it.

Motion compensation is a method for describing the difference between adjacent frames (adjacent here means adjacent in coding relation, two frames are not necessarily adjacent in playing sequence), and is a process of finding a reference block of a current image block according to motion information, and obtaining a prediction block of the current image block by processing the reference block of the current image block, which belongs to a loop in an inter-frame prediction process.

For bi-directional inter-frame prediction, a weighted prediction technique based on bi-directional prediction is required to perform weighted prediction on pixel values of the same pixel position in a forward prediction block of a current image block and a backward prediction block of the current image block to obtain a prediction block of the current image block, or an optical flow technique based on bi-directional prediction is required to determine the prediction block of the current image block according to the forward prediction block of the current image block and the backward prediction block of the current image block. However, the weighted prediction technology based on bidirectional prediction is simple in calculation and low in compression efficiency; the optical flow technology based on bidirectional prediction has high compression efficiency and high computational complexity. Therefore, how to select the motion compensation technique in the bi-directional prediction to achieve the optimal tradeoff between the compression ratio and the computational complexity is an urgent problem to be solved.

In view of the foregoing problems, an embodiment of the present application provides a bidirectional inter-frame prediction method, which has the following basic principle: after the motion information of the current image block is acquired, an initial prediction block of the current image block is acquired according to the motion information, then a motion compensation mode of the current image block is determined according to attribute information of the initial prediction block, or according to the motion information and the attribute information of the current image block, and then the motion compensation is carried out on the current image block according to the determined motion compensation mode and the initial prediction block. And the current image block is an image block to be encoded or an image block to be decoded. The motion compensation mode is a weighted prediction technology based on bidirectional prediction or an optical flow technology based on bidirectional prediction. Therefore, when the motion compensation is carried out on the current image block, a proper motion compensation mode is determined according to the characteristics of the current image block and the characteristics of the initial prediction block of the current image block, the characteristics of high compression ratio and low encoding and decoding complexity are considered, and the optimal balance of the compression ratio and the complexity is effectively achieved.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

The bidirectional interframe prediction method provided by the embodiment of the application is suitable for a video transmission system. Fig. 1 is a simplified schematic diagram illustrating an architecture of a video transmission system 100 to which embodiments of the present application may be applied. As shown in fig. 1, the video transmission system includes a source device and a destination device.

The source device includes a video source 101, a video encoder 102, and an output interface 103.

In some examples, video source 101 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of the aforementioned video data sources. The video source 101 is configured to collect video data, perform pre-encoding processing on the collected video data, convert an optical signal into a digitized image sequence, and transmit the digitized image sequence to the video encoder 102.

The video encoder 102 is configured to encode a sequence of images from the video source 101 to obtain a code stream.

Output interface 103 may include a modulator/demodulator (modem) and/or a transmitter. The output interface 103 is configured to send out a code stream encoded by the video encoder 102.

In some examples, the source device transmits the encoded codestream directly to the destination device via output interface 103. The encoded codestream may also be stored on a storage media or file server for later access by a destination device for decoding and/or playback. Such as storage device 107.

The destination device includes an input interface 104, a video decoder 105, and a display device 106.

In some examples, input interface 104 includes a receiver and/or a modem. The input interface 104 may receive the code stream transmitted via the network 108 sent by the output interface 103 and transmit the code stream to the video decoder 105. The network 108 may be an IP network including routers and switches, among others.

The video decoder 105 is configured to decode the code stream received by the input interface 104, and reconstruct an image sequence. The video encoder 102 and the video decoder 105 may operate according to a video compression standard, such as the high efficiency video codec h.265 standard.

The display device 106 may be integrated with the destination device or may be external to the destination device. In general, the display device 106 displays decoded video data. The display device 106 may include a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or other types of display devices.

The destination device may further include a rendering module for rendering the reconstructed image sequence decoded by the video decoder 105 to improve the display effect of the video.

Specifically, the bidirectional inter-frame prediction method according to the embodiment of the present application can be performed by the video encoder 102 and the video decoder 105 in the video transmission system shown in fig. 1.

A brief description of the video encoder and video decoder is provided below in conjunction with fig. 2 and 3.

Fig. 2 is a simplified schematic diagram of a video encoder 200 according to an embodiment of the present application. The video encoder 200 includes an inter predictor 201, an intra predictor 202, a summer 203, a transformer 204, a quantizer 205, and an entropy encoder 206. For image block reconstruction, the video encoder 200 further includes an inverse quantizer 207, an inverse transformer 208, a summer 209, and a filter unit 210. The inter predictor 201 includes a motion estimation unit and a motion compensation unit. The intra predictor 202 includes a selection intra prediction unit and an intra prediction unit. Filter unit 210 is intended to represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALF), and Sample Adaptive Offset (SAO) filters. Although filter unit 210 is shown in fig. 2 as an in-loop filter, in other implementations, filter unit 210 may be implemented as a post-loop filter. In one example, the video encoder 200 may further include a video data memory, a partitioning unit (not shown). The video data memory may store video data to be encoded by components of the video encoder 200. The video data stored in the video data memory may be obtained from a video source. DPB107 may be a reference picture memory that stores reference video data used to encode video data by video encoder 200 in intra, inter coding modes. The video data memory and DPB107 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. The video data memory and DPB107 may be provided by the same memory device or separate memory devices. In various examples, the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.

The video encoder 200 receives video data and stores the video data in a video data memory. The partitioning unit partitions the video data into image blocks and these image blocks may be further partitioned into smaller blocks, e.g. image block partitions based on a quadtree structure or a binary tree structure. This segmentation may also include segmentation into stripes (slices), slices (tiles), or other larger units. Video encoder 200 generally illustrates components that encode image blocks within a video slice to be encoded. A slice may be divided into a plurality of image blocks (and possibly into a set of image blocks called a slice).

After the video data is segmented to obtain a current image block, inter prediction may be performed on the current image block by the inter predictor 201. Inter-frame prediction refers to finding a matched reference block for a current image block in a current image in a reconstructed image so as to obtain motion information of the current image block, and then calculating prediction information (prediction block) of pixel values of pixel points in the current image block according to the motion information. Among them, the process of calculating motion information is called motion estimation. The motion estimation process requires trying multiple reference blocks in the reference picture for the current image block, and ultimately which reference block or blocks to use for prediction is determined using rate-distortion optimization (RDO) or other methods. The process of calculating a prediction block for the current image block is called motion compensation. Specifically, the bidirectional interframe prediction method according to the embodiment of the present application may be executed by the interframe predictor 201.

After the video data is segmented to obtain the current image block, the current image block may be intra-predicted by the intra predictor 202. The intra-frame prediction refers to the prediction of the pixel value of the pixel point in the current image block by using the pixel value of the pixel point in the reconstructed image block in the image in which the current image block is positioned.

After the video data generates a prediction block for the current image block via inter predictor 201 and intra predictor 202, video encoder 200 forms a residual image block by subtracting the prediction block from the current image block to be encoded. Summer 203 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more Transform Units (TUs) and applied to transformer 204. The transformer 204 transforms the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform or a conceptually similar transform. Transformer 204 may convert the residual video data from a pixel value domain to a transform domain, such as the frequency domain.

Transformer 204 may send the resulting transform coefficients to quantizer 205. Quantizer 205 quantizes the transform coefficients to further reduce the bit rate. In some examples, quantizer 205 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, the entropy encoder 206 may perform the scan.

After quantization, the entropy encoder 206 entropy encodes the quantized transform coefficients. For example, the entropy encoder 206 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by the entropy encoder 206, the encoded codestream may be transmitted to the video decoder 300, or archived for later transmission or retrieval by the video decoder 300. The entropy encoder 206 may also entropy encode syntax elements of the current image block to be encoded.

Inverse quantizer 207 and inverse transformer 208 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block for a reference picture. The summer 209 adds the reconstructed residual block to the prediction block produced by the inter predictor 201 or the intra predictor 202 to produce a reconstructed image block. The filter unit 210 may be adapted to reconstruct image blocks to reduce distortions, such as block artifacts. This reconstructed image block is then stored as a reference block in a decoded image buffer, which may be used by the inter predictor 201 as a reference block to inter predict a block in a subsequent video frame or image.

It should be understood that other structural variations of the video encoder 200 may be used to encode the video stream. For example, for some image blocks or image frames, the video encoder 200 may quantize the residual signal directly without processing by the transformer 204 and correspondingly without processing by the inverse transformer 208; alternatively, for some image blocks or image frames, the video encoder 200 does not generate residual data and accordingly does not need to be processed by the transformer 203, the quantizer 205, the inverse quantizer 207, and the inverse transformer 208; alternatively, the video encoder 200 may store the reconstructed image block directly as a reference block without processing by the filter unit 210; alternatively, the quantizer 205 and the dequantizer 207 in the video encoder 200 may be combined together.

The video encoder 200 is used to output video to a post-processing entity 211. Post-processing entity 211 represents an example of a video entity, such as a media-aware network element (MANE) or a splicing/editing device, that may process encoded video data from video encoder 200. In some cases, post-processing entity 211 may be an instance of a network entity. In some video encoding systems, post-processing entity 211 and video encoder 200 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 211 may be performed by the same device that includes video encoder 200. In some example, the post-processing entity 211 is an example of the storage 107 of fig. 1.

Fig. 3 is a simplified schematic diagram of a video decoder 300 according to an embodiment of the present application. The video decoder 300 includes an entropy decoder 301, an inverse quantizer 302, an inverse transformer 303, a summer 304, a filter unit 305, an inter predictor 306, and an intra predictor 307. Video decoder 300 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 200 from fig. 2. First, residual information is obtained by using the entropy decoder 301, the inverse quantizer 302, and the inverse transformer 303, and the decoded code stream determines whether the current image block uses intra prediction or inter prediction. If the prediction is intra-frame prediction, the intra-frame predictor 307 constructs prediction information according to the used intra-frame prediction method by using the pixel values of the pixel points in the surrounding reconstructed region. If the inter-frame prediction is performed, the inter-frame predictor 306 needs to analyze motion information, determine a reference block in a reconstructed image by using the analyzed motion information, use a pixel value of a pixel point in the block as prediction information, and perform filtering operation by using the prediction information plus residual error information to obtain reconstruction information.

The bidirectional interframe prediction method disclosed by the embodiment of the application is not only suitable for wireless application scenes, but also can be applied to video coding and decoding supporting various multimedia applications such as the following applications: over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, a video codec system may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

The bidirectional inter-frame prediction method provided in the embodiment of the present application may be executed by a bidirectional inter-frame prediction apparatus, a video encoding and decoding apparatus, a video encoder and decoder, or other devices with video encoding and decoding functions, which is not specifically limited in this embodiment of the present application.

For convenience of explanation, the following description will be made of a bidirectional inter-prediction method using a bidirectional inter-prediction apparatus as an execution subject.

Fig. 4 is a flowchart illustrating a bidirectional inter-frame prediction method according to an embodiment of the present disclosure. The bi-directional inter prediction method shown in fig. 4 may occur in either an encoding process or a decoding process. For example, the bi-directional inter prediction method shown in fig. 4 may occur in an inter prediction process at the time of encoding and decoding. As shown in fig. 4, the bidirectional inter prediction method includes:

s401, the bidirectional inter-frame prediction device obtains motion information of the current image block.

The current image block is an image block to be encoded or an image block to be decoded. If the current image block is an image block to be encoded, the motion information of the current image block can be obtained according to motion estimation. If the current image block is the image block to be decoded, the motion information of the current image block can be obtained by decoding according to the code stream.

The motion information mainly includes the prediction direction information of the current image block, the reference frame index of the current image block, and the motion vector of the current image block. The prediction direction information of the current image block includes forward prediction, backward prediction, and bi-directional prediction. The reference frame index of the current image block indicates an index of a frame in which the reference block of the current image block is located. The reference frame index of the current image block comprises a forward reference frame index of the current image block and a backward reference frame index of the current image block according to different prediction directions. The motion vector of the current image block represents the motion displacement of the current image block relative to the reference block.

The motion vector comprises a horizontal component (denoted as MV)_x) And hang downDirect component (denoted as MV)_y). The horizontal component represents the motion displacement of the current image block relative to the reference block in the horizontal direction. The vertical component represents the motion displacement of the current image block relative to the reference block in the vertical direction. If the prediction direction information indicates forward prediction or backward prediction, only one motion vector is present, and if the prediction direction information indicates bidirectional prediction, two motion vectors are present. For example, the motion information of the bi-prediction includes a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector. The first reference frame index is used to indicate an index of a frame in which a forward reference block of the current image block is located. The first motion vector is used to represent the motion displacement of the current image block relative to the forward reference block. The second reference frame index is used to indicate an index of a frame in which a backward reference block of the current image block is located. The second motion vector is used to represent the motion displacement of the current image block relative to the backward reference block.

For example, as shown in fig. 5, B represents the current image block. The frame where the current image block is located is the current frame. A denotes a forward reference block. The frame in which the forward reference block is located is a forward reference frame. C denotes a backward reference block. The frame in which the backward reference block is located is a backward reference frame. 0 denotes forward direction and 1 denotes backward direction. MV0 denotes a forward motion vector, MV0 ═ MV0_x,MV0_y) Wherein MV0_xThe horizontal component, MV0, representing the forward motion vector_yRepresenting the vertical component of the forward motion vector. MV1 denotes a backward motion vector, MV1 ═ MV1_x,MV1_y) Wherein MV1_xThe horizontal component, MV1, representing the forward motion vector_yRepresenting the vertical component of the forward motion vector. The dotted line represents the motion trajectory of the current image block B.

S402, the bidirectional inter-frame prediction device acquires an initial prediction block of the current image block according to the motion information.

The process of obtaining an initial prediction block of a current image block according to motion information may refer to the prior art, and the initial prediction block of the current image block includes a forward prediction block and a backward prediction block. For example, as shown in fig. 6, S402 may be implemented by the following detailed steps.

S601, the bidirectional inter-frame prediction device determines a first initial prediction block of the current image block according to the first reference frame index and the first motion vector.

Firstly, the bidirectional inter-frame prediction device can determine a first reference frame where a first reference block of a current image block is located according to a first reference frame index, then, a first reference block of the current image block is determined in the first reference frame according to a first motion vector, and the first reference block is subjected to sub-pixel interpolation to obtain a first initial prediction block. The first initial prediction block may refer to a forward prediction block of the current image block.

Assume that the first reference frame index is a forward reference frame index. For example, as shown in fig. 5, a forward reference frame where a forward reference block a of a current image block B is located is first determined according to a forward reference frame index, then the same coordinate point (i ', j ') is found in the forward reference frame according to the coordinate (i, j) of the current image block, then a block B ' in the forward reference frame is determined according to the length and width of the current image block B, and a forward motion vector MV0 of the current image block B is obtained (MV 0)_x,MV0_y) And moving the block B' to a forward reference block A, and performing sub-pixel interpolation on the forward reference block A to obtain a forward prediction block of the current image block B. (i, j) represents the coordinates of the point in the upper left corner of the current image block B in the current frame. The origin of coordinates of the current frame is a point of the upper left corner of the current frame where the current image block B is located. (i ', j ') represents the coordinates of the point in the upper left corner of block B ' in the forward reference frame. The origin of coordinates of the forward reference frame is the point in the upper left corner of the forward reference frame where block B' is located.

S602, the bidirectional inter-frame prediction device determines a second initial prediction block of the current image block according to the second reference frame index and the second motion vector.

Firstly, the bidirectional inter-frame prediction device can determine a second reference frame where a second reference block of the current image block is located according to the second reference frame index, then, a second reference block of the current image block is determined in the second reference frame according to the second motion vector, and the second reference block is subjected to sub-pixel interpolation to obtain a second initial prediction block. The second initial prediction block may refer to a backward prediction block of the current image block.

It should be noted that the process of determining the backward prediction block of the current image block is the same as the process of determining the forward prediction block of the current image block, and only the reference direction is different, and the specific method may refer to the explanation in S601. If the current image block is not bi-predicted, the forward prediction block or the backward prediction block obtained at this time is the prediction block of the current image block.

S403a, the bi-directional inter-prediction apparatus determines the motion compensation mode of the current image block according to the attribute information of the initial prediction block.

The attribute information of the initial prediction block comprises the size of the initial prediction block, the number of pixel points included by the initial prediction block and the pixel values of the pixel points included by the initial prediction block. In addition, since the method described in the embodiments of the present application is directed to bi-directional inter prediction, the initial prediction block herein includes a first initial prediction block and a second initial prediction block. The manner of obtaining the first initial prediction block and the second initial prediction block may refer to the explanation of S402. The embodiment of the present application takes the pixel value of the pixel included in the initial prediction block as an example to describe how to determine the motion compensation mode of the current image block according to the attribute information of the initial prediction block.

For example, assume that the current image block includes M × N pixels, the first initial prediction block includes M × N pixels, and the second initial prediction block includes M × N pixels. N is an integer greater than or equal to 1, M is an integer greater than or equal to 1, and M and N may be equal or unequal. As shown in fig. 7, S403a may be implemented by the following detailed steps.

And S701, the bidirectional inter-frame prediction device obtains M pixel differences according to the pixel values of M pixel points of the first initial prediction block and the pixel values of M pixel points of the second initial prediction block.

The bidirectional inter-frame prediction device can obtain M pixel difference values by subtracting the pixel values of M pixel points of the first initial prediction block from the pixel values of M pixel points of the second initial prediction block. It should be understood that the difference between M × N pixels is obtained by sequentially subtracting the pixel values of the pixels included in the first initial prediction block from the pixel values of the pixels in the corresponding positions in the second initial prediction block. The corresponding positions referred to herein refer to positions relative to the same coordinate points in the same coordinate system. The M × N pixel differences also constitute an inter prediction block.

For example, as shown in fig. 8, it is assumed that M is 4 and N is 4. The current image block includes 4 x 4 pixels, i.e. b_0,0,b_0,1,b_0,2,b_0,3....b_3,0,b_3,1,b_3,2,b_3,3. The first initial prediction block comprises 4 x 4 pixel points, namely a_0,0,a_0,1,a_0,2,a_0,3....a_3,0,a_3,1,a_3,2,a_3,3. The second initial prediction block comprises 4 x 4 pixel points, namely c_0,0,c_0,1,c_0,2,c_0, ₃....c_3,0,c_3,1,c_3,2,c_3,3. With a_0,0、b_0,0And c_0,0And establishing a two-dimensional rectangular coordinate system by taking i as an abscissa and taking j as an ordinate j as an origin of coordinates. For example, pixel a in the first initial prediction block_0,0The pixel point corresponding to the same position in the second initial prediction block is the pixel point b of the same coordinate node (0,0)_0,0By a_0,0Minus b_0,0The pixel difference value of the coordinate node (0,0) is obtained. And obtaining 4 × 4 pixel difference values according to the difference between the pixel values of 4 × 4 pixel points of the first initial prediction block and the pixel values of 4 × 4 pixel points of the second initial prediction block. The pixel difference value is expressed by the formula D (i, j) ═ abs (a (i, j) -B (i, j)), where (i, j) represents the coordinates within the block of the pixel point. D (i, j) represents a pixel difference value of a pixel point with the coordinate (i, j), namely a pixel difference value of a pixel point at the ith row and the jth column. A (i, j) represents a pixel value of a pixel point whose coordinate is (i, j) included in the first initial prediction block. B (i, j) represents a pixel value of a pixel point whose coordinate is (i, j) included in the second initial prediction block. abs () represents an absolute value operation. i is an integer, i is 0 to M-1. j is an integer, j is 0 to N-1. 4 pixel points 4 x 4 corresponding to the difference value of 4 pixel points can form an intermediate prediction block, and the intermediate prediction block comprises 4 pixel points 4 x 4, namely d_0,0,d_0,1,d_0,2,d_0, ₃....d_3,0,d_3,1,d_3,2,d_3,3。

And S702, the bidirectional inter-frame prediction device determines the texture complexity of the current image block according to the M x N pixel difference values.

The bidirectional inter-frame prediction device can determine the texture complexity of the current image block according to the M pixel differences after obtaining the M pixel differences according to the M pixel values of the M pixel points of the first initial prediction block and the M pixel values of the M pixel points of the second initial prediction block.

In one possible implementation, the texture complexity of the current image block may be determined according to a sum of M × N pixel difference values. It should be understood that the sum of M × N pixel differences here may also refer to the sum of the absolute values of M × N pixel differences. The texture complexity of the current image block is the sum of M x N pixel difference values. The texture complexity is expressed by the formula

Wherein the content of the first and second substances,

representing the complexity of the texture. Sum of Absolute Differences (SAD) represents the Sum of the Absolute Differences of M x N pixels.

In another possible implementation, the texture complexity of the current image block may be determined according to an average of M × N pixel difference values. The texture complexity of the current image block is the average of the M x N pixel differences. The texture complexity is expressed by the formula

Where μ represents the average of the M × N pixel difference values. M N represents the number of pixel points.

In a third possible implementation, the texture complexity of the current image block may be determined according to a standard deviation of M × N pixel difference values. The texture complexity of the current image block is the standard deviation of the M x N pixel differences. The texture complexity is expressed by the formula

Wherein, the sigma tableThe standard deviation of the M x N pixel difference values is shown.

And S703, the bidirectional inter-frame prediction device determines a motion compensation mode according to the texture complexity of the current image block.

The bi-directional inter prediction apparatus may determine the motion compensation mode according to a comparison between the texture complexity of the current image block and a preset threshold. For example, whether the texture complexity of the current image block is smaller than a first threshold is judged, and if the texture complexity of the current image block is smaller than the first threshold, the motion compensation mode is determined to be a weighted prediction technology based on bidirectional prediction; and if the texture complexity of the current image block is greater than or equal to a first threshold value, determining that the motion compensation mode is an optical flow technology based on bidirectional prediction. The first threshold is any real number greater than 0, such as 150 or 200. In practical applications, the first threshold value needs to be adjusted according to the codec parameters, the specific codec and the target codec time. It should be noted that the value of the first threshold may be set in advance or in a high level syntax. The high level syntax may be specified in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a slice header (slice header) among other parameter sets.

S403b, the bi-directional inter prediction device determines the motion compensation mode of the current image block according to the motion information and the attribute information of the initial prediction block.

When the bidirectional inter-frame prediction device determines the motion compensation mode of the current image block according to the attribute information of the initial prediction block, the motion compensation mode can also be determined together with the attribute information of the initial prediction block according to the motion amplitude of the current image block. The motion amplitude of the current image block may be determined by the motion information. The attribute information of the initial prediction block may be obtained according to the above S701 and S702, and is not described herein again in this embodiment of the present application.

For example, as shown in fig. 9, after the bidirectional inter-frame prediction apparatus determines the texture complexity of the current image block according to the M × N pixel difference values, that is, S702, the embodiment of the present application may further include the following detailed steps.

S901, the bidirectional inter-frame prediction device determines a first motion amplitude of the current image block according to the first motion vector and determines a second motion amplitude of the current image block according to the second motion vector.

Illustratively, the first motion amplitude is formulated as

Wherein, MV0_xRepresenting the horizontal component of the first motion vector (forward motion vector). MV0_yRepresenting the vertical component of the first motion vector (forward motion vector). The second motion amplitude is expressed by the formula

Wherein, MV1_xRepresenting the horizontal component of the second motion vector (backward motion vector). MV1_yRepresenting the vertical component of the second motion vector (backward motion vector).

It should be noted that, the order of the steps of the bidirectional inter-frame prediction method provided in the embodiment of the present application may be appropriately adjusted, and the steps may also be increased or decreased according to the situation, for example, the front and back orders among S901, S701, and S702 may be interchanged, that is, S901 may be executed first, and then S701 and S702 may be executed, and any method that is easily conceivable by those skilled in the art within the technical scope disclosed in the present application should be included in the protection scope of the present application, and therefore, no further description is given.

And S902, the bidirectional inter-frame prediction device determines the selection probability according to the texture complexity, the first motion amplitude, the second motion amplitude and the first mathematical model of the current image block.

For example, the first mathematical model may be a first logistic regression model. The first logistic regression model is as follows:

wherein, ω is₀，ω₁，ω₂And ω₃Are parameters of the first logistic regression model. Omega₀Is typically 2.06079643. Omega₁A typical value of-0.01175306. Omega₂A typical value of-0.00122516. Omega₃A typical value of-0.0008786. Substituting dist0 and dist1 into the first logistic regression model, respectively, may yield a selection probability y. It should be noted that the parameters of the first logistic regression model may be set in advance or in a high level syntax. The high level syntax may be specified in parameter sets such as SPS, PPS, slice header, etc.

Optionally, in addition to the selection probability y calculated by the logistic regression model, a first mapping table may be predefined during encoding. And storing each possible value of the texture complexity, the first motion amplitude and the second motion amplitude of the current image block and the corresponding value of the selection probability y in the first mapping table. The value of the selection probability y can be obtained in a table look-up mode during encoding.

And S903, determining a motion compensation mode by the bidirectional inter-frame prediction device according to the selection probability.

The motion compensation mode may be determined based on a comparison of the selection probability with a preset threshold. In an example, whether the selection probability is greater than a second threshold is judged, and if the selection probability is greater than the second threshold, the motion compensation mode is determined to be an optical flow technology based on bidirectional prediction; and if the selection probability is less than or equal to the second threshold value, determining the motion compensation mode to be a weighted prediction technology based on bidirectional prediction. The second threshold value is any real number equal to or greater than 0 and equal to or less than 1. For example, the second threshold value may take a value of 0.7.

S403c, the bidirectional inter-frame prediction apparatus determines the motion compensation mode of the current image block according to the motion information and the attribute information of the current image block.

The attribute information of the current image block comprises the size of the current image block, the number of pixel points included in the current image block and the pixel values of the pixel points included in the current image block. The following explains the determination of the motion compensation mode by the bidirectional inter-frame prediction apparatus according to the motion information and the attribute information of the current image block in detail by taking the size of the current image block as an example and combining the drawings. Because the current image block is a pixel array consisting of pixels, the bidirectional inter-frame prediction device can obtain the size of the current image block according to the pixels. Understandably, the size of the current image block is the width and height of the current image block. As shown in fig. 10, S403c may be implemented by the following detailed steps.

And S1001, the bidirectional inter-frame prediction device determines the selection probability according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector, the vertical component of the second motion vector and the second mathematical model.

For example, the second mathematical model may be a second logistic regression model. The second logistic regression model is as follows:

y＝1/(1+exp(-1×(ω₀+ω₁·H+ω₂·W+ω₃·MV0_x+ω₄·MV0_y+ω₅·MV1_x+ω₆·MV1_y)))

wherein, ω is₀，ω₁，ω₂、ω₃、ω₄、ω₅And ω₆Are parameters of the second logistic regression model. Omega₀A typical value of-0.18929861. Omega₁Is typically 4.81715386 e-03. Omega₂Is typically 4.66279123 e-03. Omega₃Typical value of (a) is-7.46496930 e-05. Omega₄A typical value of (a) is 1.23565538 e-04. Omega₅Typical value of (a) is-4.25855176 e-05. Omega₆A typical value of (a) is 1.44069088 e-04. W denotes the width of a prediction block of the current image block. H denotes high of a prediction block of the current image block. MV0_xRepresenting the horizontal component of the first motion vector (forward motion vector). MV0_yRepresenting the vertical component of the first motion vector (forward motion vector). MV1_xRepresenting the horizontal component of the second motion vector (backward motion vector). MV1_yRepresenting the vertical component of the second motion vector (backward motion vector). And substituting the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector into the second logistic regression model respectively to obtain the selection probability y. It should be noted that the parameters of the second logistic regression model may be preset orSet in the high level syntax. The high level syntax may be specified in parameter sets such as SPS, PPS, slice header, etc.

Optionally, in addition to calculating the selection probability y through the second logistic regression model, a second mapping table may be predefined during encoding. The second mapping table stores the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector, and each possible value of the vertical component of the second motion vector, and the corresponding value of the selection probability y. The value of the selection probability y can be obtained in a table look-up mode during encoding.

S1002, the bidirectional interframe prediction device determines a motion compensation mode according to the selection probability.

For a detailed explanation of S1002, refer to the explanation in S903, and the embodiments of the present application are not described herein again.

And S404, the bidirectional inter-frame prediction device performs motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block.

The initial prediction block includes a first initial prediction block and a second initial prediction block. The specific implementation manner in the prior art can be referred to for motion compensation of the current image block based on the weighted prediction technology of the bi-directional prediction and the initial prediction block, and for motion compensation of the current image block based on the optical flow technology of the bi-directional prediction and the initial prediction block, which is not described herein again.

Further, after the bidirectional inter-frame prediction apparatus determines the motion compensation method used for bidirectional motion compensation by using the bidirectional inter-frame prediction method described in the above embodiment, the selected motion compensation method may be written into the syntax element of the current image block. When decoding, the motion compensation mode is directly selected according to the syntax element without repeatedly performing judgment action.

For example, a syntax element (Bio _ flag) is allocated to the current image block, and the syntax element occupies 1 bit in the bitstream. When the value of Bio _ flag is 0, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; when the value of Bio _ flag is 1, the motion compensation mode is the optical flow technology based on bidirectional prediction. The initial value of Bio _ flag is 0. And after the decoding end analyzes the code stream, obtaining the value of the syntax element (Bio _ flag) of the current decoding block. And determining a motion compensation mode used by the bidirectional motion compensation according to the value of the Bio _ flag. If the value of Bio _ flag is 0, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; if the value of Bio _ flag is 1, the motion compensation mode is an optical flow technology based on bidirectional prediction.

Alternatively, the syntax element may be set in the high-level syntax to specify the decision method used by the bi-directional inter-prediction apparatus to determine the motion compensation mode. For example, the decision method is a first decision method, a second decision method, or a third decision method. The first decision method is to determine the motion compensation mode of the current image block according to the attribute information of the initial prediction block. The second decision method is to determine the motion compensation mode of the current image block according to the motion information and the attribute information of the initial prediction block. The third judging method determines the motion compensation mode of the current image block according to the motion information and the attribute information of the current image block. The detailed description of the first, second, and third decision methods may refer to the detailed description of the foregoing embodiments, and the embodiments of the present application are not described herein again. The syntax elements may be placed in parameter sets such as SPS, PPS, sliceheader, etc.

For example, the syntax element may be a selection mode (select _ mode), which takes 2 bits in the bitstream. The initial value of the syntax element select _ mode is 0. As shown in table 1, the values of select _ mode and the decision method indicated by them are shown in the following table:

TABLE 1

Value of select _ mode	Decision method
		0	First decision method
1	Second decision method
		2	Third decision method

And after the bidirectional inter-frame prediction device acquires the motion information of the current image block, determining a motion compensation mode according to a specified judgment method. If the determined decision method is the first decision method, the bidirectional interframe prediction device performs bidirectional interframe prediction according to the first decision method. And if the determined judgment method is the second judgment method, the bidirectional interframe prediction device carries out bidirectional interframe prediction according to the second judgment method. And if the determined judgment method is the third judgment method, the bidirectional interframe prediction device carries out bidirectional interframe prediction according to the third judgment method.

According to the bidirectional inter-frame prediction method, motion compensation is performed on the current image block, and a proper motion compensation mode is determined according to the characteristics of the current image block and the characteristics of the prediction block of the current image block, so that the characteristics of high compression ratio and low encoding and decoding complexity are considered, and therefore the optimal balance between the compression ratio and the complexity is effectively achieved.

The above-mentioned scheme provided by the embodiment of the present application is introduced mainly from the perspective of interaction between network elements. It will be appreciated that each network element, for example the bi-directional inter-frame prediction means, comprises corresponding hardware structures and/or software modules for performing each function in order to implement the above-described functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the functional modules of the bidirectional inter-frame prediction apparatus may be divided according to the above method, for example, each functional module may be divided according to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

In the case of dividing each functional module by corresponding functions, fig. 11 shows a possible composition diagram of the bidirectional inter-frame prediction apparatus mentioned above and in the embodiment, as shown in fig. 11, the bidirectional inter-frame prediction apparatus may include: motion estimation unit 1101, determination unit 1102, motion compensation unit 1103.

Among them, the motion estimation unit 1101 is configured to support the bidirectional inter-frame prediction apparatus to perform S401 in the bidirectional inter-frame prediction method shown in fig. 4, S401 in the bidirectional inter-frame prediction method shown in fig. 6, S401 in the bidirectional inter-frame prediction method shown in fig. 7, S401 in the bidirectional inter-frame prediction method shown in fig. 9, and S401 in the bidirectional inter-frame prediction method shown in fig. 10.

A determination unit 1102 for supporting the bidirectional inter-prediction apparatus to perform S402, S403a, S403b, and S403c in the bidirectional inter-prediction method shown in fig. 4, S601, S602, S403a, S403b, and S403c in the bidirectional inter-prediction method shown in fig. 6, S601, S602, S701-S703, S403b, and S403c in the bidirectional inter-prediction method shown in fig. 7, S601, S602, S701-S703, S901-S903, and S403c in the bidirectional inter-prediction method shown in fig. 9, and S601, S602, S403a, S403b, S1001, and S1002 in the bidirectional inter-prediction method shown in fig. 10.

A motion compensation unit 1103 for supporting the bidirectional inter-prediction apparatus to perform S404 in the bidirectional inter-prediction method shown in fig. 4, S404 in the bidirectional inter-prediction method shown in fig. 6, S404 in the bidirectional inter-prediction method shown in fig. 7, S404 in the bidirectional inter-prediction method shown in fig. 9, and S404 in the bidirectional inter-prediction method shown in fig. 10.

It should be noted that all relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

The bidirectional inter-frame prediction device provided by the embodiment of the application is used for executing the bidirectional inter-frame prediction method, so that the same effect as the bidirectional inter-frame prediction method can be achieved.

In the case of using an integrated unit, fig. 12 shows another possible composition diagram of the bidirectional inter-frame prediction apparatus in the above embodiment. As shown in fig. 12, the bidirectional inter prediction apparatus includes: a processing module 1201 and a communication module 1202.

The processing module 1201 is configured to control and manage the operation of the bidirectional inter-frame prediction apparatus, for example, the processing module 1201 is configured to support the bidirectional inter-frame prediction apparatus to perform S402, S403a, S403b and S403c in the bidirectional inter-frame prediction method shown in fig. 4, S601, S602, S403a, S403b and S403c in the bidirectional inter-frame prediction method shown in fig. 6, S601, S602, S701-S703, S403b and S403c in the bidirectional inter-frame prediction method shown in fig. 7, S601, S602, S701-S703, S901-S903 and S403c in the bidirectional inter-frame prediction method shown in fig. 9, S601, S602, S403a, S403b, S1001 and S1002 in the bidirectional inter-frame prediction method shown in fig. 10, and/or other processes used in the techniques described herein. The communication module 1202 is configured to support communication between the bidirectional inter-frame prediction apparatus and other network entities, such as the functional modules or network entities shown in fig. 1 or fig. 3. The bi-directional inter-frame prediction apparatus may further comprise a storage module 1203 for storing program code and data of the bi-directional inter-frame prediction apparatus.

The processing module 1201 may be a processor or a controller. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like. The communication module 1202 may be a transceiver circuit or a communication interface, etc. The storage module 1203 may be a memory.

All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

The bidirectional inter-frame prediction apparatus 11 and the bidirectional inter-frame prediction apparatus 12 may each perform the bidirectional inter-frame prediction method shown in any one of fig. 4, 6, 7, 9 and 10, and the bidirectional inter-frame prediction apparatus 11 and the bidirectional inter-frame prediction apparatus 12 may be specifically a video encoding apparatus, a video decoding apparatus or other devices with video encoding and decoding functions. The bidirectional inter-prediction apparatus 11 and the bidirectional inter-prediction apparatus 12 can be used for motion compensation during encoding or motion compensation during decoding.

The present application further provides a terminal, including: one or more processors, memory, a communication interface. The memory, communication interface, and one or more processors; the memory is configured to store computer program code comprising instructions that, when executed by the one or more processors, cause the terminal to perform the bidirectional inter prediction method of embodiments of the present application.

The terminal can be a video display device, a smart phone, a portable computer and other devices which can process video or play video.

The application also provides a video encoder, which comprises a nonvolatile storage medium and a central processing unit, wherein the nonvolatile storage medium stores an executable program, and the central processing unit is connected with the nonvolatile storage medium and executes the executable program to realize the bidirectional inter-frame prediction method of the embodiment of the application.

The application also provides a video decoder, which comprises a nonvolatile storage medium and a central processing unit, wherein the nonvolatile storage medium stores an executable program, and the central processing unit is connected with the nonvolatile storage medium and executes the executable program to realize the bidirectional inter-frame prediction method of the embodiment of the application.

Another embodiment of the present application also provides a computer-readable storage medium including one or more program codes, the one or more programs including instructions, which when executed by a processor in a terminal, cause the terminal to perform the bidirectional inter-frame prediction method shown in any one of fig. 4, 6, 7, 9 and 10.

In another embodiment of the present application, there is also provided a computer program product comprising computer executable instructions stored in a computer readable storage medium; the at least one processor of the terminal may read the computer-executable instructions from the computer-readable storage medium, and the execution of the computer-executable instructions by the at least one processor causes the terminal to implement the steps of the bidirectional inter prediction apparatus in the bidirectional inter prediction method illustrated in any one of fig. 4, 6, 7, 9 and 10 described above.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A bi-directional inter prediction method, comprising:

acquiring motion information of a current image block, wherein the current image block is an image block to be encoded or an image block to be decoded;

acquiring an initial prediction block of the current image block according to the motion information;

determining the texture complexity of the current image block according to the attribute information of the initial prediction block;

determining a motion compensation mode of the current image block according to the texture complexity of the current image block; if the texture complexity of the current image block is smaller than a first threshold, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; if the texture complexity of the current image block is greater than or equal to the first threshold, the motion compensation mode is a bidirectional prediction-based optical flow technology BIO; the first threshold is any real number greater than 0;

and performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block.

2. The method of claim 1, wherein the motion information comprises a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector;

the obtaining the initial prediction block of the current image block according to the motion information includes:

determining a first initial prediction block of the current image block according to the first reference frame index and the first motion vector, wherein the first reference frame index is used for representing an index of a frame where a forward reference block of the current image block is located, the first motion vector is used for representing motion displacement of the current image block relative to the forward reference block, attribute information of the first initial prediction block comprises pixel values of M pixel points by N, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 1;

and determining a second initial prediction block of the current image block according to the second reference frame index and the second motion vector, wherein the second reference frame index is used for representing an index of a frame where a backward reference block of the current image block is located, the second motion vector is used for representing motion displacement of the current image block relative to the backward reference block, and attribute information of the second initial prediction block comprises pixel values of M pixel points.

3. The method according to claim 2, wherein said determining the texture complexity of the current image block according to the attribute information of the initial prediction block comprises:

obtaining M pixel differences according to the pixel values of M pixel points of the first initial prediction block and the pixel values of M pixel points of the second initial prediction block;

and determining the texture complexity of the current image block according to the M x N pixel difference values.

4. The method according to claim 3, wherein said determining the texture complexity of the current image block from the M x N pixel difference values comprises:

calculating the sum of the absolute values of the M x N pixel difference values;

and determining the sum of the absolute values of the M x N pixel difference values as the texture complexity of the current image block.

5. The method according to claim 3, wherein said determining the texture complexity of the current image block from the M x N pixel difference values comprises:

calculating an average of the M x N pixel difference values;

and determining the average value of the M x N pixel difference values as the texture complexity of the current image block.

6. The method according to claim 3, wherein said determining the texture complexity of the current image block from the M x N pixel difference values comprises:

calculating a standard deviation of the M x N pixel difference values;

and determining the standard deviation of the M x N pixel difference values as the texture complexity of the current image block.

7. A bi-directional inter prediction method, comprising:

determining the selection probability according to the texture complexity of the current image block and the motion information;

determining a motion compensation mode of the current image block according to the selection probability; if the selection probability is larger than a second threshold value, determining that the motion compensation mode is a bidirectional prediction based optical flow technology BIO; if the selection probability is less than or equal to the second threshold, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction; the second threshold is any real number greater than or equal to 0 and less than or equal to 1;

and performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block of the current image block.

8. The method of claim 7, wherein the motion information comprises a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector; the method further comprises the following steps:

determining a second initial prediction block of the current image block according to the second reference frame index and the second motion vector, wherein the second reference frame index is used for representing an index of a frame where a backward reference block of the current image block is located, the second motion vector is used for representing motion displacement of the current image block relative to the backward reference block, and attribute information of the second initial prediction block comprises pixel values of M pixel points;

9. The method according to claim 8, wherein the motion amplitude of the current image block is determined by the motion information, and the determining a selection probability according to the texture complexity of the current image block and the motion information comprises:

determining a first motion amplitude of the current image block according to the first motion vector, and determining a second motion amplitude of the current image block according to the second motion vector;

and determining the selection probability according to the first motion amplitude, the second motion amplitude and the texture complexity of the current image block.

10. The method according to claim 9, wherein determining the selection probability according to the first motion amplitude, the second motion amplitude and the texture complexity of the current image block comprises:

determining a selection probability according to the texture complexity of the current image block, the first motion amplitude, the second motion amplitude and a first mathematical model; or querying a first mapping table according to the texture complexity of the current image block, the first motion amplitude and the second motion amplitude to determine a selection probability, where the first mapping table includes a correspondence between the selection probability and the texture complexity of the current image block, the first motion amplitude and the second motion amplitude.

11. A bi-directional inter prediction method, comprising:

determining a selection probability according to the motion information and the attribute information of the current image block;

12. The method of claim 11, wherein the motion information comprises a first motion vector and a second motion vector, and wherein determining the selection probability according to the motion information and the attribute information of the current image block comprises:

determining a selection probability according to the size of the current image block, a horizontal component of the first motion vector, a vertical component of the first motion vector, a horizontal component of the second motion vector, a vertical component of the second motion vector and a second mathematical model, wherein the first motion vector comprises the horizontal component of the first motion vector and the vertical component of the first motion vector, and the second motion vector comprises the horizontal component of the second motion vector and the vertical component of the second motion vector; alternatively, the first and second electrodes may be,

and querying a second mapping table according to the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector to determine a selection probability, wherein the second mapping table comprises a corresponding relation between a selection value and the size of the current image block, the horizontal component of the first motion vector, the vertical component of the first motion vector, the horizontal component of the second motion vector and the vertical component of the second motion vector.

13. A method of encoding, comprising:

the bi-directional inter-prediction method of any of claims 1-12 is used in an encoding process, wherein the current image block is an image block to be encoded.

14. A method of decoding, comprising:

the bi-directional inter prediction method of any of claims 1-12 is used in a decoding process, the current image block is an image block to be decoded.

15. A bi-directional inter prediction apparatus, comprising:

the motion estimation unit is used for acquiring motion information of a current image block, wherein the current image block is an image block to be encoded or an image block to be decoded;

a determining unit, configured to obtain an initial prediction block of the current image block according to the motion information;

the determining unit is further configured to determine the texture complexity of the current image block according to the attribute information of the initial prediction block; determining a motion compensation mode of the current image block according to the texture complexity of the current image block; if the texture complexity of the current image block is smaller than a first threshold, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; if the texture complexity of the current image block is greater than or equal to the first threshold, the motion compensation mode is a bidirectional prediction-based optical flow technology BIO; the first threshold is any real number greater than 0;

and the motion compensation unit is used for performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block.

16. The apparatus of claim 15, wherein the motion information comprises a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector;

the determining unit is specifically configured to:

17. The apparatus according to claim 16, wherein the determining unit is specifically configured to:

18. The apparatus according to claim 17, wherein the determining unit is specifically configured to:

19. The apparatus according to claim 17, wherein the determining unit is specifically configured to:

calculating an average of the M x N pixel difference values;

20. The apparatus according to claim 17, wherein the determining unit is specifically configured to:

calculating a standard deviation of the M x N pixel difference values;

21. A bi-directional inter prediction apparatus, comprising:

the determining unit is used for determining the selection probability according to the texture complexity of the current image block and the motion information; determining a motion compensation mode of the current image block according to the selection probability; if the selection probability is greater than a second threshold value, determining that the motion compensation mode is a bidirectional prediction based optical flow technology BIO; if the selection probability is less than or equal to the second threshold, determining that the motion compensation mode is a weighted prediction technology based on bidirectional prediction; the second threshold is any real number greater than or equal to 0 and less than or equal to 1;

and the motion compensation unit is used for performing motion compensation on the current image block according to the determined motion compensation mode and the initial prediction block of the current image block.

22. The apparatus of claim 21, wherein the motion information comprises a first reference frame index, a second reference frame index, a first motion vector, and a second motion vector; the determining unit is further configured to:

23. The apparatus according to claim 22, wherein the motion amplitude of the current image block is determined by the motion information, and the determining unit is specifically configured to:

24. The apparatus according to claim 23, wherein the determining unit is specifically configured to:

25. A bi-directional inter prediction apparatus, comprising:

the determining unit is used for determining the selection probability according to the motion information and the attribute information of the current image block;

the determining unit is further configured to determine a motion compensation mode of the current image block according to the selection probability; if the selection probability is greater than a second threshold, the motion compensation mode is a bidirectional prediction based optical flow technology BIO; if the selection probability is less than or equal to the second threshold, the motion compensation mode is a weighted prediction technology based on bidirectional prediction; the second threshold is any real number greater than or equal to 0 and less than or equal to 1;

26. The apparatus according to claim 25, wherein the motion information comprises a first motion vector and a second motion vector, and wherein the determining unit is specifically configured to:

27. A terminal, characterized in that the terminal comprises: one or more processors, memory, and a communication interface;

the memory, the communication interface and the one or more processors; the terminal communicates with other devices through the communication interface, the memory for storing computer program code comprising instructions which, when executed by the one or more processors, cause the terminal to perform the bi-directional inter prediction method of any of claims 1-12.

28. A computer-readable storage medium comprising instructions that, when executed on a terminal, cause the terminal to perform the bi-directional inter prediction method of any of claims 1-12.

29. A video encoder comprising a non-volatile storage medium and a central processor, wherein the non-volatile storage medium stores an executable program, wherein the central processor is coupled to the non-volatile storage medium, and wherein the video encoder performs the bi-directional inter prediction method of any of claims 1-12 when the executable program is executed by the central processor.

30. A video decoder comprising a non-volatile storage medium and a central processor, wherein the non-volatile storage medium stores an executable program, wherein the central processor is coupled to the non-volatile storage medium, wherein the video decoder performs the bi-directional inter-frame prediction method of any of claims 1-12 when the executable program is executed by the central processor.