WO2019128716A1

WO2019128716A1 - Image prediction method, apparatus, and codec

Info

Publication number: WO2019128716A1
Application number: PCT/CN2018/120681
Authority: WO
Inventors: 高山; 马祥; 陈焕浜; 杨海涛
Original assignee: 华为技术有限公司
Priority date: 2017-12-31
Filing date: 2018-12-12
Publication date: 2019-07-04
Also published as: CN109996080A; CN109996080B

Abstract

The present invention provides an image prediction method. Said method comprises: obtaining initial prediction motion information of a current image block, determining a first reference block from a forward reference image, and determining a second reference block from a backward reference image; and performing, by means of completely new class mirroring, search around the first reference block and the second reference block, so as to determine whether there is a pair of target reference blocks having less image block matching cost, said pair of target reference blocks having a correlation in space; and obtaining, according to the pixel value of the target reference blocks at a first precision, a pixel prediction value of the current image block, said pixel prediction value of the current image block having a code stream precision. By means of the present invention, the image block matching cost calculation is performed at a high accuracy, and an optimal reference block pair is found, reducing the complexity of inter-frame prediction of a video image in the prior art, and improving the accuracy.

Description

Image prediction method, device and codec

Technical field

The present application relates to the field of video codec technology, and in particular, to an interframe prediction method and apparatus for video images, and a corresponding encoder and decoder.

Background technique

Digital video capabilities can be incorporated into a wide variety of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, Digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones (so-called "smart phones"), video teleconferencing devices, video streaming devices and the like . Digital video devices implement video compression techniques, for example, standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), The video coding standard H.265/High Efficiency Video Coding (HEVC) standard and the video compression techniques described in the extension of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.

Video compression techniques perform spatial (intra-image) prediction and/or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (ie, a video frame or a portion of a video frame) may be partitioned into several image blocks, which may also be referred to as tree blocks, coding units (CUs), and/or coding nodes. . The image block in the intra-coded (I) slice of the image is encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. An image block in an inter-coded (P or B) slice of an image may use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images. An image may be referred to as a frame, and a reference image may be referred to as a reference frame.

Among them, various video coding standards including the High Efficiency Video Coding (HEVC) standard propose a predictive coding mode for an image block, that is, predict a block to be currently coded based on an already encoded video data block. In the intra prediction mode, the current image block is predicted based on one or more previously decoded neighboring blocks in the same image as the current image block; in the inter prediction mode, based on the already decoded blocks in the different images Predict the current image block.

There are several inter-prediction modes, such as Merge mode, Skip mode, and Advanced Motion Vector Prediction mode (AMVP mode), but the conventional image prediction method has more processes and more complexity. High, accuracy is not high.

Summary of the invention

An embodiment of the present application provides an image prediction method and apparatus, and a corresponding encoder and decoder, in particular, an inter-frame prediction method for video images, which improves the prediction accuracy of motion information of an image block to a certain extent, thereby improving coding and decoding performance. .

In a first aspect, an embodiment of the present application provides an image prediction method, which includes: acquiring initial predicted motion information of a current image block; and determining, according to the initial predicted motion information, the current image in a first reference image. a first reference block corresponding to the block, and determining, in the second reference image, a second reference block corresponding to the current image block; wherein the first reference block includes a first search base point, and the second reference block includes a second search base point; determining N third reference blocks in the first reference image; and for any one of the N third reference blocks, according to the first search base point, Determining, in the second reference image, a fourth reference block corresponding to a position of the any one of the third reference block and the second search base; obtaining N reference block groups, wherein the reference block group includes a first a third reference block and a fourth reference block; N is greater than or equal to 1; increasing pixel values of the obtained third reference block and fourth reference block to a first pixel precision, and calculating at the first pixel precision An image block matching cost of the N reference block groups; determining, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion, the target reference block group including a target third reference block and a fourth reference block; a pixel predicted value of the current image block according to a pixel value of the target third reference block at a first precision and a pixel value of the target fourth reference block at a first precision, wherein The pixel prediction value of the current image block has a second pixel precision; the second pixel precision is less than the first pixel precision

In a second aspect, an embodiment of the present application provides an image prediction apparatus, including a plurality of functional units for implementing any one of the methods of the first aspect. For example, the apparatus may include: an acquiring unit, configured to acquire initial predicted motion information of a current image block; and a determining unit, configured to determine the current image block in the first reference image according to the initial predicted motion information Corresponding first reference block, and determining, in the second reference image, a second reference block corresponding to the current image block; wherein the first reference block includes a first search base point, and the second reference block includes a first reference block a search base unit, configured to determine N third reference blocks in the first reference image, and a mapping unit, configured to use, for any one of the N third reference blocks, a third reference block, according to Determining, in the second reference image, a fourth reference block corresponding to the first search base point, the location of the any one of the third reference blocks, and the second search base point; obtaining N reference block groups, Wherein, one reference block group includes a third reference block and a fourth reference block; N is greater than or equal to 1; and a calculation unit is configured to increase pixel values of the obtained third reference block and fourth reference block to a pixel precision, and calculating an image block matching cost of the N reference block groups at the first pixel precision; a selecting unit, configured to determine, in the N reference block groups, an image block matching cost criterion a target reference block group, the target reference block group including a target third reference block and a target fourth reference block; a prediction unit, configured to: according to the pixel value of the target third reference block at the first precision, and the target Obtaining a pixel prediction value of the current image block by a pixel value of the fourth reference block at a first precision, wherein the pixel prediction value of the current image block has a second pixel precision; the second pixel precision is less than the first Pixel accuracy.

According to the first aspect or the second aspect, in a possible implementation, the initial predicted motion information includes a reference image index for indicating that the two reference images include one forward reference image and one backward reference image.

According to the first aspect or the second aspect, in a possible implementation manner, the N third reference blocks include the first reference block; and the obtained N fourth reference blocks include the second reference a block; wherein the first reference block and the second reference block belong to one reference block group, that is, there is a corresponding relationship in space. It can also be understood that, according to any one of the N third reference blocks, according to the first search base point, the location of the any one third reference block, and the second search base point Correspondingly determining a fourth reference block in the second reference image includes: if the first reference block is a third reference block; the second reference block is correspondingly a fourth reference block .

According to the first aspect or the second aspect, in a possible implementation, for any one of the N third reference blocks, according to the first search base, the any one of the third Correspondingly determining a fourth reference block in the second reference image includes: determining, according to the location of the block and the second search base, the first reference block according to the any one of the third reference block and the first search base An i vector; determining, according to a time domain interval t1 of the current image block relative to the first reference image, a time domain interval t2 of the current image block relative to the second reference image, and the ith vector a jth vector, wherein the jth vector is opposite to a direction of the ith vector; i and j are both positive integers not greater than N; and one is determined according to the second search base point and the jth vector Fourth reference block. Accordingly, the method can be performed by a mapping unit.

According to the first aspect or the second aspect, in a possible implementation, the any one of the N third reference blocks, according to the first search base point, the any one Determining, in the second reference image, a fourth reference block corresponding to the location of the third reference block and the second search base, comprising: determining, according to the any one of the third reference block and the first search base point An ith vector; determining, according to the ith vector, a jth vector, wherein the jth vector is inversely different from the ith vector; i and j are positive integers not greater than N; The second search base point and the jth vector determine a fourth reference block. Accordingly, the method can be performed by a mapping unit.

According to the first aspect or the second aspect, in a possible implementation, the pixel values of the obtained third reference block and the fourth reference block are improved to a first pixel precision, and the first pixel precision is Calculating the image block matching cost of the N reference block groups includes: interpolating or subtracting pixel values of the obtained third reference block and the fourth reference block for at least one of the N reference block groups Up shifting to a first pixel precision; and calculating an image block matching cost at the first pixel precision; determining, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion includes: The first occurrence of the reference block group in the at least one reference block group that satisfies the image block matching cost less than the preset threshold is determined as the target reference block group. For example, two reference block groups are calculated, and the image block matching cost is not less than a preset threshold. When the third reference block group is calculated, the image block matching cost is less than a preset threshold, and the third reference block group is used as the third reference block group. The target references the block group and no further reference block groups are calculated. Accordingly, the method can be performed jointly by the computing unit and the selection unit.

According to the first aspect or the second aspect, in a possible implementation, the pixel values of the obtained third reference block and the fourth reference block are improved to a first pixel precision, and the first pixel precision is Calculating an image block matching cost of the N reference block groups includes: increasing pixel values of the obtained third reference block and fourth reference block by interpolation or shifting to a first pixel precision; for the N Calculating an image block matching cost for each of the reference block groups in the reference block group; determining, in the N reference block groups, the target reference block group that satisfies the image block matching cost criterion includes: imaging the N reference block groups A reference block group having the smallest block matching cost is determined as the target reference block group. For example, six reference block groups are calculated, wherein the fourth reference block group image block matching cost is the smallest, and the fourth reference block group is used as the target reference block group. Accordingly, the method can be performed jointly by the computing unit and the selection unit.

According to the first aspect or the second aspect, in a possible implementation, the pixel value according to the target third reference block at the first precision and the target fourth reference block are at the first precision Pixel values to obtain pixel prediction values for the current image block include:

Obtaining a pixel value predSamplesL0'[x][y] of the target third reference block at the first precision;

Obtaining a pixel value predSamplesL1'[x][y] of the target fourth reference block at the first precision;

The pixel prediction value of the current image block predSamples'[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0'[x][y]+predSamplesL1'[x][y]+ Offset2)>>shift2), where bitDepth is the second pixel precision, Shift2 is the shift parameter, and offset2 is equal to 1<<(shift2-1); wherein the second pixel precision can be the code stream precision. Accordingly, the method can be performed by a prediction unit.

According to the first aspect or the second aspect, in a possible implementation manner, the first motion vector and the second motion vector included in the initial predicted motion information; according to the initial predicted motion information, at the first Determining, in the reference image, the first reference block corresponding to the current image block, and determining, in the second reference image, that the second reference block corresponding to the current image block comprises: according to the location of the current image block and the The first motion vector obtains the first reference block, and obtains the second reference block according to the location of the current image block and the second motion vector. Accordingly, the method can be performed by the determining unit.

According to the first aspect or the second aspect, in a possible implementation, the motion search may be performed in a preset step with reference to the search base point of the first reference block, and the N third reference blocks are searched for.

According to the first aspect or the second aspect, in a possible implementation manner, the method further includes: determining a motion vector corresponding to the target third reference block and the target fourth reference block as the forward optimal motion vector sum The backward optimal motion vector provides a motion vector reference for the prediction of subsequent image blocks.

The above methods and apparatus can be implemented by a processor calling a program and instructions in a memory.

In a third aspect, an embodiment of the present application provides a video encoder, where the video encoder is used to encode an image block, and includes any possible image prediction apparatus and a code reconstruction module, where the image prediction apparatus is configured to obtain a current image block. a prediction value of the pixel value; the code reconstruction module is configured to obtain the reconstructed pixel value of the current image block according to the predicted value of the pixel value of the current image block. Accordingly, the video encoder can perform any of the possible design methods described above.

In a fourth aspect, an embodiment of the present application provides a video decoder, where the video decoder is used to decode an image block, and includes any possible image prediction apparatus and a decoding reconstruction module, where the image prediction apparatus is configured to obtain a current image block. a prediction value of the pixel value; a decoding reconstruction module, configured to obtain a reconstructed pixel value of the current image block according to a predicted value of a pixel value of the current image block. Accordingly, the video decoder can perform any of the possible design methods described above.

In a fifth aspect, an embodiment of the present application provides an apparatus for encoding video data, where the apparatus includes:

a memory for storing video data, the video data comprising one or more image blocks;

A video encoder for encoding an image, and the inter prediction method in the encoding process may adopt any of the above possible design methods.

In a sixth aspect, an embodiment of the present application provides an apparatus for decoding video data, where the device includes:

A video decoder for decoding an image, and the inter prediction method in the decoding process may adopt any of the above possible design methods.

In a seventh aspect, an embodiment of the present application provides an encoding device, including: a non-volatile memory and a processor coupled to each other, the processor calling a program code stored in the memory to perform any one of the first aspects. Part or all of the steps of the method.

In an eighth aspect, an embodiment of the present application provides a decoding apparatus, including: a non-volatile memory and a processor coupled to each other, the processor calling a program code stored in the memory to perform any one of the first aspects. Part or all of the steps of the method.

In a ninth aspect, the embodiment of the present application provides a computer readable storage medium, where the program code stores program code, where the program code includes a part for performing any one of the methods of the first aspect or Instructions for all steps.

In a tenth aspect, the embodiment of the present application provides a computer program product, when the computer program product is run on a computer, causing the computer to perform some or all of the steps of any one of the first aspects.

It should be understood that the foregoing solutions are only possible implementation forms in the present application, and the implementation manners can be freely combined without departing from the natural laws.

DRAWINGS

1 is a schematic diagram of a video encoding process in an embodiment of the present application;

2 is a schematic diagram of a video decoding process in an embodiment of the present application;

FIG. 3 is a schematic diagram of an image prediction method according to an embodiment of the present application;

4 is a schematic diagram of an inter prediction mode in an embodiment of the present application;

FIG. 5 is a schematic diagram of another inter prediction mode in the embodiment of the present application; FIG.

6 is a schematic diagram of a search reference block in an embodiment of the present application;

FIG. 7 is a schematic diagram of an image prediction apparatus according to an embodiment of the present application; FIG.

8 is a schematic block diagram of a video encoder in an embodiment of the present application;

9 is a schematic block diagram of a video decoder in an embodiment of the present application;

FIG. 10 is a schematic block diagram of a video transmission system in an embodiment of the present application; FIG.

11 is a schematic diagram of a video codec apparatus in an embodiment of the present application;

FIG. 12 is a schematic block diagram of a video codec system in an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings.

The image prediction method in the present application can be applied to the field of video codec technology. In order to better understand the image prediction method of the present application, the video codec is first introduced below.

A video generally consists of a number of frame images in a certain order. Generally speaking, there is a large amount of repeated information (redundant information) in one frame of image or between different frame images. For example, there is often a large amount of space in one frame of image. The same or similar structure, that is to say, there is a large amount of spatial redundancy information in the video file. In addition, there is a large amount of time redundant information in the video file, which is caused by the composition of the video. For example, the frame rate of video sampling is generally 25 frames/second to 60 frames/second, that is, the sampling interval between adjacent frames is 1/60 second to 1/25 second, in such a short period of time, There are basically a lot of similar information in the sampled image, and there is a huge correlation between the images.

In addition, related research shows that from the perspective of the psychological characteristics of the human eye's visual sensitivity, there is also a part of the video information that can be used for compression, that is, visual redundancy. The so-called visual redundancy refers to the proper compression of the video bit stream by using the human eye to be sensitive to changes in luminance and relatively less sensitive to changes in chrominance. For example, in high-brightness areas, the sensitivity of human vision to brightness changes tends to decrease, and is more sensitive to the edges of objects; in addition, the human eye is relatively insensitive to internal areas and sensitive to the overall structure. Since the final target of the video image is our human population, we can make full use of these characteristics of the human eye to compress the original video image to achieve better compression. In addition to the above mentioned spatial redundancy, temporal redundancy and visual redundancy, video image information also has redundancy in information entropy redundancy, structural redundancy, knowledge redundancy, importance redundancy, etc. information. The purpose of video coding (also referred to as video compression coding) is to use various technical methods to remove redundant information in a video sequence to reduce storage space and save transmission bandwidth.

At present, in the internationally accepted range, there are four mainstream compression coding methods in video compression coding standards: chroma sampling, predictive coding, transform coding, and quantization coding. The following describes the coding methods in detail.

Chroma sampling: This method makes full use of the visual psychological characteristics of the human eye, and tries to minimize the amount of data described by a single element from the underlying data representation. For example, most of the television systems use luminance-chrominance-chrominance (YUV) color coding, which is a widely adopted standard in European television systems. The YUV color space includes a luminance signal Y and two color difference signals U and V, and the three components are independent of each other. The YUV color space is more flexible in representation, and the transmission occupies less bandwidth, which is superior to the traditional red, green and blue (RGB) color model. For example, the YUV 4:2:0 form indicates that the two chrominance components U and V are only half of the luminance Y component in both the horizontal direction and the vertical direction, that is, there are four luminance components Y among the four sampled pixels, and the chrominance component There is only one U and V. When expressed in this form, the amount of data is further reduced, only about 33% of the original. Therefore, chroma sampling makes full use of the physiological visual characteristics of the human eye, and the purpose of video compression by means of such chroma sampling is one of the widely used video data compression methods.

Predictive coding: Predictive coding uses the data information of the previously encoded frame to predict the frame currently to be encoded. A predicted value is obtained by prediction, which is not completely equivalent to the actual value, and there is a certain residual value between the predicted value and the actual value. The more accurate the prediction, the closer the predicted value is to the actual value, and the smaller the residual value, so that the residual value can be encoded to greatly reduce the amount of data, and the residual value plus the predicted value is used when decoding at the decoding end. It is possible to restore and reconstruct the matching image, which is the basic idea of predictive coding. In the mainstream coding standard, predictive coding is divided into two basic types: intra prediction and inter prediction. Intra Prediction refers to predicting the pixel value of the pixel in the current coding unit by using the pixel value of the pixel in the reconstructed region in the current image; Inter Prediction is the reconstructed image. Searching for a matching reference block for the current coding unit in the current image, using the pixel value of the pixel in the reference block as the prediction information or the predicted value of the pixel value of the pixel in the current coding unit, and transmitting the motion of the current coding unit. information.

Transform coding: This coding method does not directly encode the original spatial domain information, but converts the information sample values from the current domain to another artificial domain according to some form of transformation function (commonly called transform domain). ), and then compression coding according to the distribution characteristics of the information in the transform domain. Since video image data tends to have very large data correlation in the spatial domain, there is a large amount of redundant information, and if it is directly encoded, a large amount of bits is required. When the information sample value is converted into the transform domain, the correlation of the data is greatly reduced, so that the amount of data required for encoding is greatly reduced due to the reduction of redundant information during encoding, so that high compression can be obtained. Than, and can achieve better compression. Typical transform coding methods include Kalo (K-L) transform, Fourier transform, and the like.

Quantization coding: The above-mentioned transform coding does not compress the data itself, and the quantization process can effectively achieve the compression of the data. The quantization process is also the main reason for the loss of data in the lossy compression. The process of quantification is the process of "force planning" a large dynamic input value into fewer output values. Since the range of quantized input values is large, more bit number representation is needed, and the range of output values after "forced planning" is small, so that only a small number of bits can be expressed.

In the coding algorithm based on the hybrid coding architecture, the above several compression coding modes may be used in combination, and the encoder control module selects the coding mode adopted by the image block according to the local characteristics of different image blocks in the video frame. The intra-predictive coded block is subjected to frequency domain or spatial domain prediction, and the inter-predictive coded block is subjected to motion compensation prediction, and the predicted residual is further transformed and quantized to form a residual coefficient, and finally the final code is generated by the entropy encoder. flow. In order to avoid the accumulation of prediction errors, the intra or inter prediction reference signals are obtained by the decoding module at the encoding end. The transformed and quantized residual coefficients are reconstructed by inverse quantization and inverse transform, and then added to the predicted reference signal to obtain a reconstructed image. In addition, the loop filtering performs pixel correction on the reconstructed image to improve the encoding quality of the reconstructed image.

The whole process of video codec is briefly introduced in conjunction with FIG. 1 and FIG.

Figure 1 is a schematic diagram of a video encoding process.

As shown in FIG. 1 , when performing prediction on the current image block in the current frame Fn, either intra prediction or inter prediction may be used. Specifically, whether intra coding or intraframe coding can be selected according to the type of the current frame Fn. Inter-frame coding, for example, intra prediction is used when the current frame Fn is an I frame, and inter prediction is used when the current frame Fn is a P frame or a B frame. When the intra prediction is adopted, the pixel value of the pixel of the current image block may be predicted by using the pixel value of the pixel of the reconstructed area in the current frame Fn, and the reference frame F'n _-1 may be used when inter prediction is adopted. The pixel value of the pixel of the reference block that matches the current image block predicts the pixel value of the pixel of the current image block.

After obtaining the prediction block of the current image block according to inter prediction or intra prediction, the pixel value of the pixel point of the current image block is compared with the pixel value of the pixel point of the prediction block to obtain residual information, and the residual information is obtained. The change, quantization, and entropy coding are performed to obtain an encoded code stream. In addition, in the encoding process, the residual information of the current frame Fn and the prediction information of the current frame Fn are superimposed, and a filtering operation is performed to obtain a reconstructed frame F' _{n of the} current frame, and is used as a reference frame for subsequent encoding. .

2 is a schematic diagram of a video decoding process.

The video decoding process shown in FIG. 2 is equivalent to the inverse process of the video decoding process shown in FIG. 1. When decoding, the residual information is obtained by using entropy decoding and inverse quantization and inverse transform, and the current image block is determined according to the decoded code stream. Intra prediction is also inter prediction. If it is intra prediction, the prediction information is constructed according to the intra prediction method using the pixel values of the pixels in the reconstructed region in the current frame; if it is inter prediction, the motion information needs to be parsed, and the parsed motion information is used. The reference block is determined in the reconstructed image, and the pixel value of the pixel in the reference block is used as prediction information. Next, the prediction information is superimposed with the residual information, and the reconstruction information is obtained through the filtering operation.

Please refer to FIG. 3. FIG. 3 is a schematic flowchart of an image prediction method according to an embodiment of the present application. The method shown in FIG. 3 can be performed by a video codec device, a video codec, a video codec system, and other devices having video codec functions. The method shown in FIG. 3 can occur both in the encoding process and in the decoding process. More specifically, the method shown in FIG. 3 can occur in the interframe prediction process at the time of encoding and decoding.

The method shown in FIG. 3 includes steps 301 to 308, and steps 301 to 308 are described in detail below.

301. Acquire initial predicted motion information of a current image block.

302. Determine, according to the initial predicted motion information, a first reference block corresponding to the current image block in the first reference image, and determine a second reference block corresponding to the current image block in the second reference image; where, the first The reference block includes a first search base point, and the second reference block includes a second search base point; a pixel value of the first reference block and a pixel value of the second reference block have a first pixel precision.

The image block here may be one image block in the image to be processed, or may be one sub-image in the image to be processed. In addition, the image block herein may be an image block to be encoded in the encoding process, or may be an image block to be decoded in the decoding process.

Optionally, the foregoing initial predicted motion information includes indication information of a prediction direction (usually forward prediction, backward prediction, or bidirectional prediction), a motion vector directed to the reference image block (usually a motion vector of a neighboring block), and a reference. The indication information of the image (generally understood as reference image information for determining the reference image), wherein the motion vector comprises a forward motion vector and/or a backward motion vector, the reference image information comprising a forward reference image block and/or a backward direction Reference frame index information of the reference image block. The position of the forward reference block and the position of the backward reference block can be determined by the motion vector information.

Optionally, the first reference image is a forward reference image, and the second reference image is a backward reference image;

In a possible implementation, the first motion vector and the second motion vector included in the motion information are initially predicted; the position of the first reference block may be obtained according to the position of the current image block and the first motion vector, that is, the first a reference block; and obtaining a second reference block according to the position of the current image block and the second motion vector, that is, determining the second reference block.

In a possible implementation, if the location of the first reference block and/or the second reference block may be an equivalent position of the current image block, it may also be obtained according to the equivalent position and the motion vector.

There are various ways to obtain the initial predicted motion information of the image block. For example, the following method 1 and mode 2 may be used to obtain the initial predicted motion information of the image block.

method one:

In the merge mode of the inter prediction, the candidate prediction motion information list is constructed according to the motion information of the neighboring block of the current image block, and a candidate prediction motion information is selected from the candidate prediction motion information list as an initial prediction of the current image block. Sports information. The candidate predicted motion information list includes a motion vector, reference frame index information of a reference image block, and the like. As shown in FIG. 4, the motion information of the neighboring block A0 is selected as the initial predicted motion information of the current image block. Specifically, the forward motion vector of A0 is used as the forward motion vector of the current image block, and the backward direction of A0 is used. The motion vector is used as the backward predicted motion vector of the current image block.

Method 2:

In the non-merging mode of inter prediction, a motion vector predictor list is constructed according to motion information of neighboring blocks of the current image block, and a motion vector is selected from the motion vector predictor list as motion vector prediction of the current image block. value. In this case, the motion vector of the current image block may be the motion vector value of the adjacent block, or may be the sum of the motion vector of the selected neighboring block and the motion vector difference of the current image block, where the motion vector difference The difference between the motion vector obtained by motion estimation of the current image block and the motion vector of the selected neighboring block. As shown in FIG. 5, the motion vectors corresponding to the indices 1 and 2 in the motion vector predictor list are selected as the forward motion vector and the backward motion vector of the current image block.

It should be understood that the foregoing manners 1 and 2 are only two specific manners for obtaining initial prediction motion information of the current image block. The method for acquiring motion information of the prediction block is not limited, and any initial prediction motion information of the image block may be acquired. The manner of this is within the scope of protection of this application.

The base point can be represented by a coordinate point, which is a kind of position information, which can be used to indicate the position of the image block, and can also be used as a reference in the subsequent image block search. It may be the top left corner of an image block, or the center point of an image block, or a relative position point specified by other rules, which is not limited in this application. The base point of the reference image can be used as a search base point in subsequent search processes. So once the position of a reference block is determined, the search base point is determined. The base points contained in both the first reference block and the second reference block may also be referred to as a first search base point and a second search base point respectively because of subsequent search operations related to the base point; they may be predetermined It can also be specified in the process of codec.

For example, the forward motion vector is (MV0x, MV0y), and the base point of the current image block is (B0x, B0y), and the base point of the forward reference block is (MV0x+B0x, MV0y+B0y). A similar manner can be applied to the backward reference block, which is not described in the present application.

For convenience of clarity, in the next step, with the upper left corner vertex of one image block as a base point, the first reference image may refer to the forward reference image, and the second reference image may refer to the backward reference image, correspondingly, The first reference block may refer to a forward reference block and the second reference block may refer to a backward reference block. It should be understood that this is merely an illustrative example for the convenience of the description, and does not impose any limitation on the implementation of the present invention.

303. Determine N third reference blocks in the first reference image, where the value of N is greater than or equal to 1.

Step 303 includes a search method, and the specific search method can be as follows:

In the forward reference image, a motion search of an integer pixel step is performed around the first reference block with reference to the first reference block (or the first search base). The integer pixel step may refer to a position offset of the position of the candidate search block relative to the first reference block as an integer pixel distance, wherein the size of the candidate search block may be the same as the first reference block, so that the search process may determine The location of the candidate search block, and then the third reference block is determined according to the search rule. It should be pointed out that no matter whether the search base point is an integer pixel (the starting point can be an integer pixel, or a sub-pixel, such as 1/2, 1/4, 1/8, 1/16, etc.), the whole pixel can be performed. The step motion search obtains the position of the forward reference block of the current image block, that is, the third reference block is determined correspondingly. After searching for some third reference blocks in integer pixel steps, optionally, a sub-pixel search can be performed, and then some third reference blocks are obtained, and if there is still a search requirement, the finer sub-pixels can be continued. Search... For the search method, see Figure 6, where (0,0) is the search base point, you can use the cross search: search for (0,-1), (0,1), (-1,0) and (1,0); or square search: search for (-1,-1), (1,-1), (-1,1) and (1,1) in sequence; these points are the upper left corner of the candidate search block Vertex, these base points are determined, then the reference block corresponding to the base points, that is, the third reference block is also determined. It should be noted that, in the present invention, the search method is not limited, and any prior art search may be adopted. Methods, such as fractional pixel step size search, can be used in addition to integer pixel step search. For example, a search for a fractional pixel step size is directly performed, and a specific search method is not limited herein.

304. For any one of the N third reference blocks, according to the first search base point, the location of the any one third reference block, and the second search base point, in the A fourth reference block is correspondingly determined in the two reference pictures; N reference block groups are obtained, wherein one reference block group includes a third reference block and a fourth reference block.

Optionally, if the image of the current image block is the same as the time domain interval of the forward reference image and the backward reference image, for example, the time interval between the forward reference image and the backward reference image from the image of the current image block is ±0.04 s, the reference block group can be found using the motion vector difference MVD (motion vector difference) mirroring constraint. The MVD image constraint here is that if the positional offset of a third image block (base point) relative to the first reference block (first search base point) is Offset0 (deltaX0, deltaY0), it is found in the backward image. The candidate image block whose position offset is Offset1 (deltaX1, deltaY1) with respect to the second reference block (second search base point) is determined as a fourth reference block, where deltaX1=-deltaX0;deltaY1=-deltaY0; corresponding, Offset0 (deltaX0, deltaY0) can represent the ith vector, and Offset1 (deltaX1, deltaY1) can represent the jth vector. As another optional implementation manner, the image of the current current image block is different from the time domain interval of the forward reference image and the backward reference image, and the motion vector difference (MVD) image constraint can still be used to find the reference. Block group. In this implementation method, the ith vector and the jth vector are inversely large.

It should be understood that the "group" referred to in the present application is intended to express a corresponding relationship and does not constitute any form of limitation.

As an extension, if the current image block is located at a time interval different from the forward reference image and the backward reference image, the time interval between the forward reference image and the backward reference image respectively from the image of the current image block is For t1 and t2, the following constraint can be adopted: if the positional offset of the block position (base point) of a third image block relative to the first reference block (first search base point) is Offset00 (deltaX00, deltaY00), then in the backward direction An image block in the image in which the positional offset with respect to the second reference block (second search base point) is Offset01 (deltaX01, deltaY01) is determined as a fourth reference block. Among them, deltaX01=-deltaX00*t2/t1; deltaY01=-deltaY00*t2/t1. Correspondingly, Offset0 (deltaX0, deltaY0) can represent the ith vector, and Offset1 (deltaX1, deltaY1) can represent the jth vector.

It should be understood that in calculating the time domain interval, the first image block, the first base point and the first reference image have equivalent functions, and the current image block, the base point of the current image block, and the image of the current image block have equivalent functions. The essence is to calculate the time domain interval of the image where the first reference image and the current image block are located; the same applies to calculating the time domain interval of the second reference image and the image of the current image block. That is, the time interval between frames.

It can be concluded from the foregoing implementation that, as long as a third reference block is searched, a fourth reference block is correspondingly determined; that is, the determined third reference block and the fourth reference block are equal in number and in spatial position. It is one-to-one correspondence. Finally, N reference block groups can be obtained, wherein one reference block group includes one third reference block and one fourth reference block.

It should be understood that the above-mentioned positional offset may refer to the offset between the base point and the base point, and may also refer to the offset between the image block and the image block, representing a relative position.

Optionally, the N reference block groups may include the foregoing first reference block and the foregoing second reference block; that is, the first reference block may be a third reference block, and correspondingly, the second reference block may be a fourth Reference block. In the implementation method, both the ith vector and the jth vector are 0. In particular, when the first reference block is the third reference block, the second reference block is its corresponding fourth reference block.

As a supplementary description, the third reference block and/or the fourth reference block mentioned in the present application are not limited to image blocks of a specific location, but may represent a type of reference block, which may be a specific image. The block may also be a plurality of image blocks; for example, the third reference block may be based on any one of the image blocks searched around the first base point, and the fourth reference block may be an image block corresponding to any one of the image blocks described above, thus The fourth reference block may be a specific image block or a plurality of image blocks.

Step 305: Increase the pixel values of the obtained third reference block and the fourth reference block to a first pixel precision, and calculate an image block matching cost of the N reference block groups under the first pixel precision.

First, a reference block group is taken as a specific example to illustrate how to calculate an image block matching cost of a reference block group. The reference block group includes a third reference block and a fourth reference block determined corresponding thereto. First, the pixel values of the third reference block and the fourth reference block are referred to as high first pixel precision, wherein the third reference block and the fourth reference block are both image blocks that have been coded, so their pixels have a code stream. The accuracy, such as the code stream precision is 8 bits, the pixel precision of the pixel values of the third reference block and the fourth reference block is 8 bits. In order to find a reference block that is more similar in image, it is necessary to improve the precision of the pixels of the third reference block and the fourth reference block. Increasing the accuracy can be achieved by prior art methods of difference or shift, which are not described in this application. In order to better facilitate the subsequent calculation of the image block matching cost, the accuracy of the image block that needs to calculate the image block matching cost needs to be increased to the same precision, for example, 14 bits. After the above operation, the 14-bit pixel value of the third reference block is obtained, denoted as pi[x, y], and the 14-bit pixel value of the fourth reference block is recorded as pj[x, y], where x, y represents the coordinates. An image block matching cost eij is calculated from pi[x, y] and pj[x, y], which may also be referred to as an image block matching error eij. There are many ways to calculate the image block matching error, such as the SAD criterion, the MR-SAD criterion, and other evaluation criteria in the prior art, and the calculation method of the image block matching error is not limited in the present invention.

If there are more than two sets of reference block groups, the image block matching cost calculation described above can be performed for a plurality of reference block groups.

In an implementation manner, the first reference block is a third reference block, and the second reference block is a fourth reference block; and the pixel values of the first reference block and the second reference block are obtained by a motion compensation method. Motion compensation refers to pointing to a reconstructed reference image (with pixel precision of the code stream) according to the motion vector, and obtaining the pixel value (having the first pixel precision) of the reference block of the current image block. For example, the position pointed by the motion vector is a sub-pixel position, and the pixel value of the entire pixel position of the reference image needs to be interpolated by using an interpolation filter to obtain a pixel value of the sub-pixel position as the pixel value of the reference block of the current image block; The position pointed by the motion vector is a whole pixel position, and a moving operation can be employed. The coefficient sum of the interpolation filter, that is, the interpolation filter gain, is 2 to the power of N. If N is 6, it means that the interpolation filter gain is 6 bits. In the interpolation operation, since the interpolation filter gain is usually greater than 1, the accuracy of the pixel values of the obtained forward reference block and backward reference block is higher than that of the code stream. In order to reduce the loss of precision, no shifting and/or limiting operations are performed at this time to preserve the pixel values of the high-precision forward reference block and the backward reference block. For example, the pixel value precision bitDepth of the predicted image is 8 bits, the interpolation filter gain is 6 bits, and the predicted pixel value with a precision of 14 bits is obtained; for example, the pixel value precision bitDepth of the predicted image is 10 bits, and the interpolation filter gain is 6 bits. A predicted pixel value with a precision of 16 bits is obtained; if the pixel value precision bitDepth of the predicted image is 10 bits, the interpolation filter gain is 6 bits, and then 2 bits are shifted right, and a predicted pixel value with a precision of 14 bits is obtained. Commonly used interpolation filters are 4 taps, 6 taps, 8 taps, and so on. There are many motion compensation methods in the prior art, which are not described in this application.

In addition, as a supplementary explanation, the pixels of the image block referred to in the present application may include a luminance component sample, or a luma sample; correspondingly, the pixel point is a luminance component sampling point; and the pixel value is a luminance component sampling value.

306. Determine, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion, where the target reference block group includes a target third reference block and a target fourth reference block.

Optionally, the image block matching cost criterion comprises: determining a reference block group with the smallest image block matching cost as the target reference block group.

Optionally, the image block matching cost criterion further includes: determining, as the target reference block group, the first occurrence of the reference block group that satisfies the image block matching cost less than the preset threshold.

It should be understood that step 304, step 305, and step 306 may be performed after step 303, or may be performed in synchronization with step 303. The step numbers do not constitute any limitation on the order in which the methods are executed.

For example, each time a third reference block is determined, a fourth reference block is correspondingly determined, and an image block matching cost of the third set of reference blocks and the fourth reference block is calculated, if the Nth reference is calculated. In the block group, if the image block matching cost result satisfies a preset condition, such as less than a preset threshold, or even 0, the Nth reference block group is used as the target reference block group. It is not necessary to determine and calculate more third reference blocks and fourth reference blocks, which can reduce the computational complexity, where N is greater than or equal to 1.

If N third reference blocks are determined first, and N fourth reference blocks are determined one-to-one, N reference block groups are formed, and then images corresponding to each reference block group are calculated for the N reference block groups. The block matching error is compared and compared, and the image block matching cost result satisfies a preset condition. For example, if the image block matching error is the smallest, the reference block group with the smallest image block matching cost is selected (if there are at least a plurality, the random one is optional) ) as the target reference block group.

Correspondingly, the third reference block and the fourth reference block in the target reference block group, that is, the target third reference block and the target fourth reference block, may also be respectively called the optimal forward reference block of the current image block and the current image block respectively. The optimal backward reference block; for step 306 to perform the calculation.

As a supplementary explanation, the third reference block is determined based on the first reference block having the first pixel precision; the fourth reference block is determined based on the second reference block having the first pixel precision; thus the third reference block and the fourth reference The pixel precision of the block is also the first pixel precision, ie higher than the pixel precision of the code stream.

307. Obtain a pixel prediction value of the current image block according to the pixel value of the target third reference block at the first precision and the pixel value of the target fourth reference block at the first precision, where the pixel prediction value of the current image block has a Two pixel precision; the second pixel precision is less than the first pixel precision described above.

Performing "average weighting + shifting" on the obtained pixel value of the target third reference block (first pixel precision) and the pixel value (first pixel precision) of the target fourth reference block to obtain a pixel prediction value of the current image block. (second pixel accuracy).

Alternatively, the second pixel precision is the same as the pixel precision (bitDepth) of the code stream.

In a specific implementation process, the pixel value of the target third reference block at the first precision is predSamplesL0'[x][y]; and the pixel value of the target fourth reference block at the first precision is predSamplesL1'[x][ y]; the pixel prediction value of the current image block predSamples'[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0'[x][y]+predSamplesL1'[x][y] +offset2)>>shift2), where bitDepth is the pixel precision of the code stream, Shift2 is the shift parameter, and offset2 is equal to 1<<(shift2-1). x and y are the coordinates of the horizontal and vertical directions of each pixel in the image block, and for each pixel in the image block, the operation is as shown in the above formula. For example, the precision of the pixel value of the target third reference block is 14 bits, the precision of the pixel value of the target fourth reference block is 14 bits, and the shift2 is 15-bitDepth, and the precision of the pixel prediction value of the current image block is 14+1. -shift2=bitDepth.

In summary, since the first reference block and the second reference block obtained from the initial motion information are not necessarily able to accurately predict the current image block, a completely new method will be adopted in the present application to find a more suitable target. The third reference block and the target fourth reference block, and the current image block is predicted by the pixel values of the target third reference block and the target fourth reference block. Through the prediction method provided by the invention, the high-precision pixel value can be continuously maintained in the matching process, and the repeated limit operation and the motion compensation operation are not required, thereby reducing the complexity of the codec.

It should be understood that the image prediction method in the embodiment of the present application may occur in the inter prediction process shown in FIG. 1 and FIG. 2, and the image prediction method in the embodiment of the present application may be specifically implemented by an inter prediction module in an encoder or a decoder. To execute. Additionally, the image prediction method of embodiments of the present application can be implemented in any electronic device or device that may require encoding and/or decoding of a video image.

Based on the prediction method provided by the foregoing embodiment, an embodiment of the present invention provides an image prediction apparatus. The image prediction apparatus of the embodiment of the present application is described below with reference to FIG. Here, the image predicting apparatus shown in FIG. 6 corresponds to the method shown in FIG. 3, and each step in the method shown in FIG. 3 can be executed. For the sake of brevity, the repeated description is appropriately omitted below.

Referring to FIG. 7, an image prediction apparatus 700, the apparatus 700 includes:

The obtaining unit 701 is configured to acquire initial predicted motion information of the current image block. This unit can be implemented by the processor invoking code in memory.

a determining unit 702, configured to determine, according to the initial predicted motion information, a first reference block corresponding to the current image block in the first reference image, and determine, in the second reference image, the current image block a second reference block; wherein the first reference block includes a first search base point and the second reference block includes a second search base point. This unit can be implemented by the processor invoking code in memory.

The searching unit 703 is configured to determine N third reference blocks in the first reference image. This unit can be implemented by the processor invoking code in memory.

The mapping unit 704 is configured to: according to the first search base point, the location of the any one of the third reference blocks, and the second search base point, for any one of the N third reference blocks, Correspondingly determining a fourth reference block in the second reference image; obtaining N reference block groups, wherein one reference block group includes a third reference block and a fourth reference block; N is greater than or equal to 1. This unit can be implemented by the processor invoking code in memory.

The calculating unit 705 is configured to increase the obtained pixel values of the third reference block and the fourth reference block to a first pixel precision, and calculate an image block matching cost of the N reference block groups at the first pixel precision. This unit can be implemented by the processor invoking code in memory.

The selecting unit 706 is configured to determine, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion, where the target reference block group includes a target third reference block and a target fourth reference block. This unit can be implemented by the processor invoking code in memory.

a prediction unit 707, configured to obtain a pixel prediction value of the current image block according to a pixel value of the target third reference block at a first precision and a pixel value of the target fourth reference block at a first precision, where The pixel prediction value of the current image block has a second pixel precision; the second pixel precision is less than the first pixel precision. This unit can be implemented by the processor invoking code in memory.

In a specific implementation process, the obtaining unit 701 is specifically configured to perform the method mentioned in the foregoing step 301 and the method that can be replaced by the same; the determining unit 702 is specifically configured to perform the method mentioned in the foregoing step 302 and can be equivalently replaced. The search unit 703 is specifically configured to perform the method mentioned in the above step 303 and the method that can be replaced by the same; the mapping unit 704 is specifically configured to perform the method mentioned in the above step 304 and the method that can be replaced equally; 705 is specifically configured to perform the method mentioned in step 305 and the method that can be replaced by the same; the selecting unit 706 is specifically configured to perform the method mentioned in step 306 and the method that can be replaced equally; the prediction unit 707 is specifically configured to perform the step The method mentioned in 307 and the method which can be equivalently replaced. The corresponding method embodiments and corresponding explanations, representations, refinements, and alternative alternative embodiments are also applicable to the method in the device.

The device 700 may specifically be a video encoding device, a video decoding device, a video codec system, or other device having a video codec function. The apparatus 700 can be used for both image prediction in the encoding process and image prediction in the decoding process, especially inter-frame prediction in video images. Apparatus 700 includes a number of functional units for implementing any of the foregoing methods

The present application further provides a terminal device, the terminal device includes: a memory for storing a program; a processor, configured to execute the program stored by the memory, when the program is executed, the processor is configured to execute the program The image prediction method of the application embodiment includes steps 301-307.

The terminal devices here may be video display devices, smart phones, portable computers, and other devices that can process video or play video.

The present application also provides a video encoder, including a non-volatile storage medium, and a central processing unit, the non-volatile storage medium storing an executable program, the central processing unit and the non-volatile storage The medium is connected, and the executable program is executed to implement the image prediction method of the embodiment of the present application, including steps 301-307.

The present application also provides a video decoder, including a non-volatile storage medium, and a central processing unit, the non-volatile storage medium storing an executable program, the central processing unit and the non-volatile storage The medium is connected, and the executable program is executed to implement the image prediction method of the embodiment of the present application, including steps 301-307.

The present application also provides a video encoding system including a non-volatile storage medium, and a central processing unit, the non-volatile storage medium storing an executable program, the central processing unit and the non-volatile storage The medium is connected, and the executable program is executed to implement the image prediction method of the embodiment of the present application, including steps 301-307.

The present application also provides a computer readable medium storing program code for device execution, the program code comprising instructions for performing an image prediction method of an embodiment of the present application, including steps for implementing Program code of 301-307.

The present application also provides a decoder, which includes an image prediction apparatus in the embodiment of the present application, such as 700, and a decoding reconstruction module, wherein the decoding reconstruction module is configured to obtain according to the image prediction apparatus. The predicted value of the pixel value of the image block results in a reconstructed pixel value of the image block.

The present application also provides an encoder, which includes an image prediction apparatus in the embodiment of the present application, such as 700, and a code reconstruction module, wherein the code reconstruction module is configured to obtain according to the image prediction device. The predicted value of the pixel value of the image block results in a reconstructed pixel value of the image block.

FIG. 8 is a schematic block diagram of a video encoder according to an embodiment of the present application. The video encoder 1000 shown in FIG. 8 includes an encoding end prediction module 1001, a transform quantization module 1002, an entropy encoding module 1003, a code reconstruction module 1004, and an encoding end filtering module.

The video encoder 1000 shown in FIG. 8 can encode a video. Specifically, the video encoder 1000 can perform the video encoding process shown in FIG. 1 to implement encoding of a video. In addition, the video encoder 1000 can also perform the image prediction method of the embodiment of the present application, and the video encoder 1000 can perform various steps of the image prediction method shown in FIG. 3, including refinement of each step and an alternative implementation. The image prediction apparatus in the embodiment of the present application may also be the encoding end prediction module 1001 in the video encoder 1000.

9 is a schematic block diagram of a video decoder of an embodiment of the present application. The video decoder 2000 shown in FIG. 9 includes an entropy decoding module 2001, an inverse transform inverse quantization module 2002, a decoding end prediction module 2003, a decoding reconstruction module 2004, and a decoding end filtering module 2005.

The video decoder 2000 shown in FIG. 9 can encode the video. Specifically, the video decoder 2000 can perform the video decoding process shown in FIG. 2 to implement decoding of the video. In addition, the video decoder 2000 can also perform the image prediction method of the embodiment of the present application, and the video decoder 2000 can perform various steps of the image prediction method shown in FIG. 3, including refinement of each step and an alternative implementation. The image prediction apparatus 700 in the embodiment of the present application may also be the decoding side prediction module 2003 in the video decoder 2000.

The application scenario of the image prediction method in the embodiment of the present application is described below with reference to FIG. 10 to FIG. 12 . The image prediction method in the embodiment of the present application may be implemented by the video transmission system, the codec device, and the editing device shown in FIG. 10 to FIG. 12 . The decoding system is executed.

FIG. 10 is a schematic block diagram of a video transmission system according to an embodiment of the present application.

As shown in FIG. 10, the video transmission system includes an acquisition module 3001, an encoding module 3002, a transmitting module 3003, a network transmission 3004, a receiving module 3005, a decoding module 3006, and a rendering module 3007.

Among them, the specific functions of each module in the video transmission system are as follows:

The acquisition module 3001 includes a camera or a camera group for collecting video images, and performing pre-encoding processing on the collected video images to convert the optical signals into digitized video sequences;

The encoding module 3002 is configured to encode the video sequence to obtain a code stream;

The sending module 3003 is configured to send the coded code stream.

The receiving module 3005 is configured to receive the code stream sent by the sending module 3003.

The network 3004 is configured to transmit the code stream sent by the sending module 3003 to the receiving module 3005;

The decoding module 3006 is configured to decode the code stream received by the receiving module 3005 to reconstruct a video sequence.

The rendering module 3007 is configured to render the reconstructed video sequence decoded by the decoding module 3006 to improve the display effect of the video.

The video transmission system shown in FIG. 10 can perform the image prediction method in the embodiment of the present application. Specifically, the encoding module 3002 and the decoding module 3006 in the video transmission system shown in FIG. 10 can perform image prediction in the embodiment of the present application. The method, including steps 301-307, also includes refinement and alternative implementations of each step. In addition, the acquisition module 3001, the encoding module 3002, and the transmission module 3003 in the video transmission system shown in FIG. 10 correspond to the video encoder 1000 shown in FIG. The receiving module 3005, the decoding module 3006, and the rendering module 3007 in the video transmission system shown in FIG. 10 correspond to the video decoder 2000 shown in FIG.

A codec system composed of a codec device and a codec device will be described in detail below with reference to FIGS. 11 and 12. It should be understood that the codec device and the codec system shown in FIGS. 11 and 12 are capable of performing the method of image prediction of the embodiment of the present application.

FIG. 11 is a schematic diagram of a video codec apparatus according to an embodiment of the present application. The video codec device 50 may be a device dedicated to encoding and/or decoding a video image, or may be an electronic device having a video codec function. Further, the codec device 50 may be a mobile communication system. Terminal or user equipment.

Codec device 50 may include the following modules or units: controller 56, codec 54, radio interface 52, antenna 44, smart card 46, card reader 48, keypad 34, memory 58, infrared port 42, display 32. In addition to the modules and units shown in FIG. 11, the codec device 50 may also include a microphone or any suitable audio input module, which may be a digital or analog signal input, and the codec device 50 may also include an audio output. Module, the audio output module can be a headset, a speaker or an analog audio or digital audio output connection. The codec device 50 may also include a battery, which may be a solar cell, a fuel cell, or the like. The codec device 50 may also include an infrared port for short-range line-of-sight communication with other devices, and the codec device 50 may also communicate with other devices using any suitable short-range communication method, for example, a Bluetooth wireless connection, USB / Firewire wired connection.

The memory 58 can store data in the form of data and audio in the form of images, as well as instructions for execution on the controller 56.

Codec 54 may implement encoding and decoding of audio and/or video data or enable auxiliary and auxiliary decoding of audio and/or video data under the control of controller 56.

The smart card 46 and the card reader 48 can provide user information as well as network authentication and authentication information for authorized users. The specific implementation form of the smart card 46 and the card reader 48 may be a Universal Integrated Circuit Card (UICC) and a UICC reader.

The radio interface circuit 52 can generate a wireless communication signal, which can be a communication signal generated during a cellular communication network, a wireless communication system, or a wireless local area network communication.

The antenna 44 is used to transmit radio frequency signals generated by the radio interface circuit 52 to other devices (the number of devices may be one or more), and may also be used for other devices (the number of devices may be one or more Receive RF signals.

In some embodiments of the present application, codec device 50 may receive video image data to be processed from another device prior to transmission and/or storage. In still other embodiments of the present application, the codec device 50 may receive images over a wireless or wired connection and encode/decode the received images.

FIG. 12 is a schematic block diagram of a video codec system 7000 according to an embodiment of the present application.

As shown in FIG. 12, the video codec system 7000 includes a source device 4000 and a destination device 5000. The source device 4000 generates encoded video data, the source device 4000 may also be referred to as a video encoding device or a video encoding device, and the destination device 5000 may decode the encoded video data generated by the source device 4000, the destination device 5000 may also be referred to as a video decoding device or a video decoding device.

The specific implementation form of the source device 4000 and the destination device 5000 may be any one of the following devices: a desktop computer, a mobile computing device, a notebook (eg, a laptop) computer, a tablet computer, a set top box, a smart phone, a handset, TV, camera, display device, digital media player, video game console, on-board computer, or other similar device.

Destination device 5000 can receive video data encoded by source device 4000 via channel 6000. Channel 6000 can include one or more media and/or devices capable of moving encoded video data from source device 4000 to destination device 5000. In one example, channel 6000 can include one or more communication media that enable source device 4000 to transmit encoded video data directly to destination device 5000 in real time, in which case source device 4000 can be based on communication standards ( For example, a wireless communication protocol) modulates the encoded video data, and the modulated video data can be transmitted to the destination device 5000. The one or more communication media may include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media described above may form part of a packet-based network (eg, a local area network, a wide area network, or a global network (eg, the Internet)). The one or more communication media described above may include a router, a switch, a base station, or other device that enables communication from the source device 4000 to the destination device 5000.

In another example, channel 6000 can include a storage medium that stores encoded video data generated by source device 4000. In this example, destination device 5000 can access the storage medium via disk access or card access. The storage medium may include a variety of locally accessible data storage media, such as Blu-ray Disc, High Density Digital Video Disc (DVD), Compact Disc Read-Only Memory (CD-ROM), flash memory. Or other suitable digital storage medium for storing encoded video data.

In another example, channel 6000 can include a file server or another intermediate storage device that stores encoded video data generated by source device 4000. In this example, destination device 5000 can access the encoded video data stored at a file server or other intermediate storage device via streaming or download. The file server may be a server type capable of storing encoded video data and transmitting the encoded video data to the destination device 5000. For example, the file server may include a World Wide Web (Web) server (for example, for a website), a File Transfer Protocol (FTP) server, a Network Attached Storage (NAS) device, and a local disk. driver.

Destination device 5000 can access the encoded video data via a standard data connection (e.g., an internet connection). The instance type of the data connection includes a wireless channel, a wired connection (e.g., a cable modem, etc.), or a combination of both, suitable for accessing the encoded video data stored on the file server. The transmission of the encoded video data from the file server may be streaming, downloading, or a combination of both.

The image prediction method of the present application is not limited to a wireless application scenario. Illustratively, the image prediction method of the present application can be applied to video codec supporting multiple multimedia applications such as the following applications: aerial television broadcasting, cable television transmission, satellite television transmission, Streaming video transmission (e.g., via the Internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other application. In some examples, video codec system 7000 can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In FIG. 12, the source device 4000 includes a video source 4001, a video encoder 4002, and an output interface 4003. In some examples, output interface 4003 can include a modulator/demodulator (modem) and/or a transmitter. Video source 4001 can include a video capture device (eg, a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer for generating video data A graphics system, or a combination of the above video data sources.

Video encoder 4002 can encode video data from video source 4001. In some examples, source device 4000 transmits the encoded video data directly to destination device 5000 via output interface 4003. The encoded video data may also be stored on a storage medium or file server for later access by the destination device 5000 for decoding and/or playback.

In the example of FIG. 12, the destination device 5000 includes an input interface 5003, a video decoder 5002, and a display device 5001. In some examples, input interface 5003 includes a receiver and/or a modem. The input interface 5003 can receive the encoded video data via the channel 6000. Display device 5001 may be integrated with destination device 5000 or may be external to destination device 5000. Generally, the display device 5001 displays the decoded video data. Display device 5001 can include a variety of display devices, such as liquid crystal displays, plasma displays, organic light emitting diode displays, or other types of display devices.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including The instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present application. It should be covered by the scope of protection of this application. Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims

An image prediction method, the method comprising:

Obtaining initial predicted motion information of the current image block;

Determining, according to the initial predicted motion information, a first reference block corresponding to the current image block in the first reference image, and determining, in the second reference image, a second reference block corresponding to the current image block; The first reference block includes a first search base point, and the second reference block includes a second search base point;

Determining N third reference blocks in the first reference image;

For the third reference block of any one of the N third reference blocks, according to the first search base point, the location of the any one of the third reference blocks, and the second search base point, in the second reference Correspondingly determining a fourth reference block in the image; obtaining N reference block groups, wherein one reference block group includes a third reference block and a fourth reference block; N is greater than or equal to 1;

And increasing the pixel values of the obtained third reference block and the fourth reference block to a first pixel precision, and calculating an image block matching cost of the N reference block groups under the first pixel precision;

Determining, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion, the target reference block group including a target third reference block and a target fourth reference block;

Obtaining a pixel prediction value of the current image block according to a pixel value of the target third reference block at a first precision and a pixel value of the target fourth reference block at a first precision, wherein the current image block The pixel prediction value has a second pixel precision; the second pixel precision is less than the first pixel precision.
The method according to claim 1, wherein said initial predicted motion information includes a reference image index for indicating that said two reference images comprise a forward reference image and a backward reference image.
The method according to claim 1 or 2, wherein said any one of said N third reference blocks is based on said first search base point, said any one of said third reference Positioning the block and the second search base point, correspondingly determining, in the second reference image, a fourth reference block includes:

If the first reference block is a third reference block;

The second reference block is correspondingly a fourth reference block; wherein the first reference block and the second reference block belong to one reference block group.
The method according to claim 1 or 2, wherein said any one of said N third reference blocks is based on said first search base point, said any one of said third reference Positioning the block and the second search base point, correspondingly determining, in the second reference image, a fourth reference block includes:

Determining an ith vector according to the any one of the third reference block and the first search base;

Determining the jth according to a time domain interval t1 of the current image block relative to the first reference image, a time domain interval t2 of the current image block relative to the second reference image, and the ith vector a vector, wherein the jth vector is opposite to a direction of the ith vector; i and j are both positive integers not greater than N;

Determining a fourth reference block according to the second search base point and the jth vector.
The method according to claim 1 or 2, wherein said any one of said N third reference blocks is based on said first search base point, said any one of said third reference Positioning the block and the second search base point, correspondingly determining, in the second reference image, a fourth reference block includes:

Determining an ith vector according to the any one of the third reference block and the first search base;

Determining, according to the ith vector, a jth vector, wherein the jth vector is inversely different from the ith vector; i and j are positive integers not greater than N;

Determining a fourth reference block according to the second search base point and the jth vector.
The method according to any one of claims 1 to 5, wherein the pixel values of the obtained third reference block and fourth reference block are increased to a first pixel precision, and the first pixel precision is Calculating the image block matching cost of the N reference block groups includes:

And increasing pixel values of the third reference block and the fourth reference block to a first pixel precision by interpolation or shifting for at least one of the N reference block groups, and under the first pixel precision Calculate the image block matching cost;

Determining, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion includes:

And determining, by the first one of the at least one reference block group, a reference block group that satisfies an image block matching cost less than a preset threshold as the target reference block group.
The method according to any one of claims 1 to 5, wherein the pixel values of the obtained third reference block and fourth reference block are increased to a first pixel precision, and the first pixel precision is Calculating the image block matching cost of the N reference block groups includes:

And increasing pixel values of the obtained third reference block and fourth reference block to first pixel precision by interpolation or shifting;

Calculating an image block matching cost for each of the N reference block groups;

Determining, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion includes:

A reference block group having the smallest image block matching cost among the N reference block groups is determined as the target reference block group.
The method according to any one of claims 1 to 7, wherein said pixel value according to said target third reference block at a first precision and said target fourth reference block are at a first precision Pixel values to obtain pixel prediction values for the current image block include:

Obtaining a pixel value predSamplesL0'[x][y] of the target third reference block at the first precision;

Obtaining a pixel value predSamplesL1'[x][y] of the target fourth reference block at the first precision;

The pixel prediction value of the current image block predSamples'[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0'[x][y]+predSamplesL1'[x][y]+ Offset2)>>shift2), where bitDepth is the second pixel precision, Shift2 is the shift parameter, and offset2 is equal to 1<<(shift2-1).
An image prediction apparatus, characterized in that the apparatus comprises:

An acquiring unit, configured to acquire initial predicted motion information of a current image block;

a determining unit, configured to determine, according to the initial predicted motion information, a first reference block corresponding to the current image block in the first reference image, and determine, in the second reference image, a corresponding one of the current image block a second reference block, wherein the first reference block includes a first search base point, and the second reference block includes a second search base point;

a searching unit, configured to determine N third reference blocks in the first reference image;

a mapping unit, configured to: according to the first search base point, the location of the any one of the third reference blocks, and the second search base point, for any one of the N third reference blocks Determining, in the second reference image, a fourth reference block; obtaining N reference block groups, wherein one reference block group includes a third reference block and a fourth reference block; N is greater than or equal to 1;

a calculating unit, configured to increase a pixel value of the obtained third reference block and the fourth reference block to a first pixel precision, and calculate an image block matching cost of the N reference block groups under the first pixel precision;

a selecting unit, configured to determine, in the N reference block groups, a target reference block group that satisfies an image block matching cost criterion, where the target reference block group includes a target third reference block and a target fourth reference block;

a prediction unit, configured to obtain a pixel prediction value of the current image block according to a pixel value of the target third reference block at a first precision and a pixel value of the target fourth reference block at a first precision, where The pixel prediction value of the current image block has a second pixel precision; the second pixel precision is less than the first pixel precision.
The apparatus according to claim 9, wherein said initial predicted motion information includes a reference image index for indicating that said two reference images comprise a forward reference image and a backward reference image.
The device according to claim 9 or 10, wherein the mapping unit is specifically configured to:

If the first reference block is a third reference block;

The second reference block is correspondingly a fourth reference block; wherein the first reference block and the second reference block belong to one reference block group.
The device according to claim 9 or 10, wherein the mapping unit is specifically configured to:

Determining an ith vector according to the any one of the third reference block and the first search base;

Determining the jth according to a time domain interval t1 of the current image block relative to the first reference image, a time domain interval t2 of the current image block relative to the second reference image, and the ith vector a vector, wherein the jth vector is opposite to a direction of the ith vector; i and j are both positive integers not greater than N;

Determining a fourth reference block according to the second search base point and the jth vector.
The device according to claim 9 or 10, wherein the mapping unit is specifically configured to:

Determining an ith vector according to the any one of the third reference block and the first search base;

Determining, according to the ith vector, a jth vector, wherein the jth vector is inversely different from the ith vector; i and j are positive integers not greater than N;

Determining a fourth reference block according to the second search base point and the jth vector.
The apparatus according to any one of claims 9 to 13, wherein the calculating unit is specifically configured to: obtain, for the at least one reference block group of the N reference block groups, the obtained third reference block and The pixel value of the fourth reference block is increased to the first pixel precision by interpolation or shifting; and the image block matching cost is calculated under the first pixel precision;

The selecting unit is specifically configured to: determine, as the target reference block group, a reference block group that meets the first occurrence of the image block matching error that is less than a preset threshold in the at least one reference block group.
The apparatus according to any one of claims 9 to 13, wherein the calculating unit is specifically configured to: increase pixel values of the obtained third reference block and fourth reference block by interpolation or shifting to First pixel precision; calculating an image block matching cost for each of the N reference block groups;

The selecting unit is specifically configured to: determine, as the target reference block group, a reference block group that minimizes an image block matching error in the N reference block groups.
The apparatus according to any one of claims 9 to 15, wherein the prediction unit is specifically configured to:

Obtaining a pixel value predSamplesL0'[x][y] of the target third reference block at the first precision;

Obtaining a pixel value predSamplesL1'[x][y] of the target fourth reference block at the first precision;

The pixel prediction value of the current image block predSamples'[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0'[x][y]+predSamplesL1'[x][y]+ Offset2)>>shift2), where bitDepth is the second pixel precision, Shift2 is the shift parameter, and offset2 is equal to 1<<(shift2-1).
An encoder for encoding an image block, comprising:

The image prediction apparatus according to any one of claims 9 to 16, wherein the image prediction means is configured to obtain a predicted value of a pixel value of the current image block;

And a code reconstruction module, configured to obtain a reconstructed pixel value of the current image block according to a predicted value of a pixel value of the current image block.
A decoder, the video decoder for decoding an image block, comprising:

The image prediction apparatus according to any one of claims 9 to 16, wherein the image prediction means is configured to obtain a predicted value of a pixel value of the current image block;

And a decoding reconstruction module, configured to obtain a reconstructed pixel value of the current image block according to a predicted value of a pixel value of the current image block.
A computer readable storage medium, characterized in that the computer readable storage medium stores program code, wherein the program code comprises instructions for performing the method of any of claims 1-8.
A terminal, the terminal comprising a memory, a processor;

Program instructions are stored in the memory,

The processor is configured to invoke the program instruction to perform the method of any one of claims 1-8.