CN112437299B

CN112437299B - Inter-frame prediction method, device and storage medium

Info

Publication number: CN112437299B
Application number: CN202010846274.8A
Authority: CN
Inventors: 徐巍炜; 杨海涛; 赵寅
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-21
Filing date: 2019-09-20
Publication date: 2022-03-29
Anticipated expiration: 2039-09-20
Also published as: CN112437299A; CN115695782A

Abstract

The embodiment of the application discloses an inter-frame prediction method and device, relates to the technical field of video coding and decoding, and solves the problems that prediction pixels obtained in an inter-frame prediction mode in the prior art have certain discontinuity in a spatial domain, the prediction efficiency is influenced, and the prediction residual energy is large. The specific scheme is as follows: analyzing the code stream to obtain motion information of the image block to be processed; performing motion compensation on the image block to be processed based on the motion information to obtain a prediction block of the image block to be processed, wherein the prediction block of the image block to be processed comprises a prediction value of a target pixel point; and performing weighted calculation on the reconstruction values of one or more reference pixel points and the predicted value of the target pixel point to update the predicted value of the target pixel point, wherein the reference pixel point and the target pixel point have a preset spatial position relationship.

Description

Inter-frame prediction method, device and storage medium

The application is a division of Chinese patent applications with application numbers 201980011364.0 and 8926, which are filed in 7/31/2020, with the application names 'a method and a device for inter-frame prediction', 201980011364.0 is a Chinese patent application with application numbers 201980011364.0 and 2019/107060 entering the national phase, PCT/CN2019/107060 requires the Chinese patent application with application numbers 201811109950.2 and 'a method and a device for video encoding and decoding' to be filed in 9/21/2018, the Chinese patent application with application numbers 201811303754.9 and 'a method and a device for inter-frame prediction' to be filed in 11/2/2018, and the international patent applications with application numbers PCT/CN2018/109233 and 'a method and a device for video encoding and decoding' to be filed in 10/1/2018, the entire contents of which are incorporated by reference in the present application.

Technical Field

The embodiment of the application relates to the technical field of video coding and decoding, in particular to an inter-frame prediction method and device.

Background

Digital video techniques are widely applicable in a variety of digital video devices, which may implement video decoding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 advanced video decoding (AVC), ITU-T H.265 (also known as high efficiency video decoding, HEVC), and extensions of these standards. Digital video devices more efficiently transmit, receive, encode, decode, and/or store digital video information by implementing these video decoding techniques.

At present, inter-frame prediction and intra-frame prediction technologies are mainly used in video coding and decoding to eliminate temporal and spatial redundancies in video. However, the inter-frame prediction technology only considers the time-domain correlation between the same objects in the adjacent frames of the image, but does not consider the correlation problem of the spatial domain, which causes that the predicted pixels obtained by the existing inter-frame prediction mode have certain discontinuity on the spatial domain, affects the prediction efficiency, and causes large prediction residual energy.

Disclosure of Invention

The embodiment of the application provides an inter-frame prediction method and device, which can perform spatial filtering on a prediction block of inter-frame coding and improve coding efficiency.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect of the embodiments of the present application, a method for inter-frame prediction is provided, where the method includes: analyzing the code stream to obtain motion information of the image block to be processed; performing motion compensation on the image block to be processed based on the motion information to obtain a prediction block of the image block to be processed, wherein the prediction block of the image block to be processed comprises a prediction value of a target pixel point; and performing weighted calculation on the reconstruction values of one or more reference pixel points and the predicted value of the target pixel point to update the predicted value of the target pixel point, wherein the reference pixel points and the target pixel point have a preset spatial position relationship. Based on the scheme, the spatial filtering processing is carried out on the predicted value of the target pixel point by utilizing the adjacent reconstructed pixels around, so that the coding compression efficiency can be improved.

With reference to the first aspect, in a possible implementation manner, the one or more reference pixels include a reconstructed pixel having the same abscissa as the target pixel and a preset ordinate difference, or a reconstructed pixel having the same ordinate as the target pixel and a preset abscissa difference. Based on the scheme, the target pixel point is subjected to filtering processing through the reference pixel point which has the preset spatial domain position relation with the target pixel point, and compared with the prior art, the coding efficiency is improved.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, the updating the predicted value of the target pixel point includes: performing weighted calculation according to the predicted value of the target pixel point before updating and the reconstructed value of the reference pixel point to obtain the updated predicted value of the target pixel point, wherein the updated predicted value of the target pixel point is obtained through the following formula:

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) is a reconstructed value of a reference pixel point located at a coordinate position (xN-M1, yP), (xP, yN-M2), w1, w2, w3, w4, w5, w6 are preset constants, M1, and M2 are preset positive integers. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

With reference to the first aspect and the possible implementations described above, in another possible implementation, w1+ w2 ═ R1, or w3+ w4 ═ R2, or w5+ w6+ w7 ═ R3, where R1, R2, and R3 are each powers n of 2, and n is a non-negative integer. Based on the scheme, the coding efficiency can be further improved.

It should be understood that R1, R2, and R3 are each 2 to the nth power, and do not limit R1, R2, and R3 to be the same or different, and for example, R1, R2, and R3 may all be 8, or R1, R2, and R3 may each be 2, 4, and 16.

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper-left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5, w6, w7, w8, w5, w10, w11, w1, w 599 is a preset integer, w 599 is a preset integer. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, w1+ w2+ w3 is equal to S1, or w4+ w5+ w6 is equal to S2, or w7+ w8+ w9+ w10+ w11 is equal to S3, where S1, S2, and S3 are respectively powers n of 2, and n is a non-negative integer. Based on the scheme, the coding efficiency can be further improved.

It should be understood that S1, S2, S3 are respectively powers of 2 n, and do not limit S1, S2, S3 to be the same or different, for example, S1, S2, S3 may all be 8, or S1, S2, S3 may be 2, 4, 16, respectively.

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xP, yN-M2), w1, w2, w3 are preset constants, and M1, M2 are preset positive integers. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, w1+ w2+ w3 is equal to R, where R is an n-th power of 2, and n is a non-negative integer.

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5 preset constants, M1, M2, M3, and M4 are preset positive integers. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, w1+ w2+ w3+ w4+ w5 is S, where S is an n-th power of 2, and n is a non-negative integer. Based on the scheme, the coding efficiency can be further improved.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the one or more reference pixel points include one or more of the following pixel points: reconstructed pixel points which have the same horizontal coordinates with the target pixel points and are adjacent to the upper edge of the image block to be processed; or, a reconstructed pixel point which has the same vertical coordinate with the target pixel point and is adjacent to the left edge of the image block to be processed; or, the reconstructed pixel point at the upper right corner of the image block to be processed; or, the reconstructed pixel point at the lower left corner of the image block to be processed; or, the reconstructed pixel point at the upper left corner of the image block to be processed. Based on the scheme, the target pixel point is subjected to filtering processing through the reference pixel point which has the preset spatial domain position relation with the target pixel point, and compared with the prior art, the coding efficiency is improved.

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP，yP) +((w1+w2)/2))/(w1+w2)

Wherein the content of the first and second substances,

predP1(xP,yP)＝(predV(xP,yP)+predH(xP,yP)+nTbW*nTbH)>>(Log2(nTbW)+Log2(nTbH)+ 1),predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH))<<Log2(nTbW),

predH (xP, yP) ((nTbW-1-xP) × p (-1, yP) + (xP +1) × p (nTbW, -1)) < < Log2(nTbH), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left-corner pixel point in the image block to be processed are (0,0), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is a predicted value after updating of the target pixel point, p (xP, -1), p (-1, nTbH), p (-1, yP), and p (nTbW, -1) are reconstructed values of the reference pixel points located at coordinate positions (xP, -1), (-1, nTbH), (-1, yP), (nTbW, -1), w1, w2 are preset constants, and nTbW and nTbH are the width and height of the image block to be processed.

In a possible implementation manner of the first aspect, the predicted value of the target pixel point is updated according to the following formula:

predQ(xP,yP)＝(w1*predP(xP,yP) +w2*predV(xP,yP) +w3*predH(xP,yP)+((w1+w2+w3)/2))/(w1+w2+w3)

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH)+nTbH/2)>>Log2(nTbH),

predH (xP, yP) ((nTbW-1-xP) × p (-1, yP) + (xP +1) × p (nTbW, -1) + nTbW/2) > > Log2(nTbW), coordinates of the target pixel point are (xP, yP), coordinates of an upper left pixel point in the image block to be processed are (0,0), predP (xP, yP) is a predicted value before update of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, p (xP, -1), p (-1, nTbH), p (-1, yP), p (nTbW, -1) are respectively located at coordinate positions (xP, -1), (-1, nTbH), (-1, yP), (nTbW, -1), a weight value of the reference pixel point is 56, 36 2, 3, 36w, a predetermined constant, nTbW and nTbH are the width and height of the image block to be processed.

predQ(xP,yP)＝(((w1*predP(xP,yP))<<(Log2(nTbW)+Log2(nTbH)+1)) +w2*predV(xP,yP) +w3*predH(xP,yP) +(((w1+w2+w3)/2)<<(Log2(nTbW)+Log2(nTbH)+1))) /(((w1+w2+w3)<<(Log2(nTbW)+Log2(nTbH)+1)))

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH))<<Log2(nTbW),

Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP) +((w1+w2)/2))/(w1+w2)

Where predP1-xP, yP) ═ predV (xP, yP) + predH (xP, yP) +1) > 1,

predV (xP, yP) ((nTbH-1- (yP-yN))) recan (xP, yN-1) + (yP-yN +1) × recan (xN-1, yN + nTbH) + (nTbH > 1)) > Log2(nTbH), predH (xP, yP) ((nTbW-1- (xP-xN))) recan (xN-1, yP) + (xP-xN +1) > -recan (xN + nTbW, yN-1) + (nTbW >) > 1)) > (nTbW), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left corner of the pixel point in the image block to be processed are (xN, yN), predP (xP, yP) are (xN, yP), the predicted value of the target pixel point is updated (xP-1), and the predicted value of the pixel point is updated (xP-1, and the predicted value of the target pixel point is updated (xP-1), yN + nTbH), recon (xN-1, yP), recon (xN + nTbW, yN-1) are reconstruction values of the above-mentioned reference pixel points located at coordinate positions (xP, yN-1), (xN-1, yN + nTbH), (xN-1, yP), (xN + nTbW, yN-1), w1, w2 are preset constants, and nTbW and nTbH are widths and heights of the above-mentioned image blocks to be processed, respectively. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

With reference to the first aspect and the possible implementations described above, in another possible implementation, the sum of w1 and w2 is the nth power of 2, where n is a non-negative integer. Based on the scheme, the coding efficiency can be further improved.

wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (32 > (xP > 1) > nScale), wTL (xP, yP) ((wl) (xP) > 4) + (wt (yP) > 4)), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left corner in the image block to be processed are (xN, yN), predP (xP, yP) are the coordinates of the target pixel point, preq (xP-1, yP) is located before the update, and the predicted value of the target pixel point is (pxn-1, prexp-1, yP-1, and the predicted value is (nxp-1-yP-1, respectively, yN-1), (xN-1, yP), (xN-1, yN-1) reference pixel point reconstruction values, nTbW and nTbH are the width and height of the image block to be processed, and clip1Cmp is clamping operation. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (xP < 1) > nScale), nScale = ((Log2(nTbW) + Log2(nTbH) -2) > 2), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is the predicted value before update of the target pixel point, predQ (xP, yP) is the predicted value after update of the target pixel point, recon (xN-1, yP), recp (xP, yP-1, yP) is located at the position of the target pixel point, tbxn, and the reference pixel point is located at the position of tbnxn, respectively, clip1Cmp is a clamping operation. Based on the scheme, the updated predicted value of the target pixel point can be obtained through filtering processing.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, the performing weighted calculation on the reconstruction value of one or more reference pixel points and the prediction value of the target pixel point includes: when the reconstruction value of the reference pixel point is unavailable, determining the availability of pixel points adjacent to the upper edge and the left edge of the image block to be processed according to a preset sequence until a preset number of available reference pixel points are obtained; and carrying out weighted calculation on the reconstruction value of the available reference pixel point and the predicted value of the target pixel point. Based on the scheme, when the reconstruction value of the reference pixel point is unavailable, the reference pixel points with available reconstruction values on the left side and the upper side of the image block to be processed are searched in the preset sequence, and the prediction value of the target pixel point is updated by using the reconstruction value of the available reference pixel points.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, the determining, according to a preset sequence, the availability of the pixel points adjacent to the upper edge and the left edge of the image block to be processed until obtaining a preset number of available reference pixel points includes: and acquiring available reference pixel points according to the sequence from the coordinate (xN-1, yN + nTbH-1) to the coordinate (xN-1, yN-1) and then from the coordinate (xN, yN-1) to the coordinate (xN + nTbW-1, yN-1). Based on the scheme, the reconstruction value of the available reference pixel point can be obtained.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, when at least one reference pixel point of all reference pixel points is available, if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is not available, searching for an available pixel point from the coordinates (xN-1, yN + nTbH-1) to the coordinates (xN-1, yN-1) according to the preset sequence, and then from the coordinates (xN, yN-1) to the coordinates (xN + nTbW-1, yN-1), once the available pixel point is found, the search is terminated, and if the available pixel point is (x, y), the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is set as the reconstruction value of the pixel point (x, y); the reconstruction value of a reference pixel (x, y) in a reference pixel (xN-1, yN + nTbH-M) set is unavailable, wherein M is more than or equal to 2 and less than or equal to nTbH +1, and the reconstruction value of the reference pixel (x, y) of the reference pixel is set as the reconstruction value of the pixel (x, y + 1); and the reconstruction value of the reference pixel point (x, y) in the reference pixel point (xN + N, yN-1) set is unavailable, wherein N is more than or equal to 0 and less than or equal to nTbW-1, and the reconstruction value of the reference pixel point (x, y) is set as the reconstruction value of the reference pixel point (x-1, y). Based on the scheme, the reconstruction value of the available reference pixel point can be obtained.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, if a reconstruction value of a reference pixel (xN-1, yN + nTbH-M) is unavailable, where M is greater than or equal to 1 and less than or equal to nTbH +1, an available reference pixel can be found according to the preset order starting from a coordinate (xN-1, yN + nTbH-M), and if the available reference pixel is B, the reconstruction value of the reference pixel (xN-1, yN + nTbH-M) can be set as the reconstruction value of the reference pixel B; if the reference pixel point coordinate is (xN + N, yN-1), the reconstruction value is not available, wherein N is greater than or equal to 0 and less than or equal to nTbW-1, the available reference pixel points can be searched for according to the preset sequence from the coordinate (xN + N, yN-1), and if the available reference pixel point is C, the reconstruction value of the reference pixel point (xN + N, yN-1) can be set as the reconstruction value of the reference pixel point C. Based on the scheme, the reconstruction value of the available reference pixel point can be obtained.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is not available, searching for an available pixel point from the coordinate (xN-1, yN + nTbH-1) to the coordinate (xN-1, yN-1) according to the preset sequence, and then from the coordinate (xN, yN-1) to the coordinate (xN + nTbW-1, yN-1), once the available pixel point is found, the search is terminated, and if the available pixel point is (x, y), the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is set as the reconstruction value of the pixel point (x, y); if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-M) is unavailable, wherein M is larger than 1 and smaller than or equal to nTbH +1, an available reference pixel point can be searched for from the coordinate (xN-1, yN + nTbH-M) according to the sequence opposite to the preset sequence, and if the available reference pixel point is C, the reconstruction value of the reference pixel point (xN-1, yN + nTbH-M) can be set as the reconstruction value of the reference pixel point C; if the reference pixel coordinate is (xN + N, yN-1), the reconstruction value is not available, where N is greater than or equal to 0 and less than or equal to nTbW-1, the available reference pixel can be searched for in the order opposite to the preset order from the coordinate (xN + N, yN-1), and if the available reference pixel is D, the reconstruction value of the reference pixel (xN + N, yN-1) can be set as the reconstruction value of the reference pixel D. Based on the scheme, the reconstruction value of the available reference pixel point can be obtained.

With reference to the first aspect and the possible implementation manners, in another possible implementation manner, if it is determined that none of the pixels adjacent to the upper edge and the left edge of the to-be-processed image block are available, the reconstruction value of the reference pixel is set to 1< < (bitDepth-1), where bitDepth is a bit depth of a sampling value of the reference pixel. Based on the scheme, when the reconstruction values of the reference pixel point and the new reference pixel point are unavailable, the reconstruction value of the reference pixel point can be set based on the bit depth.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, before performing weighted calculation on the reconstruction value of the one or more reference pixel points and the prediction value of the target pixel point, the method includes: when the reference pixel point is positioned above the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of the left and right adjacent pixel points of the reference pixel point; when the reference pixel point is positioned at the left side of the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of upper and lower adjacent pixel points of the reference pixel point; and updating the reconstruction value of the reference pixel point by adopting the result of the weighted calculation. Based on the scheme, before the filtering processing is carried out on the target pixel point, the filtering processing is carried out on the reconstruction value of the reference pixel point, so that the coding efficiency can be further improved, and the prediction residual error is reduced.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, before performing motion compensation on the to-be-processed image block based on the motion information, the method further includes: performing initial update on the motion information through a first preset algorithm; correspondingly, the motion compensation of the image block to be processed based on the motion information includes: and performing motion compensation on the image block to be processed based on the initially updated motion information. Based on the scheme, the prediction residual error can be reduced by updating the motion information before the motion compensation is carried out on the current block and carrying out the motion compensation based on the updated motion information.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, after obtaining the prediction block of the to-be-processed image block, the method further includes: pre-updating the prediction block through a second preset algorithm; correspondingly, the performing weighted calculation on the reconstruction value of one or more reference pixel points and the prediction value of the target pixel point includes: and performing weighted calculation on the reconstruction value of the one or more reference pixel points and the pre-updated prediction value of the target pixel point. Based on the scheme, the prediction residual can be reduced by pre-updating the prediction block of the current block and performing weighted calculation according to the updated prediction value and the reconstruction value of the reference pixel.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, after performing weighted calculation on the reconstruction value of one or more reference pixel points and the prediction value of the target pixel point to update the prediction value of the target pixel point, the method further includes: and updating the predicted value of the target pixel point through a second preset algorithm. Based on the scheme, the predicted value of the target pixel point after the spatial domain filtering processing can be updated by adopting a preset algorithm, and the prediction residual error is reduced.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, before performing weighted calculation on the reconstruction value of the one or more reference pixel points and the prediction value of the target pixel point, the method further includes: analyzing the code stream to obtain a prediction mode of the image block to be processed; determining the prediction mode as a merge mode (merge) and/or an inter advanced motion vector prediction mode (inter AMVP); it is understood that the inter advanced motion vector prediction mode (inter AMVP) may also be referred to as an inter motion vector prediction mode (inter MVP). Based on the scheme, the prediction mode of the image block to be processed can be determined before the filtering processing.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, before performing weighted calculation on the reconstruction value of the one or more reference pixel points and the prediction value of the target pixel point, the method further includes: analyzing the code stream to obtain the updating judgment identification information of the image block to be processed; and determining that the updating judgment identification information indicates to update the prediction block of the image block to be processed. Based on the scheme, the updating judgment identification information of the image block to be processed can be obtained by analyzing the code stream, and the prediction block for updating the image block to be processed is determined.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, before performing weighted calculation on the reconstruction value of the one or more reference pixel points and the prediction value of the target pixel point, the method further includes: acquiring preset updating judgment identification information of the image block to be processed; and determining that the updating judgment identification information indicates to update the prediction block of the image block to be processed. Based on the scheme, the updating discrimination identification information of the image block to be processed can be obtained, and the prediction block of the image block to be processed is determined to be updated according to the updating discrimination identification information.

In a second aspect of the embodiments of the present application, an inter prediction apparatus is provided, including: the analysis module is used for analyzing the code stream to obtain the motion information of the image block to be processed; the compensation module is used for carrying out motion compensation on the image block to be processed based on the motion information so as to obtain a prediction block of the image block to be processed, wherein the prediction block of the image block to be processed comprises a prediction value of a target pixel point; and the calculation module is used for carrying out weighted calculation on the reconstruction values of one or more reference pixel points and the predicted value of the target pixel point so as to update the predicted value of the target pixel point, wherein the reference pixel point and the target pixel point have a preset spatial position relationship.

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, the one or more reference pixels include a reconstructed pixel having the same abscissa as the target pixel and having a preset ordinate difference, or a reconstructed pixel having the same ordinate as the target pixel and having a preset abscissa difference.

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, the calculating module is specifically configured to perform weighted calculation according to the predicted value of the target pixel before updating and the reconstructed value of the reference pixel, and obtain an updated predicted value of the target pixel, where the updated predicted value of the target pixel is obtained through the following formula:

The coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) is a reconstructed value of the reference pixel point located at a coordinate position (xN-M1, yP), (xP, yN-M2), w1, w2, w3, w4, w5, w6 are preset constants, and M1, M2 are preset positive integers.

With reference to the second aspect and the possible implementations described above, in another possible implementation, w1+ w2 ═ R1, or w3+ w4 ═ R2, or w5+ w6+ w7 ═ R3, where R1, R2, and R3 are each powers n of 2, and n is a non-negative integer.

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, the calculating module is specifically further configured to perform weighted calculation according to the predicted value of the target pixel before updating and the reconstructed value of the reference pixel, and obtain an updated predicted value of the target pixel, where the updated predicted value of the target pixel is obtained through the following formula:

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper-left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5, w6, w7, w8, w5, w10, w11, w1, w 599 is a preset integer, w 599 is a preset integer.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, w1+ w2+ w3 is equal to S1, or w4+ w5+ w6 is equal to S2, or w7+ w8+ w9+ w10+ w11 is equal to S3, where S1, S2, and S3 are respectively powers n of 2, and n is a non-negative integer.

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xP, yN-M2), w1, w2, w3 are preset constants, and M1, M2 are preset positive integers.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, w1+ w2+ w3 is equal to R, where R is an n-th power of 2, and n is a non-negative integer.

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5 preset constants, M1, M2, M3, and M4 are preset positive integers.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, w1+ w2+ w3+ w4+ w5 is S, where S is an n-th power of 2, and n is a non-negative integer.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the one or more reference pixel points include one or more of the following pixel points: reconstructed pixel points which have the same horizontal coordinates with the target pixel points and are adjacent to the upper edge of the image block to be processed; or, a reconstructed pixel point which has the same vertical coordinate with the target pixel point and is adjacent to the left edge of the image block to be processed; or, the reconstructed pixel point at the upper right corner of the image block to be processed; or, the reconstructed pixel point at the lower left corner of the image block to be processed; or, the reconstructed pixel point at the upper left corner of the image block to be processed.

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP) +((w1+w2)/2))/(w1+w2)

wherein the content of the first and second substances,

predH (xP, yP) ((nTbW-1-xP) × p (-1, yP) + (xP +1) × p (nTbW, -1)) < < Log2(nTbH), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left-corner pixel point in the image block to be processed are (0, 0), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is a predicted value after updating of the target pixel point, p (xP, -1), p (-1, nTbH), p (-1, yP), and p (nTbW, -1) are reconstructed values of the reference pixel points located at coordinate positions (xP, -1), (-1, nTbH), (-1, yP), (nTbW, -1), w1, w2 are preset constants, and nTbW and nTbH are the width and height of the image block to be processed.

In a possible implementation manner of the second aspect, the predicted value of the target pixel point is updated according to the following formula:

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH)+nTbH/2)>>Log2(nTbH),

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP，-1)+(yP+1)*p(-1,nTbH))<<Log2(nTbW),

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP)

+((w1+w2)/2))/(w1+w2)

where predP1(xP, yP) ═ predV (xP, yP) + predH (xP, yP) +1) > 1,

predV (xP, yP) ((nTbH-1- (yP-yN))) recan (xP, yN-1) + (yP-yN +1) × recan (xN-1, yN + nTbH) + (nTbH > 1)) > Log2(nTbH), predH (xP, yP) ((nTbW-1- (xP-xN))) recan) xN-1, yP) + (xP-xN +1) <recan (xN + nTbW, yN-1) + (nTbW > 1)) > 2(nTbW), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left corner of the pixel point in the image block to be processed are (xN, yN), predP (xP, yP) are (xP, yP), the predicted value of the target pixel point is updated (xP-1), and the predicted value of the pixel point is updated (xP-1, and the predicted value of the target pixel point is updated (xP-1), yN + nTbH), recon (xN-1, yP), recon (xN + nTbW, yN-1) are reconstruction values of the above-mentioned reference pixel points located at coordinate positions (xP, yN-1), (xN-1, yN + nTbH), (xN-1, yP), (xN + nTbW, yN-1), w1, w2 are preset constants, and nTbW and nTbH are widths and heights of the above-mentioned image blocks to be processed, respectively.

With reference to the second aspect and the possible implementations described above, in another possible implementation, the sum of w1 and w2 is the nth power of 2, where n is a non-negative integer.

wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (32 > ((xP < 1) > nScale), wTL (xP, yP) ((wl) (xP) > 4) + (wt (yP) > 4)), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left corner in the image block to be processed are (xN, yN), predP (xP, yP) are the coordinates of the target pixel point, preq (xP-1, yP) before updating, the predicted value of the target pixel point is (pxn-1, yP-1, and the predicted value of the target pixel point is (yP-1-xP-1, yP) after updating, respectively, yN-1), (xN-1, yP), (xN-1, yN-1), nTbW and nTbH are the width and height of the image block to be processed, and clip1Cmp is the clamping operation.

wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (xP < 1) > nScale), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is the predicted value of the target pixel point before update, predQ (xP, yP) is the predicted value of the target pixel point after update, recon (xN-1, yP), yP (xP, yP-1, yP) is located at the position of the target pixel point, tbxn, and the reference pixel point is located at the position of tbnxp, respectively, tbnxp, and tbnxp-1, ybn-1, and tbnxp, where the target pixel point is located at the position of the image block to be processed, clip1Cmp is a clamping operation.

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, the calculating module is further configured to determine, according to a preset sequence, availability of pixels adjacent to an upper edge and a left edge of the to-be-processed image block until a preset number of available reference pixels are obtained, when the reconstruction value of the reference pixel is unavailable; and carrying out weighted calculation on the reconstruction value of the available reference pixel point and the predicted value of the target pixel point.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing calculation module is specifically configured to obtain the reconstruction values of the available reference pixels in an order from the coordinate (xN-1, yN + nTbH-1) to the coordinate (xN-1, yN-1), and then from the coordinate (xN, yN-1) to the coordinate (xN + nTbW-1, yN-1).

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, when at least one reference pixel point of all the reference pixel points is available, if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is not available, searching for an available pixel point from the coordinates (xN-1, yN + nTbH-1) to the coordinates (xN-1, yN-1) according to the preset sequence, and then from the coordinates (xN, yN-1) to the coordinates (xN + nTbW-1, yN-1), once the available pixel point is found, the search is terminated, and if the available pixel point is (x, y), the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is set as the reconstruction value of the pixel point (x, y); the reconstruction value of a reference pixel (x, y) in a reference pixel (xN-1, yN + nTbH-M) set is unavailable, wherein M is more than or equal to 2 and less than or equal to nTbH +1, and the reconstruction value of the reference pixel (x, y) of the reference pixel is set as the reconstruction value of the pixel (x, y + 1); and the reconstruction value of the reference pixel point (x, y) in the reference pixel point (xN + N, yN-1) set is unavailable, wherein N is more than or equal to 0 and less than or equal to nTbW-1, and the reconstruction value of the reference pixel point (x, y) is set as the reconstruction value of the reference pixel point (x-1, y).

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, the computing module is specifically configured to find an available reference pixel according to the preset order from a coordinate (xN-1, yN + nTbH-M) if a reconstruction value of the reference pixel (xN-1, yN + nTbH-M) is unavailable, where M is greater than or equal to 1 and less than or equal to nTbH +1, and if the available reference pixel is B, the reconstruction value of the reference pixel (xN-1, yN + nTbH-M) may be set as a reconstruction value of the reference pixel B; if the reference pixel point coordinate is (xN + N, yN-1), the reconstruction value is not available, wherein N is greater than or equal to 0 and less than or equal to nTbW-1, the available reference pixel points can be searched for according to the preset sequence from the coordinate (xN + N, yN-1), and if the available reference pixel point is C, the reconstruction value of the reference pixel point (xN + N, yN-1) can be set as the reconstruction value of the reference pixel point C.

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is not available, searching for an available pixel point from the coordinate (xN-1, yN + nTbH-1) to the coordinate (xN-1, yN-1) according to the preset sequence, and then from the coordinate (xN, yN-1) to the coordinate (xN + nTbW-1, yN-1), once the available pixel point is found, the search is terminated, and if the available pixel point is (x, y), the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is set as the reconstruction value of the pixel point (x, y); if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-M) is unavailable, wherein M is larger than 1 and smaller than or equal to nTbH +1, an available reference pixel point can be searched for from the coordinate (xN-1, yN + nTbH-M) according to the sequence opposite to the preset sequence, and if the available reference pixel point is C, the reconstruction value of the reference pixel point (xN-1, yN + nTbH-M) can be set as the reconstruction value of the reference pixel point C; if the reference pixel coordinate is (xN + N, yN-1), the reconstruction value is not available, where N is greater than or equal to 0 and less than or equal to nTbW-1, the available reference pixel can be searched for in the order opposite to the preset order from the coordinate (xN + N, yN-1), and if the available reference pixel is D, the reconstruction value of the reference pixel (xN + N, yN-1) can be set as the reconstruction value of the reference pixel D.

With reference to the second aspect and the possible implementation manners, in another possible implementation manner, if it is determined that none of the pixels adjacent to the upper edge and the left edge of the to-be-processed image block is available, the reconstruction value of the reference pixel is set to 1< < (bitDepth-1), where bitDepth is a bit depth of a sampling value of the reference pixel.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing calculation module is further configured to: when the reference pixel point is positioned above the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of the left and right adjacent pixel points of the reference pixel point; when the reference pixel point is positioned at the left side of the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of upper and lower adjacent pixel points of the reference pixel point; and updating the reconstruction value of the reference pixel point by adopting the result of the weighted calculation.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing calculation module is further configured to: performing initial update on the motion information through a first preset algorithm; correspondingly, the compensation module is specifically configured to: and performing motion compensation on the image block to be processed based on the initially updated motion information.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing calculation module is further configured to: pre-updating the prediction block through a second preset algorithm; correspondingly, the calculating module is specifically configured to: and performing weighted calculation on the reconstruction value of the one or more reference pixel points and the pre-updated prediction value of the target pixel point.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing calculation module is further configured to: and updating the predicted value of the target pixel point through a second preset algorithm.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the parsing module is further configured to: analyzing the code stream to obtain a prediction mode of the image block to be processed; determining the prediction mode as a merge mode (merge) and/or an inter-frame advanced motion vector prediction mode (inter AMVP); it is understood that the inter advanced motion vector prediction mode (inter AMVP) may also be referred to as an inter motion vector prediction mode (inter MVP).

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the parsing module is further configured to: analyzing the code stream to obtain the updating judgment identification information of the image block to be processed; and determining that the updating judgment identification information indicates to update the prediction block of the image block to be processed.

With reference to the second aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing calculation module is further configured to: acquiring preset updating judgment identification information of the image block to be processed; and determining that the updating judgment identification information indicates to update the prediction block of the image block to be processed.

In a third aspect of the present application, there is provided a prediction apparatus of motion information, including: a processor and a memory coupled to the processor; the processor is configured to perform the method of the first aspect.

In a fourth aspect of the present application, a computer-readable storage medium is provided, having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a fifth aspect of the present application, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

It should be understood that the second to fifth aspects of the present application are consistent with the technical solutions of the first aspect of the present application, and the beneficial effects obtained by the aspects and the corresponding implementable design manners are similar, and are not repeated.

Drawings

FIG. 1 is a block diagram of an exemplary video coding system that may be configured for use with embodiments of the present application;

FIG. 2 is a block diagram of an exemplary system that may be configured for use with a video encoder of an embodiment of the present application;

FIG. 3 is a block diagram of an exemplary system that may be configured for use with a video decoder of an embodiment of the present application;

FIG. 4 is a block diagram of an exemplary inter-prediction module that may be configured for use with embodiments of the present application;

FIG. 5 is a flowchart illustrating an exemplary implementation of merge prediction mode;

FIG. 6 is a flowchart illustrating an exemplary implementation of an advanced motion vector prediction mode;

FIG. 7 is a flowchart illustrating an exemplary implementation of motion compensation by a video decoder that may be configured for use with embodiments of the present application;

FIG. 8 is a diagram of an exemplary coding unit and neighboring tiles associated therewith;

FIG. 9 is a flowchart of an exemplary implementation of constructing a list of candidate predicted motion vectors;

FIG. 10 is a diagram illustrating an exemplary implementation of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list;

FIG. 11 is a diagram illustrating an exemplary implementation of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list;

FIG. 12 is a diagram illustrating an exemplary implementation of adding a zero motion vector to the merge mode candidate prediction motion vector list;

FIG. 13 is a schematic flow chart of an inter-frame prediction method according to an embodiment of the present application;

fig. 14 is a first schematic view illustrating an application of an inter-frame prediction method according to an embodiment of the present application;

fig. 15 is a schematic diagram illustrating an application of an inter-frame prediction method according to an embodiment of the present application;

fig. 16 is a schematic diagram illustrating a third application of an inter-frame prediction method according to an embodiment of the present application;

fig. 17 is a fourth schematic view illustrating an application of an inter-frame prediction method according to an embodiment of the present application;

fig. 18 is a schematic block diagram of an inter-frame prediction apparatus according to an embodiment of the present application;

fig. 19 is a schematic block diagram of another inter-frame prediction apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

FIG. 1 is a block diagram of a video coding system of one example described in an embodiment of the present application. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this application, the term "video coding" or "coding" may generally refer to video encoding or video decoding. The video encoder 100 and the video decoder 200 of the video coding system are configured to predict motion information, such as motion vectors, of a currently coded image block or a sub-block thereof according to various method examples described in any one of a plurality of new inter prediction modes proposed in the present application, such that the predicted motion vectors are maximally close to the motion vectors obtained using the motion estimation method, thereby eliminating the need to transmit motion vector differences when encoding, and further improving the coding and decoding performance.

As shown in fig. 1, a video coding system includes a source device 10 and a destination device 20. Source device 10 generates encoded video data. Accordingly, source device 10 may be referred to as a video encoding device. Destination device 20 may decode the encoded video data generated by source device 10. Destination device 20 may therefore be referred to as a video decoding device. Various implementations of source device 10, destination device 20, or both may include one or more processors and memory coupled to the one or more processors. The memory can include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, as described herein.

Source device 10 and destination device 20 may comprise a variety of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.

Destination device 20 may receive encoded video data from source device 10 over link 30. Link 30 may comprise one or more media or devices capable of moving encoded video data from source device 10 to destination device 20. In one example, link 30 may comprise one or more communication media that enable source device 10 to transmit encoded video data directly to destination device 20 in real-time. In this example, source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination device 20. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may include a router, switch, base station, or other apparatus that facilitates communication from source device 10 to destination device 20.

In another example, encoded data may be output from output interface 140 to storage device 40. Similarly, encoded data may be accessed from storage device 40 through input interface 240. Storage 40 may comprise any of a variety of distributed or locally accessed data storage media such as a hard disk drive, blu-ray discs, Digital Versatile Discs (DVDs), compact disc read-only memories (CD-ROMs), flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data.

In another example, storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 10. Destination device 20 may access the stored video data from storage device 40 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 20. Example file servers include web servers (e.g., for websites), File Transfer Protocol (FTP) servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 20 may access the encoded video data over any standard data connection, including an internet connection. This may include a wIreless channel (e.g., a wIreless fidelity (Wi-Fi) connection), a wired connection (e.g., a Digital Subscriber Line (DSL), cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.

The motion vector prediction techniques of the present application may be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding for video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, video coding systems may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

The video coding system illustrated in fig. 1 is merely an example, and the techniques of this application may be applicable to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, the data is retrieved from local storage, streamed over a network, and so forth. A video encoding device may encode and store data to a memory, and/or a video decoding device may retrieve and decode data from a memory. In many examples, the encoding and decoding are performed by devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.

In the example of fig. 1, source device 10 includes video source 120, video encoder 100, and output interface 140. In some examples, output interface 140 may include a regulator/demodulator (modem) and/or a transmitter. Video source 120 may comprise a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of such sources of video data.

Video encoder 100 may encode video data from video source 120. In some examples, source device 10 transmits the encoded video data directly to destination device 20 via output interface 140. In other examples, encoded video data may also be stored onto storage device 40 for later access by destination device 20 for decoding and/or playback.

In the example of fig. 1, destination device 20 includes input interface 240, video decoder 200, and display device 220. In some examples, input interface 240 includes a receiver and/or a modem. Input interface 240 may receive encoded video data via link 30 and/or from storage device 40. Display device 220 may be integrated with destination device 20 or may be external to destination device 20. In general, display device 220 displays decoded video data. The display device 220 may include various display devices, such as a Liquid Crystal Display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.

Although not shown in fig. 1, in some aspects, video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. In some examples, the demultiplexer (MUX-DEMUX) unit may conform to an International Telecommunications Union (ITU) h.223 multiplexer protocol, or other protocols such as User Datagram Protocol (UDP), if applicable.

Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as: one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the present application is implemented in part in software, a device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may execute the instructions in hardware using one or more processors to implement the techniques of the present application. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in a respective device.

This application may generally refer to video encoder 100 as "signaling" or "transmitting" certain information to another device, such as video decoder 200. The terms "signaling" or "transmitting" may generally refer to the communication of syntax elements and/or other data used to decode compressed video data. This transfer may occur in real time or near real time. Alternatively, such communication may occur over a period of time, such as may occur when, at the time of encoding, syntax elements are stored in the encoded codestream to a computer-readable storage medium, which the decoding device may then retrieve at any time after the syntax elements are stored to such medium.

The h.265 High Efficiency Video Coding (HEVC) standard was developed by JCT-VC. HEVC standardization is based on an evolution model of a video decoding device called HEVC test model (HM). The latest standard document for H.265 is available from http:// www.itu.int/REC/T-REC-H.265, the latest version of the standard document being H.265(12/16), which is incorporated herein by reference in its entirety. The HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-T H.264/AVC. For example, h.264 provides 9 intra-prediction encoding modes, while the HM may provide up to 35 intra-prediction encoding modes.

Jfet is dedicated to developing the h.266 standard. The process of h.266 normalization is based on an evolving model of the video decoding apparatus called the h.266 test model. The algorithm description of H.266 is available from http:// phenix. int-evry. fr/JVET, with the latest algorithm description contained in JFET-F1001-v 2, which is incorporated herein by reference in its entirety. Also, reference software for JEM test models is available from https:// jvet. hhi. fraunhofer. de/svn/svn _ HMJEMSOFORWare/incorporated herein by reference in its entirety.

In general, the working model of HM describes that a video frame or image may be divided into a sequence of treeblocks or Largest Coding Units (LCUs), also referred to as Coding Tree Units (CTUs), that contain both luma and chroma samples. Treeblocks have a similar purpose as macroblocks of the h.264 standard. A slice includes a number of consecutive treeblocks in decoding order. A video frame or image may be partitioned into one or more slices. Each treeblock may be split into coding units according to a quadtree. For example, a treeblock that is the root node of a quadtree may be split into four child nodes, and each child node may in turn be a parent node and split into four other child nodes. The final non-fragmentable child node, which is a leaf node of the quadtree, comprises a decoding node, e.g., a decoded video block. Syntax data associated with the decoded codestream may define a maximum number of times the treeblock may be split, and may also define a minimum size of the decoding node.

An encoding unit includes a decoding node and a prediction block (PU) and a Transform Unit (TU) associated with the decoding node. The size of a CU corresponds to the size of the decoding node and must be square in shape. The size of a CU may range from 8 x 8 pixels up to a maximum treeblock size of 64 x 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe a situation in which the CU is partitioned into one or more PUs. The partition mode may be different between cases where a CU is skipped or is directly mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The PU may be partitioned into shapes other than square. For example, syntax data associated with a CU may also describe a situation in which the CU is partitioned into one or more TUs according to a quadtree. The TU may be square or non-square in shape.

The HEVC standard allows for transform according to TUs, which may be different for different CUs. A TU is typically sized based on the size of a PU within a given CU defined for a partitioned LCU, although this may not always be the case. The size of a TU is typically the same as or smaller than a PU. In some possible implementations, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure called a "residual qualtree" (RQT). The leaf nodes of the RQT may be referred to as TUs. The pixel difference values associated with the TUs may be transformed to produce transform coefficients, which may be quantized.

In general, a PU includes data related to a prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing an intra-prediction mode of the PU. As another possible implementation, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. For example, the data defining the motion vector for the PU may describe a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list of the motion vector (e.g., list 0, list 1, or list C).

In general, TUs use a transform and quantization process. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 100 may calculate residual values corresponding to the PU. The residual values comprise pixel difference values that may be transformed into transform coefficients, quantized, and scanned using TUs to produce serialized transform coefficients for entropy decoding. The term "video block" is generally used herein to refer to a decoding node of a CU. In some particular applications, the present application may also use the term "video block" to refer to a treeblock that includes a decoding node as well as PUs and TUs, e.g., an LCU or CU.

A video sequence typically comprises a series of video frames or images. A group of pictures (GOP) illustratively comprises a series of one or more video pictures. The GOP may include syntax data in header information of the GOP, header information of one or more of the pictures, or elsewhere, the syntax data describing the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the coding mode of the respective picture. Video encoder 100 typically operates on video blocks within individual video stripes in order to encode the video data. The video block may correspond to a decoding node within the CU. Video blocks may have fixed or varying sizes and may differ in size according to a specified decoding standard.

As a possible implementation, the HM supports prediction of various PU sizes. Assuming that the size of a particular CU is 2N × 2N, the HM supports intra prediction of PU sizes of 2N × 2N or N × N, and inter prediction of symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N. The HM also supports asymmetric partitioning for inter prediction for PU sizes of 2 nxnu, 2 nxnd, nlx 2N and nR x 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a horizontally split 2N × 2NCU, with 2N × 0.5NPU on top and 2N × 1.5NPU on the bottom.

In this application, "N × N" and "N by N" are used interchangeably to refer to the pixel size of a video block in both the vertical and horizontal dimensions, e.g., 16 × 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.

After using intra-predictive or inter-predictive decoding of PUs of the CU, video encoder 100 may calculate residual data for TUs of the CU. A PU may comprise pixel data in a spatial domain (also referred to as a pixel domain), and a TU may comprise coefficients in a transform domain after applying a transform (e.g., a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PUs. Video encoder 100 may form TUs that include residual data of a CU, and then transform the TUs to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 100 may perform quantization of the transform coefficients. Quantization exemplarily refers to a process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be reduced to an m-bit value during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images, and in particular, a block coding structure called "quadtree combined binary tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, TU and the like in HEVC, supports more flexible CU partition shapes, and one CU can be square or rectangular. One CTU first performs a quadtree division, and leaf nodes of the quadtree are further subjected to a binary tree division. Meanwhile, there are two partitioning modes in binary tree partitioning, symmetric horizontal partitioning and symmetric vertical partitioning. The leaf nodes of the binary tree are called CUs, and none of the JEM CUs can be further divided during prediction and transformation, i.e. all of the JEM CUs, PUs, TUs have the same block size. In JEM at the present stage, the maximum size of the CTU is 256 × 256 luminance pixels.

In some possible implementations, video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other possible implementations, video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may entropy decode the one-dimensional vector according to context-based adaptive variable-length code (CAVLC), context-based adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary arithmetic decoding (SBAC), Probability Interval Partition Entropy (PIPE) decoding, or other entropy decoding methods. Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 in decoding the video data.

To perform CABAC, video encoder 100 may assign a context within the context model to a symbol to be transmitted. A context may relate to whether adjacent values of a symbol are non-zero. To perform CAVLC, video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in variable-length decoding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, the use of VLC may achieve a code rate saving goal with respect to using equal length codewords for each symbol to be transmitted. The probability in CABAC may be determined based on the context assigned to the symbol.

In embodiments of the present application, a video encoder may perform inter prediction to reduce temporal redundancy between pictures. As described previously, a CU may have one or more prediction units, PUs, according to the specifications of different video compression codec standards. In other words, multiple PUs may belong to a CU, or the PUs and the CU are the same size. When the CU and PU sizes are the same, the partition mode of the CU is not partitioned, or is partitioned into one PU, and is expressed by using the PU collectively herein. When the video encoder performs inter prediction, the video encoder may signal the video decoder with motion information for the PU. For example, the motion information of the PU may include: reference picture index, motion vector and prediction direction identification. The motion vector may indicate a displacement between an image block (also referred to as a video block, a block of pixels, a set of pixels, etc.) of the PU and a reference block of the PU. The reference block of the PU may be a portion of a reference picture that is similar to the image block of the PU. The reference block may be located in a reference picture indicated by the reference picture index and the prediction direction identification.

To reduce the number of coding bits needed to represent the Motion information of the PU, the video encoder may generate a list of candidate prediction Motion Vectors (MVs) for each of the PUs according to a merge prediction mode or advanced Motion Vector prediction mode process. Each candidate predictive motion vector in the list of candidate predictive motion vectors for the PU may indicate motion information. The motion information indicated by some candidate predicted motion vectors in the candidate predicted motion vector list may be based on motion information of other PUs. The present application may refer to a candidate predicted motion vector as an "original" candidate predicted motion vector if the candidate predicted motion vector indicates motion information that specifies one of a spatial candidate predicted motion vector position or a temporal candidate predicted motion vector position. For example, for merge mode, also referred to herein as merge prediction mode, there may be five original spatial candidate predicted motion vector positions and one original temporal candidate predicted motion vector position. In some examples, the video encoder may generate additional candidate predicted motion vectors by combining partial motion vectors from different original candidate predicted motion vectors, modifying the original candidate predicted motion vectors, or inserting only zero motion vectors as candidate predicted motion vectors. These additional candidate predicted motion vectors are not considered as original candidate predicted motion vectors and may be referred to as artificially generated candidate predicted motion vectors in this application.

The techniques of this application generally relate to techniques for generating a list of candidate predictive motion vectors at a video encoder and techniques for generating the same list of candidate predictive motion vectors at a video decoder. The video encoder and the video decoder may generate the same candidate prediction motion vector list by implementing the same techniques for constructing the candidate prediction motion vector list. For example, both the video encoder and the video decoder may construct a list with the same number of candidate predicted motion vectors (e.g., five candidate predicted motion vectors). Video encoders and decoders may first consider spatial candidate predictive motion vectors (e.g., neighboring blocks in the same picture), then temporal candidate predictive motion vectors (e.g., candidate predictive motion vectors in different pictures), and finally may consider artificially generated candidate predictive motion vectors until a desired number of candidate predictive motion vectors are added to the list. According to the techniques of this application, a pruning operation may be utilized during candidate predicted motion vector list construction for certain types of candidate predicted motion vectors in order to remove duplicates from the candidate predicted motion vector list, while for other types of candidate predicted motion vectors, pruning may not be used in order to reduce decoder complexity. For example, for a set of spatial candidate predicted motion vectors and for temporal candidate predicted motion vectors, a pruning operation may be performed to exclude candidate predicted motion vectors with duplicate motion information from the list of candidate predicted motion vectors. However, when the artificially generated candidate predicted motion vector is added to the list of candidate predicted motion vectors, the artificially generated candidate predicted motion vector may be added without performing a clipping operation on the artificially generated candidate predicted motion vector.

After generating the candidate predictive motion vector list for the PU of the CU, the video encoder may select a candidate predictive motion vector from the candidate predictive motion vector list and output a candidate predictive motion vector index in the codestream. The selected candidate predictive motion vector may be the candidate predictive motion vector having a motion vector that yields the predictor that most closely matches the target PU being decoded. The candidate predicted motion vector index may indicate a position in the candidate predicted motion vector list where the candidate predicted motion vector is selected. The video encoder may also generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. The motion information for the PU may be determined based on the motion information indicated by the selected candidate predictive motion vector. For example, in merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector. The video encoder may generate one or more residual tiles for the CU based on the predictive tiles of the PUs of the CU and the original tiles for the CU. The video encoder may then encode the one or more residual image blocks and output the one or more residual image blocks in the code stream.

The codestream may include data identifying a selected candidate predictive motion vector in a candidate predictive motion vector list for the PU. The video decoder may determine the motion information for the PU based on the motion information indicated by the selected candidate predictive motion vector in the candidate predictive motion vector list for the PU. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying the one or more reference blocks of the PU, the video decoder may generate a predictive image block for the PU based on the one or more reference blocks of the PU. The video decoder may reconstruct the tiles for the CU based on the predictive tiles for the PUs of the CU and the one or more residual tiles for the CU.

For ease of explanation, this application may describe locations or image blocks as having various spatial relationships with CUs or PUs. This description may be interpreted to mean that the locations or tiles have various spatial relationships with the tiles associated with the CU or PU. Furthermore, the present application may refer to a PU that is currently being decoded by the video decoder as a current PU, also referred to as a current pending image block. This application may refer to a CU that a video decoder is currently decoding as the current CU. The present application may refer to a picture that is currently being decoded by a video decoder as a current picture. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the PU is used uniformly for representation.

As briefly described above, video encoder 100 may use inter prediction to generate predictive image blocks and motion information for PUs of a CU. In many instances, the motion information for a given PU may be the same as or similar to the motion information of one or more nearby PUs (i.e., PUs whose tiles are spatially or temporally nearby to the tiles of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs. Encoding motion information for a given PU with reference to motion information for nearby PUs may reduce the number of encoding bits required in the codestream to indicate the motion information for the given PU.

Video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs in various ways. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. This application may use merge mode to refer to indicating that the motion information of a given PU is the same as or derivable from the motion information of nearby PUs. In another possible implementation, video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU. The MVD indicates the difference between the motion vector of a given PU and the motion vectors of nearby PUs. Video encoder 100 may include the motion vector for the MVD in the motion information for the given PU instead of the given PU. Fewer coding bits are required to represent the MVD in the codestream than to represent the motion vector for a given PU. The present application may use advanced motion vector prediction mode to refer to signaling motion information of a given PU to a decoding end by using an MVD and an index value identifying a candidate motion vector.

To signal motion information for a given PU at a decoding end using merge mode or AMVP mode, video encoder 100 may generate a list of candidate predictive motion vectors for the given PU. The candidate predictive motion vector list may include one or more candidate predictive motion vectors. Each of the candidate predictive motion vectors in the candidate predictive motion vector list for a given PU may specify motion information. The motion information indicated by each candidate predicted motion vector may include a motion vector, a reference picture index, and a prediction direction identification. The candidate predicted motion vectors in the candidate predicted motion vector list may comprise "original" candidate predicted motion vectors, where each indicates motion information for one of the specified candidate predicted motion vector positions within a PU that is different from the given PU.

After generating the list of candidate predictive motion vectors for the PU, video encoder 100 may select one of the candidate predictive motion vectors from the list of candidate predictive motion vectors for the PU. For example, the video encoder may compare each candidate predictive motion vector to the PU being decoded and may select a candidate predictive motion vector with the desired rate-distortion cost. Video encoder 100 may output the candidate prediction motion vector index for the PU. The candidate predicted motion vector index may identify a position of the selected candidate predicted motion vector in the candidate predicted motion vector list.

Furthermore, video encoder 100 may generate the predictive picture block for the PU based on the reference block indicated by the motion information of the PU. The motion information for the PU may be determined based on motion information indicated by a selected candidate predictive motion vector in a list of candidate predictive motion vectors for the PU. For example, in merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In AMVP mode, the motion information for the PU may be determined based on the motion vector difference for the PU and the motion information indicated by the selected candidate prediction motion vector. Video encoder 100 may process the predictive image blocks for the PU as described previously.

When video decoder 200 receives the codestream, video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU. The candidate prediction motion vector list generated by video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by video encoder 100 for the PU. The syntax element parsed from the codestream may indicate a location in the candidate predicted motion vector list for the PU where the candidate predicted motion vector is selected. After generating the list of candidate prediction motion vectors for the PU, video decoder 200 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU. Video decoder 200 may determine the motion information for the PU based on the motion information indicated by the selected candidate predictive motion vector in the list of candidate predictive motion vectors for the PU. Video decoder 200 may reconstruct the tiles for the CU based on the predictive tiles for the PU and the residual tiles for the CU.

It should be understood that, in a possible implementation manner, at the decoding end, the construction of the candidate predicted motion vector list and the parsing of the selected candidate predicted motion vector from the code stream in the candidate predicted motion vector list are independent of each other, and may be performed in any order or in parallel.

In another possible implementation manner, at a decoding end, the position of a candidate predicted motion vector in a candidate predicted motion vector list is firstly analyzed and selected from a code stream, and the candidate predicted motion vector list is constructed according to the analyzed position. For example, when the selected candidate predicted motion vector obtained by analyzing the code stream is a candidate predicted motion vector with an index of 3 in the candidate predicted motion vector list, the candidate predicted motion vector with the index of 3 can be determined only by constructing the candidate predicted motion vector list from the index of 0 to the index of 3, so that the technical effects of reducing complexity and improving decoding efficiency can be achieved.

Fig. 2 is a block diagram of a video encoder 100 of one example described in an embodiment of the present application. The video encoder 100 is used to output video to the post-processing entity 41. Post-processing entity 41 represents an example of a video entity, such as a media-aware network element (MANE) or a splicing/editing device, that may process the encoded video data from video encoder 100. In some cases, post-processing entity 41 may be an instance of a network entity. In some video encoding systems, post-processing entity 41 and video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 41 may be performed by the same device that includes video encoder 100. In some example, post-processing entity 41 is an example of storage 40 of FIG. 1.

In the example of fig. 2, the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a Decoded Picture Buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103. The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109. For image block reconstruction, the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111. Filter unit 106 is intended to represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALFs), and Sample Adaptive Offset (SAO) filters. Although filter unit 106 is shown in fig. 2 as an in-loop filter, in other implementations, filter unit 106 may be implemented as a post-loop filter. In one example, the video encoder 100 may further include a video data memory, a partitioning unit (not shown).

The video data memory may store video data to be encoded by components of video encoder 100. The video data stored in the video data memory may be obtained from video source 120. DPB 107 may be a reference picture memory that stores reference video data used to encode video data by video encoder 100 in intra, inter coding modes. The video data memory and DPB 107 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including Synchronous Dynamic Random Access Memory (SDRAM), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. The video data memory and DPB 107 may be provided by the same memory device or separate memory devices. In various examples, the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.

As shown in fig. 2, video encoder 100 receives video data and stores the video data in a video data memory. The partitioning unit partitions the video data into image blocks and these image blocks may be further partitioned into smaller blocks, e.g. image block partitions based on a quadtree structure or a binary tree structure. This segmentation may also include segmentation into stripes (slices), slices (tiles), or other larger units. Video encoder 100 generally illustrates components that encode image blocks within a video slice to be encoded. The slice may be divided into a plurality of tiles (and possibly into a set of tiles called a slice). Prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. Prediction processing unit 108 may provide the resulting intra, inter coded block to summer 112 to generate a residual block and to summer 111 to reconstruct the encoded block used as the reference picture.

An intra predictor 109 within prediction processing unit 108 may perform intra-predictive encoding of the current block relative to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy. Inter predictor 110 within prediction processing unit 108 may perform inter-predictive encoding of the current block relative to one or more prediction blocks in one or more reference pictures to remove temporal redundancy.

In particular, the inter predictor 110 may be used to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use rate-distortion analysis to calculate rate-distortion values for various inter prediction modes in the set of candidate inter prediction modes and select the inter prediction mode having the best rate-distortion characteristics therefrom. Rate distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (that is, the number of bits) used to produce the encoded block. For example, the inter predictor 110 may determine an inter prediction mode with the smallest rate-distortion cost for encoding the current image block in the candidate inter prediction mode set as the inter prediction mode for inter predicting the current image block.

The inter predictor 110 is configured to predict motion information (e.g., a motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and acquire or generate a prediction block of the current image block using the motion information (e.g., the motion vector) of the one or more sub-blocks in the current image block. The inter predictor 110 may locate the prediction block to which the motion vector points in one of the reference picture lists. The inter predictor 110 may also generate syntax elements associated with the image block and the video slice for use by the video decoder 200 in decoding the image block of the video slice. Or, in an example, the inter predictor 110 performs a motion compensation process using the motion information of each sub-block to generate a prediction block of each sub-block, so as to obtain a prediction block of the current image block; it should be understood that the inter predictor 110 herein performs motion estimation and motion compensation processes.

Specifically, after selecting the inter prediction mode for the current image block, the inter predictor 110 may provide information indicating the selected inter prediction mode for the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the information indicating the selected inter prediction mode.

The intra predictor 109 may perform intra prediction on the current image block. In particular, the intra predictor 109 may determine an intra prediction mode used to encode the current block. For example, the intra predictor 109 may calculate rate-distortion values for various intra prediction modes to be tested using rate-distortion analysis and select an intra prediction mode having the best rate-distortion characteristics from among the modes to be tested. In any case, after selecting the intra prediction mode for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode for the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the information indicating the selected intra prediction mode.

After prediction processing unit 108 generates a prediction block for the current image block via inter-prediction, intra-prediction, video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded. Summer 112 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more TUs and applied to transformer 101. The transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. Transformer 101 may convert residual video data from a pixel value domain to a transform domain, e.g., the frequency domain.

The transformer 101 may send the resulting transform coefficients to the quantizer 102. Quantizer 102 quantizes the transform coefficients to further reduce the bit rate. In some examples, quantizer 102 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, the entropy encoder 103 may perform a scan.

After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by the entropy encoder 103, the encoded codestream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200. The entropy encoder 103 may also entropy encode syntax elements of the current image block to be encoded.

Inverse quantizer 104 and inverse transformer 105 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block for a reference image. The summer 111 adds the reconstructed residual block to the prediction block produced by the inter predictor 110 or the intra predictor 109 to produce a reconstructed image block. The filter unit 106 may be adapted to reconstruct the image block to reduce distortions, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107, which may be used by the inter predictor 110 as a reference block to inter predict a block in a subsequent video frame or image.

It should be understood that other structural variations of the video encoder 100 may be used to encode the video stream. For example, for some image blocks or image frames, the video encoder 100 may quantize the residual signal directly without processing by the transformer 101 and correspondingly without processing by the inverse transformer 105; alternatively, for some image blocks or image frames, the video encoder 100 does not generate residual data and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; alternatively, the video encoder 100 may store the reconstructed picture block directly as a reference block without processing by the filter unit 106; alternatively, the quantizer 102 and the dequantizer 104 in the video encoder 100 may be combined together.

Fig. 3 is a block diagram of a video decoder 200 of one example described in an embodiment of the present application. In the example of fig. 3, the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a DPB 207. The prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209. In some examples, video decoder 200 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 100 from fig. 2.

In the decoding process, video decoder 200 receives an encoded video bitstream representing an image block and associated syntax elements of an encoded video slice from video encoder 100. Video decoder 200 may receive video data from network entity 42 and, optionally, may store the video data in a video data store (not shown). The video data memory may store video data, such as an encoded video bitstream, to be decoded by components of video decoder 200. The video data stored in the video data memory may be obtained, for example, from storage device 40, from a local video source such as a camera, via wired or wireless network communication of video data, or by accessing a physical data storage medium. The video data memory may serve as a decoded picture buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data memory is not illustrated in fig. 3, the video data memory and the DPB 207 may be the same memory or may be separately provided memories. Video data memory and DPB 207 may be formed from any of a variety of memory devices, such as: dynamic Random Access Memory (DRAM) including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. In various examples, the video data memory may be integrated on-chip with other components of video decoder 200, or disposed off-chip with respect to those components.

Network entity 42 may be, for example, a server, a MANE, a video editor/splicer, or other such device for implementing one or more of the techniques described above. Network entity 42 may or may not include a video encoder, such as video encoder 100. Network entity 42 may implement portions of the techniques described in this application before network entity 42 sends the encoded video bitstream to video decoder 200. In some video decoding systems, network entity 42 and video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to network entity 42 may be performed by the same device that includes video decoder 200. In some cases, network entity 42 may be an example of storage 40 of fig. 1.

The entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements. The entropy decoder 203 forwards the syntax elements to the prediction processing unit 208. Video decoder 200 may receive syntax elements at the video slice level and/or the picture block level.

When a video slice is decoded as an intra-decoded (I) slice, intra predictor 209 of prediction processing unit 208 may generate a prediction block for an image block of the current video slice based on the signaled intra prediction mode and data from previously decoded blocks of the current frame or picture. When a video slice is decoded as an inter-decoded (i.e., B or P) slice, the inter predictor 210 of the prediction processing unit 208 may determine an inter prediction mode for decoding a current image block of the current video slice based on syntax elements received from the entropy decoder 203, decode the current image block (e.g., perform inter prediction) based on the determined inter prediction mode. Specifically, the inter predictor 210 may determine whether a current image block of the current video slice is predicted using a new inter prediction mode, and if the syntax element indicates that the current image block is predicted using the new inter prediction mode, predict motion information of the current image block or a sub-block of the current image block of the current video slice based on the new inter prediction mode (e.g., a new inter prediction mode designated by the syntax element or a default new inter prediction mode), so as to obtain or generate a prediction block of the current image block or the sub-block of the current image block using the motion information of the predicted current image block or the sub-block of the current image block through a motion compensation process. The motion information herein may include reference picture information and motion vectors, wherein the reference picture information may include, but is not limited to, uni/bi-directional prediction information, reference picture list numbers and reference picture indexes corresponding to the reference picture lists. For inter-prediction, a prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 200 may construct reference picture lists, i.e., list 0 and list 1, based on the reference pictures stored in DPB 207. The reference frame index for the current picture may be included in one or more of reference frame list 0 and list 1. In some examples, it may be the particular syntax element that video encoder 100 signals indicating whether a new inter prediction mode is employed to decode the particular block, or it may be the particular syntax element that signals indicating whether a new inter prediction mode is employed and which new inter prediction mode is specifically employed to decode the particular block. It should be understood that the inter predictor 210 herein performs a motion compensation process.

The inverse quantizer 204 inversely quantizes, i.e., dequantizes, the quantized transform coefficients provided in the codestream and decoded by the entropy decoder 203. The inverse quantization process may include: the quantization parameter calculated by the video encoder 100 for each image block in the video slice is used to determine the degree of quantization that should be applied and likewise the degree of inverse quantization that should be applied. Inverse transformer 205 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to generate a block of residues in the pixel domain.

After the inter predictor 210 generates a prediction block for the current image block or a sub-block of the current image block, the video decoder 200 obtains a reconstructed block, i.e., a decoded image block, by summing the residual block from the inverse transformer 205 with the corresponding prediction block generated by the inter predictor 210. Summer 211 represents the component that performs this summation operation. A loop filter (in or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality, if desired. Filter unit 206 may represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALF), and Sample Adaptive Offset (SAO) filters. Although the filter unit 206 is shown in fig. 2 as an in-loop filter, in other implementations, the filter unit 206 may be implemented as a post-loop filter. In one example, the filter unit 206 is adapted to reconstruct the blocks to reduce the block distortion, and the result is output as a decoded video stream. Also, decoded image blocks in a given frame or picture may also be stored in the DPB 207, with the DPB 207 storing reference pictures for subsequent motion compensation. DPB 207 may be part of a memory that may also store decoded video for later presentation on a display device (e.g., display device 220 of fig. 1), or may be separate from such memory.

It should be understood that other structural variations of the video decoder 200 may be used to decode the encoded video stream. For example, the video decoder 200 may generate an output video stream without processing by the filter unit 206; alternatively, for some image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode quantized coefficients and accordingly does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.

As noted previously, the techniques of this application illustratively relate to inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application, including, for example, video encoder 100 and video decoder 200 as shown and described with respect to fig. 1-3. That is, in one possible implementation, the inter predictor 110 described with respect to fig. 2 may perform certain techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to fig. 3 may perform certain techniques described below when performing inter prediction during decoding of a block of video data. Thus, reference to a generic "video encoder" or "video decoder" may include video encoder 100, video decoder 200, or another video encoding or encoding unit.

It should be understood that, in the encoder 100 and the decoder 200 of the present application, the processing result for a certain link may be further processed and then output to the next link, for example, after the links such as interpolation filtering, motion vector derivation, or loop filtering, the processing result for the corresponding link is further subjected to operations such as Clip or shift.

For example, the value range of the motion vector is constrained to be within a certain bit width. Assuming that the allowed bit-width of the motion vector is bitDepth, the motion vector ranges from-2 ^ (bitDepth-1) to 2^ (bitDepth-1) -1, where the "^" symbol represents the power. And if the bitDepth is 16, the value range is-32768-32767. And if the bitDepth is 18, the value range is-131072-131071. The constraint can be made in two ways:

mode 1, the high order bits of motion vector overflow are removed:

ux＝(vx+2^bitDepth)％2^bitDepth

vx＝(ux≥2^bitDepth-1)？(ux-2^bitDepth):ux

uy＝(vy+2^bitDepth)％2^bitDepth

vy＝(uy≥2^bitDepth-1)？(uy-2^bitDepth):uy

for example, vx has a value of-32769, which is obtained by the above equation of 32767. Since in the computer the value is stored in binary's complement, -32769's complement is 1,0111,1111,1111,1111(17 bits), the computer processes the overflow to discard the high bits, the value of vx is 0111,1111,1111,1111, then 32767, consistent with the results obtained by the formula processing.

Method 2, the motion vector is clipped, as shown in the following formula:

vx＝Clip3(-2^bitDepth-1,2^bitDepth-1-1,vx)

vy＝Clip3(-2^bitDepth-1,2^bitDepth-1-1,vy)

where Clip3 is defined to mean clamping the value of z between the intervals [ x, y ]:

fig. 4 is a schematic block diagram of the inter-prediction module 121 in the embodiment of the present application. The inter prediction module 121, for example, may include a motion estimation unit and a motion compensation unit. The relationship between PU and CU varies among different video compression codec standards. Inter prediction module 121 may partition the current CU into PUs according to a plurality of partition modes. For example, inter prediction module 121 may partition the current CU into PUs according to 2 nx 2N, 2 nx N, N x 2N, and nxn partition modes. In other embodiments, the current CU is the current PU, and is not limited.

The inter prediction module 121 may perform Integer Motion Estimation (IME) and then Fractional Motion Estimation (FME) for each of the PUs. When the inter prediction module 121 performs IME on the PU, the inter prediction module 121 may search one or more reference pictures for a reference block for the PU. After finding the reference block for the PU, inter prediction module 121 may generate a motion vector that indicates, with integer precision, a spatial displacement between the PU and the reference block for the PU. When the inter prediction module 121 performs FME on a PU, the inter prediction module 121 may refine a motion vector generated by performing IME on the PU. The motion vectors generated by performing FME on a PU may have sub-integer precision (e.g., 1/2 pixel precision, 1/4 pixel precision, etc.). After generating the motion vectors for the PU, inter prediction module 121 may use the motion vectors for the PU to generate a predictive image block for the PU.

In some possible implementations where the inter prediction module 121 signals the motion information of the PU at the decoding end using AMVP mode, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The list of candidate predicted motion vectors may include one or more original candidate predicted motion vectors and one or more additional candidate predicted motion vectors derived from the original candidate predicted motion vectors. After generating the candidate prediction motion vector list for the PU, inter prediction module 121 may select a candidate prediction motion vector from the candidate prediction motion vector list and generate a Motion Vector Difference (MVD) for the PU. The MVD for the PU may indicate a difference between the motion vector indicated by the selected candidate prediction motion vector and a motion vector generated for the PU using the IME and FME. In these possible implementations, the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list. The inter prediction module 121 may also output the MVD of the PU. In fig. 6, a possible implementation of the Advanced Motion Vector Prediction (AMVP) mode in the embodiment of the present application is described in detail below.

In addition to generating motion information for the PUs by performing IME and FME on the PUs, inter prediction module 121 may also perform a Merge (Merge) operation on each of the PUs. When inter prediction module 121 performs a merge operation on a PU, inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The list of candidate predictive motion vectors for the PU may include one or more original candidate predictive motion vectors and one or more additional candidate predictive motion vectors derived from the original candidate predictive motion vectors. The original candidate predicted motion vectors in the list of candidate predicted motion vectors may include one or more spatial candidate predicted motion vectors and temporal candidate predicted motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current picture. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU that is different from the current picture. The temporal candidate prediction motion vector may also be referred to as Temporal Motion Vector Prediction (TMVP).

After generating the candidate prediction motion vector list, the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. Inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. FIG. 5, described below, illustrates an exemplary flow diagram for Merge.

After generating the predictive image block for the PU based on the IME and FME and the predictive image block for the PU based on the merge operation, the inter prediction module 121 may select either the predictive image block generated by the FME operation or the predictive image block generated by the merge operation. In some possible implementations, the inter prediction module 121 may select the predictive image block for the PU based on rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation.

After inter prediction module 121 has selected the predictive tiles of PUs generated by partitioning the current CU according to each of the partition modes (in some implementations, after coding tree unit CTU is divided into CUs, it is not further divided into smaller PUs, at which point the PUs are equivalent to CUs), inter prediction module 121 may select the partition mode for the current CU. In some implementations, the inter-prediction module 121 may select the partitioning mode for the current CU based on a rate-distortion cost analysis of selected predictive tiles of the PU that are generated by partitioning the current CU according to each of the partitioning modes. Inter prediction module 121 may output the predictive image blocks associated with PUs belonging to the selected partition mode to residual generation module 102. Inter prediction module 121 may output, to an entropy encoding module, a syntax element indicating motion information for PUs belonging to the selected partition mode.

In the diagram of fig. 4, the inter-frame prediction module 121 includes IME modules 180A-180N (collectively referred to as "IME module 180"), FME modules 182A-182N (collectively referred to as "FME module 182"), merging modules 184A-184N (collectively referred to as "merging modules 184"), PU mode decision modules 186A-186N (collectively referred to as "PU mode decision modules 186"), and a CU mode decision module 188 (which may also include performing a mode decision process from the CTU to the CU).

The IME module 180, FME module 182, and merge module 184 may perform IME operations, FME operations, and merge operations on PUs of the current CU. The inter prediction module 121 is illustrated in the schematic diagram of fig. 4 as including a separate IME module 180, FME module 182, and merge module 184 for each PU of each partition mode of the CU. In other possible implementations, the inter prediction module 121 does not include a separate IME module 180, FME module 182, and merging module 184 for each PU of each partition mode of the CU.

As illustrated in the schematic diagram of fig. 4, IME module 180A, FME module 182A and merge module 184A may perform IME operations, FME operations, and merge operations on PUs generated by partitioning CUs according to a 2 nx 2N partitioning mode. The PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, FME module 182A and the merge module 184A.

IME module 180B, FME module 182B and merge module 184B may perform IME, FME, and merge operations on the left PU resulting from partitioning the CU according to the nx2N partitioning mode. The PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, FME module 182B and the merge module 184B.

IME module 180C, FME module 182C and merge module 184C may perform IME, FME, and merge operations on the right PU resulting from partitioning the CU according to the nx2N partitioning mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, FME module 182C and the merge module 184C.

IME module 180N, FME module 182N and merge module 184 may perform IME, FME, and merge operations on the bottom right PU resulting from partitioning the CU according to an nxn partitioning mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, FME module 182N and the merge module 184N.

The PU mode decision module 186 may select a predictive tile based on rate-distortion cost analysis of a plurality of possible predictive tiles and select the predictive tile that provides the best rate-distortion cost for a given decoding scenario. For example, for bandwidth-limited applications, the PU mode decision module 186 may bias towards selecting predictive tiles that increase the compression ratio, while for other applications, the PU mode decision module 186 may bias towards selecting predictive tiles that increase the reconstructed video quality. After PU mode decision module 186 selects the predictive tiles for the PUs of the current CU, CU mode decision module 188 selects the partition mode for the current CU and outputs the predictive tiles and motion information for the PUs belonging to the selected partition mode.

Fig. 5 is a flowchart of an implementation of the merge mode in the embodiment of the present application. A video encoder, such as video encoder 20, may perform merge operation 201. The merge operation 201 may include: s202, generating a candidate list for the current prediction unit. S204, generating a predictive video block associated with a candidate in the candidate list. S206, selecting the candidate from the candidate list. And S208, outputting the candidate. The candidate refers to a candidate motion vector or candidate motion information.

In other possible implementations, the video encoder may perform a merge operation that is different from merge operation 201. For example, in other possible implementations, the video encoder may perform a merge operation, where the video encoder performs more, fewer, or different steps than merge operation 201. In other possible implementations, the video encoder may perform the steps of the merge operation 201 in a different order or in parallel. The encoder may also perform a merge operation 201 on PUs encoded in skip mode.

After the video encoder starts the merge operation 201, the video encoder may generate a candidate prediction motion vector list for the current PU (S202). The video encoder may generate the candidate prediction motion vector list for the current PU in various ways. For example, the video encoder may generate the list of candidate prediction motion vectors for the current PU according to one of the example techniques described below with respect to fig. 8-12.

As previously described, the candidate prediction motion vector list for the current PU may include temporal candidate prediction motion vectors. The temporal candidate prediction motion vector may indicate motion information of a temporally corresponding (co-located) PU. The co-located PU may be spatially in the same position in the image frame as the current PU, but in the reference image instead of the current image. The present application may refer to a reference picture that includes temporally corresponding PUs as a dependent reference picture. The present application may refer to a reference picture index of an associated reference picture as an associated reference picture index. As described previously, the current picture may be associated with one or more reference picture lists (e.g., list 0, list 1, etc.). The reference picture index may indicate a reference picture by indicating a position in a reference picture list of the reference picture. In some possible implementations, the current picture may be associated with a combined reference picture list.

In some video encoders, the relevant reference picture index is the reference picture index of the PU that encompasses the reference index source location associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to the left of the current PU or adjacent above the current PU. In this application, a PU may "cover" a particular location if the image block associated with the PU includes the particular location. In these video encoders, if a reference index source location is not available, the video encoder may use a reference picture index of zero.

However, the following examples may exist: the reference index source location associated with the current PU is within the current CU. In these examples, a PU that covers the reference index source location associated with the current PU may be deemed available if the PU is above or to the left of the current CU. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference picture that contains the co-located PU. Thus, these video encoders may use the motion information (i.e., reference picture indices) of PUs belonging to the current CU to generate temporal candidate prediction motion vectors for the current PU. In other words, these video encoders may generate temporal candidate prediction motion vectors using motion information of PUs belonging to the current CU. Thus, the video encoder may not be able to generate candidate prediction motion vector lists for the current PU and the PU that encompasses the reference index source location associated with the current PU in parallel.

In accordance with the techniques of this application, a video encoder may explicitly set a relevant reference picture index without referring to the reference picture index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for the current PU and other PUs of the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the relevant reference picture index is not based on the motion information of any other PU of the current CU. In some possible implementations where the video encoder explicitly sets the relevant reference picture index, the video encoder may always set the relevant reference picture index to a fixed predefined preset reference picture index (e.g., 0). In this way, the video encoder may generate a temporal candidate prediction motion vector based on motion information of a co-located PU in a reference frame indicated by a preset reference picture index, and may include the temporal candidate prediction motion vector in a candidate prediction motion vector list of the current CU.

In a possible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder may explicitly signal the relevant reference picture index in a syntax structure (e.g., a picture header, a slice header, an APS, or another syntax structure). In this possible implementation, the video encoder may signal the decoding end the relevant reference picture index for each LCU (i.e., CTU), CU, PU, TU, or other type of sub-block. For example, a video encoder may signal: the associated reference picture index for each PU of the CU is equal to "1".

In some possible implementations, the relevant reference picture index may be set implicitly rather than explicitly. In these possible implementations, the video encoder may generate each temporal candidate predictive motion vector in the list of candidate predictive motion vectors for a PU of the current CU using motion information for PUs in the reference picture indicated by reference picture indices for PUs covering locations outside the current CU, even if these locations are not strictly adjacent to the current PU.

After generating the candidate predictive motion vector list for the current PU, the video encoder may generate a predictive image block associated with a candidate predictive motion vector in the candidate predictive motion vector list (S204). The video encoder may generate a predictive image block associated with the candidate predictive motion vector by determining motion information for the current PU based on the motion information of the indicated candidate predictive motion vector and then generating the predictive image block based on one or more reference blocks indicated by the motion information of the current PU. The video encoder may then select one of the candidate predictive motion vectors from the list of candidate predictive motion vectors (S206). The video encoder may select the candidate prediction motion vector in various ways. For example, the video encoder may select one of the candidate predictive motion vectors based on a rate-distortion cost analysis for each of the predictive image blocks associated with the candidate predictive motion vectors.

After selecting the candidate prediction motion vector, the video encoder may output a candidate prediction motion vector index (S208). The candidate predicted motion vector index may indicate a position in the candidate predicted motion vector list where the candidate predicted motion vector is selected. In some possible embodiments, the candidate prediction motion vector index may be denoted as "merge _ idx".

Fig. 6 is a flowchart of an implementation of an Advanced Motion Vector Prediction (AMVP) mode in an embodiment of the present application. A video encoder, such as video encoder 20, may perform AMVP operation 210. The AMVP operations 210 may include: s211, one or more motion vectors for the current prediction unit are generated. S212, generating a predictive video block for the current prediction unit. And S213, generating a candidate list for the current prediction unit. And S214, generating a motion vector difference. S215 selects a candidate from the candidate list. S216, output reference picture index, candidate index, and motion vector difference for selecting candidate. The candidate refers to a candidate motion vector or candidate motion information.

After the video encoder starts AMVP operation 210, the video encoder may generate one or more motion vectors for the current PU (S211). The video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU. As described previously, the current picture may be associated with two reference picture lists (list 0 and list 1). If the current PU is uni-directionally predicted, the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU. The list 0 motion vector may indicate a spatial displacement between the image block of the current PU and a reference block in a reference picture in list 0. The list 1 motion vector may indicate a spatial displacement between the image block of the current PU and a reference block in a reference picture in list 1. If the current PU is bi-predicted, the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.

After generating the one or more motion vectors for the current PU, the video encoder may generate a predictive image block for the current PU (S212). The video encoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by one or more motion vectors for the current PU.

In addition, the video encoder may generate a candidate prediction motion vector list for the current PU (S213). The video decoder may generate the candidate prediction motion vector list for the current PU in various ways. For example, the video encoder may generate the list of candidate prediction motion vectors for the current PU according to one or more of the possible implementations described below with respect to fig. 8-12. In some possible embodiments, when the video encoder generates the candidate prediction motion vector list in the AMVP operation 210, the candidate prediction motion vector list may be limited to two candidate prediction motion vectors. In contrast, when the video encoder generates the candidate prediction motion vector list in the merge operation, the candidate prediction motion vector list may include more candidate prediction motion vectors (e.g., five candidate prediction motion vectors).

After generating the candidate prediction motion vector list for the current PU, the video encoder may generate one or more Motion Vector Differences (MVDs) for each candidate prediction motion vector in the candidate prediction motion vector list (S214). The video encoder may generate a motion vector difference for the candidate prediction motion vector by determining a difference between the motion vector indicated by the candidate prediction motion vector and a corresponding motion vector of the current PU.

If the current PU is uni-directionally predicted, the video encoder may generate a single MVD for each candidate prediction motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate prediction motion vector. The first MVD may indicate a difference between a motion vector of the candidate prediction motion vector and a list 0 motion vector of the current PU. The second MVD may indicate a difference between a motion vector of the candidate prediction motion vector and a list 1 motion vector of the current PU.

The video encoder may select one or more of the candidate predicted motion vectors from the list of candidate predicted motion vectors (S215). The video encoder may select one or more candidate predictive motion vectors in various ways. For example, the video encoder may select a candidate predictive motion vector that matches the associated motion vector of the motion vector to be encoded with the least error, which may reduce the number of bits required to represent the motion vector difference for the candidate predictive motion vector.

After selecting the one or more candidate predictive motion vectors, the video encoder may output one or more reference picture indices for the current PU, one or more candidate predictive motion vector indices, and one or more motion vector differences for the one or more selected candidate predictive motion vectors (S216).

In examples where the current picture is associated with two reference picture lists (list 0 and list 1) and the current PU is uni-directionally predicted, the video encoder may output either the reference picture index for list 0 ("ref _ idx _ 10") or the reference picture index for list 1 ("ref _ idx _ 11"). The video encoder may also output a candidate predictive motion vector index ("mvp _10_ flag") indicating a position of a selected candidate predictive motion vector for the list 0 motion vector of the current PU in the candidate predictive motion vector list. Alternatively, the video encoder may output a candidate predictive motion vector index ("mvp _11_ flag") indicating a position of a selected candidate predictive motion vector for the list 1 motion vector of the current PU in the candidate predictive motion vector list. The video encoder may also output the MVD for the list 0 motion vector or the list 1 motion vector for the current PU.

In an example where the current picture is associated with two reference picture lists (list 0 and list 1) and the current PU is bi-directionally predicted, the video encoder may output a reference picture index for list 0 ("ref _ idx _ 10") and a reference picture index for list 1 ("ref _ idx _ 11"). The video encoder may also output a candidate predictive motion vector index ("mvp _10_ flag") indicating a position of a selected candidate predictive motion vector for the list 0 motion vector of the current PU in the candidate predictive motion vector list. In addition, the video encoder may output a candidate predictive motion vector index ("mvp _11_ flag") indicating a position of a selected candidate predictive motion vector for the list 1 motion vector of the current PU in the candidate predictive motion vector list. The video encoder may also output an MVD for the list 0 motion vector for the current PU and an MVD for the list 1 motion vector for the current PU.

Fig. 7 is a flowchart of an implementation of motion compensation performed by a video decoder (e.g., video decoder 30) in the embodiment of the present application.

When the video decoder performs motion compensation operation 221, the video decoder may receive an indication of the selected candidate predictive motion vector for the current PU (S222). For example, the video decoder may receive a candidate predictive motion vector index indicating a position of the selected candidate predictive motion vector within a candidate predictive motion vector list of the current PU.

The video decoder may receive a first candidate prediction motion vector index and a second candidate prediction motion vector index if the motion information of the current PU is encoded using AMVP mode and the current PU is bi-directionally predicted. The first candidate predicted motion vector index indicates the position in the candidate predicted motion vector list of the selected candidate predicted motion vector for the list 0 motion vector of the current PU. The second candidate predicted motion vector index indicates a position in the candidate predicted motion vector list of the selected candidate predicted motion vector for the list 1 motion vector of the current PU. In some possible implementations, a single syntax element may be used to identify two candidate predictive motion vector indices.

In addition, the video decoder may generate a candidate prediction motion vector list for the current PU (S224). The video decoder may generate this list of candidate prediction motion vectors for the current PU in various ways. For example, the video decoder may use the techniques described below with reference to fig. 8-12 to generate a list of candidate prediction motion vectors for the current PU. When the video decoder generates a temporal candidate prediction motion vector for the candidate prediction motion vector list, the video decoder may explicitly or implicitly set a reference picture index that identifies the reference picture that includes the co-located PU, as described above with respect to fig. 5.

After generating the candidate predictive motion vector list for the current PU, the video decoder may determine motion information for the current PU based on motion information indicated by one or more selected candidate predictive motion vectors in the candidate predictive motion vector list for the current PU (S225). For example, if the motion information of the current PU is encoded using merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate prediction motion vector. If the motion information of the current PU is encoded using AMVP mode, the video decoder may reconstruct one or more motion vectors of the current PU using one or more motion vectors indicated by the or the selected candidate predicted motion vector and one or more MVDs indicated in the codestream. The reference picture index and the prediction direction identification of the current PU may be the same as the reference picture index and the prediction direction identification of the one or more selected candidate predictive motion vectors. After determining the motion information of the current PU, the video decoder may generate a predictive picture block for the current PU based on the one or more reference blocks indicated by the motion information of the current PU (S226).

Fig. 8 is an exemplary diagram of a Coding Unit (CU) and its associated neighboring tiles in an embodiment of the present application, illustrating a CU250 and exemplary candidate predicted motion vector locations 252A-252E associated with CU 250. Candidate predicted motion vector locations 252A-252E may be collectively referred to herein as candidate predicted motion vector locations 252. Candidate predicted motion vector position 252 represents a spatial candidate predicted motion vector in the same image as CU 250. Candidate predicted motion vector position 252A is located to the left of CU 250. Candidate predicted motion vector position 252B is located above CU 250. Candidate predicted motion vector position 252C is located to the upper right of CU 250. Candidate predicted motion vector position 252D is located to the lower left of CU 250. Candidate predicted motion vector position 252E is located at the top left of CU 250. Fig. 8 is a schematic implementation to provide a way in which the inter prediction module 121 and the motion compensation module may generate a list of candidate prediction motion vectors. Implementations will be explained below with reference to inter prediction module 121, but it should be understood that the motion compensation module may implement the same techniques, and thus generate the same list of candidate prediction motion vectors.

Fig. 9 is a flowchart of an implementation of constructing a candidate prediction motion vector list in an embodiment of the present application. The technique of fig. 9 will be described with reference to a list including five candidate predicted motion vectors, although the techniques described herein may also be used with lists having other sizes. The five candidate predicted motion vectors may each have an index (e.g., 0 to 4). The technique of fig. 9 will be described with reference to a general video decoder. A generic video decoder may illustratively be a video encoder (e.g., video encoder 20) or a video decoder (e.g., video decoder 30).

To reconstruct the candidate prediction motion vector list according to the embodiment of fig. 9, the video decoder first considers the four spatial candidate prediction motion vectors (902). The four spatial candidate predicted motion vectors may include candidate predicted

motion vector positions

252A, 252B, 252C, and 252D. The four spatial candidate predicted motion vectors correspond to motion information of four PUs in the same picture as the current CU (e.g., CU 250). The video decoder may consider the four spatial candidate predictive motion vectors in the list in a particular order. For example, candidate predicted motion vector location 252A may be considered first. Candidate predicted motion vector position 252A may be assigned to index 0 if candidate predicted motion vector position 252A is available. If candidate predicted motion vector position 252A is not available, the video decoder may not include candidate predicted motion vector position 252A in the candidate predicted motion vector list. Candidate predicted motion vector positions may not be available for various reasons. For example, if the candidate predicted motion vector position is not within the current picture, the candidate predicted motion vector position may not be available. In another possible implementation, the candidate predicted motion vector position may not be available if the candidate predicted motion vector position is intra predicted. In another possible implementation, the candidate predicted motion vector position may not be available if it is in a different slice than the current CU.

After considering candidate predicted motion vector position 252A, the video decoder may next consider candidate predicted motion vector position 252B. If candidate predicted motion vector position 252B is available and different from candidate predicted motion vector position 252A, the video decoder may add candidate predicted motion vector position 252B to the candidate predicted motion vector list. In this particular context, the terms "same" and "different" refer to motion information associated with candidate predicted motion vector positions. Thus, two candidate predicted motion vector positions are considered to be the same if they have the same motion information and are considered to be different if they have different motion information. If candidate predicted motion vector position 252A is not available, the video decoder may assign candidate predicted motion vector position 252B to index 0. If candidate predicted motion vector position 252A is available, the video decoder may assign candidate predicted motion vector position 252 to index 1. If candidate predicted motion vector location 252B is not available or is the same as candidate predicted motion vector location 252A, the video decoder skips candidate predicted motion vector location 252B and does not include it in the candidate predicted motion vector list.

Candidate predicted motion vector position 252C is similarly considered by the video decoder for inclusion in the list. If candidate predicted motion vector position 252C is available and is not the same as candidate predicted

motion vector positions

252B and 252A, the video decoder assigns candidate predicted motion vector position 252C to the next available index. If candidate predicted motion vector position 252C is not available or is not different from at least one of candidate predicted

motion vector positions

252A and 252B, the video decoder does not include candidate predicted motion vector position 252C in the candidate predicted motion vector list. Next, the video decoder considers candidate predicted motion vector position 252D. If candidate predicted motion vector position 252D is available and is not the same as candidate predicted

motion vector positions

252A, 252B, and 252C, the video decoder assigns candidate predicted motion vector position 252D to the next available index. If candidate predicted motion vector position 252D is not available or is not different from at least one of candidate predicted

motion vector positions

252A, 252B, and 252C, the video decoder does not include candidate predicted motion vector position 252D in the candidate predicted motion vector list. The above embodiments generally describe that candidate predicted motion vectors 252A-252D are exemplarily considered for inclusion in the candidate predicted motion vector list, but in some implementations all candidate predicted motion vectors 252A-252D may be added to the candidate predicted motion vector list first, with duplicates later removed from the candidate predicted motion vector list.

After the video decoder considers the first four spatial candidate predicted motion vectors, the list of candidate predicted motion vectors may include four spatial candidate predicted motion vectors or the list may include less than four spatial candidate predicted motion vectors. If the list includes four spatial candidate predicted motion vectors (904, yes), the video decoder considers the temporal candidate predicted motion vector (906). The temporal candidate prediction motion vector may correspond to motion information of a co-located PU of an image different from the current image. If a temporal candidate predicted motion vector is available and different from the first four spatial candidate predicted motion vectors, the video decoder assigns the temporal candidate predicted motion vector to index 4. If the temporal candidate predicted motion vector is not available or is the same as one of the first four spatial candidate predicted motion vectors, the video decoder does not include the temporal candidate predicted motion vector in the list of candidate predicted motion vectors. Thus, after the video decoder considers the temporal candidate predicted motion vector (906), the candidate predicted motion vector list may include five candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902 and the temporal candidate predicted motion vectors considered at block 904) or may include four candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902). If the candidate predicted motion vector list includes five candidate predicted motion vectors (908, yes), the video decoder completes building the list.

If the candidate predicted motion vector list includes four candidate predicted motion vectors (908, no), the video decoder may consider a fifth spatial candidate predicted motion vector (910). The fifth spatial candidate predicted motion vector may, for example, correspond to candidate predicted motion vector position 252E. If the candidate predicted motion vector at location 252E is available and different from the candidate predicted motion vectors at

locations

252A, 252B, 252C, and 252D, the video decoder may add a fifth spatial candidate predicted motion vector to the list of candidate predicted motion vectors, the fifth spatial candidate predicted motion vector being assigned to index 4. If the candidate predicted motion vector at location 252E is not available or is not different from the candidate predicted motion vectors at candidate predicted

motion vector locations

252A, 252B, 252C, and 252D, the video decoder may not include the candidate predicted motion vector at location 252 in the list of candidate predicted motion vectors. Thus after considering the fifth spatial candidate predicted motion vector (910), the list may include five candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902 and the fifth spatial candidate predicted motion vector considered at block 910) or may include four candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902).

If the candidate predicted motion vector list includes five candidate predicted motion vectors (912, yes), the video decoder completes generating the candidate predicted motion vector list. If the candidate predicted motion vector list includes four candidate predicted motion vectors (912, no), the video decoder adds the artificially generated candidate predicted motion vector (914) until the list includes five candidate predicted motion vectors (916, yes).

If the list includes less than four spatial candidate predicted motion vectors after the video decoder considers the first four spatial candidate predicted motion vectors (904, no), the video decoder may consider a fifth spatial candidate predicted motion vector (918). The fifth spatial candidate predicted motion vector may, for example, correspond to candidate predicted motion vector position 252E. If the candidate predicted motion vector at location 252E is available and different from the candidate predicted motion vector already included in the candidate predicted motion vector list, the video decoder may add a fifth spatial candidate predicted motion vector to the candidate predicted motion vector list, the fifth spatial candidate predicted motion vector being assigned to the next available index. If the candidate predicted motion vector at location 252E is not available or is not different from one of the candidate predicted motion vectors already included in the candidate predicted motion vector list, the video decoder may not include the candidate predicted motion vector at location 252E in the candidate predicted motion vector list. The video decoder may then consider the temporal candidate prediction motion vector (920). If a temporal candidate predictive motion vector is available and different from a candidate predictive motion vector already included in the list of candidate predictive motion vectors, the video decoder may add the temporal candidate predictive motion vector to the list of candidate predictive motion vectors, the temporal candidate predictive motion vector being assigned to the next available index. The video decoder may not include the temporal candidate predictive motion vector in the list of candidate predictive motion vectors if the temporal candidate predictive motion vector is not available or is not different from one of the candidate predictive motion vectors already included in the list of candidate predictive motion vectors.

If the candidate predicted motion vector list includes five candidate predicted motion vectors after considering the fifth spatial candidate predicted motion vector (block 918) and the temporal candidate predicted motion vector (block 920) (922, yes), the video decoder completes generating the candidate predicted motion vector list. If the candidate predicted motion vector list includes less than five candidate predicted motion vectors (922, no), the video decoder adds the artificially generated candidate predicted motion vector (914) until the list includes five candidate predicted motion vectors (916, yes).

According to the techniques of this application, additional merge candidate predicted motion vectors may be artificially generated after the spatial candidate predicted motion vector and the temporal candidate predicted motion vector to fix the size of the merge candidate predicted motion vector list to a specified number of merge candidate predicted motion vectors (e.g., five in the possible implementation of fig. 9 above). The additional Merge candidate predicted motion vectors may include an exemplary combined bi-predictive Merge candidate predicted motion vector (candidate predicted motion vector 1), a scaled bi-predictive Merge candidate predicted motion vector (candidate predicted motion vector 2), and a zero vector Merge/AMVP candidate predicted motion vector (candidate predicted motion vector 3).

Fig. 10 is an exemplary diagram illustrating adding a combined candidate motion vector to a merge mode candidate prediction motion vector list according to an embodiment of the present application. The combined bi-predictive merge candidate prediction motion vector may be generated by combining the original merge candidate prediction motion vectors. Specifically, two of the original candidate predicted motion vectors (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) may be used to generate bi-predictive merge candidate predicted motion vectors. In fig. 10, two candidate predicted motion vectors are included in the original merge candidate predicted motion vector list. The prediction type of one candidate predicted motion vector is list 0 uni-prediction and the prediction type of the other candidate predicted motion vector is list 1 uni-prediction. In this possible implementation, mvls 0_ a and ref0 are picked from list 0 and mvls 1_ B and ref0 are picked from list 1, and then bi-predictive merge candidate predictive motion vectors (which have mvL0_ a and ref0 in list 0 and mvL1_ B and ref0 in list 1) may be generated and checked for differences from the candidate predictive motion vectors already included in the candidate predictive motion vector list. If they are different, the video decoder may include the bi-predictive merge candidate prediction motion vector in the candidate prediction motion vector list.

Fig. 11 is an exemplary diagram illustrating adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list according to an embodiment of the present disclosure. The scaled bi-predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector. Specifically, a candidate predicted motion vector (which may have mvLX and refIdxLX) from the original candidate predicted motion vectors may be used to generate the bi-predictive merge candidate predicted motion vector. In the possible embodiment of fig. 11, two candidate predicted motion vectors are included in the original merged candidate predicted motion vector list. The prediction type of one candidate predicted motion vector is list 0 uni-prediction and the prediction type of the other candidate predicted motion vector is list 1 uni-prediction. In this possible implementation, mvls 0_ a and ref0 can be picked from list 0, and ref0 can be copied to the reference index ref 0' in list 1. Then, mvL0 '_ a can be calculated by scaling mvL0_ a with ref0 and ref 0'. Scaling may depend on the POC (Picture Order count) distance. Then, bi-predictive merge candidate prediction motion vectors (which have mvls 0_ a and ref0 in list 0 and mvls 0 '_ a and ref 0' in list 1) may be generated and checked for duplication. If it is not duplicated, it may be added to the merge candidate prediction motion vector list.

Fig. 12 is an exemplary diagram illustrating adding a zero motion vector to the merge mode candidate prediction motion vector list in the embodiment of the present application. The zero vector merge candidate prediction motion vector may be generated by combining the zero vector and a reference index that may be referenced. If the zero vector candidate predicted motion vector is not duplicate, it may be added to the merge candidate predicted motion vector list. For each generated merge candidate prediction motion vector, the motion information may be compared to the motion information of the previous candidate prediction motion vector in the list.

In a possible implementation, the generated candidate predicted motion vector is added to the merge candidate predicted motion vector list if the newly generated candidate predicted motion vector is different from the candidate predicted motion vectors already included in the candidate predicted motion vector list. The process of determining whether a candidate predicted motion vector is different from a candidate predicted motion vector already included in the candidate predicted motion vector list is sometimes referred to as pruning. By pruning, each newly generated candidate predicted motion vector may be compared to existing candidate predicted motion vectors in the list. In some possible implementations, the pruning operation may include comparing one or more new candidate predicted motion vectors with candidate predicted motion vectors already in the list of candidate predicted motion vectors and new candidate predicted motion vectors that are not added as duplicates of candidate predicted motion vectors already in the list of candidate predicted motion vectors. In other possible embodiments, the pruning operation may include adding one or more new candidate predicted motion vectors to the list of candidate predicted motion vectors and later removing duplicate candidate predicted motion vectors from the list.

Several embodiments of inter prediction are described below, and the first preset algorithm and the second preset algorithm in the present application may include one or more of them.

Inter-picture prediction exploits temporal correlation between pictures to derive motion-compensated prediction (MCP) for blocks of image samples.

For such block-based MCP, video pictures are divided into rectangular blocks. Assuming uniform motion within a block and a moving object is larger than a block, for each block, a corresponding block in a previously decoded picture can be found as a prediction value. Using a translational motion model, the position of a block in a previously decoded picture is represented by a motion vector (Δ x, Δ y), where Δ x specifies the horizontal displacement relative to the position of the current block and Δ y specifies the vertical displacement relative to the position of the current block. The motion vectors (Δ x, Δ y) may have fractional sample precision to more accurately capture the movement of the underlying object. When the corresponding motion vector has fractional sample precision, interpolation is applied to the reference picture to obtain the prediction signal. The previously decoded picture is referred to as a reference picture and is indicated by a reference index Δ t corresponding to a reference picture list. These translational motion model parameters, i.e. motion vectors and reference indices, are further referred to as motion data. Modern video coding standards allow two kinds of inter-picture prediction, namely unidirectional prediction and bi-directional prediction.

In the case of bi-prediction, two sets of motion data (Δ x0, Δ y0, Δ t0 and Δ x1, Δ y1, Δ t1) are used to generate two MCPs (possibly from different pictures) which are then combined to obtain the final MCP. By default this is done by averaging, but in the case of weighted prediction, different weights may be applied to each MCP, e.g. to compensate for scene fades. Reference pictures that can be used in bi-prediction are stored in two separate lists, list 0 and list 1. To limit memory bandwidth in slices that allow bi-prediction, the HEVC standard limits PUs with 4 × 8 and 8 × 4 luma prediction blocks to use only uni-directional prediction. Motion data is obtained at the encoder using a motion estimation process. Motion estimation is not specified in video standards, so different encoders may use different complexity and quality tradeoffs in their implementation.

Motion data for one block is correlated with neighboring blocks. To exploit this correlation, motion data is not encoded directly in the codestream, but is predictively encoded based on neighboring motion data. In HEVC, two concepts are used for this. Predictive coding of motion vectors is improved in HEVC by introducing a new tool called Advanced Motion Vector Prediction (AMVP), where the best predictor for each motion block is signaled to the decoder. In addition, a new technique, called inter-prediction block merging, derives all motion data of a block from neighboring blocks, replacing the pass and skip modes in h.264/AVC.

Advanced motion vector prediction

As in previous video coding standards, HEVC motion vectors are encoded according to horizontal (x) and vertical (y) components as a difference from a so-called Motion Vector Predictor (MVP). The calculation of the two Motion Vector Difference (MVD) components is shown in equations (1.1) and (1.2).

MVD_X＝Δx-MVP_X (1.1)

MVD_Y＝Δy-MVP_Y (1.2)

The motion vector of the current block is typically related to the motion vector of a neighboring block in the current picture or in an earlier coded picture. This is because the neighboring blocks may correspond to the same moving object having similar motion, and the motion of the object is unlikely to change abruptly over time. Thus, using the motion vectors in the neighboring blocks as prediction values reduces the size of the signaled motion vector difference. MVP is typically derived from already decoded motion vectors in co-located pictures from spatially neighboring blocks or from temporally neighboring blocks. In some cases, a zero motion vector may also be used as the MVP. In h.264/AVC this is done by performing component-wise medians of three spatially neighboring motion vectors. Using this approach, there is no need to signal the predicted values. Temporal MVP from co-located pictures is only considered in the so-called temporal pass-through mode of h.264/AVC. The h.264/AVC pass-through mode is also used to derive other motion data than motion vectors.

In HEVC, the method of implicitly deriving MVPs is replaced by a technique called motion vector competition, which explicitly signals which MVP in the list of MVPs is used for motion vector derivation. The variable coding quadtree block structure in HEVC may result in one block having several neighboring blocks with motion vectors as potential MVP candidates. The initial design of Advanced Motion Vector Prediction (AMVP) contains five MVPs from three different classes of predictors: three motion vectors from spatial neighbors, the median of the three spatial predictors, and a scaled motion vector from the co-located temporal neighboring block. Furthermore, the prediction value list is modified by reordering to place the most likely motion prediction value in the first position and by removing redundant candidates to ensure minimal signaling overhead. Next, important simplifications of the AMVP design are developed, such as removing the median predictor, reducing the number of candidates in the list from five to two, fixing the order of candidates in the list, and reducing the number of redundancy checks. The final design of the AMVP candidate list construction contains the following two MVP candidates: a. at most two spatial candidate MVPs obtained from five spatial neighboring blocks; a temporal candidate MVP obtained from two temporal co-located blocks when two spatial candidate MVPs are not available or they are the same; c. zero motion vectors when spatial candidates, temporal candidates, or both are not available.

As already mentioned, two spatial MVP candidates a and B are derived from five spatial neighboring blocks. For AMVP and inter-prediction block merging, the positions of the spatial candidate blocks are the same. For candidate a, the motion data from the two blocks a0 and a1 from the lower left corner are considered in a two-pass approach. In the first pass, it is checked whether any candidate block contains a reference index equal to the reference index of the current block. The first motion vector found will be candidate a. When all reference indices from a0 and a1 point to a reference picture different from the reference index of the current block, the related motion vector cannot be used as it is. Thus, in the second pass, the motion vector needs to be scaled according to the temporal distance between the candidate reference picture and the current reference picture. Equation (1.3) shows how the candidate motion vector mvcand is scaled according to the scaling factor. The ScaleFactor is calculated based on the temporal distance between the current picture and the reference picture of the candidate block td and the temporal distance between the current picture and the reference picture of the current block tb. The temporal distance is expressed as a difference between Picture Order Count (POC) values defining a picture display order. The scaling operation is basically the same as the scheme used for time-pass mode in h.264/AVC. This decomposition allows the ScaleFactor to be pre-computed at the slice level, since it depends only on the reference picture list structure signaled in the slice header. It should be noted that MV scaling is only performed when both the current reference picture and the candidate reference picture are short-term reference pictures. The parameter td is defined as the POC difference between the co-located picture of the co-located candidate block and the reference picture.

mv＝sign(mv_cand·ScaleFactor)·((|mv_cand·ScaleFactor|+2⁷)>>8) (1.3)

ScaleFactor＝clip(-2¹²,2¹²-1,(tb·tx+2⁵)>>6) (1.4)

For candidate B, candidates B0 through B2 are checked in sequence in the same manner as A0 and A1 are checked in the first pass. However, the second pass is only performed when blocks a0 and a1 do not contain any motion information, i.e., are not available or are encoded using intra-picture prediction. Next, if candidate A is found, then candidate A is set equal to candidate B, which is not scaled, and candidate B is set equal to a second, unscaled or scaled variant of candidate B. The second search pass finds the unscaled and scaled MVs from candidates B0 through B2. In general, this design allows processing a0 and a1 independently of B0, B1, and B2. B's derivation should only understand the availability of both a0 and a1 in order to search for scaled or otherwise unscaled MVs derived from B0 to B2. This dependency is acceptable in view of the fact that it significantly reduces the complex motion vector scaling operation of candidate B. Reducing the number of motion vector scalings represents a significant complexity reduction in the motion vector predictor derivation process.

In HEVC, the bottom right and center blocks of the current block have been determined to be best suited to provide a good Temporal Motion Vector Predictor (TMVP). Of these candidates, C0 represents the bottom right neighbor and C1 represents the center block. Here again, the motion data of C0 is considered first and if not available, the temporal MVP candidate C is derived using the motion data from the co-located candidate block at the center. The motion data of C0 is also considered unavailable when the associated PU belongs to a CTU other than the current CTU row. This minimizes the memory bandwidth requirements for storing the co-located motion data. In contrast to spatial MVP candidates where motion vectors may refer to the same reference picture, motion vector scaling is mandatory for TMVP. Therefore, the same scaling operation as the spatial MVP is used.

Although the temporal pass-through mode in h.264/AVC always refers to the second reference picture list, i.e. the first reference picture in list 1, and is only allowed in bi-predictive slices, HEVC provides the possibility to indicate for each picture which reference picture is considered as a co-located picture. This is done by signaling the co-located reference picture list and reference picture index in the slice header and requiring that these syntax elements in all slices in the picture should specify the same reference picture.

Since the temporal MVP candidate introduces additional dependencies, its use may need to be disabled for error robustness reasons. In h.264/AVC, it is possible to disable the temporal pass-through mode (direct _ spatial _ mv _ pred _ flag) of bi-directional predicted slices in the slice header. The HEVC syntax extends this signaling by allowing TMVP (sps/slice temporal mvp enabled flag) to be disabled at the sequence level or at the picture level. Although the flag is signaled in the slice header, its value should be the same for all slices in a picture, which is a requirement for codestream consistency. Signaling the picture-level flag in the PPS will introduce a resolution dependency between SPS and PPS, since the signaling of the picture-level flag depends on the SPS flag. Another advantage of such header signaling is that if one wants to change only the value of this flag in the PPS and not other parameters, there is no need to send a second PPS.

In general, motion data signaling in HEVC is similar to that in h.264/AVC. The inter-picture prediction syntax element inter _ pred _ idc signals whether to use the

reference list

0, 1, or both. For each MCP obtained from one reference picture list, the corresponding reference picture (Δ t) is signaled by the index ref _ idx _ l0/1 of the reference picture list, and MV (Δ x, Δ y) is signaled by the index MVP _ l0/1_ flag of MVP and its MVD. The flag MVD _ l1_ zero _ flag newly introduced in the slice header indicates whether the MVD of the second reference picture list is equal to zero and thus is not signaled in the bitstream. When the motion vector is fully reconstructed, the final clipping operation ensures that the value of each component of the final motion vector will always be in the range-215 to 215-1, inclusive.

Inter-picture prediction Bloch (Bloch) merging

The AMVP list contains only the motion vectors of one reference list, while the merge candidate contains all motion data, including information whether to use one or two reference picture lists and the reference index and motion vector of each list. In summary, the merge candidate list is constructed based on the following candidates: a. a maximum of four spatial merge candidates derived from five spatial neighboring blocks; b. a temporal merging candidate derived from two temporal parity blocks; c. further merge candidates comprising a combined bi-directional prediction candidate and a zero motion vector candidate.

The first candidate in the merge candidate list is a spatial neighbor. By sequentially checking a1, B1, B0, a0, and B2 in that order, up to four candidates may be inserted in the merge list in that order.

Instead of just checking whether neighboring blocks are available and contain motion information, some additional redundancy check is performed before all motion data of neighboring blocks are taken as merge candidates. These redundancy checks can be divided into two categories for two different purposes: a. avoiding the presence of candidates with redundant motion data in the list; b. preventing merging of two otherwise representable partitions that would generate redundant syntax.

When N is the number of spatial merge candidates, the complete redundancy check will be performed by

And comparing the secondary motion data. In the case of five potential spatial merge candidates, ten motion data comparisons will be needed to ensure that all candidates in the merge list have different motion data. During the development of HEVC, the check for redundant motion data has been reduced to a subset, thereby maintaining coding efficiency while significantly reducing comparison logic. In the final design, no more than two comparisons are performed for each candidate, resulting in a total of five comparisons. Given the order of { A1, B1, B0, A0, B2}, B0 only examined B1, A0 only examined A1, and B2 only examined A1 and B1. In the embodiment of partition redundancy check, the bottom PU and the top PU of the 2 NxN partition are merged by selecting candidate B1. This will result in one CU with two PUs having the same motion data, which can be equally signaled as a 2 nx 2N CU. In general, this check applies to all second PUs of the rectangular and asymmetric partitions 2 NxN, 2 NxnU, 2 NxnD, Nx 2N, nR x 2N, and nLx 2N. It should be noted that for the spatial merge candidate, only the redundancy check is performed, and the motion data is copied from the candidate block as it is. Therefore, no motion vector scaling is required here.

The motion vector of the temporal merging candidate is obtained the same as that of the TMVP. Since the merge candidate includes all motion data and the TMVP is only one motion vector, the derivation of the entire motion data depends only on the slice type. For bi-predictive slices, the TMVP is obtained for each reference picture list. The prediction type is set to bi-directional prediction or to a list where TMVP is available, depending on the availability of TMVP for each list. All related reference picture indices are set equal to zero. Thus, for uni-directional prediction slices, only the TMVP of list 0 is obtained along with the reference picture index equal to zero.

When at least one TMVP is available and a temporal merge candidate is added to the list, no redundancy check is performed. This makes the merge list construction independent of the co-located pictures, thereby improving the error resilience. Consider the case where a temporal merge candidate would be redundant and therefore not included in the merge candidate list. In case of a missing co-located picture, the decoder cannot get a temporal candidate and therefore does not check if it is redundant. The indices of all subsequent candidates will be affected by this.

The length of the merge candidate list is fixed for parsing robustness reasons. After the spatial and temporal merging candidates have been added, it may happen that the list is not yet fixed in length. To compensate for the coding efficiency loss that occurs with non-length adaptive list index signaling, additional candidates are generated. Depending on the type of slice, at most two candidates can be used to fully populate the list: a. combining the bi-directional prediction candidates; b. zero motion vector candidates.

In bi-predictive slices, additional candidates may be generated based on existing candidates by combining the reference picture list 0 motion data of one candidate with the list 1 motion data of another candidate. This is done by copying Δ x from one candidate, the first candidate, etc₀、Δy₀、Δt₀And copying Δ x from another candidate, such as the second candidate₁、Δy₁、Δt₁To complete. The different combinations are predefined and given in table 1.1.

TABLE 1.1

After adding combined bi-prediction candidates or when the list is still incomplete for uni-prediction slices, zero motion vector candidates are computed to complete the list. All zero motion vector candidates have one zero-shift motion vector for a uni-predicted slice and two zero-shift motion vectors for a bi-predicted slice. The reference index is set equal to zero and incremented by one for each additional candidate until a maximum number of reference indices is reached. If this is the case, and there are other candidates missing, then a reference index equal to zero is used to create these candidates. For all further candidates, no redundancy checks are performed, as the results show that omitting these checks does not cause a loss in coding efficiency.

For each PU coded in inter-picture prediction mode, the so-called merge _ flag indicates that the block merge is used to derive motion data. merge idx further determines the candidates in the merge list that provide all the motion data needed for MCP. In addition to this PU level signaling, the number of candidates in the merge list is signaled in the slice header. Since the default value is five, it is expressed as a difference from five (five _ minus _ max _ num _ merge _ cand). Thus, five is signaled with a short codeword of 0, while using only one candidate is signaled with a longer codeword of 4. As for the impact on the merge candidate list construction process, the entire process remains unchanged, but after the list contains the maximum number of merge candidates, the process terminates. In the initial design, the maximum value of the merge index coding is given by the number of available spatial and temporal candidates in the list. The index may be efficiently encoded as a flag when, for example, only two candidates are available. However, in order to resolve the merge index, the entire merge candidate list must be built to know the actual number of candidates. Assuming that the neighboring blocks are not available due to transmission errors, it will not be possible to resolve the merge index any more.

A key application of the block merging concept in HEVC is the combination with skip mode. In previous video coding standards, skip mode was used to indicate such blocks: the motion data is speculated rather than explicitly signaled and the prediction residual is zero, i.e. no transform coefficients are sent. In HEVC, skip _ flag is signaled at the beginning of each CU in an inter-picture prediction slice, which means the following: a cu contains only one PU (2 nx 2N partition type); b. using merge mode to obtain motion data (merge _ flag equal to 1); c. residual data does not exist in the code stream.

A parallel merge estimation hierarchy indicating regions is introduced in HEVC, where the merge candidate list can be derived independently by checking whether a candidate block is located in the Merge Estimation Region (MER). Candidate blocks in the same MER are not included in the merge candidate list. Thus, its motion data need not be available at list building time. When this level is, for example, 32, then all prediction units in the 32 × 32 region may build the merge candidate list in parallel, since all merge candidates that are in the same 32 × 32MER are not inserted into the list. All potential merge candidates for the first PU0 are available because they are outside the first 32 x 32 MER. For the second MER, the merge candidate lists for PUs 2-6 cannot contain motion data from these PUs when the merge estimates within the MER should be independent. Thus, for example, when looking at PU5, no merge candidates are available and therefore are not inserted in the merge candidate list. In this case, the merge list of PU5 consists of only time candidates (if available) and zero MV candidates. To enable the encoder to trade-off parallelism and coding efficiency, the parallel merge estimation level is adaptive and signaled in the picture parameter set as log2_ parallel _ merge _ level _ minus 2.

sub-CU-based motion vector prediction

During the development of new video coding techniques, each CU may have at most one set of motion parameters for each prediction direction using QTBT. Two sub-CU level motion vector prediction methods are considered in the encoder by dividing the large CU into sub-CUs and getting motion information of all sub-CUs of the large CU. An Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to extract multiple sets of motion information from blocks in the co-located reference picture that are smaller than the current CU. In a spatial-temporal motion vector prediction (STMVP) method, a motion vector of a sub-CU is recursively obtained by using a temporal motion vector predictor and a spatial neighboring motion vector.

In order to preserve more accurate motion fields for sub-CU motion prediction, motion compression of reference frames is currently disabled.

Alternative temporal motion vector prediction

In an Alternative Temporal Motion Vector Prediction (ATMVP) method, the motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. A sub-CU is a square N × N block (N is set to 4 by default).

ATMVP predicts motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture using a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain the motion vector and reference index of each sub-CU from the block corresponding to each sub-CU.

In a first step, a reference picture and a corresponding block are determined by motion information of spatially neighboring blocks of the current CU. To avoid a repeated scanning process of neighboring blocks, the first merge candidate in the merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set to the indices of the temporal vector and the motion source picture. In this way, in ATMVP, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes referred to as a co-located block) is always located in the lower right or center position relative to the current CU.

In a second step, the corresponding blocks of the sub-CU are identified by the temporal vectors in the motion source picture by adding the temporal vectors to the coordinates of the current CU. For each sub-CU, the motion information of its corresponding block (the smallest motion grid covering the center sample) is used to derive the motion information of the sub-CU. After identifying the motion information of the corresponding nxn block, it is converted into a motion vector and reference index of the current sub-CU in the same way as the TMVP of HEVC, where motion scaling and other procedures are applied. For example, the decoder checks whether a low delay condition is met (i.e., POC of all reference pictures of the current picture is less than POC of the current picture) and motion vector MVy (motion vector corresponding to reference picture list X) for each sub-CU may be predicted using motion vector MVx (where X is equal to 0 or 1 and Y is equal to 1-X).

Spatio-temporal motion vector prediction

In this method, the motion vectors of the sub-CUs are recursively obtained in raster scan order. Consider an 8 × 8CU containing four 4 × 4 sub-CUs a, B, C, and D. The neighboring 4 x 4 blocks in the current frame are labeled a, b, c, and d.

The motion of sub-CU a is started by identifying its two spatial neighbors. The first neighbor is an nxn block (block c) above the sub-CU a. If this block c is not available or intra coded, the other nxn blocks above the sub-CU a are examined (from left to right, starting from block c). The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or intra-coded, the other blocks to the left of sub-CU a are examined (from top to bottom, starting from block b). For a given list, the motion information obtained from the neighboring blocks of each list is scaled to the first reference frame. Next, a Temporal Motion Vector Predictor (TMVP) of the sub-block a is obtained by following the same procedure as the TMVP obtained as specified in HEVC. The motion information of the co-located block at position D is extracted and scaled accordingly. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) of each reference list are averaged separately. The average motion vector is assigned as the motion vector of the current sub-CU.

Combining merge modes

The sub-CU mode is enabled as a further merge candidate and no further syntax elements are needed to signal the mode. Two additional merge candidates are added to the merge candidate list of each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, a maximum of seven merge candidates are used. The coding logic of the further merge candidates is the same as the coding logic of the merge candidates in the HM, which means that two more RD checks are needed for two further merge candidates for each CU in a P or B slice.

Affine motion compensated prediction

The affine motion field of the block is described by two control point motion vectors.

The Motion Vector Field (MVF) of a block is described by the following equation:

wherein (v)_0x，v_0y) Is the motion vector of the upper left control point, (v)_1x，v_1y) Is the motion vector of the upper right hand corner control point.

To further simplify motion compensated prediction, sub-block based affine transform prediction is applied. The subblock size, M N, is obtained as in equation (1.7), where MvPre is the motion vector fractional precision (e.g., 1/16), (v)_2x，v_2y) Is the motion vector of the lower left control point calculated according to equation (1.6).

After being obtained by equation (1.7), M and N should be adjusted downward as necessary to be divisors of w and h, respectively.

To obtain the motion vector for each M × N sub-block, the motion vector for the center sample of each sub-block is calculated according to equation (1.6) and rounded to a fractional precision of 1/16.

Affine inter mode

For CUs with a width and height larger than 8, the AF _ INTER mode may be applied. An affine flag in the CU level is signaled in the codestream to indicate whether AF _ INTER mode is used. In this mode, neighboring blocks are used to construct a motion vector pair { (v)₀,v₁)|v₀＝{v_A,v_B,v_C},v₁＝{v_D,v_E} list of candidates. Selecting v from motion vectors of block A, B or C₀. The motion vectors from the neighboring blocks are scaled according to the reference list and the relationship between the POC referenced by the neighboring block, the POC referenced by the current CU, and the POC of the current CU. Selecting v from neighboring blocks D and E₁The method is similar. If the number of candidate lists is less than 2, then by copying each AMVP candidateThe list is populated with pairs of motion vectors. When the candidate list is larger than 2, the candidates are first sorted according to the consistency of the neighboring motion vectors (similarity to two motion vectors in the candidate), and only the first two candidates are retained. The RD cost check is used to determine which motion vector pair candidate to select as the Control Point Motion Vector Prediction (CPMVP) for the current CU. And signaling an index in the codestream indicating the location of the CPMVP in the candidate list. The difference between CPMV and CPMVP is signaled in the codestream.

Affine merge mode

When a CU is applied in AF _ MERGE mode, it obtains the first block encoded in affine mode from the valid neighboring reconstructed blocks. The candidate blocks are selected in order from left, top right, bottom left to top left. If the neighboring lower left block A is coded in affine mode, the motion vectors v of the upper left corner, upper right corner and lower left corner of the CU containing block A are obtained₂、v₃And v₄. And according to v₂、v₃And v₄Calculating a motion vector v of an upper left corner on a current CU₀. Next, the motion vector v at the upper right of the current CU is calculated₁。

To identify whether the current CU is encoded with AF _ MERGE mode, an affine flag is signaled in the codestream when there is at least one neighboring block encoded with affine mode.

Pattern matching motion vector derivation

The pattern-matched motion vector derivation (PMMVD) pattern is based on a Frame-Rate Up Conversion (FRUC) technique. In this mode, the motion information of the block is not signaled, but is obtained at the decoder side.

When the merging flag of a CU is true, its FRUC flag is signaled. When the FRUC flag is false, the merge index is signaled and the normal merge mode is used. When the FRUC flag is true, an additional FRUC mode flag is signaled to indicate which method (bilateral matching or template matching) will be used to derive motion information for the block.

At the encoder side, the decision on whether or not the CU uses FRUC merge mode is based on RD cost selection for the normal merge candidates. Both matching patterns (bilateral matching and template matching) are checked against the CU by using RD cost selection. The one with the lowest cost is further compared to the other CU patterns. If the FRUC matching pattern is the most efficient pattern, the FRUC flag for the CU is set to true and the associated matching pattern is used.

The motion acquisition process in FRUC merge mode has two steps. CU-level motion search is performed first, followed by sub-CU-level motion refinement. At the CU level, an initial motion vector of the entire CU is derived based on bilateral matching or template matching. First, a list of MV candidates is generated and the candidate that minimizes the matching cost is selected as the starting point for further CU level refinement. Then, a local search based on bilateral matching around the starting point or template matching is performed, and the MV that minimizes the matching cost is taken as the MV of the entire CU. Subsequently, the motion information is further refined at the sub-CU level, with the resulting CU motion vector as a starting point.

For example, the following derivation process is performed for W × H CU motion information derivation. In the first stage, the MVs of the entire W × H CUs are obtained. In the second stage, the CU is further divided into M × M sub-CUs. The value of M is calculated as equation (1.8), D is a predefined depth of segmentation, and is set to 3 by default in JEM. The MV of each sub-CU is then obtained.

The motion information of the current CU is obtained using bilateral matching by finding the closest match between two blocks along the motion trajectory of the current CU in two different reference pictures. On the premise of a continuous motion trajectory, the motion vectors MV0 and MV1 pointing to the two reference blocks should be proportional to the temporal distance between the current picture and the two reference pictures, i.e., TD0 and TD 1. Bilateral matching becomes a mirror-based bi-directional MV when the current picture is temporally between two reference pictures and the temporal distance from the current picture to the two reference pictures is the same.

In the bilateral matching merging mode, bi-prediction is always applied since the motion information of a CU is derived based on the closest match between two blocks along the motion trajectory of the current CU in two different reference pictures. There is no such restriction on the template matching merge pattern. In template matching merge mode, the encoder may select between unidirectional prediction from list0, unidirectional prediction from list1, or bi-directional prediction for a CU. The selection is based on the template matching cost as follows:

if costBi < ═ factor x min (cost0, cost1)

Using bi-directional prediction;

otherwise, if cost0< ═ cost1

Using one-way prediction from list 0;

if not, then,

using one-way prediction from list 1;

where cost0 is the SAD of the list0 template match, cost1 is the SAD of the list1 template match, and cost Bi is the SAD of the bi-prediction template match. The value of the factor is equal to 1.25, which means that the selection process is biased towards bi-directional prediction. Inter prediction direction selection is only applicable to CU-level template matching process.

The motion information of the current CU is derived using template matching by finding the closest match between the template (the top and/or left neighboring block of the current CU) in the current picture and the block (same size as the template) in the reference picture. In addition to the FRUC merge mode mentioned above, template matching is also applicable to AMVP mode. Using a template matching method, new candidates are obtained. If the newly obtained candidate by template matching is different from the first existing AMVP candidate, it is inserted into the very beginning of the AMVP candidate list and then the list size is set to 2 (meaning the second existing AMVP candidate is removed). When applied to AMVP mode, only CU level search is applied.

The MV candidates set at the CU level include: a. selecting an original AMVP candidate if the current CU is in AMVP mode; b. all merge candidates; c. interpolating several MVs in the MV field; d. top and left neighboring motion vectors.

It should be noted that the interpolated MV field described above is generated before encoding the picture of the entire picture based on the unilateral ME. The motion field may then be used later as a CU-level or sub-CU-level MV candidate. First, the motion field of each reference picture in the two reference lists is traversed at the 4 × 4 block level. For each 4 x 4 block, if the motion associated with the block passes through a 4 x 4 block in the current picture and the block has not been assigned any interpolated motion, the motion of the reference block is scaled to the current picture according to temporal distances TD0 and TD1 (in the same way as MV scaling of TMVP in HEVC) and the scaled motion is assigned to the block in the current frame. If no scaled MV is assigned to the 4 x 4 block, the motion of the block is marked as unavailable in the interpolated motion field.

When using bilateral matching, each valid MV of the merge candidate is used as an input to generate MV pairs assuming bilateral matching. For example, one valid MV of the merge candidate is (MVa, refa) at reference list a. Then, the reference picture refb of its paired bilateral MV is found in another reference list B, so that refa and refb are temporally at different sides of the current picture. If such refb is not available in reference list B, refb is determined to be a different reference than refa and its temporal distance to the current picture is the minimum in list B. After determining refb, MVb is obtained by scaling MVa based on the temporal distance between the current picture and refa, refb.

The four MVs from the interpolated MV field are also added to the CU level candidate list. More specifically, interpolated MVs at positions (0,0), (W/2,0), (0, H/2), and (W/2, H/2) of the current CU are added.

When FRUC is applied in AMVP mode, the original AMVP candidate is also added to the CU-level MV candidate set.

At the CU level, up to 15 MVs of AMVP CUs and up to 13 MVs of merged CUs are added to the candidate list.

The MV candidates set at the sub-CU level include: a. searching the determined MV from the CU hierarchy; b. top, left, top left and top right adjacent MVs; c. a scaled version of the co-located MVs from the reference picture; d. a maximum of 4 ATMVP candidates; e. a maximum of 4 STMVP candidates.

The scaled MV from the reference picture is obtained as follows. All reference pictures in both lists are traversed. The MVs at the co-located positions of the sub-CUs in the reference picture are scaled to the reference of the starting CU level MV.

ATMVP and STMVP candidates are limited to the first four.

At the sub-CU level, a maximum of 17 MVs are added to the candidate list.

Motion vector refinement

The motion vectors can be refined by different methods in combination with different inter prediction modes.

MV refinement in FRUC

MV refinement is a pattern-based MV search with criteria of bilateral matching cost or template matching cost. In the current development, two search modes are supported-unrestricted center-biased diamond search (UCBDS) at the CU level and sub-CU level respectively and adaptive cross search for MV refinement. For refinement of CU and sub-CU levels MV, MV is searched directly with quarter luma sample MV precision, followed by one-eighth luma sample MV refinement. The search range for MV refinement for the CU and sub-CU steps is set equal to 8 luma samples.

Decoder-side motion vector refinement

In the bi-directional prediction operation, in order to predict one block region, two prediction blocks formed using MV of list0 and MV of list1, respectively, are combined to form a single prediction signal. In a decoder-side motion vector refinement (DMVR) method, two motion vectors for bi-directional prediction are further refined by a two-sided template matching process. Applying double-sided template matching in the decoder to perform a distortion-based search between the double-sided template and reconstructed samples in the reference picture in order to obtain refined MVs without sending further motion information.

In DMVR, two-sided templates are generated as a weighted combination (i.e., average) of two prediction blocks from the initial MV0 of list0 and the MV1 of list1, respectively. The template matching operation consists of computing a cost metric between the generated template and a sample region in the reference picture (around the initial prediction block). For each of the two reference pictures, the MV that yields the smallest template cost is considered as the updated MV of the list to replace the original MV. In current development, each list is searched for nine MV candidates. The nine MV candidates contain the original MV and 8 surrounding MVs, with one luma sample shifted to the original MV in the horizontal or vertical direction, or both. Finally, the final bi-directional prediction results are generated using the two new MVs, MV0 'and MV 1'. The Sum of Absolute Differences (SAD) is used as a cost measure.

DMVR is applied to bi-directionally predicted merge modes, where one MV is from a past reference picture and another MV is from a future reference picture, without sending additional syntax elements.

Motion data precision and storage

Motion data storage reduction

Using TMVP in AMVP as well as merge mode requires storing motion data (including motion vectors, reference indices and coding modes) in co-located reference pictures. The size of the memory required to store motion data may be important in view of the granularity of the motion representation. HEVC employs Motion Data Storage Reduction (MDSR) to reduce the size of a motion data buffer and associated memory access bandwidth by sub-sampling motion data in reference pictures. Although h.264/AVC stores these information on a 4 × 4 block basis, HEVC uses 16 × 16 blocks, where the information of the upper left 4 × 4 block is stored in case of subsampling a 4 × 4 grid. Due to this subsampling, MDSR affects the quality of the temporal prediction.

Furthermore, there is a close correlation between the position of the MV used in the co-located picture and the position of the MV stored by the MDSR. In the standardization process of HEVC, it was shown that storing the motion data of the top left block together with the bottom right and center TMVP candidates within the 16 × 16 region provides the best compromise between coding efficiency and memory bandwidth reduction.

Higher motion vector storage accuracy

In HEVC, motion vector precision is one-quarter pixel (one-quarter luma sample and one-eighth chroma sample for 4:2:0 video). In the current development, the accuracy of the intra motion vector storage and merging candidates is improved to 1/16 pixels. Higher motion vector precision (1/16 pixels) is used in motion compensated inter prediction for CUs encoded in skip/merge mode. For CUs encoded in normal AMVP mode, integer-pixel or quarter-pixel motion is used.

Adaptive motion vector difference resolution

In HEVC, when use _ integer _ mv _ flag is equal to 0 in a slice header, a Motion Vector Difference (MVD) is signaled in units of quarter luma samples. In the current development, a Locally Adaptive Motion Vector Resolution (LAMVR) is introduced. The MVD may be encoded in units of quarter luma samples, integer luma samples, or four luma samples. MVD resolution is controlled at the Coding Unit (CU) level and an MVD resolution flag is signaled conditionally for each CU having at least one non-zero MVD component.

For a CU with at least one non-zero MVD component, a first flag is signaled to indicate whether quarter luma sample MV precision is used in the CU. When a first flag (equal to 1) indicates that quarter luma sample MV precision is not used, another flag is signaled to indicate whether integer luma sample MV precision or four luma sample MV precision is used.

When the first MVD resolution flag of a CU is zero or not coded for the CU (meaning that all MVDs in the CU are zero), a quarter luma sample MV resolution is used for the CU. When a CU uses integer luma sample MV precision or four luma sample MV precision, the MVPs in the CU's AMVP candidate list are rounded to the respective precision.

In the encoder, a CU level RD check is used to determine which MVD resolution to use for a CU. That is, for each MVD resolution, three CU-level RD checks are performed.

Fractional sample interpolation module

When the motion vector points to a fractional sample position, motion compensated interpolation is required. For luma interpolation filtering, as shown in table 1.2, for 2/4 precision samples, an 8-tap separable DCT-based interpolation filter was used, and for 1/4 precision samples, a 7-tap separable DCT-based interpolation filter was used.

TABLE 1.2

Similarly, for the chroma interpolation filter, a 4-tap separable DCT-based interpolation filter is used, as shown in table 1.3.

TABLE 1.3

For vertical interpolation of 4:2:2 and horizontal and vertical interpolation of 4:4:4 chroma channels, the odd positions in table 1.3 are not used, resulting in 1/4 chroma interpolation.

For bi-directional prediction, the output of the interpolation filter is kept to 14-bit precision, regardless of the source bit depth, before averaging the two prediction signals. The actual averaging process is implicit in the bit depth reduction process as follows:

predSamples[x,y]＝(predSamplesL0[x,y]+predSamplesL1[x,y]+offset)＞＞shift (1.9)

shift＝15-BitDepth (1.10)

offset＝1＜＜(shift-1) (1.11)

to reduce complexity, bilateral linear interpolation is used for bilateral matching and template matching instead of the conventional 8-tap HEVC interpolation.

The matching cost is calculated somewhat differently in different steps. When selecting a candidate from the candidate set of the CU level, the matching cost is a SAD of the bilateral matching or the template matching. After determining the starting MV, the matching cost C of the bilateral matching at the sub-CU level search is calculated as follows:

where w is a weighting factor empirically set to 4, MV and MV^sIndicating the current MV and the starting MV, respectively. SAD is still used as the matching cost for template matching at the sub-CU level search.

In FRUC mode, the MV is obtained by using only the luma samples. The resulting motion will be used for the luminance and chrominance of the MC inter prediction. After the MV is determined, the final MC is performed using an 8-tap interpolation filter for luminance and a 4-tap interpolation filter for chrominance.

Motion compensation module

Overlapped block motion compensation

Overlapped Block Motion Compensation (OBMC) is performed for all Motion Compensation (MC) Block boundaries, except for the right and bottom boundaries of the CU currently under development. Furthermore, it applies to both luminance and chrominance components. The MC blocks correspond to coding blocks. When a CU is encoded in sub-CU mode (including sub-CU merge, affine and FRUC modes), each sub-block of the CU is an MC block. To handle CU boundaries in a uniform manner, OBMC is performed at the sub-block level for all MC block boundaries, with the sub-block size set equal to 4 × 4.

When OBMC is applied to the current sub-block, the motion vectors of the four connected neighboring sub-blocks are used to obtain a prediction block of the current sub-block, in addition to the current motion vector, if the motion vectors of the four connected neighboring sub-blocks are available and different from the current motion vector. These multiple prediction blocks based on multiple motion vectors are combined to generate a final prediction signal for the current sub-block.

The prediction block based on the motion vector of the neighboring sub-blocks is denoted as PN, where N denotes indexes of the neighboring upper, lower, left and right sub-blocks, and the prediction block based on the motion vector of the current sub-block is denoted as PC. OBMC is not performed from the PN when the PN is based on motion information of a neighboring sub-block containing the same motion information as the current sub-block. Otherwise, each PN sample is added to the same sample in the PC, i.e., four rows/columns of PNs are added to the PC. Weighting factors are used for PN 1/4,1/8,1/16,1/32 and for PC 3/4,7/8,15/16, 31/32. The exception is small MC blocks (i.e. when the height or width of the coding block is equal to 4 or the CU uses sub-CU mode coding), for such blocks only two rows/columns of PN are added to the PC. In this case, weighting factors {1/4,1/8} are used for PN and {3/4,7/8} are used for PC. For a PN generated based on motion vectors of vertically (horizontally) adjacent sub-blocks, samples in the same row (column) of the PN are added to the PC with the same weighting factor.

In the current development, for CUs with a size less than or equal to 256 luma samples, a CU level flag is signaled to indicate whether OBMC is applied for the current CU. For CUs with a size larger than 256 luma samples or not encoded in AMVP mode, OBMC is applied by default. At the encoder, the OBMC takes its effect into account in the motion estimation stage when it is applied to the CU. The top and left boundaries of the original signal of the current CU are compensated using a prediction signal formed by OBMC using motion information of the top-neighboring block and the left-neighboring block, and then a normal motion estimation process is applied.

Optimizing tool

Local illumination compensation

Local Illumination Compensation (LIC) uses a scale factor a and an offset b based on a linear model of Illumination variation. And it is adaptively enabled or disabled for each inter-mode coded Coding Unit (CU).

When LIC is applied to a CU, the parameters a and b are derived using the least squares error method by using neighboring samples of the current CU and their corresponding reference samples. The subsampled (2:1 subsampled) neighboring samples of the CU and corresponding samples in the reference picture (identified by the motion information of the current CU or sub-CU) are used. IC parameters are obtained and applied separately for each prediction direction.

When a CU is encoded in merge mode, copying LIC flags from neighboring blocks in a manner similar to motion information copying in merge mode; otherwise, a LIC flag is signaled for the CU to indicate whether LIC is applicable.

When LIC is enabled for a picture, an additional CU level RD check is needed to determine if LIC applies to a CU. When LIC is enabled for a CU, mean-removed sum of absolute differences (MR-SAD) and mean-removed sum of absolute Hadamard-transformed differences (MR-SATD) are used instead of SAD and SATD for integer-pixel motion search and fractional-pixel motion search, respectively.

Bidirectional light stream

Bi-directional Optical flow (BIO) is a sample-wise motion refinement performed on top of Bi-directionally predicted block-wise motion compensation. Sample level motion refinement does not use signaling.

Let I^(k)Is the luminance value from reference k (k 0,1) after block motion compensation, and

are each I^(k)The horizontal and vertical components of the gradient. Motion vector field (v) assuming that the optical flow is valid_x,v_y) Given by equation (1.13):

combining this optical flow equation with Hermite interpolation for the motion trajectory of each sample, the result is a unique third-order polynomial that is summed at both ends with the function value I^(k)And derivative of

Are all matched. The value of this polynomial when t is 0 is the BIO prediction value:

here, τ₀And τ₁Indicating the distance to the reference frame. Calculating distance τ from POC of Ref0 and Ref1₀And τ₁：τ₀POC (current) -POC (Ref0), τ₁POC (Ref1) -POC (current). If both predictions are from the same time direction (whether both are from the past or the future), the signs are different (i.e., τ₀·τ₁< 0). In this case, only if the predictions are not from the same time instant (i.e., τ)₀≠τ₁) Both reference areas have non-zero motion (MVx)₀,MVy₀,MVx₁,MVy₁Not equal to 0) and block motion vector-to-temporal distance (MVx) ₀/MVx₁＝MVy₀/MVy₁＝-τ₀/τ₁) BIO is applied only when proportional.

Motion vector field (v)_x,v_y) Determined by minimizing the difference a between the values in points a and B (the intersection of the motion trajectory and the reference frame plane). The model uses only the first linear term of the local taylor expansion of Δ:

all values in equation (1.15) depend on the sample position (i ', j'), and are presently omitted from the symbol. Assuming that the motion is consistent in the local surrounding area, Δ is minimized within a (2M +1) × (2M +1) square window Ω centered on the current predicted point (i, j), where M equals 2:

for this optimization problem, current developments use a simplified approach, first minimizing in the vertical direction and then minimizing in the horizontal direction. This yields:

wherein the content of the first and second substances,

to avoid division by zero or very small values, regularization parameters r and m are introduced in equations (1.17) and (1.18).

r＝500·4^d-8 (1.20)

m＝700·4^d-8 (1.21)

Where d is the bit depth of the video sample.

To keep the memory access for BIO the same as for conventional bi-predictive motion compensation, all prediction and gradient values I are computed only for the position within the current block^(k),

In equation (1.19), a (2M +1) × (2M +1) square window Ω centered on the current prediction point on the boundary of the prediction block needs to access locations outside the block. In the current development, the value I outside the block ^(k),

Is set equal to the nearest available value within the block. This may be implemented as padding, for example.

When using BIO, the motion field may be refined for each sample, but to reduce computational complexity, a block-based BIO design may be used. Motion refinement is calculated based on 4 x 4 blocks. In block-based BIO, the value sn in equation (1.19) of all samples in a 4 × 4 block is aggregated, and then the aggregated value sn is used to derive the BIO motion vector offset for the 4 × 4 block. Block-based BIO derivation is performed using the following formula:

where bk denotes the set of samples belonging to the kth 4 x 4 block of the prediction block. Sn in equations (1.17) and (1.18) is replaced with ((sn, bk) > >4) to get the relevant motion vector offset.

In some cases, the MV scheme of BIO may not be reliable due to noise or irregular motion. Thus, in the BIO, the size of the MV scheme is truncated to a threshold thBIO. The threshold is determined based on whether the reference pictures of the current picture are all from one direction. Setting the value of the threshold to 12 x 2 if all reference pictures of the current picture are from one direction^14-d(ii) a Otherwise set to 12 x 2^13-d。

The gradient of the BIO is computed simultaneously with the motion compensated interpolation using operations consistent with the HEVC motion compensation process (2D separable FIR). The input to this 2D separable FIR is the same reference frame sample for the motion compensation process and the fractional position (fracX, fracY) according to the fractional part of the block motion vector. In the horizontal gradient

In the case of a signal, the vertical interpolation is first performed using BIOfiltER S corresponding to the fractional position fracY with the canceling zoom shift d-8, and then the gradient filter BIOfiltER G is applied in the horizontal direction corresponding to the fractional position fracX with the canceling zoom shift 18-d. In the vertical gradient

In the case of (3), the gradient filter is first applied vertically using the bialterg corresponding to the fractional position fracY with the canceling zoom shift d-8, and then the signal shift is performed in the horizontal direction using the bialters corresponding to the fractional position fracX with the canceling zoom shift 18-d. To maintain reasonable complexity, the length of the interpolation filters for gradient computation BIOfiltG and signal shift BIOfiltF are short (6 taps). Table 1.4 shows the filtering for gradient computation of different fractional positions of block motion vectors in BIOA device. Table 1.5 shows interpolation filters used for prediction signal generation in BIO.

TABLE 1.4

Fractional pixel position	Interpolation filter for gradient (BIOfilterg)
		0	{8,-39,-3,46,-17,5}
1/16	{8,-32,-13,50,-18,5}
		1/8	{7.-27,-20,54,-19,5}
3/16	{6,-21,-29,57,-18,5}
		1/4	{4,-17,-36,60,-15，4}
5/16	{3，-9，-44，61,-15,4}
		3/8	{1,-4,-48,61,-13,3}
7/16	{0,1,-54,60,-9,2}
		1/2	{-1,4,-57,57,-4,1}

TABLE 1.5

In the current development, when the two predictions are from different reference pictures, the BIO is applied to all bi-predicted blocks. When LIC is enabled for a CU, BIO is disabled. OBMC is applied for a block after the normal MC procedure. To reduce computational complexity, no BIO is applied in the OBMC process. This means that BIO is applied in the MC process only when using its own MV, and not when using the MVs of the neighboring blocks in the OBMC process.

Weighted sample prediction module

As an optional tool, HEVC provides a Weighted Prediction (WP) tool. The principle of WP is to replace the inter prediction signal P with a linear weighted prediction signal P' (with weight w and offset o):

unidirectional prediction: p' ═ w × P + o (1.23)

Bidirectional prediction: p' ═ 2 (1.24) (w0 × P0+ o0+ w1 × P1+ o1)

The applicable weights and offsets are selected by the encoder and transmitted within the codestream. The L0 and L1 suffixes define List0 and List1, respectively, of the reference picture List. For the interpolation filter, the bit depth is kept to 14-bit precision before averaging the prediction signal.

In the case of bi-prediction with at least one reference picture available in each list L0 and L1, the following formula applies to the explicit signaling of weighted prediction parameters related to the luma channel. The corresponding formula applies to the case of chroma channels and unidirectional prediction.

Wherein the content of the first and second substances,

log2WD＝luma_log2_weight_denom+14-bitDepth

w0＝LumaWeightL0[refIdxL0]，w1＝LumaWeightL1[refIdxL1]

o0＝luma_offset_l0[refIdxL0]*highPrecisionScaleFactor

o1＝luma_offset_l1[refIdxL1]*highPrecisionScaleFactor

highPrecisionScaleFactor＝(1＜＜(bitDepth-8))

boundary prediction filtering (Boundary prediction filters) is an intra-coding method that is further adjusted for the leftmost or uppermost column of prediction pixels. In HEVC, after an intra-prediction block has been generated for a vertical or horizontal intra mode, further adjustments are made to the leftmost column or the topmost row of prediction samples, respectively. This method can be further extended to a number of diagonal internal modes and further scaled to up to four columns or four rows of boundary samples using two-tap (for internal modes 2 and 34) or three-tap filters (for internal modes 3-6 and 30-33).

In HEVC and previous standards, reference frames are divided into two groups, forward and backward, and placed in two reference frame lists (reference picture lists), respectively, commonly named list0 and list 1. Different reference frame lists list0, list1, or list0 and list1 are selected to be used according to a direction of prediction by indicating which prediction direction among forward prediction, backward prediction, or bi-prediction is used by the inter prediction direction for the current block. For the selected reference frame list, the reference frame is indicated by a reference frame index. In the selected reference frame, the position offset of a reference block of a prediction block of the current block in the reference frame with respect to the current block in the current frame is indicated by a motion vector. The final prediction block is then generated using the prediction block taken from the reference frame in list0, list1, or list0 and list1, according to the prediction direction. When the prediction direction is unidirectional, the prediction block obtained from the reference frame in the list0 or list1 is directly used, and when the prediction direction is bidirectional, the prediction blocks obtained from the reference frames in the list0 and list1 are synthesized into a final prediction block by means of weighted average.

In order to solve the problem that prediction residual energy is large due to the fact that prediction pixels obtained in an inter-frame prediction mode have certain discontinuity in a space domain and influence on prediction efficiency in the prior art, embodiments of the present application provide an inter-frame prediction method, which can filter prediction pixels by using neighboring reconstructed pixels after the prediction pixels are generated, thereby improving coding efficiency.

Fig. 13 is a schematic flowchart of an inter-frame prediction method according to an embodiment of the present application, and as shown in fig. 13, the method includes steps S1301 to S1307.

And S1301, analyzing the code stream to obtain motion information of the image block to be processed.

The to-be-processed image block may be referred to as a current block or a current CU.

It is understood that step S1301 can be performed by the video decoder 200 in fig. 1.

For example, the embodiments of the present application may use a block-based motion compensation technique to find a best matching block in a coded block for a current coding block, so that the residue between a prediction block and the current block is as small as possible, and calculate the offset MV of the current block.

For example, the image block to be processed may be any block in the image, and the size of the image block to be processed may be 2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, or 128 × 128, which is not limited in this embodiment of the application.

For example, if the to-be-processed image block (current block) is encoded in a merge mode (merge) at the encoding end, the spatial candidate and the temporal candidate of the current block may be added to the merge motion information candidate list of the current block, and the method is the same as that in HEVC. For example, any of the techniques described in fig. 8-12 may be used in the specific method for fusing the motion information candidate list.

Exemplarily, if the current block is in merge mode, the motion information of the current block is determined according to the fusion index carried in the code stream. And if the current block is in an Inter MVP mode, determining the motion information of the current block according to the Inter-frame prediction direction, the reference frame index, the motion vector predicted value index and the motion vector residual value transmitted in the code stream.

Step S1301 may be performed by a method in HEVC or VTM, or may also be performed by another method for generating a motion vector prediction candidate list, which is not limited in this embodiment of the present application.

S1302, it is (optionally) determined to update the prediction block of the image block to be processed.

It is understood that step S1302 may be performed by the video decoder 200 in fig. 1.

The prediction block of the to-be-processed image block is a prediction block of the current block, and can be obtained based on prediction of one or more coded blocks.

For example, whether to update the prediction block of the to-be-processed image block may be determined according to the update determination identifier of the to-be-processed image block, that is, whether to perform spatial filtering on the to-be-processed image block may be determined according to the update determination identifier of the to-be-processed image block.

In a possible implementation manner, update discrimination identification information of the to-be-processed image block may be obtained by parsing the code stream, where the update discrimination identification information is used to indicate whether to update a prediction block of the to-be-processed image block; and further determining a prediction block for updating the image block to be processed according to the updating judgment identification information of the image block to be processed.

In another possible implementation manner, the predetermined update discrimination indication information of the to-be-processed image block may be obtained, where the predetermined update discrimination indication information is used to indicate whether to update the prediction block of the to-be-processed image block; and further determining to update the prediction block of the image block to be processed according to the preset updating judgment identification.

For example, if the update discrimination indicator is true, it may be determined to update the prediction block of the to-be-processed image block, that is, to perform spatial filtering on the prediction block of the to-be-processed image block; and if the updating judgment mark is false, determining that the prediction block of the image block to be processed does not need to be updated. The embodiment of the present application is not limited to the specific form of the updated discriminating mark, and the updating of the discriminating mark as true or false is merely exemplified herein.

S1303, a prediction mode of the image block to be processed is (optionally) determined.

It is understood that step S1303 may be performed by the video decoder 200 in fig. 1.

For example, the prediction mode of the to-be-processed image block may be a merge mode (merge) and/or an inter advanced motion vector prediction mode (inter AMVP), which is not limited in this embodiment of the present application. It can be understood that the prediction mode of the to-be-processed image block may be merge only, or inter AMVP only, or a prediction mode combining two modes, i.e., merge and inter AMVP.

The inter advanced motion vector prediction mode (inter AMVP) may also be referred to as an inter motion vector prediction mode (inter MVP).

For example, the method for determining the prediction mode of the to-be-processed image block may obtain the prediction mode of the to-be-processed image block by analyzing the code stream, and determine that the prediction mode of the to-be-processed image block is merge and/or inter AMVP mode.

It can be understood that, in the embodiments of the present application, a spatial filtering method may be performed on a block encoded in a merge and/or inter AMVP mode in an inter-coded block, that is, a filtering process may be performed on a block encoded in the merge and/or inter AMVP mode at a decoding end.

And S1304, performing motion compensation on the image block to be processed based on the motion information to obtain a prediction block of the image block to be processed.

And the prediction block of the image to be processed comprises a prediction value of the target pixel point.

It is understood that step S1304 may be performed by the video decoder 200 in fig. 1.

Illustratively, motion compensation is to predict and compensate the current local image by referring to the image, so as to reduce the redundant information of the frame sequence.

For example, when performing motion compensation based on motion information, a prediction block of an image block to be processed may be obtained from a reference frame by using a reference frame direction, a reference frame number, and a motion vector, where the reference frame direction may be forward prediction, backward prediction, or bidirectional prediction, which is not limited in this application.

For example, when the reference frame direction is forward prediction, the current Coding Unit (CU) may select one reference picture from the forward reference picture set to obtain a reference block; when the reference frame direction is backward prediction, the current Coding Unit (CU) may select one reference image from a backward reference image set to obtain a reference block; when the reference frame direction is bi-prediction, the current Coding Unit (CU) may select one reference picture from each of the forward and backward reference picture sets to obtain a reference block.

It should be noted that, in the step S1304, a method for performing motion compensation on the to-be-processed image block based on the motion information may adopt a method in HEVC or VTM, or may also adopt other methods to obtain a prediction block of the to-be-processed image block, which is not limited in this application.

S1306, carrying out weighted calculation on the reconstruction values of the one or more reference pixel points and the predicted value of the target pixel point to update the predicted value of the target pixel point.

The reference pixel point and the target pixel point have a preset spatial position relationship.

It is understood that step S1306 may be performed by the video decoder 200 in fig. 1.

Illustratively, the target pixel point is a pixel point in a prediction block of the image block to be processed, and the prediction value of the target pixel point may be determined according to a pixel value of a pixel point in a reference block.

For example, the reference pixel point may be a reconstructed pixel point adjacent to a current CU (image block to be processed) in a spatial domain, and specifically, the reference pixel point may be a reconstructed pixel point in a block other than the current CU block in the image, for example, the reference pixel point may be a reconstructed pixel point in a CU block above or on the left side of the current CU, which is not limited in this embodiment of the present invention.

It can be understood that, in the step S1306, spatial filtering is performed on the prediction pixel of the target pixel point by using the reconstructed pixel point adjacent to the current CU in the spatial domain, specifically, weighting calculation is performed on the pixel value of the reconstructed pixel point adjacent to the current CU in the spatial domain and the prediction pixel of the target pixel point in the current block, so as to obtain the updated prediction pixel of the target pixel point.

In a possible embodiment, the one or more reference pixels may include reconstructed pixels having the same abscissa and the predetermined ordinate difference as the target pixel, or reconstructed pixels having the same ordinate and the predetermined abscissa difference as the target pixel.

Exemplarily, as shown in fig. 14, with the upper left corner of the picture as the origin of the coordinate system, the X-axis direction of the coordinate system extends rightward along the upper edge of the picture, the y-axis direction of the coordinate system extends downward along the left edge of the picture, if the coordinates of the target pixel point in the image block to be processed (current CU) are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), the reference pixel point of the target pixel point may be a reconstructed pixel point in a block above or on the left side of the image block to be processed, if the reference pixel point is a reconstructed pixel point in a block above the image block to be processed, since the reference pixel point is a reconstructed pixel point in a block other than the image block to be processed, the vertical coordinate of the reference pixel point is the vertical coordinate of the upper edge of the image block to be processed minus the preset position relationship N, and the horizontal coordinate is the same as the horizontal coordinate of the target pixel point to be processed, namely, the coordinate of the reference pixel point is (xP, yN-N); if the reference pixel point is a reconstructed pixel point in a block on the left side of the image block to be processed, because the reference pixel point is a reconstructed pixel point in a block other than the image block to be processed, the abscissa of the reference pixel point is the leftmost abscissa of the image block to be processed minus the preset position relation M, and the ordinate is the same as the ordinate of the target pixel point of the image block to be processed, that is, the coordinate of the reference pixel point is (xN-M, yP).

In a possible implementation, the predicted value of the target pixel point may be updated according to the following formula:

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value of the target pixel point before updating, predQ (xP, yP) is an updated predicted value of the target pixel point, and recon (xN-M1, yP) and recon (xP, yN-M2) are respectively located at coordinate positions

The reconstructed values of the reference pixel points of (xN-M1, yP), (xP, yN-M2), w1, w2, w3, w4, w5 and w6 are preset constants, and M1 and M2 are preset positive integers.

A specific method for calculating the updated predicted value of the target pixel point under different conditions of the coordinates (xN, yN) of the upper-left pixel point in the image block to be processed is described below.

In the first case: if xN is greater than zero and yN is equal to zero, the pixel at the position of the reference pixel (xN-M1, yP) is already encoded and reconstructed, and the updated predicted value of the target pixel can be obtained by the following formula:

for example, as shown in fig. 15, taking the size of the to-be-processed image block as 16 × 16 as an example, if the to-be-processed image block is CU1, the upper-left pixel (xN, yN) of the to-be-processed image block (CU1) is (16,0), the coordinate (xP, yP) of the target pixel in the to-be-processed image block is (18,3), since the abscissa xN of the upper-left corner of the current CU (CU1) is greater than zero and the ordinate yN is equal to zero, it can be determined that the current CU is located at the upper edge of the image, and when performing spatial filtering on the target pixel in the current CU, since the current CU is located at the upper edge and there is no reconstructed pixel on the upper side, in this case, the reference pixel is a reconstructed pixel on the left side of the current CU and is the same as the ordinate of the target pixel, it can be marked as (16-M1,3), the M1 is a predetermined spatial relationship between the reference pixel and the target pixel, taking M1 as an example to take 1, when M1 takes 1, the reference pixel of the target pixel (18,3) can be (15,3), and similarly, the reference pixel of the target pixel (xP, yP) can be (xN-1, yP).

If the pixel at the position of the reference pixel point (xN-1, yP) is coded and reconstructed, weighting calculation can be carried out through the reconstructed value recan (xN-1, yP) of the reference pixel point and the predicted value predP (xP, yP) of the target pixel point, and the updated predicted value predQ (xP, yP) of the target pixel point is obtained.

In the second case: if xN is equal to zero and yN is greater than zero, when the pixel at the reference pixel (xP, yN-M2) position is encoded and reconstructed, the updated predicted value of the target pixel can be obtained by the following formula:

for example, as shown in fig. 15, if the to-be-processed image block is CU2, the top-left pixel point (xN, yN) of the to-be-processed image block (CU2) is (0,32), the coordinate (xP, yP) of the target pixel point is (8,35), since the abscissa xN of the top-left corner of the current CU (CU2) is equal to zero and the ordinate yN is greater than zero, it can be determined that the current CU is located at the left edge of the image, when spatial filtering is performed on the target pixel point in the current CU, since the current CU is at the left edge of the image and there is no reconstructed pixel point on the left side, the reference pixel point in this case is a reconstructed pixel point above the current CU and is the same as the abscissa of the target pixel point and can be denoted as (8, 32-M2), the M2 is a preset spatial position relationship between the reference pixel point and the target pixel point, which is described here by taking 1 as an example of M2, when M2 takes 1, the target pixel point (8,35) the reference pixel of (8, 31) may be, similarly, the reference pixel of the target pixel (xP, yP) may be (xP, yN-M2).

If the pixel at the position of the reference pixel point (xP, yN-M2) is coded and reconstructed, weighting calculation can be carried out on a reconstruction value recon (xP, yN-M2) of the reference pixel point and a predicted value predP (xP, yP) of the target pixel point, so that an updated predicted value predQ (xP, yP) of the target pixel point is obtained.

In the third case: if xN is greater than zero and yN is greater than zero, when the pixels at the reference pixel (xN-M1, yP) and (xP, yN-M2) positions are encoded and reconstructed, the updated predicted value of the target pixel can be obtained by the following formula:

for example, as shown in fig. 15, if the to-be-processed image block is CU3, the top-left pixel point (xN, yN) of the to-be-processed image block (CU3) is (48,32), the coordinate (xP, yP) of the target pixel point is (56,33), since the abscissa xN of the top-left pixel point of the current CU (CU3) is greater than zero and the ordinate yN is greater than zero, it can be determined that the current CU is not at the edge position of the image, when performing spatial filtering on the target pixel point in the current CU, the reference pixel point can be a reconstructed pixel point above the current CU and a reconstructed pixel point at the left side of the current CU, and when the reference pixel point is a reconstructed pixel point at the left side of the current CU, the ordinate of the reconstructed pixel point position is the same as the ordinate of the target pixel point, and can be (xN-M1, 33), when the reference pixel point is a reconstructed pixel point above the current CU, the abscissa of the reconstructed pixel point is the target pixel point at the same as the abscissa of the target pixel point, the spatial position relationship between the reference pixel point and the target pixel point may be (56, yN-M2), where M1 and M2 are the preset spatial position relationship between the reference pixel point and the target pixel point, and it is described here that M1 and M2 take 1 as an example, when M1 and M2 take 1, the reference pixel points of the target pixel point (56,33) may be (47,33) and (56,31), and similarly, the reference pixel points of the target pixel point (xP, yP) may be (xN-M1, yP) and (xP, yN-M2).

If the pixels at the positions of the reference pixel point (xN-M1, yP) and the reference pixel point (xP, yN-M2) are coded and reconstructed, the reconstruction values of the reference pixel point, namely recon (xN-M1, yP) and recon (xP, yN-M2), and the predicted value predP (xP, yP) of the target pixel point are weighted and calculated to obtain the updated predicted value predQ (xP, yP) of the target pixel point.

In the embodiments of the present application, the values of the weighting coefficients w1, w2, w3, w4, w5, and w6 and the values of M1 and M2 are not limited, and the above description will be made only by taking 1 as an example of M1 and M2.

For example, the weighting coefficient set (w1, w2), (w3, w4) or (w5, w6, w7) may adopt a numerical combination of w1+ w2, w3+ w4 or w5+ w6+ w7 equal to the integer power of 2 to reduce the division operation. For example, the numerical value combination may be (6, 2), (5, 3), (4, 4), or (6, 1, 1), (5, 2, 1), etc., which is not limited in the embodiments of the present application and is only an exemplary description herein.

In another possible implementation, the predicted value of the target pixel point may be updated according to the following formula:

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value of the target pixel point before updating, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) are reconstructed values of reference pixel points located at coordinate positions (xN-M1, yP), (xP, yN-M2), w1, w2, w3 are preset constants, and M1, M2 are preset positive integers.

For example, the weighting coefficient set (w1, w2, w3) may adopt a numerical combination of w1+ w2+ w3 equal to the integer power of 2 to reduce the division operation. For example, the numerical combinations such as (6, 1, 1), (5, 2, 1), etc. may be adopted, and this is not limited in the embodiments of the present application and is only an exemplary description.

It should be noted that the implementation manner is different from the previous implementation manner in that the implementation manner does not consider pixels on the left side and the upper side of the image block to be processed, and if the implementation manner is adopted to update the predicted value of the target pixel, the reconstructed value of the reference pixel is not available, the method in steps S13061-S13062 described below may be adopted to obtain a new reference pixel, and the predicted value of the target pixel is updated according to the new reference pixel.

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper-left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of reference pixel points located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5, w6, 46w 45, w8, w5, w10, w11, M11, w1, M599, w 599, M599 is a preset integer.

The following method for calculating the updated predicted value of the target pixel point when the coordinates (xN, yN) of the upper-left pixel point in the image block to be processed are different is specifically described.

In the first case: if xN is greater than zero and yN is equal to zero, when the pixels at the reference pixel (xN-M1, yP) and (xN-M2, yP) positions have been encoded and reconstructed, the updated predicted value of the target pixel can be obtained by the following formula:

it is to be understood that, different from the first case of the first embodiment, the reference pixel points of the present embodiment are two, if the to-be-processed image block is CU1, the top-left pixel point (xN, yN) of the to-be-processed image block (CU1) is (16,0), the coordinates (xP, yP) of the target pixel point in the to-be-processed image block are (18,3), the reference pixel point is a reconstructed pixel point on the left side of the current CU and is the same as the vertical coordinates of the target pixel point, and can be denoted as (16-M1,3) and (16-M2,3), the M1 and M2 are spatial position relationships preset between the reference pixel point and the target pixel point, which is described by taking 1 from M1 and 2 from M2 as examples, when M1 takes 1 from M2 and 2 from M2, the reference pixel points of the target pixel point (18,3) can be denoted as (15,3) and (14, 3), and similarly, the target pixel point (xP, yP) can be (xN-1, yP) and (xN-2, yP).

If the pixels at the positions of the reference pixel points (xN-1, yP) and (xN-2, yP) are coded and reconstructed, weighted calculation can be carried out on the reconstruction values of the reference pixel points, namely recon (xN-1, yP) and recon (xN-2, yP), and the predicted value predP (xP, yP) of the target pixel point, so that the updated predicted value predQ (xP, yP) of the target pixel point is obtained.

In the second case: if xN is equal to zero and yN is greater than zero, when the pixels at the reference pixel (xP, yN-M3) and (xP, yN-M4) positions have been encoded and reconstructed, the updated predicted value of the target pixel can be obtained by the following formula:

illustratively, different from the second case of the first embodiment, the reference pixels of the present embodiment are two, if the to-be-processed image block is CU2, the top-left pixel (xN, yN) of the to-be-processed image block (CU2) is (0,32), the coordinates (xP, yP) of the target pixel are (8,35), the reference pixel is a reconstructed pixel above the current CU and is the same as the abscissa of the target pixel, which can be denoted as (8,32-M3) and (8,32-M4), the M3 and the M4 are spatial position relationships preset between the reference pixel and the target pixel, which is exemplified by taking 1 of M3 and 2 of M4, when 1 of M3 and 2 of M4, the reference pixels of the target pixel (8,35) can be (8,31) and (8,30), and similarly, the reference pixels of the target pixel (xP, yP) can be (xP), yN-1) and (xP, yN-2).

If the pixels at the reference pixel points (xP, yN-1) and (xP, yN-2) are coded and reconstructed, weighted calculation can be carried out on reconstructed values recon (xP, yN-1) and recon (xP, yN-2) of the reference pixel points and a predicted value predP (xP, yP) of a target pixel point, and an updated predicted value predQ (xP, yP) of the target pixel point is obtained.

In the third case: if xN is greater than zero and yN is greater than zero, the updated predicted value of the target pixel point when the pixels at the reference pixel point (xN-M1, yP), (xN-M2, yP), (xP, yN-M3) and (xP, yN-M4) locations have been encoded and reconstructed can be obtained by the following formula:

illustratively, different from the third case of the first embodiment, there are two reference pixels above and on the left of the current CU, if the to-be-processed image block is CU3, the upper-left pixel (xN, yN) of the to-be-processed image block (CU3) is (48,32), the coordinate (xP, yP) of the target pixel is (56,33), and when the reference pixel is the reconstructed pixel on the left of the current CU, the ordinate of the reconstructed pixel position is the same as the ordinate of the target pixel, which may be (48-M1,33) and (48-M2,33), and when the reference pixel is the reconstructed pixel above the current CU, the abscissa of the reconstructed pixel position is the same as the abscissa of the target pixel, which may be (48-M1,33)

(56,32-M3) and (56,32-M4), where M1, M2, M3 and M4 are spatial position relations preset between the reference pixel point and the target pixel point, and here, it is described by taking M1 and M3 as 1, and M2 and M4 as 2, and when M1 and M3 take 1, M2 and M4 take 2, the reference pixel point of the target pixel point (56,33) may be (47,33), (46,33), (56,31) and (56,30), and similarly, the reference pixel point of the target pixel point (xP, yP) may be (xN-M1, yP), (xN-M2, yP), (xP, yN-M3) and (xP, yN-M4).

If the pixels at the positions of the reference pixel points (xN-M1, yP), (xN-M2, yP), (xP, yN-M3) and (xP, yN-M4) are coded and reconstructed, the reconstructed values of the reference pixel points, namely, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3) and recon (xP, yN-M4) and the predicted values predP (xP, yP) of the target pixel points can be weighted and calculated, so that the updated predicted values predQ (xP, yP) of the target pixel points are obtained.

It should be noted that, in the embodiment of the present application, the values of the weighting coefficients w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, and w11, and the values of M1, M2, M3, and M4 are not limited, and the above description only takes the example that M1 and M3 take 1, and that M2 and M4 take 2, and it is understood that the values of M1 and M3, M2, and M4 may be the same or different in practical application, and the values of M1 and M2, M3, and M4 may be different.

For example, the weighting coefficient set (w1, w2, w3), (w4, w5, w6) or (w7, w8, w9, w10, w11) may adopt a numerical combination in which w1+ w2+ w3, w4+ w5+ w6 or w7+ w8+ w9+ w10+ w11 is equal to the integer power of 2, so as to reduce the division operation. For example, the numerical value combination may be (6, 1, 1), (5, 2, 1), or (3, 2, 1, 1, 1), etc., which is not limited in the embodiments of the present application and is only an exemplary description herein.

In another embodiment, the predicted value of the target pixel point may be updated according to the following formula:

the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of reference pixel points located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5 preset constants, M1, M2, M3, and M4 are preset positive integers.

For example, the weighting coefficient set (w1, w2, w3, w4, w5) may adopt a numerical combination that w1+ w2+ w3+ w4+ w5 is equal to the integer power of 2, so as to reduce the division operation. For example, a combination of values such as (3, 2, 1, 1, 1) may be adopted, which is not limited in the embodiments of the present application and is only an exemplary description herein. It should be noted that the implementation manner is different from the previous implementation manner in that the implementation manner does not consider pixels on the left side and the upper side of the image block to be processed, and if the implementation manner is adopted to update the predicted value of the target pixel, the reconstructed value of the reference pixel is unavailable, the method in steps S13061-S13062 described below may be adopted to obtain an available reference pixel, and the predicted value of the target pixel is updated according to the available reference pixel.

In a possible implementation, the one or more reference pixels include one or more of the following pixels: reconstructed pixel points which have the same horizontal coordinate with the target pixel point and are adjacent to the upper edge of the image block to be processed; or, a reconstructed pixel point which has the same vertical coordinate with the target pixel point and is adjacent to the left edge of the image block to be processed; or, a reconstructed pixel point at the upper right corner of the image block to be processed; or, a reconstructed pixel point at the lower left corner of the image block to be processed; or, the reconstructed pixel point at the upper left corner of the image block to be processed.

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP) +((w1+w2)/2))/(w1+w2)

the coordinates of the target pixel point are (xP, yP), predP (xP, yP) is a predicted value of the target pixel point before updating, predQ (xP, yP) is a predicted value of the target pixel point after updating, and w1 and w2 are preset constants.

Specifically, the method comprises the following steps:

the second predicted pixel value predP1(xP, yP) may be first derived from spatial neighboring pixel predictions using PLANAR mode in intra Prediction (PLANAR). It can be understood that the PLANAR mode uses two linear filters in the horizontal and vertical directions, and uses the average of the two as the prediction value of the current block pixel.

Illustratively, this second predicted pixel value predP1(xP, yP) may be obtained using the planet mode:

predP1(xP, yP) = (predV (xP, yP) + predH (xP, yP) + nTbW × nTbH) > (Log2(nTbW) + Log2(nTbH) +1), predV (xP, yP) = ((nTbH-1- (yP-yN)) -recon (xP, yN-1) + (yP-yN +1) × recon (xN-1, yN + nTbH)) < Log2(nTbW), predH (xP, yP) ((nTbW-1- (xP-xN)) (xN-1, yP) + (xP-xN +1) ((xN-xN)) (xN-1, xN-1, yP) + (xN-xN +1) × xN + xN (xN + nTbW, Log-1)) (nxn-1, xN-2, where the pixels are located at the top of the image (nxn, n-xN, where the pixels (xN-1, xN-xN + xN (xN) are located in the top of the image (txn-nxn, where (xN-1, the image, the pixel is located at the top of the image (xN-xN, the top of the image (xN, the image, as shown in the image, 16, the top of (xN-xN, the image, the top of the image (xN-1, the image (xN, the top of the image, the image (xN-1, the image, the top of the image (xN-of the image, the top of the image (xN, the top of the image (xN-1, the image (xN-of the image (xN-1, the image (xN-of the top of the image (xN-1, the top of the image, the top of the image (xN-of the top of the image, the top of the image (xN-of the top of the image (xN-1, the top of the image, the top of the top, yN-1), (xN-1, yN + nTbH), (xN-1, yP), (xN + nTbW, yN-1) are the reconstructed values of the reference pixel points, nTbW and nTbH being the width and height of the current CU (image block to be processed).

In another possible implementation, the predicted value of the target pixel point is updated according to the following formula:

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH)+nTbH/2)>>Log2(nTbH),

predH (xP, yP) ((nTbW-1-xP) × p (-1, yP) + (xP +1) × p (nTbW, -1) + nTbW/2) > > Log2(nTbW), coordinates of the target pixel point are (xP, yP), the coordinates of the upper left-corner pixel point in the image block to be processed are (0,0), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is a predicted value after updating of the target pixel point, p (xP, -1), p (-1, nTbH), p (-1, yP), and p (nTbW, -1) are reconstructed values of the reference pixel points located at coordinate positions (xP, -1), (-1, nTbH), (-1, yP), (nTbW, -1), w1, w2 are preset constants, and nTbW and nTbH are the width and height of the image block to be processed.

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH))<<Log2(nTbW),

It should be noted that the PLANAR mode (PLANAR) algorithm used for generating the second predicted pixel value predP1(xP, yP) is not limited to the algorithm in VTM, and the PLANAR algorithm in HEVC and h.264 may also be used, which is not limited in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, values of the weighting coefficients w1 and w2 are not limited, and for example, the weighting coefficient group (w1, w2) may adopt a combination of w1+ w2 equal to an integer power of 2 to reduce division operations. For example, the numerical combinations such as (6, 2), (5, 3), (4, 4), etc. may be adopted, and this is not limited in the embodiments of the present application and is only an exemplary description here.

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP)

+((w1+w2)/2))/(w1+w2)

where predP1(xP, yP) ═ predV (xP, yP) + predH (xP, yP) +1) > 1,

predV (xP, yP) ((nTbH-1- (yP-yN))) recan (xP, yN-1) + (yP-yN +1) × recan (xN-1, yN + nTbH) + (nTbH > 1)) > Log2(nTbH), predH (xP, yP) ((nTbW-1- (xP-xN))) recan (xN-1, yP) + (xP-xN +1) >, recan (xN + nTbW, yN-1) + (nTbW >) > 1)) > 2(nTbW), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) are the coordinates of the target pixel point, the predicted value of the target pixel point is (nxp), the predicted value of the target pixel point is (nxp-1, and the predicted value of the target pixel point is (nxp-1, prexp-yP-1), yP), recon (xN + nTbW, yN-1) are reconstruction values of reference pixels located at coordinate positions (xP, yN-1), (xN-1, yN + nTbH), (xN-1, yP), (xN + nTbW, yN-1), w1, w2 are preset constants, and nTbW and nTbH are widths and heights of the image blocks to be processed, respectively.

In another possible implementation, the inter-prediction block may be processed by using a Position-dependent intra prediction combination process (Position-dependent intra prediction combination process) in intra prediction, and the updated prediction value predQ (xP, yP) of the target pixel may be obtained by using a DC mode of the intra prediction combination process in VTM. It will be appreciated that, in the DC mode, the prediction value of the current block can be derived from the average value of the reference pixels to the left and above.

For example, the updated predicted value predQ (xP, yP) of the target pixel point may be obtained by the following formula:

wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (32 > ((xP < 1) > nScale), wTL (xP, yP) ((wl) (xP) > 4) + (wt (yP) > 4)), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), as shown in fig. 16, the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) are the coordinates of the target pixel point, the coordinates of the target pixel point are updated (xP-1-yP, the predicted value of the target pixel point is (1-yP, and the predicted value of the target pixel point is (rexn-1, yP-xP-1, yP), and the predicted value is updated (rexp-1-xP, respectively, yN-1), (xN-1, yP),

and (xN-1, yN-1) reconstructing a reference pixel point, wherein nTbW and nTbH are the width and height of the image block to be processed, and clip1Cmp is clamping operation.

It should be noted that the updated predicted value predQ (xP, yP) of the target pixel may use not only the intra prediction joint processing technique in the VTM, but also the algorithm in the JEM.

In a possible implementation, the inter-prediction block may be processed by using a location-based intra prediction joint processing technique in intra prediction, and the updated prediction value predQ (xP, yP) of the target pixel may be obtained by using a method of the planet mode of the intra prediction joint processing technique in VTM, as follows:

Wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (xP < 1) > nScale), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), as shown in fig. 16, the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is the predicted value before updating of the target pixel point, predQ (xP, yP) is the predicted value after updating of the target pixel point, rexn (1, yP), recp (1-1, yP) is located at the position of the target pixel point, and the reference point is the width of the pixel point, txn-1, yxp, and the reference point, respectively, clip1Cmp is a clamping operation.

It should be noted that the updated predicted value predQ (xP, yP) of the target pixel may use not only the algorithm in the VTM but also the algorithm in the JEM.

In a possible implementation manner, a boundary filtering technique in intra-frame prediction may be used to perform filtering processing on inter-frame prediction pixels, where the boundary filtering technique may be performed with reference to the HEVC method, and details are not described here.

It should be noted that, when the predicted value of the target pixel is updated according to any of the above manners, if the reconstructed value of the reference pixel is not available, the step S1306 may further include the following steps S13061-S13062.

And S13061, when the reconstructed values of the reference pixel points are unavailable, determining the availability of the pixel points adjacent to the upper edge and the left edge of the image block to be processed according to a preset sequence until obtaining a preset number of available reference pixel points.

It can be understood that the case that the reconstruction value of the reference pixel point is unavailable may include: when the to-be-processed image block is located at the upper edge position of the picture, the reconstructed value of the reference pixel point at the coordinate position (xP, yN-M) does not exist, or when the to-be-processed image block is located at the left edge position of the picture, the reconstructed value of the reference pixel point at the coordinate position (xN-N, yP) does not exist, or the reset of the reference pixel point cannot be obtained, and the like.

In one implementation, as shown in fig. 17, the predetermined sequence may be a sequence from coordinates (xN-1, yN + nTbH-1) to coordinates (xN-1, yN-1), and then from coordinates (xN, yN-1) to coordinates (xN + nTbW-1, yN-1). For example, all the pixels can be traversed from the coordinate (xN-1, yN + nTbH-1) to the coordinate (xN-1, yN-1), and then from the coordinate (xN, yN-1) to the coordinate (xN + nTbW-1, yN-1), and the available reference pixels in the pixels adjacent to the upper edge and the left edge of the image block to be processed are found. It should be noted that, in the embodiment of the present application, a specific sequence of the preset sequence is not limited, and is only an exemplary description here.

Illustratively, when at least one reference pixel point among all the reference pixel points is available. If the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is unavailable, searching for an available pixel point from the coordinate (xN-1, yN + nTbH-1) to the coordinate (xN-1, yN-1) according to the preset sequence, then searching for the available pixel point from the coordinate (xN, yN-1) to the coordinate (xN + nTbW-1, yN-1), once the available pixel point is found, the searching is terminated, and if the available pixel point is (x, y), the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is set as the reconstruction value of the pixel point (x, y); the reconstruction value of a reference pixel (x, y) in a reference pixel (xN-1, yN + nTbH-M) set is unavailable, wherein M is more than or equal to 2 and less than or equal to nTbH +1, and the reconstruction value of the reference pixel (x, y) of the reference pixel is set as the reconstruction value of the pixel (x, y + 1); and the reconstruction value of the reference pixel point (x, y) in the reference pixel point (xN + N, yN-1) set is unavailable, wherein N is more than or equal to 0 and less than or equal to nTbW-1, and the reconstruction value of the reference pixel point (x, y) is set as the reconstruction value of the reference pixel point (x-1, y).

Exemplarily, if a reconstruction value of a reference pixel (xN-1, yN + nTbH-M) is not available, where M is greater than or equal to 1 and less than or equal to nTbH +1, an available reference pixel can be found according to the preset order from a coordinate (xN-1, yN + nTbH-M), and if the available reference pixel is B, a reconstruction value of the reference pixel (xN-1, yN + nTbH-M) can be set as a reconstruction value of the reference pixel B; if the reference pixel point coordinate is (xN + N, yN-1), the reconstruction value is not available, wherein N is greater than or equal to 0 and less than or equal to nTbW-1, the available reference pixel points can be searched for according to the preset sequence from the coordinate (xN + N, yN-1), and if the available reference pixel point is C, the reconstruction value of the reference pixel point (xN + N, yN-1) can be set as the reconstruction value of the reference pixel point C.

For example, if the reconstructed value of the reference pixel (xN-1, yN + nTbH-3) is not available, the usability of the pixel adjacent to the top edge and the left edge of the image block to be processed may be determined from the order of the coordinates (xN-1, yN + nTbH-3) to the coordinates (xN-1, yN-1) until a preset number of available reference pixels are obtained, and the reconstructed value of the reference pixel (xN-1, yN + nTbH-3) may use the reconstructed value of the available reference pixel. If the reconstruction value of the reference pixel point coordinate (xN +3, yN-1) is unavailable, searching for an available pixel point from the coordinate (xN +3, yN-1) to the coordinate (xN + nTbW-1, yN-1), wherein the reconstruction value of the reference pixel point (xN +3, yN-1) can use the reconstruction value of the available reference pixel point.

Illustratively, if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is not available, searching for an available pixel point from the coordinates (xN-1, yN + nTbH-1) to the coordinates (xN-1, yN-1) in the above-mentioned preset order, and then from the coordinates (xN, yN-1) to the coordinates (xN + nTbW-1, yN-1), once the available pixel point is found, the search is terminated, and if the available pixel point is (x, y), the reconstruction value of the reference pixel point (xN-1, yN + nTbH-1) is set as the reconstruction value of the pixel point (x, y); if the reconstruction value of the reference pixel point (xN-1, yN + nTbH-M) is unavailable, wherein M is larger than 1 and smaller than or equal to nTbH +1, an available reference pixel point can be searched for from the coordinate (xN-1, yN + nTbH-M) according to the sequence opposite to the preset sequence, and if the available reference pixel point is C, the reconstruction value of the reference pixel point (xN-1, yN + nTbH-M) can be set as the reconstruction value of the reference pixel point C; if the reference pixel coordinate is (xN + N, yN-1), the reconstruction value is not available, where N is greater than or equal to 0 and less than or equal to nTbW-1, the available reference pixel can be searched for in the order opposite to the preset order from the coordinate (xN + N, yN-1), and if the available reference pixel is D, the reconstruction value of the reference pixel (xN + N, yN-1) can be set as the reconstruction value of the reference pixel D.

For example, if the reconstructed value of the reference pixel (xN-1, yN + nTbH-3) is not available, the availability of the pixels adjacent to the top edge and the left edge of the image block to be processed may be determined from the order of the coordinates (xN-1, yN + nTbH-3) to the coordinates (xN-1, yN + nTbH-1) until a preset number of available reference pixels are obtained, and the reconstructed value of the reference pixel (xN-1, yN + nTbH-3) may use the reconstructed value of the available reference pixel. If the reconstructed value of the reference pixel point coordinate (xN +3, yN-1) is unavailable, searching for an available pixel point from the coordinate (xN +3, yN-1) to the coordinate (xN, yN-1), wherein the reconstructed value of the reference pixel point (xN +3, yN-1) can use the reconstructed value of the available reference pixel point.

It should be noted that the new reference pixel may be a first available reference pixel searched according to the preset sequence, or may be any available reference pixel searched according to the preset sequence, which is not limited in this embodiment of the application.

It can be understood that, by adopting the method, for the unavailable reference pixel points, the available reference pixel points in the adjacent pixel points at the upper edge and the left edge of the image block to be processed can be searched according to the preset sequence, and the reconstruction values of the available reference pixel points are used as the reconstruction values of the unavailable reference pixel points.

S13062, performing weighted calculation on the reconstruction value of the available reference pixel and the predicted value of the target pixel to update the predicted value of the target pixel.

Illustratively, the predicted value of the target pixel point may be updated according to any one of the above embodiments according to the reconstructed value of the new reference pixel point.

It should be noted that, if the reconstructed value of the reference pixel is not available and it is determined according to step S13061 that the pixels adjacent to the upper edge and the left edge of the image block to be processed are not available, the reconstructed value of the reference pixel may be set to 1< < (bitDepth-1), where bitDepth is the bit depth of the sampling value of the reference pixel. For example, when the image block to be processed is located at the upper left corner of the picture, the coordinates of the upper left corner of the image block to be processed are (0, 0), and then the pixel points adjacent to the upper edge and the left edge of the image block to be processed are unavailable, and the reconstruction value of the reference pixel point corresponding to the target pixel point of the image block to be processed can be set to be 1< (bitDepth-1).

In the various embodiments, the spatial filtering is performed on the inter-prediction pixels in the process of generating the inter-prediction pixels, so that compared with the prior art, the coding efficiency is improved.

In a possible implementation manner, step S1305 may be further included before step S1306.

S1305, (optionally), filtering the reference pixel.

It is understood that step S1305 may be performed by the filter unit 206 of the video decoder in fig. 3.

For example, the filtering the reference pixel point may include: when the reference pixel point is positioned above the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of the left and right adjacent pixel points of the reference pixel point; when the reference pixel point is positioned at the left side of the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of the upper and lower adjacent pixel points of the reference pixel point; and updating the reconstruction value of the reference pixel point by adopting the result of the weighted calculation.

It can be understood that, after the reference pixel is filtered in step S1305, when step S1306 is executed, the reconstructed value of the reference pixel and the predicted value of the target pixel, which are updated after the filtering, may be used to perform weighting calculation, so as to update the predicted value of the target pixel.

It should be noted that, the specific method for performing filtering processing on the reference pixel point may refer to the filtering method in step S1306, and details are not repeated here.

It can be understood that the reference pixel point is subjected to filtering processing to update the reconstruction value of the reference pixel point, and the updated reconstruction value of the reference pixel point is used for filtering the target pixel point, so that the coding efficiency can be further improved, and the prediction residual error can be reduced.

In a possible implementation manner, step S1307 may be further included before step S1306 or after step S1306.

S1307, inter-frame prediction is continued by using inter-frame coding techniques other than the method according to the motion information and (optionally) the code stream information.

It is understood that step S1307 may be performed by the interframe predictor 210 of the video decoder in fig. 3.

For example, techniques in HEVC or VTM may be used, including but not limited to bi-directional optical flow methods, decoding side motion vector refinement methods, local illumination compensation techniques (LIC), generic weighted prediction (GBI), Overlapped Block Motion Compensation (OBMC), decoding side motion vector compensation (DMVD). The method in HEVC or VTM may be adopted, and other methods for generating a motion vector prediction candidate list may also be adopted, which is not limited in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the execution sequence of the above method steps S1301 to S1307 is not limited. For example, step S1305 may be executed before step S1307 or after step S1307, which is not limited in this embodiment of the application.

In a possible implementation manner, before the motion compensation is performed on the image block to be processed based on the motion information, the method may further include: initially updating the motion information through a first preset algorithm; correspondingly, the performing motion compensation on the image block to be processed based on the motion information includes: and performing motion compensation on the image blocks to be processed based on the initially updated motion information.

In another possible embodiment, after obtaining the prediction block of the to-be-processed image block, the method may further include: pre-updating the prediction block through a second preset algorithm; correspondingly, the performing weighted calculation on the reconstruction value of one or more reference pixel points and the prediction value of the target pixel point includes: and carrying out weighted calculation on the reconstruction values of the one or more reference pixel points and the pre-updated prediction value of the target pixel point.

In another possible implementation manner, after performing weighted calculation on the reconstruction value of the one or more reference pixel points and the predicted value of the target pixel point to update the predicted value of the target pixel point, the method further includes: and updating the predicted value of the target pixel point through a second preset algorithm.

It should also be understood that, after obtaining the updated predicted value of the target pixel point, the method may further include: and adding the final inter-prediction image and the residual image to obtain a reconstructed image of the current block. Specifically, if the current block has a residual error, adding the residual error information and the predicted image to obtain a reconstructed image of the current block; and if the current block has no residual error, the predicted image is a reconstructed image of the current block. The above process may adopt the same method as HEVC or VTM, and may also adopt other motion compensation and image reconstruction methods, without limitation.

According to the inter-frame prediction method provided by the embodiment of the application, the motion information of the image block to be processed is obtained by analyzing the code stream; performing motion compensation on the image block to be processed based on the motion information to obtain a prediction block of the image block to be processed; and performing weighted calculation on the reconstruction values of one or more reference pixel points and the predicted value of the target pixel point to update the predicted value of the target pixel point, wherein the reference pixel point and the target pixel point have a preset spatial position relationship. After the predicted value of the target pixel point of the image block to be processed is obtained, the predicted value of the target pixel point is filtered by utilizing the neighboring reconstructed pixels around, so that the coding compression efficiency can be improved, and the PSNR BDrate can be improved by 0.5%.

The embodiment of the present application provides an inter-frame prediction apparatus, which may be a video decoder, and specifically, the inter-frame prediction apparatus is configured to perform the steps performed by the decoding apparatus in the above inter-frame prediction method. The inter-frame prediction apparatus provided in the embodiment of the present application may include modules corresponding to the respective steps.

In the embodiment of the present application, the inter-frame prediction apparatus may be divided into functional modules according to the above method, for example, each functional module may be divided according to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The division of the modules in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Fig. 18 is a schematic diagram showing a possible structure of the inter prediction apparatus according to the above embodiment, in a case where each functional module is divided according to each function. As shown in fig. 18, the inter prediction apparatus 1800 may include a parsing module 1801, a compensation module 1802, and a calculation module 1803. Specifically, the functions of the modules are as follows:

The parsing module 1801 is configured to parse the code stream to obtain motion information of the image block to be processed.

A compensation module 1802, configured to perform motion compensation on the to-be-processed image block based on the motion information to obtain a prediction block of the to-be-processed image block, where the prediction block of the to-be-processed image block includes a prediction value of a target pixel.

A calculating module 1803, configured to perform weighted calculation on the reconstructed values of one or more reference pixel points and the predicted value of the target pixel point to update the predicted value of the target pixel point, where the reference pixel point and the target pixel point have a preset spatial position relationship.

In a possible embodiment, the one or more reference pixels include reconstructed pixels having the same abscissa as the target pixel and a preset ordinate difference, or reconstructed pixels having the same ordinate as the target pixel and a preset abscissa difference.

In a possible implementation manner, the calculating module 1803 updates the predicted value of the target pixel point according to the following formula:

the coordinates of the target pixel point are (xP, yP), the coordinates of an upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xP, yN-M2), w1, w2, w3, w4, w5, w6 are preset constants, and M1, M2 are preset positive integers.

In one possible embodiment, w1+ w2 ═ R, or w3+ w4 ═ R, or w5+ w6+ w7 ═ R, where R is the power n of 2 and n is a non-negative integer.

the coordinates of the target pixel point are (xP, yP), the coordinates of an upper left-hand pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon) xP, yN-M3), recon (xP, yN-M4) are reconstructed values of the reference pixel point located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, 2, 3, w4, w5, w6, w7, w8, w5, w10, w11, w1, w 599, M599, w 599 is a preset integer.

In one possible embodiment, w1+ w2+ w3 is S, or w4+ w5+ w6 is S, or w7+ w8+ w9+ w10+ w11 is S, where S is the n-th power of 2 and n is a non-negative integer.

The coordinates of the target pixel point are (xP, yP), the coordinates of an upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xP, yN-M2) are reconstructed values of the reference pixel points located at coordinate positions (xN-M1, yP), (xP, yN-M2), w1, w2, w3 are preset constants, and M1, M2 are preset positive integers.

In one possible embodiment, w1+ w2+ w3 is R, where R is 2 to the nth power and n is a non-negative integer.

the coordinates of the target pixel point are (xP, yP), the coordinates of an upper left-hand pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is a predicted value before updating of the target pixel point, predQ (xP, yP) is an updated predicted value of the target pixel point, recon (xN-M1, yP), recon (xN-M2, yP), recon (xP, yN-M3), recon (xP, yN-M4) are reconstructed values of the reference pixel point located at coordinate positions (xN-M1, yP), (xN-M2, yP), (xP, yN-M3), (xP, yN-M4), w1, w2, w3, w4, w5 preset constants, M1, M2, M3, and M4 are preset positive integers.

In one possible embodiment, w1+ w2+ w3+ w4+ w5 is S, where S is the power n of 2 and n is a non-negative integer.

In one possible embodiment, the one or more reference pixels include one or more of the following pixels: reconstructed pixel points which have the same horizontal coordinates with the target pixel points and are adjacent to the upper edge of the image block to be processed; or, a reconstructed pixel point which has the same vertical coordinate with the target pixel point and is adjacent to the left edge of the image block to be processed; or, the reconstructed pixel point at the upper right corner of the image block to be processed; or, the reconstructed pixel point at the lower left corner of the image block to be processed; or, the reconstructed pixel point at the upper left corner of the image block to be processed.

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP) +((w1+w2)/2))/(w1+w2)

wherein the content of the first and second substances,

In a possible implementation, the predicted value of the target pixel point is updated according to the following formula:

predQ(xP,yP)＝(w1*predP(xP，yP) +w2*predV(xP,yP) +w3*predH(xP,yP)+((w1+w2+w3)/2))/(w1+w2+w3)

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH)+nTbH/2)>>Log2(nTbH),

wherein the content of the first and second substances,

predV(xP,yP)＝((nTbH-1-yP)*p(xP,-1)+(yP+1)*p(-1,nTbH))<<Log2(nTbW),

predQ(xP,yP)＝(w1*predP(xP,yP)+w2*predP1(xP,yP) +((w1+w2)/2))/(w1+w2)

where predP1(xP, yP) ═ predV (xP, yP) + predH (xP, yP) +1) > 1,

predV (xP, yP) ((nTbH-1- (yP-yN))) recan (xP, yN-1) + (yP-yN +1) × recan (xN-1, yN + nTbH) + (nTbH > 1)) > Log2(nTbH), predH (xP, yP) = ((nTbW-1- (xP-xN))) recan (xN-1, yP) + (xP-xN +1) >) recan (xN + nTbW, yN-1) + (nTbW >) (nTbW > 1)) > 2(nTbW), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left corner of the pixel point in the image block to be processed are (xN, yN), predP (xP, yP) are the coordinates of the target pixel point, and the predicted value of the target pixel point is (xP-1), the predicted value of the pixel point is updated by preq-1, and the predicted value of the target pixel point is updated by (xP-1, xP-rep, rep-rep (xP-rep) is updated by xN, where the predicted value is equal to be equal to the target pixel point, yN + nTbH), recon (xN-1, yP), recon (xN + nTbW, yN-1) are the reconstructed values of the reference pixels located at coordinate positions (xP, yN-1), (xN-1, yN + nTbH), (xN-1, yP), (xN + nTbW, yN-1), w1, w2 are preset constants, nTbW and nTbH are the width and height of the image block to be processed, respectively.

In one possible embodiment, the sum of w1 and w2 is the power n of 2, where n is a non-negative integer.

Wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (32 > ((xP < 1) > nScale), wTL (xP, yP) ((wl) (xP) > 4) + (wt (yP) > 4)), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left corner in the image block to be processed are (xN, yN), predP (xP, yP) are the coordinates of the target pixel point, preq (xP-1, yP) before updating, the predicted value of the target pixel point is (pxn-1, yP-1, and the predicted value of the target pixel point is (yP-1-xP, yP-1, and the predicted value of the target pixel point is (nxp-1, yP-nxn-1, respectively, yN-1), (xN-1, yP), (xN-1, yN-1) and nTbW and nTbH are the width and height of the image block to be processed, and clip1Cmp is clamping operation.

wherein the content of the first and second substances,

refL (xP, yP) ═ recon (xN-1, yP), refT (xP, yP) ═ recon (xP, yN-1), wt (yP) ═ 32 > (yP < 1) > nScale), wl (xP) > (xP < 1) > nScale), nScale ((Log2(nTbW) + Log2(nTbH) -2) > 2), the coordinates of the target pixel point are (xP, yP), the coordinates of the upper left pixel point in the image block to be processed are (xN, yN), predP (xP, yP) is the predicted value of the target pixel point before update, predQ (xP, yP) is the predicted value of the target pixel point after update, recon (xN-1, yP), the coordinates of prebn (xP, yP-1, yP) are located at the position of the target pixel point, tbxn, and the reference pixel point is located at the position of tbnxn, respectively, the target pixel point-1-nxp, and the reference point-nxp, the target image block to be processed, clip1Cmp is a clamping operation.

In a feasible implementation manner, the calculating module 1803 is further configured to, when the reconstruction value of the reference pixel is unavailable, determine, according to a preset sequence, the availability of pixels adjacent to the upper edge and the left edge of the image block to be processed until a preset number of available reference pixels are obtained; and carrying out weighted calculation on the reconstruction value of the available reference pixel point and the predicted value of the target pixel point.

In a possible implementation, the calculating module 1803 is further configured to: when the reference pixel point is positioned above the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of the left and right adjacent pixel points of the reference pixel point; when the reference pixel point is positioned at the left side of the image block to be processed, carrying out weighted calculation on the reconstruction value of the reference pixel point and the reconstruction values of the upper and lower adjacent pixel points of the reference pixel point; and updating the reconstruction value of the reference pixel point by adopting the result of the weighted calculation.

In a possible implementation, the calculating module 1803 is further configured to: performing initial updating on the motion information through a first preset algorithm; correspondingly, the compensation module 1802 is specifically configured to: and performing motion compensation on the image block to be processed based on the initially updated motion information.

In a possible implementation, the calculating module 1803 is further configured to: pre-updating the prediction block through a second preset algorithm; correspondingly, the calculating module 1803 is specifically configured to: and carrying out weighted calculation on the reconstruction values of the one or more reference pixel points and the pre-updated prediction value of the target pixel point.

In a possible implementation, the calculating module 1803 is further configured to: and updating the predicted value of the target pixel point through a second preset algorithm.

In a possible implementation manner, the parsing module 1801 is further configured to: analyzing the code stream to obtain a prediction mode of the image block to be processed; determining the prediction mode as a merge mode (merge) and/or an inter advanced motion vector prediction mode (inter AMVP).

In a possible implementation manner, the parsing module 1801 is further configured to: analyzing the code stream to obtain the updating judgment identification information of the image block to be processed; and determining that the updating judgment identification information indicates to update the prediction block of the image block to be processed.

In a possible implementation, the calculating module 1803 is further configured to: acquiring preset updating judgment identification information of the image block to be processed; and determining that the updating judgment identification information indicates to update the prediction block of the image block to be processed.

Fig. 19 is a schematic block diagram of an inter-frame prediction apparatus 1900 according to an embodiment of the present application. Specifically, the method comprises the following steps: a processor 1901 and a memory 1902 coupled to said processor; the processor 1901 is used to execute the embodiment shown in fig. 13 and various possible implementations.

The Processor 1901 may be a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an ASIC, an FPGA or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others.

All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

Although particular aspects of the present application have been described with respect to video encoder 100 and video decoder 200, it should be understood that the techniques of the present application may be applied with many other video encoding and/or encoding units, processors, processing units, hardware-based encoding units such as encoders/decoders (CODECs), and the like. Moreover, it should be understood that the steps shown and described with respect to fig. 13 are provided as only possible implementations. That is, the steps shown in the possible implementation of FIG. 13 need not necessarily be performed in the order shown in FIG. 13, and fewer, additional, or alternative steps may be performed.

Moreover, it is to be understood that certain actions or events of any of the methods described herein can be performed in a different sequence, added, combined, or left out together (e.g., not all described actions or events are necessary for the practice of the methods), depending on the possible implementations. Further, in certain possible implementations, acts or events may be performed concurrently, e.g., via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially. Additionally, although specific aspects of the disclosure are described as being performed by a single module or unit for purposes of clarity, it should be understood that the techniques of this disclosure may be performed by a combination of units or modules associated with a video decoder.

In one or more possible implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol.

In this manner, the computer-readable medium illustratively may correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

Such computer-readable storage media may include, as a possible implementation and not limitation, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that may be used to store desired code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.

It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, as used herein, the term "processor" may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Likewise, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including wireless handsets, Integrated Circuits (ICs), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described previously, the various units may be combined in a codec hardware unit or provided by an interoperative hardware unit (including one or more processors as described previously) in conjunction with a collection of suitable software and/or firmware.

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An inter-frame prediction method, comprising:

obtaining a first predicted value of a target pixel point of an image block to be processed through interframe prediction;

When the updating judgment identification of the image block to be processed is true, obtaining a second predicted value of the target pixel point from the prediction of the adjacent pixels in the airspace through a PLANAR mode (PLANAR) in intra-frame prediction;

performing weighted calculation on the first predicted value and the second predicted value to obtain an updated predicted value of the target pixel point, wherein a weighting coefficient of the first predicted value is different from a weighting coefficient of the second predicted value;

obtaining an updated predicted value predQ (xP, yP) of the target pixel point according to w1 × predP (xP, yP) and w2 × predP1(xP, yP);

wherein (xP, yP) is a coordinate of the target pixel point, predP (xP, yP) is the first predicted value, predP1(xP, yP) is the second predicted value, w1 is a weighting coefficient of the first predicted value, w2 is a weighting coefficient of the second predicted value, w1 and w2 are preset constants, and w1 and w2 are not equal.

2. The method of claim 1, wherein obtaining the first prediction value of the target pixel point through inter prediction comprises:

analyzing the code stream to obtain motion information of the image block to be processed;

and performing motion compensation on the image block to be processed based on the motion information to obtain a prediction block of the image block to be processed, wherein the prediction block of the image block to be processed comprises a first prediction value of the target pixel point.

3. The method according to any of claims 1-2, wherein the set of weighting factors (w1, w2) is (6, 2) or (5, 3).

4. The method of any one of claims 1-2, wherein the sum of w1 and w2 is the nth power of 2, wherein n is a non-negative integer.

5. An inter-frame prediction apparatus, comprising:

the analysis module is used for analyzing the code stream to obtain the motion information of the image block to be processed;

the compensation module is used for performing motion compensation on the image block to be processed based on the motion information so as to obtain a prediction block of the image block to be processed, wherein the prediction block of the image block to be processed comprises a prediction value of a target pixel point;

the calculation module is used for carrying out weighted calculation on the reconstruction values of one or more reference pixel points and the predicted value of the target pixel point so as to update the predicted value of the target pixel point, wherein the reference pixel points and the target pixel point have a preset spatial position relationship, and when the update judgment identification of the image block to be processed is true, a PLANAR mode (PLANAR) in intra-frame prediction is used for obtaining the reconstruction values of the one or more reference pixel points from the reference pixel point prediction;

Wherein the updated predicted value predQ (xP, yP) is obtained from w1 × predP (xP, yP) and w2 × predP1(xP, yP);

(xP, yP) are coordinates of the target pixel point, predP (xP, yP) is a predicted value of the target pixel point, predP1(xP, yP) is a reconstruction value of the one or more reference pixel points, w1 is a weighting coefficient of the reconstruction value of the one or more reference pixel points, w2 is a weighting coefficient of the predicted value of the target pixel point, w1 and w2 are preset constants, and w1 is not equal to w 2.

6. The apparatus of claim 5, wherein the one or more reference pixels comprise one or more of:

reconstructed pixel points which have the same abscissa as the target pixel points and have preset ordinate differences; or

The reconstructed pixel points have the same vertical coordinates as the target pixel points and have preset horizontal coordinate differences; or

Reconstructed pixel points which have the same horizontal coordinates with the target pixel points and are adjacent to the upper edge of the image block to be processed; alternatively, the first and second electrodes may be,

reconstructed pixel points which have the same vertical coordinate with the target pixel points and are adjacent to the left edge of the image block to be processed; alternatively, the first and second electrodes may be,

The reconstructed pixel point at the upper right corner of the image block to be processed; alternatively, the first and second electrodes may be,

the reconstructed pixel point at the lower left corner of the image block to be processed; alternatively, the first and second electrodes may be,

and the reconstructed pixel point at the upper left corner of the image block to be processed.

7. The apparatus of claim 5,

the calculation module is further configured to determine, according to a preset sequence, the availability of pixels adjacent to the upper edge and the left edge of the image block to be processed until a preset number of available reference pixels are obtained, when the reconstruction value of the reference pixel is unavailable; and carrying out weighted calculation on the reconstruction value of the available reference pixel point and the predicted value of the target pixel point.

8. The apparatus of claim 5, wherein the computing module is further configured to: performing initial updating on the motion information through a first preset algorithm;

correspondingly, the compensation module is specifically configured to:

and performing motion compensation on the image block to be processed based on the initially updated motion information.

9. The apparatus of claim 5, wherein the computing module is further configured to: pre-updating the prediction block through a second preset algorithm;

Correspondingly, the calculation module is specifically configured to:

and carrying out weighted calculation on the reconstruction values of the one or more reference pixel points and the pre-updated prediction value of the target pixel point.

10. The apparatus of claim 5, wherein the parsing module is further configured to:

analyzing the code stream to obtain a prediction mode of the image block to be processed;

determining the prediction mode as a merge mode (merge) and/or an inter advanced motion vector prediction mode (inter AMVP).

11. The apparatus according to claim 5, wherein the set of weighting coefficients (w1, w2) is (6, 2) or (5, 3).

12. The apparatus of claim 5, wherein the sum of w1 and w2 is the power n of 2, where n is 2 or 3.

13. A computer storage medium having computer program code stored therein, which when run on a processor causes the processor to perform the prediction method according to any one of claims 1-4.