WO2019062544A1 - 视频图像的帧间预测方法、装置及编解码器 - Google Patents

视频图像的帧间预测方法、装置及编解码器 Download PDF

Info

Publication number
WO2019062544A1
WO2019062544A1 PCT/CN2018/105148 CN2018105148W WO2019062544A1 WO 2019062544 A1 WO2019062544 A1 WO 2019062544A1 CN 2018105148 W CN2018105148 W CN 2018105148W WO 2019062544 A1 WO2019062544 A1 WO 2019062544A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
motion information
current
image block
current image
Prior art date
Application number
PCT/CN2018/105148
Other languages
English (en)
French (fr)
Inventor
张娜
郑建铧
安基程
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18863727.6A priority Critical patent/EP3672249B1/en
Publication of WO2019062544A1 publication Critical patent/WO2019062544A1/zh
Priority to US16/832,707 priority patent/US11252436B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • the present application relates to the field of video codec technology, and in particular, to an interframe prediction method and apparatus for video images, and a corresponding encoder and decoder.
  • Digital video capabilities can be incorporated into a wide variety of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, Digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones (so-called "smart phones"), video teleconferencing devices, video streaming devices and the like .
  • Digital video devices implement video compression techniques, for example, standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 Part 10 Advanced Video Coding (AVC), The video coding standard H.265/High Efficiency Video Coding (HEVC) standard and the video compression techniques described in the extension of such standards.
  • Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
  • Video compression techniques perform spatial (intra-image) prediction and/or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences.
  • a video slice ie, a video frame or a portion of a video frame
  • the image block in the intra-coded (I) slice of the image is encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image.
  • An image block in an inter-coded (P or B) slice of an image may use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images.
  • An image may be referred to as a frame, and a reference image may be referred to as a reference frame.
  • various video coding standards including the High Efficiency Video Coding (HEVC) standard propose a predictive coding mode for an image block, that is, predict a block to be currently coded based on an already encoded video data block.
  • HEVC High Efficiency Video Coding
  • the intra prediction mode the current block is predicted based on one or more previously decoded neighboring blocks in the same picture as the current block; in the inter prediction mode, the current block is predicted based on the already decoded blocks in the different pictures Piece.
  • An embodiment of the present invention provides an interframe prediction method and apparatus for a video image, and a corresponding encoder and decoder, which improve the prediction accuracy of the motion information of the image block to a certain extent, thereby improving the encoding and decoding performance.
  • an embodiment of the present application provides an inter prediction method for a video image, including: determining an inter prediction mode for performing inter prediction on a current image block, where the inter prediction mode is a candidate interframe mode One of a set of prediction modes, the set of candidate inter prediction modes comprising: a plurality of inter prediction modes for non-directional motion fields, and/or a plurality of inter prediction modes for directional motion fields; And performing inter prediction on the current image block based on the determined inter prediction mode.
  • the performing inter prediction on the current image block based on the determined inter prediction mode includes:
  • the motion information of the block performs inter prediction on the current image block.
  • the candidate inter prediction mode set herein may be one mode or multiple modes.
  • the candidate inter prediction mode set is a mode (eg, a first inter prediction mode for a non-directional motion field, also referred to as a planar planar mode for inter prediction)
  • the mode may be determined to be used for The inter prediction mode in which the current image block performs inter prediction.
  • the candidate inter prediction mode set is multiple modes, it may be determined by default that the mode with the highest priority or the highest position in the set is the inter prediction mode for inter prediction of the current image block; or It may be determined that the mode indicated by the second identifier is an inter prediction mode for performing inter prediction on the current image block; or, the first inter prediction mode for the non-directional motion field may be determined to be used for the current image block An inter prediction mode in which inter prediction is performed.
  • the multiple inter prediction modes for the non-directional motion field mentioned in the embodiments of the present application may include, for example, a first inter prediction mode for a non-directional motion field (also referred to as A planar planar mode for inter prediction, or an interpolated inter prediction mode), a second inter prediction mode for a non-directional motion field (also referred to as a DC coefficient DC mode for inter prediction).
  • a first inter prediction mode for a non-directional motion field also referred to as A planar planar mode for inter prediction, or an interpolated inter prediction mode
  • a second inter prediction mode for a non-directional motion field also referred to as a DC coefficient DC mode for inter prediction.
  • the plurality of inter prediction modes for the directional motion field mentioned in the embodiments of the present application may include, for example, various directional prediction modes for inter prediction.
  • the new inter prediction mode is divided into an inter prediction mode for a non-directional motion field and/or an inter prediction mode for a directional motion field by considering the characteristics of the motion field, regardless of which one is used for
  • the directional or non-directional inter prediction modes can predict the motion information (such as motion vector) of one or more sub-blocks in the current image block, so that the predicted motion vector of the current image block is obtained from the result.
  • the prediction accuracy of the motion vector is improved, and the motion vector difference MVD does not need to be transmitted when encoding, and the code rate is saved in the case of the same video quality, and the codec is further improved. improve.
  • the determined inter prediction mode is a first inter prediction mode for a non-directional motion field (also referred to as a planar planner mode for inter prediction), correspondingly
  • the predicting the motion vector of the current sub-block in the current image block includes:
  • the weighted value is a first predicted value of the motion information of the current sub-block, wherein a ratio between the weighting factor of the third motion information and the weighting factor of the first motion information is based on the same line as the current sub-block of the current image block. a first distance between the right-side spatial neighboring block of the current image block and the current sub-block, and the current sub-block and the left-side spatial neighboring block of the current image block on the same row as the current sub-block The ratio between the second distances is determined;
  • the weighted value of the motion information is a second predicted value of the motion information of the current sub-block, wherein the ratio between the weighting factor of the fourth motion information and the weighting factor of the second motion information is based on the same as the current sub-block of the current image block.
  • a third distance between the lower side spatial neighboring block of the current image block and the current sub-block, and the current sub-block and the upper airspace of the current image block on the same column as the current sub-block The ratio between the fourth distances between adjacent blocks is determined;
  • Determining a motion vector of the current sub-block by using a first prediction value of a motion vector of the current sub-block and a second prediction value of a motion vector of the current sub-block; in a possible implementation, for example, The first predicted value of the motion vector of the current sub-block and the second predicted value of the motion vector of the current sub-block are weighted to obtain a motion vector of the current sub-block.
  • the manner of predicting or deriving the first motion information of the right-side airspace neighboring block of the current image block on the same line as the current sub-block of the current image block may be various, as follows:
  • the first mode is: based on linear interpolation of the fifth motion information of the right-left spatial adjacent block of the first collocated block of the current image block and the sixth motion information of the adjacent block of the upper-right spatial space of the current image block,
  • the first motion information, wherein the first collocated block is an image block of the reference image having the same size, shape, and coordinates as the current image block; or
  • a second mode determining motion information of a first right-side spatial neighboring block of the first co-located block of the current image block as the first motion information, where the first right airspace is adjacent to The row of the block located in the first collocated block is the same as the row of the current sub-block located in the current image block; or
  • a third mode determining motion information of a second right-side spatial neighboring block of the second co-located block of the current image block as the first motion information, where the second collocated block is An image block in the reference image having a specified position offset from the current image block, a motion vector of a representative airspace neighboring block of the current image block is used to represent the specified position offset, and the second right airspace
  • the row of the neighboring block located in the second collocated block is the same as the row of the current sub-block located in the current image block; or
  • a fourth mode determining a sixth motion information of a spatially adjacent block of the upper right corner of the current image block as the first motion information; or determining motion information of multiple airspace neighboring blocks of an upper right side of the current image block
  • the mean value is the first motion information.
  • the above four derivation manners may be used in combination according to a certain logic. For example, if the first motion information is deduced without using the first manner, the fourth motion information is further used to derive the first motion information. And, for example, the first, second, third, and fourth modes are sequentially used to derive the first motion information.
  • the manner of predicting or deriving the second motion information of the lower airspace neighboring block of the current image block on the same column as the current sub-block of the current image block may be various, as follows:
  • the first mode is: based on linear interpolation of the fifth motion information of the right-side spatial adjacent block of the first collocated block of the current image block and the seventh motion information of the neighboring block of the lower-left spatial domain of the current image block,
  • the second motion information wherein the first collocated block is an image block of the reference image having the same size, shape, and coordinates as the current image block; or
  • a second mode determining motion information of a first lower-side airspace neighboring block of the first co-located block of the current image block as the second motion information, where the first collocated block is An image block of the reference image having the same size, shape, and coordinates as the current image block, the first lower spatial neighboring block being located in the column of the first collocated block and the current sub-block being located in the current image block The same column; or
  • a third mode determining motion information of a second lower-side spatial neighboring block of the second co-located block of the current image block as the second motion information, where the second collocated block is An image block in the reference image having a specified position offset from the current image block, a motion vector of a representative airspace neighboring block of the current image block is used to represent the specified position offset, and the second lower airspace a column in which the neighboring block is located in the second collocated block is the same as a column in which the current sub-block is located in the current image block; or
  • a fourth mode determining that the seventh motion information of the spatially adjacent block in the lower left corner of the current image block is the second motion information; or determining motion information of multiple airspace neighboring blocks on the lower left side of the current image block
  • the mean value is the second motion information.
  • the above four derivation manners may be used in combination according to a certain logic. For example, if the second motion information is derived without using the first manner, the fourth motion information is further used to derive the second motion information. And, for example, the first, second, third, and fourth modes are sequentially used to derive the second motion information.
  • the weight vector of the horizontal and vertical linear interpolation is used to derive the motion vector of the current sub-block, which can better predict the gradient.
  • the motion vector of the image block of the moving field or its sub-blocks thereby improving the prediction accuracy of the motion vector.
  • the determined inter prediction mode is a second inter prediction mode for a non-directional motion field (also referred to as a DC coefficient DC mode for inter prediction),
  • the predicting motion information of the current sub-block in the current image block includes:
  • the mean value of the fourth motion information of the spatial neighboring block is the motion information of the current sub-block
  • the motion information of the plurality of left-side spatial neighboring blocks of the current image block and the motion information of the plurality of upper-side spatial neighboring blocks of the current image block are one or more sub-blocks (eg, all sub-blocks) of the current image block Sports information.
  • the current sub-segment of the direct left spatial adjacent block and the upper spatial adjacent block of the current image block is used to derive the current sub-score.
  • the motion vector of the block can better predict the motion vector of the image block or its sub-blocks with smooth motion field, thereby improving the prediction accuracy of the motion vector.
  • the determined inter prediction mode is an inter-frame prediction mode (also referred to as a directional prediction mode for inter prediction) for a directional motion field, the prediction
  • the motion information of the current sub-block in the current image block includes:
  • Determining motion information of a target reference block also referred to as a projected reference block as motion information of a current sub-block of the current image block
  • the target reference block is a reference block corresponding to the current sub-block determined on the reference row or the reference column according to a prediction direction (angle) corresponding to the inter-frame direction prediction mode.
  • the reference row or the reference column does not belong to the current image block, and the reference block on the reference row is a row of upper airspace neighboring blocks adjacent to the first row of the current image block.
  • the reference block on the reference column is a column of left-space adjacent blocks adjacent to the first column of the current image block.
  • the motion vectors of one or more sub-blocks along the prediction direction are identical to each other and the value of the motion vector depends on the motion vector of the target reference block. Therefore, the motion vector of the image block or its sub-blocks with the directional motion field can be better predicted, and the prediction accuracy of the motion vector is improved.
  • the motion information mentioned in the embodiments of the present application mainly refers to a motion vector, but it should be understood that the motion information may further include reference image information, where the reference image information may include, but is not limited to, a reference image list and a reference image list. Reference image index.
  • the method of each embodiment of the present application may further include:
  • the motion vector corresponding to the specified reference image list included in the current motion motion information is based on the time domain A scaling process of the distance to obtain a motion vector of a reference frame that points to the target reference image index.
  • the method of various embodiments of the present application may further include: based on a prediction direction or an angle corresponding to the inter-frame direction prediction mode, The motion information of the target reference block is selectively filtered.
  • the method of the various embodiments of the present application may further include: filtering motion information of a boundary sub-block of the current image block, where the boundary sub-block is located in the current image block.
  • One or more sub-blocks are included in the motion information of the boundary sub-block of the current image block.
  • the motion information of the boundary sub-block of the current image block is filtered under a specific inter prediction mode (eg, a second inter prediction mode, a vertical prediction mode, a horizontal prediction mode, etc.).
  • the determining an inter prediction mode for performing inter prediction on a current image block includes: when the inter prediction data indicates that the current image block is predicted by using a candidate inter prediction mode set, An inter prediction mode for performing inter prediction on a current image block is determined in the candidate inter prediction mode set.
  • the inter prediction data further includes a second identifier for indicating an inter prediction mode of the current image block
  • determining the inter prediction for inter prediction of the current image block The mode includes: determining an inter prediction mode indicated by the second identifier as an inter prediction mode for performing inter prediction on the current image block; or
  • the determining an inter prediction mode for performing inter prediction on the current image block includes:
  • a first inter prediction mode (also referred to as a Planar Planar mode for inter prediction) for determining a non-directional motion field is an inter prediction mode for inter prediction of the current image block.
  • the determining an inter prediction mode for performing inter prediction on a current image block includes: determining the candidate inter prediction.
  • the inter prediction mode in which the code rate distortion cost of encoding the current image block in the mode set is the inter prediction mode for inter prediction of the current image block. It should be understood that if the first inter prediction mode (also referred to as the Planar Planar mode for inter prediction) of the smoothed or faded motion field encodes the current image block with the lowest rate distortion cost, then the determination is made.
  • the first inter prediction mode for the smoothed or gradual motion field is an inter prediction mode for inter prediction of the current image block.
  • the inter prediction method in each embodiment of the present application may further include:
  • inter prediction data includes: a first identifier for indicating whether to perform inter prediction on the current image block by using the candidate inter prediction mode set; or the frame
  • the inter prediction data includes: a first identifier for indicating whether to use the candidate inter prediction mode set for the current image block for inter prediction, and a second identifier for indicating an inter prediction mode of the current image block.
  • an embodiment of the present application provides an inter prediction apparatus for a video image, including a plurality of functional units for implementing any one of the methods of the first aspect.
  • the inter prediction device of the video image may include:
  • An inter prediction mode determining unit configured to determine an inter prediction mode for performing inter prediction on a current image block, where the inter prediction mode is one of a candidate inter prediction mode set, and the candidate inter mode
  • the set of prediction modes includes: a plurality of inter prediction modes for non-directional motion fields, and/or a plurality of inter prediction modes for directional motion fields;
  • An inter prediction processing unit configured to perform inter prediction on the current image block based on the determined inter prediction mode.
  • the inter prediction processing unit is specific Used for:
  • Determining motion information of the current sub-block by using a first predicted value of motion information of the current sub-block and a second predicted value of motion information of the current sub-block; in a possible implementation, for example, The first predicted value of the motion vector of the current sub-block and the second predicted value of the motion vector of the current sub-block are weighted to obtain a motion vector of the current sub-block.
  • the inter-prediction apparatus of the video image is applied, for example, to a video encoding apparatus (video encoder) or a video decoding apparatus (video decoder).
  • an embodiment of the present application provides a video encoder, where the video encoder is used to encode an image block, including:
  • inter predictor configured to predict a prediction block of an image block to be encoded based on an inter prediction mode, wherein the inter prediction mode is one of a set of candidate inter prediction modes;
  • An entropy encoder configured to encode a first identifier into a code stream, where the first identifier is used to indicate whether to perform inter prediction on the to-be-coded image block by using the candidate inter prediction mode set, in other words, the first An identifier is used to indicate whether to adopt a new inter prediction mode for the current image block to be encoded;
  • a reconstructor configured to reconstruct the image block according to the prediction block, and store the reconstructed image block in a memory.
  • the entropy encoder is further configured to encode a second identifier into a code stream, where the second identifier is used to indicate an inter prediction mode of the image block to be encoded, in other words, The second identifier is used to indicate which new inter prediction mode is to be used for inter prediction of the image block to be encoded.
  • the embodiment of the present application provides a video decoder, where the video decoder is used to decode an image block from a code stream, including:
  • An entropy decoder configured to decode a first identifier from a code stream, where the first identifier is used to indicate whether to use the candidate inter prediction mode set for inter prediction in the image block to be decoded, in other words, the first identifier is used for Instructing whether to apply a new inter prediction mode to the decoded image block;
  • inter predictor configured to predict a prediction block of the to-be-decoded image block based on an inter prediction mode, wherein the inter prediction mode is one of the candidate inter prediction mode sets Species
  • the entropy decoder is further configured to decode a second identifier from the code stream, where the second identifier is used to indicate an inter prediction mode of the image block to be decoded, in other words, The second identifier is used to indicate which new inter prediction mode is used by the image block to be decoded.
  • an embodiment of the present application provides an apparatus for decoding video data, where the apparatus includes:
  • a memory for storing video data in the form of a code stream
  • a video decoder configured to decode, from the code stream, inter prediction data including a first identifier, where the first identifier is related to a current image block to be decoded; and when the first identifier is true, based on an inter prediction mode Performing inter prediction on the current image block to be decoded, wherein the inter prediction mode is one of a set of candidate inter prediction modes, and the candidate inter prediction mode set includes: a non-directional motion field A variety of inter prediction modes, and/or multiple inter prediction modes for directional motion fields.
  • an embodiment of the present application provides an apparatus for encoding video data, where the apparatus includes:
  • a memory for storing video data, the video data comprising one or more image blocks;
  • a video encoder configured to encode inter prediction data including a first identifier, where the first identifier is related to a current image block to be encoded; wherein, when the first identifier is true, the first And the identifier is used to indicate that the inter prediction is performed on the current to-be-coded image block according to an inter prediction mode, where the inter prediction mode is one of a candidate inter prediction mode set, where the candidate inter prediction mode set includes : multiple inter prediction modes for non-directional motion fields, and/or multiple inter prediction modes for directional motion fields.
  • an embodiment of the present application provides an apparatus for decoding video data, where the device includes:
  • a memory for storing encoded video data
  • a video decoder configured to predict first motion information of a right-side spatial neighboring block of the image block on a same line as a current sub-block of a current image block to be decoded, and a current sub-segment of the current image block to be decoded Second motion information of the lower side spatial neighboring block of the image block on the same column; based on the first motion information and the left neighboring block of the image block on the same line as the current subblock Linear interpolation of three motion information to obtain a first predicted value of motion information of the current sub-block; based on the second motion information and a fourth motion of an upper neighboring block of the image block on the same column as the current sub-block Linear interpolation of the information, obtaining a second predicted value of the motion information of the current sub-block; using the first predicted value of the motion information of the current sub-block and the second predicted value of the motion information of the current sub-block to obtain the The motion information of the current sub-block; and decoding the image block by using motion information
  • the embodiment of the present application provides a motion information prediction method, where the method includes:
  • an embodiment of the present application provides an encoding apparatus, including: a non-volatile memory coupled to a processor and a processor, wherein the processor calls a program code stored in the memory to perform any one of the first aspects. Part or all of the steps of the method.
  • the embodiment of the present application provides a decoding apparatus, including: a non-volatile memory and a processor coupled to each other, the processor calling a program code stored in the memory to perform any one of the first aspects. Part or all of the steps of the method.
  • the embodiment of the present application provides a computer readable storage medium, where the program code stores program code, where the program code includes a part for performing any one of the first aspects. Or instructions for all steps.
  • the embodiment of the present application provides a computer program product, when the computer program product is run on a computer, causing the computer to perform some or all of the steps of any one of the first aspects.
  • an embodiment of the present application provides an inter prediction method for a video image, including: determining an inter prediction mode for performing inter prediction on a current image block, where the inter prediction mode is a candidate frame.
  • One of a set of inter prediction modes, the set of candidate inter prediction modes comprising: a first inter prediction mode for a smoothed or faded motion field; and predicting the current based on the determined inter prediction mode Motion information of each sub-block in the image block, and performing inter prediction on the current image block using motion information of each sub-block in the current image block.
  • the embodiment of the present application provides an inter prediction apparatus for a video image, including a plurality of functional units for implementing any one of the thirteenth aspects.
  • the inter prediction device of the video image may include:
  • An inter prediction mode determining unit configured to determine an inter prediction mode for performing inter prediction on a current image block, where the inter prediction mode is one of a candidate inter prediction mode set, and the candidate inter mode
  • the set of prediction modes includes: a first inter prediction mode for a smooth or gradual motion field;
  • An inter prediction processing unit configured to predict motion information of each sub-block in the current image block based on the determined inter prediction mode, and use the motion information of each sub-block in the current image block to view the current The image block performs inter prediction.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system in an embodiment of the present application
  • FIG. 2A is a schematic block diagram of a video encoder in an embodiment of the present application.
  • 2B is a schematic block diagram of a video decoder in an embodiment of the present application.
  • FIG. 3 is a flowchart of a method for encoding inter prediction of a video image according to an embodiment of the present application
  • FIG. 4 is a flowchart of a method for decoding inter prediction of a video image according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of multiple candidate inter prediction modes in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of motion information of an exemplary current image block and a neighboring reference block in an embodiment of the present application
  • FIG. 7 is a flowchart of a method for acquiring motion information of a current sub-block in a current image block based on a first inter prediction mode for a non-directional motion field in an embodiment of the present application;
  • 8A to 8D are schematic diagrams showing the principle of four exemplary first inter prediction modes in the embodiment of the present application.
  • FIG. 9 is a schematic diagram of an exemplary second inter prediction mode in an embodiment of the present application.
  • 10A to 10E are schematic diagrams showing the principle of five exemplary inter-frame direction prediction modes in the embodiments of the present application.
  • FIG. 11 is a schematic block diagram of an inter prediction apparatus for a video image according to an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of an encoding device or a decoding device according to an embodiment of the present application.
  • An encoded video stream, or a portion thereof, such as a video frame or image block may use temporal and spatial similarities in the video stream to improve encoding performance.
  • the current image block of the video stream can predict the motion information for the current image block based on the previously encoded block in the video stream and identify the difference between the prediction block and the current image block (ie, the original block) (also known as the original block). Is the residual), thereby encoding the current image block based on the previously encoded block.
  • the residuals and some parameters used to generate the current image block are included in the digital video output bitstream, rather than including the entirety of the current image block in the digital video output bitstream. This technique can be called inter prediction.
  • a motion vector is an important parameter in the inter prediction process that represents the spatial displacement of a previously coded block relative to the current coded block.
  • Motion estimation methods such as motion search, can be used to obtain motion vectors.
  • bits representing motion vectors are included in the encoded bitstream to allow the decoder to reproduce the prediction block, thereby obtaining a reconstructed block.
  • it has been proposed to encode the motion vector differentially using the reference motion vector that is, instead of coding the motion vector as a whole, only the difference between the motion vector and the reference motion vector is encoded.
  • the reference motion vector may be selected from the motion vectors previously used in the video stream, and selecting the previously used motion vector to encode the current motion vector may further reduce the number of bits included in the encoded video bitstream. .
  • inter-prediction modes there are two inter-prediction modes for a prediction unit (PU), which are respectively referred to as merging (skip is considered a special case of merging) and Advanced Motion Vector Prediction (AMVP) mode, in order to further improve the codec performance, for example, encoding does not need to transmit the difference between the motion vector of the current coding block and the reference motion vector, and reduce the residual value to be transmitted as much as possible.
  • the present application further proposes a variety of new inter prediction modes, including multiple inter prediction modes for (predicting) non-directional motion fields, and/or multiple frames for (predictive) directional motion fields. Inter prediction mode to form a set of candidate inter prediction modes.
  • the various inter prediction modes for the (predicted) non-directional motion field herein may include the first inter prediction for the non-directional motion field.
  • a mode for example, a first inter prediction mode for smoothing a motion field also used for gradation, simply referred to as mode 0
  • a second inter prediction mode for a non-directional motion field for example, mainly for a smooth motion field
  • the inter-frame prediction mode is referred to as mode 1).
  • the multiple inter prediction modes (referred to as inter-frame prediction modes) for the (predictive) directional motion field herein may correspond to different prediction directions or angles, and the number of inter-frame prediction modes of the present application is not limited to 9. (i.e., modes 2 to 10 shown in Table 2) or 32 (i.e., modes 2 to 33 shown in Table 3), the number of which may be increased or decreased as the prediction accuracy of the motion vector is required.
  • mode 0 can be understood as a planar Planar mode for inter prediction
  • mode 1 can be understood as a DC mode for inter prediction
  • mode N can be understood as used to predict directivity.
  • the Planar mode for inter-frame prediction is the mean of the horizontal and vertical linear interpolation of the motion vector of the image block/sub-block, which combines the characteristics of the horizontal and vertical motion field changes, and can make the prediction block/sub-block change tend to be gentle. Applicable to image blocks whose motion field changes slowly or its sub-blocks.
  • the DC mode for intra prediction uses the mean of the motion vectors of the left neighboring and upper neighboring blocks of the current image block as the motion vector of the current image block or its sub-blocks, and is suitable for a smooth image block or a sub-block thereof.
  • the inter-frame direction prediction mode is applied to an image block of a directional motion field to predict a motion vector of the image block or its sub-blocks.
  • the angle parameter A of the inter-frame prediction mode (2 to 10) has the following correspondence:
  • the angle parameter A of the inter-frame prediction mode (2 to 33) has the following correspondence:
  • This application will introduce a set of candidate inter prediction modes including but not limited to the above modes 0, 1, 2...10, which facilitates the video encoder to determine or select from the set of candidate inter prediction modes during the encoding process of the video data sequence.
  • An inter prediction mode for inter prediction of a current image block eg, a video encoder uses a plurality of inter prediction modes to encode video data and selects an inter prediction mode for a code rate distortion of a coded image block), and Performing inter prediction on the current image block based on the determined inter prediction mode, thereby completing encoding of the current image block.
  • the candidate inter prediction mode set including the above modes 0, 1, 2, . . . 10 is described herein, but the candidate inter prediction mode set of the present application is not limited thereto.
  • Video encoder 100 and video decoder 200 of video coding system 1 are for predicting current coded image blocks in accordance with various method examples described in any of a variety of new inter prediction modes proposed herein.
  • the motion information of the sub-block or its sub-block makes the predicted motion vector to be close to the motion vector obtained by using the motion estimation method, so that the motion vector difference does not need to be transmitted during encoding, thereby further improving the codec performance.
  • video coding system 1 includes source device 10 and destination device 20.
  • Source device 10 produces encoded video data.
  • source device 10 may be referred to as a video encoding device.
  • Destination device 20 may decode the encoded video data produced by source device 10.
  • destination device 20 may be referred to as a video decoding device.
  • Various implementations of source device 10, destination device 20, or both may include one or more processors and a memory coupled to the one or more processors.
  • the memory can include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store the desired program code in the form of an instruction or data structure accessible by the computer, as described herein.
  • Source device 10 and destination device 20 may include various devices, including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set top boxes, telephone handsets such as so-called “smart” phones, and the like.
  • desktop computers mobile computing devices
  • notebook (eg, laptop) computers tablet computers
  • set top boxes telephone handsets such as so-called “smart” phones, and the like.
  • Machine television, camera, display device, digital media player, video game console, on-board computer or the like.
  • Link 30 may include one or more media or devices capable of moving encoded video data from source device 10 to destination device 20.
  • link 30 can include one or more communication media that enable source device 10 to transmit encoded video data directly to destination device 20 in real time.
  • source device 10 may modulate the encoded video data in accordance with a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination device 20.
  • the one or more communication media can include wireless and/or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media can include a router, a switch, a base station, or other device that facilitates communication from the source device 10 to the destination device 20.
  • the encoded data can be output from output interface 140 to storage device 40.
  • encoded data can be accessed from storage device 40 via input interface 240.
  • Storage device 40 may comprise any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.
  • storage device 40 may correspond to a file server or another intermediate storage device that may maintain encoded video produced by source device 10.
  • Destination device 20 can access the stored video data from storage device 40 via streaming or download.
  • the file server can be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 20.
  • the instance file server contains a web server (for example, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive.
  • Destination device 20 can access the encoded video data over any standard data connection, including an Internet connection.
  • This may include a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
  • the transmission of encoded video data from storage device 40 may be streaming, downloading, or a combination of both.
  • the motion vector prediction techniques of the present application are applicable to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (eg, via the Internet), for storage in data storage. Encoding of video data on the media, decoding of video data stored on the data storage medium, or other applications.
  • video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • the video coding system 1 illustrated in Figure 1 is merely an example, and the techniques of the present application are applicable to video coding settings (e.g., video coding or video decoding) that do not necessarily include any data communication between the encoding device and the decoding device. .
  • data is retrieved from local storage, streamed over a network, and the like.
  • the video encoding device can encode the data and store the data to a memory, and/or the video decoding device can retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other but only encode data to and/or retrieve data from memory and decode the data.
  • source device 10 includes a video source 120, a video encoder 100, and an output interface 140.
  • output interface 140 can include a regulator/demodulator (modem) and/or a transmitter.
  • Video source 120 may include a video capture device (eg, a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer for generating video data A combination of graphics systems, or such sources of video data.
  • Video encoder 100 may encode video data from video source 120.
  • source device 10 transmits encoded video data directly to destination device 20 via output interface 140.
  • the encoded video data may also be stored on storage device 40 for later access by destination device 20 for decoding and/or playback.
  • destination device 20 includes an input interface 240, a video decoder 200, and a display device 220.
  • input interface 240 includes a receiver and/or a modem.
  • Input interface 240 may receive encoded video data via link 30 and/or from storage device 40.
  • Display device 220 may be integrated with destination device 20 or may be external to destination device 20. In general, display device 220 displays the decoded video data.
  • Display device 220 can include a variety of display devices, such as liquid crystal displays (LCDs), plasma displays, organic light emitting diode (OLED) displays, or other types of display devices.
  • LCDs liquid crystal displays
  • OLED organic light emitting diode
  • video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include a suitable multiplexer-demultiplexer unit Or other hardware and software to handle the encoding of both audio and video in a common data stream or in a separate data stream.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
  • UDP User Datagram Protocol
  • Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits, such as one or more of a microprocessor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the application is implemented partially in software, the apparatus can store the instructions for the software in a suitable non-transitory computer-readable storage medium and can execute the instructions in hardware using one or more processors Thus the technology of the present application is implemented. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) can be considered one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated into a combined encoder in a respective device / Part of the decoder (codec).
  • codec Part of the decoder
  • the present application may generally refer to video encoder 100 as another device that "signals" or “transmits” certain information to, for example, video decoder 200.
  • the term “signaling” or “transmitting” may generally refer to the transmission of syntax elements and/or other data used to decode compressed video data. This transfer can occur in real time or almost in real time. Alternatively, this communication may occur over a period of time, such as may occur when the syntax element is stored in the encoded bitstream to a computer readable storage medium at the time of encoding, the decoding device may then after the syntax element is stored to the media retrieve the syntax element any time.
  • Video encoder 100 and video decoder 200 may operate in accordance with a video compression standard such as High Efficiency Video Coding (HEVC) or an extension thereof, and may conform to the HEVC Test Model (HM).
  • HEVC High Efficiency Video Coding
  • HM HEVC Test Model
  • video encoder 100 and video decoder 200 may also operate in accordance with other industry standards, such as the ITU-T H.264, H.265 standard, or an extension of such standards.
  • the techniques of this application are not limited to any particular codec standard.
  • the video encoder 100 is configured to: encode syntax elements related to an image block to be currently encoded into a digital video output bit stream (referred to as a bit stream or a code stream), which will be used here.
  • the syntax element for inter-frame prediction of the current image block is simply referred to as inter prediction data, wherein the inter prediction data may include a first identifier for indicating whether to perform inter prediction on the current image block using the candidate inter prediction mode set (in other words That is, a first identifier for indicating whether to use the new inter prediction mode proposed by the present application for the current image block for inter prediction; or the inter prediction data may include: indicating whether the current image block to be encoded is a first identifier for inter prediction using a set of candidate inter prediction modes and a second identifier for indicating an inter prediction mode of the current image block; for determining an inter prediction mode for encoding the current image block, the video
  • the encoder 100 is further configured to determine or select (S301) an inter prediction for inter prediction of a
  • the encoding process herein may include predicting motion information (specifically, motion information of each sub-block or all sub-blocks) of one or more sub-blocks in the current image block based on the determined inter-prediction mode, and utilizing Performing inter prediction on the current image block by moving information of one or more sub-blocks in the current image block;
  • the video decoder 200 is configured to: decode a syntax element related to an image block to be currently decoded from a bitstream (S401), where it will be used for current image block inter prediction.
  • the syntax element is simply referred to as inter prediction data
  • the inter prediction data includes a first identifier for indicating whether to use the candidate inter prediction mode set for inter prediction of the current decoded image block (ie, for indicating whether to the current).
  • the image block to be decoded adopts a new inter prediction mode proposed by the present application to perform first prediction of inter prediction, and when the inter prediction data indicates that a candidate inter prediction mode set (ie, a new inter prediction mode) is used
  • the decoding process herein may include predicting motion information of one or more sub-blocks in the current image block based
  • the video decoder 200 is configured to determine an inter prediction of the second identifier indication.
  • the mode is an inter prediction mode for performing inter prediction on the current image block; or if the inter prediction data does not include a second identifier indicating which inter prediction mode is used by the current image block
  • the video decoder 200 is configured to determine a first inter prediction mode for the non-directional motion field as an inter prediction mode for inter prediction of the current image block.
  • the candidate inter prediction mode set herein may be one mode or multiple modes.
  • the mode may be determined to be an inter prediction mode for encoding or decoding the current image block.
  • the candidate inter prediction mode set is multiple modes, it may be determined by default that the mode with the highest priority or the top position of the set is the inter prediction mode for encoding or decoding the current image block; or It may be determined that the mode indicated by the second identity is an inter prediction mode for decoding the current image block.
  • the embodiment of the present application divides the new inter prediction mode into an inter prediction mode for a non-directional motion field and/or an inter prediction mode for a directional motion field by considering the characteristics of the motion field, regardless of
  • the video encoder 100 and the video decoder 200 of the video coding system 1 adopt an inter prediction mode for a directional motion field in a candidate inter prediction mode set or an inter prediction mode for a non directional motion field to
  • the image block to be encoded or decoded is decoded, and the motion information (for example, motion vector) of one or more sub-blocks in the current image block can be predicted by using the motion vector of the available reference block of the current image block (referred to as a reference motion vector), such that
  • the predicted motion vector of the current image block is basically close to the motion vector obtained by the motion estimation method, so that the motion vector difference is not required to be transmitted, and the code rate is saved in the case of the same video quality. Therefore, the codec performance of the video decoding system of the embodiment of the present
  • Post-processing entity 41 represents an instance of a video entity that can process encoded video data from video encoder 100, such as a Media Aware Network Element (MANE) or a stitching/editing device. In some cases, post-processing entity 41 may be an instance of a network entity. In some video coding systems, post-processing entity 41 and video encoder 100 may be portions of a separate device, while in other cases, the functionality described with respect to post-processing entity 41 may be the same device including video encoder 100. carried out. In one example, post-processing entity 41 is an example of storage device 40 of FIG.
  • MEM Media Aware Network Element
  • the video encoder 100 may perform encoding of a video image block according to any one of the candidate inter prediction modes including mode 0, 1, 2, . . . or 10 proposed by the present application, for example, performing a video image block. Inter prediction.
  • video encoder 100 includes prediction processing unit 108, filter unit 106, decoded image buffer (DPB) 107, summer 112, transformer 101, quantizer 102, and entropy encoder 103.
  • the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109.
  • video encoder 100 also includes inverse quantizer 104, inverse transformer 105, and summer 111.
  • Filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
  • ALF adaptive loop filter
  • SAO sample adaptive offset
  • filter unit 106 is illustrated as an in-loop filter in FIG. 2A, in other implementations, filter unit 106 can be implemented as a post-loop filter.
  • video encoder 100 may also include a video data store, a splitting unit (not shown).
  • the video data store can store video data to be encoded by components of video encoder 100.
  • Video data stored in the video data store can be obtained from video source 120.
  • the DPB 107 can be a reference image memory that stores reference video data for encoding video data in the intra, inter coding mode by the video encoder 100.
  • the video data memory and DPB 107 may be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), including synchronous DRAM (SDRAM), Or other type of memory device.
  • the video data store and DPB 107 may be provided by the same memory device or a separate memory device.
  • the video data store can be on-chip with other components of video encoder 100, or off-chip relative to those components.
  • video encoder 100 receives video data and stores the video data in a video data store.
  • the segmentation unit divides the video data into a plurality of image blocks, and the image blocks may be further divided into smaller blocks, such as image block segmentation based on a quadtree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger cells.
  • Video encoder 100 generally illustrates components that encode image blocks within a video strip to be encoded.
  • the strip may be divided into a plurality of image blocks (and may be divided into a set of image blocks called slices).
  • Prediction processing unit 108 may select one of a plurality of possible coding modes for a current picture block, such as one of a plurality of intra coding modes, or one of a plurality of inter coding modes, where The plurality of inter-coding modes include, but are not limited to, one or more of modes 0, 1, 2, 3...10 as set forth herein. Prediction processing unit 108 may provide the resulting intra-frame, inter-coded block to summer 112 to generate a residual block and provide to summer 111 to reconstruct the encoded block for use as a reference image.
  • the intra predictor 109 within the prediction processing unit 108 may perform intra-predictive coding of the current image block with respect to one or more neighboring blocks in the same frame or stripe as the current block to be encoded to remove spatial redundancy.
  • Inter predictor 110 within prediction processing unit 108 may perform inter-predictive encoding of the current image block relative to one or more of the one or more reference images to remove temporal redundancy.
  • inter predictor 110 can be used to determine an inter prediction mode for encoding a current image block.
  • inter predictor 110 may use rate-distortion analysis to calculate rate-distortion values for various inter-prediction modes in a set of candidate inter-prediction modes, and select inter-frames with optimal rate-distortion characteristics therefrom. Forecast mode.
  • Rate-distortion analysis typically determines the amount of distortion (or error) between the encoded block and the original uncoded block encoded to produce the encoded block, and the bit rate used to generate the encoded block (ie, , the number of bits).
  • the inter predictor 110 may determine that the inter prediction mode in which the code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode for inter prediction of the current image block.
  • the inter-predictive coding process will be described in detail below, and in particular, in the inter-prediction mode for various non-directional or directional motion fields of the present application, one or more sub-blocks in the current image block are predicted (specifically, each The process of motion information for sub-blocks or all sub-blocks.
  • the inter predictor 110 is configured to predict motion information (eg, motion vectors) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and utilize motion information of one or more sub-blocks in the current image block (eg, Motion vector) Acquires or produces a prediction block of the current image block.
  • Inter predictor 110 may locate the predicted block to which the motion vector points in one of the reference picture lists.
  • Inter predictor 110 may also generate syntax elements associated with image blocks and video slices for use by video decoder 200 in decoding image blocks of video slices.
  • the inter predictor 110 performs a motion compensation process using motion information of each sub-block to generate a prediction block for each sub-block, thereby obtaining a prediction block of the current image block; it should be understood that The inter predictor 110 performs a motion estimation and motion compensation process.
  • the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the indication.
  • Information about the selected inter prediction mode may include inter prediction data associated with a current image block in the transmitted bitstream, which may include a first identification block_based_enable_flag to indicate whether the new proposed image is applied to the current image block.
  • the inter prediction mode performs inter prediction; optionally, the second identifier block_based_index may also be included to indicate which new inter prediction mode is used by the current image block.
  • the process of predicting the motion vector of the current image block or its sub-blocks using motion vectors of a plurality of reference blocks under different modes 0, 1, 2...10 will be described in detail below.
  • the intra predictor 109 can perform intra prediction on the current image block.
  • intra predictor 109 may determine an intra prediction mode used to encode the current block.
  • the intra predictor 109 may use rate-distortion analysis to calculate rate-distortion values of various intra prediction modes to be tested, and select intra prediction with optimal rate-distortion characteristics from among modes to be tested. mode.
  • the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication. Information about the selected intra prediction mode.
  • the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
  • Summer 112 represents one or more components that perform this subtraction.
  • the residual video data in the residual block may be included in one or more TUs and applied to the transformer 101.
  • the transformer 101 transforms the residual video data into residual transform coefficients using transforms such as discrete cosine transform (DCT) or conceptually similar transforms.
  • DCT discrete cosine transform
  • the transformer 101 can convert residual video data from a pixel value domain to a transform domain, such as the frequency domain.
  • the transformer 101 can send the resulting transform coefficients to the quantizer 102.
  • Quantizer 102 quantizes the transform coefficients to further reduce the bit rate.
  • quantizer 102 may then perform a scan of a matrix containing quantized transform coefficients.
  • the entropy encoder 103 can perform a scan.
  • the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, entropy encoder 103 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax based context adaptive binary arithmetic coding (SBAC), probability interval partition entropy (PIPE) Encoding or another entropy encoding method or technique.
  • CAVLC context adaptive variable length coding
  • CABAC context adaptive binary arithmetic coding
  • SBAC syntax based context adaptive binary arithmetic coding
  • PIPE probability interval partition entropy
  • the encoded bitstream may be transmitted to video decoder 200, or archived for later transmission or retrieved by video decoder 200.
  • the entropy encoder 103 may also entropy encode the syntax elements of the current image block to be encoded.
  • the inverse quantizer 104 and the inverse mutator 105 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of the reference image.
  • Summer 111 adds the reconstructed residual block to the prediction block produced by inter predictor 110 or intra predictor 109 to produce a reconstructed image block.
  • Filter unit 106 may be adapted to reconstructed image blocks to reduce distortion, such as block artifacts.
  • the reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used as a reference block by the inter predictor 110 to inter-predict the blocks in subsequent video frames or images.
  • video encoder 100 may directly quantize the residual signal without the need for processing by transformer 101, and accordingly need not be processed by inverse transformer 105; or, for some image blocks Or the image frame, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, the quantizer 102, the inverse quantizer 104, and the inverse transformer 105; alternatively, the video encoder 100 can reconstruct the reconstructed image
  • the block is stored directly as a reference block without being processed by filter unit 106; alternatively, quantizer 102 and inverse quantizer 104 in video encoder 100 may be combined.
  • video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a decoded image buffer 207.
  • Prediction processing unit 208 can include inter predictor 210 and intra predictor 209.
  • video decoder 200 may perform a decoding process that is substantially reciprocal with respect to the encoding process described with respect to video encoder 100 from FIG. 2A.
  • video decoder 200 receives from video encoder 100 an encoded video bitstream representing the image blocks of the encoded video strip and associated syntax elements.
  • Video decoder 200 may receive video data from network entity 42 and, optionally, may store the video data in a video data store (not shown).
  • the video data store may store video data to be decoded by components of video decoder 200, such as an encoded video bitstream.
  • the video data stored in the video data store can be obtained, for example, from storage device 40, from a local video source such as a camera, via a wired or wireless network via video data, or by accessing a physical data storage medium.
  • the video data store can function as a decoded image buffer (CPB) for storing encoded video data from an encoded video bitstream.
  • CPB decoded image buffer
  • the video data memory and DPB 207 may be the same memory or may be separately provided memories.
  • the video data memory and DPB 207 may be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (RRAM). , or other type of memory device.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • the video data store can be integrated on-chip with other components of video decoder 200, or off-chip with respect to those components.
  • Network entity 42 may be, for example, a server, a MANE, a video editor/splicer, or other such device for implementing one or more of the techniques described above.
  • Network entity 42 may or may not include a video encoder, such as video encoder 100.
  • network entity 42 may implement portions of the techniques described in this application.
  • network entity 42 and video decoder 200 may be part of a separate device, while in other cases, the functionality described with respect to network entity 42 may be performed by the same device including video decoder 200.
  • network entity 42 may be an example of storage device 40 of FIG.
  • Entropy decoder 203 of video decoder 200 entropy decodes the bitstream to produce quantized coefficients and some syntax elements. Entropy decoder 203 forwards the syntax elements to prediction processing unit 208.
  • Video decoder 200 may receive syntax elements at the video stripe level and/or image block level.
  • the syntax element herein may include inter prediction data related to a current image block, and the inter prediction data may include a first identifier block_based_enable_flag to indicate whether the candidate frame is used for the current image block.
  • the inter-prediction mode set performs inter-frame prediction (in other words, to indicate whether to perform inter-prediction on the current image block using the new inter-prediction mode proposed by the present application); optionally, the second identifier block_based_index may be further included to indicate Which new inter prediction mode is used by the current image block.
  • the intra predictor 209 of the prediction processing unit 208 can be based on the signaled intra prediction mode and the previously decoded block from the current frame or image.
  • the data produces a prediction block of the image block of the current video strip.
  • the inter predictor 210 of the prediction processing unit 208 may determine to use for the current based on the syntax element received from the entropy decoder 203.
  • the inter-prediction mode in which the current image block of the video stripe is decoded the current image block is decoded (eg, inter-prediction is performed) based on the determined inter-prediction mode.
  • the inter predictor 210 may determine whether to predict the current image block of the current video slice by using a new inter prediction mode, if the syntax element indicates that the current inter prediction mode is used to predict the current image block, based on A new inter prediction mode (eg, a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicting a current image block of a current video strip or a sub-block of a current image block.
  • a new inter prediction mode eg, a new inter prediction mode specified by a syntax element or a default new inter prediction mode
  • the motion information thereby using the motion information of the predicted current image block or the sub-block of the current image block by the motion compensation process to acquire or generate a prediction block of the current image block or a sub-block of the current image block.
  • the motion information herein may include reference image information and motion vectors, wherein the reference image information may include, but is not limited to, unidirectional/bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list.
  • a prediction block may be generated from one of the reference pictures within one of the reference picture lists.
  • the video decoder 200 may construct a reference image list, that is, list 0 and list 1, based on the reference image stored in the DPB 207.
  • the reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1.
  • video encoder 100 may signal that a particular inter-prediction mode is employed to decode a particular syntax element of a particular block, or may be signaled to indicate whether to adopt a new inter-prediction mode, And a specific syntax element that indicates which new inter prediction mode is specifically used to decode a particular block.
  • the inter predictor 210 herein performs a motion compensation process. The inter prediction process of predicting motion information of a current image block or a sub-block of a current image block using motion information of a reference block in various new inter prediction modes will be explained in detail below.
  • the inverse quantizer 204 inverse quantizes, ie dequantizes, the quantized transform coefficients provided in the bitstream and decoded by the entropy decoder 203.
  • the inverse quantization process may include using the quantization parameters calculated by video encoder 100 for each of the video slices to determine the degree of quantization that should be applied and likewise determine the degree of inverse quantization that should be applied.
  • the inverse transformer 205 applies an inverse transform to transform coefficients, such as inverse DCT, inverse integer transform, or a conceptually similar inverse transform process, to generate residual blocks in the pixel domain.
  • the video decoder 200 After the inter predictor 210 generates a prediction block for the current picture block or a sub-block of the current picture block, the video decoder 200 passes the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210. The blocks are summed to obtain a reconstructed block, ie a decoded image block. Summer 211 represents the component that performs this summation operation.
  • a loop filter (either in the decoding loop or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality, if desired.
  • Filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
  • filter unit 206 is illustrated as an in-loop filter in FIG. 2B, in other implementations, filter unit 206 can be implemented as a post-loop filter.
  • filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream.
  • decoded image blocks in a given frame or image may be stored in decoded image buffer 207, which stores reference images for subsequent motion compensation.
  • the decoded image buffer 207 can be part of a memory, which can also store decoded video for later presentation on a display device (eg, display device 220 of FIG. 1), or can be separate from such memory.
  • video decoder 200 may generate an output video stream without processing by filter unit 206; or, for some image blocks or image frames, entropy decoder 203 of video decoder 200 does not decode the quantized coefficients, correspondingly Processing by inverse quantizer 204 and inverse transformer 205 is not required.
  • FIG. 6 is a schematic diagram showing motion information of an exemplary current image block 600 and a reference block in the embodiment of the present application.
  • W and H are the width and height of the current image block 600 and the co-located block (abbreviated as collocated block) 600' of the current image block 600.
  • the reference block of the current image block 600 includes an upper spatial adjacent block and a left spatial adjacent block of the current image block 600, and a lower spatial adjacent block and a right spatial adjacent block of the collocated block 600', wherein the collocated block 600 ' is an image block of the reference image having the same size, shape, and coordinates as the current image block 600.
  • the motion information of the lower side airspace neighboring block and the right side airspace neighboring block of the current image block does not exist and has not been encoded yet.
  • the current image block 600 and the collocated block 600' can be of any block size.
  • current image block 600 and collocated block 600' may include, but are not limited to, 16x16 pixels, 32x32 pixels, 32x16 pixels, and 16x32 pixels, and the like.
  • each image frame can be divided into image blocks for encoding.
  • image blocks may be further divided into smaller blocks, for example, the current image block 600 and the collocated block 600' may be divided into a plurality of MxN sub-blocks, that is, each sub-block has a size of MxN pixels, and each reference The size of the block is also MxN pixels, that is, the same size as the sub-block of the current image block.
  • the coordinates in Figure 6 are measured in MxN blocks.
  • M ⁇ N” and “M by N” are used interchangeably to refer to the pixel size of an image block according to a horizontal dimension and a vertical dimension, that is, having M pixels in the horizontal direction and N pixels in the vertical direction, Where M and N represent non-negative integer values.
  • the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction.
  • an image block described in this application may be understood as, but not limited to, a prediction unit (PU) or a coding unit (CU) or a transform unit (TU) or the like.
  • the CU may include one or more prediction units PU according to different video compression codec standards, or the PUs and CUs have the same size.
  • Image blocks may have fixed or variable sizes and differ in size according to different video compression codec standards.
  • the current image block refers to an image block currently to be encoded or decoded, such as a prediction unit to be encoded or decoded.
  • each left-side spatial neighboring block of the current image block 600 may be sequentially determined along the direction 1 whether each left-side spatial neighboring block of the current image block 600 is available, and each upper spatial adjacent block of the current image block 600 may be sequentially determined along the direction 2. Whether it is available, for example, whether adjacent blocks (also referred to as reference blocks, interchangeably used) are inter-coded, if adjacent blocks exist and are inter-coded, the neighboring blocks are available; if adjacent blocks do not exist or are intra-framed Encoding, then the neighboring block is not available. If one neighboring block is intra-coded, motion information of other neighboring reference blocks is copied as motion information of the neighboring block.
  • the lower side airspace neighboring block and the right airspace neighboring block of the collocated block 600' are detected in a similar manner, and are not described herein.
  • the motion information of the fetch available reference block may be directly obtained; if the size of the available reference block is, for example, 8x4, 8x8, the center 4x4 block may be obtained.
  • motion information of the upper left corner 4 ⁇ 4 block of the reference block may also be acquired as motion information of the available reference block, but the application is not limited thereto.
  • MxN sub-blocks are denoted by sub-blocks below, and the neighboring MxN blocks are denoted by neighboring blocks.
  • Process 700 is a flow diagram showing a process 700 for predicting motion information for a current sub-block in a current image block based on a first inter prediction mode, in accordance with an embodiment of the present application.
  • Process 700 may be performed by video encoder 100 or video decoder 200, and in particular, by video encoder 100 or inter predictor 110, 210 of video decoder 200.
  • Process 700 is described as a series of steps or operations, it being understood that process 700 can be performed in various sequences and/or concurrently, and is not limited to the order of execution shown in FIG. Assuming that a video data stream having multiple video frames is using a video encoder or video decoder, a process 700 comprising the steps of predicting motion information for a current sub-block of a current image block of a current video frame is performed;
  • Step 701 using motion information of the plurality of reference blocks to predict (derive) the first motion information of the right airspace neighboring block 806 of the current image block 600 on the same line as the current sub-block 604 of the current image block 600, and Second motion information of the lower side airspace adjacent block 808 of the current image block 600 on the same column as the current sub-block 604 of the current image block 600;
  • the reference block herein may include spatially and/or with the current image block 600 Or temporally adjacent image blocks.
  • the current image block refers to an image block currently to be encoded or decoded.
  • Step 703 based on the linearity of the first motion information of the right-side spatial neighboring block 806 and the third motion information of the left neighboring block 802 of the current image block 600 on the same line as the current sub-block 604. Interpolating to obtain a first predicted value P h (x, y) of the motion information of the current sub-block 604;
  • step 703 determining first motion information of the right neighboring block 806 of the current image block 600 on the same row as the current sub-block 604 and on the same line as the current sub-block 604
  • the weighting value of the third motion information of the left neighboring block 802 of the current image block 600 is the first predicted value P h (x, y) of the motion information of the current sub-block 604, wherein the weighting of the third motion information
  • the ratio between the weighting factor of the factor and the first motion information is based on the right side of the current image block 600 and the current sub-block 604 on the same row as the current sub-block 604 of the current image block 600.
  • a first distance determined by a ratio between a current sub-block 604 and a second distance between the left-hand spatial neighboring block 802 of the current image block 600 on the same row as the current sub-block 604;
  • Step 705 linearly (vertical) based on the second motion information of the derived lower spatial neighboring block 808 and the fourth motion information of the upper neighboring block 809 of the current image block 600 on the same column as the current subblock 604. Interpolating to obtain a second predicted value P v (x, y) of the motion information of the current sub-block 604;
  • step 705 determining fourth motion information of the upper spatial neighboring block 809 of the current image block 600 on the same column as the current sub-block 604 and the same column as the current sub-block 604
  • the weighting value of the second motion information of the lower spatial neighboring block 808 of the current image block 600 is the second predicted value P v (x, y) of the motion information of the current sub-block 604, wherein the fourth motion information
  • the ratio between the weighting factor of the second motion information and the weighting factor of the second motion information is based on the lower airspace neighboring block 808 and the current sub-block 604 of the current image block 600 on the same column as the current sub-block 604 of the current image block 600.
  • the third distance between the third distance is determined by the ratio between the current sub-block 604 and the fourth distance between the current spatial block 604 and the upper spatial reference block 809 of the current image block 600 on the same column as the current sub-block 604.
  • Step 707 Determine the current sub-block by using the first predicted value P h (x, y) of the motion information of the current sub-block 604 and the second predicted value P v (x, y) of the motion information of the current sub-block 604. Motion information P(x, y) of 604.
  • step 707 the first predicted value P h (x, y) of the motion information of the current sub-block 604 and the second predicted value P v (x, y of the motion information of the current sub-block 604)
  • the weighting process is performed to obtain the motion information P(x, y) of the current sub-block 604. It should be understood that, in the weighting process, the case where the weighting factors are the same is equivalent to the averaging, that is, determining the first predicted value P h (x, y) of the motion information of the current sub-block 604 and the motion information of the current sub-block 604.
  • the mean of the second predicted values is the motion information of the current sub-block 604.
  • step 701 may include:
  • Step 701A-1 fifth motion information based on the lower right corner spatial neighboring block of the first collocated block 600' of the current image block 600 (referred to as the lower right corner time domain neighboring block) 807 and the upper right corner spatial neighboring block of the current image block 600.
  • the linear (vertical) interpolation of the sixth motion information of 805 results in the first motion information of the right airspace neighboring block 806 of the current image block 600 on the same line as the current sub-block 604 of the current image block 600;
  • Step 701A-2 fifth motion information based on the lower right corner spatial neighboring block of the first collocated block 600' of the current image block 600 (referred to as the lower right corner time domain neighboring block) 807 and the lower left corner airspace neighboring block of the current image block 600 Linear interpolation of the seventh motion information of 801 (ie, horizontal interpolation), obtaining second motion information of the lower airspace neighboring block 808 of the current image block 600 on the same column as the current sub-block 604 of the current image block 600, where
  • the collocated block 600' is an image block of the reference image having the same size, shape and coordinates as the current image block 600;
  • step 701A-1 the fifth motion information of the right lower corner spatial neighboring block 807 of the first collocated block 600' and the sixth motion of the upper right corner spatial neighboring block 805 of the current image block 600
  • the information is vertically interpolated according to formula (1) out of the first motion information of the right airspace neighboring block 806;
  • step 701A-2 the fifth motion information of the lower right corner spatial neighboring block 807 of the first collocated block 600' and the seventh motion information of the lower left corner spatial neighboring block 801 of the current image block 600 are horizontally interpolated according to formula (2).
  • (x, y) represents the coordinates of the current sub-block 604 relative to the upper-left sub-block of the current image block 600
  • x is an integer between 0 and W-1
  • y is between 0 and H-1
  • the integers, W and H represent the width and height of the current image block 600 (measured in sub-blocks)
  • AR represents the sixth motion information of the upper right spatial neighboring block 805
  • BR represents the fifth of the lower right temporal neighboring block 807.
  • the motion information, BL represents the seventh motion information of the left lower corner airspace neighboring block 801.
  • step 703 the first motion information of the right-side spatial neighboring block 806 and the third motion information of the left-side spatial neighboring block 802 horizontally interpolate the motion information of the current sub-block 604 according to formula (3).
  • step 705 the second motion information of the lower spatial neighboring block 808 and the fourth motion information of the upper spatial neighboring block 809 vertically interpolate the second predicted value of the motion information of the current sub-block 604 according to formula (4);
  • step 707 the motion vector mean (formula 5) of the horizontal and vertical linear interpolation is the motion vector of the current sub-block;
  • L(-1, y) represents the third motion vector of the left airspace neighboring block 802 of the row in which the current subblock 604 is located
  • R(W, y) represents the right airspace neighboring block 806 of the row in which the current subblock 604 is located.
  • the first motion vector; A(x, -1) represents the fourth motion vector of the upper spatial neighboring block 809 of the column in which the current sub-block 604 is located
  • B(x, H) represents the lower airspace neighbor of the column in which the current sub-block 604 is located.
  • the second motion vector of block 808, P h (x, y) represents a horizontally interpolated motion vector (ie, a first predicted value), and P v (x, y) represents a vertically interpolated motion vector (ie, a second predicted value), P(x, y) represents the motion vector of the current sub-block 604.
  • the second motion information, step 701 may include:
  • Step 701B-1 determining that the sixth motion information of the upper right corner spatial neighboring block 805 of the current image block 600 is the first motion information of the right airspace neighboring block 806 on the same column as the current subblock 604; or determining the current image block.
  • the mean value of the motion information of the plurality of airspace neighboring blocks on the upper right side of 600 is the first motion information of the right airspace neighboring block 806;
  • Step 701B-2 determining that the seventh motion information of the lower left spatial neighboring block 801 of the current image block 600 is the second motion information of the lower spatial neighboring block 808 on the same column as the current subblock 604; or determining the current image block.
  • the mean value of the motion information of the plurality of airspace neighboring blocks on the lower left side of 600 is the second motion information of the neighboring airspace neighboring block 808.
  • the second motion information, step 701 may include:
  • Step 701C-1 determining motion information of the first right-side spatial neighboring block (referred to as the right-side temporal neighboring reference block) of the first collocated block 600' (co-located) of the current image block 600 as a right-side spatial neighboring block First motion information of 806, wherein the first right airspace neighboring block is located in the first juxtaposed block 600' and the current subblock 604 is in the current image block 600; and
  • Step 701C-2 determining motion information of the first lower spatial neighboring block (referred to as the lower side time domain neighboring reference block) of the first collocated block 600' of the current image block 600 as the lower airspace neighboring block
  • step 701 can include:
  • Step 701D-1 determining that the motion information of the second right-side spatial neighboring block (referred to as the right-time temporal neighboring reference block) of the second co-located block of the current image block 600 is the right-side spatial neighboring block 806.
  • First motion information wherein the second collocated block is an image block in the reference image having a specified position offset from the current image block 600, and a motion vector of a representative airspace neighboring block of the current image block 600 is used to indicate the specified positional offset Moving, and the second right-space neighboring block is located in the second collocated block and the current sub-block 604 is in the current image block 600; and,
  • Step 701D-2 determining that the motion information of the second lower-side spatial neighboring block (referred to as the lower-side temporal-domain neighboring reference block) of the second co-located block of the current image block 600 is the lower-side airspace neighboring block 808.
  • Second motion information wherein the second collocated block is an image block in the reference image having a specified position offset from the current image block 600, and a motion vector of a representative airspace neighboring block of the current image block 600 is used to indicate the specified positional offset
  • the second lower side spatial neighboring block is located in the second collocated block and the current sub-block 604 is located in the current image block 600.
  • the representative airspace neighboring block herein may be one of the left airspace neighboring block or the upper airspace neighboring block shown in FIG. 6, for example, may be detected along the direction 1.
  • the first available left-side spatial neighboring block or may be the first available upper-side spatial neighboring block detected along direction 2; for example, may be multiple designated airspace neighbors of the current image block in composite mode
  • the first available spatial neighboring block is sequentially detected by the location point, as shown in FIG. 8A, L ⁇ A ⁇ AR ⁇ BL ⁇ AL; for example, it may also be multiple available spatial neighboring blocks obtained from sequential detection.
  • Embodiments of the present application are not limited thereto, and a representative airspace neighboring block that is randomly selected or selected according to a predetermined rule.
  • the current sub-block 604 in the current image block 600 is used as a representative to describe the prediction process of the motion vector of the current sub-block 604, and each sub-block in the current image block 600
  • the prediction process of the motion vector can be referred to in this embodiment, and details are not described herein again.
  • the horizontal and vertical linearity is used in the inter prediction process based on the first inter prediction mode (also referred to as the planar planar mode for inter prediction) of the non-directional motion field of the embodiment of the present application.
  • the mean value of the interpolation is used to derive the motion vector of the current sub-block, which can better predict the motion vector of the image block or its sub-blocks with the gradual motion field, thereby improving the prediction accuracy of the motion vector.
  • FIG. 9 is a schematic diagram of predicting motion information of a current sub-block in a current image block based on a second inter prediction mode, according to another embodiment of the present application.
  • the third motion of the left airspace neighboring block 802 of the current image block 600 on the same row as the current sub-block 604 of the current image block 600 is determined.
  • the information and the fourth motion information of the upper spatial neighboring block 809 of the current image block 600 on the same column as the current sub-block 604 are the motion information of the current sub-block 604; or
  • the motion information of the plurality of left-side spatial neighboring blocks of the current image block 600 and the motion information of the plurality of upper-side spatially adjacent blocks of the current image block 600 are one or more sub-blocks of the current image block 600 (specifically, all Motion information of the sub-block).
  • the current sub-block 604 in the current image block 600 is used as a representative to describe the prediction process of the motion vector of the current sub-block 604, and each sub-block in the current image block 600
  • the prediction process of the motion vector can be referred to in this embodiment, and details are not described herein again.
  • the current image block is directly used in the embodiment of the present application.
  • the moving vector of the current sub-block is derived from the mean of the motion vector of the adjacent block in the left space and the neighboring block in the upper airspace, and the motion vector of the image block or its sub-blocks with smooth motion field can be better predicted, thereby improving the motion vector.
  • the accuracy of the prediction is derived from the mean of the motion vector of the adjacent block in the left space and the neighboring block in the upper airspace.
  • FIGS. 10A-10E are schematic diagrams showing the principle of predicting motion information of a current sub-block in a current image block based on an inter-frame direction prediction mode, according to another embodiment of the present application.
  • the process of the video encoder 100 or the video decoder 200 (the specific inter predictor 110, 210) predicting the motion information of the current sub-block of the current image block is as follows:
  • a plurality of sub-blocks in the current image block are projected onto the reference row 1010 or the reference column 1020, where:
  • the reference row 1010 does not belong to the current image block 600, and the reference row 1010 is a row of upper side spatial neighboring blocks adjacent to the first row of sub-blocks of the current image block, the reference row 1010
  • the first column may be aligned with the first column of the current image block, or may be misaligned with the first column of the current image block (eg, beyond the first column of the current image block, and expand to the left);
  • the reference column 1020 does not belong to the current image block 600, and the reference column 1020 is a column of left-space spatial neighboring blocks adjacent to the first column of the current image block, and the first row of the reference column 1020 Alignable with the first row of the current image block, the last row of the reference column 1020 may be aligned with the last row of the current image block, or may not be aligned with the last row of the current image block 600 (eg, beyond the The last row of the current image block is expanded downward; the reference block at which the reference row 1010 and the reference column 1020 intersect
  • the motion information of the two target reference blocks is weighted to obtain motion information of the current sub-block, or
  • the motion information of the left and right neighboring blocks of the two target reference blocks and the two target reference blocks is weighted to obtain motion information of the current sub-block; it should be understood that the former is weighted by two motion vectors, and the latter Weighted for 4 motion vectors.
  • the weighting factor here is determined according to the distance between the reference block and the projection point, and the closer the distance is, the larger the weight is.
  • the target reference block (also referred to as a projected reference block) referred to herein refers to the current sub-determined on the reference row 1010 or the reference column 1020 according to the prediction direction (angle) corresponding to the inter-frame prediction mode.
  • the reference block corresponding to the block, this correspondence can be understood as the current prediction direction of the current sub-block and the target reference block.
  • 10C illustrates another projection mode of the inter-frame direction prediction mode (directly horizontal, such as mode 4 in FIG. 5 or Table 2, or mode 10 in Table 3), and the dotted arrow indicates such
  • the prediction direction corresponding to the inter-frame prediction mode ie, the horizontal direction
  • R (-1,0) represents the current image block.
  • the motion vector of the neighboring block (-1, 0) in the left airspace, analogy, is not explained here;
  • 10D illustrates another projection mode of an inter-frame direction prediction mode (ie, an immediate mode, such as mode 8 in FIG. 5 or Table 2, or mode 26 in Table 3), and a dashed arrow indicates such a mode.
  • the prediction direction corresponding to the inter-frame prediction mode ie, the vertical direction
  • R (0, -1) represents the upper of the current image block.
  • the motion vector of the adjacent spatial block (0, -1) in the side airspace, analogy, is not explained here;
  • 10E illustrates a projection mode of another inter-frame direction prediction mode (mode 23 in Table 3), and a bold arrow indicates a prediction direction corresponding to the inter-frame direction prediction mode 23, for example, two adjacent upper sides.
  • the weighting of the motion vectors of the spatial neighboring blocks 809, 810 is the motion vector of the current sub-block 604. It should be noted that, in FIG. 10E, as shown by thin arrows, motion vectors of some left neighboring reference blocks are projected to expand the reference row, for example, the motion vector of the left spatial neighboring block 802 is projected or mapped to the extension of the reference row.
  • the left upper airspace in the portion is adjacent to the motion vector of block 811.
  • the motion information weighting method in the embodiment of the present application may use the 4tap Cubic frame interpolation filtering in the JVET.
  • i is the integer part of the projection displacement
  • f is the fractional part of the projection displacement
  • A is the angle parameter
  • y is the current sub-block.
  • x is the abscissa of the current sub-block.
  • (x, y) represents the coordinates of the current sub-block relative to the upper-left sub-block of the current image block
  • R(x+i-1,-1), R(x+i,-1), R(x+i +1, -1), R(x+i+2, -1) represents the motion vector of four reference blocks adjacent to each other; correspondingly, w[0], w[1], w[2], w [3] represents the weighting factor of the aforementioned four reference blocks
  • P(x, y) represents the motion vector of the current sub-block.
  • the motion information weighting method in the embodiment of the present application may use 4tap Gaussian intra-frame interpolation filtering in the JVET, and replace the Cubic frame interpolation filtering in the first example with the Gaussian intra-frame interpolation filtering. , no longer repeat them here.
  • the motion information weighting method in the embodiment of the present application may use 2tap intra-interpolation filtering in HEVC.
  • (x, y) represents the coordinates of the current sub-block relative to the upper-left sub-block of the current image block
  • R(x+i, -1), R(x+i+1, -1) represents two adjacent to each other
  • the target is referenced to the motion vector of the block; correspondingly, (32-f), f represents the weighting factor of the two target reference blocks, and P(x, y) represents the motion vector of the current sub-block.
  • the motion vectors of one or more sub-blocks along the prediction direction are identical to each other and the value of the motion vector depends on
  • the motion vector of the target reference block can better predict the motion vector of the image block or its sub-blocks with the directional motion field, and improve the prediction accuracy of the motion vector.
  • motion information (ie, each set of motion information) of each reference block may include a motion vector, a reference image list, and a reference image list.
  • Reference image index is used to identify the reference image pointed to by the motion vector in the specified reference image list (RefPicList0 or RefPicList1).
  • the motion vector (MV) refers to the positional offset in the horizontal and vertical directions, that is, the horizontal component of the motion vector and the vertical component of the motion vector.
  • Video encoder 100 or video decoder 200 may be further used (or methods of various embodiments of the present application may also include):
  • the specified reference image list herein may be reference image list 0 or list 1; the target reference image index herein may be 0, 1 or other, or Is the reference image index that is most frequently used in the specified reference picture list, for example, the motion vector of all reference blocks or the reference picture index of the weighted reference block whose motion vector is pointed/used the most.
  • the motion vector corresponding to the specified reference image list included in the current motion motion information is based on the time domain A scaling process of the distance to obtain a motion vector of a reference frame that points to the target reference image index.
  • the time distance between the current image and the reference image indicated by the reference image index of the current motion information is scaled based on the temporal distance between the current image and the reference image indicated by the target reference image index.
  • Motion vector
  • the motion vectors of the plurality of reference blocks are interpolated, if the motion information of the plurality of reference blocks respectively includes a reference image index corresponding to the list 0, for example, the reference image index corresponding to the list 0 of the first reference block 0, and the reference image index corresponding to list 0 of the second reference block is 1, and in the case of assuming that the reference image index corresponding to the current image block and list 0 is 0, then the motion vector of the second reference block ( The corresponding list 0) performs time domain distance based scaling to obtain a motion vector pointing to the reference frame indicated by the reference image index 0.
  • the video encoder 100 or the video decoder 200 may be further used (or the method of various embodiments of the present application may further include): referencing the target based on the prediction direction or angle corresponding to the inter-frame prediction mode
  • the motion information of the block is selectively filtered. For example, when the determined inter prediction mode is an inter-frame direction prediction mode 2, 6 or 10 with a relatively large angle, the motion vector of the current sub-block is estimated by using the motion vector of the target reference block.
  • the motion vector of the target reference block is filtered; for example, the motion information of the neighboring reference block can be used to filter the motion information of the target reference block by using a filter of ⁇ 1/4, 2/4, 1/4 ⁇ , wherein
  • the neighboring reference block is a neighboring reference block that is directly adjacent to the target reference block (eg, adjacent left and right, or adjacent up and down).
  • the motion information of the target reference block may be selectively filtered based on the block size and the prediction direction or angle corresponding to the inter-frame prediction mode, for example, the larger the block, the larger the inter-frame direction prediction mode, Before the motion vector of the current sub-block is estimated using the motion vector of the target reference block, the necessity of the pre-filtering process is greater.
  • the prediction block obtained from the reference image is The reference pictures are not adjacent, which may cause discontinuities between the prediction blocks of the boundary sub-blocks, resulting in discontinuities in the residuals, affecting the image coding/decoding performance of the residuals, and thus considering the sub-blocks at the image block boundaries
  • the motion vector of the block is filtered.
  • the video encoder 100 or the video decoder 200 may be further used (or the method of various embodiments of the present application may further include) filtering the motion information of the boundary sub-block of the current image block, where the boundary sub-block is current One or more sub-blocks in the image block that are at the boundary.
  • the motion information of the boundary sub-block of the current image block is filtered under a specific inter prediction mode (eg, a second inter prediction mode, a vertical prediction mode, a horizontal prediction mode, etc.).
  • filtering can be performed by a filter of ⁇ 1/4, 3/4 ⁇ or ⁇ 1/4, 2/4, 1/4 ⁇ , so that the motion vector of the boundary sub-block changes more gently, which should be understood. This application is not limited to this.
  • FIG. 11 is a schematic block diagram of an inter prediction apparatus 1100 in the embodiment of the present application. It should be noted that the inter prediction apparatus 1100 is applicable to both inter prediction of a decoded video image and inter prediction of an encoded video image. It should be understood that the inter prediction apparatus 1100 herein may correspond to FIG. 2A.
  • the inter predictor 110 may correspond to the inter predictor 210 in FIG. 2B, and the inter prediction apparatus 1100 may include:
  • the inter prediction mode determining unit 1101 is configured to determine an inter prediction mode used for performing inter prediction on a current image block, where the inter prediction mode is one of a candidate inter prediction mode set, and the candidate frame is
  • the inter-prediction mode set includes: a plurality of inter-prediction modes for non-directional motion fields, or a plurality of inter-prediction modes for directional motion fields;
  • the inter prediction processing unit 1102 is configured to perform inter prediction on the current image block based on the determined inter prediction mode.
  • the inter prediction processing unit 1102 is specifically configured to: according to the determined inter prediction mode, predict one or more sub-blocks in the current image block (specifically, each sub-block or all Motion information of the sub-blocks, and performing inter prediction on the current image block using motion information of one or more sub-blocks in the current image block. It should be understood that after predicting the motion vector of one or more sub-blocks in the current image block, the prediction block of the corresponding sub-block may be generated by the motion compensation process, thereby obtaining the prediction block of the current image block.
  • the inter-prediction apparatus of the embodiment of the present application can predict one or more sub-blocks in the current image block based on which inter-prediction mode is used for directionality or non-directionality (specifically, each can Or motion information (such as motion vectors) of all sub-blocks, in which case, from the result, the predicted motion vector of the current image block is substantially close to the motion vector obtained by using the motion estimation method, so that no motion is required when encoding
  • the vector difference MVD saves the code rate when the video quality is the same, so the codec performance of the interframe prediction apparatus of the embodiment of the present application is further improved.
  • the inter prediction mode determining unit 1101 determines a first inter prediction mode (a planar planner mode for inter prediction) for a non-directional motion field
  • the inter prediction The processing unit 1102 is specifically configured to:
  • the inter-frame The prediction processing unit 1102 is specifically configured to:
  • the first motion based on linear interpolation of fifth motion information of a right-side spatial adjacent block of the first collocated block of the current image block and sixth motion information of a spatially adjacent block of an upper-right corner of the current image block Information, wherein the first collocated block is an image block of the reference image having the same size, shape, and coordinates as the current image block; or
  • the row of the collocated block is the same as the row of the current sub-block located in the current image block; or,
  • the current image block has an image block with a specified position offset, a motion vector of a representative airspace neighboring block of the current image block is used to represent the specified position offset, and the second right airspace neighboring block is located in the
  • the row of the second collocated block is the same as the row of the current sub-block located in the current image block; or
  • the first motion information is described.
  • the inter prediction processing unit 1102 is specifically configured to use, in the aspect of acquiring the second motion information of the lower neighboring block of the current image block on the same column as the current subblock. :
  • the second motion based on linear interpolation of fifth motion information of a lower right spatial adjacent block of the first collocated block of the current image block and seventh motion information of a neighboring block of a lower left corner of the current image block Information, wherein the first collocated block is an image block of the reference image having the same size, shape, and coordinates as the current image block; or
  • the column of the collocated block is the same as the column of the current sub-block located in the current image block;
  • the current image block has an image block with a specified position offset, a motion vector of a representative airspace neighboring block of the current image block is used to represent the specified position offset, and the second lower airspace neighboring block is located in the
  • the column of the second collocated block is the same as the column of the current sub-block located in the current image block; or
  • the inter-frame prediction processing unit 1102 is specifically configured to:
  • the weighted value is a first predicted value of the motion information of the current sub-block, wherein a ratio between the weighting factor of the third motion information and the weighting factor of the first motion information is based on the same line as the current sub-block of the current image block. a first distance between the right-side spatial neighboring block of the current image block and the current sub-block, and the current sub-block and the left-side spatial neighboring block of the current image block on the same row as the current sub-block The ratio between the second distances is determined;
  • the inter prediction processing unit 1102 is specifically configured to:
  • the weighted value of the motion information is a second predicted value of the motion information of the current sub-block, wherein the ratio between the weighting factor of the fourth motion information and the weighting factor of the second motion information is based on the same as the current sub-block of the current image block.
  • a third distance between the lower side spatial neighboring block of the current image block and the current sub-block, and the current sub-block and the upper airspace of the current image block on the same column as the current sub-block The ratio between the fourth distances between adjacent blocks is determined.
  • the inter prediction process based on the first inter prediction mode for the non-directional motion field also referred to as the planar planar mode for inter prediction
  • horizontal and vertical linear interpolation is used in the embodiment of the present application.
  • the mean value is used to derive the motion vector of the current sub-block, and the motion vector of the image block or its sub-blocks with the gradual motion field can be better predicted, thereby improving the prediction accuracy of the motion vector.
  • the inter prediction processing unit 1102 is specifically used to:
  • the mean value of the fourth motion information of the spatial neighboring block is the motion information of the current sub-block
  • Determining that the motion information of the plurality of left-side spatial neighboring blocks of the current image block and the motion information of the plurality of upper-side spatial neighboring blocks of the current image block are one or more sub-blocks of the current image block (eg, may be all children) Block) motion information.
  • the direct left side of the current image block is used in the embodiment of the present application.
  • the motion vector of the neighboring block of the spatial domain and the neighboring block of the upper spatial domain is used to derive the motion vector of the current sub-block, and the motion vector of the image block or its sub-block with smooth motion field can be predicted well, thereby improving the prediction of the motion vector. accuracy.
  • the inter prediction processing unit 1102 Specifically used for:
  • the target reference block is a reference block corresponding to the current sub-block determined on the reference row or the reference column according to a prediction direction (angle) corresponding to the inter-frame direction prediction mode.
  • the motion vectors of one or more sub-blocks along the prediction direction are identical to each other and the value of the motion vector depends on the target.
  • the motion vector of the block the motion vector of the image block or its sub-blocks with the directional motion field can be better predicted, and the prediction accuracy of the motion vector is improved.
  • the inter prediction processing unit 1102 is further configured to: before performing linear interpolation or weighting or averaging of the plurality of sets of motion information:
  • the motion vector corresponding to the specified reference image list included in the current motion motion information is based on a time domain distance A scaling process to obtain a motion vector of a reference frame that points to the target reference image index.
  • the device 1100 When the device 1100 is configured to decode a video image, the device 1100 may further include:
  • An inter prediction data acquiring unit (not illustrated in the figure), configured to receive inter prediction data including a first identifier for indicating whether to use the candidate inter prediction mode set for inter prediction of a current image block;
  • the inter prediction mode determining unit 1101 is specifically configured to determine, according to the candidate inter prediction mode set, when the inter prediction data indicates that the current image block is predicted by using the candidate inter prediction mode set.
  • the inter prediction mode determining unit 1101 the inter prediction mode used to determine the second identifier indication is an inter prediction mode used for performing inter prediction on the current image block;
  • the inter prediction mode determining unit 1101 is specifically configured to:
  • a first inter prediction mode (also referred to as a Planar Planar mode for inter prediction) for determining a non-directional motion field is an inter prediction mode for inter prediction of the current image block.
  • the device 1100 When the device 1100 is used to encode a video image, the device 1100 may further include:
  • the inter prediction mode determining unit 1101 is specifically configured to determine that an inter prediction mode in which the code rate distortion cost of the current image block is the smallest in the candidate inter prediction mode set is used for inter prediction of a current image block. Inter prediction mode.
  • each module in the inter prediction apparatus in the embodiment of the present application is a functional main body that implements various execution steps included in the inter prediction method of the present application, that is, in the method for realizing the inter prediction method of the present application.
  • steps and the function of the extension and deformation of these steps please refer to the introduction of the inter-frame prediction method in this article. For the sake of brevity, this article will not go into details.
  • FIG. 12 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as decoding device 1200 for short) in an embodiment of the present application.
  • the decoding device 1200 can include a processor 1210, a memory 1230, and a bus system 1250.
  • the processor and the memory are connected by a bus system for storing instructions for executing instructions stored in the memory.
  • the memory of the encoding device stores the program code, and the processor can invoke the program code stored in the memory to perform various video encoding or decoding methods described herein, particularly video encoding or decoding methods in various new inter prediction modes. And methods of predicting motion information in various new inter prediction modes. To avoid repetition, it will not be described in detail here.
  • the processor 1210 may be a central processing unit (“CPU"), and the processor 1210 may also be other general-purpose processors, digital signal processors (DSPs), and dedicated integration. Circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 1230 can include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 1230.
  • Memory 1230 can include code and data 1231 that are accessed by processor 1210 using bus 1250.
  • the memory 1230 can further include an operating system 1233 and an application 1235 that includes a video encoding or decoding method (especially the inter prediction method or motion information prediction method described herein) that allows the processor 1210 to perform the methods described herein.
  • application 1235 can include applications 1 through N, which further include a video encoding or decoding application (referred to as a video coding application) that performs the video encoding or decoding methods described herein.
  • the bus system 1250 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 1250 in the figure.
  • decoding device 1200 may also include one or more output devices, such as display 1270.
  • display 1270 can be a tactile display that combines the display with a tactile unit that operatively senses a touch input.
  • Display 1270 can be coupled to processor 1210 via bus 1250.
  • the computer readable medium can comprise a computer readable storage medium corresponding to a tangible medium, such as a data storage medium, or any communication medium that facilitates transfer of the computer program from one location to another (eg, according to a communication protocol) .
  • a computer readable medium may generally correspond to (1) a non-transitory tangible computer readable storage medium, or (2) a communication medium, such as a signal or carrier.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application.
  • the computer program product can comprise a computer readable medium.
  • such computer readable storage medium may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage or other magnetic storage, flash memory or may be used to store instructions or data structures
  • the desired program code in the form of any other medium that can be accessed by a computer.
  • any connection is properly termed a computer-readable medium.
  • a coaxial cable, fiber optic cable, twisted pair cable, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit commands from a website, server, or other remote source
  • the coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the media.
  • the computer readable storage medium and data storage medium do not include connections, carrier waves, signals, or other temporary media, but rather are directed to non-transitory tangible storage media.
  • magnetic disks and optical disks include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), and blu-ray disc, where the disc typically reproduces data magnetically, while the disc is optically reproduced using a laser data. Combinations of the above should also be included in the scope of computer readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processor may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein.
  • the functions described in the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or Into the combined codec.
  • the techniques may be fully implemented in one or more circuits or logic elements.
  • the techniques of the present application can be implemented in a wide variety of devices or devices, including a wireless handset, an integrated circuit (IC), or a group of ICs (eg, a chipset).
  • IC integrated circuit
  • a group of ICs eg, a chipset
  • Various components, modules or units are described herein to emphasize functional aspects of the apparatus for performing the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Indeed, as described above, various units may be combined in a codec hardware unit in conjunction with suitable software and/or firmware, or by interoperating hardware units (including one or more processors as described above) provide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请实施例公开了视频图像的帧间预测方法和相关产品。其中该帧间预测方法包括:确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式;以及,基于所述确定的帧间预测模式,对所述当前图像块执行帧间预测。本申请实施例还公开了基于不同的帧间预测模式的运动信息预测方法。本申请实施例方案有利于提升图像块的运动信息(例如运动矢量)的预测准确性,在视频质量相同的情况下节省了码率,从而提高编解码性能。

Description

视频图像的帧间预测方法、装置及编解码器
本申请要求于2017年9月29日提交中国专利局、申请号为201710912607.0、申请名称为“视频图像的帧间预测方法、装置及编解码器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频编解码技术领域,尤其涉及一种视频图像的帧间预测方法、装置以及相应的编码器和解码器。
背景技术
数字视频能力可并入到多种多样的装置中,包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子图书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频压缩技术,例如,在由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4第10部分高级视频编码(AVC)定义的标准、视频编码标准H.265/高效视频编码(HEVC)标准以及此类标准的扩展中所描述的视频压缩技术。视频装置可通过实施此类视频压缩技术来更有效率地发射、接收、编码、解码和/或存储数字视频信息。
视频压缩技术执行空间(图像内)预测和/或时间(图像间)预测以减少或去除视频序列中固有的冗余。对于基于块的视频编码,视频条带(即,视频帧或视频帧的一部分)可分割成若干图像块,所述图像块也可被称作树块、编码单元(CU)和/或编码节点。使用关于同一图像中的相邻块中的参考样本的空间预测来编码图像的待帧内编码(I)条带中的图像块。图像的待帧间编码(P或B)条带中的图像块可使用相对于同一图像中的相邻块中的参考样本的空间预测或相对于其它参考图像中的参考样本的时间预测。图像可被称作帧,且参考图像可被称作参考帧。
其中,包含高效视频编码(HEVC)标准在内的各种视频编码标准提出了用于图像块的预测性编码模式,即基于已经编码的视频数据块来预测当前待编码的块。在帧内预测模式中,基于与当前块在相同的图像中的一或多个先前经解码相邻块来预测当前块;在帧间预测模式中,基于不同图像中的已经解码块来预测当前块。
然而,现有的几种帧间预测模式,例如合并模式(Merge mode)、跳过模式(Skip mode)和高级运动矢量预测模式(AMVP mode)仍然无法满足实际的不同应用场景对运动矢量的预测准确性的要求。
发明内容
本申请实施例提供一种视频图像的帧间预测方法、装置及相应的编码器和解码器,一定程度上提高图像块的运动信息的预测准确性,从而提高编解码性能。
第一方面,本申请实施例提供了一种视频图像的帧间预测方法,包括:确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式;以及,基于所述确定的帧间预测模式,对所述当前图像块执行帧间预测。
在可行的实施方式下,所述基于确定的帧间预测模式,对所述当前图像块执行帧间预测,包括:
基于所述确定的帧间预测模式,预测所述当前图像块中一个或多个子块(具体可以是每个子块或所有子块)的运动信息,并利用所述当前图像块中一个或多个子块的运动信息对所述当前图像块执行帧间预测。
应当理解的是,这里的候选帧间预测模式集合可以是一个模式,也可以是多个模式。当候选帧间预测模式集合是一个模式(例如用于非方向性的运动场的第一帧间预测模式,亦称为用于帧间预测的平面planar模式)时,可以确定该模式为用于对当前图像块进行帧间预测的帧间预测模式。当候选帧间预测模式集合是多个模式时,可以缺省确定该集合中排列优先级最高或排列位置最前面的模式为用于对当前图像块进行帧间预测的帧间预测模式;或者,可以确定第二标识指示的模式为用于对当前图像块进行帧间预测的帧间预测模式;或者,可以确定用于非方向性的运动场的第一帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
其中,本申请各实施例中提及的用于非方向性的运动场的多种帧间预测模式,例如可以包括:用于非方向性的运动场的第一帧间预测模式(亦可称为用于帧间预测的平面planar模式,或者插值帧间预测模式)、用于非方向性的运动场的第二帧间预测模式(亦可称为用于帧间预测的直流系数DC模式)。
其中,本申请各实施例中提及的用于方向性的运动场的多种帧间预测模式,例如可以包括用于帧间预测的各种方向预测模式。
可见,通过考虑运动场的特征将新的帧间预测模式分为用于非方向性的运动场的帧间预测模式和/或用于方向性的运动场的帧间预测模式,无论基于哪一种用于方向性或非方向性的帧间预测模式均能预测出当前图像块中一个或多个子块的运动信息(例如运动矢量),这样的话,从结果来看,预测出的当前图像块的运动矢量基本上接近使用运动估算方法得到的运动矢量,提升了运动矢量的预测准确性,当编码时无需传送运动矢量差值MVD,在视频质量相同的情况下节省了码率,编解码性得到进一步的改善。
例如,在一些可能的实施场景下,所述确定的帧间预测模式为用于非方向性的运动场的第一帧间预测模式(亦可称为用于帧间预测的平面planner模式),相应地,所述预测当前图像块中当前子块的运动矢量包括:
利用多个参考块的运动矢量,预测或推导与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块的第一运动矢量,及与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动矢量,其中所述多个参考块包括当前图像块的空域邻近参考块和/或时域邻近参考块;
基于所述第一运动矢量和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动矢量的水平插值,得到当前子块的运动矢量的第一预测值;在可能的实现方式下,例如,
确定与当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的加权值,为当前子块的运动信息的第一预测值,其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块和当前子块之间的第一距离,与,当前子块和与当前子块相同行上的、所述当前图像块的左侧空域邻近块之间的第二距离之间的比例确定的;
基于所述第二运动矢量和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动矢量的垂直插值,得到当前子块的运动矢量的第二预测值;在可能的实现方式下,例如,
确定与当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息和与当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的加权值,为当前子块的运动信息的第二预测值,其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块和当前子块之间的第三距离,与,当前子块和与当前子块相同列上的、所述当前图像块的上侧空域邻近块之间的第四距离之间的比例确定的;
利用所述当前子块的运动矢量的第一预测值和所述当前子块的运动矢量的第二预测值,确定所述当前子块的运动矢量;在可能的实施方式下,例如,对所述当前子块的运动矢量的第一预测值和所述当前子块的运动矢量的第二预测值进行加权处理,得到所述当前子块的运动矢量。
其中,所述预测或推导与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块的第一运动信息的方式可能是多种多样的,如下:
第一种方式:基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的右上角空域邻近块的第六运动信息的线性插值,得到所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
第二种方式:确定所述当前图像块的第一并置块(co-located)的第一右侧空域邻近块的运动信息为所述第一运动信息,其中所述第一右侧空域邻近块位于所述第一并置块的行与当前子块位于所述当前图像块的行相同;或者
第三种方式:确定所述当前图像块的第二并置块(co-located)的第二右侧空域邻近块的运动信息为所述第一运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于所述第二并置块的行与当前子块位于所述当前图像块的行相同;或者
第四种方式:确定所述当前图像块的右上角空域邻近块的第六运动信息为所述第一运动信息;或者,确定所述当前图像块的右上侧的多个空域邻近块的运动信息的均值为所述第一运动信息。
应当理解的是,上述四种推导方式可以按照一定逻辑组合使用,例如如果使用上述第一种方式推导不出来第一运动信息的情况下,进一步的使用上述第四种方式来推导第一运动信息;又例如,依序使用第一、第二、第三和第四种方式推导,以得到第一运动信息。
其中,所述预测或推导与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的方式可能是多种多样的,如下:
第一种方式:基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的左下角空域邻近块的第七运动信息的线性插值,得到所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
第二种方式:确定所述当前图像块的第一并置块(co-located)的第一下侧空域邻近块的运动信息为所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一下侧空域邻近块位于所述第一并置块的列与当前子块位于所述当前图像块的列相同;或者
第三种方式:确定所述当前图像块的第二并置块(co-located)的第二下侧空域邻近块的运动信息为所述第二运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二下侧空域邻近块位于所述第二并置块的列与当前子块位于所述当前图像块的列相同;或者
第四种方式:确定所述当前图像块的左下角空域邻近块的第七运动信息为所述第二运动信息;或者,确定所述当前图像块的左下侧的多个空域邻近块的运动信息的均值为所述第二运动信息。
应当理解的是,上述四种推导方式可以按照一定逻辑组合使用,例如如果使用上述第一种方式推导不出来第二运动信息的情况下,进一步的使用上述第四种方式来推导第二运动信息;又例如,依序使用第一、第二、第三和第四种方式推导,以得到第二运动信息。
可见,基于用于非方向性的运动场的第一帧间预测模式的帧间预测过程中,使用水平和垂直线性插值的加权值来推导当前子块的运动矢量,能较好的预测具有渐变的运动场的图像块或其子块的运动矢量,从而提高了运动矢量的预测准确性。
又例如,在一些可能的实施场景下,所述确定的帧间预测模式为用于非方向性的运动场的第二帧间预测模式(亦称为用于帧间预测的直流系数DC模式),所述预测当前图像块中当前子块的运动信息包括:
确定与当前图像块的当前子块相同行上的、所述当前图像块的左侧空域邻近块的第三运动信息和与所述当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息的均值为所述当前子块的运动信息;或者,
确定当前图像块的多个左侧空域邻近块的运动信息和当前图像块的多个上侧空域 邻近块的运动信息的均值为所述当前图像块的一个或多个子块(例如所有子块)的运动信息。
可见,基于用于非方向性的运动场的第二帧间预测模式的帧间预测过程中,使用当前图像块的直接左侧空域邻近块、上侧空域邻近块的运动矢量的均值来推导当前子块的运动矢量,能较好的预测具有平滑的运动场的图像块或其子块的运动矢量,从而提高了运动矢量的预测准确性。
再例如,在一些可能的实施场景下,所述确定的帧间预测模式为用于方向性的运动场的帧间方向预测模式(亦称为用于帧间预测的方向预测模式),所述预测当前图像块中当前子块的运动信息包括:
确定一个目标参考块(亦称为投影参考块)的运动信息为当前图像块的当前子块的运动信息;或者,
确定两个目标参考块的运动信息的加权值为所述当前子块的运动信息,或者
确定所述两个目标参考块及所述两个目标参考块的两个邻近块的运动信息的加权值为所述当前子块的运动信息;
其中所述目标参考块是根据所述帧间方向预测模式对应的预测方向(角度)在参考行或参考列上确定的与当前子块对应的参考块。
其中,所述参考行或参考列均不属于所述当前图像块,且所述参考行上的参考块为与所述当前图像块的第一行子块相邻的一行上侧空域邻近块,所述参考列上的参考块为与所述当前图像块第一列子块相邻的一列左侧空域邻近块。
可见,基于用于方向性的运动场的帧间方向预测模式的帧间预测过程中,沿着预测方向的一个或多个子块的运动矢量彼此相同且运动矢量的值取决于目标参考块的运动矢量,从而能较好的预测具有方向性的运动场的图像块或其子块的运动矢量,提高了运动矢量的预测准确性。
其中,本申请各实施例中提及的运动信息主要指运动矢量,但应当理解的是,运动信息还可以包括参考图像信息,其中参考图像信息可以包括但不限于参考图像列表和参考图像列表对应的参考图像索引。
进一步的,为了提高运动矢量预测的有效性,在执行多组运动信息的线性插值或加权或求均值之前,本申请各个实施例的方法还可以包括:
确定当前图像块的、与指定参考图像列表对应的目标参考图像索引;
判断所述多组运动信息各自包括的与所述指定参考图像列表对应的参考图像索引是否与所述目标参考图像索引相同;
如果当前运动信息包括的与所述指定参考图像列表对应的参考图像索引不同于所述目标参考图像索引,则对当前运动运动信息包括的与所述指定参考图像列表对应的运动矢量进行基于时域距离的缩放处理,以得到指向所述目标参考图像索引的参考帧的运动矢量。
进一步的,为了减少由参考块运动信息的边缘edge效应引起的当前块运动信息的 轮廓效应,本申请各个实施例的方法还可以包括:基于与帧间方向预测模式对应的预测方向或角度,对目标参考块的运动信息选择性地进行滤波。
进一步的,为了减少块边界运动信息的不连续性,本申请各个实施例的方法还可以包括:对当前图像块的边界子块的运动信息进行滤波,该边界子块为当前图像块中位于边界的一个或多个子块。尤其是,对特定帧间预测模式(例如第二帧间预测模式、垂直预测模式、水平预测模式等)下当前图像块的边界子块的运动信息进行滤波。
其中,当用于解码视频图像,本申请各实施例中的帧间预测方法还可以包括:
解码码流,以得到包括第一标识的帧间预测数据,其中所述第一标识用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测;
相应地,所述确定用于对当前图像块进行帧间预测的帧间预测模式,包括:当所述帧间预测数据指示采用候选帧间预测模式集合来对当前图像块进行预测时,从所述候选帧间预测模式集合中确定用于对当前图像块进行帧间预测的帧间预测模式。
进一步的,如果所述帧间预测数据还包括用于指示所述当前图像块的帧间预测模式的第二标识,则所述确定用于对所述当前图像块进行帧间预测的帧间预测模式包括:确定所述第二标识指示的帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式;或者,
如果所述帧间预测数据未包括用于指示所述当前图像块的帧间预测模式的第二标识,则所述确定用于对所述当前图像块进行帧间预测的帧间预测模式包括:确定用于非方向性的运动场的第一帧间预测模式(亦称为用于帧间预测的平面Planar模式)为用于对所述当前图像块进行帧间预测的帧间预测模式。
其中,当用于编码视频图像,本申请各实施例中的帧间预测方法中,所述确定用于对当前图像块进行帧间预测的帧间预测模式,包括:确定所述候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。应当理解的是,如果用于平滑或渐变的运动场的第一帧间预测模式(亦称为用于帧间预测的平面Planar模式)编码所述当前图像块的码率失真代价最小,则确定用于平滑或渐变的运动场的第一帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
进一步的,当用于编码视频图像,本申请各实施例中的帧间预测方法还可以包括:
将帧间预测数据编入码流,其中所述帧间预测数据包括:用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测的第一标识;或者,所述帧间预测数据包括:用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测的第一标识和用于指示当前图像块的帧间预测模式的第二标识。
第二方面,本申请实施例提供一种视频图像的帧间预测装置,包括用于实施第一方面的任意一种方法的若干个功能单元。举例来说,视频图像的帧间预测装置可以包括:
帧间预测模式确定单元,用于确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式;
帧间预测处理单元,用于基于所述确定的帧间预测模式,对所述当前图像块执行帧间预测。
其中,如果所述帧间预测模式确定单元确定用于非方向性的运动场的第一帧间预测模式(亦可称为用于帧间预测的平面planner模式),所述帧间预测处理单元具体用于:
利用多个参考块的运动矢量,预测或推导与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块的第一运动矢量,及与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动矢量,其中所述多个参考块包括当前图像块的空域邻近参考块和/或时域邻近参考块;
基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息;在可能的实施方式下,例如,对所述当前子块的运动矢量的第一预测值和所述当前子块的运动矢量的第二预测值进行加权处理,得到所述当前子块的运动矢量。
其中,所述视频图像的帧间预测装置例如应用于视频编码装置(视频编码器)或视频解码装置(视频解码器)。
第三方面,本申请实施例提供一种视频编码器,所述视频编码器用于编码图像块,包括:
上述帧间预测器,其中所述帧间预测器用于基于帧间预测模式,预测待编码图像块的预测块,其中所述帧间预测模式是候选帧间预测模式集合中的一种;
熵编码器,用于将第一标识编入码流,所述第一标识用于指示是否对所述待编码图像块采用所述候选帧间预测模式集合进行帧间预测,换言之,所述第一标识用于指示是否对当前待编码图像块采用新的帧间预测模式;
重建器,用于根据所述预测块重建所述图像块,并将所述重建的图像块存储于内存中。
在一些可能的实施方式下,所述熵编码器,还用于将第二标识编入码流,所述第二标识用于指示所述待编码图像块的帧间预测模式,换言之,即所述第二标识用于指示对待编码图像块采用哪一种新的帧间预测模式进行帧间预测。
第四方面,本申请实施例提供一种视频解码器,所述视频解码器用于从码流中解 码出图像块,包括:
熵解码器,用于从码流中解码出第一标识,所述第一标识用于指示是否对待解码图像块采用候选帧间预测模式集合进行帧间预测,换言之,所述第一标识用于指示是否对待解码图像块采用新的帧间预测模式;
上述的帧间预测器,其中所述帧间预测器用于基于帧间预测模式预测所述待解码图像块的预测块,其中所述帧间预测模式是所述候选帧间预测模式集合中的一种;
重建器,用于根据所述预测块重建所述图像块。
在一些可能的实施方式下,所述熵解码器还用于从所述码流中解码出第二标识,所述第二标识用于指示所述待解码图像块的帧间预测模式,换言之,所述第二标识用于指示所述待解码图像块采用的是哪一种新的帧间预测模式。
第五方面,本申请实施例提供一种用于解码视频数据的设备,所述设备包括:
存储器,用于存储码流形式的视频数据;
视频解码器,用于从码流中解码出包括第一标识的帧间预测数据,所述第一标识与当前待解码图像块相关;当所述第一标识为真时,基于帧间预测模式对所述当前待解码图像块执行帧间预测,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式。
第六方面,本申请实施例提供一种用于编码视频数据的设备,所述设备包括:
存储器,用于存储视频数据,所述视频数据包括一个或多个图像块;
视频编码器,用于将包括第一标识的帧间预测数据编入码流,所述第一标识与当前待编码图像块相关;其中,当所述第一标识为真时,所述第一标识用于指示基于帧间预测模式对所述当前待编码图像块执行帧间预测,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式。
第七方面,本申请实施例提供一种用于解码视频数据的设备,所述设备包括:
存储器,用于存储经编码的视频数据;
视频解码器,用于预测与当前待解码图像块的当前子块相同行上的、所述图像块的右侧空域邻近块的第一运动信息,及与所述当前待解码图像块的当前子块相同列上的、所述图像块的下侧空域邻近块的第二运动信息;基于所述第一运动信息和与当前子块相同行上的、所述图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;基于所述第二运动信息和与当前子块相同列上的、所述图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,得到所述当前子块的运动信息;并利用当前待解码图像块中包括所述当前子块的一个或多个子块的运动信息解码所述图像块。
第八方面,本申请实施例提供一种运动信息预测方法,所述方法包括:
预测或推导与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块的第一运动信息;
预测或推导与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息;
基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息。
第九方面,本申请实施例提供一种编码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行第一方面的任意一种方法的部分或全部步骤。
第十方面,本申请实施例提供一种解码设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行第一方面的任意一种方法的部分或全部步骤。
第十一方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储了程序代码,其中,所述程序代码包括用于执行第一方面的任意一种方法的部分或全部步骤的指令。
第十二方面,本申请实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行第一方面的任意一种方法的部分或全部步骤。
第十三方面,本申请实施例提供了一种视频图像的帧间预测方法,包括:确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于平滑或渐变的运动场的第一帧间预测模式;以及,基于所述确定的帧间预测模式,预测所述当前图像块中每个子块的运动信息,并利用所述当前图像块中每个子块的运动信息对所述当前图像块执行帧间预测。
第十四方面,本申请实施例提供一种视频图像的帧间预测装置,包括用于实施第十三方面的任意一种方法的若干个功能单元。举例来说,视频图像的帧间预测装置可以包括:
帧间预测模式确定单元,用于确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于平滑或渐变的运动场的第一帧间预测模式;
帧间预测处理单元,用于基于所述确定的帧间预测模式,预测所述当前图像块中每个子块的运动信息,并利用所述当前图像块中每个子块的运动信息对所述当前图像 块执行帧间预测。
应当理解的是,本申请的第二至十四方面与本申请的第一方面的技术方案一致,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1为本申请实施例中一种视频编码及解码系统的示意性框图;
图2A为本申请实施例中一种视频编码器的示意性框图;
图2B为本申请实施例中一种视频解码器的示意性框图;
图3为本申请实施例中一种用于编码视频图像的帧间预测的方法的流程图;
图4为本申请实施例中一种用于解码视频图像的帧间预测的方法的流程图;
图5为本申请实施例中的多个候选帧间预测模式的示意图;
图6为本申请实施例中一种示例性的当前图像块和邻近参考块的运动信息示意图;
图7为本申请实施例中的基于用于非方向性的运动场的第一帧间预测模式获取当前图像块中当前子块的运动信息的方法的流程图;
图8A至图8D为本申请实施例中四种示例性的第一帧间预测模式的原理示意图;
图9为本申请实施例中一种示例性的第二帧间预测模式的原理示意图;
图10A至图10E为本申请实施例中五种示例性的帧间方向预测模式的原理示意图;
图11为本申请实施例中一种视频图像的帧间预测装置的示意性框图;
图12为本申请实施例中一种编码设备或解码设备的示意性框图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。
编码视频流,或者其一部分,诸如视频帧或者图像块可以使用视频流中的时间和空间相似性以改善编码性能。例如,视频流的当前图像块可以通过基于视频流中的先前已编码块预测用于当前图像块的运动信息,并识别预测块和当前图像块(即原始块)之间的差值(亦称为残差),从而基于先前已编码块对当前图像块进行编码。以这种方法,仅仅将用于产生当前图像块的残差和一些参数包括于数字视频输出位流中,而不是将当前图像块的整体包括于数字视频输出位流。这种技术可以称为帧间预测。
运动矢量是帧间预测过程中的一个重要参数,其表示先前已编码块相对于该当前编码块的空间位移。可以使用运动估算的方法,诸如运动搜索来获取运动矢量。初期的帧间预测技术,将表示运动矢量的位包括在编码的位流中,以允许解码器再现预测块,进而得到重建块。为了进一步的改善编码效率,后来又提出使用参考运动矢量差分地编码运动矢量,即取代编码运动矢量整体,而仅仅编码运动矢量和参考运动矢量之间的差值。在有些情况下,参考运动矢量可以是从在视频流中先前使用的运动矢量中选择出来的,选择先前使用的运动矢量编码当前的运动矢量可以进一步减少包括在编码的视频位流中的位数。
不限于现有标准中的几种帧间预测模式,例如在HEVC标准中,对于预测单元(PU)存在两个帧间预测模式,分别称为合并(跳过被视为合并的特殊情况)和高级运动向量预测(AMVP)模式,为了进一步的改善编解码性能,例如编码时无需传送当前编码块的运动矢量和参考运动矢量之间的差值,而且尽可能地减少需传送的残差值,本申请进一步的提出了多种新的帧间预测模式,包括用于(预测)非方向性的运动场的多种帧间预测模式,和/或用于(预测)方向性的运动场的多种帧间预测模式,以形成候选帧间预测模式集合。在一种示例下,如图5和表-1所示,这里的用于(预测)非方向性的运动场的多种帧间预测模式可以包括用于非方向性的运动场的第一帧间预测模式(例如用于平滑也用于渐变的运动场的第一帧间预测模式,简称为模式0)和用于非方向性的运动场的第二帧间预测模式(例如主要用于平滑的运动场的第二帧间预测模式,简称为模式1)。这里的用于(预测)方向性的运动场的多种帧间预测模式(简称为帧间方向预测模式)可以对应于不同的预测方向或角度,本申请的帧间方向预测模式的数量不限于9个(即表2所示的模式2至10)或32个(即表3所示的模式2至33),其个数可以随着运动矢量的预测精度的要求而增加或者减少。
尤其需要说明的是,本申请中,模式0可以理解为用于帧间预测的平面Planar模式,模式1可以理解为用于帧间预测的DC模式,以及模式N可以理解为用于预测方向性的运动场的帧间方向预测模式,N=2,3…10或者N=2,3…33。具体的,用于帧间预测的Planar模式是图像块/子块的运动矢量的水平和垂直线性插值的均值,综合了水平和垂直运动场变化的特点,能够使预测块/子块变化趋向于平缓,适用于运动场变化缓慢的图像块或其子块。用于帧内预测的DC模式使用当前图像块的左邻近、上邻近块的运动矢量的均值作为当前图像块或其子块的运动矢量,适用于平滑的图像块或其子块。而帧间方向预测模式适用于具有方向性的运动场的图像块,以预测该图像块或其子块的运动矢量。
Figure PCTCN2018105148-appb-000001
表1
在一种示例下,如下表2所示,帧间方向预测模式(2~10)的角度参数A有以下对应关系:
Figure PCTCN2018105148-appb-000002
表2
或者,在另一种示例下,如下表3所示,帧间方向预测模式(2~33)的角度参数A有以下对应关系:
Figure PCTCN2018105148-appb-000003
表3
本申请将介绍包括但不限于上述模式0,1,2…10的候选帧间预测模式集合,便于视频编码器在视频数据序列的编码过程中,从该候选帧间预测模式集合中确定或选择用于对当前图像块进行帧间预测的帧间预测模式(例如视频编码器使用多种帧间预测模式来编码视频数据且选择编码图像块的码率失真折中的帧间预测模式),并基于确定的帧间预测模式对所述当前图像块执行帧间预测,进而完成当前图像块的编码。应当理解的是:为了方便阐述本申请的技术方案,这里以包括上述模式0,1,2…10的候选帧间预测模式集合进行说明,但本申请的候选帧间预测模式集合不限于此。
图1为本申请实施例中所描述的一种实例的视频译码系统1的框图。如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本申请中,术语“视频译码”或“译码”可一般地指代视频编码或视频解码。视频译码系统1的视频编码器100和视频解码器200用于根据本申请提出的多种新的帧间预测模式中的任一种所描述的各种方法实例来预测当前经译码图像块或其子块的运动信息,例如运动矢量,使得预测出的运动矢量最大程度上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,从而进一步的改善编解码性能。
如图1中所示,视频译码系统1包含源装置10和目的地装置20。源装置10产生经编码视频数据。因此,源装置10可被称为视频编码装置。目的地装置20可对由源装置10所产生的经编码的视频数据进行解码。因此,目的地装置20可被称为视频解码装置。源装置10、目的地装置20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、 EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。
源装置10和目的地装置20可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
目的地装置20可经由链路30从源装置10接收经编码视频数据。链路30可包括能够将经编码视频数据从源装置10移动到目的地装置20的一或多个媒体或装置。在一个实例中,链路30可包括使得源装置10能够实时将经编码视频数据直接发射到目的地装置20的一或多个通信媒体。在此实例中,源装置10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的地装置20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备。
在另一实例中,可将经编码数据从输出接口140输出到存储装置40。类似地,可通过输入接口240从存储装置40存取经编码数据。存储装置40可包含多种分布式或本地存取的数据存储媒体中的任一者,例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器,或用于存储经编码视频数据的任何其它合适的数字存储媒体。
在另一实例中,存储装置40可对应于文件服务器或可保持由源装置10产生的经编码视频的另一中间存储装置。目的地装置20可经由流式传输或下载从存储装置40存取所存储的视频数据。文件服务器可为任何类型的能够存储经编码的视频数据并且将经编码的视频数据发射到目的地装置20的服务器。实例文件服务器包含网络服务器(例如,用于网站)、FTP服务器、网络附接式存储(NAS)装置或本地磁盘驱动器。目的地装置20可通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。这可包含无线信道(例如,Wi-Fi连接)、有线连接(例如,DSL、电缆调制解调器等),或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置40的传输可为流式传输、下载传输或两者的组合。
本申请的运动矢量预测技术可应用于视频编解码以支持多种多媒体应用,例如空中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,视频译码系统1可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。
图1中所说明的视频译码系统1仅为实例,并且本申请的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/ 或从存储器检索数据且解码数据的装置执行编码和解码。
在图1的实例中,源装置10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,源装置10经由输出接口140将经编码视频数据直接发射到目的地装置20。在其它实例中,经编码视频数据还可存储到存储装置40上,供目的地装置20以后存取来用于解码和/或播放。
在图1的实例中,目的地装置20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包括多种显示装置,例如,液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
尽管图1中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。在一些实例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(UDP)等其它协议。
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本申请技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本申请可大体上将视频编码器100称为将某些信息“发信号通知”或“发射”到例如视频解码器200的另一装置。术语“发信号通知”或“发射”可大体上指代用以对经压缩视频数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
视频编码器100和视频解码器200可根据例如高效视频编码(HEVC)等视频压缩标准或其扩展来操作,并且可符合HEVC测试模型(HM)。或者,视频编码器100和视频解码器200也可根据其它业界标准来操作,所述标准例如是ITU-T H.264、H.265标准,或此类标准的扩展。然而,本申请的技术不限于任何特定编解码标准。
在一个实例中,一并参阅图3,视频编码器100用于:将与当前待编码的图像块相关的语法元素编码入数字视频输出位流(简称为位流或码流),这里将用于当前图像块帧间预测的语法元素简称为帧间预测数据,其中帧间预测数据可以包括用于指示是否对当前图像块采用上述候选帧间预测模式集合进行帧间预测的第一标识(换言之,即用于指示是否对当前图像块采用本申请提出的新的帧间预测模式进行帧间预测的第一标识);或者,帧间预测数据可以包括:用于指示是否对当前待编码图像块采用候选帧间预测模式集合进行帧间预测的第一标识和用于指示当前待图像块的帧间预测模式的第二标识;为了确定用于对当前图像块进行编码的帧间预测模式,视频编码器100还用于确定或选择(S301)上述候选帧间预测模式集合中用于对当前图像块进行帧间预测的帧间预测模式(例如选择多种新的帧间预测模式中编码当前图像块的码率失真代价折中或最小的帧间预测模式);以及基于确定的帧间预测模式,编码所述当前图像块(S303),这里的编码过程可以包括基于确定的帧间预测模式,预测所述当前图像块中一个或多个子块的运动信息(具体可以是每个子块或者所有子块的运动信息),并利用所述当前图像块中一个或多个子块的运动信息对所述当前图像块执行帧间预测;
应当理解的是,如果由基于本申请提出的新的帧间预测模式预测出的运动信息产生的预测块与当前待编码图像块(即原始块)之间的差值(即残差)为0,则视频编码器100中只需要将与当前待编码的图像块相关的语法元素编入位流(亦称为码流);反之,除了语法元素外,还需要将相应的残差编入位流。
在另一实例中,一并参阅图4,视频解码器200用于:从位流中解码出与当前待解码的图像块相关的语法元素(S401),这里将用于当前图像块帧间预测的语法元素简称为帧间预测数据,所述帧间预测数据包括用于指示是否对当前经解码图像块采用候选帧间预测模式集合进行帧间预测的第一标识(即用于指示是否对当前待解码图像块采用本申请提出的新的帧间预测模式进行帧间预测的第一标识),当所述帧间预测数据指示采用候选帧间预测模式集合(即新的帧间预测模式)来对当前图像块进行预测时,确定所述候选帧间预测模式集合中用于对当前图像块进行帧间预测的帧间预测模式(S403),并基于确定的帧间预测模式解码所述当前图像块(S405),这里的解码过程可以包括基于确定的帧间预测模式,预测所述当前图像块中一个或多个子块的运动信息,并利用所述当前图像块中一个或多个子块的运动信息对所述当前图像块执行帧间预测。
可选的,如果所述帧间预测数据还包括用于指示所述当前图像块采用何种帧间预测模式的第二标识,视频解码器200用于确定所述第二标识指示的帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式;或者,如果所述帧间预测数据未包括用于指示所述当前图像块采用何种帧间预测模式的第二标识,视频解码器200用于确定用于非方向性的运动场的第一帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式。
应当理解的是,这里的候选帧间预测模式集合可以是一个模式,也可以是多个模式。当候选帧间预测模式集合是一个模式(例如模式0)时,可以确定该模式为用于对当前图像块进行编码或解码的帧间预测模式。当候选帧间预测模式集合是多个模式时,可以缺省确定该集合中排列优先级最高或排列位置最前面的模式为用于对当前图 像块进行编码或解码的帧间预测模式;或者,可以确定第二标识指示的模式为用于对当前图像块进行解码的帧间预测模式。
由上可见,本申请实施例通过考虑运动场的特征将新的帧间预测模式分为用于非方向性的运动场的帧间预测模式和/或用于方向性的运动场的帧间预测模式,无论视频译码系统1的视频编码器100和视频解码器200采用候选帧间预测模式集合中用于方向性的运动场的帧间预测模式还是用于非方向性的运动场的帧间预测模式来对当前待编码或解码图像块进行译码,均能利用当前图像块的可用参考块的运动矢量(简称参考运动矢量)预测出当前图像块中一个或多个子块的运动信息(例如运动矢量),这样的话,从结果来看,预测出的当前图像块的运动矢量基本上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,在视频质量相同的情况下节省了码率,因此本申请实施例的视频译码系统的编解码性能得到进一步的改善。
图2A为本申请实施例中所描述的一种实例的视频编码器100的框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。
视频编码器100可根据本申请提出的包括模式0,1,2…或10的候选帧间预测模式集合中的任一种新的帧间预测模式执行视频图像块的编码,例如执行视频图像块的帧间预测。
在图2A的实例中,视频编码器100包括预测处理单元108、滤波器单元106、经解码图像缓冲器(DPB)107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106既定表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)和样本自适应偏移(SAO)滤波器。尽管在图2A中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。
视频数据存储器可存储待由视频编码器100的组件编码的视频数据。可从视频源120获得存储在视频数据存储器中的视频数据。DPB 107可为参考图像存储器,其存储用于由视频编码器100在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。视频数据存储器和DPB 107可由多种存储器装置中的任一者形成,例如包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。视频数据存储器和DPB 107可由同一存储器装置或单独存储器装置提供。在各种实例中,视频数据存储器可与视频编码器100的其它组件一起在芯片上,或相对于那些组件在芯片外。
如图2A中所示,视频编码器100接收视频数据,并将所述视频数据存储在视频 数据存储器中。分割单元将所述视频数据分割成若干图像块,而且这些图像块可以被进一步分割为更小的块,例如基于四叉树结构或者二叉树结构的图像块分割。此分割还可包含分割成条带(slice)、片(tile)或其它较大单元。视频编码器100通常说明编码待编码的视频条带内的图像块的组件。所述条带可分成多个图像块(并且可能分成被称作片的图像块集合)。预测处理单元108可选择用于当前图像块的多个可能的译码模式中的一者,例如多个帧内译码模式中的一者或多个帧间译码模式中的一者,其中所述多个帧间译码模式包括但不限于本申请提出的模式0,1,2,3…10中的一个或多个。预测处理单元108可将所得经帧内、帧间译码的块提供给求和器112以产生残差块,且提供给求和器111以重构用作参考图像的经编码块。
预测处理单元108内的帧内预测器109可相对于与待编码当前块在相同帧或条带中的一或多个相邻块执行当前图像块的帧内预测性编码,以去除空间冗余。预测处理单元108内的帧间预测器110可相对于一或多个参考图像中的一或多个预测块执行当前图像块的帧间预测性编码以去除时间冗余。
具体的,帧间预测器110可用于确定用于编码当前图像块的帧间预测模式。举例来说,帧间预测器110可使用速率-失真分析来计算候选帧间预测模式集合中的各种帧间预测模式的速率-失真值,并从中选择具有最佳速率-失真特性的帧间预测模式。速率失真分析通常确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量,以及用于产生经编码块的位速率(也就是说,位数目)。例如,帧间预测器110可确定候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。下文将详细介绍帧间预测性编码过程,尤其是在本申请各种用于非方向性或方向性的运动场的帧间预测模式下,预测当前图像块中一个或多个子块(具体可以是每个子块或所有子块)的运动信息的过程。
帧间预测器110用于基于确定的帧间预测模式,预测当前图像块中一个或多个子块的运动信息(例如运动矢量),并利用当前图像块中一个或多个子块的运动信息(例如运动矢量)获取或产生当前图像块的预测块。帧间预测器110可在参考图像列表中的一者中定位所述运动向量指向的预测块。帧间预测器110还可产生与图像块和视频条带相关联的语法元素以供视频解码器200在对视频条带的图像块解码时使用。又或者,一种示例下,帧间预测器110利用每个子块的运动信息执行运动补偿过程,以生成每个子块的预测块,从而得到当前图像块的预测块;应当理解的是,这里的帧间预测器110执行运动估计和运动补偿过程。
具体的,在为当前图像块选择帧间预测模式之后,帧间预测器110可将指示当前图像块的所选帧间预测模式的信息提供到熵编码器103,以便于熵编码器103编码指示所选帧间预测模式的信息。在本申请中,视频编码器100可在所发射的位流中包含与当前图像块相关的帧间预测数据,其可包括第一标识block_based_enable_flag,以表示是否对当前图像块采用本申请提出的新的帧间预测模式进行帧间预测;可选的,还可以包括第二标识block_based_index,以指示当前图像块使用的是哪一种新的帧间预测模式。本申请中,在不同的模式0,1,2…10下,利用多个参考块的运动矢量来预测当前图像块或其子块的运动矢量的过程,将在下文详细描述。
帧内预测器109可对当前图像块执行帧内预测。明确地说,帧内预测器109可确定用来编码当前块的帧内预测模式。举例来说,帧内预测器109可使用速率-失真分析来计算各种待测试的帧内预测模式的速率-失真值,并从待测试模式当中选择具有最佳速率-失真特性的帧内预测模式。在任何情况下,在为图像块选择帧内预测模式之后,帧内预测器109可将指示当前图像块的所选帧内预测模式的信息提供到熵编码器103,以便熵编码器103编码指示所选帧内预测模式的信息。
在预测处理单元108经由帧间预测、帧内预测产生当前图像块的预测块之后,视频编码器100通过从待编码的当前图像块减去所述预测块来形成残差图像块。求和器112表示执行此减法运算的一或多个组件。所述残差块中的残差视频数据可包含在一或多个TU中,并应用于变换器101。变换器101使用例如离散余弦变换(DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换器101可将残差视频数据从像素值域转换到变换域,例如频域。
变换器101可将所得变换系数发送到量化器102。量化器102量化所述变换系数以进一步减小位速率。在一些实例中,量化器102可接着执行对包含经量化的变换系数的矩阵的扫描。或者,熵编码器103可执行扫描。
在量化之后,熵编码器103对经量化变换系数进行熵编码。举例来说,熵编码器103可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器103熵编码之后,可将经编码位流发射到视频解码器200,或经存档以供稍后发射或由视频解码器200检索。熵编码器103还可对待编码的当前图像块的语法元素进行熵编码。
反量化器104和反变化器105分别应用逆量化和逆变换以在像素域中重构所述残差块,例如以供稍后用作参考图像的参考块。求和器111将经重构的残差块添加到由帧间预测器110或帧内预测器109产生的预测块,以产生经重构图像块。滤波器单元106可以适用于经重构图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重构图像块作为参考块存储在经解码图像缓冲器107中,可由帧间预测器110用作参考块以对后续视频帧或图像中的块进行帧间预测。
应当理解的是,视频编码器100的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器100可以直接地量化残差信号而不需要经变换器101处理,相应地也不需要经反变换器105处理;或者,对于某些图像块或者图像帧,视频编码器100没有产生残差数据,相应地不需要经变换器101、量化器102、反量化器104和反变换器105处理;或者,视频编码器100可以将经重构图像块作为参考块直接地进行存储而不需要经滤波器单元106处理;或者,视频编码器100中量化器102和反量化器104可以合并在一起。
图2B为本申请实施例中所描述的一种实例的视频解码器200的框图。在图2B的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及经解码图像缓冲器207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行 大体上与相对于来自图2A的视频编码器100描述的编码过程互逆的解码过程。
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频位流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频位流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频位流的经编码视频数据的经解码图像缓冲器(CPB)。因此,尽管在图2B中没有示意出视频数据存储器,但视频数据存储器和DPB 207可以是同一个的存储器,也可以是单独设置的存储器。视频数据存储器和DPB 207可由多种存储器装置中的任一者形成,例如:包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。在各种实例中,视频数据存储器可与视频解码器200的其它组件一起集成在芯片上,或相对于那些组件设置在芯片外。
网络实体42可例如为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它此装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频位流发送到视频解码器200之前,网络实体42可实施本申请中描述的技术中的部分。在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。在一些情况下,网络实体42可为图1的存储装置40的实例。
视频解码器200的熵解码器203对位流进行熵解码以产生经量化的系数和一些语法元素。熵解码器203将语法元素转发到预测处理单元208。视频解码器200可接收在视频条带层级和/或图像块层级处的语法元素。本申请中,在一种示例下,这里的语法元素可以包括与当前图像块相关的帧间预测数据,该帧间预测数据可以包括第一标识block_based_enable_flag,以表示是否对当前图像块采用上述候选帧间预测模式集合进行帧间预测(换言之,即以表示是否对当前图像块采用本申请提出的新的帧间预测模式进行帧间预测);可选的,还可以包括第二标识block_based_index,以指示当前图像块使用的是哪一种新的帧间预测模式。
当视频条带被解码为经帧内解码(I)条带时,预测处理单元208的帧内预测器209可基于发信号通知的帧内预测模式和来自当前帧或图像的先前经解码块的数据而产生当前视频条带的图像块的预测块。当视频条带被解码为经帧间解码(即,B或P)条带时,预测处理单元208的帧间预测器210可基于从熵解码器203接收到的语法元素,确定用于对当前视频条带的当前图像块进行解码的帧间预测模式,基于确定的帧间预测模式,对所述当前图像块进行解码(例如执行帧间预测)。具体的,帧间预测器210可确定是否对当前视频条带的当前图像块采用新的帧间预测模式进行预测,如果语法元素指示采用新的帧间预测模式来对当前图像块进行预测,基于新的帧间预测模式(例如通过语法元素指定的一种新的帧间预测模式或默认的一种新的帧间预测模式)预测当前视频条带的当前图像块或当前图像块的子块的运动信息,从而通过运动补偿过程 使用预测出的当前图像块或当前图像块的子块的运动信息来获取或生成当前图像块或当前图像块的子块的预测块。这里的运动信息可以包括参考图像信息和运动矢量,其中参考图像信息可以包括但不限于单向/双向预测信息,参考图像列表号和参考图像列表对应的参考图像索引。对于帧间预测,可从参考图像列表中的一者内的参考图像中的一者产生预测块。视频解码器200可基于存储在DPB 207中的参考图像来建构参考图像列表,即列表0和列表1。当前图像的参考帧索引可包含于参考帧列表0和列表1中的一或多者中。在一些实例中,可以是视频编码器100发信号通知指示是否采用新的帧间预测模式来解码特定块的特定语法元素,或者,也可以是发信号通知指示是否采用新的帧间预测模式,以及指示具体采用哪一种新的帧间预测模式来解码特定块的特定语法元素。应当理解的是,这里的帧间预测器210执行运动补偿过程。下文将详细的阐述在各种新的帧间预测模式下,利用参考块的运动信息来预测当前图像块或当前图像块的子块的运动信息的帧间预测过程。
反量化器204将在位流中提供且由熵解码器203解码的经量化变换系数逆量化,即去量化。逆量化过程可包括:使用由视频编码器100针对视频条带中的每个图像块计算的量化参数来确定应施加的量化程度以及同样地确定应施加的逆量化程度。反变换器205将逆变换应用于变换系数,例如逆DCT、逆整数变换或概念上类似的逆变换过程,以便产生像素域中的残差块。
在帧间预测器210产生用于当前图像块或当前图像块的子块的预测块之后,视频解码器200通过将来自反变换器205的残差块与由帧间预测器210产生的对应预测块求和以得到重建的块,即经解码图像块。求和器211表示执行此求和操作的组件。在需要时,还可使用环路滤波器(在解码环路中或在解码环路之后)来使像素转变平滑或者以其它方式改进视频质量。滤波器单元206可以表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)以及样本自适应偏移(SAO)滤波器。尽管在图2B中将滤波器单元206示出为环路内滤波器,但在其它实现方式中,可将滤波器单元206实施为环路后滤波器。在一种示例下,滤波器单元206适用于重建块以减小块失真,并且该结果作为经解码视频流输出。并且,还可以将给定帧或图像中的经解码图像块存储在经解码图像缓冲器207中,经解码图像缓冲器207存储用于后续运动补偿的参考图像。经解码图像缓冲器207可为存储器的一部分,其还可以存储经解码视频,以供稍后在显示装置(例如图1的显示装置220)上呈现,或可与此类存储器分开。
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频位流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。
下文将详细的阐述本申请在各种新的帧间预测模式下,利用多个参考块的运动信息来预测当前图像块或当前图像块的子块的运动信息的过程。
图6是示出本申请实施例中一种示例性的当前图像块600和参考块的运动信息示意图。如图6所示,W和H是当前图像块600以及当前图像块600的同位置co-located 块(简称为并置块)600’的宽度和高度。当前图像块600的参考块包括:当前图像块600的上侧空域邻近块和左侧空域邻近块,以及并置块600’的下侧空域邻近块和右侧空域邻近块,其中并置块600’为参考图像中与当前图像块600具有相同的大小、形状和坐标的图像块。应当注意的是,当前图像块的下侧空域邻近块和右侧空域邻近块的运动信息不存在,还没编码。应当理解的是,当前图像块600和并置块600’可以是任意块大小。例如,当前图像块600和并置块600’可以包括但不限于16x16像素,32x32像素,32x16像素和16x32像素等。如上所述,每个图像帧可以被分割为用于编码的图像块。这些图像块可以被进一步分割为更小的块,例如当前图像块600和并置块600’可以被分割成多个MxN子块,即每个子块的大小均为MxN像素,而且,每个参考块的大小也为MxN像素,即与当前图像块的子块的大小相同。图6中的坐标以MxN块为衡量单位。“M×N”与“M乘N”可互换使用以指依照水平维度及垂直维度的图像块的像素尺寸,即在水平方向上具有M个像素,且在垂直方向上具有N个像素,其中M、N表示非负整数值。此外,块未必需要在水平方向上与在垂直方向上具有相同数目个像素。举例说明,这里的M=N=4,当然当前图像块的子块大小和参考块的大小也可以是8x8像素,8x4像素,或4x8像素,或者最小的预测块大小。此外,本申请描述的图像块可以理解为但不限于:预测单元(prediction unit,PU)或者编码单元(coding unit,CU)或者变换单元(transform unit,TU)等。根据不同视频压缩编解码标准的规定,CU可包含一个或多个预测单元PU,或者PU和CU的尺寸相同。图像块可具有固定或可变的大小,且根据不同视频压缩编解码标准而在大小上不同。此外,当前图像块是指当前待编码或解码的图像块,例如待编码或解码的预测单元。
在一种示例下,可以沿着方向1依序判断当前图像块600的每个左侧空域邻近块是否可用,以及可以沿着方向2依序判断当前图像块600的每个上侧空域邻近块是否可用,例如判断邻近块(亦称为参考块,可互换使用)是否帧间编码,如果邻近块存在且是帧间编码,则所述邻近块可用;如果邻近块不存在或者是帧内编码,则所述邻近块不可用。如果一个邻近块是帧内编码,则复制邻近的其它参考块的运动信息作为该邻近块的运动信息。按照类似方法检测并置块600’的下侧空域邻近块和右侧空域邻近块是否可用,在此不再赘述。
进一步的,如果可用参考块的大小与当前图像块的子块的大小是4x4,可以直接获取fetch可用参考块的运动信息;如果可用参考块的大小例如是8x4,8x8,可以获取其中心4x4块的运动信息作为该可用参考块的运动信息,该中心4x4块的左上角顶点相对于该参考块的左上角顶点的坐标为((W/4)/2*4,(H/4)/2*4),这里除运算为整除运算,若M=8,N=4,则中心4x4块的左上角顶点相对于该参考块的左上角顶点的坐标为(4,0)。可选地,也可以获取该参考块的左上角4x4块的运动信息作为该可用参考块的运动信息,但本申请并不限于此。
为了简化描述,下文以子块表示MxN子块,以邻近块表示邻近MxN块来进行说明。
图7是示出根据本申请一种实施例的基于第一帧间预测模式预测当前图像块中当前子块的运动信息的过程700的流程图。过程700可由视频编码器100或视频解码器 200执行,具体的,可以由视频编码器100或视频解码器200的帧间预测器110、210来执行。过程700描述为一系列的步骤或操作,应当理解的是,过程700可以以各种顺序执行和/或同时发生,不限于图7所示的执行顺序。假设具有多个视频帧的视频数据流正在使用视频编码器或者视频解码器,执行包括如下步骤的过程700来预测当前视频帧的当前图像块的当前子块的运动信息;
步骤701,利用多个参考块的运动信息来预测(推导)与当前图像块600的当前子块604相同行上的、当前图像块600的右侧空域邻近块806的第一运动信息,以及,与当前图像块600的当前子块604相同列上的、所述当前图像块600的下侧空域邻近块808的第二运动信息;这里的参考块可以包括与当前图像块600在空间上和/或时间上相邻的图像块。在本说明书中,当前图像块是指当前待编码或解码的图像块。
步骤703,基于所述推导得到的右侧空域邻近块806的第一运动信息和与当前子块604相同行上的、当前图像块600的左侧邻近块802的第三运动信息的线性(水平)插值,得到当前子块604的运动信息的第一预测值P h(x,y);
在一种实现方式下,步骤703中,确定与当前子块604相同行上的、所述当前图像块600的右侧邻近块806的第一运动信息和与当前子块604相同行上的、所述当前图像块600的左侧邻近块802的第三运动信息的加权值,为当前子块604的运动信息的第一预测值P h(x,y),其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于与当前图像块600的当前子块604相同行上的、所述当前图像块600的右侧空域邻近块806和当前子块604之间的第一距离,与,当前子块604和与当前子块604相同行上的、所述当前图像块600的左侧空域邻近块802之间的第二距离之间的比例确定的;
步骤705,基于所述推导得到的下侧空域邻近块808的第二运动信息和与当前子块604相同列上的、当前图像块600的上侧邻近块809的第四运动信息的线性(垂直)插值,得到当前子块604的运动信息的第二预测值P v(x,y);
在一种实现方式下,步骤705中,确定与当前子块604相同列上的、所述当前图像块600的上侧空域邻近块809的第四运动信息和与当前子块604相同列上的、所述当前图像块600的下侧空域邻近块808的第二运动信息的加权值,为当前子块604的运动信息的第二预测值P v(x,y),其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于与当前图像块600的当前子块604相同列上的、所述当前图像块600的下侧空域邻近块808和当前子块604之间的第三距离,与,当前子块604和与当前子块604相同列上的、所述当前图像块600的上侧空域邻近块809之间的第四距离之间的比例确定的。
步骤707,利用当前子块604的运动信息的第一预测值P h(x,y)和当前子块604的运动信息的第二预测值P v(x,y),确定所述当前子块604的运动信息P(x,y)。
在一种实现方式下,步骤707中,对当前子块604的运动信息的第一预测值P h(x,y)和当前子块604的运动信息的第二预测值P v(x,y)进行加权处理,得到当前子块604的运动信息P(x,y)。应当理解的是,加权处理中,加权因子相同的情况,就等同于求均值,即确定当前子块604的运动信息的第一预测值P h(x,y)和当前子块604的运动信息的第二预测值的均值为当前子块604的运动信息。
本申请实施例中有多种不同的实现方式来推导与当前子块604相同行上的右侧空域邻近块806的第一运动信息和与当前子块604相同列上的下侧空域邻近块808的第二运动信息,其中在第一种实现方式下,如图8A所示,步骤701可以包括:
步骤701A-1:基于当前图像块600的第一并置块600’的右下角空域邻近块(简称右下角时域邻近块)807的第五运动信息和当前图像块600的右上角空域邻近块805的第六运动信息的线性(垂直)插值,得到与当前图像块600的当前子块604相同行上的、当前图像块600的右侧空域邻近块806的第一运动信息;和,
步骤701A-2:基于当前图像块600的第一并置块600’的右下角空域邻近块(简称右下角时域邻近块)807的第五运动信息和当前图像块600的左下角空域邻近块801的第七运动信息的线性插值(即水平插值),得到与当前图像块600的当前子块604相同列上的、当前图像块600的下侧空域邻近块808的第二运动信息,其中第一并置块600’为参考图像中与当前图像块600具有相同的大小、形状和坐标的图像块;
在一种具体的实现方式下,步骤701A-1中,第一并置块600’的右下角空域邻近块807的第五运动信息和当前图像块600的右上角空域邻近块805的第六运动信息根据公式(1)垂直插值出右侧空域邻近块806的第一运动信息;
步骤701A-2中,第一并置块600’的右下角空域邻近块807的第五运动信息和当前图像块600的左下角空域邻近块801的第七运动信息根据公式(2)水平插值出下侧空域邻近块808的第二运动信息;
R(W,y)=((H-y-1)×AR+(y+1)×BR)/H     (1)
B(x,H)=((W-x-1)×BL+(x+1)×BR)/W     (2)
其中,(x,y)代表当前子块604相对于当前图像块600的左上角子块的坐标,x是介于0和W-1之间的整数,y是介于0与H-1之间的整数,W和H代表当前图像块600的宽度和高度(以子块为度量单位),AR代表右上角空域邻近块805的第六运动信息,BR代表右下角时域邻近块807的第五运动信息,BL代表左下角空域邻近块801的第七运动信息。
在一种具体的实现方式下,步骤703中,右侧空域邻近块806的第一运动信息和左侧空域邻近块802的第三运动信息根据公式(3)水平插值当前子块604的运动信息的第一预测值;
以及,步骤705中,下侧空域邻近块808的第二运动信息和上侧空域邻近块809的第四运动信息根据公式(4)垂直插值当前子块604的运动信息的第二预测值;
以及,步骤707中,水平和垂直线性插值的运动矢量均值(公式5)即为当前子块的运动矢量;
P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)  (3)
P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)  (4)
P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)  (5)
其中,L(-1,y)代表当前子块604所在行的左侧空域邻近块802的第三运动矢量, R(W,y)代表当前子块604所在行的右侧空域邻近块806的第一运动矢量;A(x,-1)代表当前子块604所在列的上侧空域邻近块809的第四运动矢量,B(x,H)代表当前子块604所在列的下侧空域邻近块808的第二运动矢量,P h(x,y)代表水平插值的运动矢量(即第一预测值),P v(x,y)代表垂直插值的运动矢量(即第二预测值),P(x,y)代表当前子块604的运动矢量。
在第二种实现方式下,如图8B所示,其与前述实现方式的区别在于:采用如下方式来预测/推导右侧空域邻近块806的第一运动信息以及下侧空域邻近块808的第二运动信息,即步骤701可以包括:
步骤701B-1:确定当前图像块600的右上角空域邻近块805的第六运动信息为与当前子块604相同列上的右侧空域邻近块806的第一运动信息;或者,确定当前图像块600的右上侧的多个空域邻近块的运动信息的均值为右侧空域邻近块806的第一运动信息;
步骤701B-2:确定当前图像块600的左下角空域邻近块801的第七运动信息为与当前子块604相同列上的下侧空域邻近块808的第二运动信息;或者,确定当前图像块600的左下侧的多个空域邻近块的运动信息的均值为下侧空域邻近块808的第二运动信息。
在第三种实现方式下,如图8C所示,其与前述实现方式的区别在于:采用如下方式来预测/推导右侧空域邻近块806的第一运动信息以及下侧空域邻近块808的第二运动信息,即步骤701可以包括:
步骤701C-1:确定当前图像块600的第一并置块600’(co-located)的第一右侧空域邻近块(简称右侧时域邻近参考块)的运动信息为右侧空域邻近块806的第一运动信息,其中第一右侧空域邻近块位于第一并置块600’的行与当前子块604位于当前图像块600的行相同;和,
步骤701C-2:确定当前图像块600的第一并置块600’(co-located)的第一下侧空域邻近块(简称下侧时域邻近参考块)的运动信息为下侧空域邻近块808的第二运动信息,其中第一下侧空域邻近块位于第一并置块600’的列与当前子块604位于当前图像块600的列相同,其中第一并置块600’为参考图像中与当前图像块600具有相同的大小、形状和坐标的图像块。
在第四种实现方式下,如图8D所示,其与前述实现方式的区别在于:采用如下方式来预测/推导右侧空域邻近块806的第一运动信息以及下侧空域邻近块808的第二运动信息;即步骤701可以包括:
步骤701D-1:确定当前图像块600的第二并置块(co-located)的第二右侧空域邻近块(简称右侧时域邻近参考块)的运动信息为右侧空域邻近块806的第一运动信息,其中第二并置块为参考图像中与当前图像块600具有指定位置偏移的图像块,当前图像块600的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于第二并置块的行与当前子块604位于当前图像块600的行相同;和,
步骤701D-2:确定当前图像块600的第二并置块(co-located)的第二下侧空域邻近块(简称下侧时域邻近参考块)的运动信息为下侧空域邻近块808的第二运动信息,其中第二并置块为参考图像中与当前图像块600具有指定位置偏移的图像块,当前图像块600的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且第二下侧空域邻近块位于第二并置块的列与当前子块604位于当前图像块600的列相同。
需要说明的是,这里的代表性空域邻近块可以是图6所示的左侧空域邻近块或上侧空域邻近块中的某一个可用的空域邻近块,例如,可以是沿着方向1检测到的第一个可用的左侧空域邻近块,或者可以是沿着方向2检测到的第一个可用的上侧空域邻近块;例如,可以是合成模式下对当前图像块的多个指定空域邻近位置点依序检测得到的第一个可用的空域邻近块,如图8A所示的L→A→AR→BL→AL;例如,还可以是从依序检测得到的多个可用的空域邻近块中随机选择或按照预定规则所选择的代表性空域邻近块,本申请实施例不限于此。
应当理解的是,为了便于描述,本申请实施例中以当前图像块600中的当前子块604作为代表来描述当前子块604的运动矢量的预测过程,当前图像块600中的每个子块的运动矢量的预测过程可参见本实施例所述,在此不再赘述。
由上可见,本申请实施例的基于用于非方向性的运动场的第一帧间预测模式(亦称为用于帧间预测的平面planar模式)的帧间预测过程中,使用水平和垂直线性插值的均值来推导当前子块的运动矢量,能较好的预测具有渐变的运动场的图像块或其子块的运动矢量,从而提高了运动矢量的预测准确性。
图9是根据本申请另一种实施例的基于第二帧间预测模式预测当前图像块中当前子块的运动信息的示意图。如图9所示,针对当前图像块600的当前子块604而言,确定与当前图像块600的当前子块604相同行上的、当前图像块600的左侧空域邻近块802的第三运动信息和与当前子块604相同列上的、当前图像块600的上侧空域邻近块809的第四运动信息的均值为当前子块604的运动信息;或者,
确定当前图像块600的多个左侧空域邻近块的运动信息和当前图像块600的多个上侧空域邻近块的运动信息的均值为当前图像块600的一个或多个子块(具体可以是所有子块)的运动信息。
应当理解的是,为了便于描述,本申请实施例中以当前图像块600中的当前子块604作为代表来描述当前子块604的运动矢量的预测过程,当前图像块600中的每个子块的运动矢量的预测过程可参见本实施例所述,在此不再赘述。
由上可见,本申请实施例的基于用于非方向性的运动场的第二帧间预测模式(亦称为用于帧间预测的DC模式)的帧间预测过程中,使用当前图像块的直接左侧空域邻近块、上侧空域邻近块的运动矢量的均值来推导当前子块的运动矢量,能较好的预测具有平滑的运动场的图像块或其子块的运动矢量,从而提高了运动矢量的预测准确性。
图10A-10E是示出根据本申请另一种实施例的基于帧间方向预测模式预测当前图像块中当前子块的运动信息的原理示意图。其中,视频编码器100或视频解码器200 (具体的帧间预测器110、210)预测当前图像块的当前子块的运动信息的过程如下:
根据帧间方向预测模式对应的预测方向,当前图像块内的多个子块被投影至参考行1010或参考列1020上,其中:
所述参考行1010不属于所述当前图像块600,且所述参考行1010是与所述当前图像块的第一行子块相邻的一行上侧空域相邻块,所述参考行1010的第一列可以与所述当前图像块的第一列对齐,也可以与所述当前图像块的第一列未对齐(例如超出所述当前图像块的第一列,并向左边扩展);所述参考列1020不属于所述当前图像块600,且所述参考列1020是与所述当前图像块第一列子块相邻的一列左侧空域邻近块,且所述参考列1020的第一行可以与所述当前图像块的第一行对齐,所述参考列1020的最后一行可以与所述当前图像块的最后一行对齐,也可以与当前图像块600的最后一行不对齐(例如超出所述当前图像块的最后一行,并向下扩展);所述参考行1010和所述参考列1020相交的参考块为所述当前图像块600的左上角空域邻近块。应当理解的是:这里的投影不限定为一种操作,而是方便描述子块与目标参考块的对应关系所引入的一种描述方式。
如果当前子块被投影至参考行1010或参考列1020上的一个目标参考块,确定该目标参考块的运动信息为该当前子块的运动信息;
如果当前子块被投影至参考行1010或参考列1020上的两个目标参考块,对所述两个目标参考块的运动信息进行加权处理,得到当前子块的运动信息,或者,对所述两个目标参考块及所述两个目标参考块的左右或上下邻近块的运动信息进行加权处理,得到当前子块的运动信息;应当理解的是:前者为2个运动矢量的加权,后者为4个运动矢量的加权。其中,这里的加权因子是根据参考块与投影点的距离决定的,距离越近,权值越大。
应当理解的是:这里提到的目标参考块(亦可称为投影参考块)是指根据帧间方向预测模式对应的预测方向(角度)在参考行1010或参考列1020上确定的与当前子块对应的参考块,这种对应性可以理解成当前子块与目标参考块沿着相同预测方向。
其中,图10A示意一种帧间方向预测模式(图5或表2中的模式2)的投影方式,虚线箭头表示帧间方向预测模式2对应的预测方向,举例说明,P (0,0)=R (-1,1),P (0,0)表示当前图像块的左上角子块(0,0)的运动矢量,R (-1,1)表示当前图像块的左侧空域邻近块(-1,1)的运动矢量,类推,这里不再一一说明;
其中,图10B示意另一种帧间方向预测模式(图5或表2中的模式6)的投影方式,虚线箭头表示帧间方向预测模式6对应的预测方向,举例说明,P (0,0)=P (1,1)=…=P (W-1, H-1)=R (-1,-1),P (0,0),P (1,1),…P (W-1,H-1)表示当前图像块的坐标位置(0,0),(1,1)…(W-1,H-1)的几个子块的运动矢量,R (-1,-1)表示当前图像块的左上角空域邻近块(-1,-1)的运动矢量,类推,这里不再一一说明;需要说明的是,如图10B中的实线箭头所示,参考列1020上的一些参考块(即一些左侧邻近参考块)的运动矢量被投影或映射为参考行(尤其是参考行的扩展部分)上的对应的一些参考块的运动矢量;
其中,图10C示意另一种帧间方向预测模式(正水平模式(directly horizontal),例如图5或表2中的模式4,或表3中的模式10)的投影方式,虚线箭头表示此种帧间方向预测模式对应的预测方向(即水平方向),举例说明,P (0,0)=P (1,0)=…=P (W-1,0)= R (-1,0),P (0,0),P (1,0),…P (W-1,0)分别表示当前图像块的首行子块的运动矢量,R (-1,0)表示当前图像块的左侧空域邻近块(-1,0)的运动矢量,类推,这里不再一一说明;
其中,图10D示意另一种帧间方向预测模式(即垂直模式(directly vertical),例如图5或表2中的模式8,或表3中的模式26)的投影方式,虚线箭头表示此种帧间方向预测模式对应的预测方向(即垂直方向),举例说明,P (0,0)=P (0,1)=…=P (0,H-1)=R (0,-1),P (0,0),P (0,1),…P (0,H-1)分别表示当前图像块的首列子块的运动矢量,R (0,-1)表示当前图像块的上侧空域邻近块(0,-1)的运动矢量,类推,这里不再一一说明;
其中,图10E示意再一种帧间方向预测模式(表3中的模式23)的投影方式,粗体箭头表示帧间方向预测模式23对应的预测方向,举例说明,两个相邻的上侧空域邻近块809、810的运动矢量的加权值为当前子块604的运动矢量。需要说明的是,图10E中,如细箭头所示,一些左侧邻近参考块的运动矢量被投影以扩展参考行,例如左侧空域邻近块802的运动矢量被投影或映射为参考行的扩展部分中的左上空域邻近块811的运动矢量。
在第一种示例下,本申请实施例的运动信息加权方式可以采用JVET里的4tap的Cubic帧内插值滤波。
以帧间垂直方向预测模式为例(图5或表2中的模式6~10),i是投影位移的整数部分,f代表投影位移的小数部分,A是角度参数,y是当前子块的纵坐标,x是当前子块的横坐标。
i=((y+1)*A)>>5(6)
f=((y+1)*A)&31(7)
Int*w=CubicFilter[f];
Int CubicFilter[32][4]={
{0,256,0,0},//0Integer-Pel
{-3,252,8,-1},//1
{-5,247,17,-3},//2
{-7,242,25,-4},//3
{-9,236,34,-5},//4
{-10,230,43,-7},//5
{-12,224,52,-8},//6
{-13,217,61,-9},//7
{-14,210,70,-10},//8
{-15,203,79,-11},//9
{-16,195,89,-12},//10
{-16,187,98,-13},//11
{-16,179,107,-14},//12
{-16,170,116,-14},//13
{-17,162,126,-15},//14
{-16,153,135,-16},//15
{-16,144,144,-16},//16Half-Pel
};
P(x,y)=(w[0]*R(x+i-1,-1)+w[1]*R(x+i,-1)+w[2]*R(x+i+1,-1)+w[3]*R(x+i+2,-1)+128)>>8  (8)
其中,(x,y)代表当前子块相对于当前图像块的左上角子块的坐标,R(x+i-1,-1),R(x+i,-1),R(x+i+1,-1),R(x+i+2,-1)代表4个彼此相邻的参考块的运动矢量;相应的,w[0],w[1],w[2],w[3]代表前述4个参考块的加权因子,P(x,y)代表当前子块的运动矢量。
需要说明的是,帧间水平方向预测模式下,只需要将公式(6)、公式(7)和公式(8)中交换x和y坐标即可,这里不再赘述。
在第二种示例下,本申请实施例的运动信息加权方式可以采用JVET里的4tap的Gaussian帧内插值滤波,将第一种示例中的Cubic帧内插值滤波替换为Gaussian帧内插值滤波即可,这里不再赘述。
Int*w=GaussFilter[f];
Int GaussFilter[32][4]={
{47,161,47,1},//0Integer-Pel
{43,161,51,1},//1
{40,160,54,2},//2
{37,159,58,2},//3
{34,158,62,2},//4
{31,156,67,2},//5
{28,154,71,3},//6
{26,151,76,3},//7
{23,149,80,4},//8
{21,146,85,4},//9
{19,142,90,5},//10
{17,139,94,6},//11
{16,135,99,6},//12
{14,131,104,7},//13
{13,127,108,8},//14
{11,123,113,9},//15
{10,118,118,10},//16Half-Pel
};
在第三种示例下,本申请实施例的运动信息加权方式可以采用HEVC里的2tap的帧内插值滤波,
将第一种示例中的公式(8)替换为如下公式(9)即可,这里不再赘述。
P(x,y)=((32-f)*R(x+i,-1)+f*R(x+i+1,-1)+16)>>5(9)
其中,(x,y)代表当前子块相对于当前图像块的左上角子块的坐标,R(x+i,-1),R(x+i+1,-1)代表两个彼此相邻的目标参考块的运动矢量;相应的,(32-f),f代表两个目标参考块的加权因子,P(x,y)代表当前子块的运动矢量。
由上可见,本申请实施例的基于用于方向性的运动场的帧间方向预测模式的帧间预测过程中,沿着预测方向的一个或多个子块的运动矢量彼此相同且运动矢量的值取决于目标参考块的运动矢量,从而能较好的预测具有方向性的运动场的图像块或其子块的运动矢量,提高了运动矢量的预测准确性。
在本申请的一个实例中,每个参考块(例如空域邻近参考块,时域邻近参考块)的运动信息(即每组运动信息)可包括运动矢量、参考图像列表和与参考图像列表对应的参考图像索引。参考图像索引用于识别指定参考图像列表(RefPicList0或RefPicList1)中的运动矢量所指向的参考图像。运动矢量(MV)是指水平和竖直方向的位置偏移,即运动向量的水平分量和运动向量的垂直分量。
为了提高运动矢量预测的有效性,无论基于哪一种新的帧间预测模式预测当前图像块中一个或多个子块的运动信息,在执行多组运动信息的线性插值或加权或求均值之前,视频编码器100或者视频解码器200(具体的,帧间预测器110、210)可进一步用于(或者本申请各个实施例的方法还可以包括):
确定当前图像块的、与指定参考图像列表对应的目标参考图像索引;这里的指定参考图像列表可以是参考图像列表0或列表1;这里的目标参考图像索引可以是0,1或其它,也可以是指定参考图像列表中使用频率最高的参考图像索引,例如是所有参考块的运动矢量或者经加权的参考块的运动矢量指向/使用次数最多的参考图像索引。
判断所述多组运动信息各自包括的与所述指定参考图像列表对应的参考图像索引是否与所述目标参考图像索引相同;
如果当前运动信息包括的与所述指定参考图像列表对应的参考图像索引不同于所述目标参考图像索引,则对当前运动运动信息包括的与所述指定参考图像列表对应的运动矢量进行基于时域距离的缩放处理,以得到指向所述目标参考图像索引的参考帧的运动矢量。
在一种实例中,基于当前图像与由当前运动信息的参考图像索引指示的参考图像之间的时间距离与当前图像与由目标参考图像索引指示的参考图像之间的时间距离,来按比例缩放运动矢量。
举例说明,对多个参考块的运动矢量进行插值之前,如果多个参考块的运动信息各自包括的与列表0对应的参考图像索引不同,例如第一参考块的与列表0对应的参考图像索引为0,而第二参考块的与列表0对应的参考图像索引为1,而且在假定当前图像块与列表0对应的参考图像索引为0的情况下,则对第二参考块的运动矢量(对应列表0)进行基于时域距离的缩放,以得到指向由参考图像索引0指示的参考帧的运动矢量。
在本申请的另一个实例中,针对某些帧间方向预测模式,在利用目标参考块的运动矢量推算当前子块的运动矢量之前,为了减少由参考块运动信息的边缘edge效应引 起的当前块运动信息的轮廓效应,视频编码器100或者视频解码器200可进一步用于(或者本申请各个实施例的方法还可以包括):基于与帧间方向预测模式对应的预测方向或角度,对目标参考块的运动信息选择性地进行滤波,例如,当确定的帧间预测模式为角度比较大的帧间方向预测模式2,6或10,在利用目标参考块的运动矢量推算当前子块的运动矢量之前,对目标参考块的运动矢量进行滤波;例如,可以利用邻近参考块的运动信息通过{1/4,2/4,1/4}的滤波器对目标参考块的运动信息进行滤波,其中邻近参考块为与目标参考块直接相邻(比如左右相邻,或者上下相邻)的邻近参考块。
尤其是,可以基于块大小和与帧间方向预测模式对应的预测方向或角度,对目标参考块的运动信息选择性地进行滤波,例如,块越大,角度较大的帧间方向预测模式,在利用目标参考块的运动矢量推算当前子块的运动矢量之前,提前滤波处理的必要性就越大。
在本申请的再一实例中,考虑到处于当前图像块边界的多个子块(例如上下相邻的子块)的运动矢量可能不同(不连续性),那么从参考图像中取得的预测块在参考图像中不相邻,这可能会导致边界子块的预测块之间的不连续,从而导致残差的不连续,影响残差的图像编码/解码性能,因此考虑对图像块边界处的子块的运动矢量进行滤波。
相应地,视频编码器100或者视频解码器200可进一步用于(或者本申请各个实施例的方法还可以包括):对当前图像块的边界子块的运动信息进行滤波,该边界子块为当前图像块中位于边界的一个或多个子块。尤其是,对特定帧间预测模式(例如第二帧间预测模式、垂直预测模式、水平预测模式等)下当前图像块的边界子块的运动信息进行滤波。可选地,可以通过{1/4,3/4}或者{1/4,2/4,1/4}的滤波器进行滤波,使得边界子块的运动矢量变化的更平缓,应当理解的是本申请并不限于此。
图11为本申请实施例中的帧间预测装置1100的一种示意性框图。需要说明的是,帧间预测装置1100既适用于解码视频图像的帧间预测,也适用于编码视频图像的帧间预测,应当理解的是,这里的帧间预测装置1100可以对应于图2A中的帧间预测器110,或者可以对应于图2B中的帧间预测器210,该帧间预测装置1100可以包括:
帧间预测模式确定单元1101,用于确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,或用于方向性的运动场的多种帧间预测模式;
帧间预测处理单元1102,用于基于所述确定的帧间预测模式,对所述当前图像块执行帧间预测。
在一种可行的实施方式中,帧间预测处理单元1102具体用于:基于所述确定的帧间预测模式,预测所述当前图像块中一个或多个子块(具体可以是每个子块或所有子块)的运动信息,并利用所述当前图像块中一个或多个子块的运动信息对所述当前图像块执行帧间预测。应当理解的是,在预测得到当前图像块中一个或多个子块的运动 矢量之后,可以通过运动补偿过程生成对应子块的预测块,进而得到当前图像块的预测块。
由上可见,本申请实施例的帧间预测装置无论基于哪一种用于方向性或非方向性的帧间预测模式均能预测出当前图像块中一个或多个子块(具体可以是每个或所有子块)的运动信息(例如运动矢量),这样的话,从结果来看,预测出的当前图像块的运动矢量基本上接近使用运动估算方法得到的运动矢量,从而当编码时无需传送运动矢量差值MVD,在视频质量相同的情况下节省了码率,因此本申请实施例的帧间预测装置的编解码性能得到进一步的改善。
在一些可能的实施场景下,如果所述帧间预测模式确定单元1101确定用于非方向性的运动场的第一帧间预测模式(用于帧间预测的平面planner模式),所述帧间预测处理单元1102具体用于:
预测或推导与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息;
预测或推导与当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息;
基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息。
在一些可行的实施方式中,在所述预测(推导)与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息的方面,所述帧间预测处理单元1102具体用于:
基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的右上角空域邻近块的第六运动信息的线性插值,得到所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
确定所述当前图像块的第一并置块(co-located)的第一右侧空域邻近块的运动信息为所述第一运动信息,其中所述第一右侧空域邻近块位于所述第一并置块的行与当前子块位于所述当前图像块的行相同;或者,
确定所述当前图像块的第二并置块(co-located)的第二右侧空域邻近块的运动信息为所述第一运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于所述第二并置块的行与当前子块位于所述当前图像块的行相同;或者,
确定所述当前图像块的右上角空域邻近块的第六运动信息为所述第一运动信息;或者,确定所述当前图像块的右上侧的两个空域邻近块的运动信息的平均值为所述第 一运动信息。
在一些可行的实施方式中,在所述获取与当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息的方面,所述帧间预测处理单元1102具体用于:
基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的左下角空域邻近块的第七运动信息的线性插值,得到所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
确定所述当前图像块的第一并置块(co-located)的第一下侧空域邻近块的运动信息为所述第二运动信息,其中所述第一下侧空域邻近块位于所述第一并置块的列与当前子块位于所述当前图像块的列相同;或者
确定所述当前图像块的第二并置块(co-located)的第二下侧空域邻近块的运动信息为所述第二运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二下侧空域邻近块位于所述第二并置块的列与当前子块位于所述当前图像块的列相同;或者
确定所述当前图像块的左下角空域邻近块的第七运动信息为所述第二运动信息;或者,确定所述当前图像块的左下侧的两个空域邻近块的运动信息的平均值为所述第二运动信息。
在一种可行的实施方式中,在所述基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值的方面,所述帧间预测处理单元1102具体用于:
确定与当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的加权值,为当前子块的运动信息的第一预测值,其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块和当前子块之间的第一距离,与,当前子块和与当前子块相同行上的、所述当前图像块的左侧空域邻近块之间的第二距离之间的比例确定的;
在所述基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值的方面,所述帧间预测处理单元1102具体用于:
确定与当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息和与当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的加权值,为当前子块的运动信息的第二预测值,其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块和当前子块之间的第三距离,与,当前子块和与当前子块相同列上的、所述当前图像块的上侧空域邻近块之间的第四距离之间的比例确定的。
可见,本申请实施例的基于用于非方向性的运动场的第一帧间预测模式(亦称为 用于帧间预测的平面planar模式)的帧间预测过程中,使用水平和垂直线性插值的均值来推导当前子块的运动矢量,能较好的预测具有渐变的运动场的图像块或其子块的运动矢量,从而提高了运动矢量的预测准确性。
在一些可能的实施场景下,如果所述帧间预测模式确定单元确定用于非方向性的运动场的第二帧间预测模式(用于帧间预测的DC模式),所述帧间预测处理单元1102具体用于:
确定与当前图像块的当前子块相同行上的、所述当前图像块的左侧空域邻近块的第三运动信息和与所述当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息的均值为所述当前子块的运动信息;或者,
确定当前图像块的多个左侧空域邻近块的运动信息和当前图像块的多个上侧空域邻近块的运动信息的均值为所述当前图像块的一个或多个子块(例如可以是所有子块)的运动信息。
可见,本申请实施例的基于用于非方向性的运动场的第二帧间预测模式(亦称为用于帧间预测的DC模式)的帧间预测过程中,使用当前图像块的直接左侧空域邻近块、上侧空域邻近块的运动矢量的均值来推导当前子块的运动矢量,能较好的预测具有平滑的运动场的图像块或其子块的运动矢量,从而提高了运动矢量的预测准确性。
在一些可能的实施场景下,如果所述帧间预测模式确定单元确定用于方向性的运动场的帧间方向预测模式(用于帧间预测的方向预测模式),所述帧间预测处理单元1102具体用于:
确定一个目标参考块的运动信息为当前图像块的当前子块的运动信息;或者,
确定两个目标参考块的运动信息的加权值为所述当前子块的运动信息,或者
确定所述两个目标参考块及所述两个目标参考块的两个邻近块的运动信息的加权值为所述当前子块的运动信息;
其中所述目标参考块是根据所述帧间方向预测模式对应的预测方向(角度)在参考行或参考列上确定的与当前子块对应的参考块。
可见,本申请实施例的基于用于方向性的运动场的帧间方向预测模式的帧间预测过程中,沿着预测方向的一个或多个子块的运动矢量彼此相同且运动矢量的值取决于目标参考块的运动矢量,从而能较好的预测具有方向性的运动场的图像块或其子块的运动矢量,提高了运动矢量的预测准确性。
在一些可行的实施方式中,为了提高运动矢量预测的有效性,在执行多组运动信息的线性插值或加权或求均值之前,所述帧间预测处理单元1102进一步用于:
确定当前图像块的、与指定参考图像列表对应的目标参考图像索引;
判断所述多组运动信息各自包括的与所述指定参考图像列表对应的参考图像索引是否与所述目标参考图像索引相同;
如果当前运动信息包括的与所述指定参考图像列表对应的参考图像索引不同于所述目标参考图像索引,对当前运动运动信息包括的与所述指定参考图像列表对应的运 动矢量进行基于时域距离的缩放处理,以得到指向所述目标参考图像索引的参考帧的运动矢量。
当所述装置1100用于解码视频图像,所述装置1100还可以包括:
帧间预测数据获取单元(图中未示意),用于接收包括用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测的第一标识的帧间预测数据;
相应的,帧间预测模式确定单元1101具体用于当所述帧间预测数据指示采用所述候选帧间预测模式集合来对当前图像块进行预测时,从所述候选帧间预测模式集合中确定用于对当前图像块进行帧间预测的帧间预测模式。
进一步的,在所述帧间预测数据获取单元接收的帧间预测数据还包括用于指示所述当前图像块的帧间预测模式的第二标识的情况下,所述帧间预测模式确定单元1101具体用于确定所述第二标识指示的帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式;
在所述帧间预测数据获取单元接收的帧间预测数据不包括用于指示所述当前图像块的帧间预测模式的第二标识的情况下,所述帧间预测模式确定单元1101具体用于确定用于非方向性的运动场的第一帧间预测模式(亦称为用于帧间预测的平面Planar模式)为用于对所述当前图像块进行帧间预测的帧间预测模式。
当所述装置1100用于编码视频图像,所述装置1100还可以包括:
所述帧间预测模式确定单元1101具体用于确定所述候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
需要说明的是,本申请实施例的帧间预测装置中的各个模块为实现本申请帧间预测方法中所包含的各种执行步骤的功能主体,即具备实现完整实现本申请帧间预测方法中的各个步骤以及这些步骤的扩展及变形的功能主体,具体请参见本文中对帧间预测方法的介绍,为简洁起见,本文将不再赘述。
图12为本申请实施例的编码设备或解码设备(简称为译码设备1200)的一种实现方式的示意性框图。其中,译码设备1200可以包括处理器1210、存储器1230和总线系统1250。其中,处理器和存储器通过总线系统相连,该存储器用于存储指令,该处理器用于执行该存储器存储的指令。编码设备的存储器存储程序代码,且处理器可以调用存储器中存储的程序代码执行本申请描述的各种视频编码或解码方法,尤其是在各种新的帧间预测模式下的视频编码或解码方法,以及在各种新的帧间预测模式下预测运动信息的方法。为避免重复,这里不再详细描述。
在本申请实施例中,该处理器1210可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器1210还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器1230可以包括只读存储器(ROM)设备或者随机存取存储器(RAM)设备。任何其他适宜类型的存储设备也可以用作存储器1230。存储器1230可以包括由处理器1210使用总线1250访问的代码和数据1231。存储器1230可以进一步包括操作系统1233和应用程序1235,该应用程序1235包括允许处理器1210执行本申请描述的视频编码或解码方法(尤其是本申请描述的帧间预测方法或运动信息预测方法)的至少一个程序。例如,应用程序1235可以包括应用1至N,其进一步包括执行在本申请描述的视频编码或解码方法的视频编码或解码应用(简称视频译码应用)。
该总线系统1250除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统1250。
可选的,译码设备1200还可以包括一个或多个输出设备,诸如显示器1270。在一个示例中,显示器1270可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1270可以经由总线1250连接到处理器1210。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的 专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (39)

  1. 一种视频图像的帧间预测方法,其特征在于,包括:
    确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式;
    基于所述确定的帧间预测模式,对所述当前图像块执行帧间预测。
  2. 根据权利要求1所述的方法,其特征在于,所述基于确定的帧间预测模式,对所述当前图像块执行帧间预测,包括:
    基于所述确定的帧间预测模式,预测所述当前图像块中一个或多个子块的运动信息,并利用所述当前图像块中一个或多个子块的运动信息对所述当前图像块执行帧间预测。
  3. 根据权利要求2所述的方法,其特征在于,所述确定的帧间预测模式为用于非方向性的运动场的第一帧间预测模式,所述预测当前图像块中一个或多个子块的运动信息包括:
    预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块的第一运动信息;
    预测与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息;
    基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
    基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
    利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息。
  4. 根据权利要求3所述的方法,其特征在于,所述预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息,包括:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的右上角空域邻近块的第六运动信息的线性插值,得到所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一右侧空域邻近块的运动信息为所述第一运动信息,其中,所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一右侧空域邻近块位于所述第一并置块的行与当前子块位于所述当前图像块的行相同;或者
    确定所述当前图像块的第二并置块的第二右侧空域邻近块的运动信息为所述第一运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的 图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于所述第二并置块的行与当前子块位于所述当前图像块的行相同;或者
    确定所述当前图像块的右上角空域邻近块的第六运动信息为所述第一运动信息;或者,确定所述当前图像块的右上侧的多个空域邻近块的运动信息的均值为所述第一运动信息。
  5. 根据权利要求3或4所述的方法,其特征在于,所述预测与当前图像块的当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息,包括:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的左下角空域邻近块的第七运动信息的线性插值,得到所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一下侧空域邻近块的运动信息为所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一下侧空域邻近块位于所述第一并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的第二并置块的第二下侧空域邻近块的运动信息为所述第二运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二下侧空域邻近块位于所述第二并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的左下角空域邻近块的第七运动信息为所述第二运动信息;或者,确定所述当前图像块的左下侧的多个空域邻近块的运动信息的均值为所述第二运动信息。
  6. 如权利要求3至5任一项所述的方法,其特征在于,
    所述基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值,包括:
    确定与当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的加权值,为当前子块的运动信息的第一预测值,其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于第一距离与第二距离之间的比例确定的,与,与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块和当前子块之间的距离为第一距离,当前子块和与当前子块相同行上的、所述当前图像块的左侧空域邻近块之间的距离为第二距离;
    或者,
    所述基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值,包括:
    确定与当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息和与当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的加权值,为当前子块的运动信息的第二预测值,其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于第三距离与第四距离之间的比例确定的,与,与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块和当前子块之间的距离为第三距离,当前子块和与当前子块相同列上的、所述当前图像块的上侧空域邻近块之间的距离为第四距离。
  7. 根据权利要求2所述的方法,其特征在于,所述确定的帧间预测模式为用于非方向性的运动场的第二帧间预测模式,所述预测当前图像块中一个或多个子块的运动信息包括:
    确定与当前图像块的当前子块相同行上的、所述当前图像块的左侧空域邻近块的第三运动信息和与所述当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息的均值为所述当前子块的运动信息;或者,
    确定当前图像块的多个左侧空域邻近块的运动信息和当前图像块的多个上侧空域邻近块的运动信息的均值为所述当前图像块的一个或多个子块的运动信息。
  8. 根据权利要求2所述的方法,其特征在于,所述确定的帧间预测模式为用于方向性的运动场的帧间方向预测模式,所述预测当前图像块中一个或多个子块的运动信息包括:
    确定一个目标参考块的运动信息为当前图像块的当前子块的运动信息;或者,
    确定两个目标参考块的运动信息的加权值为所述当前子块的运动信息,或者
    确定所述两个目标参考块及所述两个目标参考块的两个邻近块的运动信息的加权值为所述当前子块的运动信息;
    其中所述目标参考块是根据所述帧间方向预测模式对应的预测方向在参考行或参考列上确定的与当前子块对应的参考块。
  9. 如权利要求3至8任一项所述的方法,其特征在于,在执行多组运动信息的线性插值或加权或求均值之前,所述方法还包括:
    确定当前图像块的、与指定参考图像列表对应的目标参考图像索引;
    判断所述多组运动信息各自包括的与所述指定参考图像列表对应的参考图像索引是否与所述目标参考图像索引相同;
    如果一组运动信息包括的与所述指定参考图像列表对应的参考图像索引不同于所述目标参考图像索引,则对该组运动信息包括的与所述指定参考图像列表对应的运动矢量进行基于时域距离的缩放处理,以得到指向所述目标参考图像索引的参考帧的运动矢量。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述方法用于解码视频图像,还包括:
    解码码流,以得到包括第一标识的帧间预测数据,其中所述第一标识用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测;
    所述确定用于对当前图像块进行帧间预测的帧间预测模式,包括:当所述帧间预测数据指示采用候选帧间预测模式集合来对当前图像块进行预测时,从所述候选帧间预测模式集合中确定用于对当前图像块进行帧间预测的帧间预测模式。
  11. 根据权利要求10所述的方法,其特征在于,如果所述帧间预测数据还包括用于指示所述当前图像块的帧间预测模式的第二标识,则所述确定用于对所述当前图像块进行帧间预测的帧间预测模式包括:确定所述第二标识指示的帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式;或者,
    如果所述帧间预测数据未包括用于指示所述当前图像块的帧间预测模式的第二标识,则所述确定用于对所述当前图像块进行帧间预测的帧间预测模式包括:确定用于非方向性的运动场的第一帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式。
  12. 根据权利要求1至9任一项所述的方法,其特征在于,所述方法用于编码视频图像,
    所述确定用于对当前图像块进行帧间预测的帧间预测模式,包括:确定所述候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
  13. 根据权利要求1至9以及权12任一项所述的方法,其特征在于,所述方法用于编码视频图像,还包括:
    将帧间预测数据编入码流,其中所述帧间预测数据包括:用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测的第一标识;或者,所述帧间预测数据包括:用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测的第一标识和用于指示当前图像块的帧间预测模式的第二标识。
  14. 一种视频图像的帧间预测装置,其特征在于,包括:
    帧间预测模式确定单元,用于确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于非方向性的运动场的多种帧间预测模式,和/或用于方向性的运动场的多种帧间预测模式;
    帧间预测处理单元,用于基于所述确定的帧间预测模式,对所述当前图像块执行帧间预测。
  15. 根据权利要求14所述的装置,其特征在于,所述帧间预测处理单元用于基于所述确定的帧间预测模式,预测所述当前图像块中一个或多个子块的运动信息,并利用所述当前图像块中一个或多个子块的运动信息对所述当前图像块执行帧间预测。
  16. 根据权利要求15所述的装置,其特征在于,如果所述帧间预测模式确定单元确定用于非方向性的运动场的第一帧间预测模式,在所述预测当前图像块中一个或多个子块的运动信息的方面,所述帧间预测处理单元具体用于:
    预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息;
    预测与当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息;
    基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
    基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
    利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息。
  17. 根据权利要求16所述的装置,其特征在于,在所述预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息的方面,所述帧间预测处理单元具体用于:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的右上角空域邻近块的第六运动信息的线性插值,得到所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一右侧空域邻近块的运动信息为所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一右侧空域邻近块位于所述第一并置块的行与当前子块位于所述当前图像块的行相同;或者,
    确定所述当前图像块的第二并置块的第二右侧空域邻近块的运动信息为所述第一运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于所述第二并置块的行与当前子块位于所述当前图像块的行相同;或者,
    确定所述当前图像块的右上角空域邻近块的第六运动信息为所述第一运动信息;或者,确定所述当前图像块的右上侧的两个空域邻近块的运动信息的平均值为所述第一运动信息。
  18. 根据权利要求16或17所述的装置,其特征在于,在所述预测与当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息的方面,所述帧间预测处理单元具体用于:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的左下角空域邻近块的第七运动信息的线性插值,得到所述第二运动信息, 其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一下侧空域邻近块的运动信息为所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一下侧空域邻近块位于所述第一并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的第二并置块的第二下侧空域邻近块的运动信息为所述第二运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二下侧空域邻近块位于所述第二并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的左下角空域邻近块的第七运动信息为所述第二运动信息;或者,确定所述当前图像块的左下侧的两个空域邻近块的运动信息的平均值为所述第二运动信息。
  19. 如权利要求16至18任一项所述的装置,其特征在于,
    在所述基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值的方面,所述帧间预测处理单元具体用于:确定与当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的加权值,为当前子块的运动信息的第一预测值,其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块和当前子块之间的第一距离,与,当前子块和与当前子块相同行上的、所述当前图像块的左侧空域邻近块之间的第二距离之间的比例确定的;
    在所述基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值的方面,所述帧间预测处理单元具体用于:
    确定与当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息和与当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的加权值,为当前子块的运动信息的第二预测值,
    其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块和当前子块之间的第三距离,与,当前子块和与当前子块相同列上的、所述当前图像块的上侧空域邻近块之间的第四距离之间的比例确定的。
  20. 根据权利要求15所述的装置,其特征在于,如果所述帧间预测模式确定单元确定用于非方向性的运动场的第二帧间预测模式,在所述预测当前图像块中一个或多 个子块的运动信息的方面,所述帧间预测处理单元具体用于:
    确定与当前图像块的当前子块相同行上的、所述当前图像块的左侧空域邻近块的第三运动信息和与所述当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息的均值为所述当前子块的运动信息;或者,
    确定当前图像块的多个左侧空域邻近块的运动信息和当前图像块的多个上侧空域邻近块的运动信息的均值为所述当前图像块的一个或多个子块的运动信息。
  21. 根据权利要求15所述的装置,其特征在于,如果所述帧间预测模式确定单元确定用于方向性的运动场的帧间方向预测模式,在所述预测当前图像块中一个或多个子块的运动信息的方面,所述帧间预测处理单元具体用于:
    确定一个目标参考块的运动信息为当前图像块的当前子块的运动信息;或者,
    确定两个目标参考块的运动信息的加权值为所述当前子块的运动信息,或者
    确定所述两个目标参考块及所述两个目标参考块的两个邻近块的运动信息的加权值为所述当前子块的运动信息;
    其中所述目标参考块是根据所述帧间方向预测模式对应的预测方向在参考行或参考列上确定的与当前子块对应的参考块。
  22. 如权利要求16至21任一项所述的装置,其特征在于,在执行多组运动信息的线性插值或加权或求均值之前,所述帧间预测处理单元进一步用于:
    确定当前图像块的、与指定参考图像列表对应的目标参考图像索引;
    判断所述多组运动信息各自包括的与所述指定参考图像列表对应的参考图像索引是否与所述目标参考图像索引相同;
    如果一组运动信息包括的与所述指定参考图像列表对应的参考图像索引不同于所述目标参考图像索引,对该组运动运动信息包括的与所述指定参考图像列表对应的运动矢量进行基于时域距离的缩放处理,以得到指向所述目标参考图像索引对应的参考帧的运动矢量。
  23. 根据权利要求14至22任一项所述的装置,其特征在于,所述装置用于解码视频图像,还包括:
    帧间预测数据获取单元,用于接收包括用于指示是否对当前图像块采用所述候选帧间预测模式集合进行帧间预测的第一标识的帧间预测数据;
    所述帧间预测模式确定单元具体用于当所述帧间预测数据指示采用所述候选帧间预测模式集合来对当前图像块进行预测时,从所述候选帧间预测模式集合中确定用于对当前图像块进行帧间预测的帧间预测模式。
  24. 根据权利要求23所述的装置,其特征在于,在所述帧间预测数据获取单元接收的帧间预测数据还包括用于指示所述当前图像块的帧间预测模式的第二标识的情况下,所述帧间预测模式确定单元具体用于确定所述第二标识指示的帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式;
    在所述帧间预测数据获取单元接收的帧间预测数据不包括用于指示所述当前图像块的帧间预测模式的第二标识的情况下,所述帧间预测模式确定单元具体用于确定用于非方向性的运动场的第一帧间预测模式为用于对所述当前图像块进行帧间预测的帧间预测模式。
  25. 根据权利要求14至22任一项所述的装置,其特征在于,所述装置用于编码视频图像,
    所述帧间预测模式确定单元具体用于确定所述候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
  26. 一种视频编码器,其特征在于,所述视频编码器用于编码图像块,包括:
    如权利要求14至22以及25任一项所述的帧间预测器,其中所述帧间预测器用于基于帧间预测模式,预测待编码图像块的预测块,其中所述帧间预测模式是候选帧间预测模式集合中的一种;
    熵编码器,用于将第一标识编入码流,所述第一标识用于指示是否对所述待编码图像块采用所述候选帧间预测模式集合进行帧间预测;
    重建器,用于根据所述预测块重建所述图像块。
  27. 根据权利要求26所述的视频编码器,其特征在于,所述熵编码器,还用于将第二标识编入码流,所述第二标识用于指示所述待编码图像块的帧间预测模式。
  28. 一种视频解码器,其特征在于,所述视频解码器用于从码流中解码出图像块,包括:
    熵解码器,用于从码流中解码出第一标识,所述第一标识用于指示是否对待解码图像块采用候选帧间预测模式集合进行帧间预测;
    如权利要求14至24任一项所述的帧间预测器,其中所述帧间预测器用于基于帧间预测模式,预测所述待解码图像块的预测块,其中所述帧间预测模式是所述候选帧间预测模式集合中的一种;
    重建器,用于根据所述预测块重建所述图像块。
  29. 根据权利要求28所述的视频解码器,其特征在于,所述熵解码器还用于从所述码流中解码出第二标识,所述第二标识用于指示所述待解码图像块的帧间预测模式。
  30. 一种视频图像的帧间预测方法,其特征在于,包括:
    确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模式集合包括:用于平滑或渐变的运动场的第一帧间预测模式;
    基于所述确定的帧间预测模式,预测所述当前图像块中每个子块的运动信息,并 利用所述当前图像块中每个子块的运动信息对所述当前图像块执行帧间预测。
  31. 根据权利要求30所述的方法,其特征在于,所述确定的帧间预测模式为用于平滑或渐变的运动场的第一帧间预测模式,所述预测当前图像块中每个子块的运动信息包括:
    预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块的第一运动信息;
    预测与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息;
    基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
    基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
    利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息。
  32. 根据权利要求31所述的方法,其特征在于,所述预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息,包括:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的右上角空域邻近块的第六运动信息的线性插值,得到所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一右侧空域邻近块的运动信息为所述第一运动信息,其中,所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一右侧空域邻近块位于所述第一并置块的行与当前子块位于所述当前图像块的行相同;或者
    确定所述当前图像块的第二并置块的第二右侧空域邻近块的运动信息为所述第一运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于所述第二并置块的行与当前子块位于所述当前图像块的行相同;或者
    确定所述当前图像块的右上角空域邻近块的第六运动信息为所述第一运动信息;或者,确定所述当前图像块的右上侧的多个空域邻近块的运动信息的均值为所述第一运动信息。
  33. 根据权利要求31或32所述的方法,其特征在于,所述预测与当前图像块的当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息,包括:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的左下角空域邻近块的第七运动信息的线性插值,得到所述第二运动信息, 其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一下侧空域邻近块的运动信息为所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一下侧空域邻近块位于所述第一并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的第二并置块的第二下侧空域邻近块的运动信息为所述第二运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二下侧空域邻近块位于所述第二并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的左下角空域邻近块的第七运动信息为所述第二运动信息;或者,确定所述当前图像块的左下侧的多个空域邻近块的运动信息的均值为所述第二运动信息。
  34. 如权利要求31至33任一项所述的方法,其特征在于,
    所述基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值,包括:
    确定与当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的加权值,为当前子块的运动信息的第一预测值,
    其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块和当前子块之间的第一距离,与,当前子块和与当前子块相同行上的、所述当前图像块的左侧空域邻近块之间的第二距离之间的比例确定的;
    或者,
    所述基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值,包括:
    确定与当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息和与当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的加权值,为当前子块的运动信息的第二预测值,
    其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块和当前子块之间的第三距离,与,当前子块和与当前子块相同列上的、所述当前图像块的上侧空域邻近块之间的第四距离之间的比例确定的。
  35. 一种视频图像的帧间预测装置,其特征在于,包括:
    帧间预测模式确定单元,用于确定用于对当前图像块进行帧间预测的帧间预测模式,其中所述帧间预测模式是候选帧间预测模式集合中的一种,所述候选帧间预测模 式集合包括:用于平滑或渐变的运动场的第一帧间预测模式;
    帧间预测处理单元,用于基于所述确定的帧间预测模式,预测所述当前图像块中每个子块的运动信息,并利用所述当前图像块中每个子块的运动信息对所述当前图像块执行帧间预测。
  36. 根据权利要求35所述的装置,其特征在于,如果所述帧间预测模式确定单元确定用于平滑或渐变的运动场的第一帧间预测模式,在所述预测当前图像块中每个子块的运动信息的方面,所述帧间预测处理单元具体用于:
    预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息;
    预测与当前子块相同列上的、所述当前图像块的下侧邻近块的第二运动信息;
    基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值;
    基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值;
    利用所述当前子块的运动信息的第一预测值和所述当前子块的运动信息的第二预测值,确定所述当前子块的运动信息。
  37. 根据权利要求36所述的装置,其特征在于,在所述预测与当前图像块的当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息的方面,所述帧间预测处理单元具体用于:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的右上角空域邻近块的第六运动信息的线性插值,得到所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块(co-located)的第一右侧空域邻近块的运动信息为所述第一运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一右侧空域邻近块位于所述第一并置块的行与当前子块位于所述当前图像块的行相同;或者,
    确定所述当前图像块的第二并置块(co-located)的第二右侧空域邻近块的运动信息为所述第一运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二右侧空域邻近块位于所述第二并置块的行与当前子块位于所述当前图像块的行相同;或者,
    确定所述当前图像块的右上角空域邻近块的第六运动信息为所述第一运动信息;或者,确定所述当前图像块的右上侧的两个空域邻近块的运动信息的平均值为所述第一运动信息。
  38. 根据权利要求36或37所述的装置,其特征在于,在所述预测与当前子块相 同列上的、所述当前图像块的下侧邻近块的第二运动信息的方面,所述帧间预测处理单元具体用于:
    基于所述当前图像块的第一并置块的右下角空域邻近块的第五运动信息和所述当前图像块的左下角空域邻近块的第七运动信息的线性插值,得到所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块;或者
    确定所述当前图像块的第一并置块的第一下侧空域邻近块的运动信息为所述第二运动信息,其中所述第一并置块为参考图像中与所述当前图像块具有相同的大小、形状和坐标的图像块,所述第一下侧空域邻近块位于所述第一并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的第二并置块的第二下侧空域邻近块的运动信息为所述第二运动信息,其中所述第二并置块为参考图像中与所述当前图像块具有指定位置偏移的图像块,所述当前图像块的代表性空域邻近块的运动矢量用于表示所述指定位置偏移,且所述第二下侧空域邻近块位于所述第二并置块的列与当前子块位于所述当前图像块的列相同;或者
    确定所述当前图像块的左下角空域邻近块的第七运动信息为所述第二运动信息;或者,确定所述当前图像块的左下侧的两个空域邻近块的运动信息的平均值为所述第二运动信息。
  39. 如权利要求36至38任一项所述的装置,其特征在于,
    在所述基于所述第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的线性插值,得到当前子块的运动信息的第一预测值的方面,所述帧间预测处理单元具体用于:确定与当前子块相同行上的、所述当前图像块的右侧邻近块的第一运动信息和与当前子块相同行上的、所述当前图像块的左侧邻近块的第三运动信息的加权值,为当前子块的运动信息的第一预测值,其中,第三运动信息的加权因子和第一运动信息的加权因子之间比例是基于与当前图像块的当前子块相同行上的、所述当前图像块的右侧空域邻近块和当前子块之间的第一距离,与,当前子块和与当前子块相同行上的、所述当前图像块的左侧空域邻近块之间的第二距离之间的比例确定的;
    在所述基于所述第二运动信息和与当前子块相同列上的、所述当前图像块的上侧邻近块的第四运动信息的线性插值,得到当前子块的运动信息的第二预测值的方面,所述帧间预测处理单元具体用于:
    确定与当前子块相同列上的、所述当前图像块的上侧空域邻近块的第四运动信息和与当前子块相同列上的、所述当前图像块的下侧空域邻近块的第二运动信息的加权值,为当前子块的运动信息的第二预测值,其中,第四运动信息的加权因子和第二运动信息的加权因子之间比例是基于与当前图像块的当前子块相同列上的、所述当前图像块的下侧空域邻近块和当前子块之间的第三距离,与,当前子块和与当前子块相同列上的、所述当前图像块的上侧空域邻近块之间的第四距离之间的比例确定的。
PCT/CN2018/105148 2017-09-29 2018-09-12 视频图像的帧间预测方法、装置及编解码器 WO2019062544A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18863727.6A EP3672249B1 (en) 2017-09-29 2018-09-12 Inter frame prediction method and device for video images
US16/832,707 US11252436B2 (en) 2017-09-29 2020-03-27 Video picture inter prediction method and apparatus, and codec

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710912607.0 2017-09-29
CN201710912607.0A CN109587479B (zh) 2017-09-29 2017-09-29 视频图像的帧间预测方法、装置及编解码器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/832,707 Continuation US11252436B2 (en) 2017-09-29 2020-03-27 Video picture inter prediction method and apparatus, and codec

Publications (1)

Publication Number Publication Date
WO2019062544A1 true WO2019062544A1 (zh) 2019-04-04

Family

ID=65900774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105148 WO2019062544A1 (zh) 2017-09-29 2018-09-12 视频图像的帧间预测方法、装置及编解码器

Country Status (4)

Country Link
US (1) US11252436B2 (zh)
EP (1) EP3672249B1 (zh)
CN (1) CN109587479B (zh)
WO (1) WO2019062544A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020251419A3 (en) * 2019-10-06 2021-03-11 Huawei Technologies Co., Ltd. Harmonizing weighted prediction with affine model based motion compensation

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020059616A1 (ja) * 2018-09-21 2020-03-26 日本放送協会 画像符号化装置、画像復号装置、及びプログラム
US11595662B2 (en) * 2019-02-06 2023-02-28 Tencent America LLC Method and apparatus for neighboring block availability in video coding
KR20220003021A (ko) 2019-04-25 2022-01-07 후아웨이 테크놀러지 컴퍼니 리미티드 픽처 예측 방법 및 장치, 및 컴퓨터 판독 가능형 저장 매체
CN111866502B (zh) * 2019-04-25 2024-10-11 华为技术有限公司 图像预测方法、装置和计算机可读存储介质
CN112153389B (zh) * 2019-05-17 2021-11-19 华为技术有限公司 一种帧间预测的方法和装置
CN111953995A (zh) 2019-05-17 2020-11-17 华为技术有限公司 一种帧间预测的方法和装置
CN112055220B (zh) * 2019-06-05 2022-07-29 杭州海康威视数字技术股份有限公司 一种编解码方法、装置及其设备
CN113347436B (zh) * 2019-06-21 2022-03-08 杭州海康威视数字技术股份有限公司 预测模式的解码、编码方法及装置
CN112135129B (zh) * 2019-06-25 2024-06-04 华为技术有限公司 一种帧间预测方法及装置
CN110213590B (zh) * 2019-06-25 2022-07-12 浙江大华技术股份有限公司 时域运动矢量获取、帧间预测、视频编码的方法及设备
CN113794883B (zh) * 2019-08-23 2022-12-23 杭州海康威视数字技术股份有限公司 一种编解码方法、装置及其设备
CN114402591B (zh) * 2019-09-13 2024-08-02 北京字节跳动网络技术有限公司 并置运动矢量的推导
CN112543322B (zh) 2019-09-20 2022-04-15 杭州海康威视数字技术股份有限公司 一种解码、编码方法、装置及其设备
WO2021061027A1 (en) * 2019-09-25 2021-04-01 Huawei Technologies Co., Ltd. Harmonizing triangular merge mode with weighted prediction
CN110691253B (zh) * 2019-10-17 2022-03-01 北京大学深圳研究生院 一种基于帧间预测的编解码方法及装置
CN114007082B (zh) * 2020-03-25 2022-12-23 杭州海康威视数字技术股份有限公司 一种解码、编码、编解码方法、装置及其设备
CN112218076B (zh) * 2020-10-17 2022-09-06 浙江大华技术股份有限公司 一种视频编码方法、装置、系统及计算机可读存储介质
CN114119650A (zh) * 2021-11-26 2022-03-01 苏州臻迪智能科技有限公司 一种目标跟踪方法、装置、设备及介质
CN116962686A (zh) * 2022-04-15 2023-10-27 维沃移动通信有限公司 帧间预测方法及终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110135000A1 (en) * 2009-12-09 2011-06-09 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
CN102215388A (zh) * 2010-04-09 2011-10-12 华为技术有限公司 一种简化方向性变换的方法、装置和系统
CN103098467A (zh) * 2010-07-09 2013-05-08 三星电子株式会社 用于对运动矢量进行编码和解码的方法和设备
CN104539966A (zh) * 2014-09-30 2015-04-22 华为技术有限公司 图像预测方法及相关装置
CN104618714A (zh) * 2015-01-20 2015-05-13 宁波大学 一种立体视频帧重要性评估方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100952340B1 (ko) * 2008-01-24 2010-04-09 에스케이 텔레콤주식회사 시공간적 복잡도를 이용한 부호화 모드 결정 방법 및 장치
US9106910B2 (en) * 2009-06-23 2015-08-11 Orange Method of coding and decoding images, corresponding device for coding and decoding and computer program
CN102387360B (zh) * 2010-09-02 2016-05-11 乐金电子(中国)研究开发中心有限公司 视频编解码帧间图像预测方法及视频编解码器
US9641836B2 (en) * 2012-08-07 2017-05-02 Qualcomm Incorporated Weighted difference prediction under the framework of generalized residual prediction
KR20150010660A (ko) * 2013-07-18 2015-01-28 삼성전자주식회사 인터 레이어 비디오 복호화 및 부호화 장치 및 방법을 위한 깊이 영상의 화면내 예측 방법
WO2015192353A1 (en) * 2014-06-19 2015-12-23 Microsoft Technology Licensing, Llc Unified intra block copy and inter prediction modes
US10462457B2 (en) * 2016-01-29 2019-10-29 Google Llc Dynamic reference motion vector coding mode

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110135000A1 (en) * 2009-12-09 2011-06-09 Samsung Electronics Co., Ltd. Method and apparatus for encoding video, and method and apparatus for decoding video
CN102215388A (zh) * 2010-04-09 2011-10-12 华为技术有限公司 一种简化方向性变换的方法、装置和系统
CN103098467A (zh) * 2010-07-09 2013-05-08 三星电子株式会社 用于对运动矢量进行编码和解码的方法和设备
CN104539966A (zh) * 2014-09-30 2015-04-22 华为技术有限公司 图像预测方法及相关装置
CN104618714A (zh) * 2015-01-20 2015-05-13 宁波大学 一种立体视频帧重要性评估方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3672249A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020251419A3 (en) * 2019-10-06 2021-03-11 Huawei Technologies Co., Ltd. Harmonizing weighted prediction with affine model based motion compensation

Also Published As

Publication number Publication date
CN109587479A (zh) 2019-04-05
US20200228830A1 (en) 2020-07-16
US11252436B2 (en) 2022-02-15
EP3672249A1 (en) 2020-06-24
EP3672249B1 (en) 2024-02-14
CN109587479B (zh) 2023-11-10
EP3672249A4 (en) 2020-07-08

Similar Documents

Publication Publication Date Title
US11252436B2 (en) Video picture inter prediction method and apparatus, and codec
TWI535269B (zh) 執行視訊寫碼之運動向量預測
TWI786790B (zh) 視頻資料的幀間預測方法和裝置
TW201740730A (zh) 用於視訊寫碼之濾波器之幾何轉換
CN110868602B (zh) 视频编码器、视频解码器及相应方法
WO2019154424A1 (zh) 视频解码方法、视频解码器以及电子设备
US11582444B2 (en) Intra-frame coding method and apparatus, frame coder, and frame coding system
WO2020247577A1 (en) Adaptive motion vector resolution for affine mode
WO2020007093A1 (zh) 一种图像预测方法及装置
JP7485809B2 (ja) インター予測方法及び装置、ビデオエンコーダ、並びにビデオデコーダ
CN110677645B (zh) 一种图像预测方法及装置
WO2019237287A1 (zh) 视频图像的帧间预测方法、装置及编解码器
WO2020007187A1 (zh) 图像块解码方法及装置
WO2019227297A1 (zh) 一种视频图像的帧间预测方法、装置及编解码器
WO2024039803A1 (en) Methods and devices for adaptive loop filter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18863727

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018863727

Country of ref document: EP

Effective date: 20200320

NENP Non-entry into the national phase

Ref country code: DE