WO2020042758A1 - 一种帧间预测的方法及装置 - Google Patents

一种帧间预测的方法及装置 Download PDF

Info

Publication number
WO2020042758A1
WO2020042758A1 PCT/CN2019/094666 CN2019094666W WO2020042758A1 WO 2020042758 A1 WO2020042758 A1 WO 2020042758A1 CN 2019094666 W CN2019094666 W CN 2019094666W WO 2020042758 A1 WO2020042758 A1 WO 2020042758A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
image
processed
image block
motion vector
Prior art date
Application number
PCT/CN2019/094666
Other languages
English (en)
French (fr)
Inventor
张娜
郑建铧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020042758A1 publication Critical patent/WO2020042758A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the present application relates to the technical field of video encoding and decoding, and in particular, to a method and a device for predicting an inter-frame of a video image.
  • Digital video capabilities can be incorporated into a wide variety of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, Digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones (so-called "smart phones"), video teleconferencing devices, video streaming devices, and the like .
  • Digital video devices implement video compression technology, for example, in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4 Part 10 Advanced Video Coding (AVC), Video coding standards described in the H.265 / High Efficiency Video Coding (HEVC) standard and extensions to such standards.
  • Video devices can implement such video compression techniques to more efficiently transmit, receive, encode, decode, and / or store digital video information.
  • Video compression techniques perform spatial (intra-image) prediction and / or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences.
  • a video slice ie, a video frame or a portion of a video frame
  • image blocks which may also be referred to as tree blocks, coding units (CU), and / or coding nodes .
  • the spatial prediction of reference samples in neighboring blocks in the same image is used to encode the image blocks in the to-be-encoded (I) slice of the image.
  • the image blocks in an inter-coded (P or B) slice of an image may use spatial prediction relative to reference samples in neighboring blocks in the same image or temporal prediction relative to reference samples in other reference images.
  • An image may be referred to as a frame, and a reference image may be referred to as a reference frame.
  • various video coding standards including the High-Efficiency Video Coding (HEVC) standard have proposed predictive coding modes for image blocks, that is, predicting currently-encoded image blocks based on already-encoded image blocks.
  • HEVC High-Efficiency Video Coding
  • the intra prediction mode the currently decoded image block is predicted based on one or more previously decoded neighboring blocks in the same image as the current block; in the inter prediction mode, based on already decoded blocks in different images To predict the currently decoded image block.
  • the embodiments of the present application provide a method and an apparatus for inter prediction.
  • an inter prediction mode uses a spatial or time domain reference block having a related position relationship with a block to be processed.
  • the corresponding motion vector is interpolated to obtain the motion vector corresponding to each sub-block inside the block to be processed, which improves the efficiency of inter prediction.
  • it is further realized by adjusting the size of the sub-blocks and limiting the applicable conditions of the inter prediction mode. It balances the coding gain and complexity.
  • the motion vector corresponding to each sub-block of the image block to be obtained obtained by interpolation can be directly used as the motion vector of each sub-block of the image block to be processed for motion compensation, and can also be used as the predicted value of the motion vector of each sub-block. A motion vector is obtained from the predicted value, and then motion compensation is performed.
  • the inter prediction method according to the embodiment of the present application may be used as a prediction mode and other existing prediction modes in the coding end to participate in rate distortion selection.
  • the prediction mode is determined by the coding end as the optimal prediction mode, Similar to the prediction mode in the prior art, the identification information in the prediction mode set will be encoded into the code stream and passed to the decoding end.
  • the decoding end will parse the prediction mode according to the received code stream information to achieve encoding and decoding. Consistent.
  • each sub-block (ie, the basic prediction block) in the image block to be processed is divided into different motion vectors, so that the motion vector field corresponding to the image block to be processed is more accurate and prediction efficiency is improved.
  • the beneficial effect of this implementation mode is that the reference block used to generate the motion vector corresponding to the basic prediction block is reasonably selected, and the reliability of the generated motion vector is improved.
  • the original reference block having a preset spatial relationship with the image block to be processed includes: located at an upper left corner of the image block to be processed and related to the image to be processed An image block adjacent to the upper left corner of the block, an image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, and an image block located at the lower left corner of the image block to be processed and One or more of the image blocks adjacent to the lower left corner of the image block to be processed, wherein the original reference block having a preset spatial relationship with the image block to be processed is located at the external.
  • the beneficial effect of this implementation mode is that a spatial reference block for generating a motion vector corresponding to a basic prediction block is reasonably selected, and the reliability of the generated motion vector is improved.
  • the original reference block having a preset time-domain positional relationship with the image block to be processed includes: located in a target reference frame in a lower right corner of the mapped image block and related to the An image block adjacent to a lower right corner point of a mapped image block, wherein the original reference block having a preset time-domain positional relationship with the image block to be processed is located outside the mapped image block, and the mapped image block and The size of the image blocks to be processed is equal, and the position of the mapped image block in the target reference frame is the same as the position of the image block to be processed in the image frame where the image block to be processed is located.
  • the beneficial effect of this implementation mode is that the time-domain reference block used to generate the motion vector corresponding to the basic prediction block is reasonably selected, and the reliability of the generated motion vector is improved.
  • the index information and reference frame list information of the target reference frame are obtained by parsing the code stream.
  • the beneficial effect of this implementation mode is: compared with the preset target reference frame in the prior art, the target reference frame can be flexibly selected, so that the corresponding time-domain reference block is more reliable.
  • the index information and reference frame list information of the target reference frame are located in a code stream segment corresponding to a slice header of a slice where the image block to be processed is located.
  • the beneficial effect of this implementation mode is that the identification information of the target reference frame is stored in the slice header, and all time-domain reference blocks of the image blocks in the slice share the same reference frame information, which saves the encoding code stream and improves the encoding efficiency.
  • the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the preset position with the image block to be processed Weighted calculation is performed on one or more of the motion vector corresponding to the original reference block of the relationship to obtain the motion vector corresponding to the basic prediction block, including: the motion vector corresponding to the basic prediction block is obtained according to the following formula:
  • AR is the motion vector corresponding to the image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed
  • BR is the lower right corner of the mapped image block in the target reference frame
  • BL is the image located at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed
  • x is the ratio of the horizontal distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the image block to be processed to the width of the basic prediction block
  • y is the basic prediction block The ratio of the vertical distance of the upper left corner point of the upper left corner point of the image block to be processed to the height of the basic prediction block, where H is the height of the image block to be processed and the height of the basic prediction block.
  • W is the ratio of the width of the image block to be processed to the width of the basic prediction block
  • L (-1, y) is the motion vector corresponding to the second reference block
  • a (x, -1) is A motion vector corresponding to the first reference block
  • P (x, y) is the basic prediction Corresponding to the block motion vector.
  • the specific implementation manner of the present application partially improves a plurality of the motion vectors corresponding to the first reference block, the motion vectors corresponding to the second reference block, and the original reference having a preset position relationship with the image block to be processed.
  • An embodiment in which one or more of the motion vectors corresponding to a block are weighted to obtain a motion vector corresponding to the basic prediction block is not limited to this embodiment.
  • the determining a size of a basic prediction block in an image block to be processed includes: when the side lengths of two adjacent sides of the basic prediction block are different, determining The side length of the shorter one side of the basic prediction block is 4 or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, it is determined that the side length of the basic prediction block is 4 or 8.
  • This embodiment has the beneficial effect that the size of the basic prediction block is fixed and the complexity is reduced.
  • the determining a size of a basic prediction block in an image block to be processed includes: parsing a first identifier from a code stream, where the first identifier is used to indicate the The size of the basic prediction block, wherein the first identifier is located in a sequence parameter set of a sequence where the image block to be processed is located, an image parameter set of an image where the image block to be processed is located, and In the stream segment corresponding to one of the slice headers.
  • the beneficial effect of this implementation mode is that the identification information of the size of the basic prediction block is added to the auxiliary information, which improves the adaptability to the image content.
  • the determining a size of a basic prediction block in an image block to be processed includes: determining the basic prediction block according to a size of a plane mode prediction block in a previously reconstructed image.
  • the size of the prediction block, the planar mode prediction block is an image block to be processed for inter prediction according to any one of the foregoing feasible implementation manners of the first aspect, and the previously reconstructed image is located in the to-be-processed encoding sequence The image before the image block.
  • determining the size of the basic prediction block according to a size of a plane mode prediction block in a previously reconstructed image of an image where the image block to be processed is located includes: : Calculating an average value of a product of the width and height of all the planar mode prediction blocks in the previously reconstructed image; when the average value is less than a threshold value, the size of the basic prediction block is the first size; when When the average is greater than or equal to the threshold, the size of the basic prediction block is a second size, wherein the first size is smaller than the second size.
  • the beneficial effect of this implementation mode is that the prior prediction information is used to determine the size of the basic prediction block of the current image, and no additional identification information needs to be transmitted, which not only improves the adaptability to the image, but also ensures that the coding rate is not increased.
  • the encoding order is further from the The most recently reconstructed image of the image to be processed.
  • the previously reconstructed image is a reconstructed image whose encoding order is closest to an image where the image block to be processed is located.
  • the beneficial effect of this implementation mode is that the nearest reference frame is reasonably selected to count the prior information, and the reliability of the statistical information is improved.
  • the threshold is a preset threshold.
  • this embodiment determines the coordinate position of each basic prediction block in the image block to be processed.
  • the method before the determining a size of a basic prediction block in an image block to be processed, the method further includes: determining the first reference block and the second The reference block is located within an image boundary where the image block to be processed is located.
  • the beneficial effect of this implementation mode is that when the first reference block or the second reference block does not exist in the image block to be processed, the prediction method in the embodiment of the present application is not adopted, and when the first reference block and the second reference block do not exist, The accuracy of the prediction method will be reduced. If this method is not used at this time, unnecessary complexity overhead is avoided.
  • the beneficial effect of this implementation mode is: when the to-be-processed image block is too small, the prediction method in the embodiment of the present application is not adopted, and the coding efficiency and complexity are balanced.
  • the method is used for encoding the image block to be processed, or decoding the image block to be processed.
  • an apparatus for inter prediction including: a determining module, configured to determine a size of a basic prediction block in an image block to be processed, where the size is used to determine the basic prediction block A position in the image block to be processed; a positioning module, configured to determine a first reference block and a second reference block of the basic prediction block according to the position, wherein a left boundary line of the first reference block and The left boundary line of the basic prediction unit is collinear, the upper boundary line of the second reference block and the upper boundary line of the basic prediction unit are collinear, and the first reference block and the upper boundary of the image block to be processed Lines are adjacent to each other, and the second reference block is adjacent to the left boundary line of the image block to be processed; a calculation module is configured to perform a motion vector corresponding to the first reference block, a motion vector corresponding to the second reference block, and Weighting calculation is performed on one or more of the motion vectors corresponding to the original reference blocks of the image block to be
  • the original reference block having a preset positional relationship with the image block to be processed includes: an original having a preset spatial relationship with the image block to be processed.
  • the original reference block having a preset spatial relationship with the image block to be processed includes: located at an upper left corner of the image block to be processed and related to the image to be processed An image block adjacent to the upper left corner point of the block, an image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, and an image block located at the lower left corner of the image block to be processed and One or more of the image blocks adjacent to the lower left corner of the image block to be processed, wherein the original reference block having a preset spatial relationship with the image block to be processed is located at the external.
  • the index information and reference frame list information of the target reference frame are obtained by parsing the code stream.
  • the index information and reference frame list information of the target reference frame are located in a code stream segment corresponding to a slice header of a slice where the image block to be processed is located.
  • the calculation module is specifically configured to obtain a motion vector corresponding to the basic prediction block according to the following formula:
  • R (W, y) ((H-y-1) ⁇ AR + (y + 1) ⁇ BR) / H
  • AR is the motion vector corresponding to the image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed
  • BR is the lower right corner of the mapped image block in the target reference frame
  • BL is the image located at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed
  • x is the ratio of the horizontal distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the image block to be processed to the width of the basic prediction block
  • y is the basic prediction block The ratio of the vertical distance of the upper left corner point of the upper left corner point of the image block to be processed to the height of the basic prediction block, where H is the height of the image block to be processed and the height of the basic prediction block.
  • W is the ratio of the width of the image block to be processed to the width of the basic prediction block
  • L (-1, y) is the motion vector corresponding to the second reference block
  • a (x, -1) is A motion vector corresponding to the first reference block
  • P (x, y) is the basic prediction Corresponding to the block motion vector.
  • the determining module is specifically configured to: when the side lengths of two adjacent sides of the basic prediction block are different, determine a shorter one of the basic prediction block The side length of one side is 4 or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, it is determined that the side length of the basic prediction block is 4 or 8.
  • the determining module is specifically configured to parse a first identifier from a code stream, where the first identifier is used to indicate a size of the basic prediction block, where all The first identifier is located in a code corresponding to one of a sequence parameter set of a sequence where the image block to be processed is located, an image parameter set of an image where the image block to be processed is located, and a slice header of a strip where the image block to be processed is located. In the stream.
  • the determining module is specifically configured to determine a size of the basic prediction block according to a size of a planar mode prediction block in a previously reconstructed image, and the planar mode
  • the prediction block is an image block to be processed for inter prediction according to any one of the foregoing feasible implementation manners of the second aspect
  • the previously reconstructed image is an image whose coding order is before the image where the image block to be processed is located.
  • the determining module is specifically configured to: calculate an average value of a product of width and height of all the planar mode prediction blocks in the previously reconstructed image; when When the average value is less than a threshold value, the size of the basic prediction block is a first size; when the average value is greater than or equal to the threshold value, the size of the basic prediction block is a second size, wherein the first One size is smaller than the second size.
  • the encoding order is further from the The most recently reconstructed image of the image to be processed.
  • the previously reconstructed image is a reconstructed image whose encoding order is closest to an image where the image block to be processed is located.
  • the previously reconstructed image is multiple images.
  • the determining module is specifically configured to calculate the multiple previously reconstructed images. The average of the product of the width and height of all the planar mode prediction blocks in.
  • the threshold is a preset threshold.
  • the threshold value is A threshold
  • the threshold is a second threshold, where the first threshold and The second threshold is different.
  • it further includes a dividing module, configured to divide the image block to be processed into a plurality of the basic prediction blocks according to the size; and determine each The position of the basic prediction block in the image block to be processed.
  • it further includes a determining module, configured to determine that the first reference block and the second reference block are located within an image boundary where the image block to be processed is located.
  • the determining module is further configured to determine that a width of the image block to be processed is greater than or equal to 16 and a height of the image block to be processed is greater than or equal to 16; Or, determine that the width of the image block to be processed is greater than or equal to 16; or determine that the height of the image block to be processed is greater than or equal to 16.
  • the apparatus is configured to encode the image block to be processed, or decode the image block to be processed.
  • a third aspect of the embodiments of the present application provides an inter prediction device, including: a processor and a memory coupled to the processor; the processor is configured to execute the method described in the first aspect.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores instructions, and when the instructions are run on a computer, the computer is caused to execute the above-mentioned first aspect. method.
  • a fifth aspect of the embodiments of the present application provides a computer program product including instructions. When the instructions are run on a computer, the computer is caused to execute the method described in the first aspect.
  • a sixth aspect of the embodiments of the present application provides a video image encoder, where the video image encoder includes the device described in the second aspect.
  • a seventh aspect of the embodiments of the present application provides a video image decoder, where the video image decoder includes the device described in the second aspect.
  • FIG. 1 is an exemplary block diagram of a video encoding and decoding system according to an embodiment of the present application
  • FIG. 2 is an exemplary block diagram of a video encoder according to an embodiment of the present application.
  • FIG. 3 is an exemplary block diagram of a video decoder according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a positional relationship between an image block to be processed and a reference block thereof in an embodiment of the present application
  • FIG. 6 is an exemplary flowchart of an inter prediction method according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a motion vector corresponding to a weighted basic prediction block according to an embodiment of the present application.
  • FIG. 8 is another schematic diagram of a motion vector corresponding to a weighted basic prediction block according to an embodiment of the present application.
  • FIG. 9 is another schematic diagram of a motion vector corresponding to a weighted basic prediction block according to an embodiment of the present application.
  • FIG. 10 is an exemplary block diagram of an inter prediction apparatus according to an embodiment of the present application.
  • FIG. 11 is an exemplary block diagram of a decoding device according to an embodiment of the present application.
  • FIG. 1 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present application.
  • video coder generally refers to both video encoders and video decoders.
  • video coding or “coding” may generally refer to video encoding or video decoding.
  • the video encoder 100 and the video decoder 200 of the video decoding system 1 are configured to predict a current coded image block according to various method examples described in any of a variety of new inter prediction modes proposed in the present application.
  • the motion information of the sub-block or its sub-blocks makes the predicted motion vector close to the motion vector obtained using the motion estimation method to the greatest extent, so that the motion vector difference is not transmitted during encoding, thereby further improving the encoding and decoding performance.
  • the video decoding system 1 includes a source device 10 and a destination device 20.
  • the source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device.
  • the destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device.
  • Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors.
  • the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other media that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
  • the source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called “smart” phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.
  • the destination device 20 may receive the encoded video data from the source device 10 via the link 30.
  • the link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20.
  • the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time.
  • the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20.
  • the one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the Internet).
  • the one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 10 to the destination device 20.
  • the encoded data may be output from the output interface 140 to the storage device 40.
  • the encoded data can be accessed from the storage device 40 through the input interface 240.
  • the storage device 40 may include any of a variety of distributed or locally-accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10.
  • the destination device 20 may access the stored video data from the storage device 40 via streaming or download.
  • the file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20.
  • Example file servers include a web server (eg, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive.
  • the destination device 20 can access the encoded video data through any standard data connection, including an Internet connection.
  • This may include a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
  • the transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.
  • the motion vector prediction technology of the present application can be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), for storage in data storage Encoding of video data on media, decoding of video data stored on data storage media, or other applications.
  • the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
  • the video decoding system 1 illustrated in FIG. 1 is merely an example, and the techniques of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. .
  • data is retrieved from local storage, streamed over a network, and so on.
  • the video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data.
  • encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
  • the source device 10 includes a video source 120, a video encoder 100, and an output interface 140.
  • the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter.
  • Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.
  • the video encoder 100 may encode video data from the video source 120.
  • the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140.
  • the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.
  • the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220.
  • the input interface 240 includes a receiver and / or a modem.
  • the input interface 240 may receive the encoded video data via the link 30 and / or from the storage device 40.
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data.
  • the display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream.
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
  • UDP User Datagram Protocol
  • Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus implementing the technology of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).
  • codec device / Decoder
  • This application may generally refer to video encoder 100 as “signaling” or “transmitting” certain information to another device, such as video decoder 200.
  • the terms “signaling” or “transmitting” may generally refer to the transmission of syntax elements and / or other data to decode the compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded stream to a computer-readable storage medium at the time of encoding, and the decoding device may then store the syntax element after the syntax element is stored on this medium. retrieve the syntax element at any time.
  • H.265 HEVC
  • HM HEVC test model
  • the latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265.
  • the latest version of the standard document is H.265 (12/16).
  • the standard document is in full text.
  • the citation is incorporated herein.
  • HM assumes that video decoding devices have several additional capabilities over existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction encoding modes, while HM provides up to 35 intra-prediction encoding modes.
  • H.266 test model The evolution model of the video decoding device.
  • the algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet. The latest algorithm description is included in JVET-F1001-v2.
  • the algorithm description document is incorporated herein by reference in its entirety.
  • the reference software for the JEM test model can be obtained from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.
  • HM can divide a video frame or image into a sequence of tree blocks or maximum coding units (LCUs) containing both luminance and chrominance samples.
  • LCUs are also known as CTUs.
  • the tree block has a similar purpose as the macro block of the H.264 standard.
  • a slice contains several consecutive tree blocks in decoding order.
  • a video frame or image can be split into one or more slices.
  • Each tree block can be split into coding units according to a quadtree. For example, a tree block that is a root node of a quad tree may be split into four child nodes, and each child node may be a parent node and split into another four child nodes.
  • the final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks.
  • decoding nodes such as decoded video blocks.
  • the syntax data associated with the decoded codestream can define the maximum number of times a tree block can be split, and can also define the minimum size of a decoding node.
  • the coding unit includes a decoding node, a prediction unit (PU), and a transformation unit (TU) associated with the decoding node.
  • the size of the CU corresponds to the size of the decoding node and the shape must be square.
  • the size of the CU can range from 8 ⁇ 8 pixels to a maximum 64 ⁇ 64 pixels or larger tree block size.
  • Each CU may contain one or more PUs and one or more TUs.
  • the syntax data associated with a CU may describe a case where a CU is partitioned into one or more PUs.
  • the partitioning mode may be different between cases where the CU is skipped or is encoded in direct mode, intra prediction mode, or inter prediction mode.
  • the PU can be divided into non-square shapes.
  • the syntax data associated with a CU may also describe a case where a CU is partitioned into one or more TUs according to a quadtree.
  • the shape of the TU can be square or non-square.
  • the HEVC standard allows transformation based on the TU, which can be different for different CUs.
  • the TU is usually sized based on the size of the PUs within a given CU defined for the partitioned LCU, but this may not always be the case.
  • the size of the TU is usually the same as or smaller than the PU.
  • a quad-tree structure called "residual quad-tree" (RQT) can be used to subdivide the residual samples corresponding to the CU into smaller units.
  • the leaf node of RQT may be called TU.
  • the pixel difference values associated with the TU may be transformed to produce a transformation coefficient, which may be quantized.
  • the PU contains data related to the prediction process.
  • the PU may include data describing the intra-prediction mode of the PU.
  • the PU may include data defining a motion vector of the PU.
  • the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).
  • TU uses transform and quantization processes.
  • a given CU with one or more PUs may also contain one or more TUs.
  • video encoder 100 may calculate a residual value corresponding to the PU.
  • the residual values include pixel differences that can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding.
  • This application generally uses the term "video block" to refer to the decoding node of a CU.
  • the term “video block” may also be used in this application to refer to a tree block including a decoding node and a PU and a TU, such as an LCU or a CU.
  • HM supports prediction of various PU sizes. Assuming the size of a specific CU is 2N ⁇ 2N, HM supports intra prediction of PU sizes of 2N ⁇ 2N or N ⁇ N, and symmetric PU sizes of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, or N ⁇ N between frames prediction. HM also supports asymmetric partitioning of PU-sized inter predictions of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N. In asymmetric partitioning, one direction of the CU is not partitioned, and the other direction is partitioned into 25% and 75%.
  • 2N ⁇ nU refers to a horizontally divided 2N ⁇ 2NCU, where 2N ⁇ 0.5NPU is at the top and 2N ⁇ 1.5NPU is at the bottom.
  • N ⁇ N and “N times N” are used interchangeably to refer to the pixel size of a video block according to vertical and horizontal dimensions, for example, 16 ⁇ 16 pixels or 16 ⁇ 16 pixels.
  • an N ⁇ N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value.
  • Pixels in a block can be arranged in rows and columns.
  • the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction.
  • a block may include N ⁇ M pixels, where M is not necessarily equal to N.
  • the video encoder 100 may calculate the residual data of the TU of the CU.
  • a PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may include transforming (e.g., discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after being applied to the residual video data.
  • the residual data may correspond to a pixel difference between a pixel of an uncoded image and a prediction value corresponding to a PU.
  • the video encoder 100 may form a TU including residual data of a CU, and then transform the TU to generate a transform coefficient of the CU.
  • video encoder 100 may perform quantization of the transform coefficients.
  • Quantization exemplarily refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.
  • the JEM model further improves the coding structure of video images.
  • a block coding structure called "Quad Tree Combined with Binary Tree” (QTBT) is introduced.
  • QTBT Quality Tree Combined with Binary Tree
  • a CU can be square or rectangular.
  • a CTU first performs a quadtree partition, and the leaf nodes of the quadtree further perform a binary tree partition.
  • there are two partitioning modes in binary tree partitioning symmetrical horizontal partitioning and symmetrical vertical partitioning.
  • the leaf nodes of a binary tree are called CUs.
  • the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded.
  • the video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may perform context-adaptive variable length decoding (CAVLC), context-adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary Arithmetic decoding (SBAC), probability interval partition entropy (PIPE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector.
  • Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 to decode the video data.
  • the video encoder may perform inter prediction to reduce temporal redundancy between images.
  • a CU may have one or more prediction units PU according to the provisions of different video compression codec standards.
  • multiple PUs may belong to a CU, or PUs and CUs are the same size.
  • the CU's partitioning mode is not split, or it is split into one PU, and the PU is uniformly used for expression.
  • the video encoder may signal the video decoder motion information for the PU.
  • the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier.
  • a motion vector may indicate a displacement between an image block (also called a video block, a pixel block, a pixel set, etc.) of a PU and a reference block of the PU.
  • the reference block of the PU may be a part of the reference picture similar to the image block of the PU.
  • the reference block may be located in a reference image indicated by a reference image index and a prediction direction identifier.
  • a merge mode also referred to herein as a merge prediction mode
  • the video encoder may generate additional candidate prediction motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying the original candidate prediction motion vectors, or inserting only zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered as original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.
  • Video encoders and decoders may first consider spatial candidate prediction motion vectors (e.g., neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (e.g., candidate prediction motion vectors in different images), and finally consider The artificially generated candidate prediction motion vectors are added until a desired number of candidate prediction motion vectors are added to the list.
  • a pruning operation may be used for certain types of candidate prediction motion vectors during the construction of the candidate prediction motion vector list in order to remove duplicates from the candidate prediction motion vector list, while for other types of candidate prediction motion vectors, it may not be Use pruning to reduce decoder complexity.
  • a pruning operation may be performed to exclude candidate prediction motion vectors with duplicate motion information from the list of candidate prediction motion vectors.
  • artificially generated candidate predicted motion vectors may be added without performing a trimming operation on the artificially generated candidate predicted motion vectors.
  • the video encoder may select the candidate prediction motion vector from the candidate prediction motion vector list and output the candidate prediction motion vector index in the code stream.
  • the selected candidate prediction motion vector may be a candidate prediction motion vector having a motion vector that most closely matches the predictor of the target PU being decoded.
  • the candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list.
  • the video encoder may also generate a predictive image block for the PU based on a reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector.
  • the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
  • the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector.
  • the video encoder may generate one or more residual image blocks for the CU based on the predictive image blocks of the PU of the CU and the original image blocks for the CU. The video encoder may then encode one or more residual image blocks and output one or more residual image blocks in a code stream.
  • the codestream may include data identifying a selected candidate prediction motion vector in the candidate prediction motion vector list of the PU.
  • the video decoder may determine the motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list of the PU.
  • the video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate predictive image blocks for the PU based on the one or more reference blocks of the PU.
  • the video decoder may reconstruct an image block for a CU based on a predictive image block for a PU of the CU and one or more residual image blocks for the CU.
  • the present application may describe a position or an image block as having various spatial relationships with a CU or a PU. This description can be interpreted to mean that the position or image block and the image block associated with the CU or PU have various spatial relationships.
  • a PU currently being decoded by a video decoder may be referred to as a current PU, and may also be referred to as a current image block to be processed.
  • This application may refer to the CU that the video decoder is currently decoding as the current CU.
  • This application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that this application is applicable to a case where the PU and the CU have the same size, or the PU is the CU, and the PU is used to represent the same.
  • video encoder 100 may use inter prediction to generate predictive image blocks and motion information for a PU of a CU.
  • the motion information of a given PU may be the same or similar to the motion information of one or more nearby PUs (ie, PUs whose image blocks are spatially or temporally near the image blocks of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may refer to the motion information of nearby PUs to encode motion information for a given PU. Encoding the motion information of a given PU with reference to the motion information of nearby PUs can reduce the number of encoding bits required to indicate the motion information of a given PU in the code stream.
  • Video encoder 100 may refer to motion information of nearby PUs in various ways to encode motion information for a given PU.
  • video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs.
  • This application may use a merge mode to refer to indicating that the motion information of a given PU is the same as that of nearby PUs or may be derived from the motion information of nearby PUs.
  • the video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU.
  • MVD Motion Vector Difference
  • MVD indicates the difference between the motion vector of a given PU and the motion vector of a nearby PU.
  • Video encoder 100 may include MVD instead of a motion vector of a given PU in the motion information of a given PU. Representing MVD in the codestream requires fewer coding bits than representing the motion vector of a given PU.
  • This application may use advanced motion vector prediction mode to refer to the motion information of a given PU by using the MVD and an index value identifying a candidate motion vector.
  • the video encoder 100 may generate a list of candidate predicted motion vectors for a given PU.
  • the candidate prediction motion vector list may include one or more candidate prediction motion vectors.
  • Each of the candidate prediction motion vectors in the candidate prediction motion vector list for a given PU may specify motion information.
  • the motion information indicated by each candidate prediction motion vector may include a motion vector, a reference image index, and a prediction direction identifier.
  • the candidate prediction motion vectors in the candidate prediction motion vector list may include "raw" candidate prediction motion vectors, each of which indicates motion information that is different from one of the specified candidate prediction motion vector positions within a PU of a given PU.
  • the video encoder 100 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, a video encoder may compare each candidate prediction motion vector with the PU being decoded and may select a candidate prediction motion vector with a desired code rate-distortion cost. Video encoder 100 may output a candidate prediction motion vector index for a PU. The candidate prediction motion vector index may identify the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.
  • the video encoder 100 may generate a predictive image block for a PU based on a reference block indicated by motion information of the PU.
  • the motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU.
  • the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
  • motion information of a PU may be determined based on a motion vector difference for the PU and motion information indicated by a selected candidate prediction motion vector.
  • Video encoder 100 may process predictive image blocks for a PU as described previously.
  • video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU.
  • the candidate prediction motion vector list generated by the video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by the video encoder 100 for the PU.
  • the syntax element parsed from the bitstream may indicate the position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU.
  • the video decoder 200 may generate predictive image blocks for the PU based on one or more reference blocks indicated by the motion information of the PU.
  • Video decoder 200 may determine motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. Video decoder 200 may reconstruct an image block for a CU based on a predictive image block for a PU and a residual image block for a CU.
  • the construction of the candidate prediction motion vector list and the parsing of the selected candidate prediction motion vector from the code stream in the candidate prediction motion vector list are independent of each other, and can be arbitrarily Sequentially or in parallel.
  • the position of the selected candidate prediction motion vector in the candidate prediction motion vector list is first parsed from the code stream, and a candidate prediction motion vector list is constructed based on the parsed position.
  • a candidate prediction motion vector list is constructed based on the parsed position.
  • the selected candidate predictive motion vector is obtained by parsing the bitstream and is a candidate predictive motion vector with an index of 3 in the candidate predictive motion vector list, only the candidate predictive motion vector from index 0 to index 3 needs to be constructed
  • the list can determine the candidate predicted motion vector with the index of 3, which can achieve the technical effect of reducing complexity and improving decoding efficiency.
  • FIG. 2 is a block diagram of a video encoder 100 according to an example described in the embodiment of the present application.
  • the video encoder 100 is configured to output a video to the post-processing entity 41.
  • the post-processing entity 41 represents an example of a video entity that can process the encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device.
  • the post-processing entity 41 may be an instance of a network entity.
  • the post-processing entity 41 and the video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out.
  • the post-processing entity 41 is an example of the storage device 40 of FIG. 1.
  • the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded image buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103.
  • the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109.
  • the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111.
  • the filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
  • the filter unit 106 is shown as an in-loop filter in FIG. 2, in other implementations, the filter unit 106 may be implemented as a post-loop filter.
  • the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).
  • the video data memory may store video data to be encoded by the components of the video encoder 100.
  • the video data stored in the video data storage may be obtained from the video source 120.
  • the DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode.
  • Video data memory and DPB 107 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), synchronous resistive RAM (MRAM), resistive RAM (RRAM) including synchronous DRAM (SDRAM), Or other types of memory devices.
  • Video data storage and DPB 107 can be provided by the same storage device or separate storage devices.
  • the video data memory may be on-chip with other components of video encoder 100 or off-chip relative to those components.
  • the video encoder 100 receives video data and stores the video data in a video data memory.
  • the segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units.
  • Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded.
  • the slice can be divided into multiple image patches (and possibly into a collection of image patches called slices).
  • the prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes.
  • the prediction processing unit 108 may provide the obtained intra, inter-coded block to the summer 112 to generate a residual block, and to the summer 111 to reconstruct an encoded block used as a reference image.
  • the intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy.
  • the inter predictor 110 within the prediction processing unit 108 may perform inter predictive coding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.
  • the inter predictor 110 may be configured to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use a rate-distortion analysis to calculate the rate-distortion values of various inter-prediction modes in the set of candidate inter-prediction modes, and select from them the best rate-distortion characteristics Inter prediction mode.
  • Code rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate (also That is, the number of bits).
  • the inter predictor 110 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.
  • the inter predictor 110 is configured to predict motion information (such as a motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and use the motion information (such as the motion vector) of one or more sub-blocks in the current image block. Motion vector) to obtain or generate a prediction block of the current image block.
  • the inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists.
  • the inter predictor 110 may also generate syntax elements associated with the image blocks and video slices for use by the video decoder 200 when decoding the image blocks of the video slices.
  • the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block. It should be understood that the The inter predictor 110 performs motion estimation and motion compensation processes.
  • the intra predictor 109 may perform intra prediction on the current image block.
  • the intra predictor 109 may determine an intra prediction mode used to encode the current block.
  • the intra predictor 109 may use a rate-distortion analysis to calculate the rate-distortion values of various intra-prediction modes to be tested, and select the one with the best rate-distortion characteristics from the modes to be tested.
  • Intra prediction mode In any case, after the intra prediction mode is selected for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication Information on the selected intra prediction mode.
  • the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
  • the summer 112 represents one or more components that perform this subtraction operation.
  • the residual video data in the residual block may be included in one or more TUs and applied to the transformer 101.
  • the transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform.
  • the transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.
  • DCT discrete cosine transform
  • the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique.
  • CAVLC context-adaptive variable-length coding
  • CABAC context-adaptive binary arithmetic coding
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval segmentation entropy Coding or another entropy coding method or technique.
  • the encoded code stream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200.
  • the entropy encoder 103 may also perform entrop
  • the entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements.
  • the entropy decoder 203 forwards the syntax elements to the prediction processing unit 208.
  • Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.
  • the video encoder 100 may signal whether to use a new inter prediction mode to decode a specific syntax element of a specific block, or may be a signal to indicate whether to use a new inter prediction mode. And indicating which new inter prediction mode is used to decode a specific syntax element of a specific block. It should be understood that the inter predictor 210 here performs a motion compensation process.
  • the filter unit 206 is shown as an in-loop filter in FIG. 3, in other implementations, the filter unit 206 may be implemented as a post-loop filter.
  • the filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream.
  • a decoded image block in a given frame or image may also be stored in a decoded image buffer 207, and the decoded image buffer 207 stores a reference image for subsequent motion compensation.
  • the decoded image buffer 207 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 220 of FIG. 1, or may be separate from such memory.
  • the techniques of this application exemplarily involve inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application.
  • the video decoder includes, for example, the video encoder 100 and video decoding as shown and described with respect to FIGS. 1-3. ⁇ 200 ⁇ 200. That is, in one feasible implementation, the inter predictor 110 described with respect to FIG. 2 may perform specific techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to FIG. 3 may perform specific techniques described below when performing inter prediction during decoding of blocks of video data.
  • a reference to a generic "video encoder" or "video decoder” may include video encoder 100, video decoder 200, or another video encoding or coding unit.
  • FIG. 4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application.
  • the inter prediction module 121 may include a motion estimation unit 42 and a motion compensation unit 44.
  • the relationship between PU and CU is different in different video compression codecs.
  • the inter prediction module 121 may partition a current CU into a PU according to a plurality of partitioning modes.
  • the inter prediction module 121 may partition a current CU into a PU according to 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, and N ⁇ N partition modes.
  • the current CU is the current PU, which is not limited.
  • a motion vector generated by performing FME on a PU may have sub-integer precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.).
  • the inter prediction module 121 may use the motion vector for the PU to generate a predictive image block for the PU.
  • the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU.
  • the candidate prediction motion vector list may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors.
  • the inter prediction module 121 may select the candidate prediction motion vector from the candidate prediction motion vector list and generate a motion vector difference (MVD) for the PU.
  • the MVD for a PU may indicate a difference between a motion vector indicated by a selected candidate prediction motion vector and a motion vector generated for the PU using IME and FME.
  • the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.
  • the inter prediction module 121 may also output the MVD of the PU.
  • the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. The inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
  • the inter prediction module 121 may select a predictive image block generated through the FME operation or a merge operation. Predictive image blocks. In some feasible implementations, the inter prediction module 121 may select a predictive image for a PU based on a code rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation. Piece.
  • the inter prediction module 121 may select a partitioning mode for the current CU. In some embodiments, the inter prediction module 121 may select a rate-distortion cost analysis for a selected predictive image block of the PU generated by segmenting the current CU according to each of the partitioning modes to select the Split mode.
  • the inter prediction module 121 may output a predictive image block associated with a PU belonging to the selected partition mode to the residual generation module 102.
  • the inter prediction module 121 may output a syntax element indicating motion information of a PU belonging to the selected partitioning mode to the entropy encoding module 116.
  • the inter prediction module 121 includes IME modules 180A to 180N (collectively referred to as “IME module 180”), FME modules 182A to 182N (collectively referred to as “FME module 182”), and merge modules 184A to 184N (collectively referred to as Are “merging module 184"), PU mode decision modules 186A to 186N (collectively referred to as “PU mode decision module 186”) and CU mode decision module 188 (which may also include performing a mode decision process from CTU to CU).
  • IME module 180 IME modules 180A to 180N
  • FME module 182 FME modules 182A to 182N
  • merge modules 184A to 184N collectively referred to as Are “merging module 184"
  • PU mode decision modules 186A to 186N collectively referred to as "PU mode decision module 186”
  • CU mode decision module 188 which may also include performing a mode decision process from CTU to CU).
  • the IME module 180, the FME module 182, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a PU of the current CU.
  • the inter prediction module 121 is illustrated in the schematic diagram of FIG. 4 as including a separate IME module 180, an FME module 182, and a merging module 184 for each PU of each partitioning mode of the CU. In other feasible implementations, the inter prediction module 121 does not include a separate IME module 180, an FME module 182, and a merge module 184 for each PU of each partitioning mode of the CU.
  • the IME module 180A, the FME module 182A, and the merge module 184A may perform IME operations, FME operations, and merge operations on a PU generated by dividing a CU according to a 2N ⁇ 2N split mode.
  • the PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, the FME module 182A, and the merge module 184A.
  • the IME module 180B, the FME module 182B, and the merge module 184B may perform an IME operation, an FME operation, and a merge operation on a left PU generated by dividing a CU according to an N ⁇ 2N division mode.
  • the PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, the FME module 182B, and the merge module 184B.
  • the IME module 180C, the FME module 182C, and the merge module 184C may perform an IME operation, an FME operation, and a merge operation on a right PU generated by dividing a CU according to an N ⁇ 2N division mode.
  • the PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.
  • the IME module 180N, the FME module 182N, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a lower right PU generated by dividing a CU according to an N ⁇ N division mode.
  • the PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.
  • the PU mode decision module 186 may select a predictive image block based on a code rate-distortion cost analysis of a plurality of possible predictive image blocks, and select a predictive image block that provides the best code rate-distortion cost for a given decoding situation. For example, for bandwidth-constrained applications, the PU mode decision module 186 may prefer to select predictive image blocks that increase the compression ratio, while for other applications, the PU mode decision module 186 may prefer to select predictive images that increase the quality of the reconstructed video. Piece.
  • the CU mode decision module 188 selects a partition mode for the current CU and outputs the predictive image block and motion information of the PU belonging to the selected partition mode. .
  • FIG. 5 shows a schematic diagram of an exemplary image block to be processed and a reference block thereof in an embodiment of the present application.
  • W and H are the width and height of the image block 500 to be processed and the co-located block (referred to simply as a mapped image block) 500 'in the designated reference image.
  • the reference blocks of the to-be-processed image block include: the upper space-domain adjacent block and the left-space region adjacent block of the to-be-processed image block, and the lower-space region adjacent block and the right-space region adjacent block of the mapped image block, where the mapped image block is a designated reference
  • the image block in the image has the same size and shape as the image block to be processed, and the position of the mapped image block in the designated reference image is the same as the position of the image block to be processed in the image (generally, the current image to be processed).
  • the lower spatial contiguous block and the right spatial contiguous block of the mapped image block may also be referred to as time-domain reference blocks.
  • Each frame of image can be divided into image blocks for encoding, and these image blocks can be further divided into smaller blocks.
  • the to-be-processed image block and the mapped image block can be divided into multiple MxN sub-blocks, that is, each sub-block is MxN pixels in size. It may be useful to set the size of each reference block to be MxN pixels.
  • the sub-blocks are the same size.
  • M ⁇ N" and “M times N” are used interchangeably to refer to the pixel size of the image sub-block according to the horizontal and vertical dimensions, that is, there are M pixels in the horizontal direction and N pixels in the vertical direction , Where M and N represent non-negative integer values. In addition, M and N are not necessarily the same.
  • the sub-block size and reference block size of the image block to be processed may be 4 ⁇ 4, 8 ⁇ 8, 8 ⁇ 4, or 4 ⁇ 8 pixels, or the minimum size of the prediction block allowed by the standard.
  • the measurement units of W and H are the width and height of the sub-block, respectively, that is, W represents the ratio of the width of the image block to be processed and the width of the sub-block in the image block to be processed, and H represents the to-be-processed The ratio of the height of an image block to the height of a sub-block in the image block to be processed.
  • the to-be-processed image blocks described in this application can be understood as, but not limited to, a prediction unit (PU), a coding unit (CU), or a transformation unit (TU).
  • a CU may include one or more prediction units PU, or the PU and the CU have the same size.
  • Image blocks can have fixed or variable sizes and differ in size according to different video compression codec standards.
  • the image block to be processed refers to an image block that is currently to be encoded or to be decoded, such as a prediction unit to be encoded or to be decoded.
  • each left-side spatial adjacent block of the image block to be processed is available along direction 1 and sequentially determine each of the image blocks to be processed along direction 2.
  • Whether the side space domain adjacent block is available for example, to determine whether the above adjacent block adopts inter-frame coding. If the adjacent block exists and uses inter-frame coding, the adjacent block is available; if the adjacent block does not exist or intra-frame coding is used, the adjacency The block is not available.
  • the motion information of other neighboring reference blocks is copied as the motion information of the adjacent block. It is similarly used to detect whether the lower space adjacent block and the right space adjacent block of the mapped image block are available, and details are not described herein again.
  • the motion information is based on a 4x4 pixel set as the basic unit for storing motion information.
  • the basic unit that stores motion information may be referred to simply as the basic storage unit.
  • the motion information stored in the basic storage unit corresponding to the reference block may be directly obtained as the motion information corresponding to the reference block.
  • the motion information stored in the basic storage unit corresponding to the reference block may be directly obtained as the motion information corresponding to the reference block.
  • the motion information stored in the corresponding basic storage unit at a predetermined position of the reference block may be obtained.
  • the motion information stored in the corresponding basic storage unit at the upper left corner of the reference block may be obtained, or the motion information stored in the corresponding basic storage unit at the center point of the reference block may be obtained as the corresponding motion of the reference block. information.
  • a sub-block of an image to be processed is also referred to as a basic prediction block.
  • FIG. 6 exemplarily shows a schematic flowchart of obtaining a motion vector of each basic prediction block inside a to-be-processed image block according to a weighted motion vector corresponding to a reference block of the to-be-processed image block in the embodiment of the present application, including:
  • S601. Determine a size of a basic prediction block in an image block to be processed, where the size is used to determine a position of the basic prediction block in the image block to be processed;
  • the size of the basic prediction block in the image block to be processed may be a preset fixed value, which is determined in advance by the encoding and decoding end, and is respectively fixed at the encoding and decoding end.
  • a shorter one of the basic prediction blocks is determined.
  • the side length of a side is 4 or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, that is, the basic prediction block is a square, and the side length of the basic prediction block is determined to be 4 or 8. It should be understood that the above-mentioned side length of 4 or 8 is only an exemplary value, and may be other constants such as 16,24.
  • the size of the basic prediction block in the image block to be processed may be obtained by parsing a code stream, specifically: parsing a first identifier from the code stream, where the first identifier is used to indicate the The size of the basic prediction block, wherein the first identifier is located in a sequence parameter set (SPS) of a sequence where the image block to be processed is located, and a picture parameter set (picture parameter set of an image where the image block to be processed is located) (PPS) and a code stream segment corresponding to one of a slice header (slice header, or slice header) of a slice where the image block to be processed is located.
  • SPS sequence parameter set
  • PPS picture parameter set (picture parameter set of an image where the image block to be processed is located)
  • code stream segment corresponding to one of a slice header (slice header, or slice header) of a slice where the image block to be processed is located.
  • the corresponding syntax elements can be parsed from the code stream to determine the size of the basic prediction block.
  • the syntax element may be carried in the code stream part corresponding to the SPS in the code stream, or in the code stream part corresponding to the PPS in the code stream, and may also be carried in the code stream part of the corresponding stream header in the code stream.
  • the basic prediction block in the entire sequence adopts the same size
  • the size of the basic prediction block in the PPS uses The same size.
  • the basic prediction block in the entire slice adopts the same size.
  • Images and image frames are different concepts. Images include images that exist in the form of entire frames (that is, image frames), and also include images that exist in the form of slices, in the form of tiles. Existing images, or images existing in the form of other sub-images, are not limited.
  • the slice header of the slice using intra prediction does not have the first identifier described above.
  • the encoding end determines the size of the basic prediction block in an appropriate manner (for example, a rate-distortion selection method or an experimental experience value method), encodes the determined size of the basic prediction block into a code stream, and the decoding end reads the code from the code.
  • the size of the basic prediction block is parsed in the stream.
  • the size of the basic prediction block in the image block to be processed is determined through historical information, so it can be adaptively obtained at the encoding and decoding end respectively. Specifically, according to the plane in the previously reconstructed image, The size of the mode prediction block determines the size of the basic prediction block.
  • the planar mode prediction block is an image block to be processed for inter prediction according to the method according to any one of claims 1 to 7.
  • the reconstructed image is an image whose encoding order is before the image where the image block to be processed is located.
  • the image block to be processed for inter prediction using the method described in the embodiment of the present application may be referred to as a planar mode prediction block.
  • the size of the basic prediction block in the image (hereinafter simply referred to as the current image) in which the image block to be processed is located can be estimated according to the size of the statistical plane mode prediction block in the previously encoded image.
  • the previously reconstructed image is an image whose encoding order is before the image where the image block to be processed is located, which can also be described as
  • the previously reconstructed image is an image whose decoding order is before the image where the image block to be processed is located.
  • the image A and the image B are the same.
  • the same reconstructed image can be analyzed to obtain the same prior information. Based on the prior information, the size of the basic prediction block can be determined. The same result can be obtained at the encoding and decoding end, that is, the adaptive determination of the size of the basic prediction block is achieved. mechanism.
  • the size of the basic prediction block is a first size
  • the size of the basic prediction block is a second size, wherein the first size is smaller than the second size.
  • the threshold value Is the first threshold; when the POC of at least one reference frame of the image in which the image block to be processed is greater than the POC of the image in which the image block is to be processed, the threshold is a second threshold, wherein the first The threshold is different from the second threshold.
  • the threshold is set to the first value. For example, it can be set to 75.
  • the POC of at least one reference frame of the current image is greater than the POC of the current image, and the threshold is set to a second value, which may be set to 27 as an example. It should be understood that the setting of the first value and the second value is not limited.
  • the first size is smaller than the second size.
  • the relationship between the first size and the second size may include a first size of 4 (square side length) and a second size of 8 (square side length). Including the first size is 4x4 and the second size is 8x8, it can also include the first size is 4x4 and the second size is 4x8, it can also include the first size is 4x8, the second size is 8x8, or it can include the first size as 4x8, the second size is 8x16, which is not limited.
  • the previously reconstructed image is a reconstructed image whose coding order is closest to an image where the image block to be processed is located, that is, the previously reconstructed image is a decoding order.
  • the reconstructed image closest to the image where the image block to be processed is located.
  • the current The size of the basic prediction block in the image frame, or the size of the basic prediction block in the current slice is determined according to the statistical information of all the planar mode prediction blocks in the previous slice of the current slice.
  • the image may also include other forms of sub-images, so it is not limited to image frames and bands.
  • the statistical information is updated in units of image frames or slices, that is, updated once per image frame or slice.
  • the previously reconstructed image is an image having the same time domain layer identifier as the image in which the image block to be processed is located, and the encoding order is farther from the image block in which the to-be-processed image block is located.
  • the closest reconstructed image of the image that is, the image in which the previously reconstructed image has the same time domain layer identifier as the image in which the image block to be processed is located, the decoding order is far from the location of the image block to be processed.
  • Image The most recently reconstructed image.
  • an image closest to the current image coding distance is determined from an image having the same temporal layer ID (temporal ID) as the current image.
  • temporal layer ID temporary ID
  • the previously reconstructed image is multiple images.
  • the calculating is a product of the width and height of all the planar mode prediction blocks in the previously reconstructed image.
  • the average value of the method includes: calculating an average value of a product of a width and a height of all the planar mode prediction blocks in the plurality of previously reconstructed images.
  • the above two feasible implementation manners respectively determine the size of the basic prediction block of the current image according to the statistical data of a single previously reconstructed image
  • a plurality of previously reconstructed images are accumulated Statistics to determine the basic prediction block size of the current image. That is, in this embodiment, the statistical information is updated by using multiple image frames or multiple strips as a unit, that is, once every preset number of image frames or every preset number of stripes is updated, or the statistical information may be updated. Keep accumulating without updating.
  • calculating an average value of a product of width and height of all the planar mode prediction blocks in the plurality of previously reconstructed images may include: separately counting each of the plurality of previously reconstructed images.
  • the average value of the product of the width and height of all the planar mode prediction blocks in the image, and the weighted average values of the respective statistics are used to obtain the final average value used for comparison with the threshold value in this embodiment. Including: accumulating the product of the width and height of all the planar mode prediction blocks in a plurality of previously reconstructed images, and dividing by the number of all the planar mode prediction blocks to obtain the The average of the above thresholds is compared.
  • the statistical information in a process of calculating an average value of a product of width and height of all the planar mode prediction blocks in the previously reconstructed image, it further includes determining that statistical information is valid, for example, if Without the plane mode prediction block in the previously reconstructed image, the average value cannot be calculated. At this time, the statistical information is invalid.
  • the statistical information may not be updated or the basic prediction block of the current image may be updated.
  • the size is set to a preset value. For example, for a square block, it can be set to 4x4.
  • the size of the basic prediction block is also set to a preset value.
  • determining the size of the basic prediction block in the image block to be processed further includes determining the shape of the basic prediction block.
  • the basic prediction block may also be determined. It is a square, or the aspect ratio of the image block to be processed is the same as that of the basic prediction block, or the width and height of the image block to be processed are divided into several equal parts to obtain the width and height of the basic prediction block.
  • the shape of the image block to be processed is not related to the shape of the basic prediction block.
  • the basic prediction block may be fixedly set to a square, or when the size of an image block to be processed is 32x16, the basic prediction block may be set to 16x8 or 8x4, and the like is not limited.
  • the method further includes:
  • each basic prediction block is the same. After the size of the basic prediction block is determined, the position of each basic prediction block can be calculated in turn according to the size in the image block to be processed.
  • the positions of the image block to be processed and the basic prediction block both exist in the form of coordinates, and this step only needs to determine the coordinates of each basic prediction block, or the image to be processed
  • the block is distinguished from the basic prediction block, and there is no physical division step.
  • S603. Determine a first reference block and a second reference block of the basic prediction block according to the position, wherein a left boundary line of the first reference block and a left boundary line of the basic prediction unit are collinear, and the first The upper boundary line of two reference blocks is collinear with the upper boundary line of the basic prediction unit, the first reference block is adjacent to the upper boundary line of the image block to be processed, and the second reference block is adjacent to the to-be-processed The left border of the image block is adjacent.
  • the original reference block having a preset position relationship with the image block to be processed includes: an original reference block having a preset spatial relationship with the image block to be processed and / or An original reference block having a preset time-domain position relationship with the image block to be processed.
  • the original reference block having a preset spatial relationship with the image block to be processed includes: a point located at an upper left corner of the image block to be processed and an upper left corner point of the image block to be processed An adjacent image block, an image block located at an upper right corner of the image block to be processed and adjacent to an upper right corner point of the image block to be processed, and an image block located at a lower left corner of the image block to be processed and the image block to be processed
  • One or more of the image blocks adjacent to the lower left corner of the image, wherein the original reference block having a preset spatial relationship with the image block to be processed is located outside the image block to be processed, and may be simply referred to as Reference block for airspace.
  • the original reference block having a preset time-domain positional relationship with the image block to be processed includes: located in a target reference frame in a lower right corner of the mapped image block and to the right of the mapped image block.
  • the block sizes are the same, and the position of the mapped image block in the target reference frame is the same as the position of the image block to be processed in the image frame where the image block to be processed is located, and may be simply referred to as a time domain reference block.
  • the index information and reference frame list information of the target reference frame are obtained by parsing the code stream.
  • the index information of the target reference frame and the reference frame list information are located in a code stream segment corresponding to a slice header of a slice where the image block to be processed is located.
  • steps S603 and S604 will be described below:
  • BR is the lower right corner point of the mapped image block located in the target reference frame and adjacent to the lower right corner point of the mapped image block
  • W is the ratio of the width of the image block to be processed to the width of the basic prediction block
  • x is the upper left corner of the basic prediction block relative to the upper left of the image block to be processed The ratio of the horizontal distance of the corner points to the width of the basic prediction block.
  • step S702A and step S702B do not limit the sequence relationship.
  • S703A Perform weighted calculation based on the motion vector corresponding to the first temporary block of the image block to be processed and the motion vector corresponding to the second reference block of the image block to be processed to obtain a first temporary motion vector P h (x, y ),
  • P h (x, y) (W-1-x) ⁇ L (-1, y) + (x + 1) ⁇ R (W, y).
  • S703B Perform weighted calculation based on the motion vector corresponding to the second temporary block of the image block to be processed and the motion vector corresponding to the first reference block of the image block to be processed to obtain a second temporary motion vector P v (x, y ),
  • step S703A and step S703B do not limit the sequence relationship.
  • the motion vector P (x, y) corresponding to the basic prediction unit may also be obtained by integrating the single formula of the above steps.
  • S802A Use the motion vector corresponding to the spatial domain reference block 805 in the upper right corner of the image block to be processed as the motion vector corresponding to the first temporary block 806 of the image block to be processed;
  • step S803A and step S803B do not limit the sequence relationship.
  • S902. Determine the first temporary block 806 and the second temporary block 809 according to the position of the basic prediction block 604 in the image block 600 to be processed, where the first temporary block is the position of the block 806 at the mapped image block in the target reference frame.
  • the image block, the second temporary block is an image block located at the position 808 of the mapped image block in the target reference frame, and the first temporary block and the second temporary block are both time-domain reference blocks.
  • step S903A and step S903B do not limit the sequence relationship.
  • the motion vector corresponding to the first reference block is A (x, -1).
  • the motion vector corresponding to the two reference blocks is L (-1, y);
  • S0102 Perform motion compensation according to the motion information of any spatial domain reference block of the image block to be processed, and determine the position of the reference frame information and the motion compensation block.
  • any of the above may be one of the available airspace adjacent blocks on the left or the upper airspace adjacent blocks shown in FIG. 5.
  • the airspace adjacent blocks may be the first ones detected along the direction 1.
  • the first available airspace adjacency block is shown in the sequence of L ⁇ A ⁇ AR ⁇ BL ⁇ AL shown in FIG. 7; it may also be an airspace adjacency block selected according to a predetermined rule, which is not limited.
  • step S0104A and step S0104B do not limit the sequence relationship.
  • S0105 Perform weighted calculation based on the first temporary motion vector and the second temporary motion vector of the image block to be processed to obtain a motion vector P (x, y) corresponding to the basic prediction unit.
  • the relationship between the image block and the basic storage unit storing the motion information is mentioned above. It may be useful to call the motion information stored in the basic storage unit corresponding to the image block the actual motion information of the image block, and the motion information includes the motion vector and the motion vector Index information of the reference frame pointed to. It should be understood that the index information of the reference frames of the respective reference blocks used for weighted calculation of the motion vector of the basic prediction block cannot be guaranteed to be consistent. When the index information of the reference frame of each reference block is consistent, the motion information corresponding to the reference block is the actual motion information of the reference block. When the index information of the reference frames of the reference blocks is inconsistent, the actual motion vector of the reference block needs to be weighted according to the distance relationship between the reference frames indicated by the reference frame index. The motion vectors in the information are weighted.
  • the target reference image index is determined.
  • the target reference image index may be fixed to 0, 1, or other index values. It may also be the reference image index that is used most frequently in the reference image list, such as the actual motion vectors or The weighted motion vector points to the reference image index with the highest number of times.
  • index information of the reference frame of a certain reference block is different from the target image index, based on the time distance between the image of the reference block and the reference frame image indicated by the actual motion information (reference frame index information) of the reference block, and the reference
  • the ratio of the time distance between the image where the block is located and the reference image indicated by the target reference image index is used to scale the actual motion vector to obtain a weighted motion vector.
  • step S604 the method further includes:
  • the method includes: first merging neighboring basic prediction blocks having the same motion information, and then performing motion compensation using the merged image block as a motion compensation unit.
  • each row of basic prediction blocks in the image block to be processed is sequentially judged from left to right on the motion information of the basic prediction block and the adjacent basic prediction block (exemplary, including motion vectors, (Reference frame list, reference frame index information).
  • the adjacent basic prediction block exemplary, including motion vectors, (Reference frame list, reference frame index information).
  • merge two adjacent basic prediction blocks and continue to judge whether the motion information of the next basic prediction block adjacent to the merged basic prediction block is the same as the motion information of the merged basic prediction block.
  • stop merging and continue to use the basic prediction block with different motion information as a starting point to continue the adjacent basic prediction with the same motion information.
  • the step of merging blocks until the end of the basic prediction block line.
  • motion compensation is performed using the merged basic prediction block as a unit of motion compensation.
  • a merging manner of merging adjacent basic prediction blocks having the same motion information is related to a shape of an image block to be processed.
  • the width of the image block to be processed is greater than or equal to the height of the image block to be processed, only the horizontal merge manner described above is used to merge the basic prediction blocks.
  • each column of the basic prediction block in the to-be-processed image block sequentially judges the motion information of the basic prediction block and the basic prediction block adjacent thereto (example (Including the motion vector, the reference frame list, and the reference frame index information) are the same.
  • step S601 the method further includes:
  • the first reference block does not exist, and the solution in the embodiment of the present application is not applicable at this time.
  • the second reference block does not exist, and the solution in the embodiment of the present application is not applicable at this time.
  • step S601 the method further includes:
  • S607. Determine that a width of the image block to be processed is greater than or equal to 16 and a height of the image block to be processed is greater than or equal to 16; or determine that a width of the image block to be processed is greater than or equal to 16; The height of the image block to be processed is greater than or equal to 16.
  • the solution in the embodiment of the present application is not applicable, or when the width of the image block to be processed is less than 16 and the height is less than 16, the The plan is not applicable.
  • 16 is used as the threshold here, and other values such as 8, 24, 32 may also be used.
  • the thresholds corresponding to the width and height may also be different, and are not limited.
  • steps S606 and S607 may be performed in cooperation.
  • the image block to be processed is located at the left or upper border of the image frame, or the width and height of the image block to be processed are both less than 16, the embodiments in this application cannot be adopted.
  • the image block to be processed is located at the left or upper boundary of the image frame, or the width or height of the image block to be processed is less than 16, the present application cannot be implemented.
  • FIG. 10 is a schematic block diagram of an inter prediction apparatus 1000 according to an embodiment of the present application. Specifically, the apparatus includes:
  • a determining module 1001 configured to determine a size of a basic prediction block in an image block to be processed, where the size is used to determine a position of the basic prediction block in the image block to be processed;
  • a positioning module 1002 is configured to determine a first reference block and a second reference block of the basic prediction block according to the position, wherein a left boundary line of the first reference block and a left boundary line of the basic prediction unit are collinear. , The upper boundary line of the second reference block and the upper boundary line of the basic prediction unit are collinear, the first reference block is adjacent to the upper boundary line of the image block to be processed, and the second reference block is The left border lines of the image blocks to be processed are adjacent;
  • a calculation module 1003 is configured to calculate a motion vector corresponding to the first reference block, a motion vector corresponding to the second reference block, and a motion vector corresponding to an original reference block having a preset position relationship with the image block to be processed. Weight calculation is performed to obtain a motion vector corresponding to the basic prediction block.
  • the original reference block having a preset position relationship with the image block to be processed includes: an original reference block having a preset spatial position relationship with the image block to be processed and / Or an original reference block with a preset time-domain position relationship with the image block to be processed.
  • the original reference block having a preset spatial relationship with the image block to be processed includes: located at an upper left corner of the image block to be processed and an upper left corner of the image block to be processed Points adjacent to the image block to be processed, an image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, and a bottom left corner of the image block to be processed and to the image to be processed One or more of the image blocks adjacent to the lower left corner of the block, wherein the original reference block having a preset spatial relationship with the image block to be processed is located outside the image block to be processed.
  • the original reference block having a preset time-domain positional relationship with the image block to be processed includes: located in a target reference frame in a lower right corner of the mapped image block and related to the mapped image block.
  • the size of the image blocks is equal, and the position of the mapped image block in the target reference frame is the same as the position of the image block to be processed in the image frame where the image block to be processed is located.
  • the index information and reference frame list information of the target reference frame are obtained by parsing the code stream.
  • the index information of the target reference frame and the reference frame list information are located in a code stream segment corresponding to a slice header of a slice where the image block to be processed is located.
  • the calculation module is specifically configured to obtain a motion vector corresponding to the basic prediction block according to the following formula:
  • R (W, y) ((H-y-1) ⁇ AR + (y + 1) ⁇ BR) / H
  • AR is the motion vector corresponding to the image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed
  • BR is the lower right corner of the mapped image block in the target reference frame
  • BL is the image located at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed
  • x is the ratio of the horizontal distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the image block to be processed to the width of the basic prediction block
  • y is the basic prediction block The ratio of the vertical distance of the upper left corner point of the upper left corner point of the image block to be processed to the height of the basic prediction block, where H is the height of the image block to be processed and the height of the basic prediction block.
  • W is the ratio of the width of the image block to be processed to the width of the basic prediction block
  • L (-1, y) is the motion vector corresponding to the second reference block
  • a (x, -1) is A motion vector corresponding to the first reference block
  • P (x, y) is the basic prediction Corresponding to the block motion vector.
  • the determining module 1001 is specifically configured to: when the side lengths of two adjacent sides of the basic prediction block are different, determine the length of one of the shorter sides of the basic prediction block.
  • the side length is 4 or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, it is determined that the side length of the basic prediction block is 4 or 8.
  • the determining module 1001 is specifically configured to parse a first identifier from a code stream, where the first identifier is used to indicate a size of the basic prediction block, where the first The identifier is located in a code stream segment corresponding to one of a sequence parameter set of a sequence where the image block to be processed is located, an image parameter set of an image where the image block to be processed is located, and a slice header of a strip where the image block to be processed is located .
  • the determining module 1001 is specifically configured to determine a size of the basic prediction block according to a size of a planar mode prediction block in a previously reconstructed image, where the planar mode prediction block is
  • the previously reconstructed image is an image to be processed for inter-frame prediction
  • the previously reconstructed image is an image whose coding order is before the image where the image to be processed is located.
  • the determining module 1001 is specifically configured to: calculate an average value of a product of width and height of all the planar mode prediction blocks in the previously reconstructed image; when the average When the value is less than the threshold, the size of the basic prediction block is the first size; when the average is greater than or equal to the threshold, the size of the basic prediction block is the second size, wherein the first size is less than The second size.
  • the coding order is far from the image block to be processed The image that was most recently reconstructed.
  • the previously reconstructed image is a reconstructed image whose encoding order is closest to the image where the image block to be processed is located.
  • the previously reconstructed image is a plurality of images.
  • the determining module 1001 is specifically configured to calculate all of the plurality of previously reconstructed images. The average of the product of the width and height of the planar mode prediction block is described.
  • the threshold is a preset threshold.
  • the threshold when the POC of the reference frame of the image in which the image block to be processed is less than the POC of the image in which the image block to be processed is located, the threshold is the first threshold; When the POC of at least one reference frame of the image in which the image block to be processed is greater than the POC of the image in which the image block is to be processed, the threshold is a second threshold, wherein the first threshold and the second The thresholds are different.
  • a dividing module 1004 configured to divide the image block to be processed into a plurality of the basic prediction blocks according to the size; and determine each of the basic predictions in turn. The position of the block in the image block to be processed.
  • it further includes a determining module 1005, configured to determine that the first reference block and the second reference block are located within an image boundary where the image block to be processed is located.
  • the determining module 1005 is further configured to: determine that a width of the image block to be processed is greater than or equal to 16 and a height of the image block to be processed is greater than or equal to 16; or, determine The width of the image block to be processed is greater than or equal to 16; or, it is determined that the height of the image block to be processed is greater than or equal to 16.
  • the apparatus is configured to encode the image block to be processed, or decode the image block to be processed.
  • FIG. 11 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as a decoding device 1100) according to an embodiment of the present application.
  • the decoding device 1100 may include a processor 1110, a memory 1130, and a bus system 1150.
  • the processor and the memory are connected through a bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory.
  • the memory of the encoding device stores program code, and the processor can call the program code stored in the memory to perform various video encoding or decoding methods described in this application, especially the video encoding or decoding methods in various new inter prediction modes. , And methods for predicting motion information in various new inter prediction modes. To avoid repetition, it will not be described in detail here.
  • the memory 1130 may include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may also be used as the memory 1130.
  • the memory 1130 may include code and data 1131 accessed by the processor 1110 using the bus 1150.
  • the memory 1130 may further include an operating system 1133 and an application program 1135, which includes a processor 1110 that allows the processor 1110 to perform the video encoding or decoding method described in this application (especially the inter prediction method or the motion information prediction method described in this application). At least one program.
  • the application programs 1135 may include applications 1 to N, which further include a video encoding or decoding application (referred to as a video decoding application) that executes the video encoding or decoding method described in this application.
  • the bus system 1150 may include a power bus, a control bus, and a status signal bus in addition to a data bus. However, for the sake of clarity, various buses are marked as the bus system 1150 in the figure.
  • the decoding device 1100 may further include one or more output devices, such as a display 1170.
  • the display 1170 may be a tactile display that incorporates the display with a tactile unit operatively sensing a touch input.
  • the display 1170 may be connected to the processor 1110 via a bus 1150.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code, and executed by a hardware-based processing unit.
  • the computer-readable medium may include a computer-readable storage medium or a communication medium, the computer-readable storage medium corresponding to a tangible medium such as a data storage medium, and the communication medium includes a computer program that facilitates, for example, transmission from one place to another according to a communication protocol Any media.
  • computer-readable media may illustratively correspond to (1) non-transitory, tangible computer-readable storage media, or (2) a communication medium such as a signal or carrier wave.
  • a data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application.
  • the computer program product may include a computer-readable medium.
  • the computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store rendering instructions. Or any other medium in the form of a data structure with the desired code and accessible by a computer. Also, any connection is properly termed a computer-readable medium.
  • coaxial cable fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave
  • DSL digital subscriber line
  • coaxial Cables, fiber optic cables, twisted pairs, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transitory, tangible storage media.
  • magnetic disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks, and Blu-ray discs, where magnetic discs typically reproduce data magnetically, and optical discs pass lasers The data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • processors may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein.
  • functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
  • the techniques of this application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or collections of ICs (eg, chipset).
  • ICs integrated circuits
  • collections of ICs eg, chipset
  • Various components, modules, or units are described in this application to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. More specifically, as described above, various units may be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) combined with appropriate software and / or firmware To provide.

Abstract

本申请实施例涉及一种帧间预测的方法和装置,该方法包括:确定待处理图像块中的基本预测块的尺寸,该尺寸用于确定基本预测块在待处理图像块中的位置;根据该位置,确定基本预测块的第一参考块和第二参考块,其中,第一参考块的左边界线和基本预测单元的左边界线共线,第二参考块的上边界线和基本预测单元的上边界线共线,第一参考块与待处理图像块的上边界线邻接,第二参考块与待处理图像块的左边界线邻接;对第一参考块对应的运动矢量、第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得基本预测块对应的运动矢量。

Description

一种帧间预测的方法及装置
本申请要求于2018年8月29日提交中国国家知识产权局、申请号为201810995914.4、申请名称为“一种帧间预测的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频编解码技术领域,尤其涉及一种视频图像的帧间预测方法及装置。
背景技术
数字视频能力可并入到多种多样的装置中,包含数字电视、数字直播系统、无线广播系统、个人数字助理(PDA)、膝上型或桌上型计算机、平板计算机、电子图书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频压缩技术,例如,在由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4第10部分高级视频编码(AVC)定义的标准、视频编码标准H.265/高效视频编码(HEVC)标准以及此类标准的扩展中所描述的视频压缩技术。视频装置可通过实施此类视频压缩技术来更有效率地发射、接收、编码、解码和/或存储数字视频信息。
视频压缩技术执行空间(图像内)预测和/或时间(图像间)预测以减少或去除视频序列中固有的冗余。对于基于块的视频编码,视频条带(即,视频帧或视频帧的一部分)可分割成若干图像块,所述图像块也可被称作树块、编码单元(CU)和/或编码节点。使用关于同一图像中的相邻块中的参考样本的空间预测来编码图像的待帧内编码(I)条带中的图像块。图像的待帧间编码(P或B)条带中的图像块可使用相对于同一图像中的相邻块中的参考样本的空间预测或相对于其它参考图像中的参考样本的时间预测。图像可被称作帧,且参考图像可被称作参考帧。
其中,包含高效视频编码(HEVC)标准在内的各种视频编码标准提出了用于图像块的预测性编码模式,即基于已经编码的图像块来预测当前编码的图像块。在帧内预测模式中,基于与当前块在相同的图像中的一或多个先前经解码相邻块来预测当前解码的图像块;在帧间预测模式中,基于不同图像中的已经解码块来预测当前解码的图像块。
然而,现有的几种帧间预测模式,例如合并模式(Merge mode)、跳过模式(Skip mode)和高级运动矢量预测模式(AMVP mode)仍然无法满足实际的不同应用场景对运动矢量的预测准确性的要求。
发明内容
本申请实施例提供了一种帧间预测的方法与装置,具体的,提供给了一种帧间预测模式,该帧间预测模式利用和待处理块具有相关位置关系的空域或时域参考块对应的运动矢量进行插值,获得待处理块内部各个子块对应的运动矢量,提高了帧间预测的效率,同时 通过调整子块的尺寸以及限制该帧间预测模式的适用条件等方法,进一步实现了编码增益和复杂度的平衡。应理解,插值获得的待处理图像块的各子块对应的运动矢量,可以直接作为待处理图像块各子块的运动矢量参与运动补偿,也可以作为各子块的运动矢量的预测值,进一步根据该预测值获得运动矢量,再进行运动补偿。本申请实施例涉及的帧间预测方法可以作为一种预测模式和其他现有技术中的预测模式一起,在编码端参与率失真选择,当该预测模式被编码端确定为最优预测模式时,和现有技术中的预测模式一样,其在预测模式集合中的标识信息会被编入码流并传递到解码端,解码端会根据接收到的码流信息,解析该预测模式,实现编解码端的一致。
在本申请实施例的第一方面提供了一种帧间预测的方法,包括:确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;根据所述位置,确定所述基本预测块的第一参考块和第二参考块,其中,所述第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量。
该实施方式的有益效果在于:对于待处理图像块中的每一个子块(即基本预测块)分成不同的运动矢量,从而使待处理图像块对应的运动矢量场更精确,提高预测效率。
在第一方面的第一种可行的实施方式中,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块和/或与所述待处理图像块具有预设时域位置关系的原始参考块。
该实施方式的有益效果在于:合理地选择用于生成基本预测块对应的运动矢量的参考块,提高生成的运动矢量的可靠性。
在第一方面的第二种可行的实施方式中,与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个,其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
该实施方式的有益效果在于:合理地选择用于生成基本预测块对应的运动矢量的空域参考块,提高生成的运动矢量的可靠性。
在第一方面的第三种可行的实施方式中,与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块,其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同。
该实施方式的有益效果在于:合理地选择用于生成基本预测块对应的运动矢量的时域参考块,提高生成的运动矢量的可靠性。
在第一方面的第四种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息通过解析所述码流获得。
该实施方式的有益效果在于:与现有技术中预先设定目标参考帧相比,可以灵活地选择目标参考帧,使对应的时域参考块更可靠。
在第一方面的第五种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
该实施方式的有益效果在于:将目标参考帧的标识信息存储于条带头,条带内的图像块所有时域参考块共享相同的参考帧信息,节省了编码码流,提高了编码效率。
在第一方面的第六种可行的实施方式中,所述对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量,包括:所述基本预测块对应的运动矢量根据如下公式获得:
P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W),
其中,
P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)
P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)
R(W,y)=((H-y-1)×AR+(y+1)×BR)/H
B(x,H)=((W-x-1)×BL+(x+1)×BR)/W
AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,H为所述待处理图像块的高与所述基本预测块的高的比值,W为所述待处理图像块的宽与所述基本预测块的宽的比值,L(-1,y)为所述第二参考块对应的运动矢量,A(x,-1)为所述第一参考块对应的运动矢量,P(x,y)为所述基本预测块对应的运动矢量。
本申请具体实施方式部分提高了多种所述对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量的实施方式,并不仅限于本实施方式。
在第一方面的第七种可行的实施方式中,所述确定待处理图像块中的基本预测块的尺寸,包括:当所述基本预测块的两条邻边的边长不等时,确定所述基本预测块的较短的一条边的边长为4或8;当所述基本预测块的两条邻边的边长相等时,确定所述基本预测块的边长为4或8。
该实施方式的有益效果在于:固定基本预测块的尺寸,降低了复杂度。
在第一方面的第八种可行的实施方式中,所述确定待处理图像块中的基本预测块的尺寸,包括:从码流中解析第一标识,所述第一标识用于指示所述基本预测块的尺寸,其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像 的图像参数集和所述待处理图像块所在条带的条带头中的一个所对应的码流段中。
该实施方式的有益效果在于:在辅助信息中加入基本预测块的尺寸的标识信息,提高了对于图像内容的适应性。
在第一方面的第九种可行的实施方式中,所述确定待处理图像块中的基本预测块的尺寸,包括:根据在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,所述平面模式预测块为根据第一方面前述任一项可行的实施方式进行帧间预测的待处理图像块,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像。
在第一方面的第十种可行的实施方式中,所述根据所述待处理图像块所在图像的在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,包括:计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值;当所述平均值小于阈值时,所述基本预测块的尺寸为第一尺寸;当所述平均值大于或等于所述阈值时,所述基本预测块的尺寸为第二尺寸,其中,所述第一尺寸小于所述第二尺寸。
该实施方式的有益效果在于:利用先验信息来确定当前图像的基本预测块的尺寸,不需要传递额外的标识信息,既提高了对于图像的适应性,又保证了不增加编码码率。
在第一方面的第十一种可行的实施方式中,所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
该实施方式的有益效果在于:合理地选择同一时域层中的最近参考帧来统计先验信息,提高了统计信息的可靠性。
在第一方面的第十二种可行的实施方式中,所述在先已重构图像为编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
该实施方式的有益效果在于:合理地选择最近的参考帧来统计先验信息,提高了统计信息的可靠性。
在第一方面的第十三种可行的实施方式中,所述在先已重构图像为多个图像,对应的,所述计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值,包括:计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值。
该实施方式的有益效果在于:累计多帧的统计信息来确定当前图像中基本预测块的尺寸,提高了统计的可靠性。
在第一方面的第十四种可行的实施方式中,所述阈值为预设阈值。
在第一方面的第十五种可行的实施方式中,当所述待处理图像块所在的图像的参考帧的POC均小于所述待处理图像块所在的图像的POC时,所述阈值为第一阈值;当所述待处理图像块所在的图像的至少一个参考帧的POC大于所述待处理图像块所在的图像的POC时,所述阈值为第二阈值,其中,所述第一阈值和所述第二阈值不同。
该实施方式的有益效果在于:可以根据不同的编码场景,设置不同的阈值,提高了对应编码场景的适应性。
在第一方面的第十六种可行的实施方式中,在所述确定待处理图像块中的基本预测块的尺寸之后,还包括:根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;依次确定每个所述基本预测块在所述待处理图像块中的位置。
应理解,该实施方式确定了各个基本预测块在待处理图像块中的坐标位置。
在第一方面的第十七种可行的实施方式中,在所述确定待处理图像块中的基本预测块的尺寸之前,所述方法还包括:确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
该实施方式的有益效果在于:当待处理图像块不存在第一参考块或第二参考块时不采用本申请实施例中的预测方法,当第一参考块和第二参考块不存在时,该预测方法的准确性会降低,此时不采用本方法,则避免了不必要的复杂度开销。
在第一方面的第十八种可行的实施方式中,在所述确定待处理图像块中的基本预测块的尺寸之前,所述方法还包括:确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
该实施方式的有益效果在于:当待处理图像块过小时不采用本申请实施例中的预测方法,平衡了编码效率和复杂度。
在第一方面的第十九种可行的实施方式中,所述方法用于编码所述待处理图像块,或者,解码所述待处理图像块。
应理解,本申请实施例涉及一种帧间预测方法,在混合编码架构下,既属于编码过程的一部分,也属于解码过程的一部分。
在本申请实施例的第二方面提供了一种帧间预测的装置,包括:确定模块,用于确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;定位模块,用于根据所述位置,确定所述基本预测块的第一参考块和第二参考块,其中,所述第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;计算模块,用于对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量。
在第二方面的第一种可行的实施方式中,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块和/或与所述待处理图像块具有预设时域位置关系的原始参考块。
在第二方面的第二种可行的实施方式中,与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个,其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
在第二方面的第三种可行的实施方式中,与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块,其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中 的位置相同。
在第二方面的第四种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息通过解析所述码流获得。
在第二方面的第五种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
在第二方面的第六种可行的实施方式中,所述计算模块具体用于根据如下公式获得所述基本预测块对应的运动矢量:
P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W),
其中,
P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)
P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)
R(W,y)=((H-y-1)×AR+(y+1)×BR)/H
B(x,H)=((W-x-1)×BL+(x+1)×BR)/W
AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,H为所述待处理图像块的高与所述基本预测块的高的比值,W为所述待处理图像块的宽与所述基本预测块的宽的比值,L(-1,y)为所述第二参考块对应的运动矢量,A(x,-1)为所述第一参考块对应的运动矢量,P(x,y)为所述基本预测块对应的运动矢量。
在第二方面的第七种可行的实施方式中,所述确定模块具体用于:当所述基本预测块的两条邻边的边长不等时,确定所述基本预测块的较短的一条边的边长为4或8;当所述基本预测块的两条邻边的边长相等时,确定所述基本预测块的边长为4或8。
在第二方面的第八种可行的实施方式中,所述确定模块具体用于:从码流中解析第一标识,所述第一标识用于指示所述基本预测块的尺寸,其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像的图像参数集和所述待处理图像块所在条带的条带头中的一个所对应的码流段中。
在第二方面的第九种可行的实施方式中,所述确定模块具体用于:根据在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,所述平面模式预测块为根据第二方面前述任一项可行的实施方式进行帧间预测的待处理图像块,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像。
在第二方面的第十种可行的实施方式中,所述确定模块具体用于:计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值;当所述平均值小于阈值时,所述基本预测块的尺寸为第一尺寸;当所述平均值大于或等于所述阈值时,所述基本预测块的尺寸为第二尺寸,其中,所述第一尺寸小于所述第二尺寸。
在第二方面的第十一种可行的实施方式中,所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离所述待处理图像块所在的图 像最近的已重构图像。
在第二方面的第十二种可行的实施方式中,所述在先已重构图像为编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
在第二方面的第十三种可行的实施方式中,所述在先已重构图像为多个图像,对应的,所述确定模块具体用于:计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值。
在第二方面的第十四种可行的实施方式中,所述阈值为预设阈值。
在第二方面的第十五种可行的实施方式中,当所述待处理图像块所在的图像的参考帧的POC均小于所述待处理图像块所在的图像的POC时,所述阈值为第一阈值;当所述待处理图像块所在的图像的至少一个参考帧的POC大于所述待处理图像块所在的图像的POC时,所述阈值为第二阈值,其中,所述第一阈值和所述第二阈值不同。
在第二方面的第十六种可行的实施方式中,还包括划分模块,用于:根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;依次确定每个所述基本预测块在所述待处理图像块中的位置。
在第二方面的第十七种可行的实施方式中,还包括判断模块,用于:确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
在第二方面的第十八种可行的实施方式中,所述判断模块还用于:确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
在第二方面的第十九种可行的实施方式中,所述装置用于编码所述待处理图像块,或者,解码所述待处理图像块。
本申请实施例的第三方面提供了一种帧间预测的设备,包括:处理器和耦合于所述处理器的存储器;所述处理器用于执行上述第一方面所述的方法。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的方法。
本申请实施例的第五方面提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的方法。
本申请实施例的第六方面提供了一种视频图像编码器,所述视频图像编码器包含上述第二方面所述的装置。
本申请实施例的第七方面提供了一种视频图像解码器,所述视频图像解码器包含上述第二方面所述的装置。
应理解,本申请的第二至七方面与本申请的第一方面的技术方案一致,各方面及对应的可实施的设计方式所取得的有益效果相似,不再赘述。
附图说明
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1为本申请实施例中视频编码及解码系统的一种示例性框图;
图2为本申请实施例中视频编码器的一种示例性框图;
图3为本申请实施例中视频解码器的一种示例性框图;
图4为本申请实施例中帧间预测模块的一种示意性框图;
图5为本申请实施例中待处理图像块和其参考块位置关系的一种示意图;
图6为本申请实施例中帧间预测方法的一种示例性流程图;
图7为本申请实施例中加权计算基本预测块对应的运动矢量的一种示意图;
图8为本申请实施例中加权计算基本预测块对应的运动矢量的另一种示意图;
图9为本申请实施例中加权计算基本预测块对应的运动矢量的又一种示意图;
图10为本申请实施例中帧间预测装置的一种示例性框图;
图11为本申请实施例中译码设备的一种示例性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
图1为本申请实施例中所描述的一种实例的视频译码系统1的框图。如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本申请中,术语“视频译码”或“译码”可一般地指代视频编码或视频解码。视频译码系统1的视频编码器100和视频解码器200用于根据本申请提出的多种新的帧间预测模式中的任一种所描述的各种方法实例来预测当前经译码图像块或其子块的运动信息,例如运动矢量,使得预测出的运动矢量最大程度上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,从而进一步的改善编解码性能。
如图1中所示,视频译码系统1包含源装置10和目的地装置20。源装置10产生经编码视频数据。因此,源装置10可被称为视频编码装置。目的地装置20可对由源装置10所产生的经编码的视频数据进行解码。因此,目的地装置20可被称为视频解码装置。源装置10、目的地装置20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。
源装置10和目的地装置20可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
目的地装置20可经由链路30从源装置10接收经编码视频数据。链路30可包括能够将经编码视频数据从源装置10移动到目的地装置20的一或多个媒体或装置。在一个实例中,链路30可包括使得源装置10能够实时将经编码视频数据直接发射到目的地装置20的一或多个通信媒体。在此实例中,源装置10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的地装置20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为 局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备。
在另一实例中,可将经编码数据从输出接口140输出到存储装置40。类似地,可通过输入接口240从存储装置40存取经编码数据。存储装置40可包含多种分布式或本地存取的数据存储媒体中的任一者,例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器,或用于存储经编码视频数据的任何其它合适的数字存储媒体。
在另一实例中,存储装置40可对应于文件服务器或可保持由源装置10产生的经编码视频的另一中间存储装置。目的地装置20可经由流式传输或下载从存储装置40存取所存储的视频数据。文件服务器可为任何类型的能够存储经编码的视频数据并且将经编码的视频数据发射到目的地装置20的服务器。实例文件服务器包含网络服务器(例如,用于网站)、FTP服务器、网络附接式存储(NAS)装置或本地磁盘驱动器。目的地装置20可通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。这可包含无线信道(例如,Wi-Fi连接)、有线连接(例如,DSL、电缆调制解调器等),或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置40的传输可为流式传输、下载传输或两者的组合。
本申请的运动矢量预测技术可应用于视频编解码以支持多种多媒体应用,例如空中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,视频译码系统1可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。
图1中所说明的视频译码系统1仅为实例,并且本申请的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的装置执行编码和解码。
在图1的实例中,源装置10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,源装置10经由输出接口140将经编码视频数据直接发射到目的地装置20。在其它实例中,经编码视频数据还可存储到存储装置40上,供目的地装置20以后存取来用于解码和/或播放。
在图1的实例中,目的地装置20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包 括多种显示装置,例如,液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。
尽管图1中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。在一些实例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(UDP)等其它协议。
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本申请技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本申请可大体上将视频编码器100称为将某些信息“发信号通知”或“发射”到例如视频解码器200的另一装置。术语“发信号通知”或“发射”可大体上指代用以对经压缩视频数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码码流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
JCT-VC开发了H.265(HEVC)标准。HEVC标准化基于称作HEVC测试模型(HM)的视频解码装置的演进模型。H.265的最新标准文档可从http://www.itu.int/rec/T-REC-H.265获得,最新版本的标准文档为H.265(12/16),该标准文档以全文引用的方式并入本文中。HM假设视频解码装置相对于ITU-TH.264/AVC的现有算法具有若干额外能力。例如,H.264提供9种帧内预测编码模式,而HM可提供多达35种帧内预测编码模式。
JVET致力于开发H.266标准。H.266标准化的过程基于称作H.266测试模型的视频解码装置的演进模型。H.266的算法描述可从http://phenix.int-evry.fr/jvet获得,其中最新的算法描述包含于JVET-F1001-v2中,该算法描述文档以全文引用的方式并入本文中。同时,可从https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/获得JEM测试模型的参考软件,同样以全文引用的方式并入本文中。
一般来说,HM的工作模型描述可将视频帧或图像划分成包含亮度及色度样本两者的树块或最大编码单元(largest coding unit,LCU)的序列,LCU也被称为CTU。树块具有与H.264标准的宏块类似的目的。条带包含按解码次序的数个连续树块。可将视频帧或图像分割成一个或多个条带。可根据四叉树将每一树块分裂成编码单元。例如,可将作为四叉树的根节点的树块分裂成四个子节点,且每一子节点可又为母节点且被分裂成另外四个子节点。作为四叉树的叶节点的最终不可分裂的子节点包括解码节点,例如,经解码视频块。与经解码码流相关联的语法数据可定义树块可分裂的最大次数, 且也可定义解码节点的最小大小。
编码单元包含解码节点及预测块(prediction unit,PU)以及与解码节点相关联的变换单元(transform unit,TU)。CU的大小对应于解码节点的大小且形状必须为正方形。CU的大小的范围可为8×8像素直到最大64×64像素或更大的树块的大小。每一CU可含有一个或多个PU及一个或多个TU。例如,与CU相关联的语法数据可描述将CU分割成一个或多个PU的情形。分割模式在CU是被跳过或经直接模式编码、帧内预测模式编码或帧间预测模式编码的情形之间可为不同的。PU可经分割成形状为非正方形。例如,与CU相关联的语法数据也可描述根据四叉树将CU分割成一个或多个TU的情形。TU的形状可为正方形或非正方形。
HEVC标准允许根据TU进行变换,TU对于不同CU来说可为不同的。TU通常基于针对经分割LCU定义的给定CU内的PU的大小而设定大小,但情况可能并非总是如此。TU的大小通常与PU相同或小于PU。在一些可行的实施方式中,可使用称作“残余四叉树”(residual qualtree,RQT)的四叉树结构将对应于CU的残余样本再分成较小单元。RQT的叶节点可被称作TU。可变换与TU相关联的像素差值以产生变换系数,变换系数可被量化。
一般来说,PU包含与预测过程有关的数据。例如,在PU经帧内模式编码时,PU可包含描述PU的帧内预测模式的数据。作为另一可行的实施方式,在PU经帧间模式编码时,PU可包含界定PU的运动矢量的数据。例如,界定PU的运动矢量的数据可描述运动矢量的水平分量、运动矢量的垂直分量、运动矢量的分辨率(例如,四分之一像素精确度或八分之一像素精确度)、运动矢量所指向的参考图像,和/或运动矢量的参考图像列表(例如,列表0、列表1或列表C)。
一般来说,TU使用变换及量化过程。具有一个或多个PU的给定CU也可包含一个或多个TU。在预测之后,视频编码器100可计算对应于PU的残余值。残余值包括像素差值,像素差值可变换成变换系数、经量化且使用TU扫描以产生串行化变换系数以用于熵解码。本申请通常使用术语“视频块”来指CU的解码节点。在一些特定应用中,本申请也可使用术语“视频块”来指包含解码节点以及PU及TU的树块,例如,LCU或CU。
视频序列通常包含一系列视频帧或图像。图像群组(group of picture,GOP)示例性地包括一系列、一个或多个视频图像。GOP可在GOP的头信息中、图像中的一者或多者的头信息中或在别处包含语法数据,语法数据描述包含于GOP中的图像的数目。图像的每一条带可包含描述相应图像的编码模式的条带语法数据。视频编码器100通常对个别视频条带内的视频块进行操作以便编码视频数据。视频块可对应于CU内的解码节点。视频块可具有固定或变化的大小,且可根据指定解码标准而在大小上不同。
作为一种可行的实施方式,HM支持各种PU大小的预测。假定特定CU的大小为2N×2N,HM支持2N×2N或N×N的PU大小的帧内预测,及2N×2N、2N×N、N×2N或N×N的对称PU大小的帧间预测。HM也支持2N×nU、2N×nD、nL×2N及nR×2N的PU大小的帧间预测的不对称分割。在不对称分割中,CU的一方向未分割,而另一方向分割成25%及75%。对应于25%区段的CU的部分由“n”后跟着“上(Up)”、“下(Down)”、“左(Left)”或“右(Right)”的指示来指示。因此,例如,“2N×nU”指水平分割的2N×2NCU,其中2N×0.5NPU在上部且2N×1.5NPU在底部。
在本申请中,“N×N”与“N乘N”可互换使用以指依照垂直维度及水平维度的视频块的像素尺寸,例如,16×16像素或16乘16像素。一般来说,16×16块将在垂直方向上具有16个像素(y=16),且在水平方向上具有16个像素(x=16)。同样地,N×N块一股在垂直方向上具有N个像素,且在水平方向上具有N个像素,其中N表示非负整数值。可将块中的像素排列成行及列。此外,块未必需要在水平方向上与在垂直方向上具有相同数目个像素。例如,块可包括N×M个像素,其中M未必等于N。
在使用CU的PU的帧内预测性或帧间预测性解码之后,视频编码器100可计算CU的TU的残余数据。PU可包括空间域(也称作像素域)中的像素数据,且TU可包括在将变换(例如,离散余弦变换(discrete cosine transform,DCT)、整数变换、小波变换或概念上类似的变换)应用于残余视频数据之后变换域中的系数。残余数据可对应于未经编码图像的像素与对应于PU的预测值之间的像素差。视频编码器100可形成包含CU的残余数据的TU,且接着变换TU以产生CU的变换系数。
在任何变换以产生变换系数之后,视频编码器100可执行变换系数的量化。量化示例性地指对系数进行量化以可能减少用以表示系数的数据的量从而提供进一步压缩的过程。量化过程可减少与系数中的一些或全部相关联的位深度。例如,可在量化期间将n位值降值舍位到m位值,其中n大于m。
JEM模型对视频图像的编码结构进行了进一步的改进,具体的,被称为“四叉树结合二叉树”(QTBT)的块编码结构被引入进来。QTBT结构摒弃了HEVC中的CU,PU,TU等概念,支持更灵活的CU划分形状,一个CU可以正方形,也可以是长方形。一个CTU首先进行四叉树划分,该四叉树的叶节点进一步进行二叉树划分。同时,在二叉树划分中存在两种划分模式,对称水平分割和对称竖直分割。二叉树的叶节点被称为CU,JEM的CU在预测和变换的过程中都不可以被进一步划分,也就是说JEM的CU,PU,TU具有相同的块大小。在现阶段的JEM中,CTU的最大尺寸为256×256亮度像素。
在一些可行的实施方式中,视频编码器100可利用预定义扫描次序来扫描经量化变换系数以产生可经熵编码的串行化向量。在其它可行的实施方式中,视频编码器100可执行自适应性扫描。在扫描经量化变换系数以形成一维向量之后,视频编码器100可根据上下文自适应性可变长度解码(CAVLC)、上下文自适应性二进制算术解码(CABAC)、基于语法的上下文自适应性二进制算术解码(SBAC)、概率区间分割熵(PIPE)解码或其他熵解码方法来熵解码一维向量。视频编码器100也可熵编码与经编码视频数据相关联的语法元素以供视频解码器200用于解码视频数据。
为了执行CABAC,视频编码器100可将上下文模型内的上下文指派给待传输的符号。上下文可与符号的相邻值是否为非零有关。为了执行CAVLC,视频编码器100可选择待传输的符号的可变长度码。可变长度解码(VLC)中的码字可经构建以使得相对较短码对应于可能性较大的符号,而较长码对应于可能性较小的符号。以这个方式,VLC的使用可相对于针对待传输的每一符号使用相等长度码字达成节省码率的目的。基于指派给符号的上下文可以确定CABAC中的概率。
在本申请实施例中,视频编码器可执行帧间预测以减少图像之间的时间冗余。如前文所描述,根据不同视频压缩编解码标准的规定,CU可具有一个或多个预测单元PU。换句话说,多个PU可属于CU,或者PU和CU的尺寸相同。在本文中当CU和PU 尺寸相同时,CU的分割模式为不分割,或者即为分割为一个PU,且统一使用PU进行表述。当视频编码器执行帧间预测时,视频编码器可用信号通知视频解码器用于PU的运动信息。示例性的,PU的运动信息可以包括:参考图像索引、运动矢量和预测方向标识。运动矢量可指示PU的图像块(也称视频块、像素块、像素集合等)与PU的参考块之间的位移。PU的参考块可为类似于PU的图像块的参考图像的一部分。参考块可定位于由参考图像索引和预测方向标识指示的参考图像中。
为了减少表示PU的运动信息所需要的编码比特的数目,视频编码器可根据合并预测模式或高级运动矢量预测模式过程产生用于PU中的每一者的候选预测运动矢量(Motion Vector,MV)列表。用于PU的候选预测运动矢量列表中的每一候选预测运动矢量可指示运动信息。由候选预测运动矢量列表中的一些候选预测运动矢量指示的运动信息可基于其它PU的运动信息。如果候选预测运动矢量指示指定空间候选预测运动矢量位置或时间候选预测运动矢量位置中的一者的运动信息,则本申请可将所述候选预测运动矢量称作“原始”候选预测运动矢量。举例来说,对于合并模式,在本文中也称为合并预测模式,可存在五个原始空间候选预测运动矢量位置和一个原始时间候选预测运动矢量位置。在一些实例中,视频编码器可通过组合来自不同原始候选预测运动矢量的部分运动矢量、修改原始候选预测运动矢量或仅插入零运动矢量作为候选预测运动矢量来产生额外候选预测运动矢量。这些额外候选预测运动矢量不被视为原始候选预测运动矢量且在本申请中可称作人工产生的候选预测运动矢量。
本申请的技术一般涉及用于在视频编码器处产生候选预测运动矢量列表的技术和用于在视频解码器处产生相同候选预测运动矢量列表的技术。视频编码器和视频解码器可通过实施用于构建候选预测运动矢量列表的相同技术来产生相同候选预测运动矢量列表。举例来说,视频编码器和视频解码器两者可构建具有相同数目的候选预测运动矢量(例如,五个候选预测运动矢量)的列表。视频编码器和解码器可首先考虑空间候选预测运动矢量(例如,同一图像中的相邻块),接着考虑时间候选预测运动矢量(例如,不同图像中的候选预测运动矢量),且最后可考虑人工产生的候选预测运动矢量直到将所要数目的候选预测运动矢量添加到列表为止。根据本申请的技术,可在候选预测运动矢量列表构建期间针对某些类型的候选预测运动矢量利用修剪操作以便从候选预测运动矢量列表移除重复,而对于其它类型的候选预测运动矢量,可能不使用修剪以便减小解码器复杂性。举例来说,对于空间候选预测运动矢量集合和对于时间候选预测运动矢量,可执行修剪操作以从候选预测运动矢量的列表排除具有重复运动信息的候选预测运动矢量。然而,当将人工产生的候选预测运动矢量添加到候选预测运动矢量的列表时,可在不对人工产生的候选预测运动矢量执行修剪操作的情况下添加人工产生的候选预测运动矢量。
在产生用于CU的PU的候选预测运动矢量列表之后,视频编码器可从候选预测运动矢量列表选择候选预测运动矢量且在码流中输出候选预测运动矢量索引。选定候选预测运动矢量可为具有产生最紧密地匹配正被解码的目标PU的预测子的运动矢量的候选预测运动矢量。候选预测运动矢量索引可指示在候选预测运动矢量列表中选定候选预测运动矢量的位置。视频编码器还可基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。可基于由选定候选预测运动矢量指示的运动信息确定PU的运动 信息。举例来说,在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。在AMVP模式中,PU的运动信息可基于PU的运动矢量差和由选定候选预测运动矢量指示的运动信息确定。视频编码器可基于CU的PU的预测性图像块和用于CU的原始图像块产生用于CU的一或多个残余图像块。视频编码器可接着编码一或多个残余图像块且在码流中输出一或多个残余图像块。
码流可包括识别PU的候选预测运动矢量列表中的选定候选预测运动矢量的数据。视频解码器可基于由PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频解码器可基于PU的运动信息识别用于PU的一或多个参考块。在识别PU的一或多个参考块之后,视频解码器可基于PU的一或多个参考块产生用于PU的预测性图像块。视频解码器可基于用于CU的PU的预测性图像块和用于CU的一或多个残余图像块来重构用于CU的图像块。
为了易于解释,本申请可将位置或图像块描述为与CU或PU具有各种空间关系。此描述可解释为是指位置或图像块和与CU或PU相关联的图像块具有各种空间关系。此外,本申请可将视频解码器当前在解码的PU称作当前PU,也称为当前待处理图像块。本申请可将视频解码器当前在解码的CU称作当前CU。本申请可将视频解码器当前在解码的图像称作当前图像。应理解,本申请同时适用于PU和CU具有相同尺寸,或者PU即为CU的情况,统一使用PU来表示。
如前文简短地描述,视频编码器100可使用帧间预测以产生用于CU的PU的预测性图像块和运动信息。在许多例子中,给定PU的运动信息可能与一或多个附近PU(即,其图像块在空间上或时间上在给定PU的图像块附近的PU)的运动信息相同或类似。因为附近PU经常具有类似运动信息,所以视频编码器100可参考附近PU的运动信息来编码给定PU的运动信息。参考附近PU的运动信息来编码给定PU的运动信息可减少码流中指示给定PU的运动信息所需要的编码比特的数目。
视频编码器100可以各种方式参考附近PU的运动信息来编码给定PU的运动信息。举例来说,视频编码器100可指示给定PU的运动信息与附近PU的运动信息相同。本申请可使用合并模式来指代指示给定PU的运动信息与附近PU的运动信息相同或可从附近PU的运动信息导出。在另一可行的实施方式中,视频编码器100可计算用于给定PU的运动矢量差(Motion Vector Difference,MVD)。MVD指示给定PU的运动矢量与附近PU的运动矢量之间的差。视频编码器100可将MVD而非给定PU的运动矢量包括于给定PU的运动信息中。在码流中表示MVD比表示给定PU的运动矢量所需要的编码比特少。本申请可使用高级运动矢量预测模式指代通过使用MVD和识别候选者运动矢量的索引值来用信号通知解码端给定PU的运动信息。
为了使用合并模式或AMVP模式来用信号通知解码端给定PU的运动信息,视频编码器100可产生用于给定PU的候选预测运动矢量列表。候选预测运动矢量列表可包括一或多个候选预测运动矢量。用于给定PU的候选预测运动矢量列表中的候选预测运动矢量中的每一者可指定运动信息。由每一候选预测运动矢量指示的运动信息可包括运动矢量、参考图像索引和预测方向标识。候选预测运动矢量列表中的候选预测运动矢量可包括“原始”候选预测运动矢量,其中每一者指示不同于给定PU的PU内的指定候选预测运动矢量位置中的一者的运动信息。
在产生用于PU的候选预测运动矢量列表之后,视频编码器100可从用于PU的候选预测运动矢量列表选择候选预测运动矢量中的一者。举例来说,视频编码器可比较每一候选预测运动矢量与正被解码的PU且可选择具有所要码率-失真代价的候选预测运动矢量。视频编码器100可输出用于PU的候选预测运动矢量索引。候选预测运动矢量索引可识别选定候选预测运动矢量在候选预测运动矢量列表中的位置。
此外,视频编码器100可基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。可基于由用于PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。举例来说,在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。在AMVP模式中,可基于用于PU的运动矢量差和由选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频编码器100可如前文所描述处理用于PU的预测性图像块。
当视频解码器200接收到码流时,视频解码器200可产生用于CU的PU中的每一者的候选预测运动矢量列表。由视频解码器200针对PU产生的候选预测运动矢量列表可与由视频编码器100针对PU产生的候选预测运动矢量列表相同。从码流中解析得到的语法元素可指示在PU的候选预测运动矢量列表中选定候选预测运动矢量的位置。在产生用于PU的候选预测运动矢量列表之后,视频解码器200可基于由PU的运动信息指示的一或多个参考块产生用于PU的预测性图像块。视频解码器200可基于由用于PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频解码器200可基于用于PU的预测性图像块和用于CU的残余图像块重构用于CU的图像块。
应理解,在一种可行的实施方式中,在解码端,候选预测运动矢量列表的构建与从码流中解析选定候选预测运动矢量在候选预测运动矢量列表中的位置是相互独立,可以任意先后或者并行进行的。
在另一种可行的实施方式中,在解码端,首先从码流中解析选定候选预测运动矢量在候选预测运动矢量列表中的位置,根据解析出来的位置构建候选预测运动矢量列表,在该实施方式中,不需要构建全部的候选预测运动矢量列表,只需要构建到该解析出来的位置处的候选预测运动矢量列表,即能够确定该位置出的候选预测运动矢量即可。举例来说,当解析码流得出选定的候选预测运动矢量为候选预测运动矢量列表中索引为3的候选预测运动矢量时,仅需要构建从索引为0到索引为3的候选预测运动矢量列表,即可确定索引为3的候选预测运动矢量,可以达到减小复杂度,提高解码效率的技术效果。
图2为本申请实施例中所描述的一种实例的视频编码器100的框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。
在图2的实例中,视频编码器100包括预测处理单元108、滤波器单元106、经解码 图像缓冲器(DPB)107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106既定表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)和样本自适应偏移(SAO)滤波器。尽管在图2中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。
视频数据存储器可存储待由视频编码器100的组件编码的视频数据。可从视频源120获得存储在视频数据存储器中的视频数据。DPB 107可为参考图像存储器,其存储用于由视频编码器100在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。视频数据存储器和DPB 107可由多种存储器装置中的任一者形成,例如包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。视频数据存储器和DPB 107可由同一存储器装置或单独存储器装置提供。在各种实例中,视频数据存储器可与视频编码器100的其它组件一起在芯片上,或相对于那些组件在芯片外。
如图2所示,视频编码器100接收视频数据,并将所述视频数据存储在视频数据存储器中。分割单元将所述视频数据分割成若干图像块,而且这些图像块可以被进一步分割为更小的块,例如基于四叉树结构或者二叉树结构的图像块分割。此分割还可包含分割成条带(slice)、片(tile)或其它较大单元。视频编码器100通常说明编码待编码的视频条带内的图像块的组件。所述条带可分成多个图像块(并且可能分成被称作片的图像块集合)。预测处理单元108可选择用于当前图像块的多个可能的译码模式中的一者,例如多个帧内译码模式中的一者或多个帧间译码模式中的一者。预测处理单元108可将所得经帧内、帧间译码的块提供给求和器112以产生残差块,且提供给求和器111以重构用作参考图像的经编码块。
预测处理单元108内的帧内预测器109可相对于与待编码当前块在相同帧或条带中的一或多个相邻块执行当前图像块的帧内预测性编码,以去除空间冗余。预测处理单元108内的帧间预测器110可相对于一或多个参考图像中的一或多个预测块执行当前图像块的帧间预测性编码以去除时间冗余。
具体的,帧间预测器110可用于确定用于编码当前图像块的帧间预测模式。举例来说,帧间预测器110可使用码率-失真分析来计算候选帧间预测模式集合中的各种帧间预测模式的码率-失真值,并从中选择具有最佳码率-失真特性的帧间预测模式。码率失真分析通常确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量,以及用于产生经编码块的位码率(也就是说,位数目)。例如,帧间预测器110可确定候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
帧间预测器110用于基于确定的帧间预测模式,预测当前图像块中一个或多个子块的运动信息(例如运动矢量),并利用当前图像块中一个或多个子块的运动信息(例如运动矢量)获取或产生当前图像块的预测块。帧间预测器110可在参考图像列表中的一者中定位所述运动向量指向的预测块。帧间预测器110还可产生与图像块和视频条带 相关联的语法元素以供视频解码器200在对视频条带的图像块解码时使用。又或者,一种示例下,帧间预测器110利用每个子块的运动信息执行运动补偿过程,以生成每个子块的预测块,从而得到当前图像块的预测块;应当理解的是,这里的帧间预测器110执行运动估计和运动补偿过程。
具体的,在为当前图像块选择帧间预测模式之后,帧间预测器110可将指示当前图像块的所选帧间预测模式的信息提供到熵编码器103,以便于熵编码器103编码指示所选帧间预测模式的信息。
帧内预测器109可对当前图像块执行帧内预测。明确地说,帧内预测器109可确定用来编码当前块的帧内预测模式。举例来说,帧内预测器109可使用码率-失真分析来计算各种待测试的帧内预测模式的码率-失真值,并从待测试模式当中选择具有最佳码率-失真特性的帧内预测模式。在任何情况下,在为图像块选择帧内预测模式之后,帧内预测器109可将指示当前图像块的所选帧内预测模式的信息提供到熵编码器103,以便熵编码器103编码指示所选帧内预测模式的信息。
在预测处理单元108经由帧间预测、帧内预测产生当前图像块的预测块之后,视频编码器100通过从待编码的当前图像块减去所述预测块来形成残差图像块。求和器112表示执行此减法运算的一或多个组件。所述残差块中的残差视频数据可包含在一或多个TU中,并应用于变换器101。变换器101使用例如离散余弦变换(DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换器101可将残差视频数据从像素值域转换到变换域,例如频域。
变换器101可将所得变换系数发送到量化器102。量化器102量化所述变换系数以进一步减小位码率。在一些实例中,量化器102可接着执行对包含经量化的变换系数的矩阵的扫描。或者,熵编码器103可执行扫描。
在量化之后,熵编码器103对经量化变换系数进行熵编码。举例来说,熵编码器103可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器103熵编码之后,可将经编码码流发射到视频解码器200,或经存档以供稍后发射或由视频解码器200检索。熵编码器103还可对待编码的当前图像块的语法元素进行熵编码。
反量化器104和反变化器105分别应用逆量化和逆变换以在像素域中重构所述残差块,例如以供稍后用作参考图像的参考块。求和器111将经重构的残差块添加到由帧间预测器110或帧内预测器109产生的预测块,以产生经重构图像块。滤波器单元106可以适用于经重构图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重构图像块作为参考块存储在经解码图像缓冲器107中,可由帧间预测器110用作参考块以对后续视频帧或图像中的块进行帧间预测。
应当理解的是,视频编码器100的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器100可以直接地量化残差信号而不需要经变换器101处理,相应地也不需要经反变换器105处理;或者,对于某些图像块或者图像帧,视频编码器100没有产生残差数据,相应地不需要经变换器101、量化器102、反量化器104和反变换器105处理;或者,视频编码器100可以将经重构图像块作为参考块直接地 进行存储而不需要经滤波器单元106处理;或者,视频编码器100中量化器102和反量化器104可以合并在一起。
图3为本申请实施例中所描述的一种实例的视频解码器200的框图。在图3的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及经解码图像缓冲器207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行大体上与相对于来自图2的视频编码器100描述的编码过程互逆的解码过程。
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频码流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频码流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频码流的经编码视频数据的经解码图像缓冲器(CPB)。因此,尽管在图3中没有示意出视频数据存储器,但视频数据存储器和DPB 207可以是同一个的存储器,也可以是单独设置的存储器。视频数据存储器和DPB 207可由多种存储器装置中的任一者形成,例如:包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。在各种实例中,视频数据存储器可与视频解码器200的其它组件一起集成在芯片上,或相对于那些组件设置在芯片外。
网络实体42可例如为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它此装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频码流发送到视频解码器200之前,网络实体42可实施本申请中描述的技术中的部分。在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。在一些情况下,网络实体42可为图1的存储装置40的实例。
视频解码器200的熵解码器203对码流进行熵解码以产生经量化的系数和一些语法元素。熵解码器203将语法元素转发到预测处理单元208。视频解码器200可接收在视频条带层级和/或图像块层级处的语法元素。
当视频条带被解码为经帧内解码(I)条带时,预测处理单元208的帧内预测器209可基于发信号通知的帧内预测模式和来自当前帧或图像的先前经解码块的数据而产生当前视频条带的图像块的预测块。当视频条带被解码为经帧间解码(即,B或P)条带时,预测处理单元208的帧间预测器210可基于从熵解码器203接收到的语法元素,确定用于对当前视频条带的当前图像块进行解码的帧间预测模式,基于确定的帧间预测模式,对所述当前图像块进行解码(例如执行帧间预测)。具体的,帧间预测器210可确定是否对当前视频条带的当前图像块采用新的帧间预测模式进行预测,如果语法元素指示采用新的帧间预测模式来对当前图像块进行预测,基于新的帧间预测模式(例如通过语法元素指定的一种新的帧间预测模式或默认的一种新的帧间预测模式)预测当 前视频条带的当前图像块或当前图像块的子块的运动信息,从而通过运动补偿过程使用预测出的当前图像块或当前图像块的子块的运动信息来获取或生成当前图像块或当前图像块的子块的预测块。这里的运动信息可以包括参考图像信息和运动矢量,其中参考图像信息可以包括但不限于单向/双向预测信息,参考图像列表号和参考图像列表对应的参考图像索引。对于帧间预测,可从参考图像列表中的一者内的参考图像中的一者产生预测块。视频解码器200可基于存储在DPB 207中的参考图像来建构参考图像列表,即列表0和列表1。当前图像的参考帧索引可包含于参考帧列表0和列表1中的一或多者中。在一些实例中,可以是视频编码器100发信号通知指示是否采用新的帧间预测模式来解码特定块的特定语法元素,或者,也可以是发信号通知指示是否采用新的帧间预测模式,以及指示具体采用哪一种新的帧间预测模式来解码特定块的特定语法元素。应当理解的是,这里的帧间预测器210执行运动补偿过程。
反量化器204将在码流中提供且由熵解码器203解码的经量化变换系数逆量化,即去量化。逆量化过程可包括:使用由视频编码器100针对视频条带中的每个图像块计算的量化参数来确定应施加的量化程度以及同样地确定应施加的逆量化程度。反变换器205将逆变换应用于变换系数,例如逆DCT、逆整数变换或概念上类似的逆变换过程,以便产生像素域中的残差块。
在帧间预测器210产生用于当前图像块或当前图像块的子块的预测块之后,视频解码器200通过将来自反变换器205的残差块与由帧间预测器210产生的对应预测块求和以得到重建的块,即经解码图像块。求和器211表示执行此求和操作的组件。在需要时,还可使用环路滤波器(在解码环路中或在解码环路之后)来使像素转变平滑或者以其它方式改进视频质量。滤波器单元206可以表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)以及样本自适应偏移(SAO)滤波器。尽管在图3中将滤波器单元206示出为环路内滤波器,但在其它实现方式中,可将滤波器单元206实施为环路后滤波器。在一种示例下,滤波器单元206适用于重建块以减小块失真,并且该结果作为经解码视频流输出。并且,还可以将给定帧或图像中的经解码图像块存储在经解码图像缓冲器207中,经解码图像缓冲器207存储用于后续运动补偿的参考图像。经解码图像缓冲器207可为存储器的一部分,其还可以存储经解码视频,以供稍后在显示装置(例如图1的显示装置220)上呈现,或可与此类存储器分开。
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频码流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。
如前文所注明,本申请的技术示例性地涉及帧间解码。应理解,本申请的技术可通过本申请中所描述的视频解码器中的任一者进行,视频解码器包含(例如)如关于图1到3所展示及描述的视频编码器100及视频解码器200。即,在一种可行的实施方式中,关于图2所描述的帧间预测器110可在视频数据的块的编码期间在执行帧间预测时执行下文中所描述的特定技术。在另一可行的实施方式中,关于图3所描述的帧间预测器210可在视频数据的块的解码期间在执行帧间预测时执行下文中所描述的特定技术。因此,对一般性“视频编码器”或“视频解码器”的引用可包含视频编码器100、视频解码器 200或另一视频编码或编码单元。
图4为本申请实施例中帧间预测模块的一种示意性框图。帧间预测模块121,示例性的,可以包括运动估计单元42和运动补偿单元44。在不同的视频压缩编解码标准中,PU和CU的关系各有不同。帧间预测模块121可根据多个分割模式将当前CU分割为PU。举例来说,帧间预测模块121可根据2N×2N、2N×N、N×2N和N×N分割模式将当前CU分割为PU。在其他实施例中,当前CU即为当前PU,不作限定。
帧间预测模块121可对PU中的每一者执行整数运动估计(Integer Motion Estimation,IME)且接着执行分数运动估计(Fraction Motion Estimation,FME)。当帧间预测模块121对PU执行IME时,帧间预测模块121可在一个或多个参考图像中搜索用于PU的参考块。在找到用于PU的参考块之后,帧间预测模块121可产生以整数精度指示PU与用于PU的参考块之间的空间位移的运动矢量。当帧间预测模块121对PU执行FME时,帧间预测模块121可改进通过对PU执行IME而产生的运动矢量。通过对PU执行FME而产生的运动矢量可具有子整数精度(例如,1/2像素精度、1/4像素精度等)。在产生用于PU的运动矢量之后,帧间预测模块121可使用用于PU的运动矢量以产生用于PU的预测性图像块。
在帧间预测模块121使用AMVP模式用信号通知解码端PU的运动信息的一些可行的实施方式中,帧间预测模块121可产生用于PU的候选预测运动矢量列表。候选预测运动矢量列表可包括一个或多个原始候选预测运动矢量和从原始候选预测运动矢量导出的一个或多个额外候选预测运动矢量。在产生用于PU的候选预测运动矢量列表之后,帧间预测模块121可从候选预测运动矢量列表选择候选预测运动矢量且产生用于PU的运动矢量差(MVD)。用于PU的MVD可指示由选定候选预测运动矢量指示的运动矢量与使用IME和FME针对PU产生的运动矢量之间的差。在这些可行的实施方式中,帧间预测模块121可输出识别选定候选预测运动矢量在候选预测运动矢量列表中的位置的候选预测运动矢量索引。帧间预测模块121还可输出PU的MVD。
除了通过对PU执行IME和FME来产生用于PU的运动信息外,帧间预测模块121还可对PU中的每一者执行合并(Merge)操作。当帧间预测模块121对PU执行合并操作时,帧间预测模块121可产生用于PU的候选预测运动矢量列表。用于PU的候选预测运动矢量列表可包括一个或多个原始候选预测运动矢量和从原始候选预测运动矢量导出的一个或多个额外候选预测运动矢量。候选预测运动矢量列表中的原始候选预测运动矢量可包括一个或多个空间候选预测运动矢量和时间候选预测运动矢量。空间候选预测运动矢量可指示当前图像中的其它PU的运动信息。时间候选预测运动矢量可基于不同于当前图像的对应的PU的运动信息。时间候选预测运动矢量还可称作时间运动矢量预测(TMVP)。
在产生候选预测运动矢量列表之后,帧间预测模块121可从候选预测运动矢量列表选择候选预测运动矢量中的一个。帧间预测模块121可接着基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。
在基于IME和FME产生用于PU的预测性图像块和基于合并操作产生用于PU的预测性图像块之后,帧间预测模块121可选择通过FME操作产生的预测性图像块或者通 过合并操作产生的预测性图像块。在一些可行的实施方式中,帧间预测模块121可基于通过FME操作产生的预测性图像块和通过合并操作产生的预测性图像块的码率-失真代价分析来选择用于PU的预测性图像块。
在帧间预测模块121已选择通过根据分割模式中的每一者分割当前CU而产生的PU的预测性图像块之后(在一些实施方式中,编码树单元CTU划分为CU后,不会再进一步划分为更小的PU,此时PU等同于CU),帧间预测模块121可选择用于当前CU的分割模式。在一些实施方式中,帧间预测模块121可基于通过根据分割模式中的每一者分割当前CU而产生的PU的选定预测性图像块的码率-失真代价分析来选择用于当前CU的分割模式。帧间预测模块121可将与属于选定分割模式的PU相关联的预测性图像块输出到残差产生模块102。帧间预测模块121可将指示属于选定分割模式的PU的运动信息的语法元素输出到熵编码模块116。
在图4的示意图中,帧间预测模块121包括IME模块180A到180N(统称为“IME模块180”)、FME模块182A到182N(统称为“FME模块182”)、合并模块184A到184N(统称为“合并模块184”)、PU模式决策模块186A到186N(统称为“PU模式决策模块186”)和CU模式决策模块188(也可以包括执行从CTU到CU的模式决策过程)。
IME模块180、FME模块182和合并模块184可对当前CU的PU执行IME操作、FME操作和合并操作。图4的示意图中将帧间预测模块121说明为包括用于CU的每一分割模式的每一PU的单独IME模块180、FME模块182和合并模块184。在其它可行的实施方式中,帧间预测模块121不包括用于CU的每一分割模式的每一PU的单独IME模块180、FME模块182和合并模块184。
如图4的示意图中所说明,IME模块180A、FME模块182A和合并模块184A可对通过根据2N×2N分割模式分割CU而产生的PU执行IME操作、FME操作和合并操作。PU模式决策模块186A可选择由IME模块180A、FME模块182A和合并模块184A产生的预测性图像块中的一者。
IME模块180B、FME模块182B和合并模块184B可对通过根据N×2N分割模式分割CU而产生的左PU执行IME操作、FME操作和合并操作。PU模式决策模块186B可选择由IME模块180B、FME模块182B和合并模块184B产生的预测性图像块中的一者。
IME模块180C、FME模块182C和合并模块184C可对通过根据N×2N分割模式分割CU而产生的右PU执行IME操作、FME操作和合并操作。PU模式决策模块186C可选择由IME模块180C、FME模块182C和合并模块184C产生的预测性图像块中的一者。
IME模块180N、FME模块182N和合并模块184可对通过根据N×N分割模式分割CU而产生的右下PU执行IME操作、FME操作和合并操作。PU模式决策模块186N可选择由IME模块180N、FME模块182N和合并模块184N产生的预测性图像块中的一者。
PU模式决策模块186可基于多个可能预测性图像块的码率-失真代价分析选择预测性图像块,且选择针对给定解码情形提供最佳码率-失真代价的预测性图像块。示例性的,对于带宽受限的应用,PU模式决策模块186可偏向选择增加压缩比的预测性图像块,而对于其它应用,PU模式决策模块186可偏向选择增加经重建视频质量的预测性图像块。在PU模式决策模块186选择用于当前CU的PU的预测性图像块之后,CU模式决策模块188选择用于当前CU的分割模式且输出属于选定分割模式的PU的预测 性图像块和运动信息。
图5示出了本申请实施例中一种示例性的待处理图像块和其参考块的示意图。如图5所示,W和H是待处理图像块500以及待处理图像块在指定参考图像中的co-located块(简称为映射图像块)500’的宽度和高度。待处理图像块的参考块包括:待处理图像块的上侧空域邻接块和左侧空域邻接块,以及映射图像块的下侧空域邻接块和右侧空域邻接块,其中映射图像块为指定参考图像中与待处理图像块具有相同的大小、形状的图像块,并且映射图像块在指定参考图像中的位置和待处理图像块在其所在图像(一般指当前待处理图像)中的位置相同。映射图像块的下侧空域邻接块和右侧空域邻接块也可以被称作时域参考块。每帧图像可以被划分为用于编码的图像块,这些图像块可以被进一步划分为更小的块。例如,待处理图像块和映射图像块可以被分割成多个MxN子块,即每个子块的大小均为MxN像素,不妨设每个参考块的大小也为MxN像素,即与待处理图像块的子块的大小相同。“M×N”与“M乘N”可互换使用以指依照水平维度及垂直维度的图像子块的像素尺寸,即在水平方向上具有M个像素,且在垂直方向上具有N个像素,其中M、N表示非负整数值。此外,M和N不一定相同。举例说明,M可以等于N,M、N均为4,即子块的大小为4x4,M也可以不等于N,比如M=8,N=4,即子块的大小为8x4,在可行的实施方式中,示例性的,待处理图像块的子块大小和参考块的大小可以是4x4,8x8,8x4或4x8像素,或者标准允许的预测块的最小尺寸。在一种可行的实施方式中,W和H的度量单位分别为子块的宽度和高度,即W表示待处理图像块的宽和待处理图像块中子块的宽的比值,H表示待处理图像块的高和待处理图像块中子块的高的比值。此外,本申请描述的待处理图像块可以理解为但不限于:预测单元(prediction unit,PU)或者编码单元(coding unit,CU)或者变换单元(transform unit,TU)等。根据不同视频压缩编解码标准的规定,CU可包含一个或多个预测单元PU,或者PU和CU的尺寸相同。图像块可具有固定或可变的大小,且根据不同视频压缩编解码标准而在大小上不同。此外,待处理图像块是指当前待编码或当前待解码的图像块,例如待编码或待解码的预测单元。
在一种示例下,如图5所示,可以沿着方向1依次判断待处理图像块的每个左侧空域邻接块是否可用,以及可以沿着方向2依次判断待处理图像块的每个上侧空域邻接块是否可用,例如判断上述邻接块是否采用帧间编码,如果邻接块存在且采用帧间编码,则所述邻接块可用;如果邻接块不存在或者采用帧内编码,则所述邻接块不可用。在一种可行的实施方式中,如果一个邻接块采用帧内编码,则复制邻近的其它参考块的运动信息作为该邻接块的运动信息。按照类似方法检测映射图像块的下侧空域邻接块和右侧空域邻接块是否可用,在此不再赘述。
应理解,运动信息的存储可以存在不同的颗粒度,比如在H.264和H.265标准中,运动信息是以4x4像素集合为存储运动信息的基本单元的,示例性的,还可以以2x2,8x8,4x8,8x4等像素集合作为存储运动信息的基本单元。在本文中,不妨存储运动信息的基本单元简称为基本存储单元。
当上述参考块的大小与基本存储单元的大小一致时,可以直接获取该参考块对应的基本存储单元所存储的运动信息作为该参考块对应的运动信息。
或者,当上述参考块的大小小于基本存储单元的大小时,可以直接获取该参考块 对应的基本存储单元所存储的运动信息作为该参考块对应的运动信息。
或者,当上述参考块的大小大于存储运动信息的基本单元的大小时,可以获取参考块预定位置处对应的基本存储单元所存储的运动信息。示例性的,可以获取参考块左上角点处对应的基本存储单元所存储的运动信息,或者,可以获取参考块中心点处对应的基本存储单元所存储的运动信息,作为该参考块对应的运动信息。
在本申请实施例中,为了方便描述,待处理图像的子块又被称为基本预测块。
图6示例性的示出了本申请实施例中根据待处理图像块的参考块对应的运动矢量加权获得待处理图像块内部各基本预测块的运动矢量的示意流程图,包括:
S601、确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;
在一种可行的实施方式中,待处理图像块中的基本预测块的尺寸可以是预设的固定值,由编解码端预先确定,并且分别固化在编解码端。示例性的,当所述基本预测块的两条邻边的边长不等时,即所述基本预测块为非正方形的长方形(non-square),确定所述基本预测块的较短的一条边的边长为4或8;当所述基本预测块的两条邻边的边长相等时,即所述基本预测块为正方形,确定所述基本预测块的边长为4或8。应理解,上述边长为4或8只是一个示例值,也可以是16,24等其它常数。
在一种可行的实施方式中,待处理图像块中的基本预测块的尺寸可以通过解析码流中获得,具体的:从码流中解析第一标识,所述第一标识用于指示所述基本预测块的尺寸,其中,所述第一标识位于所述待处理图像块所在序列的序列参数集(sequence parameter set,SPS)、所述待处理图像块所在图像的图像参数集(picture parameter set,PPS)和所述待处理图像块所在条带的条带头(slice header,或者slice segment header)中的一个所对应的码流段中。
即,可以从码流中解析相应的语法元素,进而确定基本预测块的尺寸。而该语法元素可以携带于码流中对应SPS的码流部分,也可以携带于码流中对应PPS的码流部分,还可以携带于码流中对应条带头的码流部分。
应理解,当从SPS中解析出基本预测块的尺寸时,整个序列中的基本预测块采用相同的尺寸,当从PPS中解析出基本预测块的尺寸时,整个图像帧中的基本预测块采用相同的尺寸,当从条带头中解析出基本预测块的尺寸时,整个条带中的基本预测块采用相同的尺寸。
应理解,在本文中,图像和图像帧是不同的概念,图像包括以整帧形式存在的图像(即图像帧),也包括以条带(slice)形式存在的图像,以片(tile)形式存在的图像,或者以其它子图像的形式存在的图像,不做限定。
应理解,对于采用帧内预测的条带,由于不需要确定基本预测块的尺寸,因此,采用帧内预测的条带的条带头不存在上述第一标识。
具体的,编码端通过适当的方式确定基本预测块的尺寸(比如,率失真选择的方式,或者实验经验值的方式),将确定后的基本预测块的尺寸编入码流,解码端从码流中解析出基本预测块的尺寸。
在一种可行的实施方式中,待处理图像块中的基本预测块的尺寸通过历史信息来确定,因此可以分别在编解码端自适应地获得,具体的,根据在先已重构图像中平面 模式预测块的尺寸,确定所述基本预测块的尺寸,所述平面模式预测块为根据权利要求1至7任一项所述的方法进行帧间预测的待处理图像块,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像。
应理解,当确定当前图像的基本预测块时,在先已重构图像中根据权利要求1至7任一项所述的方法进行帧间预测的待处理图像块已经处理完毕,所述平面模式预测块实际为根据权利要求1至7任一项所述的方法完成帧间预测的图像块。本文相关段落均依此解释,不再赘述。
不妨将采用本申请实施例中所述的方法(比如,图6所示的方法)进行帧间预测的待处理图像块称为平面模式预测块。可以根据在先编码的图像中统计平面模式预测块的大小来估计待处理图像块所在图像(在后文中简称为当前图像)中基本预测块的大小。
应理解,在编码端的图像编码顺序和在解码端的图像解码顺序是一致的,因此,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像,也可以描述为所述在先已重构图像为解码顺序位于所述待处理图像块所在图像之前的图像。本文对于编码顺序和解码顺序均按上述方式理解,不再赘述。
应理解,当存在于编码端的已重构图像A的编码顺序和存在于解码端的已重构图像B的解码顺序相同时,图像A和图像B是相同的,因此分别在编码端和解码端基于相同的重构图像进行分析,可以得到相同的先验信息,基于该先验信息来确定基本预测块的尺寸,在编解码端可以得到相同的结果,即实现确定基本预测块的尺寸的自适应机制。
具体的,可以按照如下方式确定当前图像基本预测块的尺寸:
计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值;
当所述平均值小于阈值时,所述基本预测块的尺寸为第一尺寸;
当所述平均值大于或等于所述阈值时,所述基本预测块的尺寸为第二尺寸,其中,所述第一尺寸小于所述第二尺寸。
应理解,一般的,上述阈值是预先设置的。
在一种可行的实施方式中,当所述待处理图像块所在的图像的参考帧的POC(显示顺序,picture order count)均小于所述待处理图像块所在的图像的POC时,所述阈值为第一阈值;当所述待处理图像块所在的图像的至少一个参考帧的POC大于所述待处理图像块所在的图像的POC时,所述阈值为第二阈值,其中,所述第一阈值和所述第二阈值不同。
即当以低延时(low delay)的方式进行编码时,此时当前图像的参考帧的POC均小于当前图像的POC,将阈值设置为第一数值,示例性的可以设置为75,当以随机接入(random access)的方式进行编码时,此时当前图像的至少一个参考帧的POC大于当前图像的POC,将阈值设置为第二数值,示例性的可以设置为27。应理解,该第一数值和第二数值的设置不做限定。
应理解,第一尺寸小于第二尺寸,示例性的,第一尺寸和第二尺寸的关系可以包括第一尺寸为4(正方形边长)且第二尺寸为8(正方形边长),也可以包括第一尺寸为4x4且第二尺寸为8x8,也可以包括第一尺寸为4x4且第二尺寸为4x8,也可以包括 第一尺寸为4x8,第二尺寸为8x8,也可以包括第一尺寸为4x8,第二尺寸为8x16,不做限定。
在一种可行的实施方式中,所述在先已重构图像为编码顺序距离所述待处理图像块所在的图像最近的已重构图像,也即所述在先已重构图像为解码顺序距离所述待处理图像块所在的图像最近的已重构图像。
即,根据当前图像帧的前一编码/解码帧中的全部所述平面模式预测块的统计信息(示例性的,全部所述平面模式预测块的宽和高的乘积的平均值),确定当前图像帧中的基本预测块的尺寸,或者,根据当前条带的前一条带中的全部所述平面模式预测块的统计信息,确定当前条带中的基本预测块的尺寸。如前所述,图像也可以包括其他形式的子图像,因此,并不限定于图像帧和条带。
应理解,在此实施方式中,统计信息以图像帧或者条带为单位进行更新,即每图像帧或者每条带进行一次更新。
应理解,在采用帧内预测的图像帧或者条带中不进行统计信息的更新。
在一种可行的实施方式中,所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离所述待处理图像块所在的图像最近的已重构图像,也即所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,解码顺序距离所述待处理图像块所在的图像最近的已重构图像。
即,从和当前图像具有相同时域层标识(temporal ID)的图像中确定和当前图像编码距离最近的图像。具体方式可参考上一可行的实施方式,不做赘述。
在一种可行的实施方式中,所述在先已重构图像为多个图像,对应的,所述计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值,包括:计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值。
应理解,上述两种可行的实施方式分别是按照单一在先已重构图像的统计数据来确定当前图像的基本预测块的尺寸,而在本实施方式中是累计多个在先已重构图像的统计数据来确定当前图像的基本预测块的尺寸。即,在此实施方式中,统计信息以多个图像帧或者多个条带为单位进行更新,即每预设个数的图像帧或者每预设个数条带进行一次更新,或者统计信息可以一直累计而不做更新。具体的,计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值,可以包括:分别统计所述多个在先已重构图像中每个图像中的全部所述平面模式预测块的宽和高的乘积的平均值,将上述分别统计的平均值再做加权,获得本实施方式中用于和上述阈值进行比较的最终平均值,也可以包括:累加多个在先已重构图像中的全部所述平面模式预测块的宽和高的乘积,再除以全部所述平面模式预测块的个数,以获得本实施方式中用于和上述阈值进行比较的平均值。
在一种可行的实施方式中,在计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值的过程中,还包括确定统计信息有效,比如,如果所述在先已重构图像中没有所述平面模式预测块,就无法计算所述平均值,此时统计信息无效,对应的,可以不对统计信息进行更新,或者将当前图像的基本预测块的尺寸设置为预设值,示例性的,对于正方形块可以设置为4x4。
应理解,对于第一个采用帧间预测的图像在采用历史信息确定基本预测块的尺寸的实施方式中,基本预测块的尺寸也设置为预设值。
在一种可行的实施方式中,确定待处理图像块中的基本预测块的尺寸,还包括确定基本预测块的形状,示例性的,当待处理图像块为正方形时,可以确定基本预测块也为正方形,或者,待处理图像块的宽高比和基本预测块的宽高比一致,或者,将待处理图像块的宽和高分别均分成若干等分以获得基本预测块的宽和高,或者,待处理图像块的形状和基本预测块的形状不相关。比如,可以将基本预测块固定设置为正方形,或者,当待处理图像块的尺寸为32x16时,可以设置基本预测块为16x8或者8x4等,不做限定。
应理解,在一种可行的实施方式中,对基本预测块形状的确定分别固化于编解码端,并保持一致。
在一种可行的实施方式中,在本步骤之后,还包括:
S602、根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;依次确定每个所述基本预测块在所述待处理图像块中的位置。
应理解,每个基本预测块的尺寸是相同的,在确定了基本预测块的尺寸之后,可以在待处理图像块中按照尺寸依次推算出每一个基本预测块的位置。
应理解,在一种可行的实施方式中,待处理图像块和基本预测块的位置均以坐标的形式存在,该步骤仅需要确定每个基本预测块的坐标即可,或者,将待处理图像块和基本预测块进行区分,不存在实体化的划分步骤。
S603、根据所述位置,确定所述基本预测块的第一参考块和第二参考块,其中,所述第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接。
S604、对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量。
在一种可行的实施方式中,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块和/或与所述待处理图像块具有预设时域位置关系的原始参考块。
在一种可行的实施方式中,与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个,其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部,且不妨简称为空域参考块。
在一种可行的实施方式中,与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块,其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映 射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同,且不妨简称为时域参考块。
在一种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息通过解析所述码流获得。
在一种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
下面将具体描述步骤S603和S604的具体实现方式:
在一种可行的实施方式中,如图7所示:
S701、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802,其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y);
S702A、基于待处理图像块的右上角的空域参考块805对应的运动矢量和待处理图像块的右下角位置的时域参考块807对应的运动矢量进行加权计算获得第一临时块806对应的运动矢量,示例性的,计算公式为R(W,y)=((H-y-1)×AR+(y+1)×BR)/H,其中,AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,H为所述待处理图像块的高与所述基本预测块的高的比值,y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值。其中,目标参考帧的索引信息和参考帧列表信息从条带头中解析获得。
S702B、基于待处理图像块的左下角的空域参考块801对应的运动矢量和待处理图像块的右下角位置的时域参考块807对应的运动矢量进行加权计算获得第二临时块808对应的运动矢量,示例性的,计算公式为B(x,H)=((W-x-1)×BL+(x+1)×BR)/W,其中,BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,W为所述待处理图像块的宽与所述基本预测块的宽的比值,x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值。
应理解,步骤S702A和步骤S702B不限定先后顺序关系。
S703A、基于待处理图像块的第一临时块对应的运动矢量和待处理图像块的第二参考块对应的运动矢量进行加权计算获得基本预测单元对应的第一临时运动矢量P h(x,y),示例性的,计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S703B、基于待处理图像块的第二临时块对应的运动矢量和待处理图像块的第一参考块对应的运动矢量进行加权计算获得基本预测单元对应的第二临时运动矢量P v(x,y),示例性的,计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S703A和步骤S703B不限定先后顺序关系。
S704、基于待处理图像块的第一临时运动矢量和第二临时运动矢量进行加权计算获得基本预测单元对应的运动矢量P(x,y),示例性的,计算公式为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
应理解,在一种可行的实施方式中,基本预测单元对应的运动矢量P(x,y)也可以通过综合上述步骤的单一公式得出,示例性的,计算公式为P(x,y)=(H×((W-1-x)×L(-1,y)+(x+1)×((H-y-1)×AR+(y+1)×BR)/H)+W×((H-1-y)×A(x,-1)+(y+1)×((W-x-1)×BL+(x+1)×BR)/W)+H×W)/(2×H×W)
在另一种可行的实施方式中,如图8所示:
S801、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802,其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y);
S802A、将待处理图像块的右上角的空域参考块805对应的运动矢量作为待处理图像块的第一临时块806对应的运动矢量;
S802B、将待处理图像块的左下角的空域参考块801对应的运动矢量作为待处理图像块的第二临时块808对应的运动矢量;
S803A、基于待处理图像块的第一临时块对应的运动矢量R(W,y)和待处理图像块的第二参考块对应的运动矢量进行加权计算获得基本预测单元对应的第一临时运动矢量P h(x,y),示例性的,计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S803B、基于待处理图像块的第二临时块对应的运动矢量B(x,H)和待处理图像块的第一参考块对应的运动矢量进行加权计算获得基本预测单元对应的第二临时运动矢量P v(x,y),示例性的,计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S803A和步骤S803B不限定先后顺序关系。
S804、基于待处理图像块的第一临时运动矢量和第二临时运动矢量进行加权计算获得基本预测单元对应的运动矢量P(x,y),示例性的,计算公式为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
在另一种可行的实施方式中,如图9所示:
S901、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802,其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y);
S902、根据待处理图像块600中的基本预测块604的位置确定第一临时块806和第二临时块809,其中,第一临时块为在目标参考帧中位于映射图像块的块806位置的图像块,第二临时块为在目标参考帧中位于映射图像块的块808位置的图像块,第一临时块和第二临时块均为时域参考块。
S903A、基于待处理图像块的第一临时块对应的运动矢量R(W,y)和待处理图像块的第二参考块对应的运动矢量进行加权计算获得基本预测单元对应的第一临时运动矢量P h(x,y),示例性的,计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S903B、基于待处理图像块的第二临时块对应的运动矢量B(x,H)和待处理图像块的第一参考块对应的运动矢量进行加权计算获得基本预测单元对应的第二临时运动矢量P v(x,y),示例性的,计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S903A和步骤S903B不限定先后顺序关系。
S904、基于待处理图像块的第一临时运动矢量和第二临时运动矢量进行加权计算获得基本预测单元对应的运动矢量P(x,y),示例性的,计算公式为P(x,y)=(H× P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
在另一种可行的实施方式中,如图9所示:
S0101、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802,其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y);
S0102、根据待处理图像块的任一空域参考块的运动信息进行运动补偿,确定参考帧信息和运动补偿块的位置。
其中,所述任一可以是图5所示的左侧空域邻接块或上侧空域邻接块中的某一个可用的空域邻接块,示例性的,可以是沿着方向1检测到的第一个可用的左侧空域邻接块,或者可以是沿着方向2检测到的第一个可用的上侧空域邻接块;还可以是对待处理图像块的多个预设空域参考块依照预设的顺序检测得到的第一个可用的空域邻接块,如图7所示的L→A→AR→BL→AL的顺序;还可以是按照预定规则所选择的空域邻接块,不做限定。
S0103、根据待处理图像块600中的基本预测块604的位置确定第一临时块806和第二临时块808,其中,第一临时块为在步骤S0102中根据参考帧信息确定的参考帧中位于运动补偿块的块806位置的图像块,第二临时块为在步骤S0102中根据参考帧信息确定的参考帧中位于运动补偿块的块808位置的图像块,第一临时块和第二临时块均为时域参考块。
S0104A、基于待处理图像块的第一临时块对应的运动矢量R(W,y)和待处理图像块的第二参考块对应的运动矢量进行加权计算获得基本预测单元对应的第一临时运动矢量P h(x,y),示例性的,计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S0104B、基于待处理图像块的第二临时块对应的运动矢量B(x,H)和待处理图像块的第一参考块对应的运动矢量进行加权计算获得基本预测单元对应的第二临时运动矢量P v(x,y),示例性的,计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S0104A和步骤S0104B不限定先后顺序关系。
S0105、基于待处理图像块的第一临时运动矢量和第二临时运动矢量进行加权计算获得基本预测单元对应的运动矢量P(x,y),示例性的,计算公式为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
前文提到了图像块和存储运动信息的基本存储单元的关系,不妨将图像块所对应的基本存储单元所存储的运动信息称为该图像块的实际运动信息,而运动信息包括运动矢量和运动矢量指向的参考帧的索引信息。应理解,用于加权计算出基本预测块的运动矢量的各个参考块的参考帧的索引信息不能保证是一致的。当各参考块的参考帧的索引信息一致时,参考块对应的运动信息就是参考块的实际运动信息。当个参考块的参考帧的索引信息不一致时,首先需要按照参考帧索引指示的参考帧的距离关系对参考块的实际运动矢量进行加权处理,参考块对应的运动信息就是对参考块的实际运动信息中的运动矢量进行加权处理后的运动矢量。
具体的,确定目标参考图像索引,示例性的,可以固定为0,1或其它索引值,也可以是参考图像列表中使用频率最高的参考图像索引,例如是所有参考块的实际运动矢量或者经加权的运动矢量指向次数最多的参考图像索引。
判断各个参考块的参考帧的索引信息是否与目标图像索引相同;
如果某个参考块的参考帧的索引信息与目标图像索引不同,则基于参考块所在图像与参考块的实际运动信息(参考帧索引信息)所指示的参考帧图像之间的时间距离,与参考块所在图像与目标参考图像索引所指示的参考图像之间的时间距离之比,来按比例缩放实际运动矢量,以获得加权处理后的运动矢量。
在一种可行的实施方式中,在步骤S604之后,所述方法还包括:
S605、基于所述获得的运动矢量,对所述待处理图像块进行运动补偿。
在一种可行的实施方式中,包括:首先对具有相同运动信息的相邻基本预测块进行合并,然后以合并后的图像块作为运动补偿的单元进行运动补偿。
具体的,首先进行横向合并,即对待处理图像块中的每一行基本预测块,从左向右依次判断基本预测块和与其相邻的基本预测块的运动信息(示例性的,包括运动矢量、参考帧列表、参考帧索引信息)是否相同。当运动信息相同时,合并相邻的两个基本预测块,并继续判断和合并后的基本预测块相邻的下一个基本预测块的运动信息是否与合并后的基本预测块的运动信息相同,直到相邻的基本预测块的运动信息与合并后的基本预测块的运动信息不同时,停止合并,继续以该具有不同运动信息的基本预测块作为起点继续进行具有相同运动信息的相邻基本预测块进行合并的步骤,直到该基本预测块行结束。
然后再进行纵向合并,即对每一个横向合并后的基本预测块或者未合并的基本预测块,判断该块的下边沿是否和另一个横向合并后的基本预测块或者未合并的基本预测块的上边沿完全重合。如果完全重合,合并边沿重合的两个具有相同运动信息的基本预测块(或者横向合并后的基本预测块),继续对纵向合并后的基本预测块进行具有相重合的上下边沿的具有相同运动信息的相邻基本预测块进行合并的步骤,直到该待处理图像块中没有满足上述条件的基本预测块。
最后,以合并后的基本预测块作为运动补偿的单元进行运动补偿。
在一种可行的实施方式中,对具有相同运动信息的相邻基本预测块进行合并的合并方式和待处理图像块的形状有关系。示例性的,当待处理图像块的宽大于或等于待处理图像块的高时,只采用上文所述的横向合并的方式进行基本预测块的合并。当待处理图像块的宽小于待处理图像块的高时,对待处理图像块中的每一列基本预测块,从上向下依次判断基本预测块和与其相邻的基本预测块的运动信息(示例性的,包括运动矢量、参考帧列表、参考帧索引信息)是否相同。当运动信息相同时,合并相邻的两个基本预测块,并继续判断和合并后的基本预测块相邻的下一个基本预测块的运动信息是否与合并后的基本预测块的运动信息相同,直到相邻的基本预测块的运动信息与合并后的基本预测块的运动信息相同不同时,停止合并,继续以该具有不同运动信息的基本预测块作为起点继续进行具有相同运动信息的相邻基本预测块进行合并的步骤,直到该基本预测块列结束。
在一种可行的实施方式中,在步骤S601之前,所述方法还包括:
S606、确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
即,当待处理图像块的上边界线和待处理图像块所在图像的上边界线重合时,第 一参考块不存在,此时本申请实施例中的方案不适用。当待处理图像块的左边界线和待处理图像块所在图像的左边界线重合时,第二参考块不存在,此时本申请实施例中的方案也不适用。
在一种可行的实施方式中,在步骤S601之前,所述方法还包括:
S607、确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
即,当待处理图像块的宽小于16或高小于16时,本申请实施例中的方案不适用,或者,当待处理图像块的宽小于16且高小于16时,本申请实施例中的方案不适用。
应理解,示例性的,这里以16作为阈值,还可以采用8,24,32等其他数值,宽和高所对应的阈值也可以不相等,均不做限定。
应理解,步骤S606和步骤S607可以配合执行。示例性的,在一种可行的实施方式中,当待处理图像块处于图像帧的左边界,或上边界,或待处理图像块的宽和高都小于16时,不能采用本申请实施例中的帧间预测方案,在另一种可行的实施方式中,当待处理图像块处于图像帧的左边界,或上边界,或待处理图像块的宽或高小于16时,不能采用本申请实施例中的帧间预测方案。
图10为本申请实施例中的帧间预测装置1000的一种示意性框图,具体的,包括:
确定模块1001,用于确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;
定位模块1002,用于根据所述位置,确定所述基本预测块的第一参考块和第二参考块,其中,所述第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;
计算模块1003,用于对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量。
在第一种可行的实施方式中,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块和/或与所述待处理图像块具有预设时域位置关系的原始参考块。
在第二种可行的实施方式中,与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个,其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
在第三种可行的实施方式中,与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块,其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述 映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同。
在第四种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息通过解析所述码流获得。
在第五种可行的实施方式中,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
在第六种可行的实施方式中,所述计算模块具体用于根据如下公式获得所述基本预测块对应的运动矢量:
P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W),
其中,
P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)
P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)
R(W,y)=((H-y-1)×AR+(y+1)×BR)/H
B(x,H)=((W-x-1)×BL+(x+1)×BR)/W
AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,H为所述待处理图像块的高与所述基本预测块的高的比值,W为所述待处理图像块的宽与所述基本预测块的宽的比值,L(-1,y)为所述第二参考块对应的运动矢量,A(x,-1)为所述第一参考块对应的运动矢量,P(x,y)为所述基本预测块对应的运动矢量。
在第七种可行的实施方式中,所述确定模块1001具体用于:当所述基本预测块的两条邻边的边长不等时,确定所述基本预测块的较短的一条边的边长为4或8;当所述基本预测块的两条邻边的边长相等时,确定所述基本预测块的边长为4或8。
在第八种可行的实施方式中,所述确定模块1001具体用于:从码流中解析第一标识,所述第一标识用于指示所述基本预测块的尺寸,其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像的图像参数集和所述待处理图像块所在条带的条带头中的一个所对应的码流段中。
在第九种可行的实施方式中,所述确定模块1001具体用于:根据在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,所述平面模式预测块为根据前述各可行的实施方式进行帧间预测的待处理图像块,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像。
在第十种可行的实施方式中,所述确定模块1001具体用于:计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值;当所述平均值小于阈值时,所述基本预测块的尺寸为第一尺寸;当所述平均值大于或等于所述阈值时,所述基本预测块的尺寸为第二尺寸,其中,所述第一尺寸小于所述第二尺寸。
在第十一种可行的实施方式中,所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
在第十二种可行的实施方式中,所述在先已重构图像为编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
在第十三种可行的实施方式中,所述在先已重构图像为多个图像,对应的,所述确定模块1001具体用于:计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值。
在第十四种可行的实施方式中,所述阈值为预设阈值。
在第十五种可行的实施方式中,当所述待处理图像块所在的图像的参考帧的POC均小于所述待处理图像块所在的图像的POC时,所述阈值为第一阈值;当所述待处理图像块所在的图像的至少一个参考帧的POC大于所述待处理图像块所在的图像的POC时,所述阈值为第二阈值,其中,所述第一阈值和所述第二阈值不同。
在第十六种可行的实施方式中,还包括划分模块1004,用于:根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;依次确定每个所述基本预测块在所述待处理图像块中的位置。
在第十七种可行的实施方式中,还包括判断模块1005,用于:确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
在第十八种可行的实施方式中,所述判断模块1005还用于:确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
在第十九种可行的实施方式中,所述装置用于编码所述待处理图像块,或者,解码所述待处理图像块。
图11为本申请实施例的编码设备或解码设备(简称为译码设备1100)的一种实现方式的示意性框图。其中,译码设备1100可以包括处理器1110、存储器1130和总线系统1150。其中,处理器和存储器通过总线系统相连,该存储器用于存储指令,该处理器用于执行该存储器存储的指令。编码设备的存储器存储程序代码,且处理器可以调用存储器中存储的程序代码执行本申请描述的各种视频编码或解码方法,尤其是在各种新的帧间预测模式下的视频编码或解码方法,以及在各种新的帧间预测模式下预测运动信息的方法。为避免重复,这里不再详细描述。
该存储器1130可以包括只读存储器(ROM)设备或者随机存取存储器(RAM)设备。任何其他适宜类型的存储设备也可以用作存储器1130。存储器1130可以包括由处理器1110使用总线1150访问的代码和数据1131。存储器1130可以进一步包括操作系统1133和应用程序1135,该应用程序1135包括允许处理器1110执行本申请描述的视频编码或解码方法(尤其是本申请描述的帧间预测方法或运动信息预测方法)的至少一个程序。例如,应用程序1135可以包括应用1至N,其进一步包括执行在本申请描述的视频编码或解码方法的视频编码或解码应用(简称视频译码应用)。
该总线系统1150除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统1150。
可选的,译码设备1100还可以包括一个或多个输出设备,诸如显示器1170。在一个示例中,显示器1170可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1170可以经由总线1150连接到处理器1110。
虽然关于视频编码器100及视频解码器200已描述本申请的特定方面,但应理解,本申请的技术可通过许多其它视频编码和/或编码单元、处理器、处理单元、例如编码器/解码器(CODEC)的基于硬件的编码单元及类似者来应用。此外,应理解,仅作为可行的实施方式而提供关于图6所展示及描述的步骤。即,图6所涉及实施方式中所展示的步骤无需必定按图6中所展示的次序执行,且可执行更少、额外或替代步骤。
此外,应理解,取决于可行的实施方式,本文中所描述的方法中的任一者的特定动作或事件可按不同序列执行,可经添加、合并或一起省去(例如,并非所有所描述的动作或事件为实践方法所必要的)。此外,在特定可行的实施方式中,动作或事件可(例如)经由多线程处理、中断处理或多个处理器来同时而非顺序地执行。另外,虽然出于清楚的目的将本申请的特定方面描述为通过单一模块或单元执行,但应理解,本申请的技术可通过与视频解码器相关联的单元或模块的组合执行。
在一个或多个可行的实施方式中,所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么功能可作为一个或多个指令或代码而存储于计算机可读媒体上或经由计算机可读媒体来传输,且通过基于硬件的处理单元来执行。计算机可读媒体可包含计算机可读存储媒体或通信媒体,计算机可读存储媒体对应于例如数据存储媒体的有形媒体,通信媒体包含促进计算机程序(例如)根据通信协议从一处传送到另一处的任何媒体。
以这个方式,计算机可读媒体示例性地可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)例如信号或载波的通信媒体。数据存储媒体可为可由一个或多个计算机或一个或多个处理器存取以检索用于实施本申请中所描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为可行的实施方式而非限制,此计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用于存储呈指令或数据结构的形式的所要代码且可由计算机存取的任何其它媒体。同样,任何连接可适当地称作计算机可读媒体。例如,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL),或例如红外线、无线电及微波的无线技术而从网站、服务器或其它远端源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL,或例如红外线、无线电及微波的无线技术包含于媒体的定义中。
然而,应理解,计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体,而替代地针对非暂时性有形存储媒体。如本文中所使用,磁盘及光盘包含紧密光盘(CD)、雷射光盘、光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘通过雷射以光学方式再现数据。以上各物的组合也应包含于计算机可读媒体的范围内。
可通过例如一个或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它等效集成或离散逻辑电路的一个或多个处理器来执行指令。因此,如本文中所使用,术语“处理器”可指前述结构或适于实施本 文中所描述的技术的任何其它结构中的任一者。另外,在一些方面中,可将本文所描述的功能性提供于经配置以用于编码及解码的专用硬件和/或软件模块内,或并入于组合式编码解码器中。同样,技术可完全实施于一个或多个电路或逻辑元件中。
本申请的技术可实施于广泛多种装置或设备中,包含无线手机、集成电路(IC)或IC的集合(例如,芯片组)。本申请中描述各种组件、模块或单元以强调经配置以执行所揭示的技术的装置的功能方面,但未必需要通过不同硬件单元实现。更确切来说,如前文所描述,各种单元可组合于编码解码器硬件单元中或由互操作的硬件单元(包含如前文所描述的一个或多个处理器)结合合适软件和/或固件的集合来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (40)

  1. 一种帧间预测的方法,其特征在于,包括:
    确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;
    根据所述位置,确定所述基本预测块的第一参考块和第二参考块,其中,所述第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;
    对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量。
  2. 根据权利要求1所述的方法,其特征在于,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块和/或与所述待处理图像块具有预设时域位置关系的原始参考块。
  3. 根据权利要求2所述的方法,其特征在于,与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个,其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
  4. 根据权利要求2或3所述的方法,其特征在于,与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块,其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同。
  5. 根据权利要求4所述的方法,其特征在于,所述目标参考帧的索引信息和参考帧列表信息通过解析所述码流获得。
  6. 根据权利要求5所述的方法,其特征在于,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
  7. 根据权利要求4至6任一项所述的方法,其特征在于,所述对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对 应的运动矢量,包括:所述基本预测块对应的运动矢量根据如下公式获得:
    P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W),
    其中,
    P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)
    P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)
    R(W,y)=((H-y-1)×AR+(y+1)×BR)/H
    B(x,H)=((W-x-1)×BL+(x+1)×BR)/W
    AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,H为所述待处理图像块的高与所述基本预测块的高的比值,W为所述待处理图像块的宽与所述基本预测块的宽的比值,L(-1,y)为所述第二参考块对应的运动矢量,A(x,-1)为所述第一参考块对应的运动矢量,P(x,y)为所述基本预测块对应的运动矢量。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述确定待处理图像块中的基本预测块的尺寸,包括:
    当所述基本预测块的两条邻边的边长不等时,确定所述基本预测块的较短的一条边的边长为4或8;
    当所述基本预测块的两条邻边的边长相等时,确定所述基本预测块的边长为4或8。
  9. 根据权利要求1至7任一项所述的方法,其特征在于,所述确定待处理图像块中的基本预测块的尺寸,包括:
    从码流中解析第一标识,所述第一标识用于指示所述基本预测块的尺寸,其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像的图像参数集和所述待处理图像块所在条带的条带头中的一个所对应的码流段中。
  10. 根据权利要求7所述的方法,其特征在于,所述确定待处理图像块中的基本预测块的尺寸,包括:
    根据在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,所述平面模式预测块为根据权利要求1至7任一项所述的方法进行帧间预测的待处理图像块,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述待处理图像块所在图像的在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,包括:
    计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值;
    当所述平均值小于阈值时,所述基本预测块的尺寸为第一尺寸;
    当所述平均值大于或等于所述阈值时,所述基本预测块的尺寸为第二尺寸,其中,所述第一尺寸小于所述第二尺寸。
  12. 根据权利要求11所述的方法,其特征在于,所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
  13. 根据权利要求11所述的方法,其特征在于,所述在先已重构图像为编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
  14. 根据权利要求11至13任一项所述的方法,其特征在于,所述在先已重构图像为多个图像,对应的,所述计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值,包括:计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值。
  15. 根据权利要求11至14任一项所述的方法,其特征在于,所述阈值为预设阈值。
  16. 根据权利要求11至15任一项所述的方法,其特征在于,当所述待处理图像块所在的图像的参考帧的POC均小于所述待处理图像块所在的图像的POC时,所述阈值为第一阈值;当所述待处理图像块所在的图像的至少一个参考帧的POC大于所述待处理图像块所在的图像的POC时,所述阈值为第二阈值,其中,所述第一阈值和所述第二阈值不同。
  17. 根据权利要求1至16任一项所述的方法,在所述确定待处理图像块中的基本预测块的尺寸之后,还包括:
    根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;
    依次确定每个所述基本预测块在所述待处理图像块中的位置。
  18. 根据权利要求1至17任一项所述的方法,其特征在于,在所述确定待处理图像块中的基本预测块的尺寸之前,所述方法还包括:
    确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
  19. 根据权利要求1至18任一项所述的方法,其特征在于,在所述确定待处理图像块中的基本预测块的尺寸之前,所述方法还包括:
    确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
  20. 根据权利要求1至19任一项所述的方法,其特征在于,所述方法用于编码所述待处理图像块,或者,解码所述待处理图像块。
  21. 一种帧间预测的装置,其特征在于,包括:
    确定模块,用于确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;
    定位模块,用于根据所述位置,确定所述基本预测块的第一参考块和第二参考块,其中,所述第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;
    计算模块,用于对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获得所述基本预测块对应的运动矢量。
  22. 根据权利要求21所述的装置,其特征在于,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块和/或与所述待处理图像块具有预设时域位置关系的原始参考块。
  23. 根据权利要求22所述的装置,其特征在于,与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个,其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
  24. 根据权利要求22或23所述的装置,其特征在于,与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块,其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同。
  25. 根据权利要求24所述的装置,其特征在于,所述目标参考帧的索引信息和参考帧列表信息通过解析所述码流获得。
  26. 根据权利要求25所述的装置,其特征在于,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
  27. 根据权利要求24至26任一项所述的装置,其特征在于,所述计算模块具体用于根据如下公式获得所述基本预测块对应的运动矢量:
    P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W), 其中,
    P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)
    P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)
    R(W,y)=((H-y-1)×AR+(y+1)×BR)/H
    B(x,H)=((W-x-1)×BL+(x+1)×BR)/W
    AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,H为所述待处理图像块的高与所述基本预测块的高的比值,W为所述待处理图像块的宽与所述基本预测块的宽的比值,L(-1,y)为所述第二参考块对应的运动矢量,A(x,-1)为所述第一参考块对应的运动矢量,P(x,y)为所述基本预测块对应的运动矢量。
  28. 根据权利要求21至27任一项所述的装置,其特征在于,所述确定模块具体用于:
    当所述基本预测块的两条邻边的边长不等时,确定所述基本预测块的较短的一条边的边长为4或8;
    当所述基本预测块的两条邻边的边长相等时,确定所述基本预测块的边长为4或8。
  29. 根据权利要求21至27任一项所述的装置,其特征在于,所述确定模块具体用于:
    从码流中解析第一标识,所述第一标识用于指示所述基本预测块的尺寸,其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像的图像参数集和所述待处理图像块所在条带的条带头中的一个所对应的码流段中。
  30. 根据权利要求27所述的装置,其特征在于,所述确定模块具体用于:根据在先已重构图像中平面模式预测块的尺寸,确定所述基本预测块的尺寸,所述平面模式预测块为根据权利要求21至27任一项所述的装置进行帧间预测的待处理图像块,所述在先已重构图像为编码顺序位于所述待处理图像块所在图像之前的图像。
  31. 根据权利要求30所述的装置,其特征在于,所述确定模块具体用于:
    计算所述在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值;
    当所述平均值小于阈值时,所述基本预测块的尺寸为第一尺寸;
    当所述平均值大于或等于所述阈值时,所述基本预测块的尺寸为第二尺寸,其中,所述第一尺寸小于所述第二尺寸。
  32. 根据权利要求31所述的装置,其特征在于,所述在先已重构图像为与所述待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
  33. 根据权利要求31所述的装置,其特征在于,所述在先已重构图像为编码顺序距离所述待处理图像块所在的图像最近的已重构图像。
  34. 根据权利要求31至33任一项所述的装置,其特征在于,所述在先已重构图像为多个图像,对应的,所述确定模块具体用于:计算所述多个在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值。
  35. 根据权利要求31至34任一项所述的装置,其特征在于,所述阈值为预设阈值。
  36. 根据权利要求31至35任一项所述的装置,其特征在于,当所述待处理图像块所在的图像的参考帧的POC均小于所述待处理图像块所在的图像的POC时,所述阈值为第一阈值;当所述待处理图像块所在的图像的至少一个参考帧的POC大于所述待处理图像块所在的图像的POC时,所述阈值为第二阈值,其中,所述第一阈值和所述第二阈值不同。
  37. 根据权利要求21至36任一项所述的装置,其特征在于,还包括划分模块,用于:
    根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;
    依次确定每个所述基本预测块在所述待处理图像块中的位置。
  38. 根据权利要求21至37任一项所述的装置,其特征在于,还包括判断模块,用于:
    确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
  39. 根据权利要求21至38任一项所述的装置,其特征在于,所述判断模块还用于:
    确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
  40. 根据权利要求21至39任一项所述的装置,其特征在于,所述装置用于编码所述待处理图像块,或者,解码所述待处理图像块。
PCT/CN2019/094666 2018-08-29 2019-07-04 一种帧间预测的方法及装置 WO2020042758A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810995914.4 2018-08-29
CN201810995914.4A CN110876057B (zh) 2018-08-29 2018-08-29 一种帧间预测的方法及装置

Publications (1)

Publication Number Publication Date
WO2020042758A1 true WO2020042758A1 (zh) 2020-03-05

Family

ID=69643430

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/094666 WO2020042758A1 (zh) 2018-08-29 2019-07-04 一种帧间预测的方法及装置

Country Status (2)

Country Link
CN (1) CN110876057B (zh)
WO (1) WO2020042758A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112911284B (zh) * 2021-01-14 2023-04-07 北京博雅慧视智能技术研究院有限公司 一种视频编码中跳过模式实现方法及实现电路

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102685473A (zh) * 2011-03-10 2012-09-19 华为技术有限公司 一种帧内预测方法和装置
CN103299642A (zh) * 2011-01-07 2013-09-11 Lg电子株式会社 编码和解码图像信息的方法和使用该方法的装置
CN103841426A (zh) * 2012-10-08 2014-06-04 华为技术有限公司 用于运动矢量预测的运动矢量列表建立的方法、装置
US20150085931A1 (en) * 2013-09-25 2015-03-26 Apple Inc. Delayed chroma processing in block processing pipelines

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
PL3125560T3 (pl) * 2010-08-17 2019-01-31 M&K Holdings Inc. Urządzenie do dekodowania trybu intra-predykcji
EP2704435B1 (en) * 2011-04-25 2019-02-20 LG Electronics Inc. Intra-prediction method, and encoder and decoder using same
US10230980B2 (en) * 2015-01-26 2019-03-12 Qualcomm Incorporated Overlapped motion compensation for video coding
CN107534767A (zh) * 2015-04-27 2018-01-02 Lg电子株式会社 用于处理视频信号的方法及其装置
CN110024394B (zh) * 2016-11-28 2023-09-01 韩国电子通信研究院 对图像编码/解码的方法和设备及存储比特流的记录介质
WO2018131830A1 (ko) * 2017-01-11 2018-07-19 주식회사 케이티 비디오 신호 처리 방법 및 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103299642A (zh) * 2011-01-07 2013-09-11 Lg电子株式会社 编码和解码图像信息的方法和使用该方法的装置
CN102685473A (zh) * 2011-03-10 2012-09-19 华为技术有限公司 一种帧内预测方法和装置
CN103841426A (zh) * 2012-10-08 2014-06-04 华为技术有限公司 用于运动矢量预测的运动矢量列表建立的方法、装置
US20150085931A1 (en) * 2013-09-25 2015-03-26 Apple Inc. Delayed chroma processing in block processing pipelines

Also Published As

Publication number Publication date
CN110876057A (zh) 2020-03-10
CN110876057B (zh) 2023-04-18

Similar Documents

Publication Publication Date Title
CN108605126B (zh) 滤波视频数据的经解码块的方法和装置及存储介质
RU2580066C2 (ru) Эффективное по памяти моделирование контекста
KR101521060B1 (ko) 비디오 코딩에서 예측 데이터를 버퍼링
US10477232B2 (en) Search region determination for intra block copy in video coding
US20150071357A1 (en) Partial intra block copying for video coding
TWI527440B (zh) 在視訊寫碼中針對高效率視訊寫碼(hevc)延伸之多層之低複雜度支援
US20130163664A1 (en) Unified partition mode table for intra-mode coding
CN111200735B (zh) 一种帧间预测的方法及装置
JP2016511975A (ja) イントラ予測のためのモード決定の簡略化
US20130128971A1 (en) Transforms in video coding
JP6224851B2 (ja) 低複雑度符号化および背景検出のためのシステムおよび方法
TW201921938A (zh) 具有在用於視訊寫碼之隨機存取組態中之未來參考訊框之可調適圖像群組結構
JP7331105B2 (ja) フレーム間予測方法及び関連する装置
CN110896485B (zh) 一种预测运动信息的解码方法及装置
WO2020042758A1 (zh) 一种帧间预测的方法及装置
CN110855993A (zh) 一种图像块的运动信息的预测方法及装置
WO2020038232A1 (zh) 一种图像块的运动信息的预测方法及装置
WO2020024275A1 (zh) 一种帧间预测的方法及装置
WO2020052653A1 (zh) 一种预测运动信息的解码方法及装置
KR20210046777A (ko) 인터 예측 방법 및 장치, 비디오 인코더 및 비디오 디코더

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19855796

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19855796

Country of ref document: EP

Kind code of ref document: A1