WO2020103593A1 - 一种帧间预测的方法及装置 - Google Patents

一种帧间预测的方法及装置

Info

Publication number
WO2020103593A1
WO2020103593A1 PCT/CN2019/110206 CN2019110206W WO2020103593A1 WO 2020103593 A1 WO2020103593 A1 WO 2020103593A1 CN 2019110206 W CN2019110206 W CN 2019110206W WO 2020103593 A1 WO2020103593 A1 WO 2020103593A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
processed
image block
prediction
image
Prior art date
Application number
PCT/CN2019/110206
Other languages
English (en)
French (fr)
Inventor
张娜
陈旭
郑建铧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020103593A1 publication Critical patent/WO2020103593A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors

Definitions

  • the present application relates to the technical field of video encoding and decoding, and in particular to an inter-frame prediction method and device.
  • Digital video capabilities can be incorporated into a variety of devices, including digital TVs, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, electronics Book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones (so-called “smart phones”), video teleconferencing devices, video streaming devices And the like.
  • PDAs personal digital assistants
  • laptop or desktop computers tablet computers
  • electronics Book readers digital cameras
  • digital recording devices digital media players
  • video game devices video game consoles
  • cellular or satellite radio phones so-called “smart phones”
  • video teleconferencing devices video streaming devices And the like.
  • Digital video devices implement video compression technology, for example, in the Moving Picture Experts Group (MPEG) -2, MPEG-4, International Telecommunication Union Telecommunication Standardization Bureau (international telecommunication union-telecommunication sector, ITU-T) H .263, ITU-T H.264 / MPEG-4 Part 10 Advanced Video Coding (advanced video coding, AVC) defined standard, video coding standard H.265 / high efficiency video coding (HEVC) standard and Video compression techniques described in extensions to such standards.
  • Video devices can more efficiently transmit, receive, encode, decode, and / or store digital video information by implementing such video compression techniques.
  • Video compression techniques perform spatial (intra-image) prediction and / or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences.
  • a video slice ie, a video frame or a portion of a video frame
  • the image block in the to-be-intra-coded (I) slice of the image is encoded using spatial prediction regarding reference samples in adjacent blocks in the same image.
  • An image block in an inter-coded (P or B) slice of an image may use spatial prediction relative to reference samples in neighboring blocks in the same image or temporal prediction relative to reference samples in other reference images.
  • the image may be referred to as a frame, and the reference image may be referred to as a reference frame.
  • various video coding standards including the HEVC standard propose a predictive coding mode for image blocks, that is, predict the currently coded image block based on the already coded image block.
  • intra prediction mode the currently decoded image block is predicted based on one or more previously decoded neighboring blocks in the same image as the current block; in inter prediction mode, based on already decoded blocks in different images To predict the currently decoded image block.
  • inter-frame pre-testing is performed, and the image block to be processed is divided into basic prediction blocks according to the specified size of the basic prediction block, and then inter-prediction is performed.
  • the coding performance is limited.
  • Embodiments of the present application provide an inter prediction method and device, which can adaptively determine the size of a basic prediction block and perform inter prediction to improve coding performance.
  • a method for inter prediction including: determining a size of a basic prediction block in a to-be-processed image block according to size reference information, the size being used to determine a basic prediction block in a to-be-processed image block
  • the position of the first prediction block and the second reference block of the basic prediction block are determined according to the position of the basic prediction block; wherein, the left boundary line of the first reference block of the basic prediction block and the left boundary line of the basic prediction unit Collinear, the upper boundary line of the second reference block of the basic prediction block and the upper boundary line of the basic prediction unit are collinear, the first reference block of the basic prediction block is adjacent to the upper boundary line of the image block to be processed, the basic The second reference block of the prediction block is adjacent to the left boundary of the image block to be processed; finally, the motion vector corresponding to the first reference block of the basic prediction block, the motion vector corresponding to the second reference block of the basic prediction block, and the One or more of the motion vectors
  • the size of the basic prediction block is adaptively determined according to the size reference information, and the reasonable size reference information determines the more suitable size of the basic prediction block, so that the inter prediction performance during encoding is higher .
  • the original reference block having a preset positional relationship with the image block to be processed may include: an original reference block having a predetermined spatial positional relationship with the image block to be processed, and / or Or, an original reference block having a preset time-domain position relationship with the image block to be processed.
  • the reliability of the generated motion vector can be improved by rationally selecting the reference block used to generate the motion vector corresponding to the basic prediction block.
  • the original reference block having a preset spatial position relationship with the image block to be processed may include: located in the upper left corner of the image block to be processed and The image block adjacent to the upper left corner of the image block to be processed, the image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner of the image block to be processed, and the image block located at the lower left corner of the image block to be processed and to the image block to be processed One or more of the image blocks adjacent to the lower left corner.
  • the original reference block having a preset spatial domain position relationship with the image block to be processed is located outside the image block to be processed.
  • the original reference block having a preset time-domain position relationship with the image block to be processed may include: located in the target reference frame The image block in the lower right corner of the mapped image block and adjacent to the point in the lower right corner of the mapped image block.
  • the original reference block with the preset time domain position relationship with the image block to be processed is located outside the mapped image block, the size of the mapped image block and the image block to be processed are equal, the position of the mapped image block in the target reference frame and the image to be processed The positions of the blocks in the image frame where the image block to be processed is located are the same.
  • the time-domain reference block used to generate the motion vector corresponding to the basic prediction block the reliability of the generated motion vector is improved.
  • the index information and reference frame list information of the target reference frame may be obtained by parsing the code stream.
  • the code stream refers to the code stream transmitted between the encoding / decoding end.
  • the target reference frame can be flexibly selected, so that the corresponding time-domain reference block is more reliable.
  • the index information of the target reference frame and the reference frame list information are located in the code corresponding to the slice header of the slice where the image block to be processed is located In the flow segment.
  • the identification information of the target reference frame is stored in the slice header. All the time-domain reference blocks of the image blocks in the slice share the same reference frame information, which saves the code stream and improves the coding efficiency.
  • the motion vector corresponding to the first reference block of the basic prediction block and the motion vector corresponding to the second reference block of the basic prediction block And one or more of the motion vectors corresponding to the original reference block with the preset position relationship of the image block to be processed are weighted to obtain the motion vector corresponding to the basic prediction block, which can be specifically implemented as: the motion vector corresponding to the basic prediction block Obtained according to the following formula:
  • R (W, y) ((H-y-1) ⁇ AR + (y + 1) ⁇ BR) / H;
  • B (x, H) ((W-x-1) ⁇ BL + (x + 1) ⁇ BR) / W;
  • AR is the motion vector corresponding to the image block located in the upper right corner of the image block to be processed and adjacent to the upper right corner of the image block to be processed
  • BR is the lower right corner of the mapped image block located in the lower right corner of the mapped image block in the target reference frame
  • BL is the motion vector corresponding to the image block located in the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed
  • x is the upper left corner of the basic prediction block.
  • the ratio of H is the ratio of the height of the image block to be processed to the height of the basic prediction block
  • W is the ratio of the width of the image block to be processed and the width of the basic prediction block
  • L (-1, y) is the second reference block.
  • a (x, -1) is the motion vector corresponding to the first reference block
  • P (x, y) is the motion vector corresponding to the basic prediction block.
  • the specific implementation section of the present application provides a variety of motion vectors corresponding to the first reference block, the motion vector corresponding to the second reference block, and the original reference block that has a preset position relationship with the image block to be processed.
  • One or more of the motion vectors are weighted to obtain the motion vector corresponding to the basic prediction block.
  • the implementation is not limited to the content of this embodiment.
  • the size reference information may include a first identifier; the first identifier is used to indicate the size of the basic prediction block.
  • the method for inter prediction provided in the present application may further include: receiving a code stream, parsing and acquiring the first identifier from the code stream, and using the size indicated by the first identifier as the size of the basic prediction block.
  • the first identifier may be located in the code stream segment corresponding to any one of the sequence parameter set of the sequence where the image block to be processed is located, the image parameter set of the image where the image block is to be processed, and the slice header of the strip where the image block to be processed is located .
  • the size reference information may include the size of the plane mode prediction block in the previously reconstructed image of the current image block to be processed.
  • the plane mode prediction block is a to-be-processed image block that performs inter prediction according to any one of the foregoing feasible implementation manners in the first aspect
  • the previously reconstructed image is an image that is located before the image in which the current to-be-processed image block is located.
  • the current pending Processing the size of the basic prediction block in the image block which can be specifically implemented as: calculating the average value of the product of the width and height of all the planar mode prediction blocks in the previously reconstructed image;
  • the size of the basic prediction block in the image block is the first size; when the average value is greater than or equal to the threshold, the size of the basic prediction block in the current image block to be processed is the second size.
  • the first size is smaller than the second size.
  • prior information is used to determine the size of the basic prediction block of the current image block to be processed, and no additional identification information needs to be passed, which not only improves the adaptability to the image, but also ensures that the coding rate is not increased.
  • the previously reconstructed image of the image where the current image block is to be processed is the same as the image where the current image block is to be processed Among the images identified by the time domain layer, the reconstructed image whose coding order is closest to the image where the current image block is to be processed is located.
  • the previously reconstructed image of the image where the current image block is to be processed is an image whose coding order is away from the current image block to be processed The most recent reconstructed image.
  • the previously reconstructed image of the image where the current image block is to be processed is a plurality of images, and correspondingly, the calculation is prior to
  • the average value of the product of the width and height of all the plane mode prediction blocks in the reconstructed image includes: calculating the average value of the product of the width and height of all the plane mode prediction blocks in the previously reconstructed image.
  • the statistical information of multiple frames is accumulated to determine the size of the basic prediction block in the current image block, which improves the reliability of statistics.
  • the threshold value is a preset threshold value.
  • the above threshold may be a first threshold; when the POC of at least one reference frame of the image where the image block is to be processed is greater than the POC of the image where the image block is to be processed, the above threshold may be the second threshold.
  • the first threshold and the second threshold are different. Different thresholds can be set according to different encoding scenarios, which improves the adaptability of the corresponding encoding scenarios.
  • the inter-frame provided by the present application may further include: determining a prediction direction of the image block to be processed.
  • determining the prediction direction of the image block to be processed may include: when the first direction prediction is valid and the second direction prediction is invalid, or , When the second direction prediction is valid and the first direction prediction is invalid, the prediction direction of the image block to be processed is unidirectional prediction; when the first direction prediction is valid and the second direction prediction is valid, the prediction direction of the image block to be processed is bidirectional prediction .
  • the first direction prediction is valid; when there is no temporary image block in the adjacent area of the image block to be processed, the first reference frame image list is used to obtain the motion vector, the first direction prediction is invalid.
  • the second direction prediction is effective; when there is no temporary image block in the adjacent area of the image block to be processed, the second When the motion vector is obtained with reference to the image list of the frame, the second direction prediction is invalid.
  • the motion vector includes the first motion vector and / or the second motion vector, when at least the adjacent area of the image block to be processed When the first motion vectors of two temporary image blocks that use the first reference frame image list to obtain motion vectors are different, the first direction prediction is valid; where the first motion vector corresponds to the first reference frame image list; when the image block to be processed When the first motion vectors of all the temporary image blocks obtained by using the first reference frame image list in the adjacent area are the same, the first direction prediction is invalid; when at least two adjacent areas of the image block to be processed adopt the second When the second motion vector of the temporary image block for which the motion vector is obtained from the reference frame image list is different, the second direction prediction is valid; where the second motion vector corresponds to the second reference frame image list; when all the adjacent areas of the image block to be processed When the second motion vectors of the temporary image blocks obtained by using the second reference frame image list are the same, the second direction prediction is invalid.
  • the temporary image block is an image block having a preset size.
  • the adjacent area of the image block to be processed may include: a left airspace area and an upper airspace area of the image block to be processed , A time zone on the right, a zone or any combination of zones in the time zone on the lower.
  • determining the size of the basic prediction block in the image block to be processed may include: when the two adjacent sides of the basic prediction block When the side lengths are not equal, determine the side length of the shorter side of the basic prediction block to be 4 or 8; when the side lengths of the two adjacent sides of the basic prediction block are equal, determine the side length of the basic prediction block to be 4 or 8 .
  • the size of the basic prediction block is fixed, which reduces complexity.
  • the size reference information includes shape information of the image block to be processed; the shape information includes width and height.
  • determining the size of the basic prediction block according to the size reference information may include: when the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height is 4 Pixels; when the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.
  • the width of the basic prediction block is 8 pixels
  • the height is 8 pixels
  • the size reference information includes the prediction direction of the image block to be processed.
  • determining the size of the basic prediction block in the image block to be processed may include: determining the size of the basic prediction block according to the prediction direction of the image block to be processed.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed includes: when the prediction direction of the image block to be processed For unidirectional prediction, the width of the basic prediction block is 4 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels and the height is 4 pixels, or, the basic The prediction block has a width of 4 pixels and a height of 8 pixels.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed may include: when the prediction of the image block to be processed When the direction is unidirectional prediction, the width of the basic prediction block is 4 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed When the width of the basic prediction block is 8 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 Pixels with a height of 8 pixels.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed includes: when the prediction direction of the image block to be processed For unidirectional prediction, the width of the basic prediction block is 4 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels and the height is 8 pixels.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed includes: when the prediction direction of the image block to be processed For bidirectional prediction, the width of the basic prediction block is 8 pixels and the height is 8 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed The width of the prediction block is 8 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels, The height is 8 pixels.
  • the above content provides examples of specific implementation of determining the size of the basic prediction block when the size reference information is different content.
  • the size reference information may include one or more content, and when the size reference information includes multiple For each content, the size of the basic prediction block may be determined according to actual needs in combination with multiple content included in the size reference information, and the determination process will not be repeated here.
  • the method for inter prediction provided in this application after determining the size of the basic prediction block in the image block to be processed, further It may include: dividing the image block to be processed into a plurality of basic prediction blocks according to the determined size of the basic prediction block; determining the position of each basic prediction block in the image block to be processed in turn. It should be understood that this embodiment determines the coordinate position of each basic prediction block in the image block to be processed, and then performs inter prediction on each basic prediction block in the current image block to be processed.
  • the inter-frame provided by the present application may further include: determining that the first reference block and the second reference block are located within an image boundary where the current image block to be processed is located. In other words, it is determined that the current image block to be processed is not the boundary position of the image in which it is located.
  • the method of inter prediction provided in this application is not used.
  • the prediction method is accurate The performance will be reduced. If this method is not used at this time, unnecessary complexity overhead is avoided.
  • the inter-frame provided by the present application may further include: determining that the width of the image block to be processed is greater than or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or, determining that the width of the image block to be processed is greater than or equal to 16; or, determining the image to be processed The height of the block is greater than or equal to 16.
  • an apparatus for inter prediction including: a determining module for determining a size of a basic prediction block in an image block to be processed according to size reference information, the size used to determine a basic prediction block in The position in the image block to be processed; the positioning module is used to determine the first reference block and the second reference block of the basic prediction block according to the position of the basic prediction block in the image block to be processed; wherein, the left boundary line of the first reference block It is collinear with the left boundary of the basic prediction unit, the upper boundary of the second reference block is collinear with the upper boundary of the basic prediction unit, the first reference block is adjacent to the upper boundary of the image block to be processed, and the second reference block is Processing the left border of the image block is adjacent; the calculation module is used for the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the motion vector corresponding to the original reference block having a preset position relationship with the image block to be processed One or more of them perform weight
  • the original reference block having a preset positional relationship with the image block to be processed may include: an original reference block having a predetermined spatial positional relationship with the image block to be processed, and / or Or, an original reference block having a preset time-domain position relationship with the image block to be processed.
  • the original reference block having a preset spatial position relationship with the image block to be processed may include: located in the upper left corner of the image block to be processed And the image block adjacent to the upper left corner of the image block to be processed, the image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner of the image block to be processed, and the image block located at the lower left corner of the image block to be processed and to be processed One or more of the image blocks adjacent to the lower left corner of the image block.
  • the original reference block having a preset spatial domain position relationship with the image block to be processed is located outside the image block to be processed.
  • the original reference block having a preset time-domain positional relationship with the image block to be processed may include: located in the target reference frame The image block in the lower right corner of the mapped image block and adjacent to the point in the lower right corner of the mapped image block.
  • the original reference block with the preset time domain position relationship with the image block to be processed is located outside the mapped image block, the size of the mapped image block and the image block to be processed are equal, the position of the mapped image block in the target reference frame and the image to be processed The positions of the blocks in the image frame where the image block to be processed is located are the same.
  • the index information and reference frame list information of the target reference frame may be obtained by parsing the code stream.
  • the code stream refers to the code stream transmitted between the encoding / decoding end.
  • the index information of the target reference frame and the reference frame list information are located in the code corresponding to the strip header of the strip where the image block to be processed is located In the flow segment.
  • the calculation module is specifically configured to obtain the motion vector corresponding to the basic prediction block according to the following formula:
  • R (W, y) ((H-y-1) ⁇ AR + (y + 1) ⁇ BR) / H;
  • B (x, H) ((W-x-1) ⁇ BL + (x + 1) ⁇ BR) / W;
  • AR is the motion vector corresponding to the image block located in the upper right corner of the image block to be processed and adjacent to the upper right corner of the image block to be processed
  • BR is the lower right corner of the mapped image block located in the lower right corner of the mapped image block in the target reference frame
  • BL is the motion vector corresponding to the image block located in the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed
  • x is the upper left corner of the basic prediction block.
  • the ratio of H is the ratio of the height of the image block to be processed to the height of the basic prediction block
  • W is the ratio of the width of the image block to be processed and the width of the basic prediction block
  • L (-1, y) is the second reference block.
  • a (x, -1) is the motion vector corresponding to the first reference block
  • P (x, y) is the motion vector corresponding to the basic prediction block.
  • the size reference information may include a first identifier; the first identifier is used to indicate the size of the basic prediction block.
  • the device further includes: a receiving unit for receiving the code stream; and a parsing unit for parsing and acquiring the first identifier from the code stream received by the receiving unit.
  • the first identifier is located in the code stream segment corresponding to any one of the sequence parameter set of the sequence of the image block to be processed, the image parameter set of the image of the image block to be processed, and the slice header of the strip of the image block to be processed.
  • the size reference information may include the size of the plane mode prediction block in the previously reconstructed image of the current image block to be processed.
  • the plane mode prediction block is a to-be-processed image block that performs inter prediction according to any one of the foregoing feasible implementation manners in the second aspect, and the previously reconstructed image is an image whose coding order is before the image where the current to-be-processed image block is located. .
  • the determination module is specifically configured to: calculate the product of the width and height of all the planar mode prediction blocks in the previously reconstructed image Average value; when the average value is less than the threshold, the size of the basic prediction block in the current image block to be processed is the first size; when the average value is greater than or equal to the threshold, the size of the basic prediction block in the current image block to be processed Is the second size.
  • the first size is smaller than the second size.
  • the previously reconstructed image of the image where the current image block is to be processed is the same as the image where the current image block is to be processed Among the images identified by the time domain layer, the reconstructed image whose coding order is closest to the image where the current image block is to be processed is located.
  • the previously reconstructed image of the image where the current image block is to be processed is an image whose coding order is away from the current image block to be processed The most recent reconstructed image.
  • the threshold value is a preset threshold value.
  • the above threshold may be a first threshold; when the POC of at least one reference frame of the image where the image block is to be processed is greater than the POC of the image where the image block is to be processed, the above threshold may be the second threshold. Among them, the first threshold and the second threshold are different.
  • the determination module is further configured to: determine the prediction direction of the image block to be processed.
  • the determination module is specifically used to: when the first prediction is valid and the second prediction is invalid, or the second prediction is valid When the first direction prediction is invalid, the prediction direction of the image block to be processed is unidirectional prediction; when the first direction prediction is valid and the second direction prediction is valid, the prediction direction of the image block to be processed is bidirectional prediction.
  • the first direction prediction is valid; when there is no temporary image block in the adjacent area of the image block to be processed, the first reference frame image list is used to obtain the motion vector, the first direction prediction is invalid.
  • the second direction prediction is effective; when there is no temporary image block in the adjacent area of the image block to be processed, the second When the motion vector is obtained with reference to the image list of the frame, the second direction prediction is invalid.
  • the motion vector includes the first motion vector and / or the second motion vector, when at least the adjacent area of the image block to be processed When the first motion vectors of two temporary image blocks that use the first reference frame image list to obtain motion vectors are different, the first direction prediction is valid; where the first motion vector corresponds to the first reference frame image list; when the image block to be processed When the first motion vectors of all the temporary image blocks obtained by using the first reference frame image list in the adjacent area are the same, the first direction prediction is invalid; when at least two adjacent areas of the image block to be processed adopt the second When the second motion vector of the temporary image block for which the motion vector is obtained from the reference frame image list is different, the second direction prediction is valid; where the second motion vector corresponds to the second reference frame image list; when all the adjacent areas of the image block to be processed When the second motion vectors of the temporary image blocks obtained by using the second reference frame image list are the same, the second direction prediction is invalid.
  • the temporary image block uses only the first reference frame image list to obtain the motion vector
  • the first motion vector and the motion vector are the same
  • the temporary image block uses only the second reference frame image list to obtain the motion vector
  • the second motion vector is the same as the motion vector
  • the temporary image block is an image block having a preset size.
  • the adjacent area of the image block to be processed may include: a left airspace area and an upper airspace area of the image block to be processed , A time zone on the right, a zone or any combination of zones in the time zone on the lower.
  • determining the size of the basic prediction block in the image block to be processed may include: when the two adjacent sides of the basic prediction block When the side lengths are not equal, determine the side length of the shorter side of the basic prediction block to be 4 or 8; when the side lengths of the two adjacent sides of the basic prediction block are equal, determine the side length of the basic prediction block to be 4 or 8 .
  • the size reference information includes shape information of the image block to be processed; the shape information includes width and height.
  • the determining module is specifically used for: when the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height is 4 pixels; when the width of the image block to be processed is smaller than the width of the image block to be processed When the height of the image block is, the width of the basic prediction block is 4 pixels, and the height is 8 pixels.
  • the width of the basic prediction block is 8 pixels, and the height is 8 pixels.
  • the determination module is specifically configured to: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels with a height of 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels and the height is 4 pixels, or the width of the basic prediction block is 4 pixels and the height is 8 pixels .
  • the determination module is specifically configured to: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels with a height of 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.
  • the determination module is specifically configured to: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height is 8 pixels.
  • the determination module is specifically configured to: when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 Pixels, the height is 8 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels and the height is 4 pixels ; When the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is less than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.
  • the apparatus for inter-frame prediction provided by the present application may further include a dividing module, which is used to: according to the size of the basic prediction block The image block to be processed is divided into a plurality of basic prediction blocks; the position of each basic prediction block in the image block to be processed is determined in turn.
  • a dividing module which is used to: according to the size of the basic prediction block The image block to be processed is divided into a plurality of basic prediction blocks; the position of each basic prediction block in the image block to be processed is determined in turn.
  • the apparatus for inter-frame prediction provided by the present application may further include a judgment module, configured to: determine the first reference block and the second The reference block is located within the image boundary where the image block to be processed is located.
  • the apparatus for inter-frame prediction provided by the present application may further include a judgment module, configured to: determine that the width of the image block to be processed is greater than Or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or, it is determined that the width of the image block to be processed is greater than or equal to 16; or, the height of the image block to be processed is determined to be greater than or equal to 16.
  • an inter-frame prediction device including: a processor and a memory coupled to the processor; the processor is configured to execute the foregoing first aspect or any feasible implementation manner Inter prediction method.
  • a computer-readable storage medium having instructions stored therein, and when the instructions run on a computer, the computer is allowed to perform the first aspect or any one of the above The implementation of the inter-frame prediction method.
  • a computer program product containing instructions that, when the instructions run on a computer, cause the computer to perform the inter prediction method described in the first aspect or any feasible implementation manner .
  • a video image encoder including the apparatus for inter prediction according to the second aspect or any feasible implementation manner described above.
  • FIG. 1 is an exemplary block diagram of a video decoding system in an embodiment of this application
  • FIG. 2 is an exemplary block diagram of a video encoder in an embodiment of this application.
  • FIG. 3 is an exemplary block diagram of a video decoder in an embodiment of this application.
  • FIG. 6 is an exemplary flowchart of an inter prediction method in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a scenario in which motion vectors corresponding to basic prediction blocks are weighted in the embodiment of the present application;
  • FIG. 8 is a schematic diagram of another scenario of weighted calculation of motion vectors corresponding to basic prediction blocks in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another scenario of weighted calculation of a motion vector corresponding to a basic prediction block in an embodiment of this application.
  • FIG. 10 is an exemplary block diagram of an apparatus for inter prediction in an embodiment of the present application.
  • FIG. 11 is an exemplary block diagram of a decoding device in an embodiment of the present application.
  • words such as “exemplary” or “for example” are used as examples, illustrations or explanations. Any embodiments or design solutions described as “exemplary” or “for example” in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or design solutions. Rather, the use of words such as “exemplary” or “for example” is intended to present related concepts in a specific manner.
  • FIG. 1 is a block diagram of an example video coding system 1 described in an embodiment of the present application.
  • video coder generally refers to both video encoders and video decoders.
  • video coding or “coding” may generally refer to video encoding or video decoding.
  • the video encoder 100 and the video decoder 200 of the video coding system 1 are used to predict the currently coded image block according to various method examples described in any of the multiple new inter prediction modes proposed in this application
  • the motion information of its sub-blocks, such as the motion vector makes the predicted motion vector close to the motion vector obtained by using the motion estimation method to the greatest extent, so that there is no need to transmit the motion vector difference when encoding, thereby further improving the performance of codec.
  • the video coding system 1 includes a source device 10 and a destination device 20.
  • Source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device.
  • Destination device 20 may decode the encoded video data generated by source device 10. Therefore, the destination device 20 may be referred to as a video decoding device.
  • Various implementations of source device 10, destination device 20, or both may include one or more processors and memory coupled to the one or more processors.
  • the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures accessible by the computer, as described herein.
  • Source device 10 and destination device 20 may include various devices, including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, etc. Devices, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.
  • Link 30 may include one or more media or devices capable of moving encoded video data from source device 10 to destination device 20.
  • link 30 may include one or more communication media that enable source device 10 to transmit encoded video data directly to destination device 20 in real time.
  • the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20.
  • the one or more communication media may include wireless and / or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
  • the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20.
  • the encoded data may be output from the output interface 140 to the storage device 40.
  • the encoded data can be accessed from the storage device 40 through the input interface 240.
  • the storage device 40 may include any of a variety of distributed or locally accessed data storage media, such as a hard disk drive, a Blu-ray disc, a digital video disc (DVD), a compact disc read-only memory, CD-ROM), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
  • the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by the source device 10.
  • the destination device 20 may access the stored video data from the storage device 40 via streaming or download.
  • the file server may be any type of server capable of storing the encoded video data and transmitting the encoded video data to the destination device 20.
  • Example file servers include network servers (for example, for websites), file transfer protocol (FTP) servers, network attached storage (NAS) devices, or local disk drives. Destination device 20 can access the encoded video data through any standard data connection, including an Internet connection.
  • Wi-Fi wireless fidelity
  • DSL digital subscriber lines
  • cable modems etc.
  • suitable for access to storage A combination of both of the encoded video data on the file server.
  • the transmission of encoded video data from storage device 40 may be streaming, download transmission, or a combination of both.
  • the method of inter-frame prediction provided in this application can be applied to video encoding and decoding to support a variety of multimedia applications, such as aerial TV broadcasting, cable TV transmission, satellite TV transmission, streaming video transmission (eg, via the Internet), and storage Encoding of video data on data storage media, decoding of video data stored on data storage media, or other applications.
  • video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
  • the source device 10 includes a video source 120, a video encoder 100 and an output interface 140.
  • the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter.
  • the video source 120 may include a video capture device (eg, a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphic system, or a combination of these sources of video data.
  • Video encoder 100 may encode video data from video source 120.
  • the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140.
  • the encoded video data may also be stored on storage device 40 for later access by destination device 20 for decoding and / or playback.
  • the destination device 20 includes an input interface 240, a video decoder 200 and a display device 220.
  • input interface 240 includes a receiver and / or a modem.
  • the input interface 240 may receive encoded video data via the link 30 and / or from the storage device 40.
  • the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. In general, the display device 220 displays decoded video data.
  • the display device 220 may include various display devices, for example, a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the video encoder 100 and the video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle the encoding of both audio and video in a common data stream or separate data streams.
  • the demultiplexer (MUX-DEMUX) unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as user datagram protocol (UDP).
  • the video encoder 100 and the video decoder 200 may each be implemented as any one of various circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (application-specific integrated circuit, ASIC), field programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the application is partially implemented in software, the device may store the instructions for the software in a suitable non-volatile computer-readable storage medium, and may use one or more processors to execute the instructions in hardware Thus, the technology of the present application is implemented. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as a combined encoder in the corresponding device / Part of the decoder (codec).
  • codec Part of the decoder
  • the present application may generally refer to video encoder 100 as another device that “signals” or “transmits” certain information to, for example, video decoder 200.
  • the terms “signaling” or “transmitting” may generally refer to the transmission of syntax elements and / or other data used to decode compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur after a period of time, for example, may occur when the syntax element is stored in the encoded code stream to a computer-readable storage medium at the time of encoding, and the decoding device may then store the syntax element to this medium At any time.
  • H.265 HEVC
  • HEVC test model HEVC test model
  • the latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265.
  • the latest version of the standard document is H.265 (12/16)
  • the way cited is incorporated herein.
  • HM assumes that the video decoding device has several additional capabilities relative to the existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction encoding modes, while HM can provide up to 35 intra-prediction encoding modes.
  • JVET is committed to developing the H.266 standard.
  • the process of H.266 standardization is based on the evolution model of the video decoding device called the H.266 test model.
  • the algorithm description of H.266 is available from http://phenix.int-evry.fr/jvet.
  • the latest algorithm description is included in JVET-F1001-v2.
  • the reference software for the JEM test model can be obtained from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated in this article by full text citation.
  • HM coding tree unit
  • the tree block has a similar purpose to the macro block of the H.264 standard.
  • a slice contains several consecutive tree blocks in decoding order.
  • the video frame or image can be divided into one or more stripes.
  • Each tree block can be split into coding units (CU) according to the quadtree. For example, the tree block that is the root node of the quadtree may be split into four child nodes, and each child node may be a parent node and split into four other child nodes.
  • the final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, for example, decoded video blocks.
  • the syntax data associated with the decoded code stream may define the maximum number of times the tree block can be split, and may also define the minimum size of the decoding node.
  • the coding unit includes a decoding node and a prediction unit (PU) and a transform unit (TU) associated with the decoding node.
  • the size of the CU corresponds to the size of the decoding node and the shape must be square.
  • the size of the CU may range from 8 ⁇ 8 pixels up to a maximum size of 64 ⁇ 64 pixels or larger tree blocks.
  • Each CU may contain one or more PUs and one or more TUs.
  • the syntax data associated with a CU may describe a situation where the CU is divided into one or more PUs.
  • the split mode may be different between the cases where the CU is skipped or is encoded in direct mode, intra prediction mode encoding, or inter prediction mode encoding.
  • the PU may be divided into non-square shapes.
  • the syntax data associated with a CU may also describe the case where the CU is divided into one or more TUs according to the quadtree.
  • the shape of the TU may be square or non-square.
  • the PU contains data related to the prediction process.
  • the PU when the PU is encoded in intra mode, the PU may include data describing the intra prediction mode of the PU.
  • the PU when the PU is encoded in inter mode, the PU may include data defining the motion vector of the PU.
  • the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (eg, quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).
  • TU uses transformation and quantization processes.
  • a given CU with one or more PUs may also contain one or more TUs.
  • the video encoder 100 may calculate the residual value corresponding to the PU. Residual values include pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding.
  • This application generally uses the term "video block” to refer to a CU's decoding node. In some specific applications, the application may also use the term "video block” to refer to a tree block that includes a decoding node and PUs and TUs, such as an LCU or CU.
  • Video sequences usually contain a series of video frames or images.
  • a group of pictures exemplarily includes a series of one or more video images.
  • the GOP may contain syntax data in the header information of the GOP, the header information of one or more of the pictures, or elsewhere, and the syntax data describes the number of pictures included in the GOP.
  • Each slice of an image may contain slice syntax data describing the encoding mode of the corresponding image.
  • Video encoder 100 typically operates on video blocks within individual video slices in order to encode video data.
  • the video block may correspond to a decoding node within the CU.
  • Video blocks may have a fixed or varying size, and may differ in size according to the specified decoding standard.
  • HM supports prediction of various PU sizes. Assuming that the size of a particular CU is 2N ⁇ 2N, HM supports intra prediction of PU size of 2N ⁇ 2N or N ⁇ N, and interframe of symmetric PU size of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N or N ⁇ N prediction. HM also supports asymmetric partitioning for inter prediction of PU sizes of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N. In asymmetric division, the CU is not divided in one direction, and is divided into 25% and 75% in the other direction.
  • 2N ⁇ nU refers to a horizontally divided 2N ⁇ 2NCU, where 2N ⁇ 0.5NPU is at the top and 2N ⁇ 1.5NPU is at the bottom.
  • the video encoder 100 may calculate the residual data of the TU of the CU.
  • the PU may include pixel data in a spatial domain (also called a pixel domain), and the TU may include a transform (for example, discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Transform the coefficients in the domain after applying to the residual video data.
  • the residual data may correspond to the pixel difference between the pixels of the unencoded image and the prediction values corresponding to the PU.
  • Video encoder 100 may form a TU that includes the residual data of the CU, and then transform the TU to produce the transform coefficients of the CU.
  • video encoder 100 may perform quantization of the transform coefficients.
  • Quantization exemplarily refers to a process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value can be rounded down to an m-bit value during quantization, where n is greater than m.
  • the JEM model further improves the coding structure of video images.
  • a block coding structure called "Quadtree Combined Binary Tree" (QTBT) is introduced.
  • QTBT Quality of Binary Tree
  • a CU can be square or rectangular.
  • a CTU first divides the quadtree, and the leaf nodes of the quadtree further divide the quadtree.
  • there are two division modes in binary tree division symmetric horizontal division and symmetric vertical division.
  • the leaf nodes of the binary tree are called CU.
  • JEM CU cannot be further divided in the process of prediction and transformation, that is to say JEM CU, PU, TU have the same block size.
  • the maximum size of the CTU is 256 ⁇ 256 brightness pixels.
  • the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded.
  • the video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, the video encoder 100 may perform context-based adaptive variable-length decoding (CAVLC) and context-adaptive binary arithmetic decoding (context-based Based on adaptive binary arithmetic coding (CABAC), context-based adaptive binary arithmetic decoding (syntax-based adaptive binary arithmetic decoding, SBAC), probability interval partition entropy (probability interval partitioning entropy, PIPE) decoding or other entropy decoding Entropy decodes one-dimensional vectors.
  • Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 in decoding the video data.
  • the video encoder 100 may assign the context within the context model to the symbol to be transmitted.
  • the context may be related to whether the adjacent value of the symbol is non-zero.
  • the video encoder 100 may select a variable length code of symbols to be transmitted. Codewords in variable-length decoding (VLC) can be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC can achieve code rate savings relative to the use of codewords of equal length for each symbol to be transmitted.
  • the probability in CABAC can be determined based on the context assigned to the symbol.
  • the video encoder may perform inter prediction to reduce temporal redundancy between images.
  • a CU may have one or more prediction units PU.
  • multiple PUs may belong to the CU, or the PU and CU have the same size.
  • the division mode of the CU is not divided, or is divided into one PU, and the PU is used to express uniformly.
  • the video encoder may signal the video decoder with motion information for the PU.
  • the motion information of the PU may include: reference image index, motion vector, and prediction direction identification.
  • the motion vector may indicate the displacement between the image block (also called video block, pixel block, pixel set, etc.) of the PU and the reference block of the PU.
  • the reference block of the PU may be a part of the reference image of the image block similar to the PU.
  • the reference block may be located in the reference image indicated by the reference image index and the prediction direction identification.
  • the video encoder may generate candidate prediction motion vectors (Motion Vector, MV) for each of the PUs according to the merge prediction mode or advanced motion vector prediction mode process List.
  • MV Motion Vector
  • Each candidate predicted motion vector in the candidate predicted motion vector list for the PU may indicate motion information.
  • the motion information indicated by some candidate prediction motion vectors in the candidate prediction motion vector list may be based on the motion information of other PUs. If the candidate predicted motion vector indicates motion information specifying one of the spatial candidate predicted motion vector position or the temporal candidate predicted motion vector position, the present application may refer to the candidate predicted motion vector as the "original" candidate predicted motion vector.
  • the video encoder may generate additional candidate prediction motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying the original candidate prediction motion vector, or inserting only zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered as original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.
  • the technology of the present application generally relates to a technique for generating a candidate prediction motion vector list at a video encoder and a technique for generating the same candidate prediction motion vector list at a video decoder.
  • the video encoder and video decoder may generate the same candidate prediction motion vector list by implementing the same technique for constructing the candidate prediction motion vector list. For example, both the video encoder and the video decoder may construct a list with the same number of candidate predicted motion vectors (eg, five candidate predicted motion vectors).
  • the video encoder and decoder may first consider spatial candidate prediction motion vectors (eg, neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (eg, candidate prediction motion vectors in different images), and finally may consider Artificially generated candidate prediction motion vectors until the desired number of candidate prediction motion vectors are added to the list.
  • a pruning operation can be utilized for certain types of candidate predicted motion vectors during the candidate predicted motion vector list construction in order to remove the duplication from the candidate predicted motion vector list, while for other types of candidate predicted motion vectors Use pruning to reduce decoder complexity.
  • a pruning operation may be performed to exclude candidate predicted motion vectors with repeated motion information from the list of candidate predicted motion vectors.
  • the artificially generated candidate predicted motion vectors may be added without performing a trimming operation on the artificially generated candidate predicted motion vectors.
  • the video encoder may select the candidate prediction motion vector from the candidate prediction motion vector list and output the candidate prediction motion vector index in the code stream.
  • the selected candidate prediction motion vector may be a candidate prediction motion vector having a motion vector that produces a predictor that most closely matches the predictor of the target PU being decoded.
  • the candidate predicted motion vector index may indicate the position of the selected candidate predicted motion vector in the candidate predicted motion vector list.
  • the video encoder may also generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate predicted motion vector.
  • the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
  • the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector.
  • the video encoder may generate one or more residual image blocks for the CU based on the predictive image blocks of the PU of the CU and the original image blocks for the CU. The video encoder may then encode one or more residual image blocks and output the one or more residual image blocks in the codestream.
  • the codestream may include data identifying the selected candidate prediction motion vector in the PU's candidate prediction motion vector list.
  • the video decoder may determine the motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the PU's candidate prediction motion vector list.
  • the video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate a predictive image block for the PU based on the one or more reference blocks of the PU.
  • the video decoder may reconstruct the image block for the CU based on the predictive image block for the PU of the CU and the one or more residual image blocks for the CU.
  • the present application may describe the position or image block as having various spatial relationships with the CU or PU. This description can be interpreted to mean that the position or image block and the image block associated with the CU or PU have various spatial relationships.
  • the PU currently being decoded by the video decoder may be referred to as a current PU, which is also referred to as a current image block to be processed.
  • the CU currently being decoded by the video decoder may be referred to as the current CU.
  • the image currently being decoded by the video decoder may be referred to as the current image. It should be understood that this application is also applicable to the case where the PU and the CU have the same size, or the PU is the CU, which is uniformly expressed by the PU.
  • the video encoder 100 may use inter prediction to generate predictive image blocks and motion information for the PU of the CU.
  • the motion information of a given PU may be the same or similar to the motion information of one or more nearby PUs (that is, PUs whose image blocks are spatially or temporally near the image block of a given PU). Because nearby PUs often have similar motion information, video encoder 100 may refer to the motion information of nearby PUs to encode the motion information of a given PU. Encoding the motion information of a given PU with reference to the motion information of nearby PUs can reduce the number of coding bits required in the codestream to indicate the motion information of a given PU.
  • the video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs in various ways. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. The present application may use a merge mode to indicate that the motion information of a given PU is the same as the motion information of a nearby PU or may be derived from the motion information of a nearby PU. In another feasible embodiment, the video encoder 100 may calculate a motion vector difference (MVD) for a given PU. MVD indicates the difference between the motion vector of a given PU and the motion vector of nearby PUs.
  • MVD motion vector difference
  • Video encoder 100 may include the MVD instead of the motion vector of a given PU in the motion information of a given PU. Representing MVD in the code stream requires fewer coding bits than representing the motion vector of a given PU.
  • This application may use the advanced motion vector prediction mode to refer to the motion information of a given PU signaled on the decoding side by using the MVD and the index value of the recognition candidate motion vector.
  • the video encoder 100 may generate a list of candidate prediction motion vectors for a given PU.
  • the candidate predicted motion vector list may include one or more candidate predicted motion vectors.
  • Each of the candidate prediction motion vectors in the candidate prediction motion vector list for a given PU may specify motion information.
  • the motion information indicated by each candidate predicted motion vector may include a motion vector, a reference image index, and a prediction direction identification.
  • the candidate predicted motion vectors in the candidate predicted motion vector list may include "original" candidate predicted motion vectors, each of which indicates motion information of one of the specified candidate predicted motion vector positions within the PU that is different from the given PU.
  • the video encoder 100 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, the video encoder may compare each candidate prediction motion vector to the PU being decoded and may select candidate prediction motion vectors with a desired rate-distortion cost. The video encoder 100 may output candidate prediction motion vector indexes for PU. The candidate predicted motion vector index may identify the position of the selected candidate predicted motion vector in the candidate predicted motion vector list.
  • the video encoder 100 may generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU.
  • the motion information of the PU may be determined based on the motion information indicated by the selected candidate predicted motion vector in the candidate predicted motion vector list for the PU.
  • the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
  • the motion information of the PU may be determined based on the motion vector difference for the PU and the motion information indicated by the selected candidate predicted motion vector.
  • the video encoder 100 may process the predictive image block for the PU as described above.
  • the video decoder 200 may generate a list of candidate prediction motion vectors for each of the PUs of the CU.
  • the candidate prediction motion vector list generated by the video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by the video encoder 100 for the PU.
  • the syntax element parsed from the code stream may indicate the location of the selected candidate prediction motion vector in the PU's candidate prediction motion vector list.
  • the video decoder 200 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU.
  • the video decoder 200 may determine the motion information of the PU based on the motion information indicated by the selected candidate predicted motion vector in the candidate predicted motion vector list for the PU.
  • the video decoder 200 may reconstruct the image block for the CU based on the predictive image block for the PU and the residual image block for the CU.
  • the construction of the candidate prediction motion vector list and the analysis of the position of the selected candidate prediction motion vector in the candidate prediction motion vector list from the code stream are independent of each other, and can be arbitrary Successively or in parallel.
  • the position of the selected candidate predicted motion vector in the candidate predicted motion vector list is first parsed from the code stream, and the candidate predicted motion vector list is constructed according to the parsed position.
  • the candidate prediction motion vector at the position can be determined. For example, when parsing the code stream and finding that the selected candidate predicted motion vector is the candidate predicted motion vector with index 3 in the candidate predicted motion vector list, only the candidate predicted motion vector from index 0 to index 3 needs to be constructed The list can determine the candidate prediction motion vector with index 3, which can achieve the technical effect of reducing complexity and improving decoding efficiency.
  • the post-processing entity 41 represents an example of a video entity that can process encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device.
  • the post-processing entity 41 may be an instance of a network entity.
  • the post-processing entity 41 and the video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to the post-processing entity 41 may be the same device including the video encoder 100 carried out.
  • the post-processing entity 41 is an example of the storage device 40 of FIG. 1.
  • the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded picture buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and entropy Encoder 103.
  • the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109.
  • the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111.
  • the filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (adaptive loop filter, ALF), and a sample adaptive offset (sample adaptive offset, SAO) filter.
  • the filter unit 106 is shown as an in-loop filter in FIG. 2, in other implementations, the filter unit 106 may be implemented as a post-loop filter.
  • the video encoder 100 may further include a video data storage and a division unit (not shown in the figure).
  • the video data storage may store video data to be encoded by the components of the video encoder 100.
  • the video data stored in the video data storage can be obtained from the video source 120.
  • DPB 107 may be a reference image memory, which stores reference video data used for encoding video data by the video encoder 100 in intra-frame and inter-frame coding modes.
  • the video data memory and the DPB 107 may be formed by any of a variety of memory devices, such as dynamic random access memory (Dynamic Random Access Memory, DRAM), magnetoresistive, including synchronous dynamic random access memory (SDRAM) Type RAM (magnetic random access memory, MRAM), resistive RAM (resistive random access memory, RRAM), or other types of memory devices.
  • the video data memory and DPB 107 may be provided by the same memory device or separate memory devices.
  • the video data storage may be on-chip along with other components of video encoder 100, or off-chip relative to those components.
  • the video encoder 100 receives video data and stores the video data in a video data storage.
  • the dividing unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, for example, image block division based on a quadtree structure or a binary tree structure. This division may also include division into slices, tiles, or other larger units.
  • the video encoder 100 generally illustrates the components that encode the image blocks within the video slice to be encoded.
  • the slice can be divided into multiple image blocks (and possibly into a set of image blocks called slices).
  • the prediction processing unit 108 may select one of multiple possible coding modes for the current image block, such as one of multiple intra coding modes or one of multiple inter coding modes.
  • the prediction processing unit 108 may provide the resulting intra- and inter-coded blocks to the summer 112 to generate a residual block, and to the summer 111 to reconstruct the encoded block used as a reference image.
  • the intra-predictor 109 within the prediction processing unit 108 may perform intra-predictive coding of the current image block relative to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy .
  • the inter predictor 110 within the prediction processing unit 108 may perform inter-predictive coding of the current image block relative to one or more prediction blocks in one or more reference images to remove temporal redundancy.
  • the inter predictor 110 may be used to determine the inter prediction mode used to encode the current image block.
  • the inter-predictor 110 may use a rate-distortion analysis to calculate the rate-distortion values of various inter-prediction modes in the set of candidate inter-prediction modes, and select the best rate-distortion characteristics from them Inter prediction mode.
  • Rate distortion analysis generally determines the amount of distortion (or error) between the encoded block and the original unencoded block encoded to produce the encoded block, and the bit rate used to generate the encoded block (also That is, the number of bits).
  • the inter predictor 110 may determine that the inter prediction mode in the set of candidate inter prediction modes that encodes the current image block with the lowest rate distortion cost is the inter prediction mode used for inter prediction of the current image block.
  • the inter predictor 110 is used to predict the motion information (for example, motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and use the motion information of one or more sub-blocks in the current image block (for example, Motion vector) to obtain or generate the prediction block of the current image block.
  • the inter predictor 110 may locate the prediction block pointed to by the motion vector in one of the reference image lists.
  • Inter predictor 110 may also generate syntax elements associated with image blocks and video slices for use by video decoder 200 in decoding image blocks of video slices.
  • the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate the prediction block of each sub-block, thereby obtaining the prediction block of the current image block; it should be understood that The inter predictor 110 performs motion estimation and motion compensation processes.
  • the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the indication Information about the selected inter prediction mode.
  • the intra predictor 109 may perform intra prediction on the current image block.
  • the intra predictor 109 may determine the intra prediction mode used to encode the current block.
  • the intra-predictor 109 may use rate-distortion analysis to calculate the rate-distortion values of various intra-prediction modes to be tested, and select the one with the best rate-distortion characteristics from the modes to be tested Intra prediction mode.
  • the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication Information of the selected intra prediction mode.
  • the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
  • Summer 112 represents one or more components that perform this subtraction operation.
  • the residual video data in the residual block may be included in one or more (transform unit, TU) and applied to the transformer 101.
  • the transformer 101 transforms the residual video data into residual transform coefficients using transforms such as discrete cosine transform (DCT) or conceptually similar transforms.
  • DCT discrete cosine transform
  • the transformer 101 may convert the residual video data from the pixel value domain to the transform domain, for example, the frequency domain.
  • Transformer 101 may send the resulting transform coefficient to quantizer 102.
  • the quantizer 102 quantizes the transform coefficients to further reduce the bit rate.
  • quantizer 102 may then perform a scan of a matrix that includes quantized transform coefficients.
  • the entropy encoder 103 may perform scanning.
  • the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique.
  • CAVLC context adaptive variable length coding
  • CABAC context adaptive binary arithmetic coding
  • SBAC syntax-based context adaptive binary arithmetic coding
  • PIPE probability interval segmentation entropy Coding
  • the encoded code stream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200.
  • the entropy encoder 103 may also entropy encode the syntax elements of the current image block to be encoded.
  • the inverse quantizer 104 and the inverse variator 105 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for use as a reference block for a reference image later.
  • the summer 111 adds the reconstructed residual block to the prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block.
  • the filter unit 106 may be applied to the reconstructed image block to reduce distortion, such as block artifacts.
  • This reconstructed image block is then stored as a reference block in the decoded image buffer 107, which can be used as a reference block by the inter predictor 110 to inter predict a block in a subsequent video frame or image.
  • the video encoder 100 can directly quantize the residual signal without processing by the converter 101, and accordingly, without processing by the inverse converter 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 can convert the reconstructed image
  • the blocks are directly stored as reference blocks without processing by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be combined together.
  • FIG. 3 is a block diagram of an example video decoder 200 described in an embodiment of the present application.
  • the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a DPB 207.
  • the prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209.
  • video decoder 200 may perform a decoding process that is generally reciprocal to the encoding process described with respect to video encoder 100 from FIG. 2.
  • the video decoder 200 receives from the video encoder 100 an encoded video codestream representing image blocks of the encoded video slice and associated syntax elements.
  • the video decoder 200 may receive video data from the network entity 42, and optionally, the video data may also be stored in a video data storage (not shown in the figure).
  • the video data memory may store video data to be decoded by components of the video decoder 200, such as an encoded video code stream.
  • the video data stored in the video data storage can be obtained, for example, from a local video source such as the storage device 40, from a camera, via wired or wireless network communication of the video data, or by accessing a physical data storage medium.
  • the video data memory may serve as a decoded image buffer (CPB) for storing encoded video data from the encoded video code stream. Therefore, although the video data storage is not illustrated in FIG. 3, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages.
  • the video data memory and DPB 207 can be formed by any of a variety of memory devices, for example: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM) , Or other types of memory devices.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • the video data storage may be integrated on the chip with other components of the video decoder 200, or provided off-chip relative to those components.
  • the network entity 42 may be, for example, a server, MANE, video editor / splicer, or other such device for implementing one or more of the techniques described above.
  • the network entity 42 may or may not include a video encoder, such as video encoder 100.
  • the network entity 42 may implement some of the techniques described in this application.
  • the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200.
  • the network entity 42 may be an example of the storage device 40 of FIG. 1.
  • the entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements.
  • the entropy decoder 203 forwards the syntax element to the prediction processing unit 208.
  • the video decoder 200 may receive syntax elements at the video slice level and / or the image block level.
  • the intra-predictor 209 of the prediction processing unit 208 may be based on the signaled intra-prediction mode and the previous decoded block's Data to generate the prediction block of the image block of the current video slice.
  • the inter-predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203, the The inter prediction mode in which the current image block of the video strip is decoded, based on the determined inter prediction mode, decodes the current image block (for example, performs inter prediction).
  • the inter predictor 210 may determine whether to use the new inter prediction mode for prediction of the current image block of the current video strip, if the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video strip or a sub-block of the current image block Motion information, so that the predicted motion information of the current image block or the sub-block of the current image block is obtained or generated by the motion compensation process using the predicted motion information of the current image block or the sub-block of the current image block.
  • a new inter prediction mode for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode
  • the motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, reference image list number and reference image index corresponding to the reference image list.
  • a prediction block may be generated from one of the reference images within one of the reference image lists.
  • the video decoder 200 may construct reference image lists, that is, list 0 and list 1, based on the reference images stored in the DPB 207.
  • the reference frame index of the current image may be included in one or more of reference frame list 0 and list 1.
  • the video encoder 100 signals whether to adopt a new inter prediction mode to decode a specific syntax element of a specific block, or may also signal whether to adopt a new inter prediction mode, And indicate which new inter prediction mode to use to decode specific syntax elements of a specific block. It should be understood that the inter predictor 210 here performs a motion compensation process.
  • the inverse quantizer 204 inversely quantizes, ie dequantizes, the quantized transform coefficients provided in the code stream and decoded by the entropy decoder 203.
  • the inverse quantization process may include using the quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly determining the degree of inverse quantization that should be applied.
  • the inverse transformer 205 applies an inverse transform to transform coefficients, such as inverse DCT, inverse integer transform, or a conceptually similar inverse transform process, so as to generate a residual block in the pixel domain.
  • the video decoder 200 After the inter predictor 210 generates a prediction block for the current image block or a sub-block of the current image block, the video decoder 200 passes the residual block from the inverse transformer 205 and the corresponding prediction generated by the inter predictor 210 The blocks are summed to obtain the reconstructed block, that is, the decoded image block.
  • Summer 211 represents the component that performs this summation operation.
  • a loop filter (either in the decoding loop or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve the video quality.
  • the filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
  • the filter unit 206 is shown as an in-loop filter in FIG. 3, in other implementations, the filter unit 206 may be implemented as a post-loop filter.
  • the filter unit 206 is adapted to reconstruct the block to reduce block distortion, and the result is output as a decoded video stream.
  • the DPB 207 may be part of a memory, which may also store decoded video for later presentation on a display device (such as the display device 220 of FIG. 1), or may be separate from such memory.
  • the video decoder 200 may generate an output video stream without being processed by the filter unit 206; or, for certain image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, accordingly It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.
  • the technology of the present application exemplarily relates to inter-frame decoding.
  • the techniques of this application may be performed by any of the video decoders described in this application, including, for example, the video encoder 100 and video decoding as shown and described with respect to FIGS. 1 to 3 ⁇ 200. That is, in one possible implementation, the inter predictor 110 described with respect to FIG. 2 may perform specific techniques described below when performing inter prediction during encoding of blocks of video data. In another feasible embodiment, the inter predictor 210 described with respect to FIG. 3 may perform specific techniques described below when performing inter prediction during decoding of blocks of video data. Therefore, a reference to a general "video encoder" or "video decoder” may include the video encoder 100, the video decoder 200, or another video encoding or encoding unit.
  • the inter prediction module 121 may include a motion estimation unit and a motion compensation unit. In different video compression codec standards, the relationship between PU and CU is different.
  • the inter prediction module 121 may divide the current CU into PUs according to multiple division modes. For example, the inter prediction module 121 may partition the current CU into PUs according to 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, and N ⁇ N partition modes.
  • the inter prediction module 121 may also divide the current CU into PUs according to the determined size of the basic prediction block; in this scenario, the CU is an image block to be processed , PU is the basic prediction block. In other embodiments, the current CU is the current PU and is not limited.
  • the inter prediction module 121 may perform motion estimation on each of the PUs, acquiring its motion vector.
  • motion estimation may include integer motion estimation (Integer Motion Estimation, IME) and then perform fractional motion estimation (Fraction Motion Estimation, FME).
  • IME integer motion estimation
  • FME Fraction Motion Estimation
  • the inter prediction module 121 may search for the reference block for the PU in one or more reference images. After finding the reference block for the PU, the inter prediction module 121 may generate a motion vector indicating the spatial displacement between the PU and the reference block for the PU with integer precision.
  • FME Fraction Motion Estimation
  • the inter prediction module 121 may improve the motion vector generated by performing IME on the PU.
  • the motion vector generated by performing FME on the PU may have sub-integer precision (for example, 1/2 pixel precision, 1/4 pixel precision, etc.).
  • the inter prediction module 121 may use the motion vector for the PU to generate the predictive image block for the PU.
  • the inter prediction module 121 has presets for the motion vector corresponding to the first reference block of the PU, the motion vector corresponding to the second reference block of the PU, and the CU One or more of the motion vectors corresponding to the original reference block of the position relationship are weighted to obtain the motion vector corresponding to the PU.
  • the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU.
  • the candidate predicted motion vector list may include one or more original candidate predicted motion vectors and one or more additional candidate predicted motion vectors derived from the original candidate predicted motion vector.
  • the inter prediction module 121 may select the candidate predicted motion vector from the candidate predicted motion vector list and generate a motion vector difference (MVD) for the PU.
  • the MVD for the PU may indicate the difference between the motion vector indicated by the selected candidate prediction motion vector and the motion vector generated for the PU using IME and FME.
  • the inter prediction module 121 may output a candidate predicted motion vector index that identifies the position of the selected candidate predicted motion vector in the candidate predicted motion vector list.
  • the inter prediction module 121 may also output the MVD of the PU.
  • the inter prediction module 121 may also perform a merge operation on each of the PUs.
  • the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU.
  • the candidate prediction motion vector list for the PU may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vector.
  • the original candidate predicted motion vectors in the candidate predicted motion vector list may include one or more spatial candidate predicted motion vectors and temporal candidate predicted motion vectors.
  • the spatial candidate prediction motion vector may indicate the motion information of other PUs in the current image.
  • the temporal candidate prediction motion vector may be based on motion information of the corresponding PU different from the current image.
  • the temporal candidate prediction motion vector may also be referred to as temporal motion vector prediction (TMVP).
  • the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. The inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
  • the inter prediction module 121 can select the predictive image blocks generated through the FME operation or through the merge operation Predictive image block. In some feasible embodiments, the inter prediction module 121 may select the predictive image for the PU based on the rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation Piece.
  • the inter prediction module 121 can select a partition mode for the current CU. In some embodiments, the inter prediction module 121 may select the one for the current CU based on the rate-distortion cost analysis of the selected predictive image block of the PU generated by dividing the current CU according to each of the partition modes Split mode. The inter prediction module 121 may output the predictive image block associated with the PU belonging to the selected partition mode to the residual generation module 102. The inter prediction module 121 may output the syntax element indicating the motion information of the PU belonging to the selected partition mode to the entropy encoding module.
  • the inter prediction module 121 includes IME modules 180A to 180N (collectively referred to as “IME module 180”), FME modules 182A to 182N (collectively referred to as “FME module 182”), and merge modules 184A to 184N (collectively referred to as Are “merging module 184"), PU mode decision modules 186A to 186N (collectively referred to as “PU mode decision module 186”) and CU mode decision module 188 (which may also include performing a mode decision process from CTU to CU).
  • IME module 180 IME modules 180A to 180N
  • FME module 182 FME modules 182A to 182N
  • merge modules 184A to 184N collectively referred to as Are “merging module 184"
  • PU mode decision modules 186A to 186N collectively referred to as "PU mode decision module 186”
  • CU mode decision module 188 which may also include performing a mode decision process from CTU to CU).
  • the IME module 180, the FME module 182, and the merge module 184 may perform IME operations, FME operations, and merge operations on the PU of the current CU.
  • the inter prediction module 121 is illustrated in the schematic diagram of FIG. 4 as including a separate IME module 180, FME module 182, and merge module 184 for each PU for each partition mode of the CU. In other possible implementations, the inter prediction module 121 does not include a separate IME module 180, FME module 182, and merge module 184 for each PU of each partition mode of the CU.
  • the IME module 180A, the FME module 182A, and the merge module 184A may perform the IME operation, FME operation, and merge operation on the PU generated by dividing the CU according to the 2N ⁇ 2N division mode.
  • the PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, the FME module 182A, and the merge module 184A.
  • the IME module 180B, the FME module 182B, and the merge module 184B may perform the IME operation, FME operation, and merge operation on the left PU generated by dividing the CU according to the N ⁇ 2N division mode.
  • PU mode decision module 186B may select one of the predictive image blocks generated by IME module 180B, FME module 182B, and merge module 184B.
  • the IME module 180C, the FME module 182C, and the merge module 184C may perform an IME operation, an FME operation, and a merge operation on the right PU generated by dividing the CU according to the N ⁇ 2N division mode.
  • the PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.
  • the IME module 180N, the FME module 182N, and the merge module 184 may perform the IME operation, FME operation, and merge operation on the lower right PU generated by dividing the CU according to the N ⁇ N division mode.
  • the PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.
  • the PU mode decision module 186 may select a predictive image block based on a rate-distortion cost analysis of multiple possible predictive image blocks, and select a predictive image block that provides the best rate-distortion cost for a given decoding situation. Exemplarily, for bandwidth-limited applications, the PU mode decision module 186 may prefer to select predictive image blocks that increase the compression ratio, while for other applications, the PU mode decision module 186 may prefer to select predictive images that increase the quality of the reconstructed video Piece. After the PU mode decision module 186 selects the predictive image block for the PU of the current CU, the CU mode decision module 188 selects the partition mode for the current CU and outputs the predictive image block and motion information of the PU belonging to the selected partition mode .
  • FIG. 5 shows a schematic diagram of an exemplary image block to be processed and its reference block in the embodiment of the present application.
  • W and H are the width and height of the image block 500 to be processed and the co-located block (referred to simply as a mapped image block) 500 'in the designated reference image of the image block to be processed.
  • the content marked in each block in FIG. 5 is the motion vector corresponding to the block.
  • P (x, y) marked in FIG. 5 is the motion vector corresponding to the basic prediction block 604 in the image block 500 to be processed.
  • the reference blocks of the image block to be processed include: the upper spatial adjacent block and the left spatial adjacent block of the image block to be processed, and the lower spatial adjacent block and the right spatial adjacent block of the mapped image block, where the mapped image block is a designated reference
  • the lower spatial neighboring block and the right spatial neighboring block of the mapped image block may also be referred to as temporal reference blocks.
  • Each frame of image can be divided into image blocks for encoding, called image blocks to be processed, and these image blocks can be further divided into smaller blocks, called basic prediction blocks.
  • the image block to be processed and the mapped image block can be divided into multiple M ⁇ N sub-blocks, that is, the size of each sub-block is M ⁇ N pixels, and the size of each reference block may also be set to M ⁇ N pixels, That is, the size of the sub-block is the same as the image block to be processed.
  • MxN and M by N are used interchangeably to refer to the pixel size of the image sub-block in accordance with the horizontal dimension and the vertical dimension, ie, have M pixels in the horizontal direction and N pixels in the vertical direction , Where M and N represent non-negative integer values. In addition, M and N are not necessarily the same.
  • the sub-block size of the image block to be processed and the size of the reference block may be 4 ⁇ 4, 8 ⁇ 8, 8 ⁇ 4 or 4 ⁇ 8 pixels, or the prediction block allowed by the standard smallest size.
  • the measurement units of W and H are the width and height of the sub-block, respectively, that is, W represents the ratio of the width of the image block to be processed to the width of the sub-block in the image block to be processed, and H indicates to be processed The ratio of the height of the image block to the height of the sub-block in the image block to be processed.
  • the image blocks to be processed described in the present application may be understood as, but not limited to: prediction unit (prediction unit, PU) or coding unit (coding unit, CU) or transform unit (transform unit, TU), etc.
  • the CU may contain one or more prediction units PU, or the PU and CU have the same size.
  • the image block to be processed may have a fixed or variable size, and differ in size according to different video compression codec standards.
  • the image block to be processed refers to an image block to be encoded or to be decoded currently, for example, a prediction unit to be encoded or to be decoded.
  • the image block to be processed may be a part or all of the image to be processed, which is not specifically limited in this application.
  • each adjacent spatial block of the left image domain of the image block to be processed is available along direction 1, and to sequentially determine each of the image blocks to be processed along direction 2 Whether adjacent blocks in the side airspace are available. For example, it is determined whether the above-mentioned adjacent block adopts inter-frame coding. If the adjacent block exists and adopts inter-frame coding, the adjacent block is available; if the adjacent block does not exist or uses intra-frame coding, the adjacent block is unavailable. In a feasible implementation manner, if an adjacent block adopts intra-coding, the motion information of other adjacent reference blocks is copied as the motion information of the adjacent block. A similar method is used to detect whether the lower spatial neighboring block and the right spatial neighboring block of the mapped image block are available, which will not be repeated here.
  • the motion information is a 4 ⁇ 4 pixel set as the basic unit for storing motion information.
  • a pixel set of 2 ⁇ 2, 8 ⁇ 8, 4 ⁇ 8, 8 ⁇ 4, etc. may also be used as a basic unit for storing motion information.
  • the basic unit storing motion information may be simply referred to as the basic storage unit.
  • the motion information stored in the basic storage unit corresponding to the reference block can be directly obtained as the motion information corresponding to the reference block.
  • the motion information stored in the basic storage unit corresponding to the reference block may be directly obtained as the motion information corresponding to the reference block.
  • the motion information stored in the corresponding basic storage unit at the predetermined position of the reference block may be acquired.
  • the motion information stored in the corresponding basic storage unit at the upper left corner of the reference block may be obtained, or the motion information stored in the corresponding basic storage unit at the center point of the reference block may be obtained as the motion corresponding to the reference block information.
  • the sub-blocks of the image to be processed are also called basic prediction blocks.
  • FIG. 6 exemplarily shows an inter prediction method provided by an embodiment of the present application.
  • the motion vectors of the basic prediction blocks inside the image block to be processed are obtained by weighting the motion vector corresponding to the reference block of the image block to be processed
  • the schematic flowchart of the method may include:
  • S601. Determine the size of the basic prediction block in the image block to be processed according to the size reference information, where the size is used to determine the position of the basic prediction block in the image block to be processed.
  • the image block to be processed is the image block currently processed by the encoder or the decoder, and is hereinafter referred to as the image block to be processed or the current image block to be processed.
  • the size reference information may be the shape of the basic prediction block, and the size of the basic prediction block in the image block to be processed may be a fixed value predetermined by the codec end according to the size reference information, and solidified separately On the codec side.
  • the correspondence between different shapes and size values can be configured according to actual needs, which is not specifically limited in the embodiments of the present application.
  • the side length of the shorter side of the basic prediction block is determined to be 4 Or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, that is, the basic prediction block is a square, it is determined that the side length of the basic prediction block is 4 or 8. It should be understood that the above side length of 4 or 8 is just an example value, and may also be other constants such as 16, 24, and the like.
  • the size reference information may be a first identifier, and the first identifier is used to indicate the size of the basic prediction block.
  • the size of the basic prediction block in the image block to be processed The first identification is obtained in the stream.
  • S601 may be implemented as: receiving a code stream, and parsing the first identifier from the code stream.
  • the first identifier is located in the sequence parameter set (SPS) of the sequence of the image block to be processed, the picture parameter set (PPS) of the image of the image block to be processed and the strip of the image block to be processed In the code stream segment corresponding to any one of the slice header (slice header, or slice header).
  • the first identifier may be a syntax element.
  • S601 may be specifically implemented as: parsing the corresponding syntax element from the code stream, and then determining the size of the basic prediction block.
  • the syntax element may be carried in the code stream part corresponding to the SPS in the code stream, may also be carried in the code stream part corresponding to the PPS in the code stream, and may also be carried in the code stream part corresponding to the stripe header in the code stream.
  • the basic prediction blocks in the entire sequence adopt the same size; when the first identification is parsed from the PPS to determine the size of the basic prediction block, the entire The basic prediction blocks in the image frame use the same size; when the first identification is parsed from the slice header to determine the size of the basic prediction block, the basic prediction blocks in the entire slice use the same size.
  • An image includes an image that exists in the form of an entire frame (that is, an image frame), and also includes an image that exists in the form of a slice, in the form of a tile
  • Existing images, or images existing in the form of other sub-images, are not limited.
  • the first identifier is not present in the slice header of the slice using intra prediction.
  • the encoding end determines the size of the basic prediction block in an appropriate manner (for example, a rate-distortion selection method, or an experimental experience value method, or a size other than the basic prediction block size determined by the first identifier in the embodiment of the present application Other ways), the size of the determined basic prediction block is encoded into the code stream as the first identifier, and the decoder analyzes the first identifier from the code stream to determine the size of the basic prediction block.
  • an appropriate manner for example, a rate-distortion selection method, or an experimental experience value method, or a size other than the basic prediction block size determined by the first identifier in the embodiment of the present application Other ways
  • the size reference information may include historical information, and the size of the basic prediction block in the image block to be processed is determined by the historical information, so it can be obtained adaptively at the codec end respectively.
  • the history information refers to the information of the image block that has been coded and decoded before the current image block to be processed.
  • the history information may include the size of the plane mode prediction block in the previously reconstructed image.
  • the size of the basic prediction block may be determined according to the size of the plane mode prediction block in the previously reconstructed image.
  • the plane mode prediction block is a to-be-processed image block that performs inter prediction according to the inter-prediction method provided by the embodiment of the present application, and the previously reconstructed image is an image that is located before the image in which the current to-be-processed image block is located.
  • the image block to be processed using the inter prediction method (for example, the method shown in FIG. 6) described in the embodiments of the present application may be referred to as a planar mode prediction block.
  • the size of the basic prediction block in the image where the current image block is to be processed (hereinafter referred to as the current image for short) may be estimated according to the size of the statistical plane mode prediction block in the previously encoded image.
  • the image encoding order at the encoding end and the image decoding order at the decoding end are the same. Therefore, the previously reconstructed image is an image whose encoding order is before the image where the image block to be processed can also be described as previously
  • the composed image is an image whose decoding order is before the image where the image block to be processed is located.
  • the size of the basic prediction block of the current image can be determined as follows: Calculate the widths of all the plane mode prediction blocks in the previously reconstructed image The average value of the product of the sums; when the average value is less than the threshold, determine the size of the basic prediction block as the first size; when the average value is greater than or equal to the threshold, determine the size of the basic prediction block as the second size. Among them, the first size is smaller than the second size.
  • first size and the second size can be configured according to actual needs, and this embodiment of the present application does not specifically limit this.
  • the first size is smaller than the second size, and it can be understood that the area of the first size is smaller than the area of the second size.
  • the relationship between the first size and the second size may include a first size of 4 (square side length) and a second size of 8 (square side length), or a first size of 4 ⁇ 4 and a second size
  • the size is 8 ⁇ 8, which may also include a first size of 4 ⁇ 4 and a second size of 4 ⁇ 8, or a first size of 4 ⁇ 8, a second size of 8 ⁇ 8, or a first size It is 4 ⁇ 8, and the second size is 8 ⁇ 16, which is not limited.
  • the above threshold is set in advance.
  • the threshold value and the determination rule are not specifically limited, and can be configured according to actual needs.
  • the threshold value is The first threshold; when the POC of at least one reference frame of the image where the image block to be processed is located is greater than the POC of the image where the image block to be processed is located, the above threshold value is the second threshold value.
  • the first threshold and the second threshold are different.
  • the threshold is set to the first value.
  • the first value may be set to 75.
  • the threshold is set to a second value, for example, the second value may be set to 27. It should be understood that the setting of the first value and the second value is not limited.
  • the previously reconstructed image is the reconstructed image whose encoding order is closest to the image where the image block to be processed is located, that is, the prior reconstructed image is the decoding order that is away from the current image block to be processed The reconstructed image of the nearest image.
  • the statistical information of all the plane mode prediction blocks in the previous encoding / decoding frame of the current image frame (exemplarily, the average value of the products of the width and height of all plane mode prediction blocks) is used as the size reference information, or, The statistical information of all the plane mode prediction blocks in the previous band of the current band is used as size reference information.
  • the size of the basic prediction block in the current image frame can be determined according to the statistical information of all the plane mode prediction blocks in the previous encoding / decoding frame of the current image frame, or, based on the current slice
  • the statistical information of all the plane mode prediction blocks in the previous strip determines the size of the basic prediction block in the current strip.
  • the image may also include other forms of sub-images, so it is not limited to image frames and stripes.
  • the statistical information is updated in units of image frames or stripes, that is, updated once per image frame or each band.
  • the coding order is the closest to the image where the current image block is located.
  • Reconstructed image that is, in the previously reconstructed image is an image with the same time domain layer identifier as the image where the current image block is located, the decoding order is the closest to the image where the current image block is located. image.
  • the image closest to the current image encoding distance is determined from the images having the same temporal domain ID (temporal ID) as the image where the current image block is located.
  • temporal domain ID temporary ID
  • the previously reconstructed image may be multiple images.
  • calculating the average value of the products of the width and height of all the planar mode prediction blocks in the previously reconstructed image may include: Calculate the average value of the product of the width and height of all plane mode prediction blocks in the previously reconstructed image.
  • the above two feasible implementations respectively determine the size of the basic prediction block of the current image block to be processed according to the statistical data of a single prior reconstructed image, and in this implementation, a plurality of prior priors have been accumulated Reconstruct the statistical data of the image to determine the size of the basic prediction block of the current image block to be processed. That is, in this embodiment, the statistical information is updated in units of multiple image frames or multiple bands, that is, it is updated every preset number of image frames or every preset number of bands, or the statistical information may be Keep accumulating without updating.
  • calculating the average value of the product of the width and height of all the plane mode prediction blocks in the multiple previously reconstructed images may include: separately counting all the plane modes in each of the multiple previously reconstructed images The average value of the product of the width and height of the prediction block, and then the weighted average values above are weighted to obtain the final average value for comparison with the threshold value in this embodiment; or, multiple previous reconstructions are calculated
  • the average value of the product of the width and height of all plane mode prediction blocks in the image may also include: accumulating the product of the width and height of all the plane mode prediction blocks in the previously reconstructed image, and then dividing by all plane modes The number of prediction blocks is obtained to obtain an average value for comparison with the above-mentioned threshold value in this embodiment.
  • the statistical information in calculating the average value of the products of the width and height of all the planar mode prediction blocks in the previously reconstructed image, it also includes determining that the statistical information is valid. For example, if there is no plane mode prediction block in the previously reconstructed image, the above average value cannot be calculated, and the statistical information is invalid at this time. Correspondingly, the statistical information may not be updated, or the size of the basic prediction block of the current image block to be processed may be set to a preset value. Exemplarily, for the square basic prediction block, when the statistical information is invalid, the size of the basic prediction block of the current image block to be processed may be set to 4 ⁇ 4.
  • the size of the basic prediction block may also be set to a preset value.
  • determining the size of the basic prediction block in the image block to be processed in S601 further includes determining the shape of the basic prediction block.
  • the image block to be processed is a square
  • the basic prediction block is also a square, or, the aspect ratio of the image block to be processed and the aspect ratio of the basic prediction block are the same, or, the The width and height are respectively divided into several equal divisions to obtain the width and height of the basic prediction block, or the shape of the image block to be processed is not related to the shape of the basic prediction block.
  • the basic prediction block may be fixedly set as a square, or when the size of the image block to be processed is 32 ⁇ 16, the basic prediction block may be set as 16 ⁇ 8 or 8 ⁇ 4, etc., without limitation.
  • the inter prediction method provided by the present application may further include: determining the prediction direction of the image block to be processed .
  • determining the prediction direction of the image block to be processed may include: when the first prediction is valid and the second prediction is invalid, or the second prediction is valid and the first prediction is invalid
  • the prediction direction of the image block to be processed is unidirectional prediction; when the first direction prediction is valid and the second direction prediction is valid, the prediction direction of the image block to be processed is bidirectional prediction.
  • the first direction prediction and the second direction prediction refer to predictions in two different directions, and do not specifically limit the prediction direction.
  • the first prediction may be a forward prediction
  • the second prediction may be a backward prediction
  • the first prediction may be a backward prediction
  • the second prediction may be a forward prediction.
  • the first direction prediction when at least one temporary image block in the adjacent region of the image block to be processed uses the first reference frame image list to obtain the motion vector, the first direction prediction is effective; when the adjacent image block is to be processed When there is no temporary image block in the area and the first reference frame image list is used to obtain the motion vector, the first direction prediction is invalid; when at least one temporary image block in the adjacent area of the image block to be processed uses the second reference frame image list to obtain the motion vector , The second direction prediction is valid; when there is no temporary image block in the adjacent region of the image block to be processed, the second reference frame image list is used to obtain the motion vector, the second direction prediction is invalid.
  • the temporary image block is an image block with a preset size.
  • the value of the preset size may be determined according to actual requirements, which is not specifically limited in the embodiments of the present application.
  • the motion vector may include a first motion vector and / or a second motion vector.
  • the first motion vector corresponds to the first reference frame image list.
  • the first direction prediction is invalid; when the adjacent area of the image block to be processed
  • the second motion vector corresponds to the second reference frame image list; when the second motion vectors of all the temporary image blocks obtained by using the second reference frame image list in the adjacent area of the image block to be processed are the same, The forecast is invalid.
  • the first motion vector and the motion vector are the same; when the temporary image block uses only the second reference frame image list to obtain the motion vector At this time, the second motion vector is the same as the motion vector.
  • the adjacent area of the image block to be processed may include: one of the left airspace area, the upper airspace area, the right time domain area, and the lower time domain area of the image block to be processed Regions or any combination of regions.
  • the adjacent area of the image block to be processed may include: a left airspace area of the image block to be processed, an upper airspace area, a lower left airspace area, an upper right airspace area, and a right time Domain area, an area or any combination of areas in the lower time domain area.
  • the size reference information may include the prediction direction of the image to be processed and / or the shape information of the image block to be processed.
  • the shape information may include height and width.
  • determining the size of the basic prediction block in the image block to be processed in S601 may include: determining the size of the basic prediction block according to the prediction direction and / or shape information of the image block to be processed.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed in S601 may include: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels with a height of 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels and the height is 4 pixels, or, when the prediction direction of the image block to be processed is bidirectional prediction, The basic prediction block has a width of 4 pixels and a height of 8 pixels.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed in S601 may be implemented as follows: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block Is 4 pixels and the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels and the height It is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.
  • determining the size of the basic prediction block according to the prediction direction of the image block to be processed in S601 may include: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, the height is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height is 8 pixels.
  • the determining the size of the basic prediction block according to the prediction direction of the image block to be processed includes: when the prediction direction of the image block to be processed is bidirectional prediction, the size of the basic prediction block The width is 8 pixels and the height is 8 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels , The height is 4 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.
  • determining the size of the basic prediction block according to the shape information of the image to be processed in S601 may include: when the width of the image block to be processed is greater than or equal to the height of the image block to be processed The width of is 8 pixels and the height is 4 pixels; when the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.
  • the width of the basic prediction block may be 8 pixels, and the height may be 8 pixels.
  • the content of the size reference information is not specifically limited.
  • the above content describes the content of several different size reference information, and provides a specific implementation of S601 corresponding to the content of the different size reference information.
  • the content of the size reference information may also be a combination of the above several content. In this case, the specific implementation of S601 will not be repeated.
  • the size reference information may be the first identifier and the prediction method of the image block to be processed, the first identifier indicates the value range of the size of the basic prediction block, and then the value range is determined according to the prediction method of the image block to be processed The size of the basic prediction block is determined within.
  • the size reference information may be the first identifier and the shape of the image block to be processed
  • the first identifier indicates a value range of the size of the basic prediction block, and then determined within the value range according to the shape of the image block to be processed The size of the basic prediction block.
  • the inter prediction method provided by the present application further includes:
  • S602 Divide the image block to be processed into a plurality of basic prediction blocks according to the size of the basic prediction block; determine the position of each basic prediction block in the image block to be processed in turn.
  • the positions of the image block to be processed and the basic prediction block both exist in the form of coordinates, and this step only needs to determine the coordinates of each basic prediction block, or, the image to be processed
  • the block and the basic prediction block are distinguished, and there is no physical division step.
  • S603 Determine the first reference block and the second reference block of each basic prediction block according to the position of each basic prediction block.
  • the left boundary line of the first reference block is collinear with the left boundary line of the basic prediction unit
  • the upper boundary line of the second reference block is collinear with the upper boundary line of the basic prediction unit
  • the upper boundary of the first reference block and the image block to be processed The line is adjacent
  • the second reference block is adjacent to the left boundary of the image block to be processed.
  • S604 Perform weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the motion vector corresponding to the original reference block having a preset position relationship with the image block to be processed, to obtain The motion vector corresponding to the basic prediction block.
  • the operation of S604 is performed on each basic prediction block divided in S602, and the motion vector corresponding to each basic prediction block is obtained.
  • the process of performing S604 for each basic prediction block is the same, and will not be repeated one by one.
  • the original reference block having a preset spatial position relationship with the image block to be processed may include: an image block located in the upper left corner of the image block to be processed and adjacent to the upper left corner point of the image block to be processed One of the image block located at the upper right corner of the image block to be processed and adjacent to the upper right corner of the image block to be processed and the image block located at the lower left corner of the image block to be processed and adjacent to the lower left corner of the image block to be processed; or Multiple.
  • the original reference block having a preset spatial domain position relationship with the image block to be processed is located outside the image block to be processed, and may be simply referred to as the spatial reference block.
  • the original reference block having a preset time-domain positional relationship with the image block to be processed may include: located in the lower right corner of the mapped image block in the target reference frame and corresponding to the lower right corner of the mapped image block The adjacent image block.
  • the original reference block with the preset time domain position relationship with the image block to be processed is located outside the mapped image block, the size of the mapped image block and the image block to be processed are equal, the position of the mapped image block in the target reference frame and the image The position of the block in the image frame where the image block to be processed is the same, and may be simply referred to as the time-domain reference block.
  • the index information and reference frame list information of the target reference frame can be obtained by parsing the code stream. That is, the code stream includes the index information of the target reference frame and the reference frame list information. In the reference frame list information, the index information of the target reference frame is searched to determine the target reference frame.
  • the index information and reference frame list information of the target reference frame may be located in the code stream segment corresponding to the slice header of the slice where the image block to be processed is located.
  • S603 and S604 may include the following steps:
  • the motion vector corresponding to the first reference block is A (x, -1), and the motion vector corresponding to the second reference block is L (-1, y).
  • AR is the motion vector corresponding to the image block (spatial reference block 805) located in the upper right corner of the image block 600 to be processed and adjacent to the upper right corner of the image block 600 to be processed
  • BR is the mapped image block in the target reference frame
  • H is the ratio of the height of the image block to be processed 600 to the height of the basic prediction block 604
  • y is the upper left corner of the basic prediction block 604
  • the index information and reference frame list information of the target reference frame are parsed and obtained from the slice header, and the target reference frame is determined.
  • step S702A and step S702B do not limit the execution order relationship, and may be sequential or simultaneous.
  • S703B Perform weighted calculation based on the motion vector corresponding to the second temporary block 808 of the image block 600 to be processed and the motion vector corresponding to the first reference block 809 of the image block 600 to be processed, to obtain the second temporary motion vector corresponding to the basic prediction block 604 P v (x, y).
  • step S703A and step S703B do not limit the execution order relationship.
  • the motion vector P (x, y) corresponding to the basic prediction block 604 may also be obtained by a single formula combining the above steps.
  • S603 and S604 may include the following steps:
  • the motion vector corresponding to the first reference block is A (x, -1), and the motion vector corresponding to the second reference block is L (-1, y).
  • the motion vector corresponding to the airspace reference block 805 in the upper right corner of the image block 600 to be processed is used as the motion vector R (W, y) corresponding to the first temporary block 806 of the image block 600 to be processed.
  • the motion vector corresponding to the spatial domain reference block 801 in the lower left corner of the image block 600 to be processed is used as the motion vector B (x, H) corresponding to the second temporary block 808 of the image block 600 to be processed.
  • step S802A and step S802B do not limit the execution order relationship. It can be executed one after another or at the same time.
  • S803A Perform weighted calculation based on the motion vector R (W, y) corresponding to the first temporary block 806 of the image block 600 to be processed and the motion vector corresponding to the second reference block 802 of the image block 600 to be processed, to obtain the corresponding basic prediction block 604
  • S803B Perform weighted calculation based on the motion vector B (x, H) corresponding to the second temporary block 808 of the image block 600 to be processed and the motion vector corresponding to the first reference block 809 of the image block 600 to be processed, to obtain the corresponding basic prediction block 604 Second temporary motion vector P v (x, y).
  • S603 and S604 may include the following steps:
  • the motion vector corresponding to the first reference block is A (x, -1), and the motion vector corresponding to the second reference block is L (-1, y).
  • S902. Determine the first temporary block 806 and the second temporary block 808 according to the position of the basic prediction block 604 in the image block 600 to be processed.
  • the first temporary block is an image block at the position of block 806 of the mapped image block in the target reference frame
  • the second temporary block is an image block at the position of block 808 of the mapped image block in the target reference frame
  • the first temporary block And the second temporary block are both time-domain reference blocks.
  • step S903A and step S903B do not limit the execution order relationship.
  • S603 and S604 may include the following steps:
  • S0101 Determine the first reference block 809 and the second reference block 802 according to the position of the basic prediction block 604 in the image block 600 to be processed.
  • the motion vector corresponding to the first reference block is A (x, -1), and the motion vector corresponding to the second reference block is L (-1, y).
  • S0102. Perform motion compensation according to the motion information of any spatial reference block of the image block 600 to be processed, and determine the reference frame information and the position of the motion compensation block.
  • any one of the airspace reference blocks may be one of the available airspace adjacent blocks in the left airspace adjacent block or the upper airspace adjacent block shown in FIG. 5.
  • it may be the first available left airspace adjacency block detected along direction 1 in FIG. 5, or may be the first available upper airspace adjacency block detected along direction 2 in FIG. 5.
  • It may also be the first available spatial neighboring block detected by the multiple preset spatial reference blocks of the image block 600 to be processed in a preset order, as shown in FIG. 7 L ⁇ A ⁇ AR ⁇ BL ⁇ AL Sequence; it can also be the adjacent block of the airspace selected according to a predetermined rule, without limitation.
  • S0103 Determine the first temporary block 806 and the second temporary block 808 according to the position of the basic prediction block 604 in the image block 600 to be processed.
  • the first temporary block is the image block located at the block 806 position of the motion compensation block in the reference frame determined according to the reference frame information in step S0102
  • the second temporary block is the reference frame determined according to the reference frame information in step S0102
  • the image block located at block 808 of the motion compensation block, the first temporary block and the second temporary block are both time-domain reference blocks.
  • step S0104A and step S0104B do not limit the execution order relationship.
  • the motion information stored in the basic storage unit corresponding to the image block as the actual motion information of the image block
  • the motion information includes the motion vector and the motion vector Index information of the reference frame pointed to.
  • the index information of the reference frame of each reference block used for weighted calculation of the motion vector of the basic prediction block cannot be guaranteed to be consistent.
  • the motion information corresponding to the reference block is the actual motion information of the reference block.
  • the actual motion vector of the reference block needs to be weighted according to the distance relationship of the reference frame indicated by the reference frame index.
  • the motion information corresponding to the reference block is the actual motion of the reference block The motion vector in the information is weighted.
  • the target reference image index is determined.
  • it can be fixed to 0, 1, or other index values, or it can be the reference image index that is most frequently used in the reference image list, for example, the actual motion vector of all reference blocks or the The weighted motion vector points to the reference image index with the highest number of times.
  • index information of the reference frame of a reference block is different from the target image index, based on the time distance between the image of the reference block and the reference frame image indicated by the actual motion information of the reference block (reference frame index information), and the reference The ratio of the time distance between the image of the block and the reference image indicated by the target reference image index to scale the actual motion vector to obtain the weighted motion vector.
  • the method for inter prediction provided in the embodiment of the present application may further include:
  • it includes: first merging adjacent basic prediction blocks with the same motion information, and then performing motion compensation using the merged image block as a motion compensation unit.
  • first perform horizontal merging that is, each row of basic prediction blocks in the image block to be processed, from left to right, determine the motion information of the basic prediction block and the adjacent basic prediction blocks (exemplary, including motion vectors, Whether the reference frame list and reference frame index information are the same.
  • the motion information is the same, merge the two adjacent basic prediction blocks, and continue to judge whether the motion information of the next basic prediction block adjacent to the merged basic prediction block is the same as the motion information of the combined basic prediction block.
  • the merging is stopped, and the basic prediction block with different motion information is used as a starting point to continue the adjacent basic prediction with the same motion information.
  • the blocks are merged until the end of the basic prediction block row.
  • the combined basic prediction block is used as the motion compensation unit for motion compensation.
  • the merging manner of merging adjacent basic prediction blocks with the same motion information is related to the shape of the image block to be processed.
  • the width of the to-be-processed image block is greater than or equal to the height of the to-be-processed image block
  • only the above-mentioned horizontal merging method is used to merge the basic prediction blocks.
  • the width of the image block to be processed is smaller than the height of the image block to be processed
  • the motion information of the basic prediction block and the adjacent basic prediction blocks are judged in order from top to bottom (example Sexually, including whether the motion vector, reference frame list, and reference frame index information are the same.
  • the method for inter prediction provided in the embodiment of the present application may further include:
  • the first reference block does not exist, and the solution in the embodiment of the present application is not applicable at this time.
  • the second reference block does not exist, and the solution in the embodiments of the present application is not applicable at this time.
  • the method for inter prediction provided in the embodiment of the present application may further include:
  • S607 when it is determined that the shape of the image block to be processed satisfies the preset condition, S601 is executed, otherwise it is not executed.
  • the preset condition may include that the width of the image block to be processed is greater than or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or, it is determined that the width of the image block to be processed is greater than or equal to 16; or, it is determined to be processed The height of the image block is greater than or equal to 16.
  • the solution in the embodiment of the present application is not applicable, or, when the width of the image block to be processed is less than 16 and the height is less than 16, the The plan is not applicable.
  • 16 is used as the threshold here, other values such as 8, 24, and 32 may also be used, and the threshold values corresponding to the width and height may not be equal, and are not limited.
  • step S606 and step S607 may be executed in cooperation.
  • the image block to be processed is at the left border or the upper border of the image frame, or the width and height of the image block to be processed are both less than 16, the embodiment in this application cannot be used
  • the image block to be processed is located at the left border or upper border of the image frame, or the width or height of the image block to be processed is less than 16, it cannot be implemented by this application
  • the inter prediction scheme in the example when the image block to be processed is located at the left border or upper border of the image frame, or the width or height of the image block to be processed is less than 16, it cannot be implemented by this application
  • the inter prediction scheme in the example is described in a feasible implementation manner, when the image block to be processed is located at the left border or upper border of the image frame, or the width or height of the image block to be processed is less than 16, it cannot be implemented by this application
  • the inter prediction scheme in the example when the image block to be processed is located at the left border or upper border of the image frame, or the
  • predicting the motion information of one or more sub-blocks in the current image block based on a planar planar mode includes: The motion vector corresponding to the first reference image list in the motion information of one available neighboring block is different from the motion vector corresponding to the first reference image list in the motion information of the other available neighboring block, and / or, the one
  • prediction is based on the planar planar mode
  • a motion vector corresponding to the first reference image list that is, the first reference image list is valid
  • the second reference image list That is, the second reference image list is valid
  • the current block is bidirectionally predicted
  • the current block is unidirectionally predicted.
  • the plurality of available neighboring blocks may be: all available left-space neighboring blocks of the current image block and all available upper-space neighboring blocks of the current image block, or all available right-space neighboring blocks of the current image block And all available lower temporal neighboring blocks of the current image block, or, when all available left spatial neighboring blocks of the current image block, all available upper spatial neighboring blocks of the current image block, and all available right sides of the current image block.
  • FIG. 10 is a schematic block diagram of an apparatus 1000 for inter prediction in an embodiment of the present application.
  • the apparatus 100 for inter prediction may include: a determination module 1001, a positioning module 1002, and a calculation module 1003.
  • the determining module 1001 is configured to determine the size of the basic prediction block in the image block to be processed according to the size reference information, and the size is used to determine the position of the basic prediction block in the image block to be processed.
  • the positioning module 1002 is configured to determine the first reference block and the second reference block of the basic prediction block according to the position of the basic prediction block determined by the determination module 1001. Wherein, the left boundary line of the first reference block is collinear with the left boundary line of the basic prediction unit, the upper boundary line of the second reference block is collinear with the upper boundary line of the basic prediction unit, and the first reference block Adjacent to the upper boundary line of the image block to be processed, and the second reference block is adjacent to the left boundary line of the image block to be processed;
  • the calculation module 1003 is used for the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the motion vector corresponding to the original reference block having a preset position relationship of the image block to be processed One or more of them perform weighted calculation to obtain the motion vector corresponding to the basic prediction block.
  • the determining module 1001 is used to support the inter-frame prediction device 1000 to perform S601 and the like in the above embodiments, and / or other processes used in the technology described herein.
  • the positioning module 1002 is used to support the inter-frame prediction device 1000 to perform S603 and the like in the above embodiments, and / or other processes used in the technology described herein.
  • the calculation module 1003 is used to support the inter-frame prediction device 1000 to perform S604 and S605 in the foregoing embodiments, and / or other processes used in the technology described herein.
  • the apparatus 1000 for inter-frame prediction may further include a dividing module 1004, and the apparatus 1000 for supporting the inter-frame prediction executes S602 and the like in the foregoing embodiment, and / or is used for Other processes of technology.
  • the apparatus 1000 for inter-frame prediction may further include a judgment module 1005, and the apparatus 1000 for supporting the inter-frame prediction executes S606 and S607 in the foregoing embodiments, and / or used in this document. Other processes of the described technology.
  • FIG. 11 is a schematic block diagram of an implementation manner of the apparatus 1100 for inter prediction according to an embodiment of the present application.
  • the apparatus 1100 for inter prediction may include a processor 1110, a memory 1130, and a bus system 1150.
  • the processor and the memory are connected through a bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory.
  • the memory of the encoding device stores the program code, and the processor can call the program code stored in the memory to perform various video encoding or decoding methods described in this application, especially video encoding or decoding methods in various new inter prediction modes , And methods for predicting motion information in various new inter prediction modes. In order to avoid repetition, they will not be described in detail here.
  • the memory 1130 may include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device may also be used as the memory 1130.
  • the memory 1130 may include code and data 1131 accessed by the processor 1110 using the bus system 1150.
  • the memory 1130 may further include an operating system 1133 and an application program 1135 including at least one program that allows the processor 1110 to perform the video encoding or decoding method described in the present application (in particular, the inter prediction method described in the present application).
  • the application program 1135 may include applications 1 to N, which further include a video encoding or decoding application (referred to as a video decoding application for short) that performs the video encoding or decoding method described in this application.
  • the bus system 1150 may also include a power bus, a control bus, and a status signal bus. However, for clarity, various buses are marked as the bus system 1150 in the figure.
  • the apparatus 1100 for inter prediction may further include one or more output devices, such as a display 1170.
  • the display 1170 may be a tactile display that merges the display with a tactile unit that operably senses touch input.
  • the display 1170 may be connected to the processor 1110 via the bus system 1150.
  • Both the above-described inter-frame prediction device 1000 and the inter-frame prediction device 1100 may perform the above-described inter-frame prediction method shown in FIG. 6.
  • the inter-frame prediction device 1000 and the inter-frame prediction device 1100 may specifically be video codec devices or Other devices with video encoding and decoding capabilities.
  • the apparatus 1000 for inter prediction and the apparatus 1100 for inter prediction may be used to perform image prediction in a codec process.
  • An embodiment of the present application provides a decoding device.
  • the decoding device includes the apparatus for inter prediction described in any of the foregoing embodiments.
  • the decoding device may be a video decoder or a video encoder.
  • the present application also provides a terminal, which includes: one or more processors, a memory, and a communication interface.
  • the memory and the communication interface are coupled to one or more processors; the memory is used to store computer program code, and the computer program code includes instructions.
  • the terminal performs the inter-frame prediction of the embodiment of the present application. method.
  • the terminal here may be a video display device, a smart phone, a portable computer, and other devices that can process video or play video.
  • the present application further provides a computer-readable storage medium.
  • the computer-readable storage medium includes one or more program codes.
  • the one or more programs include instructions.
  • the terminal performs the method of inter prediction as shown in FIG. 6.
  • a computer program product in another embodiment, includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; at least one processor of the terminal may be available from the computer.
  • the read storage medium reads the computer-executed instruction, and at least one processor executes the computer-executed instruction to cause the terminal to implement the method for performing inter prediction as shown in FIG. 6.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transferred from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data center Transmission to another website, computer, server or data center via wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), or the like.
  • actions or events of any of the methods described herein may be performed in different sequences, may be added, merged, or omitted together (eg, not all described Actions or events are necessary for the practice method).
  • actions or events may be performed simultaneously rather than sequentially, for example, via multi-threaded processing, interrupt processing, or multiple processors.
  • specific aspects of the present application are described as being performed by a single module or unit for clarity, it should be understood that the technology of the present application may be performed by a combination of units or modules associated with a video decoder.
  • the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, and executed by a hardware-based processing unit.
  • the computer-readable medium may include a computer-readable storage medium or a communication medium, the computer-readable storage medium corresponding to a tangible medium such as a data storage medium, and the communication medium includes facilitating the transfer of a computer program (for example) from one place to another according to a communication protocol Any media.
  • computer-readable media may exemplarily correspond to (1) non-transitory tangible computer-readable storage media, or (2) communication media such as signals or carrier waves.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementation of the techniques described in this application.
  • the computer program product may include a computer-readable medium.
  • this computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory or may be used to store instructions Or any other medium that can be accessed by a computer in the form of a desired code in the form of a data structure. Also, any connection is properly termed a computer-readable medium.
  • coaxial cable fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave
  • DSL digital subscriber line
  • coaxial Cables, fiber optic cables, twisted pairs, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are instead directed to non-transitory tangible storage media.
  • magnetic disks and optical disks include compact disks (CDs), laser disks, optical disks, digital versatile disks (DVDs), flexible disks, and Blu-ray disks, where disks usually reproduce data magnetically, while optical disks The data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec.
  • the technology can be fully implemented in one or more circuits or logic elements.
  • the technology of the present application can be implemented in a wide variety of devices or equipment, including wireless handsets, integrated circuits (ICs), or collections of ICs (eg, chipsets).
  • ICs integrated circuits
  • collections of ICs eg, chipsets.
  • Various components, modules, or units are described in this application to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. More precisely, as described above, various units can be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) combined with suitable software and / or firmware To provide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种帧间预测的方法及装置,该方法包括:根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,该尺寸用于确定基本预测块在待处理图像块中的位置(S601);根据该位置,确定基本预测块的第一参考块和第二参考块(S603)。其中,第一参考块的左边界线和基本预测单元的左边界线共线,第二参考块的上边界线和基本预测单元的上边界线共线,第一参考块与待处理图像块的上边界线邻接,第二参考块与待处理图像块的左边界线邻接;对第一参考块对应的运动矢量、第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获取基本预测块对应的运动矢量(S604)。

Description

一种帧间预测的方法及装置
本申请要求于2018年11月19日提交国家知识产权局、申请号为201811377897.4、发明名称为“一种帧间预测的方法及装置”,以及2018年12月21日提交国家知识产权局、申请号为201811578340.7、发明名称为“一种帧间预测的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频编解码技术领域,尤其涉及一种帧间预测的方法及装置。
背景技术
数字视频能力可并入到多种多样的装置中,包含数字电视、数字直播系统、无线广播系统、个人数字助理(personal digital assistant,PDA)、膝上型或桌上型计算机、平板计算机、电子图书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话(所谓的“智能电话”)、视频电话会议装置、视频流式传输装置及其类似者。数字视频装置实施视频压缩技术,例如,在由动态图像专家组(moving picture experts group,MPEG)-2、MPEG-4、国际电信联盟电信标准分局(international telecommunication union-telecommunication sector,ITU-T)H.263、ITU-T H.264/MPEG-4第10部分高级视频编码(advanced video coding,AVC)定义的标准、视频编码标准H.265/高效视频编码(high efficiency video coding,HEVC)标准以及此类标准的扩展中所描述的视频压缩技术。视频装置可通过实施此类视频压缩技术来更有效率地发射、接收、编码、解码和/或存储数字视频信息。
视频压缩技术执行空间(图像内)预测和/或时间(图像间)预测以减少或去除视频序列中固有的冗余。对于基于块的视频编码,视频条带(即,视频帧或视频帧的一部分)可分割成若干图像块,所述图像块也可被称作树块、编码单元(coding unit,CU)和/或编码节点。使用关于同一图像中的相邻块中的参考样本的空间预测来编码图像的待帧内编码(I)条带中的图像块。图像的待帧间编码(P或B)条带中的图像块可使用相对于同一图像中的相邻块中的参考样本的空间预测或相对于其它参考图像中的参考样本的时间预测。图像可被称作帧,且参考图像可被称作参考帧。
其中,包含HEVC标准在内的各种视频编码标准提出了用于图像块的预测性编码模式,即基于已经编码的图像块来预测当前编码的图像块。在帧内预测模式中,基于与当前块在相同的图像中的一或多个先前经解码相邻块来预测当前解码的图像块;在帧间预测模式中,基于不同图像中的已经解码块来预测当前解码的图像块。
当前,进行帧间预测试,按照规定的基本预测块的尺寸,将待处理图像块划分为基本预测块后进行帧间预测的方式,编码性能有限。
发明内容
本申请实施例提供一种帧间预测的方法及装置,能够自适应的确定基本预测块的尺寸后进行帧间预测,提高编码性能。
在本申请第一方面,提供了一种帧间预测的方法,包括:根据尺寸参考信息,确 定待处理图像块中的基本预测块的尺寸,该尺寸用于确定基本预测块在待处理图像块中的位置;然后根据基本预测块的位置,确定该基本预测块的第一参考块和第二参考块;其中,该基本预测块的第一参考块的左边界线和该基本预测单元的左边界线共线,该基本预测块的第二参考块的上边界线和该基本预测单元的上边界线共线,该基本预测块的第一参考块与待处理图像块的上边界线邻接,该基本预测块的第二参考块与待处理图像块的左边界线邻接;最后对该基本预测块的第一参考块对应的运动矢量、该基本预测块的第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,获取该基本预测块对应的运动矢量。
通过本申请提供的帧间预测的方法,根据尺寸参考信息自适应的确定基本预测块的尺寸,合理的尺寸参考信息确定更合适的基本预测块的尺寸,使得编码时的帧间预测性能更高。
结合第一方面,在一种可行的实施方式中,与待处理图像块具有预设位置关系的原始参考块,可以包括:与待处理图像块具有预设空域位置关系的原始参考块,和/或,与待处理图像块具有预设时域位置关系的原始参考块。在该实施方式中,可以通过合理地选择用于生成基本预测块对应的运动矢量的参考块,提高生成的运动矢量的可靠性。
结合第一方面或上述可行的实施方式,在另一种可行的实施方式中,与待处理图像块具有预设空域位置关系的原始参考块,可以包括:位于待处理图像块左上角且与待处理图像块的左上角点相邻的图像块、位于待处理图像块右上角且与待处理图像块的右上角点相邻的图像块和位于待处理图像块左下角且与待处理图像块的左下角点相邻的图像块中的一个或多个。其中,与待处理图像块具有预设空域位置关系的原始参考块位于待处理图像块的外部。在该实施方式的有益效果中,通过合理地选择用于生成基本预测块对应的运动矢量的空域参考块,提高生成的运动矢量的可靠性。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,与待处理图像块具有预设时域位置关系的原始参考块,可以包括:在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块。其中,与待处理图像块具有预设时域位置关系的原始参考块位于映射图像块的外部,映射图像块与待处理图像块尺寸相等,映射图像块在目标参考帧中的位置与待处理图像块在待处理图像块所在图像帧中的位置相同。在该实施方式中,通过合理地选择用于生成基本预测块对应的运动矢量的时域参考块,提高生成的运动矢量的可靠性。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,目标参考帧的索引信息和参考帧列表信息可以通过解析码流获得。该码流是指编/解码端间传输的码流。在该实施方式中,通过配置参考帧列表以及索引信息,可以灵活地选择目标参考帧,使对应的时域参考块更可靠。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,目标参考帧的索引信息和参考帧列表信息位于待处理图像块所在的条带的条带头对应的码流段中。将目标参考帧的标识信息存储于条带头,条带内的图像块所有时域参考块共享相同的参考帧信息,节省了编码码流,提高了编码效率。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,对基本预测块的第一参考块对应的运动矢量、基本预测块的第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,获取基本预测块对应的运动矢量,具体可以实现为:基本预测块对应的运动矢量根据如下公式获得:
P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W);其中,
P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y);
P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H);
R(W,y)=((H-y-1)×AR+(y+1)×BR)/H;
B(x,H)=((W-x-1)×BL+(x+1)×BR)/W;
AR为位于待处理图像块右上角且与待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块对应的运动矢量,BL为位于待处理图像块左下角且与待处理图像块的左下角点相邻的图像块对应的运动矢量,x为基本预测块的左上角点相对于待处理图像块的左上角点的水平距离与基本预测块的宽的比值,y为基本预测块的左上角点相对于待处理图像块的左上角点的竖直距离与基本预测块的高的比值,H为待处理图像块的高与基本预测块的高的比值,W为待处理图像块的宽与基本预测块的宽的比值,L(-1,y)为第二参考块对应的运动矢量,A(x,-1)为第一参考块对应的运动矢量,P(x,y)为基本预测块对应的运动矢量。
需要说明的是,本申请具体实施方式部分提供了多种对第一参考块对应的运动矢量、第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获取基本预测块对应的运动矢量的实施方式,并不仅限于本实施方式的内容。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,尺寸参考信息可以包括第一标识;第一标识用于指示基本预测块的尺寸。本申请提供的帧间预测的方法还可以包括:接收码流,从该码流中解析获取第一标识,将第一标识指示的尺寸作为基本预测块的尺寸。其中,第一标识可以位于待处理图像块所在序列的序列参数集、待处理图像块所在图像的图像参数集和待处理图像块所在条带的条带头中的任一个所对应的码流段中。通过在码流的辅助信息中加入基本预测块的尺寸的标识信息,每个图像处理时使用适配的尺寸,提高了对于图像内容的适应性,且实现简单。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,尺寸参考信息可以包括当前的待处理图像块的在先已重构图像中平面模式预测块的尺寸。其中,平面模式预测块为根据第一方面前述任一项可行的实施方式进行帧间预测的待处理图像块,在先已重构图像为编码顺序位于当前的待处理图像块所在图像之前的图像。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,根据当前待处理图像块所在图像的在先已重构图像中平面模式预测块的尺寸,确定当前待处理图像块中的基本预测块的尺寸,具体可以实现为:计算在先已重构图像中全部平 面模式预测块的宽和高的乘积的平均值;当该平均值小于阈值时,当前待处理图像块中的基本预测块的尺寸为第一尺寸;当该平均值大于或等于阈值时,当前待处理图像块中的基本预测块的尺寸为第二尺寸。其中,第一尺寸小于第二尺寸。该实施方式中,利用先验信息来确定当前待处理图像块的基本预测块的尺寸,不需要传递额外的标识信息,既提高了对于图像的适应性,又保证了不增加编码码率。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,当前待处理图像块所在图像的在先已重构图像为与当前待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离当前待处理图像块所在的图像最近的已重构图像。通过合理地选择同一时域层中的最近参考帧来统计先验信息,提高了统计信息的可靠性。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,当前待处理图像块所在图像的在先已重构图像为编码顺序距离当前待处理图像块所在的图像最近的已重构图像。通过合理地选择最近的参考帧来统计先验信息,提高了统计信息的可靠性。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,当前待处理图像块所在图像的在先已重构图像为多个图像,对应的,计算在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值,包括:计算多个在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值。累计多帧的统计信息来确定当前图像块中基本预测块的尺寸,提高了统计的可靠性。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,上述阈值为预设阈值。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,当待处理图像块所在的图像的参考帧的POC均小于该待处理图像块所在的图像的POC时,上述阈值可以为第一阈值;当待处理图像块所在的图像的至少一个参考帧的POC大于该待处理图像块所在的图像的POC时,上述阈值可以为第二阈值。其中,第一阈值和第二阈值不同。可以根据不同的编码场景,设置不同的阈值,提高了对应编码场景的适应性。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,在根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,本申请提供的帧间预测的方法还可以包括:确定待处理图像块的预测方向。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,确定待处理图像块的预测方向,可以包括:当第一向预测有效且第二向预测无效,或者,第二向预测有效且第一向预测无效时,待处理图像块的预测方向为单向预测;当第一向预测有效且第二向预测有效时,待处理图像块的预测方向为双向预测。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,当待处理图像块的相邻区域内至少一个临时图像块采用第一参考帧图像列表获得运动矢量时,第一向预测有效;当待处理图像块的相邻区域内没有临时图像块采用第一参考帧图像列表获得运动矢量时,第一向预测无效。当待处理图像块的相邻区域内至少一个临时图像块采用第二参考帧图像列表获得运动矢量时,第二向预测有效;当待处理图 像块的相邻区域内没有临时图像块采用第二参考帧图像列表获得运动矢量时,第二向预测无效。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,运动矢量包括第一运动矢量和/或第二运动矢量,当待处理图像块的相邻区域内至少两个采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量不同时,第一向预测有效;其中,第一运动矢量对应第一参考帧图像列表;当待处理图像块的相邻区域内所有采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量均相同时,第一向预测无效;当待处理图像块的相邻区域内至少两个采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量不同时,第二向预测有效;其中,第二运动矢量对应第二参考帧图像列表;当待处理图像块的相邻区域内所有采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量均相同时,第二向预测无效。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,当临时图像块仅采用第一参考帧图像列表获得运动矢量时,第一运动矢量和运动矢量相同;当临时图像块仅采用第二参考帧图像列表获得运动矢量时,第二运动矢量和所述运动矢量相同。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,临时图像块为具有预设尺寸的图像块。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,待处理图像块的相邻区域,可以包括:待处理图像块的左侧空域区域,上侧空域区域,右侧时域区域,下侧时域区域中的一个区域或者任意区域组合。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,确定待处理图像块中的基本预测块的尺寸,可以包括:当基本预测块的两条邻边的边长不等时,确定基本预测块的较短的一条边的边长为4或8;当基本预测块的两条邻边的边长相等时,确定基本预测块的边长为4或8。在该实施方式中,固定基本预测块的尺寸,降低了复杂度。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,尺寸参考信息包括待处理图像块的形状信息;形状信息包括宽度和高度。具体的,根据尺寸参考信息,确定基本预测块的尺寸,可以包括:当待处理图像块的宽度大于或等于所述待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,基本预测块的宽度为8像素,高度为8像素。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,尺寸参考信息包括待处理图像块的预测方向。根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,可以包括:根据待处理图像块的预测方向,确定基本预测块的尺寸。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,根据待处理图像块的预测方向,确定基本预测块的尺寸,包括:当待处理图像块的预测方 向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为4像素,或者,基本预测块的宽度为4像素,高度为8像素。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,根据待处理图像块的预测方向,确定基本预测块的尺寸,可以包括:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测且待处理图像块的宽度大于或等于所述待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的预测方向为双向预测且待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,根据待处理图像块的预测方向,确定基本预测块的尺寸,包括:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为8像素。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,根据待处理图像块的预测方向,确定基本预测块的尺寸,包括:当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为8像素;当待处理图像块的预测方向为单向预测且待处理图像块的宽度大于或等于待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的预测方向为单向预测且待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
需要说明的是,上述内容中提供了尺寸参考信息为不同内容时,确定基本预测块的尺寸的具体实现的示例,应理解,尺寸参考信息可以包括一个或多个内容,当尺寸参考信息包括多个内容时,可以根据实际需求结合尺寸参考信息包括的多个内容,确定基本预测块的尺寸,此处对于确定过程不再赘述。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,本申请提供的帧间预测的方法,在确定待处理图像块中的基本预测块的尺寸之后,还可以包括:根据确定的基本预测块的尺寸,将待处理图像块划分为多个基本预测块;依次确定每个基本预测块在待处理图像块中的位置。应理解,该实施方式确定了各个基本预测块在待处理图像块中的坐标位置,之后对当前待处理图像块中的每个基本预测块进行帧间预测。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,在根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,本申请提供的帧间预测的方法还可以包括:确定第一参考块和第二参考块位于当前待处理图像块所在的图像边界内。也就是说,确定当前待处理图像块并不是其所在的图像的边界位置。在当前待处理图像块不存在第一参考块或第二参考块时,不采用本申请提供的帧间预测的方法,当第一参考块和第二参考块不存在时,该预测方法的准确性会降低,此时不采用本方法,则避免了不必要的复杂度开销。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,在根 据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,本申请提供的帧间预测的方法还可以包括:确定待处理图像块的宽大于或等于16且待处理图像块的高大于或等于16;或者,确定待处理图像块的宽大于或等于16;或者,确定待处理图像块的高大于或等于16。通过该实施方式,当待处理图像块过小时不采用本申请提供的帧间预测的方法,平衡了编码效率和复杂度。
结合第一方面或上述任一种可行的实施方式,在另一种可行的实施方式中,本申请提供的帧间预测的方法可以用于编码待处理图像块,或者,解码待处理图像块。应理解,本申请实施例涉及一种帧间预测方法,在混合编码架构下,既属于编码过程的一部分,也属于解码过程的一部分。
本申请第二方面,提供了一种帧间预测的装置,包括:确定模块,用于根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,该尺寸用于确定基本预测块在待处理图像块中的位置;定位模块,用于根据基本预测块在待处理图像块中的位置,确定基本预测块的第一参考块和第二参考块;其中,第一参考块的左边界线和基本预测单元的左边界线共线,第二参考块的上边界线和基本预测单元的上边界线共线,第一参考块与待处理图像块的上边界线邻接,第二参考块与待处理图像块的左边界线邻接;计算模块,用于对第一参考块对应的运动矢量、第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,获取基本预测块对应的运动矢量。
结合第二方面,在一种可行的实施方式中,与待处理图像块具有预设位置关系的原始参考块,可以包括:与待处理图像块具有预设空域位置关系的原始参考块,和/或,与待处理图像块具有预设时域位置关系的原始参考块。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实现方式中,与待处理图像块具有预设空域位置关系的原始参考块,可以包括:位于待处理图像块左上角且与待处理图像块的左上角点相邻的图像块、位于待处理图像块右上角且与待处理图像块的右上角点相邻的图像块和位于待处理图像块左下角且与待处理图像块的左下角点相邻的图像块中的一个或多个。其中,与待处理图像块具有预设空域位置关系的原始参考块位于待处理图像块的外部。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,与待处理图像块具有预设时域位置关系的原始参考块,可以包括:在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块。其中,与待处理图像块具有预设时域位置关系的原始参考块位于映射图像块的外部,映射图像块与待处理图像块尺寸相等,映射图像块在目标参考帧中的位置与待处理图像块在待处理图像块所在图像帧中的位置相同。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,目标参考帧的索引信息和参考帧列表信息可以通过解析码流获得。该码流是指编/解码端间传输的码流。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,目标参考帧的索引信息和参考帧列表信息位于待处理图像块所在的条带的条带头对应的码流段中。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,所述计算模块具体用于根据如下公式获得基本预测块对应的运动矢量:
P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W);其中,
P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y);
P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H);
R(W,y)=((H-y-1)×AR+(y+1)×BR)/H;
B(x,H)=((W-x-1)×BL+(x+1)×BR)/W;
AR为位于待处理图像块右上角且与待处理图像块的右上角点相邻的图像块对应的运动矢量,BR为在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块对应的运动矢量,BL为位于待处理图像块左下角且与待处理图像块的左下角点相邻的图像块对应的运动矢量,x为基本预测块的左上角点相对于待处理图像块的左上角点的水平距离与基本预测块的宽的比值,y为基本预测块的左上角点相对于待处理图像块的左上角点的竖直距离与基本预测块的高的比值,H为待处理图像块的高与基本预测块的高的比值,W为待处理图像块的宽与基本预测块的宽的比值,L(-1,y)为第二参考块对应的运动矢量,A(x,-1)为第一参考块对应的运动矢量,P(x,y)为基本预测块对应的运动矢量。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,尺寸参考信息可以包括第一标识;第一标识用于指示基本预测块的尺寸。所述装置还包括:接收单元,用于接收码流;解析单元,用于从接收单元接收的码流中解析获取第一标识。其中,第一标识位于待处理图像块所在序列的序列参数集、待处理图像块所在图像的图像参数集和待处理图像块所在条带的条带头中的任一个所对应的码流段中。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,尺寸参考信息可以包括当前的待处理图像块的在先已重构图像中平面模式预测块的尺寸。其中,平面模式预测块为根据第二方面前述任一项可行的实施方式进行帧间预测的待处理图像块,在先已重构图像为编码顺序位于当前的待处理图像块所在图像之前的图像。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块具体用于:计算在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值;当该平均值小于阈值时,当前待处理图像块中的基本预测块的尺寸为第一尺寸;当该平均值大于或等于阈值时,当前待处理图像块中的基本预测块的尺寸为第二尺寸。其中,第一尺寸小于第二尺寸。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,当前待处理图像块所在图像的在先已重构图像为与当前待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离当前待处理图像块所在的图像最近的已重构图像。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,当前待处理图像块所在图像的在先已重构图像为编码顺序距离当前待处理图像块所在的图像最近的已重构图像。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,当前 待处理图像块所在图像的在先已重构图像为多个图像,对应的,确定模块具体用于:计算多个在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,上述阈值为预设阈值。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,当待处理图像块所在的图像的参考帧的POC均小于该待处理图像块所在的图像的POC时,上述阈值可以为第一阈值;当待处理图像块所在的图像的至少一个参考帧的POC大于该待处理图像块所在的图像的POC时,上述阈值可以为第二阈值。其中,第一阈值和第二阈值不同。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块还用于:确定待处理图像块的预测方向。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块具体用于:当第一向预测有效且第二向预测无效,或者,第二向预测有效且第一向预测无效时,待处理图像块的预测方向为单向预测;当第一向预测有效且第二向预测有效时,待处理图像块的预测方向为双向预测。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,当待处理图像块的相邻区域内至少一个临时图像块采用第一参考帧图像列表获得运动矢量时,第一向预测有效;当待处理图像块的相邻区域内没有临时图像块采用第一参考帧图像列表获得运动矢量时,第一向预测无效。当待处理图像块的相邻区域内至少一个临时图像块采用第二参考帧图像列表获得运动矢量时,第二向预测有效;当待处理图像块的相邻区域内没有临时图像块采用第二参考帧图像列表获得运动矢量时,第二向预测无效。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,运动矢量包括第一运动矢量和/或第二运动矢量,当待处理图像块的相邻区域内至少两个采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量不同时,第一向预测有效;其中,第一运动矢量对应第一参考帧图像列表;当待处理图像块的相邻区域内所有采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量均相同时,第一向预测无效;当待处理图像块的相邻区域内至少两个采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量不同时,第二向预测有效;其中,第二运动矢量对应第二参考帧图像列表;当待处理图像块的相邻区域内所有采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量均相同时,第二向预测无效。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,当临时图像块仅采用第一参考帧图像列表获得运动矢量时,第一运动矢量和运动矢量相同;当临时图像块仅采用第二参考帧图像列表获得运动矢量时,第二运动矢量和所述运动矢量相同。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,临时图像块为具有预设尺寸的图像块。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,待处理图像块的相邻区域,可以包括:待处理图像块的左侧空域区域,上侧空域区域,右 侧时域区域,下侧时域区域中的一个区域或者任意区域组合。
结合第二方面或上述任一种可行的实施方式,在另一种可行的实施方式中,确定待处理图像块中的基本预测块的尺寸,可以包括:当基本预测块的两条邻边的边长不等时,确定基本预测块的较短的一条边的边长为4或8;当基本预测块的两条邻边的边长相等时,确定基本预测块的边长为4或8。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,尺寸参考信息包括待处理图像块的形状信息;形状信息包括宽度和高度。确定模块具体用于:当待处理图像块的宽度大于或等于所述待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,基本预测块的宽度为8像素,高度为8像素。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,尺寸参考信息包括待处理图像块的预测方向。确定模块具体用于:根据待处理图像块的预测方向,确定基本预测块的尺寸。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块具体用于:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为4像素,或者,基本预测块的宽度为4像素,高度为8像素。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块具体用于:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测且待处理图像块的宽度大于或等于所述待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的预测方向为双向预测且待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块具体用于:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为8像素。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,确定模块具体用于:当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为8像素;当待处理图像块的预测方向为单向预测且待处理图像块的宽度大于或等于待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的预测方向为单向预测且待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,本申请提供的帧间预测的装置还可以包括划分模块,用于:根据基本预测块的尺寸,将待处理图像块划分为多个基本预测块;依次确定每个基本预测块在待处理图像块中的位置。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,本申请提供的帧间预测的装置还可以包括判断模块,用于:确定第一参考块和第二参考块位于待处理图像块所在的图像边界内。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,本申请提供的帧间预测的装置还可以包括判断模块,用于:确定待处理图像块的宽大于或等于16且待处理图像块的高大于或等于16;或者,确定待处理图像块的宽大于或等于16;或者,确定待处理图像块的高大于或等于16。
结合第二方面或上述任一种可行的实现方式,在另一种可行的实施方式中,本申请提供的帧间预测的装置用于编码待处理图像块,或者,解码待处理图像块。
本申请第三方面,提供一种帧间预测的设备,包括:处理器和耦合于所述处理器的存储器;所述处理器用于执行上述第一方面或任一种可行的实现方式所述的帧间预测的方法。
本申请第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面或任一种可行的实现方式所述的帧间预测的方法。
本申请第五方面,提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面或任一种可行的实现方式所述的帧间预测的方法。
本申请第六方面,提供了一种视频图像编码器,所述视频图像编码器包含上述第二方面或任一种可行的实现方式所述的帧间预测的装置。
本申请第七方面,提供了一种视频图像解码器,所述视频图像解码器包含上述第二方面或任一种可行的实现方式所述的帧间预测的装置。
应理解,本申请的第二至七方面与本申请的第一方面的技术方案一致,各方面及对应的可实施的设计方式所取得的有益效果相似,不再赘述。
附图说明
图1为本申请实施例中视频译码系统的示例性框图;
图2为本申请实施例中视频编码器的一种示例性框图;
图3为本申请实施例中视频解码器的一种示例性框图;
图4为本申请实施例中帧间预测模块的一种示意性框图;
图5为本申请实施例中待处理图像块和其参考块位置关系的一种示意图;
图6为本申请实施例中帧间预测的方法的一种示例性流程图;
图7为本申请实施例中加权计算基本预测块对应的运动矢量的一种场景示意图;
图8为本申请实施例中加权计算基本预测块对应的运动矢量的另一种场景示意图;
图9为本申请实施例中加权计算基本预测块对应的运动矢量的又一种场景示意图;
图10为本申请实施例中帧间预测的装置的一种示例性框图;
图11为本申请实施例中译码设备的一种示例性框图。
具体实施方式
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于限定特定顺序。
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
图1为本申请实施例中所描述的一种实例的视频译码系统1的框图。如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本申请中,术语“视频译码”或“译码”可一般地指代视频编码或视频解码。视频译码系统1的视频编码器100和视频解码器200用于根据本申请提出的多种新的帧间预测模式中的任一种所描述的各种方法实例来预测当前经译码图像块或其子块的运动信息,例如运动矢量,使得预测出的运动矢量最大程度上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,从而进一步的改善编解码性能。
如图1中所示,视频译码系统1包含源装置10和目的地装置20。源装置10产生经编码视频数据。因此,源装置10可被称为视频编码装置。目的地装置20可对由源装置10所产生的经编码的视频数据进行解码。因此,目的地装置20可被称为视频解码装置。源装置10、目的地装置20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。
源装置10和目的地装置20可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。
目的地装置20可经由链路30从源装置10接收经编码视频数据。链路30可包括能够将经编码视频数据从源装置10移动到目的地装置20的一或多个媒体或装置。在一个实例中,链路30可包括使得源装置10能够实时将经编码视频数据直接发射到目的地装置20的一或多个通信媒体。在此实例中,源装置10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的地装置20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(radio frequency,RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备。
在另一实例中,可将经编码数据从输出接口140输出到存储装置40。类似地,可通过输入接口240从存储装置40存取经编码数据。存储装置40可包含多种分布式或本地存取的数据存储媒体中的任一者,例如硬盘驱动器、蓝光光盘、数字通用光盘(digital video disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码视频数据的任何其它合适 的数字存储媒体。
在另一实例中,存储装置40可对应于文件服务器或可保持由源装置10产生的经编码视频的另一中间存储装置。目的地装置20可经由流式传输或下载从存储装置40存取所存储的视频数据。文件服务器可为任何类型的能够存储经编码的视频数据并且将经编码的视频数据发射到目的地装置20的服务器。实例文件服务器包含网络服务器(例如,用于网站)、文件传输协议(file transfer protocol,FTP)服务器、网络附接式存储(network attached storage,NAS)装置或本地磁盘驱动器。目的地装置20可通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。这可包含无线信道(例如,无线保真(wIreless-fidelity,Wi-Fi)连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置40的传输可为流式传输、下载传输或两者的组合。
本申请提供的帧间预测的方法可应用于视频编解码以支持多种多媒体应用,例如空中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应用。在一些实例中,视频译码系统1可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。
图1中所说明的视频译码系统1仅为实例,并且本申请的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的装置执行编码和解码。
在图1的实例中,源装置10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,源装置10经由输出接口140将经编码视频数据直接发射到目的地装置20。在其它实例中,经编码视频数据还可存储到存储装置40上,供目的地装置20以后存取来用于解码和/或播放。
在图1的实例中,目的地装置20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包括多种显示装置,例如,液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。
尽管图1中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。在一些实例中,如果适用的话,那么解复用器(MUX-DEMUX)单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本申请技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。
本申请可大体上将视频编码器100称为将某些信息“发信号通知”或“发射”到例如视频解码器200的另一装置。术语“发信号通知”或“发射”可大体上指代用以对经压缩视频数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码码流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。
JCT-VC开发了H.265(HEVC)标准。HEVC标准化基于称作HEVC测试模型(HEVC model,HM)的视频解码装置的演进模型。H.265的最新标准文档可从http://www.itu.int/rec/T-REC-H.265获得,最新版本的标准文档为H.265(12/16),该标准文档以全文引用的方式并入本文中。HM假设视频解码装置相对于ITU-TH.264/AVC的现有算法具有若干额外能力。例如,H.264提供9种帧内预测编码模式,而HM可提供多达35种帧内预测编码模式。
JVET致力于开发H.266标准。H.266标准化的过程基于称作H.266测试模型的视频解码装置的演进模型。H.266的算法描述可从http://phenix.int-evry.fr/jvet获得,其中最新的算法描述包含于JVET-F1001-v2中,该算法描述文档以全文引用的方式并入本文中。同时,可从https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/获得JEM测试模型的参考软件,同样以全文引用的方式并入本文中。
一般来说,HM的工作模型描述可将视频帧或图像划分成包含亮度及色度样本两者的树块或最大编码单元(largest coding unit,LCU)的序列,LCU也被称为编码树单元(coding tree unit,CTU)。树块具有与H.264标准的宏块类似的目的。条带包含按解码次序的数个连续树块。可将视频帧或图像分割成一个或多个条带。可根据四叉树将每一树块分裂成编码单元(coding unit,CU)。例如,可将作为四叉树的根节点的树块分裂成四个子节点,且每一子节点可又为母节点且被分裂成另外四个子节点。作为四叉树的叶节点的最终不可分裂的子节点包括解码节点,例如,经解码视频块。与经解 码码流相关联的语法数据可定义树块可分裂的最大次数,且也可定义解码节点的最小大小。
编码单元包含解码节点及预测块(prediction unit,PU)以及与解码节点相关联的变换单元(transform unit,TU)。CU的大小对应于解码节点的大小且形状必须为正方形。CU的大小的范围可为8×8像素直到最大64×64像素或更大的树块的大小。每一CU可含有一个或多个PU及一个或多个TU。例如,与CU相关联的语法数据可描述将CU分割成一个或多个PU的情形。分割模式在CU是被跳过或经直接模式编码、帧内预测模式编码或帧间预测模式编码的情形之间可为不同的。PU可经分割成形状为非正方形。例如,与CU相关联的语法数据也可描述根据四叉树将CU分割成一个或多个TU的情形。TU的形状可为正方形或非正方形。
HEVC标准允许根据TU进行变换,TU对于不同CU来说可为不同的。TU通常基于针对经分割LCU定义的给定CU内的PU的大小而设定大小,但情况可能并非总是如此。TU的大小通常与PU相同或小于PU。在一些可行的实施方式中,可使用称作“残余四叉树”(residual qualtree,RQT)的四叉树结构将对应于CU的残余样本再分成较小单元。RQT的叶节点可被称作TU。可变换与TU相关联的像素差值以产生变换系数,变换系数可被量化。
一般来说,PU包含与预测过程有关的数据。例如,在PU经帧内模式编码时,PU可包含描述PU的帧内预测模式的数据。作为另一可行的实施方式,在PU经帧间模式编码时,PU可包含界定PU的运动矢量的数据。例如,界定PU的运动矢量的数据可描述运动矢量的水平分量、运动矢量的垂直分量、运动矢量的分辨率(例如,四分之一像素精确度或八分之一像素精确度)、运动矢量所指向的参考图像,和/或运动矢量的参考图像列表(例如,列表0、列表1或列表C)。
一般来说,TU使用变换及量化过程。具有一个或多个PU的给定CU也可包含一个或多个TU。在预测之后,视频编码器100可计算对应于PU的残余值。残余值包括像素差值,像素差值可变换成变换系数、经量化且使用TU扫描以产生串行化变换系数以用于熵解码。本申请通常使用术语“视频块”来指CU的解码节点。在一些特定应用中,本申请也可使用术语“视频块”来指包含解码节点以及PU及TU的树块,例如,LCU或CU。
视频序列通常包含一系列视频帧或图像。图像群组(group of picture,GOP)示例性地包括一系列、一个或多个视频图像。GOP可在GOP的头信息中、图像中的一者或多者的头信息中或在别处包含语法数据,语法数据描述包含于GOP中的图像的数目。图像的每一条带可包含描述相应图像的编码模式的条带语法数据。视频编码器100通常对个别视频条带内的视频块进行操作以便编码视频数据。视频块可对应于CU内的解码节点。视频块可具有固定或变化的大小,且可根据指定解码标准而在大小上不同。
作为一种可行的实施方式,HM支持各种PU大小的预测。假定特定CU的大小为2N×2N,HM支持2N×2N或N×N的PU大小的帧内预测,及2N×2N、2N×N、N×2N或N×N的对称PU大小的帧间预测。HM也支持2N×nU、2N×nD、nL×2N及nR×2N的PU大小的帧间预测的不对称分割。在不对称分割中,CU的一方向未分割,而另一方向分割成25%及75%。对应于25%区段的CU的部分由“n”后跟着“上(Up)”、“下 (Down)”、“左(Left)”或“右(Right)”的指示来指示。因此,例如,“2N×nU”指水平分割的2N×2NCU,其中2N×0.5NPU在上部且2N×1.5NPU在底部。
在本申请中,“N×N”与“N乘N”可互换使用以指依照垂直维度及水平维度的视频块的像素尺寸,例如,16×16像素或16乘16像素。一般来说,16×16块将在垂直方向上具有16个像素(y=16),且在水平方向上具有16个像素(x=16)。同样地,N×N块一股在垂直方向上具有N个像素,且在水平方向上具有N个像素,其中N表示非负整数值。可将块中的像素排列成行及列。此外,块未必需要在水平方向上与在垂直方向上具有相同数目个像素。例如,块可包括N×M个像素,其中M未必等于N。
在使用CU的PU的帧内预测性或帧间预测性解码之后,视频编码器100可计算CU的TU的残余数据。PU可包括空间域(也称作像素域)中的像素数据,且TU可包括在将变换(例如,离散余弦变换(discrete cosine transform,DCT)、整数变换、小波变换或概念上类似的变换)应用于残余视频数据之后变换域中的系数。残余数据可对应于未经编码图像的像素与对应于PU的预测值之间的像素差。视频编码器100可形成包含CU的残余数据的TU,且接着变换TU以产生CU的变换系数。
在任何变换以产生变换系数之后,视频编码器100可执行变换系数的量化。量化示例性地指对系数进行量化以可能减少用以表示系数的数据的量从而提供进一步压缩的过程。量化过程可减少与系数中的一些或全部相关联的位深度。例如,可在量化期间将n位值降值舍位到m位值,其中n大于m。
JEM模型对视频图像的编码结构进行了进一步的改进,具体的,被称为“四叉树结合二叉树”(QTBT)的块编码结构被引入进来。QTBT结构摒弃了HEVC中的CU,PU,TU等概念,支持更灵活的CU划分形状,一个CU可以正方形,也可以是长方形。一个CTU首先进行四叉树划分,该四叉树的叶节点进一步进行二叉树划分。同时,在二叉树划分中存在两种划分模式,对称水平分割和对称竖直分割。二叉树的叶节点被称为CU,JEM的CU在预测和变换的过程中都不可以被进一步划分,也就是说JEM的CU,PU,TU具有相同的块大小。在现阶段的JEM中,CTU的最大尺寸为256×256亮度像素。
在一些可行的实施方式中,视频编码器100可利用预定义扫描次序来扫描经量化变换系数以产生可经熵编码的串行化向量。在其它可行的实施方式中,视频编码器100可执行自适应性扫描。在扫描经量化变换系数以形成一维向量之后,视频编码器100可根据上下文自适应性可变长度解码(context-based adaptive variable-length code,CAVLC)、上下文自适应性二进制算术解码(context-based adaptive binary arithmetic coding,CABAC)、基于语法的上下文自适应性二进制算术解码(syntax-based adaptive binary arithmetic coding,SBAC)、概率区间分割熵(probability interval partitioning entropy,PIPE)解码或其他熵解码方法来熵解码一维向量。视频编码器100也可熵编码与经编码视频数据相关联的语法元素以供视频解码器200用于解码视频数据。
为了执行CABAC,视频编码器100可将上下文模型内的上下文指派给待传输的符号。上下文可与符号的相邻值是否为非零有关。为了执行CAVLC,视频编码器100可选择待传输的符号的可变长度码。可变长度解码(variable-length code,VLC)中的码字可经构建以使得相对较短码对应于可能性较大的符号,而较长码对应于可能性较小 的符号。以这个方式,VLC的使用可相对于针对待传输的每一符号使用相等长度码字达成节省码率的目的。基于指派给符号的上下文可以确定CABAC中的概率。
在本申请实施例中,视频编码器可执行帧间预测以减少图像之间的时间冗余。如前文所描述,根据不同视频压缩编解码标准的规定,CU可具有一个或多个预测单元PU。换句话说,多个PU可属于CU,或者PU和CU的尺寸相同。在本文中当CU和PU尺寸相同时,CU的分割模式为不分割,或者即为分割为一个PU,且统一使用PU进行表述。当视频编码器执行帧间预测时,视频编码器可用信号通知视频解码器用于PU的运动信息。示例性的,PU的运动信息可以包括:参考图像索引、运动矢量和预测方向标识。运动矢量可指示PU的图像块(也称视频块、像素块、像素集合等)与PU的参考块之间的位移。PU的参考块可为类似于PU的图像块的参考图像的一部分。参考块可定位于由参考图像索引和预测方向标识指示的参考图像中。
为了减少表示PU的运动信息所需要的编码比特的数目,视频编码器可根据合并预测模式或高级运动矢量预测模式过程产生用于PU中的每一者的候选预测运动矢量(Motion Vector,MV)列表。用于PU的候选预测运动矢量列表中的每一候选预测运动矢量可指示运动信息。由候选预测运动矢量列表中的一些候选预测运动矢量指示的运动信息可基于其它PU的运动信息。如果候选预测运动矢量指示指定空间候选预测运动矢量位置或时间候选预测运动矢量位置中的一者的运动信息,则本申请可将所述候选预测运动矢量称作“原始”候选预测运动矢量。举例来说,对于合并模式,在本文中也称为合并预测模式,可存在五个原始空间候选预测运动矢量位置和一个原始时间候选预测运动矢量位置。在一些实例中,视频编码器可通过组合来自不同原始候选预测运动矢量的部分运动矢量、修改原始候选预测运动矢量或仅插入零运动矢量作为候选预测运动矢量来产生额外候选预测运动矢量。这些额外候选预测运动矢量不被视为原始候选预测运动矢量且在本申请中可称作人工产生的候选预测运动矢量。
本申请的技术一般涉及用于在视频编码器处产生候选预测运动矢量列表的技术和用于在视频解码器处产生相同候选预测运动矢量列表的技术。视频编码器和视频解码器可通过实施用于构建候选预测运动矢量列表的相同技术来产生相同候选预测运动矢量列表。举例来说,视频编码器和视频解码器两者可构建具有相同数目的候选预测运动矢量(例如,五个候选预测运动矢量)的列表。视频编码器和解码器可首先考虑空间候选预测运动矢量(例如,同一图像中的相邻块),接着考虑时间候选预测运动矢量(例如,不同图像中的候选预测运动矢量),且最后可考虑人工产生的候选预测运动矢量直到将所要数目的候选预测运动矢量添加到列表为止。根据本申请的技术,可在候选预测运动矢量列表构建期间针对某些类型的候选预测运动矢量利用修剪操作以便从候选预测运动矢量列表移除重复,而对于其它类型的候选预测运动矢量,可能不使用修剪以便减小解码器复杂性。举例来说,对于空间候选预测运动矢量集合和对于时间候选预测运动矢量,可执行修剪操作以从候选预测运动矢量的列表排除具有重复运动信息的候选预测运动矢量。然而,当将人工产生的候选预测运动矢量添加到候选预测运动矢量的列表时,可在不对人工产生的候选预测运动矢量执行修剪操作的情况下添加人工产生的候选预测运动矢量。
在产生用于CU的PU的候选预测运动矢量列表之后,视频编码器可从候选预测 运动矢量列表选择候选预测运动矢量且在码流中输出候选预测运动矢量索引。选定候选预测运动矢量可为具有产生最紧密地匹配正被解码的目标PU的预测子的运动矢量的候选预测运动矢量。候选预测运动矢量索引可指示在候选预测运动矢量列表中选定候选预测运动矢量的位置。视频编码器还可基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。可基于由选定候选预测运动矢量指示的运动信息确定PU的运动信息。举例来说,在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。在AMVP模式中,PU的运动信息可基于PU的运动矢量差和由选定候选预测运动矢量指示的运动信息确定。视频编码器可基于CU的PU的预测性图像块和用于CU的原始图像块产生用于CU的一或多个残余图像块。视频编码器可接着编码一或多个残余图像块且在码流中输出一或多个残余图像块。
码流可包括识别PU的候选预测运动矢量列表中的选定候选预测运动矢量的数据。视频解码器可基于由PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频解码器可基于PU的运动信息识别用于PU的一或多个参考块。在识别PU的一或多个参考块之后,视频解码器可基于PU的一或多个参考块产生用于PU的预测性图像块。视频解码器可基于用于CU的PU的预测性图像块和用于CU的一或多个残余图像块来重构用于CU的图像块。
为了易于解释,本申请可将位置或图像块描述为与CU或PU具有各种空间关系。此描述可解释为是指位置或图像块和与CU或PU相关联的图像块具有各种空间关系。此外,本申请可将视频解码器当前在解码的PU称作当前PU,也称为当前待处理图像块。本申请可将视频解码器当前在解码的CU称作当前CU。本申请可将视频解码器当前在解码的图像称作当前图像。应理解,本申请同时适用于PU和CU具有相同尺寸,或者PU即为CU的情况,统一使用PU来表示。
如前文简短地描述,视频编码器100可使用帧间预测以产生用于CU的PU的预测性图像块和运动信息。在许多例子中,给定PU的运动信息可能与一或多个附近PU(即,其图像块在空间上或时间上在给定PU的图像块附近的PU)的运动信息相同或类似。因为附近PU经常具有类似运动信息,所以视频编码器100可参考附近PU的运动信息来编码给定PU的运动信息。参考附近PU的运动信息来编码给定PU的运动信息可减少码流中指示给定PU的运动信息所需要的编码比特的数目。
视频编码器100可以各种方式参考附近PU的运动信息来编码给定PU的运动信息。举例来说,视频编码器100可指示给定PU的运动信息与附近PU的运动信息相同。本申请可使用合并模式来指代指示给定PU的运动信息与附近PU的运动信息相同或可从附近PU的运动信息导出。在另一可行的实施方式中,视频编码器100可计算用于给定PU的运动矢量差(Motion Vector Difference,MVD)。MVD指示给定PU的运动矢量与附近PU的运动矢量之间的差。视频编码器100可将MVD而非给定PU的运动矢量包括于给定PU的运动信息中。在码流中表示MVD比表示给定PU的运动矢量所需要的编码比特少。本申请可使用高级运动矢量预测模式指代通过使用MVD和识别候选者运动矢量的索引值来用信号通知解码端给定PU的运动信息。
为了使用合并模式或AMVP模式来用信号通知解码端给定PU的运动信息,视频编码器100可产生用于给定PU的候选预测运动矢量列表。候选预测运动矢量列表可 包括一或多个候选预测运动矢量。用于给定PU的候选预测运动矢量列表中的候选预测运动矢量中的每一者可指定运动信息。由每一候选预测运动矢量指示的运动信息可包括运动矢量、参考图像索引和预测方向标识。候选预测运动矢量列表中的候选预测运动矢量可包括“原始”候选预测运动矢量,其中每一者指示不同于给定PU的PU内的指定候选预测运动矢量位置中的一者的运动信息。
在产生用于PU的候选预测运动矢量列表之后,视频编码器100可从用于PU的候选预测运动矢量列表选择候选预测运动矢量中的一者。举例来说,视频编码器可比较每一候选预测运动矢量与正被解码的PU且可选择具有所要码率-失真代价的候选预测运动矢量。视频编码器100可输出用于PU的候选预测运动矢量索引。候选预测运动矢量索引可识别选定候选预测运动矢量在候选预测运动矢量列表中的位置。
此外,视频编码器100可基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。可基于由用于PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。举例来说,在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。在AMVP模式中,可基于用于PU的运动矢量差和由选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频编码器100可如前文所描述处理用于PU的预测性图像块。
当视频解码器200接收到码流时,视频解码器200可产生用于CU的PU中的每一者的候选预测运动矢量列表。由视频解码器200针对PU产生的候选预测运动矢量列表可与由视频编码器100针对PU产生的候选预测运动矢量列表相同。从码流中解析得到的语法元素可指示在PU的候选预测运动矢量列表中选定候选预测运动矢量的位置。在产生用于PU的候选预测运动矢量列表之后,视频解码器200可基于由PU的运动信息指示的一或多个参考块产生用于PU的预测性图像块。视频解码器200可基于由用于PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频解码器200可基于用于PU的预测性图像块和用于CU的残余图像块重构用于CU的图像块。
应理解,在一种可行的实施方式中,在解码端,候选预测运动矢量列表的构建与从码流中解析选定候选预测运动矢量在候选预测运动矢量列表中的位置是相互独立,可以任意先后或者并行进行的。
在另一种可行的实施方式中,在解码端,首先从码流中解析选定候选预测运动矢量在候选预测运动矢量列表中的位置,根据解析出来的位置构建候选预测运动矢量列表,在该实施方式中,不需要构建全部的候选预测运动矢量列表,只需要构建到该解析出来的位置处的候选预测运动矢量列表,即能够确定该位置处的候选预测运动矢量即可。举例来说,当解析码流得出选定的候选预测运动矢量为候选预测运动矢量列表中索引为3的候选预测运动矢量时,仅需要构建从索引为0到索引为3的候选预测运动矢量列表,即可确定索引为3的候选预测运动矢量,可以达到减小复杂度,提高解码效率的技术效果。
图2为本申请实施例中所描述的一种实例的视频编码器100的框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑 装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。
在图2的实例中,视频编码器100包括预测处理单元108、滤波器单元106、经解码图像缓冲器(decoded picture buffer,DPB)107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106既定表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(adaptive loop filter,ALF)和样本自适应偏移(sample adaptive offset,SAO)滤波器。尽管在图2中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。
视频数据存储器可存储待由视频编码器100的组件编码的视频数据。可从视频源120获得存储在视频数据存储器中的视频数据。DPB 107可为参考图像存储器,其存储用于由视频编码器100在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。视频数据存储器和DPB 107可由多种存储器装置中的任一者形成,例如包含同步动态随机存储器(synchronous dynamic random access memory,SDRAM)的动态随机存取存储器(dynamic random access memory,DRAM)、磁阻式RAM(magnetic random access memory,MRAM)、电阻式RAM(resistive random access memory,RRAM),或其它类型的存储器装置。视频数据存储器和DPB 107可由同一存储器装置或单独存储器装置提供。在各种实例中,视频数据存储器可与视频编码器100的其它组件一起在芯片上,或相对于那些组件在芯片外。
如图2所示,视频编码器100接收视频数据,并将所述视频数据存储在视频数据存储器中。分割单元将所述视频数据分割成若干图像块,而且这些图像块可以被进一步分割为更小的块,例如基于四叉树结构或者二叉树结构的图像块分割。此分割还可包含分割成条带(slice)、片(tile)或其它较大单元。视频编码器100通常说明编码待编码的视频条带内的图像块的组件。所述条带可分成多个图像块(并且可能分成被称作片的图像块集合)。预测处理单元108可选择用于当前图像块的多个可能的译码模式中的一者,例如多个帧内译码模式中的一者或多个帧间译码模式中的一者。预测处理单元108可将所得经帧内、帧间译码的块提供给求和器112以产生残差块,且提供给求和器111以重构用作参考图像的经编码块。
预测处理单元108内的帧内预测器109可相对于与待编码当前块在相同帧或条带中的一或多个相邻块执行当前图像块的帧内预测性编码,以去除空间冗余。预测处理单元108内的帧间预测器110可相对于一或多个参考图像中的一或多个预测块执行当前图像块的帧间预测性编码以去除时间冗余。
具体的,帧间预测器110可用于确定用于编码当前图像块的帧间预测模式。举例来说,帧间预测器110可使用码率-失真分析来计算候选帧间预测模式集合中的各种帧间预测模式的码率-失真值,并从中选择具有最佳码率-失真特性的帧间预测模式。码 率失真分析通常确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量,以及用于产生经编码块的位码率(也就是说,位数目)。例如,帧间预测器110可确定候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。
帧间预测器110用于基于确定的帧间预测模式,预测当前图像块中一个或多个子块的运动信息(例如运动矢量),并利用当前图像块中一个或多个子块的运动信息(例如运动矢量)获取或产生当前图像块的预测块。帧间预测器110可在参考图像列表中的一者中定位所述运动向量指向的预测块。帧间预测器110还可产生与图像块和视频条带相关联的语法元素以供视频解码器200在对视频条带的图像块解码时使用。又或者,一种示例下,帧间预测器110利用每个子块的运动信息执行运动补偿过程,以生成每个子块的预测块,从而得到当前图像块的预测块;应当理解的是,这里的帧间预测器110执行运动估计和运动补偿过程。
具体的,在为当前图像块选择帧间预测模式之后,帧间预测器110可将指示当前图像块的所选帧间预测模式的信息提供到熵编码器103,以便于熵编码器103编码指示所选帧间预测模式的信息。
帧内预测器109可对当前图像块执行帧内预测。明确地说,帧内预测器109可确定用来编码当前块的帧内预测模式。举例来说,帧内预测器109可使用码率-失真分析来计算各种待测试的帧内预测模式的码率-失真值,并从待测试模式当中选择具有最佳码率-失真特性的帧内预测模式。在任何情况下,在为图像块选择帧内预测模式之后,帧内预测器109可将指示当前图像块的所选帧内预测模式的信息提供到熵编码器103,以便熵编码器103编码指示所选帧内预测模式的信息。
在预测处理单元108经由帧间预测、帧内预测产生当前图像块的预测块之后,视频编码器100通过从待编码的当前图像块减去所述预测块来形成残差图像块。求和器112表示执行此减法运算的一或多个组件。所述残差块中的残差视频数据可包含在一或多个(transform unit,TU)中,并应用于变换器101。变换器101使用例如离散余弦变换(discrete cosine transform,DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换器101可将残差视频数据从像素值域转换到变换域,例如频域。
变换器101可将所得变换系数发送到量化器102。量化器102量化所述变换系数以进一步减小位码率。在一些实例中,量化器102可接着执行对包含经量化的变换系数的矩阵的扫描。或者,熵编码器103可执行扫描。
在量化之后,熵编码器103对经量化变换系数进行熵编码。举例来说,熵编码器103可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器103熵编码之后,可将经编码码流发射到视频解码器200,或经存档以供稍后发射或由视频解码器200检索。熵编码器103还可对待编码的当前图像块的语法元素进行熵编码。
反量化器104和反变化器105分别应用逆量化和逆变换以在像素域中重构所述残差块,例如以供稍后用作参考图像的参考块。求和器111将经重构的残差块添加到由 帧间预测器110或帧内预测器109产生的预测块,以产生经重构图像块。滤波器单元106可以适用于经重构图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重构图像块作为参考块存储在经解码图像缓冲器107中,可由帧间预测器110用作参考块以对后续视频帧或图像中的块进行帧间预测。
应当理解的是,视频编码器100的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器100可以直接地量化残差信号而不需要经变换器101处理,相应地也不需要经反变换器105处理;或者,对于某些图像块或者图像帧,视频编码器100没有产生残差数据,相应地不需要经变换器101、量化器102、反量化器104和反变换器105处理;或者,视频编码器100可以将经重构图像块作为参考块直接地进行存储而不需要经滤波器单元106处理;或者,视频编码器100中量化器102和反量化器104可以合并在一起。
图3为本申请实施例中所描述的一种实例的视频解码器200的框图。在图3的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及DPB 207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行大体上与相对于来自图2的视频编码器100描述的编码过程互逆的解码过程。
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频码流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频码流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频码流的经编码视频数据的经解码图像缓冲器(CPB)。因此,尽管在图3中没有示意出视频数据存储器,但视频数据存储器和DPB 207可以是同一个的存储器,也可以是单独设置的存储器。视频数据存储器和DPB 207可由多种存储器装置中的任一者形成,例如:包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。在各种实例中,视频数据存储器可与视频解码器200的其它组件一起集成在芯片上,或相对于那些组件设置在芯片外。
网络实体42可例如为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它此装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频码流发送到视频解码器200之前,网络实体42可实施本申请中描述的技术中的部分。在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。在一些情况下,网络实体42可为图1的存储装置40的实例。
视频解码器200的熵解码器203对码流进行熵解码以产生经量化的系数和一些语法元素。熵解码器203将语法元素转发到预测处理单元208。视频解码器200可接收在视频条带层级和/或图像块层级处的语法元素。
当视频条带被解码为经帧内解码(I)条带时,预测处理单元208的帧内预测器209可基于发信号通知的帧内预测模式和来自当前帧或图像的先前经解码块的数据而产生当前视频条带的图像块的预测块。当视频条带被解码为经帧间解码(即,B或P)条带时,预测处理单元208的帧间预测器210可基于从熵解码器203接收到的语法元素,确定用于对当前视频条带的当前图像块进行解码的帧间预测模式,基于确定的帧间预测模式,对所述当前图像块进行解码(例如执行帧间预测)。具体的,帧间预测器210可确定是否对当前视频条带的当前图像块采用新的帧间预测模式进行预测,如果语法元素指示采用新的帧间预测模式来对当前图像块进行预测,基于新的帧间预测模式(例如通过语法元素指定的一种新的帧间预测模式或默认的一种新的帧间预测模式)预测当前视频条带的当前图像块或当前图像块的子块的运动信息,从而通过运动补偿过程使用预测出的当前图像块或当前图像块的子块的运动信息来获取或生成当前图像块或当前图像块的子块的预测块。这里的运动信息可以包括参考图像信息和运动矢量,其中参考图像信息可以包括但不限于单向/双向预测信息,参考图像列表号和参考图像列表对应的参考图像索引。对于帧间预测,可从参考图像列表中的一者内的参考图像中的一者产生预测块。视频解码器200可基于存储在DPB 207中的参考图像来建构参考图像列表,即列表0和列表1。当前图像的参考帧索引可包含于参考帧列表0和列表1中的一或多者中。在一些实例中,可以是视频编码器100发信号通知指示是否采用新的帧间预测模式来解码特定块的特定语法元素,或者,也可以是发信号通知指示是否采用新的帧间预测模式,以及指示具体采用哪一种新的帧间预测模式来解码特定块的特定语法元素。应当理解的是,这里的帧间预测器210执行运动补偿过程。
反量化器204将在码流中提供且由熵解码器203解码的经量化变换系数逆量化,即去量化。逆量化过程可包括:使用由视频编码器100针对视频条带中的每个图像块计算的量化参数来确定应施加的量化程度以及同样地确定应施加的逆量化程度。反变换器205将逆变换应用于变换系数,例如逆DCT、逆整数变换或概念上类似的逆变换过程,以便产生像素域中的残差块。
在帧间预测器210产生用于当前图像块或当前图像块的子块的预测块之后,视频解码器200通过将来自反变换器205的残差块与由帧间预测器210产生的对应预测块求和以得到重建的块,即经解码图像块。求和器211表示执行此求和操作的组件。在需要时,还可使用环路滤波器(在解码环路中或在解码环路之后)来使像素转变平滑或者以其它方式改进视频质量。滤波器单元206可以表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)以及样本自适应偏移(SAO)滤波器。尽管在图3中将滤波器单元206示出为环路内滤波器,但在其它实现方式中,可将滤波器单元206实施为环路后滤波器。在一种示例下,滤波器单元206适用于重建块以减小块失真,并且该结果作为经解码视频流输出。并且,还可以将给定帧或图像中的经解码图像块存储在经解码图像缓冲器207中,经DPB 207存储用于后续运动补偿的参考图像。经DPB 207可为存储器的一部分,其还可以存储经解码视频,以供稍后在显示装置(例如图1的显示装置220)上呈现,或可与此类存储器分开。
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频码流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某 些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。
如前文所注明,本申请的技术示例性地涉及帧间解码。应理解,本申请的技术可通过本申请中所描述的视频解码器中的任一者进行,视频解码器包含(例如)如关于图1到3所展示及描述的视频编码器100及视频解码器200。即,在一种可行的实施方式中,关于图2所描述的帧间预测器110可在视频数据的块的编码期间在执行帧间预测时执行下文中所描述的特定技术。在另一可行的实施方式中,关于图3所描述的帧间预测器210可在视频数据的块的解码期间在执行帧间预测时执行下文中所描述的特定技术。因此,对一般性“视频编码器”或“视频解码器”的引用可包含视频编码器100、视频解码器200或另一视频编码或编码单元。
图4为本申请实施例中帧间预测模块的一种示意性框图。帧间预测模块121,示例性的,可以包括运动估计单元和运动补偿单元。在不同的视频压缩编解码标准中,PU和CU的关系各有不同。帧间预测模块121可根据多个分割模式将当前CU分割为PU。举例来说,帧间预测模块121可根据2N×2N、2N×N、N×2N和N×N分割模式将当前CU分割为PU。在一种实现方式中,根据本申请实施例的技术方案,帧间预测模块121也可根据确定的基本预测块的尺寸,将当前CU分割为PU;在此场景下,CU为待处理图像块,PU为基本预测块。在其他实施例中,当前CU即为当前PU,不作限定。
帧间预测模块121可对PU中的每一者执行运动估计,获取其运动矢量。一种实现方式中,运动估计可以包括整数运动估计(Integer Motion Estimation,IME)且接着执行分数运动估计(Fraction Motion Estimation,FME)。当帧间预测模块121对PU执行IME时,帧间预测模块121可在一个或多个参考图像中搜索用于PU的参考块。在找到用于PU的参考块之后,帧间预测模块121可产生以整数精度指示PU与用于PU的参考块之间的空间位移的运动矢量。当帧间预测模块121对PU执行FME时,帧间预测模块121可改进通过对PU执行IME而产生的运动矢量。通过对PU执行FME而产生的运动矢量可具有子整数精度(例如,1/2像素精度、1/4像素精度等)。在产生用于PU的运动矢量之后,帧间预测模块121可使用用于PU的运动矢量以产生用于PU的预测性图像块。一种可能的实现方式中,根据本申请实施例的技术方案,帧间预测模块121对PU的第一参考块对应的运动矢量、PU的第二参考块对应的运动矢量以及与CU具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算得到PU对应的运动矢量。
在帧间预测模块121使用AMVP模式用信号通知解码端PU的运动信息的一些可行的实施方式中,帧间预测模块121可产生用于PU的候选预测运动矢量列表。候选预测运动矢量列表可包括一个或多个原始候选预测运动矢量和从原始候选预测运动矢量导出的一个或多个额外候选预测运动矢量。在产生用于PU的候选预测运动矢量列表之后,帧间预测模块121可从候选预测运动矢量列表选择候选预测运动矢量且产生用于PU的运动矢量差(MVD)。用于PU的MVD可指示由选定候选预测运动矢量指示的运动矢量与使用IME和FME针对PU产生的运动矢量之间的差。在这些可行的实施方式中,帧间预测模块121可输出识别选定候选预测运动矢量在候选预测运动矢量列 表中的位置的候选预测运动矢量索引。帧间预测模块121还可输出PU的MVD。
除了通过对PU执行IME和FME来产生用于PU的运动信息外,帧间预测模块121还可对PU中的每一者执行合并(Merge)操作。当帧间预测模块121对PU执行合并操作时,帧间预测模块121可产生用于PU的候选预测运动矢量列表。用于PU的候选预测运动矢量列表可包括一个或多个原始候选预测运动矢量和从原始候选预测运动矢量导出的一个或多个额外候选预测运动矢量。候选预测运动矢量列表中的原始候选预测运动矢量可包括一个或多个空间候选预测运动矢量和时间候选预测运动矢量。空间候选预测运动矢量可指示当前图像中的其它PU的运动信息。时间候选预测运动矢量可基于不同于当前图像的对应的PU的运动信息。时间候选预测运动矢量还可称作时间运动矢量预测(TMVP)。
在产生候选预测运动矢量列表之后,帧间预测模块121可从候选预测运动矢量列表选择候选预测运动矢量中的一个。帧间预测模块121可接着基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。
在基于IME和FME产生用于PU的预测性图像块和基于合并操作产生用于PU的预测性图像块之后,帧间预测模块121可选择通过FME操作产生的预测性图像块或者通过合并操作产生的预测性图像块。在一些可行的实施方式中,帧间预测模块121可基于通过FME操作产生的预测性图像块和通过合并操作产生的预测性图像块的码率-失真代价分析来选择用于PU的预测性图像块。
在帧间预测模块121已选择通过根据分割模式中的每一者分割当前CU而产生的PU的预测性图像块之后(在一些实施方式中,编码树单元CTU划分为CU后,不会再进一步划分为更小的PU,此时PU等同于CU),帧间预测模块121可选择用于当前CU的分割模式。在一些实施方式中,帧间预测模块121可基于通过根据分割模式中的每一者分割当前CU而产生的PU的选定预测性图像块的码率-失真代价分析来选择用于当前CU的分割模式。帧间预测模块121可将与属于选定分割模式的PU相关联的预测性图像块输出到残差产生模块102。帧间预测模块121可将指示属于选定分割模式的PU的运动信息的语法元素输出到熵编码模块。
在图4的示意图中,帧间预测模块121包括IME模块180A到180N(统称为“IME模块180”)、FME模块182A到182N(统称为“FME模块182”)、合并模块184A到184N(统称为“合并模块184”)、PU模式决策模块186A到186N(统称为“PU模式决策模块186”)和CU模式决策模块188(也可以包括执行从CTU到CU的模式决策过程)。
IME模块180、FME模块182和合并模块184可对当前CU的PU执行IME操作、FME操作和合并操作。图4的示意图中将帧间预测模块121说明为包括用于CU的每一分割模式的每一PU的单独IME模块180、FME模块182和合并模块184。在其它可行的实施方式中,帧间预测模块121不包括用于CU的每一分割模式的每一PU的单独IME模块180、FME模块182和合并模块184。
如图4的示意图中所说明,IME模块180A、FME模块182A和合并模块184A可对通过根据2N×2N分割模式分割CU而产生的PU执行IME操作、FME操作和合并操作。PU模式决策模块186A可选择由IME模块180A、FME模块182A和合并模块 184A产生的预测性图像块中的一者。
IME模块180B、FME模块182B和合并模块184B可对通过根据N×2N分割模式分割CU而产生的左PU执行IME操作、FME操作和合并操作。PU模式决策模块186B可选择由IME模块180B、FME模块182B和合并模块184B产生的预测性图像块中的一者。
IME模块180C、FME模块182C和合并模块184C可对通过根据N×2N分割模式分割CU而产生的右PU执行IME操作、FME操作和合并操作。PU模式决策模块186C可选择由IME模块180C、FME模块182C和合并模块184C产生的预测性图像块中的一者。
IME模块180N、FME模块182N和合并模块184可对通过根据N×N分割模式分割CU而产生的右下PU执行IME操作、FME操作和合并操作。PU模式决策模块186N可选择由IME模块180N、FME模块182N和合并模块184N产生的预测性图像块中的一者。
PU模式决策模块186可基于多个可能预测性图像块的码率-失真代价分析选择预测性图像块,且选择针对给定解码情形提供最佳码率-失真代价的预测性图像块。示例性的,对于带宽受限的应用,PU模式决策模块186可偏向选择增加压缩比的预测性图像块,而对于其它应用,PU模式决策模块186可偏向选择增加经重建视频质量的预测性图像块。在PU模式决策模块186选择用于当前CU的PU的预测性图像块之后,CU模式决策模块188选择用于当前CU的分割模式且输出属于选定分割模式的PU的预测性图像块和运动信息。
图5示出了本申请实施例中一种示例性的待处理图像块和其参考块的示意图。如图5所示,W和H是待处理图像块500以及待处理图像块在指定参考图像中的co-located块(简称为映射图像块)500’的宽度和高度。图5中每个块中标注的内容,为该块对应的运动矢量。例如,图5中标注的P(x,y)为待处理图像块500中基本预测块604对应的运动矢量。待处理图像块的参考块包括:待处理图像块的上侧空域邻接块和左侧空域邻接块,以及映射图像块的下侧空域邻接块和右侧空域邻接块,其中映射图像块为指定参考图像中与待处理图像块具有相同的大小、形状的图像块,并且映射图像块在指定参考图像中的位置和待处理图像块在其所在图像(一般指当前待处理图像)中的位置相同。映射图像块的下侧空域邻接块和右侧空域邻接块也可以被称作时域参考块。每帧图像可以被划分为用于编码的图像块,称之为待处理图像块,这些图像块可以被进一步划分为更小的块,称之为基本预测块。例如,待处理图像块和映射图像块可以被分割成多个M×N子块,即每个子块的大小均为M×N像素,不妨设每个参考块的大小也为M×N像素,即与待处理图像块的子块的大小相同。“M×N”与“M乘N”可互换使用以指依照水平维度及垂直维度的图像子块的像素尺寸,即在水平方向上具有M个像素,且在垂直方向上具有N个像素,其中M、N表示非负整数值。此外,M和N不一定相同。
举例说明,M可以等于N,M、N均为4,即子块的大小为4×4,M也可以不等于N,比如M=8,N=4,即子块的大小为8×4。在可行的实施方式中,示例性的,待处理图像块的子块大小和参考块的大小可以是4×4,8×8,8×4或4×8像素,或者标 准允许的预测块的最小尺寸。在一种可行的实施方式中,W和H的度量单位分别为子块的宽度和高度,即W表示待处理图像块的宽和待处理图像块中子块的宽的比值,H表示待处理图像块的高和待处理图像块中子块的高的比值。此外,本申请描述的待处理图像块可以理解为但不限于:预测单元(prediction unit,PU)或者编码单元(coding unit,CU)或者变换单元(transform unit,TU)等。根据不同视频压缩编解码标准的规定,CU可包含一个或多个预测单元PU,或者PU和CU的尺寸相同。待处理图像块可具有固定或可变的大小,且根据不同视频压缩编解码标准而在大小上不同。此外,待处理图像块是指当前待编码或当前待解码的图像块,例如待编码或待解码的预测单元。待处理图像块可以为待处理的图像中的一部分或者全部,本申请对此不进行具体限定。
在一种示例下,如图5所示,可以沿着方向1依次判断待处理图像块的每个左侧空域邻接块是否可用,以及可以沿着方向2依次判断待处理图像块的每个上侧空域邻接块是否可用。例如,判断上述邻接块是否采用帧间编码,如果邻接块存在且采用帧间编码,则所述邻接块可用;如果邻接块不存在或者采用帧内编码,则所述邻接块不可用。在一种可行的实施方式中,如果一个邻接块采用帧内编码,则复制邻近的其它参考块的运动信息作为该邻接块的运动信息。按照类似方法检测映射图像块的下侧空域邻接块和右侧空域邻接块是否可用,在此不再赘述。
应理解,运动信息的存储可以存在不同的颗粒度,比如在H.264和H.265标准中,运动信息是以4×4像素集合为存储运动信息的基本单元的。示例性的,还可以以2×2,8×8,4×8,8×4等像素集合作为存储运动信息的基本单元。在本文中,不妨将存储运动信息的基本单元简称为基本存储单元。
当上述参考块的大小与基本存储单元的大小一致时,可以直接获取该参考块对应的基本存储单元所存储的运动信息作为该参考块对应的运动信息。
或者,当上述参考块的大小小于基本存储单元的大小时,可以直接获取该参考块对应的基本存储单元所存储的运动信息作为该参考块对应的运动信息。
或者,当上述参考块的大小大于存储运动信息的基本单元的大小时,可以获取参考块预定位置处对应的基本存储单元所存储的运动信息。示例性的,可以获取参考块左上角点处对应的基本存储单元所存储的运动信息,或者,可以获取参考块中心点处对应的基本存储单元所存储的运动信息,作为该参考块对应的运动信息。
在本申请实施例中,为了方便描述,待处理图像的子块又被称为基本预测块。
图6示例性的示出了本申请实施例提供的帧间预测的方法,在该方法中根据待处理图像块的参考块对应的运动矢量加权获得待处理图像块内部各个基本预测块的运动矢量的示意流程图,该方法可以包括:
S601、根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,该尺寸用于确定基本预测块在待处理图像块中的位置。
其中,待处理图像块即编码器或解码器当前处理的图像块,后文中称之为待处理理图像块或者当前待处理图像块。
在一种可行的实施方式中,尺寸参考信息可以为基本预测块的形状,待处理图像块中的基本预测块的尺寸,可以是编解码端根据尺寸参考信息预先确定的固定值,并 分别固化在编解码端。其中,不同的形状与尺寸值的对应关系,可以根据实际需求配置,本申请实施例对此不进行具体限定。
示例性的,当基本预测块的两条邻边的边长不等时,即基本预测块为非正方形的长方形(non-square),确定基本预测块的较短的一条边的边长为4或8;当基本预测块的两条邻边的边长相等时,即基本预测块为正方形,确定所述基本预测块的边长为4或8。应理解,上述边长为4或8只是一个示例值,也可以是16,24等其它常数。
在一种可行的实施方式中,尺寸参考信息可以为第一标识,该第一标识用于指示基本预测块的尺寸,待处理图像块中的基本预测块的尺寸可以通过解析码流,从码流中获得第一标识确定。具体的,S601可以实现为:接收码流,并从码流中解析第一标识。其中,第一标识位于待处理图像块所在序列的序列参数集(sequence parameter set,SPS)、待处理图像块所在图像的图像参数集(picture parameter set,PPS)和待处理图像块所在条带的条带头(slice header,或者slice segment header)中的任一个所对应的码流段中。
一种可能的实现中,第一标识可以为语法元素,此时,S601具体可以实现为:从码流中解析相应的语法元素,进而确定基本预测块的尺寸。而该语法元素可以携带于码流中对应SPS的码流部分,也可以携带于码流中对应PPS的码流部分,还可以携带于码流中对应条带头的码流部分。
应理解,当从SPS中解析出第一标识确定基本预测块的尺寸时,整个序列中的基本预测块采用相同的尺寸;当从PPS中解析出第一标识确定基本预测块的尺寸时,整个图像帧中的基本预测块采用相同的尺寸;当从条带头中解析出第一标识确定基本预测块的尺寸时,整个条带中的基本预测块采用相同的尺寸。
应理解,在本文中,图像和图像帧是不同的概念,图像包括以整帧形式存在的图像(即图像帧),也包括以条带(slice)形式存在的图像,以片(tile)形式存在的图像,或者以其它子图像的形式存在的图像,不做限定。
应理解,对于采用帧内预测的条带,由于不需要确定基本预测块的尺寸,因此,采用帧内预测的条带的条带头不存在上述第一标识。
具体的,编码端通过适当的方式确定基本预测块的尺寸(比如,率失真选择的方式,或者实验经验值的方式,或者本申请实施例中通过第一标识确定基本预测块的尺寸之外的其他方式),将确定后的基本预测块的尺寸作为第一标识编入码流,解码端从码流中解析第一标识确定出基本预测块的尺寸。
在一种可行的实施方式中,尺寸参考信息可以包括历史信息,待处理图像块中的基本预测块的尺寸通过历史信息来确定,因此可以分别在编解码端自适应地获得。其中,历史信息是指在当前待处理图像块之前,已经经过编解码的图像块的信息。例如,历史信息可以包括在先已重构图像中平面模式预测块的尺寸。具体的,可以根据在先已重构图像中平面模式预测块的尺寸,确定基本预测块的尺寸。其中,平面模式预测块为根据本申请实施例提供的帧间预测的方法进行帧间预测的待处理图像块,在先已重构图像为编码顺序位于当前待处理图像块所在图像之前的图像。
应理解,当确定当前待处理图像块的基本预测块尺寸时,在先已重构图像中平面模式预测块已经处理完毕,平面模式预测块实际为根据本申请实施例提供的帧间预测 的方法完成帧间预测的图像块。本文相关段落均依此解释,不再赘述。
不妨将采用本申请实施例中所述的帧间预测的方法(比如,图6所示的方法)进行帧间预测的待处理图像块称为平面模式预测块。可以根据在先编码的图像中统计平面模式预测块的尺寸来估计当前待处理图像块所在图像(在后文中简称为当前图像)中基本预测块的尺寸。
应理解,在编码端的图像编码顺序和在解码端的图像解码顺序是一致的,因此,在先已重构图像为编码顺序位于待处理图像块所在图像之前的图像,也可以描述为在先已重构图像为解码顺序位于待处理图像块所在图像之前的图像。本文对于编码顺序和解码顺序均按上述方式理解,不再赘述。
应理解,当存在于编码端的已重构图像A的编码顺序和存在于解码端的已重构图像B的解码顺序相同时,图像A和图像B是相同的,因此分别在编码端和解码端基于相同的重构图像进行分析,可以得到相同的先验信息(也可以称之为历史信息),基于该先验信息作为尺寸参考信息来确定基本预测块的尺寸,在编解码端可以得到相同的结果,即实现确定基本预测块的尺寸的自适应机制。
具体的,当待处理图像块中的基本预测块的尺寸通过历史信息来确定时,可以按照如下方式确定当前图像基本预测块的尺寸:计算在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值;当该平均值小于阈值时,确定基本预测块的尺寸为第一尺寸;当该平均值大于或等于阈值时,确定基本预测块的尺寸为第二尺寸。其中,第一尺寸小于第二尺寸。
需要说明的是,第一尺寸、第二尺寸的具体取值,可以根据实际需求配置,本申请实施例对此不进行具体限定。
应理解,第一尺寸小于第二尺寸,可以理解为第一尺寸的面积小于第二尺寸的面积。示例性的,第一尺寸和第二尺寸的关系可以包括第一尺寸为4(正方形边长)且第二尺寸为8(正方形边长),也可以包括第一尺寸为4×4且第二尺寸为8×8,也可以包括第一尺寸为4×4且第二尺寸为4×8,也可以包括第一尺寸为4×8,第二尺寸为8×8,也可以包括第一尺寸为4×8,第二尺寸为8×16,不做限定。
应理解,一般的,上述阈值是预先设置的。本申请实施例对于该阈值的取值以及确定规则,均不进行具体限定,可以根据实际需求配置。
在一种可行的实施方式中,当所述待处理图像块所在的图像的参考帧的显示顺序(picture order count,POC)均小于所述待处理图像块所在的图像的POC时,上述阈值为第一阈值;当待处理图像块所在的图像的至少一个参考帧的POC大于待处理图像块所在的图像的POC时,上述阈值为第二阈值。其中,第一阈值和第二阈值不同。
即,当以低延时(low delay)的方式进行编码时,此时当前图像的参考帧的POC均小于当前图像的POC,将阈值设置为第一数值。示例性的,第一数值可以设置为75。当以随机接入(random access)的方式进行编码时,此时当前图像的至少一个参考帧的POC大于当前图像的POC,将阈值设置为第二数值,示例性的,第二数值可以设置为27。应理解,该第一数值和第二数值的设置不做限定。
在一种可行的实施方式中,在先已重构图像为编码顺序距离待处理图像块所在的图像最近的已重构图像,也即在先已重构图像为解码顺序距离当前待处理图像块所在 的图像最近的已重构图像。
即,将当前图像帧的前一编码/解码帧中的全部平面模式预测块的统计信息(示例性的,全部平面模式预测块的宽和高的乘积的平均值)作为尺寸参考信息,或者,将当前条带的前一条带中的全部所述平面模式预测块的统计信息作为尺寸参考信息。对应的,在S601中,可以根据当前图像帧的前一编码/解码帧中的全部平面模式预测块的统计信息,确定当前图像帧中的基本预测块的尺寸,或者,可以根据当前条带的前一条带中的全部所述平面模式预测块的统计信息,确定当前条带中的基本预测块的尺寸。如前所述,图像也可以包括其他形式的子图像,因此,并不限定于图像帧和条带。
应理解,在此实施方式中,统计信息以图像帧或者条带为单位进行更新,即每图像帧或者每条带进行一次更新。
应理解,在采用帧内预测的图像帧或者条带中不进行统计信息的更新。
在一种可行的实施方式中,在先已重构图像为与当前待处理图像块所在的图像具有相同的时域层标识的图像中,编码顺序距离当前待处理图像块所在的图像最近的已重构图像,也即,在先已重构图像为与当前待处理图像块所在的图像具有相同的时域层标识的图像中,解码顺序距离当前待处理图像块所在的图像最近的已重构图像。
即,从和当前待处理图像块所在的图像具有相同时域层标识(temporal ID)的图像中确定和当前图像编码距离最近的图像。具体方式可参考上一可行的实施方式,不做赘述。
在一种可行的实施方式中,在先已重构图像可以为多个图像,对应的,计算在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值,可以包括:计算多个在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值。
应理解,上述两种可行的实施方式分别是按照单一在先已重构图像的统计数据来确定当前待处理图像块的基本预测块的尺寸,而在本实施方式中是累计多个在先已重构图像的统计数据来确定当前待处理图像块的基本预测块的尺寸。即,在此实施方式中,统计信息以多个图像帧或者多个条带为单位进行更新,即每预设个数的图像帧或者每预设个数条带进行一次更新,或者统计信息可以一直累计而不做更新。
具体的,计算多个在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值,可以包括:分别统计多个在先已重构图像中每个图像中的全部平面模式预测块的宽和高的乘积的平均值,将上述分别统计的平均值再做加权,获得本实施方式中用于和上述阈值进行比较的最终平均值;或者,计算多个在先已重构图像中全部平面模式预测块的宽和高的乘积的平均值,也可以包括:累加多个在先已重构图像中的全部平面模式预测块的宽和高的乘积,再除以全部平面模式预测块的个数,以获取本实施方式中用于和上述阈值进行比较的平均值。
在一种可行的实施方式中,在计算在先已重构图像中全部所述平面模式预测块的宽和高的乘积的平均值的过程中,还包括确定统计信息有效。比如,如果在先已重构图像中没有平面模式预测块,就无法计算上述平均值,此时统计信息无效。对应的,可以不对统计信息进行更新,或者将当前待处理图像块的基本预测块的尺寸设置为预设值。示例性的,对于正方形基本预测块,当统计信息无效时,可以将当前待处理图像块的基本预测块的尺寸设置为4×4。
应理解,对于第一个采用帧间预测的图像在采用历史信息确定基本预测块的尺寸的实施方式中,基本预测块的尺寸也可以设置为预设值。
在一种可行的实施方式中,在S601中确定待处理图像块中的基本预测块的尺寸,还包括确定基本预测块的形状。示例性的,当待处理图像块为正方形时,可以确定基本预测块也为正方形,或者,待处理图像块的宽高比和基本预测块的宽高比一致,或者,将待处理图像块的宽和高分别均分成若干等分以获取基本预测块的宽和高,或者,待处理图像块的形状和基本预测块的形状不相关。比如,可以将基本预测块固定设置为正方形,或者,当待处理图像块的尺寸为32×16时,可以设置基本预测块为16×8或者8×4等,不做限定。
应理解,在一种可行的实施方式中,对基本预测块形状的确定分别固化于编解码端,并保持一致。
在一种可行的实施方式中,在S601根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,本申请提供的帧间预测方法还可以包括:确定待处理图像块的预测方向。
在一种可行的实施方式中,确定所述待处理图像块的预测方向,可以包括:当第一向预测有效且第二向预测无效,或者,第二向预测有效且第一向预测无效时,待处理图像块的预测方向为单向预测;当第一向预测有效且第二向预测有效时,待处理图像块的预测方向为双向预测。
其中,第一向预测以及第二向预测是指两个不同方向的预测,并不是对预测方向的具体限定。例如,第一向预测可以为前向预测,第二向预测可以为后向预测;或者,第一向预测可以为后向预测,第二向预测可以为前向预测。
在一种可行的实施方式中,当待处理图像块的相邻区域内至少一个临时图像块采用第一参考帧图像列表获得运动矢量时,第一向预测有效;当待处理图像块的相邻区域内没有临时图像块采用第一参考帧图像列表获得运动矢量时,第一向预测无效;当待处理图像块的相邻区域内至少一个临时图像块采用第二参考帧图像列表获得运动矢量时,第二向预测有效;当待处理图像块的相邻区域内没有临时图像块采用第二参考帧图像列表获得运动矢量时,第二向预测无效。
其中,第一参考帧图像列表是与第一向预测对应的参考帧图像列表,第二参考帧图像列表是与第二向预测对应的参考帧图像列表。
在一种可行的实施方式中,临时图像块为具有预设尺寸的图像块。预设尺寸的取值可以根据实际需求确定,本申请实施例对此不进行具体限定。
在另一种可行的实施方式中,运动矢量可以包括第一运动矢量和/或第二运动矢量,当待处理图像块的相邻区域内至少两个采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量不同时,第一向预测有效。其中,第一运动矢量对应第一参考帧图像列表。当待处理图像块的相邻区域内所有采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量均相同时,第一向预测无效;当待处理图像块的相邻区域内至少两个采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量不同时,第二向预测有效。其中,第二运动矢量对应第二参考帧图像列表;当待处理图像块的相邻区域内所有采用第二参考帧图像列表获得运动矢量的临时图像块的第二 运动矢量均相同时,第二向预测无效。
在一种可行的实施方式中,当临时图像块仅采用第一参考帧图像列表获得运动矢量时,第一运动矢量和运动矢量相同;当临时图像块仅采用第二参考帧图像列表获得运动矢量时,第二运动矢量和所述运动矢量相同。
在一种可行的实施方式中,待处理图像块的相邻区域,可以包括:待处理图像块的左侧空域区域,上侧空域区域,右侧时域区域,下侧时域区域中的一个区域或者任意区域组合。
在另一种可行的实施方式中,待处理图像块的相邻区域,可以包括:待处理图像块的左侧空域区域,上侧空域区域,左下侧空域区域,右上侧空域区域,右侧时域区域,下侧时域区域中的一个区域或者任意区域组合。
在一种可能可行的实施方式中,尺寸参考信息可以包括待处理图像的预测方向和/或待处理图像块的形状信息。其中,形状信息可以包括高度和宽度。相应的,S601中确定待处理图像块中的基本预测块的尺寸,可以包括:根据待处理图像块的预测方向和/或形状信息,确定基本预测块的尺寸。
在一种可行的实施方式中,S601中根据待处理图像块的预测方向,确定基本预测块的尺寸,可以包括:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为4像素,或者,当待处理图像块的预测方向为双向预测时,基本预测块的宽度为4像素,高度为8像素。
在一种可行的实施方式中,S601中根据待处理图像块的预测方向,确定基本预测块的尺寸,可以实现为:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测且待处理图像块的宽度大于或等于待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;当待处理图像块的预测方向为双向预测且待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
在一种可行的实施方式中,S601中根据待处理图像块的预测方向,确定基本预测块的尺寸,可以包括:当待处理图像块的预测方向为单向预测时,基本预测块的宽度为4像素,高度为4像素;当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为8像素。
在一种可行的实施方式中,所述根据所述待处理图像块的预测方向,确定所述基本预测块的尺寸,包括:当待处理图像块的预测方向为双向预测时,基本预测块的宽度为8像素,高度为8像素;当待处理图像块的预测方向为单向预测且待处理图像块的宽度大于或等于所述待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的预测方向为单向预测且待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
在一种可行的实施方式中,S601中根据待处理图像的形状信息,确定基本预测块的尺寸,可以包括:当待处理图像块的宽度大于或等于待处理图像块的高度时,基本预测块的宽度为8像素,高度为4像素;当待处理图像块的宽度小于待处理图像块的高度时,基本预测块的宽度为4像素,高度为8像素。
在一种可行的实施方式中,基本预测块的宽度可以为8像素,高度可以为8像素。该实施方式中,对于尺寸参考信息的内容不进行具体限定。
应理解,上述内容描述了几种不同的尺寸参考信息的内容,并提供了不同尺寸参考信息的内容对应的S601的具体实现,尺寸参考信息的内容也可以为上述几种内容的组合,在组合情况下S601的具体实现不再赘述。
示例性的,尺寸参考信息可以为第一标识以及待处理图像块的预测方法,由第一标识指示基本预测块的尺寸的取值范围,再根据待处理图像块的预测方法在该取值范围内确定基本预测块的尺寸。
示例性的,尺寸参考信息可以为第一标识以及待处理图像块的形状,由第一标识指示基本预测块的尺寸的取值范围,再根据待处理图像块的形状在该取值范围内确定基本预测块的尺寸。
在一种可行的实施方式中,在S601步骤之后,本申请提供的帧间预测方法还包括:
S602、根据基本预测块的尺寸,将待处理图像块划分为多个基本预测块;依次确定每个基本预测块在待处理图像块中的位置。
应理解,每个基本预测块的尺寸是相同的,在确定了基本预测块的尺寸之后,可以在待处理图像块中按照尺寸依次推算出每一个基本预测块的位置。
应理解,在一种可行的实施方式中,待处理图像块和基本预测块的位置均以坐标的形式存在,该步骤仅需要确定每个基本预测块的坐标即可,或者,将待处理图像块和基本预测块进行区分,不存在实体化的划分步骤。
S603、根据每个基本预测块的位置,确定每个基本预测块的第一参考块和第二参考块。
其中,第一参考块的左边界线和基本预测单元的左边界线共线,第二参考块的上边界线和基本预测单元的上边界线共线,第一参考块与待处理图像块的上边界线邻接,第二参考块与待处理图像块的左边界线邻接。
S604、对第一参考块对应的运动矢量、第二参考块对应的运动矢量以及与待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,获取基本预测块对应的运动矢量。
具体的,对S602中划分的每个基本预测块均执行S604的操作,获取每个基本预测块对应的运动矢量。对于每个基本预测块执行S604的过程相同,不再一一赘述。
在一种可行的实施方式中,与待处理图像块具有预设位置关系的原始参考块,可以包括:与待处理图像块具有预设空域位置关系的原始参考块和/或与待处理图像块具有预设时域位置关系的原始参考块。当然,与待处理图像块具有预设位置关系的原始参考块也可以根据实际需求定义,本申请实施例对此不进行具体限定。
在一种可行的实施方式中,与待处理图像块具有预设空域位置关系的原始参考块,可以包括:位于待处理图像块左上角且与待处理图像块的左上角点相邻的图像块、位于待处理图像块右上角且与待处理图像块的右上角点相邻的图像块和位于待处理图像块左下角且与待处理图像块的左下角点相邻的图像块中的一个或多个。其中,与待处理图像块具有预设空域位置关系的原始参考块位于待处理图像块的外部,且不妨简称为空域参考块。
在一种可行的实施方式中,与待处理图像块具有预设时域位置关系的原始参考块,可以包括:在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块。其中,与待处理图像块具有预设时域位置关系的原始参考块位于映射图像块的外部,映射图像块与待处理图像块尺寸相等,映射图像块在目标参考帧中的位置与待处理图像块在待处理图像块所在图像帧中的位置相同,且不妨简称为时域参考块。
在一种可行的实施方式中,目标参考帧的索引信息和参考帧列表信息可以通过解析码流获得。即,码流中包括目标参考帧的索引信息和参考帧列表信息,在参考帧列表信息中,查找目标参考帧的索引信息,即可确定出目标参考帧。
在一种可行的实施方式中,目标参考帧的索引信息和参考帧列表信息可以位于待处理图像块所在的条带的条带头对应的码流段中。
下面通过示例的形式,具体描述步骤S603和S604的具体实现方式。图7、图8、图9分别示意了一种加权计算基本预测块对应的运动矢量的场景。
在一种可行的实施方式中,在图7所示的场景中,S603和S604的具体实现可以包括下述步骤:
S701、根据待处理图像块600中的基本预测块604的位置,确定第一参考块809和第二参考块802。
其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y)。
S702A、基于待处理图像块600的右上角的空域参考块805对应的运动矢量和待处理图像块600的右下角位置的时域参考块807对应的运动矢量进行加权计算,获得第一临时块806对应的运动矢量。
示例性的,第一临时块806对应的运动矢量计算公式为R(W,y)=((H-y-1)×AR+(y+1)×BR)/H。
其中,AR为位于待处理图像块600右上角且与待处理图像块600的右上角点相邻的图像块(空域参考块805)对应的运动矢量,BR为在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块对应的运动矢量,H为待处理图像块600的高与基本预测块604的高的比值,y为基本预测块604的左上角点相对于待处理图像块600的左上角点的竖直距离与基本预测块604的高的比值。其中,目标参考帧的索引信息和参考帧列表信息从条带头中解析获得,并确定目标参考帧。
S702B、基于待处理图像块600的左下角的空域参考块801对应的运动矢量和待处理图像块600的右下角位置的时域参考块807对应的运动矢量进行加权计算,获得第二临时块808对应的运动矢量。
示例性的,第二临时块808对应的运动矢量计算公式为B(x,H)=((W-x-1)×BL+(x+1)×BR)/W。
其中,BL为位于待处理图像块600左下角且与待处理图像块的左下角点相邻的图像块(空域参考块801)对应的运动矢量,BR为在目标参考帧中位于映射图像块右下角且与映射图像块的右下角点相邻的图像块对应的运动矢量,W为待处理图像块600的宽与基本预测块604的宽的比值,x为基本预测块604的左上角点相对于待处理图像块600的左上角点的水平距离与基本预测块604的宽的比值。
应理解,步骤S702A和步骤S702B不限定执行顺序关系,可以先后也可以同时。
S703A、基于待处理图像块600的第一临时块806对应的运动矢量和待处理图像块600的第二参考块802对应的运动矢量,进行加权计算获得基本预测块604对应的第一临时运动矢量P h(x,y)。
示例性的,基本预测块604对应的第一临时运动矢量P h(x,y)的计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S703B、基于待处理图像块600的第二临时块808对应的运动矢量和待处理图像块600的第一参考块809对应的运动矢量进行加权计算,获得基本预测块604对应的第二临时运动矢量P v(x,y)。
示例性的,基本预测块604对应的第二临时运动矢量P v(x,y)的计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S703A和步骤S703B不限定执行顺序关系。
S704、基于待处理图像块600的第一临时运动矢量P h(x,y)和第二临时运动矢量P v(x,y)进行加权计算,获得基本预测块604对应的运动矢量P(x,y)。
示例性的,基本预测块604对应的运动矢量P(x,y)计算公式为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
应理解,在一种可行的实施方式中,基本预测块604对应的运动矢量P(x,y)也可以通过综合上述步骤的单一公式得出。
示例性的,基本预测块604对应的运动矢量P(x,y)综合上述步骤的单一公式为:P(x,y)=(H×((W-1-x)×L(-1,y)+(x+1)×((H-y-1)×AR+(y+1)×BR)/H)
+W×((H-1-Y)×A(x,-1)+(y+1)×((W-x-1)×BL+(x+1)×BR)/W)
+H+W)/(2×H×W)
在另一种可行的实施方式中,在图8所示的场景中,S603和S604的具体实现可以包括下述步骤:
S801、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802。
其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y)。
S802A、将待处理图像块600的右上角的空域参考块805对应的运动矢量作为待处理图像块600的第一临时块806对应的运动矢量R(W,y)。
S802B、将待处理图像块600的左下角的空域参考块801对应的运动矢量作为待处理图像块600的第二临时块808对应的运动矢量B(x,H)。
应理解,步骤S802A和步骤S802B不限定执行顺序关系。可以先后执行,也可以同时执行。
S803A、基于待处理图像块600的第一临时块806对应的运动矢量R(W,y)和待处理图像块600的第二参考块802对应的运动矢量进行加权计算,获得基本预测块604对应的第一临时运动矢量P h(x,y)。
示例性的,基本预测块604对应的第一临时运动矢量P h(x,y)的计算公式为:P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S803B、基于待处理图像块600的第二临时块808对应的运动矢量B(x,H)和待处理图像块600的第一参考块809对应的运动矢量进行加权计算,获得基本预测块604对应的第二临时运动矢量P v(x,y)。
示例性的,基本预测块604对应的第二临时运动矢量P v(x,y)的计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S803A和步骤S803B不限定执行顺序关系。可以先后执行,也可以同时执行。
S804、基于待处理图像块600的第一临时运动矢量和第二临时运动矢量进行加权计算,获得基本预测块604对应的运动矢量P(x,y)。
示例性的,基本预测块604对应的运动矢量P(x,y)的计算公式可以为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
在另一种可行的实施方式中,在图9所示的场景中,S603和S604的具体实现可以包括下述步骤:
S901、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802。
其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y)。
S902、根据待处理图像块600中的基本预测块604的位置确定第一临时块806和第二临时块808。
其中,第一临时块为在目标参考帧中位于映射图像块的块806位置的图像块,第二临时块为在目标参考帧中位于映射图像块的块808位置的图像块,第一临时块和第二临时块均为时域参考块。
S903A、基于待处理图像块600的第一临时块806对应的运动矢量R(W,y)和待处理图像600的第二参考块802对应的运动矢量进行加权计算,获得基本预测块604对应的第一临时运动矢量P h(x,y)。
示例性的,基本预测块604对应的第一临时运动矢量P h(x,y)的计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S903B、基于待处理图像块600的第二临时块808对应的运动矢量B(x,H)和待处理图像600的第一参考块809对应的运动矢量进行加权计算,获得基本预测块604对应的第二临时运动矢量P v(x,y)。
示例性的,基本预测块604对应的第二临时运动矢量P v(x,y)的计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S903A和步骤S903B不限定执行顺序关系。
S904、基于待处理图像块600的第一临时运动矢量P h(x,y)和第二临时运动矢量P v(x,y)进行加权计算,获得基本预测块604对应的运动矢量P(x,y)。
示例性的,基本预测块604对应的运动矢量P(x,y)计算公式为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
在另一种可行的实施方式中,在图9所示的场景中,S603和S604的具体实现可以包括下述步骤:
S0101、根据待处理图像块600中的基本预测块604的位置确定第一参考块809和第二参考块802。
其中,第一参考块所对应的运动矢量为A(x,-1),第二参考块对应的运动矢量为L(-1,y)。
S0102、根据待处理图像块600的任一空域参考块的运动信息进行运动补偿,确定参考帧信息和运动补偿块的位置。
其中,所述任一空域参考块可以是图5所示的左侧空域邻接块或上侧空域邻接块中的某一个可用的空域邻接块。示例性的,可以是图5中沿着方向1检测到的第一个可用的左侧空域邻接块,或者可以是图5中沿着方向2检测到的第一个可用的上侧空域邻接块;还可以是对待处理图像块600的多个预设空域参考块依照预设的顺序检测得到的第一个可用的空域邻接块,如图7所示的L→A→AR→BL→AL的顺序;还可以是按照预定规则所选择的空域邻接块,不做限定。
S0103、根据待处理图像块600中的基本预测块604的位置确定第一临时块806和第二临时块808。
其中,第一临时块为在步骤S0102中根据参考帧信息确定的参考帧中位于运动补偿块的块806位置的图像块,第二临时块为在步骤S0102中根据参考帧信息确定的参考帧中位于运动补偿块的块808位置的图像块,第一临时块和第二临时块均为时域参考块。
S0104A、基于待处理图像块600的第一临时块806对应的运动矢量R(W,y)和待处理图像600的第二参考块802对应的运动矢量进行加权计算,获得基本预测块604对应的第一临时运动矢量P h(x,y)。
示例性的,基本预测块604对应的第一临时运动矢量P h(x,y)的计算公式为P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y)。
S0104B、基于待处理图像块600的第二临时块808对应的运动矢量B(x,H)和待处理图像600的第一参考块809对应的运动矢量进行加权计算,获得基本预测块604对应的第二临时运动矢量P v(x,y)。
示例性的,基本预测块604对应的第二临时运动矢量P v(x,y)的计算公式为P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H)。
应理解,步骤S0104A和步骤S0104B不限定执行顺序关系。
S0105、基于待处理图像块600的第一临时运动矢量P h(x,y)和第二临时运动矢量P v(x,y)进行加权计算,获得基本预测块604对应的运动矢量P(x,y)。
示例性的,基本预测块604对应的运动矢量P(x,y)计算公式为P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)。
需要说明的是,上述结合图7至图9所描述的S603和S604的具体实现,仅为示例性说明,并不是对S603和S604的具体实现的限定。
前文提到了图像块和存储运动信息的基本存储单元的关系,不妨将图像块所对应的基本存储单元所存储的运动信息称为该图像块的实际运动信息,而运动信息包括运动矢量和运动矢量指向的参考帧的索引信息。应理解,用于加权计算出基本预测块的运动矢量的各个参考块的参考帧的索引信息不能保证是一致的。当各参考块的参考帧 的索引信息一致时,参考块对应的运动信息就是参考块的实际运动信息。当个参考块的参考帧的索引信息不一致时,首先需要按照参考帧索引指示的参考帧的距离关系对参考块的实际运动矢量进行加权处理,参考块对应的运动信息就是对参考块的实际运动信息中的运动矢量进行加权处理后的运动矢量。
具体的,确定目标参考图像索引,示例性的,可以固定为0,1或其它索引值,也可以是参考图像列表中使用频率最高的参考图像索引,例如是所有参考块的实际运动矢量或者经加权的运动矢量指向次数最多的参考图像索引。
判断各个参考块的参考帧的索引信息是否与目标图像索引相同;
如果某个参考块的参考帧的索引信息与目标图像索引不同,则基于参考块所在图像与参考块的实际运动信息(参考帧索引信息)所指示的参考帧图像之间的时间距离,与参考块所在图像与目标参考图像索引所指示的参考图像之间的时间距离之比,来按比例缩放实际运动矢量,以获取加权处理后的运动矢量。
在一种可行的实施方式中,在步骤S604之后,本申请实施例提供的帧间预测的方法还可以包括:
S605、基于获得的基本预测块运动矢量,对待处理图像块进行运动补偿。
在一种可行的实施方式中,包括:首先对具有相同运动信息的相邻基本预测块进行合并,然后以合并后的图像块作为运动补偿的单元进行运动补偿。
具体的,首先进行横向合并,即对待处理图像块中的每一行基本预测块,从左向右依次判断基本预测块和与其相邻的基本预测块的运动信息(示例性的,包括运动矢量、参考帧列表、参考帧索引信息)是否相同。当运动信息相同时,合并相邻的两个基本预测块,并继续判断和合并后的基本预测块相邻的下一个基本预测块的运动信息是否与合并后的基本预测块的运动信息相同,直到相邻的基本预测块的运动信息与合并后的基本预测块的运动信息不同时,停止合并,继续以该具有不同运动信息的基本预测块作为起点继续进行具有相同运动信息的相邻基本预测块进行合并的步骤,直到该基本预测块行结束。
然后再进行纵向合并,即对每一个横向合并后的基本预测块或者未合并的基本预测块,判断该块的下边沿是否和另一个横向合并后的基本预测块或者未合并的基本预测块的上边沿完全重合。如果完全重合,合并边沿重合的两个具有相同运动信息的基本预测块(或者横向合并后的基本预测块),继续对纵向合并后的基本预测块进行具有相重合的上下边沿的具有相同运动信息的相邻基本预测块进行合并的步骤,直到该待处理图像块中没有满足上述条件的基本预测块。
最后,以合并后的基本预测块作为运动补偿的单元进行运动补偿。
在一种可行的实施方式中,对具有相同运动信息的相邻基本预测块进行合并的合并方式和待处理图像块的形状有关系。示例性的,当待处理图像块的宽大于或等于待处理图像块的高时,只采用上文所述的横向合并的方式进行基本预测块的合并。当待处理图像块的宽小于待处理图像块的高时,对待处理图像块中的每一列基本预测块,从上向下依次判断基本预测块和与其相邻的基本预测块的运动信息(示例性的,包括运动矢量、参考帧列表、参考帧索引信息)是否相同。当运动信息相同时,合并相邻的两个基本预测块,并继续判断和合并后的基本预测块相邻的下一个基本预测块的运 动信息是否与合并后的基本预测块的运动信息相同,直到相邻的基本预测块的运动信息与合并后的基本预测块的运动信息相同不同时,停止合并,继续以该具有不同运动信息的基本预测块作为起点继续进行具有相同运动信息的相邻基本预测块进行合并的步骤,直到该基本预测块列结束。
在一种可行的实施方式中,在步骤S601之前,如图6所示,本申请实施例提供的帧间预测的方法还可以包括:
S606、确定第一参考块和第二参考块位于待处理图像块所在的图像边界内。
即,当待处理图像块的上边界线和待处理图像块所在图像的上边界线重合时,第一参考块不存在,此时本申请实施例中的方案不适用。当待处理图像块的左边界线和待处理图像块所在图像的左边界线重合时,第二参考块不存在,此时本申请实施例中的方案也不适用。
在一种可行的实施方式中,在步骤S601之前,如图6所示,本申请实施例提供的帧间预测的方法还可以包括:
S607、确定待处理图像块的形状满足预设条件。
在S607中,当确定待处理图像块的形状满足预设条件时,执行S601,否则不执行。
示例性的,预设条件可以包括待处理图像块的宽大于或等于16且待处理图像块的高大于或等于16;或者,确定待处理图像块的宽大于或等于16;或者,确定待处理图像块的高大于或等于16。
即,当待处理图像块的宽小于16或高小于16时,本申请实施例中的方案不适用,或者,当待处理图像块的宽小于16且高小于16时,本申请实施例中的方案不适用。
应理解,示例性的,这里以16作为阈值,还可以采用8,24,32等其他数值,宽和高所对应的阈值也可以不相等,均不做限定。
应理解,步骤S606和步骤S607可以配合执行。示例性的,在一种可行的实施方式中,当待处理图像块处于图像帧的左边界,或上边界,或待处理图像块的宽和高都小于16时,不能采用本申请实施例中的帧间预测方案,在另一种可行的实施方式中,当待处理图像块处于图像帧的左边界,或上边界,或待处理图像块的宽或高小于16时,不能采用本申请实施例中的帧间预测方案。
虽然关于视频编码器100及视频解码器200已描述本申请的特定方面,但应理解,本申请的技术可通过许多其它视频编码和/或编码单元、处理器、处理单元、例如编码器/解码器(CODEC)的基于硬件的编码单元及类似者来应用。此外,应理解,仅作为可行的实施方式而提供关于图6所展示及描述的步骤。即,图6的可行的实施方式中所展示的步骤无需必定按图6中所展示的次序执行,且可执行更少、额外或替代步骤。
在当前图像块的多个可用邻近块的运动信息中任意两个运动信息不相同的情况下,基于平面planar模式预测所述当前图像块中一个或多个子块的运动信息,包括:在所述一个可用邻近块的运动信息中与第一参考图像列表对应的运动矢量与所述另一个可用邻近块的运动信息中与第一参考图像列表对应的运动矢量不相同,和/或,所述一个可用邻近块的运动信息中与第二参考图像列表对应的运动矢量与所述另一个可用邻近块的运动信息中与第二参考图像列表对应的运动矢量不相同的情况下,基于平面 planar模式预测所述当前图像块中一个或多个子块的运动信息中的与第一参考图像列表对应的运动矢量(即第一参考图像列表有效),和/或,与第二参考图像列表对应的运动矢量(即第二参考图像列表有效)。
在第一参考图像列表和第二参考图像列表同时有效的情况下,当前块为双向预测;
在第一参考图像列表和第二参考图像列表仅其中一个有效的情况下,当前块为单向预测。
所述多个可用邻近块可以为:在当前图像块的所有可用左侧空域邻近块和当前图像块的所有可用上侧空域邻近块,或,在当前图像块的所有可用右侧时域邻近块和当前图像块的所有可用下侧时域邻近块,或,在当前图像块的所有可用左侧空域邻近块、当前图像块的所有可用上侧空域邻近块、当前图像块的所有可用右侧时域邻近块和当前图像块的所有可用下侧时域邻近块。
图10为本申请实施例中的帧间预测的装置1000的一种示意性框图。具体的,帧间预测的装置100可以包括:确定模块1001、定位模块1002以及计算模块1003。
其中,确定模块1001,用于根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,该尺寸用于确定基本预测块在待处理图像块中的位置。
定位模块1002,用于根据确定模块1001确定的基本预测块的位置,确定基本预测块的第一参考块和第二参考块。其中,第一参考块的左边界线和所述基本预测单元的左边界线共线,所述第二参考块的上边界线和所述基本预测单元的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;
计算模块1003,用于对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获取所述基本预测块对应的运动矢量。
其中,确定模块1001用于支持该帧间预测的装置1000执行上述实施例中的S601等,和/或用于本文所描述的技术的其它过程。定位模块1002用于支持该帧间预测的装置1000执行上述实施例中的S603等,和/或用于本文所描述的技术的其它过程。计算模块1003用于支持该帧间预测的装置1000执行上述实施例中的S604及S605等,和/或用于本文所描述的技术的其它过程。
进一步的,如图10所示,帧间预测的装置1000还可以包括划分模块1004,用于支持该帧间预测的装置1000执行上述实施例中的S602等,和/或用于本文所描述的技术的其它过程。
进一步的,如图10所示,帧间预测的装置1000还可以包括判断模块1005,用于支持该帧间预测的装置1000执行上述实施例中的S606及S607等,和/或用于本文所描述的技术的其它过程。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在采用集成的单元的情况下,图11为本申请实施例的帧间预测的装置1100的一种实现方式的示意性框图。其中,帧间预测的装置1100可以包括处理器1110、存储器1130和总线系统1150。其中,处理器和存储器通过总线系统相连,该存储器用于 存储指令,该处理器用于执行该存储器存储的指令。编码设备的存储器存储程序代码,且处理器可以调用存储器中存储的程序代码执行本申请描述的各种视频编码或解码方法,尤其是在各种新的帧间预测模式下的视频编码或解码方法,以及在各种新的帧间预测模式下预测运动信息的方法。为避免重复,这里不再详细描述。
该存储器1130可以包括只读存储器(ROM)设备或者随机存取存储器(RAM)设备。任何其他适宜类型的存储设备也可以用作存储器1130。存储器1130可以包括由处理器1110使用总线系统1150访问的代码和数据1131。存储器1130可以进一步包括操作系统1133和应用程序1135,该应用程序1135包括允许处理器1110执行本申请描述的视频编码或解码方法(尤其是本申请描述的帧间预测的方法)的至少一个程序。例如,应用程序1135可以包括应用1至N,其进一步包括执行在本申请描述的视频编码或解码方法的视频编码或解码应用(简称视频译码应用)。
该总线系统1150除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统1150。
可选的,帧间预测的装置1100还可以包括一个或多个输出设备,诸如显示器1170。在一个示例中,显示器1170可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1170可以经由总线系统1150连接到处理器1110。
其中,上述方法实施例涉及的各场景的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
上述帧间预测的装置1000和帧间预测的装置1100均可执行上述图6所示的帧间预测的方法,帧间预测的装置1000和帧间预测的装置1100具体可以是视频编解码装置或者其他具有视频编解码功能的设备。帧间预测的装置1000和帧间预测的装置1100可以用于在编解码过程中进行图像预测。
本申请实施例提供一种译码设备,该译码设备包括上述任一实施例描述的帧间预测的装置。该译码设备可以为视频解码器,也可以为视频编码器。
本申请还提供一种终端,该终端包括:一个或多个处理器、存储器、通信接口。该存储器、通信接口与一个或多个处理器耦合;存储器用于存储计算机程序代码,计算机程序代码包括指令,当一个或多个处理器执行指令时,终端执行本申请实施例的帧间预测的方法。
这里的终端可以是视频显示设备,智能手机,便携式电脑以及其它可以处理视频或者播放视频的设备。
本申请另一实施例还提供一种计算机可读存储介质,该计算机可读存储介质包括一个或多个程序代码,该一个或多个程序包括指令,当终端中的处理器在执行该程序代码时,该终端执行如图6所示的帧间预测的方法。
在本申请的另一实施例中,还提供一种计算机程序产品,该计算机程序产品包括计算机执行指令,该计算机执行指令存储在计算机可读存储介质中;终端的至少一个处理器可以从计算机可读存储介质读取该计算机执行指令,至少一个处理器执行该计算机执行指令使得终端实施执行如图6所示的帧间预测的方法。
在上述实施例中,可以全部或部分的通过软件,硬件,固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式出现。所述 计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。
所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘,硬盘、磁带)、光介质(例如,DVD)或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
此外,应理解,取决于可行的实施方式,本文中所描述的方法中的任一者的特定动作或事件可按不同序列执行,可经添加、合并或一起省去(例如,并非所有所描述的动作或事件为实践方法所必要的)。此外,在特定可行的实施方式中,动作或事件可(例如)经由多线程处理、中断处理或多个处理器来同时而非顺序地执行。另外,虽然出于清楚的目的将本申请的特定方面描述为通过单一模块或单元执行,但应理解,本申请的技术可通过与视频解码器相关联的单元或模块的组合执行。
在一个或多个可行的实施方式中,所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么功能可作为一个或多个指令或代码而存储于计算机可读媒体上或经由计算机可读媒体来传输,且通过基于硬件的处理单元来执行。计算机可读媒体可包含计算机可读存储媒体或通信媒体,计算机可读存储媒体对应于例如数据存储媒体的有形媒体,通信媒体包含促进计算机程序(例如)根据通信协议从一处传送到另一处的任何媒体。
以这个方式,计算机可读媒体示例性地可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)例如信号或载波的通信媒体。数据存储媒体可为可由一个或多个计算机或一个或多个处理器存取以检索用于实施本申请中所描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为可行的实施方式而非限制,此计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用于存储呈指令或数据结构的形式的所要代码且可由计算机存取的任何其它媒体。同样,任何连接可适当地称作计算机可读媒体。例如,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL),或例如红外线、无线电及微波的无线技术而从网站、服务器或其它远端源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL,或例如红外线、无线电及微波的无线技术包含于媒体的定义中。
然而,应理解,计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体,而替代地针对非暂时性有形存储媒体。如本文中所使用,磁盘及光盘包含紧密光盘(CD)、雷射光盘、光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘通过雷射以光学方式再现数据。以上各物的组合也应包含于计算机可读媒体的范围内。
可通过例如一个或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它等效集成或离散逻辑电路的一个或多个处理器来执行指令。因此,如本文中所使用,术语“处理器”可指前述结构或适于实施本文中所描述的技术的任何其它结构中的任一者。另外,在一些方面中,可将本文所描述的功能性提供于经配置以用于编码及解码的专用硬件和/或软件模块内,或并入于组合式编码解码器中。同样,技术可完全实施于一个或多个电路或逻辑元件中。
本申请的技术可实施于广泛多种装置或设备中,包含无线手机、集成电路(IC)或IC的集合(例如,芯片组)。本申请中描述各种组件、模块或单元以强调经配置以执行所揭示的技术的装置的功能方面,但未必需要通过不同硬件单元实现。更确切来说,如前文所描述,各种单元可组合于编码解码器硬件单元中或由互操作的硬件单元(包含如前文所描述的一个或多个处理器)结合合适软件和/或固件的集合来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (53)

  1. 一种帧间预测的方法,其特征在于,包括:
    根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;
    根据所述位置,确定所述基本预测块的第一参考块和第二参考块;其中,所述第一参考块的左边界线和所述基本预测块的左边界线共线,所述第二参考块的上边界线和所述基本预测块的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;
    对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获取所述基本预测块对应的运动矢量。
  2. 根据权利要求1所述的方法,其特征在于,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始参考块,和/或,与所述待处理图像块具有预设时域位置关系的原始参考块。
  3. 根据权利要求2所述的方法,其特征在于,所述与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个;其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
  4. 根据权利要求2或3所述的方法,其特征在于,所述与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块;其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同。
  5. 根据权利要求4所述的方法,其特征在于,所述目标参考帧的索引信息和参考帧列表信息通过解析码流获得。
  6. 根据权利要求5所述的方法,其特征在于,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
  7. 根据权利要求4至6任一项所述的方法,其特征在于,所述对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获取所述基本预测块对应的运动矢量,包括:
    所述基本预测块对应的运动矢量根据如下公式获得:
    P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W)
    其中,
    P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y);
    P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H);
    R(W,y)=((H-y-1)×AR+(y+1)×BR)/H;
    B(x,H)=((W-x-1)×BL+(x+1)×BR)/W;
    所述AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,所述BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,所述BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,所述x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,所述y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,所述H为所述待处理图像块的高与所述基本预测块的高的比值,所述W为所述待处理图像块的宽与所述基本预测块的宽的比值,所述L(-1,y)为所述第二参考块对应的运动矢量,所述A(x,-1)为所述第一参考块对应的运动矢量,所述P(x,y)为所述基本预测块对应的运动矢量。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述尺寸参考信息包括第一标识;所述第一标识用于指示所述基本预测块的尺寸;
    所述方法还包括:接收码流,从所述码流中解析获取所述第一标识;其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像的图像参数集和所述待处理图像块所在条带的条带头中的任一个所对应的码流段中。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,在所述根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,还包括:
    确定所述待处理图像块的预测方向。
  10. 根据权利要求9所述的方法,其特征在于,所述确定所述待处理图像块的预测方向,包括:
    当第一向预测有效且第二向预测无效,或者,所述第二向预测有效且所述第一向预测无效时,所述待处理图像块的预测方向为单向预测;
    当所述第一向预测有效且所述第二向预测有效时,所述待处理图像块的预测方向为双向预测。
  11. 根据权利要求10所述的方法,其特征在于,
    当所述待处理图像块的相邻区域内至少一个临时图像块采用第一参考帧图像列表获得运动矢量时,所述第一向预测有效;
    当所述待处理图像块的相邻区域内没有所述临时图像块采用所述第一参考帧图像列表获得运动矢量时,所述第一向预测无效;
    当所述待处理图像块的相邻区域内至少一个所述临时图像块采用第二参考帧图像列表获得运动矢量时,所述第二向预测有效;
    当所述待处理图像块的相邻区域内没有所述临时图像块采用所述第二参考帧图像列表获得运动矢量时,所述第二向预测无效。
  12. 根据权利要求10所述的方法,其特征在于,所述运动矢量包括第一运动矢量和/或第二运动矢量,当所述待处理图像块的相邻区域内至少两个采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量不同时,所述第一向预测有效,其中, 所述第一运动矢量对应所述第一参考帧图像列表;
    当所述待处理图像块的相邻区域内所有采用所述第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量均相同时,所述第一向预测无效;
    当所述待处理图像块的相邻区域内至少两个采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量不同时,所述第二向预测有效,其中,所述第二运动矢量对应所述第二参考帧图像列表;
    当所述待处理图像块的相邻区域内所有采用所述第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量均相同时,所述第二向预测无效。
  13. 根据权利要求11至12任一项所述的方法,其特征在于,所述临时图像块为具有预设尺寸的图像块。
  14. 根据权利要求11至12任一项所述的方法,其特征在于,所述待处理图像块的相邻区域包括:所述待处理图像块的左侧空域区域,上侧空域区域,右侧时域区域,下侧时域区域中的一个区域或者任意区域组合。
  15. 根据权利要求1至9任一项所述的方法,其特征在于,所述尺寸参考信息包括所述待处理图像块的形状信息;所述形状信息包括宽度和高度;
    所述根据尺寸参考信息,确定所述基本预测块的尺寸,包括:
    当所述待处理图像块的宽度大于或等于所述待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;
    当所述待处理图像块的宽度小于所述待处理图像块的高度时,所述基本预测块的宽度为4像素,高度为8像素。
  16. 根据权利要求1至9任一项所述的方法,其特征在于,所述基本预测块的宽度为8像素,高度为8像素。
  17. 根据权利要求10至14任一项所述的方法,其特征在于,所述尺寸参考信息包括所述待处理图像块的预测方向。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述尺寸参考信息,确定所述基本预测块的尺寸,包括:
    当所述待处理图像块的预测方向为单向预测时,所述基本预测块的宽度为4像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测时,所述基本预测块的宽度为8像素,高度为4像素,或者,所述基本预测块的宽度为4像素,高度为8像素。
  19. 根据权利要求17所述的方法,其特征在于,所述根据尺寸参考信息,确定所述基本预测块的尺寸,包括:
    当所述待处理图像块的预测方向为单向预测时,所述基本预测块的宽度为4像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测且所述待处理图像块的宽度大于或等于所述待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测且所述待处理图像块的宽度小于所述待处理图像块的高度时,所述基本预测块的宽度为4像素,高度为8像素。
  20. 根据权利要求17所述的方法,其特征在于,所述根据尺寸参考信息,确定所 述基本预测块的尺寸,包括:
    当所述待处理图像块的预测方向为单向预测时,所述基本预测块的宽度为4像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测时,所述基本预测块的宽度为8像素,高度为8像素。
  21. 根据权利要求17所述的方法,其特征在于,所述根据尺寸参考信息,确定所述基本预测块的尺寸,包括:
    当所述待处理图像块的预测方向为双向预测时,所述基本预测块的宽度为8像素,高度为8像素;
    当所述待处理图像块的预测方向为单向预测且所述待处理图像块的宽度大于或等于所述待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;
    当所述待处理图像块的预测方向为单向预测且所述待处理图像块的宽度小于所述待处理图像块的高度时,所述基本预测块的宽度为4像素,高度为8像素。
  22. 根据权利要求1至21任一项所述的方法,在所述确定待处理图像块中的基本预测块的尺寸之后,还包括:
    根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;
    依次确定每个所述基本预测块在所述待处理图像块中的位置。
  23. 根据权利要求1至22任一项所述的方法,其特征在于,在所述根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,所述方法还包括:
    确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
  24. 根据权利要求1至23任一项所述的方法,其特征在于,在所述根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸之前,所述方法还包括:
    确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
  25. 根据权利要求1至24任一项所述的方法,其特征在于,所述方法用于编码所述待处理图像块,或者,解码所述待处理图像块。
  26. 一种帧间预测的装置,其特征在于,包括:
    确定模块,用于根据尺寸参考信息,确定待处理图像块中的基本预测块的尺寸,所述尺寸用于确定所述基本预测块在所述待处理图像块中的位置;
    定位模块,用于根据所述位置,确定所述基本预测块的第一参考块和第二参考块;其中,所述第一参考块的左边界线和所述基本预测块的左边界线共线,所述第二参考块的上边界线和所述基本预测块的上边界线共线,所述第一参考块与所述待处理图像块的上边界线邻接,所述第二参考块与所述待处理图像块的左边界线邻接;
    计算模块,用于对所述第一参考块对应的运动矢量、所述第二参考块对应的运动矢量以及与所述待处理图像块具有预设位置关系的原始参考块对应的运动矢量中的一个或多个进行加权计算,以获取所述基本预测块对应的运动矢量。
  27. 根据权利要求26所述的装置,其特征在于,所述与所述待处理图像块具有预设位置关系的原始参考块,包括:与所述待处理图像块具有预设空域位置关系的原始 参考块,和/或,与所述待处理图像块具有预设时域位置关系的原始参考块。
  28. 根据权利要求27所述的装置,其特征在于,所述与所述待处理图像块具有预设空域位置关系的原始参考块,包括:位于所述待处理图像块左上角且与所述待处理图像块的左上角点相邻的图像块、位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块和位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块中的一个或多个;其中,所述与所述待处理图像块具有预设空域位置关系的原始参考块位于所述待处理图像块的外部。
  29. 根据权利要求27或28所述的装置,其特征在于,所述与所述待处理图像块具有预设时域位置关系的原始参考块,包括:在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块;其中,所述与所述待处理图像块具有预设时域位置关系的原始参考块位于所述映射图像块的外部,所述映射图像块与所述待处理图像块尺寸相等,所述映射图像块在所述目标参考帧中的位置与所述待处理图像块在所述待处理图像块所在图像帧中的位置相同。
  30. 根据权利要求29所述的装置,其特征在于,所述目标参考帧的索引信息和参考帧列表信息通过解析码流获得。
  31. 根据权利要求30所述的装置,其特征在于,所述目标参考帧的索引信息和参考帧列表信息位于所述待处理图像块所在的条带的条带头对应的码流段中。
  32. 根据权利要求29至31任一项所述的装置,其特征在于,所述计算模块具体用于:
    根据如下公式获得所述基本预测块对应的运动矢量:
    P(x,y)=(H×P h(x,y)+W×P v(x,y)+H×W)/(2×H×W);
    其中,
    P h(x,y)=(W-1-x)×L(-1,y)+(x+1)×R(W,y);
    P v(x,y)=(H-1-y)×A(x,-1)+(y+1)×B(x,H);
    R(W,y)=((H-y-1)×AR+(y+1)×BR)/H;
    B(x,H)=((W-x-1)×BL+(x+1)×BR)/W;
    所述AR为所述位于所述待处理图像块右上角且与所述待处理图像块的右上角点相邻的图像块对应的运动矢量,所述BR为所述在目标参考帧中位于映射图像块右下角且与所述映射图像块的右下角点相邻的图像块对应的运动矢量,所述BL为所述位于所述待处理图像块左下角且与所述待处理图像块的左下角点相邻的图像块对应的运动矢量,所述x为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的水平距离与所述基本预测块的宽的比值,所述y为所述基本预测块的左上角点相对于所述待处理图像块的左上角点的竖直距离与所述基本预测块的高的比值,所述H为所述待处理图像块的高与所述基本预测块的高的比值,所述W为所述待处理图像块的宽与所述基本预测块的宽的比值,所述L(-1,y)为所述第二参考块对应的运动矢量,所述A(x,-1)为所述第一参考块对应的运动矢量,所述P(x,y)为所述基本预测块对应的运动矢量。
  33. 根据权利要求26至32任一项所述的装置,其特征在于,所述尺寸参考信息包括第一标识;所述第一标识用于指示所述基本预测块的尺寸;
    所述装置还包括:接收单元,用于接收码流;解析单元,用于从所述接收单元接收的所述码流中解析获取所述第一标识;其中,所述第一标识位于所述待处理图像块所在序列的序列参数集、所述待处理图像块所在图像的图像参数集和所述待处理图像块所在条带的条带头中的任一个所对应的码流段中。
  34. 根据权利要求26至33任一项所述的装置,其特征在于,所述确定模块还用于:
    确定所述待处理图像块的预测方向。
  35. 根据权利要求34所述的装置,其特征在于,所述确定模块具体用于:
    当第一向预测有效且第二向预测无效,或者,所述第二向预测有效且所述第一向预测无效时,所述待处理图像块的预测方向为单向预测;
    当所述第一向预测有效且所述第二向预测有效时,所述待处理图像块的预测方向为双向预测。
  36. 根据权利要求35所述的装置,其特征在于,所述确定模块具体用于:
    当所述待处理图像块的相邻区域内至少一个临时图像块采用第一参考帧图像列表获得运动矢量时,所述第一向预测有效;
    当所述待处理图像块的相邻区域内没有所述临时图像块采用所述第一参考帧图像列表获得运动矢量时,所述第一向预测无效;
    当所述待处理图像块的相邻区域内至少一个所述临时图像块采用第二参考帧图像列表获得运动矢量时,所述第二向预测有效;
    当所述待处理图像块的相邻区域内没有所述临时图像块采用所述第二参考帧图像列表获得运动矢量时,所述第二向预测无效。
  37. 根据权利要求35所述的装置,其特征在于,所述确定模块具体用于:
    所述运动矢量包括第一运动矢量和/或第二运动矢量,当所述待处理图像块的相邻区域内至少两个采用第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量不同时,所述第一向预测有效,其中,所述第一运动矢量对应所述第一参考帧图像列表;
    当所述待处理图像块的相邻区域内所有采用所述第一参考帧图像列表获得运动矢量的临时图像块的第一运动矢量均相同时,所述第一向预测无效;
    当所述待处理图像块的相邻区域内至少两个采用第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量不同时,所述第二向预测有效,其中,所述第二运动矢量对应所述第二参考帧图像列表;
    当所述待处理图像块的相邻区域内所有采用所述第二参考帧图像列表获得运动矢量的临时图像块的第二运动矢量均相同时,所述第二向预测无效。
  38. 根据权利要求36至37任一项所述的装置,其特征在于,所述临时图像块为具有预设尺寸的图像块。
  39. 根据权利要求36至38任一项所述的装置,其特征在于,所述待处理图像块的相邻区域包括:所述待处理图像块的左侧空域区域,上侧空域区域,右侧时域区域,下侧时域区域中的一个区域或者任意区域组合。
  40. 根据权利要求26至34任一项所述的装置,其特征在于,所述尺寸参考信息 包括所述待处理图像块的形状信息;所述形状信息包括宽度和高度;
    所述确定模块具体用于:
    当所述待处理图像块的宽度大于或等于所述待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;
    当所述待处理图像块的宽度小于所述待处理图像块的高度时,所述基本预测块的宽度为4像素,高度为8像素。
  41. 根据权利要求26至34任一项所述的装置,其特征在于,所述基本预测块的宽度为8像素,高度为8像素。
  42. 根据权利要求36至39任一项所述的装置,其特征在于,所述尺寸参考信息包括所述待处理图像块的预测方向。
  43. 根据权利要求42所述的装置,其特征在于,所述确定模块具体用于:
    当所述待处理图像块的预测方向为单向预测时,所述基本预测块的宽度为4像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测时,所述基本预测块的宽度为8像素,高度为4像素,或者,所述基本预测块的宽度为4像素,高度为8像素。
  44. 根据权利要求42所述的装置,其特征在于,所述确定模块具体用于:
    当所述待处理图像块的预测方向为单向预测时,所述基本预测块的宽度为4像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测且所述待处理图像块的宽度大于或等于所述待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测且所述待处理图像块的宽度小于所述待处理图像块的高度时,所述基本预测块的宽度为4像素,高度为8像素。
  45. 根据权利要求42所述的装置,其特征在于,所述确定模块具体用于:
    当所述待处理图像块的预测方向为单向预测时,所述基本预测块的宽度为4像素,高度为4像素;
    当所述待处理图像块的预测方向为双向预测时,所述基本预测块的宽度为8像素,高度为8像素。
  46. 根据权利要求42所述的装置,其特征在于,所述确定模块具体用于:
    当所述待处理图像块的预测方向为双向预测时,所述基本预测块的宽度为8像素,高度为8像素;
    当所述待处理图像块的预测方向为单向预测且所述待处理图像块的宽度大于或等于所述待处理图像块的高度时,所述基本预测块的宽度为8像素,高度为4像素;
    当所述待处理图像块的预测方向为单向预测且所述待处理图像块的宽度小于所述待处理图像块的高度时,所述基本预测块的宽度为4像素,高度为8像素。
  47. 根据权利要求26至46任一项所述的装置,其特征在于,所述装置还包括:
    划分模块,用于根据所述尺寸,将所述待处理图像块划分为多个所述基本预测块;依次确定每个所述基本预测块在所述待处理图像块中的位置。
  48. 根据权利要求26至47任一项所述的装置,其特征在于,所述装置还包括判断模块,用于:
    确定所述第一参考块和所述第二参考块位于所述待处理图像块所在的图像边界内。
  49. 根据权利要求26至48任一项所述的装置,其特征在于,所述装置还包括判断模块,用于:
    确定所述待处理图像块的宽大于或等于16且所述待处理图像块的高大于或等于16;或者,确定所述待处理图像块的宽大于或等于16;或者,确定所述待处理图像块的高大于或等于16。
  50. 根据权利要求26至49任一项所述的装置,其特征在于,所述装置用于编码所述待处理图像块,或者,解码所述待处理图像块。
  51. 一种帧间预测的装置,其特征在于,所述装置包括:一个或多个处理器、存储器和通信接口;
    所述存储器、所述通信接口与所述一个或多个处理器连接;所述帧间预测的装置通过所述通信接口与其他设备通信,所述存储器用于存储计算机程序代码,所述计算机程序代码包括指令,当所述一个或多个处理器执行所述指令时,所述装置执行如权利要求1-25任一项所述的帧间预测的方法。
  52. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在帧间预测的装置上运行时,使得所述帧间预测的装置执行如权利要求1-25任一项所述的帧间预测的方法。
  53. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在帧间预测的装置上运行时,使得所述帧间预测的装置执行如权利要求1-25任一项所述的帧间预测的方法。
PCT/CN2019/110206 2018-11-19 2019-10-09 一种帧间预测的方法及装置 WO2020103593A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201811377897.4 2018-11-19
CN201811377897 2018-11-19
CN201811578340.7A CN111200735B (zh) 2018-11-19 2018-12-21 一种帧间预测的方法及装置
CN201811578340.7 2018-12-21

Publications (1)

Publication Number Publication Date
WO2020103593A1 true WO2020103593A1 (zh) 2020-05-28

Family

ID=70747374

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/110206 WO2020103593A1 (zh) 2018-11-19 2019-10-09 一种帧间预测的方法及装置

Country Status (2)

Country Link
CN (1) CN111200735B (zh)
WO (1) WO2020103593A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095322A (zh) * 2023-04-10 2023-05-09 深圳传音控股股份有限公司 图像处理方法、处理设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022027878A1 (zh) * 2020-08-04 2022-02-10 深圳市精锋医疗科技有限公司 内窥镜的图像处理方法
CN114286100A (zh) * 2020-09-28 2022-04-05 华为技术有限公司 帧间预测方法及装置
CN112966556B (zh) * 2021-02-02 2022-06-10 豪威芯仑传感器(上海)有限公司 一种运动物体检测方法及系统
CN115037933B (zh) * 2022-08-09 2022-11-18 浙江大华技术股份有限公司 一种帧间预测的方法及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249740A1 (en) * 2010-04-12 2011-10-13 Canon Kabushiki Kaisha Moving image encoding apparatus, method of controlling the same, and computer readable storage medium
CN102685497A (zh) * 2012-05-29 2012-09-19 北京大学 一种avs编码器快速帧间模式选择方法及装置
CN102970526A (zh) * 2011-08-31 2013-03-13 华为技术有限公司 一种获得变换块尺寸的方法和模块
CN108632616A (zh) * 2018-05-09 2018-10-09 电子科技大学 一种基于参考质量做帧间加权预测的方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106060563B (zh) * 2011-01-07 2019-06-21 Lg电子株式会社 编码和解码图像信息的方法和使用该方法的装置
US10230980B2 (en) * 2015-01-26 2019-03-12 Qualcomm Incorporated Overlapped motion compensation for video coding
KR20170058838A (ko) * 2015-11-19 2017-05-29 한국전자통신연구원 화면간 예측 향상을 위한 부호화/복호화 방법 및 장치
WO2017086738A1 (ko) * 2015-11-19 2017-05-26 한국전자통신연구원 영상 부호화/복호화 방법 및 장치

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110249740A1 (en) * 2010-04-12 2011-10-13 Canon Kabushiki Kaisha Moving image encoding apparatus, method of controlling the same, and computer readable storage medium
CN102970526A (zh) * 2011-08-31 2013-03-13 华为技术有限公司 一种获得变换块尺寸的方法和模块
CN102685497A (zh) * 2012-05-29 2012-09-19 北京大学 一种avs编码器快速帧间模式选择方法及装置
CN108632616A (zh) * 2018-05-09 2018-10-09 电子科技大学 一种基于参考质量做帧间加权预测的方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095322A (zh) * 2023-04-10 2023-05-09 深圳传音控股股份有限公司 图像处理方法、处理设备及存储介质
CN116095322B (zh) * 2023-04-10 2023-07-25 深圳传音控股股份有限公司 图像处理方法、处理设备及存储介质

Also Published As

Publication number Publication date
CN111200735A (zh) 2020-05-26
CN111200735B (zh) 2023-03-17

Similar Documents

Publication Publication Date Title
TWI759389B (zh) 用於視訊寫碼之低複雜度符號預測
CN108605126B (zh) 滤波视频数据的经解码块的方法和装置及存储介质
US9807399B2 (en) Border pixel padding for intra prediction in video coding
JP6284954B2 (ja) イントラ予測のためのモード決定の簡略化
RU2584498C2 (ru) Видеокодирование интра-режима
US20150071357A1 (en) Partial intra block copying for video coding
WO2020103593A1 (zh) 一种帧间预测的方法及装置
JP7407741B2 (ja) ビデオ符号化方法および装置
US11172212B2 (en) Decoder-side refinement tool on/off control
CN112352429A (zh) 利用分组的旁路剩余级别进行系数编码以用于依赖量化
WO2020048180A1 (zh) 运动矢量的获取方法、装置、计算机设备及存储介质
US20180278948A1 (en) Tile-based processing for video coding
JP6224851B2 (ja) 低複雑度符号化および背景検出のためのシステムおよび方法
US11601667B2 (en) Inter prediction method and related apparatus
CN110546957A (zh) 一种帧内预测的方法及装置
JP7294576B2 (ja) 動き情報を予測するための復号方法及び復号装置
CN110876057B (zh) 一种帧间预测的方法及装置
CN110855993A (zh) 一种图像块的运动信息的预测方法及装置
WO2020024275A1 (zh) 一种帧间预测的方法及装置
WO2020038232A1 (zh) 一种图像块的运动信息的预测方法及装置
WO2020052653A1 (zh) 一种预测运动信息的解码方法及装置
RU2574280C2 (ru) Выбор единых кандидатов режима слияния и адаптивного режима предсказания вектора движения
KR20210046777A (ko) 인터 예측 방법 및 장치, 비디오 인코더 및 비디오 디코더
CN118301331A (zh) 一种运动矢量的获取方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19887870

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19887870

Country of ref document: EP

Kind code of ref document: A1