WO2020140331A1 - 视频图像处理方法与装置 - Google Patents

视频图像处理方法与装置 Download PDF

Info

Publication number
WO2020140331A1
WO2020140331A1 PCT/CN2019/077893 CN2019077893W WO2020140331A1 WO 2020140331 A1 WO2020140331 A1 WO 2020140331A1 CN 2019077893 W CN2019077893 W CN 2019077893W WO 2020140331 A1 WO2020140331 A1 WO 2020140331A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
current image
block
motion vector
image processing
Prior art date
Application number
PCT/CN2019/077893
Other languages
English (en)
French (fr)
Inventor
郑萧桢
王苏红
王苫社
马思伟
Original Assignee
深圳市大疆创新科技有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司, 北京大学 filed Critical 深圳市大疆创新科技有限公司
Priority to JP2021534799A priority Critical patent/JP7224005B2/ja
Priority to EP19907394.1A priority patent/EP3908000A4/en
Priority to CN201980005714.2A priority patent/CN111357288B/zh
Priority to KR1020217024546A priority patent/KR20210107120A/ko
Publication of WO2020140331A1 publication Critical patent/WO2020140331A1/zh
Priority to US17/060,011 priority patent/US11206422B2/en
Priority to US17/645,143 priority patent/US11689736B2/en
Priority to JP2023012074A priority patent/JP7393061B2/ja
Priority to US18/341,246 priority patent/US20230345036A1/en
Priority to JP2023195147A priority patent/JP2024012636A/ja

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/587Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal sub-sampling or interpolation, e.g. decimation or subsequent interpolation of pictures in a video sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • This application relates to the field of video encoding and decoding, and in particular to a video image processing method and device.
  • the main video coding standards adopt block-based motion compensation technology in the inter-frame prediction part.
  • the main principle is to find the most similar block in the encoded image for the current image block. This process is called motion compensation.
  • CTU Coding Tree Unit
  • Each CTU can be further divided into square or rectangular coding units (Coding Units, CU).
  • CU Coding Units
  • Each CU looks for the most similar block in the reference frame (usually the reconstructed frame near the time domain of the current frame) as the prediction block of the current CU.
  • the relative displacement between the current block (that is, the current CU) and the similar block (that is, the prediction block of the current CU) is called a motion vector (Motion Vector, MV).
  • Motion Vector Motion Vector
  • the process of finding the most similar block in the reference frame as the prediction block of the current block is motion compensation.
  • the motion information candidate list of the current CU is usually constructed in two ways.
  • the first is the candidate motion vector in the spatial domain, usually the motion information of the encoded neighboring blocks of the current CU is filled into the candidate list; the second is the candidate motion vector in the time domain, temporal domain motion vector prediction (Temproal Motion Vector Prediction, TMVP)
  • TMVP Temporal Motion Vector Prediction
  • the motion information of the corresponding position CU of the current CU in the adjacent encoded image is used.
  • the motion vector of the current CU is determined according to one candidate motion vector in the motion information candidate list; the prediction block of the current CU is determined according to the motion vector of the current CU.
  • This application provides a video image processing method and device, which can reduce the complexity of the ATMVP technology while maintaining the performance gain of the existing ATMVP technology.
  • a video image processing method includes:
  • the TMVP operation determines the time domain candidate motion vector of the current image block ;among them,
  • the TMVP operation includes:
  • the ATMVP operation includes:
  • the time domain candidate motion vector of the sub-image block of the current image block is determined according to the motion vector of the sub-related block corresponding to each sub-image block.
  • a video image processing device includes:
  • a memory and a processor the memory is used to store instructions, the processor is used to execute the instructions stored in the memory, and the execution of the instructions stored in the memory causes the processor to:
  • the TMVP operation determines the time domain candidate motion vector of the current image block ;among them,
  • the TMVP operation includes:
  • the ATMVP operation includes:
  • the time domain candidate motion vector of the sub-image block of the current image block is determined according to the motion vector of the sub-related block corresponding to each sub-image block.
  • a computer non-volatile storage medium on which a computer program is stored, which when executed by a computer causes the computer to implement the first aspect or any possible implementation manner of the first aspect Methods.
  • a computer program product containing instructions, which when executed by a computer causes the computer to implement the first aspect or the method in any possible implementation manner of the first aspect.
  • FIG. 1 is a schematic flowchart of a video image processing method provided by an embodiment of the present application.
  • FIG. 2 is a schematic block diagram of a video image processing device provided by an embodiment of the present application.
  • FIG. 3 is a schematic block diagram of an implementation manner of an encoding device or a decoding device provided by an embodiment of the present application.
  • a prediction block refers to a basic unit used for prediction in a frame of image.
  • the prediction block is also called a prediction unit (Prediction Unit, PU).
  • Prediction Unit PU
  • the image is divided into a plurality of image blocks. Further, each image block in the plurality of image blocks can be divided into a plurality of image blocks again, and so on.
  • the number of divided levels can be different, and the operation methods undertaken are also different.
  • the names of image blocks at the same level may be different.
  • each image block of a plurality of image blocks into which a frame image is divided for the first time is called a coding tree unit (Cding); each coding tree unit may contain a coding A unit (Coding Unit, CU) may be divided into multiple coding units again; a coding unit may be divided into one, two, four, or other numbers of prediction units according to a prediction method.
  • the coding tree unit is also called the largest coding unit (Largest Coding Unit, LCU).
  • Prediction refers to finding image data similar to the predicted block, also referred to as the reference block of the predicted block. By encoding/compressing the difference between the prediction block and the reference block of the prediction block, redundant information in encoding/compression is reduced.
  • the difference between the prediction block and the reference block may be a residual obtained by subtracting corresponding pixel values of the prediction block and the reference block.
  • the prediction includes intra prediction and inter prediction. Intra prediction refers to searching for the reference block of the prediction block in the frame where the prediction block is located, and inter prediction refers to searching for the reference block of the prediction block in a frame other than the frame where the prediction block is located.
  • the prediction unit is the smallest unit in the image, and the prediction unit will not continue to be divided into multiple image blocks.
  • the "image block” or “current image block” mentioned below refers to a prediction unit (or a coding unit), and an image block can be further divided into multiple sub-image blocks, and each sub-image block can be further Make predictions.
  • the current image block is an image block to be encoded (or decoded).
  • the image frame where the current image block is located is called the current frame.
  • the current image block is a coding unit (CU) in some video standards.
  • a motion information candidate list is constructed, and the current image block is predicted according to the candidate motion information selected in the motion information candidate list.
  • the motion information mentioned in this article may include motion vectors, or include motion vectors and reference frame information.
  • the motion information candidate list refers to a set of candidate motion information of the current block.
  • the motion information candidates in the motion information candidate list may be stored in the same buffer or in different buffers. There is no restriction here.
  • the index of the motion information in the motion information candidate list mentioned below may be the index of the motion information in all the candidate motion information sets of the current block, or the index of the motion information in the buffer area, here No restrictions.
  • the current image block can be encoded through the following steps.
  • the current image block can be decoded through the following steps.
  • the preset method may be consistent with the method of constructing the motion information candidate list at the encoding end.
  • the motion vector of the current image block is equal to the prediction MV (Motion vector prediction, MVP) (for example, the aforementioned motion vector MV1).
  • the first mode includes merge mode and/or merge mode.
  • the encoding end selects the optimal motion information from the motion information candidate list and determines the predicted MV of the current image block according to the motion information, and then uses the prediction MV is the search starting point for motion search, and the displacement between the final searched position and the search starting point is recorded as the motion vector difference (Motion Vector Difference, MVD). Then, according to the predicted MV+MVD of the current image block, the predicted image block of the current image block is determined from the reference image. Therefore, the code stream sent from the encoding end to the decoding end includes the MVD in addition to the index number and the residual mentioned in the first type of mode. In some video codec standards, this second type of mode may include Advanced Motion Vector Prediction (Advanced Motion Vector Prediction, AMVP) mode.
  • AMVP Advanced Motion Vector Prediction
  • the manner of constructing the motion information candidate list in different types of modes may be the same or different.
  • the motion information candidate list constructed in the same way may be applied to only one type of mode, or may be applied to different types of construction modes.
  • the method for determining one of the candidates in the motion information candidate list may use only one of the types of patterns, or may use different types of construction patterns. There are no restrictions here.
  • motion information candidate lists in two construction modes are provided.
  • the motion information candidate lists in the two construction modes are hereinafter referred to as a first motion vector candidate list and a second motion vector candidate list.
  • One difference between the two lists is that at least one candidate in the first candidate list of motion vectors includes the motion vector of the sub-image block, and each candidate in the second candidate list of motion vectors includes the motion vector of the image block.
  • the image block here is the same type of concept as the current image block, which refers to a prediction unit (or a coding unit), and the sub-image block refers to the number of segments obtained on the basis of the image block. Sub image blocks.
  • the reference block of the current image block is determined according to the candidate, and then the residual between the image block and the reference block is calculated.
  • the adopted candidate is the motion vector of the sub-image block
  • the reference block of each sub-image block in the current image block is determined according to the candidate, and then calculated
  • the residuals of each sub-image block in the current image block and its reference block are stitched into the residuals of the current image block.
  • the second candidate list of motion vectors of the current image block mentioned in this solution may be applied to the above-mentioned first-type mode and/or second-type mode.
  • the second candidate list of motion vectors may be a regular Merge motion information candidate list (Normal, Candidate, List) in the Merge candidate list.
  • the second candidate list of motion vectors may be an AMVP candidate list (AMVP Candidate List).
  • the first candidate list of motion vectors may be an affine Merge motion information candidate list (Affine Merge Candidate List) in the Merge candidate list. It should be understood that the second candidate list of the motion vector may also have another name.
  • first candidate list of motion vectors and the second candidate list of motion vectors formed by the construction scheme provided by the present application may be applied to the encoding end and the decoding end.
  • the execution subject of the method provided by the present application may be an encoding end or a decoding end.
  • determining the candidates in the first candidate list of motion vectors and/or the second candidate list of motion vectors it may be based on TMVP operations and/or advanced/alternative temporal motion vector prediction (Advanced/Alternative temporal motion)
  • the vector prediction (ATMVP) operation determines the candidates.
  • ATMVP operation is a motion vector prediction mechanism.
  • the basic idea of ATMVP technology is to perform motion compensation by acquiring motion information of multiple sub-blocks in the current image block (for example, current CU).
  • the ATMVP operation introduces motion information of multiple sub-blocks in the current image block as candidates in constructing a candidate list (such as merge/affine merge candidate list or AMVP candidate list).
  • the realization of ATMVP technology can be roughly divided into two steps.
  • the first step is to determine a time domain vector by scanning the candidate motion vector list of the current image block or the motion vectors of the adjacent image blocks of the current image block; the second step is to divide the current image block into N ⁇ N sub-blocks (For example, sub-CU), determine the corresponding block of each subblock in the reference frame according to the time domain vector acquired in the first step, and determine the motion vector of each subblock according to the motion vector of the corresponding block of each subblock in the reference frame .
  • N ⁇ N sub-blocks For example, sub-CU
  • the motion vector determined according to the ATMVP operation may be added to the list as a candidate (for example, as the first candidate).
  • the motion vector determined according to the TMVP operation can be added to the list as a candidate.
  • the time domain candidate motion vector determined according to the TMVP operation can be added as a candidate to both the regular Merge candidate list and the AMVP candidate list.
  • the time-domain candidate motion vector determined according to the TMVP operation may be added as a candidate to the conventional Merge candidate list or to the AMVP candidate list.
  • the TMVP operation includes: determining a relevant block of the current image block in images adjacent in the time domain; and determining a candidate motion vector of the current image block according to the motion vector of the relevant block.
  • the ATMVP operation includes: determining a relevant block of the current image block in images adjacent in the time domain; dividing the current image block into a plurality of sub-image blocks; determining the plurality of sub-image blocks in the relevant block A sub-correlation block corresponding to each sub-image block in each of the sub-image blocks; determining a time-domain candidate motion vector of the sub-image block of the current image block according to the motion vector of the sub-correlation block corresponding to each sub-image block
  • the image adjacent to the time domain mentioned in the TMVP operation and the ATMVP operation may be the reference image closest to the time of the image where the current image block is located; or, the image adjacent to the time domain may be the reference image preset at the codec end Or, the image adjacent to the time domain may be the reference image of the current image block as the reference image specified in the video parameter set, sequence header, sequence parameter set, image header, image parameter set, and stripe header.
  • the images adjacent in the time domain may be co-located frames of the current image block.
  • the co-located frame is the frame set in the slice-level information header for acquiring motion information for prediction. In some application scenarios, the co-located frame is also called a position-dependent frame (collocated picture).
  • the relevant block of the current image block may be a co-located block of the current image block.
  • the relevant block may be called a collocated block or corresponding block.
  • the co-located block may be an image block in the co-located frame that has the same position as the current image block, or an image block in the co-located frame that has the same position difference as the current image block.
  • the method of determining the relevant block of the current image block in the TMVP operation and the ATMVP operation may be the same or different.
  • the method of determining the relevant block of the current image block in the TMVP operation is the same as that in the ATMVP operation, and both include: determining the image block at the specified position in the current image block at the same position in the image adjacent to the time domain Is a relevant block of the current image block; or, an image block at a specified position adjacent to the spatial domain of the current image block at the same position in the image adjacent to the time domain is determined as a relevant block of the current image block .
  • all the spatial candidate motion vectors currently added to the motion vector merge candidate list are scanned to determine the relevant blocks of the current image block.
  • “determine the image block at the specified position in the current image block at the same position in the image adjacent to the time domain as the relevant block of the current image block; or, The manner of determining the image blocks at the same position adjacent to the spatial domain at the same position in the image adjacent to the time domain as the relevant blocks of the current image block can simplify redundant operations in the TMVP operation and the ATMVP operation.
  • the size of the relevant block of the current image block may be the same as the size of the current image block, or the size of the relevant block of the current image block is the default value.
  • the specified position in the current image block may be any position in the current image block, for example, may be any one of the upper left corner point, the upper right corner point, the center point, the lower left corner point, and the lower right corner point of the current image block .
  • the specified position adjacent to the airspace in the current image block refers to a specified position other than the current image block in the current image, for example, a specified position adjacent to the current image block.
  • the block serves as a relevant block of the current image block, or an image block with the pixel point as the upper left corner point and having the same size as the current image block or a preset size may also be used as the relevant block of the current image block.
  • the size of the sub-image block is adaptively set at the frame level.
  • the size of the sub-image block is 4 ⁇ 4 by default.
  • the size of the sub-image block is set to 8 ⁇ 8 .
  • the average block size of each sub-image block in the CU is calculated when the last encoded image block in the same time-domain layer is encoded in ATMVP mode.
  • the average block size is greater than the threshold, the current image
  • the size of the sub-image block of the block is set to 8 ⁇ 8, otherwise the default value of 4 ⁇ 4 is used.
  • motion vectors are stored in a size of 8 ⁇ 8. It should be understood that when the size of the sub-image block is set to 4 ⁇ 4, the size of the motion vector of the sub-image block (also 4 ⁇ 4) does not meet the storage granularity of the motion vector in the current standard.
  • the size of the sub-image block when encoding the current image block, it is also necessary to store information on the size of the sub-image block of the previous encoded image block in the same time-domain layer.
  • the current image block is a CU
  • the sub-image block obtained after dividing it may be called a sub-CU.
  • the size of the sub-image block and/or the size of the relevant block of the sub-image block is fixed to be greater than or equal to 64 pixels.
  • the size of the sub-image block and/or the size of the relevant block of the sub-image block are both fixed at 8 ⁇ 8 pixels.
  • the size of the sub-picture block of the current image block is fixedly set to 8 ⁇ 8, on the one hand, it can adapt to the storage granularity of the motion vector specified in the video standard VVC, on the other hand, there is no need to store the sub-picture block of the previous encoded image block The size of the information, therefore, can save storage space.
  • the size of the sub-image block and/or the related block of the sub-image block may also be
  • the size of the sub-image block and/or the size of the relevant block of the sub-image block is A ⁇ B, A ⁇ 64, B ⁇ 64, and both A and B are integers of 4.
  • the size of the sub-image block and/or the size of the relevant block of the sub-image block is 4 ⁇ 16 pixels, or 16 ⁇ 4 pixels.
  • the storage granularity of the motion vector may not be 8 ⁇ 8, but other numerical values.
  • the size of the sub-image block of the current image block is set to the same granularity as the motion vector, both are x ⁇ y, and x and y are positive integers.
  • the current when determining the candidates in the first candidate list of motion vectors and/or the second candidate list of motion vectors of the current block, when the size of the current image block satisfies the preset condition, the current is determined according to the ATMVP operation Time domain candidate motion vectors of the image blocks in the first motion vector candidate list and/or the second motion vector candidate list.
  • the ATMVP operation when determining the candidates in the first candidate list of motion vectors and/or the second candidate list of motion vectors of the current block, when the size of the current image block does not meet the preset condition, the ATMVP operation is closed, or That is, the temporal candidate motion vectors of the current image block in the first candidate list of motion vectors and/or the second candidate list of motion vectors are not determined according to the ATMVP operation.
  • the operation according to the TMVP Determine time-domain candidate motion vectors of the current image block in the first motion vector candidate list and/or the second motion vector candidate list.
  • the TMVP operation when determining the candidates in the first candidate list of motion vectors and/or the second candidate list of motion vectors for each image block, when the size of the current image block does not satisfy the preset condition, the TMVP operation is closed That is, according to the TMVP operation, the time domain candidate motion vectors of the current image block in the first motion vector candidate list and/or the second motion vector candidate list are determined.
  • the preset condition may include one condition, or include a combination of multiple conditions.
  • the time domain candidate motion vector of the current image block is determined according to the ATMVP operation.
  • the time domain candidate motion vector of the current image block is determined according to the TMVP operation.
  • the number of conditions in the first condition group is at least one.
  • the number of conditions in the second condition group is at least one.
  • the first condition group and the second condition group may be completely the same, or completely different, or partially the same.
  • the size of the current image block is x1 ⁇ y1, and the default setting size of the sub-image blocks of the current image block is x2 ⁇ y2, where x1, x2, y1, and y2 are all positive integers;
  • the preset conditions include: x1 is not less than x2, and/or, y1 is not less than y2.
  • the time-domain candidate motion vector of the current image block is determined according to the ATMVP operation and/or TMVP operation.
  • x1 is less than or equal to x2, and/or, y1 is less than or equal to y2, it is set not to perform the ATMVP operation.
  • y1 is less than or equal to y2
  • the size of the current image block is x1 ⁇ y1, and the preset size is x3 ⁇ y3; where x1, x3, y1, and y3 are all positive integers; the preset conditions include: x1 is not less than x3, and/or , Y1 is not less than y3.
  • the time domain candidate motion vector of the current image block is determined according to the ATMV operation and/or TMVP operation.
  • x1 is less than or equal to x3, and/or, y1 is less than or equal to y3, it is set not to perform the TMVP operation.
  • Skip TMVP operation In hardware design, try to require the same size of processing area to complete the encoding or decoding time. Therefore, for the area that contains more small blocks, the pipeline time is much longer than other areas; therefore, save The pipeline time of small blocks is very meaningful for parallel processing of hardware.
  • the size of the current image block is small, skipping the TMVP operation can save the pipeline time of small blocks.
  • the current coding technology's utilization of time-domain correlation is getting higher and higher, and many time-domain prediction techniques are adopted, such as ATMVP, so for small blocks, the performance impact of skipping TMVP can be ignored Excluding.
  • the TMVP operation when it is mentioned that the TMVP operation is not performed or the TMVP operation is skipped, in the case where both the first-type mode and the second-type mode motion information candidate list adopt the TMVP operation to determine the candidate, the first-type mode or The TMVP operation is skipped only in the motion information candidate list of the second type mode; alternatively, the TMVP operation can be skipped in both the motion information candidate list of the first type mode and the second type mode.
  • the storage granularity of the motion vector is x3 ⁇ y3.
  • the size of the current image block is the same as the first default size
  • only one of the TMVP operation and the ATMVP operation is performed.
  • only ATMVP operations are performed.
  • the first default size may be the same as the size of the storage granularity of the motion vector.
  • ATMVP technology and TMVP technology have some redundancy. Both technologies export a set of time-domain motion information for the current image block. Therefore, one of the operations is not set. Some redundant operations are skipped, which can effectively save codec time.
  • ATMVP uses the candidate motion vectors from the existing merge list of the current image block when exporting, while TMVP directly derives the candidate motion vectors from a fixed position adjacent to the encoded image.
  • the motion vectors derived from ATMVP technology are more effective and adaptive than those of TMVP technology, so the setting does not perform TMVP operation.
  • the time domain candidate motion vector of the current image block is determined according to the ATMVP operation and/or the TMVP operation. That is, when the size of the current image block can cover the storage granularity of a motion vector, the time-domain candidate motion vector of the current image block is determined according to the ATMVP operation and/or TMVP operation.
  • the current image block when the current image block cannot contain the storage granularity of a motion vector or the size of the current image block is equal to the storage granularity of a motion vector, it is set that the TMVP operation is not performed.
  • the TMVP technology derives a set of temporal motion information.
  • the same result of the derived motion vector may occur, causing unnecessary division operations.
  • the preset condition includes: the number of pixels of the current image block is greater than or equal to or greater than a preset value. In one example, when the number of pixels of the current image block is less than or equal to the preset value, the TMVP operation and/or the ATMVP operation are not performed.
  • the preset value may be 32 or 64.
  • the width or height of the current CU block when the width or height of the current CU block is less than 8, the width and height of the current CU block are both equal to 8, and the width or height of the current CU block is less than 8 or the width and height of the current CU block are both When it is equal to 8, the TMVP operation is not set. Since some redundant operations are skipped, codec time can be effectively saved.
  • the TMVP operation process is closed, that is, the TMVP technology is not used to determine the time domain candidate motion vectors added to the second candidate list of motion vectors.
  • the TMVP process is turned off during the process of constructing the merge candidate motion vector list for the current image block. In this case, the relevant information in the time domain is still effectively used, and the merge candidate list construction process is simpler, which can reduce the complexity of the codec to a certain extent.
  • an affine transformation (affine) motion compensation model can be introduced in the codec technology.
  • Affine transformation motion compensation uses a set of control points (control points) to describe the affine motion field of an image block.
  • the affine transformation motion compensation model uses a four-parameter Affine model, and the set of control points includes two control points (such as the upper left corner and upper right corner of the image block).
  • the affine transformation motion compensation model uses a six-parameter Affine model, and the set of control points includes three control points (such as the upper left corner, upper right corner, and lower left corner of the image block).
  • the added candidate when constructing the first candidate list of motion vectors, may be an MV of a set of control points, or called a control point prediction motion vector (CPMVP, Control point motion vector prediction).
  • CPMVP Control point prediction motion vector
  • the first candidate list of motion vectors may be used in the Merge mode. Specifically, it may be called an Affine Merge mode; correspondingly, the first candidate list of motion vectors may be called an affiliate merged list. In the Affine Merge mode, the prediction in the first candidate list of the motion vector is directly used as the CPMV (Control point motion) vector of the current image block, that is, the affine motion estimation process is not required.
  • CPMV Control point motion
  • the candidates determined according to the ATMVP technology may be added to the first candidate list of motion vectors.
  • control point motion vector group of the relevant block of the current image block is added as a candidate to the motion vector first candidate list.
  • the candidate in the first list of motion vectors is used for prediction, the current image block is predicted according to the control point motion vector group of the relevant block of the current image block.
  • the representative motion vector of the relevant block of the current image block is added as a candidate to the first candidate list of motion vectors.
  • the candidate is also marked as determined according to the ATMVP technology.
  • the relevant block of the current image block is determined according to the marker and the candidate, and the current image block and the relevant block are divided into multiple sub-image blocks in the same manner.
  • Each sub-image block in the image block has a one-to-one correspondence with each sub-image block in the related block; according to the motion vector of each sub-image block in the related block, the motion vector of the corresponding sub-image block in the current image block prediction.
  • the representative motion vector of the relevant block is used to replace the unavailable motion vector to predict the corresponding sub-image block in the current image block.
  • the candidate determined according to the ATMVP technology is discarded from the motion vector second candidate list.
  • the sub-image block in the related block is not available, or when the sub-image block in the related block adopts the intra-frame coding mode, it is determined that a sub-image block in which the motion vector is not available appears in the related block.
  • each candidate in the first candidate list of motion vectors includes a set of motion vectors of control points; when the representative motion vectors of the relevant blocks of the current image block are added to the first candidate list of motion vectors, The consistency of the data format, the representative motion vector of the relevant block can be inserted as the motion vector of each control point in the candidate (that is, the motion vector of each control point in the candidate is assigned to the relevant block Stands for motion vector).
  • the representative motion vector of the relevant block of the current image block may refer to the motion vector of the center position of the relevant block, or other motion vectors representing the relevant block, which are not limited herein.
  • an embodiment of the present application further provides a video image processing method.
  • the method includes the following steps.
  • the TMVP operation includes: determining a relevant block of the current image block in images adjacent to the time domain; and determining a candidate motion vector of the current image block according to the motion vector of the relevant block.
  • the ATMVP operation includes: determining a relevant block of the current image block in images adjacent in the time domain; dividing the current image block into a plurality of sub-image blocks; determining the plurality of sub-image blocks in the relevant block The sub-correlation block corresponding to each sub-image block; the time-domain candidate motion vector of the current image block is determined according to the motion vector of the sub-correlation block corresponding to each sub-image block.
  • FIG. 2 is a schematic block diagram of a video image processing apparatus 200 provided by an embodiment of the present application.
  • the device 900 is used to execute the method embodiment shown in FIG. 7.
  • the device 200 includes the following units.
  • the first determining module 210 is used to determine the current image block
  • the second determining module 220 is configured to determine the current TMVP operation according to the time-domain motion vector when the size of the current image block satisfies a preset condition, and/or the advanced/optional time-domain motion vector prediction ATMVP operation to determine the current Time domain candidate motion vector of image block.
  • the TMVP operation includes: determining a relevant block of the current image block in images adjacent to the time domain; and determining a candidate motion vector of the current image block according to the motion vector of the relevant block.
  • the ATMVP operation includes: determining a relevant block of the current image block in images adjacent in the time domain; dividing the current image block into a plurality of sub-image blocks; determining the plurality of sub-image blocks in the relevant block The sub-correlation block corresponding to each sub-image block; the time-domain candidate motion vector of the current image block is determined according to the motion vector of the sub-correlation block corresponding to each sub-image block.
  • first determination module and the second determination module in this embodiment may be implemented by a processor.
  • An embodiment of the present application also provides a video image processing device.
  • the apparatus may be used to perform the method embodiments described above.
  • the apparatus includes a processor and a memory, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory, and the execution of the instructions stored in the memory causes the processor to be used to perform the method according to the above method embodiments.
  • the apparatus may further include a communication interface for communicating with external devices.
  • the processor is used to control the communication interface to receive and/or send signals.
  • the device provided in this application may be applied to an encoder or a decoder.
  • FIG. 3 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to simply as a decoding device 1100) provided by this application.
  • the decoding device 1100 may include a processor 1110, a memory 1130, and a bus system 1150.
  • the processor and the memory are connected through a bus system, the memory is used to store instructions, and the processor is used to execute the instructions stored in the memory.
  • the memory of the encoding device stores the program code, and the processor can call the program code stored in the memory to perform various video encoding or decoding methods described in this application, especially the inter prediction method described in this application. In order to avoid repetition, they are not described in detail here.
  • the processor 1110 may be a CPU, and the processor 1110 may also be other general-purpose processors, DSP, ASIC, FPGA or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. .
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 1130 may include ROM or RAM. Any other suitable type of storage device may also be used as the memory 1130.
  • the memory 1130 may include code and data 1131 accessed by the processor 1110 using the bus 1150.
  • the memory 1130 may further include an operating system 1133 and an application program 1135 including at least one program that allows the processor 1110 to perform the video encoding or decoding method described in the present application (in particular, the inter prediction method described in the present application).
  • the application program 1135 may include applications 1 to N, which further include a video encoding or decoding application (referred to simply as a video decoding application) that performs the video encoding or decoding method described in this application.
  • the bus system 1150 may also include a power bus, a control bus, and a status signal bus. However, for clarity, various buses are marked as the bus system 1150 in the figure.
  • the decoding device 1100 may also include one or more output devices, such as a display 1170.
  • the display 1170 may be a tactile display that merges the display with a tactile unit that operably senses touch input.
  • the display 1170 may be connected to the processor 1110 via the bus 1150.
  • An embodiment of the present application further provides a computer storage medium on which a computer program is stored.
  • the computer program is executed by the computer, the computer is caused to execute the method provided in the foregoing method embodiment.
  • Embodiments of the present application also provide a computer program product containing instructions, which when executed by a computer causes the computer to execute the method provided by the foregoing method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data center Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available medium integrated servers, data centers, and the like.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital video disc (DVD)), or semiconductor media (eg, solid state disk (SSD)), etc. .
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical, or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

提供一种视频图像处理方法与装置,该方法包括:确定当前图像块;当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量。在保证编解码性能的前提下,可以降低复杂度。

Description

视频图像处理方法与装置
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及视频编解码领域,并具体涉及一种视频图像处理方法与装置。
背景技术
目前,主要的视频编码标准在帧间预测部分都采用了基于块的运动补偿技术,其主要原理是为当前图像块在已编码图像中寻找一个最相似块,该过程称为运动补偿。例如,对于一帧图像,先分成等大的编码区域(Coding Tree Unit,CTU),例如,大小为64×64或128×128。每个CTU可以进一步划分成方形或矩形的编码单元(Coding Unit,CU)。每个CU在参考帧中(一般为当前帧的时域附近的已重构帧)寻找最相似块作为当前CU的预测块。当前块(即当前CU)与相似块(即当前CU的预测块)之间的相对位移称为运动矢量(Motion Vector,MV)。在参考帧中寻找最相似块作为当前块的预测块的过程就是运动补偿。
在当前的一种预测模式中,通常根据两种方式构建当前CU的运动信息候选列表。首先是空域的候选运动矢量,通常是将当前CU的已编码的邻近块的运动信息填充至候选列表中;其次是时域的候选运动矢量,时域运动矢量预测(Temproal Motion Vector Prediction,TMVP)利用了当前CU在邻近已编码图像中对应位置CU的运动信息。根据运动信息候选列表中的一个候选运动矢量确定当前CU的运动矢量;根据当前CU的运动矢量确定当前CU的预测块。
目前的预测模式中还存在改进的空间。
发明内容
本申请提供一种视频图像处理方法与装置,在保持现有ATMVP技术的性能增益的前提下,可以降低ATMVP技术的复杂度。
第一方面,提供一种视频图像处理方法,该方法包括:
确定当前图像块;
当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量;其中,
所述TMVP操作包括:
在时域邻近的图像中确定当前图像块的相关块;
根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量;
所述ATMVP操作包括:
在时域邻近的图像中确定当前图像块的相关块;
将所述当前图像块划分成多个子图像块;
在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;
根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的子图像块的时域候选运动矢量。
将本申请提供的方案可以对存在的冗余操作进行简化。
第二方面,提供一种视频图像处理装置,该装置包括:
存储器与处理器,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,并且对所述存储器中存储的指令的执行使得,所述处理器用于:
确定当前图像块;
当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量;其中,
所述TMVP操作包括:
在时域邻近的图像中确定当前图像块的相关块;
根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量;
所述ATMVP操作包括:
在时域邻近的图像中确定当前图像块的相关块;
将所述当前图像块划分成多个子图像块;
在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;
根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的子图像块的时域候选运动矢量。
第三方面,提供一种计算机非易失性存储介质,其上存储有计算机程序,所述计算机程序被计算机执行时使得所述计算机实现第一方面或第一方面的任一可能的实现方式中的方法。
第四方面,提供一种包含指令的计算机程序产品,所述指令被计算机执行时使得所述计算机实现第一方面或第一方面的任一可能的实现方式中的方法。
附图说明
图1是本申请实施例提供的视频图像处理方法的示意性流程图。
图2是本申请实施例提供的视频图像处理装置的示意性框图。
图3为本申请实施例提供的编码设备或解码设备的一种实现方式的示意性框图。
具体实施方式
在视频编解码中,预测步骤用于减少图像中的冗余信息。预测块指的是一帧图像中用于预测的基本单元,在一些标准中,该预测块也称为预测单元(Prediction Unit,PU)。在对一帧图像进行编码/压缩之前,图像被分成多个图像块,进一步的,该多个图像块中的每一个图像块可以再次被分成多个图像块,以此类推。不同的编码方法中,分割的层级数量可以不同,所承担的操作方法也不同。不同的编码标准中,对同一层级上的图像块的名称可能不同。例如,在一些视频标准中,一帧图像第一次被分割成的多个图像块中的 每个图像块称为编码树单元(Coding Tree Unit,CTU);每个编码树单元可以包含一个编码单元(Coding Unit,CU)或者再次分割成多个编码单元;一个编码单元可以根据预测方式分割成一个、两个、四个或者其他数量的预测单元。在一些视频标准中,该编码树单元也被称为最大编码单元(Largest Coding Unit,LCU)。
预测指的是查找与该预测块相似的图像数据,也称为该预测块的参考块。通过对该预测块和该预测块的参考块之间的差异进行编码/压缩,以减少编码/压缩中的冗余信息。其中,预测块与参考块的差异可以是由该预测块与该参考块的相应像素值相减得到的残差。预测包括帧内预测和帧间预测。帧内预测指的是在预测块所在帧内查找该预测块的参考块,帧间预测指的是在除预测块所在帧以外的其他帧内查找该预测块的参考块。
在现有的一些视频标准中,预测单元是图像中最小的单元,预测单元不会继续被划分成多个图像块。但下文中所提到的“图像块”或“当前图像块”指的是一个预测单元(或一个编码单元),而且一个图像块可以被继续划分成多个子图像块,每个子图像块可以进一步做预测。当前图像块为待进行编码(或解码)的图像块。当前图像块所在的图像帧称为当前帧。例如,当前图像块在一些视频标准中为一个编码单元(CU)。
本方案中,在对当前图像块进行预测之前,会构建运动信息候选列表,根据在该运动信息候选列表中选中的候选运动信息对当前图像块进行预测。其中,本文中所提到的运动信息可以包括运动矢量,或者包括运动矢量和参考帧信息。其中,该运动信息候选列表指的是当前块的候选运动信息的集合,该运动信息候选列表中的各候选运动信息可以存储在同一个缓冲区(buffer)中,也可以存储在不同的缓冲区中,在此不做限制。下文中所提到的运动信息在运动信息候选列表中的索引,可以是运动信息在当前块的全部候选运动信息集合中的索引,或者,也可以运动信息在所在的缓冲区的索引,在此不做限制。
构建运动信息候选列表有多类模式。下面先对构建运动信息候选列表的多类模式进行举例说明。
在第一类模式中,作为第一个示例,在编码端,在构建好运动信息候选列表之后,可以通过如下步骤完成当前图像块的编码。
1)从运动信息候选列表中选出最优的一个运动信息,根据该运动信息确定当前图像块的运动矢量MV1,并获得该选出的运动信息在运动信息候选列表中的索引。
2)根据当前图像块的运动矢量MV1,从参考图像(即参考帧)中确定当前图像块的预测图像块。即,确定当前图像块的预测图像块在参考帧中的位置。
3)获得当前图像块与预测图像块之间的残差。
4)向解码端发送步骤1)中获得的索引以及步骤3)获得的残差。
作为示例,在解码端,可以通过如下步骤解码出当前图像块。
1)从编码端接收残差与索引。
2)采用预设的方法构建运动信息候选列表。该预设的方法与编码端构建运动信息候选列表的方法可以一致。
3)根据索引,在运动信息候选列表中选中运动信息并根据该选中的运动信息确定当前图像块的运动矢量MV1。
4)根据运动矢量MV1,获取当前图像块的预测图像块,再结合残差,解码得到当前图像块。
也即在第一类模式中,当前图像块的运动矢量等于预测MV(Motion vector prediction,MVP)(例如上述提到的运动矢量MV1)。在一些视频编解码标准中,该第一类模式包括merge模式和/或affine merge模式。
在第二类模式中,与第一类模式不同的是,编码端根据从运动信息候选列表选出最优的一个运动信息并根据该运动信息确定当前图像块的预测MV后,还以该预测MV为搜索起点进行运动搜索,将最终搜索到的位置与搜索起点的位移记为运动矢量差值(Motion vector difference,MVD)。然后根据当前图像块的预测MV+MVD,从参考图像中确定当前图像块的预测图像块。因此,编码端向解码端发送的码流中除了包括第一类模式中提到的索引号和残差,还包括该MVD。在一些视频编解码标准中,该第二类模式可以包括高级运动矢量预测(Advanced Motion Vector Prediction,AMVP)模式。
不同类模式下的运动信息候选列表的构建方式可以相同也可以不同。同一种方式构建的运动信息候选列表可以只适用其中的一种类型模式,也可以适用不同类型的构建模式。确定运动信息候选列表中的其中一个候选者的方 法可以只使用其中一种类型模式,也可以使用不同类型的构建模式。在此不做限制。
本方案中将提供两种构建方式的运动信息候选列表,为描述方便,下文中称该两种构建方式的运动信息候选列表为运动矢量第一候选列表和运动矢量第二候选列表。该两种列表的一个区别在于:运动矢量第一候选列表中的至少一个候选者包括子图像块的运动矢量,运动矢量第二候选列表中的每个候选者包括图像块的运动矢量。
如上文所说,这里的图像块和当前图像块是同一类型的概念,指的均是一个预测单元(或一个编码单元),子图像块指的是在该图像块的基础上分割得到的多个子图像块。在采用运动矢量第一候选列表中的候选者进行预测时,根据该候选者确定当前图像块的参考块,然后计算该图像块与该参考块的残差。在采用运动矢量第二候选列表中的候选者进行预测时,若采用的候选者为子图像块的运动矢量,则根据该候选者确定当前图像块中的各子图像块的参考块,然后计算当前图像块中的各子图像块与其参考块的残差,将各子图像块的残差拼接成该当前图像块的残差。
本方案中提到的当前图像块的运动矢量第二候选列表可以应用于上述的第一类模式和/或第二类模式。例如,在一些视频编解码标准中,该运动矢量第二候选列表可以是Merge候选列表中的常规Merge运动信息候选列表(Normal Merge Candidate List)。在一些视频编解码标准中,该运动矢量第二候选列表可以是AMVP候选列表(AMVP Candidate List)。在一些视频编解码标准中,该运动矢量第一候选列表可以是Merge候选列表中的仿射Merge运动信息候选列表(Affine Merge Candidate List)。应理解,运动矢量第二候选列表也可以有别的名称。
应理解,本申请提供的构建的方案所形成的运动矢量第一候选列表和运动矢量第二候选列表可以应用于编码端与解码端。换句话说,本申请提供的方法的执行主体可以为编码端,也可以为解码端。
在一个示例中,在确定运动矢量第一候选列表和/或运动矢量第二候选列表中的候选者时,可以根据TMVP操作和/或高级/可选时域运动矢量预测(Advanced/Alternative temporal motion vector prediction,ATMVP)操作确定其中的候选者。
其中,ATMVP操作是一种运动矢量预测机制。ATMVP技术的基本思想是通过获取当前图像块(例如当前CU)内多个子块的运动信息进行运动补偿。ATMVP操作在构建候选列表(例如merge/affine merge候选列表或者AMVP候选列表)中引入当前图像块内多个子块的运动信息作为候选。ATMVP技术的实现大致可以分为两个步骤。第一步,通过扫描当前图像块的候选运动矢量列表或当前图像块的相邻图象块的运动矢量,确定一个时域矢量;第二步,将当前图像块划分为N×N的子块(例如sub-CU),根据第一步获取的时域矢量确定各个子块在参考帧中的对应块,并根据各个子块在参考帧中对应块的运动矢量,确定各个子块的运动矢量。
例如,在构建运动矢量第一候选列表时,可以将根据ATMVP操作确定的运动矢量作为候选者(例如作为第一个候选者)加入列表中。在构建运动矢量第二候选列表时,可以根据TMVP操作确定的运动矢量作为候选者加入列表中。例如,根据TMVP操作所确定的时域候选运动矢量可以作为候选者同时加入常规Merge候选列表和AMVP候选列表中。又例如,根据TMVP操作所确定的时域候选运动矢量可以作为候选者加入常规Merge候选列表或者加入AMVP候选列表中。
在一个示例中,TMVP操作包括:在时域邻近的图像中确定当前图像块的相关块;根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量。
在一个示例中,ATMVP操作包括:在时域邻近的图像中确定当前图像块的相关块;将所述当前图像块划分成多个子图像块;在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的子图像块的时域候选运动矢量。
其中,TMVP操作和ATMVP操作中提到的时域邻近的图像可以为与当前图像块所在的图像时间距离最近的参考图像;或,该时域邻近的图像可以为编解码端预设的参考图像;或,该时域邻近的图像可以为当前图像块的参考图像为在视频参数集、序列头、序列参数集、图像头、图像参数集、条带头中指定的参考图像。一个示例中,该时域邻近的图像可以为当前图像块的同位帧。同位帧即为在条带级信息头中设定的用于获取运动信息进行预测的 帧。在一些应用场景中,该同位帧也被称为位置相关帧(collocated picture)。
一个示例中,当前图像块的相关块可以为当前图像块的同位块。在一些视频编解码标准中,相关块可以被称为collocated block或者corresponding block。其中,同位块可以是同位帧中与当前图像块具有相同位置的图像块,或者是同位帧中与当前图像块的位置具有相同位置差的图像块。
其中,TMVP操作中和ATMVP操作中确定当前图像块的相关块的方法可以相同,也可以不同。
在一个示例中,TMVP操作中和ATMVP操作中确定当前图像块的相关块的方法相同,均包括:将当前图像块中的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块;或者,将所述当前图像块的空域相邻的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块。
在ATMVP操作和TMVP操作的第一步的一个示例中,均通过对运动矢量merge候选列表中当前已加入的所有空域候选运动矢量进行扫描,来确定当前图像块的相关块。相比该示例,采用“将当前图像块中的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块;或者,将所述当前图像块的空域相邻的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块”的方式可以对TMVP操作中和ATMVP操作中的冗余操作进行简化。
其中,当前图像块的相关块的大小可以和当前图像块的大小相同,或者,当前图像块的相关块的大小为默认值。
其中,当前图像块中的指定位置可以是当前图像块中的任意一个位置,例如,可以是当前图像块的左上角点、右上角点、中心点、左下角点、右下角点中的任意一个。当前图像块中的空域相邻的指定位置,指的是在当前图像中除当前图像块之外的某个指定位置,例如是与当前图像块邻近的指定位置。
以上述的指定位置为当前图像块左上角点为例,在时域邻近的图像中存在一个与当前图像块的左上角点的位置相同的像素点,可以将该像素点所在的已编/解码块作为当前图像块的相关块,或者,也可以将以该像素点为左上角点,且大小与当前图像块的大小相同或者大小为预设大小的图像块作为当 前图像块的相关块。
在ATMVP操作的一个示例中,对子图像块的大小进行帧级自适应的设置,子图像块的大小默认为4×4,当满足一定条件时,子图像块的大小被设置为8×8。例如,在编码端,在编码当前图像块时,计算同一时域层的上一个编码图像块进行ATMVP模式编码时CU中的各个子图像块的平均块大小,当平均块大小大于阈值,当前图像块的子图像块的尺寸被设置为8×8,否则使用默认值4×4。目前,在新一代视频编码标准(Versatile Video Coding,VVC)中,是以8×8的大小对运动矢量进行存储。应理解,当将子图像块的大小设置为4×4,该子图像块的运动矢量的大小(也为4×4)不符合当前标准中运动矢量的存储粒度。此外,在该ATMVP操作的示例中,编码当前图像块时,还需要存储同一时域层的上一个已编码图像块的子图像块的大小的信息。
在ATMVP操作的另一个示例中,当前图像块为一个CU,将其划分之后得到的子图像块可称为sub-CU。可选地,子图像块的大小和/或子图像块的相关块的大小固定为大于或等于64个像素。可选地,子图像块的大小和/或子图像块的相关块的大小均固定为8×8个像素。将当前图像块的子图像块的大小固定设置为8×8,一方面可以适应视频标准VVC中规定的运动矢量的存储粒度,另一方面,无需存储上一个已编码图像块的子图像块的大小的信息,因此,可以节省存储空间。
应理解,在保证子图像块的大小和/或子图像块的相关块的大小固定为等于64个像素的前提下,子图像块的大小和/或子图像块的相关块的大小还可以为别的尺寸,例如子图像块的大小和/或子图像块的相关块的大小为A×B,A≤64,B≤64,A和B均为4的整数。例如,子图像块的大小和/或子图像块的相关块的大小为4×16个像素,或者为16×4个像素。
在一些实现方式中,运动矢量的存储粒度也可以不是8×8,而是其他数值。可选的,当前图像块的子图像块的大小设置为与运动矢量的粒度相同,都为x×y,该x和y为正整数。
在一些实现方式中,在确定当前块的运动矢量第一候选列表和/或运动矢量第二候选列表中的候选者时,在当前图像块的尺寸满足预设条件时,根据ATMVP操作来确定当前图像块在运动矢量第一候选列表和/或运动矢量第二 候选列表中的时域候选运动矢量。
在一些实现方式中,在确定当前块的运动矢量第一候选列表和/或运动矢量第二候选列表中的候选者时,在当前图像块的尺寸不满足预设条件时,关闭ATMVP操作,也即不根据ATMVP操作来确定当前图像块在运动矢量第一候选列表和/或运动矢量第二候选列表中的时域候选运动矢量。
在一些实现方式中,在确定每一个图像块的运动矢量第一候选列表和/或运动矢量第二候选列表中的候选者时,在当前图像块的尺寸满足预设条件时,根据TMVP操作来确定当前图像块在运动矢量第一候选列表和/或运动矢量第二候选列表中的时域候选运动矢量。
在一些实现方式中,在确定每一个图像块的运动矢量第一候选列表和/或运动矢量第二候选列表中的候选者时,在当前图像块的尺寸不满足预设条件时,关闭TMVP操作,也即根据TMVP操作来确定当前图像块在运动矢量第一候选列表和/或运动矢量第二候选列表中的时域候选运动矢量。
其中,在上述提到的四种实现方式的至少一种实现方式中,预设条件可以包括一个条件,或者包括多个条件的组合。例如,在当前图像块的尺寸满足第一条件组时,才根据ATMVP操作来确定当前图像块的时域候选运动矢量。在当前图像块的尺寸满足第二条件组时,才根据TMVP操作来确定当前图像块的时域候选运动矢量。其中,第一条件组中的条件数量为至少1个。其中,第二条件组的条件数量为至少1个。其中,第一条件组和第二条件组可以完全相同,或者完全不同,或者部分相同。
在一个示例中,当前图像块的尺寸为x1×y1,当前图像块的子图像块的默认设置尺寸为x2×y2,其中,x1、x2、y1、y2均为正整数;预设条件包括:x1不小于x2,和/或,y1不小于y2。例如,当x1不小于x2,和/或,y1不小于y2时,根据ATMVP操作和/或TMVP操作来确定当前图像块的时域候选运动矢量。
在一个示例中,当x1小于或小于等于x2,和/或,y1小于或小于等于y2时,设置不执行所述ATMVP操作。例如,在构建上述的运动矢量第一候选列表时,加入的运动矢量候选者中没有候选者是根据ATMVP操作确定的。
在一个示例中,当前图像块的尺寸为x1×y1,预设尺寸为x3×y3;其中,x1、x3、y1、y3均为正整数;预设条件包括:x1不小于x3,和/或,y1不小 于y3。例如,当x1小于或小于等于x3,和/或,y1小于或小于等于y3时,根据ATMV操作和/或TMVP操作来确定当前图像块的时域候选运动矢量。
在一个示例中,当x1小于或小于等于x3,和/或,y1小于或小于等于y3时,设置不执行所述TMVP操作。跳过TMVP操作由于在硬件设计中,尽量要求相同大小的处理区域完成编码或解码的时间一致,因而对于包含较多小块的区域,需要的流水时间就远远超过了其他区域;因此,节省小块的流水时间对于硬件的并行处理十分有意义。在当前图像块的尺寸较小时,跳过TMVP操作可以节省小块的流水时间。除此之外,目前编码技术中对于时域相关性的利用率越来越高,许多时域预测技术被采纳,如ATMVP,因此对小块而言,跳过TMVP带来的性能影响可以忽略不计。
本文中在提到不执行TMVP操作或跳过TMVP操作时,在第一类模式和第二类模式运动信息候选列表均采用TMVP操作来确定候选者的情况中,可以仅在第一类模式或仅在第二类模式的运动信息候选列表中跳过TMVP操作;或者,可以在第一类模式和第二类模式的运动信息候选列表中均跳过TMVP操作。
在一个示例中,运动矢量的存储粒度为所述x3×y3。
在一个示例中,当当前图像块的尺寸与第一默认尺寸相同时,仅执行所述TMVP操作和所述ATMVP操作中的一个。例如,设置不进行ATMVP操作或者不进行TMVP操作。例如,仅执行ATMVP操作。例如,设置不进行TMVP操作。一个示例中,该第一默认尺寸可以和运动矢量的存储粒度的尺寸相同。在当前图像块的尺寸与第一默认尺寸相同时,ATMVP技术与TMVP技术存在一定冗余,两个技术均为当前图像块导出一组时域运动信息,因此,设置不进行其中一种操作,跳过了部分冗余操作,可以有效节省编解码的时间。一些实现方式中,ATMVP在导出时使用了当前图像块已有的merge列表中的候选运动矢量,而TMVP则是直接从邻近已编码图像中的固定位置导出候选运动矢量,在这种情况下,一定程度上ATMVP技术导出的运动矢量比TMVP技术的更为有效与自适应,因此设置不进行TMVP操作。
在一个示例中,当当前图像块包含一个运动矢量的存储粒度时,根据ATMVP操作和/或TMVP操作来确定当前图像块的时域候选运动矢量。也即当当前图像块的尺寸能够涵盖一个运动矢量的存储粒度时,根据ATMVP操 作和/或TMVP操作来确定当前图像块的时域候选运动矢量。
在一个示例中,当当前图像块无法包含一个运动矢量的存储粒度或者当前图像块的尺寸等于一个运动矢量的存储粒度时,设置不进行TMVP操作。TMVP技术导出一组时域运动信息,当当前图像块无法包含一个运动矢量的存储粒度时,可能出现导出的运动矢量相同的结果,引起不必要的划分操作。
在一个示例中,预设条件包括:所述当前图像块的像素数量大于或大于等于预置数值。在一个示例中,当所述当前图像块的像素数量小于或小于等于所述预置数值时,不执行所述TMVP操作和/或所述ATMVP操作。例如,该预置数值可以是32或者是64。
在一个具体例子中,在当前CU块的宽或高小于8的情况、当前CU块的宽、高均等于8的情况以及当前CU块的宽或高小于8或当前CU块的宽、高均等于8的情况下,设置不进行TMVP操作,由于跳过了部分冗余操作,可以有效节省编解码的时间。
在一些实现方式中,对于构建当前图像块的运动矢量第二候选列表的过程中,关闭TMVP操作过程,也即不采用TMVP技术来确定加入该运动矢量第二候选列表的时域候选运动矢量。考虑到加入的与时域运动信息相关的其他操作,如ATMVP操作、HMVP操作等,目前构建过程中的TMVP技术效果大大减小,其与以上这些技术存在一定冗余,即某些情况下可能导出相同的运动信息,导致候选列表构建过程过于冗杂、低效。在一个示例中,在对当前图像块构建merge候选运动矢量列表的过程中关闭TMVP过程。在这种情况下,时域相关信息仍得到了有效利用,且merge candidate list构建流程更为简单,一定程度上可以降低编解码端复杂度。
在运动补偿预测阶段,以往主流的视频编码标准只应用了平移运动模型。而在现实世界中,有太多种运动形式,如放大/缩小,旋转,远景运动和其他不规则运动。为了提高帧间预测的效率,可以在编解码技术中引入仿射变换(affine)运动补偿模型。仿射变换运动补偿会通过一组控制点(control point)的MV来描述图像块的仿射运动场。一个示例中,仿射变换运动补偿模型采用的是四参Affine模型,则该组控制点包括两个控制点(例如图像块的左上角点和右上角点)。一个示例中,仿射变换运动补偿模型采用的是六参Affine模型,则该组控制点包括三个控制点(例如图像块的左上角点、右上角点和 左下角点)。
一种实现方式中,在构建运动矢量第一候选列表时,加入的候选者可以是一组控制点的MV,或者称为控制点预测运动矢量(CPMVP,Control point motion vector prediction)。可选的,运动矢量第一候选列表可用于Merge模式中,具体的,可以称为Affine Merge模式;相对应的,该运动矢量第一候选列表可以称为affine merge candidate list。在Affine Merge模式中,直接使用运动矢量第一候选列表中的预测作为当前图像块的CPMV(Control point motion vector),也即不需要进行affine运动估计过程。
一种实现方式中,可将根据ATMVP技术确定的候选者加入到运动矢量第一候选列表中。
其中,一个示例中,将当前图像块的相关块的控制点运动矢量组作为候选者加入到运动矢量第一候选列表中。在采用运动矢量第一列表中该候选者进行预测时,根据当前图像块的相关块的控制点运动矢量组对该当前图像块进行预测。
其中,一个示例中,如上文所描述的,将当前图像块的相关块的代表运动矢量作为候选者加入到运动矢量第一候选列表中。进一步,可选的,还标记该候选者为根据ATMVP技术确定的。当采用运动矢量第一候选列表中该候选者进行预测时,根据该标记和候选者确定当前图像块的相关块,将当前图像块和该相关块采用相同的方式划分成多个子图像块,当前图像块中的各子图像块与所述相关块中的各子图像块一一对应;根据该相关块中各子图像块的运动矢量分别对当前图像块中对应的子图像块的运动矢量进行预测。
其中,可选的,当相关块中出现运动矢量不可获得的子图像块时,采用该相关块的代表运动矢量替代该不可获得的运动矢量,对当前图像块中对应的子图像块进行预测。可选的,当相关块的代表运动矢量均不可获得时,放弃将根据ATMVP技术确定的候选者加入到该运动矢量第二候选列表中。一种示例中,当相关块中的子图像块不可获得,或者相关块中的子图像块采用帧内编码模式时,确定该相关块中出现不可获得运动矢量的子图像块。
其中,可选的,运动矢量第一候选列表中每个候选者包括一组控制点的运动矢量;在将当前图像块的相关块的代表运动矢量加入运动矢量第一候选列表中时,为保证数据格式的一致性,可将该相关块的代表运动矢量插入为 候选者中的每一个控制点的运动矢量(也即该候选者中的每个控制点的运动矢量都赋值为该相关块的代表运动矢量)。
其中,可选的,当前图像块的相关块的代表运动矢量可以指的是该相关块的中心位置的运动矢量,或者其他代表该相关块的运动矢量,在此不做限制。
如图1所示,本申请实施例还提供一种视频图像处理方法,该方法包括如下步骤。
S110,确定当前图像块。
S120,当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量。
其中,所述TMVP操作包括:在时域邻近的图像中确定当前图像块的相关块;根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量。
其中,所述ATMVP操作包括:在时域邻近的图像中确定当前图像块的相关块;将所述当前图像块划分成多个子图像块;在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的时域候选运动矢量。
关于图1所示视频图像处理方法可以参考上文描述,在此不再赘述。
上文结合图1描述了本申请的方法实施例,下文将描述图1所示的方法实施例对应的装置实施例。应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见前面方法实施例,为了简洁,这里不再赘述。
图2为本申请实施例提供的视频图像处理装置200的示意性框图。该装置900用于执行如图7所示的方法实施例。该装置200包括如下单元。
第一确定模块210,用于确定当前图像块;
第二确定模块220,用于当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量。
其中,所述TMVP操作包括:在时域邻近的图像中确定当前图像块的相关块;根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量。
其中,所述ATMVP操作包括:在时域邻近的图像中确定当前图像块的相关块;将所述当前图像块划分成多个子图像块;在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的时域候选运动矢量。
应理解,本实施例中的第一确定模块和第二确定模块可以由处理器实现。
本申请实施例还提供一种视频图像处理装置。装置可以用于执行上文描述的方法实施例。装置包括处理器、存储器,存储器用于存储指令,处理器用于执行存储器存储的指令,并且对存储器中存储的指令的执行使得处理器用于执行根据上文方法实施例的方法。
可选地,该装置还可以包括通信接口,用于与外部设备进行通信。例如,处理器用于控制通信接口接收和/或发送信号。
本申请提供的装置可以应用于编码器,也可以应用于解码器。
图3为本申请提供的编码设备或解码设备(简称为译码设备1100)的一种实现方式的示意性框图。其中,译码设备1100可以包括处理器1110、存储器1130和总线系统1150。其中,处理器和存储器通过总线系统相连,该存储器用于存储指令,该处理器用于执行该存储器存储的指令。编码设备的存储器存储程序代码,且处理器可以调用存储器中存储的程序代码执行本申请描述的各种视频编码或解码方法,尤其是本申请描述的帧间预测方法。为避免重复,这里不再详细描述。
在本申请实施例中,该处理器1110可以是CPU,该处理器1110还可以是其他通用处理器、DSP、ASIC、FPGA或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器1130可以包括ROM或者RAM。任何其他适宜类型的存储设备也可以用作存储器1130。存储器1130可以包括由处理器1110使用总线1150访问的代码和数据1131。存储器1130可以进一步包括操作系统1133和应用程序1135,该应用程序1135包括允许处理器1110执行本申请描述的视频编码或解码方法(尤其是本申请描述的帧间预测方法)的至少一个程序。例如,应用程序1135可以包括应用1至N,其进一步包括执行在本申请描述的视频编码或解码方法的视频编码或解码应用(简称视频译码应用)。
该总线系统1150除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统1150。
可选的,译码设备1100还可以包括一个或多个输出设备,诸如显示器1170。在一个示例中,显示器1170可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器1170可以经由总线1150连接到处理器1110。
本申请实施例还提供一种计算机存储介质,其上存储有计算机程序,计算机程序被计算机执行时使得计算机执行上文方法实施例提供的方法。
本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行执行上文方法实施例提供的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (48)

  1. 一种视频图像处理方法,其特征在于,包括:
    确定当前图像块;
    当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量;其中,
    所述TMVP操作包括:
    在时域邻近的图像中确定当前图像块的相关块;
    根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量;
    所述ATMVP操作包括:
    在时域邻近的图像中确定当前图像块的相关块;
    将所述当前图像块划分成多个子图像块;
    在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;
    根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的子图像块的时域候选运动矢量。
  2. 根据权利要求1所述的视频图像处理方法,其特征在于,在所述TMVP操作和所述所述ATMVP操作中,所述时域邻近的图像中确定当前图像块的相关块,包括:
    将当前图像块中的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块;或者,
    将所述当前图像块的空域相邻的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块。
  3. 根据权利要求2所述的视频图像处理方法,其特征在于,所述当前图像块的指定位置包括以下其中一个位置:
    所述当前图像块的左上角点;
    所述当前图像块的右上角点;
    所述当前图像块的中心点;
    所述当前图像块的左下角点;
    所述当前图像块的右下角点。
  4. 根据权利要求1至3任一项所述的视频图像处理方法,其特征在于,所述时域邻近的图像为所述当前图像块的同位帧。
  5. 根据权利要求1至4任一项所述的视频图像处理方法,其特征在于,所述当前图像块的相关块为所述当前图像块的同位块。
  6. 根据权利要求1所述的视频图像处理方法,其特征在于,其中,所述子图像块的尺寸默认设置为与运动矢量存储粒度的尺寸相同。
  7. 根据权利要求6所述的视频图像处理方法,其特征在于,所述运动矢量存储粒度的尺寸为8×8,所述子图像块的尺寸默认设置为8×8。
  8. 根据权利要求1所述的视频图像处理方法,其特征在于,所述当前图像块的尺寸为x1×y1,所述当前图像块的子图像块的默认设置尺寸为x2×y2,其中,x1、x2、y1、y2均为正整数;
    所述预设条件包括:x1不小于x2,和/或,y1不小于y2。
  9. 根据权利要求8所述的视频图像处理方法,其特征在于,
    所述方法还包括:
    当x1小于或小于等于x2,和/或,y1小于或小于等于y2时,设置不进行所述ATMVP操作。
  10. 根据权利要求1所述的视频图像处理方法,其特征在于,所述方法还包括:
    当所述当前图像块的尺寸与第一默认尺寸相同时,仅执行所述TMVP操作和所述ATMVP操作中的一个。
  11. 根据权利要求10所述的视频图像处理方法,其特征在于,所述第一默认尺寸与所述运动矢量存储粒度的尺寸相同。
  12. 根据权利要求10所述的视频图像处理方法,其特征在于,当所述当前图像块的尺寸与所述第一默认尺寸相同时,仅执行所述TMVP操作和所述ATMVP操作中的所述ATMVP操作。
  13. 根据权利要求1所述的视频图像处理方法,其特征在于,所述当前图像块的尺寸为x1×y1,预设尺寸为x3×y3;其中,x1、x3、y1、y3均为正整数;
    所述预设条件包括:x1不小于x3,和/或,y1不小于y3。
  14. 根据权利要求13所述的视频图像处理方法,其特征在于,运动矢量的存储粒度为所述x3×y3。
  15. 根据权利要求13所述的视频图像处理方法,其特征在于,所述方法还包括:
    当x1小于或小于等于x3,和/或,y1小于或小于等于y3时,不执行所述TMVP操作。
  16. 根据权利要求1所述的视频图像处理方法,其特征在于,所述预设条件包括:
    所述当前图像块包含一个运动矢量的存储粒度。
  17. 根据权利要求16所述的视频图像处理方法,其特征在于,所述方法还包括:
    当所述当前图像块没有包含一个运动矢量的存储粒度或者所述当前图像块的尺寸等于一个运动矢量的存储粒度时,不执行所述TMVP操作。
  18. 根据权利要求1所述的视频图像处理方法,其特征在于,所述预设条件包括:
    所述当前图像块的像素数量大于或大于等于预置数值。
  19. 根据权利要求18所述的视频图像处理方法,其特征在于,所述方法还包括:
    当所述当前图像块的像素数量小于或小于等于所述预置数值时,不执行所述TMVP操作和/或所述ATMVP操作。
  20. 根据权利要求19所述的视频图像处理方法,其特征在于,所述预置数值为32或者和64。
  21. 根据权利要求1所述的视频图像处理方法,其特征在于,所述方法还包括:
    当采用ATMVP操作确定所述当前图像块的时域候选运动矢量,和/或,采用HMVP操作确定所述当前图像块的候选运动矢量时,不执行所述TMVP操作。
  22. 根据权利要求1至21任一项所述的视频图像处理方法,其特征在于,根据所述TMVP操作所确定的所述当前图像块的时域候选运动矢量为 Merge候选列表和/或高级运动矢量预测AMVP候选列表中的候选运动矢量。
  23. 根据权利要求22所述的视频图像处理方法,根据所述TMVP操作所确定的所述当前图像块的时域候选运动矢量为普通Merge候选列表和/或高级运动矢量预测AMVP候选列表中的候选运动矢量。
  24. 一种视频图像处理装置,其特征在于,包括:存储器与处理器,所述存储器用于存储指令,所述处理器用于执行所述存储器存储的指令,并且对所述存储器中存储的指令的执行使得,所述处理器用于:
    用于确定当前图像块;
    用于当所述当前图像块的尺寸满足预设条件时,根据时域运动矢量预测TMVP操作,和/或,高级/可选时域运动矢量预测ATMVP操作确定所述当前图像块的时域候选运动矢量;其中,
    所述TMVP操作包括:
    在时域邻近的图像中确定当前图像块的相关块;
    根据所述相关块的运动矢量确定所述当前图像块的时域候选运动矢量;
    所述ATMVP操作包括:
    在时域邻近的图像中确定当前图像块的相关块;
    将所述当前图像块划分成多个子图像块;
    在所述相关块中确定所述多个子图像块中每个的子图像块对应的子相关块;
    根据所述每个子图像块对应的子相关块的运动矢量确定所述当前图像块的子图像块的时域候选运动矢量。
  25. 根据权利要求24所述的视频图像处理装置,其特征在于,在所述TMVP操作和所述所述ATMVP操作中,所述时域邻近的图像中确定当前图像块的相关块,包括:
    将当前图像块中的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块;或者,
    将所述当前图像块的空域相邻的指定位置在所述时域邻近的图像中相同位置处的图像块确定为所述当前图像块的相关块。
  26. 根据权利要求25所述的视频图像处理装置,其特征在于,所述当 前图像块的指定位置包括以下其中一个位置:
    所述当前图像块的左上角点;
    所述当前图像块的右上角点;
    所述当前图像块的中心点;
    所述当前图像块的左下角点;
    所述当前图像块的右下角点。
  27. 根据权利要求24至26任一项所述的视频图像处理装置,其特征在于,所述时域邻近的图像为所述当前图像块的同位帧。
  28. 根据权利要求24至27任一项所述的视频图像处理装置,其特征在于,所述当前图像块的相关块为所述当前图像块的同位块。
  29. 根据权利要求24所述的视频图像处理装置,其特征在于,其中,所述子图像块的尺寸默认设置为与运动矢量存储粒度的尺寸相同。
  30. 根据权利要求29所述的视频图像处理装置,其特征在于,所述运动矢量存储粒度的尺寸为8×8,所述子图像块的尺寸默认设置为8×8。
  31. 根据权利要求24所述的视频图像处理装置,其特征在于,所述当前图像块的尺寸为x1×y1,所述当前图像块的子图像块的默认设置尺寸为x2×y2,其中,x1、x2、y1、y2均为正整数;
    所述预设条件包括:x1不小于x2,和/或,y1不小于y2。
  32. 根据权利要求31所述的视频图像处理装置,其特征在于,
    所述处理器还用于:
    当x1小于或小于等于x2,或者y1小于或小于等于y2时,设置不进行所述ATMVP操作。
  33. 根据权利要求24所述的视频图像处理装置,其特征在于,所述处理器还用于:
    当所述当前图像块的尺寸与第一默认尺寸相同时,仅执行所述TMVP操作和所述ATMVP操作中的一个。
  34. 根据权利要求33所述的视频图像处理装置,其特征在于,所述第一默认尺寸与所述运动矢量存储粒度的尺寸相同。
  35. 根据权利要求33所述的视频图像处理装置,其特征在于,所述处理器还用于:
    当所述当前图像块的尺寸与所述第一默认尺寸相同时,仅执行所述TMVP操作和所述ATMVP操作中的所述ATMVP操作。
  36. 根据权利要求24所述的视频图像处理装置,其特征在于,所述当前图像块的尺寸为x1×y1,预设尺寸为x3×y3;其中,x1、x3、y1、y3均为正整数;
    所述预设条件包括:x1不小于x3,和/或,y1不小于y3。
  37. 根据权利要求36所述的视频图像处理装置,其特征在于,运动矢量的存储粒度为所述x3×y3。
  38. 根据权利要求36所述的视频图像处理装置,其特征在于,所述处理器还用于:
    当x1小于或小于等于x3,和/或,y1小于或小于等于y3时,不执行所述TMVP操作。
  39. 根据权利要求24所述的视频图像处理装置,其特征在于,所述预设条件包括:
    所述当前图像块包含一个运动矢量的存储粒度。
  40. 根据权利要求39所述的视频图像处理装置,其特征在于,所述处理器还用于:
    当所述当前图像块没有包含一个运动矢量的存储粒度或者所述当前图像块的尺寸等于一个运动矢量的存储粒度时,不执行所述TMVP操作。
  41. 根据权利要求24所述的视频图像处理装置,其特征在于,所述预设条件包括:
    所述当前图像块的像素数量大于或大于等于预置数值。
  42. 根据权利要求41所述的视频图像处理装置,其特征在于,所述处理器还用于:
    当所述当前图像块的像素数量小于或小于等于所述预置数值时,不执行所述TMVP操作和/或所述ATMVP操作。
  43. 根据权利要求41所述的视频图像处理装置,其特征在于,所述预置数值为32或者和64。
  44. 根据权利要求24所述的视频图像处理装置,其特征在于,所述处理器还用于:
    当采用ATMVP操作确定所述当前图像块的时域候选运动矢量,和/或,采用HMVP操作确定所述当前图像块的候选运动矢量时,不执行所述TMVP操作。
  45. 根据权利要求24至44任一项所述的视频图像处理装置,其特征在于,根据所述TMVP操作所确定的所述当前图像块的时域候选运动矢量为Merge候选列表和/或高级运动矢量预测AMVP候选列表中的候选运动矢量。
  46. 根据权利要求45所述的视频图像处理装置,根据所述TMVP操作所确定的所述当前图像块的时域候选运动矢量为普通Merge候选列表和/或高级运动矢量预测AMVP候选列表中的候选运动矢量。
  47. 一种计算机非易失性存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行如权利要求1至23中任一项所述的方法。
  48. 一种包含指令的计算机程序产品,其特征在于,所述指令被计算机执行时使得计算机执行如权利要求1至23中任一项所述的方法。
PCT/CN2019/077893 2019-01-03 2019-03-12 视频图像处理方法与装置 WO2020140331A1 (zh)

Priority Applications (9)

Application Number Priority Date Filing Date Title
JP2021534799A JP7224005B2 (ja) 2019-01-03 2019-03-12 映像処理方法、装置、不揮発性記憶媒体、コンピュータプログラム
EP19907394.1A EP3908000A4 (en) 2019-01-03 2019-03-12 VIDEO IMAGE PROCESSING METHOD AND DEVICE
CN201980005714.2A CN111357288B (zh) 2019-01-03 2019-03-12 视频图像处理方法与装置
KR1020217024546A KR20210107120A (ko) 2019-01-03 2019-03-12 비디오 이미지 처리 방법 및 장치
US17/060,011 US11206422B2 (en) 2019-01-03 2020-09-30 Video image processing method and device
US17/645,143 US11689736B2 (en) 2019-01-03 2021-12-20 Video image processing method and device
JP2023012074A JP7393061B2 (ja) 2019-01-03 2023-01-30 映像処理方法及び符号化器
US18/341,246 US20230345036A1 (en) 2019-01-03 2023-06-26 Video image processing method and device
JP2023195147A JP2024012636A (ja) 2019-01-03 2023-11-16 映像処理方法及び符号化器

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2019/070315 2019-01-03
PCT/CN2019/070315 WO2020140243A1 (zh) 2019-01-03 2019-01-03 视频图像处理方法与装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/060,011 Continuation US11206422B2 (en) 2019-01-03 2020-09-30 Video image processing method and device

Publications (1)

Publication Number Publication Date
WO2020140331A1 true WO2020140331A1 (zh) 2020-07-09

Family

ID=71196656

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2019/070315 WO2020140243A1 (zh) 2019-01-03 2019-01-03 视频图像处理方法与装置
PCT/CN2019/077893 WO2020140331A1 (zh) 2019-01-03 2019-03-12 视频图像处理方法与装置

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/070315 WO2020140243A1 (zh) 2019-01-03 2019-01-03 视频图像处理方法与装置

Country Status (6)

Country Link
US (5) US11206422B2 (zh)
EP (1) EP3908000A4 (zh)
JP (3) JP7224005B2 (zh)
KR (1) KR20210107120A (zh)
CN (4) CN111357290B (zh)
WO (2) WO2020140243A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020140243A1 (zh) * 2019-01-03 2020-07-09 北京大学 视频图像处理方法与装置
CN110809161B (zh) * 2019-03-11 2020-12-29 杭州海康威视数字技术股份有限公司 运动信息候选者列表构建方法及装置
EP4062635A4 (en) 2019-12-26 2022-12-28 ByteDance Inc. CONSTRAINTS ON SIGNALING VIDEO LAYERS IN ENCODED BITSTREAMS
WO2021134018A1 (en) 2019-12-26 2021-07-01 Bytedance Inc. Signaling of decoded picture buffer parameters in layered video
KR20220113404A (ko) 2019-12-27 2022-08-12 바이트댄스 아이엔씨 비디오 서브픽처들을 시그널링하기 위한 신택스
WO2022146507A1 (en) 2020-12-30 2022-07-07 Arris Enterprises Llc System to dynamically detect and enhance classifiers for low latency traffic
WO2023092256A1 (zh) * 2021-11-23 2023-06-01 华为技术有限公司 一种视频编码方法及其相关装置
WO2023132679A1 (ko) * 2022-01-06 2023-07-13 엘지전자 주식회사 세컨더리 리스트를 이용하는 인터 예측 방법 및 장치
CN115633216B (zh) * 2022-09-05 2024-05-28 北京智源人工智能研究院 时域运动一致性视频生成模型的训练方法和视频生成方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101573985A (zh) * 2006-11-03 2009-11-04 三星电子株式会社 用于视频预测编码的方法和装置以及用于视频预测解码的方法和装置
CN102215395A (zh) * 2010-04-09 2011-10-12 华为技术有限公司 一种视频编解码方法和装置
US20170034512A1 (en) * 2011-07-18 2017-02-02 Zii Labs Inc. Ltd. Systems and Methods with Early Variance Measure Used to Optimize Video Encoding
CN109076236A (zh) * 2016-05-13 2018-12-21 高通股份有限公司 用于视频译码的运动矢量预测的合并候选项

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100556450B1 (ko) * 1998-05-28 2006-05-25 엘지전자 주식회사 움직임 벡터 추정에 의한 오류 복원 방법
CN101188772B (zh) * 2006-11-17 2010-09-22 中兴通讯股份有限公司 一种视频解码的时域错误隐蔽方法
CN101350928A (zh) 2008-07-29 2009-01-21 北京中星微电子有限公司 一种运动估计方法及装置
CN101873500B (zh) * 2009-04-24 2012-05-23 华为技术有限公司 帧间预测编码方法、帧间预测解码方法及设备
CN101841712A (zh) * 2010-04-28 2010-09-22 广西大学 面向全景视频编码的b帧扩展直接模式
CN102148990B (zh) * 2011-04-28 2012-10-10 北京大学 一种运动矢量预测装置和方法
CN104079944B (zh) 2014-06-30 2017-12-01 华为技术有限公司 视频编码的运动矢量列表构建方法和系统
WO2016008157A1 (en) 2014-07-18 2016-01-21 Mediatek Singapore Pte. Ltd. Methods for motion compensation using high order motion model
US11477477B2 (en) * 2015-01-26 2022-10-18 Qualcomm Incorporated Sub-prediction unit based advanced temporal motion vector prediction
US11330284B2 (en) * 2015-03-27 2022-05-10 Qualcomm Incorporated Deriving motion information for sub-blocks in video coding
US10271064B2 (en) 2015-06-11 2019-04-23 Qualcomm Incorporated Sub-prediction unit motion vector prediction using spatial and/or temporal motion information
CN104935938B (zh) 2015-07-15 2018-03-30 哈尔滨工业大学 一种混合视频编码标准中帧间预测方法
US10368083B2 (en) * 2016-02-15 2019-07-30 Qualcomm Incorporated Picture order count based motion vector pruning
US10721489B2 (en) 2016-09-06 2020-07-21 Qualcomm Incorporated Geometry-based priority for the construction of candidate lists
US10812791B2 (en) * 2016-09-16 2020-10-20 Qualcomm Incorporated Offset vector identification of temporal motion vector predictor
KR102448635B1 (ko) 2016-09-30 2022-09-27 후아웨이 테크놀러지 컴퍼니 리미티드 비디오 인코딩 방법, 비디오 디코딩 방법, 및 단말
US20180199057A1 (en) * 2017-01-12 2018-07-12 Mediatek Inc. Method and Apparatus of Candidate Skipping for Predictor Refinement in Video Coding
JP7382332B2 (ja) 2017-11-01 2023-11-16 ヴィド スケール インコーポレイテッド マージモード用のサブブロック動き導出およびデコーダサイド動きベクトル精緻化
WO2020004990A1 (ko) 2018-06-27 2020-01-02 엘지전자 주식회사 인터 예측 모드 기반 영상 처리 방법 및 이를 위한 장치
US10362330B1 (en) * 2018-07-30 2019-07-23 Tencent America LLC Combining history-based motion vector prediction and non-adjacent merge prediction
CN117956139A (zh) * 2018-08-17 2024-04-30 寰发股份有限公司 视频编解码的帧间预测方法及装置
US11184635B2 (en) * 2018-08-31 2021-11-23 Tencent America LLC Method and apparatus for video coding with motion vector constraints
US10904549B2 (en) 2018-12-13 2021-01-26 Tencent America LLC Method and apparatus for signaling of multi-hypothesis for skip and merge mode and signaling of distance offset table in merge with motion vector difference
WO2020140243A1 (zh) * 2019-01-03 2020-07-09 北京大学 视频图像处理方法与装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101573985A (zh) * 2006-11-03 2009-11-04 三星电子株式会社 用于视频预测编码的方法和装置以及用于视频预测解码的方法和装置
CN102215395A (zh) * 2010-04-09 2011-10-12 华为技术有限公司 一种视频编解码方法和装置
US20170034512A1 (en) * 2011-07-18 2017-02-02 Zii Labs Inc. Ltd. Systems and Methods with Early Variance Measure Used to Optimize Video Encoding
CN109076236A (zh) * 2016-05-13 2018-12-21 高通股份有限公司 用于视频译码的运动矢量预测的合并候选项

Also Published As

Publication number Publication date
EP3908000A1 (en) 2021-11-10
JP7224005B2 (ja) 2023-02-17
US20230345036A1 (en) 2023-10-26
US11743482B2 (en) 2023-08-29
CN116996683A (zh) 2023-11-03
CN111357290B (zh) 2023-08-22
US11178420B2 (en) 2021-11-16
US20220078466A1 (en) 2022-03-10
JP7393061B2 (ja) 2023-12-06
US20210021858A1 (en) 2021-01-21
CN113905234A (zh) 2022-01-07
JP2022515995A (ja) 2022-02-24
US11689736B2 (en) 2023-06-27
US20210021857A1 (en) 2021-01-21
JP2023052767A (ja) 2023-04-12
JP2024012636A (ja) 2024-01-30
CN111357290A (zh) 2020-06-30
EP3908000A4 (en) 2022-09-28
KR20210107120A (ko) 2021-08-31
US20220116644A1 (en) 2022-04-14
CN113905235A (zh) 2022-01-07
WO2020140243A1 (zh) 2020-07-09
US11206422B2 (en) 2021-12-21

Similar Documents

Publication Publication Date Title
WO2020140331A1 (zh) 视频图像处理方法与装置
US11949912B2 (en) Method and device for video image processing
JP2023053028A (ja) インター予測に基づいて、ビデオ信号を処理するための方法及び装置
WO2020140915A1 (zh) 视频处理方法和装置
US20220232208A1 (en) Displacement vector prediction method and apparatus in video encoding and decoding and device
CN114071135A (zh) 用于在视频编解码中用信号发送合并模式的方法和装置
CN111357288B (zh) 视频图像处理方法与装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907394

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021534799

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20217024546

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019907394

Country of ref document: EP

Effective date: 20210803