WO2019194502A1 - Procédé de traitement d'image basé sur un mode d'inter-prédiction, et appareil associé - Google Patents

Procédé de traitement d'image basé sur un mode d'inter-prédiction, et appareil associé Download PDF

Info

Publication number
WO2019194502A1
WO2019194502A1 PCT/KR2019/003810 KR2019003810W WO2019194502A1 WO 2019194502 A1 WO2019194502 A1 WO 2019194502A1 KR 2019003810 W KR2019003810 W KR 2019003810W WO 2019194502 A1 WO2019194502 A1 WO 2019194502A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
motion vector
picture
prediction
current block
Prior art date
Application number
PCT/KR2019/003810
Other languages
English (en)
Korean (ko)
Inventor
장형문
Original Assignee
엘지전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 엘지전자 주식회사 filed Critical 엘지전자 주식회사
Publication of WO2019194502A1 publication Critical patent/WO2019194502A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present invention relates to a still image or moving image processing method, and more particularly, to a method for encoding / decoding a still image or moving image based on an inter prediction mode and an apparatus supporting the same.
  • Compression coding refers to a series of signal processing techniques for transmitting digitized information through a communication line or for storing in a form suitable for a storage medium.
  • Media such as an image, an image, an audio, and the like may be a target of compression encoding.
  • a technique of performing compression encoding on an image is called video image compression.
  • Next generation video content will be characterized by high spatial resolution, high frame rate and high dimensionality of scene representation. Processing such content will result in a tremendous increase in terms of memory storage, memory access rate, and processing power.
  • An object of the present invention is to propose a restriction method at the CTU boundary for deriving temporal motion vector.
  • TMVP subblock-based temporal motion vector prediction
  • an available space of the current block Deriving a motion vector of a spatial neighboring block; Deriving a collocated block of the current block based on a motion vector of the spatial neighboring block in a collocated picture of the current block; Deriving a motion vector in units of sub-blocks in the current block based on the motion vector of the collocated block; And generating a prediction block of the current block by using the motion vector derived in the sub-block unit.
  • deriving the collocated block may include a picture order.
  • the method may further include scaling a motion vector of the spatial neighboring block based on a picture order count (POC).
  • POC picture order count
  • a picture order count (POC) difference between a first reference picture of the spatial neighboring block and a second reference picture of the block specified by the motion vector of the spatial neighboring block, and a current picture and the collocate.
  • POC picture order count
  • the spatial neighboring block is selected from neighboring blocks based on a boundary of an upper node block of the current block, and the upper node block is selected from the block division structure.
  • An ancestor node block of the current block having a partition depth smaller than that of the current block may be represented.
  • the specific size threshold may be a preset value or a value signaled from an encoder via a Sequence Parameter Set, a Picture Parameter Set, or a Tile Group Header. have.
  • an available space of the current block The motion vector of the spatial neighboring block A spatial candidate derivation unit for deriving; A collocated block derivation unit for deriving a collocated block of the current block based on a motion vector of the spatial neighboring block in a collocated picture of the current block; A sub-block motion vector derivation unit for deriving a motion vector in sub-block units within the current block based on the motion vector of the collocated block; And a prediction block generator for generating a prediction block of a current current block using the motion vector derived in the sub-block unit.
  • the collocated block derivation unit may scale the motion vector of the spatial neighboring block based on a picture order count (POC).
  • POC picture order count
  • the collocated block derivation unit includes a picture order count (POC) difference between a second reference picture of the block specified by the motion reference vector of the spatial neighboring block and the first reference picture of the spatial neighboring block.
  • POC picture order count
  • the spatial neighboring block is selected from neighboring blocks based on a boundary of an upper node block of the current block, and the upper node block is selected from the block division structure.
  • An ancestor node block of the current block having a partition depth smaller than that of the current block may be represented.
  • the specific magnitude threshold is a preset value or It may be a value signaled from an encoder via a parameter set, a picture parameter set, or a tile group header.
  • FIG. 1 is a schematic block diagram of an encoding apparatus in which an encoding of a video / image signal is performed, according to an embodiment to which the present invention is applied.
  • FIG. 2 is a schematic block diagram of a decoding apparatus in which an embodiment of the present invention is applied and decoding of a video / image signal is performed.
  • FIG. 3 is an embodiment to which the present invention may be applied and illustrates one of a multi-type tree structure. It is a figure which shows an example.
  • FIG. 4 is a diagram illustrating a signaling mechanism of partition partition information of a quadtree with nested multi-type tree structure according to an embodiment to which the present invention may be applied.
  • FIG. 5 is a diagram illustrating a method of dividing a CTU into multiple CUs based on a quadtree and a accompanying multi-type tree structure as an embodiment to which the present invention may be applied.
  • FIG. 6 is a diagram illustrating a method of limiting ternary-tree splitting as an embodiment to which the present invention may be applied.
  • FIG. 7 is a diagram illustrating redundant division patterns that may occur in binary tree division and ternary tree division, as an embodiment to which the present invention may be applied.
  • FIG. 8 and 9 illustrate an inter prediction based video / image encoding method and an inter prediction unit in an encoding apparatus according to an embodiment of the present invention.
  • FIGS. 10 and 11 illustrate an inter prediction based video / image decoding method and an inter prediction unit in a decoding apparatus according to an embodiment of the present invention.
  • FIG. 12 is a diagram for describing a neighboring block used in a merge mode or a skip mode as an embodiment to which the present invention is applied.
  • 13 is a flowchart illustrating a merge candidate list construction method according to an embodiment to which the present invention is applied.
  • 14 is a flowchart illustrating a merge candidate list construction method according to an embodiment to which the present invention is applied.
  • FIGS. 15 and 16 are diagrams for describing a method of inducing an advanced temporal motion vector prediction (ATMVP) fubo as an embodiment to which the present invention is applied.
  • ATMVP advanced temporal motion vector prediction
  • FIG. 17 is a diagram illustrating a method of deriving an Advanced Temporal Motion Vector Prediction (ATMVP) candidate according to an embodiment to which the present invention is applied.
  • ATMVP Advanced Temporal Motion Vector Prediction
  • 20 is a diagram for describing a problem occurring in a sub-block unit prediction method using a conventional time motion vector as an embodiment to which the present invention can be applied.
  • FIG. 21 is a diagram illustrating a method of deriving a motion vector of a sub-block unit using motion vectors of a spatial candidate and a temporal candidate as an embodiment to which the present invention is applied.
  • FIG. 22 is a diagram illustrating a memory fetch method for using a motion vector of a time candidate according to an embodiment to which the present invention is applied.
  • FIG. 23 is a diagram illustrating a method of deriving a motion vector of a sub-block unit using motion vectors of a spatial candidate and a temporal candidate as an embodiment to which the present invention is applied.
  • FIG. 24 is an embodiment to which the present invention is applied and shows time movement based on sub-blocks.
  • 25 and 26 are diagrams illustrating a pipeline structure for performing motion compensation using a temporal motion vector as an embodiment to which the present invention is applied.
  • FIG. 27 is a diagram illustrating a method of setting a minimum size block for using a motion vector of a spatial neighboring block as an embodiment to which the present invention is applied.
  • FIG. 28 is a diagram for describing a method of configuring a motion information candidate based on a minimum size block for using a motion vector of a spatial neighboring block according to an embodiment to which the present invention is applied.
  • 29 and 30 are diagrams illustrating sub-block based TMVP (temporal motion vector predction) according to an embodiment to which the present invention is applied.
  • TMVP temporary motion vector predction
  • FIG. 31 is a diagram illustrating a sub-block based TMVP (temporal motion vector predictor) according to an embodiment to which the present invention is applied.
  • TMVP temporary motion vector predictor
  • 32 is a flowchart illustrating a method of generating an inter prediction block according to an embodiment to which the present invention is applied.
  • FIG 33 is a diagram illustrating an inter prediction apparatus according to an embodiment to which the present invention is applied.
  • 35 shows a structure diagram of a content streaming system according to an embodiment to which the present invention is applied.
  • processing unit hereafter
  • the processing unit is referred to as the 'processing block
  • the processing unit has a unit for the luminance (1111113) component and a component for the color difference ( ⁇ ; 1: 0] 113) component. It can be interpreted to include units.
  • the processing unit may correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), that is, a Prediction Unit (PU) or a Transform Unit (TU).
  • CTU Coding Tree Unit
  • CU Coding Unit
  • PU Prediction Unit
  • TU Transform Unit
  • the processing unit may be interpreted as a unit for the luma component or a unit for the chroma component.
  • the processing unit may be a Coding Tree Block (CTB), Coding Block (CB), Prediction Block (PU) or Transform Block (TB) for luma components. May correspond to. Or, it may correspond to a coding tree block (CTB), a coding block (CB), a prediction block (PU), or a transform block (TB) for a chroma component.
  • CTB Coding Tree Block
  • CB Coding Block
  • PU Prediction Block
  • TB Transform Block
  • the present invention is not limited thereto, and the processing unit may be interpreted to include a unit for a luma component and a unit for a chroma component.
  • processing unit is not necessarily limited to square blocks, but may also be configured in a polygonal form having three or more vertices.
  • a pixel, a pixel, and the like are referred to collectively as a sample.
  • using a sample may mean using a pixel value or a pixel value.
  • FIG. 1 is a schematic block diagram of an encoding apparatus in which an encoding of a video / image signal is performed, according to an embodiment to which the present invention is applied.
  • the encoding apparatus 100 may include an image splitter 110, a subtractor 11, a transformer 120, a quantizer 130, an inverse quantizer 140, an inverse transformer 150,
  • the adder 155, the filter 160, the memory 170, the inter predictor 180, the intra predictor 185, and the entropy encoder 19 may be configured to include the inter predictor 180.
  • an intra predictor 18 1 ⁇ 4 may be collectively referred to as a predictor.
  • the predictor may include an inter predictor 180 and an intra predictor 185.
  • a transformer 120 and a quantizer 130 may be used.
  • the inverse quantization unit 140 and the inverse transform unit 150 may be included in the residual processing unit, and the residual processing unit may further include a subtracting unit 115. As an example, the above-described image division may be performed. Installment 110, subtractor 11, transformer 120, quantizer 130, inverse quantizer 140, inverse transformer 150, adder 155, filter 160, inter prediction
  • the unit 180, the intra predictor 185, and the entropy encoder 190 may be configured by one hardware component (eg, an encoder or a processor).
  • the memory 17 may include a decoded picture buffer (DPB) and may be configured by a digital storage medium.
  • DPB decoded picture buffer
  • the image divider 110 may divide an input image (or a picture or a frame) input to the encoding apparatus 100 into one or more processing units.
  • the processing unit may be called a coding unit (CU).
  • the coding unit may be recursively divided according to a quad-tree binary-tree (QTBT) structure from a coding tree unit (CTU) or a largest coding unit (LCU).
  • QTBT quad-tree binary-tree
  • CTU coding tree unit
  • LCU largest coding unit
  • one coding unit may be divided into a plurality of coding units of a deeper map based on a quad tree structure and / or a binary tree structure.
  • the quad tree structure is applied first and the binary tree structure is applied later. Can be.
  • the binary tree structure may be applied first.
  • the coding procedure according to the present invention may be performed based on the final coding unit that is no longer split.
  • the maximum coding unit may be used as the final coding unit based on the coding efficiency according to the image characteristic, or the coding unit may be recursively divided into coding units of sub-maps rather than recursively as necessary.
  • a coding unit of size may be used as the final coding unit.
  • the coding procedure may include a procedure of prediction, transform, and reconstruction, which will be described later.
  • the processing unit may further include a Prediction Unit (PU) or a Transform Unit (TU).
  • the example unit and the transform unit may be partitioned or partitioned from the aforementioned final coding unit, respectively.
  • the prediction unit may be a unit of sample prediction
  • the transformation unit may be a unit for deriving a transform coefficient and / or a unit for deriving a residual signal from the transform coefficient.
  • an M ⁇ N block may represent a set of samples or transform coefficients composed of M columns and N rows.
  • a sample may generally represent a pixel or a value of a pixel, and may only represent pixel / pixel values of the luma component or only pixel / pixel values of the chroma component.
  • a sample may be used as a term corresponding to one picture (or image) for a pixel or pel.
  • the encoding apparatus 100 may predict the prediction signal output from the inter prediction unit 180 or the intra prediction unit 185 in the input image signal (the original block, the original sample array).
  • a residual signal (residual block, residual block, residual sample array) may be generated by subtracting a block, for example, a sample array, and the generated residual signal is transmitted to the converter 120.
  • a unit for subtracting an example video signal (eg a block, eg a sample array) from an input video signal (original block, original sample array) in the encoder 100 may be referred to as a subtraction unit 115 as shown.
  • the prediction unit may perform a prediction on a block to be processed (hereinafter, referred to as a current block) and generate a predicted block including examples of the current block, that is, samples.
  • the prediction unit may determine whether intra prediction or inter prediction is applied on a current block or CU basis.
  • the prediction unit may generate various information about the prediction, such as the prediction mode information, and transmit the information to the entropy encoding unit 19.
  • the information about the prediction may be encoded in the entropy encoding unit 19 and may be transmitted. Can be output in stream form.
  • the intra predictor 185 may predict the current block by referring to samples in the current picture.
  • the referenced samples may be located in the neighborhood of the current block or may be located apart according to the prediction mode.
  • prediction modes may include a plurality of non-directional modes and a plurality of directional modes.
  • Non-directional mode may include, for example, DC mode and planner mode (Planar mode).
  • the directional mode may include, for example, 33 directional prediction modes or 65 directional prediction modes according to the degree of detail of the prediction direction. However, as an example, more or less directional prediction modes may be used depending on the setting.
  • the intra predictor 18 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.
  • the inter prediction unit 18 may derive a predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on the reference picture, wherein the motion information transmitted in the inter prediction mode may be derived.
  • the motion information may be exemplified in units of blocks, sub-blocks, or samples based on the correlation of the motion information between the neighboring block and the current block, and the motion information may include a motion vector and a reference picture index.
  • the motion information may further include inter prediction directions (L0 prediction, L1 prediction, Bi prediction, etc.)
  • the neighboring block includes a spatial neighboring block and a reference picture existing in the current picture.
  • temporally neighboring blocks which may include the (temporal ⁇ , neighboring block) reference that includes the reference block: / picture
  • a reference picture including the temporal neighboring block may be the same or different, and the temporal neighboring block may be referred to as a name of a co-located reference block and a co-located CU.
  • a reference picture including a temporal neighboring block may be referred to as a collocated picture (colPic), for example, the inter-example unit 180 constructs a motion information candidate list based on neighboring blocks, and the current block.
  • Information indicating which candidates are used to derive a motion vector and / or a reference picture index of the inter prediction may be generated based on various prediction modes, for example, a skip mode and a merge mode.
  • the inter prediction unit 18 may use the motion information of the neighboring block as the motion information of the current block.
  • the residual signal unlike the remaining mode can not be transmitted.
  • Yejeuk motion information (motion vector prediction, MVP) mode In this case, the motion vector of the neighboring block can be used as a motion vector predictor, and the motion vector of the current block can be indicated by signaling a motion vector difference.
  • the prediction signal generated by the inter predictor 180 or the intra predictor 185 may be used to generate a reconstruction signal or to generate a residual signal.
  • the transform unit 120 may generate transform coefficients by applying a transformation technique to the residual signal.
  • the transformation technique may include at least one of Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loeve Transform (KLT), Graph-Based Transform (GBT), or Conditionally Non-linear Transform (CNT).
  • DCT Discrete Cosine Transform
  • DST Discrete Sine Transform
  • KLT Karhunen-Loeve Transform
  • GBT Graph-Based Transform
  • CNT Conditionally Non-linear Transform
  • GBT means a conversion obtained from this graph when the relationship information between pixels is represented by a graph.
  • CNT means a transform that is generated using and based on all previously reconstructed pixels, i.e., a signal.
  • the conversion process may be applied to a pixel block having the same size as a square, or may be applied to a block having a variable size rather than a square.
  • the quantization unit 130 quantizes the transform coefficients and transmits the quantized transform coefficients to the entropy encoding unit 190.
  • the entropy encoding unit 190 encodes the quantized signal (information about the quantized transform coefficients) and outputs the bitstream. have.
  • the information about the quantized transform coefficients may be referred to as residual information.
  • the quantization unit 13 is a block type quantized transform coefficients based on the coefficient scan order 2019/194502 1 »(: 1/10 ⁇ 019/003810
  • the information may be rearranged into a one-dimensional vector form, and information about the quantized transform coefficients may be generated based on the quantized transform coefficients in the one-dimensional vector form.
  • the entropy encoding unit 190 may perform various encoding methods such as exponential Golomb, context-adaptive variable length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), and the like.
  • the entropy encoding unit 190 may encode information necessary for video / image reconstruction other than quantized transform coefficients (eg, values of syntax elements, etc.) together or separately.
  • the encoded information (eg, encoded video / picture information) may be transmitted or stored in units of NALs (network abstraction layer) in the form of a bitstream.
  • NALs network abstraction layer
  • the bitstream may be transmitted over a network or may be stored in a digital storage medium.
  • the network may include a broadcasting network and / or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, and the like.
  • the signal output from the entropy encoding unit 190 may include a transmitting unit (not shown) for transmitting and / or a storing unit (not shown) for storing as an internal / external element of the encoding apparatus 100, or the transmitting unit
  • the entropy encoding section 19 may be a component thereof.
  • the quantized transform coefficients output from the quantization unit 130 may be used to generate a prediction signal.
  • the quantized transform coefficients may be reconstructed in the residual signal by applying inverse quantization and inverse transform through inverse quantization unit 140 and inverse transform unit 150 in a loop.
  • the adder 155 adds the reconstructed residual signal to the prediction signal output from the inter predictor 180 or the intra predictor 185.
  • a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) can be generated. If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block may be used as the reconstructed block.
  • the adder 155 may be called a restoration unit or a restoration block generation unit.
  • the generated reconstruction signal may be used for intra prediction of a next processing target block in a current picture, and may be used for inter prediction of a next picture through filtering as described below.
  • the filtering unit 160 may improve subjective / objective image quality by applying filtering to the reconstruction signal.
  • the filtering unit 16 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed picture, and the modified reconstructed picture is stored in the memory 170, specifically, in the DPB of the memory 170.
  • the various filtering methods may include, for example, deblocking filtering, a sample adaptive offset, an adaptive loop filter, a bilateral filter, and the like.
  • the filtering unit 160 may generate various information about the filtering and transmit the information to the entropy encoding unit 190.
  • the filtering information may be encoded in the entropy encoding unit 190 and may be encoded. Can be output in the form of a stream.
  • the modified reconstructed picture transmitted to the memory 170 may be used as the reference picture in the inter predictor 180.
  • the encoding apparatus may avoid prediction mismatch in the encoding apparatus 10 and the decoding apparatus, and may improve the encoding efficiency.
  • the memory 170 DPB may store the modified reconstructed picture for use as a reference picture in the inter predictor 180.
  • the memory 170 may store the motion information of the block from which the motion information in the current picture is derived (or encoded) and / or the motion information of the blocks in the picture that have already been reconstructed.
  • the stored motion information may be transmitted to the inter predictor 180 in order to use the motion information of the spatial neighboring block or the motion information of the temporal neighboring block.
  • the memory 170 may store reconstructed samples of the reconstructed blocks in the current picture. And forward to the intra prediction unit 185.
  • FIG. 2 is a schematic block diagram of a decoding apparatus in which decoding of a video / image signal is performed according to an embodiment to which the present invention is applied.
  • the decoding apparatus 20 may include an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an adder 23 23, a filtering unit 240, a memory 250, and inter prediction. And an intra predictor 265.
  • the inter predictor 260 and the intra predictor 265 may be collectively referred to as a predictor.
  • the inverse quantization unit 22 may be referred to as a residual processing unit by combining the inverse transformation unit 230. That is, the residual processing unit may include an inverse quantization unit 220 and an inverse transformation.
  • an entropy decoding unit 210 an inverse quantization unit 220, an inverse transform unit 230, an adder 23 23, a filter 240, and an inter prediction unit 260.
  • the intra prediction unit 265 may be configured by one hardware component (for example, a decoder or a processor) according to an embodiment.
  • Memory 170 may be included the (decoded picture buffer) DPB, it may be configured by a digital storage medium.
  • the decoding apparatus 200 May perform decoding using a processing unit applied in the encoding apparatus, so that the processing unit of decoding may be, for example, a coding unit, and the coding unit may be a quad tree structure and / or from a coding tree unit or a maximum coding unit.
  • the reconstructed video signal decoded and output through the decoding apparatus 200 may be reproduced through the reproducing apparatus.
  • the decoding device 20 may receive a signal output from the encoding device of FIG. 1 in the form of a bitstream, and the received signal may be decoded through the entropy decoding unit 210.
  • the entropy decoding unit ( 210 n may parse the bitstream to derive information (eg, video / image information) necessary for image reconstruction (or picture reconstruction), for example, the entropy decoding unit 210 may use exponential Golomb coding, CAVLC, or CABAC.
  • the information in the bitstream may be decoded based on a coding method of, and the quantized values of the syntax elements required for image reconstruction and transform coefficients for the residuals may be output.
  • the entropy decoding method determines the context of the next symbol / bin after determining the context model.
  • the context model can be updated with information of decoded symbols / bins for the model.
  • the information related to the prediction among the information decoded by the entropy decoding unit 2110 is provided to a prediction unit (the inter prediction unit 26 and the intra prediction unit 265), and the register where the entropy decoding is performed by the entropy decoding unit 210 is performed.
  • Dual values that is, quantized transform coefficients and related parameter information, may be input to the inverse quantization unit 220.
  • a receiver (not shown) for receiving a signal output from the encoding apparatus may be further configured as an internal / external element of the decoding apparatus 200, or the receiver may be configured of the entropy decoding unit 210. It may be an element.
  • the inverse quantization unit 220 may dequantize the quantized transform coefficients and output the transform coefficients.
  • the inverse quantization unit 220 may rearrange the quantized transform coefficients in the form of a two-dimensional block. In this case, the reordering may be performed based on the coefficient scan order performed by the encoding apparatus.
  • the inverse quantization unit 22 may perform inverse quantization on quantized transform coefficients using a quantization parameter (for example, quantization step size information), and may obtain transform coefficients.
  • a quantization parameter for example, quantization step size information
  • the inverse transformer 230 inversely transforms the transform coefficients to obtain a residual signal (a residual block, a residual sample array).
  • the prediction unit may perform prediction on the current block and generate a predicted block including prediction samples for the current block.
  • the prediction unit may extract the information about the prediction output from the entropy decoding unit 210. Based on the determination, whether intra prediction or inter prediction is applied to the current block, a specific intra / inter prediction mode may be determined.
  • the intra predictor 265 may predict the current block by referring to the samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located apart according to the prediction mode.
  • prediction modes may include a plurality of non-directional modes and a plurality of directional modes.
  • the intra predictor 265 may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring block.
  • the inter prediction unit 2 6 may derive a predicted block for the current block based on a reference block (reference sample array) specified by a motion vector on the reference picture.
  • the motion information transmitted in the inter prediction mode may be derived.
  • motion information may be predicted in units of blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block, and the motion information may include a motion vector and a reference picture index.
  • the motion information may further include inter prediction directions (L0 prediction, L1 prediction, Bi prediction, etc.)
  • the neighboring block includes a spatial neighboring block and a reference picture existing in the current picture.
  • the inter neighboring block 260 may include a temporal neighboring block existing at. And a motion vector candidate list based on the received candidate selection information, and then derives a motion vector and / or a reference picture index of the current block based on the received candidate selection information.
  • the information about the prediction indicates a mode of inter prediction for the current block. May contain information.
  • the adder 23 is configured to add the obtained residual signal to the predictive signal (predicted block, predictive sample array) output from the inter predictor 260 or the intra predictor 265 to restore the reconstructed signal (reconstructed picture, reconstructed block). If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block may be used as the reconstructed block.
  • the adder 235 may be called a restoration unit or a restoration block generation unit.
  • the generated reconstruction signal may be used for intra prediction of a next processing target block in a current picture, and may be used for inter prediction of a next picture through filtering as described below.
  • the filtering unit 24 may apply filtering to the reconstruction signal to improve subjective / objective picture quality.
  • the filtering unit 240 may apply various filtering methods to the reconstructed picture to generate a modified reconstructed picture.
  • the modified reconstruction picture may be transmitted to a memory 250, specifically, a DPB of the memory 250.
  • the various filtering methods may include, for example, deblocking filtering, sample adaptive offset, It may include an adaptive loop filter, a bilateral filter, and the like.
  • the (modified) reconstructed picture stored in the DPB of the memory 250 may be used as the reference picture in the inter predictor 260.
  • the memory 250 may store the motion information of the block from which the motion information in the current picture is derived (or decoded) and / or the motion information of the blocks in the picture that have already been reconstructed.
  • the stored motion information may be motion information of a spatial neighboring block or motion information of a temporal neighboring block.
  • the data may be transmitted to the inter prediction unit 260 for use.
  • the memory 170 may store reconstructed samples of the blocks restored in the current picture, and may transfer the intra prediction unit 265.
  • the embodiments described by the filtering unit 160, the inter prediction unit 180, and the intra prediction unit 185 of the encoding apparatus 100 are respectively the filtering unit 240 and the inter prediction of the decoding apparatus 200. The same may also apply to the unit 260 and the intra predictor 26.
  • the video / image coding method according to this document may be performed based on various detailed techniques, and each detailed technique will be described as follows. Techniques described below include prediction, residual processing ((inverse) transform, (inverse) quantization, etc.), syntax element coding, filtering, partitioning / division, etc., in the video / image encoding / decoding procedures described above and / or below. It will be apparent to those skilled in the art that they may be involved in related procedures.
  • the block partitioning procedure according to this document may be performed by the image partition unit 11 of the encoding apparatus described above, so that partitioning related information may be processed (encoded) by the entropy encoding unit 190 and transmitted to the decoding apparatus in the form of a bitstream.
  • the entropy decoding unit 210 of the decoding apparatus derives a block partitioning structure of a current picture based on the partitioning related information obtained from the bitstream, and based on this, a series of procedures (eg, prediction, register) for image decoding. Dual processing, block reconstruction, in-loop filtering, etc.).
  • a series of procedures eg, prediction, register
  • the 24 pictures may be divided into a sequence of coding tree units (CTUs).
  • the CTU may correspond to a coding tree block (CTB).
  • CTB coding tree block
  • the CTU may include a coding tree block of luma samples and two coding tree blocks of corresponding chroma samples.
  • (* may include NxN blocks of luma samples and two corresponding blocks of chroma samples.
  • the maximum allowable size of CTU ⁇ for coding and prediction may be different from the maximum allowable size of CTU for transform.
  • the maximum allowable size of the luma block in the CTU may be 128x128.
  • the CTU may be divided into examples based on a quad-tree (QT) structure.
  • the quadtree structure may be called a quaternary tree structure. This is to reflect various local characteristics.
  • the CTU may be divided based on a multitype tree structure partition including a binary tree (BT) and a ternary tree (TT) as well as a quad tree.
  • BT binary tree
  • TT ternary tree
  • the QTBT structure may include a quadtree and binary tree based partition structure
  • the QTBTTT may include a quadtree, binary tree, and ternary tree based partition structure.
  • the QTBT structure may include a quadtree, binary tree, and ternary tree based partition structure.
  • the CTU may have a square or rectangular shape.
  • the CTU may first be divided into quadtree structures.
  • the leaf nodes of the quadtree structure may then be further divided by the multitype tree structure.
  • 3 is a diagram illustrating an example of a multi-type tree structure as an embodiment to which the present invention can be applied.
  • the multitype tree structure may include four partition types as shown in FIG.
  • the four split types include vertical binary splitting (SPLIT_BT_VER), horizontal binary splitting (SPLIT_BT_HOR), Turner 5] may include a vertical ternary splitting (SPLIT_TT_VER) and a horizontal ternary splitting (SPLIT_TT_HOR).
  • Leaf nodes of the multitype tree structure may be called CUs. These cus can be used for the prediction and transform procedure.
  • CU, PU, in the present document may have the same block size. However, when the maximum supported transform length is smaller than the width or height of the color component of the CU, the CU may have a different block size.
  • FIG. 4 is a diagram illustrating a signaling mechanism of partition partition information of a quadtree with nested multi-type tree structure according to an embodiment to which the present invention may be applied.
  • the CTU is treated as the root of the quadtree, and is partitioned into a quadtree structure for the first time.
  • Each quadtree leaf node may then be further partitioned into a multitype tree structure.
  • a first flag (ex. Mtt_split_cu_flag) 7 ⁇ is signaled to indicate whether the corresponding node is partitioned principally. If the node is additionally partitioned, the second 2019/194502 1 »(: 1/10 ⁇ 019/003810
  • a second flag, ex. mtt_split_cu_verticla_flag) 7 ⁇ minutes ' 3 ⁇ 4: may be signaled to indicate a splitting direction.
  • a third flag (ex. Mtt_split_cu_binary_flag) 7 ⁇ may be signaled to indicate whether the partition type is binary partition or ternary partition.
  • MttSplitMode multi-type tree splitting mode of a CU may be derived as shown in Table 1 below.
  • FIG. 5 is a diagram illustrating a method for dividing a CTU into multiple CUs based on a quadtree and a accompanying multi-type tree structure according to an embodiment to which the present invention may be applied.
  • Quadtree partitions involving a multitype tree can provide a content-adapted coding tree structure.
  • the CU may correspond to the coding block me.
  • the CU is a coding block of luma samples, ⁇ ⁇ 02019/194502
  • the size of a CU may be as large as CTU, or may be cut by 4 ⁇ 4 in luma sample units.
  • the maximum chroma CB size may be 64x64 and the minimum chroma CB size may be 2x2.
  • the maximum allowable luma TB size may be 64x64 and the maximum allowable chroma TB size may be 32x32. If the width or height of the divided CB according to the tree structure is larger than the maximum transform width or height, the CB may be automatically (or implicitly) split until the TB size limit in the horizontal and vertical directions is satisfied.
  • the following parameters may be defined and identified as an SPS syntax element.
  • CTU size the root node size of a quaternary tree
  • MinQTSize the minimum allowed quaternary tree leaf node size
  • MinBtSize the minimum allowed binary tree leaf node size
  • the CTU size may be set to 64x64 blocks of 128x128 luma samples and two corresponding: chroma samples (4: 2: 0 chroma format).
  • MinOTSize can be set to 16x16, MaxBtSize to 128x128, MaxTt.Szie to 64x64, MinBtSize and MinTtSize (for both width and height) to 4x4, and MaxMttDepth to 4.
  • Quarttree partitioning was applied to the CTU to create quadtree leaf nodes.
  • the quadtree leaf node may be called a leaf QT node.
  • Quadtree leaf nodes may have a 128x128 size (ie the CTU size) from .6xl6 size Ue the MinOTSize. If the leaf QT node is 128x128, it may not be further divided into binary tree / turnary tree. This is the case, even if the partition is, because Haga exceed MaxBtsize and KaxTtszie (ie 64x64). , Otherwise this, leaf QT node may further be split into multi-type tree, therefore, leaf QT node is a root node (root node) to the multi-type tree, leaf QT node multi-type tree Maps (mttDepth) 0 value Can have If the multitype tree depth reaches MaxMttdepth (ex.
  • Figure 6 is an embodiment to which the present invention can be applied, Ternary (ternary- tree) A diagram illustrating a method of limiting partitioning.
  • TT partitioning may be limited in certain cases. For example, when the width or height of the luma coding block is greater than a predetermined specific value (eg, 32, 64), TT partitioning may be limited as shown in FIG. 6.
  • a predetermined specific value e.g, 32, 64
  • the coding tree scheme may support that the luma and chroma blocks have separate block tree structures.
  • luma and chroma stages in one CTU may be limited to have the same coding tree structure.
  • luma and chroma blocks may have separate block tree structures from each other. If an individual block tree mode is applied, the luma CTB may be split into CUs based on a particular coding tree structure, and the chroma CTB may be split into chromas based on another coding tree structure.
  • An example in an I slice may mean a coding block of a luma component or coding blocks of two chroma components, and a CU of a P or B slice may be composed of blocks of three color components.
  • a quadtree coding tree structure involving a multitype tree has been described, but a structure in which a CU is divided is not limited thereto.
  • the BT structure and the TT structure may be interpreted as a concept included in a multiple partitioning tree (MPT) structure, and (3) may be interpreted as being divided through a QT structure and an MPT structure.
  • MPT multiple partitioning tree
  • the leaf nodes of the QT structure A syntax element (for example, MPT_split_type) that contains information about how many blocks are divided, and a syntax element (for example, information about whether a leaf node of a QT structure is divided in a vertical or horizontal direction).
  • MPT_split_type a syntax element that contains information about how many blocks are divided
  • a syntax element for example, information about whether a leaf node of a QT structure is divided in a vertical or horizontal direction.
  • the partition structure can be determined by signaling MPT_split_mode).
  • the CU may be partitioned in a different way than the QT structure, BT structure or TT structure. That is, the CU of the lower map is divided into 1/4 size of the CU of the upper depth according to the QT structure, or the CU of the lower depth is divided into 1/2 size of the CU of the upper depth according to the BT structure, or according to the TT structure.
  • the CU of the lower map is sometimes 1/5, 1/3, 3/8, 3 of the CU of the upper map. It can be divided into / 5, 2/3 or 5/8 size, the way in which the CU is divided is not limited to this.
  • the tree node block is placed so that all samples of all coded CUs are located within the picture boundaries. May be limited. In this case, for example, the following divisional rules may apply:
  • the block is forced to be split with QT split mode.
  • the block is forced to be split with QT split mode.
  • the block is a QT node, and the size of the block is larger than the minimum QT size and the size of the block is smaller than or equal to the maximum BT size, the block is forced to be split with QT split mode or SPLIT_BT_HOR mode.
  • the block is forced to be split with SPLIT_BT_HOR mode.
  • the block is forced to be split with QT split mode.
  • the block is a QT node, and the size of the block is larger than the minimum QT size and the size of the 2019/194502 1 » (: 1/10 ⁇ 019/003810
  • the block is smaller than or equal to the maximum BT size, the block is forced to be split with QT split mode or SPLIT_BT_VER mode.
  • the block is forced to be split with SPLIT_BT_VER mode.
  • the quadtree coded block structure with the multi-type tree described above can provide a very flexible block partitioning structure. Because of the split types supported in a multitype tree, different split patterns can potentially result in the same coding block structure. By limiting the occurrence of such redundant partition patterns, the data amount of partitioning information can be reduced. It demonstrates with reference to the following drawings.
  • FIG. 7 is a diagram illustrating redundant division patterns that may occur in binary tree division and ternary tree division as an embodiment to which the present invention may be applied.
  • two levels of consecutive binary splits in one direction have the same coding block structure as the binary split for the center partition after the ternary split.
  • the binary tree split in the given direction for the center partition of the ternary tree split may be limited. This restriction can be applied for CUs of all pictures. If this particular partitioning is restricted, the signaling of the corresponding syntax elements is such a case. It can be modified to reflect, thereby reducing the number of bits signaled for partitioning. For example, as shown in FIG.
  • the decoded portion of the current picture or other pictures in which the current processing unit is included may be used to reconstruct the current processing unit in which decoding is performed.
  • Intra picture or I picture which uses only the current picture for reconstruction, i.e. performs only intra-picture prediction, and a picture (slice ⁇ ) that uses at most one motion vector and reference index to predict each unit
  • a picture (slice) using a picture (predictive picture) or a P picture (slice), up to two motion vectors, and a reference index may be referred to as a pair, that is, a picture (Bi-predictive picture) or a B picture (slice).
  • Intra prediction means a prediction method that derives the current processing block from data elements (eg, sample values, etc.) of the same decoded picture (or slice). That is, a method of predicting a current processing block and a pixel value by referring to reconstructed regions in the current picture.
  • data elements eg, sample values, etc.
  • Inter prediction means a prediction method of deriving a current processing block based on data elements (eg, sample values or motion vectors, etc.) of pictures other than the current picture. That is, a method of exemplifying a pixel value of the current processing block by referring to reconstructed regions in other reconstructed pictures other than the current picture.
  • data elements eg, sample values or motion vectors, etc.
  • Inter prediction (or inter picture prediction) is a technique for removing redundancy existing between pictures, and is mostly performed through motion estimation and motion compensation.
  • the present invention describes the detailed description of the inter prediction method described above with reference to FIGS. 1 and 2, and the decoder may be represented by the inter prediction-based video / image decoding method of FIG. 10 described later and the inter prediction unit in the decoding apparatus of FIG. 11. .
  • the encoder may be represented by the inter prediction based video / video encoding method of FIG. 8 and the inter prediction unit in the encoding apparatus of FIG. 9.
  • the data encoded by FIGS. 8 and 9 may be stored in the form of a bitstream.
  • the prediction unit of the encoding apparatus / decoding apparatus may derive the prediction sample by performing inter prediction on a block basis.
  • Inter prediction is the data elements of picture (s) other than the current picture. Prediction values derived in a method dependent on sample values, motion information, etc.).
  • a predicted block (prediction sample array) for the current block can be derived based on a reference block (reference sample array) specified by a motion vector on the reference picture indicated by the reference picture index. have .
  • the motion information of the current block may be predicted in units of blocks, subblocks, or samples.
  • the motion information may include a motion vector and a reference picture index.
  • the motion information may further include inter prediction type (L0 prediction, L1 prediction, Bi prediction, etc.) information.
  • the neighboring block may include a spatial neighboring block existing in the current picture and a temporal neighboring block present in the reference picture.
  • the reference picture including the reference block and the reference picture including the temporal neighboring block may be the same or different.
  • the temporal neighboring block may be called a co-located reference block, a co-located CU (colCU), or the like, and a reference picture including the temporal neighboring block may be called a collocated picture (colPic). It may be.
  • a motion information candidate list may be constructed based on neighboring blocks of the current block, and a flag indicating which candidate is selected (used) to derive the motion vector and / or reference picture index of the current block. Or index information may be signaled.
  • Inter prediction may be performed based on various prediction modes.
  • the motion information of the current block may be the same as the motion information of the selected neighboring block.
  • the residual signal may not be transmitted.
  • MVP motion vector prediction
  • the motion vector of the selected neighboring block is used as a motion vector predictor, and the motion vector 2019/194502 1 > (1 '/ Table 13 ⁇ 42019 / 003810
  • motion vector difference may be signaled.
  • the motion vector of the current block may be derived using the sum of the motion vector predictor and the motion vector difference.
  • FIG. 8 and 9 illustrate an inter prediction based video / image encoding method and an inter prediction unit in an encoding apparatus according to an embodiment of the present invention.
  • S801 may be performed by the inter prediction unit 180 of the encoding apparatus, and S802 may be performed by the residual processing unit of the encoding apparatus.
  • S802 may be performed by the subtracting unit 115 of the encoding apparatus.
  • prediction information may be derived by the inter prediction unit 180 and encoded by the entropy encoding unit 190.
  • the residual information may be derived by the residual processing unit and encoded by the entropy encoding unit 190.
  • the residual information is information about the residual samples.
  • the residual information may include information about quantized transform coefficients for the residual samples.
  • the residual samples may be derived as transform coefficients through the transform unit 12 of the encoding apparatus, and the transform coefficients may be derived as transform coefficients quantized through the quantization unit 130.
  • Information about the transform coefficients may be encoded in the entropy encoding unit 19 through the residual coding procedure.
  • the encoding apparatus performs inter prediction on the current block (S801).
  • the encoding apparatus derives the inter prediction mode and the motion information of the current block, and Predictive samples of the block may be generated.
  • the inter prediction mode determination, the motion information derivation, and the prediction samples generation procedure may be performed simultaneously, or one procedure may be performed before the other.
  • the inter prediction unit 180 of the encoding apparatus may include a prediction mode determination unit 181, a motion information derivation unit 182, and a prediction sample derivation unit 183, and the prediction mode determination unit 181 may be used.
  • the prediction mode for the current block may be determined
  • the motion information derivation unit 182 may derive the motion information of the current block
  • the prediction sample derivation unit 183 may derive the motion samples of the current block.
  • the inter prediction unit 180 of the encoding apparatus searches for a block similar to the current block in a predetermined area (search area) of reference pictures through motion estimation, and a difference from the current block is determined.
  • Reference blocks that are minimum or below a certain criterion may be derived.
  • a reference picture index indicating a reference picture in which the reference block is located may be derived, and a motion vector may be derived based on a position difference between the reference block and the current block.
  • the encoding apparatus may determine a mode applied to the current block among various prediction modes.
  • the encoding apparatus may compare RD costs for the various prediction modes and determine an optimal prediction mode for the current block.
  • the encoding apparatus constructs a merge candidate list to be described later, and among the reference blocks indicated by merge candidates included in the merge candidate list.
  • a reference block having a difference from the current block that is smaller than or equal to a predetermined criterion may be derived.
  • a merge candidate associated with the derived reference block is selected, and Merge index information indicating the selected merge candidate may be generated and signaled to the decoding apparatus.
  • the motion information of the current block may be derived using the motion information of the selected merge candidate.
  • the encoding apparatus constructs a (A) MVP candidate list to be described later, and among the mvp (motion vector predictor) candidates included in the (A) MVP candidate list.
  • the motion vector of the selected mvp candidate may be used as mvp of the current block.
  • a motion vector indicating a reference block derived by the above-described motion estimation may be used as the motion vector of the current block, and the motion having the smallest difference between the current block and the motion vector among the mvp candidates is the smallest.
  • the mvp candidate with the vector may be the selected mvp candidate.
  • a motion vector difference which is a difference obtained by subtracting the mvp from the motion vector of the current block, may be plotted.
  • the information about the MVD may be signaled to the decoding apparatus.
  • the value of the reference picture index may be configured with reference picture index information and separately signaled to the decoding apparatus.
  • the encoding apparatus may derive residual samples based on the above-described samples (S802).
  • the encoding apparatus may derive the residual samples by comparing the original samples of the current block with the prediction samples.
  • the encoding apparatus encodes image information including prediction information and residual information (S803).
  • the encoding apparatus may output the encoded image information in the form of a bitstream.
  • the prediction information is information related to the prediction procedure and includes prediction mode information (eg skip flag, merge flag or mode index) and motion information. 2019/194502 1 »(: 1 1 ⁇ 2019/003810
  • the information about the motion information is a candidate selection that is information to guide a motion vector.
  • It may include year-round rumors
  • Information information about the motion information described above Information and / or reference picture index information.
  • the information on the motion information is 0 prediction, It may include information indicating whether prediction, or pair () 3: 1) prediction is applied.
  • the residual information is information about the residual samples.
  • the residual information may include information about quantized transform coefficients for the residual samples.
  • the output bitstream may be stored in a (digital) storage medium and delivered to the decoding device, or may be delivered to the decoding device via a network.
  • the encoding apparatus may generate a reconstructed picture (including the reconstructed samples and the reconstructed block) based on the reference samples and the residual samples. This is because the encoding apparatus derives the same prediction result as that performed in the decoding apparatus, and thus the coding efficiency can be increased. Therefore, the encoding apparatus may store a reconstructed picture (or reconstructed samples, reconstructed block) in a memory and use it as a reference picture for inter prediction. As described above, an in-loop filtering procedure may be further applied to the reconstructed picture.
  • FIGS. 10 and 11 illustrate an inter prediction based video / image decoding method and an inter prediction unit in a decoding apparatus according to an embodiment of the present invention.
  • the decoding apparatus may perform an operation corresponding to the operation performed by the encoding apparatus.
  • the decoding apparatus may receive the received prediction information. 2019/194502 1 »(: 1 ⁇ 1 ⁇ 2019/003810
  • prediction may be performed on the current block and prediction samples may be derived.
  • 31001 to 31003 may be performed by the inter prediction unit 260 of the decoding apparatus, and the residual information of 31004 may be obtained from the bitstream by the entropy decoding unit 210 of the decoding apparatus.
  • the residual processor of the decoding apparatus may guide residual samples for the current block based on the residual information.
  • the inverse quantization unit 220 of the residual processing unit performs dequantization on the basis of the quantized transform coefficients derived based on the residual information to derive transform coefficients and inverse transform unit of the residual processing unit ( 230 may derive residual samples for the current block by performing an inverse transform on the transform coefficients.
  • 005 may be performed by the decoding apparatus and the adder 23.
  • the decoding apparatus may determine the prediction mode for the current block based on the received prediction information (31001).
  • the decoding apparatus may determine which inter prediction mode is applied to the current block based on the prediction mode information in the prediction information.
  • the inter prediction mode candidates may include a skip mode, a merge mode, and / or (A) an MVP mode, or may include various inter prediction modes described below.
  • the decoding apparatus may determine the current block based on the determined inter prediction mode. Deriving the motion information (S1002). For example, when a skip mode or a merge mode is applied to the current block, the decoding apparatus may construct a merge candidate list to be described later, and select one merge candidate among merge candidates included in the merge candidate list. The selection may be performed based on the above merge information.
  • the motion information of the current block may be derived using the motion information of the selected merge candidate.
  • the motion information of the selected merge candidate may be used as motion information of the current block.
  • the decoding apparatus constructs (A) MVP candidate list to be described later, and among (m) mvp (motion vector predictor) candidates included in the (A) MVP candidate list.
  • the motion vector of the selected mvp candidate may be used as mvp of the current block.
  • the selection may be performed based on the above-described selection information (mvp flag or mvp ndex).
  • the MVD of the current block may be derived based on the information about the MVD
  • the motion vector of the current block may be derived based on mvp and the MVD of the current block.
  • a reference picture index of the current block may be derived based on the reference picture index information.
  • the picture indicated by the reference picture index in the reference picture list for the current block may be derived as a reference picture referred for inter prediction of the current block.
  • motion information of the current block may be derived without constructing a candidate list, and in this case, motion information of the current block may be derived according to a procedure disclosed in a prediction mode described later.
  • the candidate list structure as described above may be omitted. 2019/194502 1 »(: 1 ⁇ 1 ⁇ 2019/003810
  • the decoding apparatus may generate prediction samples for the current block based on the motion information of the current block (31003).
  • the reference picture may be derived based on the reference picture index of the current block, and the prediction samples of the current block may be derived using the samples of the reference block indicated by the motion vector of the current block on the reference picture.
  • a prediction sample filtering procedure for all or some of the prediction samples of the current block may be further performed.
  • the inter prediction unit 260 of the decoding apparatus may include a prediction mode determination unit 261, a motion information derivation unit 262, and a prediction sample derivation unit 263, and the prediction mode determination unit 261 ⁇ . Determining a prediction mode for the current block based on the prediction mode information received in the step, and based on the information on the motion information received from the motion information derivation unit 262, motion information (motion vector and / or Reference picture index, etc.), and the predictive sample derivation unit 263 may derive the predictive samples of the current block.
  • the decoding apparatus generates residual samples for the current block based on the received residual information (31004).
  • the decoding apparatus may generate reconstructed samples for the current block based on the prediction samples and the residual samples, and generate a reconstructed picture based thereon. After that, the in-loop filtering procedure may be further applied to the reconstructed picture as described above.
  • the inter prediction procedure may include an inter prediction mode determination step, motion information derivation step according to the determined prediction mode, and prediction execution (prediction sample generation) step based on the derived motion information. Determining the Inter-Ion Mode (06 ⁇ 63 : ⁇ 1 : 11 ⁇ 1 : ⁇ 11 0 Delay Root] 3 6 ⁇ ⁇ 1 :: 10] 1 111 (X16) Various interpolations for prediction of the current block in the picture. Prediction modes can be used, for example, various modes can be used, such as merge mode, skip mode, MVP mode, affine mode, etc.
  • Decoder side motion vector refinement (DMVR) -2.H, AMVR (adaptive) mot ⁇ on vector resolution (SH) may be further used as a secondary mode, etc.
  • Affine mode may also be called affine motion predictor mode
  • MVP mode is advanced motion vector prediction (AMVP) mode.
  • Prediction mode information indicating the inter prediction mode of the current block may be signaled from an encoding device to a decoding device, and the prediction mode information may be included in a bitstream and received by the decoding device.
  • a number of Correction modes may include index information indicating one of the, or, through a hierarchical signaling flag information may indicate the inter-prediction mode. In this case can be the prediction mode information includes one or more flags.
  • the skip flag is signaled to indicate whether to apply the skip mode, and if the skip mode is not applied, the merge flag is signaled to indicate whether to apply the merge mode, and if the merge mode is not applied, the MVP mode is applied. Or further signal a flag for further classification.
  • the affine mode may be signaled in an independent mode, or may be signaled in a mode dependent on a merge mode or an MVP mode.
  • the affine mode is one candidate of a merge candidate list or an MVP candidate list as described below. It may be configured.
  • Inter prediction may be performed using motion information of the current block.
  • the encoding apparatus may derive optimal motion information for the current block through a motion estimation procedure. For example, the encoding apparatus may search for a similar reference block having a high correlation using the original block in the original picture for the current block in fractional pixel units within a predetermined search range in the reference picture, thereby deriving motion information. Can be. Similarity of blocks can be derived based on the difference of phase based sample values. For example, the similarity of the blocks may be calculated based on the SAD between the current block (or template of the current block) and the reference block (or template of the reference block). In this case, motion information may be derived based on a reference block having the smallest SAD in the search area. The derived motion information may be signaled to the decoding apparatus according to various methods based on the inter prediction mode.
  • FIG. 12 is a diagram for describing a neighboring block used in a merge mode or a skip mode as an embodiment to which the present invention is applied.
  • merge mode When the merge mode is applied, the motion information of the current prediction block is not directly transmitted, and the motion information of the current prediction block is derived using the motion information of the neighboring prediction block. Therefore, merge mode
  • the motion information of the current prediction block can be indicated by transmitting flag information indicating that the information has been used and a merge index indicating which neighboring prediction blocks have been used.
  • the encoder can search the merge candidate block used to derive motion information of the current prediction block to perform the merge mode. For example, up to five merge candidate blocks may be used, but the present invention is not limited thereto.
  • the maximum number of merge candidate blocks may be transmitted in a slice header (or a tile group header ⁇ , and the present invention is not limited thereto.
  • the encoder may generate a merge candidate list. For example, the merge candidate block having the smallest cost among them may be selected as the final merge candidate block.
  • the present invention provides various embodiments of a merge candidate block constituting the merge candidate list.
  • the merge candidate list may use, for example, five merge candidate blocks. For example, four spatial merge candidates
  • One temporal merge candidate can be used.
  • the blocks shown in FIG. 12 may be used as the spatial merge candidate.
  • FIG. 13 is a flowchart illustrating a merge candidate list construction method according to an embodiment to which the present invention is applied.
  • the coding device searches for spatial neighboring blocks of the current block, and stores the spatial merge candidates derived from the merge candidate list. Insert (S1301).
  • the spatial neighboring blocks may include a lower left corner peripheral block, a left peripheral block, a right upper corner peripheral block, an upper peripheral block, and an upper left corner corner block of the current block.
  • additional peripheral blocks such as a right peripheral block, a lower peripheral block, and a lower right peripheral block may be further used as the spatial peripheral blocks.
  • the coding apparatus may search for the spatial neighboring blocks based on priority, detect available blocks, and derive motion information of the detected blocks as the spatial merge candidates.
  • the encoder and the decoder may search the five blocks shown in FIG. 12 in the order of Al, Bl, BO, AO, and B2, and index the available candidates sequentially to form a merge candidate list.
  • the coding apparatus inserts the temporal merge candidate derived by searching the temporal neighboring block of the current block into the merge candidate list (S1302).
  • the temporal neighboring block may be located on a reference picture that is a picture different from the current picture in which the current block is located.
  • the reference picture in which the temporal neighboring block is located may be called a collocated picture or a col picture.
  • the temporal neighboring block may be searched in the order of the lower right corner peripheral block and the lower right center block of the co-located block with respect to the current block on the col picture.
  • motion data compress ion when motion data compress ion is applied, specific motion information may be stored as the representative motion information for each col storage unit in the col picture. In this case, it is not necessary to store the motion information for all the blocks in the predetermined storage unit, thereby obtaining a motion data compression effect.
  • the schedule storage unit is for example 16x16 sample units, or 8x8 sample units.
  • the size information for the predetermined storage unit may be signaled from the encoder to the decoder or the like.
  • motion information of the temporal neighboring block may be replaced with representative motion information of the predetermined storage unit in which the temporal neighboring block is located.
  • the temporal merge candidate may be derived based on the motion information of the covering prediction block. For example, when the constant storage unit is 2nx2n sample units, assuming that the coordinates of the temporal neighboring block are (xTnb, yTnb), the modified positions ((xTnb »n) ⁇ n) and (yTnb» n) The motion information of the prediction block located at ⁇ n) ⁇ may be used for the temporal merge candidate.
  • the constant storage unit is a 16x16 sample unit
  • the coordinates of the temporal neighboring block are (xTnb, yTnb), the modified position ((xTnb »4) « 4), (yTnb> Motion information of the prediction block located at ⁇ 4 >) < ⁇ 4) can be used for the temporal merge candidate.
  • the constant storage unit is 8x8 sample units
  • the coordinates of the temporal neighboring block are (xTnb, yTnb)
  • the modified positions ((xTnb> ⁇ 3) ⁇ 3) and (yTnb> ⁇ 3
  • the motion information of the predictive block located at < RTI ID 0.0 > " 3)) can be used for the temporal merge candidate.
  • the coding apparatus currently determines that the number of merge candidates is greater than the maximum number of merge candidates. Whether it is small can be checked (S1303).
  • the maximum number of merge candidates may be predefined or signaled at the encoder to the decoder. For example, the encoder may generate information about the maximum number of merge candidates, encode the information, and transmit the encoded information to the decoder in the form of a bitstream. If the maximum number of merge candidates is filled, the subsequent candidate addition process may not proceed.
  • the coding apparatus inserts an additional merge candidate into the merge candidate list (S1304).
  • the additional merge candidate may include, for example, ATMVP, combined bi-predictive merge candidate (when the slice type of the current slice is B type) and / or zero vector merge candidate.
  • the coding apparatus may terminate the construction of the merge candidate list.
  • the encoder may select an optimal merge dubo among merged dubos constituting the merge fubo list based on a rate-distortion (RD) cost, and converts selection information (ex. Merge index) indicating the selected merge fubo to the decoder. Can be signaled.
  • the decoder may select the optimal merge candidate based on the merge candidate list and the selection information.
  • the motion information of the selected merge candidate may be used as the motion information of the current block, and the prediction samples of the current block may be derived based on the motion information of the current block.
  • An encoder may derive residual samples of the current block based on the prediction samples, and may signal residual information about the residual samples to a decoder. 2019/194502 1 »(: 1/10 ⁇ 019/003810
  • the 49 decoder may generate reconstructed samples based on the residual samples and the predicted samples derived based on the residual information, and generate a reconstructed picture based thereon.
  • the motion information of the current block may be derived in the same manner as when the merge mode is applied. However, when the skip mode is applied, the residual signal for the corresponding block is omitted, and thus prediction samples may be used as reconstructed samples.
  • FIG. 14 is a flowchart illustrating a merge candidate list construction method according to an embodiment to which the present invention is applied.
  • the Motion Vector Prediction (MVP) mode When the Motion Vector Prediction (MVP) mode is applied, it corresponds to the reconstructed spatial neighboring block (for example, may be the neighboring block described above with reference to FIG. 12) and the motion vector and / or temporal neighboring block (or col block).
  • a motion vector predictor (mvp) candidate list may be generated. That is, the motion vector of the reconstructed spatial neighboring block and / or the motion vector corresponding to the temporal neighboring block may be used as a motion vector predictor candidate.
  • the information about the prediction may include selection information (ex. MVP flag or MVP index) indicating an optimal motion vector predictor candidate selected from the motion vector predictor candidates included in the list.
  • the prediction unit may use the selection information, and among the motion vector predictor candidates included in the motion vector candidate list, A motion vector predictor of the current block may be selected.
  • the prediction unit of the encoding apparatus may obtain a motion vector difference (MVD) between the motion vector of the current block and the motion vector predictor, and may encode the output vector in a bitstream form. That is, MVD may be obtained by subtracting the motion vector predictor from the motion vector of the current block.
  • MVD motion vector difference
  • the prediction unit of the decoding apparatus may obtain a motion vector difference included in the information about the prediction, and derive the motion vector of the current block by adding the motion vector difference and the motion vector predictor.
  • the prediction unit of the decoding apparatus may obtain or derive a reference picture index or the like indicating the reference picture from the information about the prediction.
  • the motion vector predictor candidate list may be constructed as shown in FIG. 14.
  • ATMVP Advanced Temporal Mot ⁇ on Vector Prediction
  • ATMVP Advanced Temporal Motion Vector Prediction
  • an ATMVP is a method of deriving motion information for subblocks of a coding unit based on motion information of collocated blocks of a neighboring picture in time. This can improve the performance of Temporal Motion Vector Prediction (TMVP) and reduce the complexity of common or worst case.
  • TMVP Temporal Motion Vector Prediction
  • SbTMVP subblock-based temporal merging candidate
  • ATMVP may be derived by the following process.
  • the encoder / decoder may add motion vectors from spatial neighboring coding units if a neighboring coding unit is available and the motion vector of the available coding unit is different from the motion vector in the current candidate list.
  • the above-described process may be performed in the order of Al, Bl, BO, AO, and B2.
  • the above-described process may derive ATMVP using only the motion vector of the fixed position (eg, A1 position) block.
  • the encoder / decoder may be used to determine a position from which a first motion vector candidate among the available No spatial candidates is derived from a collocated picture and motion information of each subblock. Where N is the number of available space candidates. If No is 0, a collocated picture and a collocated position of motion 0 may be used to derive motion information of each subblock.
  • collocated pictures of different coding units may not be the same in ATMVP.
  • having different collocated pictures for ATMVP derivation means that motion information fields of multiple reference pictures must be derived, which is undesirable because it increases memory bandwidth. not.
  • the same collocated picture is obtained when inducing ATMVP.
  • a method of using the same collocated picture may be defined in a slice (or tile group) header, but the present invention is not limited thereto.
  • the motion vector of the neighboring block A may be scaled based on a temporal motion vector scaling method.
  • the scaled motion vector of the neighboring block A can be used in ATMVP.
  • FIG. 17 is a diagram illustrating a method of deriving an Advanced Temporal Motion Vector Prediction (ATMVP) candidate according to an embodiment to which the present invention is applied. Referring to FIG.
  • ATMVP Advanced Temporal Motion Vector Prediction
  • the encoder / decoder may find the motion vector of the spatial neighbor block that is available first, checking as in the merge candidate construction order shown in FIG. 17.
  • the position indicated by the motion vector in the reference picture can be derived as col-PB (ie, ATMVP candidate).
  • the motion vector may be used as a motion vector of a corresponding block in each subblock unit.
  • the motion vector of the center block located in the center of the corresponding block may be used as a motion vector for the unavailable subblock, which is used as a representative motion vector.
  • a method of reducing temporal motion vector storage based on motion vector data of spatial candidates is proposed for temporal motion vector data compression.
  • FIGS. 18 and 19 are diagrams illustrating a method of compressing temporal motion vector data and positions of spatial candidates used therein according to an embodiment to which the present invention is applied.
  • a motion vector of the spatial candidate may be set as a basic motion vector for compression.
  • up to five spatial candidates may be used as reference temporal motion information for deriving a fundamental temporal motion vector.
  • the five spatial candidates may be set as shown in FIG. 19.
  • temporal motion vector data may be compressed based on the motion vectors of the spatial candidates.
  • the order of searching for spatial candidates may be as shown in FIG. 18.
  • Spatial candidates may be identified according to the order of the center block (C), the upper left block (TL), the upper right block (TR), the lower left block (BL), and the lower right block (the drawing. This is only an example. The present invention is not limited to this, and other combinable order may be applied.
  • the encoder / decoder may check whether the center block is inter-predicted. If the center block is inter-predicted, the encoder / decoder may set the center block (its motion vector as a default for motion vector prediction). have.
  • the encoder / decoder may check whether the upper left block TL is inter predicted. If the upper left block TL is inter predicted, the encoder / decoder may set the motion vector of the upper left block TL as a default for motion vector prediction.
  • the encoder / decoder may check whether the lower right block BR is inter predicted, and if the lower right block BR is inter predicted, the encoder The decoder may set the motion vector of the lower right block to a default value for motion vector prediction. If it is not inter predicted, the encoder / decoder can set the intra mode to default. Through the above process, the encoder / decoder can compress a default motion vector into motion information.
  • a method for performing ATMVP based on an adaptive subblock size is proposed.
  • the sub block size used for ATMVP derivation may be adaptively applied at the slice level.
  • the encoder can signal one default sub-block size used for ATMVP motion derivation to the decoder at the sequence level.
  • a flag may be signaled at the picture or slice level. If the flag is false, the ATMVP subblock size may be additionally signaled in the slice header.
  • the area of the collocated block for ATMVP is currently
  • NxN block may be a 4x4 block, but the present invention may It is not limited.
  • the ATMVP collocated block identified by the motion vector of the merge candidate may be moved to be located within the restricted area. For example, it may be moved to be located at the nearest boundary within the restricted area.
  • an encoder / decoder is used to determine a collocated block (or collocated subblock) in a collocated picture specified based on motion information of a spatially neighboring block.
  • the motion information may be priced in a subblock-based temporal merging canddate, three subblock merging candidate lists.
  • motion information of a spatially neighboring block may be referred to as a temporal motion vector.
  • the encoder / decoder may derive a subblock based time merge candidate when the width and height of the current coding block are greater than or equal to a predetermined specific size.
  • the predetermined specific size may be eight.
  • the encoder / decoder may set the motion information of the first spatial candidate among the available spatial candidates as a temporal motion vector.
  • the encoder / decoder may search for available spatial candidates in the order of Al, Bl, BO, and A0.
  • the encoder / decoder may set a spatial candidate whose reference picture is the same as the collocated picture among the available spatial candidates as a temporal motion vector. 2019/194502 1 »(: 1/10 ⁇ 019/003810
  • the encoder / decoder may check whether a spatial candidate of one fixed position is available and, if available, set the motion vector of the spatial candidate as a temporal motion vector.
  • the spatial candidate at one fixed position may be set to a block at position A1.
  • the encoder / decoder may specify the position of a collocated block in a collocated picture using the temporal motion vector. For example, the following Equation 1 may be used.
  • xColCb Clip3 (xCtb, Min (CurPicWidthlnSamplesY-1, xCtb + (1 «CtbLog2SizeY) + 3), xColCtrCb + (tempMv [0]» 4))
  • yColCb Clip3 (yCtb, Min (CurPicHeightlnSamplesY — 1, yCtb + (1 «CtbLog2SizeY)-1), yColCtrCb + (tempMv [l]» 4))
  • (xColCtrCb, yColCtrCb) represents the top-left sample position of the collocated coding block including the right bottom sample of the center position
  • tempMv represents the time motion vector
  • the encoder / decoder may determine a position for deriving motion information of each subblock in the current coding block on a subblock basis.
  • the location of the collocated subblock in the collocated picture may be derived using Equation 2 below.
  • xColSb Clip3 (xCtb, Min (CurPicWidthlnSamplesY-1 , xCtb + (1 «
  • (xSb, ySb) represents the position of the current subblock.
  • the encoder / decoder may use the motion information of the collocated block specified using a temporal motion vector.
  • An object of the present invention is to propose a method of deriving a temporal motion vector for hardware design. Since typical hardware decoders are designed to use pipelines, in deriving motion vectors, the use of temporal motion vectors has a direct impact on memory bandwidth, throughput, and latency. Can be. Therefore, we propose a limited time motion vector derivation method for effective pipeline processing in H / W.
  • TMVP subblock-based temporal motion vector prediction
  • Example 1 Adaptive time motion vector prediction (ATMVP) and adaptive time motion vector prediction extension (ATMVP-ext) indicate a prediction mode using time motion vectors in sub-block units using time motion vectors.
  • the temporal motion data may be derived from a collocated block of the spatial neighboring block including the lower right position.
  • a collocated block may be referred to as a coll block.
  • the lower right bottom position beyond the CTU row may cause memory bandwidth and line buffer problems in the hardware decoder performing the CTU unit motion data patching process.
  • a method of limiting candidate derivation for a temporal motion vector at the CTU boundary is proposed.
  • the time motion vector information in order to use the time motion information, the time motion vector information must be obtained before decoding the motion vector of the current block, and the obtained time motion vector information must be stored in the line buffer. Therefore, in terms of hardware, fetch size and location are very important.
  • Conventional hardware decoders fetch motion data from a reference picture in units of CTUs.
  • the encoder / decoder may perform motion estimation / compensation using regions of the same CTU line for patching memory in connection with reducing the storage memory of the hardware line buffer.
  • 20 is a diagram for describing a problem occurring in a sub-block unit prediction method using a conventional time motion vector as an embodiment to which the present invention can be applied.
  • 20 illustrates a conventional ATMVP-Ext mode, which is one of subblock based time motion vector derivation methods.
  • the ATMVP-Ext mode is not limited to its name, and refers to an inter prediction mode (or motion prediction candidate) that combines a motion vector of a temporal candidate and a spatial candidate to derive a motion vector on a sub-block basis.
  • the motion vector of the current block (ie, the current subblock) may be generated by an average of the motion vector of the spatial candidate and the motion vector of the temporal candidate.
  • the temporal motion vector may be derived from a subblock located at the lower right corner.
  • the encoder / decoder can derive the motion vectors of the left and upper spatial candidates of the current sub-I block,
  • the sub-block at the lower right position of the CTU line may be used as an i time candidate.
  • FIG. 21 is a diagram illustrating a method of deriving a motion vector in sub-block units using motion vectors of a spatial candidate and a temporal candidate according to an embodiment to which the present invention is applied.
  • the encoder / decoder may derive the motion vectors of the left and upper spatial candidates of the current subblock and derive the motion vectors of the right and lower temporal candidates of the current subblock. At this time, in the embodiment of the present invention, the encoder / decoder may not use the time movement vector of the lower right position outside the CTU line as shown in FIG. 21 to solve the above problem.
  • the spatial candidate is a block having the same vertical coordinates (or rows) as the current subblock among the neighboring blocks of the current block (ie, the upper block of the current subblock), and an upper spatial candidate of the current subblock is adjacent to the current block.
  • the block may be the same block as the current sub-block in the horizontal direction (or column).
  • the encoder / decoder may apply a PU-based memory fetch method as shown in FIG. 22.
  • it is required to patch a minimum motion vector storage unit represented by a block of (block width +1) M block height + 1) size as shown in FIG.
  • a 4x4 block known as a worst case, we need to patch a 5x5 area. This can be a fatal bottleneck in hardware pipeline processing.
  • FIG. 23 is a diagram for describing a method of deriving a motion vector in sub-block units using motion vectors of a spatial candidate and a temporal candidate according to an embodiment to which the present invention is applied.
  • the encoder / decoder may use time motion vectors of various alternative positions as shown in 23. .
  • the encoder / decoder may use a block at the same position as the current subblock as a time candidate.
  • FIG. 23 (as shown therein, the encoder / decoder is currently When the sub block is located in the last column or row in the current block (ie CU), a block at the same position as the current sub block can be used as a time candidate.
  • the encoder / decoder selects a block located to the right of the current subblock as a time candidate when the current subblock is located in the last row of the current block (ie, CU). Available.
  • the encoder / decoder is a block having the same position as the current subblock as a time candidate when the current subblock is located in the last row in the current block (ie CU). Can be used.
  • the memory bandwidth can be reduced and additional line buffer problems can be solved.
  • Example 2
  • the use of a time motion vector derivation method for pipeline processing may be applied.
  • 24 is a diagram illustrating an example of a method of deriving a temporal motion vector based on a subblock according to an embodiment to which the present invention is applied.
  • ATMVP may be referred to as a subblock-based temporal merging candidate, SbTMVP.
  • the encoder / decoder derives a reference block (or collocated block) using the motion vector motion vector of the spatial candidate (or spatial neighboring block) of the current block.
  • the encoder / decoder derives the motion vector of the current block in sub-block units using the motion vector of the reference block.
  • the motion vector of the reference block may be scaled to a call picture based on a picture order count (POC).
  • POC picture order count
  • the motion vector of the reference block may be scaled based on a POC difference between the reference block and the reference block of the reference block and a POC difference between a current block (or current picture) and a call block (or current picture). .
  • the encoder / decoder may use the motion vector of the scaled reference block for the call picture as a subblock based time motion vector (or time candidate) of the current block.
  • the motion vector of the temporal candidate may be derived after decoding the motion vector, the reference picture list, and the reference picture index of the temporal candidate. Because of this, the use of temporal motion vectors can cause delays in pipeline processing and reduce the throughput of the decoder.
  • 25 and 26 are diagrams illustrating a pipeline structure for performing motion compensation using a temporal motion vector as an embodiment to which the present invention is applied.
  • the hardware decoder may be designed to use a pipeline structure as shown in FIG. Referring to FIG. 25, a patch operation of a temporal motion vector is required for motion compensation. It may be derived after decoding the motion vector, the reference picture list, and the reference picture index of the temporal candidate. Thus, the use of a temporal motion vector can cause delays in pipeline processing and reduce the throughput of the decoder. The worst case is a situation where all blocks consist of 4x4 blocks. In this environment, if the temporal neighbor motion vector has not been decoded, the pipeline process operating in units of 4 ⁇ 4 to obtain the temporal motion vector causes time lag and throughput reduction.
  • FIG. 26 illustrates an example of a pipeline structure of a motion vector derivation process to which a motion information candidate selection step is added.
  • using the time motion vector may cause delays in the pipeline processing and reduce the throughput of the decoder.
  • FIG. 27 is a diagram illustrating a method of setting a minimum size block for using a motion vector of a spatial neighboring block as an embodiment to which the present invention is applied.
  • the encoder / decoder may use candidate positions around the boundary of the minimum size block to refer to the spatial motion vector candidate. That is, the spatial candidate may be configured as a position defined as shown in FIG. 27 based on the boundary of the minimum size block. For example, the spatial candidates of the left block (or sample), lower left block (or sample), upper block (or sample), upper right block (or sample), and upper left block (or sample) of the minimum size block are selected.
  • the motion information candidate list can be constructed.
  • the encoder / decoder may use the motion vector of the spatial candidate as described above with reference to FIG. 24, in which case the spatial candidate is the minimum size block. It may contain blocks around the boundary of.
  • the minimum size block of the present invention is not limited in name thereto.
  • the minimum size block may be referred to as an ancestor node block, a merge shared node block, a candidate shared node block, and the like.
  • the candidate list of coding units ie, higher node blocks, higher node coding units
  • the candidate list may be referred to as, for example, a shared candidate list, a shared merge candidate list, a shared subblock merge candidate list, and the like.
  • the minimum size block may be a specific size threshold. If the current block is smaller than the specific size threshold, the current block may share the motion information candidate list of the ancestor node block of the current block.
  • the minimum size block may be preset at the encoder and decoder, or may be signaled from the encoder to the decoder.
  • a syntax indicating a minimum size of a block for deriving a motion information candidate may be signaled through a syntax as shown in Table 2 below.
  • the decoder adds to the current block. If applicable,
  • FIG. 1 is a diagram illustrating a method of configuring motion information transmission based on a minimum size block for using a motion vector of a spatial neighboring block as an embodiment to which the present invention is applied.
  • every CU may generate a candidate list using spatial candidates outside the boundary of the minimum size block (ie, ancestor node block).
  • the encoder / decoder is a motion vector of the spatial candidate as described in FIG. 24.
  • the spatial candidate may include a block around the boundary of the minimum size block as described in FIG.
  • subblock based TMVP temporary motion vector prediction
  • the conventional TMVP derives one motion vector corresponding to the current block from the call picture.
  • ATMVP derives the motion vector of the block specified by the motion vector of the spatial candidate in units of 4 ⁇ 4 subblocks.
  • the sub-block-based TMVP may be derived from the motion vector of the block (ie, the call block) at the same position as the current block in the call picture from the motion vector of the subblock of the current block.
  • TMVP temporal motion vector prediction
  • the encoder / decoder may specify a block corresponding to the current block in the call picture and derive a motion vector of the specified block in sub-block units.
  • Sub-block-based TMVP can support simple time motion vector patching and efficient pipelining compared to conventional ATVMP, so that there is no burden in terms of hardware, while efficient motion prediction is more efficient than conventional TMVP. It may be possible.
  • the encoder / decoder may use the zero motion vector as the motion vector for the sub-block as shown in FIG.
  • the encoder / decoder may use the default motion vector as the motion vector for that subblock.
  • FIG. 31 is a diagram illustrating subblock-based temporal motion vector prediction (TMVP) according to an embodiment to which the present invention is applied.
  • TMVP subblock-based temporal motion vector prediction
  • the encoder / decoder sets a default motion vector to the corresponding sub-block.
  • the default motion vector may be derived according to the conventional TMVP determination method as shown in FIG. 31.
  • the default motion vector may be determined as a motion vector of a temporal neighboring block including a pixel at the center lower right side of the current block or a temporal neighboring block including a pixel at a position adjacent to the lower right side of the current block.
  • C0 represents a lower right block and another center lower right block.
  • the C0 block may be considered first. If the C0 block is not available, the C1 block can be used to derive a TMVP candidate (ie, a default motion vector). In one embodiment, C0 blocks located in the same CTU row may be used to reduce memory bandwidth.
  • TMVP candidate ie, a default motion vector
  • Embodiments of the present invention described above have been described separately for the convenience of description, but the present invention is not limited thereto. That is, the embodiments described in the embodiments 1 to 3 described above may be performed independently, or one or more embodiments may be combined and performed.
  • 32 is a flowchart illustrating a method of generating an inter prediction block according to an embodiment to which the present invention is applied.
  • a decoder is mainly described for convenience of description, but the present invention is not limited thereto, and the method of generating an inter prediction block according to an embodiment of the present invention may be performed in the same manner in the encoder and the decoder.
  • Decoder prediction vector subblock motion vector based temporal prediction
  • the movement of the space available neighboring blocks (spatial neighboring block) of the current block Deduce the vector (33201).
  • the decoder derives a collocated block of the current block based on the motion vector of the spatial neighboring block in the collocated picture of the current block (S3202).
  • the decoder may scale the motion vector of the spatial neighboring block based on a picture order count (POC).
  • POC picture order count
  • the decoder includes a picture order count (POC) between the first reference picture of the spatial neighboring block and the second reference picture of the block specified by the motion vector of the spatial neighboring block. : scale a motion vector of the spatial neighboring block based on a picture order count) difference, and a POC difference between a current picture and the collocated picture, and use the scaled motion vector to coordinate the colo within the collocated picture. You can derive the blocked block.
  • POC picture order count
  • the decoder derives a motion vector in sub-block units within the current block based on the motion vector of the collocated block (S3203).
  • the decoder generates a prediction block of the current block by using the motion vector derived in the sub-block unit (S3204).
  • the spatial neighboring block selects among neighboring blocks based on a boundary of an upper node block of the current block.
  • the upper node block has a partition depth of 1 than the current block in the block partition structure. Small may represent an ancestor node block of the current block.
  • the specific magnitude threshold may be a preset value, a sequence parameter set, a picture parameter set, or a tile group header. It may be a value signaled from the encoder through a group header).
  • FIG 33 is a diagram illustrating an inter prediction apparatus according to an embodiment to which the present invention is applied.
  • the inter prediction unit is illustrated as one block, but the inter prediction unit may be implemented in a configuration included in the encoder and / or the decoder.
  • the inter prediction unit implements the functions, processes, and / or methods proposed in FIGS. 8 to 32.
  • the inter prediction unit may include a spatial candidate derivation unit 3301, a collocated block derivation unit 3302, a sub block motion vector derivation unit 3303, and a prediction block generation unit 3304.
  • the spatial candidate derivator 3301 may apply a motion vector of an available spatial neighboring block of the current block when a subblock based temporal motion vector predictor is applied to the current block. Induce.
  • the collocated block derivation unit 3302 selects a collocated block of the current block based on a motion vector of the spatial neighboring block in a collocated picture of the current block. Induce.
  • the collocated block derivation unit 3302 may scale the motion vector of the spatial neighboring block based on a picture order count (POC).
  • POC picture order count
  • the collocated block derivation unit 3302 may refer to the second reference of the block specified by the first reference picture of the spatial neighboring block and the motion vector of the spatial neighboring block. Scaling a motion vector of the spatial neighboring block based on a picture order count (POC) difference between pictures and a POC difference between a current picture and the collocated picture and using the scaled motion vector It is possible to derive the collocated block within the gated picture.
  • POC picture order count
  • the sub block motion vector derivation unit 3303 derives a motion vector in units of sub blocks in the current block based on the motion vector of the collocated block.
  • the block generator 3304 generates a prediction block of the current block by using the motion vector derived in the sub-block units.
  • the spatial neighboring block selects among neighboring blocks based on a boundary of an upper node block of the current block.
  • the upper node block may represent an ancestor node block of the current block having a partition depth smaller than that of the current block in the block partition structure.
  • the threshold value may be a preset value or a value signaled from an encoder through a sequence parameter set, a picture parameter set, or a tile group header.
  • the video coding system can include a source device and a receiving device.
  • the source device may transmit the encoded video / image information or data to a receiving device through a digital storage medium or a network in the form of a file or streaming.
  • the source device may include a video source, an encoding apparatus, and a transmitter.
  • the receiving device may include a receiver, a decoding apparatus, and a tender.
  • the encoding device may be called a video / video encoding device, and the decoding device may be called a video / video decoding device.
  • the transmitter may be included in the encoding device.
  • the receiver may be included in the decoding device.
  • the renderer may include a display unit, and the display unit may be configured as a separate device or an external component.
  • the video source may acquire the video / image through a process of capturing, synthesizing, or generating the video / image.
  • the video source may comprise a video / image capture device and / or a video / image generation device.
  • the video / image capture device may include, for example, one or more cameras, video / image archives including previously captured video / images, and the like.
  • Video / Video Generation The device may include, for example, a computer, a tablet, a smartphone, and the like, and may generate (electronically) video / images.
  • a virtual video / image may be generated through a computer.
  • the video / image capturing process may be replaced by a process of generating related data.
  • the encoding device may encode the input video / image.
  • the encoding apparatus may perform a series of procedures such as prediction, transform, and quantization for compression and coding efficiency.
  • the encoded data (encoded video / image information) may be output in the form of bitstreams.
  • the transmitter may transmit the encoded video / video information or data output in the form of a bitstream to the receiver of the receiving device through a digital storage medium or a network in the form of a file or streaming.
  • the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, and the like.
  • the transmission unit may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcast / communication network.
  • the receiver extracts the bitstream and delivers it to the decoding apparatus.
  • the decoding apparatus may decode the video / image by performing a series of procedures such as inverse quantization, inverse transformation, and prediction corresponding to the operation of the encoding apparatus.
  • the renderer may render the decoded video / image.
  • the rendered video / image may be displayed through the display unit.
  • 35 shows a structure diagram of a content streaming system according to an embodiment to which the present invention is applied.
  • 75 encoding server, streaming server, web server, media storage, user device and multimedia input device.
  • the encoding server compresses content input from multimedia input devices such as a smart phone, a camera, a camcorder, etc. into digital data to generate a bitstream and transmit the bitstream to the streaming server.
  • multimedia input devices such as smart phones, cameras, camcorders, etc. directly generate a bitstream
  • the encoding server may be omitted.
  • the bitstream may be generated by an encoding method or a bitstream generation method to which the present invention is applied, and the streaming server may temporarily store the bitstream in the process of transmitting or receiving the bitstream.
  • the streaming server transmits the multimedia data to the user device based on the user's request through the web server, and the web server serves as a medium for informing the user of what service.
  • the web server delivers it to a streaming server, and the streaming server transmits multimedia data to the user.
  • the content streaming system may include a separate control server, in which case the control server serves to control the command / response between each device in the content streaming system.
  • the streaming server may receive content from a media store and / or an encoding server. For example, when the content is received from the encoding server, the content may be received in real time. In this case, the streaming server schedules the bitstream to provide a smooth streaming service. Can save for hours.
  • Examples of the user device include a mobile phone, a smart phone, a laptop computer, a digital broadcasting terminal, a personal digital assistant (PDA), a portable multimedia player (PMP), a navigation device, a slate PC, Tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glass, head mounted displays), digital TVs, desktops There may be computers, digital signage, etc.
  • Each server in the content streaming system may operate as a distributed server, in which case data received from each server may be distributed.
  • the embodiments described herein may be implemented and performed on a processor, microprocessor, controller, or chip.
  • the functional units shown in each drawing may be implemented and performed on a computer, processor, microprocessor, controller, or chip.
  • the decoder and encoder to which the present invention is applied include a multimedia broadcasting transmitting and receiving device, a mobile communication terminal, a home cinema video device, a digital cinema video device, a surveillance camera, a video chat device, a real time communication device such as video communication, a mobile streaming device, Storage media, camcorders, video on demand (VoD) service providers, OTT video (Over the top video) devices, Internet streaming service providers, 3D (3D) video devices, video telephony video devices, and medical video devices. It can be used to process video signals or data signals.
  • an OTT video (Over the top video) device may be a game console, Blu-ray player, internet Home theater system, smartphone, tablet
  • DVR Digital Video Recorder
  • the processing method to which the present invention is applied can be produced in the form of a program executed by a computer, and stored in a computer-readable recording medium.
  • Multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium.
  • the computer readable recording medium includes all kinds of storage devices and distributed storage devices in which computer readable data is stored.
  • the computer-readable recording medium may include, for example, a Blu-ray Disc (BD), a Universal Serial Bus (USB), a ROM, a PROM, an EPROM, an EEPROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, and an optical disc. It may include a data storage device.
  • the computer-readable recording medium also includes media embodied in the form of a carrier wave (for example, transmission over the Internet).
  • the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted through a wired or wireless communication network.
  • an embodiment of the present invention may be implemented as a computer program product by program code, which may be performed on a computer by an embodiment of the present invention.
  • the program code may be stored on a carrier readable by a computer.
  • Embodiments according to the present invention can be implemented by various means, for example, hardware, firmware, software, or a combination thereof.
  • an embodiment of the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), and the like.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs Programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, microcontrollers, microprocessors, and the like.
  • an embodiment of the present invention may be implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above.
  • the software code may be stored in memory and driven by the processor.
  • the memory may be located inside or outside the processor, and may exchange data with the processor by various known means.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé de décodage d'un signal vidéo, et un appareil associé. Un procédé de décodage d'une image basé sur un mode d'inter-prédiction comprend les étapes consistant à : déduire un vecteur de mouvement d'un bloc disponible spatialement voisin d'un bloc actuel lorsqu'une prédiction de vecteur de mouvement temporel basée sur un sous-bloc est appliquée au bloc actuel ; déduire un bloc co-localisé du bloc actuel sur la base du vecteur de mouvement du bloc spatialement voisin à l'intérieur d'une image co-localisée du bloc actuel ; déduire un vecteur de mouvement dans des unités de sous-bloc à l'intérieur du bloc actuel sur la base du vecteur de mouvement du bloc co-localisé ; et générer un bloc de prédiction du bloc actuel à l'aide du vecteur de mouvement déduit dans les unités de sous-bloc.
PCT/KR2019/003810 2018-04-01 2019-04-01 Procédé de traitement d'image basé sur un mode d'inter-prédiction, et appareil associé WO2019194502A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862651231P 2018-04-01 2018-04-01
US62/651,231 2018-04-01

Publications (1)

Publication Number Publication Date
WO2019194502A1 true WO2019194502A1 (fr) 2019-10-10

Family

ID=68101362

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/003810 WO2019194502A1 (fr) 2018-04-01 2019-04-01 Procédé de traitement d'image basé sur un mode d'inter-prédiction, et appareil associé

Country Status (1)

Country Link
WO (1) WO2019194502A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112218076A (zh) * 2020-10-17 2021-01-12 浙江大华技术股份有限公司 一种视频编码方法、装置、系统及计算机可读存储介质
WO2021072795A1 (fr) * 2019-10-17 2021-04-22 北京大学深圳研究生院 Procédé et appareil d'encodage et de décodage sur la base d'une prédiction inter-trame
WO2021206479A1 (fr) * 2020-04-08 2021-10-14 삼성전자 주식회사 Procédé et appareil de décodage vidéo pour obtenir un vecteur de mouvement, et procédé et appareil de codage vidéo pour obtenir un vecteur de mouvement
CN116233464A (zh) * 2020-04-08 2023-06-06 北京达佳互联信息技术有限公司 用于视频编码的方法、装置和非暂态计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160006250A (ko) * 2010-09-27 2016-01-18 엘지전자 주식회사 블록 분할 방법 및 복호화 장치
WO2016165069A1 (fr) * 2015-04-14 2016-10-20 Mediatek Singapore Pte. Ltd. Prédiction de vecteurs mouvement temporelle avancée en codage vidéo
KR20170108010A (ko) * 2015-01-26 2017-09-26 퀄컴 인코포레이티드 서브-예측 유닛 기반 어드밴스드 시간 모션 벡터 예측
US20180077425A1 (en) * 2011-02-09 2018-03-15 Lg Electronics Inc. Method for storing motion information and method for inducing temporal motion vector predictor using same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160006250A (ko) * 2010-09-27 2016-01-18 엘지전자 주식회사 블록 분할 방법 및 복호화 장치
US20180077425A1 (en) * 2011-02-09 2018-03-15 Lg Electronics Inc. Method for storing motion information and method for inducing temporal motion vector predictor using same
KR20170108010A (ko) * 2015-01-26 2017-09-26 퀄컴 인코포레이티드 서브-예측 유닛 기반 어드밴스드 시간 모션 벡터 예측
WO2016165069A1 (fr) * 2015-04-14 2016-10-20 Mediatek Singapore Pte. Ltd. Prédiction de vecteurs mouvement temporelle avancée en codage vidéo

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN, JIANLE ET AL.: "Algorithm Description of Joint Exploration Test Model 7 (JEM 7)", JVET-G1001-VL. JOINT VIDEO EXPLORATION TEAM (JVET) OF ITU-T SG 16 WP 3, 19 August 2017 (2017-08-19), Torino, IT, pages 1 - 44, XP030150980 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021072795A1 (fr) * 2019-10-17 2021-04-22 北京大学深圳研究生院 Procédé et appareil d'encodage et de décodage sur la base d'une prédiction inter-trame
WO2021206479A1 (fr) * 2020-04-08 2021-10-14 삼성전자 주식회사 Procédé et appareil de décodage vidéo pour obtenir un vecteur de mouvement, et procédé et appareil de codage vidéo pour obtenir un vecteur de mouvement
CN116233464A (zh) * 2020-04-08 2023-06-06 北京达佳互联信息技术有限公司 用于视频编码的方法、装置和非暂态计算机可读存储介质
CN116233464B (zh) * 2020-04-08 2024-03-19 北京达佳互联信息技术有限公司 用于视频编码的方法、装置和非暂态计算机可读存储介质
CN112218076A (zh) * 2020-10-17 2021-01-12 浙江大华技术股份有限公司 一种视频编码方法、装置、系统及计算机可读存储介质

Similar Documents

Publication Publication Date Title
JP7141463B2 (ja) インター予測モードに基づいた映像処理方法およびそのための装置
KR102502175B1 (ko) 인터 예측 모드 기반 영상 처리 방법 및 이를 위한 장치
KR102510771B1 (ko) 영상 코딩 시스템에서 어파인 mvp 후보 리스트를 사용하는 어파인 움직임 예측에 기반한 영상 디코딩 방법 및 장치
KR102545728B1 (ko) 서브블록 단위의 시간적 움직임 정보 예측을 위한 인터 예측 방법 및 그 장치
KR102658929B1 (ko) 인터 예측 모드 기반 영상 처리 방법 및 이를 위한 장치
WO2019194502A1 (fr) Procédé de traitement d'image basé sur un mode d'inter-prédiction, et appareil associé
WO2019194497A1 (fr) Procédé de traitement d'image basé sur un mode d'inter-prédiction et appareil associé
CN113508583A (zh) 基于帧内块编译的视频或图像编译
KR20200078647A (ko) 인터 예측 모드 기반 영상 처리 방법 및 이를 위한 장치
KR102594692B1 (ko) 크로마 성분에 대한 영상 디코딩 방법 및 그 장치
KR20240032153A (ko) 크로마 성분에 대한 영상 디코딩 방법 및 그 장치
WO2019194499A1 (fr) Procédé de traitement d'image basé sur un mode de prédiction inter et dispositif associé
WO2019194498A1 (fr) Procédé de traitement d'image basé sur un mode d'inter-prédiction et dispositif associé
KR102640264B1 (ko) 크로마 양자화 파라미터 데이터에 대한 영상 디코딩 방법 및 그 장치
KR102670935B1 (ko) 크로마 성분에 대한 영상 디코딩 방법 및 그 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19781999

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19781999

Country of ref document: EP

Kind code of ref document: A1