WO2020175904A1

WO2020175904A1 - Method and apparatus for picture partitioning on basis of signaled information

Info

Publication number: WO2020175904A1
Application number: PCT/KR2020/002729
Authority: WO
Inventors: 파루리시탈; 김승환
Original assignee: 엘지전자 주식회사
Priority date: 2019-02-26
Filing date: 2020-02-26
Publication date: 2020-09-03

Abstract

An image decoding method performed by a decoding apparatus, according to the present disclosure, comprises the steps of: obtaining, from a bitstream, image information comprising partition information for a current picture and prediction information for a current block included in the current picture; deriving, on the basis of the partition information for the current picture, a partitioning structure of the current picture based on a plurality of tiles; deriving prediction samples for the current block on the basis of the prediction information for the current block included in one of the plurality of tiles; and reconstructing the current picture on the basis of the prediction samples.

Description

2020/175904 1»（：1^1{2020/002729 specification

Name of the invention: Picture partitioning method and apparatus based on signaled information

Technical field

[1] This disclosure is about video coding technology, and more specifically, video coding.

It relates to a picture partitioning method and apparatus based on information signaled in a system.

Background

四 The latest 4K or 8K or higher UHD (Ultra High Definition) video/video

Demand for high-resolution, high-quality video/video is increasing in various fields. The higher the resolution and quality of the video/video data, the higher the amount of information or bits to be transmitted compared to the existing video/video data, so the video data can be transmitted using a medium such as an existing wired/wireless broadband line or an existing storage medium. When using the video/video data to be stored, the transmission cost and storage cost increase.

[3] In addition, interest and demand for immersive media such as VR (Virtual Reality) and AR (Artificial Realtiy) contents and holograms are increasing recently. Broadcasting for video/video is increasing.

[4] Accordingly, in recent years, video/video applications with various characteristics

There is a need for a flexible picture partitioning method that can be applied to efficiently compress and play back images/videos.

Detailed description of the invention

Technical task

[5] The technical task of this disclosure is to provide a method and apparatus to increase the efficiency of image coding.

[6] Another technical task of this disclosure is to provide a method and apparatus for signaling partitioning information.

[7] Another technical task of this disclosure is to create pictures based on signaled information.

It is to provide a flexible partitioning method and apparatus.

[8] Another technical task of this disclosure is to provide a method and apparatus for partitioning a current picture based on partition information for the current picture.

[9] Another technical task of the present disclosure is to provide a method and apparatus for partitioning a current picture based on a tile group including tiles that are not adjacent to the current picture.

Problem solving means 2020/175904 1»（：1^1{2020/002729

[1] According to an embodiment of the present disclosure, a video decoding method performed by a decoding apparatus is provided. The method includes, based on the division information for the current picture, division of the current picture based on a plurality of tiles. Including the step of reducing a partitioning structure, wherein the plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups includes tiles that are not adjacent to the current picture. Include.

[11] According to another embodiment of the present disclosure, a decoding apparatus for performing image decoding is provided. The decoding apparatus includes, based on division information for the current picture, division of the current picture based on a plurality of tiles. It includes an entropy decoding unit that helps a partitioning structure, wherein the plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups is a tile that is not adjacent to the current picture. Includes them.

[12] According to another embodiment of the present disclosure, a method for encoding an image performed by an encoding device is provided. The method includes the step of generating partition information for a current picture based on a plurality of tiles, , The plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups includes tiles that are not adjacent to the current picture.

[13] According to another embodiment of the present disclosure, an encoding device for performing image encoding is provided. The encoding device divides a current picture into a plurality of tiles, and based on the plurality of tiles, an encoding device is provided. Including an image segmentation unit for generating segmentation information for a picture, wherein the plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups includes tiles that are not adjacent to the current picture. Include.

[14] According to another embodiment of the present disclosure, a computer-readable digital storage medium is provided for storing encoded image information that causes an image decoding method to be performed by a decoding device. The decoding according to one embodiment of the present disclosure is provided. The method includes the step of reducing a partitioning structure of the current picture based on a plurality of tiles, based on the partitioning information on the current picture, wherein the plurality of tiles are grouped into a plurality of tile groups. And, at least one type group among the plurality of tile groups includes tiles that are not adjacent to the current picture.

Effects of the Invention

[15] According to this specification, the overall image/video compression efficiency can be improved.

[16] According to this specification, the efficiency of picture partitioning can be improved.

[17] According to this specification, it is possible to increase the flexibility of picture partitioning based on the partition information for the current picture.

[18] According to this specification, tiles including tiles that are not adjacent to the current picture By partitioning the current picture based on the 2020/175904 1»（：1/10公020/002729 group, the efficiency of signaling for picture partitioning can be improved.

Brief description of the drawing

1 schematically shows an example of a video/video coding system to which this disclosure can be applied.

2 is a diagram schematically illustrating a configuration of a video/video encoding apparatus to which the present disclosure can be applied.

3 is a diagram schematically illustrating a configuration of a video/video decoding apparatus to which the present disclosure can be applied.

[22] FIG. 4 exemplarily shows a hierarchical structure for coded data.

[23] FIG. 5 is a diagram showing an example of partitioning a picture.

6 is a flowchart illustrating a procedure for encoding a picture based on a tile and/or a tile group according to an embodiment.

7 is a flowchart illustrating a picture decoding procedure based on a tile and/or a tile group according to an embodiment.

8 is a diagram showing an example of partitioning a picture into a plurality of tiles. 9 is a block diagram showing the configuration of an encoding apparatus according to an embodiment. W is a block diagram showing the configuration of a decoding apparatus according to an embodiment. 11 is a diagram showing an example of a tile and a tile group unit constituting a current picture.

12 is a diagram schematically showing an example of a signaling structure of tile group information.

13 is a diagram illustrating an example of a picture in a video conferencing program.

]]]]]]]]]]]]]]]]]]] Fig. 14 shows pictures as tiles or tile groups in the video conferencing video program 9079024790563568811

2222242233333333331 This is a drawing showing an example of partitioning.

15 is a diagram illustrating an example of partitioning a picture into tiles or tile groups based on MCTS (Motion Constrained Tile Set).

16 is a diagram illustrating an example of dividing a picture based on an R0I area. 17 is a diagram showing an example of partitioning a picture into a plurality of tiles. 18 is a diagram illustrating an example of partitioning a picture into a plurality of tiles and tile groups.

19 is a diagram illustrating an example of partitioning a picture into a plurality of tiles and tile groups.

20 is a diagram illustrating an example of partitioning a picture into a plurality of tiles and tile groups.

21 is a flow chart showing the operation of the decoding apparatus according to an embodiment. 22 is a block diagram showing a configuration of a decoding apparatus according to an embodiment. 2020/175904 1»（：1^1{2020/002729

23 is a flow chart showing the operation of the encoding device according to an embodiment.

24 is a block diagram showing the configuration of an encoding apparatus according to an embodiment.

[43] FIG. 25 shows an example of a content streaming system to which the disclosure of this document can be applied.

Show.

Modes for the implementation of the invention

[44] Since various modifications can be made to the present disclosure and various embodiments may be made, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present disclosure to specific embodiments. The terms commonly used in the specification are only used to describe specific embodiments and are not intended to limit the technical idea of this disclosure. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, terms such as "include" or "have" are intended to designate the existence of a feature, number, step, action, component, part, or combination of those listed in the specification, one or more other features. It is to be understood that figures, numbers, steps, actions, components, and parts diagrams do not preclude the existence or additional possibility of any combination of them.

[45] On the other hand, each of the configurations in the drawings described in this disclosure

As shown independently for the convenience of explanation of the functions, it does not mean that each configuration is implemented with separate hardware or separate software; for example, two or more of each configuration may be combined to form a single configuration. One configuration may be divided into a plurality of configurations. Embodiments in which each configuration is incorporated and/or separated are also included in the scope of the rights of this disclosure, as long as it does not depart from the essence of this disclosure.

[46] In this specification, 1 or show or p" may mean "only show," "only, or "show and ^ both." In other words, 1 or 1 of show in this specification is 1 and/ Or it can be interpreted as. For example, in this specification, '6 or (：（人 3 shows”, “only ，,” only 0 \ or “人：8 and （：any combination of arbitrary （6 （1〔 :）” can mean.

[47]

A forward slash (/) or comma (03111111幻) means “and/or 11（1/0]·）”

For example, "show 思" could mean 1 and/or 6".

“Show 思” can mean “only show”, “only, or “show and ^ all”. For example, “人 3,（:” can mean “人 3 or (:”).

[48]

It can be interpreted in the same way. 2020/175904 1»（：1^1{2020/002729

[49] In addition, in this specification, “at least one 6 and ₀₁ 6 mountain (I!)” means only 0 \ or “人: 8 and (: any combination of arbitrary （

It can mean 11（1（I!）”. Also, “at least one 6 or （I!）” or “at least one 6 and/or (1!（ 0! 6

One person：8 and（：（no

It can mean.

[5 In addition, parentheses used in this specification are

Specifically, when marked as “prediction (intra prediction)”, “intra prediction” may have been proposed as an example of “prediction.” In other words, “forecast” in this specification is limited to “intra prediction”. （It is not 1 0, and “Intra prediction” may be suggested as an example of “prediction.” In addition, even when “prediction (ie, intra prediction)” is indicated, “intra prediction” as an example of “prediction” is now It may not be.

[51] In this specification, technical features that are individually described within one drawing may be implemented individually or simultaneously.

[52] Hereinafter, with reference to the accompanying drawings, a preferred embodiment of the present disclosure will be described in more detail. Hereinafter, the same reference numerals are used for the same elements in the drawings, and duplicate descriptions for the same elements will be described. Can be omitted.

[54] Referring to Fig. 1, a video/video coding system may include a first device (source device) and a second device (receive device). The source device is encoded.

It can be delivered to the receiving device via a digital storage medium or network in the form of streaming.

[55] The source device may include a video source, an encoding device, and a transmission unit. The receiving device may include a receiver, a decoding device, and a renderer. The encoding device may be referred to as a video/image encoding device, and the decoding device may be referred to as a video/image decoding device. The transmitter may be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display unit, and the display unit may be composed of separate devices or external components.

[56] Video sources can be captured through video/video capture, synthesis, or creation.

Video/image can be acquired Video sources can include video/image capture devices and/or video/image generation devices Video/image capture devices can be, for example, one or more cameras, previously captured video/image It can contain video/picture archives, etc. Video/picture generation devices can include, for example computers, tablets and smartphones, etc. It can generate video/pictures (electronically), for example computers Virtual video/video can be created through, etc., in which case related data is created. 2020/175904 1»（：1^1{2020/002729 The video/video capture process can be alternated.

[57] The encoding device can encode the input video/video. The encoding device can perform a series of procedures such as prediction, transformation, and quantization for compression and coding efficiency.

The encoded data (encoded video/video information) can be summarized in the form of a bitstream.

[58] The transmission unit is encoded video/video information output in the form of a bitstream or

Data can be transferred to the receiver of the receiving device via a digital storage medium or network in the form of a file or streaming. The digital storage medium can include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmission unit may include an element for generating a media file through a predetermined file format, and may include an element for transmission through a broadcasting/communication network. The receiving unit may receive/extract the bitstream and transmit it to the decoding device. have.

[59] The decoding device is inverse quantization, inverse transformation, prediction, etc. corresponding to the operation of the encoding device.

Video/video can be decoded by performing a series of procedures.

[6 This renderer can render decoded video/video. The rendered video/video can be displayed through the display unit.

[61] This document is about video/image coding. For example,

The method/embodiment includes a versatile video coding (VVC) standard, an essential video coding (EVC) standard, an AOMedia Video 1 (AVI) standard, a 2nd generation of audio video coding standard (AVS2), or a next-generation video/image coding standard (ex.H). .267 or H.268, etc.).

[62] In this document, various embodiments of video/image coding are presented, and the above embodiments may be implemented in combination with each other unless otherwise stated.

[63] In this document, video refers to a series of images over time.

It can mean a set. A picture generally refers to a unit representing an image in a specific time period, and a slice/tile is a unit constituting a part of a picture in coding. A tile can contain more than one CTU (coding tree unit); a picture can consist of more than one slice/tile.

[64] A tile is a rectangular region of CTUs within a particular tile column and a particular tile row in a picture). The tile row is a rectangular region of CTUs. The tile column is a rectangular region of CTUs having a height equal to the height of the picture and the width can be specified by syntax elements in the picture parameter set. a width specified by syntax elements in the picture parameter set).The tile row is a rectangular area of CTUs, the rectangular area has a width specified by syntax elements in the picture parameter set, and the height can be the same as the height of the picture. Yes (The tile row is a rectangular region of CTUs 2020/175904 1»（：1/10公020/002729 having a height specified by syntax elements in the picture parameter set and a width equal to the width of the picture). Tile scan is a specific sequential ordering of CTUs partitioning the picture. A tile scan is a specific sequential ordering of CTUs partitioning, and the CTUs can be sequentially arranged by a CTU raster scan within a tile, and tiles within a picture can be arranged by a raster scan of the tiles of the picture. a picture in which the CTUs are ordered consecutively in CTU raster scan in a tile whereas tiles in a picture are ordered consecutively in a raster scan of the tiles of the picture). Can contain multiple consecutive CTU rows within a single tile of a picture. Tile groups and slices can be mixed in this document. For example, in this document, the tile group/tile group header could be called a slice/slice header. have.

[65] On the other hand, a picture can be divided into two or more subpictures. A subpicture can be a rectangular region of one or more slices within a picture (an mctangular mgion of one or more slices within a picture).

[66] A pixel or pel may mean the smallest unit constituting a picture (or image). In addition,'sample' may be used as a term corresponding to a pixel. Sample In general, can represent the pixel or pixel value, it can represent only the pixel/pixel value of the luma component, or it can represent only the pixel/pixel value of the chroma component.

[67] A unit can represent the basic unit of image processing. A unit can contain at least one of a specific area of a picture and information related to that area. A unit can contain one luma block and two chromas (one luma block and two chromas). ex. cb, cr) may contain a block A unit may be used interchangeably with terms such as _block or area in some cases. In general, the MxN block may include a set (or array) of samples (or sample array) or transform coefficients consisting of M columns and N rows.

[68] Figure 2 shows the configuration of a video/video encoding device to which this disclosure can be applied.

This is a schematic diagram. Hereinafter, the video encoding device may include an image encoding device.

Referring to FIG. 2, the encoding apparatus 200 includes an image partitioner 210,

Predictor (220), residual processor (230), entropy encoder (240), adder (250), filtering unit (filter, 260) and memory (memory, 270) It can be configured to include. For example, the part 220 is

It may include a prediction unit 221 and an intra prediction unit 222. The residual processing unit 230 includes a transform unit 232, a quantizer 233, an inverse quantizer 234, and an inverse transform unit ( An inverse transformer 235 may be included. The residual processing unit 230 may further include a subtractor 231. The addition unit 250 may include a reconstructor. 2020/175904 1»（：1^1{2020/002729 or it can be called a recontructged block generator. The above-described image segmentation unit 210, prediction unit 220, residual processing unit 230) , The entropy encoding unit 240, the addition unit 250, and the filtering unit 260 are

The hardware component may be configured by a component (e.g., an encoder chipset or processor). Also, the memory 270 may include a decoded picture buffer (DPB), and may be configured by a digital storage medium. The hardware component is a memory 270. You can also include more as internal/external components.

P This image segmentation unit 2W is an input image (or, picture, input) input to the encoding device 200

Frame) can be divided into one or more processing units. For example, the processing unit may be referred to as a coding unit (CU), in which case the coding unit is a coding tree unit (CTU). Alternatively, it can be divided recursively from the largest coding unit (LCU) according to the QTBTTT (Quad-tree binary-tree ternary-tree) structure. For example, one coding unit has a quad tree structure, Based on the binary tree structure and/or ternary structure, it can be divided into a plurality of coding units of deeper depth. In this case, for example, the quad tree structure is applied first, and the binary tree structure and/or ternary structure is It may be applied later. Or the binary retrieval structure may be applied first. The coding procedure according to this disclosure may be performed based on the final coding unit that is no longer divided. In this case, based on the coding efficiency according to the image characteristics, etc., the maximum possible The coding unit can be used directly as the final coding unit, or if necessary, the coding unit can be

It is recursively divided into coding units of a lower depth, so that the optimal size coding unit can be used as the final coding unit. Here, the coding procedure may include procedures such as prediction, conversion, and restoration described later. As another example, the processing unit may further include a unit (PU: Prediction Unit) or a transformation unit (TU: Transform Unit). In this case, the prediction unit and the transformation unit are each divided from the final coding unit described above. Alternatively, it may be partitioned. The prediction unit may be a unit of sample prediction, and the transform unit may be a unit for inducing a conversion factor and/or a unit for inducing a residual signal from the conversion factor. 1] In some cases, units are mixed with terms such as block or area.

In general, an MxN block can represent a set of samples or transform coefficients consisting of M columns and N rows. A sample can typically represent a pixel or pixel value, and the luminance ( It can represent only the pixel/pixel value of the luma component, or it can represent only the pixel/pixel value of the chroma component. A sample corresponds to one picture (or image) corresponding to a pixel or pel. Can be used as a term.

2] The encoding device 200 subtracts the prediction signal (predicted block, prediction sample array) output from the inter prediction unit 221 or the intra prediction unit 222 from the input video signal (original block, original sample array) Residual signal, residual block, residual sample 2020/175904 1» (: 1^1 (2020/002729 array) can be generated, and the generated residual signal is transmitted to the conversion unit 232. In this case, input in the encoding device 200 as shown The unit that subtracts the prediction signal (prediction block, prediction sample array) from the video signal (original block, original sample array) may be called a subtraction unit 231. The prediction unit is a block to be processed (hereinafter referred to as the current block). The prediction for the current block may be performed and a predicted block including the predicted samples for the current block may be generated. The prediction unit may determine whether intra prediction or inter prediction is applied in units of the current block or CU. . The prediction unit may generate various types of information related to prediction, such as prediction mode information, as described later in the description of each prediction mode, and transmit it to the entropy encoding unit 240. The information on prediction may be encoded in the entropy encoding unit 240 and summarized in the form of a bitstream.

3] The intra prediction unit 222 may predict the current block by referring to samples in the current picture. The referenced samples are of the current block according to the prediction mode.

It can be located in the neighborhood, or it can be located away from it. In intra prediction, the prediction modes can include a plurality of non-directional modes and a plurality of directional modes. Non-directional modes are, for example, DC mode and planner mode. Can include (Planar mode) Directional mode, for example, depending on the precision of the predicted direction.

It may include 33 directional prediction modes or 65 directional prediction modes. However, this is an example and more or less directional predictions depending on the setting.

Modes may be used. The intra prediction unit 222 may determine a prediction mode to be applied to the current block by using the prediction mode applied to the surrounding block.

4] The inter prediction unit 221 refers to a reference specified by a motion vector on the reference picture.

Based on the block (reference sample array), it is possible to induce the predicted block for the current block. In this case, based on the correlation of the motion information between the neighboring block and the current block to reduce the amount of motion information transmitted in the inter prediction mode. Motion information can be predicted in units of blocks, sub-blocks, or samples. The motion information may include a motion vector and a reference picture index. The motion information indicates inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) In the case of inter prediction, the peripheral block may include a spatial neighboring block existing in the current picture and a temporal neighboring block existing in the reference picture. The reference picture including the reference block and the reference picture including the temporal peripheral block may be the same or different. The temporal peripheral block may be a collocated reference block, a co-located CU (colCU), etc. It can be called by the name of, and the reference picture containing the temporal surrounding block is the same position.

It can also be called a picture (collocated picture, colPic), for example, inter

The prediction unit 221 may construct a motion information candidate list based on the neighboring blocks, and generate information indicating which candidate is used to derive the motion vector and/or reference picture index of the current block. Prediction mode 2020/175904 1»（：1^1{2020/002729 The inter prediction can be performed. For example, in the case of skip mode and merge mode, the inter prediction unit 221 converts the motion information of the surrounding block to the current block. In the case of skip mode, unlike the merge mode, the residual signal may not be transmitted In the case of motion information, i.e. (motion vector prediction, MVP) mode, the motion vector of the surrounding block is used as a motion vector, i.e. The motion vector of the current block can be indicated by using it as a (motion vector predictor) and signaling the motion vector difference.

5] The prediction unit 220 may generate a prediction signal based on various prediction methods to be described later. For example, the prediction unit may apply intra prediction or inter prediction to predict one block, as well as intra prediction. Prediction and inter prediction can be applied at the same time. This can be called combined inter and intra prediction ([can be referred to as). Also, for example, it may be based on an intra block copy (IBC) prediction mode for example for a block. Or it may be based on a palette mode. The IBC prediction mode or palette mode can be used for content video/video coding such as games, for example SCC (screen content coding), etc. IBC is basically This can be done similarly to inter prediction in that it performs prediction within the current picture but derives a reference block within the current picture, i.e. IBC can use at least one of the inter prediction techniques described in this document. The mode can be seen as an example of intracoding or intra prediction. When the palette mode is applied, the sample value in the picture can be signaled based on the information about the palette table and palette index.

6] The prediction signal generated through the prediction unit (including the inter prediction unit 221 and/or the intra prediction unit 222) may be used to generate a restoration signal or may be used to generate a residual signal. The transform unit 232 may generate transform coefficients by applying a transform method to the residual signal. For example, the transform method is DCT (Discrete Cosine Transform), DST (Discrete Sine Transform),

KLT (Karhunen-Loeve Transform), GBT (Graph-Based Transform), or

It may include at least one of CNT (Conditionally Non-linear Transform).

Here, when it is said that GBT expresses relationship information between pixels in a graph,

It means the transformation obtained from the graph. CNT refers to a transformation that is obtained based on, e.g., generating a signal using all previously reconstructed pixels. Also, the transformation process can be applied to a block of pixels of the same size of a square, and It can also be applied to blocks of variable size that are not square.

7] The quantization unit 233 quantizes the transform coefficients to the entropy encoding unit 240

After being transmitted, the entropy encoding unit 240 encodes the quantized signal (information on quantized transformation coefficients) and outputs it as a bitstream. The information on the quantized transformation coefficients may be referred to as residual information. .Quantization part 233 is a coefficient 2020/175904 1»（：1/10公020/002729 It is possible to rearrange the quantized transformation coefficients of the block form into a one-dimensional vector form based on the scan order, and the quantized transformation coefficients of the one-dimensional vector form It is also possible to generate information on the quantized transformation coefficients based on the entropy encoding unit 240, for example, exponential Golomb,

Various encoding methods such as CAVLC (context-adaptive variable length coding) and CABAC (context-adaptive binary arithmetic coding) can be performed. The entropy encoding unit 240 includes quantized conversion factors and information necessary for video/image restoration. (E.g., values of syntax elements) can be encoded together or separately Encoded information (ex.encoded video/video information) is transmitted in the form of a bitstream in units of network abstraction layer (NAL) units or The video/video information may further include information about various parameter sets, such as an appointment parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). In addition, the video/video information may further include general constraint information. In this document, information transmitted/signaled from the encoding device to the decoding device and/or syntax elements may be included in the video/video information. The video/video information may be encoded through the above-described encoding procedure and included in the bitstream. The bitstream may be transmitted through a network or may be stored in a digital storage medium. Here, the network is a broadcasting network and/or The digital storage medium may include a variety of storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The signal output from the entropy encoding unit 240 is transmitted by a transmission unit ( (Not shown) and/or a storage unit (not shown) for storing may be configured as an internal/external element of the encoding apparatus 200, or a transmission unit may be included in the entropy encoding unit 240.

8] The quantized transformation coefficients output from the quantization unit 233 can be used to generate a predicted signal. For example, the quantization unit 234 and the inverse transformation unit 235 are used to generate a prediction signal. Residual by applying quantization and inverse transformation

A signal (residual block or residual samples) can be restored. The addition unit 155 restores the restored residual signal by adding the restored residual signal to the prediction signal output from the inter prediction unit 221 or the intra prediction unit 222. A (reconstructed) signal (restored picture, reconstructed block, reconstructed sample array) can be generated If there is no residual for the block to be processed, such as when the skip mode is applied, the predicted block can be used as a reconstructed block. The unit 250 may be referred to as a restoration unit or a restoration block generation unit. The generated restoration signal may be used for intra prediction of the next processing target block in the current picture, and inter prediction of the next picture through filtering as described below. It can also be used for

R9] Meanwhile, LMCS (luma mapping with chroma scaling) may be applied during picture encoding and/or restoration. 2020/175904 1»（：1^1{2020/002729

[8 This filtering unit 260 applies filtering to the restored signal to improve subjective/objective image quality.

For example, the filtering unit 260 may apply various filtering methods to the restored picture to generate a modified restored picture, and store the modified restored picture in a memory 270, specifically a memory 270. The various filtering methods include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, and bilateral filter. The filtering unit 260 may generate a variety of filtering information and transmit it to the entropy encoding unit 240 as described later in the description of each filtering method. The filtering information is encoded by the entropy encoding unit 240. And can be output in bitstream format.

[81] The modified reconstructed picture transmitted to the memory 270 may be used as a reference picture in the inter prediction unit 221. When inter prediction is applied through this, the encoding device W0 and the decoding device It can avoid predictive mismatch of and improve the coding efficiency.

[82] The memory 270 DPB may store the modified reconstructed picture to be used as a reference picture in the inter prediction unit 221. The memory 270 is a memory 270 from which motion information in the current picture is derived (or encoded). The motion information of the block and/or the motion information of the blocks in the picture that has already been restored can be stored. The stored motion information is transmitted to the inter prediction unit 221 in order to use the motion information of the spatial neighboring block or the motion information of the temporal neighboring block. The memory 270 may store restoration samples of the restored blocks in the current picture, and may be transmitted to the intra prediction unit 222.

[83] FIG. 3 shows the configuration of a video/video decoding apparatus to which this disclosure can be applied.

This is a schematic drawing.

[84] Referring to FIG. 3, the decoding apparatus 300 includes an entropy decoder 310, a residual processor 320, a predictor 330, and an adder 340. , Can be configured including a filtering unit (filter, 350) and memory (memory, 360). The prediction unit 330 may include an intra prediction unit 331 and an inter prediction unit 332.

The residual processing unit 320 may include a dequantizer 321 and an inverse transformer 321. The above-described entropy decoding unit 310, a residual processing unit 320, a prediction unit 330, The addition unit 340 and the filtering unit 350 may be configured by one hardware component (for example, a decoder chipset or processor) according to an exemplary embodiment. [0048] Further, the memory 360 may include a decoded picture buffer (DPB). In addition, it may be configured by a digital storage medium. The hardware component may include the memory 360 as an internal/external component loader.

[85] When a bitstream including video/image information is input, the decoding apparatus 300 may reconstruct an image corresponding to a process in which the video/image information is processed by the encoding apparatus of FIG. 3. For example, decoding The device 300 may derive units/blocks based on the block division related information obtained from the bitstream. 2020/175904 1»（：1^1{2020/002729 The device 300 can perform decoding using the processing unit applied in the encoding device. Therefore, the processing unit of the decoding may be a coding unit, for example, Can be divided from the coding tree unit or the maximum coding unit according to the quadtree structure, the binary tree structure, and/or the turner tree structure. One or more conversion units may be derived from the coding unit. And, through the decoding device 300, The decoded and output restored video signal can be played back through a playback device.

[86] The decoding device 300 converts the signal output from the encoding device of FIG. 3 into a bitstream.

It can be received in a form, and the received signal can be decoded through the entropy decoding unit 310. For example, the entropy decoding unit 3W parses the bitstream and is required for image restoration (or picture restoration). Information (ex. video/video information) can be derived. The video/video information may further include information on various parameter sets, such as an appointment parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). /Video information may further include general constraint information. The decoding device may further decode the picture based on the information on the parameter set and/or the general limit information. The signaling/received information and/or syntax elements described later in this document are decoded through the decoding procedure, It can be obtained from the bitstream. For example, the entropy decoding unit (3W) decodes the information in the bitstream based on a coding method such as exponential Golomb coding, CAVLC or CABAC, In more detail, the CABAC entropy decoding method receives the bin corresponding to each syntax element in the bitstream, and receives the decoding target syntax element information and the surrounding and decoding information of the decoding target block. Alternatively, the context model is determined using the symbol/bin information decoded in the previous step, and the probability of occurrence of bins is predicted according to the determined context model, and arithmetic decoding of bins is performed. A symbol corresponding to the value of the syntax element can be generated. In this case, the CABAC entropy decoding method can update the context model using information of the decoded symbol/bin for the context model of the next symbol/bin after determining the context model. Among the information decoded by the entropy decoding unit (3W), information about prediction is provided to the prediction unit (inter prediction unit 332 and intra prediction unit 331), and entropy decoding is performed by the entropy decoding unit 3W. The residual value, that is, quantized transform coefficients and related parameter information may be input to the residual processing unit 320.

The residual processing unit 320 may derive a residual signal (residual block, residual samples, and residual sample array). In addition, information about filtering among information decoded by the entropy decoding unit 310 is a filtering unit. Can be provided as 350. On the other hand, a receiving unit (not shown) that receives the signal output from the encoding device may be further configured as an internal/external element of the decoding device 300, or the receiving unit may be a component of the entropy decoding unit 3W. ,The decoding device according to this document 2020/175904 1»（：1^1{2020/002729 It can be called a video/picture/picture decoding device, and the decoding device is an information decoder (video/picture/picture information decoder) and a sample decoder (video/picture/picture). Sample decoder), the information decoder is the entropy

A decoding unit (3W) may be included, and the sample decoder includes the inverse quantization unit 321, the inverse transform unit 322, an addition unit 340, a filtering unit 350, a memory 360, an inter prediction unit ( 332) and an intra prediction unit 331.

[87] The inverse quantization unit 321 may inverse quantize the quantized transformation coefficients and output the transformation coefficients. The inverse quantization unit 321 may rearrange the quantized transformation coefficients into a two-dimensional block. In this case, the rearrangement above The inverse quantization unit 321 performs inverse quantization on the quantized transform coefficients using the quantization parameter (for example, quantization step size information) based on the coefficient scan order performed by the silver encoding device. And, you can obtain transform coefficients.

In the inverse transform unit 322, the residual signal (residual block, residual sample array) is obtained by inverse transforming the transform coefficients.

[89] The prediction unit performs prediction on the current block, and predicts the current block

A predicted block including samples may be generated. The prediction unit determines whether intra prediction or inter prediction is applied to the current block based on the prediction-related information output from the entropy decoding unit 310. Can be determined, and a specific intra/inter prediction mode can be determined.

[9 This prediction unit 330 may generate a prediction signal based on various prediction methods to be described later. For example, the prediction unit may apply intra prediction or inter prediction for prediction for one block, as well as, Intra prediction and inter prediction can be applied at the same time. This can be called combined inter and intra prediction ([can be referred to as). In addition, the example is based on the intra block copy (IBC) prediction mode for block prediction. The IBC prediction mode or the palette mode can be used for content video/video coding such as games, such as, for example, SCC (screen content coding), or it can be based on a palette mode. Basically, prediction is performed within the current picture, but it can be performed similarly to inter prediction in that a reference block is derived within the current picture, i.e., IBC can use at least one of the inter prediction techniques described in this document. The palette mode can be seen as an example of intracoding or intra prediction. When the palette mode is applied, information on the palette table and palette index may be included in the video/video information and signaled.

[91] The intra prediction unit 331 may predict the current block by referring to samples in the current picture. The referenced samples are of the current block according to the prediction mode.

It can be located in a neighboring area, or it can be located away from each other. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. 2020/175904 1» (: 1^1{2020/002729 May.) The intra prediction unit 331 may determine the prediction mode applied to the current block by using the prediction mode applied to the surrounding block.

[92] The inter prediction unit 332 is a reference specified by a motion vector on the reference picture.

Based on the block (reference sample array), it is possible to induce the predicted block for the current block. In this case, based on the correlation of the motion information between the neighboring block and the current block to reduce the amount of motion information transmitted in the inter prediction mode. Motion information can be predicted in units of blocks, sub-blocks, or samples. The motion information may include a motion vector and a reference picture index. The motion information indicates inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.) In the case of inter prediction, the peripheral block may include a spatial neighboring block existing in the current picture and a temporal neighboring block existing in the reference picture. For example, the inter prediction unit 332 may construct a motion information candidate list based on the neighboring blocks, and derive a motion vector and/or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on the prediction mode, and the information on the prediction may include information indicating a mode of inter prediction for the current block.

[93] The addition unit 340 predicts the acquired residual signal (inter prediction unit 332 and/or

In addition to the prediction signals (predicted blocks, prediction sample arrays) output from the intra prediction unit 331), a restoration signal (restored picture, restoration block, restoration sample array) can be generated. Processing as in the case where skip mode is applied. If there is no residual for the target block, the predicted block can be used as a restore block.

[94] The addition unit 340 may be referred to as a restoration unit or a restoration block generation unit. The generated restoration signal may be used for intra prediction of the next processing target block in the current picture, and output through filtering as described later. It may be used or it may be used for inter prediction of the next picture.

[95] Meanwhile, luma mapping with chroma scaling (LMCS) may be applied in the picture decoding process.

[96] The filtering unit 350 applies filtering to the restored signal to improve subjective/objective image quality.

For example, the filtering unit 350 may apply various filtering methods to the restored picture to generate a modified restored picture, and store the modified restored picture in a memory 360, specifically a memory 360. The various filtering methods include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, etc. can do.

[97] The (modified) restored picture stored in the DPB of the memory 360 can be used as a reference picture in the inter prediction unit 332. The memory 360 is from which motion information in the current picture is derived (or decoded). The motion information of the block and/or the motion information of the blocks in the picture that has already been restored can be stored. The stored motion information is the motion of the spatial surrounding block. 2020/175904 1»（：1^1{2020/002729 For use as information or motion information of temporal neighboring blocks,

It can be transferred to the prediction unit 332. The memory 360 can store reconstructed samples of the restored blocks in the current picture and can transfer them to the intra prediction unit 331.

[98] In the present specification, the embodiments described in the filtering unit 260, the inter prediction unit 221, and the intra prediction unit 222 of the encoding apparatus 100 are respectively the filtering unit 350 of the decoding apparatus 300 , The inter prediction unit 332 and the intra prediction unit 331 may be applied to be the same or corresponding to each other.

As described above, in order to increase compression efficiency in performing video coding, prediction is performed. Through this, a predicted block including predicted samples for the current block, which is a block to be coded, can be generated. Herein, the predicted block The block includes prediction samples in the spatial domain (or pixel domain). The predicted block is derived identically in the encoding device and the decoding device, and the encoding device includes the original block and not the original sample value of the original block itself. Image coding efficiency can be increased by signaling information about the residual between the predicted blocks (residual information) with a decoding device. The decoding device derives a residual block including residual samples based on the residual information, and , A restoration block including restoration samples may be generated by adding the residual block and the predicted block, and a restoration picture including restoration blocks may be generated.

[10] The residual information may be generated through a transformation and quantization procedure. For example, the encoding apparatus derives a residual block between the original block and the predicted block, and the residual information included in the residual block Transformation coefficients are derived by performing a transformation procedure on samples (residual sample array), quantized transformation coefficients are derived by performing a quantization procedure on the transformation coefficients, and related residual information (via bitstream) Here, the residual information may include information such as value information of the quantized transformation coefficients, location information, transformation technique, transformation kernel, quantization parameter, etc. The decoding apparatus is inversely based on the residual information. The quantization/inverse transform procedure can be performed and residual samples (or residual blocks) can be derived. The decoding device can generate a reconstructed picture based on the predicted block and the residual block. The encoding device can also generate a later picture. For reference for inter prediction, a residual block is derived by inverse quantization/inverse transformation of the quantized transformation coefficients, and a reconstructed picture can be generated based on this.

4 exemplarily shows the hierarchical structure of the coded data.

[102] Referring to FIG. 4, the coded data is a NAL between a video coding layer (VCL) that handles video/image coding processing and itself, and a sub-system that stores and transmits the coded video/image data. It can be divided into (Network abstraction layer).

[103] VCL is a set of parameters corresponding to headers such as sequence and picture (picture parameter 2020/175904 1» (：1^1{2020/002729 set (PPS), sequence parameter set (SPS), video parameter set (VPS), etc.) and SEI (Supplemental enhancement) additionally required for the coding process of video/image information) message can be created. The SEI message is separated from the video/image information (slice data). The VCL containing the video/image information consists of the slice data and the slice header. On the other hand, the slice header is a tile group header. It may be referred to as, and the slice data may be referred to as tile group data.

[104] In NAL, header to RBSP (Raw Byte Sequence Payload) generated from VCL

NAL unit can be created by adding information (NAL unit header). At this time, RBSP refers to slice data, parameter set, SEI message, etc. generated from VCL. The NAL unit header may include NAL unit type information specified according to RBSP data included in the corresponding NAL unit.

[105] The NAL unit, which is the basic unit of NAL, plays a role of mapping the coded image to the bit stream of sub-systems such as file format, RTP (Real-time Transport Protocol), TS (Transport Strea), etc. according to a predetermined standard.

[106] As shown, the NAL unit is the NAL unit according to the RBSP generated from the VCL.

It can be divided into a VCL NAL unit and a Non-VCL NAL unit. The VCL NAL unit can mean a NAL unit that contains information about the video (slice data), and the Non-VCL NAL unit is a NAL unit that contains the information (parameter set or SEI message) necessary for decoding the video. Can mean

[107] The above-described VCL NAL unit and Non-VCL NAL unit may be transmitted through a network by attaching header information according to the data standard of the sub-system. For example, the NAL unit is in H.266/VVC file format, RTP (Real- time Transport Protocol), TS (Transport Stream), etc., can be transformed into data types of predetermined standards and transmitted through various networks.

[108] As described above, the NAL unit is RBSP data included in the NAL unit.

The NAL unit type may be specified according to the structure, and information on the NAL unit type may be stored in the NAL unit header and signaled.

[109] For example, whether the NAL unit contains information about the image (slice data)

Depending on whether or not, it can be largely classified into VCL NAL unit type and Non-VCL NAL unit type. The VCL NAL unit type can be classified according to the properties and types of pictures included in the VCL NAL unit, and the non-VCL NAL unit type can be classified according to the type of parameter set.

[110] The following is an example of the NAL unit type specified according to the type of parameter set included in the Non-VCL NAL unit type. The NAL unit type can be specified according to the type of parameter set, etc. For example, the NAL unit type is an APS (Adaptation Parameter Set) NAL unit, which is a type for NAL units including APS, and a type for NAL units including DPS. In DPS (Decoding Parameter Set) NAL unit, VPS 2020/175904 1»（：1^1{2020/002729 VPS (Video Parameter Set) NAL unit, which is the type for the NAL unit including SPS, SPS (Sequence Parameter Set) NAL unit, which is the type for the NAL unit including SPS, and It may be specified as any one of the PPS (Picture Parameter Set) NAL unit, which is a type for the NAL unit including PPS.

[111] The above-described NAL unit types have syntax information for the NAL unit type, and the syntax information may be stored in the NAL unit header and signaled. For example, the syntax information may be nal_unit_type, and NAL unit types are nal_unit_type values. Can be specified.

[112] On the other hand, as described above, one picture can contain a plurality of slices, and one slice can contain a slice header and slice data. In this case, a plurality of slices within one picture (slice header and slice data) For a set), one picture header may be added. The picture header (picture header syntax) may include information/parameters commonly applicable to the picture. The slice header (slice header syntax) may be added to the slice. APS (APS syntax) or PPS (PPS syntax) may include information/parameters commonly applicable to one or more slices or pictures. SPS (SPS syntax) May contain information/parameters that are commonly applicable to one or more sequences. The VPS (VPS syntax) may contain information/parameters that are commonly applicable to multiple layers. The DPS (DPS syntax) is common throughout the video. The DPS may contain information/parameters related to the concatenation of a coded video sequence (CVS). In this document, the high level syntax (HLS) refers to the APS. At least one of syntax, PPS syntax, SPS syntax, VPS syntax, DPS syntax, a picture header syntax, and slice header syntax can be included.

[113] In this text, the image/video information encoded from the encoding device to the decoding device and signaled in the form of a bitstream only includes intra-picture partitioning information, intra/inter prediction information, residual information, and in-loop filtering information. Rather, information included in the slice header, information included in the picture header, information included in the APS, information included in the PPS, information included in the SPS, information included in the VPS, and/or the information included in the DPS. Information may be included. In addition, the image/video information may further include information of the NAL unit header.

5 is a diagram showing an example of partitioning a picture.

[115] Pictures can be divided into coding tree units (CTUs), and CTUs are

Blocks (CTB) can be matched. The CTU can include a coding tree block of luma samples and two coding tree blocks of chroma samples corresponding thereto. On the other hand, the maximum allowable size of the CTU for coding and prediction is the CTU for conversion. It may be different from the maximum allowable size. 2020/175904 1»（：1^1{2020/002729

[116] A tile can correspond to a series of 0X1s covering a rectangular area of the picture,

A picture can be divided into one or more tile rows and one or more tile columns.

[117] On the other hand, a slice may consist of an integer number of complete tiles or an integer number of consecutive complete 0X1 rows. In this case, two slice modes including raster scan (-) slice mode and rectangular slice mode can be supported.

[118] Raster scan In slice mode, slice is

It can contain a series of complete tiles. In square slice mode, a slice is a number of complete tiles that collectively form a rectangular area of the picture, or a number of consecutive tiles in a single tile that collectively form a rectangular area of the picture. (：1!1 rows can be included. Tiles within a square slice can be scanned in tile raster scan order within the square area corresponding to the slice.

[119] In Fig. 5, the example of dividing a picture into tiles and raster scan slices is shown.

This is a diagram showing, for example, a picture can be divided into 12 tiles and 3 raster scan slices.

[120] Also, Figure 5 shows an example of dividing a picture into tiles and square slices.

This is the drawing shown, for example, a picture can be divided into 24 tiles (6 tile columns and 4 tile rows) and 9 square slices.

[121] In addition, the figure in Fig. 5 shows an example of dividing the picture into tiles and square slices.

This is the drawing shown, for example, a picture can be divided into 24 tiles (2 tile columns and 2 tile rows) and 4 square slices.

6 is a flowchart illustrating a tile and/or tile group-based picture encoding procedure according to an embodiment.

[123] In one embodiment, information on picture partitioning 600) and tiles/tile groups

Generation 610) can be performed by the video segmentation unit 210 of the encoding device, and for video/video information including information about tiles/tile groups.

The encoding 620) can be performed by the entropy encoding unit 240 of the encoding device.

[124] The encoding apparatus according to an embodiment may perform picture partitioning for encoding an input picture 600). The picture may include one or more tiles/tile groups. The encoding apparatus is an image of the picture. Considering the characteristics and coding efficiency, the picture can be partitioned into various types, and information indicating the partitioning type with the optimum coding efficiency can be generated and signaled to the decoding device.

[125] An encoding apparatus according to an embodiment includes a tile/tile applied to the picture

Determine the group and create information about the tile/tile group 610). The information on the tile/tile group may include information indicating the structure of the tile/tile group for the picture. The information on the tile/tile group includes various parameter sets and/or tile group headers as described later. It can be signaled through. A specific example is described below. 2020/175904 1»（：1^1{2020/002729

[126] The encoding apparatus according to an embodiment may encode video/image information including information on the tile/tile group and output it in the form of a bitstream (S620). The bitstream is a digital storage medium or network. The video/video information may include HLS and/or tile group header syntax described in this document. Further, the video/video information may include prediction information, residual information, and (In-loop) filtering information may be further included. For example, the encoding device may apply in-loop filtering after restoring the current picture, and encode the parameters related to the in-loop filtering and output in the form of a bitstream.

7 is a flowchart illustrating a tile and/or tile group-based picture decoding procedure according to an exemplary embodiment.

[128] In one embodiment, obtaining information on a tile/tile group from a bitstream (S700) and deriving a tile/tile group within a picture (Stage 0), and decoding a tile/tile group based picture Step (S720) of performing is the entropy of the decoding device

The step (S620) of encoding video/image information including information on a tile/tile group may be performed by the decoding unit 310 and may be performed by a sample decoder of the decoding apparatus.

[129] The decoding apparatus according to an embodiment, from the received bitstream, tile/tile

Information on the group can be obtained (S700). The information on the tile/tile group can be obtained through various parameter sets and/or tile group headers as described later. A specific example will be described later.

The decoding apparatus according to an embodiment may derive a tile/tile group in the current picture based on the information on the tile/tile group (S phase 0).

[131] The decoding apparatus according to an embodiment may decode the current picture based on the tile/tile group (S720). For example, the decoding apparatus derives a CTU/CU located in the tile, and performs it. Based on inter/intra prediction, residual processing, restoration block (picture) generation, and/or in-loop filtering procedures, for example, in this case, the decoding apparatus is used to determine the context model/information in units of tiles/tile groups. In addition, if the surrounding block or the surrounding sample referenced during inter/intra prediction is located in a tile different from the current tile where the current block is located, the decoding device may treat the surrounding block or the surrounding sample as not available. have.

8 is a diagram showing an example of partitioning a picture into a plurality of tiles.

[133] In one embodiment, tiles may refer to areas within a picture that are defined by a set of vertical and/or horizontal boundaries that divide the picture into a plurality of rectangles. FIG. 8 shows one picture 700 Figure 8 shows an example of dividing into multiple tiles based on multiple column boundaries (810) and row boundaries (820). Or CTU (Coding Tree Units)) are numbered and shown.

[134] In one embodiment, each tile is a raster scan order within each tile. 2020/175904 1»（：1^1{2020/002729 Can contain an integer number of CTUs to be processed. At this time, multiple tiles in a picture, including each of the above tiles, can also be processed in raster scan order in the picture. The tiles can be grouped to form tile groups, and tiles within a single tile group can be raster scanned. Dividing a picture into tiles is the syntax and semantics of the Picture Parameter Set (PPS). It can be defined based on semantics.

[135] In one embodiment, the information derived from the PPS regarding tiles may be used to check (or read) the following items. First, it is checked whether a tile exists in the picture or if there are more than one tile. If more than one tile is present, it can be checked whether the above one or more tiles are uniformly distributed, the dimension of the tiles can be checked, and whether the loop filter is enabled can be checked. have.

In one embodiment, the PPS may signal a syntax element single_tile_in_pic_flag first. The single_tile_in_pic_flag may indicate whether only one tile in a picture exists or whether a plurality of tiles in a picture exist. A plurality of tiles in a picture When they are present, the decoding device can parse information about the number of tile rows and tile columns using the syntax elements num_tile_columns_minus 1 and num_tile_rows_minusl. The syntax element num_tile_columns_minus 1 and

num_tile_rows_minusl can specify the process of dividing a picture into tile rows and columns. The heights of tile rows and widths of tile columns are from the perspective of CTBs (i.e.

CTB in units).

[137] In one embodiment, whether the tiles in the picture are uniformly spaced

Additional flags can be parsed to check if the tiles in the picture are not uniformly spaced, the number of CTBs per tile can be explicitly signaled for each tile row and column boundaries (i.e. CTB within each tile row). The number of and the number of CTBs in each tile row can be signaled) If the tiles are spaced uniformly, the tiles can have the same width and height.

[138] In one embodiment, a loop filter is enabled for tile boundaries.

Another flag (e.g. the syntax element loop_filter_across_tiles_enabled_flag) can be parsed to determine if it has been enabled.

[139] Table 1 below summarizes examples of main information about tiles that can be derived by parsing the PPS. Table 1 can represent the PPS RBSP syntax. 2020/175904 1»(：1/10公020/002729

[14] [Table 1]

Table 2 below shows an example of semantics for the syntax elements described in Table 1 above.

[142] [S.2]

2020/175904 1»（：1/10公020/002729

[143]

9 is a block diagram showing a configuration of an encoding apparatus according to an embodiment, and FIG. 9 is a block diagram showing a configuration of a decoding apparatus according to an embodiment.

9 shows an example of a block diagram of an encoding apparatus. The encoding apparatus 900 shown in FIG. 9 includes a partitioning module 910 and an encoding module 920. The partitioning module (0) and the image division unit (0) of the encoding device shown in FIG. 2020/175904 1» (: 1^1{2020/002729 The same and/or similar operations can be performed, and the encoding module 920 is the same as the entropy encoding unit 240 of the encoding device shown in FIG. 2 and/or Similar operations can be performed. The input video can be segmented in the partitioning module 9W and then encoded in the encoding module 920. After being encoded, the encoded video can be output from the encoding device 900. .

An example of a block diagram of the decoding apparatus is shown in FIG. W. The decoding apparatus 1000 shown in FIG. W includes a decoding module 1010 and a deblocking filter 1020. The decoding module ( 1010) can perform the same and/or similar operations as the entropy decoding unit 3W of the decoding apparatus shown in FIG. 3, and the deblocking filter 1020 is a filtering unit 350 of the decoding apparatus shown in FIG. The same and/or similar operations can be performed. The decoding module 1010 decodes the input received from the encoding device 900 to derive information about tiles. A processing unit based on the decoded information The deblocking filter 1020 may apply an in-loop deblocking filter to process the processing unit. In-loop filtering may be applied to remove coding artifacts generated during the partitioning process. The in-loop filtering The operation may include an adaptive loop filter (ALF), a deblocking filter (DF), a sample adaptive operation set (SAO), etc. After that, the decoded picture can be output.

[147] An example of a descriptor that embodies the parsing process of each syntax element is shown in Table 3 below.

[148] [S.3]

]

2020/175904 1»（：1^1{2020/002729

[15] FIG. 11 is a diagram showing an example of a tile and a tile group unit constituting a current picture.

[151] As mentioned above, tiles can be grouped to form tile groups. 11 shows an example in which one picture is divided into tiles and tile groups. In FIG. 11, the picture includes 9 tiles and 3 tile groups. Each tile group can be independently coded.

12 schematically shows an example of the signaling structure of tile group information

It is a drawing.

[153] In CVS (Coded Video Sequence), each tile group has a tile group header.

Tile groups can have a similar meaning to a slice group. Each tile group can be independently coded. A tile group can contain one or more tiles. A tile group header can refer to a PPS, and a PPS can sequentially refer to a SPS (Sequence Parameter Set). .

[154] In FIG. 12, a tile group header is a PPS of a PPS referenced by the tile group header.

It can have an index. The PPS can refer to the SPS in sequence.

[155] In addition to the PPS index, the tile group header according to an embodiment can be determined for the following information. First, if more than one tile exists per picture, the tile group address and the number of tiles in the tile group are determined. Next, you can determine the tile group type, such as intra/predictive/bi-directional. Next, you can determine the picture order count (POC) of the Lease Significant Bits (LSB). Next, if there is more than one tile in one picture, you can determine the offset length and the entry point to the tile.

[156] Table 4 below shows an example of the syntax of the tile group header. In Table 4, the tile group header (tile_group_header) can be replaced by a slice header.

2020/175904 1»(：1^1{2020/002729

[157] [Table 4]

[158] Table 5 below shows an example of English semantics for the syntax of the tile group header.

[159] [£5]

When present, the value of the tile group header syntax element, group_pic_parameter_set_id and ti le_group_pic_order_cnt_l sb shall be ame in all tile group headers of a coded picture. _*' ti le_group_pic_para eter_set_id specifies the value of

pps_pic_parameter_set_id for the PPS in use. The value of ti 1 e_group_pic_para eter_set_id shall be in the range of 0 to 63, inclusive. _* · _'

It is a requirement of bitstream conformance that the value of Temporal Id of the current picture shall be greater than or equal to the value of Temporal Id of the PPS that has pps_pic_parameter_set_id equal to t i 1 e_group_p i c_parameter _set_id. ti le_group_address specifies the tile address of the first tile in the tile group, where tile address is the tile ID as specified by Equation c-7. The length of ti le_group_address is Cei 1 (Log2 (NumTi lesInPic)) bits. The value of ti le_group_address shall be in the range of 0 to

NumTi lesInPic-1, inclusive, and the value of ti le_group_address shall not be equal to the value of ti le_group_address of any other coded tile group ML unit of the same coded picture. When ti le_group_address is not present it is inferred to be equal to 0..： num_tiies_in_ti le_group_minusl plus 1 specifies the number of tiles [160] in the tile group. The value of num_ti les_in_ti le_group_minusl shall be in the range of 0 to Nu Ti lesInPic-1, inclusive. When not present, the value of num_t i 1 es_ i n_t i le_group_minusl is inferred to be equal to 0.-' ti le_group_type specifies the coding type of the tile group according to table 6.-· _'

When nal_unit_type is equal to IRAP_NUT, i.e., the picture is an

I RAP picture, ti le_group_type shall be equal to 2.*· ti le_group_pic_order_cnt_lsb specifies the picture order count modulo MaxPicOrderCntLsb for the current picture. The length of the ti le_group_pic_order_cnt_lsb syntax element is log2_max_pic_order_cnt_lsb_minus4 + 4 bits. The value of the ti le_group_pic_order_cnt_lsb shall be in the range of 0 to

MaxPicOrderCntLsb-1, inclusive.- _' of fset_len_ inusl plus 1 specifies the length, in bits, of the entry_point_offset_ inusl [i] syntax elements. The value of offset_len_minusl shall be in the range of 0 to 31, inclusive.-· _' entry_point_of fset_minusl[ i] plus 1 specifies the i-th entry point offset in bytes, and is represented by offset_len_minusl plus 1 bits. The tile group data that follow the tile group header consists of nu _ti les_in_ti le_group_ inusl +1 subsets, with subset index values 2020/175904 1»(：1/10公020/002729

[161]

[162] [Table 6]

[163] In one embodiment, the tile group may include a tile group header and tile group data. When the tile group address is known, an individual 0X1 in each tile group 2020/175904 1»(：1^1{2020/002729 Positions can be mapped and decoded. Table 7 below shows an example of the syntax of tile group data. In Table 7, tile group data can be replaced with slice data. have.

[164] [Table 刀

Table 8 below shows an example of English semantics for the syntax of the tile group data.

WO 2020/175904 PCT/KR2020/002729

[166] [S.8]

[167] = 0; j <= num_t i le_rows_minusl; j++ )·,·

RowHeight[ j] = ((j + l) PicHeight InCtbsY) / ows_minusl + 1)-(j * PicHeight InCtbsY) / ws_minusl + 1 )+ ght [num_t i le_rows_aiinusl] = PicHeight InCtbsY

= 0： j <nu _t i le_rows_minusl： j++) i,·

RowHeight [j

RowHeight [num_t i le_rows_mimisl] -= RowHeight [j] _* '

st ColBd[ i] for i ranging from 0 to nuin_t i le_columns_minusl e, specifying the location of the i-th tile column boundary TBs, is derived as follows—

olBd[ 0] = 0, 1 = 0; i <= num_tile_coluinns_mInusl: i++ )· ColBd[ i + 1] = ColBd[ i] + Colfidtht i]

st RowBd[ j] for j ranging from 0 to num_t i le_rows_jninusl + specifying the location of the j-th tile row boundary in, is derived as follows—·

owBd[ 0] = 0, j = 0: j <= num_t i le_rows_minusl: j++ )· [168] Ro Bd [j + 1] = RowBd [j] + RowHeight[ j]

The list CtbAddrRsToTs[ ctbAddrRs] for ctbAddrRs ranging from 0 to PicSizelnCtbsY-1, inclusive, specifying the conversion from a CTB address in CTB raster scan of a picture to a CTB address in tile scan, is derived as follows: *

fori ctbAddrRs = Q; ctbAddrRs <PicSizelnCtbsY: ctbAddrRs++)-tbX = ctbAddrRs% ricWidthlnCtbsY,

tbY = ctbAddrRs / PicfidthlnCtbsY

fori i = 0： i <= num_ti ]e_coluiiins_minusl： i++ )<·

if (tbX >= ColBd[ i] ).

ti leX = i _' fori j = 0： j <= num_t ile_rows_minusl; j++)

iff tbY >= Ro«-Bd[ j])

tileY-j ·

CrbAddrRsToTsf ctbAddrRs] = 0

for( i = 0: i <ci leX ^' iH-)

CrbAddrRsToTs[ ctbAddrRs] += RowHeighi [tileY] *

Colffidtht i )·

fori j ~ 0: j <tileY; j++ ).·

CtbAddrRsToTs[ ctbAddrRs] += Gί cWidthlnCtbsY * [169] RowHei ht [j]-

CtbAddrRsToTst ctbAddrRs] += (tbY-RowBd[ tileY]) *

Coiffidth[ tileS] + tbX-ColBd[ tiieX ].·

The list CtbAddrTsToRs[ ctbAddrTs] for ctbAddrTs ranging from 0 to

FicSizelnCtbsY-1, inclusive, specifying the conversion from a CTB address in tile scan to a CTB address in CTB raster scan of a picture, is derived as follows:- for( ctbAddrRs = 0： ctbAddrRs <FicSizelnCtbsY: ctbAddrRs++) CtbAddrTsToRs[ CtbAddrRsTo ]] = ctbAddrRs-·

The list Tileld[ ctbAddrTs] for ctbAddrTs ranging from 0 to ricSizelnCtbsY-1, inclusive, specifying the conversion from a CTB address in tile scan to a tile ID, is derived as follows:- for( j = 0. tileldx = 0: j <= num_t i le_rows_minusl: j++) _*· for( i = 0: i <= num_t i le_columns_niinusl; i++, tileldx++ )· for( y = RowBd [j ]; y <RowBd[ j + 1 ]; y++) .· ·

for( x = ColBd[ i ]： x <CoIBd[ i + 1 ]： X-H- )-

Tileidt CtbAddrRsToTst y * PicWidthInCtbsY+ x 3 J = tileldx-

The list NumCtusInTi ie[ tileldx] for tileldx ranging from 0 to [170] PicSizelnCtbsY-1, inclusive, specifying the conversion from a tile index to the number of CTUs in the tile, is derived as follows·· _*·

for( j = 0, tileldx = 0; j <= num_tile_rows_minusl; j++) _* for( i = 0: i <= nuin_t ile_columns_minusl; i++, tileldx++)

NumCtusInTi le[ ti leldx] = Colfidth[ i] * RowHeight[ j] _' The list FirstCtbAddrTs[ ti leldx] for ti leldx ranging from 0 to NumTilesInPic-1. inclusive, specifying the conversion from a tile ID to the CTB address in tile scan of the first CTB in the tile are derived as fol lows：-for( ctbAddrTs = 0, ti leldx = 0, tileStartFIag = 1： ctbAddrTs <

PicSizeln CtbsY;

if( tileStartFIag

FirstCtbAddrTsf t i leldx] = ctbAddrTs

tileStartFIag = 0

}*>

tiieEndFlag = ctbAddrTs = = PicSizelnCtbsY-1

Ti 1 e Id [ctbAddrTs + 1] != Tiieldt ctbAddrTs] _*

if( tiieEndFlag) i

ti leldx+A

tileStartFIag = 1 2020/175904 1»（：1/10公020/002729

[171]

[172] There could be various application examples requiring picture division based on tiles.

And, the present embodiments may be related to the above application examples.

[173] In one example, we review parallel processing.

Some implementations running on CPUs require dividing the source picture into tiles and tile groups, where each tile group can be processed in parallel on a separate core. The parallel processing is a high-resolution real-time encoding of videos. In addition, the above parallel processing can reduce the sharing of information between groups of tiles, thereby reducing the memory constraint. Tiles can be distributed to different threads while processing in parallel. Therefore, the parallel architecture can benefit from this partitioning mechanism.

[174] In another example, the maximum transmission unit (MTU) size matching is reviewed. The coded pictures transmitted through the network are subject to fragmentation when the coded pictures are larger than the MTU size. It can be different. Similarly, if the coded segments are small, the IP (Internet Protocol) header can become important. Packet fragmentation can lead to loss of error resiliency. The picture is taken to mitigate the effects of packet fragmentation. When dividing into tiles and packing each tile/tile group as a separate packet, the packet may be smaller than the MTU size.

[175] In another example, error resilience is reviewed. Error resilience is motivated by the requirements of some applications that apply Unequal Error Protection (UEP) to coded tile groups. Can be given.

[176] As described above, the structure of tiles for partitioning pictures can be efficiently 2020/175904 1»（：1^1{2020/002729 A method for signaling is required, which will be described in detail in Figs. 13 to 21.

13 is a diagram showing an example of a picture in a video conference video program.

[178] According to this specification, in tiling for partitioning a picture into a plurality of tiles, flexible tiling can be achieved by using a predefined rectangular area.

[179] In the case of the existing tiling, it was performed according to the raster scan order, but the tiling structure according to this method has recently been applied to video programs for video conferencing.

There are aspects that are not suitable for application to programs.

[180] FIG. 13 shows an example of a picture in a video program for video conferencing when a participant holds a video conference with several masters. In this case, the participant is a speaker l (Speaker 1), a speaker 2 (Speaker 2), It can be represented by Speaker 3 and Speaker 4. The area corresponding to each participant in the picture can correspond to each of the preset areas, and each of the preset areas is a single tile or a group of tiles. Can be coded. When a participant changes in a video conference, the single tile or group of tiles corresponding to the participant may also change.

14 is a diagram showing an example of partitioning a picture into tiles or tile groups in a video conference video program.

Referring to FIG. 14, an area allocated to Speaker 1 participating in a video conference may be coded as a single tile. Similarly, the areas assigned to each of Speaker 2, Speaker 3, and Speaker 4 can be coded as a single tile.

[183] As shown in Fig. 14, the area assigned to each participant is

In the case of coding, it is possible to enable efficient coding as spatial dependency improves. In addition, this division method can be applied to 360 video data, which will be described later in Fig. 15.

In FIG. 15, a picture can be obtained from 360 degree video data. 360 video can mean video or image content that is captured or played in all directions (360 degrees) at the same time required to provide VR (Virtual Reality). %0 video can refer to a video or image that appears in various types of 3D space according to the 3D model. For example, a 360 video can be displayed on a spherical surface.

[186] A two-dimensional space (2D) picture obtained from 360-degree video data can be encoded with at least one spatial resolution. For example, a picture can be encoded with a first resolution and a second resolution, and May be higher than the second resolution. Referring to FIG. 15, a picture can be encoded in two spatial resolutions, each having a size of 1536x1536 and 768x768, but the spatial resolution is not limited thereto and may correspond to various sizes. 2020/175904 1»（：1^1{2020/002729

[187] At this time, a 6x4 size tile grid may be used for the bitstreams encoded at each of the two spatial resolutions. In addition, a motion constraint tile set (MCTS) for each position of the tiles is coded and used. As described above with reference to FIGS. 13 and 14, each of the MCTSs may include tiles positioned in respective areas set in a picture.

[188] MCTS may contain at least one tile to form a square tile set.

A tile can represent a rectangular area composed of coding tree blocks (CTBs) of a two-dimensional picture. A tile can be classified based on a specific tile row and tile column within a picture. A specific MCTS in the encoding/decoding process When inter prediction is performed on the internal blocks, the blocks in the specific MCTS may be restricted to refer only to the corresponding MCTS of the reference picture for motion estimation/motion compensation.

[189] For example, referring to 15, the 12 first MCTSs (1510) are of 1536x1536.

It is derived from a bitstream encoded with a spatial resolution having a size, and 12 second MCTSs 1520 may be derived from a bitstream encoded with a spatial resolution having a size of 768x768. That is, the first MCTSs 1510 May correspond to a region having a first resolution in the same picture, and the second MCTSs 1520 may correspond to a region having a second resolution in the same picture.

[19 These first MCTSs may correspond to a viewport area within a picture. The viewport area may mean an area that the user is viewing in a 360-degree video. Or, the first MCTSs may correspond to an ROI (Region of Interest). The ROI area can refer to the area of interest of users suggested by the 360 content provider.

[191] At this time, MCTSs received in a single time are merged into one merged picture (merged

picture), for example, the first MCTSs 1510 and the second

The MCTSs 1520 can be combined and merged into a 1920x4708 merge picture 1530, and the merge picture 1530 can have 4 tile groups.

[192] Table 9 below shows an example of the PPS syntax.

2020/175904 1 ^» (：1/10公020/002729

[193] [Table 9]

[194] The table below shows an example of English semantics for the above syntax.

[195] [£10]

[196]

ti le_addr_val [i ][ j] specifies the ti le_group_address value of the tile of the i-th tile row and the j— th tile column. The length of ti le_addr_val [i ][ j] is ti le_addr_len_minusl + 1 bits. _*

For any integer m in the range of 0 to num_t i 1 e_co 1 umns_m i nu s 1 inclusive, and any integer n in the range of 0 to num_t i le_rows_minusl, inclusive, ti le_addr_val [i ][ j] shall not be equal to ti le_addr_val [m ][ n] when i is not equal to m or j is not equal to n. num_mcts_in_pic_minusl plus 1 specifies the number of MCTSs in the picture 2020/175904 1»（：1/10公020/002729

[197]

[198] In an embodiment, when there are multiple tiles within a picture, a syntax element unifoml_tile_spacing_flag indicating whether tiles having the same width and height are derived by dividing the picture uniformly may be signaled/parsed. The syntax element unifoml_tile_spacing_flag Can be used to indicate whether or not tiles in a picture are divided unequally. If the syntax element unifoml_tile_spacing_flag is enabled, the width of the tile column and the height of the tile row can be signaled/parsed. In other words, the scene representing the width of the tile column and the syntax factor calculation representing the height of the tile row

And/or can be parsed.

[199] In one embodiment, the tiles in the picture

Indicating whether to form

In this case, it may or may not form a square tile set, indicating that the use of sample values or variables outside the square tile set is restricted or unrestricted.

In this case, it can be indicated that the picture is divided by ratios.

[20 is also the above syntax element 1111111_111（：18_：11 in 1（：_1111111181 is the picture

In one embodiment, when 111_£ is 1, that is, when the picture is divided into (, the syntax element num_mcts_in_pic_minusl is 2020/175904 1»(：1/10公020/002729 Can be signaled/parsed.

[201] In addition, the syntax element top_left_tile_addr[ i] in the i-th MCTS

The tile_group_address value, which is the position of the tile located at the top-left, can be indicated. Similarly, the syntax element bottom_right_tile_addr[ i] is the i-th

In MCTS, the tile_group_address value, which is the location of the tile located at the bottom-right, can be displayed.

[202] Table 11 below shows an example of the tile group data syntax. In Table 11, tile group data can be replaced with slice data.

[203] [Table 11]

[204] Table 12 below shows English semantics for the tile group data syntax.

Give an example.

2020/175904 1»(：1/10公020/002729

[205] [Table 12]

[206] On the other hand, the scanning process of the order of decoding tiles in a picture

have.

2020/175904 1»(：1/10公020/002729

[207] [Table 13]

16 is a diagram showing an example of dividing a picture based on an R region.

[209] According to the present specification, in the tiling of partitioning a picture into a plurality of tiles, flexible tiling based on a region of interest (ROI) can be achieved. Referring to FIG. 16, a picture is in R this region. Based on this, it can be divided into multiple tile groups.

[210] Table 14 below shows an example of the PPS syntax.

2020/175904 1»(：1/10公020/002729

[211] [Table 14]

[212] Table 15 below shows an example of English semantics for the above syntax.

[213] [5.15]

2020/175904 1»(：1/10公020/002729

[214]

In one embodiment, a syntax element tile_group_info_in_pps_flag indicating whether tile group information related to tiles included in the tile group exists in or in the tile group header referring to may be signaled/parsed.

If tile_group_info_in_pps_flag is 1, it can be indicated that the tile group information exists in 客 and does not exist in the tile group header referring to 客. In addition, when tile_group_info_in_pps_flag is 0, the tile group information does not exist in 客 and refers to In the tile group header, it can indicate its presence.

have.

In addition, the syntax element niim_tile_groups_in_pic_minusl may indicate the number of tile groups in the picture referring to.

[217] In addition, the syntax element pps_first_tile_id] can represent the tile 11) of the first tile of the first tile group, and the syntax element pps_last_tile_id can represent the tile 11) of the last tile of the first tile group.

17 is a diagram showing an example of partitioning a picture into a plurality of tiles.

[219] According to the present specification, coding for tiling that divides a picture into a plurality of tiles 2020/175904 1 ^» (：1^1{2020/002729 By considering the size of tiles smaller than the size of the tree unit (0X1)), flexible tiling can be achieved. The tiling structure according to this method is the latest in video conferencing programs, etc. It can be usefully applied to video applications.

[22] Referring to FIG. 17, a picture may be partitioned into a plurality of tiles, and a plurality of

At least one of the tiles may be smaller than the size of the coding tree unit (0X1), e.g. tile 1 (1¾ 1), tile 2 (1¾ 2), tile 3 (1¾ 3) and tile 4 (1116). 4) It can be dug, and among them, tile 1 (1¾ 1), tile 2 (1116 2) and tile 4 (11 no 4) are smaller than that.

[221] Table below

It shows an example of tax.

[222] [Table 16]

[223] Table 17 below shows an example of English semantics for the PPS syntax.

[224] [Table 17]

2020/175904 1»（：1^1{2020/002729

[225] In one embodiment, the syntax element tile_size_unit_idc may represent the unit size of the tile. For example, if tile_size_unit_id is 0, 1, 2..., the height and width of the tile is a coding tree block (CTB) can be defined as 4, 8, 16...

18 shows an example of partitioning a picture into a plurality of tiles and tile groups

This is a drawing to show.

[227] According to this specification, a plurality of tiles in a picture can be grouped into a plurality of tile groups, and flexible tiling can be achieved by applying a tile group index to the plurality of tile groups.

[228] On the other hand, in the case of conventional tiling, tiles arranged in raster scan order were grouped into multiple tile groups. However, according to this specification, at least one type group among multiple tile groups is non-existent to facilitate flexible tiling. Raster

It can contain tiles arranged in a non-raster scan order.

[229] For example, referring to Fig. 18, a picture can be partitioned into a plurality of tiles, and a plurality of tiles are tile group l (Tile Group 1), tile group 2 (Tile Group 2), and tile group 3 (Tile Group). It can be grouped by 3), where each of tile group 1, tile group 2 and tile group 3 can contain tiles arranged in a non-raster scan order.

[23] Table 18 below shows an example of the syntax of the tile group header (tile_group_header).

In Table 18, tile group headers can be replaced with slice headers.

[231] [Table 18]

[232] Table 19 below shows an example of English semantics for the syntax of the tile group header. 2020/175904 1»(：1/10公020/002729

[233] [Table 19]

[234] In one embodiment, a syntax element bar 6_ specifying an index of each of a plurality of tile groups within a picture may be 1'011]3_:111 (16 visible signalling/parsing. In this case, bar 6_ is The value of 1'011]3_:111 (16 is not the same as the value of 6_011)3_:111 (16) of the other tile group NAL units in the same picture.

[235] Table 20 below shows an example of the syntax of the tile group header (_bright01 _1^ 1).

In Table 20, tile group headers can be replaced with slice headers.

[236] [Table 2

[237] Table 21 below shows an example of English semantics for the syntax of the tile group header. [238] [5.21]

[239]

single_t i le_per_t i le_group_f lag is equal to 1, the value of single_t i le_in_t i le_group_f lag is inferred to be equal to 1. _*

firs t_t i 1 e_i d specifies the tile ID of the first tile of the tile group. The length of fir s t_t i 1 e_i d is CeiK Log2( NumTi lesInTic)) bits. The value of f irst_ti le_id of a tile group shall not be equal to the value of f irst_t i le_id of any other tile group of the same picture. When not present, the value of f irst_t i le_id is inferred to be equal to the tile ID of the first tile of the current picture.-'

i as t_t i 1 e_i d specifies the tile ID of the last tile of the tile group. The length of last_tile_id is CeiK Log2( NumTi lesInTic)) bits.-' When NumTi lesInTic is equal to 1 or single_t i le_in_t i le_group_f lag is equal to 1, the value of last_tile_id is inferred to be equal to f irst _ ti le_id . When ti le_group_info_in_pps_f lag is equal to 1, the value of 1 as t_t ii e_i d is inferred to be equal to the value of

2020/175904 1»(：1/10公020/002729

[24이 pps_last_r i le_id[ i] where i is the value such that f irst_ti le_id is equal to pps_f irst_t i le_id[ i].

NOTE-The first_tile_id is the tile ID of the tile located at the top-left corner of the tile group, and the last_ti le_id is the tile ID of the tile located at the bottom-right corner of the tile group. ^

The variable NumTi lesInTi leGroup, which specifies the number of tiles in the tile group, and TgTi leldx[ i], which specifies the tile index of the i-th ti le in the tile group, are derived as follows: deltaTi leldx = last_t i le_idx一 f irst_ti le_idx^ nimiTi leRows = (deltaTi leldx / (nuni_t i le_colunuis_minusl + 1)) + l iiuiiiTI leColunms = (deltaTi leldx% (nuin_t i le_coIumns_minusl + 1)) + 1 ^ nuinTi leRows = nuinTi le leColunms ^ ti leldx = first_ti le_id^

[241]

In one embodiment, for each of a plurality of tile groups within a picture, a syntax element first_tile_id that designates a tile ID of the first tile may be signaled/parsed. The first_tile_id may correspond to the tile ID of the tile located at the top-left of the tile group. In this case, the tile ID of the first tile of the tile group is not the same as the tile ID of the first tile of the other tile group in the same picture.

[243] In one embodiment, the tile of the last tile for each of the plurality of tile groups in the picture 2020/175904 1»(：1^1{2020/002729

The syntax element last_tile_id specifying the ID can be signaled/parsed. The last_tile_id may correspond to the tile ID of the tile located at the bottom-right of the tile group. When the syntax element NumTilesInPic is 1 or single_tile_in_tile_group_flag is 1, the value of last_tile_id can be the same as first_tile_id. In addition, when tile_group_info_in_pps_flag is 1, the value of last_tile_id can be the same as the value of pps_last_tile_id.

19 shows an example of partitioning a picture into a plurality of tiles and tile groups

This is a drawing to show.

[245] According to this specification, it is possible to group tiles secondarily within the tile group of a picture. Accordingly, the size of the tiles can be more effectively controlled and thus flexible tiling can be achieved.

[246] For example, referring to 19, a picture can be first partitioned into three tile groups, and Tile group #2, which is a second tile group, can be additionally partitioned into secondary tile groups.

[247] Table 22 below shows an example of the PPS syntax.

[248] [Table 22]

[249] Below

It shows an example of semantics. 2020/175904 1 ^» (：1/10公020/002729

[25] [Table 23]

[251] In one embodiment, a syntax element related to the number of tile groups within a plurality of pictures

111D1_(116_寒1'01中8_1111111181 can be signaled/parsed. For example, the value of the syntax element 1111111_(1'01¾^_1111111181 for 116_ plus 1) indicates the number of tile groups in the picture. Can be represented.

[252] In one embodiment, the first (syntax element calculation specifying the position of the ¾) positioned at the upper left side (one part-:¾) of the first tile group in the picture _寒1'0111)_11_（1（ 1 88]

It can be signaled/parsed. Also,

Lower right girl !!!-! !!!) at the end (Syntax element specifying the position of ¾) _611 (1_（1（1 88] can be signaled/parsed. Mountain _寒1'0111)_ 031_（ 1（1 88] and 1: The value of _寒1'0111)_611（1_（1（1 88] is （for 立16_1 ^* 0111）_ Table 11_（1（1 8） of the other tile group units in the same picture. It is not the same as the value of [}] and (for 立16_1 ^* 0111 YES 1 (1_（1（1 8□]).

[253] Also, according to this specification, II) of the tiles in the picture can be explicitly signaled. 2020/175904 1»(：1^1{2020/002729, and the ID of the tile may be different from the index of the tile. Accordingly, MCTS can be assisted without the need to change VCL (video coding layer) NAL (Network abstraction layer). Also, the advantage is that you do not need to change the tile group header.

[254] Table 24 below shows an example of the PPS syntax.

[255] [Table 24]

[256] Table 25 below shows an example of English semantics for Su.

2020/175904 1»(：1/10公020/002729

[257] [Table 25]

[258] In one embodiment, a plurality of tiles, each tile II) is explicitly signaled.

representative

Can be signaled/parsed, e.g. () 2020/175904 1»(：1/10公020/002729 If explicit_tile_id_flag is 0, it may indicate that the tile ID is not explicitly signaled.

In an embodiment, a syntax element tile_id_val[i] that designates the tile ID of the i-th tile in the picture referencing the PPS may be signaled/parsed.

[26 On the other hand, the variables in Table 26 below can be reduced by calling the CTB raster and tile scanning conversion process.

[261] [5.26]

2020/175904 1»（：1/10公020/002729

[262]

[263] Table 27 below shows an example of the syntax of the tile group header. In Table 27, the tile group header can be replaced by a slice header. 2020/175904 1»(：1/10公020/002729

[264] [Table 27]

[265] Table 28 below shows an example of English semantics for the syntax of the tile group header.

[266] [Table 28]

[267] In an embodiment, a syntax element tile_group_address specifying a tile ID of the first tile of a tile group in a picture may be signaled/parsed. The value of tile_group_address is not the same as the value of tile_group_address of other tile group NAL units in the same picture.

[268] On the other hand, in certain systems it may be necessary to identify a tile group. This may be essential at the system level to interpret and distinguish which VCL NAL units belong to a certain tile group.

[269] For example, a MANE (Media-Aware Network Element) or video editor can identify a tile group carried by NAL units, and remove the corresponding NAL units or belong to a target tile group. A sub-bitstream including NAL units can also be provided.

[27] For this purpose, a syntax element that has the same value as the value of tile_group_id

nuh_tile_group_id may be suggested in the NAL unit header.

[271] A network element or video editor only uses NAL units. 2020/175904 1»(：1/10公020/002729 By parsing and interpreting, the tile group carried by the NAL units can be easily identified. In addition, the network element or video editor can remove the corresponding NAL units. According to this, a sub-bitstream including NAL units belonging to the target tile group can be extracted.

[272] Table 29 below shows an example of the syntax of the NAL unit header.

[273] [Table 29]

[274] Table 30 below shows an example of English semantics for the syntax of the show unit header.

[275] [Table 3]

[276] In one embodiment, a syntax element specifying tile group II) of the NAL unit

111山_ bought 16_ 011]3_:1（1 can be signaled/parsed. 111!11_Bought 16_ 011]3_：1 (The value of 1 is the same as the value of (_ 01¾)_ of the tile group header.

[277] Table 31 below shows an example of the syntax of _1^(1) when the example tile group header is bright 0. In Table 31, the tile group header can be replaced by a slice header.

2020/175904 1»(：1/10公020/002729

[278] [Table 31]

[279] Table 32 below shows an example of English semantics for the syntax of the tile group header. 2020/175904 1»（：1/10公020/002729

[28] [Table 32]

[281] In one embodiment, a syntax element specifying a tile group II) of a tile group in a picture

1^_ is]'011]3_：1(1 can be signaled/parsed. In this case, 1^_ is]'011]3_：1 (a value of 1 is the bar _£ _ of the NAL unit of another tile group in the same picture. 01平_:1(Not equal to the value of 1.

20 shows an example of partitioning a picture into a plurality of tiles and tile groups

This is a drawing to show.

[283] According to the real specification, multiple tiles in a picture are multiple raps

Around (\\0¾]> 01111 ₍ Can be grouped into ¾ tile groups. Wrap-around tile groups can be useful in providing 360 degree video.

[284] In 360-degree video, video contents can be connected to each other by the left and right border areas or the top and bottom border areas due to the projection ((此011). This means that depending on the projection type, the same object is placed on both border areas of the picture. It means that it can be located.

[285] The wrap-around tile group is a group of tiles located on both borders of a picture.

For example, a wrap-around tile group can contain tiles that are not contiguous in the current picture, but contiguous with each other in space.

[286] Accordingly, the wrap-around tile group accesses the same object by grouping areas with similar characteristics within the picture.

It can improve the efficiency.

[287] Referring to FIG. 20, a picture may be partitioned into a plurality of tiles, and a plurality of

Tiles can be grouped into tile group #0 (10 #0), tile group #1 (10 #1), tile group #2 (10 #2), and tile group # 3 0 #3).

[288] Grouping tiles may depend on the position of the first tile and the position of the last tile. Within a tile group, the order of the tiles may be sorted sequentially from the first tile to the last tile in the raster scan order.

[289] At this time, the first tile is located in the upper left corner of the rectangular area, and the last

If the tile is located in the lower right (150110111-]'111;), the general square tile group (110]11ta1 2020/175904 1» (：1^1{2020/002729 rectangle tile group) can be applied. Otherwise, a wraparound tile group that groups tiles located on both border areas of the picture can be applied.

[29] In FIG. 20, each of tile group #0, tile group #1, and tile group #2 may correspond to a wrap-around tile group, and tile group #3 may correspond to a general square tile group.

[291] Table 33 below shows the tile order of each tile group in Fig. 20.

[292] [Table 33]

[293] Below

Give an example.

2020/175904 1»(：1/10公020/002729

[294] [Table 34]

[295] Table 35 below shows an example of English semantics for the above syntax. [296] [S.35]

»7]

value of f irst_t i le_id.

last_pred_t i le_id[ i] specifies the tile ID of the last tile of the i— th prediction tile group. The length of last_pred_t i le_id is t i le_id_len_minusl + 1 bits. When not present, the value of last_pred_t i le_id is inferred to be equal to first _pr ed_t i 1 e_ i d.

ti le_of fset_len_minusl plus 1 specifies the length, in bits, of the entry_point_of f set_minusl[ i] syntax elements in the tile group headers referring to the ITS. The value of ti le_of f set_len_minusl shall be in the range of 0 to 31, inclusive. _* ·

ti le_id_len_minusl plus 1 specifies the number of bits used to represent the syntax element ti ie_id_val [i] [j], when present in the PPS. and the syntax element first _ ti le_id and 1 as t_t i 1 e_i d in tile group headers referring to the PPS. The value of ti le_id_len_minusl shall be in the range of Ce i 1 (Log2( NirniTi lesInPic) to 15, inclusive. _*

2020/175904 1»(：1/10公020/002729

[298] expl icit_ti le_id_f lag equal to 1 specifies that tile ID for each tile is explicitliy signal led. exp 1 i c i t _t i 1 e_i d_f 1 ag equal to 0 specifies that tile IDs are not explicit 1 iy signal led. tile_id_val[i] [j] specifies the tile ID of the tile of the i-th tile row and the j-ch tile column. The length of t i le_id_val [i] [j] is t i le_id_len_minusl + 1 bits.ᅬ

For any integer m in the range of 0 to num_t i 1 e_co I umns_m i nus 1, inclusive, and any integer n in the range of 0 to num_t i le_rows_minusl. inclusive, ti le__id val [i] [j] shall not be equal to ti le id val [ra] [n] when i is not equal to m or j is not equal to n, and ti le_id_val [i] [j] shall be less than ti le_id_val [m] [n] when j * (num_t i le_colunin5_minusl + 1) + i is less than n * (num_t i Ie_columns_minusl + 1) + m

[299]

[In this embodiment, a syntax element prediction_tile_group_flag indicating whether each of a plurality of tiles in a picture corresponds to a tile set may be signaled/parsed. When the value of the syntax element prediction_tile_group_flag is 0, the number of tiles in the picture It can be indicated that each of the tiles corresponds to a tile set. In addition, when the value of the syntax element prediction_tile_group_flag is 1, it can indicate that a tile group containing multiple tiles exists. In the case of a tile group, the position of each tile in the tile group must be predicted. Therefore, a tile group containing a plurality of tiles may be referred to as a predicted tile group. Therefore, when the value of the syntax element prediction_tile_group_flag is 1, i.e., a tile Groups to be described later 2020/175904 1»（：1/10公020/002729 The syntax element num_pred_tile_groups_in_pic_minus 1, first_pred_tile_id[i] and last_pred_tile_id[i] can indicate that they are explicitly specified.

In one embodiment, a syntax element num_pred_tile_groups_in_pic_minus 17} related to the number of prediction tile groups within a picture may be signaled/parsed.

In one embodiment, for each of the predicted tile groups in a picture, a syntax element first_pred_tile_id[i] designating a tile ID of the first tile may be signaled/parsed. The first tile may correspond to the first in-tile in the raster scan order within the tile group.

In one embodiment, a syntax element last_pred_tile_id[i] that designates the tile ID of the last tile for each of the predicted tile groups in the picture may be signaled/parsed. The last tile may correspond to the last in tile in the raster scan order within the tile group.

[304] In one embodiment, a tile ID of each of a plurality of tiles in a picture is explicitly

The syntax element explicit_tile_id_flag indicating signaled can be signaled/parsed. For example, if explicit_tile_id_flag is 0, the tile ID is explicitly

It can indicate that it is not signaled.

[305] In an embodiment, a syntax element tile_id_val[i][j] designating a tile ID of each of a plurality of tiles in a picture may be signaled/parsed. The syntax element tile_id_val[i][j] is the i-th in the picture. You can specify the tile ID of the tile located in the row and the j-th column.

On the other hand, the variables in Table 36 can be reduced by calling the CTB raster and tile scanning conversion process.

[307] [5.36]

WO 2020/175904 PCT/KR2020/002729

[308]

scan.

-the list CtbAddrTsToRst ctbAddrTs] for ctbAddrTs ranging from 0 to PicSizelnCtbsY-1, inclusive, specifying the conversion from a CTB address in the tile scan to a CTB address in the CTB raster scan of a picture,

-the list Tileld[ ctbAddrTs] for ctbAddrTs ranging from 0 to ricSizelnCtbsY-1, inclusive, specifying the conversion from a CTB address in tile scan to a tile ID, _*>

-the list NuniCtusInTi 1 e [tileldx] for tileldx ranging from 0 to PicSizelnCtbsY-1, inclusive, specifying the conversion from a tile index to the number of CTUs in the tile, _*·

-the set TileIdToIdx[ tileld] for a set of NtmiTi lesInPic tileld \-alues specifying the conversion from a tile ID to a tile index and the list FirstCtbAddrTs[ tileldx] for tileldx ranging from 0 to

2020/175904 1 ^» (：1/10公020/002729

[309]

[31 this

[311] Table 37 below shows an example of the syntax of the tile group header. In Table 37, the tile group header can be replaced by a slice header.

[312] [Table 37]

[313] Table 38 below shows English semantics for the syntax of the tile group header above.

Give an example. [314] [5.38]

WO 2020/175904 PCT/KR2020/002729

[315]

firstTileldx = Til el dTo!dx[ fi rst_t i 1 e_i d ]÷ fi rstTi leColumnldx = firstTileldx% (num_ti le_columns_minusl + 1) lastTileldx = TileIdToIdx[ last_tile_id ]+ lastTi leColumnldx = lastTi leColumnldx = lastTile 1 )x% (num_ti le_columns_! DeltaTileldx = lastTileldx-firstTileldx ^; If( lastTileldx <firstTileldx)

if( fi rstTi leColumnldx> lastTi leColumnldx) deltaTileldx += Nu Ti lesInPic + num_ti le_columns_ inusl +

1* else·'- deltaTileldx += NumTi leslnPic··

} else i f (fi rstTi leColumnldx> lastTi leColumnldx )- deltaTileldx += num_tile_columns_minusl + 1 numT i 1 eRows I nT i 1 eG roup =

(deltaTileldx

2020/175904 1»(：1/10公020/002729

When arbi trary_ti le_group_f lag is equal to 0， the variable TgTi leldxt i] specifies the tile index of the i-th tile in the tile group and is derived as follows:ᆻ ti leldx = Ti leldToIdxf f irst_ti le_id] for( j = 0 _f cldx = 0; j <numTi leRo^sInTi leGroup; j++， ti leldx += num_t i 1 e_col umns_mi nusl + 1) {^

ti leldx = ti leldx% NumTi lesInPic^

for( i = 0， currTi leldx = ti leldx; i <numTi leColumns I nTi leGroup; i++, currTi lei dx++, cldx++) {^

if (currTi leldx / (num_ti le_columns_minusl + 1)> ti leldx / (num_t i 1 e _„ co 1 umns_m i nusl + 1)

TgTi leldx[ cldx] = currTi leldx-

(num_t i 1 e_col umns_mi nusl

else

161(15

｝

[317] In one embodiment, designating the tile 11) of the first tile in the tile group in the picture

Can be signaled/parsed.

[318] In one embodiment, designating the tile II) of the last tile in the tile group in the picture

Can be signaled/parsed. 2020/175904 1»(：1^1{2020/002729

[319] Also, according to this specification, it can be used to create Region of Interest based mapping using wrap-around tile groups. To do this, specify the ID of each tile for each tile group. It can be signaled locally.

[32] Table 39 below shows an example of the PPS syntax.

[321] [Table 39]

[322] Under

It shows an example of semantics. 2020/175904 1»（：1/10公020/002729

[323] [Table 4

[324] In one embodiment, a tile ID of each of a plurality of tiles in a picture is explicitly

A syntax element explicit_tile_id_flag indicating signaled may be signaled/parsed.

In an embodiment, for each of a plurality of tile groups in a picture, a syntax element num_tiles_in_tile_groups_minusl[i] related to the number of tiles included in the tile group may be signaled/parsed.

[326] In one embodiment, a syntax element tile_id_val[i][j] that designates a tile ID of each of a plurality of tiles in a picture may be signaled/parsed. The syntax element tile_id_val[i][j] is the i-th in the picture. You can specify the tile ID of the tile located in the row and the j-th column.

[327] Also, according to this specification, differences between tile IDs within a tile group

By signaling, you can increase the efficiency of signaling for picture partitioning. For example, by signaling the difference from the previous tile ID, you can identify the tile IDs within each tile group. To this end, the absolute difference between the tile IDs included in each tile group. The value (absolute difference) and the corresponding sign information can be signaled.

[328] This signaling method can be useful when the tile group consists of tile IDs that do not have a monotonically increasing order.

[329] Accordingly, the signaling method can be usefully applied in 360-degree video. In addition, it can be usefully applied when the tile IDs have a monotonically increasing order or a non-increasing order.

[33] Table 41 below shows an example of the PPS syntax. \¥0 2020/175904 1 1710公020/002729

[331] [Table 41]

[332] Table 42 below shows an example of English semantics for the above syntax.

[333] [Table 42]

2020/175904 1»（：1^1{2020/002729

[334] In one embodiment, a syntax element tile_id_val_delta_abs[i][j] for designating an absolute value of an increment corresponding to each tile included in the predicted tile group for each of the predicted tile groups in a picture ] Can be signaled/parsed. The syntax element tile_id_val_delta_abs[i][j] can specify the absolute value of the increment corresponding to the j-th tile ID in the i-th predicted tile group.

[335] In one embodiment, a syntax element tile_id_val_delta_sign[i][j] that designates a sign of the delta corresponding to each tile included in the predicted tile group for each of the predicted tile groups in a picture Can be signaled/parsed Syntax element

tile_id_val_delta_sign[i][j] can specify the sign of the increment corresponding to the j-th tile ID in the i-th predicted tile group. For example, a syntax element

If the value of tile_id_val_delta_sign[i][j] is 0, the difference between the corresponding tile IDs corresponds to a positive value, and if not, the difference between the corresponding tile IDs may correspond to a negative value.

[336] Based on the syntax element tile_id_val_delta_abs[i] [j] and the syntax element tile_id_val_delta_sign[i][j] signaled/parsed as described above, i.e. tile_id_val[i] corresponding to the j-th tile ID in the tile group ][j] can be determined as shown in Table 43 below.

[337] [Table 43]

[338] In addition, according to this specification, it is possible to increase the efficiency of signaling for picture partitioning by applying a hierarchical signaling method in which offset into the tiles within each tile group within a picture is signaled.

Referring to Table 33 above, for each of the tile groups shown in FIG. 20, the first tile and the last tile can be identified. At this time, in order to signal the order of tiles within each tile group, an offset to each tile group is made. Can be signaled.

[34, for example, the number of tiles included in tile group # 0 in tile group # 0 can be signaled first. After that, 0, 4, corresponding to IDs of leading tiles in tile group # 0, 18 and 22 can be signaled. The IDs of subsequent tiles following the preceding tiles can be derived using additional information. 2020/175904 1»（：1^1{2020/002729

21 is a flow chart showing an operation of a decoding apparatus according to an embodiment, and FIG. 22 is a block diagram showing a configuration of a decoding apparatus according to an embodiment.

Each step disclosed in FIG. 21 may be performed by the decoding apparatus 300 disclosed in FIG. 3. More specifically, S2100 and S2110 are entropy disclosed in FIG.

It may be performed by the decoding unit 310, S2120 may be performed by the prediction unit 330 disclosed in FIG. 3, and S2130 may be performed by the addition unit 340 disclosed in FIG. 3. In addition, operations according to S2100 to S2130 are performed according to S2100 to S2130. , It is based on some of the contents described above in Figs. 1 to 20. Therefore, specific contents overlapping with the contents described above in Figs. 1 to 20 will be omitted or simplified.

[343] As shown in FIG. 22, the decoding apparatus according to an embodiment is

It may include a decoding unit (3W), a prediction unit 330 and an addition unit 340. However, in some cases, all of the components shown in Fig. 22 may not be essential components of the decoding device, and the decoding device is It may be implemented by more or less components than the components shown in FIG. 22.

[344] In the decoding apparatus according to an embodiment, the entropy decoding unit (3W), the prediction unit 330, and the addition unit 340 are each implemented as a separate chip, or at least two or more components are It can also be implemented through a chip.

[345] The decoding apparatus according to an embodiment includes partition information for a current picture.

information) and image information including prediction infomiation on the current block included in the current picture can be obtained from the bitstream.

More specifically, the entropy decoding unit 3W of the decoding apparatus may obtain image information including partition information for the current picture and prediction information for the current block included in the current picture from the bitstream. have.

[346] The decoding apparatus according to an embodiment may provide a partitioning structure of the current picture based on a plurality of tiles based on the partitioning information on the current picture. More specifically, the entropy decoding unit 3W of the decoding apparatus includes a division structure of the current picture based on a plurality of tiles, based on the division information for the current picture. In one example, the plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups is not adjacent to the current picture, but in 3D space. It may contain tiles adjacent to each other on the top.

[347] The decoding apparatus according to an embodiment may derive prediction samples for the current block based on the prediction information for the current block included in one of the plurality of tiles (S2120). More specifically, the prediction unit 330 of the decoding apparatus may derive prediction samples for the current block based on the prediction information for the current block included in one of the plurality of tiles.

[348] The decoding apparatus according to an embodiment, based on the prediction samples, the current 2020/175904 1» (: 1^1{2020/002729 The picture can be restored (S2130). More specifically, the addition unit 340 of the decoding device can restore the current picture based on the prediction samples.

[349] In one embodiment, the split information for the current picture includes information on the number of tile groups, ID information of the first tile in raster order for each of the plurality of tile groups, and tile groups of the plurality of tiles. For each of these, at least one of the ID information of the last tile in raster order can be included.

[35] Also, information on the number of the plurality of tile groups, ID information of the first tile in the raster scan order for each of the plurality of tile groups, the ID of the last tile in the raster scan order for each of the plurality of tile groups At least one of the information may be included in a picture parameter set (PPS) of the image information.

[351] In one embodiment, the split information on the current picture is at least one of flag information on whether ID information of each of the plurality of tiles is explicitly signaled, and ID information of each of the plurality of tiles. May contain more.

[352] In addition, whether the ID information of each of the plurality of tiles is explicitly signaled

At least one of flag information on whether or not and ID information of each of the plurality of tiles may be included in a Picture Parameter Set (PPS) of the image information.

[353] In one embodiment, the division information for the current picture is, for each of the plurality of tile groups, absolute values between tile IDs included in the tile group.

It can include absolute difference and sign information.

[354] According to the present disclosure described above, a picture is a plurality of tiles and the plurality of tiles are

It is possible to flexibly partition into a plurality of tile groups to be grouped. Further, according to the present disclosure, it is possible to increase the efficiency of picture partitioning based on division information for the current picture.

23 is a flow chart showing an operation of an encoding device according to an embodiment, and FIG. 24 is a block diagram showing a configuration of an encoding device according to an embodiment.

The encoding apparatus according to FIGS. 23 and 24 can perform operations corresponding to those of the decoding apparatus according to FIGS. 21 and 22. Accordingly, operations of the encoding apparatus to be described later in FIGS. The same can be applied to the decoding device according to 22.

[357] Each step disclosed in FIG. 23 may be performed by the encoding apparatus 200 disclosed in FIG. 2. More specifically, S2300 and S2310 may be performed by the image dividing unit 210 disclosed in FIG. 2, and S2320 and S2330 may be performed by the prediction unit 220 disclosed in FIG. 2, and S2340 may be performed by the entropy encoding unit 240 disclosed in FIG. 2. In addition, operations according to S2300 to S2340 are described above in FIGS. It is based on some of the contents. Therefore, specific contents overlapping with the contents described above in Figs. 1 to 20 will be omitted or simplified.

As shown in FIG. 24, the encoding apparatus according to an embodiment includes an image division unit (2W), 2020/175904 1» (: 1^1{2020/002729 It may include a prediction unit 220 and an entropy encoding unit 240). However, in some cases, all of the components shown in Fig. 24 are essential for the encoding device. It may not be a component, and the encoding device may be implemented by more or less components than the components shown in FIG. 24.

[359] In the encoding apparatus according to an embodiment, the image division unit 210, the prediction unit 220, and the entropy encoding unit 240 are each implemented as a separate chip, or at least two or more components are It can also be implemented through the chip.

[36 The encoding apparatus according to this embodiment may divide the current picture into a plurality of tiles 2300). More specifically, the image dividing unit 210 of the encoding apparatus may divide the current picture into a plurality of tiles. have.

[361] The encoding apparatus according to an embodiment may generate division information for the current picture based on the plurality of tiles 2310). More specifically, the image division unit (0) of the encoding apparatus includes the plurality of tiles. Split information for the current picture may be generated based on tiles. In one example, the plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups is the current picture. 31 ^{) May} contain tiles that are not adjacent to each other in space but are not adjacent to each other.

[362] The encoding apparatus according to an embodiment may derive prediction samples for a current block included in one of the plurality of tiles 2320). More specifically, the prediction unit 220 of the encoding apparatus includes: Prediction samples for the current block included in one of the plurality of tiles can be derived.

[363] The encoding apparatus according to an embodiment may generate prediction information for the current block based on the prediction samples 2330). More specifically, the prediction unit 220 of the encoding apparatus is based on the prediction samples. With this, prediction information for the current block can be generated.

[364] The encoding apparatus according to an embodiment may encode image information including segmentation information on the current picture and prediction information on the current block 2340). More specifically, it is possible to encode image information including at least one of division information for the current picture or prediction information for the current block.

[365] In one embodiment, the division information for the current picture includes information on the number of tiles in the plurality of tile groups, © information of the first tile in raster order for each of the plurality of tile groups, and the plurality of tile groups For each of these, at least one of II) information of the last tile in raster order may be included.

[366] In addition, information on the number of tiles in the plurality of tile groups, © information of the first tile in the raster scan order for each of the plurality of tile groups, 10 of the last tile in the raster scan order for each of the plurality of tile groups At least one of the information is

It can be included in 8 ).

[367] In one embodiment, the division information for the current picture is, the plurality of tiles 2020/175904 1»（：1^1{2020/002729 Flag information on whether or not each ID information is explicitly signaled, and at least one of the ID information of each of the plurality of tiles may be further included.

[368] In addition, whether the ID information of each of the plurality of tiles is explicitly signaled

[369] In one embodiment, the division information for the current picture is, the plurality of tiles

For each of the groups, the absolute value between the tile IDs included in the tile group

It can include absolute difference and sign information.

[37] In the above-described embodiment, the methods are described on the basis of a sequence diagram as a series of steps or blocks, but this disclosure is not limited to the order of the steps, and certain steps may be performed in an order different from those described above. In addition, those skilled in the art will understand that the steps shown in the flowchart are not exclusive, other steps may be included, or one or more steps in the flowchart may be deleted without affecting the scope of this disclosure.

[371] The above-described method according to this disclosure may be implemented in the form of software, and the encoding device and/or decoding device according to this disclosure can perform image processing such as TV, computer, smartphone, set-top box, display device, etc. It can be included in the device that performs it.

[372] In the present disclosure, when the embodiments are implemented as software, the above-described method is

It can be implemented as a module that performs a function (process, function, etc.) Modules are stored in memory and can be executed by the processor The memory can be inside or outside the processor, well known It can be connected to the processor by various means. . Processors may include application-specific integrated circuits (ASICs), other chipsets, logic circuits and/or data processing devices. Memory includes read-only memory (ROM), random access memory (RAM), flash memory, and memory cards. In other words, the embodiments described in this disclosure may be implemented and performed on a processor, microprocessor, controller, or chip. For example, the functional units shown in each drawing may be implemented. It can be implemented and performed on a computer, processor, microprocessor, controller or chip, in which case information on instructions or algorithms can be stored on a digital storage medium.

[373] In addition, the decoding device and encoding device to which this disclosure is applied are multimedia broadcasting.

Transmission/reception device, mobile communication terminal, home cinema video device, digital cinema video device, surveillance camera, video conversation device, real-time communication device such as video communication, mobile streaming device, storage medium, camcorder, video-on-demand (VoD) service provider device , OTT video (Over the top video) device, Internet streaming service providing device,

3D (3D) video device, VR (virtual reality) device, AR (argumente reality) device, video phone video device, transportation terminal (ex. vehicle (including self-driving vehicle) terminal, airplane terminal, ship terminal, etc.) and It can be included in medical video equipment, etc., and can be used to process video signals or data signals, for example, OTT video (Over 2020/175904 1» (：1^1 (2020/002729 the top video) devices include game consoles, Blu-ray layers, Internet access TVs, home theater systems, smartphones, tablet PCs, and DVR (Digital Video Recorders). Can include

[374] In addition, the processing method to which this disclosure is applied can be produced in the form of a program executed by a computer, and can be stored in a recording medium that can be read by a computer. Multimedia data having a data structure according to the present disclosure is also a computer The computer-readable recording medium includes all types of storage devices and distributed storage devices in which computer-readable data is stored. The computer-readable recording medium is, for example, a computer-readable recording medium. For example, it may include Blu-ray disk (BD), universal serial bus (USB), ROM, PROM, EPROM, EEPROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage. The provisional readable recording medium includes media implemented in the form of a carrier (for example, transmission via the Internet). In addition, the bitstream generated by the encoding method is on a computer-readable recording medium.

It can be stored or transmitted over a wired or wireless communication network.

Further, an embodiment of the present disclosure may be implemented as a computer program product using a program code, and the program code may be executed in a computer by an embodiment of the present disclosure. The program code is a carrier readable by a computer. Can be stored on

[376] Fig. 25 shows an example of a content streaming system to which the disclosure of this document can be applied

Show.

[377] Referring to FIG. 22, the content streaming system to which this disclosure is applied may largely include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

[378] The encoding server plays a role of generating a bitstream by compressing content input from multimedia input devices such as smartphones, cameras, camcorders, etc. into digital data and transmitting them to the streaming server. As another example, a smart phone When multimedia input devices such as phones, cameras, camcorders, etc. directly generate bitstreams, the encoding server may be omitted.

[379] The bitstream may be generated by an encoding method or a bitstream generation method to which the present disclosure is applied, and the streaming server may temporarily store the bitstream while transmitting or receiving the bitstream.

[38] The streaming server transmits multimedia data to a user device based on a user request through a web server, and the web server serves as a medium that informs the user of what kind of service is available. When a service is requested, the web server transmits it to the streaming server, and the streaming server transmits multimedia data to the user. In this case, the content streaming system may include a separate control server, in which case the control server is the control server. To control command/response between devices in the content streaming system 2020/175904 1»（：1^1{2020/002729 plays a role.

[381] The streaming server may receive content from a media storage and/or an encoding server. For example, when receiving content from the encoding server, it may receive the content in real time. In this case, a seamless streaming service In order to provide a, the streaming server may store the bitstream for a predetermined time.

[382] Examples of the user device, mobile phones, smart phones (smart phone), notebook

Computer (laptop computer), digital broadcasting terminal, PDA (personal digital assistants), PMP (portable multimedia player), navigation, slate PC, tablet PC, ultrabook (ul-abook), wearable device (wearable devices, for example, watch-type terminals (smartwatch), glass-type terminals (smart glass), HMD (head mounted display)), digital TVs, desktop computers, digital signage, and the like.

[383] Each of the servers in the content streaming system may be operated as a distributed server, and in this case, data received from each server may be distributed and processed.

[384] Claims described in this specification can be combined in various ways. For example, the technical features of the method claims of this specification can be combined to be implemented as a device, and the technical features of the device claims of this specification can be combined to Can be implemented. In addition, the technical characteristics of the method claims of the present specification and the technical characteristics of the apparatus claims may be combined to be implemented as an apparatus, and the technical characteristics of the method claims of the present specification and the technical characteristics of the apparatus claims may be combined to be implemented in a method.

Claims

2020/175904 1»（：1/10公020/002729 Claims

[Claim 1] In the video decoding method performed by the decoding device,

Acquiring from a bitstream image information including partition information on a current picture and, for example, prediction information on a current block included in the current picture;

Reducing a partitioning structure of the current picture based on a plurality of tiles based on the division information for the current picture;

Deriving prediction samples for the current block based on the prediction information for the current block included in one of the plurality of tiles; And

Including the step of restoring the current picture based on the prediction samples,

The plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups includes tiles that are not adjacent to the current picture.

[Claim 2] The method of claim 1,

The division information for the current picture,

Including at least one of the number information of the plurality of tile groups, the ID information of the first tile in raster order for each of the plurality of tile groups, and the ID information of the last tile in raster order for each of the plurality of tile groups , Image decoding method.

[Claim 3] In paragraph 2,

At least one of the number information of the plurality of tile groups, the ID information of the first tile in the raster scan order for each of the plurality of tile groups, and the ID information of the last tile in the raster scan order for each of the plurality of tile groups The video decoding method included in the PPS (Picture Parameter Set) of the video information.

[Claim 4] The method of claim 1,

The division information for the current picture,

A video decoding method further comprising at least one of flag information on whether ID information of each of the plurality of tiles is explicitly signaled and ID information of each of the plurality of tiles.

[Claim 5] The method of claim 4,

At least one of the flag information on whether the ID information of each of the plurality of tiles is explicitly signaled and the ID information of each of the plurality of tiles are included in the PPS (Picture Parameter Set) of the image information, 2020/175904 1»（：1^1{2020/002729 Video decoding method.

[Claim 6] The method of claim 1,

The split information for the current picture,

For each of the plurality of tile groups,

Image decoding method, including absolute difference and sign information between tile IDs included in a tile group.

[Claim 7] In the video encoding method performed by the encoding device,

Dividing the current picture into a plurality of tiles;

Generating segmentation information for the current picture based on the plurality of tiles;

Deriving prediction samples for a current block included in one of the plurality of tiles;

Generating prediction information for the current block based on the prediction samples; And

Including the step of encoding image information including segmentation information on the current picture and prediction information on the current block,

The plurality of tiles are grouped into a plurality of tile groups, the plurality of tiles are grouped into a plurality of tile groups, and at least one type group among the plurality of tile groups includes tiles that are not adjacent to the current picture. Containing, image encoding method.

[Claim 8] The method of claim 7,

The split information for the current picture,

At least one of the number information of the plurality of tile groups, the ID information of the first tile in raster scan order for each of the plurality of tile groups, and the ID information of the last tile in raster scan order for each of the plurality of tile groups A video encoding method included in the PPS (Picture Parameter Set) of the video information.

[Claim 9] The method of claim 7,

The split information for the current picture,

The image encoding method further comprising at least one of flag information on whether ID information of each of the plurality of tiles is explicitly signaled and ID information of each of the plurality of tiles.

[Claim 10] In paragraph 9,

At least one of the flag information on whether the ID information of each of the plurality of tiles is explicitly signaled and the ID information of each of the plurality of tiles are included in the PPS (Picture Parameter Set) of the image information, an image encoding method .

[Claim 11] In clause W, 2020/175904 1»（：1^1{2020/002729 For each of the plurality of tile groups, the information on the number of tiles, the location information of the CTB located on the upper left of each of the plurality of tile groups, and each of the plurality of tile groups. On the other hand, at least one of the location information of the CTB located at the lower right is included in the PPS (Picture Parameter Set) of the image information, an image encoding method.

[Claim 12] In paragraph 7,

The division information for the current picture,

For each of the above plurality of tile groups,

An image encoding method including an absolute difference value and corresponding sign information between IDs of tiles included in a tile group.

[Claim 13] The decoding device is required to perform the video decoding method.

In a computer-readable digital storage medium for storing encoded image information, the image decoding method,