WO2021194100A1 - Procédé et dispositif de traitement d'un signal vidéo en utilisant un en-tête d'image - Google Patents

Procédé et dispositif de traitement d'un signal vidéo en utilisant un en-tête d'image Download PDF

Info

Publication number
WO2021194100A1
WO2021194100A1 PCT/KR2021/002174 KR2021002174W WO2021194100A1 WO 2021194100 A1 WO2021194100 A1 WO 2021194100A1 KR 2021002174 W KR2021002174 W KR 2021002174W WO 2021194100 A1 WO2021194100 A1 WO 2021194100A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
slice
level
flag
header
Prior art date
Application number
PCT/KR2021/002174
Other languages
English (en)
Korean (ko)
Inventor
정재홍
김동철
손주형
곽진삼
Original Assignee
주식회사 윌러스표준기술연구소
(주)휴맥스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 윌러스표준기술연구소, (주)휴맥스 filed Critical 주식회사 윌러스표준기술연구소
Priority to AU2021243774A priority Critical patent/AU2021243774B2/en
Priority to CA3177367A priority patent/CA3177367A1/fr
Publication of WO2021194100A1 publication Critical patent/WO2021194100A1/fr
Priority to AU2024219800A priority patent/AU2024219800A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Definitions

  • the present disclosure relates to a method and apparatus for processing a video signal, and more particularly, to a method and apparatus for processing a video signal for encoding or decoding a video signal.
  • Compression coding refers to a series of signal processing techniques for transmitting digitized information through a communication line or storing it in a form suitable for a storage medium.
  • Targets of compression encoding include audio, video, and text.
  • a technique for performing compression encoding on an image is called video image compression.
  • Compression encoding of a video signal is performed by removing redundant information in consideration of spatial correlation, temporal correlation, stochastic correlation, and the like.
  • a method and apparatus for processing a video signal with higher efficiency are required.
  • An object of the present disclosure is to increase coding efficiency of a video signal.
  • a method of decoding a video signal is a coding tool for a current CLVS from a sequence parameter set (SPS) applied to a current coded layer video sequence (CLVS). obtaining a coding tool activation flag of a sequence level indicating whether or not activation is performed; when the coding tool activation flag of a sequence level indicates that the coding tool is activated for the current CLVS, the coding tool is activated for the current picture included in the current CLVS obtaining a picture-level coding tool activation flag indicating whether or not , if the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header, and the picture-level coding tool activation flag indicates activation of the coding tool, whether a coding tool is used for the current slice obtaining, from a slice header, a coding tool activation flag of a slice level indicating
  • a slice-level coding tool activation flag is selected based on a picture-level coding tool activation flag. It further comprises the step of determining.
  • the determining of the slice-level coding tool activation flag based on the picture-level coding tool activation flag of the method for decoding a video signal includes: the picture header-in-slice header flag is a picture header syntax determining that the value of the coding tool activation flag at the slice level is the same as the value of the coding tool activation flag at the picture level, if it indicates that the structure exists in the slice header, and the picture header-in-slice header flag is the picture header syntax structure and determining that the coding tool activation flag of the slice level is not in use of the coding tool when it indicates that it does not exist in the slice header.
  • the slice-level coding tool activation flag when the picture-level coding tool activation flag indicates activation of the coding tool, the slice-level coding tool activation flag is used for the coding tool. and determining the slice level coding tool activation flag as non-use of the coding tool when the picture level coding tool activation flag indicates deactivation of the coding tool.
  • the value of the picture header-in-slice header flag of all coded slices of the current coded layer video sequence (CLVS) of the method for decoding a video signal according to an embodiment of the present disclosure is characterized in that it is the same.
  • the picture header-in-slice header flag of the method for decoding a video signal indicates that the picture header syntax structure is present in the slice header
  • the picture header network abstraction layer (NAL) unit including the structure does not exist
  • the step of obtaining the picture-level coding tool activation flag includes obtaining the picture-level coding tool activation flag from the picture header syntax structure included in the slice header. It is characterized in that it comprises a step.
  • the current picture unit PU header includes a NAL unit
  • obtaining a picture-level coding tool activation flag includes obtaining a picture-level coding tool activation flag from a picture header syntax structure included in a picture header NAL unit.
  • the picture header NAL unit of the method of decoding a video signal according to an embodiment of the present disclosure is characterized in that it exists before the first video coding layer (VCL) NAL unit of the current picture unit (PU).
  • the step of obtaining a coding tool activation flag at a sequence level includes a sequence level LMCS activation flag indicating whether luma mapping with chroma scaling (LMCS) for the current CLVS is activated.
  • LMCS luma mapping with chroma scaling
  • obtaining a picture-level coding tool activation flag wherein when the sequence-level LMCS activation flag indicates activation of the LMCS, the picture-level LMCS activation indicating whether the LMCS is activated for the current picture obtaining a flag from the picture header syntax structure, and when the LMCS activation flag of the picture level indicates activation of LMCS, obtaining an identifier for an LMCS adaptation parameter set (APS) including the LMCS parameter from the picture header syntax structure
  • the step of obtaining the slice-level coding tool activation flag from the slice header is that the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header, and the picture-level LMCS activation flag of the LMCS when indicating activation, obtaining a slice-level LMCS activation flag indicating whether LMCS is used for the current slice from a slice header, wherein decoding the current slice includes: a slice-level LMCS activation flag and an LMCS APS and decoding the current
  • the step of obtaining the coding tool activation flag at the picture level indicates that the LMCS activation flag of the picture level indicates activation of the LMCS, and the color format of the current picture includes a chroma component format, obtaining a picture-level chroma residual scale flag indicating whether chroma residual signal scaling is activated for the current picture from a picture header syntax structure.
  • the LMCS activation flag of the slice level indicates use of the LMCS
  • the currently decoded block included in the current slice is a luma block, and performing a luma mapping process.
  • the LMCS activation flag of the slice level indicates the use of the LMCS, and the currently decoded block included in the current slice is not a luma block, and performing a chroma residual scaling process when the chroma residual scale flag of the picture level indicates activation of chroma residual signal scaling for the current picture.
  • the step of obtaining the coding tool activation flag of the sequence level of the method for decoding a video signal includes an explicit scaling list activation flag of the sequence level indicating whether the explicit scaling list for the current CLVS is activated. comprising the step of obtaining, wherein the step of obtaining the coding tool activation flag at the picture level includes whether the explicit scaling list activation flag for the current picture is activated when the sequence level explicit scaling list activation flag indicates activation of the explicit scaling list Obtaining a picture-level explicit scaling list activation flag indicating from a picture header syntax structure, and when the picture-level explicit scaling list activation flag indicates activation of an explicit scaling list, for APS including a scaling list element obtaining the identifier from the picture header syntax structure, wherein the obtaining the slice-level coding tool activation flag from the slice header indicates that the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header.
  • the picture-level explicit scaling list activation flag indicates activation of the explicit scaling list
  • obtain a slice-level explicit scaling list use flag indicating whether the explicit scaling list is used for the current slice from the slice header and decoding the current slice comprises decoding the current slice based on the identifier for the APS including the slice-level explicit scaling list use flag and the scaling list element.
  • the decoding of the current slice of the method for decoding a video signal comprises: when the slice-level explicit scaling list use flag indicates that the explicit scaling list is used for the current slice, a scaling list element Scaling the transform coefficients of the current slice by utilizing the scaling list included in the APS based on the identifier for the containing APS, and the slice-level explicit scaling list use flag indicates that the explicit scaling list is not used for the current slice. If indicated, scaling the transform coefficient of the current slice by using a preset scaling list, wherein all elements of the preset scaling list are 16.
  • An apparatus for decoding a video signal includes a processor and a memory, and the processor currently A coding tool activation flag at the sequence level indicating whether the coding tool is activated for CLVS is obtained, and when the coding tool activation flag at the sequence level indicates that the coding tool is activated for the current CLVS, in the current picture included in the current CLVS obtains a picture-level coding tool activation flag indicating whether a coding tool is activated for When the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header, and the picture-level coding tool activation flag indicates activation of the coding tool, the coding tool for the current slice is It is characterized in that the slice-level coding tool activation flag indicating whether or not it is used is obtained from the slice header, and the current slice is decoded based on the slice-level coding tool activation flag.
  • the processor of the apparatus for decoding a video signal based on the instruction stored in the memory, based on the picture-level coding tool activation flag when the slice-level coding tool activation flag is not obtained from the slice header to determine the slice level coding tool activation flag.
  • the slice level determines the value of the coding tool activation flag of , equal to the value of the coding tool activation flag of the picture level, and the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header. It is characterized in that the coding tool activation flag is determined as non-use of the coding tool.
  • the processor of the apparatus for decoding a video signal codes the slice-level coding tool activation flag when the picture-level coding tool activation flag indicates activation of the coding tool based on the instruction stored in the memory.
  • the coding tool activation flag at the picture level indicates deactivation of the coding tool, it is characterized in that the coding tool activation flag at the slice level is determined as non-use of the coding tool.
  • the value of the picture header-in-slice header flag of all coded slices of the current coded layer video sequence (CLVS) of the apparatus for decoding a video signal according to an embodiment of the present disclosure is the same.
  • the picture header-in-slice header flag of the apparatus for decoding a video signal indicates that the picture header syntax structure is present in the slice header
  • the picture header syntax in the current coded layer video sequence (CLVS)
  • a picture header network abstraction layer (NAL) unit including a structure does not exist, and the processor obtains a picture-level coding tool activation flag from a picture header syntax structure included in a slice header based on an instruction stored in a memory. do it with
  • the processor obtains a picture-level coding tool activation flag from a picture header syntax structure included in the picture header NAL unit based on an instruction stored in a memory.
  • the picture header NAL unit of the apparatus for decoding a video signal is characterized in that it exists before the first video coding layer (VCL) NAL unit of the current picture unit (PU).
  • the processor of the apparatus for decoding a video signal acquires an LMCS activation flag of a sequence level indicating whether luma mapping with chroma scaling (LMCS) for the current CLVS is activated, based on a command stored in a memory.
  • LMCS luma mapping with chroma scaling
  • a picture-level LMCS activation flag indicating whether LMCS is activated for the current picture is obtained from a picture header syntax structure
  • the LMCS activation flag of the picture level indicates that the LMCS activation flag is
  • an identifier for an LMCS adaptation parameter set (APS) including an LMCS parameter is obtained from a picture header syntax structure
  • a picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header.
  • a slice-level LMCS activation flag indicating whether LMCS is used for the current slice is obtained from the slice header, and the slice-level LMCS activation flag and LMCS APS are obtained from the slice header. It is characterized in that the decoding of the current slice based on the identifier for.
  • the LMCS activation flag at the picture level indicates activation of the LMCS
  • the color format of the current picture includes a chroma component.
  • a picture-level chroma residual scale flag indicating whether chroma residual signal scaling is activated for the current picture is obtained from a picture header syntax structure.
  • the slice-level LMCS activation flag indicates the use of the LMCS, and the currently decoded block included in the current slice is the luma block In the case of , it is characterized in that a luma mapping process is performed.
  • the slice-level LMCS activation flag indicates the use of the LMCS, and the currently decoded block included in the current slice is the luma block , and when the chroma residual scale flag of the picture level indicates activation of chroma residual signal scaling for the current picture, the chroma residual scaling process is performed.
  • the processor of the apparatus for decoding a video signal obtains an explicit scaling list activation flag of a sequence level indicating whether an explicit scaling list for the current CLVS is activated, based on an instruction stored in a memory, and , when the sequence-level explicit scaling list activation flag indicates activation of the explicit scaling list, the picture-level explicit scaling list activation flag indicating whether the explicit scaling list is activated for the current picture is obtained from the picture header syntax structure and, when the explicit scaling list activation flag of the picture level indicates activation of the explicit scaling list, an identifier for the APS including the scaling list element is obtained from the picture header syntax structure, and the picture header-in-slice header flag is A slice indicating whether an explicit scaling list is used for the current slice when it indicates that the picture header syntax structure does not exist in the slice header, and the picture-level explicit scaling list activation flag indicates activation of the explicit scaling list. It is characterized in that the level's explicit scaling list use flag is obtained from the slice header, and the current slice is decode
  • the processor of the apparatus for decoding a video signal Based on the instruction stored in the memory, the processor of the apparatus for decoding a video signal according to an embodiment of the present disclosure performs scaling when the slice-level explicit scaling list use flag indicates that the explicit scaling list is used for the current slice. Scale the transform coefficient of the current slice by using the scaling list included in the APS based on the identifier for the APS that includes the list element, and the slice-level use explicit scaling list flag indicates that the explicit scaling list is not used for the current slice. In case of indicating no, the transform coefficient of the current slice is scaled by using a preset scaling list, and all elements of the preset scaling list are 16.
  • a method of encoding a video signal is a sequence-level coding tool indicating whether a coding tool is activated for a current CLVS from a sequence parameter set (SPS) applied to a current coded layer video sequence (CLVS).
  • SPS sequence parameter set
  • CLVS current coded layer video sequence
  • An apparatus for encoding a video signal includes a processor and a memory, and the processor currently Generates a coding tool activation flag at the sequence level indicating whether the coding tool is activated for CLVS, and when the coding tool activation flag at the sequence level indicates that the coding tool is activated for the current CLVS, in the current picture included in the current CLVS generates a picture-level coding tool activation flag indicating whether a coding tool is activated for If the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header, and the picture-level coding tool activation flag indicates activation of the coding tool, the coding tool for the current slice Include a slice-level coding tool activation flag indicating whether to be used in a slice header, encode a current slice based on a slice-level coding tool activation flag, and a sequence-level coding tool activation flag, a sequence-level LMCS activation flag or a sequence level explicit scaling list activ
  • coding efficiency of a video signal may be increased.
  • FIG. 1 is a schematic block diagram of a video signal encoding apparatus according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic block diagram of a video signal decoding apparatus according to an embodiment of the present disclosure.
  • FIG. 3 shows an embodiment in which a coding tree unit is divided into coding units within a picture.
  • FIG. 4 shows an embodiment of a method for signaling the split of a quad tree and a multi-type tree.
  • FIG 5 and 6 illustrate an intra prediction method according to an embodiment of the present disclosure in more detail.
  • FIG 7 illustrates an inter prediction method according to an embodiment of the present disclosure.
  • FIG. 8 is a diagram specifically illustrating a method in which an encoder converts a residual signal.
  • FIG. 9 is a diagram specifically illustrating a method in which an encoder and a decoder inverse transform transform coefficients to obtain a residual signal.
  • FIG. 10 shows a block diagram of an encoder supporting a scalable video coding scheme.
  • FIG. 11 shows a block diagram of a decoder supporting a scalable video coding scheme.
  • FIG. 13 is a diagram illustrating a reference relationship between pictures in a bitstream composed of multiple layers.
  • HLS high level syntax
  • VPS 15 shows a video parameter set (VPS) syntax structure.
  • Fig. 19 shows pseudocode for deriving variables related to an output layer set according to another embodiment.
  • VPS video parameter set
  • 21 is a diagram illustrating the use of a flag related to the number of sub-layers in a layer among contents of the VPS syntax structure.
  • 22 is a diagram illustrating syntax elements and semantics that are part of the SPS syntax structure and indicate sub-layers.
  • FIG. 23 is a diagram illustrating an example of a picture unit (PU) configuration.
  • 24 is a diagram illustrating a parameter set.
  • 25 is a diagram illustrating a parameter set.
  • 26 is a diagram illustrating a method of signaling whether to use luma mapping with chroma scaling (LMCS) and an explicit scaling list at the sequence, picture, and slice level.
  • LMCS luma mapping with chroma scaling
  • Coding can be interpreted as encoding or decoding as the case may be. Or it may be construed to include both encoding and decoding.
  • an apparatus for generating a video signal bitstream by performing encoding (encoding) of a video signal is referred to as an encoding apparatus or an encoder
  • an apparatus for reconstructing a video signal by performing decoding (decoding) of a video signal bitstream is decoding referred to as a device or decoder.
  • a video signal processing apparatus is used as a term that includes both an encoder and a decoder.
  • the 'unit' is used to refer to a basic unit of image processing or a specific position of a picture, and refers to an image region including at least one of a luma component and a chroma component.
  • 'block' refers to an image region including a specific component among the luma component and the chroma component (ie, Cb and Cr).
  • terms such as 'unit', 'block', 'partition' and 'region' may be used interchangeably according to embodiments.
  • a unit may be used as a concept including all of a coding unit, a prediction unit, and a transform unit.
  • a picture indicates a field or a frame, and according to embodiments, the terms may be used interchangeably.
  • the encoding apparatus 100 of the present disclosure includes a transform unit 110 , a quantizer 115 , an inverse quantizer 120 , an inverse transform unit 125 , a filtering unit 130 , and a prediction unit 150 . ) and an entropy coding unit 160 .
  • the transform unit 110 converts a residual signal that is a difference between the input video signal and the prediction signal generated by the prediction unit 150 to obtain a transform coefficient value.
  • a discrete cosine transform DCT
  • DST discrete sine transform
  • the transform is performed by dividing the input picture signal into blocks.
  • the coding efficiency may vary according to the distribution and characteristics of values in the transform region.
  • the quantization unit 115 quantizes the transform coefficient values output from the transform unit 110 .
  • the picture signal is not coded as it is, but the picture is predicted using the region already coded through the prediction unit 150, and a residual value between the original picture and the prediction picture is added to the predicted picture to obtain a reconstructed picture.
  • method is used to obtain
  • the encoder performs a process of reconstructing the encoded current block.
  • the inverse quantization unit 120 inversely quantizes the transform coefficient value, and the inverse transform unit 125 restores the residual value using the inverse quantized transform coefficient value.
  • the filtering unit 130 performs a filtering operation for improving the quality of the reconstructed picture and improving the encoding efficiency.
  • a deblocking filter For example, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) may be included.
  • the filtered picture is output or stored in a decoded picture buffer (DPB, 156) to be used as a reference picture.
  • DPB decoded picture buffer
  • the picture signal is not coded as it is, but the picture is predicted using the region already coded through the prediction unit 150, and a residual value between the original picture and the prediction picture is added to the predicted picture to obtain a reconstructed picture.
  • method is used to obtain
  • the intra prediction unit 152 performs intra prediction within the current picture, and the inter prediction unit 154 predicts the current picture using the reference picture stored in the decoded picture buffer 156 .
  • the intra prediction unit 152 performs intra prediction from reconstructed regions in the current picture, and transmits intra prediction information to the entropy coding unit 160 .
  • the inter prediction unit 154 may again include a motion estimation unit 154a and a motion compensator 154b.
  • the motion estimation unit 154a obtains a motion vector value of the current region with reference to the reconstructed specific region.
  • the motion estimation unit 154a transmits position information (reference frame, motion vector, etc.) of the reference region to the entropy coding unit 160 to be included in the bitstream.
  • the motion compensation unit 154b performs inter-screen motion compensation using the motion vector value transmitted from the motion estimation unit 154a.
  • the prediction unit 150 includes an intra prediction unit 152 and an inter prediction unit 154 .
  • the intra prediction unit 152 performs intra prediction within the current picture, and the inter prediction unit 154 predicts the current picture using the reference picture stored in the decoded picture buffer 156 Inter prediction. carry out
  • the intra prediction unit 152 performs intra prediction on reconstructed samples in the current picture, and transmits intra encoding information to the entropy coding unit 160 .
  • the intra encoding information may include at least one of an intra prediction mode, a most probable mode (MPM) flag, and an MPM index.
  • the intra-encoding information may include information about a reference sample.
  • the inter prediction unit 154 may include a motion estimation unit 154a and a motion compensation unit 154b.
  • the motion estimator 154a obtains a motion vector value of the current region by referring to a specific region of the reconstructed reference picture.
  • the motion estimation unit 154a transmits a set of motion information (reference picture index, motion vector information, etc.) for the reference region to the entropy coding unit 160 .
  • the motion compensation unit 154b performs motion compensation using the motion vector value transmitted from the motion estimation unit 154a.
  • the inter prediction unit 154 transmits inter encoding information including motion information on the reference region to the entropy coding unit 160 .
  • the prediction unit 150 may include an intra block copy (BC) prediction unit (not shown).
  • the intra BC prediction unit performs intra BC prediction from reconstructed samples in the current picture, and transmits intra BC encoding information to the entropy coding unit 160 .
  • the intra BC prediction unit obtains a block vector value indicating a reference region used for prediction of the current region by referring to a specific region in the current picture.
  • the intra BC prediction unit may perform intra BC prediction by using the obtained block vector value.
  • the intra BC prediction unit transmits the intra BC encoding information to the entropy coding unit 160 .
  • the intra BC encoding information may include block vector information.
  • the transform unit 110 obtains a transform coefficient value by transforming a residual value between the original picture and the predicted picture.
  • the transformation may be performed in units of a specific block within the picture, and the size of the specific block may vary within a preset range.
  • the quantization unit 115 quantizes the transform coefficient values generated by the transform unit 110 and transmits the quantized values to the entropy coding unit 160 .
  • the entropy coding unit 160 entropy-codes information indicating quantized transform coefficients, intra-encoding information, and inter-encoding information to generate a video signal bitstream.
  • a variable length coding (VLC) scheme and an arithmetic coding scheme may be used.
  • the variable length coding scheme converts input symbols into continuous codewords, and the length of the codewords may be variable. For example, symbols that occur frequently are expressed as short codewords, and symbols that do not occur frequently are expressed as long codewords.
  • a context-based adaptive variable length coding (CAVLC) scheme may be used as the variable length coding scheme.
  • Arithmetic coding converts consecutive data symbols into one prime number, and the arithmetic coding can obtain an optimal fractional bit required to represent each symbol.
  • arithmetic coding context-based adaptive binary arithmetic coding (CABAC) may be used.
  • CABAC context-based adaptive binary arithmetic coding
  • the entropy coding unit 160 may binarize information indicating a quantized transform coefficient. Also, the entropy coding unit 160 may generate a bitstream by performing arithmetic coding on the binarized information.
  • the generated bitstream is encapsulated in a network abstraction layer (NAL) unit as a basic unit.
  • NAL network abstraction layer
  • the NAL unit includes an integer number of coded coding tree units.
  • the bitstream is divided into NAL units, and then each divided NAL unit must be decoded.
  • information necessary for decoding a video signal bitstream is a high-level set such as a picture parameter set (PPS), a sequence parameter set (SPS), a video parameter set (VPS), and the like. It can be transmitted through the RBSP (Raw Byte Sequence Payload).
  • FIG. 1 shows the encoding apparatus 100 according to an embodiment of the present disclosure. Separately displayed blocks are logically separated and illustrated elements of the encoding apparatus 100 . Accordingly, the elements of the above-described encoding apparatus 100 may be mounted as one chip or a plurality of chips according to the design of the device. According to an embodiment, an operation of each element of the above-described encoding apparatus 100 may be performed by a processor (not shown).
  • the decoding apparatus 200 of the present disclosure includes an entropy decoding unit 210 , an inverse quantization unit 220 , an inverse transform unit 225 , a filtering unit 230 , and a prediction unit 250 .
  • the entropy decoding unit 210 entropy-decodes the video signal bitstream to extract transform coefficient information, intra-encoding information, inter-encoding information, and the like for each region. For example, the entropy decoding unit 210 may obtain a binarization code for transform coefficient information of a specific region from a video signal bitstream. Also, the entropy decoding unit 210 inversely binarizes the binarized code to obtain quantized transform coefficients. The inverse quantization unit 220 inverse quantizes the quantized transform coefficient, and the inverse transform unit 225 restores a residual value using the inverse quantized transform coefficient. The video signal processing apparatus 200 restores the original pixel value by adding the residual value obtained by the inverse transform unit 225 with the prediction value obtained by the prediction unit 250 .
  • the filtering unit 230 improves picture quality by filtering the picture.
  • This may include a deblocking filter for reducing block distortion and/or an adaptive loop filter for removing distortion from the entire picture.
  • the filtered picture is output or stored in the decoded picture buffer DPB 256 to be used as a reference picture for the next picture.
  • the prediction unit 250 includes an intra prediction unit 252 and an inter prediction unit 254 .
  • the prediction unit 250 generates a prediction picture by using the encoding type decoded through the entropy decoding unit 210, transform coefficients for each region, intra/inter encoding information, and the like.
  • a current picture including the current block or a decoded area of other pictures may be used.
  • a picture that can be performed is called an inter picture (or tile/slice).
  • a picture (or tile/slice) using at most one motion vector and a reference picture index to predict the sample values of each block among inter-pictures (or tile/slice) is a predictive picture or a P picture (or , tile/slice), and a picture (or tile/slice) using up to two motion vectors and a reference picture index is called a bi-predictive picture or a B picture (or tile/slice).
  • a P picture uses at most one set of motion information to predict each block
  • a B picture uses up to two sets of motion information to predict each block.
  • the motion information set includes one or more motion vectors and one reference picture index.
  • the intra prediction unit 252 generates a prediction block by using the intra encoding information and reconstructed samples in the current picture.
  • the intra encoding information may include at least one of an intra prediction mode, a most probable mode (MPM) flag, and an MPM index.
  • the intra prediction unit 252 predicts sample values of the current block by using reconstructed samples located on the left and/or above the current block as reference samples.
  • reconstructed samples, reference samples, and samples of the current block may represent pixels. Also, sample values may represent pixel values.
  • the reference samples may be samples included in a neighboring block of the current block.
  • the reference samples may be samples adjacent to a left boundary and/or samples adjacent to an upper boundary of the current block.
  • the reference samples are located on a line within a preset distance from the left boundary of the current block among samples of neighboring blocks of the current block and/or on a line within a preset distance from the upper boundary of the current block may be samples.
  • the neighboring blocks of the current block are a left (L) block, an upper (A) block, a lower left (BL) block, an above right (AR) block, or an above left (Above Left) block adjacent to the current block.
  • AL left
  • the inter prediction unit 254 generates a prediction block by using the reference picture stored in the decoded picture buffer 256 and the inter encoding information.
  • the inter encoding information may include a motion information set (reference picture index, motion vector information, etc.) of the current block with respect to the reference block.
  • Inter prediction may include L0 prediction, L1 prediction, and bi-prediction.
  • L0 prediction is prediction using one reference picture included in the L0 picture list
  • L1 prediction means prediction using one reference picture included in the L1 picture list.
  • one set of motion information eg, a motion vector and a reference picture index
  • a maximum of two reference regions may be used, and the two reference regions may exist in the same reference picture or in different pictures, respectively.
  • a maximum of two sets of motion information (eg, a motion vector and a reference picture index) may be used, and the two motion vectors may correspond to the same reference picture index or may correspond to different reference picture indexes. may correspond.
  • the reference pictures may be temporally displayed (or output) before or after the current picture.
  • the two reference regions used in the bi-prediction method may be regions selected from each of the L0 picture list and the L1 picture list.
  • the inter prediction unit 254 may obtain the reference block of the current block by using the motion vector and the reference picture index.
  • the reference block exists in the reference picture corresponding to the reference picture index.
  • a sample value of a block specified by the motion vector or an interpolated value thereof may be used as a predictor of the current block.
  • an 8-tap interpolation filter may be used for a luma signal and a 4-tap interpolation filter may be used for a chroma signal.
  • the interpolation filter for motion prediction in units of subpels is not limited thereto.
  • the inter prediction unit 254 performs motion compensation for predicting the texture of the current unit from the previously reconstructed picture. In this case, the inter prediction unit may use the motion information set.
  • the prediction unit 250 may include an intra BC prediction unit (not shown).
  • the intra BC predictor may reconstruct the current region with reference to a specific region including reconstructed samples in the current picture.
  • the intra BC prediction unit obtains intra BC encoding information for the current region from the entropy decoding unit 210 .
  • the intra BC prediction unit obtains a block vector value of the current region indicating a specific region in the current picture.
  • the intra BC prediction unit may perform intra BC prediction by using the obtained block vector value.
  • the intra BC encoding information may include block vector information.
  • a reconstructed video picture is generated by adding the prediction value output from the intra prediction unit 252 or the inter prediction unit 254 and the residual value output from the inverse transform unit 225 . That is, the video signal decoding apparatus 200 reconstructs the current block by using the prediction block generated by the prediction unit 250 and the residual obtained from the inverse transform unit 225 .
  • the block diagram of FIG. 2 shows the decoding apparatus 200 according to an embodiment of the present disclosure, in which the separately displayed blocks are logically divided into elements of the decoding apparatus 200 .
  • the elements of the decoding apparatus 200 described above may be mounted as one chip or a plurality of chips according to the design of the device.
  • the operation of each element of the above-described decoding apparatus 200 may be performed by a processor (not shown).
  • FIG. 3 shows an embodiment in which a coding tree unit (CTU) is divided into coding units (CUs) within a picture.
  • CTU coding tree unit
  • CUs coding units
  • a coding tree unit consists of an NXN block of luma samples and two blocks of corresponding chroma samples.
  • a coding tree unit may be divided into a plurality of coding units.
  • a coding tree unit may be a leaf node without being split. In this case, the coding tree unit itself may be a coding unit.
  • the coding unit refers to a basic unit for processing a picture in the process of processing the video signal described above, that is, intra/inter prediction, transformation, quantization, and/or entropy coding.
  • the size and shape of the coding unit in one picture may not be constant.
  • the coding unit may have a square or rectangular shape.
  • the rectangular coding unit (or rectangular block) includes a vertical coding unit (or a vertical block) and a horizontal coding unit (or a horizontal block).
  • a vertical block is a block having a height greater than a width
  • a horizontal block is a block having a width greater than a height.
  • a non-square block may refer to a rectangular block, but the present disclosure is not limited thereto.
  • the coding tree unit is first divided into a quad tree (QT) structure. That is, in the quad tree structure, one node having a size of 2NX2N may be divided into four nodes having a size of NXN.
  • a quad tree may also be referred to as a quaternary tree. Quad tree partitioning can be performed recursively, and not all nodes need to be partitioned to the same depth.
  • a leaf node of the aforementioned quad tree may be further divided into a multi-type tree (MTT) structure.
  • MTT multi-type tree
  • one node in the multi-type tree structure, one node may be divided into a binary (binary) or ternary (ternary) tree structure of horizontal or vertical division. That is, in the multi-type tree structure, there are four partitioning structures: vertical binary partitioning, horizontal binary partitioning, vertical ternary partitioning, and horizontal ternary partitioning.
  • both a width and a height of a node in each tree structure may have a value of a power of two.
  • a node having a size of 2NX2N may be divided into two NX2N nodes by vertical binary division and divided into two 2NXN nodes by horizontal binary division.
  • a node of size 2NX2N is divided into nodes of (N/2)X2N, NX2N, and (N/2)X2N by vertical ternary division, and a horizontal ternary It can be divided into nodes of 2NX(N/2), 2NXN, and 2NX(N/2) by division.
  • This multi-type tree splitting can be performed recursively.
  • a leaf node of a multi-type tree may be a coding unit.
  • the coding unit may be used as a unit of prediction and/or transform without further splitting.
  • the width or height of the current coding unit is greater than the maximum transform length, the current coding unit may be split into a plurality of transform units without explicit signaling regarding splitting.
  • at least one of the following parameters may be predefined or transmitted through an RBSP of a higher level set such as PPS, SPS, or VPS.
  • Preset flags may be used to signal the division of the aforementioned quad tree and multi-type tree.
  • a flag 'qt_split_flag' indicating whether to split a quad tree node
  • a flag 'mtt_split_flag' indicating whether to split a multi-type tree node
  • a flag 'mtt_split_vertical_flag' indicating a split direction of a multi-type tree node ' or a flag 'mtt_split_binary_flag' indicating a split shape of a multi-type tree node
  • a coding tree unit is a root node of a quad tree, and may be first divided into a quad tree structure.
  • 'qt_split_flag' is signaled for each node 'QT_node'.
  • the corresponding node is divided into four square nodes, and when the value of 'qt_split_flag' is 0, the corresponding node becomes a leaf node 'QT_leaf_node' of the quad tree.
  • Each quad tree leaf node 'QT_leaf_node' may be further divided into a multi-type tree structure.
  • 'mtt_split_flag' is signaled for each node 'MTT_node'.
  • the corresponding node is divided into a plurality of rectangular nodes, and when the value of 'mtt_split_flag' is 0, the corresponding node becomes the leaf node 'MTT_leaf_node' of the multi-type tree.
  • the node 'MTT_node' is divided into two rectangular nodes, and when the value of 'mtt_split_binary_flag' is 0, the node 'MTT_node' is divided into three rectangular nodes.
  • Picture prediction (motion compensation) for coding is performed on a coding unit that is no longer divided (ie, a leaf node of a coding unit tree).
  • a basic unit for performing such prediction is hereinafter referred to as a prediction unit or a prediction block.
  • the term unit used herein may be used as a term replacing the prediction unit, which is a basic unit for performing prediction.
  • the present disclosure is not limited thereto, and more broadly, it may be understood as a concept including the coding unit.
  • the intra prediction unit predicts sample values of the current block by using reconstructed samples located on the left and/or above the current block as reference samples.
  • FIG. 5 shows an embodiment of reference samples used for prediction of a current block in an intra prediction mode.
  • the reference samples may be samples adjacent to the left boundary and/or samples adjacent to the upper boundary of the current block.
  • a maximum of 2W+2H+1 located at the left and/or upper side of the current block Reference samples may be set using the surrounding samples.
  • the intra prediction unit may perform a reference sample padding process to obtain a reference sample. Also, the intra prediction unit may perform a reference sample filtering process to reduce an intra prediction error. That is, filtered reference samples may be obtained by performing filtering on neighboring samples and/or reference samples obtained by the reference sample padding process. The intra prediction unit predicts samples of the current block using the reference samples obtained in this way. The intra prediction unit predicts samples of the current block using unfiltered reference samples or filtered reference samples.
  • the surrounding samples may include samples on at least one reference line.
  • the neighboring samples may include neighboring samples on a line adjacent to the boundary of the current block.
  • FIG. 6 shows an embodiment of prediction modes used for intra prediction.
  • intra prediction mode information indicating an intra prediction direction may be signaled.
  • the intra prediction mode information indicates any one of a plurality of intra prediction modes constituting the intra prediction mode set.
  • the decoder receives intra prediction mode information of the current block from the bitstream.
  • the intra prediction unit of the decoder performs intra prediction on the current block based on the extracted intra prediction mode information.
  • the intra prediction mode set may include all intra prediction modes (eg, a total of 67 intra prediction modes) used for intra prediction. More specifically, the intra prediction mode set may include a planar mode, a DC mode, and a plurality of (eg, 65) angular modes (ie, directional modes). Each intra prediction mode may be indicated through a preset index (ie, intra prediction mode index). For example, as shown in FIG. 6 , the intra prediction mode index 0 indicates the planar mode, and the intra prediction mode index 1 indicates the DC mode. In addition, the intra prediction mode indexes 2 to 66 may indicate different angular modes, respectively. The angle modes respectively indicate different angles within a preset angle range.
  • the angle mode may indicate an angle within an angle range between 45 degrees and -135 degrees in a clockwise direction (ie, a first angle range).
  • the angle mode may be defined based on the 12 o'clock direction.
  • intra prediction mode index 2 indicates a horizontal diagonal (HDIA) mode
  • intra prediction mode index 18 indicates a horizontal (HOR) mode
  • intra prediction mode index 34 indicates a diagonal (DIA) mode.
  • an intra prediction mode index 50 indicates a vertical (VER) mode
  • an intra prediction mode index 66 indicates a vertical diagonal (VDIA) mode.
  • the preset angle range may be set differently according to the shape of the current block.
  • a wide-angle mode indicating an angle greater than 45 degrees or less than -135 degrees in a clockwise direction may be additionally used.
  • the angle mode may indicate an angle within an angle range (ie, a second angle range) between (45+offset1) and (-135+offset1) degrees in a clockwise direction.
  • angle modes 67 to 76 outside the first angle range may be additionally used.
  • the angle mode may indicate an angle within an angle range (ie, a third angle range) between (45-offset2) and (-135-offset2) degrees in a clockwise direction.
  • angle modes -10 to -1 outside the first angle range may be additionally used.
  • the values of offset1 and offset2 may be determined differently according to a ratio between the width and the height of the rectangular block. Also, offset1 and offset2 may be positive numbers.
  • the plurality of angular modes constituting the intra prediction mode set may include a basic angular mode and an extended angular mode.
  • the extended angle mode may be determined based on the basic angle mode.
  • the basic angle mode corresponds to an angle used in intra prediction of the existing High Efficiency Video Coding (HEVC) standard
  • the extended angle mode corresponds to an angle newly added in intra prediction of the next-generation video codec standard. It can be a mode to More specifically, the default angular mode is the intra prediction mode ⁇ 2, 4, 6, ... , 66 ⁇ is an angular mode corresponding to any one of, and the extended angular mode is an intra prediction mode ⁇ 3, 5, 7, ... , 65 ⁇ may be an angular mode corresponding to any one of. That is, the extended angular mode may be an angular mode between basic angular modes within the first angular range. Accordingly, the angle indicated by the extended angle mode may be determined based on the angle indicated by the basic angle mode.
  • HEVC High Efficiency Video Coding
  • the basic angle mode may be a mode corresponding to an angle within a preset first angle range
  • the extended angle mode may be a wide-angle mode outside the first angle range. That is, the default angle mode is the intra prediction mode ⁇ 2, 3, 4, ... , 66 ⁇ is an angular mode corresponding to any one of, and the extended angular mode is an intra prediction mode ⁇ -10, -9, ... , -1 ⁇ and ⁇ 67, 68, ... , 76 ⁇ may be an angular mode corresponding to any one of.
  • the angle indicated by the extended angle mode may be determined as an angle opposite to the angle indicated by the corresponding basic angle mode. Accordingly, the angle indicated by the extended angle mode may be determined based on the angle indicated by the basic angle mode.
  • the number of extension angle modes is not limited thereto, and additional extension angles may be defined according to the size and/or shape of the current block.
  • the extended angle mode is an intra prediction mode ⁇ -14, -13, ... , -1 ⁇ and ⁇ 67, 68, ... , 80 ⁇ may be defined as an angular mode corresponding to any one of.
  • the total number of intra prediction modes included in the intra prediction mode set may vary according to the configuration of the aforementioned basic angular mode and extended angular mode.
  • the interval between the extended angular modes may be set based on the interval between the corresponding basic angular modes.
  • extended angle modes ⁇ 3, 5, 7, ... , 65 ⁇ is the interval between the corresponding basic angular modes ⁇ 2, 4, 6, ... , 66 ⁇ may be determined based on the interval between them.
  • extended angle modes ⁇ -10, -9, ... , -1 ⁇ between the corresponding opposing elementary angular modes ⁇ 56, 57, ... , 65 ⁇ is determined based on the spacing between the extension angle modes ⁇ 67, 68, ... , 76 ⁇ the spacing between the corresponding opposing elementary angular modes ⁇ 3, 4, . . . , 12 ⁇ may be determined based on the interval between them.
  • the angular spacing between the extended angular modes may be set to be the same as the angular spacing between the corresponding basic angular modes. Also, the number of extended angular modes in the intra prediction mode set may be set to be less than or equal to the number of basic angular modes.
  • the extended angle mode may be signaled based on the basic angle mode.
  • the wide angle mode ie, the extended angle mode
  • the wide angle mode may replace at least one angle mode (ie, the basic angle mode) within the first angle range.
  • the replaced basic angular mode may be an angular mode corresponding to the opposite of the wide-angle mode. That is, the replaced basic angle mode is an angle mode corresponding to an angle in the opposite direction to the angle indicated by the wide-angle mode or to an angle different from the angle in the opposite direction by a preset offset index.
  • the preset offset index is 1.
  • the intra prediction mode index corresponding to the replaced basic angle mode may be remapped to the wide-angle mode to signal the corresponding wide-angle mode.
  • wide-angle mode ⁇ -10, -9, ... , -1 ⁇ is the intra prediction mode index ⁇ 57, 58, ... , 66 ⁇ can be respectively signaled by the wide-angle mode ⁇ 67, 68, ... , 76 ⁇ is the intra prediction mode index ⁇ 2, 3, ... , 11 ⁇ may be signaled, respectively.
  • the intra prediction mode index for the basic angular mode signals the extended angular mode
  • the same set of intra prediction mode indexes is used for signaling of the intra prediction mode even if the configurations of the angular modes used for intra prediction of each block are different from each other. can be used Accordingly, signaling overhead caused by a change in the intra prediction mode configuration can be minimized.
  • whether to use the extended angle mode may be determined based on at least one of the shape and size of the current block.
  • the extended angle mode when the size of the current block is larger than the preset size, the extended angle mode is used for intra prediction of the current block, otherwise only the basic angle mode can be used for intra prediction of the current block.
  • the current block when the current block is a non-square block, the extended angle mode is used for intra prediction of the current block, and when the current block is a square block, only the basic angle mode can be used for intra prediction of the current block.
  • the inter prediction method may include a general inter prediction method optimized for translation motion and an affine model-based inter prediction method.
  • the motion vector may include at least one of a general motion vector for motion compensation according to the general inter prediction method and a control point motion vector for affine motion compensation.
  • the decoder may predict the current block with reference to reconstructed samples of another decoded picture.
  • the decoder obtains the reference block 702 in the reference picture 720 based on the motion information set of the current block 701 .
  • the motion information set may include a reference picture index and a motion vector.
  • the reference picture index indicates the reference picture 720 including the reference block for inter prediction of the current block in the reference picture list.
  • the reference picture list may include at least one of the aforementioned L0 picture list and L1 picture list.
  • the motion vector represents an offset between the coordinate values of the current block 701 in the current picture 710 and the coordinate values of the reference block 702 in the reference picture 720 .
  • the decoder obtains a predictor of the current block 701 based on sample values of the reference block 702 , and reconstructs the current block 701 using the predictor.
  • the encoder may acquire the aforementioned reference block by searching for a block similar to the current block in pictures having an earlier reconstruction order. For example, the encoder may search for a reference block in which a sum of a difference between a current block and a sample value is minimized within a preset search area.
  • a sum of absolute difference SAD
  • SATD sum of hadamard transformed difference
  • the SAD may be a value obtained by adding all absolute values of differences between sample values included in two blocks.
  • SATD may be a value obtained by adding all absolute values of Hadamard transform coefficients obtained by performing Hadamard transform on a difference between sample values included in two blocks.
  • the current block may be predicted using one or more reference regions.
  • the current block may be inter-predicted through a bi-prediction method using two or more reference regions.
  • the decoder may obtain two reference blocks based on two sets of motion information of the current block.
  • the decoder may obtain the first predictor and the second predictor of the current block based on the obtained sample values of each of the two reference blocks.
  • the decoder may reconstruct the current block using the first predictor and the second predictor. For example, the decoder may reconstruct the current block based on the sample-by-sample average of the first predictor and the second predictor.
  • one or more sets of motion information may be signaled.
  • the similarity between the motion information sets for motion compensation of each of the plurality of blocks may be used.
  • the motion information set used for prediction of the current block may be derived from the motion information set used for prediction of any one of other previously reconstructed samples. In this way, the encoder and decoder can reduce signaling overhead.
  • the decoder may generate a merge candidate list based on the plurality of candidate blocks.
  • the merge candidate list may include candidates corresponding to samples that are likely to have been predicted based on the motion information set related to the motion information set of the current block, among samples reconstructed before the current block.
  • the encoder and the decoder may construct a merge candidate list of the current block according to a predefined rule. In this case, the merge candidate lists each configured by the encoder and the decoder may be identical to each other.
  • the encoder and the decoder may construct a merge candidate list of the current block based on the position of the current block in the current picture.
  • the position of the specific block indicates the relative position of the top-left sample of the specific block in a picture including the specific block.
  • the merge candidate list may include spatial candidates and temporal candidates.
  • a prediction method for performing motion compensation using a motion information set obtained based on the merge candidate list may be referred to as a merge mode.
  • the encoder may signal a flag (merge flag) indicating whether the merge mode is applied to the current block and an index (merge candidate index) indicating a motion information set used in the merge candidate list when the merge mode is applied to the current block. have.
  • the decoder may determine whether a merge mode is applied to the current block based on the merge flag, and may obtain a motion information set based on a merge candidate index.
  • a motion vector (MV) used for motion compensation is divided into motion vector predictor (MVP) information and motion vector difference (MVD) information.
  • MVP motion vector predictor
  • MVD motion vector difference
  • MVP is derived based on this, and it corresponds to the MVP information and the difference between the actual motion vector and the MVP.
  • the MVP may be obtained based on a motion vector predictor candidate list composed of a plurality of motion vector predictor candidates and index information indicating a candidate used in the candidate list.
  • the encoder and the decoder may generate the same motion vector predictor candidate list based on a plurality of motion vectors corresponding to spatially adjacent positions or temporally adjacent positions.
  • a method of quantizing a transform coefficient value obtained by transforming a residual signal and coding the quantized transform coefficient may be used instead of coding the above-described residual (residual) signal as it is.
  • the transform unit may obtain transform coefficient values by transforming the residual signal.
  • the residual signal of a specific block may be distributed over the entire area of the current block. Accordingly, it is possible to improve coding efficiency by concentrating energy in a low-frequency region through frequency-domain transformation of the residual signal.
  • a method in which the residual signal is transformed or inversely transformed will be described in detail.
  • the encoder may obtain transform coefficients by transforming the obtained residual signal.
  • the encoder may obtain at least one residual block including a residual signal for the current block.
  • the residual block may be any one of a current block or blocks divided from the current block.
  • the residual block may be referred to as a residual array or residual matrix including residual samples of the current block.
  • a residual block may indicate a block having the same size as that of a transform unit or a transform block.
  • the encoder may transform the residual block using a transform kernel.
  • the transform kernel used for transforming the residual block may be a transform kernel having separable characteristics of vertical transform and horizontal transform.
  • transformation of the residual block may be performed separately into vertical transformation and horizontal transformation.
  • the encoder may perform vertical transformation by applying a transform kernel in the vertical direction of the residual block.
  • the encoder may perform horizontal transform by applying a transform kernel in the horizontal direction of the residual block.
  • a transform kernel may be used as a term referring to a parameter set used for transforming a residual signal, such as a transform matrix, a transform array, a transform function, and a transform.
  • the transform kernel may be any one of a plurality of available kernels.
  • a transform kernel based on a different transform type may be used for each of the vertical transform and the horizontal transform.
  • the encoder may quantize the transform block transformed from the residual block by passing it to the quantizer.
  • the transform block may include a plurality of transform coefficients.
  • the transform block may include a plurality of two-dimensionally arranged transform coefficients.
  • the size of the transform block may be the same as any one of the current block or a block divided from the current block, like the residual block.
  • the transform coefficients transmitted to the quantizer may be expressed as quantized values.
  • the encoder may perform an additional transform before the transform coefficients are quantized.
  • the above-described transform method may be referred to as a primary transform, and the additional transform may be referred to as a secondary transform.
  • the quadratic transformation may be selective for each residual block.
  • the encoder may improve coding efficiency by performing a second-order transform on a region in which it is difficult to concentrate energy in a low-frequency region only with the first-order transform.
  • a quadratic transformation may be added to a block in which residual values appear largely in a direction other than the horizontal or vertical direction of the residual block.
  • the residual values of the intra-predicted block may have a higher probability of changing in a direction other than the horizontal or vertical direction compared to the residual values of the inter-predicted block. Accordingly, the encoder may additionally perform quadratic transformation on the residual signal of the intra-predicted block. Also, the encoder may omit the quadratic transform for the residual signal of the inter-predicted block.
  • whether to perform secondary transformation may be determined according to the size of the current block or the residual block.
  • transform kernels having different sizes may be used according to the size of the current block or the residual block.
  • the 8X8 quadratic transform may be applied to a block in which the length of the shorter side of the width or the height is greater than or equal to the first preset length.
  • a 4X4 quadratic transformation may be applied to a block having a shorter side length of a width or a height equal to or greater than the second preset length and smaller than the first preset length.
  • the first preset length may be a value greater than the second preset length, but the present disclosure is not limited thereto.
  • the secondary transformation may not be separately performed into vertical transformation and horizontal transformation, unlike the primary transformation.
  • This quadratic transform may be referred to as a low frequency non-separable transform (LFNST).
  • transformation of the residual signal of a specific region may be omitted.
  • the syntax element may include transform skip information.
  • the transform skip information may be a transform skip flag.
  • the encoder may directly quantize the residual signal on which the transformation of the corresponding region is not performed. Operations of the encoder described with reference to FIG. 8 may be performed through the transform unit of FIG. 1 .
  • the above-described conversion-related syntax elements may be information parsed from a video signal bitstream.
  • the decoder may entropy decode the video signal bitstream to obtain transform related syntax elements.
  • the encoder may entropy code transform related syntax elements to generate a video signal bitstream.
  • FIG. 9 is a diagram specifically illustrating a method in which an encoder and a decoder inverse transform transform coefficients to obtain a residual signal.
  • an inverse transform operation is performed through an inverse transform unit of each of the encoder and the decoder.
  • the inverse transform unit may obtain a residual signal by inversely transforming the inverse quantized transform coefficient.
  • the inverse transform unit may detect whether an inverse transform is performed on a corresponding region from a transformation-related syntax element of a specific region. According to an embodiment, when a transform-related syntax element for a specific transform block indicates transform skip, the transform for the corresponding transform block may be omitted.
  • both the first-order inverse transform and the second-order inverse transform described above for the transform block may be omitted.
  • the inverse quantized transform coefficient may be used as a residual signal.
  • the decoder may reconstruct the current block by using the inverse quantized transform coefficient as the residual signal.
  • the above-described inverse first-order transform represents an inverse transform to the first-order transform, and may be referred to as an inverse primary transform.
  • the inverse quadratic transform represents an inverse transform to the second-order transform (LFNST), and may be referred to as an inverse secondary transform or an inverse LFNST.
  • a first-order (inverse) transform may be referred to as a first (inverse) transform
  • a second-order (inverse) transform may be referred to as a second (inverse) transform.
  • a transform related syntax element for a specific transform block may not indicate transform skip.
  • the inverse transform unit may determine whether to perform the second-order inverse transform on the second-order transform. For example, when the transform block is a transform block of an intra-predicted block, a second-order inverse transform may be performed on the transform block. Also, a secondary transform kernel used for the transform block may be determined based on the intra prediction mode corresponding to the transform block. As another example, whether to perform second-order inverse transform may be determined based on the size of the transform block. The second-order inverse transform may be performed before the first-order inverse transform is performed after the inverse quantization process.
  • the inverse transform unit may perform first inverse transform on the inverse quantized transform coefficient or the second inverse transform coefficient.
  • first-order inverse transform like the first-order transform, the vertical transform and the horizontal transform may be separately performed.
  • the inverse transform unit may obtain a residual block by performing vertical inverse transform and horizontal inverse transform on the transform block.
  • the inverse transform unit may inversely transform the transform block based on a transform kernel used for transforming the transform block.
  • the encoder may explicitly or implicitly signal information indicating a transform kernel applied to a current transform block among a plurality of available transform kernels.
  • the decoder may select a transform kernel to be used for inverse transform of the transform block from among a plurality of available transform kernels by using the signaled information indicating the transform kernel.
  • the inverse transform unit may reconstruct the current block using a residual signal obtained through inverse transform on the transform coefficients.
  • the generalization of mobile terminal devices such as smartphones and tablet PCs has drastically changed the consumption pattern of video content. Rather than sitting in front of a TV and watching a program broadcast at a set time, there is a trend that more and more people consume desired video content anytime and anywhere through a video on demand service with a smartphone. Accordingly, there is a need for a video coding method that can smoothly provide video content in various terminals and network environments.
  • the scalable video coding method is a technology to cope with diversified terminal devices and network environments.
  • the encoder is layered to support various resolutions, frame rates (frames per second), image quality, etc. for one image.
  • a bitstream is created by encoding the image in a traditional way.
  • the decoder may extract all or part of a bitstream and decode the bitstream to obtain an image suitable for a corresponding terminal and environment.
  • 10 shows a block diagram of an encoder supporting a scalable video coding scheme.
  • 10 is an example of generating a bitstream supporting all images of ultra high definition (UHD) 8K resolution (7680 x 4320), UHD 4K resolution (3840 x 2160), and full high definition (FHD) resolution (1920 x 1080) indicates
  • UHD ultra high definition
  • FHD full high definition
  • For an input image of 8K resolution after preprocessing, an input image of 4K resolution and FHD resolution may be obtained, and the preprocessing process may include down-sampling.
  • Input image sequences having different resolutions may be encoded into different layers (or layers).
  • the lowest resolution FHD input image may be coded as layer 0 (lowest layer), and the 4K resolution and 8K input images may be coded as layer 1 and layer 2, respectively.
  • inter-layer prediction may be performed. Since there is a correlation between the lower layer and the higher layer, when coding the current picture of the upper layer, the sample or syntax element of the picture already coded (restored) in the lower layer can be used to code the current picture of the upper layer. . For example, an already reconstructed picture of a lower layer corresponding to the same output order as the current picture may be used as a reference picture of the current picture. That is, it indicates that it can be used as a reference block for predicting the texture (pixel sample value) of the current block or can be utilized for coding a motion information set.
  • IRP inter-layer prediction
  • a picture used as a reference picture by inter-layer prediction may be referred to as an inter-layer reference picture (ILRP), and after inter-layer processing, a decoded picture buffer of the current layer (decoded picture) buffer, DPB).
  • Inter-layer processing is a processing procedure for using an already reconstructed picture of a lower layer for coding of a current picture. For example, when spatial scalability is supported, up-sampling may be included. By upsampling, the resolution of the lower layer may be adjusted to the resolution of the current layer, and a preset interpolation filter may be used. For example, a DCT-based interpolation filter (discrete cosine transform-based interpolation filter, DCT-IF) may be used.
  • a process of mapping the motion vector of the lower layer to match the resolution of the higher layer may be included.
  • the higher layer may encode with reference to the lower layer, and the encoder block of FIG. 10 may include the encoder component described above with reference to FIG. 1 .
  • the coded data of the video may be encapsulated in a network abstraction layer (NAL) unit unit, and the header of the NAL unit is a layer identifier (nuh_layer_id) in order to indicate in which layer data included in the corresponding NAL unit is included. ) may be included. That is, it indicates that the NAL unit included in each layer can be classified for each layer by the layer identifier included in the NAL unit header.
  • a bitstream generated for each layer may be configured as one bitstream by a multiplexer, and the corresponding bitstream may be stored in a storage device or transmitted through a network.
  • An input bitstream as an input to the decoder may be composed of a plurality of layers, and the decoder may obtain an image suitable for the terminal by decoding some or all of the bitstream divided for each layer by a demultiplexer.
  • the input bitstream of the decoder may include a plurality of output layer set (OLS) information and reference information between layers.
  • OLS output layer set
  • the output layer set is a set of layers, and may include an output layer and a reference layer, and the decoder may select an output layer set to be decoded from among a plurality of output layer sets.
  • the decoder may selectively extract and decode layers included in the corresponding output layer set from the input bitstream, and layers not included in the output layer set may be removed from the input bitstream. For example, in order for the terminal decoder to obtain an image of 4K resolution, layer 1 corresponding to an output layer and a bitstream corresponding to layer 0 that is a reference layer of layer 1 may be decoded. Since layer 2 is not included in the output layer set, the bitstream corresponding to layer 2 may not be decoded. As another example, in order for the terminal decoder to obtain an image of 8K resolution, the bitstream corresponding to layer 2 corresponding to the output layer, layer 1 being the reference layer of layer 2, and layer 0 being the reference layer of layer 1 may be decoded. .
  • the decoder block of FIG. 11 may include the decoder components described above in FIG. 2 , and, like the encoder, inter-layer prediction may be performed to reduce redundancy between layers. That is, a picture of a higher layer may be decoded based on a sample or syntax element of an already decoded picture of a lower layer. For example, an already reconstructed picture of the lower layer corresponding to the same output order as the current picture of the higher layer may be used as a reference picture of the current picture. That is, it indicates that it can be used as a reference block for texture prediction of the current block or can be utilized for motion information set coding.
  • the reconstructed picture of the lower layer used as the inter-layer reference picture of the higher layer may be processed into a form that can be referenced by the higher layer by the inter-layer processing process and then stored in the DPB. For example, when spatial scalability is supported, up-sampling of a picture of a lower layer may be performed, and a process of mapping a motion vector to fit a higher layer may be performed.
  • the NAL unit 12 shows the structure of a bitstream composed of multiple layers.
  • Data including a parameter set and a coded picture may be included in a bitstream in the form of a NAL unit, and the bitstream may consist of one or more layers.
  • the NAL unit is a video coding layer (VCL) NAL unit and It may be divided into non-VCL NAL units.
  • the header of the NAL unit may include NAL unit type information (nal_unit_type) to indicate the type of data included in the corresponding NAL unit.
  • the header of the NAL unit may include a layer identifier (nuh_layer_id) to indicate to which layer data included in the corresponding NAL unit belongs, and nuh_layer_id of all NAL units included in a specific layer have the same value.
  • the layer may be composed of temporal sub-layers, and the header of the NAL unit may include a temporal identifier (TemporalId) to indicate to which sub-layer data included in the corresponding NAL unit belongs. .
  • TemporalId of all NAL units included in a specific sub-layer of the layer may have the same value.
  • 12 shows an example in which a bitstream is composed of three layers (layer 0, layer 1, layer 2), and each layer is composed of three sub-layers (sub-layer 0, sub-layer 1, sub-layer 2). do.
  • a rectangle indicated by a solid line indicates a coded picture and a picture unit (PU), and a column of picture units indicated by a solid line indicates an access unit (AU).
  • An access unit is a set of picture units belonging to different layers, and output timing of coded pictures included in each picture unit included in the access unit may be the same.
  • a picture unit is a set of NAL units including one coded picture, and a coded picture may be composed of VCL NAL units including all coding tree units of a picture as a coded picture.
  • a sequence of access units in FIG. 12 may constitute a coded video sequence (CVS).
  • CVS is a sequence of access units, and may be composed of a coded video sequence start (CVSS) AU and AUs following it.
  • the PU included in the CVSS AU is a coded layer video sequence start (CLVSS) PU
  • the coded picture included in the CLVSS PU is a coded layer video sequence start (CLVSS) picture.
  • the CLVSS picture may be an intra random access point (IRAP) picture or a gradual decoding refresh (GDR) picture in which the type of the coded picture is an I picture for supporting random access.
  • IRAP intra random access point
  • GDR gradual decoding refresh
  • a coded layer video sequence is a sequence of picture units having the same nuh_layer_id (ie, a sequence of picture units in the same layer), and may be composed of a CLVSS PU and PUs following it.
  • the first access unit of FIG. 12 is a CVSS AU
  • a sequence of AUs composed of 5 AUs may be a CVS.
  • the CVS may refer to a video parameter set (VPS), and parameters specified in the VPS may be referred to in the CVS.
  • the sequence of PUs included in each layer is CLVS, and there may be three CLVSs.
  • the CLVS may refer to a sequence parameter set (SPS), and parameters specified in the SPS may be referred to in the CLVS.
  • SPS sequence parameter set
  • PPS picture parameter set
  • each layer includes four sub-layers.
  • Each rectangle is a picture, and the number in the rectangle indicates the coding order (encoding and decoding order) of the picture.
  • the pictures of FIG. 13 are listed according to the output order, and the picture on the left indicates that the picture is output before the picture on the right.
  • a previously coded picture in the same layer may be used as a reference picture.
  • the 1st coded picture of layer 1 may use the 0th coded picture as a reference picture
  • the 3rd coded picture uses the 0th coded picture and the 2nd coded picture as a reference picture.
  • a picture with a small TemporalId cannot refer to a picture with a large TemporalId
  • a picture included in a sublayer with a TemporalId less than or equal to a specific value can be decoded.
  • the decoder can adjust the frame rate.
  • inter-layer prediction may be used to reduce redundancy existing between layers.
  • the reconstructed picture of the lower layer may be used as a reference picture, and the reconstructed picture of the lower layer included in the same access unit as the current picture may be used as the inter-layer reference picture.
  • the inter-layer reference picture may be stored in the DPB, and may be utilized for texture prediction of a block in the current picture or coding of a motion information set.
  • pictures in which the TemporalId of the reference layer is greater than a specific value may not be used as inter-layer reference pictures.
  • a picture having a TemporalId (Tid) of 0 of layer 0 may be used as an inter-layer reference picture of a picture included in the same AU of layer 1.
  • a picture having a TemporalId (Tid) of 1 of layer 0 may be used as an inter-layer reference picture of a picture included in the same AU of layer 1.
  • a picture having a TemporalId (Tid) of 2 or 3 of layer 0 is not used as an inter-layer reference picture of layer 1.
  • the output layer set consists of layers 0 and 1, and only layer 1 is used as an output, the sub-layers with TemporalId 2 and 3 of layer 0 are not used for layer 1, which is the output layer, so the corresponding NAL unit is not decoded. may not be In this way, an image suitable for a terminal may be efficiently restored by extracting and decoding only a layer used as an output or a part of a sublayer of a layer used as an output or a reference according to an output layer set configuration from a bitstream.
  • HLS high level syntax
  • the decoder may decode the corresponding bitstream according to a set of predetermined rules and obtain a reconstructed image.
  • Each piece of information constituting a bitstream generated by a set of predetermined rules is called a syntax element (or syntax element), and the configuration of these syntax elements may be referred to as a syntax structure.
  • syntax elements are aligned in units of bytes and encapsulated in units of NAL units, which may be referred to as raw byte sequence payload (RBSP).
  • the NAL unit corresponding to the non-VCL includes DCI (decoding capability information) RBSP syntax, VPS (video There may be parameter set, video parameter set) RBSP syntax, SPS (sequence parameter set, sequence parameter set) RBSP syntax, PPS (picture parameter set, picture parameter set) RBSP syntax, PH (picture header) RBSP syntax, etc. have.
  • the NAL unit corresponding to the VCL may include a slice layer RBSP syntax including slice data.
  • the encoded bitstream may be configured in the order of a VPS NAL unit, an SPS NAL unit, a PPS NAL unit, a PH NAL unit, a slice layer NAL unit, and the like, and a specific NAL unit may not exist.
  • the order and number may vary depending on the environment of image compression.
  • An access unit delimiter (AUD) NAL unit and a DCI NAL unit may take precedence in front of the VPS RBSP syntax.
  • AUD access unit delimiter
  • FIG. 14 a reference relationship between an upper syntax structure and a lower syntax structure can be schematically confirmed. Syntax elements to be commonly used in a video sequence or picture (sub-picture) unit or slice unit may have a relationship as shown in FIG. 14 .
  • VPS RBSP syntax referenced based on sps_vps_id defined in SPS RBSP syntax may be commonly applied to a corresponding video sequence. Some parameters may be redefined/updated in lower syntax.
  • a term for a syntax structure when used, even if only a part of the name is used instead of the full name, it may mean that the syntax structure includes a corresponding partial name.
  • signaling/parsing for the syntax structure and syntax element of the present disclosure can be equally applied at the time of encoding/decoding, and it is close to signaling at the time of encoding and close to parsing because it reads and interprets from the bitstream at the time of decoding. Bars can be said to be similar.
  • the video parameter set includes layer information, output layer set information, profile, tier, level information, decoded picture buffer (DPB) information, hypothetical reference decoder (HRD) information may be included, and information specified in the VPS may be applied to a CVS referring to the VPS.
  • DVB decoded picture buffer
  • HRD hypothetical reference decoder
  • the second column of FIG. 15 indicates a descriptor for a syntax element and may indicate how the corresponding syntax element is parsed.
  • the descriptor is u(n) it may indicate that the syntax element is an unsigned integer and is expressed by n bits of a fixed length. In this case, n bits are read from the bitstream, and a pointer indicating a specific bit position in the bitstream may be moved by n bit positions.
  • a binary bitstream of length n obtained from the bitstream can be interpreted as a binary representation of an unsigned integer in which the most significant bit (MSB) is written first (or leftmost). For example, when the descriptor for the syntax element is u(2) and the bit string read from the bitstream is '10', it may indicate that the value of the corresponding syntax element is 2.
  • the syntax element vps_video_parameter_set_id may be included in the video parameter set RBSP.
  • vps_video_parameter_set_id may indicate an identifier for reference by another syntax element or another syntax structure.
  • a sequence parameter set (SPS) RBSP may include an identifier of a referenced VPS, and parameters of the VPS indicated by the identifier may be referenced.
  • the value of vps_video_parameter_set_id may be greater than a preset value.
  • the preset value may be 0. In the case of single-layer video coding, information such as a multi-layer and output layer set configuration obtained from a VPS may be unnecessary.
  • the VPS when the identifier of the VPS referenced by the SPS RBSP is 0, the VPS may not be referenced. Therefore, a NAL unit including VPS data may not exist in a bitstream coded as a single layer. On the other hand, when the identifier of the referenced VPS is greater than 0, parameters of the VPS indicated by the identifier may be referred to.
  • vps_max_layers_minus1 may be included in the video parameter set RBSP.
  • vps_max_layers_minus1 + 1 specifies the maximum number of layers allowed in the CVS referencing the corresponding VPS.
  • vps_max_sublayers_minus1 may be included in the video parameter set RBSP.
  • vps_max_sublayers_minus1 + 1 specifies the maximum number of temporal sublayers that may exist in individual layers in the CVS referring to the corresponding VPS.
  • the value of vps_max_sublayers_minus1 may be a value from 0 to 6. This may indicate that the maximum number of temporal sub-layers that may exist in an individual layer in the CVS referring to the VPS is seven.
  • the CVS referencing the VPS may consist of a plurality of layers, and each individual layer may include a plurality of temporal sublayers (when vps_max_layers_minus1>0 and vps_max_sublayers_minus1>0), the syntax element vps_all_layers_same_num_sublayers_flag may be included in the video parameter set RBSP. vps_all_layers_same_num_sublayers_flag may be information indicating whether all layers constituting the CVS include the same number of temporal sublayers.
  • vps_all_layers_same_num_sublayers_flag 1
  • vps_all_layers_same_num_sublayers_flag 0
  • vps_all_layers_same_num_sublayers_flag 0
  • the number of temporal sublayers of layers in the CVS referring to the VPS may be the same or different.
  • vps_all_layers_same_num_sublayers_flag shall not be included in the bitstream. may be, and the syntax element may be inferred to be 1.
  • a syntax element vps_all_independent_layers_flag indicating dependency between layers may be included in the video parameter set RBSP.
  • a picture in a specific layer is predicted by inter-layer prediction (ILP), a texture from a coded picture in another lower layer (predicting a pixel value), or another layer
  • Motion information may be coded using a motion information set corresponding to a coded picture in . This may mean that the coded picture of the lower layer can be used as a reference picture of the current picture.
  • inter-layer prediction is mainly described using a coded picture in another lower layer as a reference picture, but the present disclosure is not limited thereto.
  • Inter-layer prediction may indicate that a coded picture or syntax elements of a lower layer are utilized for picture coding in the current layer.
  • the current layer uses inter-layer prediction, it indicates that there is a dependency on a single or a plurality of layers different from the current layer, which means that in order for a picture of the current layer to be coded, a picture of a single or multiple layers with a dependency must first be coded. can mean that
  • the current layer does not use inter-layer prediction, the layer may be independently coded regardless of other layers.
  • vps_all_independent_layers_flag When vps_all_independent_layers_flag is 1, it specifies that all layers in the CVS referencing the VPS are independently coded without using inter-layer prediction. On the other hand, when vps_all_independent_layers_flag is 0, it specifies that one or more layers in the CVS referring to the VPS can use inter-layer prediction. This indicates that there may be dependencies between certain layers. When the CVS referring to the VPS consists of a single layer (vps_max_layers_minus1 is 0), vps_all_independent_layers_flag may not be included in the bitstream, and the syntax element may be inferred to be 1.
  • the NAL unit header RBSP may include a nuh_layer_id syntax element, which may indicate an identifier for specifying to which layer the VCL or non-VCL data included in the corresponding NAL unit belongs.
  • vps_layer_id[i] indicates nuh_layer_id included in the NAL unit header of the NAL unit corresponding to the i-th layer. For two non-negative integers m and n, when m is less than n, the value of vps_layer_id[m] may be smaller than the value of vps_layer_id[n].
  • the layer is not the 0th layer (lowest layer) (layer index i is greater than 0) and one or more layers can use inter-layer prediction (vps_all_independent_layers_flag is 0), indicating the dependency and reference relationship between layers Information may be included in the video parameter set RBSP.
  • vps_independent_layer_flag[i] may indicate the dependency of a layer having a layer index of i on another layer.
  • vps_independent_layer_flag[i] 1
  • the layer having the index i specifies that inter-layer prediction is not used, which may mean that the corresponding layer can be coded independently of other layers.
  • the layer having the index i specifies that inter-layer prediction can be used, which may mean that the corresponding layer may have a dependency on other layers.
  • vps_independent_layer_flag[i] When vps_independent_layer_flag[i] is 0, vps_direct_ref_layer_flag[i][j], which is a syntax element for indicating dependency on another layer, may exist in the VPS RBSP, and index j is defined as a range from 0 to i-1. can If it is the 0th layer (index i is 0), or all layers do not use inter-layer prediction (vps_all_independent_layers_flag is 1), vps_independent_layer_flag[i] may not exist in the VPS and may be inferred to be 1 . Accordingly, the picture of the lowest layer having index i of 0 may be independently coded regardless of pictures of other layers.
  • information on the reference layer may be included in the video parameter set RBSP.
  • the higher layer may perform inter-layer prediction with reference to the lower layer.
  • a layer having a layer index of 0 to i-1 may be used as a reference layer.
  • pictures included in the same AU as the current picture among pictures of layers having a layer index of 0 to i-1 may be used as reference pictures.
  • the syntax element vps_direct_ref_layer_flag[i][j] can be parsed from the bitstream while increasing j by 1 from index j to i-1.
  • vps_direct_ref_layer_flag[i][j] When vps_direct_ref_layer_flag[i][j] is 0, it may indicate that the layer having the layer index j is not a direct reference layer of the layer having the layer index i. On the other hand, when vps_direct_ref_layer_flag[i][j] is 1, the layer having the layer index j may indicate that it is a direct reference layer of the layer having the layer index i. This indicates that inter-layer prediction can be performed by using the j-th layer as a reference layer with respect to the i-th layer.
  • vps_direct_ref_layer_flag[i][j] may be inferred to be 0 and the range of index j may range from 0 to i-1.
  • vps_independent_layer_flag[i] it indicates that the corresponding layer can use another layer as a reference layer. Therefore, at least one vps_direct_ref_layer_flag[i][j] may be 1.
  • the index j may range from 0 to i-1.
  • the j-th layer may be required for encoding and decoding of the i-th layer.
  • a layer having a layer index of j is a direct reference layer (when vps_direct_ref_layer_flag[k][j] is 1)
  • a layer having a layer index of i is When the layer is a direct reference layer (vps_direct_ref_layer_flag[i][k] is 1), the j-th layer is not used as a direct reference layer of the i-th layer, but may be required for decoding of the i-th layer. This will be described later with reference to the related drawings.
  • vps_independent_layer_flag[i] when the i-th layer can use inter-layer prediction (vps_independent_layer_flag[i] is 0), information on a temporal identifier that can be used as an inter-layer reference picture (ILRP) is a video parameter may be included in the set RBSP.
  • ILRP inter-layer reference picture
  • the inter-layer reference picture is a picture having a nuh_layer_id smaller than the nuh_layer_id of the current picture among pictures included in the same AU as the current picture, and may be used as a reference picture for the current picture by inter-layer prediction. That is, this indicates that, among pictures included in the same AU as the current picture, pictures corresponding to a lower layer than the current layer may be used as reference pictures.
  • the NAL unit header RBSP may include a syntax element nuh_temporal_id_plus1 that is a temporal identifier for the NAL unit.
  • a variable TemporalId (or Tid) indicating a temporal identifier based on the syntax element nuh_temporal_id_plus1 may be derived as nuh_temporal_id_plus1 - 1.
  • a value of nuh_temporal_id_plus1 may be greater than 0, and TemporalId of all VCL NAL units included in an AU may have the same value.
  • a picture of a layer having a layer index of j may be used as an inter-layer reference picture for a picture of a layer having a layer index of i in the same AU.
  • pictures in which the TemporalId of the reference layer is greater than a specific value may not be used as inter-layer reference pictures.
  • a picture having a TemporalId (Tid) of 0 of Layer 0 may be used as an inter-layer reference picture of a picture included in the same AU of Layer 1.
  • a picture having a TemporalId (Tid) of 1 of Layer 0 may be used as an inter-layer reference picture of a picture included in the same AU of Layer 1.
  • a picture having a TemporalId (Tid) of 2 or 3 of Layer 0 is not used as an inter-layer reference picture of Layer 1.
  • the output layer set (OLS) consists of Layer 0 and Layer 1, and only Layer 1 is output, the sub-layers with TemporalId of 2 and 3 of Layer 0 are not used for Layer 1, so the corresponding NAL unit is It may not be decoded.
  • an image suitable for a terminal may be efficiently reconstructed by extracting and decoding only a layer used as an output or a partial sublayer of a layer from a bitstream according to the configuration of the output layer set.
  • the syntax element max_tid_ref_present_flag[i] may be included in the video parameter set RBSP.
  • max_tid_ref_present_flag[i] 1
  • max_tid_ref_presnet_flag[i] 0
  • it may be specified that the syntax element max_tid_il_ref_pics_plus1[i] does not exist.
  • max_tid_ref_present_flag[i] 1
  • information limiting the TemporalId of a picture that can be used as an inter-layer reference picture may be included in the video parameter set RBSP.
  • max_tid_il_ref_pics_plus1[i] it can be specified that pictures other than IRAP pictures (non-IRAP pictures) are not used as inter-layer reference pictures for coding of pictures of the i-th layer. That is, this means that only the IRAP picture of the reference layer can be used as an inter-layer reference picture for the picture of the i-th layer.
  • max_tid_il_ref_pics_plus1[i] When max_tid_il_ref_pics_plus1[i] is greater than 0, it is possible to specify that a picture having a TemporalId greater than max_tid_il_ref_pics_plus1[i] - 1 is not used as an inter-layer reference picture in the coding of the picture of the i-th layer.
  • max_tid_il_ref_pics_plus1[i] does not exist, the value of max_tid_il_ref_pics_plus1[i] may be inferred to be 7.
  • the max_tid_il_ref_pics_plus1[i] syntax element limits the TemporalId for the inter-layer reference picture of the reference layer that can be used for coding the picture of the i-th layer, so that the TemporalId of the reference layer is max_tid_il_ref_pics_plus1[ i] greater than or equal to sub-layers may not be decoded.
  • the embodiment described above in FIG. 13 is a case where the value of max_tid_il_ref_pics_plus1[1] for Layer 1 is 2, and the picture having TemporalId of 2 and 3 of Layer 0 is not used as an inter-layer reference picture for the picture of Layer 1. indicates.
  • An output layer set may indicate a layer set including one or more layers used as outputs with respect to a single or a plurality of layers.
  • one or more output layer sets may be defined according to an output layer set configuration method, and output layer information for each output layer set may be included in the video parameter set RBSP.
  • each_layer_is_an_ols_flag may be included in the video parameter set RBPS.
  • each_layer_is_an_ols_flag is 1
  • each of the output layer sets includes only one layer, and it may be specified that each layer in the CVS referring to the VPS is the output layer set and the only output layer. In this case, the total number of output layer sets specified in the VPS may be equal to vps_max_layers_minus1+1.
  • each_layer_is_an_ols_flag when a total of three layers exist in a CVS referring to the VPS, layer 0, layer 1, and layer 2, three output layer sets may be specified by the VPS.
  • the 0th output layer set may include layer 0, the first output layer set may include layer 1, and the second output layer set may include layer 2.
  • each_layer_is_an_ols_flag when each_layer_is_an_ols_flag is 0, it may be specified that a plurality of layers may be included in the output layer set.
  • the CVS consists of a single layer (vps_max_layers_minus1 is 0)
  • the value of each_layer_is_an_ols_flag may be inferred to be 1.
  • CVS referring to the VPS may consist of a plurality of layers (when vps_max_layers_minus1 is greater than 0) and one or more layers in the corresponding CVS can use inter-layer prediction (when vps_all_independent_layers_flag is 0), the value of each_layer_is_an_ols_flag is It can be inferred to be 0.
  • each_layer_is_an_ols_flag is 0
  • information on a method of configuring the output layer set may be additionally included in the video parameter set RBSP.
  • a syntax element ols_mode_idc indicating the output layer set mode may be included in the video parameter set RBSP.
  • the i-th output layer set includes layers having a layer index value from 0 to i (ie, i+1 layers), and may specify that the highest layer among them is an output layer. For example, if the CVS referring to the VPS consists of a total of three layers (layer 0, layer 1, and layer 2), the 0th output layer set includes layer 0, and layer 0 may be used as an output. The first output layer set includes layer 0 and layer 1, and layer 1 may be used as an output. The second output layer set includes layer 0, layer 1, and layer 2, and layer 2 may be used as an output.
  • the i-th output layer set includes layers having a layer index value from 0 to i (ie, i+1 layers), and it may be specified that all layers included in the output layer set are outputs.
  • the CVS referring to the VPS consists of a total of three layers (layer 0, layer 1, and layer 2)
  • the 0th output layer set includes layer 0, and layer 0 may be used as an output.
  • the first output layer set includes layers 0 and 1, and both layers 0 and 1 may be used as outputs.
  • the second output layer set includes layer 0, layer 1, and layer 2, and all of layer 0, layer 1, and layer 2 may be used as outputs.
  • ols_mode_idc When ols_mode_idc is 2, the number of output layer sets specified by the VPS and the output layer for each output layer set are explicitly signaled, and in each output set, the remaining layers other than the output layer are the corresponding output It may indicate that it is a direct or indirect reference layer of an output layer included in the layer set.
  • ols_mode_idc may not exist, and the value of ols_mode_idc may be inferred to be 2.
  • ols_mode_idc When ols_mode_idc is 2, information indicating the number of output layer sets and output layers used in the corresponding output layer set may be included in the video parameter set RBSP.
  • the syntax element num_output_layer_sets_minus1 is information indicating the number of output layer sets, and num_output_layer_sets_minus1 + 1 may specify the number of output layer sets specified by the VPS when ols_mode_idc is 2 .
  • TotalNumOlss is the number of output layer sets specified by the VPS
  • TotalNumOlss can be derived according to the following pseudo code.
  • the number of output layer sets may be 1.
  • each_layer_is_an_ols_flag is 1 or ols_mode_idc is 0 or 1
  • the number of output layer sets may be equal to the maximum number of layers allowed in the corresponding CVS (vps_max_layers_minus1+1). Otherwise, when ols_mode_idc is 2, the number of output layer sets may be equal to num_output_layer_set_minus1+1.
  • information indicating an output layer used for each output layer set may be additionally included in the video parameter set RBSP.
  • the loop on index j may be performed while increasing index i by 1 from 1 to num_output_layer_sets_minu1.
  • Index j may be increased by 1 from 0 to vps_max_layers_minus1, and an ols_output_layer_flag[i][j] syntax element for indices i and j may be included in the video parameter set RBSP.
  • ols_output_layer_flag[i][j] is 1, when ols_mode_idc is 2, the layer identifier (nuh_layer_id) of the NAL unit header may specify that the same layer as vps_layer_id[j] is the output layer of the i-th output layer set.
  • nuh_layer_id may specify that a layer equal to vps_layer_id[j] is not an output layer of the i-th output layer set.
  • the 0th output layer set (output layer set with index i equal to 0), it contains only the lowest layer (i.e., the layer with nuh_layer_id equal to vps_layer_id[0]), and since the lowest layer is the output layer, i is 0
  • ols_output_layer_flag[i][j] may not exist.
  • the variable NumDirectRefLayers[i] represents the number of direct reference layers of the i-th layer
  • the variable DirectRefLayerIdx[i][d] represents the layer index of the d-th direct reference layer with respect to the i-th layer.
  • the variable NumRefLayers[i] indicates the number of reference layers (direct reference layer and indirect reference layer) of the i-th layer
  • RefLayerIdx[i][r] indicates the layer index of the r-th reference layer of the i-th layer.
  • the variable LayerUsedAsRefLayerFlag[j] indicates whether the j-th layer is used as a direct reference layer.
  • the higher layer may perform inter-layer prediction with reference to the lower layer.
  • the current layer is the i-th layer
  • the 0-th to (i-1)-th layers may be used as reference layers. This indicates that when coding a current picture of a layer having a layer index of i, a picture included in the same AU as the current picture among pictures of a layer having a layer index of 0 to i-1 may be used as an inter-layer reference picture.
  • the layer having the layer index j may be used as a direct reference layer of the layer having the layer index i, and by inter-layer prediction, the layer index is j
  • a picture of a layer may be used as an inter-layer reference picture for a picture of a layer having a layer index of i.
  • vps_direct_ref_layer_flag[i][j] is 0, it may indicate that the layer having the layer index j is not a direct reference layer of the layer having the layer index i. In this case, the j-th layer is not directly used for coding the i-th layer, but may be required for coding the i-th layer.
  • the j-th layer is not a direct reference layer of the i-th layer, but may be required for coding of the i-th layer. This layer may be referred to as an indirect reference layer.
  • dependencyFlag[i][j] can be set while increasing the index i by 1 from 0 to vps_max_layers_minus1 and increasing the index j by 1 from 0 to vps_max_layer_minus1.
  • dependencyFlag[i][j] is 1, it indicates that the i-th layer is dependent on the j-th layer, meaning that the j-th layer can be used as a reference layer of the i-th layer.
  • dependencyFlag[i][j] when dependencyFlag[i][j] is 0, it indicates that the i-th layer is not dependent on the j-th layer, meaning that the j-th layer is not used as a reference layer of the i-th layer.
  • dependencyFlag[i][j] may be set to 1, and the j-th layer is the i-th layer.
  • dependencyFlag[i][j] may be set to 0.
  • the k-th layer is a direct reference layer of the i-th layer (vps_direct_ref_layer_flag[i][k] is 1), and the j-th layer is When the layer is the reference layer of the k-th layer (when dependencyFlag[k][j] is 1), dependencyFlag[i][j] may be set to 1.
  • the variable LayerUsedAsRefLayerFlag[i] indicating whether the i-th layer at the end of the for statement for index i is used as a direct reference layer may be initialized to 0.
  • dependencyFlag[i][j] when the j-th layer is the reference layer of the i-th layer, dependencyFlag[i][j] may be equal to 1, and the j-th layer is not the reference layer of the i-th layer. case dependencyFlag[i][j] may be equal to 0.
  • reference layer index related information can be set while increasing index i by 1 from 0 to vps_max_layers_minus1 and increasing index j by 1 from 0 to vps_max_layers_minus1.
  • the index d directly related to the reference layer and the index r related to the reference layer may be initialized to 0.
  • vps_direct_ref_layer_flag[i][j] 1, it indicates that the j-th layer is a direct reference layer of the i-th layer, so the variable DirectRefLayerIdx[i][d] is set to the index j, and the index d may be increased by 1. .
  • a variable LayerUsedAsRefLayerFlag[j] indicating whether the j-th layer is used as a direct reference layer may be set to 1. Additionally, when the j-th layer is a direct or indirect reference layer of the i-th layer (when dependencyFlag[i][j] is 1), RefLayerIdx[i][r] is set to index j and index r is increased by 1 can be After the for statement for index j ends, the values of indexes d and r may indicate the number of direct reference layers and the number of reference layers of the i-th layer, respectively. Accordingly, NumDirectRefLayers[i] and NumRefLayers[i] may be set to the values of d and r, respectively.
  • the variable NumOutputLayersInOls[i] represents the number of output layers of the i-th output layer set
  • the variable NumSubLayersInLayerInOLS[i][j] represents the number of (temporal) sub-layers for the j-th layer of the i-th output layer set.
  • a NAL unit corresponding to a specific sub-layer may be removed from the bitstream based on the variable NumSubLayersInLayerInOLS[i][j].
  • variable OutputLayerIdInOls[i][j] indicates the layer identifier (nuh_layer_id) of the NAL unit header for the j-th layer of the i-th output layer set, and the variable LayerUsedAsOutputLayerFlag[k] is that the k-th layer is output from at least one output layer set. Indicates whether a layer is used.
  • the 0th output layer set (the output layer set with index i equal to 0), it contains only the lowest layer (i.e., the layer with nuh_layer_id equal to vps_layer_id[0]), and since the lowest layer is the output layer, the variable NumOutputLayersInOls[ 0], OutputLayerIdInOls[0], NumSubLayersInLayerInOLS[0][0], and LayerUsedAsOutputLayerFlag[0] may be set to 1, vps_layer_id[0], vps_max_sub_layers_minus1+1, 1, respectively.
  • the variable LayerUsedAsOutputLayerFlag[i] indicating whether the i-th layer is used as an output layer can be derived while i is increased by 1 from 1 to vps_max_layers_minus1.
  • each_layer_is_an_ols_flag is 1 or ols_mode_idc is 0 or 1 (when ols_mode_idc is less than 2)
  • all layers of CVS may be used as output layers, so LayerUsedAsOutputLayerFlag[i] may be set to 1.
  • each_layer_is_an_ols_flag is 1 or ols_mode_idc is 0, the number of output layers of the i-th output layer set is one, and the output layer may be the i-th output layer. Accordingly, NumOutputLayersInOls[i] may be set to 1, and OutputLayerIdInOls[i][0] may be set to vps_layer_id[i], which is the layer identifier of the i-th layer (output layer).
  • the i-th output layer set may consist of i+1 layers, and the i-th layer (a layer in which nuh_layer_id is vps_layer_id[i]) may be used as an output layer.
  • the i-th output layer set includes only the i-th layer, and the i-th layer may be used as an output layer.
  • the number of temporal sub-layers for the reference layer excluding the output layer can be set while increasing j by 1 from 0 to i-1.
  • a temporal identifier (TemporalId) of an inter-layer reference picture that can be used for a picture of a specific layer may be limited according to max_tid_il_ref_pics_plus1[i].
  • max_tid_il_ref_pics_plus1[i] When the value of max_tid_il_ref_pics_plus1[i] is greater than 0, a picture having a TemporalId greater than max_tid_il_ref_pics_plus1[i] - 1 may not be used as an inter-layer reference picture in coding a picture in the i-th layer.
  • the i-th layer of the i-th output layer set is an output layer, so NumSubLayersInLayerInOLS[i][i] may be set to vps_max_sub_layers_minus1 + 1.
  • each_layer_is_an_ols_flag is 0 and ols_mode_idc is 1
  • the number of output layers of the i-th output layer set may be equal to i+1.
  • the CVS referring to the VPS is configured with layer 0, layer 1, and layer 2, the 0th output layer set may be configured with layer 0 and layer 0 may be used as an output layer.
  • the first output layer set is composed of layer 0 and layer 1, and both layer 0 and layer 1 may be used as output layers.
  • the second output layer set includes layer 0, layer 1, and layer 2, and all of layer 0, layer 1, and layer 2 may be used as output layers.
  • variable NumOutputLayersInOls[i] indicating the number of output layers of the i-th output layer set may be set to i+1.
  • the layer identifier of the output layer constituting the output layer set and the number of sub-layers included in the layer can be derived by increasing j by 1 from 0 to NumOutputLayersInOls[i]-1.
  • OutputLayerIdInOls[i][j], which is the j-th output layer identifier of the i-th output layer set, can be set to vps_layer_id[j], which is the layer identifier of the j-th layer, and since all layers constituting the output layer set are output layers, NumSubLayersInLayerInOLS[i][j] may be set to vps_max_sub_layers_minus1+1.
  • each_layer_is_an_ols_flag is 0 and ols_mode_idc is 2
  • index j increase j by 1 from 0 to vps_max_layers_minus1 and layerIncludedInOlsFlag[i][j]
  • NumSubLayersInLayerInOLS[i][j] is 0
  • index k can be increased from 0 to vps_max_layers_minus1 by 1
  • index j can be initialized to 0.
  • layerIncludedInOlsFlag[i][k] and LayerUsedAsOutputLayerFlag[k] may be set to 1. . This may indicate that the k-th layer is included in the i-th output layer set, and the k-th layer is used as the output layer.
  • OutputLayerIdx[i][j] which is a variable indicating the j-th output layer index of the i-th output layer set, may be set as the layer index k, and OutputLayerIdInOls[i], which is the layer identifier of the j-th output layer of the i-th output layer set. [j] may be set to vps_layer_id[k]. Thereafter, the index j may be increased by 1. Since the k-th layer of the i-th output layer set is an output layer, NumSubLayersInLayerInOLS[i][k] may be set to vps_max_sub_layers_minus1+1.
  • NumOutputLayersInOls[i] may be set to j.
  • NumSubLayersInLayerInOLS[i][j] is equal to vps_max_sub_layers_minus1+1 when the j-th layer is an output layer, and equal to 0 when not an output layer.
  • variable values for the reference layer of the output layer can be derived while increasing j by 1 from 0 to NumOutputLayersInOls[i]-1.
  • the variable idx represents the layer index of the j-th output layer constituting the i-th output layer set
  • the variable NumRefLayers[idx] represents the number of reference layers of the j-th output layer constituting the i-th output layer set.
  • a variable, RefLayerIdx[idx][k] indicates the layer index of the k-th reference layer of the j-th output layer constituting the i-th output layer set.
  • the syntax element max_tid_il_ref_pics_plus1[] is defined according to the layer index rather than the layer identifier. It may be removed from the bitstream. Therefore, instead of OutputLayerIdInOls[i][j] indicating the output layer identifier, max_tid_il_ref_pics_plus1 corresponding to the output layer index idx or GeneralLayerIdx[OutputLayerIdInOls[i][j]] may be used.
  • GeneralLayerIdx[] may indicate a layer index for a layer identifier.
  • GeneralLayerIdx[] can be derived by the following pseudo code.
  • GeneralLayerIdx[OutputLayerIdInOls[i][j]] may be the same as the output layer index (OutputLayerIdx[i][j]) corresponding to the output layer identifier (OutputLayerIdInOls[i][j]).
  • the variable NumLayersInOls[i] represents the number of layers included in the i-th output layer set
  • the variable LayerIdInOls[i][j] represents the layer identifier (nuh_layer_id) of the j-th layer included in the i-th output layer set.
  • NumLayersInOls[0] and LayerIdInOls[0][0] can be set to 1 and vps_layer_id[0], respectively. .
  • NumLayersInOls[i] and LayerIdInOls[i][j] can be derived according to the output layer set configuration method while increasing i by 1 from 1 to TotalNumOlss-1 in the for statement for index i.
  • the i-th output layer set contains only the i-th layer with the layer identifier equal to vps_layer_id[i], so NumLayersInOls[i] and LayerIdInOls[i][0] are 1, respectively, vps_layer_id[i] ] can be set.
  • each_layer_is_an_ols_flag is 0 and ols_mode_idc is 0 or 1
  • NumLayersInOls[i] may be set to i+1. .
  • LayerIdInOls[i][j] may be set to vps_layer_id[j].
  • each_layer_is_an_ols_flag is 0 and osl_mode_idc is 2
  • index k is increased from 0 to vps_max_layers_minus1 by 1, and index j may be initialized to 0.
  • LayerIdInOls[i][j] can be set to vps_layer_id[k]. and index j may be increased by 1.
  • j represents the number of layers included in the i-th output layer set, so NumLayersInOls[i] may be equal to j.
  • the bitstream that is input to the decoder may include NAL units for a plurality of layers, and the decoder of the terminal extracts only the layer included in the output layer set to be decoded from among the output layer sets specified by the VPS from the bitstream. can be decoded.
  • the sub-bitstream extraction process may receive an input bitstream inBitstream, an output layer set index to be decoded, targetOlsIdx, and a variable tIdTarget indicating the highest TemporalId of the bitstream to be extracted.
  • OutBitstream which is a sub-bitstream of the input bitstream, may be output by the sub-bitstream extraction process.
  • the output bitstream, outBitstream may be derived according to the following processes i), ii), iii), and iv).
  • outBitstream may be set to be the same as inBitstream, which is an input bitstream. That is, for the same output bitstream as the input bitstream, unnecessary NAL units may be removed from the output bitstream according to processes ii), iii), and iv).
  • the type (nal_unit_type) of the NAL unit is not any one of VPS_NUT, DCI_NUT, and EOB_NUT, and the layer identifier (nuh_layer_id) of the NAL unit header is a list LayerIdInOls [targetOlsIdx] All NAL units that are not included in the list will be removed.
  • the header of the NAL unit may include information on the type of the NAL unit, and when the type of the NAL unit is VPS_NUT, the NAL unit may include the video parameter set RBSP. When the type of the NAL unit is DCI_NUT, the corresponding NAL unit may include decoder capability information RBSP.
  • the corresponding NAL unit may include information related to the end of the bitstream. That is, when the type of the NAL unit is any one of VPS_NUT, DCI_NUT, and EOB_NUT, even if the layer identifier is not included in the list LayerIdInOls[targetOlsIdx], since it may be information necessary for decoding, it may be included in the output bitstream.
  • the slice layer may include a slice header and slice data, and when the type of the NAL unit is any one of IDR_W_RADL, IDR_N_LP, and CRA_NUT, the corresponding NAL unit may include a slice layer for an IRAP picture.
  • the TemporalID of the IRAP picture may be 0.
  • nuh_layer_id is equal to LayerIdInOls[targetOlsIdx][j] for j whose value ranges from 0 to NumLayersInOls[targetOlsIdx]-1
  • TemporalId is greater than or equal to NumSubLayersInLayerInOLS[targetOlsIdx][GeneralLayerIdx[LayerIdInOls[targetOlsIdx][j]]] for j whose value ranges from 0 to NumLayersInOls[targetOlsIdx]-1
  • the list NumSubLayersInLayerInOLS[targetOlsIdx][j] derivation method described above in FIG. 17 includes the number of sub-layers for the layers included in the output layer set indicated by the targetOlsIdx and the number of sub-layers for the layers not included in the output layer set.
  • index j may not be a layer index for a layer included in the output layer set indicated by targetOlsIdx, but a layer index for all layers constituting CVS.
  • GeneralLayerIdx[LayerIdInOls[targetOlsIdx][j]] instead of index j may be used.
  • GeneralLayerIdx[LayerIdInOls[targetOlsIdx][j]] may indicate the layer index of the j-th layer included in the targetOlsIdx-th output layer set, as a result, NumSubLayersInLayerInOLS[targetOlsIdx][GeneralLayerIdx[LayerIdInOls[targetOlsIdx]] It may indicate the number of sub-layers for the j-th layer included in the th output layer set.
  • Condition a) is for when max_tid_il_ref_pics_plus1[i] is 0, and when max_tid_il_ref_pics_plus1[i] is 0, it may indicate that only the IRAP picture is used as the inter-layer reference picture of the i-th layer.
  • the TemporalId of the IRAP picture is 0, and even if conditions b) and c) are true, the NAL unit for the IRAP picture may be included in the output bitstream and may be used as an inter-layer reference picture.
  • Fig. 19 shows pseudocode for deriving variables related to an output layer set according to another embodiment.
  • index j is a layer index for all layers constituting the CVS, and the number of sub-layers for layers not included in the i-th output layer set can be included. have. Therefore, when referring to NumSubLayersInLayerInOLS[i][] in the sub-bitstream extraction process, it is necessary to limit it to the j-th layer included in the i-th output layer set using GeneralLayerIdx[LayerIdInOls[i][j]]. .
  • NumSubLayersInLayerInOLS[i][j] may be derived only for the layers included in the output layer set.
  • NumSubLayersInLayerInOLS[i][j] may specify the number of sub-layers for the j-th layer included in the i-th output layer set.
  • contents different from those of FIG. 17 will be mainly described, and for contents not described in FIG. 19 , the above contents in FIG. 17 may be referred to.
  • the i-th output layer set includes only the i-th layer, which may be the only output layer.
  • NumSubLayersInLayerInOLS[i][0] may be equal to vps_max_sub_layers_minus1+1.
  • each_layer_is_an_ols_flag is 0 and ols_mode_idc is 0 or 1
  • the same method as the NumSubLayersInLayerInOLS[i][j] derivation method described above with reference to FIG. 17 may be applied.
  • temp[i][j] can be equal to vps_max_sub_layers_minus1+1 if j-th layer is the output layer of the i-th output layer set, and temp[i][ if the j-th layer is not the output layer of the i-th output layer set. j] may be equal to 0.
  • temp[i][RefLayerIdx[idx][k]] may be set to max_tid_il_ref_pics_plus1[idx].
  • index j is increased from 0 to vps_max_layers_minus1 by 1, and k may be initialized to 0. If the j-th layer is included in the i-th output layer set (layerIncludedInOlsFlag[i][j] is 1), NumSubLayersInLayerInOLS[i][k] may be set to temp[i][j] and index k is It can be increased by 1.
  • NumSubLayersInLayerInOLS[i][j] the number of sub-layers is derived only for the layer included in the i-th output layer set, and the index j is the j-th included in the i-th output layer set. means layer.
  • NumSubLayersInLayerInOLS[targetOlsIdx][j] may be used instead of NumSubLayersInLayerInOLS[targetOlsIdx][GeneralLayerIdx[LayerIdInOls[targetOlsIdx][j]]] in condition c) of the sub-bitstream extraction process.
  • VPS video parameter set
  • max_tid_il_ref_pics_plus1[i] which is a syntax element limiting the TemporalID of an inter layer reference picture (ILRP) that can be used for a layer having a layer index of i
  • ILRP inter layer reference picture
  • the method of configuring the output layer set, and the reference relationship between the layers, unnecessary (not used as output and reference) sub-layers among the layers included in the output layer set may be removed from the bitstream.
  • max_tid_ref_present_flag[i] and max_tid_il_ref_pics_plus1[i] may be present in the video parameter set RBSP. That is, in FIG. 15 , max_tid_ref_present_flag[i] and max_tid_il_ref_pics_plus1[i] may be signaled regardless of an output layer set configuration method.
  • syntax elements max_tid_ref_present_flag[i] and max_tid_il_ref_pics_plus1[i] may be signaled differently from the method of FIG. 15 .
  • NumSubLayersInLayerInOLS[i][j] used in the sub-bitstream extraction process may be set based on an output layer set configuration method and a reference relationship between layers.
  • NumSubLayersInLayerInOLS[i][j] may be set to a value greater than vps_max_sublayers_minus1+1 or vps_max_sublayers_minus1+1. This may indicate that since the corresponding layer is an output layer, all sub-layers in the layer are required.
  • NumSubLayersInLayerInOLS[i][j] may be set based on max_tid_il_ref_pics_plus1[], which is a specific sub layer of the corresponding layer. It may indicate that the layer may be removed from the bitstream.
  • ols_mode_idc is 1, the i-th output layer set includes i+1 layers, and all layers constituting the output layer set may be used as output layers.
  • NumSubLayersInLayerInOLS[i][j] is not derived based on max_tid_il_ref_pics_plus1[i], but may always be derived as vps_max_sublayers_minus1+1.
  • max_tid_il_ref_pics_plus1[i] syntax element Since the max_tid_il_ref_pics_plus1[i] syntax element is not used in a specific output layer set configuration method, it may be signaled according to the output layer set configuration method. Unlike the VPS RBSP syntax structure of FIG. 15 , max_tid_ref_present_flag[i] and max_tid_il_ref_pics_plus1[i] may not be included immediately after the vps_direct_ref_layer_flag[i][j] syntax element.
  • max_tid_il_ref_pics_plus1[i] is not used when ols_mode_idc is 1, so max_tid_il_ref_pics_plus1[i] may be included in the video parameter set RBSP after the ols_mode_idc syntax element.
  • max_tid_il_ref_pics_plus1[i] may be included in the video parameter set RBSP after the output layer set related information (ols_output_layer_flag[i][j]).
  • i can be increased by 1 from 1 to vps_max_layers_minus1, when the i-th layer can use inter-layer prediction (vps_independent_layer_flag[i] is 0) and ols_mode_idc is 0 or 2 (ols_mode_idc is 0) not 1), max_tid_ref_present_flag[i] may be included in the video parameter set RBSP.
  • max_tid_ref_present_flag[i] indicates that max_tid_il_ref_pics_plus1[i] is present (when max_tid_ref_present_flag[i] is 1)
  • max_tid_il_ref_pics_plus1[i] may be included in the video parameter set RBSP.
  • 21 is a diagram illustrating the use of a flag related to the number of sub-layers in a layer among contents of the VPS syntax structure.
  • a bitstream may be composed of a plurality of layers, and an individual layer may be composed of a plurality of sub-layers.
  • the number of sub-layers may be the same or different for each layer. If the number of sub-layers is different for each layer, ptl_max_temporal_id[i], dpb_max_temporal_id[i], and hrd_max_tid[i] among the syntax elements may be affected.
  • ptl_max_temporal_id[i] may be a syntax element related to a profile, a tier, and a level.
  • dpb_max_temporal_id[i] may be a syntax element related to the decoded picture buffer.
  • hrd_max_tid[i] may be a syntax element related to the hypothetical reference decoder.
  • vps_all_layers_same_num_sublayers_flag is used as a syntax element indicating the same number of sublayers for each layer.
  • the flag value is equal to 1
  • the number of sub-layers for each layer is the same, and when the flag value is equal to 0, it may be the same or different.
  • the flag value is equal to 1
  • the number of sub-racers per layer may be equal to vps_max_sublayers_minus1+1.
  • 22 is a diagram illustrating syntax elements and semantics that are part of the SPS syntax structure and indicate sub-layers.
  • a syntax element indicating the number of sublayers of a corresponding layer may be sps_max_sublayers_minus1.
  • a value obtained by adding 1 to sps_max_sublayers_minus1 may specify the maximum allowable number of temporal sublayers.
  • a value of sps_max_sublayers_minus1 may range from 0 to vps_max_sublayers_minus1.
  • the value of the syntax element sps_max_sublayers_minus1 in the SPS syntax structure referring to the VPS must be equal to the value of vps_max_sublayers_minus1.
  • This may be a constraint that syntax elements in the SPS syntax structure referencing the same must also be followed when the same number of sub-layers for all layers is indicated in the VPS syntax structure. Contrary to this, if the number of sub-layers is differently set/indicated for each layer, mismatches may occur with the propie/tier/level, DPB parameter, and HRD parameters, making it difficult to decode the bitstream.
  • the sps_max_sublayer_minus1 syntax element may be signaled in the following structure.
  • sps_max_sublayer_minus1 may be signaled based on the syntax element sps_max_sublayers_minus1_present_flag.
  • sps_max_sublayers_minus1_present_flag indicates whether the syntax element sps_max_sublayer_minus1 exists.
  • sps_max_sublayers_minus1_presnet_flag 1
  • sps_max_sublayers_minus1_present_flag it specifies that sps_max_sublayer_minus1 does not exist.
  • VPS referenced by the SPS exists (that is, sps_video_parameter_set_id is greater than 0) and vps_all_layers_same_num_sublayers_flag is 1, there may be a constraint that the value of sps_max_sublayers_minus1_presnet_flag must be 0.
  • VPS referenced by the SPS does not exist (that is, when sps_video_parameter_set_id is 0) or when vps_all_layers_same_num_sublayers_flag is 0, a constraint that the value of sps_max_sublayers_minus1_present_flag must be 1 may exist.
  • sps_max_sublayers_minus1_present_flag is 0, sps_max_sublayers_minus1 may not exist and may be inferred as vps_max_sublayers_minus1.
  • a picture unit may include one coded picture as a set of NAL units. That is, the PU may include a picture header (PH) NAL unit, a non-VCL NAL unit other than the picture header NAL unit, and one coded picture.
  • the picture header is a syntax structure including parameters applicable to all slices included in one coded picture, and in some cases, the PH NAL unit may not exist in the PU.
  • non-VCL NAL units such as DCI NAL unit, VPS NAL unit, SPS NAL unit, and PPS NAL unit are not shown in FIG. 23, when these non-VCL NAL units are present in the PU, the PH NAL unit and the first of the PU It may exist before the th VCL NAL unit.
  • a picture may be divided into a plurality of slices, and each coded slice may be included in a PU in the form of a VCL NAL unit.
  • One coded slice is encapsulated in one VCL NAL unit, and the RBSP included in the corresponding VCL NAL unit is slice_layer_rbsp(), which is data for a slice layer.
  • slice_layer_rbsp() which is data for a slice layer.
  • nal_unit_type when nal_unit_type is CRA_NUT, it may indicate that the VCL NAL unit includes a coded slice of a clean random access (CRA) picture that is one of the IRAP pictures.
  • CRA clean random access
  • a PH NAL unit may exist before the first VCL NAL unit of the PU. Parameters applied to one picture may not be included in the slice header and repeatedly signaled, but may be included in the picture header and signaled. This can reduce parameter signaling overhead.
  • the PH NAL unit is a NAL unit including a picture header
  • nal_unit_type which is information indicating the NAL unit type of the NAL unit header, is the same as PH_NUT.
  • the RBSP included in the PH NAL unit is picture_header_rbsp(), which is an RBSP for the picture header.
  • 23B shows the configuration of a picture unit for a case in which a coded picture includes one coded slice.
  • the picture header includes parameters applicable to all slices in one picture, a unique picture header is required for each coded picture.
  • the NAL unit requires additional signaling overhead according to the media file format and transport protocol, which indicates that as the number of NAL units constituting the bitstream increases, the signaling overhead for the NAL unit increases. In particular, the signaling overhead associated with such a NAL unit can be burdensome in low bit rate applications.
  • the coded picture includes one coded slice
  • the PH NAL unit including the picture header may not exist in the PU.
  • the picture header may be included in the slice header. That is, the picture header may be included in the slice layer RBSP (slice_layer_rbsp()) of the VCL NAL unit.
  • a coded picture consists of two or more coded slices (ie, includes two or more VCL NAL units).
  • the picture header (PH) NAL unit may exist before the first VCL NAL unit of the picture unit (PU).
  • the picture unit PU may include the current picture unit.
  • the PH NAL unit includes a picture header RBSP, and a plurality of slices constituting a picture may refer to the picture header.
  • the coded picture consists of only one coded slice (ie, it includes one VCL NAL unit).
  • the picture header may be included in the slice header instead of the PH NAL unit, and a single slice constituting the picture may refer to the picture header included in the slice header.
  • a video bitstream may include parameter sets used for encoding and decoding.
  • a NAL unit including such a parameter set is a non-VCL NAL unit, and a parameter set included in the corresponding NAL unit may be indicated based on the nal_unit_type of the NAL unit header.
  • nal_unit_type is DCI_NUT, it indicates a decoding capability information (DCI) NAL unit, and the NAL unit includes a decoding_capability_information_rbsp() syntax structure.
  • DCI decoding capability information
  • nal_unit_type When nal_unit_type is VPS_NUT, a video parameter set (VPS) NAL unit is indicated, and the NAL unit includes a video_parameter_set_rbsp() syntax structure.
  • nal_unit_type When nal_unit_type is SPS_NUT, a sequence parameter set (SPS) NAL unit is indicated, and the NAL unit includes a seq_parameter_set_rbsp() syntax structure.
  • SPS sequence parameter set
  • PPS picture parameter set
  • pic_parameter_set_rbsp() syntax structure When nal_unit_type is PPS_NUT, it indicates a picture parameter set (PPS) NAL unit, and the corresponding NAL unit includes a pic_parameter_set_rbsp() syntax structure.
  • nal_unit_type When nal_unit_type is PREFIX_APS_NUT or SUFFIX_APS_NUT, it indicates an adaptation parameter set (APS) NAL unit, and the corresponding NAL unit includes an adaptation_parameter_set_rbsp() syntax structure.
  • nal_unit_type When nal_unit_type is PH_NUT, it indicates a PH NAL unit, and the NAL unit includes a picture_header_rbsp() syntax structure.
  • the SPS NAL unit includes seq_parameter_set_rbsp(), and the SPS RBSP may include parameters applicable to CLVS. That is, the CLVS may refer to a specific SPS, and the SPS RBSP may include an identifier for reference. sps_seq_parameter_set_id indicates an SPS identifier used for reference by other syntax elements. In addition, the SPS may refer to the VPS, and when sps_video_parameter_set_id is greater than 0, vps_video_parameter_set_id, which is an identifier of the VPS referenced by the SPS, is specified.
  • sps_video_parameter_set_id when sps_video_parameter_set_id is equal to 0, it specifies that the SPS does not refer to the VPS, which indicates that the VPS is not referenced when decoding the CLVS referring to the SPS.
  • the sps_video_parameter_set_id of the SPSs referenced by each CLVS included in the CVS must all be the same.
  • the VPS NAL unit includes video_parameter_set_rbsp()
  • the VPS RBSP includes vps_video_parameter_set_id, which is an identifier for reference.
  • the VPS indicated by sps_video_parameter_set_id may be referenced by the SPS.
  • the SPS RBSP may include control information for a specific coding method (or coding tool).
  • control information may be a flag specifying whether a specific coding method can be used in CLVS (on or enabled) or not used (off or disabled).
  • a coding tool activation flag (sps_coding_tool_enabled_flag) of the sequence level may be included in the SPS RBSP.
  • a coding tool activation flag (sps_coding_tool_enabled_flag) of the sequence level may be obtained from the SPS.
  • the sequence-level coding tool activation flag sps_coding_tool_enabled_flag may indicate whether a coding tool is activated for the current CLVS from the SPS applied to the current CLVS.
  • the coding tool enable flag (sps_coding_tool_enabled_flag) of the sequence level is 1, it specifies that a specific coding method can be used for the CLVS referring to the SPS.
  • the coding tool enable flag (sps_coding_tool_enabled_flag) of the sequence level is 0, it specifies that a specific coding method is not used in the CLVS referring to the SPS.
  • RBSP Since the RBSP is a byte unit structure, it may include a bit indicating the end of data (information) included in the RBSP and an additional bit for configuring the byte unit.
  • rbsp_trailing_bits() may include rbsp_stop_one_bit (bit '1') and rbsp_alignment_zero_bit (bit '0').
  • rbsp_stop_one_bit is always 1, and may be added after the last data included in the RBSP. Additionally, bit 0 (rbsp_alignment_zero_bit) may be added until the bitstream becomes a byte unit.
  • the decoder may scan the bitstream in units of bytes inversely to find the position corresponding to the first bit value 1, which may indicate the position after the end of the data actually included in the RBSP. For example, when the bitstream is 10011010, actual RBSP data corresponds to 100110.
  • the PPS NAL unit includes pic_parameter_set_rbsp(), and parameters applicable to the coded picture may be included in the PPS RBSP. That is, the coded picture may refer to a specific PPS, and the PPS RBSP may include an identifier for reference. pps_pic_parameter_set_id indicates a PPS identifier used for reference by other syntax elements. In addition, the PPS may refer to the SPS, and pps_seq_parameter_set_id specifies sps_seq_parameter_set_id, which is an identifier of the SPS referenced by the PPS.
  • pps_seq_parameter_set_ids of the PPSs referenced by each coded picture included in the CLVS must all be the same. After the last data included in the PPS RBSP, rbsp_trailing_bits() may be added.
  • the PH NAL unit includes picture_header_rbsp(), and the PH RBSP may include parameters applicable to all slices of one coded picture.
  • picture_header_rbsp() may include picture_header_structure() (refer to line 2430 of FIG. 24) and rbsp_trailing_bits() including actual picture header data.
  • the PH NAL unit may not exist in the PU, and picture_header_structure() including the actual picture header parameter as shown in line 2420 of FIG. 24 . may be included in the slice header.
  • picture_header_structure( ) may include common information (or parameters) applicable to all slices of a coded picture. Since the picture header may refer to the PPS, ph_pic_parameter_set_id specifying the pps_pic_parameter_set_id of the PPS referenced by the picture header may be included in the picture header.
  • control information may be a flag specifying whether a specific coding method can be used (on or enabled) in a coded slice of a coded picture or not (off or disabled).
  • a picture-level coding tool enable flag (ph_coding_tool_enabled_flag) may be included in the picture header.
  • the video signal processing apparatus may perform an operation of obtaining a picture-level coding tool activation flag (ph_coding_tool_enabled_flag).
  • a coding tool activation flag (ph_coding_tool_enabled_flag) of the picture level may indicate whether a coding tool is activated for a picture.
  • the picture level coding tool enable flag may include a picture level LMCS enable flag (ph_lmcs_enabled_flag) or a picture level explicit scaling list enable flag (ph_explicit_scaling_list_enabled_flag).
  • the picture-level coding tool enable flag (ph_coding_tool_enabled_flag) 1
  • the picture level coding tool enable flag (ph_coding_tool_enabled_flag) 0
  • the picture level coding tool enable flag (ph_coding_tool_enabled_flag) does not exist, it may be inferred as 0.
  • sps_coding_tool_enabled_flag when a specific coding method can be used in CLVSL (when sps_coding_tool_enabled_flag is 1), it can be controlled whether the coding method can be used at the picture level according to the coding tool activation flag (ph_coding_tool_enabled_flag) of the picture level.
  • the specific coding method (or coding tool) described above may include at least one of luma mapping with chroma scaling (LMCS) or use of an explicit scaling list.
  • LMCS luma mapping with chroma scaling
  • the use of LMCS and explicit scaling lists is described in more detail in conjunction with FIG. 26 .
  • slice_layer_rbsp() which is data for the coded slice.
  • slice_layer_rbsp() may include slice_header(), slice_data(), and rbsp_slice_trailing_bits().
  • slice_header() includes parameters applied to a slice
  • slice_data() includes all coding tree units included in a slice.
  • rbsp_slice_trailing_bits() is an additional bit stream for the slice layer RBSP, and may include rbsp_trailing_bits() and additional bytes.
  • slice_header( ) may include a picture header-in-slice header flag (picture_header_in_slice_header_flag) which is information indicating whether a picture header exists in the slice header.
  • the video signal processing apparatus may perform an operation of obtaining a picture header-in-slice header flag indicating whether a picture header syntax structure exists in a slice header from a bitstream of a slice level. More specifically, the video signal processing apparatus may obtain the picture header-in-slice header flag from the slice header (slice_header( )) of the current slice included in the current picture.
  • picture header-in-slice header flag (picture_header_in_slice_header_flag) 1, it specifies that the PH syntax structure picture_header_structure() is present in the slice header.
  • picture header-in-slice header flag (picture_header_in_slice_header_flag)
  • picture_header_in_slice_header_flag the picture header-in-slice header flag of all coded slices of CLVS.
  • the value of the picture header-in-slice header flag (picture_header_in_slice_header_flag) of all coded slices of the current CLVS may be the same.
  • picture_header_structure( ) may be included in the slice header.
  • the picture header network abstraction layer (NAL) including the picture header syntax structure in the current coded layer video sequence (CLVS). Units may not exist.
  • the video signal processing apparatus obtains the picture-level coding tool activation flag from the picture-level bitstream, from the picture header syntax structure (picture_header_structure()) included in the slice header (slice_header()) ph_coding_tool_enabled_flag) may be obtained.
  • the current PU when the picture header-in-slice header flag (picture_header_in_slice_header_flag) is 0, the current PU must include a PH NAL unit, which indicates that the coded picture includes a plurality of coded slices.
  • the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure does not exist in the slice header
  • the current picture unit PU may include a picture header (PH) NAL. That is, it may be as shown in (a) of FIG. 23 .
  • the current picture unit PU may include a picture header NAL unit. have.
  • the video signal processing apparatus obtains the picture-level coding tool activation flag from the picture-level bitstream, the picture header syntax included in the NAL unit. Obtaining a picture-level coding tool activation flag from the structure may be performed.
  • a specific coding method when a specific coding method can be used in the coded slice of the coded picture (when ph_coding_tool_enabled_flag is 1), information for controlling the specific coding method at the slice level may be included in the slice header.
  • Such control information may be a slice-level coding tool activation flag (slice_coding_tool_enabled_flag).
  • the slice-level coding tool activation flag (slice_coding_tool_enabled_flag) may include a slice-level LMCS activation flag (slice_lmcs_enabled_flag) or a slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag).
  • the slice-level coding tool enable flag may be a flag that specifies whether a specific coding method can be used in the current slice (on or enabled) or not used (off or disabled). For example, when the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) is 1, it specifies that a specific coding method can be used for the current slice. On the other hand, when the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) is 0, it specifies that a specific coding method is not used for the current slice.
  • the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) does not exist, it may be inferred as 0. Accordingly, when a specific coding method can be used for a coded slice of a coded picture (when ph_coding_tool_enabled_flag is 1), it is determined whether the coding method can be used at the slice level according to the slice level coding tool activation flag (slice_coding_tool_enabled_flag). can be controlled. Even if the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) is 1, a specific coding method may not be applied to all substructures included in the corresponding slice. For example, although not shown in FIG. 24, information for controlling a specific coding method at a slice lower level may be used. Or a specific coding method may not be used at a slice lower level based on another syntax element.
  • redundancy may exist in a method of signaling control information of a specific coding method at each level described above in FIG. 24 .
  • the coded picture includes one coded slice and the picture header is included in the slice header (when picture_header_in_slice_header_flag is 1)
  • the picture level coding tool activation flag (ph_coding_tool_enabled_flag) when the picture level coding tool activation flag (ph_coding_tool_enabled_flag) is 1, the slice level A coding tool activation flag (slice_coding_tool_enabled_flag may be present in the slice header of Since it includes only slices, it can be seen that a specific coding method can be used for only one coded slice of a coded picture.
  • a syntax element (slice_coding_tool_enabled_flag) that specifies whether a specific coding method is used at the slice level Signaling is unnecessary, and whether to use a specific coding method at the slice level may be determined based
  • Line 2510 in FIG. 25 is different from line 2410 in FIG. 24 .
  • the remaining portions described in the same manner in FIGS. 24 and 25 have already been described with reference to FIG. 24, and thus overlapping descriptions will be omitted.
  • the method of signaling the control information (slice_coding_tool_enabled_flag) of the specific coding method at the slice level is independent of the number of coded slices included in the coded picture.
  • a slice-level coding tool enable flag (slice_coding_tool_enabled_flag) that specifies whether a specific coding method can be used for the current slice may be included in the slice header RBSP.
  • slice_level coding tool activation flag (slice_coding_tool_enabled_flag) (on/off or enabled/disabled) may be determined.
  • the coded picture when the coded picture includes only one coded slice, it may be determined whether a specific coding method is used for the current slice according to a picture-level coding tool enable flag (ph_coding_tool_enabled_flag).
  • a picture-level coding tool enable flag ph_coding_tool_enabled_flag
  • the picture-level coding tool activation flag ph_coding_tool_enabled_flag
  • the picture header is present in the slice header (when picture_header_in_slice_header_flag is 1), this indicates that there is no PH NAL unit in the PU and the coded picture contains one coded slice, and the slice-level coding tool
  • the enable flag (slice_coding_tool_enabled_flag) is not included in the slice header RBSP.
  • the value of the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) may be inferred as the value of the picture-level coding tool enable flag (ph_coding_tool_enabled_flag).
  • the video signal processing apparatus indicates that the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure does not exist in the slice header, and the picture-level coding tool activation flag ( When ph_coding_tool_enabled_flag) indicates activation of a coding tool, a slice-level coding tool activation flag (slice_coding_tool_enabled_flag) indicating whether a coding tool is used for the current slice may be obtained from the slice header (slice_header()). .
  • the coding tool activation flag (slice_coding_tool_enabled_flag) at the slice level that specifies whether a specific coding method is used (on/off or enabled/disabled) at the slice level of FIG. 25 is the coding tool activation flag (ph_coding_tool_enabled_flag) at the picture level.
  • the picture header-in-slice header flag (picture_header_in_slice_header_flag) it may be included in the slice header RBSP.
  • the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) does not exist in the slice header RBSP
  • the value of the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) may be inferred as the value of the picture-level coding tool enable flag (ph_coding_tool_enabled_flag).
  • the slice-level coding tool enable flag (ph_coding_tool_enabled_flag) is 1 and the coded picture includes two or more coded slices
  • the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) may be included in the slice header RBSP, and the coded It means that whether to use a specific coding method can be determined for each coded slice of a picture.
  • the video signal processing apparatus When the slice-level coding tool activation flag (slice_coding_tool_enabled_flag) is not obtained from the slice header (slice_header()), the video signal processing apparatus performs a picture-level coding tool activation flag (ph_coding_tool_enabled_flag) based on the slice-level coding tool activation flag ( slice_coding_tool_enabled_flag) may be determined. Referring to line 2510 of FIG. 25 , there may be two cases in which the slice-level coding tool activation flag slice_coding_tool_enabled_flag is not obtained from the bitstream.
  • the picture header-in-slice header flag indicates that the picture header syntax structure is present in the slice header (the first case)
  • the picture-level coding tool activation flag (ph_coding_tool_enabled_flag) disables the coding tool
  • the slice-level coding tool activation flag slice_coding_tool_enabled_flag may not be obtained from the bitstream.
  • the video signal processing apparatus may perform the following process to determine the slice-level coding tool activation flag based on the picture-level coding tool activation flag.
  • the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure is present in the slice header
  • the video signal processing apparatus sets the value of the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) to activate the coding tool at the picture level
  • the step of determining to be the same as the value of the flag (ph_coding_tool_enabled_flag) may be performed.
  • the picture-level coding tool activation flag (ph_coding_tool_enabled_flag) may indicate activation of a coding tool or deactivation of a coding tool.
  • the video signal processing apparatus may determine the slice-level coding tool activation flag slice_coding_tool_enabled_flag to be the same as the picture-level coding tool activation flag ph_coding_tool_enabled_flag. That is, when the picture-level coding tool activation flag indicates activation of the coding tool, the video signal processing apparatus may determine the slice-level coding tool activation flag as the use of the coding tool. Also, when the picture-level coding tool activation flag indicates deactivation of the coding tool, the video signal processing apparatus may determine the slice-level coding tool activation flag as non-use of the coding tool.
  • the video signal processing apparatus sets the slice-level coding tool activation flag (slice_coding_tool_enabled_flag) to non-use of the coding tool. You can take steps to decide. Referring to line 2510 of FIG.
  • the picture header-in-slice header flag is not obtained from the bitstream, even though the picture header syntax structure does not exist in the slice header, the slice-level coding tool enable flag (slice_coding_tool_enabled_flag) is not obtained from the bitstream.
  • the picture level coding tool activation flag (ph_coding_tool_enabled_flag) may indicate deactivation of the coding tool.
  • the video signal processing apparatus may determine the value of the slice-level coding tool activation flag slice_coding_tool_enabled_flag to be the same as the value of the picture-level coding tool activation flag ph_coding_tool_enabled_flag.
  • the video signal processing apparatus may determine the slice-level coding tool activation flag slice_coding_tool_enabled_flag as non-use of the coding tool.
  • the value of the coding tool activation flag (slice_coding_tool_enabled_flag) of the slice level is determined to be the same as the value of the coding tool activation flag (ph_coding_tool_enabled_flag) of the picture level This may have the same meaning as determining the slice-level coding tool activation flag (slice_coding_tool_enabled_flag) as non-use of the coding tool.
  • 26 shows a method of signaling whether to use luma mapping with chroma scaling (LMCS) and an explicit scaling list at the sequence, picture, and slice level.
  • LMCS luma mapping with chroma scaling
  • the specific coding tool of FIGS. 24 to 25 may include at least one of luma mapping with chroma scaling (LMCS) or use of an explicit scaling list.
  • LMCS luma mapping with chroma scaling
  • LMCS can be applied before the filtering unit is performed, and can be composed of a process of mapping the luma component based on an adaptive piecewise linear model (in-loop mapping) and the process of scaling the residual signal of the chroma component (chroma scaling). have.
  • in-loop mapping of the luma component may improve coding efficiency by adjusting a dynamic range of an input signal.
  • In-loop mapping for the luma component consists of forward mapping and inverse mapping, and a predictor obtained after motion compensation may have a dynamic range adjusted by a forward mapping function.
  • the reconstructed block is obtained as the sum of the mapped predictor and the residual block, and before the filtering unit is performed, the reconstructed picture may be mapped to the original dynamic range by the backward mapping function.
  • forward mapping may not be performed.
  • a scaling parameter is derived based on a reconstructed luma sample around the current block or a preset unit block including a part of the current block (for example, a unit block having a size of 64x64), and based on this, The residual signal can be scaled.
  • the forward mapping function and chroma scaling related parameters of the LMCS may be included in an adaptive parameter set (APS), and the backward mapping function may be derived based on the forward mapping function.
  • the adaptation parameter set may include filter coefficients used in a picture or slice, elements of a scaling (scaling for transform coefficients) list, mapping functions, and the like. That is, the adaptation parameter set includes parameters used in a picture or slice and an identifier for reference, and the picture header or slice header may include an identifier of the referenced APS.
  • nal_unit_type is PREFIX_APS_NUT or SUFFIX_APS_NUT, it indicates that the non-VCL NAL unit is an APS NAL unit, and may include adaptation_parameter_set_rbsp().
  • Adaptation_parameter_set_rbsp() which is an RBSP of APS, may include adaptation_parameter_set_id, which is an APS identifier used for reference by other syntax.
  • a syntax element aps_params_type for specifying the type of parameter included in the APS may be included in the APS RBSP.
  • aps_params_type is ALF_APS, it indicates that the APS includes coefficients of an adaptive loop filter (ALF), and adaptation_parameter_set_rbsp() may include alf_data(), which is an ALF coefficient.
  • aps_params_type When aps_params_type is LMCS_APS, it indicates that the APS includes a parameter related to a forward mapping function of the LMCS and a parameter related to chroma scaling, and adaptation_parameter_set_rbsp() may include an LMCS parameter, lmcs_data().
  • aps_params_type is SCALING_APS, the corresponding APS indicates that it includes a scaling list used in a scaling process (inverse quantization process), and adaptation_parameter_set_rbsp() includes scaling_list_data(), which is a scaling list.
  • Whether to use the LMCS may be signaled at the sequence, picture, and slice level.
  • the video signal processing apparatus may perform a step of acquiring a sequence level LMCS activation flag indicating whether luma mapping with chroma scaling (LMCS) for the current CLVS is activated.
  • LMCS luma mapping with chroma scaling
  • seq_parameter_set_rbsp( ) may include an LMCS activation flag (sps_lmcs_enabled_flag) of the sequence level.
  • sequence level LMCS enable flag sps_lmcs_enabled_flag
  • the video signal processing apparatus determines whether the LMCS for the current picture is activated.
  • the step of obtaining the LMCS enable flag (ph_lmcs_enabled_flag) from the picture header syntax structure (picture_header_structure()) may be performed.
  • the step of line 2631 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag of the line 2440 of FIG. 24 from the bitstream. Alternatively, the step of line 2631 of FIG.
  • the picture header syntax structure may include a picture level LMCS activation flag (ph_lmcs_enabled_flag) when the sequence level LMCS enable flag (sps_lmcs_enabled_flag) is 1 have.
  • a picture level LMCS enable flag ph_lmcs_enabled_flag
  • the picture level LMCS enable flag ph_lmcs_enabled_flag
  • the LMCS enable flag (ph_lmcs_enabled_flag) of the picture level is 0, it specifies that the LMCS is not used for the coded slice of the coded picture. That is, it specifies that the LMCS is not used for the coded picture.
  • the picture level LMCS enable flag (ph_lmcs_enabled_flag) does not exist, the value of the picture level LMCS enable flag (ph_lmcs_enabled_flag) is inferred to be 0.
  • the video signal processing apparatus converts the identifier (ph_lmcs_aps_id) for the LMCS APS into the picture header syntax structure (picture_header_structure). ()) can be performed.
  • the picture header may include an identifier for the referenced LMCS APS.
  • the identifier (ph_lmcs_aps_id) for the LMCS APS specifies adaptation_parameter_set_id, which is the APS identifier of the LMCS APS referenced in the coded slice of the coded picture.
  • the step of line 2632 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag of line 2440 of FIG. 24 from the bitstream. Also, the step of line 2632 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag from the bitstream of FIG. 25 .
  • the video signal processing apparatus determines whether chroma residual signal scaling is activated for the current picture.
  • the step of obtaining the chroma residual scale flag (ph_chroma_residual_scale_flag) of the indicated picture level from the picture header syntax structure may be performed.
  • the picture header determines whether the chroma residual signal scaling of the LMCS is used. may contain information.
  • the chroma residual scale flag (ph_chroma_residual_scale_flag) of the picture level is 1, it specifies that the chroma residual signal scaling of the LMCS can be used for the coded slice of the coded picture. That is, it specifies that the chroma residual signal scaling of the LMCS can be used for the coded picture.
  • the chroma residual scale flag (ph_chroma_residual_scale_flag) of the picture level is 0, it specifies that the chroma residual signal scaling of the LMCS is not used for the coded slice of the coded picture. That is, it specifies that the chroma residual signal scaling of the LMCS is not used for the coded picture.
  • the picture-level chroma residual scale flag (ph_chroma_residual_scale_flag) does not exist, the value of the picture-level chroma residual scale flag (ph_chroma_residual_scale_flag) may be inferred to be 0.
  • slice_header() is used when the LMCS enable flag (ph_lmcs_enabled_flag) of the picture level is 1 and the picture header-in-slice header flag (picture_header_in_slice_header_flag) is 0, the slice level It may include an LMCS activation flag (slice_lmcs_enabled_flag). That is, referring to line 2611 of FIG.
  • the video signal processing apparatus indicates that the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure does not exist in the slice header, and the LMCS activation flag of the picture level
  • the picture header-in-slice header flag indicates that the picture header syntax structure does not exist in the slice header
  • the LMCS activation flag of the picture level When (ph_lmcs_enabled_flag) indicates activation of LMCS, obtaining a slice-level LMCS activation flag (slice_lmcs_enabled_flag) indicating whether LMCS is used for the current slice from the slice header (slice_header()) may be performed.
  • the step of line 2611 of FIG. 26 may be included in the step of obtaining the slice-level coding tool activation flag of the line 2511 of FIG. 25 from the bitstream.
  • the video signal processing apparatus may perform decoding of the current slice based on the slice-level LMCS activation flag and the identifier for the LMCS APS.
  • the slice-level LMCS enable flag (slice_lmcs_enabled_flag) is 1, it specifies that LMCS can be used for the current slice. That is, the LMCS may be used for the current slice by referring to the parameter of the LMCS APS indicated by ph_lmcs_aps_id.
  • the slice level LMCS enable flag (slice_lmcs_enabled_flag) is 0, it specifies that the LMCS is not used for the current slice.
  • the value of the slice-level LMCS activation flag may be inferred as the value of the picture-level LMCS activation flag (ph_lmcs_enabled_flag). That is, when the coded picture includes one coded slice, whether LMCS can be used for the current slice may be determined according to the picture level LMCS enable flag (ph_lmcs_enabled_flag).
  • the current slice (the only coded slice of the coded picture) included in the coded picture may use the LMCS.
  • the picture level LMCS enable flag (ph_lmcs_enabled_flag) is 0, the current slice included in the coded picture cannot use LMCS.
  • the slice-level LMCS enable flag slice_lmcs_enabled_flag may not be obtained from the bitstream. That is, when the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure is present in the slice header (the first case), or the picture-level LMCS enable flag (ph_lmcs_enabled_flag) is the deactivation of the LMCS of the current picture In the case of indicating (the second case), the slice-level LMCS activation flag (slice_lmcs_enabled_flag) may not be obtained from the bitstream.
  • the video signal processing apparatus may perform the following process to determine the slice-level LMCS-enabled flag (slice_lmcs_enabled_flag) based on the picture-level LMCS-enabled flag (ph_lmcs_enabled_flag).
  • the video signal processing apparatus sets the slice-level LMCS activation flag (slice_lmcs_enabled_flag) to the picture-level LMCS activation flag (ph_lmcs_enabled_flag)
  • the picture level LMCS activation flag may indicate activation of the LMCS of the current picture or may indicate deactivation of the LMCS.
  • the video signal processing apparatus may determine the slice-level LMCS activation flag (slice_lmcs_enabled_flag) to be the same as the picture-level LMCS activation flag (ph_lmcs_enabled_flag).
  • the video signal processing apparatus may determine the slice-level LMCS activation flag (slice_lmcs_enabled_flag) as the use of the LMCS.
  • the video signal processing apparatus may determine the slice-level LMCS activation flag (slice_lmcs_enabled_flag) as non-use of the LMCS.
  • the slice-level LMCS enable flag (slice_lmcs_enabled_flag) may be a coding tool applied to a slice included in the current picture.
  • the video signal processing apparatus determines the slice-level LMCS activation flag (slice_lmcs_enabled_flag) as non-use of the LMCS. steps can be performed. Referring to line 2510 of FIG.
  • the picture header-in-slice header flag (picture_header_in_slice_header_flag) is not obtained from the bitstream, even though the picture header syntax structure does not exist in the slice header, the slice-level LMCS enable flag (slice_lmcs_enabled_flag) is not obtained from the bitstream. If not, as in the second case, the picture level LMCS activation flag (ph_lmcs_enabled_flag) may indicate deactivation of the LMCS.
  • the video signal processing apparatus may determine the value of the slice-level LMCS activation flag (slice_lmcs_enabled_flag) to be the same as the value of the picture-level LMCS activation flag (ph_lmcs_enabled_flag). That is, the video signal processing apparatus may determine the slice-level LMCS activation flag slice_lmcs_enabled_flag as non-use of the LMCS for the current slice included in the current picture.
  • the video signal processing apparatus may decode the current slice based on the slice-level coding tool activation flag.
  • the slice-level coding tool activation flag may include a slice-level LMCS activation flag (slice_lmcs_enabled_flag).
  • the video signal processing apparatus may decode the current picture by decoding the current slice included in the current picture based on the slice level LMCS enable flag (slice_lmcs_enabled_flag).
  • LMCS may be applied when decoding the current slice according to the slice-level LMCS activation flag (slice_lmcs_enabled_flag).
  • the slice-level LMCS enable flag (slice_lmcs_enabled_flag) is 1 and the currently decoded block included in the current slice is a luma block (when the color index is 0)
  • a luma mapping process may be performed.
  • the slice-level LMCS enable flag (slice_lmcs_enabled_flag) is 1 and the currently decoded block included in the current slice is a chroma block (when the color index is greater than 0)
  • the chroma residual signal scaling process may be performed.
  • slice_header( ) may include slice_chroma_residual_scale_flag in order to signal whether to use the chroma residual signal scaling of the LMCS at the slice level. That is, slice_header() may include the following syntax structure.
  • slice_chroma_residual_scale_flag may be included in slice_header() when ph_chroma_residual_scale_flag is 1 and slice-level LMCS enable flag (slice_lmcs_enabled_flag) is 1.
  • slice_chroma_residual_scale_flag 1
  • slice_chroma_residual_scale_flag When the slice level LMCS enable flag (slice_lmcs_enabled_flag) is 0, it specifies that the chroma residual signal scaling of the LMCS is not used for the current slice.
  • slice_chroma_residual_scale_flag when the picture header-in-slice header flag (picture_header_in_slice_header_flag) is 1, it is inferred as the value of the picture level chroma residual scale flag (ph_chroma_residual_scale_flag), and the picture header flag (picture_header_headerin_slgice_header_flag)_flag When it is 0, it can be inferred as 0.
  • whether to use the chroma residual signal scaling of the LMCS in the current slice is determined according to whether to use the chroma residual signal scaling of the LMCS in the current picture.
  • the chroma residual scale flag (ph_chroma_residual_scale_flag) of the picture level is 1, the current slice (the only coded slice of the coded picture) included in the coded picture may use the chroma residual signal scaling of the LMCS.
  • the chroma residual scale flag (ph_chroma_residual_scale_flag) of the picture level is 0, the current slice included in the coded picture cannot use the chroma residual signal scaling of the LMCS.
  • the slice-level LMCS activation flag (slice_lmcs_enabled_flag), an identifier for LMCS APS LMCS may be applied when decoding the current slice according to at least one of (ph_lmcs_aps_id), a picture-level chroma residual scale flag (ph_chroma_residual_scale_flag), and a slice-level chroma residual scale flag (slice_chroma_residual_scale_flag).
  • the slice-level LMCS enable flag indicates the use of LMCS
  • the currently decoded block included in the current slice is not a luma block (if the color index is not 0)
  • the picture-level chroma residual scale flag When ph_chroma_residual_scale_flag indicates activation of chroma residual signal scaling for the current picture, a chroma residual scaling process may be performed.
  • the chroma residual signal scaling process may be performed. .
  • the human visual system has characteristics that are relatively sensitive to a low frequency band and relatively insensitive to a high frequency band.
  • transform coefficients can be efficiently compressed.
  • the variable weight list m[x][y] according to the frequency index is referred to as a scaling factor or a scaling list.
  • the decoder may obtain the transform coefficient block by inverse quantizing (scaling) the quantized transform coefficient block obtained from the bitstream.
  • the decoder may derive the final scaling list based on the scaling list m[x][y] and the levelScale[][] list that is the default inverse quantization step size in the scaling process (inverse quantization process) for the transform coefficients.
  • the levelScale[][] list indicates inverse quantization step sizes approximated by integers, and any one value included in levelScale[][] may be selected according to a quantization parameter (QP) value.
  • QP quantization parameter
  • the aforementioned scaling list m[x][y] may be explicitly signaled, or may be derived from a preset value when not explicitly signaled.
  • the explicitly signaled scaling list m[x][y] is referred to as an explicit scaling list.
  • An explicit scaling list may be included in the APS NAL unit.
  • An APS NAL unit in which aps_params_type is SCALING_APS includes explicit scaling list data, and an explicit scaling list m[x][y] may be derived based thereon.
  • Whether to use the explicit scaling list may be signaled at the sequence, picture, and slice level.
  • the video signal processing apparatus may perform an operation of acquiring an explicit scaling list activation flag of a sequence level indicating whether an explicit scaling list for the current CLVS is activated.
  • seq_parameter_set_rbsp( ) may include an explicit scaling list activation flag (sps_explicit_scaling_list_enabled_flag) of the sequence level.
  • the explicit scaling list activation flag (sps_explicit_scaling_list_enabled_flag) of the sequence level is 1, it specifies that the explicit scaling list can be used in the CLVS referring to the SPS.
  • the sequence level explicit scaling list activation flag (sps_explicit_scaling_list_enabled_flag) is 0, it specifies that the explicit scaling list is not used in the CLVS referring to the SPS. That is, when decoding a slice included in CLVS, it specifies that a preset scaling list, not an explicit scaling list included in the APS NAL unit and signaled, is used in the scaling process for the transform coefficients.
  • the preset scaling list m[x][y] may always be derived to 16 regardless of the horizontal frequency index x and the vertical frequency index y.
  • picture_header_structure( ) uses a picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) when the sequence level explicit scaling list activation flag (sps_explicit_scaling_list_enabled_flag) is 1 may include.
  • a step of obtaining a picture-level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) indicating whether to activate or not may be performed from the picture header syntax structure (picture_header_structure()).
  • the step of line 2641 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag of the line 2440 of FIG. 24 from the bitstream.
  • the step of line 2641 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag of the line 2540 of FIG. 25 from the bitstream.
  • the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) 1
  • the explicit scaling list can be used for the coded slice of the coded picture. That is, when decoding a coded slice of a coded picture, it specifies that an explicit scaling list signaled by being included in an APS NAL unit can be used in a scaling process for transform coefficients.
  • the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) it specifies that the explicit scaling list is not used for the coded slice of the coded picture.
  • a preset scaling list not an explicit scaling list signaled by being included in the APS NAL unit, is used in the scaling process for transform coefficients.
  • the preset scaling list m[x][y] may always be derived to 16 regardless of the horizontal frequency index x and the vertical frequency index y.
  • the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) does not exist, the value of the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) may be inferred to be 0.
  • the picture header may include an identifier for the APS including the scaling list element to which aps_params_type is SCALING_APS).
  • the video signal processing apparatus determines that when the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) indicates activation of the explicit scaling list, an identifier for APS including a scaling list element ( The step of obtaining ph_scaling_list_aps_id) from the picture header syntax structure (picture_header_structure()) may be performed.
  • the step of line 2642 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag of the line 2440 of FIG. 24 from the bitstream. Also, the step of line 2642 of FIG. 26 may be included in the step of obtaining the picture-level coding tool activation flag of the line 2540 of FIG. 25 from the bitstream.
  • the identifier (ph_scaling_list_aps_id) for the APS including the scaling list element specifies adaptation_parameter_set_id, which is the APS identifier of the scaling list APS referenced in the coded slice of the coded picture.
  • slice_header() has a picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) of 1 and a picture header-in-slice header flag (picture_header_in_slice_header_flag) of 0
  • a slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) may be included. Referring to line 2612 of FIG.
  • the video signal processing apparatus indicates that the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure does not exist in the slice header, and activates the explicit scaling list of the picture level
  • the slice-level explicit scaling list usage flag indicating whether the explicit scaling list is used for the current slice is obtained from the slice header (slice_header())
  • the steps to obtain can be performed.
  • the step of line 2612 of FIG. 26 may be included in the step of obtaining the slice-level coding tool activation flag of the line 2511 of FIG. 25 from the bitstream.
  • the video signal processing apparatus may perform decoding of the current picture based on an identifier (ph_scaling_list_aps_id) for APS including a slice-level explicit scaling list use flag and a scaling list element.
  • ph_scaling_list_aps_id an identifier for APS including a slice-level explicit scaling list use flag and a scaling list element.
  • slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) 1, it specifies that the explicit scaling list can be used for the current slice.
  • slice-level explicit scaling list use flag indicates that the explicit scaling list is used for the current slice
  • the video signal processing apparatus determines the APS based on the identifier (ph_scaling_list_aps_id) for the APS including the scaling list element. Scaling the transform coefficient of the current slice may be performed by using the included scaling list.
  • slice-level explicit scaling list use flag indicates that the explicit scaling list is used for the current slice
  • adaptation_parameter_set_id is an identifier for the APS including the scaling list element (ph_scaling_list_aps_id) and Specifies that an explicit scaling list signaled by being included in the same scaling list APS can be used in the scaling process for the transform coefficients.
  • the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) is 0, it specifies that the explicit scaling list is not used for the current slice.
  • the slice-level explicit scaling list use flag indicates that the explicit scaling list is not used for the current slice
  • the video signal processing apparatus uses a preset scaling list to scale the transform coefficient of the current slice can be performed. All elements of the preset scaling list may be 16.
  • slice-level explicit scaling list use flag indicates that the explicit scaling list is not used for the current slice, when decoding the current slice, it is included in the APS NAL unit and not the signaled explicit scaling list. Specifies that a preset scaling list is used in a scaling process for transform coefficients.
  • the preset scaling list m[x][y] may always be derived to 16 regardless of the horizontal frequency index x and the vertical frequency index y.
  • the value of the slice-level explicit scaling list use flag is the value of the picture-level explicit scaling list enable flag (ph_explicit_scaling_list_enabled_flag) to be inferred.
  • the current slice included in the coded picture (the only coded slice of the coded picture) may use the explicit scaling list.
  • the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) is 0, the current slice included in the coded picture cannot use the explicit scaling list.
  • the slice-level explicit scaling list use flag slice_explicit_scaling_list_used_flag may not be obtained from the bitstream. That is, when the picture header-in-slice header flag (picture_header_in_slice_header_flag) indicates that the picture header syntax structure is present in the slice header (the first case), or the picture-level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) is explicit scaling When indicating non-use of the list (the second case), the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) may not be obtained from the bitstream.
  • the video signal processing apparatus may perform the following process to determine the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) based on the picture-level explicit scaling list enable flag (ph_explicit_scaling_list_enabled_flag).
  • the video signal processing apparatus specifies the value of the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) at the picture level.
  • the step of determining the same as the value of the enemy scaling list activation flag (ph_explicit_scaling_list_enabled_flag) may be performed.
  • the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) may indicate use of the explicit scaling list or non-use of the explicit scaling list.
  • the video signal processing apparatus may determine the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) to be the same as the picture-level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag). That is, when the picture-level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) indicates activation of the explicit scaling list, the video signal processing apparatus sets the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) to the use of the explicit scaling list.
  • the video signal processing apparatus sets the slice level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) to the non-use of the explicit scaling list.
  • the video signal processing apparatus sets the slice level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) of the slice included in the current picture.
  • a step of determining the non-use of the explicit scaling list may be performed. Referring to line 2510 of FIG. 25 , even though the picture header-in-slice header flag (picture_header_in_slice_header_flag) does not have a picture header syntax structure in the slice header, the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) is set in the bitstream.
  • a case in which it is not obtained may be a case in which the picture level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) indicates non-use of the explicit scaling list as in the second case.
  • the video signal processing apparatus may determine the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) to be the same as the picture-level explicit scaling list enable flag (ph_explicit_scaling_list_enabled_flag). That is, the video signal processing apparatus may determine the slice-level explicit scaling list use flag slice_explicit_scaling_list_used_flag as non-use of the explicit scaling list.
  • determining the slice-level explicit scaling list usage flag is the same as the picture-level explicit scaling list activation flag (ph_explicit_scaling_list_enabled_flag) It may have the same meaning as determining the level's explicit scaling list use flag slice_explicit_scaling_list_used_flag as non-use of the explicit scaling list.
  • the explicit scaling list may not be used for all transform blocks included in the slice. For example, even if the slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag) is 1, when transform skip is applied to the current transform block, a preset scaling list other than the explicit scaling list may be applied. As another example, when LFNST is applied to the current transform block and it is indicated not to apply the explicit scaling list to the transform block to which the LFNT is applied, a preset scaling list other than the explicit scaling list may be applied.
  • the preset scaling list m[x][y] may always be derived to 16 regardless of the horizontal frequency index x and the vertical frequency index y.
  • the video signal processing apparatus may decode the current slice based on the slice-level coding tool activation flag.
  • the slice-level coding tool activation flag may include a slice-level explicit scaling list use flag (slice_explicit_scaling_list_used_flag).
  • the video signal processing apparatus may decode the current picture by decoding the current slice included in the current picture based on the slice level explicit scaling list use flag (slice_explicit_scaling_list_used_flag).
  • embodiments of the present disclosure may be implemented through various means.
  • embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof.
  • the method according to embodiments of the present disclosure may include one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs). , FPGAs (Field Programmable Gate Arrays), processors, controllers, microcontrollers, microprocessors, and the like.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGAs Field Programmable Gate Arrays
  • processors controllers, microcontrollers, microprocessors, and the like.
  • the method according to the embodiments of the present disclosure may be implemented in the form of a module, procedure, or function that performs the functions or operations described above.
  • the software code may be stored in the memory and driven by the processor.
  • the memory may be located inside or outside the processor, and data may be exchanged with the processor by various known means.
  • Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer-readable media may include both computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Communication media typically includes computer readable instructions, data structures, or other data in modulated data signals, such as program modules, or other transport mechanisms, and includes any information delivery media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

La présente divulgation concerne un procédé de traitement d'un signal vidéo, le procédé comprenant les étapes consistant à : acquérir, à partir d'un ensemble de paramètres de séquence (SPS) appliqué à une séquence vidéo de couche codée (CLVS) courante, un drapeau activé par outil de codage de niveau de séquence indiquant si un outil de codage est activé pour la CLVS courante ; si le drapeau activé par outil de codage de niveau de séquence indique que l'outil de codage est activé pour la CLVS courante, acquérir un drapeau activé par outil de codage de niveau d'image indiquant si l'outil de codage est activé pour une image courante incluse dans la CLVS courante ; acquérir, à partir d'un en-tête de tranche d'une tranche courante incluse dans l'image courante, un en-tête d'image dans un drapeau d'en-tête de tranche indiquant si une structure de syntaxe d'en-tête d'image existe dans l'en-tête de tranche ; et si l'en-tête d'image dans le drapeau d'en-tête de tranche indique qu'aucune structure de syntaxe d'en-tête d'image n'existe dans l'en-tête de tranche, et le drapeau activé par outil de codage de niveau d'image indique que l'outil de codage est activé, acquérir, à partir de l'en-tête de tranche, un drapeau activé par outil de codage de niveau de tranche indiquant si l'outil de codage est utilisé pour la tranche courante.
PCT/KR2021/002174 2020-03-27 2021-02-22 Procédé et dispositif de traitement d'un signal vidéo en utilisant un en-tête d'image WO2021194100A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2021243774A AU2021243774B2 (en) 2020-03-27 2021-02-22 Method and device for processing video signal by using picture header
CA3177367A CA3177367A1 (fr) 2020-03-27 2021-02-22 Procede et dispositif de traitement d'un signal video en utilisant un en-tete d'image
AU2024219800A AU2024219800A1 (en) 2020-03-27 2024-09-17 Method and device for processing video signal by using picture header

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0037836 2020-03-27
KR20200037836 2020-03-27

Publications (1)

Publication Number Publication Date
WO2021194100A1 true WO2021194100A1 (fr) 2021-09-30

Family

ID=77892375

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/002174 WO2021194100A1 (fr) 2020-03-27 2021-02-22 Procédé et dispositif de traitement d'un signal vidéo en utilisant un en-tête d'image

Country Status (3)

Country Link
AU (2) AU2021243774B2 (fr)
CA (1) CA3177367A1 (fr)
WO (1) WO2021194100A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170129975A (ko) * 2013-07-15 2017-11-27 소니 주식회사 비트스트림을 처리하기 위한 장치 및 방법

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170129975A (ko) * 2013-07-15 2017-11-27 소니 주식회사 비트스트림을 처리하기 위한 장치 및 방법

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
B. BROSS, J. CHEN, S. LIU: "Versatile Video Coding (Draft 6)", 15. JVET MEETING; 20190703 - 20190712; GOTHENBURG; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-O2001-vE, 31 July 2019 (2019-07-31), pages 1 - 455, XP030293944 *
BENJAMIN BROSS , JIANLE CHEN , SHAN LIU , YE-KUI WANG: "Versatile Video Coding (Draft 8)", 17. JVET MEETING; 20200107 - 20200117; BRUSSELS; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-Q2001-vE, 12 March 2020 (2020-03-12), pages 1 - 510, XP030285390 *
HENDRY (LGE),: "AHG9: A summary of HLS contributions on picture header, slice header, and access unit delimiter", 17. JVET MEETING; 20200107 - 20200117; BRUSSELS; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-Q0684 ; m52647, 9 January 2020 (2020-01-09), XP030224047 *
J. SAMUELSSON (SHARPLABS), S. DESHPANDE, A. SEGALL (SHARP): "AHG9: On slice header", 17. JVET MEETING; 20200107 - 20200117; BRUSSELS; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-Q0346 ; m51941, 31 December 2019 (2019-12-31), XP030223257 *

Also Published As

Publication number Publication date
AU2021243774A1 (en) 2022-11-10
CA3177367A1 (fr) 2021-09-30
AU2021243774B2 (en) 2024-06-20
AU2024219800A1 (en) 2024-10-10

Similar Documents

Publication Publication Date Title
WO2021015523A1 (fr) Procédé et dispositif de traitement de signal vidéo
WO2014003379A1 (fr) Procédé de décodage d'image et appareil l'utilisant
WO2014092407A1 (fr) Procédé de décodage d'image et appareil l'utilisant
WO2021125912A1 (fr) Procédé de traitement de signal vidéo, et dispositif s'y rapportant
WO2020204413A1 (fr) Codage vidéo ou d'image pour corriger une image de restauration
WO2021054797A1 (fr) Procédé et dispositif de traitement de signaux vidéo utilisant un processus de mise à l'échelle
WO2020171681A1 (fr) Procédé et dispositif de traitement de signal vidéo sur la base de l'intraprédiction
WO2015056941A1 (fr) Procédé et appareil de codage/décodage d'images sur base multicouche
WO2020204419A1 (fr) Codage vidéo ou d'image basé sur un filtre à boucle adaptatif
WO2020231139A1 (fr) Codage vidéo ou d'image basé sur une cartographie de luminance et une mise à l'échelle chromatique
WO2020180143A1 (fr) Codage vidéo ou d'image basé sur un mappage de luminance avec mise à l'échelle de chrominance
WO2020180122A1 (fr) Codage de vidéo ou d'images sur la base d'un modèle à alf analysé conditionnellement et d'un modèle de remodelage
WO2020076142A1 (fr) Dispositif et procédé de traitement de signal vidéo à l'aide d'un modèle linéaire inter-composants
WO2021086021A1 (fr) Procédé et appareil de codage/décodage d'image utilisant une transformée adaptative, et procédé de transmission de flux binaire
WO2015099401A1 (fr) Procédé et appareil pour coder/décoder un signal vidéo multicouche
WO2015064990A1 (fr) Procédé et dispositif d'encodage/décodage de signal vidéo multicouche
WO2021066618A1 (fr) Codage d'image ou de vidéo basé sur la signalisation d'informations liées au saut de transformée et au codage de palette
WO2020197207A1 (fr) Codage d'image ou vidéo sur la base d'un filtrage comprenant un mappage
WO2021201513A1 (fr) Procédé et dispositif de codage/décodage d'image destinés à signaler des informations relatives à des ptl, dpb et hrd dans un sps, et support d'enregistrement lisible par ordinateur stockant un flux binaire
WO2021194100A1 (fr) Procédé et dispositif de traitement d'un signal vidéo en utilisant un en-tête d'image
WO2021066609A1 (fr) Codage d'image ou de vidéo basé sur un élément de syntaxe avancée lié au codage de saut de transformée et de palette
WO2021086024A1 (fr) Procédé de codage/décodage d'image et appareil réalisant un traitement résiduel à l'aide d'une transformation d'espace de couleur adaptative, et procédé de transmission de flux binaire
WO2020180151A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2020246790A1 (fr) Codage vidéo ou d'image basé sur un composant de luminance mappé et un composant de chrominance mis à l'échelle
WO2020190085A1 (fr) Codage de vidéo ou d'image reposant sur un filtrage en boucle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21776875

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3177367

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021243774

Country of ref document: AU

Date of ref document: 20210222

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 21776875

Country of ref document: EP

Kind code of ref document: A1