KR20210105984A

KR20210105984A - Intra random access point pictures and leading pictures in video coding

Info

Publication number: KR20210105984A
Application number: KR1020217023657A
Authority: KR
Inventors: 프누 헨드리; 예-쿠이 왕
Original assignee: 후아웨이 테크놀러지 컴퍼니 리미티드
Priority date: 2018-12-27
Filing date: 2019-12-23
Publication date: 2021-08-27
Also published as: KR102603980B1; US20210392361A1; BR112021012679A2; CN113228588A; US11563967B2; WO2020139829A1

Abstract

비디오 인코더에 의해 구현된 비디오 비트스트림을 인코딩하는 방법이 개시된다. 이 방법은 : 비디오 데이터에 대해 이용가능한 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트를 비디오 인코더의 메모리에 저장하는 단계; 비디오 인코더의 프로세서에 의해, 비디오 데이터로부터의 픽처에 대해 5개 미만의 NAL 유닛 유형의 세트로부터 NAL 유닛 유형을 선택하는 단계; 비디오 인코더의 프로세서에 의해, 선택된 NAL 유닛 유형에 대응하는 NAL 유닛을 포함하고 선택된 NAL 유닛 유형을 식별하는 식별자를 포함하는 비디오 비트스트림을 생성하는 단계; 및 비디오 인코더의 전송기에 의해, 비디오 비트스트림을 비디오 디코더를 향해 전송하는 단계를 포함한다. 비디오 비트스트림을 디코딩하는 대응하는 방법도 역시 개시된다.A method of encoding a video bitstream implemented by a video encoder is disclosed. The method includes: storing in a memory of a video encoder a set of less than five network abstraction layer (NAL) unit types available for video data; selecting, by a processor of a video encoder, a NAL unit type from a set of less than 5 NAL unit types for a picture from the video data; generating, by a processor of a video encoder, a video bitstream that includes a NAL unit corresponding to the selected NAL unit type and includes an identifier identifying the selected NAL unit type; and sending, by the transmitter of the video encoder, the video bitstream towards the video decoder. A corresponding method of decoding a video bitstream is also disclosed.

Description

Intra random access point pictures and leading pictures in video coding

관련 출원의 상호참조Cross-referencing of related applications

본 특허 출원은, 참조에 의해 본 명세서에 포함되는, Fnu Hendry 등에 의해 2018년 12월 27일에 출원된 발명의 명칭이 "On Intra Random Access Point Pictures and Leading Pictures in Video Coding"인, 미국 가출원 번호 제62/785,515호의 이익을 주장한다.This patent application is entitled "On Intra Random Access Point Pictures and Leading Pictures in Video Coding," U.S. Provisional Application No., filed December 27, 2018 by Fnu Hendry et al., which is incorporated herein by reference. Claims the interests of No. 62/785,515.

일반적으로, 본 개시내용은 선두 및 인트라 랜덤 액세스 포인트(intra random access point; IRAP) 픽처들에 대한 네트워크 추상화 계층(network abstraction layer; NAL) 유닛 유형들을 처리하기 위한 기술들을 설명한다. 더 구체적으로, 본 개시내용은 이용가능한 NAL 유닛 유형들의 수를 제한하고 픽처들이 NAL 유닛 유형에 의해 식별되지 않았을 때 픽처가 디코딩가능한지 여부를 표시하기 위해 플래그들을 이용하기 위한 기술들을 설명한다.In general, this disclosure describes techniques for processing network abstraction layer (NAL) unit types for leading and intra random access point (IRAP) pictures. More specifically, this disclosure describes techniques for limiting the number of available NAL unit types and using flags to indicate whether a picture is decodable when the pictures are not identified by a NAL unit type.

비교적 짧은 비디오를 묘사하는데 필요한 비디오 데이터의 양은 상당할 수 있으며, 이것은 제한된 대역폭 용량을 갖는 통신 네트워크를 통해 데이터가 스트리밍되거나 기타의 방식으로 전달될 때 어려움을 초래할 수 있다. 따라서, 비디오 데이터는 일반적으로, 현대의 통신 네트워크들을 통해 전달되기 전에 압축된다. 메모리 자원들은 제한될 수 있기 때문에 비디오가 저장 디바이스에 저장될 때 비디오의 크기가 문제가 될 수도 있다. 비디오 압축 디바이스들은 종종, 전송 또는 저장 전에 소스에서 소프트웨어 및/또는 하드웨어를 이용하여 비디오 데이터를 코딩함으로써, 전송 또는 저장 전에 디지털 비디오 이미지들을 표현하는데 필요한 데이터 양을 감소시킨다. 그 다음, 압축된 데이터는 비디오 데이터를 디코딩하는 비디오 압축해제 디바이스에 의해 목적지에서 수신된다. 제한된 네트워크 자원들과 더 높은 비디오 품질에 대한 수요가 계속 증가함에 따라 이미지 품질을 거의 또는 전혀 희생하지 않고 압축 비율을 개선하는 개선된 압축 및 압축해제 기술들이 바람직하다.The amount of video data required to depict a relatively short video can be significant, which can create difficulties when the data is streamed or otherwise conveyed over a communication network having limited bandwidth capacity. Accordingly, video data is typically compressed prior to delivery over modern communication networks. The size of the video may be an issue when the video is stored on a storage device because memory resources may be limited. Video compression devices often code the video data using software and/or hardware at the source prior to transmission or storage, thereby reducing the amount of data required to represent digital video images prior to transmission or storage. The compressed data is then received at the destination by a video decompression device that decodes the video data. As limited network resources and the demand for higher video quality continue to increase, improved compression and decompression techniques that improve compression ratios with little or no sacrificing image quality are desirable.

제1 양태는 비디오 인코더에 의해 구현된 비디오 비트스트림을 인코딩하는 방법에 관한 것이다. 이 방법은 비디오 데이터에 대해 이용가능한 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트(a set of less than five network abstraction layer (NAL) unit types)를 비디오 인코더의 메모리에 저장하는 단계; 비디오 인코더의 프로세서에 의해, 비디오 데이터로부터의 픽처에 대해 5개 미만의 NAL 유닛 유형의 세트로부터 NAL 유닛 유형을 선택하는 단계; 비디오 인코더의 프로세서에 의해, 선택된 NAL 유닛 유형에 대응하는 NAL 유닛을 포함하고 선택된 NAL 유닛 유형을 식별하는 식별자를 포함하는 비디오 비트스트림을 생성하는 단계; 및 비디오 인코더의 전송기에 의해, 비디오 비트스트림을 비디오 디코더를 향해 전송하는 단계를 포함한다.A first aspect relates to a method of encoding a video bitstream implemented by a video encoder. The method includes storing a set (a set of less than five network abstraction layer (NAL) unit types) of less than five possible network abstraction layer (NAL) unit type of use for the video data to the video encoder memory; selecting, by a processor of a video encoder, a NAL unit type from a set of less than 5 NAL unit types for a picture from the video data; generating, by a processor of a video encoder, a video bitstream that includes a NAL unit corresponding to the selected NAL unit type and includes an identifier identifying the selected NAL unit type; and sending, by the transmitter of the video encoder, the video bitstream towards the video decoder.

이 방법은 비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제한하는 (예를 들어, NAL 유닛 유형의 수를 4개로 제한하는) 기술들을 제공한다. 이것은 선두 및 후미 픽처들(일명, 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. NAL 유닛 유형의 세트를 제약함으로써, 비디오 코딩에서의 코더/디코더(일명 "코덱")는 현재의 코덱들에 비해 개선된다(예를 들어, 더 적은 비트 이용하고, 더 적은 대역폭을 요구하며, 더 효율적이다 등). 실질적으로, 개선된 비디오 코딩 프로세스는, 비디오를 전송, 수신 및/또는 시청할 때 사용자에게 더 양호한 사용자 경험을 제공한다.This method provides techniques for limiting the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, limiting the number of NAL unit types to 4). This allows leading and trailing pictures (aka, non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. By constraining the set of NAL unit types, the coder/decoder (aka “codec”) in video coding is improved over current codecs (eg, uses fewer bits, requires less bandwidth, and is more efficient, etc.). In practice, the improved video coding process provides a better user experience for the user when transmitting, receiving and/or watching video.

따라서 제1 양태에 따른 방법의 제1 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(random access skipped leading; RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(random access decodable leading; RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP를 포함한다.Thus, in a first implementation form of the method according to the first aspect, the set of less than 5 network abstraction layer (NAL) unit types include: leading and trailing pictures NAL unit type, random access skipped leading; Intra Random Access Point (IRAP) with RASL) NAL unit type, IRAP with random access decodable leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

제1 양태에 따른 방법의 제2 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP로 구성된다.In a second implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the set of less than five network abstraction layer (NAL) unit types are: leading and trailing pictures NAL unit type, random It consists of an Intra Random Access Point (IRAP) with Access Skip Type Leading (RASL) NAL unit type, IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

제1 양태에 따른 방법의 제3 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 선두 및 후미 픽처들 양쪽 모두에는 선두 및 후미 픽처들 NAL 유닛 유형이 할당된다.In a third implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, both the leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

제1 양태에 따른 방법의 제4 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RASL 픽처 및 0개 이상의 RADL 픽처들이 뒤따르는 IRAP 픽처에 대해서 RASL NAL 유닛 유형을 갖는 IRAP가 선택된다.In a fourth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, having a RASL NAL unit type for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order IRAP is selected.

제1 양태에 따른 방법의 제5 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 클린 랜덤 액세스(CRA; clean random access) 픽처라고 지칭된다.In a fifth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the IRAP picture is referred to as a clean random access (CRA) picture.

제1 양태에 따른 방법의 제6 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 클린 랜덤 액세스(CRA) NAL 유닛 유형이라고 지칭된다.In a sixth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the IRAP having a RASL NAL unit type is referred to as a Clean Random Access (CRA) NAL unit type.

제1 양태에 따른 방법의 제7 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RASL로 지정된다.In a seventh implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, an IRAP having a RASL NAL unit type is designated as IRAP_W_RASL.

제1 양태에 따른 방법의 제8 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RASL 지정은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 3에 대응한다.In an eighth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the IRAP_W_RASL designation is: Stream Access Point (SAP) Type 3 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol respond to

제1 양태에 따른 방법의 제9 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RADL 픽처 및 0개 이상의 RADL 픽처가 뒤따르는 IRAP 픽처에 대해 RADL NAL 유닛 유형을 갖는 IRAP가 선택된다.In a ninth implementation form of the method according to the first aspect or any preceding implementation of the first aspect, having a RADL NAL unit type for an IRAP picture followed by one or more RADL pictures and zero or more RADL pictures in decoding order IRAP is selected.

제1 양태에 따른 방법의 제10 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 RADL 픽처를 갖는 순간 디코더 리프레시(IDR; instantaneous decoder refresh) 픽처라고 지칭된다.In a tenth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

제1 양태에 따른 방법의 제11 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 RADL NAL 유닛 유형을 갖는 순간 디코더 리프레시(IDR)라고 지칭된다.In an eleventh implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, an IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

제1 양태에 따른 방법의 제12 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RADL로 지정된다.In a twelfth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, an IRAP having a RADL NAL unit type is designated as IRAP_W_RADL.

제1 양태에 따른 방법의 제13 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RADL은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 2에 대응한다.In a thirteenth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, IRAP_W_RADL corresponds to Stream Access Point (SAP) Type 2 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol do.

제1 양태에 따른 방법의 제14 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 선두 픽처가 뒤따르지 않는 IRAP 픽처에 대해 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP가 선택된다. 제1 양태에 따른 방법의 제15 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 선두 픽처가 없는 순간 디코더 리프레시(IDR) 픽처라고 지칭된다.In a fourteenth implementation of the method according to the first aspect or any preceding implementation of the first aspect, an IRAP having no leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order . In a fifteenth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

제1 양태에 따른 방법의 제16 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 선두 픽처들 NAL 유닛 유형이 없는 순간 디코더 리프레시(IDR)라고 지칭된다.In a sixteenth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, IRAP without leading pictures NAL unit type is called Instantaneous Decoder Refresh (IDR) without leading pictures NAL unit type. is referred to

제1 양태에 따른 방법의 제17 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 IRAP_N_LP로 지정된다.In a seventeenth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, an IRAP without leading pictures NAL unit type is designated as IRAP_N_LP.

제1 양태에 따른 방법의 제18 구현 형태 또는 제1 양태의 임의의 선행하는 구현 형태에서, IRAP_N_LP 지정은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 1에 대응한다.In an eighteenth implementation form of the method according to the first aspect or any preceding implementation form of the first aspect, the IRAP_N_LP designation is in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol to Stream Access Point (SAP) Type 1 respond

제2 양태는 비디오 디코더에 의해 구현된 코딩된 비디오 비트스트림을 디코딩하는 방법에 관한 것이다. 이 방법은, 비디오 데이터에 대해 이용가능한 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트를 비디오 디코더의 메모리에 저장하는 단계; 비디오 디코더의 수신기에 의해, NAL 유닛 및 식별자를 포함하는 코딩된 비디오 비트스트림을 수신하는 단계; 비디오 디코더의 프로세서에 의해, 식별자에 기초하여 NAL 유닛을 인코딩하는데 이용된 NAL 유닛 유형을 5개 미만의 NAL 유닛 유형의 세트로부터 결정하는 단계; 및 비디오 디코더의 프로세서에 의해, 결정된 NAL 유닛 유형에 기초하여 NAL 유닛에 포함된 픽처들에 대한 프리젠테이션 순서를 할당하는 단계를 포함한다.A second aspect relates to a method of decoding a coded video bitstream implemented by a video decoder. The method includes storing in a memory of a video decoder a set of less than five network abstraction layer (NAL) unit types available for video data; receiving, by a receiver of a video decoder, a coded video bitstream comprising a NAL unit and an identifier; determining, by a processor of the video decoder, a NAL unit type used to encode the NAL unit based on the identifier from the set of less than five NAL unit types; and allocating, by a processor of the video decoder, a presentation order for pictures included in the NAL unit based on the determined NAL unit type.

따라서 제2 양태에 따른 방법의 제1 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP를 포함한다.Thus, in a first implementation form of the method according to the second aspect, the set of less than 5 network abstraction layer (NAL) unit types are: leading and trailing pictures NAL unit type, random access skipping leading (RASL) NAL unit type Intra Random Access Point (IRAP) with , IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

제2 양태에 따른 방법의 제2 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP로 구성된다.In a second implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, the set of less than five network abstraction layer (NAL) unit types are: leading and trailing pictures NAL unit type, random It consists of an Intra Random Access Point (IRAP) with Access Skip Type Leading (RASL) NAL unit type, IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

제2 양태에 따른 방법의 제3 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 선두 및 후미 픽처들 양쪽 모두에는 선두 및 후미 픽처들 NAL 유닛 유형이 할당된다.In a third implementation of the method according to the second aspect or any preceding implementation of the second aspect, both the leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

제2 양태에 따른 방법의 제4 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RASL 픽처 및 0개 이상의 RADL 픽처들이 뒤따르는 IRAP 픽처에 대해서 RASL NAL 유닛 유형을 갖는 IRAP가 결정된다.In a fourth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, having a RASL NAL unit type for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order IRAP is determined.

제2 양태에 따른 방법의 제5 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 클린 랜덤 액세스(CRA; clean random access) 픽처라고 지칭된다.In a fifth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, the IRAP picture is referred to as a clean random access (CRA) picture.

제2 양태에 따른 방법의 제6 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 클린 랜덤 액세스(CRA) NAL 유닛 유형이라고 지칭된다.In a sixth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, the IRAP having a RASL NAL unit type is referred to as a Clean Random Access (CRA) NAL unit type.

제2 양태에 따른 방법의 제7 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RASL로 지정된다.In a seventh implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, an IRAP having a RASL NAL unit type is designated as IRAP_W_RASL.

제2 양태에 따른 방법의 제8 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RASL 지정은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 3에 대응한다.In an eighth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, the IRAP_W_RASL designation is: Stream Access Point (SAP) Type 3 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol respond to

제2 양태에 따른 방법의 제9 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RADL 픽처 및 0개 이상의 RADL 픽처가 뒤따르는 IRAP 픽처에 대해 RADL NAL 유닛 유형을 갖는 IRAP가 결정된다.In a ninth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, having a RADL NAL unit type for an IRAP picture followed by one or more RADL pictures and zero or more RADL pictures in decoding order IRAP is determined.

제2 양태에 따른 방법의 제10 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 RADL 픽처를 갖는 순간 디코더 리프레시(IDR; instantaneous decoder refresh) 픽처라고 지칭된다.In a tenth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

제2 양태에 따른 방법의 제11 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 RADL NAL 유닛 유형을 갖는 순간 디코더 리프레시(IDR)라고 지칭된다.In an eleventh implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, an IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

제2 양태에 따른 방법의 제12 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RADL로 지정된다.In a twelfth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, an IRAP having a RADL NAL unit type is designated as IRAP_W_RADL.

제2 양태에 따른 방법의 제13 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RADL은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 2에 대응한다.In a thirteenth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, IRAP_W_RADL corresponds to Stream Access Point (SAP) Type 2 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol do.

제2 양태에 따른 방법의 제14 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 선두 픽처가 뒤따르지 않는 IRAP 픽처에 대해 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP가 결정된다.In a fourteenth implementation of the method according to the second aspect or any preceding implementation of the second aspect, an IRAP having no leading pictures NAL unit type is determined for an IRAP picture not followed by a leading picture in decoding order .

제2 양태에 따른 방법의 제15 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 선두 픽처가 없는 순간 디코더 리프레시(IDR) 픽처라고 지칭된다.In a fifteenth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

제2 양태에 따른 방법의 제16 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 선두 픽처들 NAL 유닛 유형이 없는 순간 디코더 리프레시(IDR)라고 지칭된다.In a sixteenth implementation form of the method according to the second aspect or any preceding implementation form of the second aspect, IRAP without leading pictures NAL unit type is called Instantaneous Decoder Refresh (IDR) without leading pictures NAL unit type. is referred to

제2 양태에 따른 방법의 제17 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 IRAP_N_LP로 지정된다.In a seventeenth implementation of the method according to the second aspect or any preceding implementation of the second aspect, an IRAP without leading pictures NAL unit type is designated as IRAP_N_LP.

제2 양태에 따른 방법의 제18 구현 형태 또는 제2 양태의 임의의 선행하는 구현 형태에서, IRAP_N_LP 지정은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 1에 대응한다.In an eighteenth implementation form of the method according to the second aspect or any preceding implementation of the second aspect, the IRAP_N_LP designation is in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol to Stream Access Point (SAP) Type 1 respond

제3 양태는 인코딩 디바이스에 관한 것이다. 인코딩 디바이스는, 명령어들 및 비디오 데이터에 대해 이용가능한 5개 미만의 NAL(network abstraction layer) 유닛 유형의 세트를 포함하는 메모리; 메모리에 결합된 프로세서, ―프로세서는 인코딩 디바이스로 하여금 : 비디오 데이터로부터의 픽처에 대해 5개 미만의 NAL 유닛 유형의 세트로부터 NAL 유닛 유형을 선택하고; 선택된 NAL 유닛 유형에 대응하는 NAL 유닛을 포함하고 선택된 NAL 유닛 유형을 식별하는 식별자를 포함하는 비디오 비트스트림을 생성하게 하는 명령어들을 구현하도록 구성됨―; 및 프로세서에 결합되고, 비디오 비트스트림을 비디오 디코더를 향해 전송하도록 구성된 전송기를 포함한다.A third aspect relates to an encoding device. The encoding device comprises: a memory including a set of less than five network abstraction layer (NAL) unit types usable for instructions and video data; A processor coupled to the memory, the processor causing the encoding device to: select a NAL unit type from a set of less than five NAL unit types for a picture from the video data; configured to implement instructions to generate a video bitstream that includes a NAL unit corresponding to the selected NAL unit type and includes an identifier identifying the selected NAL unit type; and a transmitter coupled to the processor and configured to transmit the video bitstream towards the video decoder.

인코딩 디바이스는, 비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제한하는 (예를 들어, NAL 유닛 유형의 수를 4개로 제한하는) 기술들을 제공한다. 이것은 선두 및 후미 픽처들(일명, 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. NAL 유닛 유형의 세트를 제약함으로써, 비디오 코딩에서의 코더/디코더(일명 "코덱")는 현재의 코덱들에 비해 개선된다(예를 들어, 더 적은 비트 이용하고, 더 적은 대역폭을 요구하며, 더 효율적이다 등). 실질적으로, 개선된 비디오 코딩 프로세스는, 비디오를 전송, 수신 및/또는 시청할 때 사용자에게 더 양호한 사용자 경험을 제공한다.The encoding device provides techniques for limiting the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, limiting the number of NAL unit types to 4). This allows leading and trailing pictures (aka, non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. By constraining the set of NAL unit types, the coder/decoder (aka “codec”) in video coding is improved over current codecs (eg, uses fewer bits, requires less bandwidth, and is more efficient, etc.). In practice, the improved video coding process provides a better user experience for the user when transmitting, receiving and/or watching video.

제3 양태에 따른 인코딩 디바이스의 제1 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP를 포함한다.In a first implementation form of the encoding device according to the third aspect, the set of less than five network abstraction layer (NAL) unit types are: leading and trailing pictures NAL unit type, random access skipping leading (RASL) NAL unit type Intra Random Access Point (IRAP) with , IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

제3 양태에 따른 인코딩 디바이스의 제2 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP로 구성된다.In a second implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the set of less than five network abstraction layer (NAL) unit types comprises: leading and trailing pictures NAL unit type; Consists of Intra Random Access Point (IRAP) with Random Access Skip Type Leading (RASL) NAL unit type, IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type. .

제3 양태에 따른 인코딩 디바이스의 제3 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 선두 및 후미 픽처들 양쪽 모두에는 선두 및 후미 픽처들 NAL 유닛 유형이 할당된다.In a third implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, both the leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

제3 양태에 따른 인코딩 디바이스의 제4 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RASL 픽처 및 0개 이상의 RADL 픽처들이 뒤따르는 IRAP 픽처에 대해서 RASL NAL 유닛 유형을 갖는 IRAP가 선택된다.In a fourth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the RASL NAL unit type is configured for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order. The IRAP with which it has is selected.

제3 양태에 따른 인코딩 디바이스의 제5 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 클린 랜덤 액세스(CRA; clean random access) 픽처라고 지칭된다.In a fifth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the IRAP picture is referred to as a clean random access (CRA) picture.

제3 양태에 따른 인코딩 디바이스의 제6 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 클린 랜덤 액세스(CRA) NAL 유닛 유형이라고 지칭된다.In a sixth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the IRAP having a RASL NAL unit type is referred to as a clean random access (CRA) NAL unit type.

제3 양태에 따른 인코딩 디바이스의 제7 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RASL로 지정된다.In a seventh implementation form of the encoding device according to the third aspect or any preceding implementation aspect of the third aspect, an IRAP having a RASL NAL unit type is designated as IRAP_W_RASL.

제3 양태에 따른 인코딩 디바이스의 제8 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RASL 지정은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 3에 대응한다.In an eighth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the IRAP_W_RASL designation is: Stream Access Point (SAP) type in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol corresponds to 3.

제3 양태에 따른 인코딩 디바이스의 제9 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RADL 픽처 및 0개 이상의 RADL 픽처들이 뒤따르는 IRAP 픽처에 대해서 RADL NAL 유닛 유형을 갖는 IRAP가 선택된다.In a ninth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the RADL NAL unit type is configured for an IRAP picture followed by one or more RADL pictures and zero or more RADL pictures in decoding order. The IRAP with which it has is selected.

제3 양태에 따른 인코딩 디바이스의 제10 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 RADL 픽처를 갖는 순간 디코더 리프레시(IDR; instantaneous decoder refresh) 픽처라고 지칭된다.In a tenth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

제3 양태에 따른 인코딩 디바이스의 제11 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 RADL NAL 유닛 유형을 갖는 순간 디코더 리프레시(IDR)라고 지칭된다.In the eleventh implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, an IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

제3 양태에 따른 인코딩 디바이스의 제12 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RADL로 지정된다.In a twelfth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, an IRAP having a RADL NAL unit type is designated as IRAP_W_RADL.

제3 양태에 따른 인코딩 디바이스의 제13 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RADL은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 2에 대응한다.In a thirteenth implementation form of the encoding device according to the third aspect or any preceding implementation aspect of the third aspect, IRAP_W_RADL is: Stream Access Point (SAP) Type 2 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol respond to

제3 양태에 따른 인코딩 디바이스의 제14 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 선두 픽처가 뒤따르지 않는 IRAP 픽처에 대해 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP가 선택된다.In a fourteenth implementation form of the encoding device according to the third aspect or any preceding implementation aspect of the third aspect, an IRAP having no leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order do.

제3 양태에 따른 인코딩 디바이스의 제15 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 선두 픽처가 없는 순간 디코더 리프레시(IDR; instantaneous decoder refresh) 픽처라고 지칭된다.In a fifteenth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

제3 양태에 따른 인코딩 디바이스의 제16 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 선두 픽처들 NAL 유닛 유형이 없는 순간 디코더 리프레시(IDR)라고 지칭된다.In a sixteenth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, IRAP without leading pictures NAL unit type is an instantaneous decoder refresh (IDR) without leading pictures NAL unit type. is referred to as

제3 양태에 따른 인코딩 디바이스의 제17 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 IRAP_N_LP로 지정된다.In a seventeenth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, an IRAP having no lead pictures NAL unit type is designated as IRAP_N_LP.

제3 양태에 따른 인코딩 디바이스의 제18 구현 형태 또는 제3 양태의 임의의 선행하는 구현 형태에서, IRAP_N_LP 지정은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 1에 대응한다.In an eighteenth implementation form of the encoding device according to the third aspect or any preceding implementation form of the third aspect, the IRAP_N_LP designation is: Stream Access Point (SAP) type in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol corresponds to 1.

제4 양태는 디코딩 디바이스에 관한 것이다. 디코딩 디바이스는, NAL 유닛 및 식별자를 포함하는 코딩된 비디오 비트스트림을 수신하도록 구성된 수신기; 수신기에 결합된 메모리, ―메모리는 비디오 데이터에 대해 이용가능한 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트 및 명령어들을 저장함―; 및 메모리에 결합된 프로세서를 포함하고, 프로세서는, 디코딩 디바이스로 하여금 : 식별자에 기초하여 NAL 유닛을 인코딩하는데 이용된 NAL 유닛 유형을 5개 미만의 NAL 유닛 유형의 세트로부터 결정하고; 결정된 NAL 유닛 유형에 기초하여 NAL 유닛에 포함된 픽처들에 대한 프리젠테이션 순서를 할당하게 하는 명령어들을 실행하도록 구성된다.A fourth aspect relates to a decoding device. The decoding device includes: a receiver configured to receive a coded video bitstream that includes a NAL unit and an identifier; a memory coupled to the receiver, the memory storing instructions and sets of less than five network abstraction layer (NAL) unit types available for video data; and a processor coupled to the memory, the processor configured to cause the decoding device to: determine from the set of less than five NAL unit types a NAL unit type used to encode the NAL unit based on the identifier; and execute instructions to assign a presentation order to pictures included in the NAL unit based on the determined NAL unit type.

디코딩 디바이스는, 비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제한하는 (예를 들어, NAL 유닛 유형의 수를 4개로 제한하는) 기술들을 제공한다. 이것은 선두 및 후미 픽처들(일명, 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. NAL 유닛 유형의 세트를 제약함으로써, 비디오 코딩에서의 코더/디코더(일명 "코덱")는 현재의 코덱들에 비해 개선된다(예를 들어, 더 적은 비트 이용하고, 더 적은 대역폭을 요구하며, 더 효율적이다 등). 실질적으로, 개선된 비디오 코딩 프로세스는, 비디오를 전송, 수신 및/또는 시청할 때 사용자에게 더 양호한 사용자 경험을 제공한다.The decoding device provides techniques for limiting the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, limiting the number of NAL unit types to 4). This allows leading and trailing pictures (aka, non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. By constraining the set of NAL unit types, the coder/decoder (aka “codec”) in video coding is improved over current codecs (eg, uses fewer bits, requires less bandwidth, and is more efficient, etc.). In practice, the improved video coding process provides a better user experience for the user when transmitting, receiving and/or watching video.

제4 양태에 따른 디코딩 디바이스의 제1 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP를 포함한다.In a first implementation form of the decoding device according to the fourth aspect, the set of less than five network abstraction layer (NAL) unit types are: leading and trailing pictures NAL unit type, random access skipping leading (RASL) NAL unit type Intra Random Access Point (IRAP) with , IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

제4 양태에 따른 디코딩 디바이스의 제2 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP로 구성된다.In the second implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the set of less than five network abstraction layer (NAL) unit types comprises: leading and trailing pictures NAL unit type; Consists of Intra Random Access Point (IRAP) with Random Access Skip Type Leading (RASL) NAL unit type, IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type. .

제4 양태에 따른 디코딩 디바이스의 제3 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 선두 및 후미 픽처들 양쪽 모두에는 선두 및 후미 픽처들 NAL 유닛 유형이 할당된다.In a third implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, both the leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

제4 양태에 따른 디코딩 디바이스의 제4 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RASL 픽처 및 0개 이상의 RADL 픽처들이 뒤따르는 IRAP 픽처에 대해서 RASL NAL 유닛 유형을 갖는 IRAP가 선택된다.In a fourth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the RASL NAL unit type is selected for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order. The IRAP with which it has is selected.

제4 양태에 따른 디코딩 디바이스의 제5 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 클린 랜덤 액세스(CRA; clean random access) 픽처라고 지칭된다.In a fifth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IRAP picture is referred to as a clean random access (CRA) picture.

제4 양태에 따른 디코딩 디바이스의 제6 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 클린 랜덤 액세스(CRA) NAL 유닛 유형이라고 지칭된다.In a sixth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IRAP having a RASL NAL unit type is referred to as a clean random access (CRA) NAL unit type.

제4 양태에 따른 디코딩 디바이스의 제7 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, RASL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RASL로 지정된다.In a seventh implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, an IRAP having a RASL NAL unit type is designated as IRAP_W_RASL.

제4 양태에 따른 디코딩 디바이스의 제8 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RASL 지정은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 3에 대응한다.In an eighth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IRAP_W_RASL designation is: Stream Access Point (SAP) type in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol corresponds to 3.

제4 양태에 따른 디코딩 디바이스의 제9 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 하나 이상의 RADL 픽처 및 0개의 RADL 픽처들이 뒤따르는 IRAP 픽처에 대해서 RADL NAL 유닛 유형을 갖는 IRAP가 선택된다.In a ninth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, having a RADL NAL unit type for an IRAP picture followed by one or more RADL pictures and zero RADL pictures in decoding order IRAP is selected.

제4 양태에 따른 디코딩 디바이스의 제10 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 RADL 픽처를 갖는 순간 디코더 리프레시(IDR; instantaneous decoder refresh) 픽처라고 지칭된다.In a tenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

제4 양태에 따른 디코딩 디바이스의 제11 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 RADL NAL 유닛 유형을 갖는 순간 디코더 리프레시(IDR)라고 지칭된다.In the eleventh implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IRAP with the RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with the RADL NAL unit type.

제4 양태에 따른 디코딩 디바이스의 제12 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, RADL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RADL로 지정된다.In a twelfth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, an IRAP having a RADL NAL unit type is designated as IRAP_W_RADL.

제4 양태에 따른 디코딩 디바이스의 제13 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, IRAP_W_RADL은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 2에 대응한다.In a thirteenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, IRAP_W_RADL is: Stream Access Point (SAP) Type 2 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol respond to

제4 양태에 따른 디코딩 디바이스의 제14 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 디코딩 순서에서 선두 픽처가 뒤따르지 않는 IRAP 픽처에 대해 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP가 선택된다.In a fourteenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, an IRAP having no leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order do.

제4 양태에 따른 디코딩 디바이스의 제15 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, IRAP 픽처는 선두 픽처가 없는 순간 디코더 리프레시(IDR; instantaneous decoder refresh) 픽처라고 지칭된다.In a fifteenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

제4 양태에 따른 디코딩 디바이스의 제16 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 선두 픽처들 NAL 유닛 유형이 없는 순간 디코더 리프레시(IDR)라고 지칭된다.In a sixteenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, IRAP without leading pictures NAL unit type is an instantaneous decoder refresh (IDR) without leading pictures NAL unit type. is referred to as

제4 양태에 따른 디코딩 디바이스의 제17 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 IRAP_N_LP로 지정된다.In a seventeenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, an IRAP having no lead pictures NAL unit type is designated as IRAP_N_LP.

제4 양태에 따른 디코딩 디바이스의 제18 구현 형태 또는 제4 양태의 임의의 선행하는 구현 형태에서, IIRAP_N_LP 지정은, 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 1에 대응한다.In an eighteenth implementation form of the decoding device according to the fourth aspect or any preceding implementation form of the fourth aspect, the IIRAP_N_LP designation is: Stream Access Point (SAP) type in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol corresponds to 1.

제5 양태는 비디오 인코더에 의해 구현된 비디오 비트스트림을 인코딩하는 방법에 관한 것이다. 이 방법은, 비디오 인코더의 프로세서에 의해, 인트라 랜덤 액세스 포인트(IRAP) 픽처와 연관된 비-IRAP 픽처에 대해 NAL 유닛을 포함하는 비트스트림을 생성하는 단계; 비-인트라 랜덤 액세스 포인트(비-IRAP) 픽처에 대한 NAL 유닛이 랜덤 액세스 디코딩가능 선두(random access decodable leading; RADL) 픽처를 포함할 때, 비디오 인코더의 프로세서에 의해, 비트스트림 내의 제1 플래그를 제1 값으로 설정하는 단계; 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 스킵형 선두(RASL) 픽처를 포함할 때, 비디오 인코더의 프로세서에 의해, 비트스트림 내의 제2 플래그를 제1 값으로 설정하는 단계; 및 비디오 인코더의 전송기에 의해, 비디오 비트스트림을 비디오 디코더를 향해 전송하는 단계를 포함한다.A fifth aspect relates to a method of encoding a video bitstream implemented by a video encoder. The method includes, by a processor of a video encoder, generating, for a non-IRAP picture associated with an intra random access point (IRAP) picture, a bitstream comprising NAL units; When the NAL unit for a non-intra random access point (non-IRAP) picture contains a random access decodable leading (RADL) picture, by the processor of the video encoder, a first flag in the bitstream is set setting to a first value; when the NAL unit for the non-IRAP picture includes a random access skipping leading (RASL) picture, setting, by a processor of the video encoder, a second flag in the bitstream to a first value; and sending, by the transmitter of the video encoder, the video bitstream towards the video decoder.

이 인코딩하는 방법은 비-IRAP 픽처들이 NAL 유닛 유형에 의해 식별되지 않는 경우에 대한 기술들을 제공한다. 이러한 경우, 비트스트림 내의 플래그들은 IRAP 픽처가 RADL 픽처 또는 RASL 픽처와 연관되어 있는지의 여부를 나타내는 특정한 값으로 설정된다.This encoding method provides techniques for the case where non-IRAP pictures are not identified by NAL unit type. In this case, the flags in the bitstream are set to a specific value indicating whether the IRAP picture is associated with a RADL picture or a RASL picture.

제5 양태에 따른 방법의 제1 구현 형태에서, 제1 플래그는 RadlPictureFlag로 지정되고 제2 플래그는 RaslPictureFlag로 지정된다.In a first implementation form of the method according to the fifth aspect, the first flag is designated as RadlPictureFlag and the second flag is designated as RaslPictureFlag.

제5 양태에 따른 방법의 제2 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제1 값은 일(1)이다.In a second implementation of the method according to the fifth aspect or any preceding implementation of the fifth aspect, the first value is one (1).

제5 양태에 따른 방법의 제3 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 비-IRAP 픽처는 선두 픽처를 포함한다.In a third implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, the non-IRAP picture comprises a leading picture.

제5 양태에 따른 방법의 제4 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 비-IRAP 픽처는 후미 픽처를 포함한다.In a fourth implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, the non-IRAP picture comprises a trailing picture.

제5 양태에 따른 방법의 제5 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제1 플래그는, 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다.In a fifth implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, the first flag is: when a picture order count (POC) value of a non-IRAP picture is less than a POC value of an IRAP picture It is set equal to the first value.

제5 양태에 따른 방법의 제6 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제1 플래그는, 비-IRAP 픽처에 대한 각각의 참조 픽처 목록이 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RADL 픽처 이외의 어떠한 픽처도 포함하지 않을 때 제1 값과 동일하게 설정된다.In a sixth implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, the first flag indicates that each reference picture list for a non-IRAP picture is an IRAP picture associated with a non-IRAP picture or It is set equal to the first value when it does not include any pictures other than another RADL picture associated with the IRAP picture.

제5 양태에 따른 방법의 제7 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제2 플래그는 비-IRAP 픽처의 POC 값이 IRAP 픽처의 POC(picture order count) 값보다 작을 때 제1 값과 동일하게 설정된다.In a seventh implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, the second flag is the second flag when the POC value of the non-IRAP picture is less than the picture order count (POC) value of the IRAP picture. It is set equal to 1 value.

제5 양태에 따른 방법의 제8 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제2 플래그는, 비-IRAP 픽처에 대한 참조 픽처 목록이 디코딩 순서에서 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RASL 픽처에 선행하는 적어도 하나의 참조 픽처를 포함할 때 제1 값과 동일하게 설정된다.In an eighth implementation form of the method according to the fifth aspect or any preceding implementation aspect of the fifth aspect, the second flag indicates that the reference picture list for the non-IRAP picture is an IRAP picture associated with the non-IRAP picture in decoding order. Or, when including at least one reference picture preceding another RASL picture associated with the IRAP picture, it is set equal to the first value.

제5 양태에 따른 방법의 제9 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제1 플래그 및 제2 플래그를, 비-IRAP 픽처에 대한 NAL 유닛이 RADL 픽처 또는 RASL 픽처를 포함하지 않는다는 것을 나타내는 제2 값으로 설정하는 단계를 더 포함한다.In a ninth implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, the first flag and the second flag are configured such that the NAL unit for a non-IRAP picture does not contain a RADL picture or a RASL picture. and setting it to a second value indicating that it is not.

제5 양태에 따른 방법의 제10 구현 형태 또는 제5 양태의 임의의 선행하는 구현 형태에서, 제1 플래그 및 제2 플래그는 양쪽 모두 비-IRAP 픽처에 대해 제1 값으로 설정되진 않는다.In a tenth implementation form of the method according to the fifth aspect or any preceding implementation form of the fifth aspect, neither the first flag nor the second flag are set to the first value for the non-IRAP picture.

제6 양태는 비디오 디코더에 의해 구현된 비디오 비트스트림을 디코딩하는 방법에 관한 것이다. 이 방법은, 비디오 디코더의 수신기에 의해, 인트라 랜덤 액세스 포인트(IRAP) 픽처와 연관된 비인트라 랜덤 액세스 포인트(non-IRAP) 픽처에 대한 제1 플래그, 제2 플래그 및 NAL 유닛을 포함하는 코딩된 비디오 비트스트림을 수신하는 단계; 비디오 디코더의 프로세서에 의해, 비트스트림 내의 제1 플래그가 제1 값으로 설정되었을 때 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 디코딩가능 선두(random access decodable leading; RADL) 픽처를 포함한다고 결정하는 단계; 비디오 디코더의 프로세서에 의해, 비트스트림 내의 제2 플래그가 제1 값으로 설정되었을 때 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 스킵형 선두(RASL) 픽처를 포함한다고 결정하는 단계; 및 비디오 디코더의 프로세서에 의해, 제1 값을 갖는 제1 플래그 또는 제2 플래그에 기초하여 NAL 유닛에 포함된 픽처들에 대한 프리젠테이션 순서를 할당하고 할당된 프리젠테이션 순서에 기초하여 NAL 유닛을 디코딩하는 단계를 포함한다.A sixth aspect relates to a method of decoding a video bitstream implemented by a video decoder. The method comprises, by a receiver of a video decoder, a coded video comprising a first flag, a second flag and a NAL unit for a non-intra random access point (non-IRAP) picture associated with an intra random access point (IRAP) picture. receiving a bitstream; determining, by a processor of the video decoder, that the NAL unit for the non-IRAP picture contains a random access decodable leading (RADL) picture when a first flag in the bitstream is set to a first value; ; determining, by a processor of the video decoder, that the NAL unit for the non-IRAP picture includes a random access skip-type lead (RASL) picture when a second flag in the bitstream is set to a first value; and assign, by the processor of the video decoder, a presentation order for pictures included in the NAL unit based on the first flag or the second flag having the first value, and decode the NAL unit based on the assigned presentation order. including the steps of

이 디코딩하는 방법은 비-IRAP 픽처들이 NAL 유닛 유형에 의해 식별되지 않는 경우에 대한 기술들을 제공한다. 이러한 경우, 비트스트림 내의 플래그들은 IRAP 픽처가 RADL 픽처 또는 RASL 픽처와 연관되어 있는지의 여부를 나타내는 특정한 값으로 설정된다.This decoding method provides techniques for the case where non-IRAP pictures are not identified by NAL unit type. In this case, the flags in the bitstream are set to a specific value indicating whether the IRAP picture is associated with a RADL picture or a RASL picture.

제6 양태에 따른 방법의 제1 구현 형태에서, 제1 플래그는 RadlPictureFlag로 지정되고 제2 플래그는 RaslPictureFlag로 지정된다.In a first implementation form of the method according to the sixth aspect, the first flag is designated as RadlPictureFlag and the second flag is designated as RaslPictureFlag.

제6 양태에 따른 방법의 제2 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제1 값은 일(1)이다.In a second implementation of the method according to the sixth aspect or any preceding implementation of the sixth aspect, the first value is one (1).

제6 양태에 따른 방법의 제3 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 비-IRAP 픽처는 선두 픽처를 포함한다.In a third implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the non-IRAP picture comprises a leading picture.

제6 양태에 따른 방법의 제4 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 비-IRAP 픽처는 후미 픽처를 포함한다.In a fourth implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the non-IRAP picture comprises a trailing picture.

제6 양태에 따른 방법의 제5 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제1 플래그는, 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다.In a fifth implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the first flag is: when a picture order count (POC) value of a non-IRAP picture is less than a POC value of an IRAP picture It is set equal to the first value.

제6 양태에 따른 방법의 제6 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제1 플래그는, 비-IRAP 픽처에 대한 각각의 참조 픽처 목록이 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RADL 픽처 이외의 어떠한 픽처도 포함하지 않을 때 제1 값과 동일하게 설정된다.In a sixth implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the first flag indicates that each reference picture list for a non-IRAP picture is an IRAP picture associated with a non-IRAP picture or It is set equal to the first value when it does not include any pictures other than another RADL picture associated with the IRAP picture.

제6 양태에 따른 방법의 제7 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제2 플래그는 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다.In a seventh implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the second flag is the second flag when the picture order count (POC) value of the non-IRAP picture is less than the POC value of the IRAP picture. It is set equal to 1 value.

제6 양태에 따른 방법의 제8 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제2 플래그는, 비-IRAP 픽처에 대한 참조 픽처 목록이 디코딩 순서에서 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RASL 픽처에 선행하는 적어도 하나의 참조 픽처를 포함할 때 제1 값과 동일하게 설정된다.In an eighth implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the second flag indicates that the reference picture list for the non-IRAP picture is an IRAP picture associated with the non-IRAP picture in decoding order. Or, when including at least one reference picture preceding another RASL picture associated with the IRAP picture, it is set equal to the first value.

제6 양태에 따른 방법의 제9 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제1 플래그 및 제2 플래그를, 비-IRAP 픽처에 대한 NAL 유닛이 RADL 픽처 또는 RASL 픽처를 포함하지 않는다는 것을 나타내는 제2 값으로 설정하는 단계를 더 포함한다.In a ninth implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, the first flag and the second flag are configured such that the NAL unit for a non-IRAP picture does not contain a RADL picture or a RASL picture. and setting it to a second value indicating that it is not.

제6 양태에 따른 방법의 제10 구현 형태 또는 제6 양태의 임의의 선행하는 구현 형태에서, 제1 플래그 및 제2 플래그는 양쪽 모두 비-IRAP 픽처에 대해 제1 값으로 설정되진 않는다.In a tenth implementation form of the method according to the sixth aspect or any preceding implementation form of the sixth aspect, neither the first flag nor the second flag are set to the first value for the non-IRAP picture.

제7 양태는 코딩 장치에 관한 것이다. 코딩 장치는, 디코딩할 비트스트림을 수신하도록 구성된 수신기; 수신기에 결합되고, 디코딩된 이미지를 디스플레이에 전송하도록 구성된 전송기; 수신기 또는 전송기 중 적어도 하나에 결합되고, 명령어들을 저장하도록 구성된 메모리; 및 메모리에 결합된 프로세서를 포함하고, 프로세서는 여기서 개시된 임의의 실시예의 방법을 수행하기 위해 메모리에 저장된 명령어들을 실행하도록 구성된다.A seventh aspect relates to a coding apparatus. A coding apparatus includes: a receiver configured to receive a bitstream to be decoded; a transmitter coupled to the receiver and configured to transmit the decoded image to a display; a memory coupled to at least one of the receiver or the transmitter and configured to store instructions; and a processor coupled to the memory, wherein the processor is configured to execute instructions stored in the memory to perform the method of any embodiment disclosed herein.

코딩 장치는, 비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제한하는 (예를 들어, NAL 유닛 유형의 수를 4개로 제한하는) 기술들을 제공한다. 이것은 선두 및 후미 픽처들(일명, 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. NAL 유닛 유형의 세트를 제약함으로써, 비디오 코딩에서의 코더/디코더(일명 "코덱")는 현재의 코덱들에 비해 개선된다(예를 들어, 더 적은 비트 이용하고, 더 적은 대역폭을 요구하며, 더 효율적이다 등). 실질적으로, 개선된 비디오 코딩 프로세스는, 비디오를 전송, 수신 및/또는 시청할 때 사용자에게 더 양호한 사용자 경험을 제공한다.The coding apparatus provides techniques for limiting the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, limiting the number of NAL unit types to 4). This allows leading and trailing pictures (aka, non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. By constraining the set of NAL unit types, the coder/decoder (aka “codec”) in video coding is improved over current codecs (eg, uses fewer bits, requires less bandwidth, and is more efficient, etc.). In practice, the improved video coding process provides a better user experience for the user when transmitting, receiving and/or watching video.

코딩 장치는 또한, 비-IRAP 픽처들이 NAL 유닛 유형에 의해 식별되지 않는 경우에 대한 기술들을 제공한다. 이러한 경우, 비트스트림 내의 플래그들은 IRAP 픽처가 RADL 픽처 또는 RASL 픽처와 연관되어 있는지의 여부를 나타내는 특정한 값으로 설정된다.The coding apparatus also provides techniques for the case where non-IRAP pictures are not identified by NAL unit type. In this case, the flags in the bitstream are set to a specific value indicating whether the IRAP picture is associated with a RADL picture or a RASL picture.

제8 양태는 시스템에 관한 것이다. 이 시스템은, 인코더; 및 상기 인코더와 통신하는 디코더를 포함하고, 인코더 또는 디코더는 여기서 개시된 디코딩 디바이스, 인코딩 디바이스, 또는 코딩 장치를 포함한다.An eighth aspect relates to a system. The system includes an encoder; and a decoder in communication with the encoder, wherein the encoder or decoder comprises a decoding device, encoding device, or coding apparatus disclosed herein.

이 시스템은, 비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제한하는 (예를 들어, NAL 유닛 유형의 수를 4개로 제한하는) 기술들을 제공한다. 이것은 선두 및 후미 픽처들(일명, 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. NAL 유닛 유형의 세트를 제약함으로써, 비디오 코딩에서의 코더/디코더(일명 "코덱")는 현재의 코덱들에 비해 개선된다(예를 들어, 더 적은 비트 이용하고, 더 적은 대역폭을 요구하며, 더 효율적이다 등). 실질적으로, 개선된 비디오 코딩 프로세스는, 비디오를 전송, 수신 및/또는 시청할 때 사용자에게 더 양호한 사용자 경험을 제공한다.This system provides techniques for limiting the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, limiting the number of NAL unit types to 4). This allows leading and trailing pictures (aka, non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. By constraining the set of NAL unit types, the coder/decoder (aka “codec”) in video coding is improved over current codecs (eg, uses fewer bits, requires less bandwidth, and is more efficient, etc.). In practice, the improved video coding process provides a better user experience for the user when transmitting, receiving and/or watching video.

이 시스템은 또한, 비-IRAP 픽처들이 NAL 유닛 유형에 의해 식별되지 않는 경우에 대한 기술들을 제공한다. 이러한 경우, 비트스트림 내의 플래그들은 IRAP 픽처가 RADL 픽처 또는 RASL 픽처와 연관되어 있는지의 여부를 나타내는 특정한 값으로 설정된다.The system also provides techniques for the case where non-IRAP pictures are not identified by NAL unit type. In this case, the flags in the bitstream are set to a specific value indicating whether the IRAP picture is associated with a RADL picture or a RASL picture.

제9 양태는 코딩을 위한 수단에 관한 것이다. 코딩을 위한 수단은, 디코딩할 비트스트림을 수신하도록 구성된 수신 수단; 수신 수단에 결합되고, 디코딩된 이미지를 디스플레이 수단에 전송하도록 구성된 전송 수단; 수신 수단 또는 전송 수단 중 적어도 하나에 결합되고, 명령어들을 저장하도록 구성된 저장 수단; 및 저장 수단에 결합된 처리 수단을 포함하고, 처리 수단은 여기서 개시된 방법들을 수행하기 위해 저장 수단에 저장된 명령어들을 실행하도록 구성된다.A ninth aspect relates to means for coding. The means for coding comprises: receiving means configured to receive a bitstream to be decoded; transmitting means coupled to the receiving means and configured to transmit the decoded image to the display means; storage means coupled to at least one of the receiving means or the transmitting means and configured to store instructions; and processing means coupled to the storage means, wherein the processing means is configured to execute the instructions stored in the storage means to perform the methods disclosed herein.

코딩을 위한 수단은, 비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제한하는 (예를 들어, NAL 유닛 유형의 수를 4개로 제한하는) 기술들을 제공한다. 이것은 선두 및 후미 픽처들(일명, 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. NAL 유닛 유형의 세트를 제약함으로써, 비디오 코딩에서의 코더/디코더(일명 "코덱")는 현재의 코덱들에 비해 개선된다(예를 들어, 더 적은 비트 이용하고, 더 적은 대역폭을 요구하며, 더 효율적이다 등). 실질적으로, 개선된 비디오 코딩 프로세스는, 비디오를 전송, 수신 및/또는 시청할 때 사용자에게 더 양호한 사용자 경험을 제공한다.Means for coding provides techniques for limiting the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, limiting the number of NAL unit types to 4). This allows leading and trailing pictures (aka, non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. By constraining the set of NAL unit types, the coder/decoder (aka “codec”) in video coding is improved over current codecs (eg, uses fewer bits, requires less bandwidth, and is more efficient, etc.). In practice, the improved video coding process provides a better user experience for the user when transmitting, receiving and/or watching video.

코딩을 위한 수단은 또한, 비-IRAP 픽처들이 NAL 유닛 유형에 의해 식별되지 않는 경우에 대한 기술들을 제공한다. 이러한 경우, 비트스트림 내의 플래그들은 IRAP 픽처가 RADL 픽처 또는 RASL 픽처와 연관되어 있는지의 여부를 나타내는 특정한 값으로 설정된다.Means for coding also provides techniques for the case where non-IRAP pictures are not identified by NAL unit type. In this case, the flags in the bitstream are set to a specific value indicating whether the IRAP picture is associated with a RADL picture or a RASL picture.

본 개시내용의 더 완전한 이해를 위해, 이제, 유사한 참조 번호가 유사한 부분을 나타내는 첨부된 도면에 관한 이하의 간략한 설명과 상세한 설명을 참조한다.
도 1은 양방향 예측 기술들을 이용할 수 있는 한 예시적인 코딩 시스템을 나타내는 블록도이다.
도 2는 양방향 예측 기술들을 구현할 수 있는 한 예시적인 비디오 인코더를 나타내는 블록도이다.
도 3은 양방향 예측 기술들을 구현할 수 있는 비디오 디코더의 한 예를 나타내는 블록도이다.
도 4는 비디오 비트스트림의 한 실시예의 개략도이다.
도 5는 디코딩 순서 및 프리젠테이션 순서에서 선두 픽처들 및 후미 픽처들에 관한 I-RAP 픽처 사이의 관계의 표현이다.
도 6은 비디오 비트스트림을 인코딩하는 방법의 한 실시예이다.
도 7은 코딩된 비디오 비트스트림을 디코딩하는 방법의 한 실시예이다.
도 8은 비디오 비트스트림을 인코딩하는 방법의 한 실시예이다.
도 9는 코딩된 비디오 비트스트림을 디코딩하는 방법의 한 실시예이다.
도 10은 비디오 코딩 디바이스의 개략도이다.
도 11은 코딩을 위한 수단의 한 실시예의 개략도이다.For a more complete understanding of the present disclosure, reference is now made to the following brief description and detailed description directed to the accompanying drawings in which like reference numerals indicate like parts.
1 is a block diagram illustrating an example coding system that may utilize bi-directional prediction techniques.
2 is a block diagram illustrating an example video encoder that may implement bidirectional prediction techniques.
3 is a block diagram illustrating an example of a video decoder that may implement bidirectional prediction techniques.
4 is a schematic diagram of one embodiment of a video bitstream;
5 is a representation of the relationship between the I-RAP picture with respect to leading pictures and trailing pictures in decoding order and presentation order.
6 is an embodiment of a method for encoding a video bitstream.
7 is an embodiment of a method for decoding a coded video bitstream.
8 is an embodiment of a method for encoding a video bitstream.
9 is an embodiment of a method for decoding a coded video bitstream.
10 is a schematic diagram of a video coding device;
11 is a schematic diagram of one embodiment of a means for coding;

다음은 여기서 이용되는 다양한 약어이다: 코딩 트리 블록(Coding Tree Block; CTB), 코딩 트리 유닛(Coding Tree Unit; CTU), 코딩 유닛(Coding Unit; CU), 코딩된 비디오 시퀀스(Coded Video Sequence; CVS), 공동 비디오 전문가 팀(Joint Video Experts Team; JVET), 움직임 제약된 타일 세트(Motion-Constrained Tile Set; MCTS), 최대 전송 단위(Maximum Transfer Unit; MTU) , 네트워크 추상화 계층(Network Abstraction Layer; NAL), 픽처 오더 카운터(Picture Order Count; POC), 픽처 파라미터 세트(Picture Parameter Set; PPS), 원시 바이트 시퀀스 페이로드(Raw Byte Sequence Payload; RBSP), 시퀀스 파라미터 세트(Sequence Parameter Set; SPS), 다용도 비디오 코딩(Versatile Video Coding; VVC) 및 규격 초안(Working Draft; WD).The following are various abbreviations used herein: Coding Tree Block (CTB), Coding Tree Unit (CTU), Coding Unit (CU), Coded Video Sequence (CVS) ), Joint Video Experts Team (JVET), Motion-Constrained Tile Set (MCTS), Maximum Transfer Unit (MTU), Network Abstraction Layer (NAL) ), Picture Order Count (POC), Picture Parameter Set (PPS), Raw Byte Sequence Payload (RBSP), Sequence Parameter Set (SPS), Versatile Versatile Video Coding (VVC) and Working Draft (WD).

도 1은 본 명세서에 설명된 비디오 코딩 기술들을 이용할 수 있는 예시적인 코딩 시스템(10)을 나타내는 블록도이다. 도 1에 도시된 바와 같이, 코딩 시스템(10)은 목적지 디바이스(14)에 의해 나중에 디코딩될 인코딩된 비디오 데이터를 제공하는 소스 디바이스(12)를 포함한다. 특히, 소스 디바이스(12)는 컴퓨터 판독가능한 매체(16)를 통해 목적지 디바이스(14)에 비디오 데이터를 제공할 수 있다. 소스 디바이스(12) 및 목적지 디바이스(14)는, 데스크탑 컴퓨터들, 노트북(예를 들어, 랩탑) 컴퓨터들, 태블릿 컴퓨터들, 셋탑 박스들, 소위 "스마트" 폰 등의 전화 핸드셋들, 소위 "스마트" 패드, 텔레비전들, 카메라들, 디스플레이 디바이스들, 디지털 미디어 플레이어들, 비디오 게임 콘솔들, 비디오 스트리밍 디바이스들 등을 포함하는, 임의의 광범위한 디바이스를 포함할 수 있다. 일부 경우에, 소스 디바이스(12) 및 목적지 디바이스(14)는 무선 통신을 위해 장착될 수 있다.1 is a block diagram illustrating an example coding system 10 that may utilize the video coding techniques described herein. As shown in FIG. 1 , the coding system 10 includes a source device 12 that provides encoded video data to be decoded later by a destination device 14 . In particular, source device 12 may provide video data to destination device 14 via computer readable medium 16 . Source device 12 and destination device 14 include desktop computers, notebook (eg, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, so-called “smart” phones. "can include any of a wide variety of devices, including pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, and the like. In some cases, source device 12 and destination device 14 may be equipped for wireless communication.

목적지 디바이스(14)는 컴퓨터 판독가능한 매체(16)를 통해 디코딩될 인코딩된 비디오 데이터를 수신할 수 있다. 컴퓨터 판독가능한 매체(16)는 인코딩된 비디오 데이터를 소스 디바이스(12)로부터 목적지 디바이스(14)로 이동시킬 수 있는 임의의 유형의 매체 또는 디바이스를 포함할 수 있다. 한 예에서, 컴퓨터 판독가능한 매체(16)는 소스 디바이스(12)가 인코딩된 비디오 데이터를 목적지 디바이스(14)에 실시간으로 직접 전송할 수 있게 하는 통신 매체를 포함할 수 있다. 인코딩된 비디오 데이터는 무선 통신 프로토콜 등의 통신 표준에 따라 변조되어 목적지 디바이스(14)에 전송될 수 있다. 통신 매체는, 무선 주파수(RF) 스펙트럼 또는 하나 이상의 물리적 전송 라인 등의, 임의의 무선 또는 유선 통신 매체를 포함할 수 있다. 통신 매체는, 로컬 영역 네트워크, 광역 네트워크 또는 인터넷 등의 글로벌 네트워크 등의, 패킷 기반의 네트워크의 일부를 형성할 수 있다. 통신 매체는, 라우터들, 스위치들, 기지국들, 또는 소스 디바이스(12)로부터 목적지 디바이스(14)로의 통신을 용이화하는데 유용할 수 있는 기타 임의의 장비를 포함할 수 있다.Destination device 14 may receive encoded video data to be decoded via computer readable medium 16 . Computer-readable medium 16 may include any tangible medium or device capable of moving encoded video data from source device 12 to destination device 14 . In one example, computer-readable medium 16 may comprise a communication medium that enables source device 12 to transmit encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard such as a wireless communication protocol and transmitted to the destination device 14 . Communication media may include any wireless or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. Communication media may include routers, switches, base stations, or any other equipment that may be useful in facilitating communication from source device 12 to destination device 14 .

일부 예에서, 인코딩된 데이터는 출력 인터페이스(22)로부터 저장 디바이스로 출력될 수 있다. 유사하게, 인코딩된 데이터는 입력 인터페이스에 의해 저장 디바이스로부터 액세스될 수 있다. 저장 디바이스는, 하드 드라이브, Blu-ray 디스크들, 디지털 비디오 디스크(DVD), CD-ROM들(Compact Disc Read-Only Memories), 플래시 메모리, 휘발성 또는 비휘발성 메모리, 또는 인코딩된 비디오 데이터를 저장하기 위한 임의의 다른 적절한 디지털 저장 매체 등의 다양한 분산형 또는 로컬 액세스형 데이터 저장 매체 중 임의의 것을 포함할 수 있다. 추가 예에서, 저장 디바이스는, 소스 디바이스(12)에 의해 생성된 인코딩된 비디오를 저장할 수 있는 파일 서버 또는 다른 중간 저장 디바이스에 대응할 수 있다. 목적지 디바이스(14)는 스트리밍 또는 다운로드를 통해 저장 디바이스로부터 저장된 비디오 데이터에 액세스할 수 있다. 파일 서버는, 인코딩된 비디오 데이터를 저장하고 그 인코딩된 비디오 데이터를 목적지 디바이스(14)에 전송할 수 있는 임의의 유형의 서버일 수 있다. 예시적인 파일 서버는, 웹 서버(예를 들어, 웹 사이트용), 파일 전송 프로토콜(file transfer protocol; FTP) 서버, NAS(Network Attached Storage) 디바이스들, 또는 로컬 디스크 드라이브를 포함한다. 목적지 디바이스(14)는, 인터넷 접속을 포함한, 임의의 표준 데이터 접속을 통해 인코딩된 비디오 데이터에 액세스할 수 있다. 이것은, 무선 채널(예를 들어, Wi-Fi 접속), 유선 접속(예를 들어, 디지털 가입자 회선(DSL), 케이블 모뎀 등), 또는 파일 서버에 저장된 인코딩된 비디오 데이터에 액세스하는데 적합한 이들의 조합을 포함될 수 있다. 저장 디바이스로부터의 인코딩된 비디오 데이터의 전송은, 스트리밍 전송, 다운로드 전송, 또는 이들의 조합일 수 있다.In some examples, encoded data may be output from output interface 22 to a storage device. Similarly, encoded data can be accessed from a storage device by an input interface. A storage device may include a hard drive, Blu-ray Discs, Digital Video Disc (DVD), Compact Disc Read-Only Memories (CD-ROMs), flash memory, volatile or non-volatile memory, or for storing encoded video data. may include any of a variety of distributed or locally accessible data storage media, such as any other suitable digital storage media for In a further example, the storage device may correspond to a file server or other intermediate storage device capable of storing the encoded video generated by the source device 12 . The destination device 14 may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 14 . Exemplary file servers include a web server (eg, for a web site), a file transfer protocol (FTP) server, Network Attached Storage (NAS) devices, or a local disk drive. Destination device 14 may access the encoded video data via any standard data connection, including an Internet connection. This may be a wireless channel (eg, Wi-Fi connection), a wired connection (eg, digital subscriber line (DSL), cable modem, etc.), or any combination thereof suitable for accessing encoded video data stored on a file server. may be included. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

본 개시내용의 기술들은 반드시 무선 응용 또는 환경으로 제한되는 것은 아니다. 본 기술들은, 공중파 텔레비전 방송들, 케이블 텔레비전 전송들, 위성 텔레비전 전송들, http를 통한 동적 적응적 스트리밍(dynamic adaptive streaming over http; DASH) 등의 인터넷 스트리밍 비디오 전송들, 데이터 저장 매체에 인코딩된 디지털 비디오, 데이터 저장 매체에 저장된 디지털 비디오의 디코딩 또는 기타의 응용 등의 임의의 다양한 멀티미디어 애플리케이션을 지원하는 비디오 코딩에 적용될 수 있다. 일부 예에서, 코딩 시스템(10)은, 비디오 스트리밍, 비디오 재생, 비디오 브로드캐스팅, 및/또는 비디오 전화 등의 응용들을 지원하기 위해 단방향 또는 양방향 비디오 전송을 지원하도록 구성될 수 있다.The techniques of this disclosure are not necessarily limited to wireless applications or environments. The techniques are applicable to over-the-air television broadcasts, cable television transmissions, satellite television transmissions, Internet streaming video transmissions such as dynamic adaptive streaming over http (DASH), digital encoded in a data storage medium. It can be applied to video coding to support any of a variety of multimedia applications, such as video, decoding of digital video stored on data storage media, or other applications. In some examples, coding system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

도 1의 예에서, 소스 디바이스(12)는, 비디오 소스(18), 비디오 인코더(20), 및 출력 인터페이스(22)를 포함한다. 목적지 디바이스(14)는, 입력 인터페이스(28), 비디오 디코더(30), 및 디스플레이 디바이스(32)를 포함한다. 본 개시내용에 따르면, 소스 디바이스(12)의 비디오 인코더(20) 및/또는 목적지 디바이스(14)의 비디오 디코더(30)는 비디오 코딩을 위한 기술들을 적용하도록 구성될 수 있다. 다른 예들에서, 소스 디바이스 및 목적지 디바이스는 기타의 컴포넌트들 또는 구성들을 포함할 수 있다. 예를 들어, 소스 디바이스(12)는, 외부 카메라 등의 외부 비디오 소스로부터 비디오 데이터를 수신할 수 있다. 마찬가지로, 목적지 디바이스(14)는, 통합된 디스플레이 디바이스를 포함하는 것이 아니라, 외부 디스플레이 디바이스와 인터페이스할 수 있다.In the example of FIG. 1 , source device 12 includes a video source 18 , a video encoder 20 , and an output interface 22 . The destination device 14 includes an input interface 28 , a video decoder 30 , and a display device 32 . According to this disclosure, video encoder 20 of source device 12 and/or video decoder 30 of destination device 14 may be configured to apply techniques for video coding. In other examples, the source device and destination device may include other components or configurations. For example, source device 12 may receive video data from an external video source, such as an external camera. Likewise, destination device 14 may interface with an external display device, but not include an integrated display device.

도 1의 예시된 코딩 시스템(10)은 단지 하나의 예일 뿐이다. 비디오 코딩을 위한 기술들은, 임의의 디지털 비디오 인코딩 및/또는 디코딩 디바이스에 의해 수행될 수 있다. 본 개시내용의 기술들은 일반적으로 비디오 코딩 디바이스에 의해 수행되지만, 이 기술들은 전형적으로 "CODEC"이라고 지칭되는 비디오 인코더/디코더에 의해 수행될 수도 있다. 더욱이, 본 개시내용의 기술들은 또한 비디오 전처리기에 의해 수행될 수 있다. 비디오 인코더 및/또는 디코더는 그래픽 처리 유닛(GPU) 또는 유사한 디바이스일 수 있다.The illustrated coding system 10 of FIG. 1 is only one example. Techniques for video coding may be performed by any digital video encoding and/or decoding device. Although the techniques of this disclosure are generally performed by a video coding device, the techniques may be performed by a video encoder/decoder, typically referred to as a “CODEC”. Moreover, the techniques of this disclosure may also be performed by a video preprocessor. The video encoder and/or decoder may be a graphics processing unit (GPU) or similar device.

소스 디바이스(12) 및 목적지 디바이스(14)는, 소스 디바이스(12)가 목적지 디바이스(14)로의 전송을 위한 코딩된 비디오 데이터를 생성하는 이러한 코딩 디바이스들의 예일 뿐이다. 일부 예에서, 소스 디바이스(12) 및 목적지 디바이스(14)는, 소스 및 목적지 디바이스들(12, 14) 각각이 비디오 인코딩 및 디코딩 컴포넌트들을 포함하도록 실질적으로 대칭적인 방식으로 동작할 수 있다. 따라서, 코딩 시스템(10)은, 예를 들어, 비디오 스트리밍, 비디오 재생, 비디오 브로드캐스팅, 또는 비디오 전화를 위해, 비디오 디바이스들(12, 14) 사이의 단방향 또는 양방향 비디오 전송을 지원할 수 있다.Source device 12 and destination device 14 are merely examples of such coding devices in which source device 12 generates coded video data for transmission to destination device 14 . In some examples, source device 12 and destination device 14 may operate in a substantially symmetrical manner such that source and destination devices 12 , 14 each include video encoding and decoding components. Accordingly, coding system 10 may support one-way or two-way video transmission between video devices 12 , 14 , for example, for video streaming, video playback, video broadcasting, or video telephony.

소스 디바이스(12)의 비디오 소스(18)는, 비디오 카메라 등의 비디오 캡처 디바이스, 이전에 캡처된 비디오를 포함하는 비디오 아카이브, 및/또는 비디오 콘텐츠 제공자로부터 비디오를 수신하는 비디오 피드 인터페이스를 포함할 수 있다. 추가의 대안으로서, 비디오 소스(18)는, 소스 비디오로서의 컴퓨터 그래픽 기반의 데이터, 또는 라이브 비디오, 아카이브된 비디오 및 컴퓨터 생성된 비디오의 조합을 생성할 수 있다.The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video archive containing previously captured video, and/or a video feed interface that receives video from a video content provider. have. As a further alternative, video source 18 may generate computer graphics-based data as the source video, or a combination of live video, archived video, and computer-generated video.

일부 경우에, 비디오 소스(18)가 비디오 카메라일 때, 소스 디바이스(12) 및 목적지 디바이스(14)는 소위 카메라 폰들 또는 비디오 폰들을 형성할 수 있다. 앞서 언급된 바와 같이, 그러나, 본 개시내용에서 설명된 기술들은 일반적으로 비디오 코딩에 적용될 수 있고, 무선 및/또는 유선 응용들에 적용될 수 있다. 각각의 경우에, 캡처되거나, 미리 캡처되거나, 또는 컴퓨터 생성된 비디오는, 비디오 인코더(20)에 의해 인코딩될 수 있다. 그 다음, 인코딩된 비디오 정보는 출력 인터페이스(22)에 의해 컴퓨터 판독가능한 매체(16)로 출력될 수 있다.In some cases, when video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applied to video coding in general, and may be applied to wireless and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by the video encoder 20 . The encoded video information may then be output to a computer-readable medium 16 by an output interface 22 .

컴퓨터 판독가능한 매체(16)는, 무선 브로드캐스트 또는 유선 네트워크 전송 등의 일시적인 매체, 또는 하드 디스크, 플래시 드라이브, 컴팩트 디스크, 디지털 비디오 디스크, Blu-ray 디스크 등의 저장 매체(즉, 비일시적인 저장 매체), 또는 기타의 컴퓨터 판독가능한 매체를 포함할 수 있다. 일부 예에서, 네트워크 서버(미도시)는 소스 디바이스(12)로부터 인코딩된 비디오 데이터를 수신하고 인코딩된 비디오 데이터를 예를 들어 네트워크 전송을 통해 목적지 디바이스(14)에 제공할 수 있다. 유사하게, 디스크 스탬핑 설비 등의 매체 생산 설비의 컴퓨팅 디바이스는 소스 디바이스(12)로부터 인코딩된 비디오 데이터를 수신하고 인코딩된 비디오 데이터를 포함하는 디스크를 생성할 수 있다. 따라서, 컴퓨터 판독가능한 매체(16)는 다양한 예에서 다양한 형태의 하나 이상의 컴퓨터 판독가능한 매체를 포함하는 것으로 이해될 수 있다.The computer-readable medium 16 is a transitory medium such as a wireless broadcast or wired network transmission, or a storage medium such as a hard disk, flash drive, compact disk, digital video disk, Blu-ray disk, etc. (ie, non-transitory storage medium). ), or other computer readable media. In some examples, a network server (not shown) may receive the encoded video data from the source device 12 and provide the encoded video data to the destination device 14, for example via a network transmission. Similarly, a computing device of a media production facility, such as a disk stamping facility, may receive encoded video data from source device 12 and generate a disk comprising the encoded video data. Accordingly, computer readable media 16 may be understood to include one or more computer readable media in various forms in various instances.

목적지 디바이스(14)의 입력 인터페이스(28)는 컴퓨터 판독가능한 매체(16)로부터 정보를 수신한다. 컴퓨터 판독가능한 매체(16)의 정보는, 비디오 인코더(20)에 의해 정의되고, 비디오 디코더(30)에 의해 역시 이용되며, 된 신택스 정보를 포함할 수 있으며, 블록들 및 기타의 코딩된 유닛들, 예를 들어, 픽처 그룹(group of picture; GOP)의 특성 및/또는 처리를 기술하는 신택스 요소를 포함하는, 신택스 정보를 포함할 수 있다. 디스플레이 디바이스(32)는 디코딩된 비디오 데이터를 사용자에게 디스플레이하고, 음극선관(CRT), 액정 디스플레이(LCD), 플라즈마 디스플레이, 유기 발광 다이오드(OLED) 디스플레이, 또는 다른 유형의 디스플레이 디바이스 등의, 임의의 다양한 디스플레이 디바이스를 포함할 수 있다.The input interface 28 of the destination device 14 receives information from the computer readable medium 16 . The information in the computer readable medium 16 is defined by the video encoder 20 and also used by the video decoder 30 , and may include syntax information in blocks and other coded units. , for example, including syntax elements describing characteristics and/or processing of a group of pictures (GOP). The display device 32 displays the decoded video data to a user, and can be configured to be any display device, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma display, organic light emitting diode (OLED) display, or other type of display device. It may include various display devices.

비디오 인코더(20) 및 비디오 디코더(30)는, 현재 개발중인 HEVC(High Efficiency Video Coding) 표준 등의 비디오 코딩 표준에 따라 동작할 수 있고, HEVC 테스트 모델(HEVC Test Model; HM)을 따를 수 있다. 대안으로서, 비디오 인코더(20) 및 비디오 디코더(30)는, 대안으로서 MPEG(Moving Picture Expert Group)-4 Part 10, AVC(Advanced Video Coding)라고도 지칭되는 ITU-T(International Telecommunications Union Telecommunication Standardization Sector) H.264 표준, H.265/HEVC, 또는 이러한 표준들의 확장 등의, 다른 전용 또는 산업 표준에 따라 작동할 수 있다. 그러나, 본 개시내용의 기술들은 임의의 특정한 코딩 표준으로 제한되는 것은 아니다. 비디오 코딩 표준들의 다른 예들은 MPEG-2 및 ITU-T H.263을 포함한다. 도 1에 도시되지는 않았지만, 일부 양태에서, 비디오 인코더(20) 및 비디오 디코더(30) 각각은 오디오 인코더 및 디코더와 통합될 수 있고, 공통의 데이터 스트림 또는 별개의 데이터 스트림들에서 오디오 및 비디오 양쪽 모두의 인코딩을 처리하기 위해 적절한 멀티플렉서-디멀티플렉서(MUX-DEMUX) 유닛들, 또는 기타의 하드웨어 및 소프트웨어를 포함할 수 있다. 적용가능하다면, MUX-DEMUX 유닛들은, ITU H.223 멀티플렉서 프로토콜, 또는 사용자 데이터그램 프로토콜(UDP) 등의 기타의 프로토콜을 따를 수 있다.The video encoder 20 and the video decoder 30 may operate according to a video coding standard such as the High Efficiency Video Coding (HEVC) standard currently under development, and may follow the HEVC Test Model (HM). . As an alternative, the video encoder 20 and the video decoder 30 are configured by an International Telecommunications Union Telecommunication Standardization Sector (ITU-T), alternatively referred to as Moving Picture Expert Group (MPEG)-4 Part 10, Advanced Video Coding (AVC). It may operate according to other proprietary or industry standards, such as the H.264 standard, H.265/HEVC, or extensions of these standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in FIG. 1 , in some aspects, each of the video encoder 20 and video decoder 30 may be integrated with an audio encoder and decoder, and both audio and video in a common data stream or in separate data streams. It may include suitable multiplexer-demultiplexer (MUX-DEMUX) units, or other hardware and software, to handle the encoding of both. If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).

비디오 인코더(20) 및 비디오 디코더(30) 각각은, 하나 이상의 마이크로프로세서, 디지털 신호 프로세서(DSP)들, 주문형 집적 회로(ASIC)들, 필드 프로그래머블 게이트 어레이(FPGA)들, 개별 로직, 소프트웨어, 하드웨어, 펌웨어 또는 이들의 임의의 조합들 등의, 다양한 적절한 인코더 회로 중 임의의 것으로서 구현될 수 있다. 본 기술들이 부분적으로 소프트웨어로 구현될 때, 디바이스는 소프트웨어를 위한 명령어들을 적절한 비일시적인 컴퓨터 판독가능한 매체에 저장하고 본 개시내용의 기술들을 수행하기 위해 하나 이상의 프로세서를 이용하여 하드웨어로 명령어들을 실행할 수 있다. 비디오 인코더(20) 및 비디오 디코더(30) 각각은 하나 이상의 인코더 또는 디코더에 포함될 수 있고, 이들 중 어느 하나는 각각의 디바이스에서 결합된 인코더/디코더(CODEC)의 일부로서 통합될 수 있다. 비디오 인코더(20) 및/또는 비디오 디코더(30)를 포함하는 디바이스는, 집적 회로, 마이크로프로세서, 및/또는 셀룰러 전화 등의 무선 통신 디바이스를 포함할 수 있다.Each of the video encoder 20 and video decoder 30 includes one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware , firmware, or any combinations thereof, or the like, as any of a variety of suitable encoder circuitry. When the techniques are implemented in part in software, the device may store the instructions for the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. . Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device. Devices including video encoder 20 and/or video decoder 30 may include integrated circuits, microprocessors, and/or wireless communication devices such as cellular telephones.

도 2는 비디오 코딩 기술들을 구현할 수 있는 비디오 인코더(20)의 한 예를 나타내는 블록도이다. 비디오 인코더(20)는 비디오 슬라이스들 내의 비디오 블록들의 인트라 코딩 및 인터 코딩을 수행할 수 있다. 인트라 코딩은 공간 예측에 의존하여 비디오의 주어진 비디오 프레임 또는 픽처 내에서 공간 중복성을 감소시키거나 제거한다. 인터 코딩은 시간 예측에 의존하여 비디오의 비디오 시퀀스의 인접한 프레임들 또는 픽처들 내에서 시간 중복성을 감소시키거나 제거한다. 인트라 모드(I 모드)란, 수개의 공간 기반의 코딩 모드 중 임의의 것을 지칭할 수 있다. 단방향(일명, 일방 예측) 예측(P 모드) 또는 양방향 예측(일명, 쌍방 예측)(B 모드) 등의 인터 모드들은, 수개의 시간 기반의 코딩 모드들 중 임의의 것을 지칭할 수 있다.2 is a block diagram illustrating an example of a video encoder 20 that may implement video coding techniques. Video encoder 20 may perform intra-coding and inter-coding of video blocks within video slices. Intra coding relies on spatial prediction to reduce or remove spatial redundancy within a given video frame or picture of video. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy within adjacent frames or pictures of a video sequence of video. Intra mode (I mode) may refer to any of several spatial-based coding modes. Inter-modes, such as unidirectional (aka, uni-prediction) prediction (P mode) or bi-prediction (aka, bi-prediction) (B mode), may refer to any of several temporal-based coding modes.

도 2에 도시된 바와 같이, 비디오 인코더(20)는 인코딩될 비디오 프레임들 내에서 현재 비디오 블록을 수신한다. 도 2의 예에서, 비디오 인코더(20)는, 모드 선택 유닛(40), 참조 프레임 메모리(64), 합산기(50), 변환 처리 유닛(52), 양자화 유닛(54), 및 엔트로피 코딩 유닛(56)을 포함한다. 모드 선택 유닛(40)은 차례로, 움직임 보상 유닛(44), 움직임 추정 유닛(42), 인트라 예측(일명 인트라 예측) 유닛(46) 및 분할 유닛(48)을 포함한다. 비디오 블록 재구성을 위해, 비디오 인코더(20)는 또한, 역 양자화 유닛(58), 역 변환 유닛(60), 및 합산기(62)를 포함한다. 디블록킹 필터(도 2에 도시되지 않음)가 또한 포함되어, 재구성된 비디오로부터 블록성 아티팩트들(blockiness artifacts)을 제거하기 위해 블록 경계들을 필터링할 수 있다. 원한다면, 디블록킹 필터는 전형적으로 합산기(62)의 출력을 필터링할 것이다. 디블록킹 필터에 추가하여 추가 필터들(루프 또는 사후 루프)이 이용될 수도 있다. 이러한 필터들은 간결성을 위해 도시되지 않지만, 원한다면, 합산기(50)의 출력을 필터링할 수 있다(인-루프 필터로서).As shown in FIG. 2 , video encoder 20 receives a current video block within video frames to be encoded. In the example of FIG. 2 , the video encoder 20 includes a mode selection unit 40 , a reference frame memory 64 , a summer 50 , a transform processing unit 52 , a quantization unit 54 , and an entropy coding unit. (56). The mode selection unit 40 includes, in turn, a motion compensation unit 44 , a motion estimation unit 42 , an intra prediction (aka intra prediction) unit 46 , and a dividing unit 48 . For video block reconstruction, the video encoder 20 also includes an inverse quantization unit 58 , an inverse transform unit 60 , and a summer 62 . A deblocking filter (not shown in FIG. 2 ) may also be included to filter block boundaries to remove blockiness artifacts from the reconstructed video. If desired, a deblocking filter will typically filter the output of summer 62 . Additional filters (loop or post loop) may be used in addition to the deblocking filter. These filters are not shown for brevity, but if desired, the output of summer 50 may be filtered (as an in-loop filter).

인코딩 프로세스 동안, 비디오 인코더(20)는 코딩될 비디오 프레임 또는 슬라이스를 수신한다. 프레임 또는 슬라이스는 복수의 비디오 블록으로 분할될 수 있다. 움직임 추정 유닛(42) 및 움직임 보상 유닛(44)은, 하나 이상의 참조 프레임 내의 하나 이상의 블록에 관해 수신된 비디오 블록의 인터 예측 코딩을 수행하여 시간적 예측을 제공한다. 인트라 예측 유닛(46)은, 대안으로서, 코딩될 블록과 동일한 프레임 또는 슬라이스 내의 하나 이상의 이웃 블록에 관해 수신된 비디오 블록의 인트라 예측 코딩을 수행하여 공간 예측을 제공할 수 있다. 비디오 인코더(20)는, 예를 들어 비디오 데이터의 각각의 블록에 대해 적절한 코딩 모드를 선택하기 위해 복수의 코딩 패스(coding pass)를 수행할 수 있다.During the encoding process, video encoder 20 receives a video frame or slice to be coded. A frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-prediction coding of a received video block with respect to one or more blocks within one or more reference frames to provide temporal prediction. Intra prediction unit 46 may alternatively provide spatial prediction by performing intra prediction coding of the received video block with respect to one or more neighboring blocks within the same frame or slice as the block to be coded. Video encoder 20 may, for example, perform multiple coding passes to select an appropriate coding mode for each block of video data.

더욱이, 분할 유닛(48)은, 이전 코딩 패스들에서의 이전 분할 방식들의 평가에 기초하여, 비디오 데이터의 블록들을 서브블록들로 분할할 수 있다. 예를 들어, 분할 유닛(48)은 초기에 프레임 또는 슬라이스를 최대 코딩 유닛(LCU)들로 분할하고, 각각의 LCU를 레이트 왜곡 분석(예를 들어, 레이트 왜곡 최적화)에 기초하여 서브코딩 유닛들(서브-CU들)로 분할할 수 있다. 모드 선택 유닛(40)은 LCU를 서브-CU들로 분할하는 것을 나타내는 쿼드 트리 데이터 구조를 더 생성할 수 있다. 쿼드 트리의 리프 노드 CU들은 하나 이상의 예측 유닛(PU) 및 하나 이상의 변환 유닛(TU)을 포함할 수 있다.Moreover, partitioning unit 48 may partition blocks of video data into subblocks based on evaluation of previous partitioning schemes in previous coding passes. For example, partitioning unit 48 initially partitions a frame or slice into largest coding units (LCUs), and divides each LCU into subcoding units based on rate-distortion analysis (eg, rate-distortion optimization). (sub-CUs). The mode selection unit 40 may further generate a quad tree data structure indicating dividing the LCU into sub-CUs. Leaf node CUs of a quad tree may include one or more prediction units (PUs) and one or more transform units (TUs).

본 개시내용은, HEVC의 맥락에서 CU, PU 또는 TU 중 임의의 것, 또는 다른 표준들의 맥락에서 유사한 데이터 구조들(예를 들어, H.264/AVC에서 매크로블록들 및 그 서브블록들)을 지칭하기 위해 용어 "블록"을 사용한다. CU는, 코딩 노드, PU들, 및 코딩 노드와 연관된 TU들을 포함한다. CU의 크기는 코딩 노드의 크기에 대응하고 정사각형 형상이다. CU의 크기는, 8x8 픽셀로부터 최대 64x64 픽셀 이상의 트리블록의 크기까지의 범위일 수 있다. 각각의 CU는 하나 이상의 PU 및 하나 이상의 TU를 포함할 수 있다. CU와 연관된 신택스 데이터는, 예를 들어, CU를 하나 이상의 PU로 분할하는 것을 기술할 수 있다. 분할 모드들은, CU가 스킵 또는 직접 모드 인코딩되는지, 인트라 예측 모드 인코딩되는지, 또는 인터 예측(일명 인터 예측) 모드 인코딩되는지에 따라 다를 수 있다. PU들은 정사각형이 아닌 형상으로 분할될 수 있다. CU와 연관된 신택스 데이터는 또한, 예를 들어 쿼드 트리에 따라 CU를 하나 이상의 TU로 분할하는 것을 기술할 수 있다. TU는 정사각형 또는 비정사각형(예를 들어, 직사각형) 형상일 수 있다.This disclosure provides similar data structures (eg macroblocks and their subblocks in H.264/AVC) in the context of any of CU, PU, or TU in the context of HEVC, or other standards. The term “block” is used to refer to it. A CU includes a coding node, PUs, and TUs associated with the coding node. The size of the CU corresponds to the size of the coding node and is square in shape. The size of the CU may range from 8x8 pixels to the maximum size of a treeblock of 64x64 pixels or more. Each CU may include one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, splitting the CU into one or more PUs. The splitting modes may differ depending on whether the CU is skip or direct mode encoded, intra prediction mode encoded, or inter prediction (aka inter prediction) mode encoded. PUs may be divided into non-square shapes. Syntax data associated with a CU may also describe partitioning the CU into one or more TUs, eg, according to a quad tree. A TU may be square or non-square (eg, rectangular) shaped.

모드 선택 유닛(40)은 예를 들어 에러 결과에 기초하여 인트라 또는 인터 코딩 모드 중 하나를 선택할 수 있고, 결과적인 인트라 또는 인터 코딩된 블록을 합산기(50)에 제공하여 잔차 블록 데이터를 생성하고 합산기(62)에 제공하여 참조 프레임으로 이용하기 위한 인코딩된 블록을 재구성할 수 있다. 모드 선택 유닛(40)은 또한, 움직임 벡터들, 인트라 모드 표시자들, 분할 정보, 및 기타의 이러한 신택스 정보 등의 신택스 요소들을 엔트로피 코딩 유닛(56)에 제공한다.The mode selection unit 40 may, for example, select one of the intra or inter coding modes based on the error result, and provide the resultant intra or inter coded block to the summer 50 to generate residual block data and It can be provided to summer 62 to reconstruct the encoded block for use as a reference frame. The mode selection unit 40 also provides syntax elements such as motion vectors, intra mode indicators, partition information, and other such syntax information to the entropy coding unit 56 .

움직임 추정 유닛(42) 및 움직임 보상 유닛(44)은 고도로 통합될 수 있지만, 개념상의 목적을 위해 별개로 예시되어 있다. 움직임 추정 유닛(42)에 의해 수행되는 움직임 추정은 비디오 블록들에 대한 움직임을 추정하는 움직임 벡터들을 생성하는 프로세스이다. 움직임 벡터는, 예를 들어, 현재 프레임 내에서 코딩중인 현재 블록(또는 기타의 코딩된 유닛)에 관한 참조 프레임 내의 예측 블록(또는 기타의 코딩된 유닛)에 관한 현재의 비디오 프레임 또는 픽처 내의 비디오 블록의 PU의 변위를 나타낼 수 있다. 예측 블록이란, 절대 차이(SAD; sum of absolute difference)의 합, 제곱 차이의 합(SSD; sum of square difference), 또는 기타의 차이 메트릭에 의해 결정될 수 있는, 픽셀 차이의 면에서, 코딩될 블록과 근접하게 정합하는 것으로 발견된 블록이다. 일부 예에서, 비디오 인코더(20)는 참조 프레임 메모리(64)에 저장된 참조 픽처들의 정수이하 픽셀 위치들에 대한 값들을 계산할 수 있다. 예를 들어, 비디오 인코더(20)는, 참조 픽처의 1/4 픽셀 위치들, 1/8 픽셀 위치들, 또는 다른 부분 픽셀 위치들의 값들을 보간할 수 있다. 따라서, 움직임 추정 유닛(42)은 전체 픽셀 위치들 및 부분 픽셀 위치들에 관한 움직임 검색을 수행하고 부분 픽셀 정밀도로 움직임 벡터를 출력할 수 있다.Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation unit 42 is the process of generating motion vectors that estimate motion for video blocks. A motion vector is, for example, a video block within a current video frame or picture with respect to a predictive block (or other coded unit) within a reference frame with respect to the current block (or other coded unit) being coded within the current frame. may represent the displacement of PU of . A predictive block is a block to be coded, in terms of pixel difference, which may be determined by a sum of absolute difference (SAD), sum of square difference (SSD), or other difference metric. Blocks found to match closely with . In some examples, video encoder 20 can calculate values for sub-integer pixel positions of reference pictures stored in reference frame memory 64 . For example, video encoder 20 can interpolate values of quarter pixel positions, 1/8 pixel positions, or other partial pixel positions of a reference picture. Accordingly, the motion estimation unit 42 can perform a motion search with respect to full pixel positions and partial pixel positions and output a motion vector with partial pixel precision.

움직임 추정 유닛(42)는 PU의 위치를 참조 픽처의 예측 블록의 위치와 비교함으로써 인터 코딩된 슬라이스에서 비디오 블록의 PU에 대한 움직임 벡터를 계산한다. 참조 픽처는, 제1 참조 픽처 목록(List 0) 또는 제2 참조 픽처 목록(List 1)으로부터 선택될 수 있으며, 그 각각은 참조 프레임 메모리(64)에 저장된 하나 이상의 참조 픽처를 식별한다. 움직임 추정 유닛(42)은 계산된 움직임 벡터를 엔트로피 인코딩 유닛(56) 및 움직임 보상 유닛(44)에 전송한다.The motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU with the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identifies one or more reference pictures stored in the reference frame memory 64 . The motion estimation unit 42 transmits the calculated motion vector to the entropy encoding unit 56 and the motion compensation unit 44 .

움직임 보상 유닛(44)에 의해 수행되는 움직임 보상은, 움직임 추정 유닛(42)에 의해 결정된 움직임 벡터에 기초하여 예측 블록을 인출(fetch)하거나 생성하는 것을 포함할 수 있다. 다시 한번, 움직임 추정 유닛(42) 및 움직임 보상 유닛(44)은, 일부 예에서, 기능적으로 통합될 수 있다. 현재 비디오 블록의 PU에 대한 움직임 벡터의 수신시, 움직임 보상 유닛(44)은 참조 픽처 목록들 중 하나에서 움직임 벡터가 가리키는 예측 블록을 위치파악할 수 있다. 합산기(50)는, 아래에서 논의되는 바와 같이, 코딩중인 현재 비디오 블록의 픽셀 값들로부터 예측 블록의 픽셀 값들을 감산하여 픽셀 차이 값들을 형성함으로써, 잔차 비디오 블록을 형성한다. 일반적으로, 움직임 추정 유닛(42)은 루마 성분들에 관한 움직임 추정을 수행하고, 움직임 보상 유닛(44)은 크로마 성분들 및 루마 성분들 양쪽 모두에 대해 루마 성분들에 기초하여 계산된 움직임 벡터들을 이용한다. 모드 선택 유닛(40)은 또한, 비디오 슬라이스의 비디오 블록들을 디코딩할 때 비디오 디코더(30)에 의해 이용하기 위한 비디오 블록들 및 비디오 슬라이스와 연관된 신택스 요소들을 생성할 수 있다.Motion compensation performed by the motion compensation unit 44 may include fetching or generating a predictive block based on the motion vector determined by the motion estimation unit 42 . Once again, motion estimation unit 42 and motion compensation unit 44 may, in some examples, be functionally integrated. Upon receiving the motion vector for the PU of the current video block, the motion compensation unit 44 may locate the predictive block pointed to by the motion vector in one of the reference picture lists. Summer 50 forms a residual video block by subtracting pixel values of the predictive block from pixel values of the current video block being coded, forming pixel difference values, as discussed below. In general, the motion estimation unit 42 performs motion estimation on the luma components, and the motion compensation unit 44 calculates the motion vectors calculated based on the luma components for both the chroma components and the luma components. use it Mode selection unit 40 may also generate the video blocks and syntax elements associated with the video slice for use by video decoder 30 when decoding video blocks of the video slice.

인트라 예측 유닛(46)은, 전술된 바와 같이, 움직임 추정 유닛(42) 및 움직임 보상 유닛(44)에 의해 수행되는 인터 예측의 대안으로서 현재 블록을 인트라 예측할 수 있다. 특히, 인트라 예측 유닛(46)은 현재 블록을 인코딩하기 위해 이용할 인트라 예측 모드를 결정할 수 있다. 일부 예에서, 인트라 예측 유닛(46)은, 예를 들어, 별개의 인코딩 패스(encoding pass)들 동안에, 다양한 인트라 예측 모드를 이용하여 현재 블록을 인코딩할 수 있고, 인트라 예측 유닛(46)(또는 일부 예에서는 모드 선택 유닛(40))은 테스트된 모드들로부터 이용할 적절한 인트라 예측 모드를 선택할 수 있다.The intra prediction unit 46 may intra-predict the current block as an alternative to the inter prediction performed by the motion estimation unit 42 and the motion compensation unit 44 , as described above. In particular, intra prediction unit 46 may determine an intra prediction mode to use for encoding the current block. In some examples, intra prediction unit 46 may encode a current block using various intra prediction modes, eg, during separate encoding passes, and intra prediction unit 46 (or In some examples, the mode selection unit 40 may select an appropriate intra prediction mode to use from the tested modes.

예를 들어, 인트라 예측 유닛(46)은, 다양한 테스트된 인트라 예측 모드에 대해 레이트 왜곡 분석을 이용하여 레이트 왜곡 값들을 계산할 수 있고, 테스트된 모드들 중에서 최상의 레이트 왜곡 특성을 갖는 인트라 예측 모드를 선택할 수 있다. 레이트 왜곡 분석은, 일반적으로, 인코딩된 블록과, 그 인코딩된 블록을 생성하기 위해 인코딩되었던 원래의 인코딩되지 않은 블록 사이의 왜곡(또는 에러)의 양뿐만 아니라, 그 인코딩된 블록을 생성하는데 이용되는 비트레이트(즉, 비트수)를 결정한다. 인트라 예측 유닛(46)은, 어느 인트라 예측 모드가 블록에 대한 최상의 레이트 왜곡 값을 보여주는지를 결정하기 위해 다양한 인코딩된 블록에 대한 왜곡들 및 레이트들로부터 비율을 계산할 수 있다.For example, the intra prediction unit 46 may calculate rate distortion values using rate distortion analysis for various tested intra prediction modes, and select an intra prediction mode having the best rate distortion characteristic among the tested modes. can Rate-distortion analysis, in general, determines the amount of distortion (or error) between an encoded block and the original unencoded block that was encoded to produce the encoded block, as well as the amount of distortion (or error) used to generate the encoded block. Determines the bit rate (ie, number of bits). Intra prediction unit 46 may calculate a ratio from the distortions and rates for the various encoded blocks to determine which intra prediction mode shows the best rate distortion value for the block.

또한, 인트라 예측 유닛(46)은 깊이 모델링 모드(DMM)를 이용하여 깊이 맵의 깊이 블록들을 코딩하도록 구성될 수 있다. 모드 선택 유닛(40)은 가용 DMM 모드가 예를 들어 레이트 왜곡 최적화(RDO)를 이용하여 인트라 예측 모드 및 기타의 DMM 모드보다 더 양호한 코딩 결과들을 생성하는지를 결정할 수 있다. 깊이 맵에 대응하는 텍스처 이미지에 대한 데이터는 참조 프레임 메모리(64)에 저장될 수 있다. 움직임 추정 유닛(42) 및 움직임 보상 유닛(44)은 또한, 깊이 맵의 깊이 블록을 인터 예측하도록 구성될 수 있다.In addition, intra prediction unit 46 may be configured to code the depth blocks of the depth map using a depth modeling mode (DMM). Mode selection unit 40 may determine whether an available DMM mode produces better coding results than an intra prediction mode and other DMM modes using, for example, rate distortion optimization (RDO). Data for the texture image corresponding to the depth map may be stored in the reference frame memory 64 . The motion estimation unit 42 and the motion compensation unit 44 may also be configured to inter-predict the depth block of the depth map.

블록에 대한 인트라 예측 모드(예를 들어, 종래의 인트라 예측 모드, 또는 DMM 모드들 중 하나)를 선택한 후, 인트라 예측 유닛(46)은 블록에 대한 선택된 인트라 예측 모드를 나타내는 정보를 엔트로피 코딩 유닛(56)에 제공할 수 있다. 엔트로피 코딩 유닛(56)은 선택된 인트라 예측 모드를 나타내는 정보를 인코딩할 수 있다. 비디오 인코더(20)는, 복수의 인트라 예측 모드 인덱스 테이블 및 복수의 수정된 인트라 예측 모드 인덱스 테이블(코드워드 맵핑 테이블이라고도 함), 다양한 블록에 대한 인코딩 컨텍스트들의 정의들, 및 가장 가능성이 큰 인트라 예측 모드에 대한 표시들을 포함할 수 있는 전송된 비트스트림 구성 데이터에서, 인트라 예측 모드 인덱스 테이블, 및 컨텍스트들 각각에 대해 이용할 수정된 인트라 예측 모드 인덱스 테이블을 포함할 수 있다.After selecting an intra prediction mode (e.g., a conventional intra prediction mode, or one of the DMM modes) for the block, intra prediction unit 46 sends information indicating the selected intra prediction mode for the block to an entropy coding unit ( 56) can be provided. Entropy coding unit 56 may encode information indicating the selected intra prediction mode. Video encoder 20 provides a plurality of intra prediction mode index tables and a plurality of modified intra prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks, and a most probable intra prediction In the transmitted bitstream configuration data, which may include indications for the mode, may include an intra prediction mode index table, and a modified intra prediction mode index table to use for each of the contexts.

비디오 인코더(20)는 코딩중인 원래의 비디오 블록으로부터 모드 선택 유닛(40)으로부터의 예측 데이터를 감산함으로써 잔차 비디오 블록을 형성한다. 합산기(50)는 이 감산 연산을 수행하는 컴포넌트 또는 컴포넌트들을 나타낸다.The video encoder 20 forms a residual video block by subtracting the prediction data from the mode selection unit 40 from the original video block being coded. Summer 50 represents the component or components that perform this subtraction operation.

변환 처리 유닛(52)은, 이산 코사인 변환(DCT; discrete cosine transform) 또는 개념적으로 유사한 변환 등의 변환을 잔차 블록에 적용하여, 잔차 변환 계수 값들을 포함하는 비디오 블록을 생성한다. 변환 처리 유닛(52)은 DCT와 개념적으로 유사한 다른 변환들을 수행할 수 있다. 웨이블렛 변환들, 정수 변환들, 서브대역 변환들 또는 기타 유형의 변환들도 역시 이용될 수 있다.Transform processing unit 52 applies a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform, to the residual block to generate a video block including residual transform coefficient values. The transform processing unit 52 may perform other transforms that are conceptually similar to DCT. Wavelet transforms, integer transforms, subband transforms or other types of transforms may also be used.

변환 처리 유닛(52)은 변환을 잔차 블록에 적용하여, 잔차 변환 계수들의 블록을 생성한다. 변환은 잔차 정보를 픽셀 값 도메인으로부터 주파수 도메인 등의 변환 도메인으로 변환할 수 있다. 변환 처리 유닛(52)은 결과적인 변환 계수들을 양자화 유닛(54)에 전송할 수 있다. 양자화 유닛(54)은 변환 계수들을 양자화하여 비트 레이트를 더욱 감소시킨다. 양자화 프로세스는 계수들의 일부 또는 전부와 연관된 비트 깊이를 감소시킬 수 있다. 양자화의 정도는 양자화 파라미터를 조절함으로써 수정될 수도 있다. 일부 예에서, 양자화 유닛(54)은 그 후 양자화된 변환 계수들을 포함하는 행렬의 스캔을 수행할 수 있다. 대안으로서, 엔트로피 인코딩 유닛(56)은 스캔을 수행할 수 있다.Transform processing unit 52 applies the transform to the residual block, producing a block of residual transform coefficients. The transform may transform the residual information from a pixel value domain to a transform domain, such as a frequency domain. The transform processing unit 52 may send the resulting transform coefficients to the quantization unit 54 . The quantization unit 54 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan of the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform a scan.

양자화 후에, 엔트로피 코딩 유닛(56)은 양자화된 변환 계수들을 엔트로피 코딩한다. 예를 들어, 엔트로피 코딩 유닛(56)은, 컨텍스트 적응적 가변 길이 코딩(context adaptive variable length coding; CAVLC), 컨텍스트 적응적 2진 산술 코딩(context adaptive binary arithmetic coding; CABAC), 신택스 기반의 컨텍스트 적응적 2진 산술 코딩(syntax-based context-adaptive binary arithmetic coding; SBAC), 확률 간격 분할 엔트로피(PIPE; probability interval partitioning entropy) 코딩 또는 다른 엔트로피 코딩 기술을 수행할 수 있다. 컨텍스트 기반의 엔트로피 코딩의 경우, 컨텍스트는 이웃 블록들에 기초할 수 있다. 엔트로피 코딩 유닛(56)에 의한 엔트로피 코딩에 이어, 인코딩된 비트스트림은 또 다른 디바이스(예를 들어, 비디오 디코더(30))에 전송되거나 나중에 전송 또는 회수를 위해 아카이브될 수 있다.After quantization, entropy coding unit 56 entropy codes the quantized transform coefficients. For example, entropy coding unit 56 may be configured for context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context adaptation Syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding, or other entropy coding techniques may be performed. In the case of context-based entropy coding, the context may be based on neighboring blocks. Following entropy coding by entropy coding unit 56 , the encoded bitstream may be transmitted to another device (eg, video decoder 30 ) or archived for later transmission or retrieval.

역 양자화 유닛(58) 및 역 변환 유닛(60)은 각각 역 양자화 및 역 변환을 적용하여, 예를 들어 나중에 참조 블록으로 이용하기 위해 잔차 블록을 픽셀 도메인에서 재구성한다. 움직임 보상 유닛(44)은, 참조 프레임 메모리(64)의 프레임들 중 하나의 예측 블록에 잔차 블록을 더함으로써 참조 블록을 계산할 수 있다. 움직임 보상 유닛(44)은 또한, 하나 이상의 보간 필터를 재구성된 잔차 블록에 적용하여 움직임 추정에 이용하기 위한 정수이하 픽셀 값들을 계산할 수 있다. 합산기(62)는, 재구성된 잔차 블록을 움직임 보상 유닛(44)에 의해 생성된 움직임 보상된 예측 블록에 더하여 참조 프레임 메모리(64)에 저장하기 위한 재구성된 비디오 블록을 생성한다. 재구성된 비디오 블록은, 후속 비디오 프레임 내의 블록을 인터 코딩하기 위해 움직임 추정 유닛(42) 및 움직임 보상 유닛(44)에 의해 참조 블록으로서 이용될 수 있다.Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block. The motion compensation unit 44 may calculate a reference block by adding a residual block to a prediction block of one of the frames of the reference frame memory 64 . Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reconstructed video block for storage in reference frame memory 64 . The reconstructed video block may be used as a reference block by the motion estimation unit 42 and the motion compensation unit 44 to inter-code a block within a subsequent video frame.

도 3은 비디오 코딩 기술들을 구현할 수 있는 비디오 디코더(30)의 한 예를 나타내는 블록도이다. 도 3의 예에서, 비디오 디코더(30)는, 엔트로피 디코딩 유닛(70), 움직임 보상 유닛(72), 인트라 예측 유닛(74), 역 양자화 유닛(76), 역 변환 유닛(78), 참조 프레임 메모리(82), 및 합산기(80)를 포함한다. 비디오 디코더(30)는, 일부 예에서, 비디오 인코더(20)(도 2)에 관하여 설명된 인코딩 패스와는 대체로 역 관계의 디코딩 패스를 수행할 수 있다. 움직임 보상 유닛(72)은 엔트로피 디코딩 유닛(70)으로부터 수신된 움직임 벡터들에 기초하여 예측 데이터를 생성할 수 있는 반면, 인트라 예측 유닛(74)은 엔트로피 디코딩 유닛(70)으로부터 수신된 인트라 예측 모드 표시자들에 기초하여 예측 데이터를 생성할 수 있다.3 is a block diagram illustrating an example of a video decoder 30 that may implement video coding techniques. In the example of FIG. 3 , the video decoder 30 includes an entropy decoding unit 70 , a motion compensation unit 72 , an intra prediction unit 74 , an inverse quantization unit 76 , an inverse transform unit 78 , a reference frame a memory 82 , and a summer 80 . Video decoder 30 may, in some examples, perform a decoding pass generally inversely related to the encoding pass described with respect to video encoder 20 ( FIG. 2 ). The motion compensation unit 72 may generate prediction data based on the motion vectors received from the entropy decoding unit 70 , while the intra prediction unit 74 receives the intra prediction mode received from the entropy decoding unit 70 . Predictive data may be generated based on the indicators.

디코딩 프로세스 동안, 비디오 디코더(30)는 인코딩된 비디오 슬라이스의 비디오 블록들 및 연관된 신택스 요소들을 나타내는 인코딩된 비디오 비트스트림을 비디오 인코더(20)로부터 수신한다. 비디오 디코더(30)의 엔트로피 디코딩 유닛(70)은 비트스트림을 엔트로피 디코딩하여, 양자화된 계수들, 움직임 벡터들 또는 인트라 예측 모드 표시자들, 및 기타의 신택스 요소들을 생성한다. 엔트로피 디코딩 유닛(70)은 움직임 벡터들 및 기타의 신택스 요소들을 움직임 보상 유닛(72)에 포워딩한다. 비디오 디코더(30)는 비디오 슬라이스 레벨 및/또는 비디오 블록 레벨에서 신택스 요소들을 수신할 수 있다.During the decoding process, video decoder 30 receives from video encoder 20 an encoded video bitstream representing video blocks of an encoded video slice and associated syntax elements. Entropy decoding unit 70 of video decoder 30 entropy decodes the bitstream to generate quantized coefficients, motion vectors or intra prediction mode indicators, and other syntax elements. Entropy decoding unit 70 forwards motion vectors and other syntax elements to motion compensation unit 72 . Video decoder 30 may receive syntax elements at a video slice level and/or a video block level.

비디오 슬라이스가 인트라 코딩된(I) 슬라이스로서 코딩될 때, 인트라 예측 유닛(74)은, 시그널링된 인트라 예측 모드 및 현재 프레임 또는 픽처의 이전에 디코딩된 블록들로부터의 데이터에 기초하여 현재 비디오 슬라이스의 비디오 블록에 대한 예측 데이터를 생성할 수 있다. 비디오 프레임이 인터 코딩된(예를 들어, B, P 또는 GPB) 슬라이스로서 코딩될 때, 움직임 보상 유닛(72)은 움직임 벡터들 및 엔트로피 디코딩 유닛(70)으로부터 수신된 기타의 신택스 요소들에 기초하여 현재 비디오 슬라이스의 비디오 블록에 대한 예측 블록들을 생성한다. 예측 블록들은 참조 픽처 목록들 중 하나 내의 참조 픽처들 중 하나로부터 생성될 수 있다. 비디오 디코더(30)는, 참조 프레임 메모리(82)에 저장된 참조 픽처들에 기초해 디폴트 구성 기술들을 이용하여 참조 프레임 목록들, List 0 및 List 1을 구성할 수 있다.When a video slice is coded as an intra-coded (I) slice, intra-prediction unit 74 is configured to determine the size of the current video slice based on the signaled intra prediction mode and data from previously decoded blocks of the current frame or picture. Predictive data for video blocks may be generated. When a video frame is coded as an inter-coded (eg, B, P or GPB) slice, motion compensation unit 72 based on motion vectors and other syntax elements received from entropy decoding unit 70 . to generate prediction blocks for the video block of the current video slice. Predictive blocks may be generated from one of the reference pictures in one of the reference picture lists. The video decoder 30 may construct the reference frame lists, List 0 and List 1 , using default construction techniques based on the reference pictures stored in the reference frame memory 82 .

움직임 보상 유닛(72)은 움직임 벡터들 및 기타의 신택스 요소들을 파싱함으로써 현재 비디오 슬라이스의 비디오 블록에 대한 예측 정보를 결정하고, 예측 정보를 이용하여 디코딩중인 현재 비디오 블록에 대한 예측 블록들을 생성한다. 예를 들어, 움직임 보상 유닛(72)은 수신된 신택스 요소들 중 일부를 이용하여, 비디오 슬라이스의 비디오 블록들을 코딩하는데 이용된 예측 모드(예를 들어, 인트라 또는 인터 예측), 인터 예측 슬라이스 유형(예를 들어, B 슬라이스, P 슬라이스 또는 GPB 슬라이스), 슬라이스에 대한 참조 픽처 목록들 중 하나 이상에 대한 구성 정보, 슬라이스의 각각의 인터 인코딩된 비디오 블록에 대한 움직임 벡터들, 슬라이스의 각각의 인터 코딩된 비디오 블록에 대한 인터 예측 상태, 및 현재 비디오 슬라이스 내의 비디오 블록들을 디코딩하기 위한 기타의 정보를 결정한다.Motion compensation unit 72 determines prediction information for a video block of a current video slice by parsing motion vectors and other syntax elements, and uses the prediction information to generate predictive blocks for the current video block being decoded. For example, motion compensation unit 72 uses some of the received syntax elements to determine the prediction mode (eg, intra or inter prediction) used to code the video blocks of the video slice, the inter prediction slice type ( For example, B slice, P slice, or GPB slice), configuration information for one or more of the reference picture lists for the slice, motion vectors for each inter-encoded video block of the slice, each inter-coding of the slice determine the inter prediction state for the video block that has been used, and other information for decoding the video blocks in the current video slice.

움직임 보상 유닛(72)은 또한, 보간 필터들에 기초하여 보간을 수행할 수 있다. 움직임 보상 유닛(72)은, 참조 블록들의 정수이하 픽셀들에 대한 보간된 값들을 계산하기 위해 비디오 블록들의 인코딩 동안 비디오 인코더(20)에 의해 이용되는 보간 필터들을 이용할 수 있다. 이 경우, 움직임 보상 유닛(72)은 수신된 신택스 요소들로부터 비디오 인코더(20)에 의해 이용되는 보간 필터들을 결정하고 보간 필터들을 이용하여 예측 블록을 생성할 수 있다.Motion compensation unit 72 may also perform interpolation based on interpolation filters. Motion compensation unit 72 may use interpolation filters used by video encoder 20 during encoding of video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, motion compensation unit 72 may determine interpolation filters used by video encoder 20 from the received syntax elements and generate a predictive block using the interpolation filters.

깊이 맵에 대응하는 텍스처 이미지에 대한 데이터는 참조 프레임 메모리(82)에 저장될 수 있다. 움직임 보상 유닛(72)은 또한, 깊이 맵의 깊이 블록들을 인터 예측하도록 구성될 수 있다.Data for the texture image corresponding to the depth map may be stored in the reference frame memory 82 . Motion compensation unit 72 may also be configured to inter-predict depth blocks of the depth map.

이미지 및 비디오 압축은 빠른 겪어, 다양한 코딩 표준으로 이어졌다. 이러한 비디오 코딩 표준들은, ITU-T H.261, ISO/IEC MPEG-1 Part 2, ITU-T H.262 또는 ISO/IEC MPEG-2 Part 2, ITU-T H.263, ISO/IEC MPEG-4 Part 2, ITU-T H.264 또는 ISO/IEC MPEG-4 Part 10이라고도 알려진 AVC(Advanced Video Coding), 및 ITU-T H.265 또는 MPEG-H Part 2라고도 알려진 HEVC(High Efficiency Video Coding)를 포함한다. AVC는, SVC(Scalable Video Coding), MVC(Multiview Video Coding) 및 MVC+D(Multiview Video Coding plus Depth), 및 3D AVC(3D-AVC) 등의 확장을 포함한다. HEVC는 Scalable HEVC(SHVC), Multiview HEVC(MV-HEVC), 및 3D HEVC(3D-HEVC) 등의 확장을 포함한다.Image and video compression suffered rapidly, leading to various coding standards. These video coding standards are ITU-T H.261, ISO/IEC MPEG-1 Part 2, ITU-T H.262 or ISO/IEC MPEG-2 Part 2, ITU-T H.263, ISO/IEC MPEG- Advanced Video Coding (AVC), also known as 4 Part 2, ITU-T H.264 or ISO/IEC MPEG-4 Part 10, and High Efficiency Video Coding (HEVC), also known as ITU-T H.265 or MPEG-H Part 2 includes AVC includes extensions such as Scalable Video Coding (SVC), Multiview Video Coding (MVC) and Multiview Video Coding plus Depth (MVC+D), and 3D-AVC (3D-AVC). HEVC includes extensions such as Scalable HEVC (SHVC), Multiview HEVC (MV-HEVC), and 3D HEVC (3D-HEVC).

또한 ITU-T 및 ISO/IEC의 공동 비디오 전문가 팀(JVET)에 의해 개발중인 VVC(Versatile Video Coding)라는 새로운 비디오 코딩 표준도 있다. VVC의 최신 작업 초안(WD)은 http://phenix.it-sudparis.eu/jvet/doc_end_user/documents/12_Macao/wg11/JVET-L1001-v11.zip에서 공개적으로 이용가능한 JVET-L1001-v5에 포함되어 있다.There is also a new video coding standard called Versatile Video Coding (VVC) that is being developed by the Joint Video Expert Team (JVET) of ITU-T and ISO/IEC. VVC's latest Working Draft (WD) is included in JVET-L1001-v5 publicly available at http://phenix.it-sudparis.eu/jvet/doc_end_user/documents/12_Macao/wg11/JVET-L1001-v11.zip has been

인트라 랜덤 액세스 포인트(Intra Random Access Point; IRAP) 픽처들 및 선두 픽처들이 논의된다.Intra Random Access Point (IRAP) pictures and leading pictures are discussed.

HEVC에서, 이하의 픽처들은 IRAP(Intra Random Access Point) 픽처들로서 간주된다 : IDR, BLA, 및 CRA 픽처들. VVC의 경우, 2018 년 10월 12차 JVET 회의 동안, IDR와 CRA 픽처 양쪽 모두를 IRAP 픽처들로서 갖는 것에 대해 동의했다.In HEVC, the following pictures are considered as Intra Random Access Point (IRAP) pictures: IDR, BLA, and CRA pictures. In the case of VVC, during the 12th JVET meeting in October 2018, it was agreed to have both IDR and CRA pictures as IRAP pictures.

IRAP 픽처는 다음과 같은 2개의 중요한 기능들/이점들을 제공한다. 첫째, IRAP 픽처의 존재는 디코딩 프로세스가 그 픽처로부터 시작할 수 있다는 것을 나타낸다. 이 기능은, 디코딩 프로세스가, 반드시 비트스트림의 시작부일 필요없이, IRAP 픽처가 소정의 위치에 존재하는 한, 비트스트림의 그 소정의 위치에서 시작하는 랜덤 액세스 피처를 허용한다. 둘째, IRAP 픽처의 존재는, RASL(Random Access Skipped Leading) 픽처들을 제외한, IRAP 픽처를 시작하는 코딩된 픽처가 이전 픽처들에 대한 임의의 참조 없이 코딩되도록 디코딩 프로세스를 리프레시한다. 결과적으로 비트스트림에 IRAP 픽처가 존재하는 것은, 그 IRAP 픽처 이전에 코딩된 픽처를 디코딩하는 동안 발생할 수 있는 임의의 오류도 IRAP 픽처 및 디코딩 순서에서 IRAP 픽처에 뒤따르는 픽처로 전파하는 것을 중지시킬 것이다.IRAP pictures provide two important functions/benefits: First, the presence of an IRAP picture indicates that the decoding process can start from that picture. This function allows for a random access feature where the decoding process starts at a given position in the bitstream, as long as the IRAP picture is there, without necessarily being at the beginning of the bitstream. Second, the presence of an IRAP picture refreshes the decoding process so that, except for Random Access Skipped Leading (RASL) pictures, the coded picture that begins the IRAP picture is coded without any reference to previous pictures. As a result, the presence of an IRAP picture in the bitstream will stop propagating to the IRAP picture and to the picture following the IRAP picture in decoding order, any errors that may occur while decoding the picture coded before that IRAP picture. .

IRAP 픽처들은 중요한 기능들을 제공하지만, 압축 효율에 대한 불이익을 동반한다. IRAP 픽처의 존재는 비트레이트에서의 급증을 야기할 것이다. 압축 효율에 대한 이러한 불이익은 2개의 이유 때문이다 : 첫째, IRAP 픽처가 인트라 예측된 픽처이기 때문에, 픽처 자체가 인터 예측된 픽처들인 다른 픽처들에 비해 표현을 위해 비교적 더 많은 비트를 요구한다; 둘째, IRAP 픽처의 존재는 시간적 예측을 단절시킬 것이므로(이것은, 디코더가 디코딩 프로세스를 리프레시하고, 이를 위한 디코딩 프로세스의 동작들 중 하나가 DPB에서 이전의 참조 픽처들을 제거할 것이기 때문), 이들은 그들의 인터 예측 코딩을 위해 더 적은 참조 픽처들을 가질 것이므로 디코딩 순서에서 IRAP 픽처를 뒤따르는 픽처들의 코딩을 덜 효율적이게 할 것이다(즉, 표현을 위한 더 많은 비트들을 필요로 함).IRAP pictures provide important functions, but come with a penalty for compression efficiency. The presence of IRAP pictures will cause a spike in bitrate. This penalty for compression efficiency is due to two reasons: first, since an IRAP picture is an intra-predicted picture, the picture itself requires relatively more bits for representation compared to other pictures that are inter-predicted pictures; Second, since the presence of IRAP pictures will break temporal prediction (this is because the decoder refreshes the decoding process, and one of the operations of the decoding process for this will remove old reference pictures from the DPB), they It will have fewer reference pictures for predictive coding, thus making the coding of pictures that follow the IRAP picture in decoding order less efficient (ie, require more bits for representation).

IRAP 픽처들이라고 간주되는 픽처 유형들 중에서, HEVC에서의 IDR 픽처는 다른 픽처 유형들에 비해 상이한 시그널링 및 도출(derivation)을 갖는다. 차이점들 중 일부는 다음과 같다.Among the picture types considered as IRAP pictures, the IDR picture in HEVC has different signaling and derivation compared to other picture types. Some of the differences are:

-IDR 픽처의 POC 값의 시그널링 및 도출을 위해, POC의 최상위 비트(most significant bit; MSB) 부분은 이전 키 픽처로부터 도출되지 않고 단순히 0으로 설정된다.- For signaling and derivation of the POC value of the IDR picture, the most significant bit (MSB) portion of the POC is not derived from the previous key picture and is simply set to zero.

-참조 픽처 관리에 필요한 시그널링 정보의 경우, IDR 픽처의 슬라이스 헤더는 참조 픽처 관리를 보조하기 위해 시그널링될 필요가 있는 정보를 포함하지 않는다. 다른 픽처 유형들(즉, CRA, Trailing, TSA(Temporal Sublayer Access) 등)의 경우, 아래의 섹션에 설명되는 참조 픽처 세트(reference picture set; RPS) 또는 다른 형태들의 유사한 정보(예를 들어, 참조 픽처 목록들) 등의 정보는 참조 픽처 마킹 프로세스(즉, 참조용으로 이용되거나 참조용으로 이용되지 않는 DPB에서 참조 픽처들의 상태를 결정하는 프로세스)에 필요하다. 그러나, IDR 픽처의 경우, IDR의 존재는 디코딩 프로세스가 DPB의 모든 참조 픽처를 참조용으로 이용되지 않은 것으로 단순히 마킹해야 함을 나타내기 때문에 이러한 정보는 시그널링될 필요가 없다.- In the case of signaling information necessary for reference picture management, the slice header of the IDR picture does not include information that needs to be signaled to assist reference picture management. For other picture types (ie, CRA, Trailing, Temporal Sublayer Access (TSA), etc.), a reference picture set (RPS) described in the section below or other forms of similar information (eg, reference information such as picture lists) is required for the reference picture marking process (ie, the process of determining the status of reference pictures in the DPB that is used for reference or not used for reference). However, in the case of an IDR picture, this information does not need to be signaled because the presence of the IDR indicates that the decoding process should simply mark all reference pictures of the DPB as unused for reference.

IRAP 픽처 개념에 추가하여, 존재하는 경우 IRAP 픽처와 연관된 선두 픽처들도 있다. 선두 픽처들은, 디코딩 순서에서는 그 연관된 IRAP 픽처를 뒤따르지만 출력 순서에서는 IRAP 픽처보다 선행하는 픽처들이다. 코딩 구성 및 픽처 참조 구조에 따라, 선두 픽처들은 2개의 유형으로 더 식별된다. 제1 유형은, 디코딩 프로세스가 그 연관된 IRAP 픽처에서 시작되는 경우 올바르게 디코딩되지 않을 수 있는 선두 픽처들이다. 이것은, 이들 선두 픽처들이 디코딩 순서에서 IRAP 픽처에 선행하는 픽처들을 참조하여 코딩되기 때문에 발생할 수 있다. 이러한 선두 픽처들은 랜덤 액세스 스킵형 선두(random access skipped leading; RASL)이라고 불린다. 제2 유형은, 디코딩 프로세스가 그 연관된 IRAP 픽처에서 시작하더라도 올바르게 디코딩되어야 하는 선두 픽처들이다. 이것은, 이들 선두 픽처들이 디코딩 순서에서 IRAP 픽처에 선행하는 픽처들을 직접 또는 간접 참조하지 않고 코딩되기 때문에 가능하다. 이러한 선두 픽처들은 랜덤 액세스 디코딩가능 선두(random access decodable leading; RADL)이라고 불린다. HEVC에서, RASL 및 RADL 픽처들이 존재할 때, 동일한 IRAP 픽처와 연관된 RASL 및 RADL 픽처들에 대해, RASL 픽처들은 출력 순서에서 RADL 픽처들보다 선행해야 한다는 제약이 있다.In addition to the IRAP picture concept, there are also leading pictures associated with the IRAP picture, if present. Leading pictures are pictures that follow the associated IRAP picture in decoding order but precede the IRAP picture in output order. According to the coding scheme and the picture reference structure, the leading pictures are further identified into two types. The first type are leading pictures that may not be decoded correctly if the decoding process starts at the associated IRAP picture. This may occur because these leading pictures are coded with reference to pictures that precede the IRAP picture in decoding order. These leading pictures are called random access skipped leading (RASL). The second type are leading pictures that must be decoded correctly even if the decoding process starts with the associated IRAP picture. This is possible because these leading pictures are coded without direct or indirect reference to the pictures preceding the IRAP picture in decoding order. These leading pictures are called random access decodable leading (RADL). In HEVC, when there are RASL and RADL pictures, for RASL and RADL pictures associated with the same IRAP picture, there is a constraint that RASL pictures must precede RADL pictures in output order.

HEVC 및 VVC에서, IRAP 픽처들과 선두 픽처들에는 상이한 NAL 유닛 유형들이 주어지므로, 이들은 시스템 레벨 애플리케이션들에 의해 쉽게 식별될 수 있다. 예를 들어, 비디오 스플라이서(video splicer)는, 특히 비-IRAP 픽처들로부터 IRAP 픽처들을 식별하고 후미 픽처들로부터, RASL 및 RADL 픽처들의 결정을 포함한, 선두 픽처들을 식별하기 위해, 코딩된 비트스트림에서 신택스 요소의 너무 많은 상세사항을 이해하지 않고도 코딩된 픽처 유형들을 이해할 필요가 있다. 후미 픽처들은, IRAP 픽처와 연관되고 출력 순서에서 IRAP 픽처를 뒤따르는 픽처들이다. 특정한 IRAP 픽처와 연관된 픽처는, 디코딩 순서에서 특정한 IRAP 픽처를 뒤따르고 디코딩 순서에서 임의의 다른 IRAP 픽처보다 선행하는 픽처이다. 이를 위해, IRAP 및 선두 픽처들에게 그들 자신의 자체 NAL 유닛 유형을 제공하는 것은 이러한 애플리케이션에 도움이 된다.In HEVC and VVC, since IRAP pictures and leading pictures are given different NAL unit types, they can be easily identified by system level applications. For example, a video splicer identifies IRAP pictures, particularly from non-IRAP pictures, and from trailing pictures, coded bits to identify leading pictures, including determination of RASL and RADL pictures. There is a need to understand the coded picture types without understanding too many details of the syntax element in the stream. Trailing pictures are pictures that are associated with the IRAP picture and follow the IRAP picture in output order. A picture associated with a particular IRAP picture is a picture that follows the particular IRAP picture in decoding order and precedes any other IRAP picture in decoding order. To this end, providing IRAP and leading pictures with their own NAL unit type is helpful in this application.

HEVC의 경우, IRAP 픽처들에 대한 NAL 유닛 유형들은 다음을 포함한다 :For HEVC, NAL unit types for IRAP pictures include:

- 선두 픽처를 갖는 BLA(BLA_W_LP) : 디코딩 순서에서 하나 이상의 선두 픽처가 뒤따를 수 있는 BLA(Broken Link Access) 픽처의 NAL 유닛.- BLA with a leading picture (BLA_W_LP): NAL unit of a BLA (Broken Link Access) picture that can be followed by one or more leading pictures in decoding order.

- RADL을 갖는 BLA(BLA_W_RADL) : 디코딩 순서에서 하나 이상의 RADL 픽처가 뒤따를 수 있지만 RASL 픽처가 없는 BLA 픽처의 NAL 유닛.- BLA with RADL (BLA_W_RADL): A NAL unit of a BLA picture that may be followed by one or more RADL pictures in decoding order but no RASL pictures.

- 선두 픽처가 없는 BLA(BLA_N_LP) : 디코딩 순서에서 선두 픽처가 뒤따르지 않는 BLA 픽처의 NAL 유닛.- BLA without a leading picture (BLA_N_LP): NAL unit of a BLA picture not followed by a leading picture in decoding order.

- RADL을 갖는 IDR(IDR_W_RADL) : 디코딩 순서에서 하나 이상의 RADL 픽처가 뒤따를 수 있지만 RASL 픽처가 없는 IDR 픽처의 NAL 유닛.- IDR with RADL (IDR_W_RADL): A NAL unit of an IDR picture that may be followed by one or more RADL pictures in decoding order but no RASL pictures.

- 선두 픽처가 없는 IDR(IDR_N_LP) : 디코딩 순서에서 선두 픽처가 뒤따르지 않는 IDR 픽처의 NAL 유닛.- IDR without a leading picture (IDR_N_LP): NAL unit of an IDR picture not followed by a leading picture in decoding order.

- CRA : 선두 픽처들(즉, RASL 픽처 또는 RADL 픽처 또는 양쪽 모두)이 뒤따를 수 있는 클린 랜덤 액세스(CRA) 픽처의 NAL 유닛.- CRA: NAL unit of a clean random access (CRA) picture that can be followed by leading pictures (ie, RASL picture or RADL picture or both).

- RADL : RADL 픽처의 NAL 유닛.- RADL: NAL unit of RADL picture.

- RASL : RASL 픽처의 NAL 유닛.- RASL: NAL unit of RASL picture.

VVC의 경우, 이 문서의 기재에 따라, IRAP 픽처들 및 선두 픽처들에 대한 NAL 유닛 유형은 여전히 명확하지 않거나 / 아직 결정되지 않았다.In the case of VVC, according to the description of this document, the NAL unit type for IRAP pictures and leading pictures is still not clear / yet not determined.

파일 포멧 표준들이 논의된다.File format standards are discussed.

파일 포멧 표준들은, ISO 베이스 미디어 파일 포멧(ISOBMFF, ISO/IEC 14496-12, 이하 "ISO/IEC 14996-12"), 및 MPEG-4 파일 포멧(ISO/IEC 14496-14), 3GPP 파일 포멧(3GPP TS 26.244), 및 AVC 파일 포멧(ISO/IEC 14496-15, 이하 "ISO/IEC 14996-15")을 포함한, ISOBMFF로부터 파생된 기타의 파일 포멧 표준들을 포함한다. 따라서, ISO/IEC 14496-12는 ISO 베이스 미디어 파일 포멧을 명시한다. 다른 문서들은 특정한 애플리케이션들을 위해 ISO 베이스 미디어 파일 포멧을 확장한다. 예를 들어, ISO/IEC 14496-15는 ISO 베이스 미디어 파일 포멧의 NAL 유닛 구조화된 비디오의 캐리지(carriage)를 기술한다. H.264/AVC 및 HEVC뿐만 아니라 그들의 확장은, NAL 유닛 구조화된 비디오의 예들이다. ISO/IEC 14496-15는 H.264/AVC NAL 유닛들의 캐리지를 기술하는 섹션들을 포함한다. 추가로, ISO/IEC 14496-15의 섹션 8은 HEVC NAL 유닛들의 캐리지를 기술한다. 따라서, ISO/IEC 14496-15의 섹션 8은 HEVC 파일 포멧을 기술한다고 할 수 있다.The file format standards are ISO Base Media File Format (ISOBMFF, ISO/IEC 14496-12, hereinafter "ISO/IEC 14996-12"), and MPEG-4 File Format (ISO/IEC 14496-14), 3GPP File Format ( 3GPP TS 26.244), and other file format standards derived from ISOBMFF, including the AVC file format (ISO/IEC 14496-15, hereinafter "ISO/IEC 14996-15"). Therefore, ISO/IEC 14496-12 specifies the ISO base media file format. Other documents extend the ISO base media file format for specific applications. For example, ISO/IEC 14496-15 describes the carriage of NAL unit structured video in the ISO base media file format. H.264/AVC and HEVC as well as their extensions are examples of NAL unit structured video. ISO/IEC 14496-15 contains sections describing carriage of H.264/AVC NAL units. Additionally, section 8 of ISO/IEC 14496-15 describes a carriage of HEVC NAL units. Therefore, it can be said that section 8 of ISO/IEC 14496-15 describes the HEVC file format.

ISOBMFF는, AVC 파일 포멧 등의 많은 코덱 캡슐화 포멧들뿐만 아니라, MPEG-4 파일 포멧, 3GPP 파일 포멧(3GP) 및 DVB 파일 포멧 등의 많은 멀티미디어 컨테이너 포멧들에 대한 기초로서 이용된다. 오디오 및 비디오 등의 연속 미디어에 추가하여, 이미지들 등의 정적 미디어뿐만 아니라, 메타데이터는, ISOBMFF를 준수하는 파일에 저장될 수 있다. ISOBMFF에 따라 구조화된 파일들은, 로컬 미디어 파일 재생, 원격 파일의 점진적 다운로드, DASH(Dynamic Adaptive Streaming over HTTP)를 위한 세그먼트들, 스트리밍될 콘텐츠 및 그 패킷화 명령어들을 위한 컨테이너들, 수신된 실시간 미디어 스트림들의 기록을 포함한 많은 목적에 이용될 수 있다. 따라서, 원래는 저장용으로 설계되었지만, ISOBMFF는, 스트리밍, 예를 들어, 점진적 다운로드 또는 DASH에 가치가 있음이 입증되었다. 스트리밍 목적을 위해, ISOBMFF에 정의된 영화 단편들이 이용될 수 있다. 오디오 및 비디오 등의 연속 미디어에 추가하여, 이미지들 등의 정적 미디어뿐만 아니라, 메타데이터는, ISOBMFF를 준수하는 파일에 저장될 수 있다.ISOBMFF is used as the basis for many codec encapsulation formats, such as the AVC file format, as well as many multimedia container formats, such as the MPEG-4 file format, the 3GPP file format (3GP) and the DVB file format. In addition to continuous media such as audio and video, metadata, as well as static media such as images, can be stored in files that comply with ISOBMFF. Files structured according to ISOBMFF include segments for local media file playback, progressive download of remote files, Dynamic Adaptive Streaming over HTTP (DASH), containers for content to be streamed and its packetization instructions, real-time media stream received It can be used for many purposes, including the recording of Thus, although originally designed for storage, ISOBMFF has proven valuable for streaming, eg progressive download or DASH. For streaming purposes, movie fragments defined in ISOBMFF may be used. In addition to continuous media such as audio and video, metadata, as well as static media such as images, can be stored in files that comply with ISOBMFF.

HEVC 파일 포멧을 준수하는 파일은 박스들이라고 알려진 일련의 객체들을 포함할 수 있다. 박스는, 고유한 유형 식별자 및 길이에 의해 정의된 객체 지향형 구축 블록일 수 있다. 박스는 ISOBMFF의 기본 신택스 구조이며 4-캐릭터 코딩된 박스 유형, 박스의 바이트 수, 및 페이로드를 포함할 수 있다. 즉, 박스는, 코딩된 박스 유형, 박스의 바이트 수, 및 페이로드를 포함하는 신택스 구조일 수 있다. 일부 경우에, HEVC 파일 포멧을 준수하는 파일의 모든 데이터는 박스들 내에 포함될 수 있으며 박스 내에 있지 않은 파일에는 데이터가 없을 수 있다. 따라서, ISOBMFF 파일은 박스들의 시퀀스로 구성될 수 있고, 박스들은 다른 박스들을 포함할 수 있다. 예를 들어, 박스의 페이로드는 하나 이상의 추가 박스를 포함할 수 있다.A file conforming to the HEVC file format may contain a series of objects known as boxes. A box may be an object-oriented building block defined by a unique type identifier and length. A box is the basic syntax structure of ISOBMFF and may contain a 4-character coded box type, the number of bytes in the box, and a payload. That is, the box may be a syntax structure including a coded box type, the number of bytes in the box, and a payload. In some cases, all data of a file that conforms to the HEVC file format may be contained within boxes and files that are not within boxes may have no data. Thus, an ISOBMFF file may consist of a sequence of boxes, and boxes may contain other boxes. For example, the payload of a box may include one or more additional boxes.

ISOBMFF를 준수하는 파일은 다양한 유형의 박스를 포함할 수 있다. 예를 들어, ISOBMFF를 준수하는 파일은, 파일 유형 박스, 미디어 데이터 박스, 영화 박스, 영화 단편 박스 등을 포함할 수 있다. 이 예에서, 파일 유형 박스는 파일 유형 및 호환성 정보를 포함한다. 미디어 데이터 박스는 샘플들(예를 들어, 코딩된 픽처들)을 포함할 수 있다. 영화 박스("moov")는 파일에 존재하는 연속 미디어 스트림들에 대한 메타데이터를 포함한다. 연속 미디어 스트림들 각각은 파일에서 트랙으로서 표현될 수 있다. 예를 들어, 영화 박스는 영화에 관한 메타데이터(예를 들어, 샘플들 사이의 논리적 및 타이밍 관계, 및 또한 샘플들의 위치들에 대한 포인터들)를 포함할 수 있다. 영화 박스는 수개의 유형의 서브박스들을 포함할 수 있다. 영화 박스의 서브박스들은 하나 이상의 트랙 박스를 포함할 수 있다. 트랙 박스는 영화의 개개 트랙에 관한 정보를 포함할 수 있다. 트랙 박스는 단일 트랙의 전체 정보를 명시하는 트랙 헤더 박스를 포함할 수 있다. 또한, 트랙 박스는 미디어 정보 박스를 포함하는 미디어 박스를 포함할 수 있다. 미디어 정보 박스는, 트랙에서 데이터 인덱싱 미디어 샘플들을 포함하는 샘플 테이블 박스를 포함할 수 있다. 샘플 테이블 박스 내의 정보는, 제 시간에 샘플들을 위치파악하고, 트랙의 샘플들 각각에 대해, 유형, 크기, 컨테이너, 및 샘플의 그 컨테이너 내로의 오프셋을 위치파악하는데 이용될 수 있다. 따라서, 트랙에 대한 메타데이터는 트랙 박스("trak")에 포함되는 반면, 트랙의 미디어 콘텐츠는 미디어 데이터 박스("mdat")에 포함되거나 별도의 파일에 직접 포함된다. 트랙들에 대한 미디어 콘텐츠는, 오디오 또는 비디오 액세스 유닛들 등의, 샘플들의 시퀀스를 포함하거나 이들로 구성된다.Files compliant with ISOBMFF may contain different types of boxes. For example, a file conforming to ISOBMFF may include a file type box, a media data box, a movie box, a movie short box, and the like. In this example, the file type box contains file type and compatibility information. A media data box may contain samples (eg, coded pictures). A movie box (“moov”) contains metadata for the continuous media streams present in the file. Each of the successive media streams may be represented as a track in a file. For example, a movie box may contain metadata about the movie (eg, logical and timing relationships between samples, and also pointers to locations of the samples). A movie box may include several types of subboxes. Subboxes of a movie box may contain one or more track boxes. The track box may contain information about individual tracks of the movie. The track box may include a track header box specifying the full information of a single track. Also, the track box may include a media box including a media information box. The media information box may include a sample table box containing data indexing media samples in the track. The information in the Sample Table box can be used to locate samples in time and, for each of the samples in the track, the type, size, container, and offset of the sample into that container. Thus, metadata for a track is contained in a track box (“trak”), whereas the media content of a track is contained in a media data box (“mdat”) or directly in a separate file. Media content for tracks includes or consists of a sequence of samples, such as audio or video access units.

ISOBMFF는 다음과 같은 유형들의 트랙들을 명시한다 : 기본 미디어 스트림을 포함하는 미디어 트랙, 미디어 전송 명령어들을 포함하거나 수신된 패킷 스트림을 나타내는 힌트 트랙, 및 시간 동기화된 메타데이터를 포함하는 시간지정된 메타데이터 트랙. 각각의 트랙에 대한 메타데이터는 샘플 설명 엔트리들의 목록을 포함하고, 각각은 트랙에 이용되는 코딩 또는 캡슐화 포멧 및 그 포멧을 처리하는데 이용되는 초기화 데이터를 제공한다. 각각의 샘플은 트랙의 샘플 설명 엔트리들 중 하나와 연관된다.ISOBMFF specifies the following types of tracks: a media track containing the base media stream, a hint track containing media transport instructions or representing a received packet stream, and a timed metadata track containing time-synchronized metadata . The metadata for each track contains a list of sample description entries, each providing the coding or encapsulation format used for the track and initialization data used to process the format. Each sample is associated with one of the track's sample description entries.

ISOBMFF는 다양한 메커니즘으로 샘플 특유의 메타데이터를 명시하는 것을 가능케한다. 샘플 테이블 박스("stbl") 내의 특정한 박스들은 일반적인 필요성에 응답하기 위해 표준화되었다. 샘플 테이블 박스는 트랙 내의 미디어 샘플들의 모든 시간 및 데이터 인덱싱을 포함하는 샘플 테이블을 포함한다. 샘플 테이블 박스 내의 테이블들을 이용하여, 제 시간에 샘플들을 위치파악하고, 그들의 유형(예를 들어, I-프레임 여부)을 결정하고, 그들의 크기, 컨테이너, 및 그 컨테이너 내로의 오프셋을 결정할 수 있다.ISOBMFF makes it possible to specify sample-specific metadata with a variety of mechanisms. Specific boxes within the sample table box (“stbl”) have been standardized to respond to a general need. The Sample Table box contains a sample table containing all time and data indexing of the media samples in the track. Tables in the Sample Table box can be used to locate samples in time, determine their type (eg, whether I-frames), and determine their size, container, and offset into that container.

영화 단편 박스는 최상위 레벨 박스이다. 각각의 영화 단편 박스는, 영화 박스에 이전에 있었던 정보를 제공한다. 영화 단편 박스는 하나 이상의 트랙 단편("traf") 박스를 포함할 수 있다. 영화 단편 내에는, 트랙 당 0개 이상의, 트랙 단편 세트가 있다. 트랙 단편은 결국 0개 이상의 트랙 주행을 포함하고, 그 각각은 그 트랙에 대한 샘플들의 연속적 주행을 문서화한다. 예를 들어, 각각의 트랙 주행은, 디코딩 순서 등의 소정의 순서로 연속적인 픽처들의 샘플들을 포함할 수 있다. 트랙 단편 박스는 14996-12 명세에 정의되어 있으며 하나 이상의 트랙 단편에 대한 메타데이터를 포함한다. 예를 들어, 트랙 단편 박스는, 트랙 식별자(ID), 베이스 데이터 오프셋, 샘플 설명 인덱스, 디폴트 샘플 지속시간, 디폴트 샘플 크기, 및 디폴트 샘플 플래그들을 나타내는 트랙 단편 헤더 박스를 포함할 수 있다. 트랙 단편 박스는, 하나 이상의 트랙 단편 주행 박스를 포함할 수 있고, 각각은 트랙에 대한 연속적인 샘플 세트를 문서화한다. 예를 들어, 트랙 단편 박스는, 샘플 수, 데이터 오프셋, 샘플 플래그들, 샘플 지속시간, 샘플 크기, 샘플 구성 시간 오프셋 등을 나타내는 신택스 요소들을 포함할 수 있다. 이들 구조 내에서, 많은 필드들은 선택사항이며 디폴트화될 수 있다.The Movie Short Box is the top level box. Each movie short box provides information that was previously in the movie box. A short movie box may include one or more track fragments (“traf”) boxes. Within a movie snippet, there is a set of zero or more track snippets per track. A track fragment in turn contains zero or more track runs, each documenting successive runs of samples over that track. For example, each track run may include samples of successive pictures in some order, such as decoding order. The Track Fragments box is defined in the 14996-12 specification and contains metadata for one or more track fragments. For example, a track fragment box may include a track fragment header box indicating a track identifier (ID), base data offset, sample description index, default sample duration, default sample size, and default sample flags. A track fragment box may include one or more track fragment travel boxes, each documenting a contiguous set of samples for a track. For example, a track fragment box may contain syntax elements indicating number of samples, data offset, sample flags, sample duration, sample size, sample construction time offset, and the like. Within these structures, many fields are optional and can be defaulted.

스트림 액세스 포인트(Stream Access Point; SAP)에 대해 논의된다.Stream Access Point (SAP) is discussed.

ISO 미디어 파일 포멧(ISOBMFF)은 소위 스트림 액세스 포인트(SAP)라는 개념을 정의한다. SAP는 미디어 스트림(들)의 컨테이너로의 랜덤 액세스를 가능케한다. 컨테이너는 하나보다 많은 미디어 스트림을 포함할 수 있소, 각각은 소정의 미디어 유형의 연속적인 미디어의 인코딩된 버전이다. SAP는 컨테이너 내의 소정 위치로서, 식별된 미디어 스트림의 재생이, (a) 그 위치로부터 시작하여 컨테이너에 포함된 정보 및 (b) 컨테이너의 다른 부분(들)로부터의 또는 외부적으로 이용가능한 가능한 초기화 데이터만을 이용하여 시작될 수 있게 하는 위치이다. 파생된 명세는, SAP에서 컨테이너에 액세스하기 위해 초기화 데이터가 필요한지와, 초기화 데이터에 액세스할 수 있는 방법을 명시해야 한다.The ISO Media File Format (ISOBMFF) defines the concept of a so-called Stream Access Point (SAP). SAP enables random access to the container of the media stream(s). A container may contain more than one media stream, each being an encoded version of a contiguous media of a given media type. SAP refers to a location within a container where playback of an identified media stream is initiated from (a) the information contained in the container and (b) available externally or from other part(s) of the container starting from that location. This is a location that allows you to start using only data. The derived specification should specify whether initialization data is needed to access the container in SAP and how the initialization data can be accessed.

정의된 6가지 유형의 SAP가 있고, 이들은 다음과 같다 :There are six types of SAP defined, these are:

유형 1은 일부 코딩 방식에서 "폐쇄된 GoP 랜덤 액세스 포인트"라고 알려진 것에 대응하고(개선된 샘플 기반의 각도 인트라 예측(improved sample-based angular intra prediction; ISAP)으로부터 시작하여 디코딩 순서대로 모든 액세스 유닛이 올바르게 디코딩될 수 있어서, 그 결과 갭이 없는 올바르게 디코딩된 액세스 유닛들의 연속적인 시간 시퀀스를 생성한다), 추가로 디코딩 순서의 액세스 유닛은 또한 구성 순서에서 첫 번째 액세스 유닛이기도 하다.Type 1 corresponds to what is known as a "closed GoP random access point" in some coding schemes (starting from improved sample-based angular intra prediction (ISAP), and in decoding order all access units are can be decoded correctly, resulting in a continuous time sequence of correctly decoded access units without gaps), in addition, an access unit in decoding order is also the first access unit in construction order.

유형 2는 일부 코딩 방식에서 "폐쇄된 GoP 랜덤 액세스 포인트"라고 알려진 것에 대응하고, 이 경우, 미디어 스트림에서 디코딩 순서의 첫 번째 액세스 유닛은 구성 순서에서 첫 번째 액세스 유닛이 아니다.Type 2 corresponds to what is known as a “closed GoP random access point” in some coding schemes, in which case the first access unit in decoding order in the media stream is not the first access unit in construction order.

유형 3은 일부 코딩 방식에서 "개방형 GoP 랜덤 액세스 포인트"라고 알려진 것에 대응하고, 여기서, 디코딩 순서에서 올바르게 디코딩될 수 없는 랜덤 액세스 포인트를 뒤따르는 일부 액세스 유닛이 있고 랜덤 액세스 포인트 액세스 유닛은 구성 순서에서 첫 번째 액세스 유닛이 아닐 수 있다.Type 3 corresponds to what is known as an "open GoP random access point" in some coding schemes, where in decoding order there are some access units that follow a random access point that cannot be decoded correctly and the random access point access unit is in the construction order It may not be the first access unit.

유형 4는 일부 코딩 방식에서 "점진적 디코딩 리프레시(Gradual Decoding Refresh; GDR) 시작점"이라고 알려진 것에 대응한다.Type 4 corresponds to what is known as a "Gradual Decoding Refresh (GDR) starting point" in some coding schemes.

유형 5는 디코딩 순서에서 디코딩을 위한 첫 번째 액세스 유닛부터 시작하여 올바르게 디코딩될 수 없고 TDEC보다 더 큰 프리젠테이션 시간을 갖는 적어도 하나의 액세스 유닛이 있는 경우에 대응하며, 여기서 TDEC는 디코딩을 위한 첫 번째 액세스 유닛으로부터 시작하여 임의의 액세스 유닛의 가장 이른 프리젠테이션 시간이다.Type 5 corresponds to the case where there is at least one access unit that cannot be decoded correctly and has a presentation time greater than TDEC, starting with the first access unit for decoding in decoding order, where TDEC is the first access unit for decoding The earliest presentation time of any access unit, starting from the access unit.

유형 6은 디코딩 순서에서 디코딩을 위한 첫 번째 액세스 유닛부터 시작하여 올바르게 디코딩될 수 없고 TDEC보다 더 큰 프리젠테이션 시간을 갖는 적어도 하나의 액세스 유닛이 있는 경우에 대응하며, 여기서 TDEC는 디코딩을 위한 첫 번째 액세스 유닛으로부터 시작하여 임의의 액세스 유닛의 가장 이른 프리젠테이션 시간이 아니다.Type 6 corresponds to the case where there is at least one access unit that cannot be decoded correctly and has a presentation time greater than TDEC, starting with the first access unit for decoding in decoding order, where TDEC is the first access unit for decoding Starting from the access unit, it is not the earliest presentation time of any access unit.

HEVC의 경우, IRAP 픽처들에 대한 NAL 유닛 유형들의 설계는, IRAP 유형들과 SAP 유형들, 특히 SAP 유형 1 내지 유형 3 사이의 용이한 맵핑이라는 목표들 중 하나를 이용하여 이루어졌다.In the case of HEVC, the design of NAL unit types for IRAP pictures was made using one of the goals of easy mapping between IRAP types and SAP types, especially SAP types 1-3.

HTTP를 통한 동적 적응적 스트리밍(DASH)에 대해 논의된다.Dynamic Adaptive Streaming (DASH) over HTTP is discussed.

ISO/IEC 23009-1에 명시된 DASH(Dynamic adaptive streaming over HTTP)는 HTTP (적응적) 스트리밍 애플리케이션들을 위한 표준이다. DASH는 주로, 매니페스트라고도 알려진 미디어 프리젠테이션 설명(Media Presentation Description; MDP)의 포멧과 미디어 세그먼트 포멧을 명시한다. MPD는 서버에서 이용가능한 미디어를 기술하고 DASH 클라이언트가 자신이 관심갖는 미디어 시간에 미디어 버전을 자율적으로 다운로드할 수 있게 한다.Dynamic adaptive streaming over HTTP (DASH) specified in ISO/IEC 23009-1 is a standard for HTTP (adaptive) streaming applications. DASH mainly specifies the format of a Media Presentation Description (MDP), also known as a manifest, and a media segment format. MPD describes the media available on the server and allows DASH clients to autonomously download media versions at the time of the media they are interested in.

DASH는 계층적 데이터 모델에 기초한다. 프리젠테이션은 미디어 프리젠테이션을 구성하는 기간들의 시퀀스를 기술하는 MPD 문서에 의해 기술된다. 기간은 전형적으로 그 동안에 미디어 콘텐츠의 인코딩된 버전의 일관된 세트가 이용가능한 미디어 콘텐츠 기간을 나타낸다, 예를 들어 이용가능한 비트레이트들, 언어들, 캡션들, 자막들 등의 세트는 소정 기간 동안 변하지 않는다.DASH is based on a hierarchical data model. A presentation is described by an MPD document that describes a sequence of periods that make up a media presentation. A period typically represents a period of media content during which a consistent set of encoded versions of the media content is available, eg the set of available bitrates, languages, captions, subtitles, etc. does not change over a period of time. .

소정 기간 동안, 자료는 적합화 세트들로 배열된다. 적합화 세트는, 하나 또는 수개의 미디어 콘텐츠 성분들의 상호교환가능한 인코딩된 버전들의 세트를 나타낸다. 예를 들어, 메인 비디오 성분에 대한 하나의 적합화 세트와 메인 오디오 성분에 대한 별개의 적합화 세트가 있을 수 있다. 캡션들 또는 오디오 설명들 등의, 기타의 이용가능한 자료는 각각 별개의 적합화 세트를 가질 수 있다. 자료는 또한 멀티플렉싱된 형태로 제공될 수 있으며, 이 경우 멀티플렉스의 상호교환가능한 버전들은, 단일 적합화 세트, 예를 들어 소정 기간 동안 메인 오디오 및 메인 비디오 양쪽 모두를 포함하는 적합화 세트로서 기술될 수 있다. 멀티플렉싱된 성분들 각각은, 미디어 콘텐츠 성분 설명에 의해 개별적으로 기술될 수 있다.For a period of time, the material is arranged into fitting sets. A adaptation set represents a set of interchangeable encoded versions of one or several media content components. For example, there may be one set of adaptations for the main video component and a separate set of adaptations for the main audio component. Other available material, such as captions or audio descriptions, may each have a separate set of adaptations. The material may also be provided in multiplexed form, in which case the interchangeable versions of the multiplex will be described as a single adaptation set, for example a adaptation set containing both main audio and main video for a period of time. can Each of the multiplexed components may be individually described by a media content component description.

적합화 세트는 표현들의 세트를 포함한다. 표현은 하나 또는 수개의 미디어 콘텐츠 성분들의 전달가능한 인코딩된 버전을 기술한다. 표현은 하나 이상의 미디어 스트림(멀티플렉스 내의 각각의 미디어 콘텐츠 성분에 대해 하나씩)을 포함한다. 적합화 세트 내의 임의의 단일 표현은 포함된 미디어 콘텐츠 성분들을 렌더링하기에 충분하다. 하나의 적합화 세트에서 상이한 표현들을 수집함으로써, 미디어 프리젠테이션 저자는 그 표현들이 인지적으로 동등한 콘텐츠를 나타냄을 표현한다. 전형적으로, 이것은, 클라이언트들이 네트워크 조건들 또는 기타의 요인들에 맞게 적합화하기 위해 적합화 세트 내에서 표현마다 동적으로 전환할 수 있다는 것을 의미한다. 전환이란, 소정의 시간 t까지의 디코딩된 데이터의 프리젠테이션, 및 시간 t 이후의 또 다른 표현의 디코딩된 데이터의 프리젠테이션을 말한다. 표현들이 하나의 적합화 세트에 포함되고, 클라이언트가 적절하게 전환한다면, 미디어 프리젠테이션은 전환에 걸쳐 원활하게 인지될 것으로 예상된다. 클라이언트들은, 그들이 지원하지 않거나 기타의 방식으로 적합하지 않은 코덱들 또는 기타의 렌더링 기술들에 의존하는 표현들을 무시할 수 있다. 표현 내에서, 콘텐츠는 적절한 접근성 및 전달을 위해 시간적으로 세그먼트들로 분할될 수 있다. 한 세그먼트에 액세스하기 위해, 각각의 세그먼트에 대한 URL이 제공된다. 결과적으로, 세그먼트는 단일 HTTP 요청으로 회수될 수 있는 가장 큰 데이터 유닛이다.A fitting set includes a set of representations. A representation describes a deliverable encoded version of one or several media content components. A representation includes one or more media streams (one for each media content component in the multiplex). Any single representation in the adaptation set is sufficient to render the contained media content components. By collecting different representations in one adaptation set, the media presentation author expresses that the representations represent cognitively equivalent content. Typically, this means that clients can dynamically switch from representation to representation within an adaptation set to adapt to network conditions or other factors. Transition refers to the presentation of the decoded data up to a given time t, and the presentation of the decoded data in another representation after the time t. If representations are included in one adaptation set, and the client transitions appropriately, the media presentation is expected to be perceived smoothly throughout the transition. Clients may ignore representations that rely on codecs or other rendering technologies that they do not support or are otherwise unsuitable for. Within a presentation, content can be temporally divided into segments for proper accessibility and delivery. To access one segment, a URL for each segment is provided. Consequently, a segment is the largest data unit that can be retrieved in a single HTTP request.

DASH 기반의 HTTP 스트리밍을 위한 전형적인 절차는 다음과 같은 단계들을 포함한다.A typical procedure for DASH-based HTTP streaming includes the following steps.

1) 클라이언트는, 스트리밍 콘텐츠, 예를 들어, 영화의 MPD를 획득한다. MPD는, 상이한 대안적인 표현들에 관한 정보, 예를 들어, 비트 레이트, 비디오 해상도, 프레임 레이트, 스트리밍 콘텐츠의 오디오 언어뿐만 아니라, HTTP 자원들의 URL들(초기화 세그먼트 및 미디어 세그먼트들)을 포함한다.1) The client obtains the MPD of streaming content, eg a movie. The MPD contains information about different alternative representations, eg bit rate, video resolution, frame rate, audio language of streaming content, as well as URLs (initialization segment and media segments) of HTTP resources.

2) MPD 내의 정보와 클라이언트의 로컬 정보, 예를 들어, 네트워크 대역폭, 디코딩/디스플레이 능력들, 및 사용자 선호사항에 기초하여, 클라이언트는 원하는 표현(들), 한 번에 하나의 세그먼트(또는 그 일부)를 요청한다.2) Based on the information in the MPD and the client's local information, e.g., network bandwidth, decoding/display capabilities, and user preferences, the client can select the desired representation(s), one segment (or part thereof) at a time. ) is requested.

3) 클라이언트가 네트워크 대역폭 변경을 검출하면, 이상적으로는 랜덤 액세스 포인트에서 시작하는 세그먼트로부터 시작하여, 더 양호하게 정합하는 비트레이트를 갖는 상이한 표현의 세그먼트들을 요청한다.3) When the client detects a change in network bandwidth, it requests segments of different representations with better matching bitrates, ideally starting with the segment starting at the random access point.

HTTP 스트리밍 "세션" 동안에, 과거 위치로 뒤로 또는 미래 위치로 앞으로 탐색하라는 사용자 요청에 응답하여, 클라이언트는 원하는 위치에 가깝고 이상적으로 랜덤 액세스 포인트에서 시작되는 세그먼트로부터 시작하여 과거 또는 미래 세그먼트들을 요청한다. 사용자는 또한, 인트라 코딩된 비디오 픽처들만을 디코딩하거나 비디오 스트림의 시간적 서브세트만을 디코딩하기에 충분한 데이터를 요청함으로써 실현될 수 있는 콘텐츠의 고속 전방 진행을 요청할 수 있다.During an HTTP streaming "session", in response to a user request to navigate back to a past location or forward to a future location, the client requests past or future segments, starting with a segment close to the desired location and ideally starting at a random access point. A user may also request a fast forward progression of content, which may be realized by decoding only intra coded video pictures or by requesting enough data to decode only a temporal subset of the video stream.

기존 IRAP의 문제점들과 선두 픽처들이 논의된다.The problems and leading pictures of the existing IRAP are discussed.

선두 픽처들 및 IRAP 픽처들에 대한 NAL 유닛 유형들의 현재 설계는 다음과 같은 문제점을 갖는다:The current design of NAL unit types for leading pictures and IRAP pictures has the following problems:

- 시스템 레벨 애플리케이션들이 그들의 연관된 IRAP 픽처로부터 디코딩이 시작될 때 RASL을 식별하고 NAL 유닛 헤더를 단순히 파싱함으로써 비트스트림으로부터 제거하는 것을 돕기 위한 이유로 NAL 유닛 헤더 레벨에서 선두 픽처들(즉, RASL 및 RADL)의 식별을 위해 주어진 2개의 NAL 유닛 유형이 있다. 그러나, 실제로, 시스템 애플리케이션들에 의한 이러한 제거는 좀처럼 수행되지 않으며, 선두 픽처들 및 그들의 연관된 IRAP 픽처가 동일한 DASH 미디어 세그먼트에 캡슐화되고, HTTP 기반의 적응적 스트리밍에서 이러한 DASH 미디어 세그먼트가 클라이언트에 의해 요청되므로 선두 픽처들 및 그들의 연관된 IRAP 픽처는 RASL 픽처들의 요청이 회피될 수 있도록 별개로 요청되지 않는다. 더욱이, 시스템 애플리케이션들이 RASL 픽처들을 제거하거나 제거하지 않을 가능성을 허용하는 것은, 결과적으로 양쪽 즉, RASL 픽처들의 유무의, 가능성들을 모두 처리하기 위한 비디오 코딩 명세를 요구하며, 2개의 대안적 HRD(hypothetical decoder reference) 세트의 파라미터들을 포함하는 HRD의 명세를 포함한, 양쪽 모두의 상황에 대한 비트스트림 순응의 명시를 필요로 한다.- of leading pictures (i.e. RASL and RADL) at the NAL unit header level for the reason to help system level applications identify the RASL when decoding starts from their associated IRAP picture and remove it from the bitstream by simply parsing the NAL unit header There are two NAL unit types given for identification. However, in practice, such removal by system applications is seldom performed, the leading pictures and their associated IRAP picture are encapsulated in the same DASH media segment, and in HTTP-based adaptive streaming this DASH media segment is requested by the client. Therefore, the leading pictures and their associated IRAP picture are not separately requested so that the request of RASL pictures can be avoided. Moreover, allowing system applications to remove or not remove RASL pictures, consequently requires a video coding specification to address both possibilities, with or without RASL pictures, and two alternative hypothetical (HRD) decoder reference) requires the specification of the bitstream conformance for both situations, including the specification of the HRD containing the set of parameters.

- IRAP 픽처들의 경우, SAP 유형 1 내지 3에 용이하게 맵핑하기 위함 목적으로 선두 픽처들의 존재에 기초하여 그들을 구분하기 위해 수개의 NAL 유닛 유형이 주어진다. 그러나, IRAP NAL 유닛 유형들, 특히 CRA 유형의 정의의 유연성으로 인해, 많은 경우에, IRAP 픽처 유형들과 SAP 유형들 사이의 맵핑은 단순히 그 NAL 유닛 유형을 아는 것만으로는 이루어질 수 없다. IRAP 픽처들을 뒤따르는 액세스 유닛들은 여전히 체크될 필요가 있다. 예를 들어, CRA에 후속하여 0개 이상의 RASL 또는 0개 이상의 RADL이 뒤따르거나 이들 중 어느 것도 없을 수 있으므로, SAP 유형 1, 유형 2 또는 유형 3으로서 맵핑될 수 있다. CRA 픽처를 SAP 유형에 맵핑하는 방법을 아는 유일한 방법은, 뒤따르는 액세스 유닛들을 파싱하여 선두 픽처들이 있는지의 여부 및 그런 다음 어떤 유형의 선두 픽처들이 있는지를 확인하는 것이다.- For IRAP pictures, several NAL unit types are given to distinguish them based on the presence of leading pictures for the purpose of easy mapping to SAP types 1-3. However, due to the flexibility of the definition of IRAP NAL unit types, especially CRA types, in many cases the mapping between IRAP picture types and SAP types cannot be achieved simply by knowing the NAL unit type. Access units following IRAP pictures still need to be checked. For example, a CRA may be followed by zero or more RASLs or zero or more RADLs, or none of these, and thus may be mapped as SAP type 1, type 2 or type 3. The only way to know how to map a CRA picture to an SAP type is to parse the access units that follow to see if there are leading pictures and then what type of leading pictures there are.

비디오 데이터에 대해 이용가능한 NAL 유닛 유형의 세트를 5개 이하의 특정한 NAL 유닛 유형으로 제약(예를 들어, NAL 유닛 유형의 수를 4개로 제약)하는 비디오 코딩 기술들이 여기서 개시된다. 각각의 NAL 유닛은 헤더를 가지며 식별자(ID)에 의해 식별된다. 더 적은 수의 NAL 유형들은 ID가 더 작을 수 있다는 것을 의미한다. 따라서, 각각의 NAL 유닛의 크기가 감소될 수 있으며, 이것은 비트스트림의 크기를 상당히 감소시킨다(메모리 절약). 이것은 또한, 비트스트림을 전송하는데 이용되는 각각의 패킷의 크기를 감소시켜 네트워크 자원 이용량을 감소시킨다. 더욱이, 5개 이하의 특정한 NAL 유닛 유형은 선두 및 후미 픽처들(일명 비-IRAP 픽처들)이 동일한 NAL 유닛 유형을 공유하는 것을 허용한다. 이것은 또한, NAL 유닛 유형들이 I-RAP 픽처가 RADL 픽처 및/또는 RASL 픽처와 연관되어 있는지 여부를 표시하는 것을 허용한다. 추가로, 특정한 NAL 유닛 유형들은 DASH에서 상이한 SAP 유형들에 맵핑될 수 있다. 비-IRAP 픽처들이 NAL 유닛 유형에 의해 식별되지 않을 때를 위한 비디오 코딩 기술들이 또한 여기서 개시된다. 이러한 경우, 비트스트림 내의 플래그들은 IRAP 픽처가 RADL 픽처 또는 RASL 픽처와 연관되어 있는지의 여부를 나타내는 특정한 값으로 설정된다.Video coding techniques are disclosed herein that constrain the set of NAL unit types available for video data to no more than 5 specific NAL unit types (eg, constrain the number of NAL unit types to 4). Each NAL unit has a header and is identified by an identifier (ID). Fewer NAL types mean that the ID can be smaller. Thus, the size of each NAL unit can be reduced, which significantly reduces the size of the bitstream (memory saving). It also reduces the size of each packet used to transmit the bitstream, thereby reducing network resource usage. Moreover, no more than 5 specific NAL unit types allow leading and trailing pictures (aka non-IRAP pictures) to share the same NAL unit type. This also allows NAL unit types to indicate whether an I-RAP picture is associated with a RADL picture and/or a RASL picture. Additionally, certain NAL unit types may be mapped to different SAP types in DASH. Video coding techniques for when non-IRAP pictures are not identified by NAL unit type are also disclosed herein. In this case, the flags in the bitstream are set to a specific value indicating whether the IRAP picture is associated with a RADL picture or a RASL picture.

도 4는 비디오 비트스트림(400)의 한 실시예의 개략도이다. 본 명세서에서 사용될 때, 비디오 비트스트림(400)이란, 코딩된 비디오 비트스트림, 비트스트림, 또는 이들의 변형들이라고도 말할 수 있다. 도 4에 도시된 바와 같이, 비트스트림(400)은, 시퀀스 파라미터 세트(sequence parameter set; SPS)(410), 픽처 파라미터 세트(picture parameter set; PPS)(412), 슬라이스 헤더(414) 및 이미지 데이터(420)를 포함한다. 실제 응용에서, 슬라이스 헤더(414)는 타일 그룹 헤더라고 지칭될 수 있다.4 is a schematic diagram of one embodiment of a video bitstream 400 . As used herein, video bitstream 400 may also refer to a coded video bitstream, a bitstream, or variants thereof. As shown in FIG. 4 , the bitstream 400 includes a sequence parameter set (SPS) 410 , a picture parameter set (PPS) 412 , a slice header 414 and an image. data 420 . In a practical application, the slice header 414 may be referred to as a tile group header.

SPS(410)는 픽처 시퀀스(sequence of pictures; SOP) 내의 모든 픽처에 공통인 데이터를 포함한다. 대조적으로, PPS(412)는 전체 픽처에 공통적인 데이터를 포함한다. 슬라이스 헤더(414)는, 예를 들어, 슬라이스 유형, 참조 픽처들 중 어느 것이 이용될 것인지 등의, 현재 슬라이스에 대한 정보를 포함한다. SPS(410) 및 PPS(412)는 일반적으로 파라미터 세트라고 지칭될 수 있다. SPS(410), PPS(412) 및 슬라이스 헤더(414)는 NAL(Network Abstraction Layer) 유닛들의 유형들이다. 이미지 데이터(420)는 인코딩 또는 디코딩중인 이미지들 또는 비디오와 연관된 데이터를 포함한다. 이미지 데이터(420)는 단순히, 비트스트림(400)에서 운반되는 페이로드 또는 데이터라고 지칭될 수 있다.The SPS 410 includes data common to all pictures in a sequence of pictures (SOP). In contrast, the PPS 412 contains data common to the entire picture. The slice header 414 includes information about the current slice, for example, the slice type, which of the reference pictures will be used, and the like. SPS 410 and PPS 412 may be generally referred to as parameter sets. SPS 410 , PPS 412 , and slice header 414 are types of Network Abstraction Layer (NAL) units. Image data 420 includes data associated with images or video being encoded or decoded. The image data 420 may simply be referred to as a payload or data carried in the bitstream 400 .

한 실시예에서, SPS(410), PPS(412), 슬라이스 헤더(414), 또는 비트스트림(400)의 또 다른 부분은, 각각이 복수의 참조 픽처 엔트리를 포함하는 복수의 참조 픽처 목록 구조들을 운반한다. 본 기술분야의 통상의 기술자라면, 비트스트림(400)은 실제 응용들에서 다른 파라미터들 및 정보를 포함할 수 있다는 것을 이해할 것이다.In one embodiment, SPS 410 , PPS 412 , slice header 414 , or another portion of bitstream 400 contains a plurality of reference picture list structures each including a plurality of reference picture entries. carry A person of ordinary skill in the art will understand that the bitstream 400 may include other parameters and information in practical applications.

도 5는, 디코딩 순서(508) 및 프리젠테이션 순서(510)에서 선두 픽처들(504) 및 후미 픽처들(506)에 관한 I-RAP 픽처(502) 사이의 관계의 표현(500)이다. 한 실시예에서, I-RAP 픽처(502)는 클린 랜덤 액세스(CRA) 픽처 또는 RADL 픽처를 갖는 순간 디코더 리프레시(IDR) 픽처라고 지칭된다.5 is a representation 500 of the relationship between the I-RAP picture 502 with respect to leading pictures 504 and trailing pictures 506 in decoding order 508 and presentation order 510 . In one embodiment, the I-RAP picture 502 is referred to as a clean random access (CRA) picture or an instantaneous decoder refresh (IDR) picture with a RADL picture.

도 5에 도시된 바와 같이, 선두 픽처들(504)(예를 들어, 픽처 2 및 3)은 디코딩 순서(508)에서 I-RAP 픽처(502)를 뒤따르지만, 프리젠테이션 순서(510)에서 I-RAP 픽처(502)보다 선행한다. 후미 픽처(506)는 디코딩 순서(508) 및 프리젠테이션 순서(510) 양쪽 모두에서 I-RAP 픽처(502)를 뒤따른다. 2개의 선두 픽처(504)와 하나의 후미 픽처(506)가 도 5에 도시되어 있지만, 본 기술분야의 통상의 기술자라면, 실제적 응용들에서 더 많거나 더 적은 수의 선두 픽처들(504) 및/또는 후미 픽처들(506)이 디코딩 순서(508) 및 프리젠테이션 순서(510)에 존재할 수 있다는 것을 이해할 것이다.5, the leading pictures 504 (eg, pictures 2 and 3) follow the I-RAP picture 502 in decoding order 508, but I-RAP picture 502 in presentation order 510 - precedes the RAP picture 502 . Trailing picture 506 follows I-RAP picture 502 in both decoding order 508 and presentation order 510 . Although two leading pictures 504 and one trailing picture 506 are shown in FIG. 5 , those skilled in the art will find that in practical applications more or fewer leading pictures 504 and It will be appreciated that/or trailing pictures 506 may be in decoding order 508 and presentation order 510 .

도 5의 선두 픽처들(504)은, 2가지 유형, 즉, RASL과 RADL로 나뉜다. 디코딩이 I-RAP 픽처(502)(예를 들어, 픽처 1)에서 시작될 때, RADL 픽처(예를 들어, 픽처 3)가 적절하게 디코딩될 수 있다; 그러나, RASL 픽처(예를 들어, 픽처 2)는 적절하게 디코딩될 수 없다. 따라서, RASL 픽처는 폐기된다. RADL과 RASL 픽처들 간의 구분을 고려하여, I-RAP 픽처와 연관된 선두 픽처의 유형은 효율적이고 적절한 코딩을 위해 RADL 또는 RASL로서 식별되어야 한다.The leading pictures 504 of FIG. 5 are divided into two types, namely, RASL and RADL. When decoding starts at the I-RAP picture 502 (eg, picture 1), the RADL picture (eg, picture 3) can be properly decoded; However, the RASL picture (eg, picture 2) cannot be decoded properly. Therefore, the RASL picture is discarded. Considering the distinction between RADL and RASL pictures, the type of the leading picture associated with the I-RAP picture should be identified as RADL or RASL for efficient and proper coding.

도 6은 비디오 인코더(예를 들어, 비디오 인코더(20))에 의해 구현된 비디오 비트스트림(예를 들어, 비트스트림(400))을 인코딩하는 방법(600)의 한 실시예이다. 이 방법(600)은, (예를 들어, 비디오로부터의) 픽처가 비디오 비트스트림으로 인코딩된 다음 비디오 디코더(예를 들어, 비디오 디코더(30))를 향해 전송될 때 수행될 수 있다. 이 방법(600)은, 예를 들어 I-RAP 픽처와 연관된 선두 픽처의 유형을 식별하는 제한된 세트의 NAL 유닛 유형들이 이용되기 때문에, 인코딩 프로세스를 개선시킨다(예를 들어, 인코딩 프로세스를 종래의 인코딩 프로세스들보다 더 효율적이고 빠르게 등으로 만든다). 따라서, 실질적으로, 코덱의 성능이 개선되어, 더 양호한 사용자 경험으로 이어진다.6 is one embodiment of a method 600 of encoding a video bitstream (eg, bitstream 400 ) implemented by a video encoder (eg, video encoder 20 ). This method 600 may be performed when a picture (eg, from a video) is encoded into a video bitstream and then transmitted towards a video decoder (eg, video decoder 30 ). This method 600 improves the encoding process (eg, because a limited set of NAL unit types that identify the type of the leading picture associated with the I-RAP picture are used (eg, the encoding process is compared to conventional encoding) making it more efficient and faster than processes, etc.). Thus, substantially, the performance of the codec is improved, leading to a better user experience.

블록 602에서, 비디오 데이터에 대해 이용가능한 5개 미만의 NAL 유닛 유형의 세트가 비디오 인코더의 메모리에 저장된다. 한 실시예에서, 5개 미만의 NAL 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP를 포함한다. 한 실시예에서, 5개 미만의 NAL 유닛 유형의 세트는 이들 4개의 NAL 유닛 유형만으로 제약된다. 한 실시예에서, 선두 및 후미 픽처들(예를 들어, 선두 픽처들(504) 및 후미 픽처들(506)) 양쪽 모두에는 선두 및 후미 픽처들 NAL 유닛 유형이 할당된다.At block 602, a set of less than five NAL unit types available for video data is stored in the memory of the video encoder. In one embodiment, a set of less than 5 NAL unit types is an intra random access point (IRAP) with leading and trailing pictures NAL unit type, random access skipping leading (RASL) NAL unit type, random access decodable Includes IRAP with leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type. In one embodiment, the set of less than 5 NAL unit types is constrained to only these 4 NAL unit types. In one embodiment, both leading and trailing pictures (eg, leading pictures 504 and trailing pictures 506 ) are assigned a leading and trailing pictures NAL unit type.

블록 604에서, NAL 유닛 유형은, 비디오 데이터로부터의 픽처(예를 들어, 도 5의 픽처 2 또는 픽처 3)에 대한 5개 미만의 NAL 유닛 유형의 세트로부터 선택된다. 예를 들어, 도 5의 픽처 2에 대한 NAL 유닛 유형은 RASL NAL 유닛 유형을 갖는 IRAP일 수 있다. 또 다른 예로서, 픽처 5의 픽처 3에 대한 NAL 유닛 유형은, RADL NAL 유닛 유형을 갖는 IRAP일 수 있다.At block 604 , a NAL unit type is selected from a set of less than five NAL unit types for a picture from the video data (eg, picture 2 or picture 3 in FIG. 5 ). For example, the NAL unit type for picture 2 of FIG. 5 may be IRAP with a RASL NAL unit type. As another example, the NAL unit type for picture 3 of picture 5 may be IRAP with a RADL NAL unit type.

한 실시예에서, RASL NAL 유닛 유형을 갖는 IRAP는, 디코딩 순서에서 하나 이상의 RASL 픽처 및 0개 이상의 RADL 픽처가 뒤따르는 IRAP 픽처에 대해 선택된다. 한 실시예에서, IRAP 픽처는 CRA 픽처라고 지칭된다. 한 실시예에서, RASL NAL 유닛 유형을 갖는 IRAP는 클린 랜덤 액세스(CRA) NAL 유닛 유형이라고 지칭된다. 한 실시예에서, RASL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RASL로 지정된다. 한 실시예에서, IRAP_W_RASL 지정은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 3에 대응한다.In one embodiment, an IRAP with a RASL NAL unit type is selected for an IRAP picture that is followed by one or more RASL pictures and zero or more RADL pictures in decoding order. In one embodiment, an IRAP picture is referred to as a CRA picture. In one embodiment, an IRAP with a RASL NAL unit type is referred to as a Clean Random Access (CRA) NAL unit type. In one embodiment, an IRAP with a RASL NAL unit type is designated IRAP_W_RASL. In one embodiment, the IRAP_W_RASL designation corresponds to Stream Access Point (SAP) Type 3 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol.

한 실시예에서, RADL NAL 유닛 유형을 갖는 IRAP는, 디코딩 순서에서 하나 이상의 RADL 픽처 및 0개의 RADL 픽처가 뒤따르는 IRAP 픽처에 대해 선택된다. 한 실시예에서, IRAP 픽처는 RADL 픽처를 갖는 순간 디코더 리프레시(IDR) 픽처라고 지칭된다. 한 실시예에서, RADL NAL 유닛 유형을 갖는 IRAP는 RADL NAL 유닛 유형을 갖는 순간 디코더 리프레시(IDR)라고 지칭된다. 한 실시예에서, RADL NAL 유닛 유형을 갖는 IRAP는 IRAP_W_RADL로 지정된다. 한 실시예에서, IRAP_W_RADL은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 2에 대응한다.In one embodiment, an IRAP with a RADL NAL unit type is selected for an IRAP picture followed by one or more RADL pictures and zero RADL pictures in decoding order. In one embodiment, an IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture. In one embodiment, an IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type. In one embodiment, an IRAP with a RADL NAL unit type is designated as IRAP_W_RADL. In one embodiment, IRAP_W_RADL corresponds to Stream Access Point (SAP) Type 2 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol.

한 실시예에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 디코딩 순서에서 선두 픽처가 뒤따르지 않는 IRAP 픽처에 대해 선택된다. 한 실시예에서, IRAP 픽처는 선두 픽처가 없는 순간 디코더 리프레시(IDR) 픽처라고 지칭된다. 한 실시예에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 선두 픽처들 NAL 유닛 유형이 없는 순간 디코더 리프레시(IDR)라고 지칭된다. 한 실시예에서, 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP는 IRAP_N_LP로 지정된다. 한 실시예에서, IRAP_N_LP 지정은 하이퍼텍스트 전송 프로토콜을 통한 동적 적응적 스트리밍(DASH)에서 스트림 액세스 포인트(SAP) 유형 1에 대응한다.In one embodiment, an IRAP without leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order. In one embodiment, an IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture. In one embodiment, IRAP without leading pictures NAL unit type is referred to as instantaneous decoder refresh (IDR) without leading pictures NAL unit type. In one embodiment, IRAP without leading pictures NAL unit type is designated as IRAP_N_LP. In one embodiment, the IRAP_N_LP designation corresponds to Stream Access Point (SAP) Type 1 in Dynamic Adaptive Streaming (DASH) over Hypertext Transfer Protocol.

블록 606에서, 비디오 비트스트림(예를 들어, 도 4의 비트스트림(400))이 생성된다. 비디오 비트스트림은, 선택된 NAL 유닛 유형에 대응하는 NAL 유닛뿐만 아니라 선택된 NAL 유닛 유형을 식별하는 식별자를 포함한다. 식별자는, 예를 들어, 플래그 또는 비트 수일 수 있다.At block 606 , a video bitstream (eg, bitstream 400 of FIG. 4 ) is generated. The video bitstream includes an identifier that identifies the selected NAL unit type as well as the NAL unit corresponding to the selected NAL unit type. The identifier may be, for example, a flag or a number of bits.

블록 608에서, 비디오 인코더는 비디오 비트스트림(예를 들어, 비트스트림 400)을 비디오 디코더를 향해 전송한다. 비디오 비트스트림은 또한, 코딩된 비디오 비트스트림 또는 인코딩된 비디오 비트스트림이라고 지칭될 수 있다. 일단 비디오 디코더에 의해 수신되고 나면, 인코딩된 비디오 비트스트림은 (예를 들어, 아래에 설명되는 바와 같이) 디코딩되어 전자 디바이스(예를 들어, 스마트 폰, 태블릿, 랩탑, 개인용 컴퓨터 등)의 디스플레이 또는 스크린 상에서 사용자에게 디스플레이할 이미지를 발생시키거나 생성할 수 있다.At block 608 , the video encoder sends a video bitstream (eg, bitstream 400 ) towards a video decoder. A video bitstream may also be referred to as a coded video bitstream or an encoded video bitstream. Once received by the video decoder, the encoded video bitstream is decoded (eg, as described below) to display on an electronic device (eg, smart phone, tablet, laptop, personal computer, etc.) or Generate or create an image to be displayed to the user on the screen.

도 7은 비디오 디코더(예를 들어, 비디오 디코더(30))에 의해 구현된 코딩된 비디오 비트스트림(예를 들어, 비트스트림(400))을 디코딩하는 방법(700)의 한 실시예이다. 이 방법(700)은 디코딩된 비트스트림이 비디오 인코더(예를 들어, 비디오 인코더(20))로부터 직접 또는 간접적으로 수신된 후에 수행될 수 있다. 이 방법(700)은, 예를 들어 I-RAP 픽처와 연관된 선두 픽처의 유형을 식별하는 제한된 세트의 NAL 유닛 유형들이 이용되기 때문에, 디코딩 프로세스를 개선시킨다(예를 들어, 디코딩 프로세스를 종래의 디코딩 프로세스들보다 더 효율적이고 빠르게 등으로 만든다). 따라서, 실질적으로, 코덱의 성능이 개선되어, 더 양호한 사용자 경험으로 이어진다.7 is one embodiment of a method 700 for decoding a coded video bitstream (eg, bitstream 400 ) implemented by a video decoder (eg, video decoder 30 ). The method 700 may be performed after the decoded bitstream is received directly or indirectly from a video encoder (eg, video encoder 20 ). This method 700 improves the decoding process (eg, because a limited set of NAL unit types that identify the type of the leading picture associated with the I-RAP picture are used (eg, the decoding process is compared to conventional decoding) making it more efficient and faster than processes, etc.). Thus, substantially, the performance of the codec is improved, leading to a better user experience.

블록 702에서, 비디오 데이터에 대해 이용가능한 5개 미만의 네트워크 추상화 계층(NAL) 유닛 유형의 세트가 저장된다. 한 실시예에서, 5개 미만의 NAL 유닛 유형의 세트는, 선두 및 후미 픽처들 NAL 유닛 유형, 랜덤 액세스 스킵형 선두(RASL) NAL 유닛 유형을 갖는 인트라 랜덤 액세스 포인트(IRAP), 랜덤 액세스 디코딩가능 선두(RADL) NAL 유닛 유형을 갖는 IRAP, 및 선두 픽처들 NAL 유닛 유형을 갖지 않는 IRAP를 포함한다. 한 실시예에서, 5개 미만의 NAL 유닛 유형의 세트는 이들 4개의 NAL 유닛 유형만으로 제약된다. 한 실시예에서, 선두 및 후미 픽처들(예를 들어, 선두 픽처들(504) 및 후미 픽처들(506)) 양쪽 모두에는 선두 및 후미 픽처들 NAL 유닛 유형이 할당된다.At block 702, a set of less than five network abstraction layer (NAL) unit types available for video data is stored. In one embodiment, the set of less than 5 NAL unit types is: Intra Random Access Point (IRAP) with Leading and Trailing Pictures NAL Unit Type, Random Access Skip Type Leading (RASL) NAL Unit Type, Random Access Decodable Includes IRAP with leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type. In one embodiment, the set of less than 5 NAL unit types is constrained to only these 4 NAL unit types. In one embodiment, both leading and trailing pictures (eg, leading pictures 504 and trailing pictures 506 ) are assigned a leading and trailing pictures NAL unit type.

블록 704에서, NAL 유닛 및 식별자를 포함하는 코딩된 비디오 비트스트림(예를 들어, 비트스트림(400))이 수신된다. 블록 706에서, NAL 유닛을 인코딩하는데 이용된 NAL 유닛 유형이 5개 미만의 NAL 유닛 유형의 세트로부터 식별자에 기초하여 결정된다.At block 704 , a coded video bitstream (eg, bitstream 400 ) including a NAL unit and an identifier is received. At block 706 , the NAL unit type used to encode the NAL unit is determined based on the identifier from the set of less than 5 NAL unit types.

예를 들어, 도 5의 픽처 2에 대한 NAL 유닛 유형은 RASL NAL 유닛 유형을 갖는 IRAP일 수 있다. 또 다른 예로서, 픽처 5의 픽처 3에 대한 NAL 유닛 유형은, RADL NAL 유닛 유형을 갖는 IRAP일 수 있다.For example, the NAL unit type for picture 2 of FIG. 5 may be IRAP with a RASL NAL unit type. As another example, the NAL unit type for picture 3 of picture 5 may be IRAP with a RADL NAL unit type.

블록 708에서, 결정된 NAL 유닛 유형에 기초하여 NAL 유닛에 포함된 픽처들에 대한 프리젠테이션 순서(예를 들어, 도 5의 프리젠테이션 순서(510))가 할당된다. 프리젠테이션 순서는, 전자 디바이스(예를 들어, 스마트 폰, 태블릿, 랩탑, 개인용 컴퓨터 등)의 디스플레이 또는 스크린에서 사용자에게 디스플레이할 이미지를 발생시키거나 생성하는데 이용될 수 있다.At block 708 , a presentation order (eg, presentation order 510 in FIG. 5 ) for pictures included in the NAL unit is assigned based on the determined NAL unit type. The presentation sequence may be used to generate or generate an image for display to a user on a display or screen of an electronic device (eg, a smart phone, tablet, laptop, personal computer, etc.).

도 8은 비디오 인코더(예를 들어, 비디오 인코더(20))에 의해 구현된 비디오 비트스트림(예를 들어, 비트스트림(400))을 인코딩하는 방법(800)의 한 실시예이다. 이 방법(800)은, (예를 들어, 비디오로부터의) 픽처가 비디오 비트스트림으로 인코딩된 다음 비디오 디코더(예를 들어, 비디오 디코더(30))를 향해 전송될 때 수행될 수 있다. 이 방법(800)은, 예를 들어 비-IRAP 픽처에 대한 NAL 유닛이 RADL 또는 RASL 중 하나임을 나타내도록 플래그가 설정되기 때문에 인코딩 프로세스를 개선한다(예를 들어, 인코딩 프로세스를 종래의 인코딩 프로세스들보다 더 효율적이고 빠르게 등으로 만든다). 따라서, 실질적으로, 코덱의 성능이 개선되어, 더 양호한 사용자 경험으로 이어진다.8 is one embodiment of a method 800 of encoding a video bitstream (eg, bitstream 400 ) implemented by a video encoder (eg, video encoder 20 ). This method 800 may be performed when a picture (eg, from a video) is encoded into a video bitstream and then transmitted towards a video decoder (eg, video decoder 30 ). This method 800 improves the encoding process because, for example, a flag is set to indicate that the NAL unit for a non-IRAP picture is either RADL or RASL (e.g., replacing the encoding process with conventional encoding processes). more efficient and faster, etc.). Thus, substantially, the performance of the codec is improved, leading to a better user experience.

블록 802에서, 인트라 랜덤 액세스 포인트(intra random access point; IRAP) 픽처와 연관된 비-인트라 랜덤 액세스 포인트(비-IRAP) 픽처에 대한 NAL 유닛을 포함하는 비트스트림이 생성된다. 비-IRAP 픽처는 선두 픽처(예를 들어, 선두 픽처(504)) 또는 후미 픽처(후미 픽처(506))일 수 있다.At block 802 , a bitstream is generated that includes a NAL unit for a non-intra random access point (non-IRAP) picture associated with an intra random access point (IRAP) picture. The non-IRAP picture may be a leading picture (eg, leading picture 504) or a trailing picture (tailing picture 506).

블록 804에서, 비트스트림 내의 제1 플래그는 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 디코딩가능 선두(RADL) 픽처를 포함한다는 것을 나타내는 제1 값으로 설정된다. 한 실시예에서, 제1 플래그는 RadlPictureFlag로 지정되고 제2 플래그는 RaslPictureFlag로 지정된다. 한 실시예에서, 제1 값은 일(1)이다. 한 실시예에서, 제1 플래그는 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다. 한 실시예에서, 비-IRAP 픽처에 대한 각각의 참조 픽처 목록이 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RADL 픽처 이외의 어떠한 픽처도 포함하지 않을 때 제1 플래그는 제1 값과 동일하게 설정된다.At block 804 , a first flag in the bitstream is set to a first value indicating that the NAL unit for the non-IRAP picture contains a random access decodable leading (RADL) picture. In one embodiment, the first flag is designated as RadlPictureFlag and the second flag is designated as RaslPictureFlag. In one embodiment, the first value is one (1). In one embodiment, the first flag is set equal to the first value when the picture order count (POC) value of the non-IRAP picture is less than the POC value of the IRAP picture. In one embodiment, the first flag is a first value when each reference picture list for a non-IRAP picture contains no pictures other than an IRAP picture associated with a non-IRAP picture or another RADL picture associated with an IRAP picture. is set the same as

블록 806에서, 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 스킵형 선두(random access skipped leading; RASL) 픽처를 포함한다는 것을 나타내기 위해 비트스트림 내의 제2 플래그가 제1 값으로 설정된다. 한 실시예에서, 제2 플래그는 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다. 한 실시예에서, 비-IRAP 픽처에 대한 참조 픽처 목록이 디코딩 순서에서 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RASL 픽처에 선행하는 적어도 하나의 참조 픽처를 포함할 때 제2 플래그는 제1 값과 동일하게 설정된다.At block 806 , a second flag in the bitstream is set to a first value to indicate that the NAL unit for the non-IRAP picture includes a random access skipped leading (RASL) picture. In one embodiment, the second flag is set equal to the first value when the picture order count (POC) value of the non-IRAP picture is less than the POC value of the IRAP picture. In one embodiment, a second flag when the reference picture list for a non-IRAP picture includes at least one reference picture preceding an IRAP picture associated with the non-IRAP picture or another RASL picture associated with an IRAP picture in decoding order is set equal to the first value.

한 실시예에서, 제1 플래그 및 제2 플래그는, 비-IRAP 픽처에 대한 NAL 유닛이 RADL 픽처 또는 RASL 픽처를 포함하지 않음을 나타내기 위해 제2 값으로 설정될 수 있다. 한 실시예에서, 제2 값은 0이다. 한 실시예에서, 제1 플래그 및 제2 플래그는 양쪽 모두 비-IRAP 픽처에 대해 제1 값으로 설정되진 않는다.In one embodiment, the first flag and the second flag may be set to a second value to indicate that the NAL unit for the non-IRAP picture does not include a RADL picture or a RASL picture. In one embodiment, the second value is zero. In one embodiment, neither the first flag nor the second flag is set to a first value for a non-IRAP picture.

블록 808에서, 비디오 인코더는 비디오 비트스트림(예를 들어, 비트스트림 400)을 비디오 디코더를 향해 전송한다. 비디오 비트스트림은 또한, 코딩된 비디오 비트스트림 또는 인코딩된 비디오 비트스트림이라고 지칭될 수 있다. 일단 비디오 디코더에 의해 수신되고 나면, 인코딩된 비디오 비트스트림은 (예를 들어, 아래에 설명되는 바와 같이) 디코딩되어 전자 디바이스(예를 들어, 스마트 폰, 태블릿, 랩탑, 개인용 컴퓨터 등)의 디스플레이 또는 스크린 상에서 사용자에게 디스플레이할 이미지를 발생시키거나 생성할 수 있다.At block 808 , the video encoder sends a video bitstream (eg, bitstream 400 ) towards a video decoder. A video bitstream may also be referred to as a coded video bitstream or an encoded video bitstream. Once received by the video decoder, the encoded video bitstream is decoded (eg, as described below) to display on an electronic device (eg, smart phone, tablet, laptop, personal computer, etc.) or Generate or create an image to be displayed to the user on the screen.

도 9은 비디오 디코더(예를 들어, 비디오 디코더(30))에 의해 구현된 코딩된 비디오 비트스트림(예를 들어, 비트스트림(400))을 디코딩하는 방법(900)의 한 실시예이다. 이 방법(900)은 디코딩된 비트스트림이 비디오 인코더(예를 들어, 비디오 인코더(20))로부터 직접 또는 간접적으로 수신된 후에 수행될 수 있다. 이 방법(900)은, 예를 들어 비-IRAP 픽처에 대한 NAL 유닛이 RADL 또는 RASL 중 하나임을 나타내도록 플래그가 설정되기 때문에 디코딩 프로세스를 개선한다(예를 들어, 디코딩 프로세스를 종래의 디코딩 프로세스들보다 더 효율적이고 빠르게 등으로 만든다). 따라서, 실질적으로, 코덱의 성능이 개선되어, 더 양호한 사용자 경험으로 이어진다.9 is one embodiment of a method 900 for decoding a coded video bitstream (eg, bitstream 400 ) implemented by a video decoder (eg, video decoder 30 ). The method 900 may be performed after the decoded bitstream is received directly or indirectly from a video encoder (eg, video encoder 20 ). This method 900 improves the decoding process because, for example, a flag is set to indicate that the NAL unit for a non-IRAP picture is either RADL or RASL (e.g., comparing the decoding process to conventional decoding processes). more efficient and faster, etc.). Thus, substantially, the performance of the codec is improved, leading to a better user experience.

블록 902에서, 인트라 랜덤 액세스 포인트(IRAP) 픽처와 연관된 비-인트라 랜덤 액세스 포인트(비-IRAP) 픽처에 대한 제1 플래그, 제2 플래그, 및 NAL 유닛을 포함하는 코딩된 비디오 비트스트림이 수신된다. 비-IRAP 픽처는 선두 픽처(예를 들어, 선두 픽처(504)) 또는 후미 픽처(후미 픽처(506))일 수 있다.At block 902 , a coded video bitstream including a first flag, a second flag, and a NAL unit for a non-intra random access point (non-IRAP) picture associated with an intra random access point (IRAP) picture is received . The non-IRAP picture may be a leading picture (eg, leading picture 504) or a trailing picture (tailing picture 506).

블록 904에서, 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 디코딩가능 선두(RADL) 픽처를 포함한다는 결정은, 비트스트림 내의 제1 플래그가 제1 값으로 설정되었을 때 이루어진다. 한 실시예에서, 제1 플래그는 RadlPictureFlag로 지정되고 제2 플래그는 RaslPictureFlag로 지정된다. 한 실시예에서, 제1 값은 일(1)이다. 한 실시예에서, 제1 플래그는 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다. 한 실시예에서, 비-IRAP 픽처에 대한 각각의 참조 픽처 목록이 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RADL 픽처 이외의 어떠한 픽처도 포함하지 않을 때 제1 플래그는 제1 값과 동일하게 설정된다.At block 904 , a determination that the NAL unit for the non-IRAP picture includes a random access decodable leading (RADL) picture is made when a first flag in the bitstream is set to a first value. In one embodiment, the first flag is designated as RadlPictureFlag and the second flag is designated as RaslPictureFlag. In one embodiment, the first value is one (1). In one embodiment, the first flag is set equal to the first value when the picture order count (POC) value of the non-IRAP picture is less than the POC value of the IRAP picture. In one embodiment, the first flag is a first value when each reference picture list for a non-IRAP picture contains no pictures other than an IRAP picture associated with a non-IRAP picture or another RADL picture associated with an IRAP picture. is set the same as

블록 906에서, 비-IRAP 픽처에 대한 NAL 유닛이 랜덤 액세스 스킵형 선두(RASL) 픽처를 포함한다는 결정은, 비트스트림 내의 제2 플래그가 제1 값으로 설정되었을 때 이루어진다. 한 실시예에서, 제2 플래그는 비-IRAP 픽처의 POC(picture order count) 값이 IRAP 픽처의 POC 값보다 작을 때 제1 값과 동일하게 설정된다. 한 실시예에서, 비-IRAP 픽처에 대한 참조 픽처 목록이 디코딩 순서에서 비-IRAP 픽처와 연관된 IRAP 픽처 또는 IRAP 픽처와 연관된 또 다른 RASL 픽처에 선행하는 적어도 하나의 참조 픽처를 포함할 때 제2 플래그는 제1 값과 동일하게 설정된다.At block 906 , a determination that the NAL unit for the non-IRAP picture includes a random access skip type leading (RASL) picture is made when a second flag in the bitstream is set to a first value. In one embodiment, the second flag is set equal to the first value when the picture order count (POC) value of the non-IRAP picture is less than the POC value of the IRAP picture. In one embodiment, a second flag when the reference picture list for a non-IRAP picture includes at least one reference picture preceding an IRAP picture associated with the non-IRAP picture or another RASL picture associated with an IRAP picture in decoding order is set equal to the first value.

블록 908에서, NAL 유닛에 포함된 픽처들에 대한 프리젠테이션 순서(예를 들어, 도 5의 프리젠테이션 순서(510))는, 제1 값을 갖는 제1 플래그 또는 제2 플래그에 기초하여 할당된다. 프리젠테이션 순서는, 전자 디바이스(예를 들어, 스마트 폰, 태블릿, 랩탑, 개인용 컴퓨터 등)의 디스플레이 또는 스크린에서 사용자에게 디스플레이할 이미지를 발생시키거나 생성하는데 이용될 수 있다.At block 908 , a presentation order (eg, presentation order 510 of FIG. 5 ) for pictures included in the NAL unit is assigned based on a first flag or a second flag having a first value. . The presentation sequence may be used to generate or generate an image for display to a user on a display or screen of an electronic device (eg, a smart phone, tablet, laptop, personal computer, etc.).

상기에 대한 한 대안에서, 선두 픽처들 및 IRAP 픽처들에 대한 NAL 유닛 유형들은 다음과 같이 할당된다 : 선두 픽처들에 대한 2개의 NAL 유닛 유형, 즉, RASL_NUT 및 RADL_NUT, 및 IRAP 픽처들에 대한 하나의 NAL 유닛 유형, 즉, IRAP_NUT.In one alternative to the above, the NAL unit types for leading pictures and IRAP pictures are assigned as follows: two NAL unit types for leading pictures, RASL_NUT and RADL_NUT, and one for IRAP pictures. NAL unit type, i.e. IRAP_NUT.

한 실시예에서, IRAP NAL 유닛 유형들로부터 SAP 유형들로의 맵핑은 다음과 같다. IRAP NAL 유닛 유형을 갖는 픽처와 조우하면, 애플리케이션은 RASL NAL 유닛 유형을 갖는 픽처 수, 및 IRAP 픽처와 디코딩 순서에서 IRAP 픽처를 뒤따르는 첫 번째 후미 픽처(예를 들어, 후미 NAL 유닛 유형을 갖는 픽처) 사이의 RADL NAL 유닛 유형을 갖는 픽처 수를 카운트해야 한다. RASL 및 RADL 픽처 수들에 따라, 다음과 같은 맵핑이 명시된다.In one embodiment, the mapping from IRAP NAL unit types to SAP types is as follows. When encountering a picture with an IRAP NAL unit type, the application determines the number of pictures with the RASL NAL unit type, and the IRAP picture and the first trailing picture that follows the IRAP picture in decoding order (e.g., a picture with a trailing NAL unit type). ) should count the number of pictures with a RADL NAL unit type between Depending on the RASL and RADL picture numbers, the following mapping is specified.

RASL 픽처 수가 0보다 크다면, IRAP 픽처는 SAP 유형 3이다. 그렇지 않고, RASL 픽처 수가 0이고 RADL의 수가 0보다 크다면, IRAP 픽처는 SAP 유형 2이다. 그렇지 않으면, (예를 들어, RASL 픽처와 RADL 픽처 양쪽 모두의 수가 0), IRAP 픽처는 SAP 유형 1이다.If the number of RASL pictures is greater than 0, the IRAP picture is SAP type 3. Otherwise, if the number of RASL pictures is 0 and the number of RADLs is greater than 0, the IRAP picture is SAP type 2. Otherwise (eg, the number of both RASL pictures and RADL pictures is 0), the IRAP picture is SAP type 1.

또 다른 대안에서, 선두 픽처들 및 IRAP 픽처들에 대한 NAL 유닛 유형들은 다음과 같이 할당된다 : 선두 픽처들에 대한 2개의 NAL 유닛 유형들, 즉, RASL_NUT 및 RADL_NUT. IRAP 픽처들에 대한 NAL 유닛 유형의 정의는 다음과 같다. IDR (IDR_NUT): 디코딩 순서에서 0개 이상의 RADL 픽처 및 0개의 RASL 픽처인 IRAP 픽처의 NAL 유닛. CRA(CRA_NUT) : 디코딩 순서에서 하나 이상의 RASL 픽처 및 0개 이상의 RADL 픽처가 뒤따르는 IRAP 픽처의 NAL 유닛.In another alternative, the NAL unit types for leading pictures and IRAP pictures are assigned as follows: Two NAL unit types for leading pictures, RASL_NUT and RADL_NUT. The definition of the NAL unit type for IRAP pictures is as follows. IDR (IDR_NUT): A NAL unit of an IRAP picture that is 0 or more RADL pictures and 0 RASL pictures in decoding order. CRA(CRA_NUT): NAL unit of an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order.

IRAP NAL 유닛 유형들로부터 SAP 유형들로의 맵핑은 다음과 같다 : CRA_NUT는 SAP 유형 3이다.The mapping from IRAP NAL unit types to SAP types is as follows: CRA_NUT is SAP type 3.

IDR_NUT 유형의 픽처와 조우하면, 애플리케이션은 디코딩 순서에서 IDR 픽처를 뒤따르는 픽처를 체크해야 한다. 뒤따르는 픽처가 RADL_NUT를 갖는 픽처이면, IDR 픽처는 SAP 2이다. 그렇지 않으면, IDR 픽처는 SAP 1이다.When encountering a picture of type IDR_NUT, the application must check the picture that follows the IDR picture in decoding order. If the following picture is a picture with RADL_NUT, the IDR picture is SAP 2. Otherwise, the IDR picture is SAP 1.

픽처가 IDR_NUT 유형일 때, 디코딩 순서에서 IDR 픽처의 바로 뒤에 오는 픽처는 RADL_NUT 또는 Trailing_NUT를 갖는 픽처일 수 있다는 제약이 있다.When a picture is of type IDR_NUT, there is a restriction that the picture immediately following the IDR picture in decoding order may be a picture with RADL_NUT or Trailing_NUT.

또 다른 대안에서, 선두 픽처들 및 IRAP 픽처들에 대한 NAL 유닛 유형들은 다음과 같이 할당된다 : 선두 픽처들에 대한 하나의 NAL 유닛 유형 : LP_NUT. IRAP 픽처들에 대한 NAL 유닛 유형들의 정의는 다음과 같다 : IDR (IDR_NUT): 디코딩 순서에서 RADL 픽처인 0개 이상의 선두 픽처 및 0개의 RASL 픽처인 IRAP 픽처의 NAL 유닛,In another alternative, the NAL unit types for leading pictures and IRAP pictures are assigned as follows: One NAL unit type for leading pictures: LP_NUT. The definition of NAL unit types for IRAP pictures is as follows: IDR (IDR_NUT): NAL unit of an IRAP picture that is 0 or more leading pictures and 0 RASL pictures that are RADL pictures in decoding order,

CRA (CRA_NUT): 디코딩 순서에서 RASL 픽처들인 하나 이상의 선두 픽처 및 0개 이상의 RADL 픽처가 뒤따르는 IRAP 픽처의 NAL 유닛.CRA (CRA_NUT): NAL unit of an IRAP picture followed by one or more leading pictures and zero or more RADL pictures that are RASL pictures in decoding order.

IRAP NAL 유닛 유형들로부터 SAP로의 맵핑은 다음과 같다 : CRA_NUT는 SAP 유형 3이다.The mapping from IRAP NAL unit types to SAP is as follows: CRA_NUT is SAP type 3.

IDR_NUT 유형의 픽처와 조우하면, 애플리케이션은 디코딩 순서에서 IDR 픽처를 뒤따르는 픽처를 체크해야 한다. 뒤따르는 픽처가 LP_NUT를 갖는 픽처이면, IDR 픽처는 SAP 2이다. 그렇지 않으면, IDR 픽처는 SAP 1이다. 픽처가 IDR_NUT 유형일 때, 디코딩 순서에서 IDR 픽처의 바로 뒤에 오는 픽처는 LP_NUT 또는 Trailing_NUT를 갖는 픽처일 수 있다는 제약이 있다.When encountering a picture of type IDR_NUT, the application must check the picture that follows the IDR picture in decoding order. If the picture that follows is a picture with LP_NUT, then the IDR picture is SAP 2. Otherwise, the IDR picture is SAP 1. When a picture is of type IDR_NUT, there is a restriction that the picture immediately following the IDR picture in decoding order may be a picture with LP_NUT or Trailing_NUT.

도 10은 본 개시내용의 한 실시예에 따른 비디오 코딩 디바이스(1000)(예를 들어, 비디오 인코더(20) 또는 비디오 디코더(30))의 개략도이다. 비디오 코딩 디바이스(1000)는 여기서 설명된 개시된 실시예들을 구현하기에 적합하다. 비디오 코딩 디바이스(1000)는 데이터를 수신하기 위한 유입 포트들(1010) 및 수신기 유닛들(Rx)(1020); 데이터를 처리하는 프로세서, 로직 유닛 또는 중앙 처리 유닛(CPU)(1030); 데이터를 전송하기 위한 전송기 유닛들(Tx)(1040) 및 유출 포트들(1050); 및 데이터를 저장하기 위한 메모리(1060)를 포함한다. 비디오 코딩 디바이스(1000)는 또한, 유입 포트들(1010), 수신기 유닛들(1020), 전송기 유닛들(1040) 및 광학적 또는 전기적 신호들의 유출 또는 유입을 위한 유출 포트들(1050)에 결합된 광-전기(OE) 컴포넌트들 및 전기-광(EO) 컴포넌트들을 포함할 수 있다.10 is a schematic diagram of a video coding device 1000 (eg, video encoder 20 or video decoder 30 ) in accordance with an embodiment of the present disclosure. Video coding device 1000 is suitable for implementing the disclosed embodiments described herein. The video coding device 1000 includes inlet ports 1010 and receiver units (Rx) 1020 for receiving data; a processor, logic unit, or central processing unit (CPU) 1030 that processes data; transmitter units (Tx) 1040 and egress ports 1050 for transmitting data; and a memory 1060 for storing data. The video coding device 1000 also includes a light coupled to inlet ports 1010 , receiver units 1020 , transmitter units 1040 and outlet ports 1050 for the outflow or ingress of optical or electrical signals. - electrical (OE) components and electro-optical (EO) components.

프로세서(1030)는 하드웨어 및 소프트웨어에 의해 구현된다. 프로세서(1030)는, 하나 이상의 CPU 칩, 코어(예를 들어, 멀티 코어 프로세서), FPGA(field-programmable gate array), ASIC(application specific integrated circuit) 및 DSP(digital signal processor)로서 구현될 수 있다. 프로세서(1030)는, 유입 포트들(1010), 수신기 유닛들(1020), 전송기 유닛들(1040), 유출 포트들(1050) 및 메모리(1060)와 통신한다. 프로세서(1030)는 코딩 모듈(1070)을 포함한다. 코딩 모듈(1070)은 위에서 설명된 개시된 실시예들을 구현한다. 예를 들어, 코딩 모듈(1070)은 다양한 네트워킹 기능을 구현, 처리, 준비 또는 제공한다. 따라서, 코딩 모듈(1070)의 포함은 비디오 코딩 디바이스(1000)의 기능에 실질적인 개선을 제공하고 비디오 코딩 디바이스(1000)의 상이한 상태로의 변환에 영향을 미친다. 대안으로서, 코딩 모듈(1070)은 메모리(1060)에 저장되고 프로세서(1030)에 의해 실행되는 명령어들로서 구현된다.The processor 1030 is implemented by hardware and software. The processor 1030 may be implemented as one or more CPU chips, cores (eg, multi-core processors), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signal processors (DSPs). . The processor 1030 communicates with the ingress ports 1010 , the receiver units 1020 , the transmitter units 1040 , the egress ports 1050 and the memory 1060 . The processor 1030 includes a coding module 1070 . Coding module 1070 implements the disclosed embodiments described above. For example, the coding module 1070 implements, processes, prepares, or provides various networking functions. Thus, the inclusion of the coding module 1070 provides substantial improvements to the functionality of the video coding device 1000 and affects the transformation of the video coding device 1000 to different states. Alternatively, coding module 1070 is implemented as instructions stored in memory 1060 and executed by processor 1030 .

비디오 코딩 디바이스(1000)는 또한, 사용자와 데이터를 통신하기 위한 입력 및/또는 출력(I/O) 디바이스들(1080)을 포함할 수 있다. I/O 디바이스들(1080)은, 비디오 데이터를 디스플레이하기 위한 디스플레이, 오디오 데이터를 출력하기 위한 스피커들 등의 출력 디바이스를 포함할 수 있다. I/O 디바이스들(1080)은 또한, 키보드, 마우스, 트랙볼 등의 입력 디바이스들, 및/또는 이러한 출력 디바이스들과 상호작용하기 위한 대응하는 인터페이스를 포함할 수 있다.The video coding device 1000 may also include input and/or output (I/O) devices 1080 for communicating data with a user. I/O devices 1080 may include an output device, such as a display for displaying video data, speakers for outputting audio data, or the like. I/O devices 1080 may also include input devices such as a keyboard, mouse, trackball, and/or a corresponding interface for interacting with such output devices.

메모리(1060)는, 하나 이상의 디스크, 테이프 드라이브, 및 솔리드 스테이트 드라이브를 포함하고, 프로그램들이 실행을 위해 선택될 때 이러한 프로그램들을 저장하고, 프로그램 실행 동안에 판독된 명령어들 및 데이터를 저장하기 위한 오버플로우 데이터 저장 디바이스로서 이용될 수 있다. 메모리(1060)는 휘발성 및/또는 비휘발성일 수 있고, ROM(read-only memory), RAM(random access memory), TCAM(ternary content-addressable memory), 및/또는 SRAM(static random-access memory)일 수 있다.Memory 1060 includes one or more disks, tape drives, and solid state drives, and overflows for storing programs as they are selected for execution, and for storing instructions and data read during program execution. It can be used as a data storage device. Memory 1060 may be volatile and/or non-volatile, and may be read-only memory (ROM), random access memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM). can be

도 11은 코딩을 위한 수단(1100)의 한 실시예의 개략도이다. 실시예에서, 코딩을 위한 수단(1100)은 비디오 코딩 디바이스(1102)(예를 들어, 비디오 인코더(20) 또는 비디오 디코더(30))에서 구현된다. 비디오 코딩 디바이스(1102)는 수신 수단(1101)을 포함한다. 수신 수단(1101)은 인코딩할 픽처를 수신하거나 디코딩할 비트스트림을 수신하도록 구성된다. 비디오 코딩 디바이스(1102)는 수신 수단(1101)에 결합된 전송 수단(1107)을 포함한다. 전송 수단(1107)은, 비트스트림을 디코더에 전송하거나 디코딩된 이미지를 디스플레이 수단(예를 들어, I/O 디바이스들(1080) 중 하나)에 전송하도록 구성된다.11 is a schematic diagram of one embodiment of means 1100 for coding. In an embodiment, the means for coding 1100 is implemented in a video coding device 1102 (eg, video encoder 20 or video decoder 30 ). The video coding device 1102 comprises receiving means 1101 . The receiving means 1101 is configured to receive a picture to encode or receive a bitstream to decode. The video coding device 1102 comprises transmitting means 1107 coupled to receiving means 1101 . The transmitting means 1107 is configured to transmit the bitstream to the decoder or the decoded image to the display means (eg one of the I/O devices 1080 ).

비디오 코딩 디바이스(1102)는 저장 수단(1103)을 포함한다. 저장 수단(1103)은 수신 수단(1101) 또는 전송 수단(1107) 중 적어도 하나에 결합된다. 저장 수단(1103)은 명령어들을 저장하도록 구성된다. 비디오 코딩 디바이스(1102)는 또한, 처리 수단(1105)을 포함한다. 처리 수단(1105)은 저장 수단(1103)에 결합된다. 처리 수단(1105)은 여기서 개시된 방법들을 수행하기 위해 저장 수단(1103)에 저장된 명령어들을 실행하도록 구성된다.The video coding device 1102 comprises storage means 1103 . The storage means 1103 is coupled to at least one of the receiving means 1101 or the transmitting means 1107 . The storage means 1103 is configured to store instructions. The video coding device 1102 also comprises processing means 1105 . The processing means 1105 is coupled to the storage means 1103 . The processing means 1105 is configured to execute the instructions stored in the storage means 1103 for performing the methods disclosed herein.

또한, 여기서 개시된 예시적인 방법들의 단계들이 반드시 설명된 순서대로 수행될 것이 요구되는 것은 아니며, 이러한 방법들의 단계들의 순서는 단지 예시적인 것으로 이해되어야 한다는 점을 이해해야 한다. 마찬가지로, 추가 단계들이 이러한 방법들에 포함될 수 있고, 소정의 단계들은, 본 개시내용의 다양한 실시예와 일치하는 방법들에서 생략되거나 결합될 수 있다.Also, it is to be understood that the steps of the exemplary methods disclosed herein are not necessarily performed in the order described, and that the order of steps in these methods is to be understood as illustrative only. Likewise, additional steps may be included in these methods, and certain steps may be omitted or combined in methods consistent with various embodiments of the present disclosure.

본 개시내용에서 수개의 실시예들이 제공되었지만, 개시된 시스템 및 방법들은 본 개시내용의 사상 또는 범위로부터 벗어나지 않고 많은 다른 특정한 형태로 구현될 수도 있다는 것을 이해해야 한다. 본 예들은 제한이 아니라 예시로서 간주되어야 하며, 여기서 주어진 상세사항들로 제한하고자 하는 의도가 아니다. 예를 들어, 다양한 요소들 또는 컴포넌트들이 결합되거나 또 다른 시스템 내에 통합될 수 있으며, 소정의 피쳐들은 생략되거나, 구현되지 않을 수도 있다.While several embodiments have been provided in this disclosure, it should be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of the disclosure. These examples are to be regarded as illustrative and not restrictive, and are not intended to be limiting to the details given herein. For example, various elements or components may be combined or integrated into another system, and certain features may be omitted or not implemented.

또한, 다양한 실시예에서 개별적 또는 별개인 것으로 설명되고 예시된 기술들, 시스템들, 서브시스템들, 및 방법들은, 본 개시내용의 범위로부터 벗어나지 않고 다른 시스템들, 모듈들, 기술들, 또는 방법들과 결합되거나 통합될 수도 있다. 서로 결합되거나 직접 결합되거나 통신하는 것으로 도시된 또는 논의된 기타의 항목들은, 전기적이든, 기계적이든 또는 기타의 방식으로든, 어떤 인터페이스, 디바이스, 또는 중간 컴포넌트를 통해 간접 결합되거나 통신할 수도 있다. 변경, 대체, 및 수정의 다른 예들은 본 기술분야의 통상의 기술자라면 알아낼 수 있으며, 여기서 개시된 사상과 범위로부터 벗어나지 않고 이루어질 수 있다.In addition, techniques, systems, subsystems, and methods described and illustrated as separate or separate in various embodiments may be used in other systems, modules, techniques, or methods without departing from the scope of the present disclosure. may be combined or integrated with Other items shown or discussed as coupled or directly coupled or in communication with each other may be indirectly coupled or communicated via some interface, device, or intermediate component, whether electrical, mechanical, or otherwise. Other examples of changes, substitutions, and modifications will occur to those skilled in the art, and may be made without departing from the spirit and scope disclosed herein.

Claims

A method of encoding a video bitstream implemented by a video encoder, comprising:
storing in a memory of the video encoder a set of less than five network abstraction layer (NAL) unit types available for video data;
selecting, by a processor of the video encoder, a NAL unit type from the set of less than five NAL unit types for a picture from the video data;
generating, by a processor of the video encoder, a video bitstream including a NAL unit corresponding to the selected NAL unit type and including an identifier identifying the selected NAL unit type; and
sending, by a transmitter of the video encoder, the video bitstream towards a video decoder;
How to include.

The method of claim 1, wherein the set of less than 5 network abstraction layer (NAL) unit types have a leading and trailing pictures NAL unit type, a random access skipped leading (RASL) NAL unit type. An intra random access point (IRAP), an IRAP with a random access decodable leading (RADL) NAL unit type, and an IRAP without a leading pictures NAL unit type.

The intra random access point of claim 1 , wherein the set of less than 5 network abstraction layer (NAL) unit types comprises: a leading and trailing pictures NAL unit type, a random access skipping type leading (RASL) NAL unit type ( IRAP), IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

Method according to claim 2 or 3, wherein both leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

The method according to any one of claims 2 or 3, wherein an IRAP with a RASL NAL unit type is selected for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order.

The method of claim 5 , wherein the IRAP picture is referred to as a clean random access (CRA) picture.

6. The method of claim 5, wherein the IRAP with a RASL NAL unit type is referred to as a clean random access (CRA) NAL unit type.

6. The method of claim 5, wherein the IRAP with a RASL NAL unit type is designated as IRAP_W_RASL.

The method of claim 8 , wherein the IRAP_W_RASL designation corresponds to stream access point (SAP) type 3 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

The method according to claim 2 or 3, wherein the IRAP with a RADL NAL unit type is selected for an IRAP picture followed by one or more RADL pictures and zero RADL pictures in decoding order.

The method of claim 10 , wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

11. The method of claim 10, wherein the IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

11. The method of claim 10, wherein the IRAP with a RADL NAL unit type is designated as IRAP_W_RADL.

The method of claim 13 , wherein the IRAP_W_RADL corresponds to a stream access point (SAP) type 2 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

The method according to claim 2 or 3, wherein the IRAP without leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order.

16. The method of claim 15, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

16. The method of claim 15, wherein the IRAP without leading pictures NAL unit type is referred to as instantaneous decoder refresh (IDR) without leading pictures NAL unit type.

14. The method of claim 13, wherein the IRAP without leading pictures NAL unit type is designated as IRAP_N_LP.

The method of claim 18 , wherein the IRAP_N_LP designation corresponds to stream access point (SAP) type 1 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

A method of decoding a coded video bitstream implemented by a video decoder, comprising:
storing in a memory of the video decoder a set of less than five network abstraction layer (NAL) unit types available for video data;
receiving, by a receiver of the video decoder, a coded video bitstream comprising a NAL unit and an identifier;
determining, by a processor of the video decoder, a NAL unit type used to encode a NAL unit based on the identifier from a set of less than five NAL unit types; and
allocating, by the processor of the video decoder, a presentation order for pictures included in the NAL unit based on the determined NAL unit type;
How to include.

21. The intra-random access point of claim 20, wherein the set of less than five network abstraction layer (NAL) unit types comprises: a leading and trailing pictures NAL unit type, a random access skipping type leading (RASL) NAL unit type. IRAP), IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

23. A method according to claim 21 or 22, wherein both leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

23. The method according to claim 21 or 22, wherein the IRAP with a RASL NAL unit type is determined for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order.

25. The method of claim 24, wherein the IRAP picture is referred to as a clean random access (CRA) picture.

25. The method of claim 24, wherein the IRAP with a RASL NAL unit type is referred to as a clean random access (CRA) NAL unit type.

25. The method of claim 24, wherein the IRAP with a RASL NAL unit type is designated IRAP_W_RASL.

28. The method of claim 27, wherein the IRAP_W_RASL designation corresponds to stream access point (SAP) type 3 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

23. The method according to claim 21 or 22, wherein the IRAP with a RADL NAL unit type is determined for an IRAP picture followed by one or more RADL pictures and zero RADL pictures in decoding order.

30. The method of claim 29, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

30. The method of claim 29, wherein the IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

30. The method of claim 29, wherein the IRAP with a RADL NAL unit type is designated as IRAP_W_RADL.

33. The method of claim 32, wherein the IRAP_W_RADL corresponds to stream access point (SAP) type 2 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

23. The method according to claim 21 or 22, wherein the IRAP without leading pictures NAL unit type is determined for an IRAP picture not followed by a leading picture in decoding order.

35. The method of claim 34, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

35. The method of claim 34, wherein the IRAP without leading pictures NAL unit type is referred to as instantaneous decoder refresh (IDR) without leading pictures NAL unit type.

35. The method of claim 34, wherein the IRAP without leading pictures NAL unit type is designated as IRAP_N_LP.

38. The method of claim 37, wherein the IRAP_N_LP designation corresponds to stream access point (SAP) type 1 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

An encoding device comprising:
a memory comprising a set of instructions and less than five network abstraction layer (NAL) unit types available for video data;
a processor coupled to the memory and configured to implement the instructions, the instructions causing an encoding device to:
select a NAL unit type from a set of less than 5 NAL unit types for a picture from the video data;
generate a video bitstream that includes a NAL unit corresponding to the selected NAL unit type and includes an identifier identifying the selected NAL unit type; and
A transmitter coupled to the processor and configured to transmit the video bitstream towards the video decoder.
An encoding device comprising:

40. The method of claim 39, wherein the set of less than five Network Abstraction Layer (NAL) unit types comprises: a leading and trailing pictures NAL unit type, an intra random access point having a random access skipping leading (RASL) NAL unit type ( IRAP), IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

42. Encoding device according to claim 40 or 41, wherein both leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

42. The encoding device according to claim 40 or 41, wherein an IRAP with a RASL NAL unit type is selected for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order.

44. The encoding device of claim 43, wherein the IRAP picture is referred to as a clean random access (CRA) picture.

44. The encoding device of claim 43, wherein the IRAP with a RASL NAL unit type is referred to as a clean random access (CRA) NAL unit type.

44. The device of claim 43, wherein the IRAP with a RASL NAL unit type is designated as IRAP_W_RASL.

47. The encoding device of claim 46, wherein the IRAP_W_RASL designation corresponds to stream access point (SAP) type 3 in dynamic adaptive streaming over hypertext transfer protocol (DASH). .

42. The encoding device according to claim 40 or 41, wherein the IRAP with a RADL NAL unit type is selected for an IRAP picture followed by one or more RADL pictures and zero RADL pictures in decoding order.

49. The encoding device of claim 48, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

49. The device of claim 48, wherein the IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

49. The device of claim 48, wherein the IRAP with a RADL NAL unit type is specified as IRAP_W_RADL.

The encoding device of claim 51 , wherein the IRAP_W_RADL corresponds to stream access point (SAP) type 2 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

42. The encoding device according to claim 40 or 41, wherein the IRAP without leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order.

54. The encoding device of claim 53, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

54. The encoding device of claim 53, wherein the IRAP without leading pictures NAL unit type is referred to as instantaneous decoder refresh (IDR) without leading pictures NAL unit type.

54. The encoding device of claim 53, wherein the IRAP without leading pictures NAL unit type is designated as IRAP_N_LP.

57. The encoding device of claim 56, wherein the IRAP_N_LP designation corresponds to stream access point (SAP) type 1 in dynamic adaptive streaming over hypertext transfer protocol (DASH). .

A decoding device comprising:
a receiver configured to receive a coded video bitstream comprising a NAL unit and an identifier;
a memory coupled to the receiver for storing instructions and a set of less than five network abstraction layer (NAL) unit types available for video data; and
a processor coupled to the memory and configured to execute the instructions
wherein the instructions cause the decoding device to:
determine, from a set of less than five NAL unit types, a NAL unit type used to encode the NAL unit based on the identifier;
and assign a presentation order to pictures included in the NAL unit based on the determined NAL unit type.

59. The intra-random access point of claim 58, wherein the set of less than five network abstraction layer (NAL) unit types comprises: a leading and trailing pictures NAL unit type, a random access skipping type leading (RASL) NAL unit type. IRAP), IRAP with Random Access Decodable Leading (RADL) NAL unit type, and IRAP without leading pictures NAL unit type.

The decoding device according to claim 59 or 60, wherein both leading and trailing pictures are assigned the leading and trailing pictures NAL unit type.

61. The decoding device according to claim 59 or 60, wherein the IRAP with a RASL NAL unit type is selected for an IRAP picture followed by one or more RASL pictures and zero or more RADL pictures in decoding order.

63. The decoding device of claim 62, wherein the IRAP picture is referred to as a clean random access (CRA) picture.

63. The decoding device of claim 62, wherein the IRAP with a RASL NAL unit type is referred to as a clean random access (CRA) NAL unit type.

63. The decoding device of claim 62, wherein the IRAP with a RASL NAL unit type is designated as IRAP_W_RASL.

66. The decoding device of claim 65, wherein the IRAP_W_RASL designation corresponds to stream access point (SAP) type 3 in dynamic adaptive streaming over hypertext transfer protocol (DASH). .

The decoding device according to claim 59 or 60, wherein the IRAP with a RADL NAL unit type is selected for an IRAP picture followed by one or more RADL pictures and zero RADL pictures in decoding order.

68. The decoding device of claim 67, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture with a RADL picture.

68. The decoding device of claim 67, wherein the IRAP with a RADL NAL unit type is referred to as an instantaneous decoder refresh (IDR) with a RADL NAL unit type.

68. The decoding device of claim 67, wherein the IRAP with a RADL NAL unit type is designated as IRAP_W_RADL.

71. The decoding device of claim 70, wherein the IRAP_W_RADL corresponds to a stream access point (SAP) type 2 in dynamic adaptive streaming over hypertext transfer protocol (DASH).

The decoding device according to claim 59 or 60, wherein the IRAP without leading pictures NAL unit type is selected for an IRAP picture not followed by a leading picture in decoding order.

73. The device of claim 72, wherein the IRAP picture is referred to as an instantaneous decoder refresh (IDR) picture without a leading picture.

73. The decoding device of claim 72, wherein the IRAP without leading pictures NAL unit type is referred to as instantaneous decoder refresh (IDR) without leading pictures NAL unit type.

73. The decoding device of claim 72, wherein the IRAP without leading pictures NAL unit type is designated as IRAP_N_LP.

76. The decoding device of claim 75, wherein the IRAP_N_LP designation corresponds to stream access point (SAP) type 1 in dynamic adaptive streaming over hypertext transfer protocol (DASH). .

A method of encoding a video bitstream implemented by a video encoder, comprising:
generating, by a processor of a video encoder, a bitstream comprising a network abstraction layer (NAL) unit for a non-intra random access point (non-IRAP) picture associated with an intra random access point (IRAP) picture step;
When the NAL unit for the non-IRAP picture includes a random access decodable leading (RADL) picture, by the processor of the video encoder, a first flag in the bitstream is set to a first value to do;
When the NAL unit for the non-IRAP picture includes a random access skipped leading (RASL) picture, by the processor of the video encoder, a second flag in the bitstream is set to the first value. setting; and
sending, by a transmitter of the video encoder, the video bitstream towards a video decoder;
How to include.

78. The method of claim 77, wherein the first flag is designated as RadlPictureFlag and the second flag is designated as RaslPictureFlag.

78. The method of claim 77, wherein the first value is one (1).

80. The method of claim 78 or 79, wherein the non-IRAP picture comprises a leading picture.

80. The method of claim 78 or 79, wherein the non-IRAP picture comprises a trailing picture.

78. The method of claim 77, wherein the first flag is set equal to the first value when a picture order count (POC) value of the non-IRAP picture is less than a POC value of the IRAP picture.

78. The method of claim 77, wherein each reference picture list for the non-IRAP picture does not contain any pictures other than an IRAP picture associated with the non-IRAP picture or another RADL picture associated with the IRAP picture. a flag is set equal to the first value.

78. The method of claim 77, wherein the second flag is set equal to the first value when a picture order count (POC) value of the non-IRAP picture is less than a POC value of the IRAP picture.

78. The method of claim 77, wherein the reference picture list for the non-IRAP picture includes at least one reference picture that precedes the IRAP picture associated with the non-IRAP picture or another RASL picture associated with the IRAP picture in decoding order. when the second flag is set equal to the first value.

78. The method of claim 77, further comprising setting the first flag and the second flag to a second value indicating that the NAL unit for the non-IRAP picture does not include the RADL picture or the RASL picture. How to.

78. The method of claim 77, wherein neither the first flag nor the second flag is set to the first value for the non-IRAP picture.

A method of decoding a coded video bitstream implemented by a video decoder, comprising:
a first flag, a second flag and a network abstraction layer (NAL) unit for a non-intra random access point (non-IRAP) picture associated with an intra random access point (IRAP) picture by the receiver of the video decoder receiving a coded video bitstream;
When the first flag in the bitstream is set to a first value by the processor of the video decoder, the NAL unit for the non-IRAP picture includes a random access decodable leading (RADL) picture deciding to do;
By the processor of the video decoder, when the second flag in the bitstream is set to the first value, the NAL unit for the non-IRAP picture selects a random access skipped leading (RASL) picture. determining to include; and
Allocate, by the processor of the video decoder, a presentation order for the pictures included in the NAL unit based on the first flag or the second flag having the first value, and based on the assigned presentation order decoding the NAL unit by
How to include.

89. The method of claim 88, wherein the first flag is designated as RadlPictureFlag and the second flag is designated as RaslPictureFlag.

89. The method of claim 88, wherein the first value is one (1).

91. The method of claim 89 or 90, wherein the non-IRAP picture comprises a leading picture.

91. The method of claim 89 or 90, wherein the non-IRAP picture comprises a trailing picture.

89. The method of claim 88, wherein the first flag is set equal to the first value when a picture order count (POC) value of the non-IRAP picture is less than a POC value of the IRAP picture.

89. The method of claim 88, wherein when each reference picture list for the non-IRAP picture does not contain any pictures other than an IRAP picture associated with the non-IRAP picture or another RADL picture associated with the IRAP picture. a flag is set equal to the first value.

89. The method of claim 88, wherein the second flag is set equal to the first value when a picture order count (POC) value of the non-IRAP picture is less than a POC value of the IRAP picture.

89. The method of claim 88, wherein the reference picture list for the non-IRAP picture includes at least one reference picture that precedes, in decoding order, the IRAP picture associated with the non-IRAP picture or another RASL picture associated with the IRAP picture. when the second flag is set equal to the first value.

89. The method of claim 88, further comprising setting the first flag and the second flag to a second value indicating that the NAL unit for the non-IRAP picture does not include the RADL picture or the RASL picture. How to.

89. The method of claim 88, wherein neither the first flag nor the second flag is set to the first value for the non-IRAP picture.

A coding device comprising:
a receiver configured to receive a bitstream to decode;
a transmitter coupled to the receiver and configured to transmit a decoded image to a display;
a memory coupled to at least one of the receiver or the transmitter and configured to store instructions; and
A processor coupled to the memory and configured to execute instructions stored in the memory to perform the method of any one of claims 1-38 and 77-98.
A coding device comprising a.

As a system,
encoder; and
A decoder in communication with the encoder
, wherein the encoder or the decoder comprises the decoding device, encoding device or coding apparatus of any one of claims 39 to 76 and 99.

As a means for coding,
receiving means configured to receive a bitstream to be decoded;
transmitting means coupled to the receiving means and configured to transmit the decoded image to the display means;
storage means coupled to at least one of the receiving means or the transmitting means and configured to store instructions; and
97. Processing means coupled to said storage means and configured to execute instructions stored in said storage means to perform the method of any one of claims 1-38 and 77-98.
Means for coding comprising