KR102210946B1

KR102210946B1 - Dictionary encoding and decoding of screen content

Info

Publication number: KR102210946B1
Application number: KR1020167027340A
Authority: KR
Inventors: 빈 리; 지정 쉬; 펑 위
Original assignee: 마이크로소프트 테크놀로지 라이센싱, 엘엘씨
Priority date: 2014-03-04
Filing date: 2014-03-04
Publication date: 2021-02-01
Also published as: EP3114840A1; CN105230021B; EP3114840A4; CN105230021A; WO2015131304A1; KR20160129076A

Abstract

딕셔너리 모드를 사용하여 비디오 및/또는 이미지 컨텐츠를 인코딩 및/또는 디코딩하기 위한 방법 및 디바이스가 제공된다. 예를 들면, 본 방법 및 디바이스는 1D 딕셔너리에 저장된 이전 픽셀 값들로부터 현재 픽셀 값들을 예측한다. 본 방법 및 디바이스는, 의사 2D 딕셔너리 모드를 사용하여 이전 픽셀 값들로부터 현재 픽셀 값들을 예측한다. 또한, 본 방법 및 디바이스는 의사 2D 딕셔너리 모드를 사용하여 현재 픽셀 값들을 예측한다. 또한, 본 방법 및 디바이스는 인터 의사 2D 딕셔너리 모드를 사용하여 참조 픽쳐의 이전 픽셀 값들로부터 현재 픽셀 값들을 예측한다. 픽셀 값들은, 오프셋 및 길이에 의해 식별되는 (예를 들면, 딕셔너리에 저장된) 이전 픽셀 값들로부터 예측될 수 있다. 또한, 본 방법 및 디바이스는 픽셀 값의 해시 매칭을 사용하여 픽셀 값을 인코딩한다.A method and device are provided for encoding and/or decoding video and/or image content using a dictionary mode. For example, the method and device predicts current pixel values from previous pixel values stored in a 1D dictionary. The method and device predicts current pixel values from previous pixel values using a pseudo 2D dictionary mode. In addition, the method and device predict current pixel values using a pseudo 2D dictionary mode. In addition, the method and device predicts current pixel values from previous pixel values of a reference picture using an inter pseudo 2D dictionary mode. Pixel values can be predicted from previous pixel values (eg, stored in a dictionary) identified by offset and length. In addition, the method and device encode pixel values using hash matching of pixel values.

Description

Dictionary encoding and decoding of screen content {DICTIONARY ENCODING AND DECODING OF SCREEN CONTENT}

엔지니어는 디지털 비디오의 비트 레이트를 감소시키기 위해 압축(소스 코딩 또는 소스 인코딩이라고도 칭해짐)을 사용한다. 압축은 정보를 더 낮은 비트 레이트 형태로 변환하는 것에 의해 비디오 정보를 저장 및 송신하는 비용을 감소시킨다. 압축해제(decompression)(디코딩이라고도 칭해짐)는 압축된 형태로부터 원래의 정보의 버전을 재구성한다. "코덱"은 인코더/디코더 시스템이다.Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression reduces the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from its compressed form. "Codec" is an encoder/decoder system.

지난 이십 여년에 걸쳐, ITU-T H.261, H.262 (MPEG-2 또는 ISO/EC 13818-2), H.263 및 H.264 (MPEG-4 AVC 또는 ISO/IEC 14496-10) 표준, MPEG-1 (ISO/IEC 11172-2) 및 MPEG-4 비주얼(ISO/IEC 14496-2) 표준, 및 SMPTE 421M 표준을 비롯하여, 다양한 비디오 코덱 표준이 채택되었다. 보다 최근에는, HEVC 표준(ITU-T H.265 또는 ISO/IEC 23008-2)이 승인되었다. (예를 들면, 스케일러블 비디오 코딩/디코딩을 위한, 샘플 비트 깊이 또는 크로마 샘플링 레이트의 관점에서 충실도가 더 높은 비디오의 코딩/디코딩을 위한, 또는 멀티뷰 코딩/디코딩을 위한) HEVC 표준에 대한 확장안이 현재 개발 중에 있다. 비디오 코덱 표준은 통상적으로, 인코딩 및 디코딩에서 특정 피쳐가 사용될 때 비트스트림에서 파라미터를 상세히 설명하는, 인코딩된 비디오 비트스트림의 신택스(syntax)에 대한 옵션을 정의한다. 많은 경우에서, 비디오 코덱 표준은 또한, 디코딩에서의 일치하는 결과를 달성하기 위해 디코더가 수행해야 하는 디코딩 동작에 관한 상세를 제공한다. 코덱 표준 외에, 다양한 독점적 코덱 포맷이 인코딩된 비디오 비트스트림의 신택스에 대한 다른 옵션 및 대응하는 디코딩 동작을 정의한다.Over the past two decades, ITU-T H.261, H.262 (MPEG-2 or ISO/EC 13818-2), H.263 and H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards , MPEG-1 (ISO/IEC 11172-2) and MPEG-4 visual (ISO/IEC 14496-2) standards, and SMPTE 421M standards, various video codec standards have been adopted. More recently, the HEVC standard (ITU-T H.265 or ISO/IEC 23008-2) has been approved. Extensions to the HEVC standard (e.g., for scalable video coding/decoding, for coding/decoding of video with higher fidelity in terms of sample bit depth or chroma sampling rate, or for multi-view coding/decoding) An is currently under development. Video codec standards typically define options for the syntax of an encoded video bitstream, detailing the parameters in the bitstream when certain features are used in encoding and decoding. In many cases, the video codec standard also provides details regarding the decoding operation the decoder must perform to achieve a consistent result in decoding. In addition to the codec standard, various proprietary codec formats define other options for the syntax of the encoded video bitstream and the corresponding decoding operation.

특정 타입의 컨텐츠, 예컨대 스크린 컨텐츠의 인코딩 및 디코딩은 일반적인 비디오 컨텐츠를 코딩하는 것과는 상이한 도전과제를 제시할 수 있다. 예를 들면, 스크린 컨텐츠는 유사한 컨텐츠의 영역(예를 들면, 동일한 컬러 또는 평활한 그래디언트(smooth gradient)를 갖는 큰 그래픽 영역) 및 반복된 컨텐츠의 영역을 포함할 수 있다. 일반적인 코딩 기술을 사용하여 이러한 컨텐츠를 인코딩 및 디코딩하는 것은, 효율적이지 않은 그리고 (예를 들면, 압축 아티팩트(artifact)를 생성하는 것에 의해) 품질을 감소시키는 결과를 생성할 수 있다.Encoding and decoding certain types of content, such as screen content, can present different challenges than coding general video content. For example, the screen content may include an area of similar content (eg, a large graphic area having the same color or a smooth gradient) and an area of repeated content. Encoding and decoding such content using conventional coding techniques can result in inefficient and reducing quality (eg, by creating compression artifacts).

이 개요는 하기의 상세한 설명에서 더 설명되는 엄선된 개념을 간단한 형태로 소개하기 위해 제공된다. 이 개요는 청구된 주제의 주요 피쳐나 또는 본질적인 피쳐를 식별하도록 의도된 것이 아니며, 청구된 주제의 범위를 제한하는 데 사용되도록 의도된 것도 아니다.This summary is provided to introduce in a simplified form a selection of concepts that are further described in the detailed description that follows. This summary is not intended to identify major or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

비디오 및/또는 이미지 데이터의 인코딩 및/또는 디코딩의 효율성을 향상시키기 위한 기술이 설명된다. 몇몇 혁신안에서, 일차원(one-dimensional; 1D) 딕셔너리(dictionary)에 저장된 이전의 픽셀 값(예를 들면, 이전에 재구성된 또는 이전에 디코딩된 픽셀 값)을 사용하여 픽셀 값을 인코딩 및/또는 디코딩하기 위해, 1D 딕셔너리 모드가 사용된다. 1D 딕셔너리 모드에서, 현재 픽셀 값은, 예측되고 있는 픽셀 값의 수를 나타내는 길이 및 1D 딕셔너리 내에서의 위치를 식별하는 오프셋을 사용하여 예측될 수 있다(예를 들면, 어떠한 잔차(residual)도 필요로 하지 않으면서 정확하게 예측될 수 있다).Techniques for improving the efficiency of encoding and/or decoding video and/or image data are described. In some innovations, pixel values are encoded and/or decoded using previous pixel values (e.g., previously reconstructed or previously decoded pixel values) stored in a one-dimensional (1D) dictionary. To do this, 1D dictionary mode is used. In the 1D dictionary mode, the current pixel value can be predicted using a length representing the number of pixel values being predicted and an offset identifying a location within the 1D dictionary (e.g., any residual is required. Can be accurately predicted without doing).

몇몇 혁신안에서, 이전의 픽셀 값(예를 들면, 이전에 재구성된 또는 이전에 디코딩된 픽셀 값)을 사용하여 픽셀 값을 인코딩 및/또는 디코딩하기 위해, 의사 2D(pseudo 2D) 딕셔너리 모드가 사용된다. 2D 딕셔너리 모드에서, 현재 픽셀 값은 X 및 Y 오프셋 및 길이를 사용하여 예측될 수 있다(예를 들면, 어떠한 잔차도 필요로 하지 않으면서 정확하게 예측될 수 있다). (예를 들면, 인코딩되고 있는 또는 디코딩되고 있는 현재 픽쳐에서의 현재 픽셀 위치에 대응하는 참조 픽쳐에서의 대응하는 픽셀 위치로부터의 길이 및 X 및 Y 오프셋에 의해 참조 픽쳐 내에서 식별되는) 참조 픽쳐에서의 픽셀 값을 사용하여 픽셀 값을 인코딩 및/또는 디코딩하기 위해 인터 의사 2D 딕셔너리 모드(inter pseudo 2D dictionary mode)가 또한 사용될 수 있다.In some innovations, to encode and/or decode pixel values using previous pixel values (e.g., previously reconstructed or previously decoded pixel values), a pseudo 2D dictionary mode is used. . In 2D dictionary mode, the current pixel value can be predicted using X and Y offsets and lengths (eg, can be accurately predicted without requiring any residuals). In a reference picture (e.g., identified within the reference picture by a length and X and Y offsets from the corresponding pixel position in the reference picture corresponding to the current pixel position in the current picture being encoded or being decoded) An inter pseudo 2D dictionary mode may also be used to encode and/or decode pixel values using the pixel values of.

다른 혁신안에서, 인코더는 (예를 들면, 매 1개, 2개, 4개, 및 8개 픽셀 값에 대해) 이전에 인코딩된 픽셀 값에 대한 해시 값을 계산한다. 그 다음, 인코딩되고 있는 현재 픽셀 값은, 현재 픽셀 값의 해시를 생성하여 해시 값에 매칭시키는 것에 의해, 이전에 인코딩된 픽셀 값에 대해 매칭된다.In another innovation, the encoder computes hash values for previously encoded pixel values (eg, for every 1, 2, 4, and 8 pixel values). The current pixel value being encoded is then matched against the previously encoded pixel value by generating a hash of the current pixel value and matching it to the hash value.

본원에서 설명되는 기술은 스크린 컨텐츠의 코딩에 적용될 수 있다. 스크린 컨텐츠는, 컴퓨터에서 생성되는 비디오 및/또는 이미지 컨텐츠(예를 들면, 텍스트, 그래픽, 및/또는 컴퓨터에서 생성되는 다른 인공적 컨텐츠)를 가리킨다. 스크린 컨텐츠의 한 예는, 텍스트, 아이콘, 메뉴, 윈도우, 및/또는 다른 컴퓨터 텍스트 및 그래픽을 포함하는 컴퓨터 데스크탑 그래픽 유저 인터페이스의 이미지이다. 본원에서 설명되는 기술은 스크린 컨텐츠 이외의 컨텐츠에 또한 적용될 수 있다.The techniques described herein can be applied to coding of screen content. Screen content refers to computer-generated video and/or image content (eg, text, graphics, and/or other computer-generated artificial content). One example of screen content is an image of a computer desktop graphical user interface, including text, icons, menus, windows, and/or other computer text and graphics. The techniques described herein can also be applied to content other than screen content.

본 발명의 상기 및 다른 목적, 피쳐, 및 이점은, 첨부의 도면을 참조로 진행하는 하기의 상세한 설명으로부터 더욱 명확하게 될 것이다.The above and other objects, features, and advantages of the present invention will become more apparent from the following detailed description proceeding with reference to the accompanying drawings.

도 1은 몇몇 설명된 실시형태가 구현될 수 있는 예시적인 컴퓨팅 시스템의 도면이다.
도 2a 및 도 2b는 몇몇 설명된 실시형태가 구현될 수 있는 예시적인 네트워크 환경의 도면이다.
도 3은 몇몇 설명된 실시형태가 연계하여 구현될 수 있는 예시적인 인코더 시스템의 도면이다.
도 4는 몇몇 설명된 실시형태가 연계하여 구현될 수 있는 예시적인 디코더 시스템의 도면이다.
도 5a 및 도 5b는 몇몇 설명된 실시형태가 연계하여 구현될 수 있는 예시적인 비디오 인코더를 예시하는 도면이다.
도 6은 몇몇 설명된 실시형태가 연계하여 구현될 수 있는 예시적인 비디오 디코더를 예시하는 도면이다.
도 7은 1D 딕셔너리 모드를 사용하여 픽셀 값의 블록을 인코딩하는 예를 예시하는 도면이다.
도 8은 1D 딕셔너리 모드를 사용하여 픽셀 값의 블록을 디코딩하는 예를 예시하는 도면이다.
도 9는 딕셔너리 모드를 사용하여 픽셀 값을 디코딩하기 위한 예시적인 방법의 플로우차트이다.
도 10은 1D 딕셔너리 모드를 사용하여 픽셀 값을 디코딩하기 위한 예시적인 방법의 플로우차트이다.
도 11은 딕셔너리 모드를 사용하여 픽셀 값을 인코딩하기 위한 예시적인 방법의 플로우차트이다.1 is a diagram of an exemplary computing system in which some of the described embodiments may be implemented.
2A and 2B are diagrams of an exemplary network environment in which some of the described embodiments may be implemented.
3 is a diagram of an exemplary encoder system in which several described embodiments may be implemented in conjunction.
4 is a diagram of an exemplary decoder system in which several described embodiments may be implemented in conjunction.
5A and 5B are diagrams illustrating an example video encoder in which some described embodiments may be implemented in conjunction.
6 is a diagram illustrating an example video decoder in which several described embodiments may be implemented in conjunction.
7 is a diagram illustrating an example of encoding a block of pixel values using a 1D dictionary mode.
8 is a diagram illustrating an example of decoding a block of pixel values by using a 1D dictionary mode.
9 is a flowchart of an exemplary method for decoding pixel values using dictionary mode.
10 is a flowchart of an exemplary method for decoding pixel values using a 1D dictionary mode.
11 is a flowchart of an exemplary method for encoding pixel values using dictionary mode.

상세한 설명은 인코딩 및/또는 디코딩 동안 딕셔너리 모드의 사용에서 혁신안을 제시한다. 특히, 상세한 설명은, 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및/또는 인터 의사 2D 딕셔너리 모드를 사용하여, 디지털 비디오 및/또는 이미지 컨텐츠(예를 들면, 스크린 컨텐츠와 같은 비디오 컨텐츠)를 인코딩 및/또는 디코딩하기 위한 혁신안을 제시한다. 예를 들면, 딕셔너리(예를 들면, 1D 딕셔너리)에 저장된 또는 다른 위치에 저장된(예를 들면, 재구성된 픽쳐에 저장된) (예를 들면, 비디오 픽쳐에서의) 이전에 인코딩된 또는 디코딩된(예를 들면, 재구성된) 픽셀 값에 기초하여 비디오 컨텐츠에서의(예를 들면, 비디오 픽쳐에서의) 픽셀 값을 인코딩 및/또는 디코딩하기 위해, 다양한 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및 인터 의사 2D 딕셔너리 모드가 적용될 수 있다.The detailed description presents an innovation in the use of dictionary mode during encoding and/or decoding. In particular, the detailed description includes encoding and/or encoding digital video and/or image content (e.g., video content such as screen content) using a 1D dictionary mode, a pseudo 2D dictionary mode, and/or an inter pseudo 2D dictionary mode. Or suggest innovations for decoding. For example, previously encoded or decoded (e.g., in a video picture) stored in a dictionary (e.g., a 1D dictionary) or stored in another location (e.g., in a reconstructed picture) For example, to encode and/or decode pixel values in video content (e.g., in a video picture) based on reconstructed) pixel values, various 1D dictionary modes, pseudo 2D dictionary modes, and inter pseudo 2D Dictionary mode can be applied.

비디오 및/또는 이미지 데이터의 인코딩 및/또는 디코딩의 효율성을 향상시키기 위한 기술이 설명된다. 몇몇 혁신안에서, 딕셔너리에 또는 다른 위치에 저장된 이전 픽셀 값(예를 들면, 이전에 재구성된 또는 이전에 디코딩된 픽셀 값)을 사용하여 픽셀 값을 인코딩 및/또는 디코딩하기 위해, 딕셔너리 모드가 사용된다. 딕셔너리 모드에서, 현재 픽셀 값은, 예측되고 있는 픽셀 값의 수를 나타내는 길이 및 (예를 들면, 딕셔너리에서의) 이전 픽셀 값 내에서의 위치를 식별하는 오프셋을 사용하여 예측될 수 있다(예를 들면, 어떠한 잔차도 필요로 하지 않으면서 정확하게 예측될 수 있다). 이전 픽셀 값으로부터 픽셀 값을 정확하게 예측하는 것에 의해 무손실 예측이 수행될 수 있다.Techniques for improving the efficiency of encoding and/or decoding video and/or image data are described. In some innovations, the dictionary mode is used to encode and/or decode pixel values using previous pixel values stored in a dictionary or elsewhere (e.g., previously reconstructed or previously decoded pixel values). . In dictionary mode, the current pixel value can be predicted using a length representing the number of pixel values being predicted and an offset identifying a position within the previous pixel value (e.g., in the dictionary) (e.g. For example, it can be accurately predicted without requiring any residuals). Lossless prediction can be performed by accurately predicting pixel values from previous pixel values.

이들 혁신안 중 몇몇은 디지털 픽쳐 컨텐츠(예를 들면, 이미지 컨텐츠 및/또는 비디오 컨텐츠)를 인코딩 및/또는 디코딩하는 효율성을 향상시킨다. 예를 들면, 디지털 픽쳐 컨텐츠를 코딩하는 데 필요로 되는 비트를 감소시키기 위해, 딕셔너리 코딩 모드가 적용될 수 있다. 스크린 컨텐츠가 인코딩되고 있고/있거나 디코딩되고 있는 상황에서, 컨텐츠를 코딩하는 데 필요로 되는 비트의 수 및/또는 코딩 복잡도를 감소시키기 위해, 다양한 1D 딕셔너리 코딩 모드, 의사 2D 딕셔너리 코딩 모드, 및 인터 의사 2D 딕셔너리 코딩 모드가 적용될 수 있다. 다른 혁신안에서, 디지털 픽쳐 컨텐츠의 인코딩은, 픽셀의 다양한 그룹(예를 들면, 1개 픽셀, 2개 픽셀, 4개 픽셀, 8개 픽셀, 등등)의 해시 값을 계산하고, 인코딩되고 있는 현재 픽셀 값을 예측하기 위해(예를 들면, 본원에서 설명되는 다양한 딕셔너리 모드를 사용하여 인코딩하기 위해) 사용할 매칭 픽셀 값을 식별하는 해시 값을 매칭시키는 것에 의해, 향상될 수 있다.Some of these innovations improve the efficiency of encoding and/or decoding digital picture content (eg, image content and/or video content). For example, in order to reduce bits required to code digital picture content, a dictionary coding mode may be applied. In situations where screen content is being encoded and/or decoded, in order to reduce the number of bits and/or coding complexity required to code the content, various 1D dictionary coding modes, pseudo 2D dictionary coding modes, and inter pseudo 2D dictionary coding mode can be applied. In another innovation, the encoding of digital picture content can be done by computing the hash value of various groups of pixels (e.g. 1 pixel, 2 pixels, 4 pixels, 8 pixels, etc.), and the current pixel being encoded. It can be improved by matching hash values that identify matching pixel values to use to predict values (eg, to encode using the various dictionary modes described herein).

본원에서 설명되는 기술은 스크린 컨텐츠의 코딩에 적용될 수 있다. 스크린 컨텐츠는, 컴퓨터에서 생성되는 비디오 및/또는 이미지 컨텐츠(예를 들면, 텍스트, 그래픽, 및/또는 컴퓨터에서 생성되는 다른 인공적 컨텐츠)를 가리킨다. 스크린 컨텐츠의 한 예는, 텍스트, 아이콘, 메뉴, 윈도우, 및/또는 다른 컴퓨터 텍스트 및 그래픽을 포함하는 컴퓨터 데스크탑 그래픽 유저 인터페이스의 이미지이다. 본원에서 설명되는 기술은 스크린 컨텐츠 이외의 컨텐츠(예를 들면, 다른 타입의 디지털 비디오 및/또는 이미지 컨텐츠)에 또한 적용될 수 있다.The techniques described herein can be applied to coding of screen content. Screen content refers to computer-generated video and/or image content (eg, text, graphics, and/or other computer-generated artificial content). One example of screen content is an image of a computer desktop graphical user interface, including text, icons, menus, windows, and/or other computer text and graphics. The techniques described herein may also be applied to content other than screen content (eg, other types of digital video and/or image content).

본원에서 설명되는 동작이 비디오 인코더 또는 비디오 디코더에 의해 수행되고 있는 것으로 여기저기에 설명되지만, 많은 경우에서, 동작은 다른 타입의 미디어 프로세싱 툴(예를 들면, 디지털 이미지 또는 디지털 픽쳐 인코더, 디지털 이미지 또는 디지털 픽쳐 디코더)에 의해 수행될 수 있다.Although the operations described herein are described here and there as being performed by a video encoder or video decoder, in many cases, the operation is performed by other types of media processing tools (e.g., digital image or digital picture encoders, digital image or Digital picture decoder).

본원에서 설명되는 혁신안 중 몇몇은 HEVC 표준에 고유한 신택스 엘리먼트 및 동작을 참조로 예시된다. 예를 들면, HEVC 표준의 초안 버전(draft version) JCTVC-N1005 - 2013년 7월의 JCTVC-P1005 "High Efficiency Video Coding (HEVC) Range Extensions Text Specification: Draft 4"에 대한 참조가 이루어진다. 본원에서 설명되는 혁신안은 또한, 다른 표준 또는 포맷에 대해 구현될 수 있다.Some of the innovations described herein are illustrated with reference to syntax elements and operations specific to the HEVC standard. For example, reference is made to the draft version of the HEVC standard JCTVC-N1005-JCTVC-P1005 "High Efficiency Video Coding (HEVC) Range Extensions Text Specification: Draft 4" of July 2013. The innovations described herein may also be implemented for other standards or formats.

보다 일반적으로는, 본원에서 설명되는 예에 대한 다양한 대안예도 가능하다. 예를 들면, 본원에서 설명되는 방법 중 몇몇은, 설명되는 방법 액트(act)의 순서를 변경하는 것에 의해, 소정의 방법 액트를 분할, 반복, 또는 생략하는 것에 의해, 등등에 의해 변경될 수 있다. 개시된 기술의 다양한 양태는 조합하여 또는 개별적으로 사용될 수 있다. 상이한 실시형태는 설명된 혁신안 중 하나 이상을 사용한다. 본원에서 설명되는 혁신안 중 몇몇은 배경에서 언급된 문제점 중 하나 이상을 중점적으로 다룬다. 통상적으로는, 주어진 기술/툴은 이러한 문제점 모두를 해결하지는 않는다.More generally, various alternatives to the examples described herein are possible. For example, some of the methods described herein may be modified by changing the order of the described method acts, by dividing, repeating, or omitting certain method acts, etc. . Various aspects of the disclosed technology may be used in combination or individually. Different embodiments use one or more of the described innovations. Some of the innovations described herein focus on one or more of the problems mentioned in the background. Typically, a given technique/tool does not solve all of these problems.

I. 예시적인 컴퓨팅 시스템I. Example Computing System

도 1은, 설명된 혁신안 중 몇몇이 구현될 수도 있는 적절한 컴퓨팅 시스템(100)의 일반화된 예를 예시한다. 컴퓨팅 시스템(100)은, 혁신안이 다양한 범용의 또는 특수 목적의 컴퓨팅 시스템에서 구현될 수도 있기 때문에, 용도 또는 기능성의 범위에 관해 어떠한 제한도 제시하도록 의도되지는 않는다.1 illustrates a generalized example of a suitable computing system 100 in which some of the described innovations may be implemented. Computing system 100 is not intended to present any limitations as to the scope of use or functionality, as innovations may be implemented in a variety of general purpose or special purpose computing systems.

도 1을 참조하면, 컴퓨팅 시스템(100)은 하나 이상의 프로세싱 유닛(110, 115) 및 메모리(120, 125)를 포함한다. 프로세싱 유닛(110, 115)은 컴퓨터 실행가능 명령어를 실행한다. 프로세싱 유닛은 범용 중앙 프로세싱 유닛(central processing unit; "CPU"), 주문형 반도체(application-specific integrated circuit; "ASIC")에서의 프로세서, 또는 임의의 다른 타입의 프로세서일 수 있다. 다중 프로세싱 시스템에서, 프로세싱 파워를 증가시키기 위해 다수의 프로세싱 유닛이 컴퓨터 실행가능 명령어를 실행한다. 예를 들면, 도 1은 중앙 프로세싱 유닛(110)뿐만 아니라 그래픽 프로세싱 유닛 또는 코프로세싱 유닛(co-processing unit)(115)을 도시한다. 유형적인 메모리(120, 125)는, 프로세싱 유닛(들)에 의해 액세스될 수 있는, 휘발성 메모리(예를 들면, 레지스터, 캐시, RAM), 불휘발성 메모리(예를 들면, ROM, EEPROM, 플래시 메모리 등등), 또는 이 둘의 어떤 조합일 수도 있다. 메모리(120, 125)는, 1D 딕셔너리 모드 코딩, 의사 2D 딕셔너리 모드 코딩, 및/또는 인터 의사 2D 딕셔너리 모드 코딩에 대한 하나 이상의 혁신안을 구현하는 소프트웨어(180)를, 프로세싱 유닛(들)에 의한 실행에 적합한 컴퓨터 실행가능 명령어의 형태로 저장한다.Referring to FIG. 1, computing system 100 includes one or more processing units 110 and 115 and memories 120 and 125. The processing units 110 and 115 execute computer-executable instructions. The processing unit may be a general purpose central processing unit ("CPU"), a processor in an application-specific integrated circuit ("ASIC"), or any other type of processor. In multiple processing systems, multiple processing units execute computer-executable instructions to increase processing power. For example, FIG. 1 shows a central processing unit 110 as well as a graphics processing unit or co-processing unit 115. The tangible memory 120, 125 is volatile memory (e.g., registers, cache, RAM), nonvolatile memory (e.g., ROM, EEPROM, flash memory), which can be accessed by the processing unit(s). Etc.), or some combination of the two. The memories 120 and 125 are executed by the processing unit(s) for software 180 implementing one or more innovations for 1D dictionary mode coding, pseudo 2D dictionary mode coding, and/or inter pseudo 2D dictionary mode coding. It stores in the form of computer-executable instructions suitable for.

컴퓨팅 시스템은 추가적인 피쳐를 구비할 수도 있다. 예를 들면, 컴퓨팅 시스템(100)은 스토리지(140), 하나 이상의 입력 디바이스(150), 하나 이상의 출력 디바이스(160), 및 하나 이상의 통신 연결부(170)를 포함한다. 인터커넥션 메커니즘(interconnection mechanism)(도시되지 않음), 예컨대 버스, 컨트롤러, 또는 네트워크는 컴퓨팅 시스템(100)의 컴포넌트를 인터커넥트한다. 통상적으로, 오퍼레이팅 시스템 소프트웨어(도시되지 않음)는 컴퓨팅 시스템(100)에서 실행하는 다른 소프트웨어에 대한 동작 환경을 제공하고, 컴퓨팅 시스템(100)의 컴포넌트의 활동을 조화시킨다.The computing system may have additional features. For example, computing system 100 includes storage 140, one or more input devices 150, one or more output devices 160, and one or more communication connections 170. An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects components of computing system 100. Typically, operating system software (not shown) provides an operating environment for other software running on computing system 100 and coordinates the activities of components of computing system 100.

유형적인 스토리지(140)는 착탈식 또는 비착탈식일 수도 있고, 자기 디스크, 자기 테이프 또는 카세트, CD-ROM, DVD, 또는 정보를 저장하기 위해 사용될 수 있고 컴퓨팅 시스템(100) 내에서 액세스될 수 있는 임의의 다른 매체를 포함한다. 스토리지(140)는, 1D 딕셔너리 모드 코딩, 의사 2D 딕셔너리 모드 코딩, 및/또는 인터 의사 2D 딕셔너리 모드 코딩에 대한 하나 이상의 혁신안을 구현하는 소프트웨어(180)에 대한 명령어를 저장한다.The tangible storage 140 may be removable or non-removable, a magnetic disk, a magnetic tape or cassette, a CD-ROM, a DVD, or any other that can be used to store information and accessed within the computing system 100. Includes other media of. Storage 140 stores instructions for software 180 implementing one or more innovations for 1D dictionary mode coding, pseudo 2D dictionary mode coding, and/or inter pseudo 2D dictionary mode coding.

입력 디바이스(들)(150)는 터치 입력 디바이스 예컨대 키보드, 마우스, 펜, 또는 트랙볼, 음성 입력 디바이스, 스캐닝 디바이스, 또는 컴퓨팅 시스템(100)으로 입력을 제공하는 다른 디바이스일 수도 있다. 비디오의 경우, 입력 디바이스(들)(150)는 카메라, 비디오 카드, TV 튜너 카드, 또는 비디오 입력을 아날로그 또는 디지털 형태로 받아들이는 유사한 디바이스, 또는 비디오 샘플을 컴퓨팅 시스템(100) 안으로 읽어들이는 CD-ROM 또는 CD-RW일 수도 있다. 출력 디바이스(들)(160)는 디스플레이, 프린터, 스피커, CD-라이터, 또는 컴퓨팅 시스템(100)으로부터의 출력을 제공하는 다른 디바이스일 수도 있다.The input device(s) 150 may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or other device that provides input to the computing system 100. In the case of video, the input device(s) 150 may be a camera, video card, TV tuner card, or similar device that accepts video input in analog or digital form, or a CD that reads video samples into computing system 100. It may be -ROM or CD-RW. Output device(s) 160 may be a display, printer, speaker, CD-writer, or other device that provides output from computing system 100.

통신 연결부(들)(170)는 통신 매체를 통한 다른 통신 엔티티로의 통신을 가능하게 한다. 통신 매체는 컴퓨터 실행가능 명령어, 오디오 또는 비디오 입력 또는 출력, 또는 변조된 데이터 신호에서의 다른 데이터와 같은 정보를 전달한다. 변조된 데이터 신호는, 자신의 특성 중 하나 이상을, 신호에서 정보를 인코딩하는 것과 같은 방식으로 설정하거나 변경한 신호를 의미한다. 비제한적인 예로서, 통신 매체는 전기적 캐리어, 광학적 캐리어, RF 캐리어, 또는 다른 캐리어를 사용할 수 있다.The communication connection(s) 170 enable communication to other communication entities via a communication medium. Communication media carry information such as computer executable instructions, audio or video input or output, or other data in a modulated data signal. The modulated data signal refers to a signal in which one or more of its characteristics are set or changed in the same manner as encoding information in the signal. As a non-limiting example, the communication medium may use an electrical carrier, an optical carrier, an RF carrier, or other carrier.

개시된 혁신안 중 임의의 것은, 하나 이상의 컴퓨터 판독가능 저장 매체 상에 저장되며 컴퓨팅 디바이스(예를 들면, 스마트폰 또는 컴퓨팅 하드웨어를 포함하는 다른 모바일 디바이스를 비롯하여, 임의의 이용가능한 컴퓨팅 디바이스) 상에서 실행되는 컴퓨터 실행가능 명령어 또는 컴퓨터 프로그램 제품으로서 구현될 수 있다. 컴퓨터 판독가능 저장 매체는, 컴퓨팅 환경 내에서 액세스될 수 있는 임의의 이용가능한 유형적인 매체(예를 들면, DVD 또는 CD와 같은 하나 이상의 광학 매체 디스크, 휘발성 메모리 컴포넌트(예컨대 DRAM 또는 SRAM), 또는 불휘발성 메모리 컴포넌트(예컨대 플래시 메모리 또는 하드 드라이브))이다. 예로서 그리고 도 1을 참조하면, 컴퓨터 판독가능 저장 매체는 메모리(1020 및 1025), 및 스토리지(1040)를 포함한다. 용어 컴퓨터 판독가능 저장 매체는 신호 및 반송파(carrier wave)를 포함하지 않는다. 또한, 컴퓨터 판독가능 저장 매체는 통신 연결부(예를 들면, 170)를 포함하지 않는다.Any of the disclosed innovations are stored on one or more computer readable storage media and executed on a computing device (eg, any available computing device, including a smartphone or other mobile device including computing hardware). It may be implemented as a computer executable instruction or a computer program product. Computer-readable storage media may be any available tangible media that can be accessed within a computing environment (e.g., one or more optical media disks such as DVDs or CDs, volatile memory components (eg DRAM or SRAM), or non- Is a volatile memory component (eg flash memory or hard drive). As an example and referring to FIG. 1, computer-readable storage media includes memories 1020 and 1025, and storage 1040. The term computer readable storage medium does not include signals and carrier waves. Further, the computer-readable storage medium does not include a communication connection (eg, 170).

혁신안은, 컴퓨팅 시스템에서 타겟인 실제 또는 가상의 프로세서 상에서 실행되고 있는, 프로그램 모듈에 포함되는 것과 같은 컴퓨터 실행가능 명령어의 일반적 맥락에서 설명될 수 있다. 일반적으로, 프로그램 모듈은, 특정한 태스크를 수행하거나 또는 특정한 추상 데이터 타입을 구현하는 루틴, 프로그램, 라이브러리, 오브젝트, 클래스, 컴포넌트, 데이터 구조 등등을 포함한다. 프로그램 모듈의 기능성은 다양한 실시형태에서 소망되는 바에 따라 프로그램 모듈 사이에서 분할되거나 또는 결합될 수도 있다. 프로그램 모듈에 대한 컴퓨터 실행가능 명령어는 로컬 컴퓨팅 시스템 또는 분산형 컴퓨팅 시스템 내에서 실행될 수도 있다.The innovation can be described in the general context of computer-executable instructions, such as contained in program modules, that are executing on a target real or virtual processor in a computing system. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be divided or combined between program modules as desired in the various embodiments. Computer-executable instructions for program modules may be executed within a local computing system or a distributed computing system.

용어 "시스템" 및 "디바이스"는 본원에서 상호교환적으로 사용된다. 문맥상 그렇지 않다고 명확하게 나타내지 않는 한, 어떠한 용어도 컴퓨팅 디바이스 또는 컴퓨팅 시스템의 타입에 대해 아무런 제한도 내포하지 않는다. 일반적으로, 컴퓨팅 시스템 또는 컴퓨팅 디바이스는 로컬이거나 또는 분산될 수도 있으며, 본원에서 설명되는 기능성을 구현하는 소프트웨어와의 특수 목적의 하드웨어 및/또는 범용 하드웨어의 임의의 조합을 포함할 수 있다.The terms “system” and “device” are used interchangeably herein. Unless the context clearly indicates otherwise, no term implies any limitation on the type of computing device or computing system. In general, a computing system or computing device may be local or distributed and may include any combination of special purpose hardware and/or general purpose hardware with software implementing the functionality described herein.

개시된 방법은 또한, 개시된 방법 중 임의의 것을 수행하도록 구성되는 특수한 컴퓨팅 하드웨어를 사용하여 구현될 수 있다. 예를 들면, 개시된 방법은, 개시된 방법 중 임의의 것을 구현하도록 특별하게 설계되거나 구성되는 집적 회로(예를 들면, ASIC(예컨대 ASIC 디지털 신호 프로세스 유닛(digital signal processor; "DSP"), 그래픽 프로세싱 유닛(graphics processing unit; "GPU"), 또는 프로그래머블 로직 디바이스(programmable logic device; "PLD"), 예컨대 필드 프로그래머블 게이트 어레이(field programmable gate array; "FPGA"))에 의해 구현될 수 있다.The disclosed methods may also be implemented using special computing hardware configured to perform any of the disclosed methods. For example, the disclosed method includes an integrated circuit (e.g., an ASIC (e.g. ASIC digital signal processor (“DSP”)), a graphics processing unit) specifically designed or configured to implement any of the disclosed methods. (graphics processing unit; “GPU”), or a programmable logic device (“PLD”), such as a field programmable gate array (“FPGA”)).

표현 때문에, 상세한 설명은, 컴퓨팅 시스템에서의 컴퓨터 동작을 설명하기 위해, "결정한다" 및 "사용한다"와 같은 용어를 사용한다. 이들 용어는 컴퓨터에 의해 수행되는 동작에 대한 하이 레벨의 추상개념이며 사람에 의해 수행되는 행위와 혼돈되어선 안된다. 이들 용어에 대응하는 실제 컴퓨터 동작은 구현예에 따라 달라진다.For reasons of presentation, the detailed description uses terms such as "determining" and "using" to describe computer operation in a computing system. These terms are high-level abstractions of actions performed by computers and should not be confused with actions performed by humans. Actual computer operation corresponding to these terms will depend on the implementation.

II. 예시적인 네트워크 환경II. Example network environment

도 2a 및 도 2b는 비디오 인코더(220) 및 비디오 디코더(270)를 포함하는 예시적인 네트워크 환경(201, 202)을 도시한다. 인코더(220) 및 디코더(270)는 적절한 통신 프로토콜을 사용하여 네트워크(250)를 통해 연결된다. 네트워크(250)는 인터넷 또는 다른 컴퓨터 네트워크를 포함할 수 있다.2A and 2B illustrate exemplary network environments 201 and 202 that include a video encoder 220 and a video decoder 270. The encoder 220 and decoder 270 are connected via network 250 using an appropriate communication protocol. Network 250 may include the Internet or other computer network.

도 2a에서 도시되는 네트워크 환경(201)에서, 각각의 실시간 통신(real-time communication; "RTC") 툴(210)은 양방향 통신을 위해 인코더(220) 및 디코더(270) 둘 다를 포함한다. 주어진 인코더(220)는, HEVC 표준, SMPTE 421M 표준, ISO-IEC 14496-10 표준(H.264 또는 AVC로 또한 알려짐), 다른 표준, 또는 독점적 포맷의 변형안 또는 확장안을 따르는 출력을 생성할 수 있는데, 대응하는 디코더(270)가 인코더(220)로부터 인코딩된 데이터를 받아들이게 된다. 양방향 통신은 화상 회의, 영상 전화, 또는 다른 양자간 통신 시나리오의 일부일 수 있다. 도 2a의 네트워크 환경(201)이 두 개의 실시간 통신 툴(210)을 포함하지만, 네트워크 환경(201)은, 대신, 다자간 통신(multiparty communication)에 참여하는 세 개 이상의 실시간 통신 툴(210)을 포함할 수 있다.In the network environment 201 shown in FIG. 2A, each real-time communication (“RTC”) tool 210 includes both an encoder 220 and a decoder 270 for bidirectional communication. A given encoder 220 can produce outputs that conform to the HEVC standard, the SMPTE 421M standard, the ISO-IEC 14496-10 standard (also known as H.264 or AVC), another standard, or a variant or extension of the proprietary format. However, the corresponding decoder 270 receives the encoded data from the encoder 220. Two-way communication may be part of a video conference, video telephony, or other bilateral communication scenario. Although the network environment 201 of FIG. 2A includes two real-time communication tools 210, the network environment 201 instead includes three or more real-time communication tools 210 that participate in multiparty communication. can do.

실시간 통신 툴(210)은 인코더(220)에 의한 인코딩을 관리한다. 도 3은 실시간 통신 툴(210)에 포함될 수 있는 예시적인 인코더 시스템(300)을 도시한다. 대안적으로, 실시간 통신 툴(210)은 다른 인코더 시스템을 사용한다. 실시간 통신 툴(210)은 또한 디코더(270)에 의한 디코딩을 관리한다. 도 4는 실시간 통신 툴(210)에 포함될 수 있는 예시적인 디코더 시스템(400)을 도시한다. 대안적으로, 실시간 통신 툴(210)은 다른 디코더 시스템을 사용한다.The real-time communication tool 210 manages the encoding by the encoder 220. 3 shows an exemplary encoder system 300 that may be included in a real-time communication tool 210. Alternatively, the real-time communication tool 210 uses another encoder system. The real-time communication tool 210 also manages decoding by the decoder 270. 4 shows an exemplary decoder system 400 that may be included in the real-time communication tool 210. Alternatively, the real-time communication tool 210 uses a different decoder system.

도 2b에서 도시되는 네트워크 환경(202)에서, 인코딩 툴(212)은, 디코더(270)를 포함하는 다수의 재생 툴(214)로 전달하기 위해 비디오를 인코딩하는 인코더(220)를 포함한다. 단방향 통신은, 비디오 감시 시스템, 웹 카메라 모니터링 시스템, 원격 데스크탑 회의 프리젠테이션 또는 비디오가 인코딩되어 한 장소에서 하나 이상의 다른 장소로 전송되는 다른 시나리오에 대해 제공될 수 있다. 도 2b의 네트워크 환경(202)이 두 개의 재생 툴(214)을 포함하지만, 네트워크 환경(202)은 더 많은 또는 더 적은 재생 툴(214)을 포함할 수 있다. 일반적으로, 재생 툴(214)은, 재생 툴(214)이 수신할 비디오의 스트림을 결정하기 위해 인코딩 툴(212)과 통신한다. 재생 툴(214)은 스트림을 수신하고, 수신된 인코딩된 데이터를 적절한 기간 동안 버퍼링하고, 디코딩 및 재생을 시작한다.In the network environment 202 shown in FIG. 2B, the encoding tool 212 includes an encoder 220 that encodes the video for delivery to a number of playback tools 214 including a decoder 270. One-way communication may be provided for video surveillance systems, web camera monitoring systems, remote desktop conference presentations, or other scenarios in which video is encoded and transmitted from one location to one or more other locations. Although the network environment 202 of FIG. 2B includes two playback tools 214, the network environment 202 may include more or fewer playback tools 214. In general, playback tool 214 communicates with encoding tool 212 to determine the stream of video that playback tool 214 will receive. The playback tool 214 receives the stream, buffers the received encoded data for an appropriate period, and begins decoding and playback.

도 3은 인코딩 툴(212)에 포함될 수 있는 예시적인 인코더 시스템(300)을 도시한다. 대안적으로, 인코딩 툴(212)은 다른 인코더 시스템을 사용한다. 인코딩 툴(212)은 또한, 하나 이상의 재생 툴(214)과의 연결을 관리하기 위한 서버측 컨트롤러 로직을 포함할 수 있다. 도 4는, 재생 툴(214)에 포함될 수 있는 예시적인 디코더 시스템(400)을 도시한다. 대안적으로, 재생 툴(214)은 다른 디코더 시스템을 사용한다. 재생 툴(214)은 또한, 인코딩 툴(212)과의 연결을 관리하기 위한 클라이언트측 컨트롤러 로직을 포함할 수 있다.3 shows an exemplary encoder system 300 that may be included in the encoding tool 212. Alternatively, the encoding tool 212 uses a different encoder system. The encoding tool 212 may also include server-side controller logic to manage connections with one or more playback tools 214. 4 shows an exemplary decoder system 400 that may be included in the playback tool 214. Alternatively, the playback tool 214 uses a different decoder system. The playback tool 214 may also include client side controller logic to manage the connection with the encoding tool 212.

III. 예시적인 인코더 시스템III. Exemplary Encoder System

도 3은 몇몇 설명된 실시형태가 연계하여 구현될 수도 있는 예시적인 인코더 시스템(300)의 블록도이다. 인코더 시스템(300)은, 다수의 인코딩 모드 예컨대 실시간 통신을 위한 로우 레이턴시(low-latency) 인코딩 모드, 트랜스코딩 모드, 및 파일 또는 스트림으로부터의 미디어 재생을 위한 일반적인 인코딩 모드 중 임의의 것에서 동작할 수 있는 범용 인코딩 툴일 수 있거나, 또는 인코더 시스템(300)은 하나의 이러한 인코딩 모드에 대해 적응되는 특수 목적의 인코딩 툴일 수 있다. 인코더 시스템(300)은 오퍼레이팅 시스템 모듈로서, 애플리케이션 라이브러리의 일부로서 또는 독립형 애플리케이션으로서 구현될 수 있다. 종합하면, 인코더 시스템(300)은 비디오 소스(310)로부터 소스 비디오 프레임(311)의 시퀀스를 수신하고 인코딩된 데이터를 채널(390)로의 출력으로서 생성한다. 채널로 출력되는 인코딩된 데이터는, 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및/또는 인터 의사 2D 딕셔너리 모드를 사용하여 인코딩된 컨텐츠를 포함할 수 있다.3 is a block diagram of an exemplary encoder system 300 in which some described embodiments may be implemented in conjunction. The encoder system 300 can operate in any of a number of encoding modes, such as a low-latency encoding mode for real-time communication, a transcoding mode, and a general encoding mode for media playback from a file or stream. There may be a general purpose encoding tool, or the encoder system 300 may be a special purpose encoding tool adapted for one such encoding mode. The encoder system 300 may be implemented as an operating system module, as part of an application library, or as a standalone application. Taken together, encoder system 300 receives a sequence of source video frames 311 from video source 310 and produces encoded data as output to channel 390. The encoded data output through the channel may include content encoded using a 1D dictionary mode, a pseudo 2D dictionary mode, and/or an inter pseudo 2D dictionary mode.

비디오 소스(310)는 카메라, 튜너 카드, 저장 매체, 또는 다른 디지털 비디오 소스일 수 있다. 비디오 소스(310)는, 예를 들면, 초당 30프레임의 프레임 레이트에서 비디오 프레임의 시퀀스를 생성한다. 본원에서 사용되는 바와 같이, 용어 "프레임"은 일반적으로, 소스, 코딩된 또는 재구성된 이미지 데이터를 지칭한다. 순차 주사 비디오(progressive video)의 경우, 프레임은 순차 주사 비디오 프레임이다. 인터레이스 방식의 비디오(interlaced video)의 경우, 예시적인 실시형태에서, 인터레이스 방식의 비디오 프레임은 인코딩 이전에 디인터레이스된다(de-interlaced). 대안적으로, 두 개의 상보적 인터레이스 방식의 비디오 필드(interlaced video field)가 인터레이스 방식의 비디오 프레임 또는 별개의 필드로서 인코딩된다. 순차 주사 비디오 프레임(progressive video frame)을 나타내는 것 외에, 용어 "프레임" 또는 "픽쳐"는 단일의 짝을 이루지 않는 비디오 필드(single non-paired video field), 비디오 필드의 상보적 쌍, 주어진 시간에서의 비디오 오브젝트를 나타내는 비디오 오브젝트 플레인, 또는 더 큰 이미지에서의 주목(of interest) 영역을 나타낼 수 있다. 비디오 오브젝트 플레인 또는 영역은, 한 장면의 다수의 오브젝트 또는 영역을 포함하는 더 큰 이미지의 일부일 수 있다.The video source 310 may be a camera, tuner card, storage medium, or other digital video source. Video source 310 generates a sequence of video frames at a frame rate of 30 frames per second, for example. As used herein, the term “frame” generally refers to source, coded or reconstructed image data. In the case of progressive video, the frame is a progressive video frame. In the case of interlaced video, in an exemplary embodiment, the interlaced video frames are de-interlaced prior to encoding. Alternatively, two complementary interlaced video fields are encoded as interlaced video frames or separate fields. In addition to denoting a progressive video frame, the term "frame" or "picture" refers to a single non-paired video field, a complementary pair of video fields, at a given time. It may represent a video object plane representing a video object of, or an area of interest in a larger image. The video object plane or region may be part of a larger image comprising multiple objects or regions of a scene.

도달하는 소스 프레임(311)은, 다수의 프레임 버퍼 저장 영역(321, 322, …, 32n)을 포함하는 소스 프레임 임시 메모리 저장 영역(320)에 저장된다. 프레임 버퍼(321, 322 등등)는 소스 프레임 저장 영역(320)에 하나의 소스 프레임을 유지한다. 소스 프레임(311) 중 하나 이상이 프레임 버퍼(321, 322, 등등)에 저장된 이후, 프레임 선택기(330)는 소스 프레임 저장 영역(320)으로부터 개개의 소스 프레임을 선택한다. 인코더(340)로의 입력을 위해 프레임 선택기(330)에 의해 프레임이 선택되는 순서는, 비디오 소스(310)에 의해 프레임이 생성되는 순서와는 상이할 수도 있다, 예를 들면, 시간적으로 역방향의 예측을 용이하게 하기 위해, 어떤 프레임은 순서에서 앞설 수도 있다. 인코더(340) 이전에, 인코더 시스템(300)은, 인코딩 이전에, 선택된 프레임(331)의 전처리(pre-processing)(예를 들면, 필터링)를 수행하는 전처리기(pre-processor)(도시되지 않음)를 포함할 수 있다. 전처리는, 인코딩을 위한 주 성분(primary component) 및 보조 성분(secondary component)으로의 컬러 공간 변환을 또한 포함할 수 있다. 통상적으로는, 인코딩 이전에, 비디오는 YUV와 같은 컬러 공간으로 변환되었는데, 이 경우 루마(Y) 성분의 샘플 값은 휘도 또는 강도(intensity) 값을 나타내고, 크로마(U, V) 성분의 샘플 값은 컬러 차이 값을 나타낸다. 크로마 샘플 값은 (예를 들면, YUV 4:2:0 포맷에 대해) 더 낮은 크로마 샘플링 레이트로 서브샘플링될 수도 있거나, 또는 크로마 샘플 값은 (예를 들면, YUV 4:4:4 포맷에 대해) 루마 샘플 값과 동일한 해상도를 가질 수도 있다. 또는, 비디오는 다른 포맷(예를 들면, RGB 4:4:4 포맷)으로 인코딩될 수 있다.The arriving source frame 311 is stored in the source frame temporary memory storage area 320 including a plurality of frame buffer storage areas 321, 322, ..., 32n. The frame buffers 321, 322, etc. maintain one source frame in the source frame storage area 320. After one or more of the source frames 311 are stored in the frame buffers 321, 322, etc., the frame selector 330 selects individual source frames from the source frame storage area 320. The order in which frames are selected by the frame selector 330 for input to the encoder 340 may be different from the order in which frames are generated by the video source 310, for example, temporally reverse prediction. To facilitate this, some frames may precede in order. Before the encoder 340, the encoder system 300 is a pre-processor (not shown) that performs pre-processing (e.g., filtering) of the selected frame 331 before encoding. May include). The preprocessing may also include color space conversion to a primary component and a secondary component for encoding. Typically, before encoding, the video has been converted into a color space such as YUV, in which case the sample value of the luma (Y) component represents the luminance or intensity value, and the sample value of the chroma (U, V) component Represents the color difference value. The chroma sample values may be subsampled at a lower chroma sampling rate (e.g., for YUV 4:2:0 format), or the chroma sample values may be (e.g., for YUV 4:4:4 format). ) It may have the same resolution as the luma sample value. Alternatively, the video may be encoded in another format (eg, RGB 4:4:4 format).

인코더(340)는, 코딩된 프레임(341)을 생성하기 위해, 선택된 프레임(331)을 인코딩하고 또한 메모리 관리 제어 동작(memory management control operation; "MMCO") 신호(342) 또는 참조 픽쳐 세트(reference picture set; "RPS") 정보를 생성한다. 현재 프레임이 인코딩된 첫 번째 프레임이 아니고, 자신의 인코딩 프로세스를 수행하고 있는 경우, 인코더(340)는, 디코딩된 프레임 임시 메모리 저장 영역(360)에 저장되어 있는 하나 이상의 이전에 인코딩된/디코딩된 프레임(369)을 사용할 수도 있다. 이렇게 저장된 디코딩된 프레임(369)은 현재 소스 프레임(331)의 컨텐츠의 인터 프레임 예측(inter-frame prediction)을 위한 참조 프레임으로서 사용된다. 일반적으로, 인코더(340)는, 타일로의 구획화, 인트라 예측 추정 및 예측, 모션 추정 및 보상, 주파수 변환, 양자화 및 엔트로피 코딩과 같은 인코딩 태스크를 수행하는 다수의 인코딩 모듈을 포함한다. 인코더(340)에 의해 수행되는 정확한 동작은 압축 포맷에 의존하여 변할 수 있다. 출력된 인코딩된 데이터의 포맷은, HEVC 포맷, 윈도우 미디어 비디오(Windows Media Video) 포맷, VC-1 포맷, MPEG-x 포맷(예를 들면, MPEG-1, MPEG-2, 또는 MPEG-4), H.26x 포맷(예를 들면, H.261, H.262, H.263, H.264), 또는 다른 포맷의 변형안 또는 확장안일 수 있다.The encoder 340 encodes the selected frame 331 and also generates a coded frame 341, a memory management control operation ("MMCO") signal 342 or a reference picture set. picture set; "RPS") information is created. If the current frame is not the first encoded frame and is performing its own encoding process, the encoder 340 may perform one or more previously encoded/decoded frames stored in the decoded frame temporary memory storage area 360. A frame 369 can also be used. The decoded frame 369 thus stored is used as a reference frame for inter-frame prediction of the content of the current source frame 331. In general, the encoder 340 includes a number of encoding modules that perform encoding tasks such as partitioning into tiles, intra prediction estimation and prediction, motion estimation and compensation, frequency transformation, quantization, and entropy coding. The exact operation performed by encoder 340 may vary depending on the compression format. The format of the output encoded data is HEVC format, Windows Media Video format, VC-1 format, MPEG-x format (eg, MPEG-1, MPEG-2, or MPEG-4), It may be an H.26x format (eg, H.261, H.262, H.263, H.264), or a variant or extension of another format.

인코더(340)는 프레임을 동일한 사이즈의 또는 상이한 사이즈의 다수의 타일로 구획할 수 있다. 예를 들면, 인코더(340)는, 프레임 경계와 함께, 프레임 내에서의 타일의 수평 및 수직 경계를 정의하는 타일 행(row) 및 타일 열(column)을 따라 프레임을 분할하는데, 이 경우 각각의 타일은 직사각형 영역이다. 타일은 종종 병렬 프로세싱을 위한 옵션을 향상시키기 위해 사용된다. 프레임은 또한 하나 이상의 슬라이스로서 편제될(organized) 수 있는데, 이 경우 슬라이스는 전체 프레임 또는 프레임의 영역일 수 있다. 슬라이스는 프레임에서의 다른 슬라이스와는 무관하게 디코딩될 수 있는데, 슬라이스가 프레임의 다른 슬라이스와는 무관하게 디코딩되는 것은 에러 복원성(error resilience)을 향상시킨다. 슬라이스 또는 타일의 컨텐츠는 인코딩 및 디코딩의 목적을 위해 블록 또는 샘플 값의 다른 세트로 더 구획된다.The encoder 340 may divide the frame into multiple tiles of the same size or different sizes. For example, the encoder 340 divides the frame along tile rows and tile columns that define horizontal and vertical boundaries of tiles within the frame, along with frame boundaries, in which case each The tile is a rectangular area. Tiles are often used to enhance options for parallel processing. A frame may also be organized as one or more slices, in which case the slice may be an entire frame or a region of a frame. A slice can be decoded irrespective of other slices in a frame, and the fact that a slice is decoded irrespective of other slices in a frame improves error resilience. The content of a slice or tile is further partitioned into blocks or different sets of sample values for purposes of encoding and decoding.

HEVC 표준에 따른 신택스의 경우, 인코더는 프레임(또는 슬라이스 또는 타일)의 컨텐츠를 코딩 트리 단위로 분할한다. 코딩 트리 단위(coding tree unit; "CTU")는 루마 코딩 트리 블록(luma coding tree block; "CTB")으로 편제되는 루마 샘플 값 및 두 개의 크로마 CTB로 편제되는 대응하는 크로마 샘플 값을 포함한다. CTU(및 CTU의 CTB)의 사이즈는 인코더에 의해 선택되며, 예를 들면, 64×64, 32×32 또는 16×16 샘플 값일 수 있다. CTU는 하나 이상의 코딩 단위를 포함한다. 코딩 단위(coding unit; "CU")는 루마 코딩 블록(coding block; "CB") 및 두 개의 대응하는 크로마 CB를 구비한다. 예를 들면, 64×64 루마 CTB 및 두 개의 64×64 크로마 CTB를 갖는 CTU(YUV 4:4:4 포맷)는 네 개의 CU로 분할될 수 있는데, 각각의 CU는 32×32 루마 CB 및 두 개의 32×32 크로마 CB를 포함하고, 그리고 각각의 CU는 어쩌면 더 작은 CU로 더 분할된다. 또는, 다른 예로서, 64×64 루마 CTB 및 두 개의 32×32 크로마 CTB를 갖는 CTU(YUV 4:2:0 포맷)는 네 개의 CU로 분할될 수 있는데, 각각의 CU는 32×32 루마 CB 및 두 개의 16×16 크로마 CB를 포함하고, 그리고 각각의 CU는 어쩌면 더 작은 CU로 더 분할된다. CU의 가장 작은 허용가능한 사이즈(예를 들면, 8×8, 16×16)는 비트스트림에서 시그널링될 수 있다.In the case of syntax according to the HEVC standard, the encoder divides the content of a frame (or slice or tile) into a coding tree unit. A coding tree unit ("CTU") contains luma sample values organized into a luma coding tree block ("CTB") and corresponding chroma sample values organized into two chroma CTBs. The size of the CTU (and the CTB of the CTU) is selected by the encoder, and may be, for example, 64x64, 32x32 or 16x16 sample values. CTU contains one or more coding units. A coding unit (“CU”) has a luma coding block (“CB”) and two corresponding chroma CBs. For example, a CTU with a 64x64 luma CTB and two 64x64 chroma CTBs (YUV 4:4:4 format) can be divided into four CUs, each of which has a 32x32 luma CB and two It contains two 32x32 chroma CBs, and each CU is probably further divided into smaller CUs. Alternatively, as another example, a CTU (YUV 4:2:0 format) having a 64×64 luma CTB and two 32×32 chroma CTBs may be divided into four CUs, each CU being a 32×32 luma CB And two 16×16 chroma CBs, and each CU is possibly further divided into smaller CUs. The smallest allowable size of the CU (eg, 8×8, 16×16) can be signaled in the bitstream.

일반적으로, CU는 인터 또는 인트라와 같은 예측 모드를 갖는다. CU는 예측 정보(예컨대 예측 모드 상세, 등등) 및/또는 예측 프로세싱을 시그널링하는 목적을 위한 하나 이상의 예측 단위를 포함한다. 예측 단위(prediction unit; "PU")는 루마 예측 블록(prediction block; "PB") 및 두 개의 크로마 PB를 구비한다. 인트라 예측된 CU의 경우, CU가 최소 사이즈(예를 들면, 8×8)를 갖지 않는 한, PU는 CU와 동일한 사이즈를 갖는다. 그 경우, CU는 네 개의 더 작은 PU(예를 들면, 최소 CU 사이즈가 8×8이면 각각 4×4)로 분할될 수 있거나 또는 PU는, CU에 대한 신택스 엘리먼트에 의해 나타내어지는 바와 같이, 최소 CU 사이즈를 가질 수 있다. CU는 또한 잔차 코딩/디코딩의 목적을 위해 하나 이상의 변환 단위를 갖는데, 이 경우 변환 단위(transform unit; "TU")는 변환 블록(transform block; "TB") 및 두 개의 크로마 TB를 갖는다. 인트라 예측된 CU에서의 PU는 단일의 TU(사이즈에서 PU와 동일) 또는 다수의 TU를 포함할 수도 있다. 본원에서 사용되는 바와 같이, 용어 "블록"은, 상황에 따라, CU, CB, PB, TB 또는 샘플 값의 다른 세트를 나타낼 수 있다. 인코더는 비디오를 CTU, CU, PU, TU 등등으로 구획하는 방법을 결정한다.In general, the CU has a prediction mode such as inter or intra. The CU contains one or more prediction units for the purpose of signaling prediction information (eg prediction mode details, etc.) and/or prediction processing. The prediction unit (“PU”) includes a luma prediction block (“PB”) and two chroma PBs. In the case of an intra-predicted CU, the PU has the same size as the CU, unless the CU has a minimum size (eg, 8×8). In that case, the CU can be divided into four smaller PUs (e.g., 4x4 each if the minimum CU size is 8x8), or the PU is the minimum, as indicated by the syntax element for the CU. It can have a CU size. The CU also has one or more transform units for the purpose of residual coding/decoding, in which case a transform unit ("TU") has a transform block ("TB") and two chroma TBs. A PU in an intra-predicted CU may include a single TU (same as a PU in size) or multiple TUs. As used herein, the term “block” can refer to CU, CB, PB, TB or other sets of sample values, depending on the context. The encoder determines how to partition the video into CTU, CU, PU, TU, and so on.

도 3을 참조하면, 인코더는 소스 프레임(331)의 인트라 코딩된 블록을, 프레임(331)에서의 다른 이전에 재구성된 샘플 값으로부터의 예측의 관점에서 나타낸다. 블록에 대한 인트라 공간 예측의 경우, 인트라 픽쳐 추정기는, 이웃하는 재구성된 샘플 값의 블록으로의 외삽(extrapolation)을 추정한다. 인트라 픽쳐 추정기는 예측 정보(예컨대 인트라 공간 예측을 위한 예측 모드(방향))를 출력하는데, 예측 정보는 엔트로피 코딩된다. 인트라 예측 예측기(intra-prediction predictor)는 인트라 예측 값을 결정하기 위해 예측 정보를 적용한다.Referring to FIG. 3, the encoder represents the intra-coded block of the source frame 331 in terms of prediction from other previously reconstructed sample values in the frame 331. In the case of intra spatial prediction for a block, the intra picture estimator estimates an extrapolation of neighboring reconstructed sample values into a block. The intra picture estimator outputs prediction information (eg, a prediction mode (direction) for intra spatial prediction), and the prediction information is entropy-coded. An intra-prediction predictor applies prediction information to determine an intra prediction value.

본원에서 설명되는 다양한 딕셔너리 코딩 모드의 경우, 인코더는 이전에 재구성된 샘플 값(예를 들면, 1개 픽셀, 2개 픽셀, 4개 픽셀, 8개 픽셀, 및 기타 등등의 그룹화)의 해시 값을 계산할 수 있고 이들 해시 값을 인코딩되고 있는 현재 픽셀 값의 해시 값에 대해 비교할 수 있다. 길이의 매칭은 하나 이상이, 해시 비교에 기초하여, 이전에 재구성된 샘플 값에서 식별될 수 있고 현재 픽셀 값(또는 값들)은 본원에서 설명되는 다양한 1D 딕셔너리 모드 및 의사 2D 딕셔너리 모드(또는 참조 픽쳐와 관련한 인터 의사 2D 딕셔너리 모드)를 사용하여 인코딩될 수 있다.For the various dictionary coding modes described herein, the encoder retrieves a hash value of a previously reconstructed sample value (e.g., a grouping of 1 pixel, 2 pixels, 4 pixels, 8 pixels, and so on). You can compute and compare these hash values against the hash value of the current pixel value being encoded. Matching of the length can be identified in one or more previously reconstructed sample values, based on a hash comparison, and the current pixel value (or values) is the various 1D dictionary modes and pseudo 2D dictionary modes described herein (or reference picture Can be encoded using an inter pseudo 2D dictionary mode).

인코더(340)는, 소스 프레임(331)의 인터 프레임 코딩되고 예측된 블록을, 참조 프레임으로부터의 예측의 관점에서 나타낸다. 모션 추정기는 하나 이상의 참조 프레임(369)에 대한 블록의 모션을 추정한다. 다수의 참조 프레임이 사용되는 경우, 다수의 참조 프레임은 상이한 시간적 방향 또는 동일한 시간적 방향으로부터 유래할 수 있다. 모션 보상된 예측 참조 영역은, 현재 프레임의 샘플의 블록에 대한 모션 보상된 예측 값을 생성하기 위해 사용되는, 참조 프레임(들)에서의 샘플의 영역이다. 모션 추정기는 모션 벡터 정보와 같은 모션 정보를 출력하는데, 모션 정보는 엔트로피 코딩된다. 모션 보상기는 모션 보상된 예측 값을 결정하기 위해, 참조 프레임(369)에 모션 벡터를 적용한다.The encoder 340 represents an inter-frame coded and predicted block of the source frame 331 in terms of prediction from a reference frame. The motion estimator estimates the motion of a block for one or more reference frames 369. If multiple reference frames are used, the multiple reference frames may originate from different temporal directions or from the same temporal direction. The motion compensated prediction reference region is the region of the sample in the reference frame(s), which is used to generate a motion compensated prediction value for a block of samples of the current frame. The motion estimator outputs motion information such as motion vector information, and the motion information is entropy-coded. The motion compensator applies the motion vector to the reference frame 369 to determine the motion compensated prediction value.

인코더(340)의 엔트로피 코더는, 양자화된 변환 계수 값뿐만 아니라 소정의 부가 정보(예를 들면, 모션 벡터 정보, QP 값, 모드 결정, 파라미터 선택)를 압축한다. 특히, 엔트로피 코더는 계수 코딩 신택스 구조체를 사용하여 인덱스 맵의 엘리먼트에 대한 데이터를 압축할 수 있다. 통상적인 엔트로피 코딩 기술은, 지수 골룸 코딩(Exp-Golomb coding), 산술 코딩(arithmetic coding), 허프만 코딩(Huffman coding), 런 길이 코딩(run length coding), 가변 길이 대 가변 길이(variable-length-to-variable-length; "V2V") 코딩, 가변 길이 대 고정 길이(variable-length-to-fixed-length; "V2F") 코딩, LZ 코딩, 딕셔너리 코딩(dictionary coding), 확률 구간 구획화 엔트로피 코딩(probability interval partitioning entropy coding; "PIPE"), 및 상기의 조합을 포함한다. 엔트로피 코더는 상이한 종류의 정보에 대해 상이한 코딩 기술을 사용할 수 있고, 특정한 코딩 기술 내에서 다수의 코드 테이블 중에서 선택할 수 있다.The entropy coder of the encoder 340 compresses not only quantized transform coefficient values but also predetermined additional information (eg, motion vector information, QP value, mode determination, parameter selection). In particular, the entropy coder may compress data for an element of the index map using a coefficient coding syntax structure. Typical entropy coding techniques include Exp-Golomb coding, arithmetic coding, Huffman coding, run length coding, and variable-length- to-variable-length; "V2V") coding, variable-length-to-fixed-length ("V2F") coding, LZ coding, dictionary coding, probability interval partitioning entropy coding ( probability interval partitioning entropy coding ("PIPE"), and combinations of the above. Entropy coders can use different coding techniques for different kinds of information, and can choose from multiple code tables within a particular coding technique.

코딩된 프레임(341) 및 MMCO/RPS 정보(342)는 디코딩 프로세스 에뮬레이터(350)에 의해 프로세싱된다. 디코딩 프로세스 에뮬레이터(350)는 디코더의 기능성 중 몇몇, 예를 들면, 참조 프레임을 재구성하는 디코딩 태스크를 구현한다. 디코딩 프로세스 에뮬레이터(350)는, 인코딩될 후속 프레임의 인터 프레임 예측에서 참조 프레임으로서의 사용을 위해 주어진 코딩된 프레임(341)이 재구성되어 저장될 필요가 있는지의 여부를 결정하기 위해, MMCO/RPS 정보(342)를 사용한다. 코딩된 프레임(341)이 저장될 필요가 있다는 것을 MMCO/RPS 정보(342)가 나타내면, 디코딩 프로세스 에뮬레이터(350)는, 코딩된 프레임(341)을 수신하는 그리고 대응하는 디코딩된 프레임(351)을 생성하는 디코더에 의해 행해질 디코딩 프로세스를 모델링한다. 이렇게 함에 있어서, 디코딩된 프레임 저장 영역(360)에 저장되어 있던 디코딩된 프레임(들)(369)을 인코더(340)가 사용한 경우, 디코딩 프로세스 에뮬레이터(350)는 또한, 디코딩 프로세스의 일부로서, 저장 영역(360)으로부터의 디코딩된 프레임(들)(369)을 사용한다.The coded frame 341 and MMCO/RPS information 342 are processed by the decoding process emulator 350. The decoding process emulator 350 implements some of the decoder's functionality, for example a decoding task to reconstruct a reference frame. The decoding process emulator 350, in order to determine whether a given coded frame 341 needs to be reconstructed and stored for use as a reference frame in inter-frame prediction of a subsequent frame to be encoded, MMCO/RPS information ( 342) is used. If the MMCO/RPS information 342 indicates that the coded frame 341 needs to be stored, the decoding process emulator 350 receives the coded frame 341 and generates the corresponding decoded frame 351. Model the decoding process to be done by the generating decoder. In doing so, when the encoder 340 uses the decoded frame(s) 369 stored in the decoded frame storage area 360, the decoding process emulator 350 also stores, as part of the decoding process. The decoded frame(s) 369 from region 360 are used.

디코딩된 프레임 임시 메모리 저장 영역(360)은 다수의 프레임 버퍼 저장 영역(361, 362, …, 36n)을 포함한다. 디코딩 프로세스 에뮬레이터(350)는, 참조 프레임으로서의 사용을 위해 인코더(340)에 의해 더 이상 필요로 되지 않는 프레임을 갖는 임의의 프레임 버퍼(361, 362, 등등)을 식별하기 위해, 저장 영역(360)의 컨텐츠를 관리하는 데 MMCO/RPS 정보(342)를 사용한다. 디코딩 프로세스를 모델링한 이후, 디코딩 프로세스 에뮬레이터(350)는 이 방식으로 식별된 프레임 버퍼(361, 362 등등)에 새롭게 디코딩된 프레임(351)을 저장한다.The decoded frame temporary memory storage area 360 includes a plurality of frame buffer storage areas 361, 362, ..., 36n. The decoding process emulator 350 is configured to identify any frame buffers 361, 362, etc., which have frames that are no longer needed by the encoder 340 for use as reference frames. The MMCO/RPS information 342 is used to manage the contents of the. After modeling the decoding process, the decoding process emulator 350 stores the newly decoded frame 351 in the frame buffers 361, 362, etc., identified in this manner.

코딩된 프레임(341) 및 MMCO/RPS 정보(342)는 임시적인 코딩된 데이터 영역(temporary coded data area; 370)에 버퍼링된다. 코딩된 데이터 영역(370)에 집성되는 코딩된 데이터는, 기본 코딩된 비디오 비트스트림의 신택스의 일부로서, 하나 이상의 픽쳐에 대한 인코딩된 데이터를 포함한다. 코딩된 데이터 영역(370)에 집성되는 코딩된 데이터는 또한, 코딩된 비디오 데이터에 관련이 있는 미디어 메타데이터를 (예를 들면, 하나 이상의 보충적인 향상 정보(supplemental enhancement information; "SEI") 메시지 또는 비디오 사용가능성 정보(video usability information; "VUI") 메시지에서의 하나 이상의 파라미터로서) 포함할 수 있다.The coded frame 341 and MMCO/RPS information 342 are buffered in a temporary coded data area 370. The coded data aggregated in the coded data area 370, as part of the syntax of the basic coded video bitstream, includes encoded data for one or more pictures. The coded data aggregated in the coded data area 370 also includes media metadata related to the coded video data (eg, one or more supplemental enhancement information (“SEI”) messages or Video usability information ("VUI") as one or more parameters in the message).

임시적인 코딩된 데이터 영역(370)으로부터의 집성된 데이터(371)는 채널 인코더(380)에 의해 프로세싱된다. 채널 인코더(380)는, 미디어 스트림으로서의 송신을 위해, 집성된 데이터를 (예를 들면, ISO/IEC 13818-1과 같은 미디어 스트림 멀티플렉싱 포맷에 따라) 패킷화할 수 있는데, 이 경우, 채널 인코더(380)는 신택스 엘리먼트를 미디어 송신 스트림의 신택스의 일부로서 추가할 수 있다. 또는, 채널 인코더(380)는, 파일로서의 저장을 위해, 집성된 데이터를 (예를 들면, ISO/IEC 14496-12와 같은 미디어 컨테이너 포맷에 따라) 편제할 수 있는데, 이 경우, 채널 인코더(380)는 신택스 엘리먼트를 미디어 저장 파일의 신택스의 일부로서 추가할 수 있다. 또는, 보다 일반적으로는, 채널 인코더(380)는 하나 이상의 미디어 시스템 멀티플렉싱 프로토콜 또는 전송 프로토콜을 구현할 수 있는데, 이 경우, 채널 인코더(380)는 신택스 엘리먼트를 프로토콜(들)의 신택스의 일부로서 추가할 수 있다. 채널 인코더(380)는 채널(390)로 출력을 제공하는데, 채널(390)은 출력에 대한 다른 채널, 저장, 또는 통신 연결을 나타낸다.The aggregated data 371 from the temporary coded data area 370 is processed by the channel encoder 380. The channel encoder 380 may packetize the aggregated data (eg, according to a media stream multiplexing format such as ISO/IEC 13818-1) for transmission as a media stream. In this case, the channel encoder 380 ) May add a syntax element as part of the syntax of the media transport stream. Alternatively, the channel encoder 380 may organize the aggregated data (eg, according to a media container format such as ISO/IEC 14496-12) for storage as a file. In this case, the channel encoder 380 ) Can add a syntax element as part of the syntax of the media storage file. Or, more generally, the channel encoder 380 may implement one or more media system multiplexing protocols or transport protocols, in which case the channel encoder 380 may add a syntax element as part of the syntax of the protocol(s). I can. Channel encoder 380 provides the output to channel 390, which represents another channel, storage, or communication connection to the output.

IV. 예시적인 디코더 시스템IV. Exemplary decoder system

도 4는 몇몇 설명된 실시형태가 연계하여 구현될 수도 있는 예시적인 디코더 시스템(400)의 블록도이다. 디코더 시스템(400)은, 다수의 디코딩 모드 예컨대 실시간 통신을 위한 로우 레이턴시 디코딩 모드 및 파일 또는 스트림으로부터의 미디어 재생을 위한 일반적인 디코딩 모드 중 임의의 것에서 동작할 수 있는 범용 디코딩 툴일 수 있거나, 또는 디코더 시스템(400)은 하나의 이러한 디코딩 모드에 대해 적응되는 특수 목적의 디코딩 툴일 수 있다. 디코더 시스템(400)은 오퍼레이팅 시스템 모듈로서, 애플리케이션 라이브러리의 일부로서 또는 독립형 애플리케이션으로서 구현될 수 있다. 종합하면, 디코더 시스템(400)은 채널(410)로부터 코딩된 데이터를 수신하고 출력 목적지(490)에 대한 출력으로서 재구성된 프레임을 생성한다. 코딩된 데이터는, 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및/또는 인터 의사 2D 딕셔너리 모드를 사용하여 인코딩된 컨텐츠를 포함할 수 있다.4 is a block diagram of an exemplary decoder system 400 in which some described embodiments may be implemented in conjunction. The decoder system 400 may be a general purpose decoding tool capable of operating in any of a number of decoding modes, such as a low latency decoding mode for real-time communication and a typical decoding mode for media playback from a file or stream, or 400 may be a special purpose decoding tool adapted for one such decoding mode. The decoder system 400 may be implemented as an operating system module, as part of an application library, or as a standalone application. Taken together, the decoder system 400 receives coded data from channel 410 and generates a reconstructed frame as an output to an output destination 490. The coded data may include content encoded using a 1D dictionary mode, a pseudo 2D dictionary mode, and/or an inter pseudo 2D dictionary mode.

디코더 시스템(400)은 채널(410)을 포함하는데, 채널(410)은 코딩된 데이터에 대한 다른 채널, 저장, 또는 통신 연결을 입력으로서 나타낼 수 있다. 채널(410)은 채널 코딩된 코딩 데이터를 생성한다. 채널 디코더(420)는 코딩된 데이터를 프로세싱할 수 있다. 예를 들면, 채널 디코더(420)는, 미디어 스트림으로서의 송신을 위해 집성되었던 데이터를 (예를 들면, ISO/IEC 13818-1과 같은 미디어 스트림 멀티플렉싱 포맷에 따라) 패킷화해제하는데(de-packetize), 이 경우, 채널 디코더(420)는 미디어 송신 스트림의 신택스의 일부로서 추가된 신택스 엘리먼트를 파싱할 수 있다. 또는, 채널 디코더(420)는, 파일로서의 저장을 위해 집성되었던 코딩된 비디오 데이터를 (예를 들면, ISO/IEC 14496-12와 같은 미디어 컨테이너 포맷에 따라) 분리하는데, 이 경우, 채널 디코더(420)는 미디어 저장 파일의 신택스의 일부로서 추가된 신택스 엘리먼트를 파싱할 수 있다. 또는, 보다 일반적으로는, 채널 디코더(420)는 하나 이상의 미디어 시스템 디멀티플렉싱 프로토콜 또는 전송 프로토콜을 구현할 수 있는데, 이 경우, 채널 디코더(420)는 프로토콜(들)의 신택스의 일부로서 추가된 신택스 엘리먼트를 파싱할 수 있다.The decoder system 400 includes a channel 410, which may represent another channel, storage, or communication connection for coded data as an input. Channel 410 generates channel-coded coded data. The channel decoder 420 may process coded data. For example, the channel decoder 420 de-packetizes data that has been aggregated for transmission as a media stream (eg, according to a media stream multiplexing format such as ISO/IEC 13818-1). In this case, the channel decoder 420 may parse the syntax element added as part of the syntax of the media transmission stream. Alternatively, the channel decoder 420 separates the coded video data that has been aggregated for storage as a file (eg, according to a media container format such as ISO/IEC 14496-12). In this case, the channel decoder 420 ) Can parse the syntax element added as part of the syntax of the media storage file. Or, more generally, the channel decoder 420 may implement one or more media system demultiplexing protocols or transport protocols, in which case the channel decoder 420 is a syntax element added as part of the syntax of the protocol(s). Can be parsed.

채널 디코더(420)로부터 출력되는 코딩된 데이터(421)는, 충분한 양의 이러한 데이터가 수신될 때까지 임시적인 코딩된 데이터 영역(430)에 저장된다. 코딩된 데이터(421)는 코딩된 프레임(431) 및 MMCO/RPS 정보(432)를 포함한다. 코딩된 데이터 영역(430)에서의 코딩된 데이터(421)는, 기본 코딩된 비디오 비트스트림의 신택스의 일부로서, 하나 이상의 픽쳐에 대한 코딩된 데이터를 포함한다. 코딩된 데이터 영역(430)의 코딩된 데이터(421)는 또한, 인코딩된 비디오 데이터에 관련이 있는 미디어 메타데이터를 (예를 들면, 하나 이상의 SEI 메시지 또는 VUI 메시지에서의 하나 이상의 파라미터로서) 포함할 수 있다.The coded data 421 output from the channel decoder 420 is stored in the temporary coded data area 430 until a sufficient amount of such data is received. The coded data 421 includes a coded frame 431 and MMCO/RPS information 432. The coded data 421 in the coded data area 430, as part of the syntax of the basic coded video bitstream, includes coded data for one or more pictures. The coded data 421 of the coded data area 430 may also contain media metadata related to the encoded video data (e.g., as one or more parameters in one or more SEI messages or VUI messages). I can.

일반적으로, 코딩된 데이터 영역(430)은, 이러한 코딩된 데이터(421)가 디코더(450)에 의해 사용될 때까지, 코딩된 데이터(421)를 일시적으로 저장한다. 그때, MMCO/RPS 정보(432) 및 코딩된 프레임(431)에 대한 코딩된 데이터는 코딩된 데이터 영역(430)으로부터 디코더(450)로 전송된다. 디코딩이 진행함에 따라, 새로 코딩된 데이터가 코딩된 데이터 영역(430)에 추가되고 코딩된 데이터 영역(430)에 남아 있는 가장 오래된 코딩된 데이터는 디코더(450)로 전송된다.In general, coded data area 430 temporarily stores coded data 421 until such coded data 421 is used by decoder 450. At that time, the MMCO/RPS information 432 and coded data for the coded frame 431 are transmitted from the coded data area 430 to the decoder 450. As the decoding proceeds, newly coded data is added to the coded data area 430 and the oldest coded data remaining in the coded data area 430 is transmitted to the decoder 450.

디코더(450)는 대응하는 디코딩된 프레임(451)을 생성하기 위해 코딩된 프레임(431)을 주기적으로 디코딩한다. 적절하다면, 자신의 디코딩 프로세스를 수행하고 있을 때, 디코더(450)는 하나 이상의 이전에 디코딩된 프레임(469)을, 인터 프레임 예측을 위한 참조 프레임으로서 사용할 수도 있다. 디코더(450)는 이러한 이전에 디코딩된 프레임(469)을 디코딩된 프레임 임시 메모리 저장 영역(460)으로부터 판독한다. 일반적으로, 디코더(450)는, 타일의 엔트로피 디코딩, 역 양자화, 역 주파수 변환, 인트라 예측, 모션 보상 및 병합과 같은 디코딩 태스크를 수행하는 다수의 디코딩 모듈을 포함한다. 디코더(450)에 의해 수행되는 정확한 동작은 압축 포맷에 의존하여 변할 수 있다.The decoder 450 periodically decodes the coded frame 431 to generate a corresponding decoded frame 451. If appropriate, when performing its own decoding process, the decoder 450 may use one or more previously decoded frames 469 as reference frames for inter-frame prediction. The decoder 450 reads this previously decoded frame 469 from the decoded frame temporary memory storage area 460. In general, the decoder 450 includes a plurality of decoding modules that perform decoding tasks such as entropy decoding of tiles, inverse quantization, inverse frequency transform, intra prediction, motion compensation, and merging. The exact operation performed by the decoder 450 may vary depending on the compression format.

예를 들면, 디코더(450)는 압축된 프레임 또는 프레임의 시퀀스에 대한 인코딩된 데이터를 수신하고 디코딩된 프레임(451)을 포함하는 출력을 생성한다. 디코더(450)에서, 버퍼는 압축된 프레임에 대한 인코딩된 데이터를 수신하고, 적절한 시간에, 수신된 인코딩된 데이터를 엔트로피 디코더가 이용가능하게 만든다. 엔트로피 디코더는, 엔트로피 코딩된 양자화된 데이터뿐만 아니라 엔트로피 코딩된 부가 정보를 엔트로피 디코딩하는데, 통상적으로는, 인코더에서 수행된 엔트로피 인코딩의 역을 적용한다. 모션 보상기는 하나 이상의 참조 프레임에 대해 모션 정보를 적용하여, 재구성되고 있는 프레임의 임의의 인터 코딩된 블록에 대한 모션 보상된 예측 값을 형성한다. 인트라 예측 모듈은, 이웃하는 이전에 재구성된 샘플 값으로부터 현재 블록의 샘플 값을 공간적으로 예측할 수 있다.For example, the decoder 450 receives encoded data for a compressed frame or sequence of frames and generates an output comprising the decoded frame 451. At decoder 450, the buffer receives the encoded data for the compressed frame, and at the appropriate time, makes the received encoded data available to the entropy decoder. The entropy decoder entropy decodes entropy-coded quantized data as well as entropy-coded side information, and typically applies the inverse of entropy encoding performed by the encoder. The motion compensator applies motion information to one or more reference frames to form a motion compensated prediction value for any inter-coded block of the frame being reconstructed. The intra prediction module may spatially predict a sample value of a current block from a neighboring previously reconstructed sample value.

본원에서 설명되는 다양한 딕셔너리 코딩 모드의 경우, 디코더는 매칭 모드 및/또는 다이렉트 모드에서 현재 픽셀 값을 디코딩할 수 있다. 매칭 모드에서, 디코더는, 1D 딕셔너리에 또는 다른 위치(예를 들면, 재구성된 픽쳐)에 저장될 수도 있는 이전에 디코딩된 픽쳐 값(예를 들면, 이전에 재구성된 픽셀 값)으로부터 예측되는 현재 픽셀 값을 디코딩한다. 예를 들면, 디코더는 (예를 들면, 딕셔너리 내에서의) 오프셋 및 (오프셋으로부터 예측될 픽셀 값의 수를 나타내는) 길이를 나타내는 하나 이상의 코드를 수신할 수 있다. 다이렉트 모드에서, 디코더는 예측 없이 픽셀 값을 직접적으로 디코딩할 수 있다.For the various dictionary coding modes described herein, the decoder can decode the current pixel value in the matching mode and/or the direct mode. In matching mode, the decoder is the current pixel predicted from a previously decoded picture value (e.g., a previously reconstructed pixel value), which may be stored in a 1D dictionary or another location (e.g., reconstructed picture). Decode the value. For example, the decoder may receive one or more codes indicating an offset (eg, within a dictionary) and a length (indicating the number of pixel values to be predicted from the offset). In direct mode, the decoder can directly decode pixel values without prediction.

비딕셔너리 모드에서, 디코더(450)는 또한 예측 잔차를 재구성한다. 역 양자화기는 엔트로피 디코딩된 데이터를 역 양자화한다. 예를 들면, 디코더(450)는, 비트스트림의 신택스 엘리먼트에 기초하여 비디오의 픽쳐, 타일, 슬라이스 및/또는 다른 부분에 대한 QP에 대한 값을 설정하고, 그에 따라 변환 계수를 역 양자화한다. 역 주파수 변환기는 양자화된 주파수 도메인 데이터를 공간 도메인 정보로 변환한다. 인터 예측된 블록의 경우, 디코더(450)는 재구성된 예측 잔차를 모션 보상된 예측치와 결합한다. 마찬가지로, 디코더(450)는 예측 잔차 값을 인트라 예측으로부터의 예측치와 결합할 수 있다. 비디오 디코더(450)에서의 모션 보상 루프는, 디코딩된 프레임(451)의 블록 경계 행 및/또는 열에 걸친 불연속부를 평활화하기 위해, 적응적 디블록화 필터를 포함한다.In the non-dictionary mode, the decoder 450 also reconstructs the prediction residuals. The inverse quantizer inverse quantizes the entropy-decoded data. For example, the decoder 450 sets a value for QP for a picture, tile, slice, and/or other portion of the video based on the syntax element of the bitstream, and inverse quantizes the transform coefficient accordingly. The inverse frequency converter converts the quantized frequency domain data into spatial domain information. In the case of an inter-predicted block, the decoder 450 combines the reconstructed prediction residual with the motion compensated prediction value. Likewise, the decoder 450 may combine the prediction residual values with the predictions from intra prediction. The motion compensation loop in video decoder 450 includes an adaptive deblocking filter to smooth discontinuities across block boundary rows and/or columns of decoded frame 451.

디코딩된 프레임 임시 메모리 저장 영역(460)은 다수의 프레임 버퍼 저장 영역(461, 462, …, 46n)을 포함한다. 디코딩된 프레임 저장 영역(460)은 디코딩된 픽쳐 버퍼의 예이다. 디코더(450)는, 디코딩된 프레임(451)을 내부에 저장할 수 있는 프레임 버퍼(461, 462 등등)를 식별하기 위해 MMCO/RPS 정보(432)를 사용한다. 디코더(450)는 그 프레임 버퍼에 디코딩된 프레임(451)을 저장한다.The decoded frame temporary memory storage area 460 includes a plurality of frame buffer storage areas 461, 462, ..., 46n. The decoded frame storage area 460 is an example of a decoded picture buffer. The decoder 450 uses the MMCO/RPS information 432 to identify the frame buffers 461, 462, etc. that may store the decoded frame 451 therein. The decoder 450 stores the decoded frame 451 in its frame buffer.

출력 시퀀서(output sequencer; 480)는, 출력 순서에서 생성될 다음 프레임이 디코딩된 프레임 저장 영역(460)에서 이용가능한 때를 식별하기 위해 MMCO/RPS 정보(432)를 사용한다. 출력 순서에서 생성될 다음 프레임(481)이 디코딩된 프레임 저장 영역(460)에서 이용가능할 때, 다음 프레임(481)은 출력 시퀀서(480)에 의해 판독되고 출력 목적지(490)(예를 들면, 디스플레이)로 출력된다. 일반적으로, 디코딩된 프레임 저장 영역(460)으로부터 출력 시퀀서(480)에 의해 프레임이 출력되는 순서는, 프레임이 디코더(450)에 의해 디코딩되는 순서와는 상이할 수도 있다.The output sequencer 480 uses the MMCO/RPS information 432 to identify when the next frame to be generated in the output sequence is available in the decoded frame storage area 460. When the next frame 481 to be generated in the output sequence is available in the decoded frame storage area 460, the next frame 481 is read by the output sequencer 480 and output destination 490 (e.g., display ) Is displayed. In general, the order in which frames are output from the decoded frame storage area 460 by the output sequencer 480 may be different from the order in which the frames are decoded by the decoder 450.

V. 예시적인 비디오 인코더V. Exemplary Video Encoder

도 5a 및 도 5b는 몇몇 설명된 실시형태가 연계하여 구현될 수도 있는 일반화된 비디오 인코더(500)의 블록도이다. 인코더(500)는 현재 픽쳐를 입력 비디오 신호(505)로서 포함하는 비디오 픽쳐의 시퀀스를 수신하고 인코딩된 데이터를 코딩된 비디오 비트스트림(595)에서 출력으로서 생성한다.5A and 5B are block diagrams of a generalized video encoder 500 in which some described embodiments may be implemented in conjunction. Encoder 500 receives a sequence of video pictures containing the current picture as input video signal 505 and generates encoded data as output in coded video bitstream 595.

인코더(500)는 블록 기반이며 구현예에 의존하는 블록 포맷을 사용한다. 블록은 상이한 스테이지에서, 예를 들면, 예측, 주파수 변환 및/또는 엔트로피 인코딩 스테이지에서, 추가로 세분될(sub-divided) 수도 있다. 예를 들면, 픽쳐는 64×64 블록, 32×32 블록 또는 16×16 블록으로 분할될 수 있는데, 이들은 종국에는 코딩 및 디코딩을 위해 샘플 값의 더 작은 블록으로 분할될 수 있다. HEVC 표준에 대한 인코딩의 구현예에서, 인코더는 픽쳐를 CTU(CTB), CU(CB), PU(PB) 및 TU(TB)로 구획한다.The encoder 500 is block based and uses a block format that is implementation dependent. Blocks may be further sub-divided in different stages, for example in prediction, frequency transform and/or entropy encoding stages. For example, a picture may be divided into 64×64 blocks, 32×32 blocks or 16×16 blocks, which may eventually be divided into smaller blocks of sample values for coding and decoding. In an implementation of encoding for the HEVC standard, the encoder partitions the picture into CTU (CTB), CU (CB), PU (PB) and TU (TB).

인코더(500)는 인트라 픽쳐 코딩 및/또는 인터 픽쳐 코딩을 사용하여 픽쳐를 압축한다. 인코더(500)의 컴포넌트 중 많은 것은 인트라 픽쳐 코딩 및 인터 픽쳐 코딩 둘 다에 대해 사용된다. 이들 컴포넌트에 의해 수행되는 정확한 동작은, 압축되고 있는 정보의 타입에 의존하여 변할 수 있다.The encoder 500 compresses a picture using intra picture coding and/or inter picture coding. Many of the components of encoder 500 are used for both intra picture coding and inter picture coding. The exact operation performed by these components can vary depending on the type of information being compressed.

타일화 모듈(tiling module; 510)은, 옵션적으로, 픽쳐를, 동일한 사이즈의 또는 상이한 사이즈의 다수의 타일로 구획한다. 예를 들면, 타일화 모듈(510)은, 픽쳐 경계와 함께, 픽쳐 내에서의 타일의 수평 및 수직 경계를 정의하는 타일 행 및 타일 열을 따라 픽쳐를 분할하는데, 이 경우 각각의 타일은 직사각형 영역이다. 그 다음, 타일화 모듈(510)은 타일을 하나 이상의 타일 세트로 그룹화할 수 있는데, 이 경우, 타일 세트는 타일 중 하나 이상의 그룹이다.The tiling module 510 optionally divides the picture into a plurality of tiles of the same size or of different sizes. For example, the tiling module 510 divides a picture along tile rows and tile columns that define horizontal and vertical boundaries of tiles within the picture, along with the picture boundary. In this case, each tile is a rectangular area. to be. The tiling module 510 may then group the tiles into one or more tile sets, in which case the tile set is one or more groups of tiles.

일반적인 인코딩 제어부(520)는 입력 비디오 신호(505)에 대한 픽쳐뿐만 아니라 인코더(500)의 다양한 모듈로부터 피드백(도시되지 않음)을 수신한다. 종합하면, 일반적인 인코딩 제어부(520)는, 인코딩 동안 코딩 파라미터를 설정하고 변경하기 위해, 제어 신호(도시되지 않음)를 다른 모듈(예컨대 타일화 모듈(510), 변환기/스케일러/양자화기(530), 스케일러/역변환기(535), 인트라 픽쳐 추정기(540), 모션 추정기(550) 및 인트라/인터 스위치)로 제공한다. 특히, 일반적인 인코딩 제어부(520)는, 인코딩 동안 딕셔너리 모드를 사용할지의 여부 및 어떻게 사용할지를 결정할 수 있다. 일반적인 인코딩 제어부(520)는 또한, 인코딩 동안, 예를 들면, 레이트 왜곡 분석을 수행하는 동안 중간 결과를 평가할 수 있다. 일반적인 인코딩 제어부(520)는, 인코딩 동안 만들어진 결정을 나타내는 일반적인 제어 데이터(522)를 생성하고, 그 결과, 대응하는 디코더는 일관된 결정을 행할 수 있다. 일반적인 제어 데이터(522)는 헤더 포맷터(header formatter)/엔트로피 코더(590)로 제공된다.The general encoding control unit 520 receives not only a picture for the input video signal 505 but also feedback (not shown) from various modules of the encoder 500. In summary, the general encoding control unit 520 converts a control signal (not shown) to another module (e.g., a tiling module 510, a converter/scaler/quantizer 530) in order to set and change a coding parameter during encoding. , A scaler/inverse transformer 535, an intra picture estimator 540, a motion estimator 550, and an intra/inter switch). In particular, the general encoding control unit 520 may determine whether and how to use the dictionary mode during encoding. The general encoding control unit 520 may also evaluate intermediate results during encoding, for example, while performing rate distortion analysis. The general encoding control unit 520 generates general control data 522 representing decisions made during encoding, and as a result, the corresponding decoder can make a consistent decision. General control data 522 is provided by a header formatter/entropy coder 590.

현재 픽쳐가 인터 픽쳐 예측을 사용하여 예측되면, 모션 추정기(550)는, 하나 이상의 참조 픽쳐를 기준으로, 입력 비디오 신호(505)의 현재 픽쳐의 샘플 값의 블록의 모션을 추정한다. 디코딩된 픽쳐 버퍼(570)는, 참조 픽쳐로서의 사용을 위해 하나 이상의 재구성된 이전에 코딩된 픽쳐를 버퍼링한다. 다수의 참조 픽쳐가 사용되는 경우, 다수의 참조 픽쳐는 상이한 시간적 방향 또는 동일한 시간적 방향으로부터 유래할 수 있다. 모션 추정기(550)는 모션 데이터(552) 예컨대 모션 벡터 데이터 및 참조 픽쳐 선택 데이터를 부가 정보로서 생성한다. 모션 데이터(552)는 헤더 포맷터/엔트로피 코더(590)뿐만 아니라 모션 보상기(555)로 제공된다.When the current picture is predicted using inter picture prediction, the motion estimator 550 estimates a motion of a block of sample values of the current picture of the input video signal 505 based on one or more reference pictures. The decoded picture buffer 570 buffers one or more reconstructed previously coded pictures for use as a reference picture. When multiple reference pictures are used, the multiple reference pictures may originate from different temporal directions or from the same temporal direction. The motion estimator 550 generates motion data 552, such as motion vector data and reference picture selection data, as additional information. Motion data 552 is provided to a header formatter/entropy coder 590 as well as a motion compensator 555.

모션 보상기(555)는 모션 벡터를, 디코딩된 픽쳐 버퍼(570)로부터의 재구성된 참조 픽쳐(들)에 적용한다. 모션 보상기(555)는 현재 픽쳐에 대한 모션 보상된 예측치를 생성한다.Motion compensator 555 applies the motion vector to the reconstructed reference picture(s) from decoded picture buffer 570. The motion compensator 555 generates a motion compensated prediction value for the current picture.

인코더(500) 내에서의 별개의 경로에서, 인트라 픽쳐 추정기(540)는, 입력 비디오 신호(505)의 현재 픽쳐의 샘플 값의 블록에 대한 인트라 픽쳐 예측을 수행하는 방법을 결정한다. 현재 픽쳐는 인트라 픽쳐 코딩을 사용하여 전적으로 또는 부분적으로 코딩될 수 있다. 현재 픽쳐의 재구성(538)의 값을 사용한 인트라 공간 예측의 경우, 인트라 픽쳐 추정기(540)는, 현재 픽쳐의 이웃하는 이전에 재구성된 샘플 값으로부터, 현재 픽쳐의 현재 블록의 샘플 값을 공간적으로 예측하는 방법을 결정한다.In a separate path within the encoder 500, the intra picture estimator 540 determines a method of performing intra picture prediction on a block of sample values of the current picture of the input video signal 505. The current picture may be wholly or partially coded using intra picture coding. In the case of intra spatial prediction using the value of the reconstruction 538 of the current picture, the intra picture estimator 540 spatially predicts the sample value of the current block of the current picture from the previously reconstructed sample value adjacent to the current picture. Decide how to do it.

본원에서 설명되는 다양한 딕셔너리 코딩 모드의 경우, 인코더(500)는 이전에 재구성된 샘플 값(예를 들면, 1개 픽셀, 2개 픽셀, 4개 픽셀, 8개 픽셀, 및 기타 등등의 그룹화)의 해시 값을 계산할 수 있고 이들 해시 값을 인코딩되고 있는 현재 픽셀 값의 해시 값에 대해 비교할 수 있다. 길이의 매칭은 하나 이상이, 해시 비교에 기초하여, 이전에 재구성된 샘플 값에서 식별될 수 있고 현재 픽셀 값(또는 값들)은 본원에서 설명되는 다양한 1D 딕셔너리 모드 및 의사 2D 딕셔너리 모드(또는 참조 픽쳐와 관련한 인터 의사 2D 딕셔너리 모드)를 사용하여 인코딩될 수 있다.For the various dictionary coding modes described herein, the encoder 500 is a grouping of previously reconstructed sample values (e.g., 1 pixel, 2 pixels, 4 pixels, 8 pixels, and so on). Hash values can be computed and these hash values can be compared against the hash value of the current pixel value being encoded. Matching of the length can be identified in one or more previously reconstructed sample values, based on a hash comparison, and the current pixel value (or values) is the various 1D dictionary modes and pseudo 2D dictionary modes described herein (or reference picture Can be encoded using an inter pseudo 2D dictionary mode).

인트라 예측 추정기(540)는, 인트라 예측 데이터(542), 예컨대 인트라 예측이 공간 예측을 사용하는지 또는 다양한 딕셔너리 모드 중 하나를 사용하는지의 여부를 나타내는 정보(예를 들면, 인트라 블록당 또는 소정의 예측 모드 방향의 인트라 블록당 플래그 값), 및 (인트라 공간 예측을 위한) 예측 모드 방향을 부가 정보로서 생성한다. 인트라 예측 데이터(542)는, 헤더 포맷터/엔트로피 코더(590)뿐만 아니라 인트라 픽쳐 예측기(545)에도 제공된다. 인트라 예측 데이터(542)에 따르면, 인트라 픽쳐 예측기(545)는, 현재 픽쳐의 이웃하는 이전에 재구성된 샘플 값으로부터, 현재 픽쳐의 현재 블록의 샘플 값을 공간적으로 예측한다.The intra prediction estimator 540 includes information indicating whether intra prediction data 542, such as whether intra prediction uses spatial prediction or one of various dictionary modes (eg, per intra block or predetermined prediction A flag value per intra block in the mode direction) and a prediction mode direction (for intra spatial prediction) are generated as additional information. The intra prediction data 542 is provided not only to the header formatter/entropy coder 590 but also to the intra picture predictor 545. According to the intra prediction data 542, the intra picture predictor 545 spatially predicts the sample value of the current block of the current picture from the neighboring previously reconstructed sample values of the current picture.

비딕셔너리 모드에서, 인트라/인터 스위치는, 주어진 블록에 대한 예측치(558)로서의 사용을 위해 모션 보상된 예측 또는 인트라 픽쳐 예측의 값을 선택한다. 비딕셔너리 모드에서, 예측치(558)의 블록과 입력 비디오 신호(505)의 원래의 현재 픽쳐의 대응하는 부분 사이의 차이는 (만약 있다면) 잔차(518)의 값을 제공한다. 현재 픽쳐의 재구성 동안, 재구성된 잔차 값은 예측치(558)와 결합되어 비디오 신호(505)로부터의 원래의 컨텐츠의 재구성치(538)를 생성한다. 손실성 압축에서, 몇몇 정보는 여전히 비디오 신호(505)로부터 상실된다.In non-dictionary mode, the intra/inter switch selects a value of motion compensated prediction or intra picture prediction for use as a prediction value 558 for a given block. In non-dictionary mode, the difference between a block of prediction 558 and a corresponding portion of the original current picture of the input video signal 505 provides the value of the residual 518 (if any). During reconstruction of the current picture, the reconstructed residual value is combined with the predicted value 558 to produce a reconstructed value 538 of the original content from the video signal 505. In lossy compression, some information is still lost from the video signal 505.

변환기/스케일러/양자화기(530)에서, 비딕셔너리 모드의 경우, 주파수 변환은 공간 도메인 비디오 정보를 주파수 도메인(즉, 스펙트럼, 변환) 데이터로 변환한다. 블록 기반의 비디오 코딩의 경우, 주파수 변환기는, 예측 잔차 데이터(또는 예측(558)이 널(null)인 경우 샘플 값 데이터)의 블록에 대해, 이산 코사인 변환(discrete cosine transform; "DCT"), 그 정수 근사, 또는 다른 타입의 순방향 블록 변환(forward block transform)을 적용하여, 주파수 변환 계수의 블록을 생성하게 된다. 인코더(500)는 또한, 이러한 변환 단계가 스킵되는 것을 나타낼 수 있을 수도 있다. 스케일러/양자화기는 변환 계수를 스케일링하고 양자화한다. 예를 들면, 양자화기는, 프레임 단위 기반으로, 타일 단위 기반으로, 슬라이스 단위 기반으로, 블록 단위 기반으로, 또는 다른 기반으로 변하는 양자화 스텝 사이즈를 가지고 주파수 도메인 데이터에 대해 불균일한 스칼라 양자화를 적용한다. 양자화된 변환 계수 데이터(532)는 헤더 포맷터/엔트로피 코더(590)에 제공된다.In the converter/scaler/quantizer 530, in the case of a non-dictionary mode, the frequency transform converts spatial domain video information into frequency domain (ie, spectrum, transform) data. In the case of block-based video coding, the frequency converter, for a block of prediction residual data (or sample value data when prediction 558 is null), a discrete cosine transform (“DCT”), By applying the integer approximation or other type of forward block transform, a block of frequency transform coefficients is generated. The encoder 500 may also indicate that this transform step is skipped. The scaler/quantizer scales and quantizes the transform coefficients. For example, the quantizer applies non-uniform scalar quantization to frequency domain data with a quantization step size that varies on a frame-by-frame basis, on a tile basis, on a slice basis, on a block basis, or on another basis. The quantized transform coefficient data 532 is provided to a header formatter/entropy coder 590.

스케일러/역 변환기(535)에서, 비딕셔너리 모드의 경우, 스케일러/역 양자화기는, 양자화된 변환 계수에 관해 역 스케일링 및 역 양자화를 수행한다. 역 주파수 변환기는 역 주파수 변환을 수행하여, 재구성된 예측 잔차 값 또는 샘플 값의 블록을 생성한다. 인코더(500)는 재구성된 잔차 값을 예측치(558)의 값(예를 들면, 모션 보상된 예측 값, 인트라 픽쳐 예측 값)과 결합하여 재구성치(538)를 형성한다.In the scaler/inverse transformer 535, in the case of a non-dictionary mode, the scaler/inverse quantizer performs inverse scaling and inverse quantization on the quantized transform coefficients. The inverse frequency converter performs inverse frequency transformation to generate a reconstructed prediction residual value or a block of sample values. The encoder 500 combines the reconstructed residual value with a value of the predicted value 558 (eg, a motion compensated predicted value, an intra picture predicted value) to form a reconstructed value 538.

인트라 픽쳐 예측의 경우, 재구성(538)의 값은 인트라 픽쳐 추정기(540) 및 인트라 픽쳐 예측기(545)로 다시 공급될 수 있다. 또한, 재구성(538)의 값은 후속하는 픽쳐의 모션 보상된 예측에 대해 사용될 수 있다. 재구성(538)의 값은 추가로 필터링될 수 있다. 필터링 제어부(560)는, 비디오 신호(505)의 주어진 픽쳐에 대해, 재구성(538)의 값에 관한 디블록 필터링 및 샘플 적응 오프셋(sample adaptive offset; "SAO") 필터링을 수행하는 방법을 결정한다. 필터링 제어부(560)는 필터 제어 데이터(562)를 생성하는데, 필터 제어 데이터(562)는 헤더 포맷터/엔트로피 코더(590) 및 병합기/필터(들)(565)로 제공된다.In the case of intra picture prediction, the value of the reconstruction 538 may be supplied back to the intra picture estimator 540 and the intra picture predictor 545. Further, the value of reconstruction 538 may be used for motion compensated prediction of a subsequent picture. The value of reconstruction 538 may be further filtered. The filtering control unit 560 determines a method of performing deblocking filtering and sample adaptive offset (“SAO”) filtering on a value of the reconstruction 538 for a given picture of the video signal 505. . The filtering control unit 560 generates filter control data 562, which is provided to a header formatter/entropy coder 590 and a merger/filter(s) 565.

병합기/필터(들)(565)에서, 인코더(500)는 상이한 타일로부터의 컨텐츠를 픽쳐의 재구성된 버전으로 병합한다. 인코더(500)는, 프레임에서의 경계에 걸친 불연속부를 적응적으로 평활화하기 위해, 필터 제어 데이터(562)에 따라 디블록 필터링 및 SAO 필터링을 선택적으로 수행한다. 타일 경계는, 인코더(500)의 설정에 의존하여, 선택적으로 필터링될 수 있거나 또는 전혀 필터링되지 않을 수 있고, 인코더(500)는 이러한 필터링이 적용되었는지 또는 그렇지 않은지의 여부를 나타내기 위해 코딩된 비트스트림 내에 신택스를 제공할 수도 있다. 디코딩된 픽쳐 버퍼(570)는, 후속하는 모션 보상된 예측에서의 사용을 위해 재구성된 현재 픽쳐를 버퍼링한다.In merger/filter(s) 565, encoder 500 merges content from different tiles into a reconstructed version of the picture. The encoder 500 selectively performs deblocking filtering and SAO filtering according to the filter control data 562 in order to adaptively smooth a discontinuity across a boundary in a frame. The tile boundary may be selectively filtered or not filtered at all, depending on the settings of the encoder 500, and the encoder 500 may or may not have coded bits to indicate whether such filtering has been applied or not. You can also provide syntax in the stream. The decoded picture buffer 570 buffers the reconstructed current picture for use in subsequent motion compensated prediction.

헤더 포맷터/엔트로피 코더(590)는, 일반적인 제어 데이터(522), 양자화된 변환 계수 데이터(532), 인트라 예측 데이터(542) 및 패킷화된 인덱스 값, 모션 데이터(552), 및 필터 제어 데이터(562)를 포맷하고 및/또는 엔트로피 코딩한다. 예를 들면, 헤더 포맷터/엔트로피 코더(590)는, 계수 코딩 신택스 구조체의 다양한 신택스 엘리먼트의 엔트로피 코딩을 위해 컨텍스트 적응 이진 산술 코딩(context-adaptive binary arithmetic coding; "CABAC")을 사용한다.The header formatter/entropy coder 590 includes general control data 522, quantized transform coefficient data 532, intra prediction data 542 and packetized index values, motion data 552, and filter control data ( 562) and/or entropy coding. For example, the header formatter/entropy coder 590 uses context-adaptive binary arithmetic coding ("CABAC") for entropy coding of various syntax elements of a coefficient coding syntax structure.

헤더 포맷터/엔트로피 코더(590)는 코딩된 비디오 비트스트림(595)에서 인코딩된 데이터를 제공한다. 코딩된 비디오 비트스트림(595)의 포맷은, HEVC 포맷, 윈도우 미디어 비디오 포맷, VC-1 포맷, MPEG-x 포맷(예를 들면, MPEG-1, MPEG-2, 또는 MPEG-4), H.26x 포맷(예를 들면, H.261, H.262, H.263, H.264), 또는 다른 포맷의 변형안 또는 확장안일 수 있다.The header formatter/entropy coder 590 provides encoded data in the coded video bitstream 595. The format of the coded video bitstream 595 is HEVC format, window media video format, VC-1 format, MPEG-x format (e.g., MPEG-1, MPEG-2, or MPEG-4), H. It may be a 26x format (eg, H.261, H.262, H.263, H.264), or a variant or extension of another format.

소망되는 압축의 타입 및 구현예에 따라서, 인코더의 모듈은 추가될 수 있고, 생략될 수 있고, 다수의 모듈로 분할될 수 있고, 다른 모듈과 결합될 수 있고 및/또는 유사한 모듈로 대체될 수 있다. 대안적인 실시형태에서, 상이한 모듈 및/또는 모듈의 다른 구성을 갖는 인코더는 설명된 기술 중 하나 이상을 수행한다. 인코더의 특정 실시형태는, 통상적으로, 인코더(500)의 변형예 또는 보충된 버전을 사용한다. 인코더(500) 내에서의 모듈 사이에 도시되는 관계는, 인코더에서의 정보의 일반적인 흐름을 나타내며; 다른 관계는 간략화를 위해 도시되지 않는다.Depending on the type and implementation of compression desired, modules of the encoder can be added, omitted, divided into multiple modules, combined with other modules, and/or replaced by similar modules. have. In alternative embodiments, encoders with different modules and/or different configurations of modules perform one or more of the described techniques. Certain embodiments of the encoder typically use a variant or supplemented version of the encoder 500. The relationships shown between the modules within the encoder 500 represent the general flow of information in the encoder; Other relationships are not shown for the sake of brevity.

VI. 예시적인 비디오 디코더VI. Exemplary video decoder

도 6은 몇몇 설명된 실시형태가 연계하여 구현될 수도 있는 일반화된 디코더(600)의 블록도이다. 디코더(600)는 인코딩된 데이터를 코딩된 비디오 비트스트림(605)에서 수신하고 재구성된 비디오(695)에 대한 픽쳐를 포함하는 출력을 생성한다. 코딩된 비디오 비트스트림(605)의 포맷은, HEVC 포맷, 윈도우 미디어 비디오 포맷, VC-1 포맷, MPEG-x 포맷(예를 들면, MPEG-1, MPEG-2, 또는 MPEG-4), H.26x 포맷(예를 들면, H.261, H.262, H.263, H.264), 또는 다른 포맷의 변형안 또는 확장안일 수 있다.6 is a block diagram of a generalized decoder 600 in which some described embodiments may be implemented in conjunction. The decoder 600 receives the encoded data in the coded video bitstream 605 and produces an output containing the picture for the reconstructed video 695. The format of the coded video bitstream 605 is HEVC format, window media video format, VC-1 format, MPEG-x format (eg, MPEG-1, MPEG-2, or MPEG-4), H. It may be a 26x format (eg, H.261, H.262, H.263, H.264), or a variant or extension of another format.

디코더(600)는 블록 기반이며 구현예에 의존하는 블록 포맷을 사용한다. 블록은 상이한 스테이지에서 더 세분될 수도 있다. 예를 들면, 픽쳐는 64×64 블록, 32×32 블록 또는 16×16 블록으로 분할될 수 있는데, 이들은 종국에는 샘플 값의 더 작은 블록으로 분할될 수 있다. HEVC 표준에 대한 디코딩의 구현예에서, 픽쳐는 CTU(CTB), CU(CB), PU(PB) 및 TU(TB)로 구획된다.The decoder 600 is block based and uses a block format depending on the implementation. Blocks may be further subdivided in different stages. For example, a picture may be divided into 64×64 blocks, 32×32 blocks or 16×16 blocks, which may eventually be divided into smaller blocks of sample values. In an implementation of decoding for the HEVC standard, the picture is partitioned into CTU (CTB), CU (CB), PU (PB) and TU (TB).

디코더(600)는 인트라 픽쳐 디코딩 및/또는 인터 픽쳐 디코딩을 사용하여 픽쳐를 압축해제한다. 디코더(600)의 컴포넌트 중 많은 것은 인트라 픽쳐 디코딩 및 인터 픽쳐 디코딩 둘 다에 대해 사용된다. 이들 컴포넌트에 의해 수행되는 정확한 동작은, 압축해제되고 있는 정보의 타입에 의존하여 변할 수 있다.The decoder 600 decompresses a picture using intra picture decoding and/or inter picture decoding. Many of the components of decoder 600 are used for both intra picture decoding and inter picture decoding. The exact operation performed by these components can vary depending on the type of information being decompressed.

버퍼는 인코딩된 데이터를 코딩된 비디오 비트스트림(605)에서 수신하고 수신된 인코딩된 데이터를 파서/엔트로피 디코더(610)가 이용가능하게 만든다. 파서/엔트로피 디코더(610)는 엔트로피 코딩된 데이터를 엔트로피 디코딩하는데, 통상적으로는 인코더(500)에서 수행된 엔트로피 코딩의 역(예를 들면, 컨텍스트 적응 이진 산술 디코딩)을 적용한다. 예를 들면, 파서/엔트로피 디코더(610)는 계수 코딩 신택스 구조체의 다양한 신택스 엘리먼트의 엔트로피 디코딩을 위해 컨텍스트 적응 이진 산술 디코딩을 사용한다. 파싱 및 엔트로피 디코딩의 결과로서, 파서/엔트로피 디코더(610)는, 일반적인 제어 데이터(622), 양자화된 변환 계수 데이터(632), 인트라 예측 데이터(642) 및 패킷화된 인덱스 값, 모션 데이터(652) 및 필터 제어 데이터(662)를 생성한다.The buffer receives the encoded data in the coded video bitstream 605 and makes the received encoded data available to the parser/entropy decoder 610. The parser/entropy decoder 610 entropy decodes the entropy-coded data, and typically applies the inverse of the entropy coding performed by the encoder 500 (eg, context adaptive binary arithmetic decoding). For example, the parser/entropy decoder 610 uses context adaptive binary arithmetic decoding for entropy decoding of various syntax elements of a coefficient coding syntax structure. As a result of parsing and entropy decoding, the parser/entropy decoder 610 is configured with general control data 622, quantized transform coefficient data 632, intra prediction data 642 and packetized index values, motion data 652 ) And filter control data 662 are generated.

일반적인 디코딩 제어부(620)는 일반적인 제어 데이터(622)를 수신하고, 디코딩 동안 디코딩 파라미터를 설정하고 변경하기 위해, 제어 신호(도시되지 않음)를 다른 모듈(예컨대, 스케일러/역 변환기(635), 인트라 픽쳐 예측기(645), 모션 보상기(655) 및 인트라/인터 스위치)로 제공한다.The general decoding control unit 620 receives the general control data 622 and converts a control signal (not shown) to another module (e.g., scaler/inverse converter 635, intra) in order to set and change the decoding parameter during decoding. A picture predictor 645, a motion compensator 655, and an intra/inter switch) are provided.

현재 픽쳐가 인터 픽쳐 예측을 사용하여 예측되면, 모션 보상기(655)는 모션 데이터(652), 예컨대 모션 벡터 데이터 및 참조 픽쳐 선택 데이터를 수신한다. 모션 보상기(655)는 모션 벡터를, 디코딩된 픽쳐 버퍼(670)로부터의 재구성된 참조 픽쳐(들)에 적용한다. 모션 보상기(655)는 현재 픽쳐의 인터 코딩된 블록에 대한 모션 보상된 예측치를 생성한다. 디코딩된 픽쳐 버퍼(670)는 참조 픽쳐로서의 사용을 위해 하나 이상의 이전에 재구성된 픽쳐를 저장한다.If the current picture is predicted using inter picture prediction, the motion compensator 655 receives motion data 652, such as motion vector data and reference picture selection data. Motion compensator 655 applies the motion vector to the reconstructed reference picture(s) from decoded picture buffer 670. The motion compensator 655 generates a motion compensated prediction value for the inter-coded block of the current picture. The decoded picture buffer 670 stores one or more previously reconstructed pictures for use as a reference picture.

디코더(600) 내의 별개의 경로에서, 인트라 예측 예측기(645)는 인트라 예측 데이터(642), 예컨대 인트라 예측이 공간 예측을 사용하는지 또는 딕셔너리 모드 중 하나를 사용하는지의 여부를 나타내는 정보(예를 들면, 인트라 블록당 또는 소정의 예측 모드 방향의 인트라 블록당 플래그 값), (인트라 공간 예측을 위한) 예측 모드 방향을 수신한다. 인트라 공간 예측의 경우, 현재 픽쳐의 재구성(638)의 값을 사용하여, 예측 모드 데이터에 따라, 인트라 픽쳐 예측기(645)는, 현재 픽쳐의 이웃하는 이전에 재구성된 샘플 값으로부터, 현재 픽쳐의 현재 블록의 샘플 값을 공간적으로 예측한다.In a separate path within decoder 600, intra prediction predictor 645 provides intra prediction data 642, such as information indicating whether intra prediction uses spatial prediction or one of dictionary modes (e.g. , A flag value per intra block or per intra block in a predetermined prediction mode direction), and a prediction mode direction (for intra spatial prediction) are received. In the case of intra-spatial prediction, using the value of the reconstruction 638 of the current picture, according to the prediction mode data, the intra picture predictor 645, from the neighboring previously reconstructed sample values of the current picture, the current picture Spatially predict the sample value of the block.

비딕셔너리 모드에서, 인트라/인터 스위치는, 주어진 블록에 대한 예측치(658)로서의 사용을 위해 모션 보상된 예측 또는 인트라 픽쳐 예측의 값을 선택한다. 예를 들면, HEVC 신택스가 후속되면, 인트라/인터 스위치는, 인트라 예측된 CU 및 인터 예측된 CU를 포함할 수 있는 픽쳐의 CU에 대해 인코딩되는 신택스 엘리먼트에 기초하여 제어될 수 있다. 디코더(600)는 예측치(658)를 재구성된 잔차 값과 결합하여, 비디오 신호로부터의 컨텐츠의 재구성치(638)를 생성한다.In non-dictionary mode, the intra/inter switch selects a value of motion compensated prediction or intra picture prediction for use as a prediction value 658 for a given block. For example, if HEVC syntax is followed, the intra/inter switch may be controlled based on a syntax element encoded for a CU of a picture that may include an intra predicted CU and an inter predicted CU. The decoder 600 combines the predicted value 658 with the reconstructed residual value to generate a reconstructed value 638 of the content from the video signal.

잔차를 재구성하기 위해, 비딕셔너리 모드의 경우, 스케일러/역변환기(635)는 양자화된 변환 계수 데이터(632)를 수신하여 프로세싱한다. 스케일러/역 변환기(635)에서, 스케일러/역 양자화기는, 양자화된 변환 계수에 관해 역 스케일링 및 역 양자화를 수행한다. 역 주파수 변환기는 역 주파수 변환을 수행하여, 재구성된 예측 잔차 값 또는 샘플 값의 블록을 생성한다. 예를 들면, 역 주파수 변환기는 주파수 변환 계수에 대해 역 블록 변환을 적용하여, 샘플 값 데이터 또는 예측 잔차 데이터를 생성한다. 역 주파수 변환은 역 DCT, 역 DCT의 정수 근사, 또는 다른 타입의 역 주파수 변환일 수 있다.In order to reconstruct the residuals, in the case of a non-dictionary mode, the scaler/inverse transformer 635 receives and processes the quantized transform coefficient data 632. In the scaler/inverse transformer 635, the scaler/inverse quantizer performs inverse scaling and inverse quantization on the quantized transform coefficients. The inverse frequency converter performs inverse frequency transformation to generate a reconstructed prediction residual value or a block of sample values. For example, the inverse frequency converter applies an inverse block transform to a frequency transform coefficient to generate sample value data or prediction residual data. The inverse frequency transformation may be an inverse DCT, an integer approximation of an inverse DCT, or another type of inverse frequency transformation.

인트라 픽쳐 예측의 경우, 재구성(638)의 값은 인트라 픽쳐 예측기(645)로 다시 공급될 수 있다. 인터 픽쳐 예측의 경우, 재구성(638)의 값은 추가로 필터링될 수 있다. 병합기/필터(들)(665)에서, 디코더(600)는 상이한 타일로부터의 컨텐츠를 픽쳐의 재구성된 버전으로 병합한다. 디코더(600)는, 프레임에서의 경계에 걸친 불연속부를 적응적으로 평활화하기 위해, 필터 제어 데이터(662) 및 필터 적응을 위한 규칙에 따라 디블록 필터링 및 SAO 필터링을 선택적으로 수행한다. 타일 경계는, 인코딩된 비트스트림 데이터 내에서의 신택스 표시(syntax indication) 또는 디코더(600)의 설정에 의존하여, 선택적으로 필터링될 수 있거나 또는 전혀 필터링되지 않을 수 있다. 디코딩된 픽쳐 버퍼(670)는, 후속하는 모션 보상된 예측에서의 사용을 위해 재구성된 현재 픽쳐를 버퍼링한다.In the case of intra picture prediction, the value of the reconstruction 638 may be supplied back to the intra picture predictor 645. In the case of inter-picture prediction, the value of the reconstruction 638 may be additionally filtered. In merger/filter(s) 665, decoder 600 merges the content from different tiles into a reconstructed version of the picture. The decoder 600 selectively performs deblocking filtering and SAO filtering according to the filter control data 662 and a rule for filter adaptation in order to adaptively smooth a discontinuity across a boundary in a frame. The tile boundary may be selectively filtered or may not be filtered at all, depending on the setting of the decoder 600 or the syntax indication in the encoded bitstream data. The decoded picture buffer 670 buffers the reconstructed current picture for use in subsequent motion compensated prediction.

디코더(600)는 또한 후처리 디블록 필터(post-processing deblock filter)를 포함할 수 있다. 후처리 디블록 필터는, 옵션적으로, 재구성된 픽쳐에서의 불연속부를 평활화한다. 후처리 필터링의 일부로서, 다른 필터링(예컨대 디링잉 필터링(de-ring filtering))이 또한 적용될 수 있다.The decoder 600 may also include a post-processing deblock filter. The post-processing deblocking filter optionally smoothes the discontinuities in the reconstructed picture. As part of post-processing filtering, other filtering (eg de-ring filtering) can also be applied.

소망되는 압축해제의 타입 및 구현예에 따라서, 디코더의 모듈은 추가될 수 있고, 생략될 수 있고, 다수의 모듈로 분할될 수 있고, 다른 모듈과 결합될 수 있고 및/또는 유사한 모듈로 대체될 수 있다. 대안적인 실시형태에서, 상이한 모듈 및/또는 모듈의 다른 구성을 갖는 디코더는 설명된 기술 중 하나 이상을 수행한다. 디코더의 특정 실시형태는, 통상적으로, 디코더(600)의 변형예 또는 보충된 버전을 사용한다. 디코더(600) 내에서 모듈 사이에 도시되는 관계는, 디코더에서의 정보의 일반적인 흐름을 나타내며; 다른 관계는 간략화를 위해 도시되지 않는다.Depending on the type and implementation of decompression desired, modules of the decoder can be added, omitted, divided into multiple modules, combined with other modules and/or replaced by similar modules. I can. In alternative embodiments, decoders having different modules and/or different configurations of modules perform one or more of the described techniques. Certain embodiments of the decoder typically use a variant or supplemented version of the decoder 600. The relationship shown between the modules within the decoder 600 represents the general flow of information in the decoder; Other relationships are not shown for the sake of brevity.

VII. 1D VII. 1D 딕셔너리Dictionary 모드에Mode 대한 혁신안 Korea Innovation Plan

이 섹션은 일차원(1D) 딕셔너리 모드에 대한 다양한 혁신안을 제시한다. 몇몇 혁신안은 오프셋 및 길이를 사용하여 픽셀 값을 시그널링하는 것에 관한 것이고, 한편 다른 혁신안은 픽셀 값을 직접적으로 시그널링하는 것에 관한 것이다. 또 다른 혁신안은 수직 스캐닝 및 수평 스캐닝에 관한 것이다.This section presents various innovations for the one-dimensional (1D) dictionary mode. Some innovations relate to signaling pixel values using offset and length, while others relate to signaling pixel values directly. Another innovation concerns vertical and horizontal scanning.

특히, 픽셀 값의 인코딩시 1D 딕셔너리 모드를 사용하는 것은 성능을 향상시킬 수 있고, 비디오 컨텐츠, 특히 스크린 컨텐츠를 인코딩할 때(예를 들면, 스크린 캡쳐를 수행할 때) 필요로 되는 비트를 감소시킬 수 있다. 스크린 컨텐츠는 통상적으로 반복된 구조체(예를 들면, 그래픽, 텍스트 문자)를 포함하며, 반복된 문자는 예측을 통해 인코딩될 수 있는 동일한 픽셀 값을 갖는 영역을 제공하여 성능을 향상시키게 된다.In particular, using the 1D dictionary mode when encoding pixel values can improve performance and reduce the bits required when encoding video content, particularly screen content (e.g., when performing screen capture). I can. Screen content typically includes repeated structures (e.g., graphics, text characters), and the repeated characters improve performance by providing areas with the same pixel values that can be encoded through prediction.

A. 1D A. 1D 딕셔너리Dictionary 모드mode - 서론 - Introduction

1D 딕셔너리 모드에서, 샘플 값(예를 들면, 픽셀 값)은 (오프셋 및 길이를 사용하여) 1D 딕셔너리에 저장되어 있는 이전의 샘플 값(예를 들면, 이전에 재구성된 샘플 값)을 참조로 예측된다. 예를 들면, 비디오 인코더 또는 이미지 인코더는, 현재 샘플 값을 예측 및 인코딩하기 위해 사용되는 이전의 샘플 값(예를 들면, 재구성된 또는 원래의 샘플 값)을 저장하는 1D 딕셔너리를 참조로 현재 샘플 값을 인코딩할 수 있다. 비디오 디코더 또는 이미지 디코더는, 현재 샘플 값을 예측 및 디코딩하기 위해 사용되는 이전에 디코딩된(예를 들면, 재구성된) 샘플 값을 저장하는 1D 딕셔너리를 참조로 현재 샘플 값을 디코딩할 수 있다.In 1D dictionary mode, sample values (e.g. pixel values) are predicted with reference to previous sample values (e.g., previously reconstructed sample values) stored in the 1D dictionary (using offset and length). do. For example, a video encoder or image encoder may refer to a 1D dictionary that stores previous sample values (e.g., reconstructed or original sample values) used to predict and encode the current sample values. Can be encoded. The video decoder or image decoder may decode the current sample value with reference to a 1D dictionary that stores previously decoded (eg, reconstructed) sample values used to predict and decode the current sample value.

1D 딕셔너리 모드에서, 하나 이상의 현재 픽셀 값은 (예를 들면, 스캔 순서로) 하나 이상의 이전 픽셀 값으로부터 예측될 수 있다. 예측은, (예를 들면, 어떠한 잔차도 필요로 하지 않으면서) 현재 픽셀 값이 정확하게 예측될 수 있도록, 현재 픽셀 값을 이전 픽셀 값과 매칭시키는 것에 의해 수행될 수 있다. 용어 "매칭 모드"는 딕셔너리의(또는 다른 소스, 예컨대 재구성된 픽쳐로부터의) 매칭 픽셀 값을 사용하는 인코딩 및/또는 디코딩을 설명한다(또는 매칭 픽셀 값이 존재하지 않는 상황에서는(예를 들면, 프레임의 시작 또는 이전 픽셀 값의 딕셔너리에서 어떤 매칭도 발견되지 않는 경우에는), 하나 이상의 현재 픽셀 값이 직접적으로 코딩될 수 있다. 용어 "직접 모드"는 픽셀 값을 직접적으로 인코딩 및/또는 디코딩하는 것을 설명한다.In the 1D dictionary mode, one or more current pixel values may be predicted from one or more previous pixel values (eg, in scan order). Prediction can be performed by matching the current pixel value with a previous pixel value so that the current pixel value can be accurately predicted (eg, without requiring any residuals). The term “matching mode” describes encoding and/or decoding using matching pixel values from a dictionary (or from another source, e.g. a reconstructed picture) (or in situations in which no matching pixel values are present (e.g., If no match is found at the beginning of the frame or in the dictionary of previous pixel values), one or more current pixel values may be directly coded The term “direct mode” directly encodes and/or decodes pixel values. Explain that.

몇몇 구현예에서, 픽셀 값은 조합된 픽셀로서 인코딩되고 디코딩된다(픽셀에 대한 Y, U, 및 V의 조합, 또는 픽셀에 대한 R, G, 및 B의 조합이 함께 인코딩된다/디코딩된다). 다른 구현예에서, 픽셀 값은 별개의 성분으로서 인코딩되고 디코딩된다(예를 들면, Y, U, 및 V 성분 또는 R, G, 및 B 성분의 각각에 대해 별개의 1D 딕셔너리가 유지될 수 있다). 픽셀 값은 다양한 YUV 데이터 포맷(예를 들면, YUV 4:4:4, YUV 4:2:2, YUV 4:2:0, 등등)으로 또는 다양한 RGB 데이터 포맷(예를 들면, RGB, GBR, BGR, 등등)으로 인코딩되고 디코딩될 수 있다.In some implementations, pixel values are encoded and decoded as combined pixels (a combination of Y, U, and V for a pixel, or a combination of R, G, and B for a pixel are encoded/decoded together). In other implementations, pixel values are encoded and decoded as separate components (e.g., separate 1D dictionaries may be maintained for each of the Y, U, and V components or R, G, and B components). . Pixel values can be in various YUV data formats (e.g., YUV 4:4:4, YUV 4:2:2, YUV 4:2:0, etc.) or in various RGB data formats (e.g. RGB, GBR, BGR, etc.) and decoded.

1D 딕셔너리 모드를 사용하는 픽셀 값의 인코딩 및/또는 디코딩은, 별개의 영역, 예컨대 블록으로 분할되는 비디오 또는 이미지 컨텐츠에 적용될 수 있다. 일반적으로, 임의의 사이즈의 블록이 사용될 수 있다. 몇몇 구현예에서, 비디오 컨텐츠(예를 들면, 비디오 픽쳐 또는 프레임)는 64×64, 32×32, 16×16, 또는 8×8 샘플 값의 사이즈를 갖는 코딩 단위로 분할된다.Encoding and/or decoding of pixel values using the 1D dictionary mode may be applied to a separate area, such as video or image content divided into blocks. In general, blocks of any size can be used. In some implementations, video content (eg, a video picture or frame) is divided into coding units having a size of 64×64, 32×32, 16×16, or 8×8 sample values.

몇몇 구현예에서, 딕셔너리 코딩은 다른 타입의 코딩과 결합될 수 있다. 예를 들면, 픽셀 값은, 본원에서 설명되는 딕셔너리 모드 중 하나(예를 들면, 1D 딕셔너리 모드)를 사용하여 코딩될 수 있다. 그 다음, 코딩된 픽셀 값은 다른 코딩 기술(예를 들면, 컨텍스트 기반의 산술 코딩 또는 다른 코딩 기술)을 사용하여 코딩될 수 있다.In some implementations, dictionary coding can be combined with other types of coding. For example, pixel values may be coded using one of the dictionary modes described herein (eg, 1D dictionary mode). The coded pixel values can then be coded using other coding techniques (eg, context based arithmetic coding or other coding techniques).

B. 오프셋 및 길이의 B. Offset and length 시그널링Signaling

1D 딕셔너리 모드에서, 매칭 픽셀 값이 존재하는 경우, 현재 픽셀 값을 예측할 매칭 픽셀 값이 위치되는 1D 딕셔너리에서의 위치를 나타내기 위해, 오프셋 값 및 길이 값이 시그널링된다. 예를 들면, 하나 이상의 현재 픽셀 값은, 오프셋(현재 픽셀 값으로부터 1D 딕셔너리에서의 역방향의 위치) 및 길이(오프셋으로부터 예측되는 픽셀 값의 수)에 의해 1D 딕셔너리 내에서 식별되는 1D 딕셔너리에 저장된 하나 이상의 이전 픽셀 값으로부터 예측될 수 있다. 이해되어야 하는 바와 같이, 5의 오프셋은 현재 픽셀 값으로부터 1D 딕셔너리에서 다섯 픽셀 이전(back)을 의미한다(예를 들면, 몇몇 구현예에서, 음의(negative) 부호가 추가되는데, 이것은 본 예에서는 -5의 오프셋일 것이다).In the 1D dictionary mode, when a matching pixel value exists, an offset value and a length value are signaled to indicate a position in the 1D dictionary where the matching pixel value to predict the current pixel value is located. For example, one or more current pixel values may be stored in a 1D dictionary identified within the 1D dictionary by offset (position in the 1D dictionary from the current pixel value in the reverse direction) and length (number of pixel values predicted from the offset). It can be predicted from the previous pixel values. As should be understood, an offset of 5 means five pixels back in the 1D dictionary from the current pixel value (e.g., in some implementations, a negative sign is added, which in this example Will be an offset of -5).

1D 딕셔너리 모드에서는, 몇몇 구현예에서, 현재 블록의 픽셀 값이 (예를 들면, 딕셔너리의 최대 사이즈에 의존하여) 이전 블록의 픽셀 값으로부터 예측될 수 있다. 예를 들면, 64×64 블록을 사용하여 코딩된 픽쳐에서, 픽쳐의 네 번째 블록으로부터의 픽셀 값은, 1D 딕셔너리에 저장되어 있는 픽쳐의 첫 번째 블록으로부터의 픽셀 값으로부터 (예를 들면, 오프셋 및 길이를 사용하여) 예측될 수 있다.In the 1D dictionary mode, in some implementations, the pixel values of the current block can be predicted from the pixel values of the previous block (eg, depending on the maximum size of the dictionary). For example, in a picture coded using a 64×64 block, the pixel values from the fourth block of the picture are from the pixel values from the first block of the picture stored in the 1D dictionary (e.g., offset and Can be predicted using length).

오프셋은, 가능한 오프셋 값을 다수의 범위로 분할하고, 오프셋 값을 범위에 의해 인코딩하는 포맷으로 인코딩되어 (예를 들면, 비트 스트림에서) 시그널링될 수 있다. 이 방식에서, 오프셋은 오프셋 범위를 식별하는 제1 부분 및 그 범위 내의 오프셋 값을 나타내는 제2 부분을 갖는 두 부분의 코드로서 인코딩될 수 있다.The offset may be signaled (eg, in a bit stream) encoded in a format that divides the possible offset values into multiple ranges and encodes the offset values by range. In this way, the offset may be encoded as a two-part code with a first part identifying the offset range and a second part indicating an offset value within that range.

특정 구현예에서, 오프셋 값은 다음의 범위를 사용하여 코딩된다. 또한, 이 구현예에서, 오프셋 값이 인코딩되기 이전에 오프셋 값이 1만큼 감소되고 오프셋 값이 디코딩된 이후에 오프셋 값이 1만큼 증가되도록, 제로에서 시작하는 넘버링(zero-based numbering)이 적용된다. (자신의 오프셋 범위 코드를 갖는) 범위, 대응하는 오프셋 값, 및 비트의 수는 다음의 테이블(테이블 1)에 의해 나타내어진다.In certain implementations, the offset values are coded using the following ranges. Also, in this embodiment, zero-based numbering is applied so that the offset value is decreased by 1 before the offset value is encoded and the offset value is increased by 1 after the offset value is decoded. . The range (with its own offset range code), the corresponding offset value, and the number of bits are represented by the following table (Table 1).

상기의 테이블 1에서 묘사되는 구현예를 사용하여, 오프셋이 인코딩될 수 있고, 시그널링될 수 있고, 디코딩될 수 있다. 예로서, 415(인코딩을 위해 1만큼 감소되는 416의 원래의 오프셋 값을 나타냄)의 오프셋 값은 범위 4에서 인코딩될 것이다. 범위 4가 276의 오프셋 값으로 시작하기 때문에, 코딩될 값은 415 - 276 = 139가 될 것이다. 인코딩된 오프셋은, 16 비트 값의 "0000000010001011"(십진수 139에 대한 16 비트의 이진 값)이 후속하는 "0001"(범위 4를 나타냄)의 오프셋 범위 코드를 결합하는 것에 의해 생성될 것이다. 코드의 두 부분(오프셋 범위 코드 및 오프셋 값 코드)을 함께 두는 것은, 인코딩된 오프셋에 대해 다음의 결합된 코드로 나타난다: "00010000000010001011". 다른 예로서, 45(인코딩을 위해 1만큼 감소되는 46의 원래의 오프셋 값을 나타냄)의 오프셋 값은 범위 3에서 인코딩될 것이다. 범위 3이 20의 오프셋 값으로 시작하기 때문에, 코딩될 값은 45 - 20 = 25가 될 것이다. 인코딩된 오프셋은, 8 비트 값의 "00011001"(십진수 25에 대한 8 비트의 이진 값)이 후속하는 "001"(범위 3를 나타냄)의 오프셋 범위 코드를 결합하는 것에 의해 생성될 것이다. 코드의 두 부분(오프셋 범위 코드 및 오프셋 값 코드)을 함께 두는 것은, 인코딩된 오프셋에 대해 다음의 결합된 코드로 나타난다: "00100011001".Using the implementation depicted in Table 1 above, the offset can be encoded, signaled, and decoded. As an example, an offset value of 415 (representing the original offset value of 416, decremented by 1 for encoding) would be encoded in range 4. Since range 4 starts with an offset value of 276, the value to be coded will be 415-276 = 139. The encoded offset will be generated by combining the offset range code of "0001" (representing range 4) followed by "0000000010001011" of the 16-bit value (16-bit binary value for decimal 139). Putting the two parts of the code together (offset range code and offset value code) is represented by the following combined code for the encoded offset: "00010000000010001011". As another example, an offset value of 45 (representing the original offset value of 46 that is reduced by 1 for encoding) would be encoded in range 3. Since range 3 starts with an offset value of 20, the value to be coded will be 45-20 = 25. The encoded offset will be generated by combining an offset range code of "001" (representing range 3) followed by an 8-bit value of "00011001" (an 8-bit binary value for decimal 25). Putting the two parts of the code together (offset range code and offset value code) is represented by the following combined code for the encoded offset: "00100011001".

상기의 테이블 1에서 묘사되는 바와 같이, 범위 5는 N 비트를 사용하여 65,811보다 큰 오프셋 값을 표현하는데, 여기서 N은 최대 오프셋을 표현하는 데 필요로 되는 비트의 수를 나타낸다. 몇몇 구현예에서, 최대 오프셋 값은 현재의 딕셔너리 사이즈로부터 결정된다. 예를 들면, 현재 딕셔너리 사이즈가 300,000이면, N은 18(즉, 300,000의 최대 오프셋 값을 표현하기 위해서는 18비트가 필요로 된다)로 설정될 수 있고, 따라서 65,811과 300,000 사이의 오프셋 값은 그 오프셋 값을 인코딩하기 위해 18 비트를 사용할 것이다. 범위 5에 대한 오프셋 값이 65,812에서 시작하며, 따라서, 300,000을 나타내기 위해서는, 65,811보다 큰 양을 표현하도록 단지 18 비트만이 필요로 된다(즉, 300,000 - 65,812 = 234,188을 표현하기 위해 단지 18 비트만이 필요로 된다)는 것이 이해되어야 한다. 다른 구현예에서, 최대 오프셋 값은 미리 결정되고 현재의 딕셔너리 사이즈에 의존하지 않는다. 예를 들면, 미리 결정된 최대 오프셋 값이 800,000이면, N은 20으로 설정될 수 있다.As depicted in Table 1 above, range 5 uses N bits to represent an offset value greater than 65,811, where N represents the number of bits required to represent the maximum offset. In some implementations, the maximum offset value is determined from the current dictionary size. For example, if the current dictionary size is 300,000, N can be set to 18 (that is, 18 bits are required to represent the maximum offset value of 300,000), so an offset value between 65,811 and 300,000 is the offset value. We will use 18 bits to encode the value. The offset value for range 5 starts at 65,812, so to represent 300,000, only 18 bits are needed to represent an amount greater than 65,811 (i.e. 300,000-65,812 = only 18 bits to represent 234,188). It should be understood that only is needed). In other implementations, the maximum offset value is predetermined and does not depend on the current dictionary size. For example, if the predetermined maximum offset value is 800,000, N may be set to 20.

다른 구현예에서, 오프셋 값은 상이한 수의 범위 및 오프셋 값의 상이한 그룹화를 커버하는 범위를 사용하여 코딩될 수 있다.In other implementations, the offset values may be coded using different numbers of ranges and ranges covering different groupings of the offset values.

특정 구현예에서, 길이 값은, 오프셋 값과 마찬가지로, 범위에 의해 코딩된다. 또한, 이 구현예에서, 길이 값이 인코딩되기 이전에 길이 값이 1만큼 감소되고 길이 값이 디코딩된 이후에 길이 값이 1만큼 증가되도록, 제로에서 시작하는 넘버링이 적용된다. (자신의 길이 범위 코드를 갖는) 범위, 대응하는 길이 값, 및 비트의 수는 다음의 테이블(테이블 2)에 의해 나타내어진다.In certain implementations, length values, like offset values, are coded by range. Also, in this implementation, numbering starting at zero is applied such that the length value is decreased by 1 before the length value is encoded and the length value is increased by 1 after the length value is decoded. The range (with its own length range code), the corresponding length value, and the number of bits are represented by the following table (Table 2).

상기의 테이블 2에서 묘사되는 구현예를 사용하여, 길이가 인코딩될 수 있고, 시그널링될 수 있고, 디코딩될 수 있다. 예로서, 2(인코딩을 위해 1만큼 감소되는 3의 원래의 오프셋 값을 나타냄)의 길이 값은 범위 1에서 인코딩될 것이다. 인코딩된 길이는, 2 비트 값의 "10"(십진수 2에 대한 2 비트의 이진 값)이 후속하는 "1"(범위 1을 나타냄)의 길이 범위 코드를 결합하는 것에 의해 생성될 것이다. 코드의 두 부분(길이 범위 코드 및 길이 값 코드)을 함께 두는 것은, 인코딩된 길이에 대해 다음의 결합된 코드로 나타난다: "101". 다른 예로서, 56(인코딩을 위해 1만큼 감소되는 57의 원래의 길이 값을 나타냄)의 길이 값은 범위 3에서 인코딩될 것이다. 범위 3이 20의 오프셋 값으로 시작하기 때문에, 코딩될 값은 56 - 20 = 36이 될 것이다. 인코딩된 길이는, 8 비트 값의 "00100100"(십진수 36에 대한 8 비트의 이진 값)이 후속하는 "001"(범위 3를 나타냄)의 길이 범위 코드를 결합하는 것에 의해 생성될 것이다. 코드의 두 부분(길이 범위 코드 및 길이 값 코드)을 함께 두는 것은, 인코딩된 길이에 대해 다음의 결합된 코드로 나타난다: "00100100100".Using the implementation depicted in Table 2 above, the length can be encoded, signaled, and decoded. As an example, a length value of 2 (representing the original offset value of 3 decremented by 1 for encoding) would be encoded in range 1. The encoded length will be generated by concatenating a length range code of "1" (representing range 1) followed by "10" of a 2-bit value (2-bit binary value for decimal 2). Putting the two parts of the code together (Length Range Code and Length Value Code) is represented by the following combined code for the encoded length: "101". As another example, a length value of 56 (representing the original length value of 57 decremented by 1 for encoding) would be encoded in range 3. Since range 3 starts with an offset value of 20, the value to be coded will be 56-20 = 36. The encoded length will be generated by combining a length range code of "001" (representing range 3) followed by an 8-bit value of "00100100" (an 8-bit binary value for 36 decimal). Putting two parts of the code together (length range code and length value code) is represented by the following combined code for the encoded length: "00100100100".

상기의 테이블 2에서 묘사되는 바와 같이, 범위 4는 N 비트를 사용하여 275보다 큰 길이 값을 표현하는데, 여기서 N은 최대 오프셋을 표현하는 데 필요로 되는 비트의 수를 나타낸다. 몇몇 구현예에서, 최대 길이 값은, 인코딩되고 있는 또는 디코딩되고 있는 현재 블록에서 왼쪽으로의 픽셀의 수이다. 예를 들면, 인코딩되고 있는 또는 디코딩되고 있는 현재 픽셀 값이, 현재의 64×64 블록(4,096개의 픽셀 값을 갖는 블록)에서의 3,000번째 픽셀 값이면, 최대 길이 값은 1,096(4,096 - 3,000)이고, 이것은 10 비트(N=10)에 의해 나타내어질 수 있다. 범위 4에 대한 오프셋 값이 276에서 시작하며, 따라서, 1,096을 나타내기 위해서는, 275를 넘는 양을 표현하도록 단지 10 비트만이 필요로 된다(즉, 1,096 - 276 = 820을 표현하기 위해 단지 10 비트만이 필요로 된다)는 것이 이해되어야 한다. 다른 구현예에서, 최대 길이 값은 미리 결정되고 현재의 딕셔너리 사이즈에 의존하지 않는다. 예를 들면, 미리 결정된 최대 오프셋 값이 4,096이면, N은 12로 설정될 수 있다.As depicted in Table 2 above, range 4 uses N bits to represent length values greater than 275, where N represents the number of bits required to represent the maximum offset. In some implementations, the maximum length value is the number of pixels to the left in the current block being encoded or being decoded. For example, if the current pixel value being encoded or being decoded is the 3,000th pixel value in the current 64x64 block (a block with 4,096 pixel values), then the maximum length value is 1,096 (4,096-3,000). , This can be represented by 10 bits (N=10). The offset value for range 4 starts at 276, so, to represent 1,096, only 10 bits are needed to represent quantities over 275 (i.e., 1,096-276 = only 10 bits to represent 820. It should be understood that only is needed). In other implementations, the maximum length value is predetermined and does not depend on the current dictionary size. For example, if the predetermined maximum offset value is 4,096, N may be set to 12.

다른 구현예에서, 길이 값은 상이한 수의 범위 및 길이 값의 상이한 그룹화를 커버하는 범위를 사용하여 코딩될 수 있다.In other implementations, length values can be coded using different numbers of ranges and ranges covering different groupings of length values.

몇몇 구현예에서, 최대 오프셋 및/또는 최대 길이는 공지된다. 최대 오프셋 및/또는 최대 길이가 공지되면, 코딩 효율성은 향상될 수 있다. 예를 들면, 매칭 오프셋의 값을 코딩할 때, 최대 오프셋은 현재의 딕셔너리 사이즈로 설정될 수 있다(예를 들면, 현재의 딕셔너리 사이즈가 10개 픽셀이면, 오프셋은 10보다 클 수 없다). 매칭 길이의 값을 인코딩할 때, 최대 길이는 현재 블록(예를 들면, 현재의 코딩 단위(coding unit; CU)에서 왼쪽으로의 픽셀의 수로 설정될 수 있다. 예를 들면, 인코딩되고 있는 또는 디코딩되고 있는 현재 픽셀 값이 8×8 블록에서의 15번째 픽셀이면, 최대 길이는 49로 설정될 수 있다. (오프셋 및/또는 길이에 대해) 최대 값이 공지되면, 최대 값은 보다 효율적으로 시그널링될 수 있다. 예를 들면, 최대 값을 인코딩하는 데 필요로 되는 비트의 수는, ceiling(log2(최대치))에 의해 결정될 수 있는데, ceiling(log2(최대치))은 상기의 테이블 1 및 테이블 2에서 "N" 비트를 정의하기 위해 사용될 수 있다.In some implementations, the maximum offset and/or maximum length is known. If the maximum offset and/or maximum length is known, the coding efficiency can be improved. For example, when coding the value of the matching offset, the maximum offset may be set to the current dictionary size (eg, if the current dictionary size is 10 pixels, the offset cannot be greater than 10). When encoding the value of the matching length, the maximum length may be set as the number of pixels to the left in the current block (eg, the current coding unit (CU). For example, being encoded or decoded) If the current pixel value being being is the 15th pixel in the 8x8 block, the maximum length can be set to 49. If the maximum value (for offset and/or length) is known, the maximum value will be signaled more efficiently. For example, the number of bits required to encode the maximum value can be determined by ceiling(log2(maximum value)), where ceiling(log2(maximum value)) is given in Tables 1 and 2 above. Can be used to define the "N" bit.

몇몇 구현예에서, 최소 오프셋 및 길이는 1인데, 이것은 제로에서 시작하는 넘버링으로 변환될 때 0으로 코딩될 수 있다.In some implementations, the minimum offset and length is 1, which can be coded as 0 when converted to numbering starting at zero.

1D 딕셔너리 모드는 블록 내의 픽셀 값을 인코딩 및/또는 디코딩하는 데 적용될 수 있다. 예를 들면, 1D 딕셔너리 모드(뿐만 아니라 본원에서 설명되는 다른 딕셔너리 모드)는 비디오 프레임의 블록(예를 들면, 4×4 블록, 8×8 블록, 16×16 블록, 32×32 블록 및 64×64 블록과 같은 다양한 사이즈의 블록) 내의 픽셀 값을 인코딩 및/또는 디코딩하는 데 적용될 수 있다The 1D dictionary mode can be applied to encode and/or decode pixel values within a block. For example, 1D dictionary mode (as well as other dictionary modes described herein) can be used for blocks of video frames (e.g., 4x4 blocks, 8x8 blocks, 16x16 blocks, 32x32 blocks, and 64x). It can be applied to encode and/or decode pixel values within blocks of various sizes, such as 64 blocks.

몇몇 구현예에서, 오프셋 및 길이는 인코딩되고 있는/디코딩되고 있는 현재 픽셀 값과 중첩할 수 있다. 예로서, 픽셀 값 [P-2, P-1, P0, P1, P2, P3]을 고려하는데, 여기서 P-2 및 P-1은 1D 딕셔너리에서의 마지막 두 픽셀 값이고, P0는 인코딩되고 있는/디코딩되고 있는 현재 픽셀 값이고, P1 내지 P3는 인코딩될/디코딩될 다음 픽셀 값이다. 이 상황에서, 1의 오프셋 및 3의 길이(인코딩되지 않은 오프셋 값 및 길이 값)는, P0가 P-1로부터 예측되고, P1이 P0로부터 예측되고, P2가 P1로부터 예측되는 유효 조건이다. 이해되어야 하는 바와 같이, 1의 오프셋(인코딩되지 않은 값, 이것은 인코딩될 때 0일 것이다)은 현재 픽셀 값으로부터 1D 딕셔너리 안으로의 하나 위치 이전을 의미한다(예를 들면, 몇몇 구현예에서, 음의 부호가 오프셋에 추가되는데, 이것은 이 예에서 -1의 오프셋일 것이다).In some implementations, the offset and length may overlap the current pixel value being encoded/decoded. As an example, consider the pixel values [P-2, P-1, P0, P1, P2, P3], where P-2 and P-1 are the last two pixel values in the 1D dictionary, and P0 is the value being encoded. /Is the current pixel value being decoded, and P1 to P3 are the next pixel values to be encoded/decoded. In this situation, an offset of 1 and a length of 3 (unencoded offset value and length value) are valid conditions in which P0 is predicted from P-1, P1 is predicted from P0, and P2 is predicted from P1. As should be understood, an offset of 1 (unencoded value, which will be 0 when encoded) means one position before the current pixel value into the 1D dictionary (e.g., in some implementations, a negative The sign is added to the offset, which will be an offset of -1 in this example).

C. 수평 및 수직 스캐닝C. Horizontal and vertical scanning

1D 딕셔너리 모드는 수평 및 수직 스캐닝을 지원하는데, 수평 및 수직 스캐닝은 1D 딕셔너리와 비디오 또는 이미지 컨텐츠의 이차원 표현(예를 들면, 이차원 비디오 또는 이미지 컨텐츠의 블록) 사이에서 변환하는 데 사용될 수 있다. 예를 들면, 비디오 컨텐츠의 블록 내의 픽셀 값은, 인코딩 및 디코딩될 때 수평으로 스캔될 수 있다. 수평 스캐닝에서, 픽셀 값은 1D 딕셔너리에 수평 스캐닝 순서로(예를 들면, 픽셀의 행에서 좌에서 우로) 추가된다. 비디오 컨텐츠의 블록 내의 픽셀 값은, 인코딩 및 디코딩될 때 수직으로 스캔될 수 있다. 수직 스캐닝에서, 픽셀 값은 1D 딕셔너리에 수직 스캐닝 순서로(예를 들면, 픽셀의 열에서 위에서 아래로) 추가된다.The 1D dictionary mode supports horizontal and vertical scanning, which can be used to convert between a 1D dictionary and a two-dimensional representation of video or image content (e.g., a block of two-dimensional video or image content). For example, pixel values within a block of video content can be scanned horizontally when encoded and decoded. In horizontal scanning, pixel values are added to a 1D dictionary in horizontal scanning order (eg, from left to right in a row of pixels). Pixel values within a block of video content can be scanned vertically when encoded and decoded. In vertical scanning, pixel values are added to a 1D dictionary in vertical scanning order (eg, top to bottom in a column of pixels).

몇몇 구현예에서, 수평 및 수직 스캐닝 둘 다가 지원된다. 수평 및 수직 스캐닝 둘 다를 지원하기 위해, 두 개의 1D 딕셔너리, 즉, 픽셀 값을 수평 스캐닝 순서로 저장하는 하나의 1D 딕셔너리(수평 스캐닝 1D 딕셔너리) 및 픽셀 값을 수직 스캐닝 순서로 저장하는 다른 1D 딕셔너리(수직 스캐닝 1D 딕셔너리)가 유지될 수 있다. 픽셀 값이 추가될 필요가 있는 경우, 픽셀 값은 수평 스캐닝 1D 딕셔너리 및 수직 스캐닝 1D 딕셔너리 둘 다에 추가될 수 있다. 픽셀 값의 순서매김(ordering)은, 순서가, 어떤 스캐닝 순서가 사용되는지에 의존하기 때문에, 양 딕셔너리에서 상이할 것이다.In some implementations, both horizontal and vertical scanning are supported. To support both horizontal and vertical scanning, two 1D dictionaries, one 1D dictionary that stores pixel values in horizontal scanning order (horizontal scanning 1D dictionary) and another 1D dictionary that stores pixel values in vertical scanning order ( Vertical scanning 1D dictionary) can be maintained. If a pixel value needs to be added, the pixel value can be added to both the horizontal scanning 1D dictionary and the vertical scanning 1D dictionary. The ordering of pixel values will be different in both dictionaries, as the ordering depends on which scanning order is used.

몇몇 구현예에서, 1D 딕셔너리에 추가하는 것은 상이한 시간에 수행된다. 예를 들면, 수평 스캐닝 모드에서 블록을 인코딩 또는 디코딩하는 경우, 픽셀 값은, 그들이 인코딩될 때 또는 디코딩될 때, 수평 스캐닝 1D 딕셔너리에 추가될 수 있다. 현재 블록이 인코딩되면 또는 디코딩되면, 픽셀 값은 수직 스캐닝 1D 딕셔너리에 추가될 수 있다.In some implementations, adding to the 1D dictionary is performed at different times. For example, when encoding or decoding blocks in the horizontal scanning mode, pixel values may be added to the horizontal scanning 1D dictionary when they are encoded or decoded. When the current block is encoded or decoded, pixel values can be added to the vertical scanning 1D dictionary.

수평 및 수직 스캐닝 둘 다를 지원하는 구현예에서, 스캐닝 순서는 (예를 들면, 블록 단위 기반으로 또는 어떤 다른 기반으로) 변경될 수 있다. 예를 들면, 픽쳐의 하나의 블록이 수평 스캐닝을 사용하면, 그 블록에 대한 픽셀 값은 (수평 스캐닝 순서로) 수평 스캐닝 1D 딕셔너리에 추가될 것이고, 그 블록에 대한 픽셀 값은 또한 (수직 스캐닝 순서로) 수직 스캐닝 1D 딕셔너리에 추가될 것이다. 픽쳐의 다른 블록이 수직 스캐닝을 사용하면, 그 블록에 대한 픽셀 값은 (수직 스캐닝 순서로) 수직 스캐닝 1D 딕셔너리에 추가될 것이고, 그 블록에 대한 픽셀 값은 또한 (수평 스캐닝 순서로) 수평 스캐닝 1D 딕셔너리에 추가될 것이다.In implementations that support both horizontal and vertical scanning, the scanning order can be changed (eg, on a block-by-block basis or on some other basis). For example, if one block of a picture uses horizontal scanning, the pixel values for that block will be added to the horizontal scanning 1D dictionary (in horizontal scanning order), and the pixel values for that block will also be added to (vertical scanning order). As) will be added to the vertical scanning 1D dictionary. If another block of the picture uses vertical scanning, the pixel values for that block will be added to the vertical scanning 1D dictionary (in vertical scanning order), and the pixel values for that block will also be added to the horizontal scanning 1D (in horizontal scanning order). It will be added to the dictionary.

D. D. 딕셔너리Dictionary 사이즈 감소 Size reduction

1D 딕셔너리의 사이즈는 (예를 들면, 딕셔너리를 유지하는 비용과 픽셀 값을 예측하는 이점의 밸런스를 맞추기 위해) 제한될 수 있다 딕셔너리의 사이즈를 감소시키는 것(예를 들면, 딕셔너리의 프루닝(pruning))은 다양한 시간에 수행될 수 있다. 예를 들면, 딕셔너리의 사이즈는, 픽셀 값을 딕셔너리에 추가할 때 체크될 수 있다. 딕셔너리가 최대 사이즈(예를 들면, 미리 결정된 최대 사이즈, 예컨대 500K)보다 더 크면, 딕셔너리는 (예를 들면, 딕셔너리에서 가장 오래된 엔트리를 제거하는 것에 의해) 사이즈가 감소될 수 있다.The size of a 1D dictionary can be limited (e.g., to balance the cost of maintaining the dictionary and the advantage of predicting pixel values).Reducing the size of the dictionary (e.g. pruning the dictionary) )) can be performed at various times. For example, the size of the dictionary can be checked when adding pixel values to the dictionary. If the dictionary is larger than the maximum size (eg, a predetermined maximum size, eg 500K), the dictionary can be reduced in size (eg, by removing the oldest entry in the dictionary).

몇몇 구현예에서, 미리 정의된 최대 딕셔너리 사이즈가 정의된다. 딕셔너리가 미리 정의된 최대 딕셔너리 사이즈보다 더 크면, 딕셔너리의 일부(예를 들면, 딕셔너리의 가장 오래된 부분)는 제거된다. 특정 구현예에서, 딕셔너리가 임계 사이즈보다 더 크면, 딕셔너리의 1/3이 제거된다. 예를 들면, 몇몇 구현예에서, 기본 딕셔너리 사이즈는 1<<18로서 정의된다. 현재 딕셔너리 사이즈가 기본 딕셔너리 사이즈와 동일하거나 또는 1.5배 더 크면, 가장 오래된 0.5배의 기본 딕셔너리 사이즈 엘리먼트는 딕셔너리에서 제거된다.In some implementations, a predefined maximum dictionary size is defined. If the dictionary is larger than the predefined maximum dictionary size, then a portion of the dictionary (eg, the oldest part of the dictionary) is removed. In certain implementations, if the dictionary is larger than the threshold size, 1/3 of the dictionary is removed. For example, in some implementations, the default dictionary size is defined as 1<<18. If the current dictionary size is equal to or 1.5 times larger than the base dictionary size, then the oldest 0.5 times base dictionary size element is removed from the dictionary.

몇몇 구현예에서, 딕셔너리는 주기적으로 체크만된다(그리고 필요하다면 프루닝된다). 예를 들면, 딕셔너리는, 블록, CU, 또는 CTU를 인코딩 및/또는 디코딩한 이후, 체크될 수 있다. 특정 구현예에서, 딕셔너리의 사이즈는, CTU를 인코딩 또는 디코딩한 이후, 체크되고, 최대 사이즈를 초과하면 1/3만큼 감소된다. 이러한 구현예에서, 두 번의 체크 사이에서 딕셔너리에 추가될 수도 있는 엘리먼트의 최대 수는, 제거용 임계치(removing threshold)를 뺀 딕셔너리 버퍼 사이즈보다 더 크지 않아야 한다는 것이 보장되어야 한다. 예를 들면, 기본 딕셔너리 사이즈는 1<<18로서 정의되고, 제거용 임계치는 기본 딕셔너리 사이즈의 1.5배로서 정의되는데, 1.5배는, 1<<18 + 1<<17이어야 한다. CTU(CTU 사이즈가 4096이라고 가정함)를 인코딩 또는 디코딩한 이후, 딕셔너리 사이즈가 체크되면, 딕셔너리에 대해 사용되는 최소 버퍼는 1<<18 + <<17 + 4096이어야 한다.In some implementations, the dictionary is only checked periodically (and pruned if necessary). For example, the dictionary may be checked after encoding and/or decoding a block, CU, or CTU. In certain implementations, the size of the dictionary is checked after encoding or decoding the CTU, and is reduced by 1/3 if it exceeds the maximum size. In such an implementation, it should be ensured that the maximum number of elements that may be added to the dictionary between two checks should not be larger than the dictionary buffer size minus the removing threshold. For example, the default dictionary size is defined as 1<<18, and the threshold for removal is defined as 1.5 times the default dictionary size, and 1.5 times should be 1<<18 + 1<<17. After encoding or decoding CTU (assuming that the CTU size is 4096), if the dictionary size is checked, the minimum buffer used for the dictionary should be 1<<18 + <<17 + 4096.

E. 스캐닝 순서의 재구성E. Reorganization of the scanning sequence

픽셀 값을 디코딩한 이후, 픽셀 값은 비디오 컨텐츠를 이차원으로 재작성하도록 재구성된다. 픽셀 값을 스캐닝 순서로 재구성하는 것은, 디코딩 프로세스 동안 다양한 지점에서 수행될 수 있다. 예를 들면, 비디오 컨텐츠의 특정 영역(예를 들면, 블록, CU, 또는 CTU)에 대한 픽셀 값이 디코딩된 이후, 그 픽셀 값은 스캐닝 순서로 재구성될 수 있다.After decoding the pixel values, the pixel values are reconstructed to rewrite the video content in two dimensions. Reconstructing the pixel values in the scanning order can be performed at various points during the decoding process. For example, after pixel values for a specific region (eg, block, CU, or CTU) of the video content are decoded, the pixel values may be reconstructed in a scanning order.

몇몇 구현예에서, 재구성은, 픽셀 값이 CU에 대해 다음과 같이 디코딩된 이후, 수행된다. CU에 대해 수평 스캐닝이 사용되면, 폭 "w" 및 높이 "h"를 가지고 스캐닝 순서로 CU에 대한 픽셀 값을 재구성하기 위해 다음의 식(식 1)이 사용된다(rec[i][j]는 행 "i" 및 열 "j"에서의 재구성된 픽셀이고; pixel[]은 디코딩된 픽셀이다):In some implementations, the reconstruction is performed after the pixel values are decoded for the CU as follows. When horizontal scanning is used for a CU, the following equation (Equation 1) is used to reconstruct the pixel values for the CU in scanning order with a width "w" and a height "h" (rec[i][j] Is the reconstructed pixel in row "i" and column "j"; pixel[] is the decoded pixel):

CU에 대해 수직 스캐닝이 사용되면, 폭 "w" 및 높이 "h"를 가지고 스캐닝 순서로 CU에 대한 픽셀 값을 재구성하기 위해 다음의 식(식 2)이 사용된다:If vertical scanning is used for a CU, the following equation (Equation 2) is used to reconstruct the pixel values for the CU in scanning order with width "w" and height "h":

F. F. 다이렉트direct 모드mode

1D 딕셔너리 모드를 사용하는 경우, 매칭 픽셀 값이 발견되지 않는 상황이 있을 수도 있다. 예를 들면, 인코딩 동안, 인코더는, 인코딩되고 있는 현재 픽셀 값과 매칭하는(또는 현재 인코딩되고 있는 다수의 픽셀 값과 매칭하는) 픽셀 값(또는 다수의 픽셀 값의 시퀀스)이 존재하는지를 결정하기 위해 딕셔너리를 역방향으로 들여다 볼 수 있다. 매칭이 발견되면, 현재 픽셀 값(들)은, 이 섹션의 위에서 설명된 오프셋 및 길이 코딩을 사용하여 매칭 모드에서 인코딩될 수 있다. 그러나, 매칭 픽셀 값이 딕셔너리에서 발견되지 않으면, 현재 픽셀 값은 다이렉트 모드를 사용하여 인코딩될 수 있다. 다이렉트 모드에서, 현재 픽셀 값은 직접적으로 코딩될 수 있다(예를 들면, 딕셔너리에서의 어떠한 다른 픽셀 값도 참조하지 않고, 픽셀 값의 Y, U, 및 V 성분, 또는 픽셀 값의 R, G, 및 B 성분이 직접적으로 인코딩될 수 있다).When using the 1D dictionary mode, there may be a situation in which a matching pixel value is not found. For example, during encoding, the encoder to determine if there is a pixel value (or a sequence of multiple pixel values) that matches the current pixel value being encoded (or matches multiple pixel values that are currently being encoded). You can look into the dictionary in the reverse direction. If a match is found, the current pixel value(s) can be encoded in the matching mode using the offset and length coding described above in this section. However, if a matching pixel value is not found in the dictionary, the current pixel value can be encoded using direct mode. In direct mode, the current pixel value can be directly coded (e.g., without reference to any other pixel values in the dictionary, the Y, U, and V components of the pixel value, or the R, G, and And the B component can be directly encoded).

몇몇 구현예에서는, 픽셀 값에 대해 다이렉트 모드가 사용되는 때를 나타내기 위해, 이스케이프(escape) 코드 또는 플래그가 사용된다. 예를 들면, 픽셀 값이 다이렉트 모드를 사용하여 인코딩되어 있다는 것을 디코더가 알도록, 인코더가, 직접적으로 인코딩된 픽셀 값과 함께, 비트스트림에 이스케이프 코드 또는 플래그를 배치할 수 있다. 이 방식에서, 디코더는 다이렉트 모드에서 인코딩된 픽셀 값과 매칭 모드를 사용하여 인코딩된 픽셀 값 사이를 구별할 수 있다. 또한, 1D 딕셔너리 모드에서의 코딩은, 필요하다면(예를 들면, 픽셀 단위 기반으로) 매칭 모드와 다이렉트 모드 사이에서의 전환을 지원할 수 있다.In some implementations, an escape code or flag is used to indicate when direct mode is used for a pixel value. For example, the encoder can place an escape code or flag in the bitstream, along with the directly encoded pixel value, so that the decoder knows that the pixel value has been encoded using direct mode. In this way, the decoder can distinguish between a pixel value encoded in the direct mode and a pixel value encoded using the matching mode. In addition, coding in the 1D dictionary mode may support switching between a matching mode and a direct mode if necessary (eg, on a per-pixel basis).

G. 예시적인 인코딩/디코딩G. Exemplary Encoding/Decoding

도 7은, 픽셀 값이 1D 딕셔너리 모드를 사용하여 어떻게 인코딩될 수 있는지의 단순화된 예(700)를 예시하는 도면이다. 예(700)에서 묘사되는 바와 같이, 예시적인 픽셀 값의 8×8 블록(710)의 세 개의 행(제1, 제2 및 최종 행)이 묘사된다. 예시적인 픽셀 값의 블록(710)은 3 바이트의 YUV 또는 RGB 값을 사용하여 묘사된다. 블록의 픽셀 값은, 참조의 목적을 위해, 수평 스캐닝 순서에서 픽셀 제로(P₀)로 시작하여 라벨링된다.7 is a diagram illustrating a simplified example 700 of how pixel values can be encoded using a 1D dictionary mode. As depicted in example 700, three rows (first, second and last row) of an 8x8 block 710 of exemplary pixel values are depicted. An exemplary block of pixel values 710 is depicted using 3 bytes of YUV or RGB values. The pixel values of the block are labeled starting with pixel zero (P ₀ ) in the horizontal scanning order, for reference purposes.

예(700)에서 예시되는 바와 같이, 픽셀 값은 1D 딕셔너리 모드를 사용하여 인코딩된다(720). 제1 픽셀 값(P₀)은 제1 엔트리로서 1D 딕셔너리에 추가된다(예를 들면, 제1 픽셀 값은 비디오 프레임의 제1 블록에서의 제1 픽셀일 수도 있다). 1D 딕셔너리에서 이전 픽셀 값이 없기 때문에, 제1 픽셀 값(P₀)은 다이렉트 모드에서 인코딩되고 인코딩된 비트 스트림에 추가된다. 제2 픽셀 값(P₁)도 또한 다이렉트 모드에서 1D 딕셔너리에 추가되는데, 제2 픽셀 값(P₁)이 딕셔너리에서의 어떠한 이전 픽셀 값과도 매칭하지 않기 때문이다. 제3 픽셀 값(P₂)도 또한 다이렉트 모드에서 1D 딕셔너리에 추가된다. 인코딩된 비트 스트림 및 1D 딕셔너리의 상태는 730에서 묘사된다. 인코딩된 비트 스트림은, 처음 세 개의 픽셀이 다이렉트 모드를 사용하여 인코딩된다는 것을 나타내는 단순화된 포맷으로 묘사된다(예를 들면, 다이렉트 모드는 인코딩된 비트 스트림에서 이스케이프 코드에 의해 나타내어질 수도 있다).As illustrated in example 700, pixel values are encoded 720 using a 1D dictionary mode. The first pixel value P ₀ is added to the 1D dictionary as a first entry (eg, the first pixel value may be the first pixel in the first block of the video frame). Since there is no previous pixel value in the 1D dictionary, the first pixel value P ₀ is encoded in direct mode and added to the encoded bit stream. The second pixel value P ₁ is also added to the 1D dictionary in direct mode because the second pixel value P ₁ does not match any previous pixel values in the dictionary. The third pixel value P ₂ is also added to the 1D dictionary in direct mode. The state of the encoded bit stream and 1D dictionary is described at 730. The encoded bit stream is depicted in a simplified format indicating that the first three pixels are encoded using direct mode (eg, direct mode may be represented by an escape code in the encoded bit stream).

제4 픽셀 값(P3)이 인코딩될 때, 1D 딕셔너리에서 매칭이 발견된다. 구체적으로는, P₀는 P₃와 매칭하고 따라서 P₃은 1D 딕셔너리의 P₀를 참조로 오프셋 값 및 길이 값을 사용하여 매칭 모드에서 인코딩될 수 있다. 1D 딕셔너리에서 매칭 픽셀(P₀)이 식별된 이후, 매칭 픽셀 값의 길이가 결정될 수 있다. 이 예에서, 두 개의 픽셀 값은 매칭한다(즉, P₃ 및 P₄는 P₀ 및 P₁과 매칭한다). 오프셋 및 길이를 인코딩하기 위해, 이 예(700)는 이 섹션의 위에서 설명된 범위(테이블 1 및 테이블 2)를 사용한다. 먼저, 오프셋 값 및 길이 값이 (제로에서 시작하는 넘버링으로 변환하기 위해) 1만큼 감소되고 범위를 사용하여 인코딩된다. 구체적으로는, 2(3-1)의 오프셋 값은 테이블 1의 첫 번째 행에 따라 "110"(첫 번째 "1"은 범위 1을 나타내고, "10"은 2의 오프셋 값을 나타냄)으로서 인코딩된다. 1(2-1)의 길이 값은 테이블 2의 첫 번째 행에 따라 "101"(첫 번째 "1"은 범위 1을 나타내고, "01"은 1의 길이 값을 나타냄)으로서 인코딩된다. 길이 및 오프셋을 덧붙이는 것은 "110101"의 코드로서 나타난다. 인코딩된 비트 스트림 및 1D 딕셔너리의 상태는 740에서 묘사된다. 인코딩된 비트 스트림은, 처음 세 개의 픽셀이 다이렉트 모드를 사용하여 인코딩되고 제4 및 제5 픽셀 값이 매칭 모드에서 인코딩되고 제1 및 제2 픽셀 값으로부터 예측된다는 것을 나타내는 단순화된 포맷으로 묘사된다.When the fourth pixel value P3 is encoded, a match is found in the 1D dictionary. Specifically, P ₀ matches P ₃ and thus P ₃ can be encoded in the matching mode using an offset value and a length value with reference to P ₀ in the 1D dictionary. After the matching pixel P ₀ is identified in the 1D dictionary, the length of the matching pixel value may be determined. In this example, the two pixel values match (i.e., P ₃ and P ₄ match P ₀ and P ₁ ). To encode the offset and length, this example 700 uses the ranges described above (Table 1 and Table 2) in this section. First, the offset value and length value are decremented by 1 (to convert to numbering starting at zero) and encoded using the range. Specifically, the offset value of 2(3-1) is encoded as "110" (the first "1" represents the range 1 and "10" represents the offset value of 2) according to the first row of Table 1. do. The length value of 1(2-1) is encoded as "101" (the first "1" represents the range 1, and "01" represents the length value of 1) according to the first row of Table 2. Adding length and offset appears as a code of "110101". The state of the encoded bit stream and 1D dictionary is described at 740. The encoded bit stream is depicted in a simplified format indicating that the first three pixels are encoded using direct mode and the fourth and fifth pixel values are encoded in the matching mode and predicted from the first and second pixel values.

도 8은, 픽셀 값이 1D 딕셔너리 모드를 사용하여 어떻게 디코딩될 수 있는지의 단순화된 예(800)를 예시하는 도면이다. 예(800)에서 묘사되는 바와 같이, 도 7의 블록을 인코딩하는 것으로부터 생성되는 인코딩된 비트 스트림은 1D 딕셔너리 모드를 사용하여 디코딩된다(810). 처음 세 개의 픽셀 값은, 820에서 묘사되는 바와 같이, 다이렉트 모드에서 디코딩되어 딕셔너리에 추가된다.8 is a diagram illustrating a simplified example 800 of how pixel values can be decoded using a 1D dictionary mode. As depicted in example 800, the encoded bit stream resulting from encoding the block of FIG. 7 is decoded 810 using a 1D dictionary mode. The first three pixel values are decoded in direct mode and added to the dictionary, as depicted at 820.

제4 및 제 5 픽셀 값은 매칭 모드를 사용하여 디코딩된다. 이 예에서, 제4 및 제5 픽셀 값에 대한 인코딩된 비트 스트림 표현은 "110101"인데, 이것은 이 섹션의 상기에서 테이블 1 및 테이블 2에 의해 정의되는 오프셋 및 길이 범위를 사용하여 디코딩된다. 구체적으로는, 오프셋은 2로서 디코딩되고 길이는 1로서 디코딩된다. 오프셋 및 길이를 사용하여, 예측을 위해 사용되는 픽셀 값이 식별된다. 이 예에서, 2의 오프셋(제로에서 시작하는 넘버링을 보상하기 위해 1을 추가한 이후 세 개의 픽셀 이전)은 딕셔너리에서의 제1 픽셀 값을 식별한다. 길이는, (제로에서 시작하는 넘버링을 보상하기 위해 길이에 1을 더한 이후) 두 개의 픽셀 값이 예측된다는 것을 나타낸다. 따라서, 제4 및 제5 픽셀 값은, 830에서 묘사되는 바와 같이, 제1 및 제2 픽셀 값으로부터 예측되고 딕셔너리에 추가된다.The fourth and fifth pixel values are decoded using the matching mode. In this example, the encoded bit stream representation for the fourth and fifth pixel values is "110101", which is decoded using the offset and length range defined by Tables 1 and 2 above in this section. Specifically, the offset is decoded as 2 and the length is decoded as 1. Using the offset and length, the pixel values used for prediction are identified. In this example, an offset of 2 (three pixels before after adding 1 to compensate for numbering starting at zero) identifies the first pixel value in the dictionary. The length indicates that two pixel values are predicted (after adding 1 to the length to compensate for numbering starting at zero). Thus, the fourth and fifth pixel values are predicted from the first and second pixel values and added to the dictionary, as depicted at 830.

일단 8×8 블록이 디코딩되면, 8×8 블록은 수평 스캐닝 순서로 재구성된다. 재구성된 8×8 블록은 840에서 묘사된다.Once the 8x8 blocks are decoded, the 8x8 blocks are reconstructed in horizontal scanning order. The reconstructed 8x8 block is depicted in 840.

VIII. 의사 2D VIII. Doctor 2d 딕셔너리Dictionary 모드에Mode 대한 혁신안 Korea Innovation Plan

이 섹션은 의사 2D 딕셔너리 모드에 대한 다양한 혁신안을 제시한다. 의사 2D 딕셔너리 모드는 섹션 VII에서 설명된 1D 딕셔너리 모드와 유사하며, 따라서, 의사 2D 딕셔너리 모드의 동작은, 이 섹션에서 설명되는 차이를 제외하면, 1D 딕셔너리 모드와 동일하다.This section presents various innovations for the pseudo 2D dictionary mode. The pseudo 2D dictionary mode is similar to the 1D dictionary mode described in section VII, so the operation of the pseudo 2D dictionary mode is the same as the 1D dictionary mode, except for the differences described in this section.

1D 딕셔너리 모드가 이전 픽셀 값의 1D 딕셔너리를 유지하지만, 의사 2D 딕셔너리 모드는 별개의 딕셔너리를 유지하지 않는다. 대신, 의사 2D 딕셔너리 모드에서는, 이전 픽셀 값 전체(예를 들면, 픽쳐 또는 프레임의 시작에서부터 이전에 재구성된 픽셀 값의 전체)가 예측을 위해 사용될 수 있다. 예를 들면, 비디오 또는 이미지 인코더 또는 디코더는, 보통은, (예를 들면, 예측 동안의 사용을 위해) 인코딩 및 디코딩 동안 (예를 들면, 현재 픽쳐 또는 프레임에 대한) 모든 재구성된 픽셀 값을 유지할 수 있다.While the 1D dictionary mode keeps a 1D dictionary of previous pixel values, the pseudo 2D dictionary mode does not keep a separate dictionary. Instead, in the pseudo 2D dictionary mode, all previous pixel values (eg, all of the pixel values previously reconstructed from the start of a picture or frame) may be used for prediction. For example, a video or image encoder or decoder usually keeps all reconstructed pixel values (e.g., for the current picture or frame) during encoding and decoding (e.g., for use during prediction). I can.

의사 2D 딕셔너리 모드가 이차원 픽쳐의 픽셀 값(예를 들면, 이전에 재구성된 픽셀 값)으로부터 현재 픽셀 값을 예측하기 때문에, 의사 2D 딕셔너리 모드는 두 개의 오프셋 값, 즉 X 오프셋 값(offsetX) 및 Y 오프셋 값(offsetY)을 사용한다. offsetX 값 및 offsetY 값은 1D 딕셔너리 섹션의 위에서 설명된 기술을 사용하여(예를 들면, 테이블 1에서 설명되는 범위를 사용하여) 독립적으로 시그널링될 수 있다. 예를 들면, 100, 100(현재 픽쳐의 좌상(top-left)으로부터의 X/Y)에서의 픽셀 값이 10, 20에 있는 픽셀 값으로부터 예측되고 있으면, offsetX은 90으로 설정될 수 있고(픽쳐에 대한 재구성된 픽셀 값에서 왼쪽으로 90개 픽셀을 나타냄, 이것은 또한 -90으로 나타내어질 수 있음) offsetY은 80(픽쳐에 대한 재구성된 픽셀 값에서 위로 80개 픽셀을 나타냄, 이것은 또한 -80으로 나타내어질 수 있음)으로 설정될 수 있다.Since the pseudo 2D dictionary mode predicts the current pixel value from the pixel values of the two-dimensional picture (e.g., previously reconstructed pixel values), the pseudo 2D dictionary mode has two offset values, namely X offset values (offsetX) and Y Use an offset value (offsetY). The offsetX values and offsetY values can be signaled independently using the technique described above in the 1D dictionary section (eg, using the ranges described in Table 1). For example, if the pixel values at 100, 100 (X/Y from the top-left of the current picture) are predicted from the pixel values at 10 and 20, then offsetX can be set to 90 (picture Represents 90 pixels to the left in the reconstructed pixel value for, this can also be represented as -90) offsetY is 80 (represents 80 pixels up in the reconstructed pixel value for the picture, which is also represented as -80) Can be set).

의사 2D 딕셔너리 모드에서, 블록의 구조체는 예측을 수행할 때 고려된다. 예를 들면, 수평 스캐닝을 사용하여 코딩되는 현재의 8×8 블록을 고려한다. 현재 블록의 픽셀 값이 이전의 8×8 블록으로부터 예측되면, 그리고 예측의 길이가 9이면(즉, 8×8 블록의 한 행보다 더 길면), 이전의 8×8 블록에서 예측을 위해 사용된 픽셀 값은, 블록의 두 행을(또는 하나의 블록의 마지막 행으로부터 다음 블록의 첫 번째 행으로) 랩어라운드(wrap around)할 것이다.In the pseudo 2D dictionary mode, the structure of the block is considered when performing prediction. For example, consider the current 8x8 block that is coded using horizontal scanning. If the pixel value of the current block is predicted from the previous 8×8 block, and if the length of the prediction is 9 (i.e., longer than one row of the 8×8 block), then the previous 8×8 block used for prediction The pixel values will wrap around two rows of a block (or from the last row of one block to the first row of the next block).

몇몇 구현예에서, 하기의 식(식 3)은 의사 2D 딕셔너리 모드에서 픽쳐의 현재 픽셀을 재구성하기 위해 사용된다. 이 식에서, 현재 블록의 차원은 폭(w)×높이(h)이고, 현재 픽셀은 현재 블록의 위치 "c"(제로에서부터 카운팅함)에서의 픽셀이고, (x0, y0)는 현재 블록의 좌상의 시작 위치이고, 오프셋은 (oX, oY)이고, 스캐닝 순서는 수평이고, 매칭 길이는 1이고, pictureRec[]는 현재 픽쳐의 재구성이다.In some implementations, the following equation (Equation 3) is used to reconstruct the current pixel of the picture in the pseudo 2D dictionary mode. In this equation, the dimension of the current block is width (w) x height (h), the current pixel is the pixel at the current block's position "c" (counted from zero), and (x0, y0) is the upper left of the current block. Is the start position of, the offset is (oX, oY), the scanning order is horizontal, the matching length is 1, and pictureRec[] is the reconstruction of the current picture.

의사 2D 딕셔너리 모드의 나머지 양태는 1D 딕셔너리 모드와 관련하여 상기에서 설명되어 있다(예를 들면, 길이의 시그널링, 길이 및 오프셋을 코딩하기 위한 비트의 최대 수, 수평 및 수직 스캐닝 모드 양자에 대한 지원, 픽셀 값 성분을 함께(예를 들면, Y, U, 및 V 또는 R, G, 및 B) 프로세싱하는 것, 등등).The remaining aspects of the pseudo 2D dictionary mode are described above with respect to the 1D dictionary mode (e.g., signaling of length, maximum number of bits to code length and offset, support for both horizontal and vertical scanning modes, Processing the pixel value components together (eg, Y, U, and V or R, G, and B), etc.).

IX. IX. 인터Inter 의사 2D Doctor 2d 딕셔너리Dictionary 모드에Mode 대한 혁신안 Korea Innovation Plan

이 섹션은 인터 의사 2D 딕셔너리 모드에 대한 다양한 혁신안을 제시한다. 인터 의사 2D 딕셔너리 모드는 섹션 VIII에서 설명된 의사 2D 딕셔너리 모드와 유사하며, 따라서, 인터 의사 2D 딕셔너리 모드의 동작은, 이 섹션에서 설명되는 차이를 제외하면, 의사 2D 딕셔너리 모드와 동일하다.This section presents various innovations for the inter pseudo 2D dictionary mode. The inter pseudo 2D dictionary mode is similar to the pseudo 2D dictionary mode described in section VIII, and therefore the operation of the inter pseudo 2D dictionary mode is the same as the pseudo 2D dictionary mode, except for the differences described in this section.

의사 2D 딕셔너리 모델이 예측을 위해 현재 픽쳐의 재구성된 픽셀 값을 사용하지만, 인터 의사 2D 딕셔너리 모드는 예측을 위해 참조 픽쳐(또는 다수의 참조 픽쳐)의 픽셀 값을 사용한다. 몇몇 구현예에서, 인터 의사 2D 딕셔너리 모드에서 예측을 위해 사용되는 참조 픽쳐는 (예를 들면, 참조 픽쳐 리스트 및 리스트에 대한 참조 픽쳐 인덱스를 시그널링하는 것에 의해) 시그널링된다.While the pseudo 2D dictionary model uses reconstructed pixel values of the current picture for prediction, the inter pseudo 2D dictionary mode uses the pixel values of a reference picture (or multiple reference pictures) for prediction. In some implementations, the reference picture used for prediction in the inter pseudo 2D dictionary mode is signaled (eg, by signaling a reference picture list and a reference picture index for the list).

대안적으로, (예를 들면, 다수의 이용가능한 참조 픽쳐로부터의 특정 참조 픽쳐에 대한 시그널링 오버헤드를 방지하기 위해) 디폴트 참조 픽쳐가 예측을 위해 사용될 수 있다. 몇몇 구현예에서, 디폴트 참조 픽쳐는 참조 픽쳐 리스트 0에서의 제1 픽쳐이다.Alternatively, a default reference picture can be used for prediction (eg, to avoid signaling overhead for a particular reference picture from multiple available reference pictures). In some implementations, the default reference picture is the first picture in reference picture list 0.

X. X. 딕셔너리Dictionary 모드를Mode 사용하여 픽셀 값을 디코딩하기 위한 예시적인 방법 Example method for decoding pixel values using

1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및/또는 인터 의사 2D 딕셔너리 모드를 사용하여 픽셀 값을 디코딩하기 위한 방법이 제공될 수 있다.A method for decoding pixel values using a 1D dictionary mode, a pseudo 2D dictionary mode, and/or an inter pseudo 2D dictionary mode may be provided.

도 9는 딕셔너리 모드를 사용하여 픽셀 값을 디코딩하기 위한 예시적인 방법(900)의 플로우차트이다. 910에서, 인코딩된 데이터가 비트 스트림에서 수신된다. 예를 들면, 인코딩된 데이터는 인코딩된 비디오 데이터 및/또는 인코딩된 이미지 데이터일 수 있다.9 is a flowchart of an example method 900 for decoding pixel values using a dictionary mode. At 910, encoded data is received in the bit stream. For example, the encoded data may be encoded video data and/or encoded image data.

920에서, 하나 이상의 현재 픽셀 값은 딕셔너리 모드를 사용하여 디코딩된다. 예를 들면, 딕셔너리 모드는 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 또는 인터 의사 2D 딕셔너리 모드일 수 있다. 하나 이상의 현재 픽셀 값은 비디오 컨텐츠의 블록에 대해 디코딩될 수 있다. 하나 이상의 현재 픽셀 값을 디코딩하는 것은, 동작 930 내지 950을 수행하는 것을 포함한다.At 920, the one or more current pixel values are decoded using the dictionary mode. For example, the dictionary mode may be a 1D dictionary mode, a pseudo 2D dictionary mode, or an inter pseudo 2D dictionary mode. One or more current pixel values may be decoded for a block of video content. Decoding the one or more current pixel values includes performing operations 930 to 950.

930에서, 이전에 디코딩된 픽셀 값 내에서의 오프셋 위치를 나타내는 오프셋이 디코딩된다. 예를 들면, 오프셋을 디코딩하는 것은, 현재 픽쳐의 이전에 디코딩된(예를 들면, 이전에 재구성된) 픽셀 값의 1D 딕셔너리 내에서의 오프셋 위치를 식별하는 오프셋 값을 획득하기 위해, 오프셋 범위 코드 및 오프셋 값 코드를 디코딩하는 것을 포함할 수 있다. 오프셋을 디코딩하는 것은 또한, 의사 2D 딕셔너리 모드 또는 인터 의사 2D 딕셔너리 모드를 사용하여 이전 픽셀 값을 식별하기 위한 X 및 Y 오프셋 값을 갖는 이차원 오프셋을 디코딩하는 것을 포함할 수 있다. 또한, 인터 의사 2D 딕셔너리 모드를 사용하는 경우, (예를 들면, 오프셋과는 개별적으로) 참조 픽쳐 정보가 디코딩될 수 있다.At 930, an offset representing the offset position within the previously decoded pixel value is decoded. For example, decoding the offset may include an offset range code to obtain an offset value that identifies the offset position within a 1D dictionary of previously decoded (e.g., previously reconstructed) pixel values of the current picture. And decoding the offset value code. Decoding the offset may also include decoding a two-dimensional offset with X and Y offset values to identify previous pixel values using a pseudo 2D dictionary mode or an inter pseudo 2D dictionary mode. In addition, when the inter pseudo 2D dictionary mode is used, reference picture information may be decoded (eg, independently from an offset).

940에서, 930에서 디코딩되었던 오프셋으로부터 예측되고 있는 픽셀의 수를 나타내는 길이가 디코딩된다. 예를 들면, 길이를 디코딩하는 것은, 길이 범위 코드 및 길이 값 코드를 디코딩하는 것을 포함할 수 있다.At 940, a length representing the number of pixels being predicted from the offset that was decoded at 930 is decoded. For example, decoding the length may include decoding the length range code and the length value code.

950에서, 하나 이상의 현재 픽셀 값은 오프셋에 있는 하나 이상의 이전 픽셀 값으로부터 예측된다. 하나 이상의 현재 픽셀 값은, 어떠한 잔차 또는 다른 수정치 없이, 하나 이상의 이전 픽셀 값과 동일한 픽셀 값(예를 들면, YUV 또는 RGB 성분 값)을 사용하여 정확하게 예측될 수 있다. 예측되고 있는 픽셀 값의 수는 길이에 의해 나타내어진다.At 950, one or more current pixel values are predicted from one or more previous pixel values at the offset. One or more current pixel values can be accurately predicted using the same pixel values (eg, YUV or RGB component values) as one or more previous pixel values, without any residuals or other corrections. The number of predicted pixel values is represented by the length.

하나 이상의 현재 픽셀 값은, 예측된 이후, (예를 들면, 현재 픽쳐에 대한 수평 또는 수직 스캐닝 순서를 사용하여) 이차원 비디오 픽쳐 또는 이미지를 재구성하도록 사용될 수 있다.The one or more current pixel values may be used to reconstruct a two-dimensional video picture or image (eg, using a horizontal or vertical scanning order for the current picture) after being predicted.

도 10은 1D 딕셔너리 모드를 사용하여 픽셀 값을 디코딩하기 위한 예시적인 방법(1000)의 플로우차트이다. 1010에서, 인코딩된 데이터가 비트 스트림에서 수신된다. 예를 들면, 인코딩된 데이터는 인코딩된 비디오 데이터 및/또는 인코딩된 이미지 데이터일 수 있다.10 is a flowchart of an exemplary method 1000 for decoding pixel values using a 1D dictionary mode. At 1010, encoded data is received in the bit stream. For example, the encoded data may be encoded video data and/or encoded image data.

1020에서, 다수의 현재 픽셀 값이 1D 딕셔너리 모드를 사용하여 디코딩된다. 1D 딕셔너리 모드는 이전에 디코딩된 픽셀 값(예를 들면, 현재 픽쳐에서의 이전에 재구성된 픽셀 값)을 1D 딕셔너리에 저장한다. 다수의 현재 픽셀 값을 디코딩하는 것은, 동작 1030 내지 1070을 수행하는 것을 포함한다.At 1020, a number of current pixel values are decoded using the 1D dictionary mode. The 1D dictionary mode stores previously decoded pixel values (eg, previously reconstructed pixel values in the current picture) in the 1D dictionary. Decoding the multiple current pixel values includes performing operations 1030 to 1070.

1030에서, 오프셋 범위 코드가 디코딩된다. 오프셋 범위 코드는 오프셋 값 코드에 대한 비트의 수를 나타낸다. 예를 들면, 가능한 오프셋 값은 (예를 들면, 상기의 테이블 1에서 묘사된 바와 같이) 다수의 범위로 분할될 수 있는데, 오프셋 범위 코드는 오프셋 값 코드에 대해 사용되는 비트의 수 및 범위를 나타낸다.At 1030, the offset range code is decoded. The offset range code represents the number of bits for the offset value code. For example, possible offset values can be divided into multiple ranges (e.g., as depicted in Table 1 above), where the offset range code represents the number and range of bits used for the offset value code. .

1040에서, 오프셋 값 코드는 (1030에서 나타내어지는 비트의 수를 사용하여) 디코딩되어 오프셋 값을 생성한다. 오프셋 값은 이전에 디코딩된 픽셀 값의 1D 딕셔너리 내에서의 위치를 식별한다. 수평 스캐닝 1D 딕셔너리 및 수직 스캐닝 1D 딕셔너리 둘 다가 사용되면, 오프셋 값은, 현재 픽셀의 스캐닝 순서(예를 들면, 현재 블록의 스캐닝 순서)에 대응하는 딕셔너리 내에서의 위치를 식별할 수도 있다.At 1040, the offset value code is decoded (using the number of bits indicated at 1030) to generate an offset value. The offset value identifies the position within the 1D dictionary of previously decoded pixel values. If both the horizontal scanning 1D dictionary and the vertical scanning 1D dictionary are used, the offset value may identify a position within the dictionary corresponding to the scanning order of the current pixel (e.g., the scanning order of the current block).

1050에서, 길이 범위 코드가 디코딩된다. 길이 범위 코드는 길이 값 코드에 대한 비트의 수를 나타낸다. 예를 들면, 가능한 길이 값은 (예를 들면, 상기의 테이블 2에서 묘사된 바와 같이) 다수의 범위로 분할될 수 있는데, 길이 범위 코드는 길이 값 코드에 대해 사용되는 비트의 수 및 범위를 나타낸다.At 1050, the length range code is decoded. The length range code represents the number of bits for the length value code. For example, possible length values can be divided into multiple ranges (e.g., as depicted in Table 2 above), the length range code representing the number and range of bits used for the length value code. .

1060에서, 길이 값 코드는 (1050에서 나타내어지는 비트의 수를 사용하여) 디코딩되어 길이 값을 생성한다. 길이 값은 예측되고 있는 픽셀의 수를 특정한다.At 1060, the length value code is decoded (using the number of bits indicated at 1050) to produce a length value. The length value specifies the number of pixels being predicted.

1070에서, 현재 픽셀 값은 오프셋 값 및 길이 값을 사용하여 적어도 하나의 딕셔너리에서의 픽셀 값으로부터 예측된다. 현재 픽셀 값은, 현재 픽셀 값에 대응하는 순서(예를 들면, 수평 또는 수직 스캐닝 순서)로 이전 픽셀 값을 저장하는 1D 딕셔너리에서의 대응하는 픽셀 값으로부터 예측될 수 있다. 1D 딕셔너리에서의 위치는, 길이 값에 의해 나타내어지는 예측되고 있는 현재 픽셀의 수를 갖는 오프셋 값에 의해 식별된다. 현재 픽셀 값은, 어떠한 잔차 또는 다른 수정치 없이, 이전 픽셀 값과 동일한 픽셀 값(예를 들면, YUV 또는 RGB 성분 값)을 사용하여 정확하게 예측될 수 있다.At 1070, the current pixel value is predicted from the pixel value in at least one dictionary using an offset value and a length value. The current pixel value may be predicted from a corresponding pixel value in a 1D dictionary storing previous pixel values in an order corresponding to the current pixel value (eg, horizontal or vertical scanning order). The position in the 1D dictionary is identified by an offset value with the number of predicted current pixels represented by the length value. The current pixel value can be accurately predicted using the same pixel value (eg, YUV or RGB component value) as the previous pixel value, without any residuals or other corrections.

현재 픽셀 값은, 예측된 이후, (예를 들면, 현재 픽쳐에 대한 수평 또는 수직 스캐닝 순서를 사용하여) 이차원 비디오 픽쳐 또는 이미지를 재구성하도록 사용될 수 있다.The current pixel value can be used to reconstruct a two-dimensional video picture or image after it is predicted (eg, using a horizontal or vertical scanning order for the current picture).

XI. 1D 및 의사 2D XI. 1D and pseudo 2D 딕셔너리Dictionary 모드에서의Mode 인코딩을 위한 혁신안 Innovation for encoding

이 섹션은, 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및/또는 인터 의사 2D 딕셔너리 모드에 적용될 수 있는 인코딩을 위한 다양한 혁신안을 제시한다. 몇몇 혁신안은, 이전에 재구성된 픽셀 값 및/또는 딕셔너리 내에서 매칭 픽셀 값을 찾는 것에 관한 것이지만, 다른 혁신안은 조기 종료(early termination) 및 매칭 모드에서의 시그널링의 비용에 관한 것이다.This section presents various innovations for encoding applicable to 1D dictionary mode, pseudo 2D dictionary mode, and/or inter pseudo 2D dictionary mode. Some innovations relate to finding matching pixel values within a dictionary and/or previously reconstructed pixel values, while others relate to early termination and the cost of signaling in matching mode.

A. 1D A. 1D 딕셔너리Dictionary 모드에서의Mode 해시 기반의 매칭 Hash-based matching

몇몇 구현예에서, 비디오 또는 이미지 인코더는 매칭 픽셀 값을 식별하기 위해 해시 기반의 검색 기술을 사용한다. 해시 기반의 검색 기술의 특정 구현예에서, 해시 값은 매 1개 픽셀마다(예를 들면, Y, U, 및 V 성분, 또는 R, G, 및 B 성분과 같은 픽셀의 성분을 함께 취급하는 모든 결합된 픽셀), 매 2개 픽셀마다, 매 4개 픽셀마다, 그리고 매 8개 픽셀마다 계산되어 저장된다. 예를 들면, 해시 값은, 현재 픽셀이 그 일부인 1개, 2개, 4개, 및 8개 픽셀의 각각의 조합에 대해 픽셀 값이 딕셔너리에 추가될 때(예를 들면, 1D 딕셔너리에 추가될 때) 생성될 수 있다. 예로서, 제1 픽셀 값이 인코딩되어 1D 딕셔너리에 추가될 수 있다. 제1 픽셀 값에 대한 해시 값이 결정되어 (예를 들면, 해시 테이블에) 추가될 수 있다. 제2 픽셀 값이 인코딩되어 1D 딕셔너리에 추가될 수 있다. 제2 픽셀 값에 대한 해시 값이 결정되어 추가될 수 있다. 또한, 2픽셀 조합(제1 픽셀 값 및 제2 픽셀 값)에 대한 해시 값이 계산되어 추가될 수 있고, 추가적인 픽셀 값이 1D 딕셔너리에 추가됨에 따라 계속 그럴 수 있다.In some implementations, the video or image encoder uses a hash-based search technique to identify matching pixel values. In certain implementations of the hash-based search technique, the hash value is every single pixel (e.g., all components of a pixel such as Y, U, and V components, or R, G, and B components, which are treated together. Combined pixels), every 2 pixels, every 4 pixels, and every 8 pixels. For example, the hash value is when the pixel value is added to the dictionary for each combination of 1, 2, 4, and 8 pixels whose current pixel is part of it (e.g., to be added to the 1D dictionary). When) can be created. For example, a first pixel value may be encoded and added to the 1D dictionary. A hash value for the first pixel value may be determined and added (eg, to a hash table). The second pixel value may be encoded and added to the 1D dictionary. A hash value for the second pixel value may be determined and added. Also, a hash value for a two-pixel combination (first pixel value and second pixel value) may be calculated and added, and may continue as additional pixel values are added to the 1D dictionary.

그 다음, 해시에서의 픽셀 값(또는 픽셀 값들)이, 인코딩되고 있는 현재 픽셀 값(또는 현재 픽셀 값)과 매칭하는지를 확인하기 위해 매칭이 수행된다. 먼저, (예를 들면, 1개의 현재 픽셀 값의 해시를 작성하고 그것을 딕셔너리의 이전의 1개의 픽셀 값의 해시에 비교하는 것에 의해) 해싱된 픽셀 값을 사용하여 매 1개 픽셀 값마다 매칭에 대한 체크가 이루어진다. 1개의 픽셀 매칭이 발견되면, 인코더는, 길이(현재 픽셀로부터 매칭하는 픽셀의 수)를 결정하기 위해 얼마나 많은 픽셀이 현재 픽셀로부터 매칭할 수 있는지를 체크할 수 있다. 2의 매칭 길이가 발견되면(예를 들면, 현재 픽셀 값이 딕셔너리에서 길이 2를 갖는 특정 오프셋에 있는 픽셀 값과 매칭하면), 매칭은, 현재 픽셀에 대해 더 이상 1개의 픽셀의 해시를 체크할 필요 없이, 2개의 픽셀 및 그 이상을 가지고 진행할 수 있다(예를 들면, 2 또는 그 이상의 길이를 갖는 딕셔너리의 다른 오프셋에 있는 픽셀 값이 현재 픽셀과 매칭할 수도 있다). 마찬가지로, 4의 매칭 길이가 발견되면, 해시 체킹은 4개 픽셀 및 그 이상을 가지고 시작하고, 8개 픽셀에 있어서도 마찬가지이다. 몇몇 구현예에서, 해시 검색은 1개, 2개, 4개, 및 8개 픽셀로 구현된다. 다른 구현예에서, 해시 검색은 더 많은 또는 더 적은 픽셀을 사용할 수 있다.Then, matching is performed to see if the pixel value (or pixel values) in the hash matches the current pixel value (or current pixel value) being encoded. First, for a match every 1 pixel value using the hashed pixel value (e.g. by building a hash of 1 current pixel value and comparing it to the hash of the previous 1 pixel value in the dictionary). A check is made. If one pixel match is found, the encoder can check how many pixels can match from the current pixel to determine the length (number of matching pixels from the current pixel). If a match length of 2 is found (e.g., if the current pixel value matches the pixel value at a specific offset of length 2 in the dictionary), the match will no longer check the hash of 1 pixel for the current pixel. Without need, you can proceed with two pixels and more (eg, a pixel value at a different offset in a dictionary with a length of 2 or more may match the current pixel). Likewise, if a match length of 4 is found, the hash checking starts with 4 pixels and more, and so on for 8 pixels. In some implementations, the hash search is implemented with 1, 2, 4, and 8 pixels. In other implementations, the hash search may use more or fewer pixels.

예로서, 다음의 여덟 개의 픽셀 값을 가지고 종료하는 딕셔너리를 고려한다(나타내어진 값과 위치를 가짐, 예를 들면, p-3은 3의 픽셀 값을 갖는 딕셔너리에서의 세 개의 픽셀 이전의 픽셀이다):As an example, consider a dictionary that ends with the following eight pixel values (with the indicated values and positions, e.g. p-3 is the pixel before three pixels in the dictionary with a pixel value of 3) ):

현재 픽셀은 인코더에 의해 인코딩될 것이다:The current pixel will be encoded by the encoder:

인코딩은, 1개의 픽셀 p0에 대한 해시 값을 체크하는 것에 의해 해시 인코딩 모드에서 시작한다. p0에 대한 해시 값은 p-3의 1개의 픽셀 해시 값과 매칭한다(그리고 p0 및 p-3 둘 다는 3의 픽셀 값을 갖는다). 해시 매칭은 체킹의 시작 위치만을 결정한다. 그 시작 위치로부터, 인코더는 또한, 매칭 픽셀 값의 실제 수를 체크하는 것을 필요로 한다. 따라서, 인코더는 매칭 픽셀 값의 길이를 체크한다. 이 예에서, 인코더는 p0 == p-3 인지의 여부를 체크하고(p0 및 p-3 둘 다가 3의 픽셀 값을 가지며, 따라서 예이면), 그 다음 p1 == p-2인지의 여부를 체크하고(둘 다가 4의 픽셀 값을 가지며, 따라서 예이면), 그 다음 p2 == p-1인지의 여부를 체크한다(픽셀 값이 매칭하지 않고, 즉 7 != 5이고, 따라서 인코더는 중지하고 매칭 길이가 2이다는 것을 결정한다). 다음에, 인코더는 두 개의 픽셀 값에 대한 해시 값으로부터 체크를 시작한다(2의 길이를 갖는 매칭이 이미 발견되었기 때문에, 인코더는 더 이상 1개의 픽셀의 해시 매칭를 체크하지 않는다). p0p1에 대한 해시 값은, p-7p-6의 2개의 픽셀 해시 값과 매칭한다. 그 다음, 인코더는 매칭 픽셀 값의 길이를 체크한다. 이 예에서, 인코더는 p0p1 == p-7p-6 인지의 여부를 체크하고(양자가 3, 4의 픽셀 값을 가지며, 따라서 예이면), 그 다음 p2 == p-5인지의 여부를 체크하고(양자가 7의 픽셀 값을 가지며, 따라서 예이면), 그 다음 p3 == p-4인지의 여부를 체크하고(양자가 1의 픽셀 값을 가지며, 따라서 예이면), 그 다음 p4 == p-3인지의 여부를 체크한다(픽셀 값이 매칭하지 않고, 즉 6 != 3이고, 따라서 인코더는 중지하고 매칭 길이가 4이다는 것을 결정한다). 그 다음, 인코더는, 더 긴 매칭 길이가 발견될 수 있는지를 확인하기 위해 4개 픽셀의 해시 매칭를 체크하도록(그리고 종국에서는 8개 픽셀의 해시 매칭를 가지고) 진행할 수 있다. 인코더가 체킹을 마치면, 현재 픽셀 값은, 발견된 가장 큰 매칭 길이로 인코딩될 것이다.Encoding starts in hash encoding mode by checking the hash value for one pixel p0. The hash value for p0 matches the 1 pixel hash value of p-3 (and both p0 and p-3 have a pixel value of 3). Hash matching only determines the starting position of the check. From its starting position, the encoder also needs to check the actual number of matching pixel values. Thus, the encoder checks the length of the matching pixel value. In this example, the encoder checks whether p0 == p-3 (both p0 and p-3 have a pixel value of 3, so if yes), then whether p1 == p-2 Check (both have a pixel value of 4, so if yes), then check whether p2 == p-1 (the pixel values do not match, i.e. 7 != 5, so the encoder stops And determine that the matching length is 2). Next, the encoder starts checking from the hash values for the two pixel values (since a match with a length of 2 has already been found, the encoder no longer checks the hash match of one pixel). The hash value for p0p1 matches the two pixel hash values of p-7p-6. Then, the encoder checks the length of the matching pixel value. In this example, the encoder checks whether p0p1 == p-7p-6 (both has pixel values of 3, 4, so if yes), then checks whether p2 == p-5 (If both have a pixel value of 7 and thus yes), then check whether p3 == p-4 (if both have a pixel value of 1 and thus yes), then p4 == Check if it is p-3 (the pixel value does not match, i.e. 6 != 3, so the encoder stops and determines that the match length is 4). The encoder may then proceed to check a hash match of 4 pixels (and eventually with a hash match of 8 pixels) to see if a longer match length can be found. When the encoder finishes checking, the current pixel value will be encoded with the largest match length found.

딕셔너리(예를 들면, 1D 딕셔너리)에서 픽셀 값(또는 다수의 픽셀 값)이 현재 픽셀 값과 동일한 해시 값을 가지면, 딕셔너리에서의 픽셀 값이 예측을 위해 사용될 수 있는지를 확인하기 위해 매칭은 여전히 수행된다. 예를 들면, 1D 딕셔너리에서의 한 픽셀에 대한 해시 값은 현재 픽셀에 대한 해시 값과 동일할 수도 있다. 1D 딕셔너리에서의 픽셀 값은, 현재 픽셀의 픽셀 값이 동일한지를 결정하기 위해, 여전히 비교되는 것을 필요로 한다(즉, 상이한 픽셀 값이 동일한 해시 값을 가질 수 있다).If a pixel value (or multiple pixel values) in a dictionary (e.g., a 1D dictionary) has the same hash value as the current pixel value, then matching is still performed to see if the pixel value in the dictionary can be used for prediction. do. For example, the hash value for one pixel in the 1D dictionary may be the same as the hash value for the current pixel. The pixel values in the 1D dictionary still need to be compared to determine if the pixel values of the current pixel are the same (ie different pixel values may have the same hash value).

몇몇 구현예에서, 하나 이상의 현재 픽셀에 대해 매칭이 발견되더라도, 오프셋 및 길이를 사용하는 매칭 모드에서 (예를 들면, 비트의 수의 관점에서) 하나 이상의 현재 픽셀을 인코딩하는 비용은, (예를 들면, 비트의 수의 관점에서) 하나 이상의 현재 픽셀을 직접적으로 인코딩하는 비용보다 더 클 수 있다. 이 상황에서, 하나 이상의 현재 픽셀은 직접적으로 코딩될 수 있다(예를 들면, 인코더는 하나 이상의 현재 픽셀에 대해 매칭 모드에서부터 다이렉트 모드로 전환할 수 있고, 이것은 이스케이프 코드 또는 플래그에 의해 비트 스트림에서 식별될 수 있다). 인코더는 필요로 될 때 (예를 들면, 픽셀 단위 기반으로, 블록 단위 기반으로, 또는 몇몇 다른 기반으로) 매칭 모드와 다이렉트 모드 사이에서 전환할 수 있다.In some implementations, even if a match is found for one or more current pixels, the cost of encoding one or more current pixels (e.g., in terms of number of bits) in a matching mode using offset and length is (e.g. For example, it may be higher than the cost of directly encoding one or more current pixels) in terms of the number of bits. In this situation, one or more current pixels may be directly coded (e.g., the encoder may switch from matching mode to direct mode for one or more current pixels, which is identified in the bit stream by an escape code or flag. Can be). The encoder can switch between matching mode and direct mode as needed (eg, on a per-pixel basis, on a per-block basis, or on some other basis).

몇몇 구현예에서, 조기 종료는 인코더에 의해 수행된다. 예를 들면, 충분한 픽셀 값(예를 들면, N 개의 픽셀 값)이 프로세싱되었고, 평균 매칭 길이(다이렉트 모드의 경우, 매칭 길이는 1로 간주될 수 있다)가 임계치(예를 들면, T의 임계 값)보다 더 작으면, 딕셔너리 모드 추정은 (예를 들면, 블록 단위 기반으로) 조기에 종료될 수 있다. 예를 들면, 딕셔너리 모드는 종료될 수 있고 픽쳐는 다른 인코딩 모드를 사용하여 재인코딩될 수 있거나, 또는 딕셔너리 모드는 픽쳐의 나머지 또는 픽쳐의 일부(예를 들면, 현재 블록)에 대해 종료될 수 있다. 조기 종료는, 평균 매칭 길이가 충분히 작아서 딕셔너리 모드가 다른 인코딩 모드보다 덜 효율적일 때(예를 들면, 통상의 인트라 모드, 통상의 인터 모드, 등등보다 덜 효율적일 때), 수행될 수 있다. 예를 들면, 몇몇 구현예에서, 평균 매칭 길이 임계치(T)는 2 또는 3일 수 있다.In some implementations, early termination is performed by the encoder. For example, enough pixel values (e.g., N pixel values) have been processed, and the average matching length (for direct mode, the matching length can be considered as 1) is the threshold (e.g., the threshold of T). Value), the dictionary mode estimation can be terminated early (e.g., on a block-by-block basis). For example, the dictionary mode can be terminated and the picture can be re-encoded using a different encoding mode, or the dictionary mode can be terminated for the remainder of the picture or part of the picture (e.g., the current block). . Early termination may be performed when the average matching length is sufficiently small so that the dictionary mode is less efficient than other encoding modes (eg, when it is less efficient than normal intra mode, normal inter mode, etc.). For example, in some implementations, the average match length threshold T may be 2 or 3.

B. 의사 2D B. Doctor 2D 딕셔너리Dictionary 모드에서의Mode 해시 기반의 매칭 Hash-based matching

인코딩 동안의 해시 기반의 매칭은, 1D 딕셔너리 모드에 대해 상기에서 설명된 해시 기반의 매칭과 마찬가지로, 의사 2D 딕셔너리 모드에서(그리고 인터 의사 2D 딕셔너리 모드에서) 수행될 수 있다.The hash-based matching during encoding may be performed in the pseudo 2D dictionary mode (and in the inter pseudo 2D dictionary mode), similar to the hash-based matching described above for the 1D dictionary mode.

1D 딕셔너리 모드와 마찬가지로, 1개, 2개, 4개, 및 8개 픽셀 값의 그룹화에서 이전 픽셀 값에 대해 해시 값이 작성된다. 그러나, 매칭하면, 의사 2D 딕셔너리 모드(및 인터 의사 2D 딕셔너리 모드)는 (1개 픽셀 해시 매칭를 가지고 시작하는 대신) 8개 픽셀의 해시 값을 가지고 체킹을 시작한다. 길이 8의 매칭이 발견되면, 최대 길이는 8보다 작지 않아야 하며 4개 픽셀 또는 그 이하의 해시 값을 체크할 필요가 없다. 그러나, 길이 8의 매칭이 발견되지 않으면, 4개 픽셀의 매칭에 대한 체킹이 시작하고, 계속 그런식으로 1개 픽셀까지 계속한다. 8개 픽셀 매칭이 해시 매칭에 의해 발견되지 않으면, 현재 매칭 길이는 7(예를 들면, 4개 픽셀에 대한 해시 매칭이 발견되고 그 시작 위치로부터, 인코더는 7개의 매칭 픽셀이 실제로 존재한다는 것을 발견했다)이고, 인코더는 8개 픽셀에 대한 매칭이 존재하지 않기 때문에 여기에서 종료할 수 있다.Like the 1D dictionary mode, a hash value is created for the previous pixel value in the grouping of 1, 2, 4, and 8 pixel values. However, if they match, the pseudo 2D dictionary mode (and inter pseudo 2D dictionary mode) starts checking with a hash value of 8 pixels (rather than starting with a 1 pixel hash match). If a match of length 8 is found, the maximum length should not be less than 8 and there is no need to check hash values of 4 pixels or less. However, if a match of length 8 is not found, then checking for a match of 4 pixels starts, and continues up to 1 pixel that way. If 8 pixel matches are not found by hash matching, the current matching length is 7 (e.g., a hash match for 4 pixels is found and from its starting position, the encoder finds that 7 matching pixels actually exist). ), and the encoder can end here because there is no match for 8 pixels.

C. C. 딕셔너리Dictionary 모드를Mode 사용하여 픽셀 값을 인코딩하기 위한 예시적인 방법 Example method for encoding pixel values using

1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 및/또는 인터 의사 2D 딕셔너리 모드를 사용하여 픽셀 값을 인코딩하기 위한 방법이 제공될 수 있다. 인코딩은 이전 픽셀 값(예를 들면, 재구성된 픽셀 값)의 해시 값을 계산하는 것 및 그 해시 값을, 인코딩될 현재 픽셀 값의 해시 값에 비교하는 것을 포함할 수 있다. 매칭은 (예를 들면, 픽쳐의 이전에 인코딩된 값 또는 1D 딕셔너리에서의) 오프셋 및 길이에 의해 식별되어 인코딩될 수 있다. 인코딩은 어떠한 매칭도 발견되지 않으면 다이렉트 모드에서 수행될 수 있다.A method may be provided for encoding pixel values using a 1D dictionary mode, a pseudo 2D dictionary mode, and/or an inter pseudo 2D dictionary mode. Encoding may include calculating a hash value of a previous pixel value (eg, a reconstructed pixel value) and comparing the hash value to a hash value of a current pixel value to be encoded. Matches can be identified and encoded by offset and length (eg, in a previously encoded value of the picture or in a 1D dictionary). The encoding can be performed in direct mode if no match is found.

도 11은 딕셔너리 모드를 사용하여 픽셀 값을 인코딩하기 위한 예시적인 방법(1100)의 플로우차트이다. 1110에서, 하나 이상의 현재 픽셀 값은 딕셔너리 모드(예를 들면, 1D 딕셔너리 모드, 의사 2D 딕셔너리 모드, 또는 인터 의사 2D 딕셔너리 모드)에서 인코딩된다. 현재 픽셀 값을 디코딩하는 것은, 1120 내지 1150에 따른 동작을 수행하는 것을 포함한다.11 is a flowchart of an example method 1100 for encoding pixel values using a dictionary mode. At 1110, the one or more current pixel values are encoded in a dictionary mode (eg, 1D dictionary mode, pseudo 2D dictionary mode, or inter pseudo 2D dictionary mode). Decoding the current pixel value includes performing an operation according to 1120 to 1150.

1120에서, 해시 값은 이전에 인코딩된 픽셀 값(예를 들면, 재구성된 픽셀 값)에 대해 계산된다 예를 들면, 해시 값은 1개 픽셀, 2개 픽셀, 4개 픽셀, 및 8개 픽셀의 조합에 대해 계산될 수 있다.At 1120, a hash value is calculated for a previously encoded pixel value (e.g., a reconstructed pixel value). For example, the hash value is 1 pixel, 2 pixels, 4 pixels, and 8 pixels. Can be calculated for combinations.

1130에서, 해시 값은, 인코딩될 하나 이상의 현재 픽셀 값에 대해 계산된다.At 1130, a hash value is computed for one or more current pixel values to be encoded.

1140에서, 하나 이상의 현재 픽셀 값에 대한 해시 값은, 매칭이 발견되는지를 결정하기 위해, 이전에 인코딩된 픽셀 값의 해시 값에 비교된다. (예를 들면, 1개 픽셀 값에 대해) 매칭이 발견되면, 매칭하는 픽셀의 길이는 결정될 수 있다.At 1140, the hash values for the one or more current pixel values are compared to the hash value of the previously encoded pixel value to determine if a match is found. If a match is found (eg, for one pixel value), the length of the matching pixel can be determined.

1150에서, 매칭이 발견되면, 하나 이상의 현재 픽셀 값은 오프셋 및 길이를 사용하여 인코딩된다. 예를 들면, 오프셋 및 길이는, (예를 들면, 의사 2D 딕셔너리 모드 또는 인터 의사 2D 딕셔너리 모드에 대해 X 및 Y 오프셋 값을 사용하여) 현재 픽셀 값이 예측되는 1D 딕셔너리에서의 위치 또는 이전에 재구성된 픽쳐 내에서의 위치를 나타낼 수 있다.At 1150, if a match is found, the one or more current pixel values are encoded using the offset and length. For example, the offset and length can be reconfigured before or at the position in the 1D dictionary where the current pixel value is predicted (e.g., using X and Y offset values for pseudo 2D dictionary mode or inter pseudo 2D dictionary mode) It can indicate the position within the picture.

Claims

A method in a computing device having a video decoder or an image decoder, the method comprising:
Receiving encoded data for a picture in a bit stream; And
Decoding one or more current pixel values from the encoded data, the decoding of the one or more current pixel values,
Decoding an offset representing an offset position in at least one one-dimensional dictionary of previously decoded pixel values from the encoded data, the at least one one-dimensional dictionary being a horizontal scanning one-dimensional dictionary, and decoding the offset :
Decoding an offset range code indicating a range of offset values and a number of bits to decode for the offset value; And
Decoding the offset value from the number of bits for the offset value based on the offset range code, wherein the offset position in the one-dimensional dictionary is identified by the offset value
Including, decoding the offset;
Decoding a length from the encoded data; And
Predicting the one or more current pixel values from one or more corresponding pixel values in the previously decoded pixel values at the offset location, the number of predicted pixels represented by a length value;
Decoding the one or more current pixel values including;
Adding the decoded one or more current pixel values to the horizontal scanning one-dimensional dictionary in a horizontal scanning order; And
Adding the decoded one or more current pixel values to a vertical scanning one-dimensional dictionary in a vertical scanning order.

The method of claim 1,
Wherein the one or more current pixel values and the one or more corresponding pixel values are combined YUV pixel values.

The method of claim 1,
Wherein the one or more current pixel values are decoded according to a 1D dictionary mode.

The method of claim 3,
Determining the size of the one-dimensional dictionary; And
If the size of the one-dimensional dictionary is larger than a predetermined maximum value, reducing the size of the one-dimensional dictionary
The method in a computing device further comprising.

The method of claim 1,
The step of decoding the length,
Decoding a length range code indicating the range of length values and the number of bits to be decoded for the length value; And
Decoding the length value from the number of bits for the length value based on the length range code
Including,
Wherein the predicted number of pixels is identified by the length value.

The method of claim 1,
The one or more current pixel values are decoded in a matching mode that predicts the one or more current pixel values from the one or more corresponding pixel values in the previously decoded pixel values,
The above method,
Decoding one or more other current pixel values from the encoded data
Including more,
Decoding one or more other current pixel values from the encoded data,
Decoding the one or more other current pixel values using a direct mode in which the one or more other current pixel values are directly encoded without prediction
A method in a computing device comprising a.

The method of claim 1,
Reconstructing at least a portion of the picture in one of a horizontal scanning order and a vertical scanning order using at least partially the decoded one or more current pixel values
The method in a computing device further comprising.

The method of claim 1,
Wherein the offset represents an X/Y offset position within a current picture of previously decoded pixel values, the one or more current pixel values being decoded according to a pseudo 2-D dictionary mode, Method on a computing device.

The method of claim 8,
The step of decoding the offset,
Decoding an X offset value from a first offset range code indicating a range of offset values and a number of bits to decode for the X offset value; And
Decoding a Y offset value from a second offset range code indicating a range of offset values and the number of bits to be decoded for the Y offset value.
Including,
Wherein the X/Y offset position within the current picture of the previously decoded pixel values is identified by the X offset value and the Y offset value.

A method in a computing device having a video decoder or an image decoder, the method comprising:
Receiving encoded data for a picture in a bit stream; And
A step of decoding a plurality of current pixel values from the encoded data using a 1D dictionary mode, the step of decoding the plurality of current pixel values,
Decoding an offset range code, the offset range code representing the number of bits for an offset value code;
The offset value-the offset value identifies a position in at least one dictionary of previously decoded pixel values, and the at least one dictionary includes a horizontal scanning one-dimensional dictionary and a vertical scanning one-dimensional dictionary-indicated above to generate Decoding the offset value code from a lower number of bits;
Decoding a length range code, the length range code indicating the number of bits for a length value code;
Decoding the length value code from the indicated number of bits to produce a length value;
From the corresponding pixel values at the position in the at least one dictionary, identified by the offset value, with the number of predicted current pixel values-the number of current pixel values being represented by the length value- Predicting the current pixel values
Decoding the plurality of current pixel values;
Adding the plurality of decoded current pixel values to the horizontal scanning one-dimensional dictionary in a horizontal scanning order; And
Adding the plurality of decoded current pixel values to the vertical scanning one-dimensional dictionary in a vertical scanning order
A method in a computing device comprising a.

The method of claim 10,
Reconstructing at least a part of the picture in one of a horizontal scanning order and a vertical scanning order by using at least partially the decoded plurality of current pixel values
The method in a computing device further comprising.

A method in a computing device having a video encoder or an image encoder, comprising:
Encoding data for the picture, including using a dictionary mode to encode one or more current pixel values.
Including,
The encoding step,
Calculating hash values for previously encoded pixel values;
Calculating a hash value for the one or more current pixel values to be encoded;
Determining whether a hash value for the at least one current pixel value matches a hash value for the previously encoded pixel values;
If a match is found, encoding the one or more current pixel values using a length and an offset that predicts the one or more current pixel values from matching previously encoded pixel values.
Including, the step of encoding the one or more current pixel values,
Encoding an offset range code, the offset range code indicating a number of bits for an offset value code;
Encoding the offset value code from the indicated number of bits to generate an offset value, the offset value identifying a location within at least one dictionary of previously encoded pixel values, and wherein the at least one dictionary Includes a horizontal scanning one-dimensional dictionary and a vertical scanning one-dimensional dictionary -;
Encoding a length range code, the length range code indicating a number of bits for a length value code;
Encoding the length value code from the indicated number of bits to produce a length value;
The number of current pixel values-the number of current pixel values to be encoded is represented by the length value-and the corresponding pixel values at the location in the at least one dictionary, identified by the offset value, are used. Encoding the one or more current pixel values
Including,
The encoded current pixel values of the number are added in the horizontal scanning order to the horizontal scanning one-dimensional dictionary,
The encoded current pixel values of the number are added in the vertical scanning order to the vertical scanning one-dimensional dictionary,
Method on a computing device.

The method of claim 12,
Wherein the one or more current pixel values and the previously encoded pixel values are one of combined YUV pixel values, combined RGB pixel values, and combined GBR pixel values.

The method of claim 12,
Computing hash values for the previously encoded pixel values,
Calculating hash values for each one of the previously encoded pixel values;
Calculating hash values for each of the two pixel values of the previously encoded pixel values;
Calculating hash values for each of the four pixel values of the previously encoded pixel values; And
Calculating hash values for each of the eight pixel values among the previously encoded pixel values
A method in a computing device comprising a.

The method of claim 12,
Calculating an average matching length;
When the average matching length is less than a threshold value, switching to an encoding mode other than the dictionary mode for the current block
The method in a computing device further comprising.

A computing device comprising a processing unit and a memory and adapted to perform the method of claim 1.

In the one or more computer-readable storage media storing computer-executable instructions,
Wherein the computer-executable instructions cause a computing device programmed by the computer-executable instructions to perform the method of any one of claims 1-15. media.

delete