KR102537087B1

KR102537087B1 - Motion estimation using 3D auxiliary data

Info

Publication number: KR102537087B1
Application number: KR1020217012942A
Authority: KR
Inventors: 블라디슬라브 자카르첸코; 장러 천
Original assignee: 후아웨이 테크놀러지 컴퍼니 리미티드
Priority date: 2018-10-02
Filing date: 2019-10-02
Publication date: 2023-05-26
Also published as: US11688104B2; WO2020072579A1; US20210217202A1; MX2021003841A; KR20210069683A; EP3850832A1; BR112021006256A2; EP3850832A4; JP7187688B2; SG11202103302VA; CN112771850B; JP2022502961A; CN112771850A

Abstract

모션 추정을 수행하기 위한 컴퓨터 저장 매체에 인코딩된 컴퓨터 프로그램을 포함하는 방법, 시스템 및 장치. 일부 구현에서, 방법은 포인트 클라우드 데이터의 연속성 데이터에 기초하여 포인트 클라우드 데이터의 분할을 생성하는 단계를 포함한다. 분할된 포인트 클라우드 데이터의 표현은 3 차원 바운딩 박스의 측면에 투영된다. 분할된 포인트 클라우드 데이터의 투영된 표현에 기초하여 패치가 생성된다. 패치의 제1 프레임이 생성된다. 제1 및 제2 보조 정보는 제1 프레임과 참조 프레임을 이용하여 생성된다. 제1 및 제2 보조 정보에 기초하여 참조 프레임으로부터의 패치와 일치하는 제1 프레임으로부터의 제1 패치가 식별된다. 제1 및 제2 보조 정보의 차이에 기초하여 제1 패치와 제2 패치 사이에 모션 벡터 후보가 생성된다. 모션 보상은 모션 벡터 후보를 사용하여 수행된다.A method, system and apparatus comprising a computer program encoded on a computer storage medium for performing motion estimation. In some implementations, the method includes generating a segmentation of the point cloud data based on continuity data of the point cloud data. The segmented representation of point cloud data is projected onto the sides of a 3D bounding box. A patch is created based on the projected representation of the segmented point cloud data. The first frame of the patch is created. The first and second auxiliary information are generated using the first frame and the reference frame. A first patch from the first frame that matches the patch from the reference frame is identified based on the first and second auxiliary information. A motion vector candidate is generated between the first patch and the second patch based on the difference between the first and second auxiliary information. Motion compensation is performed using motion vector candidates.

Description

Motion estimation using 3D auxiliary data

본 출원은 2018년 10월 2일에 출원된 미국 가특허출원 No. 62/740,237 및 2019년 6월 19일 출원된 미국 가특허출원 No. 62/863,362에 대한 우선권을 주장하며, 이 두 문헌은 본 명세서에 그 전문이 참조로 포함된다.This application is a U.S. Provisional Patent Application No. 2, filed October 2, 2018. 62/740,237 and U.S. Provisional Patent Application No. 62/863,362, both of which are incorporated herein by reference in their entirety.

포인트 클라우드 처리(Point cloud processing)는 엔터테인먼트 산업, 지능형 자동차 내비게이션, 지리 공간 검사, 실제 물체의 3 차원(3-D) 모델링 및 환경 시각화와 같은 애플리케이션에서 다양한 애플리케이션의 필수 부분이 되었다.Point cloud processing has become an integral part of a variety of applications in applications such as the entertainment industry, intelligent car navigation, geospatial inspection, three-dimensional (3-D) modeling of real objects and environment visualization.

일부 구현에서, 명세서는 3 차원 및 2 차원 보조 데이터를 사용하여 모션 추정을 수행하기 위한 기술을 설명한다. 모션 추정은 3 차원 포인트 클라우드 데이터를 인코딩하고 전송하기 위해 수행된다. 3 차원 포인트 클라우드 데이터에는 인간 또는 실제 항목과 같은 3 차원 객체의 외부 표면을 윤곽을 그리거나 시각적으로 나타내는 데이터 포인트가 포함된다. 3 차원 포인트 클라우드 데이터에는 3 차원 포인트 클라우드 데이터의 색상, 질감 및 깊이를 나타내는 속성 정보도 포함될 수 있다. 인코더 또는 디코더는 모션 개선 데이터를 사용하여 3 차원 포인트 클라우드 데이터를 각각 인코딩 또는 디코딩할 수 있다.In some implementations, the specification describes techniques for performing motion estimation using three-dimensional and two-dimensional auxiliary data. Motion estimation is performed to encode and transmit 3D point cloud data. Three-dimensional point cloud data includes data points that outline or visually represent the outer surface of a three-dimensional object, such as a human or real-world item. The 3D point cloud data may also include attribute information representing color, texture, and depth of the 3D point cloud data. An encoder or decoder may respectively encode or decode the 3D point cloud data using the motion enhancement data.

일부 구현에서, 인코더 또는 디코더는 3 차원 포인트 클라우드 데이터를 둘러싸기 위해 3 차원 바운딩 박스를 사용하고 이어서 인코딩 및 전송에 사용되는 패치를 생성한다. 인코더는 3 차원 포인트 클라우드 데이터의 이미지를 3 차원 바운딩 박스의 각 측면에 투영할 수 있다. 인코더는 인코딩에 사용할 이미지 또는 패치를 프레임으로 그룹화할 수 있다. 일반적으로 큰 3 차원 포인트 클라우드 데이터를 전송하는 데 일반적으로 사용되는 대역폭의 양을 줄이기 위해 인코더는 대신 현재 생성된 프레임의 패치를 이전에 생성된 프레임의 패치와 비교하여 동작 개선 데이터를 생성할 수 있다. 인코더는 두 프레임 사이의 패치를 일치시키고 일치하는 패치를 식별하는 데이터를 기반으로 동작 개선 데이터를 생성할 수 있다. 예를 들어, 인코더는 위치 좌표 및 패치의 크기를 정의하는 보조 정보를 사용하여 모션 개선 데이터로 포함할 수 있다. 패치 프레임을 인코딩하고 전송하는 대신, 모션 개선 데이터를 인코딩 및 전송에 사용하여 전체 전송 대역폭을 줄이고 메시지를 적절하게 디코딩 및 수신할 수 있다. 모션 세분화 데이터가 식별되면, 모션 세분화 데이터를 기존 비디오 압축 기술에 추가하여 3 차원 포인트 클라우드 데이터 전송을 개선할 수 있다.In some implementations, an encoder or decoder uses a 3-dimensional bounding box to enclose the 3-dimensional point cloud data and then generates patches used for encoding and transmission. The encoder may project images of the 3D point cloud data onto each side of the 3D bounding box. Encoders can group images or patches into frames to be used for encoding. To reduce the amount of bandwidth typically used to transmit typically large three-dimensional point cloud data, an encoder can instead compare a patch of a currently generated frame with a patch of a previously generated frame to generate motion improvement data. . The encoder can match the patches between the two frames and generate motion improvement data based on the data identifying the matching patches. For example, the encoder may use position coordinates and side information defining the size of the patch and include it as motion enhancement data. Instead of encoding and transmitting patch frames, motion enhancement data can be used for encoding and transmission to reduce overall transmission bandwidth and allow messages to be properly decoded and received. Once the motion segmentation data is identified, the motion segmentation data can be added to existing video compression techniques to improve 3D point cloud data transmission.

하나의 일반적인 관점에서, 방법은: 3 차원 포인트 클라우드 데이터의 연속성 데이터에 기초하여 기록된 미디어의 3 차원 포인트 클라우드 데이터의 분할을 생성하는 단계; 상기 분할된 3 차원 포인트 클라우드 데이터의 표현을 3 차원 바운딩 박스(three-dimensional bounding box)의 하나 이상의 측면에 투영하는 단계 - 상기 분할된 3 차원 포인트 클라우드 데이터의 표현은 상기 3 차원 바운딩 박스의 투영된 측면에 기초하여 상이함 - ; 상기 분할된 3 차원 포인트 클라우드 데이터의 투영된 표현에 기초하여 하나 이상의 패치를 생성하는 단계; 상기 하나 이상의 패치의 제1 프레임을 생성하는 단계; 상기 제1 프레임에 대한 제1 보조 정보를 생성하는 단계; 참조 프레임에 대한 제2 보조 정보를 생성하는 단계; 상기 제1 보조 정보 및 상기 제2 보조 정보에 기초하여 상기 참조 프레임으로부터의 제2 패치와 매칭되는 제1 프레임으로부터의 제2 보조 정보 사이의 차이에 기초하여 상기 제1 패치와 상기 제2 패치 사이의 모션 벡터 후보를 생성하는 단계; 및 상기 모션 벡터 후보를 사용하여 모션 보상을 수행하는 단계를 포함한다.In one general aspect, the method includes: generating a segmentation of three-dimensional point cloud data of recorded media based on continuity data of the three-dimensional point cloud data; Projecting the segmented representation of 3-dimensional point cloud data onto one or more sides of a three-dimensional bounding box, wherein the segmented representation of 3-dimensional point cloud data is projected onto the projection of the 3-dimensional bounding box. -Different based on aspect; generating one or more patches based on the projected representation of the segmented three-dimensional point cloud data; generating a first frame of the one or more patches; generating first auxiliary information for the first frame; generating second auxiliary information for the reference frame; Between the first patch and the second patch based on the difference between the second patch from the reference frame and the second auxiliary information from the first frame that matches the second patch from the reference frame based on the first auxiliary information and the second auxiliary information. Generating a motion vector candidate of ; and performing motion compensation using the motion vector candidate.

본 개시 내용 및 본 개시의 다른 관점들의 다른 실시 예들은 컴퓨터 저장 장치상에서 인코딩된, 방법들의 동작을 수행하도록 구성된 대응하는 시스템, 장치 및 컴퓨터 프로그램을 포함한다. 하나 이상의 컴퓨터로 구성된 시스템은 소프트웨어, 펌웨어, 하드웨어 또는 시스템에 설치된 이들의 조합을 통해 구성되어 시스템이 작업을 수행하도록 할 수 있다. 하나 이상의 컴퓨터 프로그램은 데이터 처리 장치에 의해 실행될 때 장치로 하여금 동작을 수행하게 하는 명령을 가짐으로써 구성될 수 있다.Other embodiments of this disclosure and other aspects of this disclosure include corresponding systems, apparatus and computer programs configured to perform the operations of the methods, encoded on a computer storage device. A system composed of one or more computers may be configured through software, firmware, hardware, or a combination of these installed on the system to enable the system to perform tasks. One or more computer programs may be configured by having instructions that, when executed by a data processing device, cause the device to perform actions.

전술한 실시예 및 다른 실시예는 각각 하나 이상의 다음의 특징을 포함할 수 있거나 단독으로 또는 조합으로 포함할 수 있다. 예를 드렁, 일 실시예는 다음의 특징을 조합으로 모두 포함한다.Each of the foregoing and other embodiments may include one or more of the following features, singly or in combination. For example, one embodiment includes all of the following features in combination.

일부 실시예에서, 방법은 상기 참조 프레임은 전송되고 이전에 인코딩된 프레임에 대응하고 상기 제2 보조 정보를 생성하도록 디코딩되는 것을 포함한다.In some embodiments, the method includes the reference frame being transmitted and corresponding to a previously encoded frame and being decoded to generate the second auxiliary information.

일부 실시예에서, 상기 기록된 미디어의 3 차원 포인트 클라우드 데이터의 분할을 생성하는 단계는: 상기 하나 이상의 프로세서가 복수의 분할의 각각의 분할을 후속적으로 투영하고 인코딩하기 위해 상기 3 차원 포인트 클라우드 미디어에 대한 복수의 분할을 생성하는 단계를 더 포함한다.In some embodiments, generating the segmentation of the three-dimensional point cloud data of the recorded media comprises: the one or more processors to subsequently project and encode each segment of the plurality of segmentation the three-dimensional point cloud media Further comprising generating a plurality of partitions for .

일부 실시예에서, 상기 제1 보조 정보는 상기 하나 이상의 패치 각각에 대한 인덱스 데이터, 상기 하나 이상의 패치 각각에 대한 2 차원 데이터 및 상기 하나 이상의 패치 각각에 대한 3 차원 데이터를 포함한다.In some embodiments, the first auxiliary information includes index data for each of the one or more patches, 2-dimensional data for each of the one or more patches, and 3-dimensional data for each of the one or more patches.

일부 실시예에서, 상기 하나 이상의 패치 각각에 대한 인덱스 데이터는 상기 3 차원 바운딩 박스의 대응하는 측면에 대응한다.In some embodiments, the index data for each of the one or more patches corresponds to a corresponding side of the three-dimensional bounding box.

일부 실시예에서, 상기 하나 이상의 패치 각각에 대한 2 차원 데이터 및 상기 하나 이상의 패치 각각에 대한 3 차원 데이터는 상기 3 차원 포인트 클라우드 데이터의 일부에 대응한다.In some embodiments, the 2-dimensional data for each of the one or more patches and the 3-dimensional data for each of the one or more patches correspond to a portion of the 3-dimensional point cloud data.

일부 실시예에서, 상기 3 차원 포인트 클라우드 데이터의 연속성 데이터에 기초하여 상기 3 차원 포인트 클라우드 데이터에 대한 하나 이상의 패치를 생성하는 단계는: 상기 하나 이상의 프로세서가 각 방향으로부터 상기 3 차원 포인트 클라우드 데이터의 평활도 기준(smoothness criteria)을 결정하는 단계; 상기 하나 이상의 프로세서가 상기 3 차원 포인트 클라우드 데이터의 각 방향으로부터 상기 평활도 기준을 비교하는 단계; 및 상기 비교에 응답하여, 상기 하나 이상의 프로세서가 바운딩 박스의 측면에 더 큰 투영 영역을 갖는 상기 3 차원 포인트 클라우드 데이터의 평활도 기준의 방향을 선택하는 단계를 더 포함한다.In some embodiments, generating one or more patches of the 3D point cloud data based on continuity data of the 3D point cloud data includes: the one or more processors performing a smoothness of the 3D point cloud data from each direction. determining smoothness criteria; comparing, by the one or more processors, the smoothness criteria from each direction of the 3D point cloud data; and in response to the comparison, the one or more processors selecting a direction of the smoothness criterion of the three-dimensional point cloud data having a larger projection area on the side of the bounding box.

일부 실시예에서, 상기 제1 패치와 상기 제2 패치 사이의 모션 벡터 후보를 생성하는 단계는: 상기 하나 이상의 프로세서가 상기 제1 보조 정보의 2 차원 데이터와 상기 제2 보조 정보의 2 차원 데이터 사이의 거리를 결정하는 단계; 상기 제1 보조 정보의 2 차원 데이터와 상기 제2 보조 정보의 2 차원 데이터 사이의 거리에 기초하여 상기 모션 벡터 후보를 생성하는 단계; 및 상기 모션 벡터 후보를 모션 벡터 후보 목록에 추가하는 단계를 더 포함한다.In some embodiments, the generating of the motion vector candidate between the first patch and the second patch includes: the one or more processors between the two-dimensional data of the first auxiliary information and the two-dimensional data of the second auxiliary information. determining the distance of; generating the motion vector candidate based on a distance between the 2D data of the first auxiliary information and the 2D data of the second auxiliary information; and adding the motion vector candidate to a motion vector candidate list.

본 명세서의 주제의 하나 이상의 실시예의 자세한 내용은 첨부된 도면 및 아래의 설명에서 설명된다. 주제의 다른 특징, 측면 및 이점은 설명, 도면 및 청구 범위로부터 명백해질 것이다.The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects and advantages of the subject matter will be apparent from the description, drawings and claims.

도 1은 비디오 신호를 코딩하는 예시적인 방법의 흐름도이다.
도 2는 비디오 코딩을 위한 예시적인 코딩 및 디코딩(코덱) 시스템의 개략도이다.
도 3은 예시적인 비디오 인코더를 예시하는 블록도이다.
도 4는 예시적인 비디오 디코더를 예시하는 블록도이다.
도 5는 단방향 인터-예측의 예를 예시하는 개략도이다.
도 6은 양방향 인터-예측의 예를 설명하는 개략도이다.
도 7은 비디오 코딩에 사용되는 예시적인 인트라-예측 모드를 예시하는 개략도이다.
도 8은 비디오 코딩에서 블록의 방향 관계의 예를 예시하는 개략도이다.
도 9는 예시적인 인-루프 필터를 예시하는 블록도이다.
도 10은 블록 분할에 사용되는 예시적인 스플리트 모드를 도시한다.
도 11은 예시적인 비디오 인코딩 메커니즘의 개략도이다.
도 12는 비디오 코딩을 위한 컴퓨팅 장치의 개략도이다.
도 13은 포인트 클라우드 미디어를 예시하는 시스템의 예이다.
도 14는 포인트 클라우드 프레임 시퀀스를 예시하는 시스템의 예이다.
도 15는 3 차원 패치 바운딩 박스를 2 차원 패치 투영으로 변환하는 과정의 예이다.
도 16은 3D 대 2D 패치 투영 결과를 나타내는 시스템의 예이다.
도 17은 클라우드 포인트 미디어에 대한 속성 분할의 예이다.
도 18은 속성 정보를 갖는 포인트 클라우드 미디어에 대한 패키징 패치를 예시하는 시스템의 예이다.
도 19는 모션 추정을 수행하는 시스템의 예이다.
도 20은 현재 프레임의 패치와 참조 프레임의 패치 사이의 모션 벡터 후보를 나타내는 시스템의 일 예이다.
도 21은 병합 후보 목록 구성을 위한 유도 프로세스를 예시한다.
도 22는 공간 병합 후보의 중복 검사를 위해 고려되는 공간 병합 후보 및 후보 쌍의 위치 시스템을 예시한다.
도 23은 Nx2N 및 2NxN 파티션의 제2 PU에 대한 위치를 보여주는 시스템을 예시한다.
도 24는 시간 병합 후보에 대한 스케일링된 모션 벡터를 얻는 것을 예시한다.
도 25는 시간 병합 후보에 대한 후보 위치를 나타내는 시스템
도 26은 결합된 양방향-예측 병합 후보들의 예시적인 테이블을 예시한다.
도 27은 보조 데이터를 사용하여 모션 추정 파이프 라인을 수정 한 시스템의 예이다.
도 28은 V-PCC 유닛 페이로드의 패킷 스트림 표현의 예이다.
도 29는 V-PCC 유닛 페이로드의 시각적 표현의 또 다른 예이다.
도 30은 3D 보조 데이터를 사용하여 모션 추정을 수행하기 위한 프로세스의 예를 예시하는 흐름도이다.
다양한 도면에서 유사한 참조 번호 및 지정은 유사한 요소를 나타낸다.1 is a flow diagram of an exemplary method of coding a video signal.
2 is a schematic diagram of an exemplary coding and decoding (codec) system for video coding.
3 is a block diagram illustrating an exemplary video encoder.
4 is a block diagram illustrating an exemplary video decoder.
5 is a schematic diagram illustrating an example of unidirectional inter-prediction.
6 is a schematic diagram illustrating an example of bi-directional inter-prediction.
7 is a schematic diagram illustrating an exemplary intra-prediction mode used for video coding.
8 is a schematic diagram illustrating an example of a directional relationship of blocks in video coding.
9 is a block diagram illustrating an example in-loop filter.
10 shows an exemplary split mode used for block splitting.
11 is a schematic diagram of an exemplary video encoding mechanism.
12 is a schematic diagram of a computing device for video coding.
13 is an example of a system illustrating point cloud media.
14 is an example of a system illustrating a point cloud frame sequence.
15 is an example of a process of converting a 3D patch bounding box into a 2D patch projection.
16 is an example of a system showing 3D to 2D patch projection results.
17 is an example of attribute segmentation for cloud point media.
18 is an example of a system illustrating a packaging patch for point cloud media with attribute information.
19 is an example of a system that performs motion estimation.
20 is an example of a system representing a motion vector candidate between a patch of a current frame and a patch of a reference frame.
21 illustrates a derivation process for merging candidate list construction.
22 illustrates a spatial merge candidate and a location system of candidate pairs that are considered for redundancy check of spatial merge candidates.
23 illustrates a system showing a location for a second PU of Nx2N and 2NxN partitions.
24 illustrates obtaining a scaled motion vector for a temporal merge candidate.
25 is a system showing candidate positions for temporal merging candidates
26 illustrates an example table of combined bi-predictive merging candidates.
Figure 27 is an example of a system that has modified the motion estimation pipeline using auxiliary data.
28 is an example of a packet stream representation of a V-PCC unit payload.
29 is another example of a visual representation of a V-PCC unit payload.
30 is a flow diagram illustrating an example of a process for performing motion estimation using 3D auxiliary data.
Like reference numbers and designations in the various drawings indicate like elements.

하나 이상의 실시예의 예시적인 구현이 아래에 제공되지만, 개시된 시스템 및/또는 방법은 현재 알려져 있거나 존재하는지에 관계없이 임의의 수의 기술을 사용하여 구현될 수 있다는 것을 처음부터 이해해야 한다. 본 개시는 여기에 예시되고 설명된 예시적인 설계 및 구현을 포함하여 아래에 예시된 예시적인 구현, 도면 및 기술로 제한되어서는 안 되며, 균등물의 전체 범위와 함께 첨부된 청구 범위의 범위 내에서 수정될 수 있다.Although example implementations of one or more embodiments are provided below, it should be understood from the outset that the disclosed systems and/or methods may be implemented using any number of technologies, whether presently known or existing. This disclosure should not be limited to the example implementations, drawings and techniques illustrated below, including the example designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims, along with the full scope of equivalents. It can be.

도 1은 비디오 신호를 코딩하는 예시적인 방법(100)의 흐름도이다. 구체적으로, 비디오 신호는 인코더에서 인코딩된다. 인코딩 프로세스는 비디오 파일 크기를 줄이기 위해 다양한 메커니즘을 사용하여 비디오 신호를 압축한다. 파일 크기가 작을수록 압축된 비디오 파일을 사용자에게 전송하는 동시에 관련 대역폭 오버헤드를 줄일 수 있다. 그런 다음 디코더는 압축된 비디오 파일을 디코딩하여 최종 사용자에게 표시할 원본 비디오 신호를 재구성한다. 디코딩 프로세스는 일반적으로 디코더가 비디오 신호를 일관되게 재구성할 수 있도록 인코딩 프로세스를 미러링한다.1 is a flow diagram of an exemplary method 100 for coding a video signal. Specifically, a video signal is encoded in an encoder. The encoding process compresses the video signal using various mechanisms to reduce the video file size. Smaller file sizes allow you to transmit compressed video files to users while reducing the associated bandwidth overhead. The decoder then decodes the compressed video file to reconstruct the original video signal for display to the end user. The decoding process generally mirrors the encoding process so that the decoder can consistently reconstruct the video signal.

단계 101에서, 비디오 신호가 인코더에 입력된다. 예를 들어, 비디오 신호는 메모리에 저장된 비 압축 비디오 파일일 수 있다. 다른 예로서, 비디오 파일은 비디오 카메라와 같은 비디오 캡처 장치에 의해 캡처될 수 있고 비디오의 라이브 스트리밍을 지원하도록 인코딩될 수 있다. 비디오 파일에는 오디오 컴포넌트와 비디오 컴포넌트가 모두 포함될 수 있다. 비디오 컴포넌트에는 일련의 이미지 프레임이 포함되어 있으며 순차적으로 볼 때 모션의 시각적 인상을 준다. 프레임은 본 명세서에서 루마 성분으로 지칭되는 광 및 크로마 성분으로 지칭되는 컬러로 표현되는 픽셀을 포함한다. 일부 예에서, 프레임은 또한 3 차원 뷰를 지원하기 위해 깊이 값을 포함할 수 있다.In step 101, a video signal is input to an encoder. For example, the video signal may be an uncompressed video file stored in memory. As another example, a video file can be captured by a video capture device, such as a video camera, and encoded to support live streaming of video. A video file may include both an audio component and a video component. The video component contains a series of image frames, which when viewed sequentially give the visual impression of motion. A frame includes pixels represented by light, referred to herein as a luma component, and color, referred to as a chroma component. In some examples, a frame may also include a depth value to support a three-dimensional view.

단계 103에서, 비디오는 블록으로 파티셔닝된다. 파티셔닝은 압축을 위해 각 프레임의 픽셀을 정사각형 및/또는 직사각형 블록으로 세분화하는 것을 포함한다. 예를 들어, 코딩 트리는 추가 인코딩을 지원하는 구성이 달성될 때까지 블록을 분할한 다음 재귀적으로 세분화하는 데 사용될 수 있다. 이와 같이, 블록은 고효율 비디오 코딩(High Efficiency Video Coding, HEVC)(H.265 및 MPEG-H Part 2라고도 함)에서 코딩 트리 단위로 지칭될 수 있다. 예를 들어, 프레임의 루마 성분은 개별 블록이 상대적으로 균일한 조명 값을 포함할 때까지 세분화될 수 있다. 또한, 프레임의 크로마 성분은 개별 블록이 상대적으로 균일한 색상 값을 포함할 때까지 세분화될 수 있다. 따라서 분할 메커니즘은 비디오 프레임의 내용에 따라 달라진다.In step 103, the video is partitioned into blocks. Partitioning involves subdividing the pixels of each frame into square and/or rectangular blocks for compression. For example, a coding tree can be used to divide a block and then recursively subdivide it until a configuration that supports further encoding is achieved. As such, a block may be referred to as a coding tree unit in High Efficiency Video Coding (HEVC) (also referred to as H.265 and MPEG-H Part 2). For example, the luma component of a frame can be subdivided until individual blocks contain relatively uniform illumination values. Also, the chroma components of a frame can be subdivided until individual blocks contain relatively uniform color values. Therefore, the segmentation mechanism depends on the content of the video frame.

단계 105에서, 단계 103에서 분할된 이미지 블록을 압축하기 위해 다양한 압축 메커니즘이 사용된다. 예를 들어, 인터-예측 및/또는 인트라-예측이 사용될 수 있다. 인터-예측은 공통 장면의 객체가 연속 프레임에 나타나는 경향이 있다는 사실을 활용하도록 설계되어 있다. 따라서, 참조 프레임에서 객체를 묘사하는 블록은 후속 프레임에서 반복적으로 기술될 필요가 없다. 구체적으로, 테이블과 같은 객체는 여러 프레임에 걸쳐 일정한 위치에 남아 있을 수 있다. 따라서 테이블은 한 번만 설명하면 되고 이후 프레임은 참조 프레임을 다시 참조할 수 있다. 패턴 매칭 메커니즘은 여러 프레임에 걸쳐 객체를 매칭하기 위해 사용될 수 있다. 또한, 움직이는 객체는 예를 들어 객체 이동 또는 카메라 이동으로 인해 여러 프레임에 걸쳐 표현될 수 있다. 특정 예로서, 비디오는 여러 프레임에 걸쳐 화면을 가로 질러 이동하는 자동차를 보여줄 수 있다. 이러한 움직임을 설명하기 위해 모션 벡터를 사용할 수 있다. 모션 벡터는 프레임에 있는 객체의 좌표에서 참조 프레임에 있는 객체의 좌표까지 오프셋을 제공하는 2 차원 벡터이다. 이와 같이 인터-예측은 현재 프레임의 이미지 블록을 참조 프레임의 해당 블록으로부터의 오프셋을 나타내는 모션 벡터의 세트로 인코딩할 수 있다.In step 105, various compression mechanisms are used to compress the image block segmented in step 103. For example, inter-prediction and/or intra-prediction may be used. Inter-prediction is designed to exploit the fact that objects in a common scene tend to appear in successive frames. Thus, a block depicting an object in a reference frame does not need to be repeatedly described in a subsequent frame. Specifically, an object such as a table may remain in a constant position over several frames. Thus, the table only needs to be described once, and later frames can refer back to the reference frame. A pattern matching mechanism can be used to match objects across multiple frames. Also, a moving object may be expressed over several frames due to object movement or camera movement, for example. As a specific example, a video may show a car moving across the screen over several frames. Motion vectors can be used to describe these movements. A motion vector is a two-dimensional vector that provides an offset from the object's coordinates in the frame to the object's coordinates in the reference frame. As such, inter-prediction may encode an image block of a current frame into a set of motion vectors representing an offset from that block of a reference frame.

인트라-예측은 공통 프레임의 블록을 인코딩한다. 인트라-예측은 루마 성분 및 크로마 성분이 한 프레임에 클러스터되는 경향이 있다는 사실을 활용한다. 예를 들어, 나무의 일부에 있는 녹색 패치는 유사한 녹색 패치 옆에 위치하는 경향이 있다. 인트라-예측은 다중 방향 예측 모드(예를 들어, HEVC에서 33), 평면 모드 및 직류(direct current, DC) 모드를 사용한다. 방향 모드는 현재 블록이 해당 방향의 인접 블록과 유사/동일함을 나타낸다. 평면 모드는 행을 따르는 일련의 블록(예를 들어, 평면)이 행 가장자리의 인접 블록을 기반으로 보간될 수 있음을 나타낸다. 사실상 평면 모드는 값을 변경할 때 상대적으로 일정한 기울기를 사용하여 행을 가로 질러 빛/색상이 부드럽게 전환됨을 나타낸다. DC 모드는 경계 평활화에 사용되며 블록이 방향 예측 모드의 각도 방향과 관련된 모든 인접 블록과 관련된 평균 값과 유사/동일함을 나타낸다. 따라서 인트라-예측 블록은 실제 값 대신 다양한 관계형 예측 모드 값으로 영상 블록을 나타낼 수 있다. 또한 인터-예측 블록은 이미지 블록을 실제 값 대신 모션 벡터 값으로 나타낼 수 있다. 두 경우 모두 예측 블록은 경우에 따라 이미지 블록을 정확하게 표현하지 못할 수 있다. 모든 차이는 잔여 블록에 저장된다. 파일을 추가로 압축하기 위해 잔여 블록에 변환을 적용할 수 있다.Intra-prediction encodes blocks of a common frame. Intra-prediction takes advantage of the fact that luma and chroma components tend to cluster in one frame. For example, green patches on parts of trees tend to be located next to similar green patches. Intra-prediction uses a multi-directional prediction mode (eg, 33 in HEVC), a planar mode, and a direct current (DC) mode. The direction mode indicates that the current block is similar/identical to an adjacent block in that direction. Planar mode indicates that a series of blocks (e.g., planes) along a row may be interpolated based on adjacent blocks at the row edges. Effectively flat mode represents a smooth transition of light/color across the rows, using a relatively constant gradient as you change values. The DC mode is used for boundary smoothing and indicates that a block is similar/equal to the average value associated with all adjacent blocks relative to the angular orientation of the direction prediction mode. Accordingly, an intra-prediction block may represent an image block with various relational prediction mode values instead of actual values. Also, the inter-prediction block may represent an image block as a motion vector value instead of an actual value. In both cases, the prediction block may not accurately represent the image block in some cases. All differences are stored in the residual block. A transform can be applied to the remaining blocks to further compress the file.

단계 107에서, 다양한 필터링 기술이 적용될 수 있다. HEVC에서 필터는 인-루프 필터링 방식에 따라 적용된다. 위에서 논의된 블록 기반 예측은 디코더에서 블록 이미지의 생성을 초래할 수 있다. 또한, 블록 기반 예측 방식은 블록을 인코딩한 후 나중에 참조 블록으로 사용하기 위해 인코딩된 블록을 재구성할 수 있다. 인-루프 필터링 체계는 노이즈 억제 필터, 디 블로킹 필터, 적응형 루프 필터 및 SAO 필터를 블록/프레임에 반복적으로 적용한다. 이러한 필터는 인코딩된 파일을 정확하게 재구성할 수 있도록 이러한 차단 아티팩트를 완화한다. 또한, 이러한 필터는 재구성된 참조 블록에서 아티팩트를 완화하여 아티팩트가 재구성된 참조 블록을 기반으로 인코딩되는 후속 블록에서 추가 아티팩트를 생성할 가능성을 줄인다. 인-루프 필터링 프로세스는 아래에서 자세히 설명한다.At step 107, various filtering techniques may be applied. In HEVC, filters are applied according to the in-loop filtering method. The block-based prediction discussed above can result in the creation of block images at the decoder. In addition, the block-based prediction scheme can reconstruct the encoded block for later use as a reference block after encoding the block. The in-loop filtering scheme repeatedly applies a noise suppression filter, a deblocking filter, an adaptive loop filter and a SAO filter to blocks/frames. These filters mitigate these blocking artifacts so that encoded files can be accurately reconstructed. Additionally, these filters mitigate artifacts in the reconstructed reference block to reduce the likelihood that artifacts will create additional artifacts in subsequent blocks that are encoded based on the reconstructed reference block. The in-loop filtering process is detailed below.

비디오 신호가 분할, 압축 및 필터링되면, 결과 데이터는 단계 109에서 비트스트림으로 인코딩된다. 비트스트림은 디코더에서 적절한 비디오 신호 재구성을 지원하기 위해 원하는 임의의 시그널링 데이터뿐만 아니라 위에서 논의된 데이터를 포함한다. 예를 들어, 그러한 데이터는 파티션 데이터, 예측 데이터, 잔여 블록, 및 코딩 명령을 디코더에 제공하는 다양한 플래그를 포함할 수 있다. 비트스트림은 요청시 디코더를 향한 전송을 위해 메모리에 저장될 수 있다. 비트스트림은 또한 복수의 디코더를 향해 브로드캐스트 및/또는 멀티캐스트될 수 있다. 비트스트림의 생성은 반복적인 프로세스이다. 따라서, 단계 101, 103, 105, 107 및 109는 많은 프레임 및 블록에 걸쳐 연속적으로 및/또는 동시에 발생할 수 있다. 도 1에 도시된 순서는 명확성과 논의의 용이성을 위해 제시된 것이며, 비디오 코딩 프로세스를 특정 순서로 제한하려는 것이 아니다.Once the video signal is split, compressed and filtered, the resulting data is encoded into a bitstream at step 109. The bitstream contains the data discussed above as well as any signaling data desired to support proper video signal reconstruction at the decoder. For example, such data may include partition data, prediction data, residual blocks, and various flags that provide coding instructions to the decoder. The bitstream may be stored in memory for transmission towards the decoder upon request. A bitstream may also be broadcast and/or multicast towards multiple decoders. Generating a bitstream is an iterative process. Thus, steps 101, 103, 105, 107 and 109 may occur sequentially and/or concurrently over many frames and blocks. The order shown in FIG. 1 is presented for clarity and ease of discussion, and is not intended to limit the video coding process to any particular order.

디코더는 비트스트림을 수신하고 단계 111에서 디코딩 프로세스를 시작한다. 구체적으로, 디코더는 비트스트림을 대응하는 신택스 및 비디오 데이터로 변환시키기 위해 엔트로피 디코딩 체계를 적용한다. 디코더는 단계 111에서 프레임에 대한 파티션을 결정하기 위해 비트스트림으로부터의 신택스 데이터를 적용한다. 파티셔닝은 단계 103에서 블록 파티셔닝의 결과를 매칭해야 한다. 단계 111에서 적용되는 바와 같은 엔트로피 인코딩/디코딩에 대해 설명한다. 인코더는 입력 이미지(들)에서의 공간 포지셔닝 값에 기초하여 수 개의 가능한 선택 중에서 블록-파티셔닝 방식을 선택하는 것과 같이 압축 프로세스 중에 많은 선택을 한다. 정확한 선택을 시그널링하는 데 많은 빈(bin)이 사용될 수 있다. 본 명세서에서 사용되는 바와 같이, 빈은 변수(예를 들어, 컨텍스트에 따라 달라질 수 있는 비트 값)로 취급되는 이진 값이다. 엔트로피 코딩을 사용하면 인코더가 특정 사례에 대해 명확하게 실행 불가능한 옵션을 버리고 허용 가능한 옵션 세트만 남길 수 있다. 그런 다음 각 허용 옵션에 코드 단어가 할당된다. 코드워드의 길이는 허용 가능한 옵션의 수를 기반으로 한다(예를 들어, 2 개의 옵션에 대해 하나의 빈, 3 내지 4 개의 옵션에 대해 2 개의 빈 등). 그런 다음 인코더는 선택한 옵션에 대한 코드워드를 인코딩한다. 이 체계는 가능한 모든 옵션의 잠재적으로 큰 세트 중에서의 선택을 고유하게 나타내는 것과는 반대로 허용 가능한 옵션의 작은 서브-세트 중에서의 선택을 고유하게 나타내기 위해 코드 단어가 원하는 만큼만 크기 때문에 코드 단어의 크기를 줄인다. 그런 다음 디코더는 인코더와 유사한 방식으로 허용 가능한 옵션 세트를 결정하여 선택을 디코딩한다. 허용 가능한 옵션 세트를 결정함으로써 디코더는 코드워드를 읽고 인코더가 선택한 사항을 결정할 수 있다.The decoder receives the bitstream and starts the decoding process at step 111 . Specifically, the decoder applies an entropy decoding scheme to convert a bitstream into corresponding syntax and video data. The decoder applies syntax data from the bitstream to determine the partition for the frame at step 111 . Partitioning must match the result of block partitioning in step 103. Entropy encoding/decoding as applied in step 111 is described. The encoder makes many choices during the compression process, such as selecting a block-partitioning scheme from among several possible choices based on spatial positioning values in the input image(s). Many bins can be used to signal correct selection. As used herein, a bin is a binary value that is treated as a variable (eg, a bit value that may vary depending on context). Entropy coding allows the encoder to discard clearly infeasible options for a particular case, leaving only an acceptable set of options. A code word is then assigned to each permitted option. The length of the codeword is based on the number of options allowed (eg, one bin for two options, two bins for three or four options, etc.). The encoder then encodes the codeword for the selected option. This scheme reduces the size of code words because code words are only as large as desired to uniquely represent a choice among a small sub-set of acceptable options, as opposed to uniquely representing a choice among a potentially large set of all possible options. . The decoder then decodes the selection by determining the set of acceptable options in a manner similar to the encoder. By determining the set of acceptable options, the decoder can read the codeword and determine the encoder's choice.

단계 113에서 디코더는 블록 디코딩을 수행한다. 구체적으로, 디코더는 역변환을 사용하여 잔여 블록을 생성한다. 그런 다음 디코더는 분할에 따라 이미지 블록을 재구성하기 위해 잔여 블록 및 대응하는 예측 블록을 사용한다. 예측 블록은 단계 105에서 인코더에서 생성된 바와 같이 인트라-예측 블록 및 인터-예측 블록을 모두 포함할 수 있다. 그런 다음, 재구성된 이미지 블록은 단계 111에서 결정된 분할 데이터에 따라 재구성된 비디오 신호의 프레임으로 위치 결정된다. 단계 113은 또한 전술한 바와 같이 엔트로피 코딩을 통해 비트스트림에서 시그널링될 수 있다.In step 113, the decoder performs block decoding. Specifically, the decoder uses an inverse transform to generate a residual block. The decoder then uses the residual block and the corresponding predictive block to reconstruct the image block according to the segmentation. The predictive blocks may include both intra-prediction blocks and inter-prediction blocks as generated at the encoder in step 105 . Then, the reconstructed image block is positioned into a frame of the reconstructed video signal according to the segmentation data determined in step 111. Step 113 may also be signaled in the bitstream via entropy coding as described above.

단계 115에서, 인코더에서의 단계 107과 유사한 방식으로 재구성된 비디오 신호의 프레임에 대해 필터링이 수행된다. 예를 들어, 노이즈 억제 필터, 디 블로킹 필터, 적응형 루프 필터 및 SAO 필터가 블로킹 아티팩트를 제거하기 위해 프레임에 적용될 수 있다. 프레임이 필터링되면, 비디오 신호는 최종 사용자가 볼 수 있도록 단계 117에서 디스플레이로 출력될 수 있다.In step 115, filtering is performed on frames of the reconstructed video signal in a manner similar to step 107 in the encoder. For example, noise suppression filters, deblocking filters, adaptive loop filters, and SAO filters may be applied to frames to remove blocking artifacts. Once the frames are filtered, the video signal can be output to a display at step 117 for viewing by an end user.

도 2는 비디오 코딩을 위한 예시적인 코딩 및 디코딩(코덱) 시스템(200)의 개략도이다. 구체적으로, 코덱 시스템(200)은 방법(100)의 구현을 지원하는 기능을 제공한다. 코덱 시스템(200)은 인코더 및 디코더 모두에서 사용되는 컴포넌트를 묘사하도록 일반화된다. 코덱 시스템(200)은 방법(100)의 단계 101 및 103과 관련하여 논의된 바와 같이 비디오 신호를 수신하고 분할하며, 그 결과 파티셔닝된 비디오 신호(201)를 초래한다. 방법(100)의 단계 105, 107 및 109와 관련하여 논의된 바와 같이 코덱 시스템(200)은 인코더로서 동작할 때 파티셔닝된 비디오 신호(201)를 코딩된 비트스트림으로 압축한다. 디코더 코덱 시스템(200)으로서 동작할 때 방법(100)의 단계 111, 113, 115 및 117에 관해 논의된 바와 같이 비트스트림으로부터 출력 비디오 신호를 생성한다. 코덱 시스템(200)은 일반 코더 제어 컴포넌트(211), 변환 스케일링 및 양자화 컴포넌트(213), 인트라-픽처 추정 컴포넌트(215), 인트라-픽처 예측 컴포넌트(217), 모션 보상 컴포넌트(219), 모션 추정 컴포넌트(221), 스케일 및 역변환 컴포넌트(229), 필터 제어 분석 컴포넌트(227), 인-루프 필터 컴포넌트(225), 디코딩된 픽처 버퍼 컴포넌트(223), 및 헤더 포맷팅 및 컨텍스트 적응 이진 산술 코딩(header formatting and Context adaptive binary arithmetic coding, CABAC) 컴포넌트(231)를 포함한다. 이러한 컴포넌트는 도시된 바와 같이 결합된다. 도 2에서 검은 색 선은 인코딩/디코딩할 데이터의 이동을 나타내고, 점선은 다른 컴포넌트의 동작을 제어하는 제어 데이터의 이동을 나타낸다. 코덱 시스템(200)의 컴포넌트는 모두 인코더에 존재할 수 있다. 디코더는 코덱 시스템(200)의 컴포넌트의 서브세트를 포함할 수 있다. 예를 들어, 디코더는 인트라-픽처 예측 컴포넌트(217), 모션 보상 컴포넌트(219), 스케일링 및 역변환 컴포넌트(229), 인-루프 필터 컴포넌트(225), 및 디코딩된 픽처 버퍼 컴포넌트(223)를 포함할 수 있다. 이들 컴포넌트가 이제 설명된다.2 is a schematic diagram of an exemplary coding and decoding (codec) system 200 for video coding. Specifically, the codec system 200 provides functionality to support implementation of the method 100 . Codec system 200 is generalized to describe components used in both encoders and decoders. Codec system 200 receives and divides the video signal as discussed with respect to steps 101 and 103 of method 100 , resulting in partitioned video signal 201 . As discussed with respect to steps 105, 107 and 109 of method 100, codec system 200, acting as an encoder, compresses partitioned video signal 201 into a coded bitstream. As discussed with respect to steps 111, 113, 115 and 117 of method 100 when operating as decoder codec system 200 generates an output video signal from a bitstream. The codec system (200) includes a general coder control component (211), a transform scaling and quantization component (213), an intra-picture estimation component (215), an intra-picture prediction component (217), a motion compensation component (219), motion estimation component 221, scale and inverse transform component 229, filter control analysis component 227, in-loop filter component 225, decoded picture buffer component 223, and header formatting and context adaptive binary arithmetic coding (header formatting and Context adaptive binary arithmetic coding (CABAC) component 231. These components are combined as shown. In FIG. 2 , black lines represent the movement of data to be encoded/decoded, and dotted lines represent the movement of control data that controls the operation of other components. All of the components of the codec system 200 may reside in an encoder. A decoder may include a subset of the components of codec system 200. For example, the decoder includes an intra-picture prediction component 217, a motion compensation component 219, a scaling and inverse transform component 229, an in-loop filter component 225, and a decoded picture buffer component 223. can do. These components are now described.

파티셔닝된 비디오 신호(201)는 코딩 트리에 의해 픽셀 블록으로 분할된 캡처된 비디오 스트림(captured video stream)이다. 코딩 트리는 다양한 스플리트 모드를 사용하여 픽셀 블록을 더 작은 픽셀 블록으로 세분화한다. 그런 다음 이 블록은 더 작은 블록으로 더 세분화될 수 있다. 블록은 코딩 트리에서 노드로 지칭될 수 있다. 더 큰 부모 노드는 더 작은 자식 노드로 파티셔닝된다. 노드가 세분화되는 횟수를 노드/코딩 트리의 깊이라고 한다. 분할된 블록은 경우에 따라 코딩 단위(coding unit, CU)라고도 한다. 스플리트 모드는 적용된 스플리트 모드에 따라 노드를 다양한 형태의 2 개, 3 개 또는 4 개의 자식 노드로 각각 분할하는 데 사용되는 이진 트리(binary tree, BT), 트리플 트리(triple tree, TT) 및 쿼드 트리(quad tree, QT)를 포함할 수 있다. 파티셔닝된 비디오 신호(201)는 일반 코더 제어 컴포넌트(211), 변환 스케일링 및 양자화 컴포넌트(213), 인트라-픽처 추정 컴포넌트(215), 필터 제어 분석 컴포넌트(227), 및 압축을 위한 모션 추정 컴포넌트(221)로 포워딩된다.A partitioned video signal 201 is a captured video stream that has been divided into blocks of pixels by a coding tree. The coding tree subdivides blocks of pixels into smaller blocks of pixels using various splitting modes. These blocks can then be further subdivided into smaller blocks. A block may be referred to as a node in a coding tree. Larger parent nodes are partitioned into smaller child nodes. The number of times a node is subdivided is called the depth of the node/coding tree. A divided block is also referred to as a coding unit (CU) in some cases. Split mode is a binary tree (BT), a triple tree (TT) and It may include a quad tree (QT). The partitioned video signal 201 includes a general coder control component 211, a transform scaling and quantization component 213, an intra-picture estimation component 215, a filter control analysis component 227, and a motion estimation component for compression ( 221) is forwarded.

일반 코더 제어 컴포넌트(211)는 애플리케이션 제약에 따라 비디오 시퀀스의 이미지를 비트스트림으로 코딩하는 것과 관련된 결정을 내리도록 구성된다. 예를 들어, 일반 코더 제어 컴포넌트(211)는 비트레이트/비트스트림 크기 대 재구성 품질의 최적화를 관리한다. 이러한 결정은 저장 공간/대역폭 가용성 및 이미지 해상도 요청을 기반으로 할 수 있다. 일반 코더 제어 컴포넌트(211)는 또한 버퍼 언더런 및 오버런 문제를 완화하기 위해 전송 속도 측면에서 버퍼 활용을 관리한다. 이러한 문제를 관리하기 위해 일반 코더 제어 컴포넌트(211)는 다른 컴포넌트에 의한 파티셔닝, 예측 및 필터링을 관리한다. 예를 들어, 일반 코더 제어 컴포넌트(211)는 해상도를 증가시키고 대역폭 사용을 증가시키기 위해 압축 복잡도를 동적으로 증가시키거나 해상도 및 대역폭 사용을 감소시키기 위해 압축 복잡도를 감소시킬 수 있다. 따라서, 일반 코더 제어 컴포넌트(211)는 비디오 신호 재구성 품질과 비트레이트 문제의 균형을 맞추기 위해 코덱 시스템(200)의 다른 컴포넌트를 제어한다. 일반 코더 제어 컴포넌트(211)는 다른 컴포넌트의 동작을 제어하는 제어 데이터를 생성한다. 제어 데이터는 또한 비트스트림에서 인코딩되도록 헤더 포맷팅 및 CABAC 컴포넌트(231)로 포워딩되어 디코더에서 디코딩을 위한 파라미터를 시그널링한다.The generic coder control component 211 is configured to make decisions related to coding images of a video sequence into a bitstream according to application constraints. For example, the generic coder control component 211 manages optimization of bitrate/bitstream size versus reconstruction quality. This decision may be based on storage space/bandwidth availability and image resolution requests. The generic coder control component 211 also manages buffer utilization in terms of transfer rate to mitigate buffer underrun and overrun problems. To manage these issues, the generic coder control component 211 manages partitioning, prediction and filtering by other components. For example, the generic coder control component 211 can dynamically increase compression complexity to increase resolution and increase bandwidth usage or decrease compression complexity to decrease resolution and bandwidth usage. Thus, the generic coder control component 211 controls the other components of the codec system 200 to balance video signal reconstruction quality and bit rate issues. A generic coder control component 211 generates control data that controls the operation of other components. Control data is also forwarded to the header formatting and CABAC component 231 to be encoded in the bitstream to signal parameters for decoding at the decoder.

파티셔닝된 비디오 신호(201)는 또한 인터-예측을 위해 모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)로 전송된다. 파티셔닝된 비디오 신호(201)의 프레임 또는 슬라이스는 다수의 비디오 블록으로 분할될 수 있다. 모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)는 시간적 예측을 제공하기 위해 하나 이상의 참조 프레임에서 하나 이상의 블록에 대해 수신된 비디오 블록의 인터-예측 코딩을 수행한다. 코덱 시스템(200)은 예를 들어 비디오 데이터의 각 블록에 대한 적절한 코딩 모드를 선택하기 위해 다중 코딩 패스를 수행할 수 있다.The partitioned video signal 201 is also sent to a motion estimation component 221 and a motion compensation component 219 for inter-prediction. A frame or slice of the partitioned video signal 201 may be divided into multiple video blocks. Motion estimation component 221 and motion compensation component 219 perform inter-predictive coding of the received video blocks relative to one or more blocks in one or more reference frames to provide temporal prediction. Codec system 200 may perform multiple coding passes to select an appropriate coding mode for each block of video data, for example.

모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)는 고도로 통합될 수 있지만, 개념적 목적을 위해 별도로 예시된다. 모션 추정 컴포넌트(221)에 의해 수행되는 모션 추정은 비디오 블록에 대한 모션을 추정하는 모션 벡터를 생성하는 프로세스이다. 예를 들어, 모션 벡터는 현재 프레임(또는 다른 코딩된 단위) 내에서 코딩되는 현재 블록에 대한 참조 프레임(또는 다른 코딩된 단위) 내의 예측 블록에 대한 비디오 블록의 예측 단위(PU)의 변위를 나타낼 수 있다. 예측 블록은 절대 차이의 합(sum of absolute difference, SAD), 제곱 차이의 합(sum of square difference, SSD) 또는 기타 차이 메트릭에 의해 결정될 수 있는 픽셀 차이 측면에서 코딩될 블록과 밀접하게 일치하는 것으로 확인된 블록이다. 일부 예들에서, 코덱 시스템(200)은 디코딩된 픽처 버퍼(223)에 저장된 참조 픽처의 서브 정수 픽셀 위치에 대한 값을 카운팅할 수 있다. 예를 들어, 비디오 코덱 시스템(200)은 1/4 픽셀 위치, 1/8 픽셀 위치, 또는 참조 화상의 다른 분수 픽셀 위치의 값을 보간할 수 있다. 따라서, 모션 추정 컴포넌트(221)는 전체 픽셀 위치 및 부분 픽셀 위치에 대한 모션 검색을 수행하고 부분 픽셀 정밀도를 갖는 모션 벡터를 출력할 수 있다. 모션 추정 컴포넌트(221)는 PU의 위치를 참조 픽처의 예측 블록의 위치와 비교함으로써 인터 코딩된 슬라이스에서 비디오 블록의 PU에 대한 모션 벡터를 계산한다. 모션 추정 컴포넌트(221)는 모션 보상 컴포넌트(219)로의 인코딩 및 모션을 위해 헤더 포맷 및 CABAC 컴포넌트(231)에 모션 데이터로서 계산된 모션 벡터를 출력한다.Motion estimation component 221 and motion compensation component 219 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation component 221, is the process of generating motion vectors that estimate motion for a video block. For example, a motion vector represents the displacement of a predictive unit (PU) of a video block relative to a predictive block within a reference frame (or other coded unit) relative to a current block being coded within the current frame (or other coded unit). can A predictive block is one that closely matches the block to be coded in terms of pixel differences, which can be determined by sum of absolute difference (SAD), sum of square difference (SSD), or other difference metric. block is confirmed. In some examples, the codec system 200 may count values for sub-integer pixel positions of reference pictures stored in the decoded picture buffer 223 . For example, the video codec system 200 may interpolate values of quarter pixel positions, eighth pixel positions, or other fractional pixel positions of a reference picture. Thus, the motion estimation component 221 can perform a motion search for full and sub-pixel locations and output motion vectors with sub-pixel precision. Motion estimation component 221 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU with the position of a predictive block of a reference picture. The motion estimation component 221 outputs the calculated motion vectors as motion data to a header format and CABAC component 231 for motion and encoding to a motion compensation component 219 .

모션 보상 컴포넌트(219)에 의해 수행되는 모션 보상은 모션 추정 컴포넌트(221)에 의해 결정된 모션 벡터에 기초하여 예측 블록을 페치하거나 생성하는 것을 포함할 수 있다. 다시, 모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)는 일부 예들에서 기능적으로 통합될 수 있다. 현재 비디오 블록의 PU에 대한 모션 벡터를 수신하면, 모션 보상 컴포넌트(219)는 모션 벡터가 참조 화상 목록을 가리키는 예측 블록을 찾을 수 있다. 잔여 비디오 블록은 코딩되는 현재 비디오 블록의 픽셀 값에서 예측 블록의 픽셀 값을 감산하여 픽셀 차이 값을 형성함으로써 형성된다. 일반적으로, 모션 추정 컴포넌트(221)는 루마 성분에 대한 모션 추정을 수행하고, 모션 보상 컴포넌트(219)는 크로마 성분 및 루마 성분 모두에 대해 루마 성분에 기초하여 계산된 모션 벡터를 사용한다. 예측 블록 및 잔여 블록은 변환 스케일링 및 양자화 컴포넌트(213)로 포워딩된다.Motion compensation performed by motion compensation component 219 can include fetching or generating a predictive block based on the motion vector determined by motion estimation component 221 . Again, motion estimation component 221 and motion compensation component 219 may be functionally integrated in some examples. Upon receiving the motion vector for the PU of the current video block, motion compensation component 219 can find the predictive block to which the motion vector points to a reference picture list. The residual video block is formed by subtracting the pixel value of the predictive block from the pixel value of the current video block being coded to form a pixel difference value. In general, motion estimation component 221 performs motion estimation for luma components, and motion compensation component 219 uses motion vectors calculated based on luma components for both chroma and luma components. The predictive block and residual block are forwarded to a transform scaling and quantization component (213).

파티셔닝된 비디오 신호(201)는 또한 인트라-픽처 추정 컴포넌트(215) 및 인트라-픽처 예측 컴포넌트(217)로 전송된다. 모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)와 마찬가지로, 인트라-픽처 추정 컴포넌트(215) 및 인트라-픽처 예측 컴포넌트(217)는 고도로 통합될 수 있지만 개념적 목적을 위해 별도로 설명되어 있다. 인트라-픽처 추정 컴포넌트(215) 및 인트라-픽처 예측 컴포넌트(217)는 전술한 바와 같이 프레임 사이의 모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)에 의해 수행되는 인터-예측에 대한 대안으로서, 현재 프레임의 블록에 대한 현재 블록을 인트라-예측한다. 특히, 인트라-픽처 추정 컴포넌트(215)는 현재 블록을 인코딩하기 위해 사용할 인트라-예측 모드를 결정한다. 일부 예들에서, 인트라-픽처 추정 컴포넌트(215)는 다수의 테스트된 인트라-예측 모드로부터 현재 블록을 인코딩하기 위해 적절한 인트라-예측 모드를 선택한다. 선택된 인트라-예측 모드는 인코딩을 위해 헤더 포맷팅 및 CABAC 컴포넌트(231)로 포워딩된다.The partitioned video signal 201 is also sent to an intra-picture estimation component 215 and an intra-picture prediction component 217 . Like motion estimation component 221 and motion compensation component 219, intra-picture estimation component 215 and intra-picture prediction component 217 may be highly integrated, but are described separately for conceptual purposes. The intra-picture estimation component 215 and the intra-picture prediction component 217, as an alternative to the inter-prediction performed by the inter-frame motion estimation component 221 and motion compensation component 219 as described above, Intra-predict the current block for blocks in the current frame. In particular, intra-picture estimation component 215 determines the intra-prediction mode to use for encoding the current block. In some examples, intra-picture estimation component 215 selects an appropriate intra-prediction mode to encode the current block from a number of tested intra-prediction modes. The selected intra-prediction mode is forwarded to the header formatting and CABAC component 231 for encoding.

예를 들어, 인트라-픽처 추정 컴포넌트(215)는 테스트된 다양한 인트라-예측 모드에 대한 레이트 왜곡 분석을 이용하여 레이트-왜곡 값을 계산하고, 테스트된 모드 중에서 가장 좋은 레이트-왜곡 특성을 갖는 인트라-예측 모드를 선택한다. 레이트 왜곡 분석은 일반적으로 인코딩된 블록을 생성하기 위해 인코딩된 원래의 인코딩되지 않은 블록과 인코딩된 블록 간의 왜곡(또는 오류) 양뿐만 아니라 인코딩된 블록을 생성하는 데 사용된 비트레이트(예를 들어, 비트 수)를 결정한다. 인트라-픽처 추정 컴포넌트(215)는 다양한 인코딩된 블록에 대한 왜곡 및 레이트로부터 비율을 계산하여 블록에 대한 최상의 레이트-왜곡 값을 나타내는 인트라-예측 모드를 결정한다. 또한, 인트라-픽처 추정 컴포넌트(215)는 레이트-왜곡 최적화(RDO)에 기초한 깊이 모델링 모드(DMM)를 사용하여 깊이 맵의 깊이 블록을 코딩하도록 구성될 수 있다.For example, the intra-picture estimation component 215 calculates a rate-distortion value using rate-distortion analysis for the various intra-prediction modes tested, and selects an intra-distortion value with the best rate-distortion characteristics among the tested modes. Choose a prediction mode. Rate distortion analysis is generally performed on the amount of distortion (or error) between the original unencoded block and the encoded block that was encoded to produce the encoded block, as well as the bitrate used to generate the encoded block (e.g., number of bits). Intra-picture estimation component 215 calculates ratios from the rates and distortions for the various encoded blocks to determine the intra-prediction mode that exhibits the best rate-distortion values for the blocks. Additionally, the intra-picture estimation component 215 can be configured to code depth blocks of the depth map using a depth modeling mode (DMM) based on rate-distortion optimization (RDO).

인트라-픽처 예측 컴포넌트(217)는 인트라-픽처 추정 컴포넌트(215)에 의해 결정된 선택된 인트라-예측 모드에 기초하여 예측 블록으로부터 잔여 블록을 생성할 수 있다. 잔여 블록은 행렬로 표현된, 예측 블록과 원래 블록 사이의 값의 차이를 포함한다. 그런 다음 잔여 블록은 변환 스케일링 및 양자화 컴포넌트(213)로 포워딩된다. 인트라-픽처 추정 컴포넌트(215) 및 인트라-픽처 예측 컴포넌트(217)는 루마 및 크로마 성분 모두에서 동작할 수 있다.Intra-picture prediction component 217 can generate a residual block from the predictive block based on the selected intra-prediction mode determined by intra-picture estimation component 215 . The residual block contains the difference in value between the predicted block and the original block, represented as a matrix. The residual block is then forwarded to the transform scaling and quantization component 213. Intra-picture estimation component 215 and intra-picture prediction component 217 can operate on both luma and chroma components.

변환 스케일링 및 양자화 컴포넌트(213)는 잔여 블록을 추가로 압축하도록 구성된다. 변환 스케일링 및 양자화 컴포넌트(213)는 이산 코사인 변환(DCT), 이산 사인 변환(DST) 또는 개념적으로 유사한 변환과 같은 변환을 잔여 블록에 적용하여 잔여 변환 계수 값을 포함하는 비디오 블록을 생성한다. 웨이블릿 변환, 정수 변환, 서브 밴드 변환 또는 기타 유형의 변환도 사용할 수 있다. 변환은 잔여 정보를 픽셀 값 도메인으로부터 주파수 도메인과 같은 변환 도메인으로 변환할 수 있다. 변환 스케일링 및 양자화 컴포넌트(213)는 또한 예를 들어 주파수에 기초하여 변환된 잔여 정보를 스케일링하도록 구성된다. 이러한 스케일링은 스케일 팩터를 잔여 정보에 적용하여 상이한 주파수 정보가 상이한 입도(granularity)에서 양자화되도록 하는 것을 포함하며, 이는 재구성된 비디오의 최종 시각적 품질에 영향을 미칠 수 있다. 변환 스케일링 및 양자화 컴포넌트(213)는 또한 비트레이트를 더 감소시키기 위해 변환 계수를 양자화하도록 구성된다. 양자화 프로세스는 계수의 일부 또는 전부와 관련된 비트 깊이를 감소시킬 수 있다. 양자화의 정도는 양자화 파라미터를 조정함으로써 수정될 수 있다. 일부 예들에서, 변환 스케일링 및 양자화 컴포넌트(213)는 양자화된 변환 계수들을 포함하는 매트릭스의 스캔을 수행할 수 있다. 양자화된 변환 계수는 헤더 포맷팅 및 CABAC 컴포넌트(231)로 포워딩되어 비트스트림으로 인코딩된다.The transform scaling and quantization component 213 is configured to further compress the residual block. Transform scaling and quantization component 213 applies a transform, such as a discrete cosine transform (DCT), discrete sine transform (DST), or a conceptually similar transform, to the residual block to produce a video block containing residual transform coefficient values. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms can also be used. A transform may transform the residual information from a pixel value domain to a transform domain such as a frequency domain. The transform scaling and quantization component 213 is also configured to scale the transformed residual information, for example based on frequency. This scaling involves applying a scale factor to the residual information so that different frequency information is quantized at different granularity, which can affect the final visual quality of the reconstructed video. Transform scaling and quantization component 213 is also configured to quantize transform coefficients to further reduce bitrate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameter. In some examples, transform scaling and quantization component 213 can perform a scan of the matrix including the quantized transform coefficients. The quantized transform coefficients are forwarded to the header formatting and CABAC component 231 to be encoded into the bitstream.

스케일링 및 역변환 컴포넌트(229)는 모션 추정을 지원하기 위해 변환 스케일링 및 양자화 컴포넌트(213)의 역 동작을 적용한다. 스케일링 및 역변환 컴포넌트(229)는, 예를 들어, 다른 현재 블록에 대한 예측 블록이 될 수 있는 참조 블록으로서 나중에 사용하기 위해 픽셀 도메인에서 잔여 블록을 재구성하기 위해 역 스케일링, 변환 및/또는 양자화를 적용한다. 모션 추정 컴포넌트(221) 및/또는 모션 보상 컴포넌트(219)는 차후 블록/프레임의 모션 추정에 사용하기 위해 대응하는 예측 블록에 잔여 블록을 다시 추가함으로써 참조 블록을 카운팅할 수 있다. 스케일링, 양자화 및 변환 중에 생성된 아티팩트를 완화하기 위해 재구성된 참조 블록에 필터가 적용된다. 그렇지 않으면 후속 블록이 예측될 때 이러한 아티팩트가 부정확한 예측을 유발하고 추가 아티팩트를 생성할 수 있다.The scaling and inverse transform component 229 applies the inverse operation of the transform scaling and quantization component 213 to support motion estimation. Scaling and inverse transform component 229 applies inverse scaling, transform and/or quantization to reconstruct the residual block in the pixel domain for later use as a reference block, which can be, for example, a predictive block for another current block. do. Motion estimation component 221 and/or motion compensation component 219 can count the reference block by adding the residual block back to the corresponding prediction block for use in motion estimation of a subsequent block/frame. A filter is applied to the reconstructed reference block to mitigate artifacts created during scaling, quantization and transformation. Otherwise, when subsequent blocks are predicted, these artifacts can lead to inaccurate predictions and create additional artifacts.

필터 제어 분석 컴포넌트(227) 및 인-루프 필터 컴포넌트(225)는 필터를 잔여 블록 및/또는 재구성된 이미지 블록에 적용한다. 예를 들어, 스케일링 및 역 변환 컴포넌트(229)로부터의 변환된 잔여 블록은 원본 이미지 블록을 재구성하기 위해 인트라-픽처 예측 컴포넌트(217) 및/또는 모션 보상 컴포넌트(219)로부터 대응하는 예측 블록과 결합될 수 있다. 그런 다음 필터는 재구성된 이미지 블록에 적용될 수 있다. 일부 예에서, 필터는 대신 잔여 블록에 적용될 수 있다. 도 2의 다른 컴포넌트와 마찬가지로, 필터 제어 분석 컴포넌트(227) 및 인-루프 필터 컴포넌트(225)는 고도로 통합되고 함께 구현될 수 있지만 개념적 목적을 위해 별도로 도시된다. 재구성된 참조 블록에 적용된 필터는 특정 공간 영역에 적용되며 이러한 필터가 적용되는 방식을 조정하는 여러 파라미터를 포함한다. 필터 제어 분석 컴포넌트(227)는 재구성된 참조 블록을 분석하여 그러한 필터가 적용되어야 하는 위치를 결정하고 대응하는 파라미터를 설정한다. 이러한 데이터는 인코딩을 위한 필터 제어 데이터로서 헤더 포맷팅 및 CABAC 컴포넌트(231)로 포워딩된다. 인-루프 필터 컴포넌트(225)는 필터 제어 데이터에 기초하여 이러한 필터를 적용한다. 필터는 디 블로킹 필터, 노이즈 억제 필터, SAO 필터 및 적응 루프 필터를 포함할 수 있다. 그러한 필터는 예에 따라 공간/픽셀 도메인에 (예를 들어, 재구성된 픽셀 블록에) 또는 주파수 도메인에 적용될 수 있다.Filter control analysis component 227 and in-loop filter component 225 apply filters to residual blocks and/or reconstructed image blocks. For example, the transformed residual block from scaling and inverse transform component 229 is combined with the corresponding prediction block from intra-picture prediction component 217 and/or motion compensation component 219 to reconstruct the original image block. It can be. Filters can then be applied to the reconstructed image blocks. In some examples, a filter may be applied to the residual block instead. Like the other components in FIG. 2 , filter control analysis component 227 and in-loop filter component 225 are highly integrated and can be implemented together, but are shown separately for conceptual purposes. Filters applied to the reconstructed reference block are applied to a specific spatial region and contain several parameters that control how these filters are applied. The filter control analysis component 227 analyzes the reconstructed reference block to determine where such filters should be applied and sets the corresponding parameters. This data is forwarded to the header formatting and CABAC component 231 as filter control data for encoding. In-loop filter component 225 applies these filters based on the filter control data. Filters may include deblocking filters, noise suppression filters, SAO filters and adaptive loop filters. Such a filter may be applied in the spatial/pixel domain (eg, to the reconstructed pixel block) or in the frequency domain, depending on the example.

인코더로서 동작할 때, 필터링된 재구성된 이미지 블록, 잔여 블록 및/또는 예측 블록은 위에서 논의된 바와 같이 모션 추정에서 나중에 사용하기 위해 디코딩된 픽처 버퍼(223)에 저장된다. 디코더로서 동작할 때, 디코딩된 픽처 버퍼(223)는 출력 비디오 신호의 일부로서 디스플레이를 향해 재구성 및 필터링된 블록을 저장하고 포워딩한다. 디코딩된 픽처 버퍼(223)는 예측 블록, 잔여 블록 및/또는 재구성된 이미지 블록을 저장할 수 있는 임의의 메모리 장치일 수 있다.When operating as an encoder, the filtered reconstructed image blocks, residual blocks and/or prediction blocks are stored in the decoded picture buffer 223 for later use in motion estimation as discussed above. Acting as a decoder, the decoded picture buffer 223 stores and forwards the reconstructed and filtered blocks towards the display as part of the output video signal. The decoded picture buffer 223 may be any memory device capable of storing prediction blocks, residual blocks and/or reconstructed image blocks.

헤더 포맷팅 및 CABAC 컴포넌트(231)는 코덱 시스템(200)의 다양한 컴포넌트로부터 데이터를 수신하고 디코더를 향한 전송을 위해 이러한 데이터를 코딩된 비트스트림으로 인코딩한다. 구체적으로, 헤더 포맷팅 및 CABAC 컴포넌트(231)는 일반 제어 데이터 및 필터 제어 데이터와 같은 제어 데이터를 인코딩하기 위해 다양한 헤더를 생성한다. 또한, 인트라-예측 및 움직임 데이터를 포함하는 예측 데이터는 물론 양자화된 변환 계수 데이터 형태의 잔여 데이터가 모두 비트스트림으로 인코딩된다. 최종 비트스트림은 원래 파티셔닝된 비디오 신호(201)를 재구성하기 위해 디코더에 의해 요구되는 모든 정보를 포함한다. 이러한 정보는 또한 인트라-예측 모드 인덱스 테이블(코드워드 매핑 테이블이라고도 함), 다양한 블록에 대한 인코딩 컨텍스트의 정의, 가장 가능성 있는 인트라-예측 모드의 지시, 파티션 정보의 지시 등을 포함할 수 있다. 이러한 데이터는 엔트로피 코딩을 사용하여 인코딩될 수 있다. 예를 들어, 정보는 컨텍스트 적응 가변 길이 코딩(Context adaptive variable length coding, CAVLC), CABAC 및 신택스 기반 컨텍스트 적응 이진 산술 코딩(syntax-based context adaptive binary arithmetic coding, SBAC), 확률 간격 파티셔닝 엔트로피(probability interval partitioning entropy, PIPE) 코딩 또는 다른 엔트로피 코딩 기술을 사용하여 인코딩될 수 있다. 엔트로피 코딩에 이어서, 코딩된 비트스트림은 다른 장치(예를 들어, 비디오 디코더)로 전송되거나 나중에 전송 또는 검색을 위해 보관될 수 있다.Header formatting and CABAC component 231 receives data from the various components of codec system 200 and encodes this data into a coded bitstream for transmission towards a decoder. Specifically, header formatting and CABAC component 231 generates various headers to encode control data such as general control data and filter control data. In addition, intra-prediction and prediction data including motion data as well as residual data in the form of quantized transform coefficient data are all encoded into the bitstream. The final bitstream contains all information required by the decoder to reconstruct the original partitioned video signal 201 . This information may also include intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of encoding contexts for various blocks, indications of most probable intra-prediction modes, indications of partition information, and the like. Such data may be encoded using entropy coding. For example, the information may include context adaptive variable length coding (CAVLC), CABAC and syntax-based context adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy partitioning entropy (PIPE) coding or other entropy coding techniques. Following entropy coding, the coded bitstream may be transmitted to another device (eg, a video decoder) or archived for later transmission or retrieval.

도 3은 예시적인 비디오 인코더(300)를 예시하는 블록도이다. 비디오 인코더(300)는 코덱 시스템(200)의 인코딩 기능을 구현하고 및/또는 방법(100)의 단계 101, 103, 105, 107 및/또는 109를 구현하기 위해 사용될 수 있다. 인코더(300) 입력 비디오 신호를 분할하여 파티셔닝된 비디오 신호(301)를 생성하며, 이는 파티셔닝된 비디오 신호(201)와 실질적으로 유사하다. 파티셔닝된 비디오 신호(301)는 인코더(300)의 컴포넌트에 의해 압축되고 비트스트림으로 인코딩된다.3 is a block diagram illustrating an exemplary video encoder 300 . Video encoder 300 may be used to implement the encoding function of codec system 200 and/or to implement steps 101 , 103 , 105 , 107 and/or 109 of method 100 . The encoder (300) divides the input video signal to produce a partitioned video signal (301), which is substantially similar to the partitioned video signal (201). The partitioned video signal 301 is compressed by components of the encoder 300 and encoded into a bitstream.

구체적으로, 파티셔닝된 비디오 신호(301)는 인트라-예측을 위해 인트라-픽처 예측 컴포넌트(317)로 포워딩된다. 인트라-픽처 예측 컴포넌트(317)는 인트라-픽처 추정 컴포넌트(215) 및 인트라-픽처 예측 컴포넌트(217)와 실질적으로 유사할 수 있다. 파티셔닝된 비디오 신호(301)는 또한 디코딩된 픽처 버퍼(323)의 참조 블록에 기초하여 인터-예측을 위한 모션 보상 컴포넌트(321)로 포워딩된다. 모션 보상 컴포넌트(321)는 모션 추정 컴포넌트(221) 및 모션 보상 컴포넌트(219)와 실질적으로 유사할 수 있다. 인트라-픽처 예측 컴포넌트(317) 및 모션 보상 컴포넌트(321)로부터의 예측 블록 및 잔여 블록은 잔여 블록의 변환 및 양자화를 위한 변환 및 양자화 컴포넌트(313)로 포워딩된다. 변환 및 양자화 컴포넌트(313)는 변환 스케일링 및 양자화 컴포넌트(213)와 실질적으로 유사할 수 있다. 변환 및 양자화된 잔여 블록 및 대응하는 예측 블록은 (관련 제어 데이터와 함께) 엔트로피 코딩 컴포넌트(331)로 포워딩되어 비트스트림으로 코딩된다. 엔트로피 코딩 컴포넌트(331)는 헤더 포맷팅 및 CABAC 컴포넌트(231)와 실질적으로 유사할 수 있다.Specifically, the partitioned video signal 301 is forwarded to an intra-picture prediction component 317 for intra-prediction. Intra-picture prediction component 317 may be substantially similar to intra-picture estimation component 215 and intra-picture prediction component 217 . The partitioned video signal 301 is also forwarded to a motion compensation component 321 for inter-prediction based on the reference block of the decoded picture buffer 323 . Motion compensation component 321 may be substantially similar to motion estimation component 221 and motion compensation component 219 . The prediction block and residual block from intra-picture prediction component 317 and motion compensation component 321 are forwarded to transform and quantization component 313 for transform and quantization of the residual block. The transform and quantization component 313 may be substantially similar to the transform scaling and quantization component 213 . The transformed and quantized residual block and the corresponding predictive block (along with associated control data) are forwarded to the entropy coding component 331 to be coded into a bitstream. Entropy coding component 331 may be substantially similar to header formatting and CABAC component 231 .

변환 및 양자화된 잔여 블록 및/또는 대응하는 예측 블록은 또한 모션 보상 컴포넌트(321)에 의해 사용하기 위한 참조 블록으로 재구성하기 위해 변환 및 양자화 컴포넌트(313)로부터 역 변환 및 양자화 컴포넌트(329)로 포워딩된다. 역변환 및 양자화 컴포넌트(329)는 스케일링 및 역 변환 컴포넌트(229)와 실질적으로 유사할 수 있다. 인-루프 필터 컴포넌트(325)의 인-루프 필터는 또한 예에 따라 잔여 블록 및/또는 재구성된 참조 블록에 적용된다. 인-루프 필터 컴포넌트(325)는 필터 제어 분석 컴포넌트(227) 및 인-루프 필터 컴포넌트(225)와 실질적으로 유사할 수 있다. 인-루프 필터 컴포넌트(325)는 아래에서 논의되는 바와 같이 노이즈 억제 필터를 포함하는 다중 필터를 포함할 수 있다. 필터링된 블록은 모션 보상 컴포넌트(321)에 의해 참조 블록을 사용하기 위해 디코딩된 픽처 버퍼(323)에 저장된다. 디코딩된 픽처 버퍼(323)는 디코딩된 픽처 버퍼(223)와 실질적으로 유사할 수 있다.The transformed and quantized residual block and/or the corresponding prediction block is also forwarded from the transform and quantization component 313 to the inverse transform and quantization component 329 for reconstructing into a reference block for use by the motion compensation component 321. do. Inverse transform and quantization component 329 may be substantially similar to scaling and inverse transform component 229 . The in-loop filter of the in-loop filter component 325 is also applied to the residual block and/or the reconstructed reference block according to an example. The in-loop filter component 325 may be substantially similar to the filter control analysis component 227 and the in-loop filter component 225 . In-loop filter component 325 may include multiple filters including noise suppression filters as discussed below. The filtered block is stored in the decoded picture buffer 323 for use as a reference block by the motion compensation component 321 . The decoded picture buffer 323 may be substantially similar to the decoded picture buffer 223 .

도 4는 예시적인 비디오 디코더(400)를 예시하는 블록도이다. 비디오 디코더(400)는 코덱 시스템(200)의 디코딩 기능을 구현하고 및/또는 방법(100)의 단계 111, 113, 115 및/또는 117을 구현하기 위해 사용될 수 있다. 디코더(400)는 예를 들어 인코더(300)로부터의 비트스트림은 최종 사용자에게 디스플레이하기 위해 비트스트림에 기초하여 재구성된 출력 비디오 신호를 생성한다.4 is a block diagram illustrating an exemplary video decoder 400 . Video decoder 400 may be used to implement the decoding function of codec system 200 and/or to implement steps 111 , 113 , 115 and/or 117 of method 100 . Decoder 400 generates a reconstructed output video signal based on, for example, a bitstream from encoder 300 for display to an end user.

비트스트림은 엔트로피 디코딩 컴포넌트(433)에 의해 수신된다. 엔트로피 디코딩 컴포넌트(433)는 엔트로피 인코딩 컴포넌트(331)의 역기능을 수행한다. 엔트로피 디코딩 컴포넌트(433)는 CAVLC, CABAC, SBAC 및 PIPE 코딩 또는 기타 엔트로피 코딩 기술과 같은 엔트로피 디코딩 방식을 구현하도록 구성된다. 예를 들어, 엔트로피 디코딩 컴포넌트(433)는 비트스트림에서 코드워드로 인코딩된 추가 데이터를 해석하기 위한 컨텍스트를 제공하기 위해 헤더 정보를 사용할 수 있다. 디코딩된 정보는 일반 제어 데이터, 필터 제어 데이터, 파티션 정보, 모션 데이터, 예측 데이터 및 잔여 블록으로부터의 양자화된 변환 계수와 같은 비디오 신호를 디코딩하기 위한 임의의 원하는 정보를 포함한다. 양자화된 변환 계수는 잔여 블록으로의 재구성을 위해 역변환 및 양자화 컴포넌트(429)로 포워딩된다. 역변환 및 양자화 컴포넌트(429)는 역변환 및 양자화 컴포넌트(329)와 실질적으로 유사할 수 있다.The bitstream is received by entropy decoding component 433. Entropy decoding component 433 performs the inverse function of entropy encoding component 331 . Entropy decoding component 433 is configured to implement entropy decoding schemes such as CAVLC, CABAC, SBAC and PIPE coding or other entropy coding techniques. For example, entropy decoding component 433 can use the header information to provide context for interpreting additional data encoded as codewords in the bitstream. The decoded information includes any desired information for decoding the video signal, such as general control data, filter control data, partition information, motion data, prediction data and quantized transform coefficients from the residual block. The quantized transform coefficients are forwarded to an inverse transform and quantization component 429 for reconstruction into a residual block. Inverse transform and quantization component 429 may be substantially similar to inverse transform and quantization component 329 .

재구성된 잔여 블록 및/또는 예측 블록은 인트라-예측 연산에 기초하여 이미지 블록으로 재구성하기 위해 인트라-픽처 예측 컴포넌트(417)로 포워딩된다. 인트라-픽처 예측 컴포넌트(417)는 인트라-픽처 예측 컴포넌트(317)와 실질적으로 유사할 수 있지만 반대로 동작한다. 구체적으로, 인트라-픽처 예측 컴포넌트(417)는 프레임에서 참조 블록을 찾기 위해 예측 모드를 사용하고 그 결과에 잔여 블록을 적용하여 인트라-예측된 이미지 블록을 재구성한다. 재구성된 인트라-예측된 이미지 블록 및/또는 잔여 블록 및 대응하는 인터-예측 데이터는 인-루프 필터 컴포넌트(425)를 통해 디코딩된 픽처 버퍼 컴포넌트(423)로 포워딩되며, 이는 디코딩된 픽처 버퍼 컴포넌트(323) 및 인-루프 필터 컴포넌트(325)와 각각 실질적으로 유사할 수 있다. 인-루프 필터 컴포넌트(425)는 재구성된 이미지 블록, 잔여 블록 및/또는 예측 블록을 필터링하고, 이러한 정보는 디코딩된 픽처 버퍼 컴포넌트(423)에 저장된다. 디코딩된 픽처 버퍼 컴포넌트(423)로부터의 재구성된 이미지 블록은 모션 보상으로 포워딩된다. 인터-예측을 위한 컴포넌트 421. 모션 보상 컴포넌트(421)는 모션 보상 컴포넌트(321)와 실질적으로 유사할 수 있지만 반대로 동작할 수 있다. 구체적으로, 모션 보상 컴포넌트(421)는 참조 블록으로부터 모션 벡터를 사용하여 예측 블록을 생성하고 결과에 잔여 블록을 적용하여 이미지 블록을 재구성한다. 결과적인 재구성된 블록은 또한 인-루프 필터 컴포넌트(425)를 통해 디코딩된 픽처 버퍼 컴포넌트(423)로 포워딩될 수 있다. 디코딩된 픽처 버퍼 컴포넌트(423)는 분할 정보를 통해 프레임으로 재구성될 수 있는 추가의 재구성된 이미지 블록을 계속 저장한다. 이러한 프레임은 시퀀스에 배치될 수도 있다. 시퀀스는 재구성된 출력 비디오 신호로 디스플레이에 출력된다.The reconstructed residual blocks and/or prediction blocks are forwarded to intra-picture prediction component 417 for reconstruction into image blocks based on intra-prediction operations. Intra-picture prediction component 417 can be substantially similar to intra-picture prediction component 317 but operates in reverse. Specifically, intra-picture prediction component 417 uses a prediction mode to find a reference block in a frame and applies the residual block to the result to reconstruct an intra-predicted image block. The reconstructed intra-predicted image blocks and/or residual blocks and the corresponding inter-prediction data are forwarded through an in-loop filter component 425 to a decoded picture buffer component 423, which is a decoded picture buffer component ( 323) and in-loop filter component 325, respectively. An in-loop filter component 425 filters the reconstructed image blocks, residual blocks and/or predictive blocks, and this information is stored in a decoded picture buffer component 423 . Reconstructed image blocks from the decoded picture buffer component 423 are forwarded to motion compensation. Components for Inter-Prediction 421. Motion compensation component 421 can be substantially similar to motion compensation component 321, but can operate in reverse. Specifically, motion compensation component 421 reconstructs an image block by generating a prediction block using a motion vector from a reference block and applying a residual block to the result. The resulting reconstructed block may also be forwarded through an in-loop filter component 425 to a decoded picture buffer component 423. The decoded picture buffer component 423 continues to store additional reconstructed image blocks that can be reconstructed into frames via segmentation information. These frames may be placed in a sequence. The sequence is output to the display as a reconstructed output video signal.

인터-예측.Inter-prediction.

비디오 코딩 프로세스 동안 비디오 데이터를 압축하기 위해 많은 방식이 함께 사용된다. 예를 들어 비디오 시퀀스는 이미지 프레임으로 파티셔닝된다. 그런 다음 이미지 프레임이 이미지 블록으로 파티셔닝된다. 그런 다음 이미지 블록은 인터-예측(다른 프레임의 블록 간의 상관) 또는 인트라-예측(동일한 프레임의 블록 간의 상관)에 의해 압축될 수 있다.Many schemes are used together to compress video data during the video coding process. For example, a video sequence is partitioned into image frames. The image frame is then partitioned into image blocks. The image blocks can then be compressed by inter-prediction (correlation between blocks in different frames) or intra-prediction (correlation between blocks in the same frame).

인터-예측은 코딩 트리 단위(coding tree unit, CTU), 코딩 트리 블록(coding tree block, CTB), 코딩 단위(coding unit, CU), 서브 CU 등과 같은 코딩 객체가 비디오 시퀀스의 여러 프레임에 나타날 때 사용된다. 각 프레임에서 동일한 객체를 코딩하는 대신 객체를 참조 프레임에 코딩하고 모션 벡터(motion vector, MV)를 사용하여 객체의 움직임 궤적을 표시한다. 물체의 모션 궤적은 시간에 따른 물체의 움직임이다. MV는 프레임 사이에서 위치가 변경되는 객체의 방향과 크기를 나타내는 벡터이다. 객체와 MV는 비트스트림으로 코딩되고 디코더에 의해 디코딩될 수 있다. 코딩 효율을 더 높이고 인코딩 크기를 줄이기 위해 MV는 비트스트림에서 생략되고 디코더에서 유도될 수 있다. 예를 들어, 한 쌍의 참조 프레임이 사용될 수 있다. 참조 프레임은 관련 프레임을 코딩할 때 참조로 코딩할 수 있는 데이터를 포함하는 비트스트림의 프레임이다. 양방향 매칭 및/또는 템플릿 매칭과 같은 매칭 알고리즘이 양쪽 참조 프레임에서 코딩 객체의 위치를 결정하기 위해 사용될 수 있다. 양방향 매칭 알고리즘은 이전 프레임의 블록을 현재 프레임의 블록과 매칭한다. 템플릿 매칭 알고리즘은 하나 이상의 참조 프레임에서 현재 블록에 인접한 블록이 있는 현재 블록에 인접한 블록을 매칭한다. 두 참조 프레임에서 물체의 위치가 결정되면 참조 프레임 사이에서 물체의 움직임을 나타내는 MV를 결정할 수 있다. 그런 다음 MV를 사용하여 참조 프레임 사이의 프레임에 객체를 배치할 수 있다. 구체적인 예로 전체 CU에 대해 초기 MV를 결정할 수 있다. 그런 다음 로컬 검색을 사용하여 초기 MV를 구체화할 수 있다. 또한 객체의 서브 CU 컴포넌트에 대한 MV는 개선된 초기 MV를 기반으로 결정 및 개선될 수 있다. 이러한 접근 방식은 물체의 움직임 궤적이 참조 프레임 사이에서 연속적인 한 물체의 정확한 위치를 나타낸다.Inter-prediction is when coding objects such as coding tree units (CTUs), coding tree blocks (CTBs), coding units (CUs), sub-CUs, etc. appear in multiple frames of a video sequence. used Instead of coding the same object in each frame, the object is coded in a reference frame and a motion vector (MV) is used to indicate the motion trajectory of the object. The motion trajectory of an object is the movement of the object over time. MV is a vector representing the direction and size of an object whose position changes between frames. Objects and MVs can be coded into a bitstream and decoded by a decoder. To further increase coding efficiency and reduce encoding size, MVs can be omitted from the bitstream and derived at the decoder. For example, a pair of reference frames may be used. A reference frame is a frame of a bitstream that contains data that can be coded as a reference when coding a related frame. Matching algorithms such as bi-directional matching and/or template matching may be used to determine the location of a coded object in both reference frames. A bidirectional matching algorithm matches blocks in the previous frame with blocks in the current frame. A template matching algorithm matches blocks adjacent to a current block with blocks adjacent to the current block in one or more frames of reference. When the position of the object is determined in the two reference frames, it is possible to determine the MV representing the motion of the object between the reference frames. MVs can then be used to place objects in frames between reference frames. As a specific example, initial MVs may be determined for all CUs. You can then refine the initial MV using local search. Also, the MV for the sub-CU component of the object may be determined and improved based on the improved initial MV. This approach represents the exact location of an object as long as the object's motion trajectory is continuous between reference frames.

도 5는 예를 들어 블록 압축 단계 105, 블록 디코딩 단계 113, 모션 추정 컴포넌트(221), 모션 보상 컴포넌트(219), 모션 보상 컴포넌트(321) 및/또는 모션 보상 컴포넌트(421)에서 모션 벡터(MV)를 결정하기 위해 수행되는 단방향 인터-예측(500)의 예를 예시하는 개략도이다.5 shows, for example, the motion vector (MV) in block compression step 105, block decoding step 113, motion estimation component 221, motion compensation component 219, motion compensation component 321 and/or motion compensation component 421. ) is a schematic diagram illustrating an example of unidirectional inter-prediction 500 performed to determine

단방향 인터-예측(500)은 현재 프레임(510)에서 현재 블록(511)을 예측하기 위해 참조 블록(531)이 있는 참조 프레임(530)을 사용한다. 참조 프레임(530)은 도시된 바와 같이 시간적으로 현재 프레임(510) 뒤에 위치할 수 있지만, 일부 예들에서 시간적으로 현재 프레임(510) 앞에 위치할 수도 있다. 현재 프레임(510)은 특정 시간에 인코딩/디코딩되는 예시적인 프레임/영상이다. 현재 프레임(510)은 참조 프레임(530)의 참조 블록(531)에 있는 객체와 일치하는 현재 블록(511)의 객체를 포함한다. 참조 프레임(530)은 현재 프레임(510)을 인코딩하기 위한 참조로 사용되는 프레임고, 참조 블록(531)은 현재 프레임(510)의 현재 블록(511)에도 포함된 객체를 포함하는 참조 프레임(530)의 블록이다. 현재 블록(511)은 코딩 프로세스의 특정 포인트에서 인코딩/디코딩되는 임의의 코딩 단위이다. 현재 블록(511)은 분할된 전체 블록일 수도 있고, 아핀 인터-예측의 경우 서브 블록일 수도 있다. 현재 프레임(510)은 시간적 거리(temporal distance, TD)(533)만큼 참조 프레임(530)과 분리된다. TD(533)는 비디오 시퀀스에서 현재 프레임(510)과 참조 프레임(530) 사이의 시간의 양을 나타낸다. TD(533)에 의해 표현되는 기간 동안, 현재 블록(511)의 객체는 현재 프레임(510)의 위치에서 참조 프레임(530)의 다른 위치(예를 들어, 참조 블록(531)의 위치)로 이동한다. 예를 들어, 객체는 시간에 따른 객체의 이동 방향 인 모션 궤적(513)을 따라 이동할 수 있다. 모션 벡터(535)는 TD(533)를 통해 모션 궤적(513)을 따라 물체의 움직임의 방향과 크기를 설명한다. 따라서, 인코딩된 MV(535) 및 참조 블록(531)은 현재 블록(51)을 현재 프레임(510)에서 현재 블록(511)의 랜드 위치로 재구성하기에 충분한 정보를 제공한다.Unidirectional inter-prediction 500 uses a reference frame 530 with a reference block 531 to predict a current block 511 in the current frame 510 . The reference frame 530 may be positioned temporally behind the current frame 510 as shown, but may also be positioned temporally before the current frame 510 in some examples. Current frame 510 is an exemplary frame/video being encoded/decoded at a specific time. The current frame 510 includes objects in the current block 511 that match objects in the reference block 531 of the reference frame 530 . The reference frame 530 is a frame used as a reference for encoding the current frame 510, and the reference block 531 includes an object also included in the current block 511 of the current frame 510. ) is a block of A current block 511 is any coding unit that is encoded/decoded at a particular point in the coding process. The current block 511 may be an entire divided block or may be a sub-block in the case of affine inter-prediction. The current frame 510 is separated from the reference frame 530 by a temporal distance (TD) 533 . TD 533 represents the amount of time between a current frame 510 and a reference frame 530 in a video sequence. During the period represented by the TD 533, the object of the current block 511 moves from the location of the current frame 510 to another location of the reference frame 530 (eg, the location of the reference block 531). do. For example, an object may move along a motion trajectory 513 that is a moving direction of the object over time. The motion vector 535 describes the direction and magnitude of the object's motion along the motion trajectory 513 via the TD 533. Thus, the encoded MV 535 and the reference block 531 provide information sufficient to reconstruct the current block 51 to the land position of the current block 511 in the current frame 510 .

도 6은 예를 들어, 블록 압축 단계 105, 블록 디코딩 단계 113, 모션 추정 컴포넌트(221), 모션 보상 컴포넌트(219), 모션 보상 컴포넌트(321), 및/또는 모션 보상 컴포넌트(421)에서 MV를 결정하기 위해 수행되는 양방향 인터-예측(600)의 예를 나타내는 개략도이다. 예를 들어, 양방향 인터-예측(600)은 인터-예측 모드에서 블록에 대한 모션 벡터를 결정하고 및/또는 아핀 인터-예측 모드에서 서브-블록에 대한 모션 벡터를 결정하기 위해 사용될 수 있다.6 shows, for example, block compression step 105, block decoding step 113, motion estimation component 221, motion compensation component 219, motion compensation component 321, and/or motion compensation component 421 converting MV to It is a schematic diagram illustrating an example of bi-directional inter-prediction 600 performed to determine. For example, bidirectional inter-prediction 600 can be used to determine motion vectors for blocks in inter-prediction mode and/or motion vectors for sub-blocks in affine inter-prediction mode.

양방향 인터-예측(600)은 단방향 인터-예측(500)과 유사하지만, 현재 프레임(610)에서 현재 블록(611)을 예측하기 위해 한 쌍의 참조 프레임을 사용한다. 따라서, 현재 프레임(610) 및 현재 블록(611)은 각각 현재 프레임(510) 및 현재 블록(511)과 실질적으로 유사하다. 현재 프레임(610)은 비디오 시퀀스에서 현재 프레임(610) 이전에 발생하는 선행 참조 프레임(620)과 비디오 시퀀스에서 현재 프레임(610) 이후에 발생하는 후속 참조 프레임(630) 사이에 시간적으로 위치한다. 선행 참조 프레임(620) 및 후속 참조 프레임(630)은 그렇지 않으면 참조 프레임(530)과 실질적으로 유사하다.Bidirectional inter-prediction (600) is similar to unidirectional inter-prediction (500), but uses a pair of reference frames to predict the current block (611) in the current frame (610). Accordingly, the current frame 610 and the current block 611 are substantially similar to the current frame 510 and the current block 511, respectively. The current frame 610 is located temporally between a preceding reference frame 620 occurring before the current frame 610 in the video sequence and a subsequent reference frame 630 occurring after the current frame 610 in the video sequence. Prior reference frame 620 and subsequent reference frame 630 are otherwise substantially similar to reference frame 530 .

현재 블록(611)은 선행 참조 프레임(620)에서 선행 참조 블록(621) 및 후속 참조 프레임(630)에서 후속 참조 블록(631)에 매칭된다. 이러한 매칭은 비디오 시퀀스 과정에서, 객체는 선행 참조 블록(621)의 위치에서 모션 궤적(613)을 따라 현재 블록(611)을 통해 후속 참조 블록(631)의 위치로 이동한다는 것을 나타낸다. 현재 프레임(610)은 선행 참조 프레임(620)으로부터 일부의 선행 시간 거리(preceding temporal distance, TD0)(623)만큼 분리되고 일부 후속 시간 거리(TD1)(633)만큼 후속 참조 프레임(630)으로부터 분리된다. TD0(623)은 비디오 시퀀스에서 선행 참조 프레임(620)과 현재 프레임(610) 사이의 시간 양을 나타낸다. TD1(633)은 비디오 시퀀스에서 현재 프레임(610)과 후속 참조 프레임(630) 사이의 시간 양을 나타낸다. 따라서, 객체는 TD0(623)에 의해 표시된 시간 동안 모션 궤적(613)을 따라 선행 참조 블록(621)에서 현재 블록(611)으로 이동한다. 또한, 객체는 TD1(633)로 표시된 시간 동안 모션 궤적(613)을 따라 현재 블록(611)에서 후속 참조 블록(631)으로 이동한다.The current block 611 is matched to the preceding reference block 621 in the preceding reference frame 620 and to the subsequent reference block 631 in the subsequent reference frame 630 . This matching indicates that during the video sequence, the object moves from the position of the preceding reference block 621 to the position of the subsequent reference block 631 through the current block 611 along the motion trajectory 613 . The current frame 610 is separated from the preceding reference frame 620 by some preceding temporal distance (TD0) 623 and from the subsequent reference frame 630 by some subsequent temporal distance (TD1) 633. do. TD0 623 represents the amount of time between the preceding reference frame 620 and the current frame 610 in the video sequence. TD1 633 represents the amount of time between a current frame 610 and a subsequent reference frame 630 in a video sequence. Accordingly, the object moves from the previous reference block 621 to the current block 611 along the motion trajectory 613 during the time indicated by TD0 623 . In addition, the object moves from the current block 611 to the next reference block 631 along the motion trajectory 613 during the time indicated by TD1 633 .

선행 모션 벡터(MV0)(625)는 TD0(623)(예를 들어, 선행 참조 프레임(620)과 현재 프레임(610) 사이)에 걸쳐 모션 궤적(613)을 따라 물체의 이동의 방향 및 크기를 설명한다. 후속 모션 벡터(MV1)(635)는 (예를 들어, 현재 프레임(610)과 후속 참조 프레임(630) 사이의) TD1(633)에 걸쳐 모션 궤적(613)을 따라 물체의 이동의 방향 및 크기를 설명한다. 이와 같이, 양방향 인터-예측(600)에서, 현재 블록(611)은 선행 참조 블록(621) 및/또는 후속 참조 블록(631), MV0(625) 및 MV1(635)을 사용함으로써 코딩되고 재구성될 수 있다.The preceding motion vector (MV0) 625 measures the direction and magnitude of the object's movement along the motion trajectory 613 across TD0 623 (e.g., between the preceding reference frame 620 and the current frame 610). Explain. The subsequent motion vector (MV1) 635 is the direction and magnitude of the object's movement along the motion trajectory 613 over TD1 633 (e.g., between the current frame 610 and the subsequent reference frame 630). explain Thus, in bi-directional inter-prediction 600, current block 611 can be coded and reconstructed by using preceding reference block 621 and/or subsequent reference block 631, MV0 625 and MV1 635. can

인트라 예측.Intra prediction.

비디오 코딩 프로세스 동안 비디오 데이터를 압축하기 위해 많은 방식이 함께 사용된다. 예를 들어 비디오 시퀀스는 이미지 프레임으로 파티셔닝된다. 그런 다음 이미지 프레임이 이미지 블록으로 파티셔닝된다. 그런 다음 이미지 블록은 인터-예측(다른 프레임의 블록 간의 상관) 또는 인트라-예측(동일한 프레임의 블록 간의 상관)에 의해 압축될 수 있다. 인트라-예측에서는 샘플의 기준선에서 현재 이미지 블록을 샘플의 기준선으로부터 예측한다. 기준선은 인접 블록이라고도 하는 인접 이미지 블록의 샘플이 포함된다. 현재 블록의 샘플은 가장 가까운 루마(광) 또는 크로마(색상) 값을 가진 기준선의 샘플과 일치한다. 현재 블록은 일치하는 샘플을 나타내는 예측 모드로 코딩된다. 예측 모드에는 각도 예측 모드, 직류(DC) 모드 및 평면 모드가 포함된다. 예측 모드에서 예측한 값과 실제 값의 차이는 잔여 블록에 잔여 값으로 코딩된다.Many schemes are used together to compress video data during the video coding process. For example, a video sequence is partitioned into image frames. The image frame is then partitioned into image blocks. The image blocks can then be compressed by inter-prediction (correlation between blocks in different frames) or intra-prediction (correlation between blocks in the same frame). In intra-prediction, the current image block is predicted from the baseline of the sample. Baselines include samples from adjacent image blocks, also referred to as adjacent blocks. The samples in the current block match the samples in the baseline with the closest luma (light) or chroma (color) values. The current block is coded in a prediction mode representing matching samples. Prediction modes include angular prediction mode, direct current (DC) mode, and planar mode. The difference between the value predicted in the prediction mode and the actual value is coded as a residual value in the residual block.

도 7은 비디오 코딩에 사용되는 예시적인 인트라-예측 모드(700)를 예시하는 개략도이다. 예를 들어, 인트라-예측 모드(700)는 방법(100)의 단계 105 및 113, 인트라-픽처 추정 컴포넌트(215), 및 코덱 시스템(200)의 인트라-픽처 예측 컴포넌트(217), 인코더(300)의 인트라-픽처 예측 컴포넌트(317), 및/또는 디코더(400)의 인트라-픽처 예측 컴포넌트(417)에 의해 적용될 수 있다. 구체적으로, 인트라-예측 모드(700)는 이미지 블록을 선택된 예측 모드 및 나머지 잔여 블록을 포함하는 예측 블록으로 압축하기 위해 사용될 수 있다.7 is a schematic diagram illustrating an exemplary intra-prediction mode 700 used for video coding. For example, intra-prediction mode 700 includes steps 105 and 113 of method 100 , intra-picture estimation component 215 , and intra-picture prediction component 217 of codec system 200 , encoder 300 . ), and/or the intra-picture prediction component 417 of the decoder 400 . Specifically, the intra-prediction mode 700 may be used to compress an image block into a prediction block comprising the selected prediction mode and the remaining residual blocks.

위에서 언급한 바와 같이, 인트라-예측은 현재 이미지 블록을 하나 이상의 인접 블록의 대응하는 샘플 또는 샘플들에 매칭하는 것을 포함한다. 그런 다음 현재 이미지 블록은 선택된 예측 모드 인덱스 및 잔여 블록으로 표시될 수 있으며, 이것은 현재 이미지 블록에 포함된 모든 루마/크로마 값을 나타내는 것보다 훨씬 작다. 인트라-예측은 사용 가능한 참조 프레임이 없거나 현재 블록 또는 프레임에 대해 인터-예측(Inter-predication) 코딩이 사용되지 않을 때 사용할 수 있다. 인트라-예측을 위한 참조 샘플은 동일한 프레임에서 이전에 코딩된(또는 재구성된) 인접 블록으로부터 유도될 수 있다. H.264 및 H.265/HEVC로도 알려진 고급 비디오 코딩(Advanced Video Coding, AVC)는 둘 다 인트라-예측을 위한 참조 샘플로 인접한 블록의 경계 샘플의 기준선을 사용한다. 다양한 텍스처 또는 구조적 특성을 다루기 위해 다양한 내부 예측 모드가 사용된다. H.265/HEVC는 현재 블록을 하나 이상의 참조 샘플과 공간적으로 상관시키는 35 개의 내부 예측 모드(700)를 지원한다. 구체적으로, 인트라-예측 모드(700)는 모드 2 내지 34로 인덱싱된 33 개의 방향성 예측 모드, 모드 1로 인덱싱된 DC 모드 및 모드 0으로 인덱싱된 평면 모드를 포함한다.As mentioned above, intra-prediction involves matching a current image block to a corresponding sample or samples of one or more adjacent blocks. The current image block can then be marked with the selected prediction mode index and residual block, which is much smaller than representing all the luma/chroma values contained in the current image block. Intra-prediction can be used when there is no reference frame available or when inter-prediction coding is not used for the current block or frame. Reference samples for intra-prediction may be derived from previously coded (or reconstructed) neighboring blocks in the same frame. Advanced Video Coding (AVC), also known as H.264 and H.265/HEVC, both use baselines of boundary samples of adjacent blocks as reference samples for intra-prediction. Different intraprediction modes are used to handle different textures or structural properties. H.265/HEVC supports 35 intra prediction modes 700 that spatially correlate a current block with one or more reference samples. Specifically, the intra-prediction mode 700 includes 33 directional prediction modes indexed as modes 2 to 34, a DC mode indexed as mode 1, and a planar mode indexed as mode 0.

인코딩하는 동안 인코더는 현재 블록의 루마/크로마 값을 인접 블록의 가장자리를 가로지르는 기준선의 해당 참조 샘플의 루마/크로마 값과 일치시킨다. 기준선 중 하나와의 최상의 일치가 발견되면, 인코더는 최상의 일치 기준선을 가리키는 방향성 인트라-예측 모드(700) 중 하나를 선택한다. 논의의 명확성을 위해, 특정 방향성 인트라-예측 모드(700)를 참조하기 위해 아래에서 약어가 사용된다. DirS는 왼쪽 하단에서 시계 방향으로 카운팅할 때 시작 방향성 인트라-예측 모드를 나타낸다(예를 들어, HEVC의 모드 2). DirE는 왼쪽 하단에서 시계 방향으로 카운팅할 때 종료 방향 인트라-예측 모드를 나타낸다(예를 들어, HEVC의 모드 34). DirD는 왼쪽 하단에서 시계 방향으로 카운팅할 때 중간 방향 인트라 코딩 모드를 나타낸다(예를 들어, HEVC의 모드 18). DirH는 수평 인트라-예측 모드를 나타낸다(예를 들어, HEVC의 모드 10). DirV는 수직 인트라-예측 모드를 나타낸다(예를 들어, HEVC의 모드 26).During encoding, the encoder matches the luma/chroma values of the current block with the luma/chroma values of the corresponding reference sample in the baseline across the edge of the adjacent block. If a best match with one of the baselines is found, the encoder selects one of the directional intra-prediction modes 700 pointing to the best matching baseline. For clarity of discussion, abbreviations are used below to refer to specific directional intra-prediction modes 700 . DirS represents the starting directional intra-prediction mode when counting clockwise from the bottom left (eg, mode 2 of HEVC). DirE represents the end-direction intra-prediction mode when counting clockwise from the bottom left (eg, mode 34 of HEVC). DirD represents a mid-direction intra-coding mode when counting clockwise from the bottom left (eg, mode 18 of HEVC). DirH represents a horizontal intra-prediction mode (eg, mode 10 of HEVC). DirV represents a vertical intra-prediction mode (eg, mode 26 of HEVC).

전술한 바와 같이, DC 모드는 평활화 기능의 역할을 하며 주변 블록을 가로지르는 기준선의 모든 참조 샘플의 평균값으로 현재 블록의 예측값을 도출한다. 또한 위에서 논의한 바와 같이, 평면 모드는 기준 샘플의 기준선의 왼쪽 아래와 상단 또는 왼쪽 상단과 오른쪽 상단에 있는 샘플 사이의 부드러운 전환(예를 들어, 값의 일정한 기울기)을 나타내는 예측 값을 반환한다.As described above, the DC mode serves as a smoothing function and derives the predicted value of the current block as the average value of all reference samples of the baseline crossing the neighboring blocks. Also, as discussed above, planar mode returns predictive values that represent a smooth transition (e.g., constant slope of the value) between samples that are below and above the left and top of the reference sample's baseline or between the top left and top right of the sample.

DirH에서 DirV까지의 평면(Planar) 모드, DC 모드 및 예측 모드의 경우 기준선의 맨 위 행과 기준선의 왼쪽 열에 있는 샘플이 참조 샘플로 사용된다. DirS에서 DirH까지(DirS 및 DirH 포함)의 예측 방향이 있는 예측 모드의 경우 기준선의 왼쪽 열에 있는 이전에 코딩되고 재구성된 인접 블록의 참조 샘플이 참조 샘플로 사용된다. DirV에서 DirE까지(DirV 및 DirE 포함)의 예측 방향이 있는 예측 모드의 경우 기준선의 맨 위 행에 있는 이전에 코딩되고 재구성된 인접 블록의 참조 샘플이 참조 샘플로 사용된다.For Planar mode, DC mode and Prediction mode from DirH to DirV, the samples in the top row of the baseline and the column to the left of the baseline are used as reference samples. For prediction modes with prediction directions from DirS to DirH (including DirS and DirH), reference samples of previously coded and reconstructed neighboring blocks in the left column of the baseline are used as reference samples. For prediction modes with prediction directions from DirV to DirE (including DirV and DirE), the reference sample of the previously coded and reconstructed adjacent block in the top row of the baseline is used as the reference sample.

도 8은 비디오 코딩에서 블록(800)의 방향 관계의 예를 예시하는 개략도이다. 예를 들어, 블록(800)은 인트라-예측 모드(500)를 선택할 때 사용될 수 있다. 따라서, 블록(800)은 방법(100)의 단계 105 및 113, 코덱 시스템(200)의 인트라-픽처 추정 컴포넌트(215) 및 인트라-픽처 예측 컴포넌트(217), 인코더(300)의 인트라-픽처 예측 컴포넌트(317) 및/또는 디코더(400)의 인트라-픽처 예측 컴포넌트(417)에 의해 사용될 수 있다. 비디오 코딩에서, 블록(800)은 비디오 콘텐츠에 기초하여 파티셔닝되고, 따라서 다양한 형태 및 크기의 많은 직사각형 및 정사각형을 포함할 수 있다. 블록(800)은 설명을 위해 정사각형으로 도시되어 있으므로 논의의 명확성을 지원하기 위해 실제 비디오 코딩 블록에서 단순화된다.8 is a schematic diagram illustrating an example of the directional relationship of a block 800 in video coding. For example, block 800 may be used when selecting intra-prediction mode 500 . Accordingly, block 800 follows steps 105 and 113 of method 100, intra-picture estimation component 215 and intra-picture prediction component 217 of codec system 200, and intra-picture prediction of encoder 300. component 317 and/or intra-picture prediction component 417 of decoder 400 . In video coding, blocks 800 are partitioned based on video content, and thus may contain many rectangles and squares of various shapes and sizes. Block 800 is shown as a square for explanatory purposes and is therefore simplified from the actual video coding block to support clarity of discussion.

블록(800)은 현재 블록(801) 및 인접 블록(810)을 포함한다. 현재 블록(810)은 지정된 시간에 코딩되는 임의의 블록이다. 인접 블록(810)은 현재 블록(801)의 좌측 에지 또는 상단 에지에 바로 인접한 임의의 블록이다. 비디오 코딩은 일반적으로 좌측 상단에서 우측 하단으로 진행된다. 이와 같이, 인접 블록(810)은 현재 블록(801)의 코딩 이전에 인코딩 및 재구성될 수 있다. 현재 블록(801)을 코딩할 때, 인코더는 현재 블록(801)의 루마/크로마 값을 인접 블록(810)의 에지를 횡단하는 기준선으로부터 참조 샘플(또는 샘플들)과 일치시킨다. 그런 다음, 일치된 샘플(또는 DC 또는 평면 모드가 선택될 때의 샘플)을 가리키는 인트라-예측 모드(700)와 같은 인트라-예측 모드를 선택하기 위해 매치가 사용된다. 이때 선택된 인트라-예측 모드는 현재 블록(801)의 루마/크로마 값은 선택된 인트라-예측 모드에 대응하는 기준 샘플과 실질적으로 유사함을 나타낸다. 모든 차이는 잔여 블록에 유지될 수 있다. 선택된 인트라-예측 모드는 비트스트림으로 인코딩된다. 디코더에서, 현재 블록(801)은 (잔여 블록으로부터의 잔여 정보와 함께) 선택된 인트라-예측 모드에 대응하는 인접 블록(810)의 선택된 기준선에서 참조 샘플의 루마/크로마 값을 사용함으로써 재구성될 수 있다.A block 800 includes a current block 801 and an adjacent block 810 . The current block 810 is any block being coded at a designated time. The adjacent block 810 is any block immediately adjacent to the left edge or top edge of the current block 801 . Video coding generally proceeds from top left to bottom right. As such, the adjacent block 810 may be encoded and reconstructed prior to coding of the current block 801 . When coding the current block 801, the encoder matches the luma/chroma values of the current block 801 with a reference sample (or samples) from a baseline traversing the edge of the adjacent block 810. The match is then used to select an intra-prediction mode, such as intra-prediction mode 700 pointing to the matched sample (or sample when DC or planar mode is selected). At this time, the selected intra-prediction mode indicates that the luma/chroma value of the current block 801 is substantially similar to the reference sample corresponding to the selected intra-prediction mode. All differences can be kept in the remaining blocks. The selected intra-prediction mode is encoded into the bitstream. At the decoder, the current block 801 may be reconstructed (with residual information from the residual block) by using the luma/chroma values of the reference samples at the selected baseline of the adjacent block 810 corresponding to the selected intra-prediction mode. .

인-루프 필터.In-loop filter.

비디오 코딩 방식은 비디오 신호를 이미지 프레임으로 세분화한 다음 이미지 프레임을 다양한 유형의 블록으로 세분화한다. 그런 다음 이미지 블록이 압축된다. 이 접근 방식은 압축된 비디오 신호가 재구성되고 표시될 때 시각적 아티팩트를 생성할 수 있다. 예를 들어, 이미지 압축 프로세스는 인위적으로 블록 모양을 추가할 수 있다. 이를 블로킹(blocking)이라고 하며 일반적으로 블록 파티션 경계에서 발생한다. 또한, 양자화 노이즈로 알려진 비선형 신호 의존적 반올림 오차(non-linear signal dependent rounding error)도 압축된 이미지에 인위적으로 추가될 수 있다. 이러한 아티팩트를 보정하기 위해 다양한 필터가 사용될 수 있다. 필터는 포스트 프로세싱에서 재구성된 프레임에 적용될 수 있다. 포스트 프로세싱은 압축된 비디오 신호의 상당 부분이 재구성된 후 사용자에게 표시되기 직전에 발생한다. 필터는 인-루프 필터링이라는 메커니즘을 사용하여 압축/압축 해제 프로세스의 일부로 적용될 수도 있다. 인-루프 필터링은 인코딩 및/또는 디코딩 프로세스 중에 재구성된 비디오 이미지에 필터를 적용하여 관련 이미지 간의 보다 정확한 압축을 지원하는 필터링 체계이다. 예를 들어, 인터-예측은 이전 및/또는 후속 이미지 프레임을 기반으로 이미지 프레임을 인코딩한다. 인코더에서, 압축된 이미지는 인-루프 필터링을 통해 재구성되고 필터링되므로 재구성된 이미지는 인터-예측을 통해 이전/후속 이미지 프레임(들)을 인코딩하는 데 사용하기 위해 더 정확한 이미지를 제공한다. 디코더에서 압축된 이미지는 최종 사용자가 볼 수 있는 보다 정확한 이미지를 생성하고 보다 정확한 인터-예측을 지원하기 위해 인-루프 필터링을 통해 재구성 및 필터링된다. 인-루프에서, 필터링은 디 블로킹 필터, 샘플 적응 오프셋(SAO) 필터 및 적응 루프 필터와 같은 여러 필터를 사용한다. 인-루프 필터링에는 노이즈 억제 필터도 포함될 수 있다.Video coding schemes subdivide a video signal into image frames and then subdivide the image frames into blocks of various types. The image block is then compressed. This approach can create visual artifacts when the compressed video signal is reconstructed and displayed. For example, an image compression process may artificially add a block shape. This is called blocking and usually occurs at block partition boundaries. In addition, non-linear signal dependent rounding error, also known as quantization noise, may be artificially added to the compressed image. Various filters may be used to correct for these artifacts. A filter may be applied to the reconstructed frame in post processing. Post processing occurs after a significant portion of the compressed video signal is reconstructed and immediately before display to the user. Filters can also be applied as part of the compression/decompression process using a mechanism called in-loop filtering. In-loop filtering is a filtering scheme that supports more precise compression between related images by applying filters to reconstructed video images during the encoding and/or decoding process. For example, inter-prediction encodes an image frame based on previous and/or subsequent image frames. At the encoder, the compressed image is reconstructed and filtered through in-loop filtering so that the reconstructed image provides a more accurate image for use in encoding the previous/subsequent image frame(s) via inter-prediction. At the decoder, the compressed image is reconstructed and filtered through in-loop filtering to produce a more accurate image for end-user viewing and to support more accurate inter-prediction. In-loop, filtering uses several filters such as a deblocking filter, a sample adaptive offset (SAO) filter and an adaptive loop filter. In-loop filtering may also include a noise suppression filter.

도 9는 예시적인 인-루프 필터(900)를 예시하는 블록도이다. 인-루프 필터(900)는 인-루프 필터(225, 325, 및/또는 425)를 구현하기 위해 사용될 수 있다. 인-루프 필터(900)는 노이즈 억제 필터(941); 디 블록킹 필터(943), 샘플 적응 오프셋(SAO) 필터(945) 및 적응 루프 필터(947)를 포함한다. 인-루프 필터(900)의 필터는 재구성된 이미지 블록 및/또는 잔여 블록에 순차적으로 적용된다.9 is a block diagram illustrating an exemplary in-loop filter 900. In-loop filter 900 may be used to implement in-loop filters 225, 325, and/or 425. The in-loop filter 900 includes a noise suppression filter 941; A deblocking filter 943, a sample adaptive offset (SAO) filter 945 and an adaptive loop filter 947. The filters of in-loop filter 900 are sequentially applied to the reconstructed image block and/or residual block.

노이즈 억제 필터(941)는 이미지 압축으로 인한 양자화 노이즈를 제거하도록 구성된다. 구체적으로, 노이즈 억제 필터(941)는 이미지의 가장자리에서 발생하는 아티팩트를 제거하기 위해 사용된다. 예를 들어, 이미지 압축은 이미지에서 서로 다른 색상/밝은 패치 사이의 날카로운 전환(가장자리)에 인접한 뚜렷하고 부정확한 색상/밝은 값을 생성할 수 있다. 이를 링잉이라고 하며 날카로운 가장자리와 관련된 이미지 데이터의 고주파수 부분에 변환을 적용하여 발생한다. 노이즈 억제 필터(941)는 이러한 링잉 아티팩트를 완화하기 위해 사용된다. 노이즈 억제 필터(941)는 공간 도메인(예를 들어, 픽셀의 공간 배향) 및 주파수 도메인(예를 들어, 픽셀 데이터에 관한 변환된 계수 값의 관계) 모두에서 동작한다. 인코더에서, 노이즈 억제 필터(941)는 재구성된 프레임을 참조 매크로 블록으로 파티셔닝한다. 이러한 블록은 더 작은 참조 블록으로 세분화될 수도 있다. 노이즈 억제 필터(941)는 먼저 블록에서 추정된 양자화 잡음의 양에 기초하여 필터링되어야 하는 프레임의 부분을 나타내는 애플리케이션 맵을 생성한다. 노이즈 억제 필터(941)는 매칭 컴포넌트를 사용하여 애플리케이션 맵에 의해 표시된 바와 같이 각각의 참조 블록에 대해 대응하는 참조 블록과 유사한 패치 세트를 결정하는데, 여기서 유사한 것은 크로마/루마 값이 미리 결정된 범위 내에 있음을 나타낸다. 노이즈 억제 필터(941)는 패치를 클러스터로 그룹화하고 2 차원(2D) 변환을 사용하여 클러스터를 주파수 도메인으로 변환하여 주파수 도메인 패치를 생성할 수 있다. 노이즈 억제 필터(941)는 또한 역 2D 변환을 이용하여 주파수 도메인 패치를 공간 도메인으로 다시 변환할 수 있다.A noise suppression filter 941 is configured to remove quantization noise due to image compression. Specifically, the noise suppression filter 941 is used to remove artifacts occurring at the edges of the image. For example, image compression can produce sharp and inaccurate color/brightness values adjacent to sharp transitions (edges) between different color/bright patches in an image. This is called ringing and is caused by applying a transform to the high-frequency portion of the image data associated with sharp edges. A noise suppression filter 941 is used to mitigate these ringing artifacts. The noise suppression filter 941 operates in both the spatial domain (eg, the spatial orientation of pixels) and the frequency domain (eg, the relationship of transformed coefficient values with respect to pixel data). At the encoder, a noise suppression filter 941 partitions the reconstructed frame into reference macroblocks. These blocks may be subdivided into smaller reference blocks. The noise suppression filter 941 first creates an application map indicating the portion of the frame that should be filtered based on the amount of quantization noise estimated in the block. The noise suppression filter 941 uses a matching component to determine, for each reference block, a set of patches that are similar to the corresponding reference block, as indicated by the application map, where similar have chroma/luma values within a predetermined range. indicates The noise suppression filter 941 can generate frequency domain patches by grouping the patches into clusters and transforming the clusters to the frequency domain using a two-dimensional (2D) transform. The noise suppression filter 941 can also transform the frequency domain patch back to the spatial domain using an inverse 2D transform.

디 블로킹 필터(943)는 블록 기반 인터 및 인트라-예측에 의해 생성된 블록 형상 에지를 제거하도록 구성된다. 디 블로킹 필터(943)는 분할 경계에서 발생하는 크로마 및/또는 루마 값의 불연속성에 대해 이미지 부분(예를 들어, 이미지 슬라이스)을 스캔한다. 그런 다음, 디 블로킹 필터(943)는 그러한 불연속성을 제거하기 위해 블록 경계에 평활화 기능을 적용한다. 디 블로킹 필터(943)의 강도는 블록 경계에 인접한 영역에서 발생하는 공간 활동(예를 들어, 루마/크로마 성분의 변화)에 따라 변할 수 있다.The deblocking filter 943 is configured to remove block-shaped edges produced by block-based inter and intra-prediction. The deblocking filter 943 scans an image portion (eg, an image slice) for discontinuities in chroma and/or luma values that occur at segmentation boundaries. The deblocking filter 943 then applies a smoothing function to the block boundaries to remove such discontinuities. The strength of the deblocking filter 943 may change according to spatial activity (eg, a change in luma/chroma components) occurring in a region adjacent to a block boundary.

SAO 필터(945)는 인코딩 프로세스에 의해 야기되는 샘플 왜곡과 관련된 아티팩트를 제거하도록 구성된다. 인코더에서의 SAO 필터(945)는 재구성된 이미지의 디 블로킹된 샘플을 상대적 디 블로킹 에지 형상 및/또는 방향에 기초하여 여러 카테고리로 분류한다. 그런 다음 오프셋이 결정되고 범주에 따라 샘플에 추가된다. 오프셋은 비트스트림에서 인코딩되고 디코더에서 SAO 필터(945)에 의해 사용된다. SAO 필터(945)는 밴딩 아티팩트(banding artifact)(부드러운 전환 대신 값의 밴드) 및 링잉 아티팩트(ringing artifact)(예리한 에지 근처의 가짜 신호)를 제거한다.SAO filter 945 is configured to remove artifacts related to sample distortion caused by the encoding process. The SAO filter 945 in the encoder classifies the deblocked samples of the reconstructed image into different categories based on relative deblocking edge shape and/or direction. An offset is then determined and added to the samples according to the category. The offset is encoded in the bitstream and used by the SAO filter 945 at the decoder. The SAO filter 945 removes banding artifacts (bands of values instead of smooth transitions) and ringing artifacts (spurious signals near sharp edges).

인코더에서 적응 루프 필터(947)는 재구성된 이미지를 원본 이미지와 비교하도록 구성된다. 적응 루프 필터(947)는 예를 들어 비에너(Wiener) 기반 적응 필터를 통해 재구성된 이미지와 원본 이미지 사이의 차이를 설명하는 계수를 결정한다. 이러한 계수는 비트스트림에서 인코딩되고 디코더의 적응 루프 필터(947)에서 사용되어 재구성된 이미지와 원본 이미지 사이의 차이를 제거한다. 적응성 루프 필터(947)가 아티팩트를 수정하는 데 효과적이지만, 재구성된 이미지와 원본 이미지 사이의 차이가 클수록 더 많은 계수가 시그널링될 수 있다. 이것은 차례로 더 큰 비트스트림을 생성하여 압축의 효율성을 감소시킨다. 이와 같이, 적응 루프 필터(947)를 적용하기 전에 다른 필터에 의한 차이를 최소화하면 압축이 개선된다.An adaptive loop filter 947 in the encoder is configured to compare the reconstructed image with the original image. The adaptive loop filter 947 determines coefficients describing the difference between the original image and the reconstructed image via, for example, a Wiener-based adaptive filter. These coefficients are encoded in the bitstream and used in the decoder's adaptive loop filter 947 to remove the difference between the reconstructed image and the original image. Although the adaptive loop filter 947 is effective in correcting artifacts, the larger the difference between the reconstructed image and the original image, the more coefficients can be signaled. This in turn creates a larger bitstream which reduces the efficiency of compression. As such, minimizing the difference caused by the other filters before applying the adaptive loop filter 947 improves compression.

파티셔닝.partitioning.

비디오 코딩은 인코더를 사용하여 미디어 파일을 압축하고 디코더를 사용하여 압축된 미디어 파일로부터 원본 미디어 파일을 재구성한다. 비디오 코딩은 표준화된 프로세스를 사용하는 모든 디코더가 표준화된 프로세스를 사용하는 모든 인코더에 의해 압축된 미디어 파일을 일관되게 재생할 수 있도록 하기 위해 다양한 표준화된 프로세스를 사용한다. 예를 들어, 인코더와 디코더는 모두 H.265로도 알려진 고효율 비디오 코딩(HEVC)과 같은 코딩 표준을 사용할 수 있다. 인코더에서 비디오 신호는 프레임으로 분리된다. 그런 다음 프레임은 픽셀 그룹을 포함하는 이미지 블록으로 파티셔닝된다. 그런 다음 이미지 블록은 압축, 필터링 및 비트스트림으로 인코딩된다. 비트스트림은 최종 사용자에게 디스플레이하기 위해 비디오 신호를 재구성하는 디코더로 전송될 수 있다.Video coding uses an encoder to compress a media file and a decoder to reconstruct an original media file from the compressed media file. Video coding uses a variety of standardized processes to ensure that any decoder using a standardized process can consistently reproduce a compressed media file by any encoder using a standardized process. For example, both the encoder and decoder may use a coding standard such as High Efficiency Video Coding (HEVC), also known as H.265. At the encoder, the video signal is divided into frames. The frame is then partitioned into image blocks containing groups of pixels. The image blocks are then compressed, filtered and encoded into bitstreams. The bitstream can be sent to a decoder that reconstructs the video signal for display to an end user.

분할 시스템은 이미지 블록을 서브 블록으로 분할하도록 구성된다. 예를 들어, 노드(예를 들어, 블록)를 자식 노드(예를 들어, 서브 블록)로 분할하기 위해 다양한 스플리트 모드를 사용하는 트리 구조가 사용될 수 있다. 다른 스플리트 모드를 사용하여 다른 파티션을 얻을 수 있다. 또한 스플리트 모드를 재귀적으로 적용하여 노드를 더 세분화할 수도 있다. 이러한 스플리트 모드를 적용하면 다양한 파티션 패턴이 생성된다.The segmentation system is configured to divide an image block into sub-blocks. For example, a tree structure using various split modes can be used to split a node (eg, block) into child nodes (eg, sub-blocks). Different split modes can be used to obtain different partitions. You can also recursively apply split mode to further subdivide nodes. When this split mode is applied, various partition patterns are created.

도 10은 블록 파티셔닝에 사용되는 예시적인 스플리트 모드(1000)를 도시한다. 스플리트 모드(1000)는 파티셔닝하는 동안 부모 노드(예를 들어, 이미지 블록)를 복수의 자식 노드(예를 들어, 이미지 서브-블록)로 분할하는 메커니즘이다. 스플리트 모드(1000)는 쿼드 트리(quad-tree, QT) 스플리트 모드(1001), 수직 이진 트리(binary tree, BT) 스플리트 모드(1003), 수평 BT 스플리트 모드(1005), 수직 트리플 트리(triple tree, TT) 스플리트 모드(1007) 및 수평 TT 스플리트 모드(1009)를 포함한다. QT 스플리트 모드(1001)는 4Mx4N 크기의 노드가 MxN 크기의 4 개의 자식 노드로 분할되는 블록 파티셔닝을 위한 트리 구조이며, 여기서 M은 블록 너비를 나타내고 N은 블록 높이를 나타낸다. 수직 BT 스플리트 모드(1003) 및 수평 BT 스플리트 모드(1005)는 크기가 4Mx4N인 노드가 각각 크기가 2Mx4N인 두 개의 자식 노드로 수직으로 분할되거나 크기가 4Mx2N인 두 개의 자식 노드로 수평으로 각각 분할되는 블록 파티셔닝을 위한 트리 구조이다. 수직 TT 스플리트 모드(1007) 및 수평 TT 스플리트 모드(1009)는 크기가 4Mx4N인 노드가 Mx4N, 2Mx4N 및 Mx4N 크기의 3 개의 자식 노드로 수직으로 분할되거나; 또는 4MxN, 4Mx2N 및 4MxN 크기의 3 개의 자식 노드로 각각 수평 분할되는 블록 파티셔닝을 위한 트리 구조이다. 세 개의 자식 노드 중 가장 큰 노드가 중앙에 위치한다.10 shows an exemplary split mode 1000 used for block partitioning. Split mode 1000 is a mechanism that splits a parent node (eg, image block) into multiple child nodes (eg, image sub-blocks) during partitioning. The split mode (1000) includes a quad-tree (QT) split mode (1001), a vertical binary tree (BT) split mode (1003), a horizontal BT split mode (1005), and a vertical triple It includes a triple tree (TT) split mode 1007 and a horizontal TT split mode 1009 . The QT split mode 1001 is a tree structure for block partitioning in which a node of size 4Mx4N is split into four child nodes of size MxN, where M represents block width and N represents block height. Vertical BT split mode (1003) and horizontal BT split mode (1005) split a node of size 4Mx4N vertically into two child nodes each of size 2Mx4N or horizontally into two child nodes each of size 4Mx2N. It is a tree structure for partitioning blocks that are divided. In the vertical TT split mode 1007 and the horizontal TT split mode 1009, a node with a size of 4Mx4N is vertically split into three child nodes with sizes of Mx4N, 2Mx4N and Mx4N; Or, it is a tree structure for block partitioning that is horizontally divided into three child nodes of 4MxN, 4Mx2N, and 4MxN sizes, respectively. The largest node among the three child nodes is located in the center.

스플리트 모드(1000)는 블록을 더 분할하기 위해 재귀적으로 적용될 수도 있다. 예를 들어 쿼드-트리 이진-트리(quad-tree binary-tree, QT-BT)는 QT 스플리트 모드 1001로 노드를 분할한 다음 수직 BT 스플리트 모드(1003) 및/또는 수평 BT 스플리트 모드(1005)로 각각의 자식 노드(4 중 트리 리프 노드라고도 함)를 파티셔닝하여 생성될 수 있다. 또한, 쿼드 트리 분할로 노드를 분할한 다음 결과적인 자식 노드를 수직 TT 분할 모드(1007) 및/또는 수평 TT 스플리트 모드(1009)로 분할하여 쿼드 트리 트리플 트리(QT-TT)를 생성할 수 있다.Split mode 1000 may be applied recursively to further split the block. For example, the quad-tree binary-tree (QT-BT) splits a node with QT split mode 1001, followed by vertical BT split mode (1003) and/or horizontal BT split mode ( 1005) by partitioning each child node (also called a quadruple tree leaf node). Additionally, a quad tree triple tree (QT-TT) can be created by splitting a node with a quad tree split and then splitting the resulting child nodes into a vertical TT split mode (1007) and/or a horizontal TT split mode (1009). there is.

HEVC는 합동 탐사 모델(Joint Exploration Model, JEM) 애플리케이션에서 작동한다. JEM에서는 코딩 트리 단위(CTU)를 복수의 블록으로 파티셔닝하기 위해 QT-BT 블록 분할이 사용된다. TT 블록 파티셔닝은 또한 블록 파티션 유형을 더욱 풍부하게 하기 위해 JEM에 포함되도록 제안되었다. QT, QT-BT, QT-TT 블록 분할 스플리트 모드에 기반한 비디오 코딩에서, 깊이 K의 코딩 또는 예측 블록은 BT, TT, 또는 QT 스플리트 모드에 의해 깊이 K+1의 N 개의 더 작은 코딩 또는 예측 블록으로 분할될 수 있으며, 여기서 N은 각각 2, 3 또는 4로 설정된다. 스플리트 모드의 파티션 패턴은 도 10에 도시되어 있고, 파티션 패턴은 부모 노드에서 분할된 둘 이상의 자식 노드의 크기와 위치를 나타낸다.HEVC works on Joint Exploration Model (JEM) applications. In JEM, QT-BT block partitioning is used to partition a coding tree unit (CTU) into multiple blocks. TT block partitioning has also been proposed for inclusion in JEM to further enrich block partition types. In video coding based on the QT, QT-BT, or QT-TT block division split mode, a coded or predicted block of depth K is divided into N smaller coded or It can be divided into prediction blocks, where N is set to 2, 3 or 4 respectively. A partition pattern of the split mode is shown in FIG. 10, and the partition pattern indicates the size and location of two or more child nodes divided from a parent node.

변환.conversion.

비디오 코딩은 인코더를 사용하여 미디어 파일을 압축하고 디코더를 사용하여 압축된 미디어 파일로부터 원본 미디어 파일을 재구성한다. 비디오 코딩은 표준화된 프로세스를 사용하는 모든 디코더가 표준화된 프로세스를 사용하는 모든 인코더에 의해 압축된 미디어 파일을 일관되게 재생할 수 있도록 하기 위해 다양한 표준화된 프로세스를 사용한다.Video coding uses an encoder to compress a media file and a decoder to reconstruct an original media file from the compressed media file. Video coding uses a variety of standardized processes to ensure that any decoder using a standardized process can consistently reproduce a compressed media file by any encoder using a standardized process.

예를 들어, 인코더와 디코더는 모두 H.265로도 알려진 고효율 비디오 코딩(HEVC)과 같은 코딩 표준을 사용할 수 있다. H. 265는 예측 및 변환 프레임 워크를 기반으로 한다. 인코더에서 비디오 파일은 프레임으로 분리된다. 그런 다음 프레임은 픽셀 그룹을 포함하는 이미지 블록으로 세분화된다. 이미지 블록은 예측 모드 및 모션 벡터 정보와 같은 예측 정보를 포함하는 예측 블록과 변환 모드, 변환 계수 및 양자화 파라미터와 같은 잔여 정보를 포함하는 잔여 블록으로 더 분해된다. 예측 블록과 잔여 블록은 이미지 블록보다 적은 저장 공간을 사용하지만 디코더에서 이미지 블록을 복원하는 데 사용할 수 있다. 예측 블록 및 잔여 블록은 비트스트림으로 코딩되고 디코더로 전송되고 및/또는 요청 시 나중의 전송을 위해 저장된다. 디코더에서 예측 정보와 잔여 정보가 파싱된다. 분석된 예측 정보는 인트라-예측 또는 인터-예측을 사용하여 예측 샘플을 생성하는 데 사용된다. 인트라-예측은 재구성된 이미지 블록을 사용하여 동일한 프레임의 다른 이미지 블록을 예측한다. 인터-예측은 재구성된 이미지 블록을 사용하여 인접 프레임 사이의 다른 이미지 블록을 예측한다. 또한, 잔여 정보는 예를 들어 역 양자화 및 역변환을 순차적으로 적용하여 잔여 샘플을 생성하는 데 사용된다. 예측 샘플 및 잔여 샘플은 (예를 들어, 모니터에서 최종 사용자에게 표시하기 위해) 인코더에 의해 코딩된 이미지 블록에 대응하는 재구성된 샘플을 얻도록 결합된다.For example, both the encoder and decoder may use a coding standard such as High Efficiency Video Coding (HEVC), also known as H.265. H. 265 is based on a prediction and transformation framework. In the encoder, a video file is divided into frames. The frame is then subdivided into image blocks containing groups of pixels. The image block is further decomposed into a prediction block containing prediction information such as prediction mode and motion vector information and a residual block containing residual information such as transform mode, transform coefficients and quantization parameters. Prediction blocks and residual blocks use less storage space than image blocks, but can be used to reconstruct image blocks at the decoder. Prediction blocks and residual blocks are coded into a bitstream and transmitted to a decoder and/or stored for later transmission upon request. Prediction information and residual information are parsed at the decoder. The analyzed prediction information is used to generate prediction samples using either intra-prediction or inter-prediction. Intra-prediction uses a reconstructed image block to predict other image blocks in the same frame. Inter-prediction uses reconstructed image blocks to predict other image blocks between adjacent frames. Also, the residual information is used to generate residual samples by sequentially applying, for example, inverse quantization and inverse transformation. Prediction samples and residual samples are combined to obtain reconstructed samples corresponding to image blocks coded by the encoder (eg, for display to an end user on a monitor).

공간 가변 변환(Spatial varying transform, SVT)은 비디오 코딩 효율성을 더욱 향상시키기 위해 사용되는 메커니즘이다. SVT는 잔여 블록을 더 압축하기 위해 변환 블록을 사용한다. 구체적으로 직사각형 잔여 블록은 너비와 높이 h(예를 들어, wxh)를 포함한다. 잔여 블록보다 작은 변환 블록이 선택된다. 따라서, 변환 블록은 잔여 블록의 대응하는 부분을 변환하고 추가적인 코딩/압축 없이 잔여 블록의 나머지 부분을 남기기 위해 사용된다. SVT의 근거는 잔여 정보가 잔여 블록에 고르게 분포되지 않을 수 있다는 것이다. 적응형 위치가 있는 더 작은 변환 블록을 사용하면 전체 잔여 블록을 변환할 필요없이 잔여 블록에서 대부분의 잔여 정보를 캡처할 수 있다. 이 접근 방식은 일부 경우에 잔여 블록의 모든 잔여 정보를 변환하는 것보다 더 나은 코딩 효율성을 달성할 수 있다. 변환 블록이 잔여 블록보다 작기 때문에 SVT는 잔여 블록에 대한 변환 위치를 시그널링하는 메커니즘을 사용한다. 이러한 위치 시그널링은 코딩 프로세스의 전체 시그널링 오버헤드를 증가시켜 압축 효율을 감소시킨다. 또한 모든 경우에 동일한 유형의 변환 블록을 사용하면 경우에 따라 유익한 결과를 얻지 못할 수 있다.Spatial varying transform (SVT) is a mechanism used to further improve video coding efficiency. SVT uses a transform block to further compress the residual block. Specifically, the rectangular residual block includes a width and a height h (eg, wxh). A transform block smaller than the residual block is selected. Thus, a transform block is used to transform the corresponding part of the residual block and leave the remaining part of the residual block without further coding/compression. The rationale for SVT is that residual information may not be evenly distributed in residual blocks. Using a smaller transform block with adaptive positioning allows us to capture most of the residual information from the residual block without having to transform the entire residual block. This approach may in some cases achieve better coding efficiency than transforming all residual information of a residual block. Since the transform block is smaller than the residual block, SVT uses a mechanism to signal the transform position for the residual block. This position signaling increases the overall signaling overhead of the coding process and reduces compression efficiency. Also, using the same type of transform block in all cases may not yield beneficial results in some cases.

도 11은 예시적인 비디오 인코딩 메커니즘(1100)의 개략도이다. 하나 이상의 프레임으로부터의 인코더는 이미지 블록(1101)을 획득할 수 있다. 예를 들어, 이미지는 복수의 직사각형 이미지 영역으로 분할될 수 있다. 이미지의 각 영역은 CTU(코딩 트리 단위)에 대응한다. CTU는 HEVC의 코딩 단위와 같은 복수의 블록으로 파티셔닝된다. 그런 다음, 블록 분할 정보는 비트스트림(1111)에서 인코딩된다. 따라서, 이미지 블록(1101)은 이미지의 분할된 부분이고 이미지의 대응하는 부분에서 루마 성분 및/또는 크로마 성분을 나타내는 픽셀을 포함한다. 인코딩 동안, 이미지 블록(1101)은 인트라-예측을 위한 예측 모드 및/또는 인터-예측을 위한 모션 벡터와 같은 예측 정보를 포함하는 예측 블록(1103)으로서 인코딩된다. 이미지 블록(1101)을 예측 블록(1103)으로 인코딩하면 예측 블록(303)과 이미지 블록(301) 사이의 차이를 나타내는 잔여 정보를 포함하는 잔여 블록(1105)을 남길 수 있다.11 is a schematic diagram of an example video encoding mechanism 1100 . An encoder from one or more frames may obtain an image block 1101 . For example, an image may be divided into a plurality of rectangular image regions. Each area of the image corresponds to a CTU (Coding Tree Unit). A CTU is partitioned into a plurality of blocks, such as coding units of HEVC. The block division information is then encoded in the bitstream 1111. Thus, an image block 1101 is a segmented portion of an image and includes pixels representing luma components and/or chroma components in corresponding portions of the image. During encoding, the image block 1101 is encoded as a prediction block 1103 containing prediction information such as a motion vector for inter-prediction and/or a prediction mode for intra-prediction. Encoding the image block 1101 into the prediction block 1103 may leave a residual block 1105 containing residual information representing the difference between the prediction block 303 and the image block 301 .

이미지 블록(1101)은 하나의 예측 블록(1103)과 하나의 잔여 블록(1105)을 포함하는 코딩 단위로 파티셔닝될 수 있다는 점에 유의해야 한다. 예측 블록(1103)은 코딩 단위의 모든 예측 샘플을 포함할 수 있고, 잔여 블록(1105)은 코딩 단위의 잔여 샘플 모두를 포함할 수 있다. 이러한 경우, 예측 블록(1103)은 잔여 블록(1105)과 동일한 크기이다. 다른 예에서, 이미지 블록(1101)은 두 개의 예측 블록(1103)과 하나의 잔여 블록(1105)을 포함하는 코딩 단위로 분할될 수 있다. 이러한 경우에, 각각의 예측 블록(1103)은 코딩 단위의 예측 샘플의 일부를 포함하고, 잔여 블록(1105)은 코딩 단위의 모든 잔여 샘플을 포함한다. 또 다른 예에서, 이미지 블록(1101)은 2 개의 예측 블록(1103) 및 4 개의 잔여 블록(1105)을 포함하는 코딩 단위로 파티셔닝된다. 코딩 단위의 잔여 블록(1105)의 파티션 패턴은 비트스트림(1111)에서 시그널링될 수 있다. 이러한 위치 패턴은 HEVC의 잔여 쿼드-트리(Residual Quad-Tree, RQT)를 포함할 수 있다. 또한, 이미지 블록(1101)은 이미지 샘플(또는 픽셀)의 Y 성분으로 표시되는 루마 성분(예를 들어, 빛)만을 포함할 수 있다. 다른 경우에, 이미지 블록(1101)은 이미지 샘플의 Y, U 및 V 성분을 포함할 수 있으며, 여기서 U 및 V는 청색 휘도 및 적색 휘도(UV) 색 공간에서 색차 성분(예를 들어, 컬러)을 나타낸다.It should be noted that the image block 1101 can be partitioned into coding units including one prediction block 1103 and one residual block 1105. The prediction block 1103 may include all prediction samples of the coding unit, and the residual block 1105 may include all of the residual samples of the coding unit. In this case, the prediction block 1103 is the same size as the residual block 1105. In another example, an image block 1101 may be divided into coding units including two prediction blocks 1103 and one residual block 1105 . In this case, each prediction block 1103 includes a portion of the prediction samples of the coding unit, and the residual block 1105 includes all residual samples of the coding unit. In another example, an image block 1101 is partitioned into coding units comprising two prediction blocks 1103 and four residual blocks 1105. The partition pattern of the residual block 1105 of the coding unit may be signaled in the bitstream 1111. This location pattern may include a residual quad-tree (RQT) of HEVC. Also, the image block 1101 may include only a luma component (eg, light) represented by a Y component of an image sample (or pixel). In another case, image block 1101 may include Y, U, and V components of an image sample, where U and V are chrominance components (e.g., color) in the blue luminance and red luminance (UV) color space. indicates

정보를 추가로 압축하기 위해 변환이 사용될 수 있다. 특히, 변환 블록(1107)은 잔여 블록(1105)을 추가로 압축하기 위해 사용될 수 있다. 변환 블록(1107)은 역 이산 코사인 변환(DCT) 및/또는 역 이산 사인 변환(DST)과 같은 변환을 포함한다. 예측 블록(1103)과 이미지 블록(1101) 사이의 차이는 변환 계수를 사용함으로써 변환에 적합하다. 변환 블록(1107)의 변환 모드(예를 들어, 역 DCT 및/또는 역 DST) 및 대응하는 변환 계수를 표시함으로써, 디코더는 잔여 블록(1105)을 재구성할 수 있다. 정확한 재생이 요구되지 않는 경우, 변환 계수는 변환에 더 잘 맞도록 특정 값을 반올림하여 더 압축될 수 있다. 이 프로세스를 양자화라고 하며 허용 가능한 양자화를 설명하는 양자화 파라미터에 따라 수행된다. 따라서, 변환 블록(1107)의 변환 모드, 변환 계수 및 양자화 파라미터는 변환된 잔여 블록(1109)에 변환된 잔여 정보로 저장되며, 일부 경우에 간단히 잔여 블록이라고도 할 수 있다.A transform can be used to further compress the information. In particular, the transform block 1107 can be used to further compress the residual block 1105. Transform block 1107 includes a transform such as an inverse discrete cosine transform (DCT) and/or an inverse discrete sine transform (DST). The difference between the prediction block 1103 and the image block 1101 is fit for transformation by using transform coefficients. By indicating the transform mode (eg, inverse DCT and/or inverse DST) of the transform block 1107 and the corresponding transform coefficients, the decoder can reconstruct the residual block 1105 . If exact reproduction is not required, the transform coefficients can be further compressed by rounding certain values to better fit the transform. This process is called quantization and is performed according to a quantization parameter describing acceptable quantization. Accordingly, the transform mode, transform coefficients, and quantization parameters of the transform block 1107 are stored as transformed residual information in the transformed residual block 1109, which in some cases may simply be referred to as a residual block.

예측 블록(1103)의 예측 정보 및 변환된 잔여 블록(1109)의 변환된 잔여 정보는 비트스트림(1111)에서 인코딩될 수 있다. 비트스트림(1111)은 저장 및/또는 디코더로 전송될 수 있다. 그런 다음 디코더는 이미지 블록(1101)을 복구하기 위해 역으로 프로세스를 수행할 수 있다. 구체적으로, 디코더는 변환된 잔여 정보를 이용하여 변환 블록(1107)을 결정할 수 있다. 그런 다음 변환 블록(1107)은 변환된 잔여 블록(1109)과 함께 사용되어 잔여 블록(1105)을 결정할 수 있다. 잔여 블록(1105) 및 예측 블록(1103)은 이미지 블록(1101)을 재구성하는 데 사용될 수 있다. 그런 다음 이미지 블록(1101)은 다른 디코딩된 이미지 블록(1101)에 상대적으로 위치하여 프레임을 재구성하고 그러한 프레임을 인코딩된 비디오 복구하도록 위치시킬 수 있다.Prediction information of the prediction block 1103 and transformed residual information of the transformed residual block 1109 may be encoded in the bitstream 1111 . The bitstream 1111 may be stored and/or transmitted to a decoder. The decoder can then perform the reverse process to recover image block 1101. Specifically, the decoder may determine the transform block 1107 using the transformed residual information. The transform block 1107 can then be used with the transformed residual block 1109 to determine the residual block 1105. Residual block 1105 and prediction block 1103 may be used to reconstruct image block 1101 . Image blocks 1101 can then be positioned relative to other decoded image blocks 1101 to reconstruct frames and position those frames for encoded video recovery.

일부 예측 블록(1103)은 잔여 블록(1105)을 생성하지 않고 인코딩될 수 있다는 점에 유의해야 한다. 그렇지만, 그러한 경우는 변환 블록(1107)의 사용을 초래하지 않으므로 더 이상 논의되지 않는다. 변환 블록(1107)은 인터-예측된 블록 또는 인트라-예측된 블록에 사용될 수 있다. 또한, 변환 블록(1107)은 지정된 인터-예측 메커니즘(예를 들어, 변환 모델 기반 모션 보상)에 의해 생성된 잔여 블록(1105)에 사용될 수 있지만, 다른 지정된 인터-예측 메커니즘(예를 들어, 아핀 모델 기반 모션 보상)에 의해 생성된 잔여 블록(1105)에는 사용되지 않을 수 있다.It should be noted that some prediction blocks 1103 can be encoded without generating residual blocks 1105. However, such a case is not discussed further as it does not result in the use of the transform block 1107. Transform block 1107 can be used for inter-predicted blocks or intra-predicted blocks. In addition, the transform block 1107 can be used for the residual block 1105 generated by a specified inter-prediction mechanism (e.g., transform model based motion compensation), but can be used for other specified inter-prediction mechanisms (e.g., affine It may not be used for the residual block 1105 generated by model-based motion compensation).

도 12는 본 개시의 실시예에 따른 비디오 코딩을 위한 예시적인 컴퓨팅 장치(1200)의 개략도이다. 컴퓨팅 장치(1200)는 여기에 설명된 바와 같이 개시된 실시예들을 구현하는데 적합하다. 컴퓨팅 장치(1200)는 데이터를 수신하기 위한 수신 포트(1220) 및 수신기 유닛(Rx)(1210); 데이터를 처리하기 위한 프로세서, 로직 유닛 또는 중앙 처리 유닛(CPU)(1230); 데이터를 전송하기 위한 송신기 유닛(TX) 1240 및 출구 포트(1250); 데이터를 저장하기 위한 메모리(1260)를 포함한다. 컴퓨팅 장치(1200)는 또한 입구 포트(1220), 수신기 유닛(1210), 송신기 유닛(1240) 및 광학 신호 또는 전기 신호의 출력 또는 유입을 위한 출구 포트(1250)에 결합된 광-전기(OE) 컴포넌트 및 전기-광(EO) 컴포넌트를 포함할 수 있다. 컴퓨팅 장치(1200)는 또한 일부 예에서 무선 송신기 및/또는 수신기를 포함할 수 있다.12 is a schematic diagram of an example computing device 1200 for video coding in accordance with an embodiment of the present disclosure. Computing device 1200 is suitable for implementing the disclosed embodiments as described herein. Computing device 1200 includes a receive port 1220 and a receiver unit (Rx) 1210 for receiving data; a processor, logic unit or central processing unit (CPU) 1230 for processing data; a transmitter unit (TX) 1240 and egress port 1250 for transmitting data; and a memory 1260 for storing data. Computing device 1200 also includes an optical-to-electrical (OE) device coupled to an inlet port 1220, a receiver unit 1210, a transmitter unit 1240, and an outlet port 1250 for outputting or receiving optical or electrical signals. components and electro-optical (EO) components. Computing device 1200 may also include a wireless transmitter and/or receiver in some examples.

프로세서(1230)는 하드웨어 및 소프트웨어에 의해 구현된다. 프로세서(1230)는 하나 이상의 CPU 칩, 코어(예를 들어, 멀티 코어 프로세서), 필드-프로그래머블 게이트 어레이(field-programmable gate array, FPGA), 주문형 집적 회로(application specific integrated circuits, ASIC) 및 디지털 신호 프로세서(digital signal processor, DSP)로 구현될 수 있다. 프로세서(1230)는 입구 포트(1220), 수신기 유닛(1210), 송신기 유닛(1240), 출구 포트(1250) 및 메모리(1260)와 통신한다. 프로세서(1230)는 코딩 모듈(1214)을 포함한다. 코딩 모듈(1214)은 전술한 개시된 실시예를 구현한다. 예를 들어, 코딩 모듈(1214)은 다양한 코딩 동작을 구현, 처리, 준비 또는 제공한다. 따라서 코딩 모듈(1214)을 포함함으로써 컴퓨팅 장치(1200)의 기능에 실질적인 개선을 제공하고 컴퓨팅 장치(1200)의 다른 상태로의 변환에 영향을 미친다. 대안적으로, 코딩 모듈(1214)은 메모리(1260)에 저장되고 (예를 들어, 비 일시적 미디어에 저장된 컴퓨터 프로그램 제품으로서) 프로세서(1230)에 의해 실행되어 구현된다.Processor 1230 is implemented by hardware and software. Processor 1230 may include one or more CPU chips, cores (eg, multi-core processors), field-programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), and digital signals. It may be implemented as a processor (digital signal processor, DSP). Processor 1230 communicates with inlet port 1220 , receiver unit 1210 , transmitter unit 1240 , egress port 1250 and memory 1260 . Processor 1230 includes coding module 1214 . Coding module 1214 implements the disclosed embodiments described above. For example, coding module 1214 implements, processes, prepares, or provides various coding operations. Thus, inclusion of the coding module 1214 provides substantial improvements to the functionality of the computing device 1200 and affects the transition of the computing device 1200 to other states. Alternatively, coding module 1214 is implemented by being stored in memory 1260 and executed by processor 1230 (eg, as a computer program product stored on non-transitory media).

메모리(1260)는 하나 이상의 디스크, 테이프 드라이브 및 솔리드 스테이트 드라이브를 포함하고 오버 플로우 데이터 저장 장치로 사용될 수 있으며, 이러한 프로그램이 실행을 위해 선택될 때 프로그램을 저장하고, 명령 및 데이터를 저장하기 위해 사용될 수 있다. 프로그램 실행 중에 읽는다. 메모리(1260)는 휘발성 및/또는 비 휘발성일 수 있으며 리드-온리 메모리(read-only memory, ROM), 랜덤 액세스 메모리(random access memory, RAM), 터너리 콘텐츠-액세스 가능형 메모리(ternary content-addressable memory, TCAM) 및/또는 정적 랜덤 액세스 메모리(static random-access memory, SRAM)일 수 있다. 컴퓨팅 장치(1200)는 또한 최종 사용자와 상호 작용하기 위한 입력/출력(I/O) 장치일 수 있다. 예를 들어, 컴퓨팅 장치(1200)는 시각적 출력을 위한 모니터와 같은 디스플레이, 오디오 출력을 위한 스피커, 사용자 입력을 위한 키보드/마우스/트랙볼 등을 포함할 수 있다.Memory 1260 includes one or more disks, tape drives, and solid state drives and can be used as overflow data storage, to store programs when such programs are selected for execution, and to be used to store instructions and data. can read during program execution. Memory 1260 may be volatile and/or non-volatile and may include read-only memory (ROM), random access memory (RAM), ternary content-accessible memory addressable memory (TCAM) and/or static random-access memory (SRAM). Computing device 1200 may also be an input/output (I/O) device for interacting with an end user. For example, the computing device 1200 may include a display such as a monitor for visual output, a speaker for audio output, and a keyboard/mouse/trackball for user input.

도 13은 포인트 클라우드 미디어를 예시하는 시스템(1300)의 예이다. 특히, 시스템(1300)은 포인트 클라우드 미디어 프레임의 다양한 예를 예시한다. 예는 남자(1302), 정면을 향하는 여성(1304) 및 후면을 향하는 여성(1306)을 포함한다. 일부 구현에서, 시스템(1300)은 포인트 클라우드 미디어의 더 많거나 적은 예시를 포함할 수 있다. 포인트 클라우드 미디어는 3 차원(3-D) 공간에서 일정 기간 동안 물체 또는 사람의 가상 표현과 그 대응하는 움직임을 포함할 수 있다.13 is an example of a system 1300 illustrating point cloud media. In particular, system 1300 illustrates various examples of point cloud media frames. Examples include a male 1302, a front facing female 1304 and a rear facing female 1306. In some implementations, system 1300 may include more or fewer instances of point cloud media. Point cloud media may include virtual representations of objects or people and their corresponding movements over a period of time in three-dimensional (3-D) space.

일반적으로, 포인트 클라우드 미디어는 3-D 객체의 외부 표면을 윤곽을 그리는 3-D 공간의 데이터 포인트 세트를 포함한다. 예를 들어, 객체에는 사람, 제조 품목 및 실제 객체가 포함될 수 있다. 예를 들어, 물체는 카메라에 의해 실시간으로 기록될 수 있으며 3 차원 공간의 디스플레이 또는 프로젝션 스크린에 가상으로 표현될 수 있다. 포인트 클라우드 미디어의 프레임은 특정 시점에서 객체의 3D 표현을 포함할 수 있다. 포인트 클라우드 미디어의 연속 프레임은 포인트 클라우드 미디어의 동적 표현을 나타낸다.In general, point cloud media includes a set of data points in 3-D space that outline the outer surface of a 3-D object. For example, objects may include people, manufactured items, and real objects. For example, an object may be recorded in real time by a camera and represented virtually on a display or projection screen in a three-dimensional space. A frame of point cloud media may contain a 3D representation of an object at a particular point in time. Consecutive frames of the point cloud media represent a dynamic representation of the point cloud media.

이하에서는 두문자어 정의가 설명된다: 프레임 그룹(Group of Frames, GoF) - 추가 처리를 위해 사용되는 고정된 시간에 등록된 포인트 클라우드 세트. 픽처 그룹(Group of Pictures, GoP) - 비디오 압축 인코딩에 사용되는 포인트 클라우드에서 유도된 투영 세트이다. 포인트 클라우드 압축(Point cloud compression, PCC) - 전송을 위해 포인트 클라우드 미디어를 압축하는 기술이다.Acronym definitions are explained below: Group of Frames (GoF) - A set of registered point clouds at a fixed time used for further processing. Group of Pictures (GoP) - A set of projections derived from a point cloud used for video compression encoding. Point cloud compression (PCC) - A technique for compressing point cloud media for transmission.

비디오 기반 PCC 코덱 솔루션은 3-D 포인트 클라우드 데이터를 2-D 프로젝션 패치로의 세그멘테이션을 기반으로 한다. 비디오 기반 PCC는 몰입형 6 자유도, 동적 AR/VR 객체, 문화 유산, GIS, CAD, 자율 내비게이션 등의 콘텐츠에 널리 사용된다.The video-based PCC codec solution is based on the segmentation of 3-D point cloud data into 2-D projection patches. Video-based PCC is widely used for content such as immersive 6 degrees of freedom, dynamic AR/VR objects, cultural heritage, GIS, CAD, and autonomous navigation.

포인트 클라우드 미디어는 엔터테인먼트 산업, 지능형 자동차 내비게이션 산업, 지리 공간 검사, 실제 물체의 3D 모델링 및 시각화를 포함한 다양한 애플리케이션에서 필수적인 부분이 되었다. 이러한 각 애플리케이션에서 다양한 클라이언트 장치는 포인트 클라우드 미디어를 표시하고 설명할 수 있다. 비 균일 샘플링 형상을 고려할 때 이러한 데이터의 저장 및 전송을 위한 간결한 표현을 갖는 것이 좋다.Point cloud media has become an integral part of a variety of applications, including the entertainment industry, the intelligent car navigation industry, geospatial inspection, and 3D modeling and visualization of real-world objects. In each of these applications, the various client devices can display and describe the point cloud media. Given the non-uniform sampling geometry, it is good to have a concise representation for the storage and transmission of such data.

예를 들어, 클라이언트 장치는 포인트 클라우드 미디어를 사용하여 제조 부품, 계측 및 품질 검사, 다양한 시각화, 애니메이션, 렌더링 및 대량 맞춤화 애플리케이션을 위한 3-D 컴퓨터 애니메이션 드로잉(CAD) 모델을 생성할 수 있다. 일부 예에서, 몇 가지 예를 들기 위해 뼈, 팔 또는 다리가 차지하는 공간의 양을 시각화할 때와 같이 의료 영상에서 종종 수행되는 것처럼 포인트 클라우드 미디어를 사용하여 체적 데이터를 나타낼 수도 있다.For example, a client device may use point cloud media to create 3-D computer animation drawing (CAD) models for manufacturing parts, metrology and quality inspection, various visualization, animation, rendering, and mass customization applications. In some instances, point cloud media may be used to represent volumetric data, as is often done in medical imaging, such as when visualizing the amount of space occupied by a bone, arm, or leg, to name a few examples.

기존의 3D 프리젠테이션과 비교할 때 사람을 나타내는 포인트 클라우드 표면과 같은 불규칙한 포인트 클라우드 표면은 더 일반적이며 광범위한 센서 및 데이터 수집 전략에 적용 가능하다. 예를 들어, 가상 현실 세계의 3D 프리젠테이션 또는 텔레프레즌스 환경의 원격 렌더링의 경우 가상 인물 및 실시간 지침의 렌더링이 고밀도 포인트 클라우드 데이터 세트로 처리된다. 예를 들어, 남성(1302), 전면을 향한 여성(1304) 및 후면을 향한 여성(1306)이 텔레프레즌스 환경을 통해 가상 현실 세계에 제시된 예에서 이러한 단일 포인트 클라우드 프레임 각각은 밀집 포인트 클라우드 데이터 세트로 처리된다.Compared to traditional 3D presentations, irregular point cloud surfaces, such as those representing people, are more general and applicable to a wide range of sensors and data collection strategies. For example, for 3D presentations of virtual reality worlds or remote rendering of telepresence environments, renderings of virtual people and real-time instructions are handled as dense point cloud data sets. For example, in the example where a male 1302, a front-facing female 1304, and a rear-facing female 1306 are presented in a virtual reality world via a telepresence environment, each of these single point cloud frames is treated as a dense point cloud data set. do.

시스템(1300)은 특정 시점에서 포인트 클라우드 미디어의 프레임에 의해 표현되는 상이한 객체를 예시한다. 클라이언트 장치와 같은 장치는 일정 기간 동안 포인트 클라우드 미디어의 연속 프레임을 표시할 수 있다. 일부 예에서, 장치는 몰입형 6 자유도(6 DoF) 환경, 동적 증강 현실(AR) 및 가상 현실(VR) 환경, 문화 유산 환경, 지리 정보 시스템(GIS) 환경, CAD(Computer-Aided Design) 및 자율 내비게이션 시스템에서 포인트 클라우드 미디어를 표시할 수 있다.System 1300 illustrates different objects represented by frames of point cloud media at a particular point in time. A device, such as a client device, may display successive frames of point cloud media over a period of time. In some examples, the device may include immersive six degrees of freedom (6 DoF) environments, dynamic augmented reality (AR) and virtual reality (VR) environments, heritage environments, geographic information system (GIS) environments, computer-aided design (CAD) environments, and display point cloud media in an autonomous navigation system.

포인트 클라우드 미디어를 수신, 표시 및 전송할 수 있는 장치는 개인용 컴퓨터, 핸드헬드 장치, 텔레비전, 내비게이션 시스템, 또는 포인트 클라우드 미디어를 표시할 수 있는 다른 장치와 같은 클라이언트 장치를 포함할 수 있다. 일부 구현에서, 이러한 장치는 내부 메모리 내에 포인트 클라우드 미디어를 저장하거나 일부 외부 저장 장치에 포인트 클라우드 미디어를 저장할 수 있다. 포인트 클라우드 미디어를 저장하려면 상당한 양의 저장 공간이 필요하기 때문에 일반적으로 포인트 클라우드 미디어는 외부 저장 장치에 저장되고 액세스된다.Devices capable of receiving, displaying, and transmitting point cloud media may include client devices such as personal computers, handheld devices, televisions, navigation systems, or other devices capable of displaying point cloud media. In some implementations, these devices may store point cloud media within internal memory or store point cloud media on some external storage device. Because storing point cloud media requires a significant amount of storage space, point cloud media is typically stored and accessed on external storage devices.

포인트 클라우드 미디어를 캡처하고 중계하는 클라이언트 장치에 있어서, 클라이언트 장치는 포인트 클라우드 미디어를 저장 및/또는 전송하는 데 필요한 대역폭을 줄이기 위해 멀티 샘플링 기술 및 데이터 압축 기술에 의존할 수 있다. 어떤 경우에는 포인트 클라우드 미디어의 단일 프레임 크기 때문에 포인트 클라우드 미디어의 동적 표현을 저장하고 전송하는 것이 복잡하다. 예를 들어, 단일 프레임의 포인트 클라우드 미디어에는 수백 기가 바이트와 같은 많은 양의 데이터가 포함될 수 있다. 따라서 여러 프레임의 포인트 클라우드 미디어를 저장하고 전송하려면 데이터 손실을 유발할 수 있는 다양한 미디어를 통해 많은 양의 포인트 클라우드 미디어 프레임이 적절하게 전송되도록 데이터 압축 및 인코딩 기술이 필요한다.For client devices that capture and relay point cloud media, the client device may rely on multisampling techniques and data compression techniques to reduce the bandwidth required to store and/or transmit the point cloud media. In some cases, storing and transmitting dynamic representations of point cloud media is complicated by the size of a single frame of point cloud media. For example, a single frame of point cloud media can contain large amounts of data, such as hundreds of gigabytes. Therefore, storing and transmitting multiple frames of point cloud media requires data compression and encoding techniques to properly transmit large amounts of point cloud media frames over the various media which can cause data loss.

도 14는 포인트 클라우드 프레임 시퀀스를 예시하는 시스템(1400)의 예이다. 특히, 시스템(1400)은 상이한 타임 스탬프에서 포인트 클라우드 미디어의 프레임 시퀀스를 나타내는 동적 포인트 클라우드 미디어 시퀀스를 예시한다. 예를 들어, 시스템(1400)은 제1 타임 스탬프에서 제1 포인트 클라우드 프레임(1402), 제2 타임 스탬프에서 제2 포인트 클라우드 프레임(1404), 및 제3 타임 스탬프에서 제3 포인트 클라우드 프레임(1406)을 포함한다. 제1, 제2 및 제3 포인트 클라우드 프레임은 실시간으로 볼 때 3 차원 환경에서 증분 타임 스탬프에서 1400회 오른발을 앞으로 이동하는 여성을 나타내는 동적 포인트 클라우드 시퀀스를 구성한다. 증분 타임 스탬프는 몇 가지 예를 들자면 초 또는 마이크로 초와 같은 시간 단위로 측정할 수 있다.14 is an example of a system 1400 illustrating a point cloud frame sequence. In particular, system 1400 illustrates a dynamic point cloud media sequence representing a sequence of frames of point cloud media at different time stamps. For example, system 1400 provides a first point cloud frame 1402 at a first time stamp, a second point cloud frame 1404 at a second time stamp, and a third point cloud frame 1406 at a third time stamp. ). The first, second and third point cloud frames constitute a dynamic point cloud sequence representing a woman moving her right foot forward 1400 times at incremental time stamps in a three-dimensional environment when viewed in real time. Incremental timestamps can be measured in units of time, such as seconds or microseconds, to name a few.

시스템(1400)은 포인트 클라우드 미디어(1402)의 프레임에 대한 바운딩 박스(1408), 포인트 클라우드 미디어(1404)의 다른 프레임에 대한 바운딩 박스(1410), 및 포인트 클라우드 미디어(1406)의 다른 프레임에 대한 바운딩 박스(1412)를 예시한다. 각 포인트 클라우드 미디어는 u1, v1 및 d1 위치 좌표로 표시되는 X 차원, Y 차원 및 Z 차원과 같은 3 차원의 고정 그리드를 포함한다.System 1400 provides bounding boxes 1408 for frames in point cloud media 1402, bounding boxes 1410 for other frames in point cloud media 1404, and bounding boxes 1410 for other frames in point cloud media 1406. Bounding box 1412 is illustrated. Each point cloud media contains a fixed grid of three dimensions: X dimension, Y dimension and Z dimension, denoted by u1, v1 and d1 position coordinates.

바운딩 박스는 비어 있거나 데이터로 채워질 수 있는 포인트를 포함한다. 일부 예에서 빈 포인트(empty point)는 보이드 포인트(void point)가 될 수 있다. 일부 예에서, 점유 포인트는 하나 이상의 속성을 포함할 수 있다. 보이드 포인트는 속성이 없는 포인트 클라우드 미디어 프레임의 위치를 포함할 수 있다. 반면에, 점유 포인트는 적어도 하나의 속성을 갖는 포인트 클라우드 미디어의 프레임 내 위치를 포함할 수 있다. 예를 들어 속성에는 크로마, 루나, 반사율 및 색상이 포함될 수 있다. 바운딩 박스 내의 포인트는 특정 좌표계로 식별할 수 있다. 예를 들어, 포인트 클라우드 미디어(1402) 프레임의 바운딩 박스(1408) 내의 포인트는 u1, v1 및 d1 좌표계에 의해 표시될 수 있다.A bounding box contains points that can be empty or filled with data. In some examples, an empty point may be a void point. In some examples, an occupancy point may include one or more attributes. A void point may contain the location of a point cloud media frame with no attributes. On the other hand, an occupancy point may include a position within a frame of point cloud media having at least one attribute. For example, attributes may include chroma, luna, reflectance, and color. A point within a bounding box can be identified by a specific coordinate system. For example, points within the bounding box 1408 of the frame of the point cloud media 1402 may be represented by u1, v1 and d1 coordinate systems.

일반적으로 클라이언트 장치가 네트워크를 통해 하나 이상의 포인트 클라우드 미디어 프레임을 다른 클라이언트 장치로 전송할 때 포인트 클라우드 미디어의 각 프레임이 인코딩된다. 인코딩은 적절한 전송을 보장하고 수신 클라이언트 장치에서 수신될 때 데이터가 손실되지 않도록 한다. 그러나 한 프레임의 포인트 클라우드 미디어의 크기가 수백 기가 바이트를 초과할 수 있기 때문에 포인트 클라우드 미디어의 프레임 인코딩은 복잡할 수 있다. 예를 들어, 클라이언트 장치와 같은 시스템은 제1 포인트 클라우드 프레임(1402)을 인코딩하려는 경우 500 기가 바이트의 데이터를 인코딩해야 하므로 이를 수행하는 데 시간과 리소스가 낭비된다. 따라서, 포인트 클라우드 미디어의 특정 프레임을 인코딩하기 위해 바운딩 박스, 예를 들어 바운딩 박스(1408)에 의존하는 다른 인코딩 기술이 필요하다. 아래에서 설명하는 바운딩 박스 표현을 통해 시스템은 단일 프레임의 포인트 클라우드 미디어를 압축된 형식으로 인코딩할 수 있다.Typically, when a client device transmits one or more frames of point cloud media to another client device over a network, each frame of point cloud media is encoded. Encoding ensures proper transmission and ensures that no data is lost when received at the receiving client device. However, frame encoding of point cloud media can be complicated because the size of a single frame of point cloud media can exceed hundreds of gigabytes. For example, a system such as a client device needs to encode 500 gigabytes of data if it wants to encode the first point cloud frame 1402, wasting time and resources to do so. Thus, another encoding technique that relies on a bounding box, e.g., bounding box 1408, is needed to encode a particular frame of point cloud media. The bounding box representation described below allows the system to encode a single frame of point cloud media into a compressed format.

일부 구현에서, 포인트 클라우드 미디어의 프레임은 비디오 기반 포인트 클라우드 압축(video-based point cloud compression, V-PCC) 코더를 사용하여 인코딩된다. V-PCC 코더는 포인트 클라우드 미디어 프레임을 3 차원 패치 세트로 분할하여 작동한다. 3 차원 패치는 3 차원 바운딩 박스에 의해 표시되고 전송을 위해 조작된다. 이에 대해서는 아래에서 자세히 설명한다.In some implementations, frames of the point cloud media are encoded using a video-based point cloud compression (V-PCC) coder. The V-PCC coder works by segmenting point cloud media frames into three-dimensional patch sets. A 3D patch is represented by a 3D bounding box and manipulated for transmission. This is explained in detail below.

도 15는 3 차원 패치 바운딩 박스를 2 차원 패치 투영으로 변환하는 과정(1500)의 예이다. 일부 구현에서, 변환 프로세스(1500)는 V-PCC 인코딩 솔루션을 사용하여 V-PCC 인코더에 의해 수행된다.15 is an example of a process 1500 of converting a 3D patch bounding box into a 2D patch projection. In some implementations, the conversion process 1500 is performed by a V-PCC encoder using a V-PCC encoding solution.

V-PCC 인코딩 솔루션 동안, V-PCC 인코더는 3 차원 바운딩 박스를 사용하여 포인트 클라우드 미디어의 프레임을 인코딩한다. 일반적으로 포인트 세트가 반복되고 바운딩 박스에 투영된 다음 부드럽기 연속 표면 기준의 정의에 따라 패치가 파티셔닝된다. 각 패치는 특정하고 고유한 인덱스와 해당 3D 좌표에 대응한다. 또한 일치하는 패치 목록은 목록에서의 순서가 유사하고 이 목록의 일치하는 패치가 동일한 인덱스를 갖도록 하기 위해 존재한다.During the V-PCC encoding solution, the V-PCC encoder encodes a frame of point cloud media using a three-dimensional bounding box. Typically, a set of points is iterated and projected onto a bounding box, then patches are partitioned according to the definition of a smooth continuous surface criterion. Each patch corresponds to a specific and unique index and its 3D coordinates. Also, the matching patch list exists so that the order in the list is similar and matching patches in this list have the same index.

먼저, 3-D 포인트 클라우드 미디어의 프레임은 하나 이상의 3-D 패치-바운딩 박스(1504)와 같은 3 차원(3-D) 패치 바운딩 박스 또는 분할 세트로 분할된다. 패치 바운딩 박스(1504)는 다음 파라미터 - u1, v1, d1, u1의 크기, v1의 크기 및 d1의 크기를 포함한다. 3-D 패치 바운딩 박스(1504)의 파라미터는 포인트 클라우드 3-D 바운딩 박스(1502) 내의 3-D 패치 바운딩 박스(1504)의 크기 및 배치를 나타낸다. 일부 구현에서, 시스템은 다수의 3-D 패치 바운딩 박스를 생성한다. 포인트 클라우드 미디어의 전체 프레임이 프레임 내의 객체를 덮도록 한다. 각 3D 패치 바운딩 박스는 서로 근접하게 배치할 수 있다. 일부 구현에서, 각각의 3-D 패치-바운딩 박스는 포인트 클라우드 3-D 바운딩 박스(1502) 내의 객체를 덮을 때 서로 겹칠 수 있다.First, a frame of 3-D point cloud media is divided into a set of partitions or three-dimensional (3-D) patch bounding boxes, such as one or more 3-D patch-bounding boxes 1504 . Patch bounding box 1504 includes the following parameters: u1, v1, d1, the size of u1, the size of v1, and the size of d1. The parameters of the 3-D patch bounding box 1504 represent the size and placement of the 3-D patch bounding box 1504 within the point cloud 3-D bounding box 1502. In some implementations, the system creates multiple 3-D patch bounding boxes. Let the entire frame of the point cloud media cover the objects within the frame. Each 3D patch bounding box can be placed close to each other. In some implementations, each 3-D patch-bounding box may overlap each other when covering an object within the point cloud 3-D bounding box 1502 .

그런 다음, V-PCC 인코더는 3D 패치 경계 박스(1504)와 같이, 각각의 3D 패치 바운딩 박스에 대한 포인트 클라우드 3D 바운딩 박스(1502)의 측면 중 하나로 투영 평면(1508)을 정의한다. V-PCC 인코더는 투영 평면을 선택하기 위한 기준을 정의한다. 예를 들어, 기준은 3-D 패치 바운딩 박스(1504)의 각 측면의 매끄러운 연속 표면 기준의 정의를 포함한다. 매끄러운 연속 표면 기준이 투영된 3-D 패치 바운딩 박스(1504)의 영역을 포함할 때 투영 평면이 선택되는데, 투영된 3-D 패치 바운딩 박스(1504)의 영역은 3 차원 바운딩 박스(1502)의 측면에 투영되고 투영을 위한 각 방향 중 최대 영역이 된다. 예를 들어, 매끄러운 연속 표면은 하나 이상의 매끄러운 연속 알고리즘을 사용하여 결정할 수 있다. 매끄러운 연속 표면은 최소한의 가려지거나 차단된 데이터 포인트가 있는 표면으로 정의할 수 있다. 그런 다음, V-PCC 인코더는 각각의 모든 방향으로부터의 각각의 매끄러운 표면을 비교하여 어느 방향이 3D 바운딩 박스(1502)의 측면에 최대 영역을 포함하는 2D 바운딩 박스를 생성할 것인지 결정한다.The V-PCC encoder then defines a projection plane 1508 as one of the sides of the point cloud 3D bounding box 1502 for each 3D patch bounding box, such as the 3D patch bounding box 1504 . A V-PCC encoder defines criteria for selecting a projection plane. For example, the criteria include the definition of a smooth continuous surface criteria on each side of the 3-D patch bounding box 1504. A projection plane is selected when the smooth continuous surface reference contains the area of the projected 3-D patch bounding box 1504, where the area of the projected 3-D patch bounding box 1504 is the area of the three-dimensional bounding box 1502. It is projected on the side and becomes the largest area in each direction for projection. For example, a smooth continuous surface can be determined using one or more smooth continuous algorithms. A smooth continuous surface can be defined as a surface with a minimum number of occluded or occluded data points. The V-PCC encoder then compares each smooth surface from each and every direction to determine which direction will produce a 2D bounding box containing the largest area on the side of the 3D bounding box 1502.

시스템(1500)에 예시된 바와 같이, 투영 평면(1508)은 최대 면적을 갖도록 3-D 패치 바운딩 박스(1504)의 평행 투영 면적에 대응한다. 달리 말하면, V-PCC 인코더가 투영면(1508)의 반대쪽(예를 들어, 측면 1505)에서 3-D 패치 바운딩 박스(1504)를 보는 경우, V-PCC 인코더는 패치 3-D 바운딩 박스(1504)에 포함된 속성 포인트가 투영 평면(1508)에 투영되었을 때, 그 투영은 포인트 클라우드 3-D 바운딩 박스(1502)의 다른 측면으로 투영되는 최대 면적을 나타낼 것으로 결정할 것이다. 포인트 클라우드 3-D 바운딩 박스(1502) 내에 포함된 포인트 클라우드 미디어의 프레임이 정지하여 움직이지 않는 경우, V-PCC 인코더는 포인트 클라우드 3-D 바운딩 박스(1502)의 각 측면으로부터의 프로젝션을 분석하여 어떤 프로젝션이 최대 영역을 포함할지를 결정할 수 있다. 따라서, V-PCC 인코더는 투영 평면(1508)이 패치 3-D 바운딩 박스(1504)로부터 속성의 투영을 위한 최대 영역을 생성할 것이라고 결정한다. 따라서 투영은 패치 2-D 투영 바운딩 박스(1506) 내에 표시된다. 패치 2-D 투영 바운딩 박스(1506)는 바운딩 박스의 근거리 레이어 및 원거리 레이어 상의 u 및 v 좌표에 의해 정의된다. 예를 들어, 도 17은 깊이 이미지(1708 및 1712) 및 속성 이미지(1710 및 1714)의 근거리 레이어(1704) 및 원거리 레이어(1706)를 도시한다.As illustrated in system 1500, projection plane 1508 corresponds to the parallel projected area of 3-D patch bounding box 1504 to have a maximum area. In other words, if the V-PCC encoder looks at the 3-D patch bounding box 1504 from the opposite side of the projection plane 1508 (e.g., side 1505), the V-PCC encoder will see the patch 3-D bounding box 1504 When the attribute points included in are projected onto the projection plane 1508 , that projection will determine that they represent the largest area projected onto the other side of the point cloud 3-D bounding box 1502 . When the frame of the point cloud media included in the point cloud 3-D bounding box 1502 is still and does not move, the V-PCC encoder analyzes the projections from each side of the point cloud 3-D bounding box 1502 and You can decide which projection will contain the maximum area. Thus, the V-PCC encoder determines that the projection plane 1508 will produce the largest area for projection of attributes from the patch 3-D bounding box 1504. The projection is thus displayed within the patch 2-D projection bounding box 1506. The patch 2-D projection bounding box 1506 is defined by the u and v coordinates on the near layer and the far layer of the bounding box. For example, FIG. 17 shows near layer 1704 and far layer 1706 of depth images 1708 and 1712 and attribute images 1710 and 1714 .

투영 평면, 예를 들어 투영 평면(1508)이 선택되면, 패치 3-D 바운딩 박스(1504)에 대한 법선 축이 그려지므로 V-PCC 인코더가 투영의 방향을 알 수 있다. 예를 들어, V-PCC 인코더는 포인트 클라우드 3-D 바운딩 박스(1502) 내에서 "n"으로 표시되는 투영 평면(1508)에 수직 또는 직교하는 축을 그린다. 추가로, V-PCC 인코더는 포인트 클라우드 3D 바운딩 박스(1502) 내에서 "bt" 및 "t"로 표시된 패치 3D 바운딩 박스(1504)에 대한 접선 및 양각 축을 생성한다. V-PCC 인코더가 포인트 3-D 바운딩 박스(1502) 클라우드 내에서 법선, 접선 및 양면 축을 그린 것에 대한 응답으로, 오른손 3-D 좌표계가 패치 3-D 바운딩 박스(1504)를 통해 양각과 직교 축 사이에 생성된다.When a projection plane, e.g. projection plane 1508, is selected, the normal axis for the patch 3-D bounding box 1504 is drawn so that the V-PCC encoder knows the direction of the projection. For example, the V-PCC encoder draws an axis perpendicular or orthogonal to the projection plane 1508 denoted by "n" within the point cloud 3-D bounding box 1502. Additionally, the V-PCC encoder generates tangential and positive axes for the patch 3D bounding box 1504 denoted by “bt” and “t” within the point cloud 3D bounding box 1502. In response to the V-PCC encoder drawing the normal, tangential, and double-sided axes within the point 3-D bounding box 1502 cloud, the right-hand 3-D coordinate system is drawn through the patch 3-D bounding box 1504 for the embossed and orthogonal axes. created between

그 포인트에서, V-PCC 인코더는 패치 3-D 바운딩 박스(1504)를 법선 축 "n"을 따라 포인트 클라우드 3-D 바운딩 박스(1502)의 투영 평면(1508)에 투영한다. 프로세스(1500)에 도시된 바와 같이, 투영의 결과는 포인트 클라우드(3-D 바운딩 박스(1502))의 특정 측면 상의 2 차원(2-D) 투영(1506)이다.At that point, the V-PCC encoder projects the patch 3-D bounding box 1504 onto the projection plane 1508 of the point cloud 3-D bounding box 1502 along the normal axis “n”. As shown in process 1500, the result of the projection is a two-dimensional (2-D) projection 1506 on a particular side of the point cloud (3-D bounding box 1502).

일부 구현에서, 포인트 클라우드 미디어 프레임의 3-D 투영은 포인트 클라우드 3-D 바운딩 박스(1502)의 각 측면에 투영된다. 프레임이 포인트 클라우드 3-D의 측면에 투영될 때 바운딩 박스(1502)에서 2-D 바운딩 박스에 대한 좌표 세트가 획득될 수 있다. 예를 들어 2 차원 바운딩 박스의 좌표 세트에는 u0, v0, size_u0 및 size_v0이 포함된다. 바운딩 박스(1510)에 도시된 바와 같이, V-PCC 인코더는 바운딩 박스(1510) 내의 포인트 클라우드의 표면을 투영 평면(1512)에 투영한다.In some implementations, a 3-D projection of the point cloud media frame is projected onto each side of the point cloud 3-D bounding box 1502 . A set of coordinates for the 2-D bounding box can be obtained from the bounding box 1502 when the frame is projected to the side of the point cloud 3-D. For example, the set of coordinates of a 2D bounding box includes u0, v0, size_u0, and size_v0. As shown in the bounding box 1510 , the V-PCC encoder projects the surface of the point cloud within the bounding box 1510 onto a projection plane 1512 .

도 16은 3-D 내지 2-D 패치 투영 결과를 예시하는 시스템(1600)의 예이다. 시스템(1600)은 바운딩 박스(1602)의 각 측면에 2-D 패치 투영을 갖는 바운딩 박스(1602)를 예시한다. 바운딩 박스의 각 측면 상의 투영은 시스템(1500)에 대해 설명된 바와 같이 패치의 직교 방향에 기초한다. 각 2D 패치 투영에는 패치의 2D 투영의 2D 좌표를 설명하는 패치 인덱스 u0, v0, size_u0 및 size_v0이 포함된다. 패치 인덱스는 대응하는 바운딩 박스(1602)의 특정 면과 연관된 특정 패치를 식별한다. U0는 투영 평면에서 패치의 X 좌표를 정의한다. V0은 투영 평면에서 패치의 Y 좌표를 정의한다. Size_u0 및 size_v0은 각각 패치 u0 및 v0의 각 좌표에 대응하는 크기를 설명한다.16 is an example system 1600 illustrating 3-D to 2-D patch projection results. System 1600 illustrates bounding box 1602 with 2-D patch projections on each side of bounding box 1602 . The projections on each side of the bounding box are based on the orthogonal orientation of the patch as described for system 1500. Each 2D patch projection contains patch indices u0, v0, size_u0 and size_v0 describing the 2D coordinates of the 2D projection of the patch. The patch index identifies a particular patch associated with a particular face of the corresponding bounding box 1602 . U0 defines the X coordinate of the patch in the projection plane. V0 defines the Y coordinate of the patch in the projection plane. Size_u0 and size_v0 describe sizes corresponding to respective coordinates of patches u0 and v0, respectively.

바운딩 박스(1602)에 투영된 패치는 "패치 타일 그룹"을 형성한다. 패치 타일 그룹의 각 요소는 특정 패치에 대응하며, 이 특정 패치에는 특정의 고유한 인덱스가 포함되며 3D 포인트 클라우드 프레임 내의 고유한 3D 바운딩 박스에 대응한다.The patches projected into the bounding box 1602 form a "patch tile group". Each element of the patch tile group corresponds to a specific patch, which contains a specific unique index and corresponds to a unique 3D bounding box within the 3D point cloud frame.

바운딩 박스(1602)에 투영 맵을 생성하는 것에 추가하여, 시스템은 패치 2-D 및 3-D 데이터를 메모리에 저장한다. 예를 들어, 아래에 설명된 코드는 패치를 생성하고 비디오 코딩을 위한 보충 데이터로 2D 및 3D 패치 데이터를 저장하는 한 가지 예시적인 솔루션을 제공한다:In addition to generating projection maps in the bounding box 1602, the system stores the patch 2-D and 3-D data in memory. For example, the code described below provides one example solution for generating patches and storing 2D and 3D patch data as supplemental data for video coding:

Algorithm. PCC coder additional data export Algorithms. PCC coder additional data export 1. while frame k in GoF frames do generate patches in frame k.
for each patch in frame k
generate projection map from patches
store patch2D data: U0, sizeU0, V0, sizeV0
store patch3D data: U1, V1, D1, axis
end for
end while 1. while frame k in GoF frames do generate patches in frame k .
for each patch in frame k
generate projection map from patches
store patch2D data: U0, sizeU0, V0, sizeV0
store patch3D data: U1, V1, D1, axis
end for
end while

For k-th frame: double patch3DCoor[MAX_NUM_PATCHES][4];For k-th frame: double patch3DCoor[MAX_NUM_PATCHES][4];

size_t patch2DCoor[MAX_NUM_PATCHES][4]; size_t patch2DCoor[MAX_NUM_PATCHES][4];

patch2DCoor[patchIndex][0] = patch.getU0() * patch.getOccupancyResolution(); patch2DCoor[patchIndex][0] = patch.getU0() * patch.getOccupancyResolution();

patch2DCoor[patchIndex][1] =(patch.getU0() + patch.getSizeU0()) * patch.getOccupancyResolution(); patch2DCoor[patchIndex][1] =(patch.getU0() + patch.getSizeU0()) * patch.getOccupancyResolution();

patch2DCoor[patchIndex][2] = patch.getV0() * patch.getOccupancyResolution(); patch2DCoor[patchIndex][2] = patch.getV0() * patch.getOccupancyResolution();

patch2DCoor[patchIndex][3] =(patch.getV0() + patch.getSizeV0()) * patch.getOccupancyResolution(); patch2DCoor[patchIndex][3] =(patch.getV0() + patch.getSizeV0()) * patch.getOccupancyResolution();

PCCVector3D pointStart, pointEnd; PCCVector3D pointStart, pointEnd;

const double lodScale = params.ignoreLod_ ? 1.0 : double(1u << patch.getLod()); const double lodScale = params.ignoreLod_ ? 1.0 : double(1u << patch.getLod());

int x = patch.getU0() * patch.getOccupancyResolution(), y = patch.getV0() * patch.getOccupancyResolution(); int x = patch.getU0() * patch.getOccupancyResolution(), y = patch.getV0() * patch.getOccupancyResolution();

pointStart[patch.getNormalAxis()] = double(frame0.getValue(0, x, y) + patch.getD1()) * lodScale; pointStart[patch.getNormalAxis()] = double(frame0.getValue(0, x, y) + patch.getD1()) * lodScale;

pointStart[patch.getTangentAxis()] = patch.getU1() * lodScale; pointStart[patch.getTangentAxis()] = patch.getU1() * lodScale;

pointStart[patch.getBitangentAxis()] = patch.getV1() * lodScale; pointStart[patch.getBitangentAxis()] = patch.getV1() * lodScale;

x =(patch.getU0() + patch.getSizeU0()) * patch.getOccupancyResolution() - 1; x = (patch.getU0() + patch.getSizeU0()) * patch.getOccupancyResolution() - 1;

y =(patch.getV0() + patch.getSizeV0()) * patch.getOccupancyResolution() - 1; y = (patch.getV0() + patch.getSizeV0()) * patch.getOccupancyResolution() - 1;

pointEnd[patch.getNormalAxis()] = double(frame0.getValue(0, x, y) + patch.getD1()) * lodScale; pointEnd[patch.getNormalAxis()] = double(frame0.getValue(0, x, y) + patch.getD1()) * lodScale;

pointEnd[patch.getTangentAxis()] =(patch.getSizeU0() * patch.getOccupancyResolution() - 1 + patch.getU1()) * lodScale; pointEnd[patch.getTangentAxis()] =(patch.getSizeU0() * patch.getOccupancyResolution() - 1 + patch.getU1()) * lodScale;

pointEnd[patch.getBitangentAxis()] =(patch.getSizeV0() * patch.getOccupancyResolution() - 1 + patch.getV1()) * lodScale; pointEnd[patch.getBitangentAxis()] =(patch.getSizeV0() * patch.getOccupancyResolution() - 1 + patch.getV1()) * lodScale;

patch3DCoor[patchIndex][0] = pointStart[patch.getTangentAxis()]; patch3DCoor[patchIndex][0] = pointStart[patch.getTangentAxis()];

patch3DCoor[patchIndex][1] = pointStart[patch.getBitangentAxis()]; patch3DCoor[patchIndex][1] = pointStart[patch.getBitangentAxis()];

patch3DCoor[patchIndex][2] = pointEnd[patch.getTangentAxis()]; patch3DCoor[patchIndex][2] = pointEnd[patch.getTangentAxis()];

patch3DCoor[patchIndex][3] = pointEnd[patch.getBitangentAxis()]; patch3DCoor[patchIndex][3] = pointEnd[patch.getBitangentAxis()];

비디오 코딩에 대한 보조 정보 생성Generate auxiliary information for video coding

위에서 설명한 코드는 "패치"의 정의를 보여준다. 패치 3D 정보가 2D 패치 정보로 전송되는 방법; 프로젝션 프로세스; 및 재구성 프로세스. 특히 이러한 프레임이 2D 투영 측면에서 해석되는 방식이다.The code described above shows the definition of a "patch". how patch 3D information is transmitted as 2D patch information; projection process; and reconstitution process. Specifically, the way these frames are interpreted in terms of 2D projections.

추가적으로, 프레임의 각 패치에 대해 V-PCC 인코더는 투영 맵을 생성한다. 투영 맵에는 비디오 인코더에 제공되는 추가 텍스트 파일이 포함되어 있다. 특정 패치에 대한 2D 데이터, 예를 들어, u0, v0는 투영의 X 및 Y 좌표의 왼쪽 상단 모서리에 대응한다. Size_u0 및 size_v0은 해당 패치의 높이와 너비에 대응한다. 3D 패치 투영 데이터의 경우 투영 맵에는 X, Y 및 Z 축에 대응하는 u1, v1 및 d1 좌표가 포함된다. 특히, u1, v1 및 d1은 투영면의 인덱스 또는 법선 축에 대응하는 투영면에 대응한다. 예를 들어, 시스템(1600)에 예시된 바와 같이, 바운딩 박스(1602)의 각 측면은 자신의 인덱스를 포함한다. 따라서 각 인덱스는 특정 투영 축을 나타낸다. 이와 관련하여, u1, v1 및 d1은 바운딩 박스(1602)에 도시된 X, Y 및 Z 좌표의 재구성이다.Additionally, for each patch in a frame, the V-PCC encoder creates a projection map. The projection map contains an additional text file provided to the video encoder. The 2D data for a particular patch, eg u0, v0, corresponds to the top left corner of the X and Y coordinates of the projection. Size_u0 and size_v0 correspond to the height and width of the patch. For 3D patch projection data, the projection map contains u1, v1 and d1 coordinates corresponding to the X, Y and Z axes. In particular, u1, v1 and d1 correspond to the projection plane corresponding to the index or normal axis of the projection plane. For example, as illustrated in system 1600, each side of bounding box 1602 includes its own index. Each index therefore represents a specific projection axis. In this regard, u1, v1, and d1 are reconstructions of the X, Y, and Z coordinates shown in bounding box 1602.

일부 구현에서, u1, v1 및 d1 좌표는 로컬 좌표계에 대응한다. X, Y 및 Z 좌표는 전역 좌표계에 대응한다. 예를 들어, 수직 축이 결정되면(투영 축에 대응) V-PCC 인코더는 3D 공간 내에서 Z 축과 정렬되도록 투영 축을 회전할 수 있다. 이 포인트에서 V-PCC 인코더는 로컬 좌표계와 글로벌 좌표계 사이를 변환할 수 있다.In some implementations, the u1, v1 and d1 coordinates correspond to a local coordinate system. X, Y and Z coordinates correspond to the global coordinate system. For example, once the vertical axis is determined (corresponding to the projection axis), the V-PCC encoder can rotate the projection axis to align with the Z axis in 3D space. At this point, the V-PCC encoder can transform between the local coordinate system and the global coordinate system.

제안된 솔루션은 또한 비디오 압축 기술에서 비디오 압축 솔루션에 대한 추가 입력으로 사용되는 보조 정보를 기반으로 모션 보상을 위한 추가 모션 벡터 후보를 제공한다:The proposed solution also provides additional motion vector candidates for motion compensation based on the auxiliary information used as an additional input to the video compression solution in the video compression technique:

도 17은 클라우드 포인트 미디어에 대한 속성 분할을 위한 시스템(1700)의 예이다. 투영에서 각 3D 포인트 클라우드에 대한 2D 패치가 생성되면 패치 타일 그룹 내의 각 2D 패치에 대해 이미지 세트가 생성된다. 시스템(1700)은 3 차원에서 2 차원으로 투영하는 동안 깊이 및 속성 정보가 손실되기 때문에 패치 타일 그룹 내의 2 차원 패치의 각 패치에 대한 이미지 세트를 생성한다. 투영이 발생하기 전에 포인트 클라우드 미디어에 대응하는 깊이 및 속성 정보를 보존하기 위해 이미지 세트가 생성된다.17 is an example of a system 1700 for attribute segmentation for cloud point media. Once a 2D patch is created for each 3D point cloud in the projection, a set of images is created for each 2D patch within the patch tile group. System 1700 creates a set of images for each patch of a 2D patch within a group of patch tiles since depth and attribute information is lost during projection from 3D to 2D. A set of images is created to preserve depth and attribute information corresponding to the point cloud media before projection takes place.

일부 구현에서, 패치 타일 그룹은 패치(1702)를 포함한다. 시스템은 특정 패치(1702)로부터 두 세트의 이미지를 생성한다. 제1 이미지 세트는 근거리 레이어(1704)를 포함한다. 제2 이미지 세트는 원거리 레이어(1712)를 포함한다. 근거리 레이어(1704)는 깊이 데이터의 이미지(1708) 및 속성 데이터의 이미지(1710)를 포함한다. 추가적으로, 원거리 레이어(1706)는 깊이 데이터의 이미지(1712) 및 속성 데이터의 이미지(1712)를 포함한다. 따라서 2-D 투영의 각각의 모든 포인트는 깊이 정보 및 속성 정보, 예를 들어, 색상, 질감, 루나 등을 갖는다. 특정 레이어, 예를 들어, 근처 레이어(1704) 및 원거리 레이어(1706)는 3D 포인트 클라우드 객체의 2-D 투영의 양면을 보여준다.In some implementations, the patch tile group includes patch 1702. The system creates two sets of images from a particular patch 1702. The first set of images includes a near layer 1704 . The second set of images includes a far layer 1712 . The near layer 1704 includes an image 1708 of depth data and an image 1710 of attribute data. Additionally, the far layer 1706 includes an image 1712 of depth data and an image 1712 of attribute data. Thus, each and every point of the 2-D projection has depth information and attribute information such as color, texture, luna, etc. Certain layers, eg, near layer 1704 and far layer 1706 show both sides of a 2-D projection of a 3D point cloud object.

도 18은 속성 정보를 갖는 포인트 클라우드 미디어에 대한 패키징 패치를 예시하는 시스템(1800)의 예이다. 일부 구현에서, 속성 정보는 텍스처 정보, 깊이 정보, 색상 정보, 루나, 반사율 및 크로마 정보를 포함할 수 있다. 패키징된 패치 그룹(1802)은 포인트 클라우드 미디어(1402)의 프레임과 같은 포인트 클라우드 미디어의 프레임으로부터의 하나 이상의 투영된 패치, 2-D 프레임을 예시한다. 패치의 컬렉션은 패치 타일 그룹 및 패치 타일 그룹을 생성한다. 주어진 포인트 클라우드 미디어 프레임에 대해 패치 데이터 그룹으로 결합된다. "패치"라고 하는 패치 데이터 그룹의 각 요소는 특정 고유 인덱스를 포함하며 3D 포인트 클라우드 프레임 내의 고유한 3D 바운딩 박스에 대응한다. 패키징된 패치(1802)는 인코딩된 후 전송될 수 있다. 특히, 하나의 포인트 클라우드 프레임의 패치가 참조 포인트 클라우드 프레임, 예를 들어 이전 프레임에 대응하는 참조 패치를 가지고 있다면, 참조 패치 타일 그룹의 참조 패치의 인덱스가 비트스트림으로 전송된다. 이에 대해서는 아래에서 자세히 설명한다.18 is an example system 1800 illustrating packaging patches for point cloud media with attribute information. In some implementations, attribute information can include texture information, depth information, color information, lunar, reflectance, and chroma information. Packaged patch group 1802 illustrates one or more projected patches, 2-D frames, from frames of point cloud media, such as frames of point cloud media 1402 . A collection of patches creates patch tile groups and patch tile groups. Combined into patch data groups for a given point cloud media frame. Each element of the patch data group, referred to as a "patch", contains a specific unique index and corresponds to a unique 3D bounding box within the 3D point cloud frame. The packaged patch 1802 may be encoded and then transmitted. In particular, if a patch of one point cloud frame has a reference point cloud frame, eg, a reference patch corresponding to a previous frame, the index of the reference patch of the reference patch tile group is transmitted as a bitstream. This is explained in detail below.

도 19는 모션 추정을 수행하기 위한 시스템(1900)의 예이다. 이전 비디오 압축 솔루션에서 2D 프로젝션 이미지, 예를 들어, 패치의 데이터는 모션 추정 프로세스를 사용하여 추정된다. 시스템(1900)은 이러한 모션 추정 프로세스를 예시한다. 블록(1902)에 도시된 인코딩 및 디코딩 블록은 인코더를 사용하여 모션 추정을 생성하고, 데이터 전환 채널 또는 스토리지를 통해 모션 추정을 전송하고, 디코더에서 전송된 모션 추정을 디코딩한다. 결과는 새로 복원된 프레임이다.19 is an example of a system 1900 for performing motion estimation. In previous video compression solutions, the data of a 2D projection image, eg a patch, is estimated using a motion estimation process. System 1900 illustrates this motion estimation process. The encoding and decoding block shown at block 1902 generates a motion estimate using an encoder, transmits the motion estimate over a data conversion channel or storage, and decodes the transmitted motion estimate at a decoder. The result is a newly reconstructed frame.

블록(1904)은 모션 추정 프로세스의 예시를 묘사한다. 예를 들어, 인코더는 참조 프레임에서 유사한 패치 이미지의 예측자를 생성한다. 예를 들어 참조 프레임은 이전 프레임일 수 있다. 일부 구현에서, 인코더는 참조 프레임과 현재 프레임 사이의 샘플 값을 비교함으로써 유사성을 결정한다. 예를 들어 샘플 값은 속성 및 위치에 따라 다를 수 있다. 이 프로세스를 현재 프레임의 이미지와 유사한 참조 프레임에서 해당 이미지를 찾는 것을 목표로 모션 추정이라고 한다. 인코더는 참조 프레임의 인접 픽셀 내에서 픽셀을 검색한다. 일치하는 이미지가 발견되면 인코더는 현재 및 참조 프레임의 각 이미지를 구체적으로 인코딩하지 않고 적은 양의 정보를 필요로 하는 유사성을 인코딩한다. 전반적으로 이것은 전송 대역폭을 줄인다. 따라서 인코더는 유사한 이미지에 대해 현재 및 이전 프레임을 분석한다. 현재 프레임과 참조 (이전) 프레임 사이의 각 이미지에 대한 속성 및 위치 정보와 관련된 구체화 데이터를 결정한다. 그런 다음 인코더는 현재 프레임과 미래 예측 프레임에 대한 개선 데이터를 인코딩한다. 일반적으로 2D 모션 비디오 또는 2D 이미지를 전송하는 데 사용된다.Block 1904 depicts an example of a motion estimation process. For example, an encoder generates predictors of similar patch images in a reference frame. For example, the reference frame may be a previous frame. In some implementations, an encoder determines similarity by comparing sample values between a reference frame and a current frame. For example, sample values may vary by attribute and location. This process is called motion estimation, with the goal of finding that image in a reference frame that is similar to the image in the current frame. The encoder searches for a pixel within neighboring pixels of a reference frame. When a matching image is found, the encoder encodes the similarity, which requires a small amount of information, without specifically encoding each image in the current and reference frames. Overall, this reduces transmission bandwidth. Thus, the encoder analyzes the current and previous frames for similar images. Determine specific data related to attributes and location information for each image between the current frame and the reference (previous) frame. The encoder then encodes the enhancement data for the current frame and future predicted frames. It is commonly used to transmit 2D motion video or 2D images.

패치 패킹 방법의 특성으로 인해, 각 포인트 클라우드 미디어에 대한 투영 프레임 내의 패치 위치는 상당히 다를 수 있다. 따라서 서로 다른 프레임 사이에서 패치는 3D 바운딩 박스의 다른 측면을 가로지르는 것과 같이 한 위치에서 다른 위치로 이동할 수 있다. 이와 같이, 일반적으로 2-D 모션 비디오 또는 2-D 이미지에 사용되는 인터-예측 코딩 알고리즘에서 패치 위치를 사용한 예측 모션 추정은 만족할 수 없어 적절하게 인코딩 및 전송되지 않는다. 따라서, 블록(1902 및 1904)은 유용하지만 패치 데이터 그룹을 사용하여 모션 추정 데이터를 전송하기 위한 향상을 필요로 한다. 따라서, 아래에서 설명하는 기술은 모션 벡터 후보 목록에 보조 정보를 추가함으로써 2 차원 패치 데이터에 대한 모션 추정을 사용하는 예측의 사용을 향상시킨다.Due to the nature of the patch packing method, the location of patches within a projection frame for each point cloud media can be quite different. Thus, between different frames, patches can move from one location to another, such as across different sides of a 3D bounding box. As such, predictive motion estimation using patch positions in inter-predictive coding algorithms commonly used for 2-D motion video or 2-D images is unsatisfactory and is not properly encoded and transmitted. Thus, blocks 1902 and 1904 are useful but require enhancements to transmit motion estimation data using patch data groups. Thus, the technique described below enhances the use of prediction using motion estimation for two-dimensional patch data by adding auxiliary information to the motion vector candidate list.

전송할 2-D 패치 간의 시간적 예측의 경우, 전송 중 최대 압축 효율을 보장하기 위해 유사한 패치에 대해 유효한 모션 벡터 후보가 제공된다. 기존의 모션 벡터 후보 목록 구성 방법은 기존 후보를 대체하거나 패치 보조 정보에서 생성된 추가 후보를 도입함으로써 개선될 수 있다.For temporal prediction between 2-D patches to be transmitted, valid motion vector candidates are provided for similar patches to ensure maximum compression efficiency during transmission. Existing motion vector candidate list construction methods can be improved by replacing existing candidates or introducing additional candidates generated from patch auxiliary information.

일부 구현에서, 프레임의 특정 패치는 패치 메타 데이터에 의해 포인트 클라우드 미디어 내의 3D 좌표에 연결된 특정 2-D 좌표를 갖는다. 패치 데이터( "보조 정보"라고도 함)는 패치의 해당 3D 위치와 2D 평면의 관련 투영 위치를 기반으로 한다. 일부 구현에서, 보조 정보 또는 패치 메타 데이터는 패치 인덱스, u0, v0, u1, v1 및 d1을 포함할 수 있다. u0 및 v0은 2D 투영에서 패치의 왼쪽 상단 모서리에 대응한다. u1, v1 및 d1은 3 차원 도메인의 X, Y 및 Z 좌표를 나타낸다. 따라서 u0 및 v0 좌표는 u1, v1 및 d1 좌표에 연결된다. 연결은 u1, v1 및 d1 좌표와 u0 및 v0 좌표 간의 수학적 관계를 기반으로 한다. 2 차원과 3 차원 좌표 간의 이러한 연결은 모션 벡터 후보를 생성하고, 벡터 후보 목록을 업데이트하고, 모션 벡터 검색 프로세스를 업데이트하는 데 사용할 수 있다.In some implementations, specific patches of a frame have specific 2-D coordinates linked to 3-D coordinates within the point cloud media by patch metadata. Patch data (also called "auxiliary information") is based on the patch's corresponding 3D position and its associated projection position on a 2D plane. In some implementations, the auxiliary information or patch meta data may include patch indices, u0, v0, u1, v1 and d1. u0 and v0 correspond to the upper left corner of the patch in the 2D projection. u1, v1 and d1 represent the X, Y and Z coordinates of the three-dimensional domain. Thus, u0 and v0 coordinates are linked to u1, v1 and d1 coordinates. The connection is based on the mathematical relationship between u1, v1 and d1 coordinates and u0 and v0 coordinates. These connections between 2D and 3D coordinates can be used to generate motion vector candidates, update the vector candidate list, and update the motion vector search process.

도 20은 현재 프레임의 패치와 참조 프레임의 패치 사이의 모션 벡터 후보를 나타내는 시스템의 일 예이다. 클라이언트 장치의 인코더, 예를 들어, V-PCC 인코더는 현재 패치 프레임을 이전에 인코딩된 참조 프레임으로 분석한다. 인코더는 현재 프레임과 비교하기 위해 먼저 이전에 인코딩된 참조 프레임을 디코딩해야 한다. 그런 다음 인코더는 현재 패치 프레임을 예측 단위(PU)라고 하는 하나 이상의 작은 블록에 제공한다. 각 PU에 저장되는 정보의 양이 중요하므로 전송 시 정보의 양을 줄이기 위해 인코더는 차등 코딩 방식을 사용한다. 인코더는 잔여 프레임을 생성한다. 예를 들어, 예측 유닛 PU(참조 프레임의 패치에 속함)는 현재 PU(현재 프레임에 속함), 예를 들어 현재 PU- 예측 유닛 PU에서 감산되어 잔여 프레임을 생성한다. 인코더는 이후에 전송에 사용될 잔여 프레임만 인코딩한다. 예를 들어 다음 코드는 이 프로세스를 설명한다:20 is an example of a system representing a motion vector candidate between a patch of a current frame and a patch of a reference frame. An encoder of the client device, eg, a V-PCC encoder, parses the current patch frame into a previously encoded reference frame. The encoder must first decode the previously encoded reference frame for comparison with the current frame. The encoder then feeds the current patch frame into one or more small blocks called prediction units (PUs). Since the amount of information stored in each PU is important, the encoder uses a differential coding scheme to reduce the amount of information during transmission. The encoder creates residual frames. For example, a prediction unit PU (belonging to a patch of a reference frame) is subtracted from a current PU (belonging to a current frame), eg, a current PU-prediction unit PU to generate a residual frame. The encoder encodes only the remaining frames to be used for transmission later. For example, the following code illustrates this process:

1. Split current frame into N PU blocks1. Split current frame into N PU blocks

2. Generate PU_pred candidate list from the current auxiliary frame2. Generate PU_pred candidate list from the current auxiliary frame

3. for i = 0 to N-13. for i = 0 to N-1

get the(x,y) coordinates for top left corner of the PU_cur[i]get the(x,y) coordinates for top left corner of the PU_cur[i]

perform search for predictor PU_pred in the reference imageperform search for predictor PU_pred in the reference image

cost[k] = maxcost[k] = max

while cost[k] > minCost[k-1] dowhile cost[k] > minCost[k-1] do

cost[k] = (PU_cur[i] - PU_pred[k]) + lambda * BitSizecost[k] = (PU_cur[i] - PU_pred[k]) + lambda * BitSize

encode in bitstream:encode in bitstream:

MV = PU_cur[i](x,y) - PU_pred[k](x,y)MV = PU_cur[i](x,y) - PU_pred[k](x,y)

ResidualUnit = Transform(PU_cur[i]-PU_pred[k])ResidualUnit = Transform(PU_cur[i]-PU_pred[k])

위의 코드는 속도 왜곡 문제 최소화를 기반으로 하는 코딩 모드 선택 프로세스를 설명한다. 일부 구현들에서, 인코더는 이전에 인코딩된 참조 프레임에서 예측기 유닛 PU를 검색하고 잔여물을 인코딩하는 데 필요한 비트의 양에 대해 현재 PU에 대한 값에 매칭하기 위한 검색을 수행한다. 예측 단위(PU)를 선택하는 프로세스는 레이트 왜곡 최소화 문제를 포함하는데, 여기서 레이트는 코딩된 잔여의 비트 크기에 대응하고 왜곡은 잔여의 L2 노름에 대응하며 기원. 그런 다음 인코더는 잔여 정보와 해당 변위 정보를 인코딩하여 모션 벡터가 된다. 그러나 이 구현에는 포인트 클라우드 미디어를 인코딩하고 전송할 때 개선 사항이 필요하다.The code above describes the coding mode selection process based on minimizing the speed distortion problem. In some implementations, the encoder searches for a predictor unit PU in a previously encoded reference frame and performs a search to match the value for the current PU for the amount of bits needed to encode the residual. The process of selecting a prediction unit (PU) involves a rate distortion minimization problem, where the rate corresponds to the bit size of the coded residual and the distortion corresponds to the L2 norm of the residual and origin. The encoder then encodes the residual information and the corresponding displacement information to become a motion vector. However, this implementation needs improvements when encoding and transmitting point cloud media.

인코더는 패치 데이터로부터 포함된 부가적인 보조 정보를 검토함으로써 향상된다. 현재 프레임의 경우 보조 정보는 현재 2D 패치 및 해당 3D 포인트 클라우드에서 생성된다. 참조 프레임의 경우 이전에 인코딩된 참조 2D 패치 및 해당 3D 포인트 클라우드에서 보조 정보가 생성된다. 인코더는 현재 프레임의 패치, 예를 들어, u0, v0, u1, v1, d1의 보조 정보를 참조 프레임의 패치, 예를 들어, u0, v0, u1, v1 및 d1의 보조 정보와 비교하여 블록 위치를 예측할 수 있다. 비교 결과 보조 정보를 기반으로 패치 간의 거리 m가 나타난다.The encoder is enhanced by examining the additional auxiliary information contained from the patch data. For the current frame, auxiliary information is generated from the current 2D patch and the corresponding 3D point cloud. For a reference frame, auxiliary information is generated from the previously encoded reference 2D patch and the corresponding 3D point cloud. The encoder compares the side information of the patch of the current frame, e.g. u0, v0, u1, v1, d1, with the side information of the patch of the reference frame, e.g. u0, v0, u1, v1, and d1, to locate the block. can predict As a result of the comparison, based on the auxiliary information, the distance m between the patches appears.

일부 구현에서, 인코더는 먼저 3-D 차이를 결정한다. 3 차원 차이는 CU의 시작 픽셀의 3 차원 위치가 다를 수 있으므로 현재 보조 정보 프레임과 기준 보조 정보 프레임 간의 3 차원 차이에 대응한다. 식에는 다음이 포함된다:In some implementations, the encoder first determines the 3-D difference. The 3-dimensional difference corresponds to the 3-dimensional difference between the current auxiliary information frame and the reference auxiliary information frame since the 3-dimensional position of the starting pixel of the CU may be different. Expressions include:

그런 다음 인코딩은 2 차원 차이를 계산한다. 2 차원 차이는 점유 블록 해상도(OBR)를 고려할 때 현재 보조 정보 프레임과 참조 보조 정보 프레임 간의 2 차원 차이에 대응한다. OBR은 패치 데이터 그룹의 좌표 배율 대 패치 투영 크기에 대응한다. 예를 들어, 패치 너비는 패치 투영 크기에 OBR을 곱한 값과 같다. 그런 다음 인코더는 2D 도메인에서 모션 벡터를 생성한다. 식에는 다음이 포함된다:Encoding then computes the two-dimensional difference. The two-dimensional difference corresponds to the two-dimensional difference between the current auxiliary information frame and the reference auxiliary information frame when considering the occupied block resolution (OBR). The OBR corresponds to the coordinate scale of the patch data group versus the patch projection size. For example, patch width equals patch projection size multiplied by OBR. The encoder then generates motion vectors in the 2D domain. Expressions include:

최종 유도된 모션 벡터는 이 두 모션 벡터 컴포넌트의 조합이다. 예를 들어, 최종 유도 모션 벡터는 다음에 대응한다:The final derived motion vector is a combination of these two motion vector components. For example, the final derived motion vector corresponds to:

비 규범적 솔루션에서는 모션 벡터(MV)가 검색 범위 중심의 후보로 추정된다. 유도된 추정 MV를 후보로 추가한 후 인코더는 검색 범위의 중심을 결정하기 위해 RDO를 사용하여 파티션 2Nx2N의 예측 MV, 0 MV, 추정 MV 및 MV 중에서 선택할 것이다. 추정된 MV에 최소 속도 왜곡(R-D) 비용이 있는 경우 비디오 압축에 사용될 것이다.In the non-normative solution, the motion vector (MV) is estimated as a candidate for the center of the search range. After adding the derived estimated MVs as candidates, the encoder will use RDO to select among the predicted MVs, 0 MVs, estimated MVs and MVs of partition 2Nx2N to determine the center of the search range. If the estimated MV has a minimum velocity distortion (R-D) cost, it will be used for video compression.

도 21은 병합 후보 목록 구성을 위한 유도 프로세스(2100)를 예시한다. 일부 구현에서, 예측 단위가 병합 모드를 사용하여 예측될 때, 병합 후보 목록의 엔트리를 가리키는 인덱스가 비트스트림으로부터 파싱되고 모션 정보를 검색하는 데 사용된다. 이 목록의 구성은 HEVC 표준에 지정되어 있으며 다음 단계 순서에 따라 요약할 수 있다:21 illustrates a derivation process 2100 for merging candidate list construction. In some implementations, when a prediction unit is predicted using merge mode, an index pointing to an entry in the merge candidate list is parsed from the bitstream and used to retrieve motion information. The organization of this list is specified in the HEVC standard and can be summarized in the following sequence of steps:

단계 1: 초기 후보 유도Step 1: Derivation of initial candidates

단계 1.1: 공간 후보 유도Step 1.1: Deriving spatial candidates

단계 1.2: 공간 후보에 대한 중복 검사Step 1.2: Redundancy Check for Spatial Candidates

단계 1.3: 시간 후보 유도Step 1.3: Deriving Temporal Candidates

단계 2: 추가 후보 삽입Step 2: Insert additional candidates

단계 2.1: 양방향-예측 후보 생성Step 2.1: Generate bi-prediction candidates

단계 2.2: 제로 모션 후보 삽입Step 2.2: Insert zero motion candidates

이들 단계는 또한 도 21에 개략적으로 도시되어 있다. 공간 병합 후보 유도를 위해, 5 개의 상이한 위치에 위치한 후보들 중에서 최대 4 개의 병합 후보가 선택된다. 시간 병합 후보 유도의 경우 두 후보 중 최대 하나의 병합 후보가 선택된다. 디코더에서는 각 예측 단위에 대해 일정한 수의 후보를 가정하기 때문에 후보 수가 슬라이스 헤더에서 시그널링되는 최대 병합 후보(MaxNumMergeCand) 수에 도달하지 않으면 추가 후보가 생성된다. 후보의 수가 일정하기 때문에 최적 병합 후보의 인덱스는 절단된 단항 이진화(truncated unary binarization)(TU)를 사용하여 인코딩된다. CU의 크기가 8이면 현재 CU의 모든 PU는 2Nx2N 예측 단위의 병합 후보 목록과 동일한 단일 병합 후보 목록을 공유한다.These steps are also shown schematically in FIG. 21 . For spatial merging candidate derivation, up to 4 merging candidates are selected among 5 different positional candidates. In case of temporal merging candidate derivation, at most one merging candidate among two candidates is selected. Since the decoder assumes a certain number of candidates for each prediction unit, additional candidates are generated if the number of candidates does not reach the maximum number of merge candidates (MaxNumMergeCand) signaled in the slice header. Since the number of candidates is constant, the index of the best merging candidate is encoded using truncated unary binarization (TU). If the size of the CU is 8, all PUs of the current CU share a single merge candidate list identical to that of the 2Nx2N prediction unit.

이하에서, 전술한 단계와 관련된 동작이 상세히 설명된다.In the following, operations related to the foregoing steps are described in detail.

도 22는 공간 병합 후보들의 중복 검사를 위해 고려되는 공간 병합 후보들 및 후보 쌍들의 위치들의 시스템(2200)을 예시한다. 공간적 병합 후보의 도출에서는 도 22에 도시된 위치에 위치한 후보들 중에서 최대 4 개의 병합 후보가 선택된다. 도출 순서는 A_1, B_1, B_0, A_0, 및 B₂이다. 위치 B₂는 위치 A₁, B₁, B₀, A₀의 임의의 PU를 사용할 수 없거나(예를 들어, 다른 슬라이스 또는 타일에 속하기 때문이다) 인트라 코딩된 경우에만 고려된다. A₁ 위치의 후보가 추가된 후 나머지 후보의 추가는 중복 검사를 거치므로 동일한 모션 정보를 가진 후보가 목록에서 제외되어 코딩 효율성이 향상된다. 계산 복잡성을 줄이기 위해 언급된 중복 검사에서 가능한 모든 후보 쌍이 고려되는 것은 아니다. 대신, 도 22에서 화살표로 연결된 쌍만이 고려되고 중복 검사에 사용된 해당 후보가 동일한 모션 정보를 가지고 있지 않은 경우에만 후보를 목록에 추가한다. 중복 모션 정보의 또 다른 소스는 2Nx2N과는 다른 파티션과 관련된 "제2 PU"이다. 예를 들어, 도 23은 각각 Nx2N 및 2NxN의 경우에 대한 제2 PU를 도시한다. 현재 PU가 Nx2N으로 분할되면 위치 A1의 후보는 목록 구성에 고려되지 않는다. 사실, 이 후보를 추가함으로써 동일한 모션 정보를 갖는 2 개의 예측 단위로 이어질 것이고, 이는 코딩 단위에 단지 하나의 PU를 갖는 것에 중복된다. 마찬가지로 현재 PU가 2NxN으로 분할될 때 위치 B1은 고려되지 않는다.22 illustrates a system 2200 of positions of spatial merge candidates and candidate pairs considered for redundancy checking of spatial merge candidates. In the derivation of spatial merging candidates, up to four merging candidates are selected from candidates located at positions shown in FIG. 22 . The derivation order is A _{1 ,} B _{1 ,} B _{0 ,} A _{0 ,} and B ₂ . Position B ₂ is considered only when any PU at positions A ₁ , B ₁ , B ₀ , A ₀ is unavailable (eg, because it belongs to another slice or tile) or is intra-coded. After the candidate for the position A ₁ is added, the addition of the other candidates is redundantly checked, so that a candidate having the same motion information is excluded from the list, improving coding efficiency. In order to reduce computational complexity, not all possible pairs of candidates are considered in the mentioned redundancy check. Instead, only pairs connected by arrows in FIG. 22 are considered, and candidates are added to the list only when the corresponding candidates used for redundancy check do not have the same motion information. Another source of redundant motion information is a "second PU" associated with a partition other than 2Nx2N. For example, FIG. 23 shows the second PU for cases of Nx2N and 2NxN, respectively. If the current PU is divided into Nx2N, the candidate for position A1 is not considered for list construction. In fact, adding this candidate will lead to two prediction units with the same motion information, which is redundant to having only one PU in a coding unit. Likewise, position B1 is not considered when the current PU is divided into 2NxN.

도 23은 Nx2N 및 2NxN 파티션의 제2 PU에 대한 위치를 보여주는 시스템(2300)을 예시한다. 특히, 시스템(2300)은 시간 후보 유도 프로세스를 기술한다. 이 단계에서는 하나의 후보만 목록에 추가된다. 특히, 이 시간 병합 후보의 도출에 있어서, 주어진 참조 픽처 리스트 내에서 현재 픽처와 가장 작은 POC 차이를 갖는 픽처에 속하는 공존 PU를 기반으로 스케일링된 모션 벡터가 도출된다. 공존 PU의 유도에 사용될 참조 픽처 목록은 슬라이스 헤더에서 명시적으로 시그널링된다. 시간 병합 후보에 대한 스케일링된 모션 벡터는 도 24(시스템 2400)의 점선으로 나타낸 바와 같이 획득되고, POC 거리인 tb 및 td를 사용하여 공존 PU의 모션 벡터에서 스케일링되며, 여기서 tb는 현재 픽처의 참조 픽처와 현재 픽처 간의 POC 차이로 정의되고 td는 공존 픽처의 참조 픽처와 공존 픽처 간의 POC 차이로 정의된다. 시간 병합 후보의 참조 픽처 인덱스는 0으로 설정된다. B-슬라이스의 경우, 하나는 참조 픽처 리스트 0을 위한 것이고 다른 하나는 참조 픽처 리스트 1을 것인 2 개의 모션 벡터가 획득되고 결합되어 양방향-예측 병합 후보가 된다.23 illustrates a system 2300 showing a location for a second PU of Nx2N and 2NxN partitions. In particular, system 2300 describes a temporal candidate derivation process. At this stage, only one candidate is added to the list. In particular, in deriving this temporal merging candidate, a scaled motion vector is derived based on a coexisting PU belonging to a picture having the smallest POC difference from a current picture within a given reference picture list. The reference picture list to be used for deriving the coexisting PU is explicitly signaled in the slice header. The scaled motion vector for the temporal merge candidate is obtained as indicated by the dotted line in FIG. 24 (system 2400), and is scaled from the motion vector of the coexisting PU using the POC distances tb and td, where tb is the reference of the current picture It is defined as the POC difference between the picture and the current picture, and td is defined as the POC difference between the reference picture of the coexisting picture and the coexisting picture. The reference picture index of the temporal merging candidate is set to 0. For a B-slice, two motion vectors, one for reference picture list 0 and the other for reference picture list 1, are obtained and combined to become a bi-predictive merge candidate.

도 25는 시간 병합 후보에 대한 후보 위치를 나타내는 시스템(2500)이다. 참조 프레임에 속하는 공동 배치된 PU(Y)에서, 시스템(2500)에 묘사된 바와 같이 후보 C₀ 및 C₁ 사이에서 시간 후보에 대한 위치가 선택된다. 위치 C₀에 있는 PU를 사용할 수 없거나, 인트라 코딩되거나 현재 CTU의 외부에 있으면, 위치 C₁이 사용된다. 그렇지 않으면 위치 C₀이 시간 병합 후보의 유도에 사용된다.25 is a system 2500 illustrating candidate positions for time merge candidates. At a co-located PU(Y) belonging to a reference frame, a position for a temporal candidate is selected between candidates C ₀ and C ₁ as depicted in system 2500 . If the PU at position C ₀ is not available, intra-coded or outside the current CTU, position C ₁ is used. Otherwise, location C ₀ is used for derivation of temporal merging candidates.

시공간 병합 후보 외에도 두 가지 추가 유형의 병합 후보: 결합된 양방향-예측 병합 후보와 제로 병합 후보가 있다. 결합된 양방향-예측 병합 후보는 시공간 병합 후보를 활용하여 생성된다. 결합된 양방향-예측 병합 후보는 B-슬라이스에만 사용된다. 결합된 양방향-예측 후보들은 초기 후보의 제1 참조 픽처 리스트 모션 파라미터와 다른 후보의 제2 참조 픽처 리스트 모션 파라미터를 결합함으로써 생성된다. 이 두 튜플이 다른 모션 가설을 제공하면 새로운 양방향-예측 후보를 형성할 것이다. 예를 들어, 시스템(2500)은 mvL0 및 refIdxL0 또는 mvL1 및 refIdxL1이 있는 (왼쪽 상의) 원본 목록의 두 후보가 (오른쪽 상의) 최종 목록에 추가된 결합된 양방향-예측 병합 후보를 생성하는 데 사용되는 경우를 나타낸다. 이러한 추가 병합 후보를 생성하기 위해 고려되는 조합에 관한 수많은 규칙이 있다.Besides spatiotemporal merge candidates, there are two additional types of merge candidates: combined bi-predictive merge candidates and zero merge candidates. A combined bi-predictive merging candidate is created utilizing the spatiotemporal merging candidate. The combined bi-predictive merge candidate is used only for B-slices. Combined bi-prediction candidates are generated by combining the first reference picture list motion parameter of an initial candidate with the second reference picture list motion parameter of another candidate. If these two tuples provide different motion hypotheses, they will form a new bi-prediction candidate. For example, the system 2500 can use mvL0 and refIdxL0 or mvL1 and refIdxL1 to generate a combined bi-predictive merge candidate where the two candidates from the original list (on the left) are added to the final list (on the right). indicate the case. There are numerous rules regarding combinations that are considered to generate these additional merging candidates.

도 26은 결합된 양방향-예측 병합 후보들의 예시적인 테이블(2600)을 도시한다. 제로 모션 후보가 삽입되어 병합 후보 목록의 나머지 항목을 채우므로 MaxNumMergeCand 용량에 도달한다. 이러한 후보는 0에서 시작하여 새로운 제로 모션 후보가 목록에 추가될 때마다 증가하는 제로 공간 변위 및 참조 픽처 인덱스를 갖는다. 이러한 후보에 의해 사용되는 참조 프레임의 수는 각각 단방향 및 양방향 예측에 대해 1 개와 2 개이다. 마지막으로 이러한 후보에 대해서는 중복 검사가 수행되지 않는다.26 shows an example table 2600 of combined bi-predictive merging candidates. The MaxNumMergeCand capacity is reached as zero motion candidates are inserted to fill the remaining entries in the merge candidate list. These candidates have zero spatial displacement and reference picture indices, starting at 0 and increasing each time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is 1 and 2 for unidirectional and bidirectional prediction, respectively. Finally, no redundant checks are performed on these candidates.

도 27은 보조 데이터를 사용하는 모션 추정 파이프 라인의 수정을 포함하는 시스템(2700)의 예이다. 시스템(2700)은 시스템(1900)의 컴포넌트(1902)에 예시된 모션 추정 파이프 라인의 인코더 부분에 대한 추가 컴포넌트를 예시한다. 예를 들어, 시스템(2700)은 보조 프레임(2702)(패치 프레임 데이터 유닛 또는 패치 프레임 데이터 그룹이라고도 함)을 포함한다. 시스템(2700)은 또한 새로운 프레임(2704), 모션 추정 컴포넌트(2706), 모션 엔트로피 코더(2708), 모션 보상된 보간(2710), 감산 모듈(2712), 잔여 인코더(2714), 잔여 디코더(2716) 및 가산기(2718)를 포함한다.27 is an example of a system 2700 that includes modifications to the motion estimation pipeline using auxiliary data. System 2700 illustrates additional components to the encoder portion of the motion estimation pipeline illustrated in component 1902 of system 1900. For example, system 2700 includes auxiliary frames 2702 (also referred to as patch frame data units or patch frame data groups). System 2700 also includes new frame 2704, motion estimation component 2706, motion entropy coder 2708, motion compensated interpolation 2710, subtraction module 2712, residual encoder 2714, residual decoder 2716 ) and an adder 2718.

모션 추정 컴포넌트(2706)는 보조 프레임(2702), 새로운 프레임(2704) 및 잔여 디코더(2716) 및 가산기(2718)로부터 이전에 인코딩된(현재 디코딩된) 잔여 프레임을 수신한다.Motion estimation component 2706 receives an auxiliary frame 2702, a new frame 2704, and previously encoded (now decoded) residual frames from residual decoder 2716 and adder 2718.

일부 구현에서, 모션 추정 컴포넌트(2706)가 프레임에 대한 패치 데이터를 포함하는 새로운 프레임(2704)을 수신할 때, 모션 추정 컴포넌트(2706)는 보조 프레임(2702) 내의 패치와 연관된 3D 좌표를 비교한다. 이것은 각각의 보조 프레임(2702)이 바운딩 박스 내에서 포인트 클라우드 미디어의 3D 좌표로 패치 데이터의 2-D 좌표 사이의 전송을 설명할 자체 패치 메타데이터(또는 보조 데이터)를 포함하기 때문이다. 특히, 모션 추정(2706)은 보조 프레임(2702)의 패치 메타 데이터를 사용하여 참조 프레임을 생성한다. 그런 다음, 현재 프레임, 예를 들어, 새로운 프레임(2704)으로부터의 현재 3D 정보 및 이전 프레임, 예를 들어 참조 프레임으로부터의 3D 정보는 패치의 위치를 결정하는 데 사용될 수 있다. 예를 들어, 모션 추정(2706)은 새로운 프레임(2704)의 이 패치가 참조 프레임의 다른 패치와 3D 좌표의 동일한 위치에서 오는 것으로 결정할 수 있으며, 여기서 참조 프레임의 패치는 새로운 프레임(2704)의 다른 2-D 위치에 있다.In some implementations, when motion estimation component 2706 receives a new frame 2704 that includes patch data for the frame, motion estimation component 2706 compares the 3D coordinates associated with the patch in auxiliary frame 2702. . This is because each auxiliary frame 2702 contains its own patch metadata (or auxiliary data) that will describe the transfer between the 2-D coordinates of the patch data to the 3-D coordinates of the point cloud media within the bounding box. In particular, motion estimation 2706 uses the patch metadata of auxiliary frame 2702 to create a reference frame. The current 3D information from the current frame, eg the new frame 2704, and the 3D information from the previous frame, eg the reference frame, can then be used to determine the location of the patch. For example, motion estimation 2706 can determine that this patch in the new frame 2704 is from the same location in 3D coordinates as another patch in the reference frame, where the patch in the reference frame is another patch in the new frame 2704. It is in a 2-D position.

그런 다음, 모션 추정 컴포넌트(2706)는 새로운 프레임(2704)의 각 패치에 대해 이 프로세스를 실행하고 대응하는 3-D 정보를 얻기 위해 2-D 정보를 검색한다. 모션 추정 컴포넌트(2706)는 보조 프레임(2702)에서 패치에 대한 2-D 정보를 획득한다. 새로운 프레임(2704)에서의 패치의 2-D 정보와 보조 프레임(2702)에서의 패치의 2-D 정보를 사용하여, 모션 추정 컴포넌트(2706)는 예를 들어, 새로운 프레임(2704)으로부터 패치의 u0 및 v0 위치 좌표를 보조 프레임(2702)에서의 패치의 u0 및 v0 위치 좌표와 비교함으로써 이들 두 패치 사이의 2-D 거리를 결정하고, 이어서 보조 프레임(2702)의 위치로부터 새로운 프레임(2704)의 위치로 패치가 2-D로 이동한 방법을 결정한다. 보조 프레임(2702)에 있는 패치의 u0, v0과 새로운 프레임에 있는 패치의 u0, v0 간의 이 차이는 모션 벡터 후보에 대응한다. 모션 추정 컴포넌트(2706)는 이 생성된 모션 벡터 후보를 모션 벡터 후보 목록에 삽입한다. 모션 추정 컴포넌트(2706)는 새로운 프레임(2704)의 각 패치를 보조 프레임(2702)의 각 패치와 비교하는 이 프로세스를 수행한다. 일치가 발생하면, 이러한 일치된 패치 간의 2D 좌표 간의 차이가 모션 벡터 후보 목록에 추가된다.The motion estimation component 2706 then runs this process for each patch in the new frame 2704 and retrieves the 2-D information to obtain the corresponding 3-D information. A motion estimation component 2706 obtains 2-D information for the patch in auxiliary frame 2702 . Using the 2-D information of the patch in the new frame 2704 and the 2-D information of the patch in the auxiliary frame 2702, the motion estimation component 2706 can, for example, use the 2-D information of the patch from the new frame 2704. The 2-D distance between these two patches is determined by comparing the u0 and v0 position coordinates to the u0 and v0 position coordinates of the patch in auxiliary frame 2702, and then a new frame 2704 from the position of auxiliary frame 2702. The position of determines how the patch is moved in 2-D. This difference between u0, v0 of the patch in the auxiliary frame 2702 and u0, v0 of the patch in the new frame corresponds to the motion vector candidate. Motion estimation component 2706 inserts this generated motion vector candidate into a motion vector candidate list. The motion estimation component 2706 performs this process comparing each patch in the new frame 2704 to each patch in the auxiliary frame 2702. When a match occurs, the difference between the 2D coordinates between these matched patches is added to the motion vector candidate list.

일부 구현에서, 모션 벡터 후보가 모션 벡터 후보 목록에 추가되면, 3D 좌표 목록의 패치에 대해 3D 모션 검색이 수행된다.In some implementations, when a motion vector candidate is added to the motion vector candidate list, a 3D motion search is performed on the patch in the 3D coordinate list.

일부 구현에서, 인코더는 필드 이웃의 후보 목록을 포함한다. 필드 이웃의 후보 목록에는 2D 이미지 내에서 움직임을 검토하기 위한 가능한 위치가 포함될 수 있다. 모션 추정 컴포넌트(2706)는 각 패치 프레임에 대한 필드 이웃의 후보 목록에서 위치를 구체화할 수 있다.In some implementations, an encoder includes a candidate list of field neighbors. A candidate list of field neighbors may include possible locations for reviewing motion within a 2D image. Motion estimation component 2706 can specify a position in the candidate list of field neighbors for each patch frame.

보조 정보를 사용하여 참조 프레임에서 매칭된 패치를 결정하는 프로세스는 아래 예시된 코드에 표시된다. 예를 들어, 참조 프레임에 대한 포인트 클라우드 압축 솔루션 버퍼에서 생성된 보조 3D 정보 파일의 기존 패치 2D 및 3D 정보를 사용한다. 예를 들어 코드에는 다음이 포함된다:The process of using the auxiliary information to determine the matching patch in the reference frame is shown in the code example below. For example, using the existing patch 2D and 3D information in the auxiliary 3D information file created in the point cloud compression solution buffer for the reference frame. For example the code contains:

Algorithm. Find matched patches in reference framesAlgorithms. Find matched patches in reference frames while frame k in GoP frames dofor each patch in patches list of reference frame [refIdx]
for each patch in patches list of frame [k]
estimate cost in RDO based on prediction
if(patchIdx[k] = patchIdx[refIdx])
do motion refinement
end if
if(dist < bestDist)
update pcMvFieldNeighbours
end if
end for
end for while frame k in GoP frames dofor each patch in patches list of reference frame [refIdx]
for each patch in patches list of frame [k]
estimate cost in RDO based on prediction
if( patchIdx[k] = patchIdx[refIdx])
do motion refinement
end if
if( dist < bestDist)
update pcMvFieldNeighbours
end if
end for
end for end whileend while

For k-th frame:For k-th frame:

for(Int yCoorRef = 0; yCoorRef < picHeight; yCoorRef++) { for(Int yCoorRef = 0; yCoorRef < picHeight; yCoorRef++) {

for(Int xCoorRef = 0; xCoorRef < picWidth; xCoorRef++) { for(Int xCoorRef = 0; xCoorRef < picWidth; xCoorRef++) {

Double* refDepthX = m_pcPic->getDepthX(Int(eRefPicList) * MAX_NUM_REF + refIdx); Double* refDepthX = m_pcPic->getDepthX(Int(eRefPicList) * MAX_NUM_REF + refIdx);

Double* refDepthY = m_pcPic->getDepthY(Int(eRefPicList) * MAX_NUM_REF + refIdx); Double* refDepthY = m_pcPic->getDepthY(Int(eRefPicList) * MAX_NUM_REF + refIdx);

Double* refDepthZ = m_pcPic->getDepthZ(Int(eRefPicList) * MAX_NUM_REF + refIdx); Double* refDepthZ = m_pcPic->getDepthZ(Int(eRefPicList) * MAX_NUM_REF + refIdx);

Double xCoorRef3D = refDepthX[yCoorRef * picWidth + xCoorRef]; Double xCoorRef3D = refDepthX[yCoorRef * picWidth + xCoorRef];

Double yCoorRef3D = refDepthY[yCoorRef * picWidth + xCoorRef]; Double yCoorRef3D = refDepthY[yCoorRef * picWidth + xCoorRef];

Double zCoorRef3D = refDepthZ[yCoorRef * picWidth + xCoorRef]; Double zCoorRef3D = refDepthZ[yCoorRef * picWidth + xCoorRef];

Double dist =(xCoor3D - xCoorRef3D) *(xCoor3D - xCoorRef3D) +(yCoor3D - yCoorRef3D) * Double dist =(xCoor3D - xCoorRef3D) *(xCoor3D - xCoorRef3D) +(yCoor3D - yCoorRef3D) *

(yCoor3D - yCoorRef3D) +(zCoor3D - zCoorRef3D) *(zCoor3D - zCoorRef3D); (yCoor3D - yCoorRef3D) + (zCoor3D - zCoorRef3D) *(zCoor3D - zCoorRef3D);

if(dist < bestDist) { if(dist < bestDist) {

bestDist = dist; bestDist = dist;

bestXCoorRef = xCoorRef; bestXCoorRef = xCoorRef;

bestYCoorRef = yCoorRef; bestYCoorRef = yCoorRef;

} }

TComMv depthMVP((bestXCoorRef - xCoor) << 2,(bestYCoorRef - yCoor) << 2); TComMv depthMVP((bestXCoorRef - xCoor) << 2, (bestYCoorRef - yCoor) << 2);

pcMvFieldNeighbours[(iCount << 1) + 1].setMvField(depthMVP, refIdx); pcMvFieldNeighbours[(iCount << 1) + 1].setMvField(depthMVP, refIdx);

위의 코드에서 V-PCC 인코더는 비디오 인코더에 대한 참조 프레임에서 일치하는 패치를 찾는다. V-PCC 인코더는 각 프레임의 패치에 대응하는 동일한 2-D 및 3-D 정보와 도 16에 대해 설명된 투영 정보를 사용한다. 이 단계에서 V-PCC 인코더는 동작 개선을 위한 새로운 후보를 생성한다. 알고리즘은 다음과 같은 방식으로 실행된다: V-PCC 인코더는 프레임 그룹의 각 프레임을 반복한다. V-PCC 인코더는 특정 참조 프레임의 패치 목록에서 패치를 검색한다. 그 후, V-PCC 인코더는 현재 프레임에서 특정 패치를 검색한다.In the code above, the V-PCC encoder finds a matching patch in the reference frame to the video encoder. The V-PCC encoder uses the same 2-D and 3-D information corresponding to each frame's patch and the projection information described with respect to FIG. 16 . In this step, the V-PCC encoder generates new candidates for motion improvement. The algorithm runs in the following way: The V-PCC encoder iterates through each frame of the frame group. The V-PCC encoder searches for a patch in the patch list of a specific reference frame. After that, the V-PCC encoder searches for a specific patch in the current frame.

V-PCC 인코더는 모션 벡터 후보에 기초하여 레이트 왜곡 최적화에서 비용을 추정한다. 비용 또는 거리, 예를 들어 레이트 왜곡 최적화 함수의 추정 비용에 의해 반환되는 메트릭 또는 벡터 공간의 메트릭 거리는 모션 벡터 후보 목록으로부터의 최소 비용에 대응한다. 특히, 최소 비용은 X 차이, Y 차이 및 Z 차이의 제곱 값의 합에 대응한다. 예를 들어, 참조 프레임에 있는 패치의 X 좌표를 현재 프레임에 있는 패치의 X 좌표에서 빼서 X 차이("Xdif")를 생성하고; 참조 프레임에 있는 패치의 Y 좌표는 현재 프레임에 있는 패치의 Y 좌표에서 빼서 Y 차이("Ydif")를 생성하며; 그리고 참조 프레임에서 패치의 Z 좌표는 현재 프레임에서 패치의 Z 좌표에서 빼서 Z 차이("Zdif")를 생성한다. 그런 다음 PCC 인코더는 Xdif, Ydif 및 Zdif의 제곱을 합산하여 참조 프레임의 패치와 현재 프레임의 패치 사이의 특정 거리를 생성한다.The V-PCC encoder estimates the cost in rate distortion optimization based on the motion vector candidates. A cost or distance, for example a metric returned by the estimated cost of the rate distortion optimization function or a metric distance in a vector space, corresponds to the minimum cost from the motion vector candidate list. In particular, the minimum cost corresponds to the sum of the squares of the X difference, the Y difference, and the Z difference. For example, the X coordinate of the patch in the reference frame is subtracted from the X coordinate of the patch in the current frame to create an X difference ("Xdif"); The Y coordinate of the patch in the reference frame is subtracted from the Y coordinate of the patch in the current frame to create a Y difference (“Ydif”); Then, the Z coordinate of the patch in the reference frame is subtracted from the Z coordinate of the patch in the current frame to create a Z difference ("Zdif"). The PCC encoder then sums the squares of Xdif, Ydif and Zdif to produce a specific distance between the patch of the reference frame and the patch of the current frame.

레이트 왜곡 최적화에서 비용을 추정하는 이러한 프로세스는 최소 비용 또는 거리가 발견될 때까지 반복된다. 최소 비용 또는 거리가 발견되면 pcMvFieldNeighbors 변수가 X 좌표, Y 좌표 및 참조 프레임의 패치와 연결된 올바른 패치에 대응하는 최소화된 비용 값으로 업데이트된다.This process of estimating cost in rate distortion optimization is repeated until a minimum cost or distance is found. If the minimum cost or distance is found, the pcMvFieldNeighbors variable is updated with the minimized cost value corresponding to the X coordinate, Y coordinate, and correct patch associated with the patch in the reference frame.

그런 다음 V-PCC 인코더는 정제된 모션 벡터 후보 및 병합 후보를 기반으로 모션 보상을 수행한다. 일부 구현에서, 새로운 엔티티는 병합 후보 목록에서 생성된다. 이 엔티티는 모션 추정을 위한 더 나은 예측을 찾기 위해 모션 추정을 수행하는 예측자로 사용된다. 예를 들어, 모션 벡터 후보 목록은 패치에 대한 3D 정보와 투사된 이미지의 해당 2D 패치 위치를 기반으로 업데이트된다. 또한 병합 후보 목록은 패치에 대한 3D 정보와 투영된 이미지의 해당 2D 위치를 기반으로 업데이트될 수 있다. 따라서 이 정보는 비디오 압축 및 압축 해제에 추가된다. 이 정보는 보조 정보를 기반으로 비디오 인코더에 대해 생성되며 비디오 디코더에 대해서도 동일한 정보가 생성된다. 모션 벡터 후보 목록은 전송되거나 계산되는 것이 아니라 패치 메타 데이터와 함께 2D 패치를 수신할 때 V-PCC 인코더에 의해 생성된다. 모션 벡터 후보 목록은 "비디오 코딩을 위한 추가 정보 생성" 코드 세트에 따라 생성된다. V-PCC 인코더와 V-PCC 디코더는 모두 "비디오 코딩을 위한 추가 정보 생성"을 수행하여 모션 벡터 후보 목록을 생성한다. 따라서, V-PCC 인코더와 V-PCC 디코더 간에 전송되는 추가 모션 벡터 후보 목록 또는 기타 정보가 필요하지 않다.Then, the V-PCC encoder performs motion compensation based on the refined motion vector candidate and merge candidate. In some implementations, new entities are created in the merge candidate list. This entity is used as a predictor to perform motion estimation to find a better prediction for motion estimation. For example, the motion vector candidate list is updated based on 3D information about a patch and the location of the corresponding 2D patch in the projected image. Also, the merge candidate list can be updated based on the 3D information about the patch and the corresponding 2D location of the projected image. So this information is added to video compression and decompression. This information is generated for the video encoder based on the auxiliary information and the same information is generated for the video decoder. The motion vector candidate list is not transmitted or computed, but generated by the V-PCC encoder upon receiving a 2D patch along with the patch metadata. The motion vector candidate list is generated according to the “Generate Additional Information for Video Coding” code set. Both the V-PCC encoder and the V-PCC decoder perform "generate additional information for video coding" to generate a motion vector candidate list. Therefore, no additional motion vector candidate list or other information transmitted between the V-PCC encoder and the V-PCC decoder is required.

일부 구현에서, 보조 정보 파일은 포인트 클라우드 코딩된 비트스트림으로부터 생성된다. 포인트 클라우드 코딩된 비트스트림은 아래에서 더 설명되고 예시될 것이다. 보조 정보는 V-PCC 인코더와 V-PCC 디코더 모두에서 새로운 모션 벡터 후보를 생성하는 데 사용되며, 둘 다 동일한 방법을 사용하여 V-PCC 인코더와 V-PCC 디코더가 동일한 결과를 생성하도록 한다. 보조 정보를 사용하여 V-PCC 인코더 및 V-PCC 디코더는 새로운 모션 벡터 후보 및 병합 후보 목록 입력을 생성하였다. 그런 다음, 기존의 비디오 인코딩/디코딩 기술에 새로운 모션 벡터 후보 및 병합 후보 목록이 제공된다.In some implementations, the auxiliary information file is generated from a point cloud coded bitstream. A point cloud coded bitstream will be described and illustrated further below. Side information is used to generate new motion vector candidates in both the V-PCC encoder and V-PCC decoder, both using the same method to ensure that the V-PCC encoder and V-PCC decoder produce the same result. Using the side information, the V-PCC encoder and V-PCC decoder generated new motion vector candidate and merge candidate list inputs. Then, new motion vector candidate and merge candidate lists are provided to the existing video encoding/decoding technology.

도 28은 V-PCC 유닛 페이로드의 패킷 스트림(2800) 표현의 예이다. 패킷 스트림(2800)은 V-PCC 비트스트림(2802)을 포함한다. V-PCC 비트스트림(2802)은 하나 이상의 V-PCC 유닛을 포함한다. 예를 들어, V-PCC 유닛(2804)은 V-PCC 유닛 헤더(2806) 및 V-PCC 유닛 페이로드(2808)를 포함한다. V-PCC 유닛 페이로드(2808)는 시퀀스 파라미터 세트(2810), 패치 데이터 그룹(2812), 점유 비디오 데이터(2814), 지오메트리 비디오 데이터(2818), 및 속성 비디오 데이터(2816)를 포함한다. 점유 비디오 데이터(2814)는 2-D 프레임 데이터를 포함한다. 속성 비디오 데이터(2816)는 2 개의 2-D 프레임 세트, 예를 들어 근거리 레이어 및 원거리 레이어를 포함한다. 지오메트리 비디오 데이터(2818)는 또한 2 개의 2-D 프레임 세트, 예를 들어 근거리 레이어 및 원거리 레이어를 포함한다.28 is an example of a packet stream 2800 representation of a V-PCC unit payload. Packet stream 2800 includes V-PCC bitstream 2802. The V-PCC bitstream 2802 includes one or more V-PCC units. For example, the V-PCC unit 2804 includes a V-PCC unit header 2806 and a V-PCC unit payload 2808. The V-PCC unit payload 2808 includes a sequence parameter set 2810, patch data group 2812, occupancy video data 2814, geometry video data 2818, and attribute video data 2816. Occupying video data 2814 includes 2-D frame data. Attribute video data 2816 includes two sets of 2-D frames, e.g., a near layer and a far layer. Geometry video data 2818 also includes two sets of 2-D frames, e.g., a near layer and a far layer.

패치 데이터 그룹 단위 유형은 복수의 데이터 세트를 포함한다. 예를 들어, 패치 데이터 그룹 단위 유형(2812)은 시퀀스 파라미터 세트(2820), 프레임 지오메트리 파라미터 세트(2822), 지오메트리 패치 파라미터 세트(2824), 프레임 파라미터 세트(2826), 프레임 속성 파라미터 세트 및 속성 패치 파라미터 세트(2830)를 포함한다. 추가로, 패치 데이터 그룹 유닛 유형(2812)은 복수의 패치 타일 그룹(2832)을 포함한다. 전술한 바와 같이, 패치 타일 그룹은 복수의 패치를 포함한다. 예를 들어, 패치 데이터 그룹은 T(i, 0) 내지 T(i, m)의 패치 타일 그룹의 한 세트를 포함한다. 이 정보는 V-PCC 유닛 페이로드의 점유, 속성 및 지오메트리 컴포넌트에서 포인트 클라우드 프레임을 재구성하는 데 필요하다. "I"는 3-D PCC 프레임 "I"에 대응하는 패치 데이터 그룹 인덱스이다. M+1은 3D 포인트 클라우드 프레임 "I"에 대해 생성된 3D 패치의 수이고; 이 문서에서는 T(i, j)를 패치라고 한다.The patch data group unit type includes a plurality of data sets. For example, patch data group unit type 2812 includes sequence parameter set 2820, frame geometry parameter set 2822, geometry patch parameter set 2824, frame parameter set 2826, frame attribute parameter set, and attribute patch. parameter set 2830. Additionally, the patch data group unit type 2812 includes a plurality of patch tile groups 2832. As described above, a patch tile group includes a plurality of patches. For example, the patch data group includes one set of patch tile groups of T(i, 0) to T(i, m). This information is needed to reconstruct the point cloud frame from the occupancy, attribute and geometry components of the V-PCC unit payload. "I" is the patch data group index corresponding to the 3-D PCC frame "I". M+1 is the number of 3D patches generated for 3D point cloud frame “I”; In this document, T(i, j) is referred to as a patch.

도 29는 V-PCC 유닛 페이로드의 시각적 표현(2900)의 또 다른 예이다. V-PCC 유닛 페이로드의 시각적 표현(2900)에 예시된 바와 같이, 화살표는 참조 프레임/데이터 유닛에서 현재 데이터 유닛으로의 예측 흐름을 나타낸다. 근거리 레이어와 원거리 레이어 간의 예측은 동일한 V-PCC 데이터 프레임 내에서만 허용된다. 예를 들어, 시각적 표현(2900)은 3-D 포인트 클라우드, 패치 데이터 그룹, 근거리 및 원거리 레이어의 기하학 비디오 데이터, 근거리 및 원거리 레이어의 속성 비디오 데이터 및 점유 비디오 데이터를 예시한다. 디29 is another example of a visual representation 2900 of a V-PCC unit payload. As illustrated in the visual representation 2900 of the V-PCC unit payload, arrows indicate the prediction flow from the reference frame/data unit to the current data unit. Prediction between the near layer and the far layer is allowed only within the same V-PCC data frame. For example, visual representation 2900 illustrates a 3-D point cloud, patch data group, geometry video data of near and far layers, attribute video data and occupancy video data of near and far layers. d

도 30은 3D 보조 데이터를 사용하여 모션 추정을 수행하기 위한 프로세스(3000)의 예를 예시하는 흐름도이다. 프로세스(3000)는 도 27에 도시된 컴포넌트를 포함하는 V-PCC 인코더 또는 V-PCC 디코더에 의해 수행될 수 있다.30 is a flow diagram illustrating an example of a process 3000 for performing motion estimation using 3D auxiliary data. Process 3000 may be performed by a V-PCC encoder or a V-PCC decoder that includes the components shown in FIG. 27 .

V-PCC 인코더는 3 차원 포인트 클라우드 데이터의 연속성 데이터를 기반으로 기록 미디어의 3 차원 포인트 클라우드 데이터의 세그멘테이션을 생성한다(3002). V-PCC 인코더는 3 차원(3-D) 포인트 클라우드 데이터를 3 차원 패치 바운딩 박스 또는 3 차원 패치 바운딩 박스(1504)와 같은 분할 세트로 분할한다. 3D 포인트 클라우드 데이터의 프레임 내에서 객체를 덮는 3D 포인트 클라우드 데이터의 전체에 대해 다중 분할이 생성될 수 있다. 각 세그멘테이션 또는 3D 패치 바운딩 박스는 서로 근접하게 배치되거나 3D 포인트 클라우드 데이터로 객체를 덮을 때 서로 겹칠 수 있다.The V-PCC encoder generates segmentation of the 3D point cloud data of the recording medium based on the continuity data of the 3D point cloud data (3002). The V-PCC encoder partitions the three-dimensional (3-D) point cloud data into a set of partitions, such as a 3-dimensional patch bounding box or a 3-dimensional patch bounding box 1504. Multiple segmentations may be created for the entirety of the 3D point cloud data covering objects within a frame of the 3D point cloud data. Each segmentation or 3D patch bounding box may be placed close to each other or overlap each other when covering objects with 3D point cloud data.

V-PCC 인코더는 분할된 3 차원 포인트 클라우드 데이터의 표현(representation)을 3 차원 바운딩 박스의 하나 이상의 측면에 투영하고, 분할된 3 차원 포인트 클라우드 데이터의 표현은 3 차원 바운딩 박스의 투영된 측면에 기초하여 다르다(3004). V-PCC는 도 16에 도시된 바와 같이 3D 포인트 클라우드 데이터의 이미지를 3D 바운딩 박스의 각 측면에 투영하기 위해 3-D 바운딩 박스로 3D 포인트 클라우드 데이터를 둘러싼다. 일부 구현에서, V-PCC 인코더는 3-D 바운딩 박스의 측면 상의 투영 평면을 선택하기 위한 기준을 정의한다. 예를 들어, 기준은 매끄러운 연속 표면 기준을 포함할 수 있다. 따라서, 매끄러운 연속 표면 기준이 각 투영 방향 중 최대 영역이 되도록 바운딩 박스의 측면에 분할의 투영 영역을 포함하는 경우 특정 투영 평면이 선택된다. 부드러운 연속 표면은 하나 이상의 부드러운 연속 알고리즘을 사용하여 결정할 수 있다. 매끄러운 연속 표면은 최소한의 가려지거나 차단된 데이터 포인트가 있는 표면으로 정의할 수 있다. 그런 다음 V-PCC 인코더는 각 방향과 모든 방향의 각 매끄러운 표면을 비교하여 3D 바운딩 박스 측면의 최대 영역을 포함하는 투영을 생성하는 방향을 결정한다. 특정 투영 평면이 선택되면 V-PCC 인코더는 3D 포인트 클라우드 데이터의 특정 표면을 3D 바운딩 박스의 특정 투영 평면에 투영한다. 일부 구현에서, V-PCC 인코더는 3-D 포인트 클라우드 데이터의 다양한 표면을 3-D 바운딩 박스의 각 측면에 투영한다.The V-PCC encoder projects a representation of the segmented 3D point cloud data onto one or more sides of a 3D bounding box, and the representation of the segmented 3D point cloud data is based on the projected side of the 3D bounding box. (3004). V-PCC encloses 3D point cloud data with a 3-D bounding box to project images of the 3D point cloud data to each side of the 3D bounding box, as shown in FIG. 16 . In some implementations, the V-PCC encoder defines criteria for selecting projection planes on the sides of the 3-D bounding box. For example, the criteria may include smooth continuous surface criteria. Thus, a specific projection plane is selected if the smooth continuous surface reference contains the projection region of the division on the side of the bounding box such that it is the largest region in each projection direction. A smooth contiguous surface can be determined using one or more smooth contiguous algorithms. A smooth continuous surface can be defined as a surface with a minimum number of occluded or occluded data points. The V-PCC encoder then compares each smooth surface in each direction and in all directions to determine which direction produces a projection containing the largest area of the side of the 3D bounding box. When a specific projection plane is selected, the V-PCC encoder projects a specific surface of the 3D point cloud data onto a specific projection plane of the 3D bounding box. In some implementations, the V-PCC encoder projects various surfaces of the 3-D point cloud data onto each side of the 3-D bounding box.

V-PCC 인코더는 분할된 3 차원 포인트 클라우드 데이터의 투영된 표현에 기초하여 하나 이상의 패치를 생성한다(3006). 이미지가 3D 바운딩 박스의 측면에 투영되면 V-PCC 인코더는 각 투영에서 패치를 생성한다. 3 차원 바운딩 박스 측면의 2 차원 바운딩 박스에 대응하는 각 패치에는 좌표 세트가 포함된다. 패치는 3D 바운딩 박스 측면의 투영 영역에 대응한다. 패치의 좌표에는 예를 들어 패치 인덱스 u0, v0, size_u0 및 size_v0이 포함된다. 패치 인덱스는 해당 바운딩 박스의 특정 면과 관련된 특정 패치를 식별한다. U0은 투영 평면에서 패치의 X 좌표를 정의한다. V0은 투영 평면에서 패치의 Y 좌표를 정의한다. Size_u0 및 size_v0은 각각 패치 u0 및 v0의 각 좌표에 대응하는 크기를 설명한다.The V-PCC encoder generates (3006) one or more patches based on the projected representation of the segmented 3D point cloud data. When an image is projected onto a side of a 3D bounding box, a V-PCC encoder generates a patch from each projection. Each patch corresponding to the 2-dimensional bounding box on the side of the 3-dimensional bounding box contains a set of coordinates. A patch corresponds to a projected area on the side of a 3D bounding box. Coordinates of the patch include, for example, patch indices u0, v0, size_u0 and size_v0. The patch index identifies a specific patch associated with a specific face of the bounding box. U0 defines the X coordinate of the patch in the projection plane. V0 defines the Y coordinate of the patch in the projection plane. Size_u0 and size_v0 describe sizes corresponding to respective coordinates of patches u0 and v0, respectively.

일부 구현에서, V-PCC 인코더는 "패치 타일 그룹"을 형성하기 위해 3-D 바운딩 박스의 각 측면으로부터의 패치와 그에 대응하는 좌표 정보를 결합한다. 패치 타일 그룹의 각 요소는 특정 패치에 대응하며, 여기에는 특정 고유 인덱스가 포함되며 3D 포인트 클라우드 프레임 내의 고유한 3D 바운딩 박스에 대응한다. 예를 들어, 패키징된 패치(1802)는 패치 데이터 그룹으로 알려진 패치 타일 그룹의 그룹을 포함한다.In some implementations, the V-PCC encoder combines the patches from each side of the 3-D bounding box with their corresponding coordinate information to form a “patch tile group”. Each element of the patch tile group corresponds to a specific patch, which contains a specific unique index and corresponds to a unique 3D bounding box within the 3D point cloud frame. For example, packaged patch 1802 includes a group of patch tile groups known as patch data groups.

패치 데이터 그룹의 각 패치로부터의 좌표 정보는 투영 맵에 포함된다. 프로젝션 맵(projection map)은 나중에 V-PCC 인코더에 의해 비디오 인코딩 솔루션에 제공되는 추가 파일이다. 투영 맵에는 각 패치에 대한 2D 및 3D 위치 정보가 포함된다. 특정 패치에 대한 2D 데이터, 예를 들어, u0, v0는 투영의 X 및 Y 좌표의 왼쪽 상단 모서리에 대응한다. Size_u0 및 size_v0은 해당 패치의 높이와 너비에 대응한다. 3D 패치 투영 데이터의 경우 투영 맵에는 X, Y 및 Z 축에 대응하는 u1, v1 및 d1 좌표가 포함된다. 특히 u1, v1 및 d1은 투영면의 인덱스 또는 법선 축에 대응하는 투영면에 대응한다.Coordinate information from each patch in the patch data group is included in the projection map. A projection map is an additional file that is later provided to the video encoding solution by the V-PCC encoder. The projection map contains 2D and 3D location information for each patch. The 2D data for a particular patch, eg u0, v0, corresponds to the top left corner of the X and Y coordinates of the projection. Size_u0 and size_v0 correspond to the height and width of the patch. For 3D patch projection data, the projection map contains u1, v1 and d1 coordinates corresponding to the X, Y and Z axes. In particular, u1, v1 and d1 correspond to projection planes corresponding to indexes or normal axes of projection planes.

V-PCC 인코더는 하나 이상의 패치의 제1 프레임을 생성한다(3008). V-PCC인코더는(3006)에서 생성된 패치 데이터 그룹의 프레임을 생성한다. 프레임은 인코딩되어 전송을 위해 비디오 압축 솔루션에 제공될 수 있다.The V-PCC encoder generates a first frame of one or more patches (3008). The V-PCC encoder generates a frame of the patch data group generated in (3006). The frame may be encoded and provided to a video compression solution for transmission.

V-PCC 인코더는 제1 프레임에 대한 제1 보조 정보를 생성한다(3010). 제1 보조 정보는 제1 프레임의 특정 패치에 대한 패치 메타 데이터에 대응한다. 예를 들어, 제1 보조 정보는 특정 패치에 대한 패치 인덱스 u0, v0, u1, v1, d1을 포함할 수 있다.The V-PCC encoder generates first auxiliary information for the first frame (3010). The first auxiliary information corresponds to patch metadata for a specific patch of the first frame. For example, the first auxiliary information may include patch indices u0, v0, u1, v1, and d1 for a specific patch.

V-PCC 인코더는 참조 프레임에 대한 제2 보조 정보를 생성한다(3012). V-PCC 인코더는 하나 이상의 패치를 포함하는 이전에 인코딩된 프레임인 참조 프레임을 검색한다. V-PCC 인코더는 참조 프레임을 디코딩한 후 참조 프레임에서 패치를 검색하고 참조 프레임에서 패치와 관련된 보조 정보를 검색한다. 참조 프레임의 패치와 관련된 보조 정보에는 특정 패치에 대한 패치 인덱스 u0, v0, u1, v1 및 d1이 포함된다.The V-PCC encoder generates second auxiliary information for the reference frame (3012). The V-PCC encoder searches for a reference frame, which is a previously encoded frame containing one or more patches. After decoding the reference frame, the V-PCC encoder retrieves a patch from the reference frame and retrieves auxiliary information related to the patch from the reference frame. Auxiliary information related to patches in the reference frame includes patch indices u0, v0, u1, v1, and d1 for a specific patch.

V-PCC 인코더는 제1 보조 정보 및 제2 보조 정보에 기초하여 참조 프레임에서 제2 패치와 매칭되는 제1 프레임에서 제1 패치를 식별한다(3014). V-PCC는 제1 프레임과 관련된 보조 정보를 참조의 패치와 관련된 보조 정보와 비교한다. 예를 들어 V-PCC는 각 패치와 관련된 패치 인덱스, u0, v0, v1 및 d1을 비교하여 이 정보 간의 거리를 결정한다. V-PCC 인코더는 제1 프레임의 패치를 참조 프레임의 각 패치와 비교하여 최소 보조 정보 거리를 가진 일치하는 패치를 찾는다. 거리가 최소가 되면 V-PCC 인코더는 두 패치 간의 매칭 패치를 나타낸다.The V-PCC encoder identifies a first patch in the first frame that matches the second patch in the reference frame based on the first auxiliary information and the second auxiliary information (3014). The V-PCC compares the auxiliary information associated with the first frame with the auxiliary information associated with the patch of reference. For example, V-PCC compares the patch indices u0, v0, v1 and d1 associated with each patch to determine the distance between these pieces of information. The V-PCC encoder compares the patch in the first frame with each patch in the reference frame to find the matching patch with the minimum side information distance. When the distance is minimized, the V-PCC encoder represents a matching patch between the two patches.

V-PCC 인코더는 제1 보조 정보와 제2 보조 정보의 차이에 기초하여 제1 패치와 제2 패치 사이의 모션 벡터 후보를 생성한다(3016). 다른 패치에 비해 대응하는 보조 정보의 차이가 최소 일 때 제1 패치와 제2 패치 사이에 모션 벡터 후보가 생성된다. 모션 벡터 후보는 전송에 사용되는 모션 벡터 후보 목록에 추가된다. 차이는 속도 왜곡 최적화의 추정 비용에 기초할 수 있다. 비용 또는 거리, 예를 들어 레이트 왜곡 최적화 함수의 추정 비용에 의해 반환되는 메트릭 또는 벡터 공간의 메트릭 거리는 모션 벡터 후보 목록으로부터의 최소 비용에 대응한다. 최소 비용 또는 거리가 발견되면 pcMvFieldNeighbors 변수가 참조 프레임의 패치와 연결된 올바른 패치에 대응하는 X 좌표, Y 좌표 및 최소화된 비용 값으로 업데이트된다.The V-PCC encoder generates a motion vector candidate between the first patch and the second patch based on the difference between the first side information and the second side information (3016). A motion vector candidate is generated between the first patch and the second patch when the difference between the corresponding side information compared to other patches is minimal. Motion vector candidates are added to the list of motion vector candidates used for transmission. The difference may be based on the estimated cost of velocity distortion optimization. A cost or distance, for example a metric returned by the estimated cost of the rate distortion optimization function or a metric distance in a vector space, corresponds to the minimum cost from the motion vector candidate list. If the minimum cost or distance is found, the pcMvFieldNeighbors variable is updated with the X coordinate, Y coordinate and minimized cost value corresponding to the correct patch associated with the patch in the reference frame.

V-PCC 인코더는 모션 벡터 후보를 사용하여 모션 보상을 수행한다(3018). 그런 다음 V-PCC 인코더는 정제된 모션 벡터 후보 및 병합 후보를 기반으로 모션 보상을 수행한다. 일부 구현에서, 새로운 엔티티는 병합 후보 목록에서 생성된다. 이 엔티티는 모션 추정을 위한 더 나은 예측을 찾기 위해 모션 추정을 수행하는 예측 자로 사용된다. 예를 들어, 모션 벡터 후보 목록은 패치에 대한 3D 정보와 투사된 이미지의 해당 2D 패치 위치를 기반으로 업데이트된다. 또한 병합 후보 목록은 패치에 대한 3D 정보와 투영된 이미지의 해당 2D 위치를 기반으로 업데이트될 수 있다. 따라서 이 정보는 비디오 압축 및 압축 해제에 추가된다. 이 정보는 보조 정보를 기반으로 비디오 인코더에 대해 생성되며 비디오 디코더에 대해서도 동일한 정보가 생성된다. 모션 벡터 후보 목록은 전송되거나 계산되는 것이 아니라 패치 메타 데이터와 함께 2D 패치를 수신할 때 V-PCC 인코더에 의해 생성된다. 모션 벡터 후보 목록은 "비디오 코딩을 위한 추가 정보 생성" 코드 세트에 따라 생성된다. 이후, 기존의 비디오 인코딩/디코딩 기술에 새로운 모션 벡터 후보 및 병합 후보 목록이 제공된다.The V-PCC encoder performs motion compensation using the motion vector candidate (3018). Then, the V-PCC encoder performs motion compensation based on the refined motion vector candidate and merge candidate. In some implementations, new entities are created in the merge candidate list. This entity is used as a predictor to perform motion estimation to find a better prediction for motion estimation. For example, the motion vector candidate list is updated based on 3D information about a patch and the location of the corresponding 2D patch in the projected image. Also, the merge candidate list can be updated based on the 3D information about the patch and the corresponding 2D location of the projected image. So this information is added to video compression and decompression. This information is generated for the video encoder based on the auxiliary information and the same information is generated for the video decoder. The motion vector candidate list is not transmitted or computed, but generated by the V-PCC encoder upon receiving a 2D patch along with the patch metadata. The motion vector candidate list is generated according to the “Generate Additional Information for Video Coding” code set. Thereafter, new motion vector candidate and merge candidate lists are provided to the existing video encoding/decoding technology.

제1 컴포넌트와 제2 컴포넌트 사이에 라인, 트레이스 또는 다른 미디어를 제외하고 개재 컴포넌트가 없을 때 제1 컴포넌트는 제2 컴포넌트에 직접 연결된다. 제1 컴포넌트와 제2 컴포넌트 사이에 라인, 트레이스 또는 다른 미디어 이외의 개재 컴포넌트가 있을 때 제1 컴포넌트는 제2 컴포넌트에 간접적으로 결합된다. 용어 "결합" 및 그 변형은 직접 결합 및 간접 결합을 모두 포함한다. 용어 "약"의 사용은 달리 명시되지 않는 한 후속 숫자의 ±10%를 포함하는 범위를 의미한다.A first component is directly connected to a second component when there is no intervening component between the first component and the second component other than a line, trace or other media. A first component is indirectly coupled to a second component when there is an intervening component other than a line, trace, or other media between the first component and the second component. The term "linkage" and variations thereof include both direct and indirect linkages. Use of the term "about" means a range inclusive of ±10% of the following number unless otherwise specified.

본 개시에서 여러 실시예가 제공되었지만, 개시된 시스템 및 방법은 본 개시의 사상 또는 범위를 벗어나지 않고 많은 다른 특정 형태로 구현될 수 있음을 이해할 수 있다. 본 실시예는 제한적인 것이 아니라 예시적인 것으로 간주되어야 하며, 의도는 여기에 제공된 자세한 내용에 제한되지 않는다. 예를 들어, 다양한 요소 또는 컴포넌트는 다른 시스템에서 결합 또는 통합될 수 있거나, 특정 기능이 생략되거나 구현되지 않을 수 있다.Although several embodiments have been provided in this disclosure, it is to be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of this disclosure. The present examples are to be regarded as illustrative rather than restrictive, and the intent is not to be limited to the details provided herein. For example, various elements or components may be combined or integrated in other systems, or certain functions may be omitted or not implemented.

추가로, 다양한 실시예에서 개별적이거나 별개로서 설명되고 예시된 기술, 시스템, 서브 시스템 및 방법은 본 개시의 범위를 벗어나지 않고 다른 시스템, 컴포넌트, 기술 또는 방법과 결합되거나 통합될 수 있다. 변경, 대체 및 변경의 다른 예는 당업자에 의해 확인될 수 있으며 여기에 개시된 정신 및 범위를 벗어나지 않고 이루어질 수 있다.Additionally, the techniques, systems, subsystems, and methods described and illustrated in the various embodiments, individually or separately, may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of alterations, substitutions and alterations may be identified by those skilled in the art and may be made without departing from the spirit and scope of the disclosure herein.

본 명세서에 기술된 본 발명의 실시예 및 모든 기능적 동작은 본 명세서 및 그 구조적 등가물에 개시된 구조 포함하는 디지털 전자 회로, 또는 컴퓨터 소프트웨어, 펌웨어 또는 하드웨어, 또는 이들 중 하나 이상의 조합으로 구현될 수 있다. 본 발명의 실시예는 하나 이상의 컴퓨터 프로그램 제품, 즉, 데이터 처리 장치에 의해 실행되거나 데이터 처리 장치의 동작을 제어하기 위해 컴퓨터 판독 가능 미디어 상에 인코딩된 컴퓨터 프로그램 명령의 하나 이상의 모듈로서 구현될 수 있다. 컴퓨터 판독 가능형 매체는 비 일시적 컴퓨터 판독 가능 저장 매체, 기계 판독 가능 저장 장치, 기계 판독 가능 저장 기판, 메모리 장치, 기계 판독 가능 전파 신호에 영향을 미치는 물질 구성, 또는 이들 중 하나 이상의 조합일 수 있다. "데이터 처리 장치"라는 용어는 데이터 처리를 위한 모든 장치, 장치 및 기계를 포함하며, 예를 들어 프로그램 가능한 프로세서, 컴퓨터 또는 다중 프로세서 또는 컴퓨터를 포함한다. 장치는 하드웨어에 추가하여 문제의 컴퓨터 프로그램에 대한 실행 환경을 생성하는 코드, 예를 들어 프로세서 펌웨어, 프로토콜 스택, 데이터베이스 관리 시스템, 운영 체제 또는 이들 중 하나 이상의 조합을 구성하는 코드를 포함할 수 있다. 전파된 신호는 인공적으로 생성된 신호, 예를 들어 적절한 수신기 장치로의 전송을 위해 정보를 인코딩하기 위해 생성되는 기계 생성 전기, 광학 또는 전자기 신호이다.The embodiments and all functional operations of the invention described herein may be implemented in digital electronic circuitry, or computer software, firmware or hardware, or a combination of one or more of these, including structures disclosed in this specification and their structural equivalents. Embodiments of the invention may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on computer readable media to be executed by or to control the operation of a data processing device. . A computer-readable medium may be a non-transitory computer-readable storage medium, a machine-readable storage device, a machine-readable storage substrate, a memory device, a material composition that affects a machine-readable propagated signal, or a combination of one or more of these. . The term “data processing device” includes all devices, devices and machines for processing data, including, for example, programmable processors, computers or multiple processors or computers. The device may include, in addition to hardware, code that creates an execution environment for the computer program in question, such as code that makes up a processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of these. A propagated signal is an artificially generated signal, for example a machine generated electrical, optical or electromagnetic signal generated to encode information for transmission to a suitable receiver device.

컴퓨터 프로그램(프로그램, 소프트웨어, 소프트웨어 애플리케이션, 스크립트 또는 코드라고도 함)은 컴파일된 언어 또는 해석된 언어를 포함한 모든 형태의 프로그래밍 언어로 작성될 수 있으며 독립 실행형 프로그램으로서 또는 컴퓨팅 환경에서 사용하기에 적합한 모듈, 컴포넌트, 서브 루틴 또는 기타 장치로서 포함하는 모든 형태로 배포될 수 있다. 컴퓨터 프로그램이 반드시 파일 시스템의 파일에 대응하는 것은 아니다. 프로그램은 다른 프로그램 또는 데이터(예를 들어, 마크업 언어 문서에 저장된 하나 이상의 스크립트)를 보유하는 파일의 일부, 해당 프로그램 전용 단일 파일 또는 여러 개의 조정된 파일(예를 들어, 하나 이상의 모듈, 하위 프로그램 또는 코드 일부를 저장하는 파일)에 저장될 수 있다. 컴퓨터 프로그램은 하나의 컴퓨터 또는 한 사이트에 위치하거나 여러 사이트에 분산되고 통신 네트워크에 의해 상호 연결된 여러 컴퓨터에서 실행되도록 배포될 수 있다.A computer program (also called a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and is a stand-alone program or module suitable for use in a computing environment. may be distributed in any form, including as a component, subroutine, or other device. Computer programs do not necessarily correspond to files in a file system. A program is a part of a file holding another program or data (for example, one or more scripts stored in a markup language document), a single file dedicated to that program, or several coordinated files (for example, one or more modules, subprograms). or a file that stores part of the code). A computer program may be distributed to be executed on one computer or on multiple computers located at one site or distributed across multiple sites and interconnected by a communications network.

본 명세서에 설명된 프로세스 및 로직 흐름은 입력 데이터에 대해 동작하고 출력을 생성함으로써 기능을 수행하기 위해 하나 이상의 컴퓨터 프로그램을 실행하는 하나 이상의 프로그래밍 가능한 프로세서에 의해 수행될 수 있다. 프로세스 및 로직 흐름은 또한 수행될 수 있으며, 장치는 예를 들어 필드 프로그래머블 게이트 어레이(field programmable gate array, FPGA) 또는 애플리케이션 특정 집적 회로(application specific integrated circuit, ASIC)와 같은 특수 목적 로직 회로로 구현될 수 있다.The processes and logic flows described herein can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows may also be performed, and the apparatus may be implemented with special purpose logic circuits, such as, for example, field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs). can

컴퓨터 프로그램의 실행에 적합한 프로세서는 예를 들어 범용 및 특수 목적의 마이크로프로세서, 및 임의의 종류의 디지털 컴퓨터의 임의의 하나 이상의 프로세서를 포함한다. 일반적으로 프로세서는 읽기 전용 메모리 나 랜덤 액세스 메모리 또는 둘 다에서 명령과 데이터를 수신한다. 컴퓨터의 필수 요소는 명령을 수행하기 위한 프로세서와 명령 및 데이터를 저장하기 위한 하나 이상의 메모리 장치이다. 일반적으로, 컴퓨터는 또한 데이터를 저장하기 위한 하나 이상의 대용량 저장 장치, 예를 들어 자기, 광 자기 디스크 또는 광 디스크로부터 데이터를 수신하거나 데이터를 전송하거나 둘 모두를 포함하거나 작동 가능하게 결합된다. 그러나 컴퓨터에는 이러한 장치가 필요하지 않다. 더욱이, 컴퓨터는 태블릿 컴퓨터, 모바일 전화, 퍼스널 디지털 어시스턴트(Personal Digital Assistant, PDA), 모바일 오디오 플레이어, 글로벌 포지셔닝 시스템(Global Positioning System, GPS) 수신기와 같은 다른 장치에 내장될 수 있다. 컴퓨터 프로그램 명령 및 데이터를 저장하기에 적합한 컴퓨터 판독 가능 미디어는 예를 들어 EPROM, EEPROM 및 플래시 메모리 장치와 같은 반도체 메모리 장치를 포함하는 모든 형태의 비 휘발성 메모리, 미디어 및 메모리 장치를 포함한다. 자기 디스크, 예를 들어 내부 하드 디스크 또는 이동식 디스크; 광 자기 디스크; 및 CD ROM 및 DVD-ROM 디스크. 프로세서 및 메모리는 특수 목적 논리 회로에 의해 보완되거나 통합될 수 있다.Processors suitable for the execution of computer programs include, for example, general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Typically, processors receive instructions and data from either read-only memory or random-access memory, or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer also includes or is operably coupled to receive data from or transmit data from one or more mass storage devices for storing data, such as magnetic, magneto-optical disks, or optical disks, or both. However, computers do not need these devices. Moreover, the computer may be embedded in other devices such as tablet computers, mobile phones, personal digital assistants (PDAs), mobile audio players, and Global Positioning System (GPS) receivers. Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices such as EPROM, EEPROM and flash memory devices. magnetic disks such as internal hard disks or removable disks; magneto-optical disk; and CD ROM and DVD-ROM discs. The processor and memory may be complemented or integrated by special purpose logic circuitry.

사용자와의 상호 작용을 제공하기 위해, 본 발명의 실시예는 사용자에게 정보를 표시하기 위한 음극선관(cathode ray tube, CRT) 또는 액정 디스플레이(liquid crystal display, LCD) 모니터와 같은 디스플레이 장치 및 사용자가 컴퓨터에 입력을 제공할 수 있는 키보드 및 포인팅 장치, 예를 들어 마우스 또는 트랙볼을 포함하는 컴퓨터에서 구현될 수 있다. 사용자와의 상호 작용을 제공하기 위해 다른 종류의 장치가 사용될 수도 있고; 예를 들어, 사용자에게 제공되는 피드백은 예를 들어 시각적 피드백, 청각적 피드백 또는 촉각적 피드백과 같은 모든 형태의 감각 피드백일 수 있으며; 그리고 사용자로부터의 입력은 음향, 음성 또는 촉각 입력을 포함한 임의의 형태로 수신될 수 있다.To provide interaction with a user, an embodiment of the present invention provides a display device such as a cathode ray tube (CRT) or liquid crystal display (LCD) monitor for displaying information to a user and a user It may be implemented in a computer that includes a keyboard and a pointing device capable of providing input to the computer, such as a mouse or trackball. Other types of devices may be used to provide interaction with the user; For example, the feedback provided to the user may be any form of sensory feedback, such as, for example, visual feedback, auditory feedback, or tactile feedback; Also, input from the user may be received in any form including sound, voice, or tactile input.

본 발명의 실시예는 예를 들어 데이터 서버로서 백엔드 컴포넌트를 포함하거나, 예를 들어 애플리케이션 서버와 같은 미들웨어 컴포넌트를 포함하거나, 예를 들어 사용자가 본 발명의 구현과 상호 작용할 수 있는 그래픽 사용자 인터페이스 또는 웹 브라우저를 갖는 클라이언트 컴퓨터로서 프론트엔드 컴포넌트를 포함하거나, 또는 이러한 백 엔드, 미들웨어 또는 프런트 엔드 컴포넌트의 하나 이상의 조합을 포함하는 컴퓨팅 시스템에서 구현될 수 있다. 시스템의 컴포넌트는 디지털 데이터 통신의 임의의 형태 또는 매체, 예를 들어 통신 네트워크에 의해 상호 연결될 수 있다. 통신 네트워크의 예로는 근거리 통신망("LAN")과 광역 통신망("WAN"), 예를 들어, 인터넷이 있다.Embodiments of the present invention may include a backend component, eg as a data server, or may include a middleware component, eg an application server, or may include a web or graphical user interface, eg, through which a user may interact with an implementation of the present invention. It may be implemented in a computing system that includes a front-end component as a client computer with a browser, or a combination of one or more of such back-ends, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication, for example a communication network. Examples of communication networks include local area networks ("LAN") and wide area networks ("WAN"), such as the Internet.

컴퓨팅 시스템은 클라이언트 및 서버를 포함할 수 있다. 클라이언트와 서버는 일반적으로 서로 떨어져 있으며 일반적으로 통신 네트워크를 통해 상호 작용한다. 클라이언트와 서버의 관계는 각 컴퓨터에서 실행되고 서로 클라이언트-서버 관계를 갖는 컴퓨터 프로그램으로 인해 발생한다.A computing system may include a client and a server. Clients and servers are usually remote from each other and usually interact through a communication network. The relationship of client and server arises due to computer programs running on each computer and having a client-server relationship with each other.

위에서 몇 가지 구현에 대해 자세히 설명했지만 다른 수정도 가능한다. 예를 들어, 클라이언트 애플리케이션이 델리게이트(들)에 액세스하는 것으로 설명되지만, 다른 구현에서 델리게이트(들)는 하나 이상의 서버에서 실행되는 애플리케이션과 같은 하나 이상의 프로세서에 의해 구현된 다른 애플리케이션에 의해 사용될 수 있다. 또한, 도면에 묘사된 논리 흐름은 원하는 결과를 얻기 위해 표시된 특정 순서 또는 순차적 순서를 필요로 하지 않는다. 또한, 설명된 흐름으로부터 다른 동작이 제공되거나 동작이 제거될 수 있으며, 설명된 시스템에 다른 컴포넌트가 추가되거나 제거될 수 있다. 따라서, 다른 구현은 다음 청구항의 범위 내에 있다.Several implementations have been detailed above, but other modifications are possible. For example, although a client application is described as accessing the delegate(s), in other implementations the delegate(s) may be used by other applications implemented by one or more processors, such as applications running on one or more servers. Further, the logic flows depicted in the figures do not require any specific order or sequential order shown to achieve desired results. In addition, other operations may be provided or removed from the described flow, and other components may be added to or removed from the described system. Accordingly, other implementations are within the scope of the following claims.

본 명세서는 많은 특정 구현 세부 사항을 포함하지만, 이들은 임의의 발명의 범위 또는 청구될 수 있는 것에 대한 제한으로 해석되어서는 안 되며, 특정 발명의 특정 실시예에 특정될 수 있는 특징의 설명으로 해석되어야 한다. 별도의 실시예의 맥락에서 본 명세서에 설명된 특정 특징은 또한 단일 실시예에서 조합하여 구현될 수 있다. 반대로, 단일 실시예의 맥락에서 설명된 다양한 특징은 또한 다중 실시예에서 개별적으로 또는 임의의 적절한 하위 조합으로 구현될 수 있다. 더욱이, 특징이 특정 조합으로 작용하는 것으로 위에서 설명될 수 있고 심지어 처음에 그렇게 주장될 수도 있지만, 청구된 조합으로부터 하나 이상의 특징이 어떤 경우에는 조합으로부터 절제될 수 있고, 청구된 조합은 하위 조합 또는 하위 조합의 변형을 지향할 수 있다.Although this specification contains many specific implementation details, they should not be construed as limitations on the scope of any invention or what may be claimed, but rather as a description of features that may be specific to a particular embodiment of a particular invention. . Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable subcombination. Moreover, while features may be described above as acting in particular combinations, and may even be initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be a subcombination or subcombination. Variations of the combination may be oriented.

유사하게, 동작이 특정 순서로 도면에 도시되어 있지만, 이는 그러한 동작이 도시된 특정 순서 또는 순차적 순서로 수행되거나, 바람직한 결과를 달성을 위해 모든 예시된 동작이 수행될 것을 요구하는 것으로 이해되어서는 안 된다. 특정 상황에서는 멀티태스킹 및 병렬 처리가 유리할 수 있다. 더욱이, 위에서 설명된 실시예에서 다양한 시스템 모듈 및 컴포넌트의 분리는 모든 실시예에서 그러한 분리를 요구하는 것으로 이해되어서는 안 되며, 설명된 프로그램 컴포넌트 및 시스템은 일반적으로 단일 소프트웨어 제품으로 통합될 수 있음을 이해해야 한다. 여러 소프트웨어 제품으로 패키징된다.Similarly, while actions are depicted in the drawings in a particular order, this should not be construed as requiring that such actions be performed in the specific order shown or in sequential order, or that all illustrated actions be performed to achieve a desired result. do. Multitasking and parallel processing can be advantageous in certain situations. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and that the described program components and systems may generally be integrated into a single software product. You have to understand. It is packaged into several software products.

주제의 특정 실시예가 설명되었다. 다른 실시예는 다음 청구항의 범위 내에 있다. 예를 들어, 청구항에 언급된 동작은 다른 순서로 수행될 수 있으며 여전히 바람직한 결과를 얻을 수 있다. 일례로서, 첨부된 도면에 도시된 프로세스는 바람직한 결과를 얻기 위해 도시된 특정 순서 또는 순차적인 순서를 반드시 필요로 하지 않는다. 특정 구현에서, 멀티 태스킹 및 병렬 처리가 유리할 수 있다.Specific embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still obtain desirable results. As an example, the processes depicted in the accompanying figures do not necessarily require the specific order shown or sequential order to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Claims

As a computer implemented method,
generating, by an encoder, a segmentation of the 3D point cloud data of the recorded media based on the continuity data of the 3D point cloud data;
Projecting, by the encoder, a representation of the segmented 3-dimensional point cloud data onto one or more sides of a three-dimensional bounding box, the representation of the segmented 3-dimensional point cloud data being the 3-dimensional bounding box; different based on the projected side of -;
In response to projecting the segmented representation of 3D point cloud data onto one or more sides of a 3D bounding box, the encoder performs one or more patches based on the projected representation of the segmented 3D point cloud data. generating them;
generating, by the encoder, a first frame of the one or more patches;
generating, by the encoder, first auxiliary information for the first frame;
generating, by the encoder, second auxiliary information for a reference frame, the reference frame including one or more second patches, the reference frame being a previously encoded and transmitted frame;
identifying, by the encoder, a first patch from a first frame that matches a second patch from one or more second patches of the reference frame based on the first and second auxiliary information;
generating, by the encoder, a motion vector candidate between the first patch and the second patch based on a difference between the first side information and the second side information; and
The encoder performing motion compensation using the motion vector candidate.
A computer implemented method comprising a.

According to claim 1,
wherein the reference frame is decoded to generate the second auxiliary information.

According to claim 1 or 2,
The step of generating a segmentation of the 3D point cloud data of the recorded media is to:
generating a plurality of partitions for the three-dimensional point cloud media for the encoder to subsequently project and encode each partition of the plurality of partitions;
Further comprising, a computer implemented method.

According to claim 1 or 2,
The first auxiliary information includes index data for each of the one or more patches, two-dimensional data for each of the one or more patches, and three-dimensional data for each of the one or more patches.

According to claim 4,
wherein the index data for each of the one or more patches corresponds to a corresponding side of the three-dimensional bounding box.

According to claim 4,
wherein the two-dimensional data for each of the one or more patches and the three-dimensional data for each of the one or more patches correspond to and are concatenated with a portion of the three-dimensional point cloud data.

According to claim 1 or 2,
Generating one or more patches of the 3D point cloud data based on continuity data of the 3D point cloud data includes:
determining, by the encoder, smoothness criteria of the 3D point cloud data from each direction;
comparing, by the encoder, the smoothness criterion from each direction of the 3D point cloud data; and
In response to the comparison, the encoder selecting a direction of a smoothness criterion of the 3-dimensional point cloud data having a larger projection area on the side of the bounding box.
Further comprising, a computer implemented method.

According to claim 1 or 2,
Generating a motion vector candidate between the first patch and the second patch:
determining, by the encoder, a distance between the two-dimensional data of the first auxiliary information and the two-dimensional data of the second auxiliary information;
generating, by the encoder, the motion vector candidate based on the distance between the 2D data of the first side information and the 2D data of the second side information; and
adding, by the encoder, the motion vector candidate to a motion vector candidate list;
Further comprising, a computer implemented method.

As a system,
comprising one or more computers and one or more storage devices storing operable instructions, which when executed by the one or more computers cause the one or more computers to:
generating a segmentation of the 3D point cloud data of the recorded media based on the continuity data of the 3D point cloud data;
Projecting the segmented representation of 3-dimensional point cloud data onto one or more sides of a three-dimensional bounding box, wherein the segmented representation of 3-dimensional point cloud data is a projection of the 3-dimensional bounding box. different based on aspect -;
in response to projecting the segmented representation of 3-dimensional point cloud data onto one or more sides of a 3-dimensional bounding box, generating one or more patches based on the projected representation of the segmented 3-dimensional point cloud data;
generating a first frame of the one or more patches;
generating first auxiliary information for the first frame;
generating second auxiliary information for a reference frame, the reference frame including one or more second patches, the reference frame being a previously encoded and transmitted frame;
identifying a first patch from a first frame that matches a second patch from one or more second patches of the reference frame based on the first auxiliary information and the second auxiliary information;
generating a motion vector candidate between the first patch and the second patch based on a difference between the first auxiliary information and the second auxiliary information; and
performing motion compensation using the motion vector candidate;
A system that allows to perform an operation that includes.

According to claim 9,
wherein the reference frame is decoded to generate the second auxiliary information.

The method of claim 9 or 10,
The step of generating a segmentation of the 3D point cloud data of the recorded media is to:
generating a plurality of partitions of the three-dimensional point cloud media for subsequently projecting and encoding each partition of the plurality of partitions.
Further comprising a system.

The method of claim 9 or 10,
The first auxiliary information includes index data for each of the one or more patches, 2-dimensional data for each of the one or more patches, and 3-dimensional data for each of the one or more patches.

According to claim 12,
wherein the index data for each of the one or more patches corresponds to a corresponding side of the three-dimensional bounding box.

According to claim 12,
wherein the two-dimensional data for each of the one or more patches and the three-dimensional data for each of the one or more patches correspond to and are concatenated with a portion of the three-dimensional point cloud data.

The method of claim 9 or 10,
Generating one or more patches of the 3D point cloud data based on continuity data of the 3D point cloud data includes:
determining smoothness criteria of the 3D point cloud data from each direction;
comparing the smoothness criterion from each direction of the 3D point cloud data; and
Responsive to the comparison, selecting a direction of a smoothness criterion of the three-dimensional point cloud data having a larger projection area on the side of the bounding box.
Further comprising a system.

The method of claim 9 or 10,
Generating a motion vector candidate between the first patch and the second patch:
determining a distance between the two-dimensional data of the first auxiliary information and the two-dimensional data of the second auxiliary information;
generating the motion vector candidate based on a distance between the 2D data of the first auxiliary information and the 2D data of the second auxiliary information; and
adding the motion vector candidate to a motion vector candidate list;
Further comprising a system.

As a non-transitory computer-readable storage medium,
Stored are instructions executable by one or more processing devices, which upon execution cause the one or more processing devices to:
generating a segmentation of the 3D point cloud data of the recorded media based on the continuity data of the 3D point cloud data;
Projecting the segmented representation of 3-dimensional point cloud data onto one or more sides of a three-dimensional bounding box, wherein the segmented representation of 3-dimensional point cloud data is a projection of the 3-dimensional bounding box. different based on aspect -;
in response to projecting the segmented representation of 3-dimensional point cloud data onto one or more sides of a 3-dimensional bounding box, generating one or more patches based on the projected representation of the segmented 3-dimensional point cloud data;
generating a first frame of the one or more patches;
generating first auxiliary information for the first frame;
generating second auxiliary information for a reference frame, the reference frame including one or more second patches, the reference frame being a previously encoded and transmitted frame;
identifying a first patch from a first frame that matches a second patch from one or more second patches of the reference frame based on the first auxiliary information and the second auxiliary information;
generating a motion vector candidate between the first patch and the second patch based on a difference between the first auxiliary information and the second auxiliary information; and
performing motion compensation using the motion vector candidate;
One or more non-transitory computer-readable storage media that cause the performance of operations comprising:

According to claim 17,
wherein the reference frame is decoded to generate the second auxiliary information.

According to any one of claims 17 to 18,
The step of generating a segmentation of the 3D point cloud data of the recorded media is to:
generating a plurality of partitions of the three-dimensional point cloud media for subsequently projecting and encoding each partition of the plurality of partitions.
Further comprising, non-transitory computer readable storage medium.

The method of claim 1 , wherein the first auxiliary information for the first frame corresponds to patch metadata for a specific patch of the first frame.