KR20230123951A

KR20230123951A - Bidirectional optical flow in video coding

Info

Publication number: KR20230123951A
Application number: KR1020237020313A
Authority: KR
Inventors: 지 장; 한 황; 춘-치 천; 얀 장; 바딤 세레긴; 마르타 카르체비츠
Original assignee: 퀄컴 인코포레이티드
Priority date: 2020-12-22
Filing date: 2021-12-21
Publication date: 2023-08-24
Also published as: EP4268452A1; WO2022140377A1; JP2023553839A; TW202243475A

Abstract

비디오 데이터를 디코딩하는 방법은, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하는 단계; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하는 단계; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 단계; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 단계; 및 예측 샘플들에 기초하여 블록을 복원하는 단계를 포함한다.A method of decoding video data includes determining that bi-directional optical flow (BDOF) is enabled for a block of video data; dividing the block into a plurality of subblocks based on a determination that BDOF is enabled for the block; determining individual distortion values for each sub-block of one or more sub-blocks of the plurality of sub-blocks; determining that one of the BDOFs per pixel is performed or the BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determining prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and reconstructing the block based on the predicted samples.

Description

Bidirectional optical flow in video coding

본 출원은 2021년 12월 20일자로 출원된 미국 출원 제17/645,233호 및 2020년 12월 22일자로 출원된 미국 가출원 제63/129,190호를 우선권 주장하고, 이들 각각의 전체 내용은 본 명세서에 참조에 의해 통합된다. 미국 출원 제17/645,233호는 2020년 12월 22일자로 출원된 미국 가출원 제63/129,190호의 이익을 주장한다.This application claims priority to U.S. Application Serial No. 17/645,233, filed on December 20, 2021, and U.S. Provisional Application No. 63/129,190, filed on December 22, 2020, the entire contents of each of which are incorporated herein by reference. incorporated by reference. US Application Serial No. 17/645,233 claims the benefit of US Provisional Application Serial No. 63/129,190, filed on December 22, 2020.

기술분야technology field

본 개시는 비디오 인코딩 및 비디오 디코딩에 관한 것이다.This disclosure relates to video encoding and video decoding.

디지털 비디오 능력들은 디지털 텔레비전들, 디지털 직접 브로드캐스트 시스템들, 무선 브로드캐스트 시스템들, 개인용 디지털 보조기들 (PDA들), 랩탑 또는 데스크탑 컴퓨터들, 태블릿 컴퓨터들, e-북 리더들, 디지털 카메라들, 디지털 레코딩 디바이스들, 디지털 미디어 플레이어들, 비디오 게이밍 디바이스들, 비디오 게임 콘솔들, 셀룰러 또는 위성 무선 전화기들, 소위 "스마트 폰들", 비디오 텔레컨퍼런싱 디바이스들, 비디오 스트리밍 디바이스들 등을 포함한, 광범위한 디바이스들에 통합될 수 있다. 디지털 비디오 디바이스들은 MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, 파트 10, 어드밴스드 비디오 코딩 (AVC), ITU-T H.265/고 효율 비디오 코딩 (HEVC) 에 의해 정의된 표준들, 및 그러한 표준들의 확장들에서 설명된 것들과 같은 비디오 코딩 기법들을 구현한다. 비디오 디바이스들은 그러한 비디오 코딩 기법들을 구현함으로써 디지털 비디오 정보를 더 효율적으로 송신, 수신, 인코딩, 디코딩, 및/또는 저장할 수도 있다.Digital video capabilities include digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, A wide range of devices, including digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite wireless telephones, so-called "smart phones", video teleconferencing devices, video streaming devices, and the like. can be integrated into Digital video devices are MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video It implements video coding techniques, such as those described in the standards defined by HEVC, and extensions to those standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques.

비디오 코딩 기법들은 비디오 시퀀스들에 내재한 리던던시를 감소 또는 제거하기 위해 공간 (인트라-픽처) 예측 및/또는 시간 (인터-픽처) 예측을 포함한다. 블록 기반 비디오 코딩에 대해, 비디오 슬라이스 (예컨대, 비디오 픽처 또는 비디오 픽처의 일부분) 는 비디오 블록들로 파티셔닝될 수도 있으며, 이 비디오 블록들은 또한 코딩 트리 유닛들 (CTU들), 코딩 유닛들 (CU들) 및/또는 코딩 노드들로서 지칭될 수도 있다. 픽처의 인트라-코딩된 (I) 슬라이스에서의 비디오 블록들은 동일 픽처의 이웃 블록들에서의 레퍼런스 샘플들에 대한 공간 예측을 이용하여 인코딩된다. 픽처의 인터-코딩된 (P 또는 B) 슬라이스에서의 비디오 블록들은 동일 픽처의 이웃 블록들에서의 레퍼런스 샘플들에 대한 공간 예측, 또는 다른 레퍼런스 픽처들에서의 레퍼런스 샘플들에 대한 시간 예측을 이용할 수도 있다. 픽처들은 프레임들로서 지칭될 수도 있으며, 레퍼런스 픽처들은 레퍼런스 프레임들로서 지칭될 수도 있다.Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be divided into coding tree units (CTUs), coding units (CUs) ) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks of the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks of the same picture, or temporal prediction with respect to reference samples in other reference pictures. there is. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

일반적으로, 본 개시는 디코더측 모션 벡터 도출 (예를 들어, 템플릿 매칭, 양측성 (bilateral) 매칭, 디코더측 모션 벡터 (MV) 정세화 (refinement), 및/또는 양방향 광학 플로우 (BDOF)) 을 위한 기법들을 설명한다. 본 개시의 기법들은 고효율 비디오 코딩 (HEVC), 다용도 비디오 코딩 (VVC), EVC (Essential Video Coding) 와 같은 기존의 비디오 코덱들 중 임의의 것에 적용되거나 임의의 미래의 비디오 코딩 표준들에서 효율적인 코딩 툴일 수도 있다.In general, this disclosure provides methods for decoder-side motion vector derivation (e.g., template matching, bilateral matching, decoder-side motion vector (MV) refinement, and/or bidirectional optical flow (BDOF)). Describe the techniques. The techniques of this disclosure can be applied to any of the existing video codecs, such as High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), or an efficient coding tool in any future video coding standards. may be

하나 이상의 예들에서, BDOF 에 대해, 비디오 인코더 및 비디오 디코더 (예를 들어, 비디오 코더) 는 픽셀 당 (per-pixel) BDOF 가 블록의 서브블록들에 대해 수행되는지 여부, 또는 BDOF 가 바이패스되는지 여부를 선택적으로 결정하도록 구성될 수도 있다. 즉, 비디오 코더는 픽셀 당 BDOF 또는 픽셀 당 BDOF (또는 일반적으로 BDOF) 가 바이패스되는 것 중 하나를 선택할 수도 있다. 이러한 방식으로, 예시적인 기법들은, 함께 조합될 때와 같이, 더 양호한 코딩 성능을 제공할 수도 있는 코딩 모드들 사이의 선택을 촉진할 수도 있다 (예를 들어, 여기서, 비디오 코더는 픽셀 당 BDOF 의 하나가 서브블록에 대해 수행되거나 BDOF 가 서브블록에 대해 바이패스됨을 결정함).In one or more examples, for BDOF, a video encoder and a video decoder (eg, a video coder) determine whether per-pixel BDOF is performed on subblocks of a block, or whether BDOF is bypassed. It may also be configured to selectively determine. That is, the video coder may choose either per-pixel BDOF or per-pixel BDOF (or BDOF in general) is bypassed. In this way, the example techniques, when combined together, may facilitate selection between coding modes that may provide better coding performance (e.g., where a video coder has a per-pixel BDOF of one is performed for the subblock or BDOF is bypassed for the subblock).

더욱이, 일부 예들에서, 서브블록에 대해 픽셀 당 BDOF 를 수행할지 또는 BDOF 를 바이패스할지를 결정하는 것은 왜곡 값을 결정하는 것 및 왜곡 값을 임계 값과 비교하는 것에 기초할 수도 있다. 일부 예들에서, 비디오 코더는, 왜곡 값을 결정하기 위해 사용된 계산들이 픽셀 당 BDOF 를 수행할 때 비디오 코더에 의해 재사용될 수 있는 방식으로 왜곡 값을 결정하도록 구성될 수도 있다. 예를 들어, 비디오 코더가 픽셀 당 BDOF 를 수행할 경우, 비디오 코더는 픽셀 당 BDOF 를 수행하기 위한 왜곡 값을 결정하도록 수행된 계산으로부터의 결과들을 재사용할 수도 있다.Moreover, in some examples, determining whether to perform per-pixel BDOF or bypass BDOF for a subblock may be based on determining a distortion value and comparing the distortion value to a threshold value. In some examples, the video coder may be configured to determine the distortion value in such a way that the calculations used to determine the distortion value can be reused by the video coder when performing per-pixel BDOF. For example, when a video coder performs per-pixel BDOF, the video coder may reuse results from calculations performed to determine distortion values for performing per-pixel BDOF.

일 예에서, 본 개시는 비디오 데이터를 디코딩하는 방법을 설명하며, 그 방법은, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하는 단계; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하는 단계; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 단계; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 단계; 및 예측 샘플들에 기초하여 블록을 복원하는 단계를 포함한다.In one example, this disclosure describes a method of decoding video data, the method comprising: determining that bi-directional optical flow (BDOF) is enabled for a block of video data; dividing the block into a plurality of subblocks based on a determination that BDOF is enabled for the block; for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determining individual distortion values; determining that one of the per-pixel BDOFs is performed or the BDOFs are bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determining prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and reconstructing the block based on the prediction samples.

일 예에서, 본 개시는 비디오 데이터를 디코딩하기 위한 디바이스를 설명하며, 그 디바이스는 비디오 데이터를 저장하도록 구성된 메모리; 및 메모리에 커플링된 프로세싱 회로부를 포함하고, 프로세싱 회로부는, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하고; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하고; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하고; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하고; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하고; 그리고 예측 샘플들에 기초하여 블록을 복원하도록 구성된다.In one example, this disclosure describes a device for decoding video data, the device including a memory configured to store the video data; and processing circuitry coupled to the memory, wherein the processing circuitry determines that bi-directional optical flow (BDOF) is enabled for the block of video data; divide the block into a plurality of subblocks based on determining that BDOF is enabled for the block; for each subblock of one or more subblocks of the plurality of subblocks, determine respective distortion values; determine that one of the BDOFs per pixel is performed or the BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; and reconstruct the block based on the predicted samples.

일 예에서, 본 개시는 명령들을 저장하는 컴퓨터 판독가능 저장 매체를 설명하며, 그 명령들은, 실행될 경우, 하나 이상의 프로세서들로 하여금 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하게 하고; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하게 하고; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하게 하고; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하게 하고; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하게 하고; 그리고 예측 샘플들에 기초하여 블록을 복원하게 한다.In one example, this disclosure describes a computer readable storage medium storing instructions that, when executed, cause one or more processors to determine that bidirectional optical flow (BDOF) is enabled for a block of video data. let it; divide the block into a plurality of subblocks based on a determination that BDOF is enabled for the block; for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determine respective distortion values; determine that one of the per-pixel BDOFs are performed or the BDOFs are bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determine prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and to reconstruct a block based on the prediction samples.

일 예에서, 본 개시는 비디오 데이터를 디코딩하기 위한 디바이스를 설명하며, 그 디바이스는, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하는 수단; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하는 수단; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 수단; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 수단; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 수단; 및 예측 샘플들에 기초하여 블록을 복원하는 수단을 포함한다.In one example, this disclosure describes a device for decoding video data, the device comprising: means for determining that bi-directional optical flow (BDOF) is enabled for a block of video data; means for dividing the block into a plurality of subblocks based on determining that BDOF is enabled for the block; means for determining respective distortion values for each sub-block of one or more of the plurality of sub-blocks; means for determining that one of per-pixel BDOF is performed or BDOF is bypassed for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values; means for determining prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and means for reconstructing the block based on the prediction samples.

하나 이상의 예들의 상세들이 첨부 도면들 및 하기의 설명에서 기재된다. 다른 특징들, 목적들, 및 이점들은 그 설명, 도면들, 및 청구항들로부터 명백할 것이다.The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description, drawings, and claims.

도 1 은 본 개시의 기법들을 수행할 수도 있는 예시적인 비디오 인코딩 및 디코딩 시스템을 예시한 블록 다이어그램이다.
도 2a 및 도 2b 는 예시적인 쿼드트리 바이너리 트리 (QTBT) 구조, 및 대응하는 코딩 트리 유닛 (CTU) 을 예시한 개념 다이어그램들이다.
도 3 은 본 개시의 기법들을 수행할 수도 있는 예시적인 비디오 인코더를 예시한 블록 다이어그램이다.
도 4 는 본 개시의 기법들을 수행할 수도 있는 예시적인 비디오 디코더를 예시한 블록 다이어그램이다.
도 5a 및 도 5b 는, 각각, 병합 모드 및 어드밴스드 모션 벡터 예측자 (AMVP) 모드에 대한 공간 이웃 모션 벡터 후보들의 예들을 예시한 개념 다이어그램들이다.
도 6a 및 도 6b 는, 각각, 시간 모션 벡터 예측자 (TMVP) 후보 및 모션 벡터 스케일링의 예들을 예시한 개념 다이어그램들이다.
도 7 은 초기 모션 벡터 (MV) 주변의 탐색 영역에 대해 수행된 템플릿 매칭을 예시한 개념 다이어그램이다.
도 8 은 시간 거리들에 기초하여 비례하는 모션 벡터 차이들의 예들을 예시한 개념 다이어그램이다.
도 9 는 시간 거리들에 무관하게 미러링되는 모션 벡터 차이들의 예들을 예시한 개념 다이어그램이다.
도 10 은 [-8,8] 의 탐색 범위에서의 3x3 정사각형 탐색 패턴의 일 예를 예시한 개념 다이어그램이다.
도 11 은 디코딩측 모션 벡터 정세화의 일 예를 예시한 개념 다이어그램이다.
도 12 는 양방향 광학 플로우 (BDOF) 에서 사용되는 확장된 코딩 유닛 (CU) 을 예시한 개념 다이어그램이다.
도 13 은 서브블록 바이패스를 갖는 픽셀 당 BDOF 의 예시적인 프로세스를 예시한 플로우차트이다.
도 14 는 8x8 서브블록의 픽셀 당 BDOF 의 일 예를 예시한 개념 다이어그램이다.
도 15 는 본 개시의 기법들에 따른, 현재 블록을 디코딩하기 위한 예시적인 방법을 예시한 플로우차트이다.
도 16 은 본 개시의 기법들에 따른, 현재 블록을 인코딩하기 위한 예시적인 방법을 예시한 플로우차트이다.1 is a block diagram illustrating an example video encoding and decoding system that may perform the techniques of this disclosure.
2A and 2B are conceptual diagrams illustrating an exemplary QuadTree Binary Tree (QTBT) structure, and a corresponding Coding Tree Unit (CTU).
3 is a block diagram illustrating an example video encoder that may perform the techniques of this disclosure.
4 is a block diagram illustrating an example video decoder that may perform the techniques of this disclosure.
5A and 5B are conceptual diagrams illustrating examples of spatial neighboring motion vector candidates for merge mode and advanced motion vector predictor (AMVP) mode, respectively.
6A and 6B are conceptual diagrams illustrating examples of temporal motion vector predictor (TMVP) candidates and motion vector scaling, respectively.
7 is a conceptual diagram illustrating template matching performed on a search area around an initial motion vector (MV).
8 is a conceptual diagram illustrating examples of proportional motion vector differences based on temporal distances.
9 is a conceptual diagram illustrating examples of motion vector differences that are mirrored regardless of temporal distances.
10 is a conceptual diagram illustrating an example of a 3x3 square search pattern in the search range of [-8,8].
11 is a conceptual diagram illustrating an example of decoding-side motion vector refinement.
12 is a conceptual diagram illustrating an extended coding unit (CU) used in bi-directional optical flow (BDOF).
13 is a flowchart illustrating an exemplary process of per-pixel BDOF with subblock bypass.
14 is a conceptual diagram illustrating an example of BDOF per pixel of an 8x8 subblock.
15 is a flowchart illustrating an example method for decoding a current block, in accordance with the techniques of this disclosure.
16 is a flowchart illustrating an example method for encoding a current block, in accordance with the techniques of this disclosure.

비디오 인코더는 블록에 대한 하나 이상의 모션 벡터들을 갖는 하나 이상의 레퍼런스 픽처들에서의 하나 이상의 레퍼런스 블록들로부터 예측 블록을 생성하도록 구성될 수도 있다. 비디오 인코더는 예측 블록과 블록 사이의 잔차, 및 잔차를 나타내는 신호 정보 및 모션 벡터를 결정하는데 사용되는 정보를 결정한다. 비디오 디코더는 잔차를 나타내는 정보 및 모션 벡터를 결정하는데 사용되는 정보를 수신한다. 비디오 디코더는 모션 벡터(들)를 결정하고, 모션 벡터(들)로부터 레퍼런스 블록(들)을 결정하며, 예측 블록을 생성한다. 비디오 디코더는 블록을 복원하기 위해 예측 블록을 잔차에 가산한다.A video encoder may be configured to generate a predictive block from one or more reference blocks in one or more reference pictures having one or more motion vectors for the block. The video encoder determines prediction blocks and residuals between the blocks, and signal information representing the residuals and information used to determine motion vectors. A video decoder receives information representing residuals and information used to determine motion vectors. A video decoder determines motion vector(s), determines reference block(s) from the motion vector(s), and generates a predictive block. A video decoder adds the predictive block to the residual to reconstruct the block.

일부 경우들에서, 레퍼런스 블록 및 예측 블록은 동일한 블록이다. 하지만, 레퍼런스 블록과 예측 블록이 동일한 것이 모든 예들에서 요구되지는 않는다. 일부 예들에서, 예컨대, 양방향 예측에서, 비디오 인코더 및 비디오 디코더는 제 1 모션 벡터에 기초하여 제 1 레퍼런스 블록을, 그리고 제 2 모션 벡터에 기초하여 제 2 레퍼런스 블록을 결정할 수도 있다. 비디오 인코더 및 비디오 디코더는 예측 블록을 생성하기 위해 제 1 및 제 2 레퍼런스 블록들을 블렌딩할 수도 있다.In some cases, the reference block and predictive block are the same block. However, it is not required in all examples that the reference block and the prediction block are the same. In some examples, eg, in bi-prediction, a video encoder and a video decoder may determine a first reference block based on a first motion vector and a second reference block based on a second motion vector. A video encoder and a video decoder may blend the first and second reference blocks to generate a predictive block.

더욱이, 일부 예들에서, 비디오 인코더 및 비디오 디코더는 제 1 및 제 2 레퍼런스 블록들의 샘플 값들에 대한 조정들에 기초하여 예측 블록을 생성할 수도 있다. 예측 블록의 샘플들을 생성하기 위해 샘플 값들을 조정하기 위한 하나의 예시적인 방법은 양방향 광학 플로우 (BDOF) 로서 지칭된다. 예를 들어, I⁽⁰⁾(x,y) 가 제 1 레퍼런스 블록을 참조하고, I⁽¹⁾(x,y) 가 제 2 레퍼런스 블록을 참조한다고 가정한다. BDOF 에서, 예측 블록은 I⁽⁰⁾(x,y) 플러스 I⁽¹⁾(x,y) 로서 고려될 수도 있다. 하기에 설명되는 바와 같이, 비디오 인코더 및 비디오 디코더는 조정 팩터들 (즉, b(x,y)) 을 결정하고, 조정 팩터들을, 예측 샘플들을 결정하는 프로세스의 부분으로서 예측 블록에 가산할 수도 있다 (즉, I⁽⁰⁾(x,y) + I⁽¹⁾ (x,y) + b(x,y)). 예측 샘플들을 결정하기 위해 I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) + b(x,y) 의 결과의 추가적인 스케일링 및 오프셋팅이 있을 수도 있다.Moreover, in some examples, the video encoder and video decoder may generate the predictive block based on adjustments to sample values of the first and second reference blocks. One example method for adjusting sample values to generate samples of a predictive block is referred to as bi-directional optical flow (BDOF). For example, assume that I ⁽⁰⁾ (x,y) refers to the first reference block and I ⁽¹⁾ (x,y) refers to the second reference block. In BDOF, a predictive block may be considered as I ⁽⁰⁾ (x,y) plus I ⁽¹⁾ (x,y). As described below, the video encoder and video decoder may determine adjustment factors (ie, b(x,y)) and add the adjustment factors to the predictive block as part of the process of determining predictive samples. (i.e. I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) + b(x,y)). There may be additional scaling and offsetting of the result of I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) + b(x,y) to determine the prediction samples.

BDOF 에서, 비디오 인코더 및 비디오 디코더는, 예측 샘플들을 생성하기 위해 예측 블록의 샘플 값들을 조정하기 위한 조정 팩터들 (예를 들어, 승산되거나 가산된 팩터들) 을 결정하도록 모션 벡터를 활용한다. 일 예로서, 비디오 인코더 및 비디오 디코더는 제 1 레퍼런스 블록, 제 2 레퍼런스 블록의 대응하는 샘플들 및 모션 정세화로부터 생성된 대응하는 값들을 가산함으로써 예측 샘플들을 생성할 수도 있다.In BDOF, a video encoder and a video decoder utilize a motion vector to determine adjustment factors (eg, multiplied or added factors) for adjusting sample values of a predictive block to generate predictive samples. As an example, a video encoder and a video decoder may generate predictive samples by adding corresponding values of the first reference block, the corresponding samples of the second reference block, and the motion refinement.

다양한 타입들의 BDOF 기법들이 존재할 수도 있다. BDOF 의 일 예는 서브블록 BDOF 이고, BDOF 기법들의 다른 예는 픽셀 당 BDOF 이다. 서브블록 BDOF 에서, 비디오 인코더 및 비디오 디코더는 서브블록에 대한 모션 정세화 (또한, 정세화된 모션으로 지칭됨) 를 결정한다. 서브블록 BDOF 에 대해, 비디오 인코더 및 비디오 디코더는 예측 블록으로부터의 샘플들을 조정하기 위해 동일한 모션 정세화를 사용하며, 여기서, 예측 블록은 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록 (예를 들어, 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록의 합, 또는 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록의 가중 평균) 으로 생성될 수도 있다. 픽셀 당 BDOF 에서, 비디오 인코더 및 비디오 디코더는, 현재 블록에서의 2 이상의 샘플들에 대해 상이할 수도 있는 모션 정세화 팩터들을 결정할 수도 있다. 픽셀 당 BDOF 에 대해, 비디오 인코더 및 비디오 디코더는, 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록으로 생성될 수도 있는 예측 블록으로부터의 샘플들을 조정하기 위해 픽셀 당 샘플에 대해 결정된 모션 정세화들 (또한, 정세화된 모션들로 지칭됨) 을 사용할 수도 있다.Various types of BDOF techniques may exist. One example of BDOF is subblock BDOF, and another example of BDOF techniques is per-pixel BDOF. In a subblock BDOF, a video encoder and a video decoder determine a motion refinement (also referred to as refined motion) for a subblock. For subblock BDOF, a video encoder and a video decoder use the same motion refinement to adjust samples from a predictive block, where the predictive block is a first reference block and a second reference block (e.g., the first reference block block and the second reference block, or a weighted average of the first reference block and the second reference block). In per-pixel BDOF, a video encoder and a video decoder may determine motion refinement factors, which may be different for two or more samples in the current block. For per-pixel BDOF, the video encoder and video decoder determine the motion refinements determined for the sample-per-pixel (also the refined referred to as motions) may be used.

BDOF 또는 다른 정세화 기법들이 블록 레벨에서 선택적으로 인에이블될 수도 있지만, 서브블록 레벨에서 BDOF 가 적용되는지 여부는 왜곡 값들에 기초하여 추론될 수도 있다. 예를 들어, 비디오 인코더는 블록에 대해 BDOF 를 인에이블할 수도 있으며, BDOF 가 블록에 대해 인에이블됨을 표시하는 정보를 시그널링할 수도 있다.Whether BDOF or other refinement techniques may be selectively enabled at the block level, but whether BDOF is applied at the sub-block level may be inferred based on the distortion values. For example, a video encoder may enable BDOF for a block and may signal information indicating that BDOF is enabled for the block.

이에 응답하여, 비디오 디코더는, BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할할 수도 있다. BDOF 가 블록에 대해 인에이블되지만, 비디오 디코더는, 서브블록 단위 기반으로 BDOF 가 실제로 수행되거나 또는 바이패스될 것인지를 결정할 수도 있다. 예를 들어, 비디오 디코더는, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정한다.In response, the video decoder may divide the block into a plurality of subblocks based on determining that BDOF is enabled for the block. Although BDOF is enabled for a block, a video decoder may determine whether BDOF is actually performed or bypassed on a subblock-by-subblock basis. For example, a video decoder determines, for each subblock of one or more subblocks of a plurality of subblocks, respective distortion values.

본 개시에서 설명된 하나 이상의 예들에 따르면, 비디오 디코더는 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다. 예를 들어, 비디오 디코더는 제 1 서브블록에 대한 제 1 왜곡 값을 결정하고, 픽셀 당 BDOF 가 제 1 왜곡 값에 기초하여 제 1 서브블록에 대해 수행됨을 결정할 수도 있다. 비디오 디코더는 제 2 서브블록에 대한 제 2 왜곡 값을 결정하고, BDOF 가 제 2 왜곡 값에 기초하여 제 2 서브블록에 대해 바이패스됨을 결정하는 등등일 수도 있다.According to one or more examples described in this disclosure, the video decoder performs one of per-pixel BDOF for each subblock of one or more subblocks of a plurality of subblocks based on respective distortion values, or BDOF is bypassed. may decide to become For example, a video decoder may determine a first distortion value for a first subblock and determine that per-pixel BDOF is performed for the first subblock based on the first distortion value. The video decoder may determine a second distortion value for the second subblock, determine that BDOF is bypassed for the second subblock based on the second distortion value, and so on.

하나 이상의 예들에서, BDOF 가 수행됨을 비디오 디코더가 결정하면, 비디오 디코더는 픽셀 당 BDOF 를 수행할 수도 있고, 다른 BDOF 기법들은 비디오 디코더에 이용가능하지 않을 수도 있다. 즉, 비디오 디코더는, 서브블록 단위 기반으로, 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다. BDOF 가 수행될 경우, 비디오 디코더에 이용가능한 BDOF 기법은 픽셀 당 BDOF 일 수도 있고, 다른 BDOF 기법들은 이용가능하지 않을 수도 있다.In one or more examples, if the video decoder determines that BDOF is performed, the video decoder may perform per-pixel BDOF, and other BDOF techniques may not be available to the video decoder. That is, the video decoder may determine, on a subblock-by-subblock basis, that one of the BDOFs per pixel is performed or the BDOF is bypassed for each subblock. If BDOF is performed, the BDOF technique available to the video decoder may be per-pixel BDOF, and other BDOF techniques may not be available.

하나 이상의 예들에 있어서, 상기에서 설명된 바와 같이, 비디오 디코더는 서브블록 단위 기반으로 픽셀 당 BDOF 가 수행되는지 여부 또는 BDOF 가 바이패스되는지 여부를 결정하기 위한 왜곡 값들을 결정할 수도 있다. 일부 예들에 있어서, 하기에서 더 상세히 설명될 바와 같이, 비디오 디코더는 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 결정하기 위한 왜곡 값들을 결정하는데 사용되는 계산들을 재사용할 수도 있다. 예를 들어, 제 1 서브블록에 대해, 비디오 디코더는 제 1 왜곡 값을 결정할 수도 있다. 제 1 서브블록에 대해, 비디오 디코더는 픽셀 당 BDOF 가 인에이블됨을 결정하였다고 가정한다. 일부 예들에서, 픽셀 당 모션 정세화를 결정하기 위해 필요한 모든 값들을 재계산하기보다는, 비디오 디코더는, 픽셀 당 BDOF 가 픽셀 당 모션 정세화를 결정하기 위해 수행된다는 것을 결정하기 위해 비디오 디코더가 수행한 계산으로부터의 결과들을 재사용하도록 구성될 수도 있다.In one or more examples, as described above, a video decoder may determine distortion values for determining whether per-pixel BDOF is performed or whether BDOF is bypassed on a subblock-by-subblock basis. In some examples, as will be described in more detail below, a video decoder may reuse calculations used to determine distortion values for determining per-pixel motion refinement for per-pixel BDOF. For example, for a first subblock, a video decoder may determine a first distortion value. Assume that for the first subblock, the video decoder has determined that per-pixel BDOF is enabled. In some examples, rather than recalculating all of the values necessary to determine the per-pixel motion refinement, the video decoder may determine from the calculations performed by the video decoder to determine that the per-pixel BDOF is performed to determine the per-pixel motion refinement. It may also be configured to reuse the results of

비디오 디코더는, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하도록 구성될 수도 있다. 예를 들어, 서브블록에 대해, 픽셀 당 BDOF 가 수행됨을 가정한다. 이 예에서, 비디오 디코더는, 픽셀 당 모션 정세화에 기초하여 예측 블록 (예를 들어, 2개의 레퍼런스 블록들을 결합하는 것으로부터 생성된 블록) 의 샘플들을 정세화함으로써 서브블록에 대해 예측 샘플들을 생성할 수도 있다. 다른 예로서, 서브블록에 대해, BDOF 가 바이패스됨을 가정한다. 이 예에서, 비디오 디코더는 예측 샘플들을 생성하기 위한 예측 블록의 샘플들의 정세화를 수행하지 않을 수도 있다. 오히려, 예측 블록의 샘플들은 예측 샘플들과 동일할 수도 있다 (또는 가능하게는, BDOF 에 기초하지 않는 일부 조정이 있을 수도 있음). 예를 들어, BDOF 가 바이패스될 경우, 비디오 인코더 및 비디오 디코더는 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록에서의 대응하는 샘플들의 가중 평균을 결정함으로써 예측 샘플들을 생성할 수도 있다.A video decoder may be configured to determine predictive samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed. For example, it is assumed that BDOF is performed per pixel for subblocks. In this example, a video decoder may generate predictive samples for a subblock by refining samples of a predictive block (eg, a block generated from combining two reference blocks) based on per-pixel motion refinement. there is. As another example, assume that BDOF is bypassed for a subblock. In this example, the video decoder may not perform refinement of the samples of the predictive block to generate the predictive samples. Rather, the samples of the predictive block may be the same as the predictive samples (or possibly with some adjustment not based on BDOF). For example, when BDOF is bypassed, the video encoder and video decoder may generate predictive samples by determining a weighted average of corresponding samples in the first reference block and the second reference block.

비디오 디코더는 예측 샘플들에 기초하여 블록을 복원할 수도 있다. 예를 들어, 비디오 디코더는 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 수신하고, 잔차 값들을 예측 샘플들에 가산하여 블록을 복원할 수도 있다. 상기의 예들은 비디오 디코더의 관점에서 설명된다. 비디오 인코더가 유사한 기법들을 수행하도록 구성될 수도 있다. 예를 들어, 비디오 디코더에 의해 생성된 예측 샘플들은 비디오 인코더에 의해 생성된 예측 샘플들과 동일해야 한다. 따라서, 비디오 인코더는, 비디오 디코더와 동일한 방식으로 예측 샘플들을 결정하기 위해 상기에서 설명된 것들과 유사한 기법들을 수행할 수도 있다.A video decoder may reconstruct a block based on the predictive samples. For example, a video decoder may receive residual values representing differences between prediction samples and samples of a block, and add the residual values to the prediction samples to reconstruct the block. The above examples are described from the point of view of a video decoder. A video encoder may be configured to perform similar techniques. For example, the prediction samples generated by the video decoder should be the same as the prediction samples generated by the video encoder. Thus, a video encoder may perform techniques similar to those described above to determine predictive samples in the same way as a video decoder.

도 1 은 본 개시의 기법들을 수행할 수도 있는 예시적인 비디오 인코딩 및 디코딩 시스템 (100) 을 예시한 블록 다이어그램이다. 본 개시의 기법들은 일반적으로 비디오 데이터를 코딩 (인코딩 및/또는 디코딩) 하는 것에 관한 것이다. 일반적으로, 비디오 데이터는 비디오를 프로세싱하기 위한 임의의 데이터를 포함한다. 따라서, 비디오 데이터는 원시의, 인코딩되지 않은 비디오, 인코딩된 비디오, 디코딩된 (예컨대, 복원된) 비디오, 및 시그널링 데이터와 같은 비디오 메타데이터를 포함할 수도 있다.1 is a block diagram illustrating an example video encoding and decoding system 100 that may perform the techniques of this disclosure. The techniques of this disclosure relate generally to coding (encoding and/or decoding) video data. Generally, video data includes any data for processing video. Accordingly, video data may include video metadata such as raw, unencoded video, encoded video, decoded (eg, reconstructed) video, and signaling data.

도 1 에 도시된 바와 같이, 시스템 (100) 은, 이 예에 있어서, 목적지 디바이스 (116) 에 의해 디코딩 및 디스플레이될 인코딩된 비디오 데이터를 제공하는 소스 디바이스 (102) 를 포함한다. 특히, 소스 디바이스 (102) 는 비디오 데이터를, 컴퓨터 판독가능 매체 (110) 를 통해 목적지 디바이스 (116) 에 제공한다. 소스 디바이스 (102) 및 목적지 디바이스 (116) 는, 데스크탑 컴퓨터, 노트북 (즉, 랩탑) 컴퓨터, 모바일 디바이스, 태블릿 컴퓨터, 셋탑 박스, 전화기 핸드셋들, 예컨대, 스마트폰들, 텔레비전, 카메라, 디스플레이 디바이스, 디지털 미디어 플레이어, 비디오 게이밍 콘솔, 비디오 스트리밍 디바이스, 브로드캐스트 수신기 디바이스 등을 포함하여, 광범위한 디바이스들 중 임의의 것을 포함할 수도 있다. 일부 경우들에 있어서, 소스 디바이스 (102) 및 목적지 디바이스 (116) 는 무선 통신을 위해 장비될 수도 있고, 따라서, 무선 통신 디바이스들로서 지칭될 수도 있다.As shown in FIG. 1 , system 100 , in this example, includes a source device 102 that provides encoded video data to be decoded and displayed by a destination device 116 . In particular, source device 102 provides video data to destination device 116 via computer readable medium 110 . Source device 102 and destination device 116 may be desktop computers, notebook (i.e., laptop) computers, mobile devices, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, It may include any of a wide variety of devices, including digital media players, video gaming consoles, video streaming devices, broadcast receiver devices, and the like. In some cases, source device 102 and destination device 116 may be equipped for wireless communication and, therefore, may be referred to as wireless communication devices.

도 1 의 예에 있어서, 소스 디바이스 (102) 는 비디오 소스 (104), 메모리 (106), 비디오 인코더 (200), 및 출력 인터페이스 (108) 를 포함한다. 목적지 디바이스 (116) 는 입력 인터페이스 (122), 비디오 디코더 (300), 메모리 (120), 및 디스플레이 디바이스 (118) 를 포함한다. 본 개시에 따르면, 소스 디바이스 (102) 의 비디오 인코더 (200) 및 목적지 디바이스 (116) 의 비디오 디코더 (300) 는 템플릿 매칭, 양측성 매칭, 디코더측 모션 벡터 (MV) 정세화, 및 양방향 광학 플로우와 같은 디코더측 모션 벡터 도출 기법들에 대해 그 기법들을 적용하도록 구성될 수도 있다. 따라서, 소스 디바이스 (102) 는 비디오 인코딩 디바이스의 일 예를 나타내는 한편, 목적지 디바이스 (116) 는 비디오 디코딩 디바이스의 일 예를 나타낸다. 다른 예들에 있어서, 소스 디바이스 및 목적지 디바이스는 다른 컴포넌트들 또는 배열들을 포함할 수도 있다. 예를 들어, 소스 디바이스 (102) 는 외부 카메라와 같은 외부 비디오 소스로부터 비디오 데이터를 수신할 수도 있다. 마찬가지로, 목적지 디바이스 (116) 는 통합된 디스플레이 디바이스를 포함하는 것보다는, 외부 디스플레이 디바이스와 인터페이스할 수도 있다.In the example of FIG. 1 , source device 102 includes a video source 104 , a memory 106 , a video encoder 200 , and an output interface 108 . Destination device 116 includes input interface 122 , video decoder 300 , memory 120 , and display device 118 . According to this disclosure, video encoder 200 of source device 102 and video decoder 300 of destination device 116 perform template matching, bilateral matching, decoder-side motion vector (MV) refinement, and bi-directional optical flow and It may be configured to apply the techniques for the same decoder-side motion vector derivation techniques. Thus, source device 102 represents an example of a video encoding device, while destination device 116 represents an example of a video decoding device. In other examples, a source device and a destination device may include other components or arrangements. For example, source device 102 may receive video data from an external video source, such as an external camera. Likewise, destination device 116 may interface with an external display device, rather than including an integrated display device.

도 1 에 도시된 바와 같은 시스템 (100) 은 단지 일 예일 뿐이다. 일반적으로, 임의의 디지털 비디오 인코딩 및/또는 디코딩 디바이스는 템플릿 매칭, 양측성 매칭, 디코더측 모션 벡터 (MV) 정세화, 및 양방향 광학 플로우 (BDOF) 와 같은 디코더측 모션 벡터 도출 기법들을 위한 기법들을 수행할 수도 있다. 소스 디바이스 (102) 및 목적지 디바이스 (116) 는, 단지, 소스 디바이스 (102) 가 목적지 디바이스 (116) 로의 송신을 위한 코딩된 비디오 데이터를 생성하는 그러한 코딩 디바이스들의 예들일 뿐이다. 본 개시는 데이터의 코딩 (인코딩 및/또는 디코딩) 을 수행하는 디바이스로서 "코딩" 디바이스를 언급한다. 따라서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 코딩 디바이스들, 특히 각각 비디오 인코더 및 비디오 디코더의 예들을 나타낸다. 일부 예들에 있어서, 소스 디바이스 (102) 및 목적지 디바이스 (116) 는, 소스 디바이스 (102) 및 목적지 디바이스 (116) 의 각각이 비디오 인코딩 및 디코딩 컴포넌트들을 포함하도록 실질적으로 대칭적인 방식으로 동작할 수도 있다. 따라서, 시스템 (100) 은 예컨대, 비디오 스트리밍, 비디오 플레이백, 비디오 브로드캐스팅, 또는 비디오 전화를 위해, 소스 디바이스 (102) 와 목적지 디바이스 (116) 간의 일방 또는 양방 비디오 송신을 지원할 수도 있다.System 100 as shown in FIG. 1 is only one example. In general, any digital video encoding and/or decoding device performs techniques for decoder-side motion vector derivation techniques such as template matching, bilateral matching, decoder-side motion vector (MV) refinement, and bidirectional optical flow (BDOF). You may. Source device 102 and destination device 116 are merely examples of such coding devices for which source device 102 generates coded video data for transmission to destination device 116 . This disclosure refers to a “coding” device as a device that performs coding (encoding and/or decoding) of data. Accordingly, video encoder 200 and video decoder 300 represent examples of coding devices, particularly a video encoder and video decoder, respectively. In some examples, source device 102 and destination device 116 may operate in a substantially symmetric manner such that each of source device 102 and destination device 116 includes video encoding and decoding components. . Thus, system 100 may support one-way or two-way video transmission between source device 102 and destination device 116, eg, for video streaming, video playback, video broadcasting, or video telephony.

일반적으로, 비디오 소스 (104) 는 비디오 데이터 (즉, 원시의, 인코딩되지 않은 비디오 데이터) 의 소스를 나타내며, 픽처들에 대한 데이터를 인코딩하는 비디오 인코더 (200) 에 비디오 데이터의 순차적인 일련의 픽처들 (또한 "프레임들" 로서도 지칭됨) 을 제공한다. 소스 디바이스 (102) 의 비디오 소스 (104) 는 비디오 카메라와 같은 비디오 캡처 디바이스, 이전에 캡처된 원시 비디오를 포함하는 비디오 아카이브, 및/또는 비디오 콘텐츠 제공자로부터 비디오를 수신하기 위한 비디오 피드 인터페이스를 포함할 수도 있다. 추가의 대안으로서, 비디오 소스 (104) 는 컴퓨터 그래픽 기반 데이터를 소스 비디오로서, 또는 라이브 비디오, 아카이브된 비디오, 및 컴퓨터 생성된 비디오의 조합으로서 생성할 수도 있다. 각각의 경우에 있어서, 비디오 인코더 (200) 는 캡처된, 사전-캡처된, 또는 컴퓨터-생성된 비디오 데이터를 인코딩한다. 비디오 인코더 (200) 는 픽처들을 수신된 순서 (때때로 "디스플레이 순서" 로서 지칭됨) 로부터 코딩을 위한 코딩 순서로 재배열할 수도 있다. 비디오 인코더 (200) 는 인코딩된 비디오 데이터를 포함하는 비트스트림을 생성할 수도 있다. 그 다음, 소스 디바이스 (102) 는, 예컨대, 목적지 디바이스 (116) 의 입력 인터페이스 (122) 에 의한 수신 및/또는 취출을 위해 인코딩된 비디오 데이터를 출력 인터페이스 (108) 를 통해 컴퓨터 판독가능 매체 (110) 상으로 출력할 수도 있다.In general, video source 104 represents a source of video data (ie, raw, unencoded video data), a sequential series of pictures of video data to video encoder 200 that encodes the data for the pictures. s (also referred to as "frames"). Video source 104 of source device 102 may include a video capture device, such as a video camera, a video archive containing previously captured raw video, and/or a video feed interface for receiving video from a video content provider. may be As a further alternative, video source 104 may generate computer graphics-based data as source video, or as a combination of live video, archived video, and computer-generated video. In each case, video encoder 200 encodes captured, pre-captured, or computer-generated video data. Video encoder 200 may rearrange pictures from received order (sometimes referred to as “display order”) into coding order for coding. Video encoder 200 may generate a bitstream that includes encoded video data. Source device 102 then converts the encoded video data to computer-readable medium 110 via output interface 108 for reception and/or retrieval by, e.g., input interface 122 of destination device 116. ) can also be output.

소스 디바이스 (102) 의 메모리 (106) 및 목적지 디바이스 (116) 의 메모리 (120) 는 범용 메모리들을 나타낸다. 일부 예들에 있어서, 메모리들 (106, 120) 은 원시 비디오 데이터, 예컨대, 비디오 소스 (104) 로부터의 원시 비디오 및 비디오 디코더 (300) 로부터의 원시의, 디코딩된 비디오 데이터를 저장할 수도 있다. 부가적으로 또는 대안적으로, 메모리들 (106, 120) 은, 예컨대, 비디오 인코더 (200) 및 비디오 디코더 (300) 에 의해 각각 실행가능한 소프트웨어 명령들을 저장할 수도 있다. 메모리 (106) 및 메모리 (120) 가 이 예에서는 비디오 인코더 (200) 및 비디오 디코더 (300) 와 별도로 도시되지만, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 또한 기능적으로 유사하거나 또는 동등한 목적들을 위한 내부 메모리들을 포함할 수도 있음을 이해해야 한다. 더욱이, 메모리들 (106, 120) 은, 예컨대, 비디오 인코더 (200) 로부터 출력되고 비디오 디코더 (300) 에 입력되는 인코딩된 비디오 데이터를 저장할 수도 있다. 일부 예들에 있어서, 메모리들 (106, 120) 의 부분들은, 예컨대, 원시의, 디코딩된, 및/또는 인코딩된 비디오 데이터를 저장하기 위해 하나 이상의 비디오 버퍼들로서 할당될 수도 있다.Memory 106 of source device 102 and memory 120 of destination device 116 represent general purpose memories. In some examples, memories 106 and 120 may store raw video data, eg, raw video from video source 104 and raw, decoded video data from video decoder 300 . Additionally or alternatively, memories 106 and 120 may store software instructions executable by, eg, video encoder 200 and video decoder 300 , respectively. Although memory 106 and memory 120 are shown separately from video encoder 200 and video decoder 300 in this example, video encoder 200 and video decoder 300 also serve functionally similar or equivalent purposes. It should be understood that it may include internal memories for Moreover, memories 106 and 120 may store encoded video data, eg, output from video encoder 200 and input to video decoder 300 . In some examples, portions of memories 106 and 120 may be allocated as one or more video buffers, eg, to store raw, decoded, and/or encoded video data.

컴퓨터 판독가능 매체 (110) 는 인코딩된 비디오 데이터를 소스 디바이스 (102) 로부터 목적지 디바이스 (116) 로 전송할 수 있는 임의의 타입의 매체 또는 디바이스를 나타낼 수도 있다. 일 예에 있어서, 컴퓨터 판독가능 매체 (110) 는, 소스 디바이스 (102) 로 하여금 인코딩된 비디오 데이터를 직접 목적지 디바이스 (116) 에 실시간으로, 예컨대, 무선 주파수 네트워크 또는 컴퓨터 기반 네트워크를 통해 송신할 수 있게 하기 위한 통신 매체를 나타낸다. 무선 통신 프로토콜과 같은 통신 표준에 따라, 출력 인터페이스 (108) 는 인코딩된 비디오 데이터를 포함하는 송신 신호를 변조할 수도 있고, 입력 인터페이스 (122) 는 수신된 송신 신호를 복조할 수도 있다. 통신 매체는 무선 주파수 (RF) 스펙트럼 또는 하나 이상의 물리적 송신 라인들과 같은 임의의 무선 또는 유선 통신 매체를 포함할 수도 있다. 통신 매체는 로컬 영역 네트워크, 광역 네트워크, 또는 인터넷과 같은 글로벌 네트워크와 같은 패킷 기반 네트워크의 부분을 형성할 수도 있다. 통신 매체는 라우터들, 스위치들, 기지국들, 또는 소스 디바이스 (102) 로부터 목적지 디바이스 (116) 로의 통신을 용이하게 하는데 유용할 수도 있는 임의의 다른 장비를 포함할 수도 있다.Computer readable medium 110 may represent any type of medium or device capable of transferring encoded video data from source device 102 to destination device 116 . In one example, computer readable medium 110 enables source device 102 to transmit encoded video data directly to destination device 116 in real time, e.g., over a radio frequency network or a computer-based network. represents a medium of communication for In accordance with a communication standard, such as a wireless communication protocol, output interface 108 may modulate a transmission signal containing encoded video data, and input interface 122 may demodulate a received transmission signal. Communication media may include any wireless or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 102 to destination device 116.

일부 예들에 있어서, 소스 디바이스 (102) 는 출력 인터페이스 (108) 로부터 저장 디바이스 (112) 로 인코딩된 데이터를 출력할 수도 있다. 유사하게, 목적지 디바이스 (116) 는 입력 인터페이스 (122) 를 통해 저장 디바이스 (112) 로부터의 인코딩된 데이터에 액세스할 수도 있다. 저장 디바이스 (112) 는 하드 드라이브, 블루-레이 디스크들, DVD들, CD-ROM들, 플래시 메모리, 휘발성 또는 비휘발성 메모리, 또는 인코딩된 비디오 데이터를 저장하기 위한 임의의 다른 적합한 디지털 저장 매체들과 같은 다양한 분산된 또는 국부적으로 액세스된 데이터 저장 매체들 중 임의의 데이터 저장 매체를 포함할 수도 있다.In some examples, source device 102 may output encoded data from output interface 108 to storage device 112 . Similarly, destination device 116 may access encoded data from storage device 112 via input interface 122 . Storage device 112 may be a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. may include any of a variety of distributed or locally accessed data storage media, such as

일부 예들에 있어서, 소스 디바이스 (102) 는, 소스 디바이스 (102) 에 의해 생성된 인코딩된 비디오 데이터를 저장할 수도 있는 파일 서버 (114) 또는 다른 중간 저장 디바이스로 인코딩된 비디오 데이터를 출력할 수도 있다. 목적지 디바이스 (116) 는 스트리밍 또는 다운로드를 통해 파일 서버 (114) 로부터의 저장된 비디오 데이터에 액세스할 수도 있다.In some examples, source device 102 may output the encoded video data to a file server 114 or other intermediate storage device that may store the encoded video data generated by source device 102. Destination device 116 may access stored video data from file server 114 via streaming or download.

파일 서버 (114) 는 인코딩된 비디오 데이터를 저장하고 그 인코딩된 비디오 데이터를 목적지 디바이스 (116) 에 송신할 수 있는 임의의 타입의 서버 디바이스일 수도 있다. 파일 서버 (114) 는 (예컨대, 웹 사이트에 대한) 웹 서버, (파일 전송 프로토콜 (FTP) 또는 FLUTE (File Delivery over Unidirectional Transport) 프로토콜과 같은) 파일 전송 프로토콜 서비스를 제공하도록 구성된 서버, 콘텐츠 전달 네트워크 (CDN) 디바이스, 하이퍼텍스트 전송 프로토콜 (HTTP) 서버, 멀티미디어 브로드캐스트 멀티캐스트 서비스 (MBMS) 또는 강화된 MBMS (eMBMS) 서버, 및/또는 네트워크 어태치형 스토리지 (NAS) 디바이스를 나타낼 수도 있다. 파일 서버 (114) 는, 부가적으로 또는 대안적으로, DASH (Dynamic Adaptive Streaming over HTTP), HTTP 라이브 스트리밍 (HLS), 실시간 스트리밍 프로토콜 (RTSP), HTTP 동적 스트리밍 등과 같은 하나 이상의 HTTP 스트리밍 프로토콜들을 구현할 수도 있다.File server 114 may be any type of server device capable of storing encoded video data and transmitting the encoded video data to destination device 116 . File server 114 may be a web server (e.g., for a website), a server configured to provide file transfer protocol services (such as File Transfer Protocol (FTP) or File Delivery over Unidirectional Transport (FLUTE) protocol), and a content delivery network. (CDN) device, Hypertext Transfer Protocol (HTTP) server, Multimedia Broadcast Multicast Service (MBMS) or Enhanced MBMS (eMBMS) server, and/or Network Attached Storage (NAS) device. File server 114 may additionally or alternatively implement one or more HTTP streaming protocols, such as Dynamic Adaptive Streaming over HTTP (DASH), HTTP Live Streaming (HLS), Real Time Streaming Protocol (RTSP), HTTP Dynamic Streaming, and the like. may be

목적지 디바이스 (116) 는 인터넷 커넥션을 포함한, 임의의 표준 데이터 커넥션을 통해 파일 서버 (114) 로부터의 인코딩된 비디오 데이터에 액세스할 수도 있다. 이것은 파일 서버 (114) 상에 저장된 인코딩된 비디오 데이터에 액세스하기에 적합한, 무선 채널 (예컨대, Wi-Fi 커넥션), 유선 커넥션 (예컨대, 디지털 가입자 라인 (DSL), 케이블 모뎀 등), 또는 이들 양자 모두의 조합을 포함할 수도 있다. 입력 인터페이스 (122) 는 파일 서버 (114) 로부터 미디어 데이터를 취출하거나 수신하기 위해 상기 논의된 다양한 프로토콜들 중 임의의 하나 이상의 프로토콜들, 또는 미디어 데이터를 취출하기 위한 다른 그러한 프로토콜들에 따라 동작하도록 구성될 수도 있다.Destination device 116 may access the encoded video data from file server 114 over any standard data connection, including an Internet connection. This may be a wireless channel (eg, Wi-Fi connection), a wired connection (eg, digital subscriber line (DSL), cable modem, etc.), or both, suitable for accessing encoded video data stored on file server 114. A combination of all may be included. Input interface 122 is configured to operate according to any one or more of the various protocols discussed above for retrieving or receiving media data from file server 114, or other such protocols for retrieving media data. It could be.

출력 인터페이스 (108) 및 입력 인터페이스 (122) 는 무선 송신기들/수신기들, 모뎀들, 유선 네트워킹 컴포넌트들 (예컨대, 이더넷 카드들), 다양한 IEEE 802.11 표준들 중 임의의 것에 따라 동작하는 무선 통신 컴포넌트들, 또는 다른 물리적 컴포넌트들을 나타낼 수도 있다. 출력 인터페이스 (108) 및 입력 인터페이스 (122) 가 무선 컴포넌트들을 포함하는 예들에 있어서, 출력 인터페이스 (108) 및 입력 인터페이스 (122) 는 4G, 4G-LTE (롱 텀 에볼루션), LTE 어드밴스드, 5G 등과 같은 셀룰러 통신 표준에 따라, 인코딩된 비디오 데이터와 같은 데이터를 전송하도록 구성될 수도 있다. 출력 인터페이스 (108) 가 무선 송신기를 포함하는 일부 예들에 있어서, 출력 인터페이스 (108) 및 입력 인터페이스 (122) 는 IEEE 802.11 사양, IEEE 802.15 사양 (예컨대, ZigBee™), Bluetooth™ 표준 등과 같은 다른 무선 표준들에 따라, 인코딩된 비디오 데이터와 같은 데이터를 전송하도록 구성될 수도 있다. 일부 예들에 있어서, 소스 디바이스 (102) 및/또는 목적지 디바이스 (116) 는 개별 SoC (system-on-a-chip) 디바이스들을 포함할 수도 있다. 예를 들어, 소스 디바이스 (102) 는 비디오 인코더 (200) 및/또는 출력 인터페이스 (108) 에 기인한 기능성을 수행하기 위한 SoC 디바이스를 포함할 수도 있고, 목적지 디바이스 (116) 는 비디오 디코더 (300) 및/또는 입력 인터페이스 (122) 에 기인한 기능성을 수행하기 위한 SoC 디바이스를 포함할 수도 있다.Output interface 108 and input interface 122 are wireless transmitters/receivers, modems, wired networking components (eg, Ethernet cards), wireless communication components that operate in accordance with any of the various IEEE 802.11 standards. , or other physical components. For examples in which output interface 108 and input interface 122 include wireless components, output interface 108 and input interface 122 may be configured for 4G, 4G-LTE (long term evolution), LTE Advanced, 5G, and the like. In accordance with cellular communication standards, it may be configured to transmit data such as encoded video data. In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 conform to other wireless standards such as the IEEE 802.11 specification, the IEEE 802.15 specification (eg, ZigBee™), the Bluetooth™ standard, and the like. s, may be configured to transmit data such as encoded video data. In some examples, source device 102 and/or destination device 116 may include separate system-on-a-chip (SoC) devices. For example, source device 102 may include video encoder 200 and/or an SoC device to perform functionality due to output interface 108 , and destination device 116 may include video decoder 300 and/or a SoC device for performing functionality due to input interface 122 .

본 개시의 기법들은 오버-디-에어 (over-the-air) 텔레비전 브로드캐스트들, 케이블 텔레비전 송신들, 위성 텔레비전 송신들, 인터넷 스트리밍 비디오 송신들, 예컨대 DASH (dynamic adaptive streaming over HTTP), 데이터 저장 매체 상으로 인코딩되는 디지털 비디오, 데이터 저장 매체 상에 저장된 디지털 비디오의 디코딩, 또는 다른 어플리케이션들과 같은 다양한 멀티미디어 어플리케이션들 중 임의의 것을 지원하여 비디오 코딩에 적용될 수도 있다.The techniques of this disclosure may be used for over-the-air television broadcasts, cable television transmissions, satellite television transmissions, Internet streaming video transmissions such as dynamic adaptive streaming over HTTP (DASH), data storage It may be applied to video coding in support of any of a variety of multimedia applications, such as digital video encoded onto a medium, decoding of digital video stored on a data storage medium, or other applications.

목적지 디바이스 (116) 의 입력 인터페이스 (122) 는 컴퓨터 판독가능 매체 (110) (예컨대, 통신 매체, 저장 디바이스 (112), 파일 서버 (114) 등) 로부터 인코딩된 비디오 비트스트림을 수신한다. 인코딩된 비디오 비트스트림은, 비디오 블록들 또는 다른 코딩된 유닛들 (예컨대, 슬라이스들, 픽처들, 픽처들의 그룹들, 시퀀스들 등) 의 프로세싱 및/또는 특성들을 기술하는 값들을 갖는 신택스 엘리먼트들과 같은, 비디오 디코더 (300) 에 의해 또한 사용되는 비디오 인코더 (200) 에 의해 정의된 시그널링 정보를 포함할 수도 있다. 디스플레이 디바이스 (118) 는 디코딩된 비디오 데이터의 디코딩된 픽처들을 사용자에게 디스플레이한다. 디스플레이 디바이스 (118) 는 액정 디스플레이 (LCD), 플라즈마 디스플레이, 유기 발광 다이오드 (OLED) 디스플레이, 또는 다른 타입의 디스플레이 디바이스와 같은 다양한 디스플레이 디바이스들 중 임의의 것을 나타낼 수도 있다.Input interface 122 of destination device 116 receives the encoded video bitstream from computer readable medium 110 (eg, communication medium, storage device 112, file server 114, etc.). An encoded video bitstream includes syntax elements with values describing processing and/or characteristics of video blocks or other coded units (eg, slices, pictures, groups of pictures, sequences, etc.) The same may include signaling information defined by video encoder 200 that is also used by video decoder 300 . Display device 118 displays decoded pictures of the decoded video data to a user. Display device 118 may represent any of a variety of display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.

도 1 에 도시되지는 않았지만, 일부 예들에 있어서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 각각 오디오 인코더 및/또는 오디오 디코더와 통합될 수도 있고, 공통 데이터 스트림에서 오디오 및 비디오 양자 모두를 포함하는 멀티플렉싱된 스트림들을 핸들링하기 위해, 적절한 MUX-DEMUX 유닛들, 또는 다른 하드웨어 및/또는 소프트웨어를 포함할 수도 있다. 적용가능하다면, MUX-DEMUX 유닛들은 ITU H.223 멀티플렉서 프로토콜, 또는 사용자 데이터그램 프로토콜 (UDP) 과 같은 다른 프로토콜들에 부합할 수도 있다.Although not shown in FIG. 1 , in some examples, video encoder 200 and video decoder 300 may be integrated with an audio encoder and/or an audio decoder, respectively, and include both audio and video in a common data stream. may include suitable MUX-DEMUX units, or other hardware and/or software, to handle multiplexed streams that If applicable, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP).

비디오 인코더 (200) 및 비디오 디코더 (300) 는 각각, 하나 이상의 마이크로프로세서들, 디지털 신호 프로세서들 (DSP들), 주문형 집적 회로들 (ASIC들), 필드 프로그래밍가능 게이트 어레이들 (FPGA들), 이산 로직, 소프트웨어, 하드웨어, 펌웨어 또는 이들의 임의의 조합들과 같은 다양한 적합한 인코더 및/또는 디코더 회로부 중 임의의 것으로서 구현될 수도 있다. 기법들이 부분적으로 소프트웨어에서 구현되는 경우, 디바이스는 적합한 비일시적 컴퓨터 판독가능 매체에 소프트웨어에 대한 명령들을 저장하고, 본 개시의 기법들을 수행하기 위해 하나 이상의 프로세서들을 사용하는 하드웨어에서 그 명령들을 실행할 수도 있다. 즉, 실행될 경우, 하나 이상의 프로세서들로 하여금 본 개시에서 설명된 예시적인 기법들을 수행하게 하는 명령들을 저장하는 컴퓨터 판독가능 저장 매체가 있을 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 의 각각은 하나 이상의 인코더들 또는 디코더들에 포함될 수도 있으며, 이들 중 어느 하나는 개별 디바이스에 있어서 결합된 인코더/디코더 (CODEC) 의 부분으로서 통합될 수도 있다. 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 를 포함하는 디바이스는 집적 회로, 마이크로 프로세서, 및/또는 셀룰러 전화기와 같은 무선 통신 디바이스를 포함할 수도 있다.Video encoder 200 and video decoder 300 may each include one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete It may be implemented as any of a variety of suitable encoder and/or decoder circuitry, such as logic, software, hardware, firmware or any combinations thereof. If the techniques are implemented partly in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. . That is, there may be a computer readable storage medium storing instructions that, when executed, cause one or more processors to perform the example techniques described in this disclosure. Each of video encoder 200 and video decoder 300 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a separate device. . A device that includes video encoder 200 and/or video decoder 300 may include an integrated circuit, a microprocessor, and/or a wireless communication device such as a cellular telephone.

다음은 비디오 코딩 표준들을 설명한다. 비디오 코딩 표준들은, 그 스케일러블 비디오 코딩 (SVC) 및 멀티-뷰 비디오 코딩 (MVC) 확장들을 포함하여, ITU-T H.261, ISO/IEC MPEG-1 비주얼, ITU-T H.262 또는 ISO/IEC MPEG-2 비주얼, ITU-T H.263, ISO/IEC MPEG-4 비주얼 및 ITU-T H.264 (ISO/IEC MPEG-4 AVC 로서도 또한 공지됨) 를 포함한다. 부가적으로, 그 범위 확장, 멀티뷰 확장 (MV-HEVC) 및 스케일러블 확장 (SHVC) 을 포함한 고효율 비디오 코딩 (HEVC) 또는 ITU-T H.265 가, ISO/IEC 모션 픽처 전문가 그룹 (MPEG) 및 ITU-T 비디오 코딩 전문가 그룹 (VCEG) 의 JCT-3V (Joint Collaboration Team on 3D Video Coding Extension Development) 뿐 아니라 JCT-VC (Joint Collaboration Team on Video Coding) 에 의해 개발되었다. HEVC 사양은 ITU-T H.265, "Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video, High efficiency Video Coding," The International Telecommunication Union.December 2016, 664 Pages 로부터 입수가능하다.The following describes video coding standards. Video coding standards, including its scalable video coding (SVC) and multi-view video coding (MVC) extensions, include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO /IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC). Additionally, High Efficiency Video Coding (HEVC) or ITU-T H.265, including Range Extension, Multiview Extension (MV-HEVC) and Scalable Extension (SHVC), ISO/IEC Motion Picture Experts Group (MPEG) and the Joint Collaboration Team on 3D Video Coding Extension Development (JCT-3V) of the ITU-T Video Coding Experts Group (VCEG) as well as the Joint Collaboration Team on Video Coding (JCT-VC). The HEVC specification is available from ITU-T H.265, "Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Coding of Moving Video, High efficiency Video Coding," The International Telecommunication Union. December 2016, 664 Pages.

ITU-T VCEG (Q6/16) 및 ISO/IEC MPEG (JTC 1/SC 29/WG 11) 는, 현재 HEVC 표준 (스크린 콘텐츠 코딩 및 고-동적-범위 코딩을 위한 그의 현재 확장들 및 단기 확장들을 포함) 의 압축 능력을 현저하게 초과하는 압축 능력을 갖는 미래의 비디오 코딩 기술의 표준화를 연구하고 있다. 그 그룹들은 이 영역에서 그들의 전문가들에 의해 제안된 압축 기술 설계들을 평가하기 위해 JVET (Joint Video Exploration Team) 로서 알려진 공동 협력 노력으로 이 탐구 활동에 대해 함께 작업하고 있다. 레퍼런스 소프트웨어의 최신 버전, 즉, VVC 테스트 모델 10 (VTM 10.0) 은 https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM 으로부터 다운로드될 수 있다.ITU-T VCEG (Q6/16) and ISO/IEC MPEG (JTC 1/SC 29/WG 11) implement the current HEVC standard (its current extensions and short-term extensions for screen content coding and high-dynamic-range coding). We are working on standardization of future video coding technologies with compression capabilities significantly exceeding those of The groups are working together on this exploratory activity in a collaborative collaborative effort known as the Joint Video Exploration Team (JVET) to evaluate compression technology designs proposed by their experts in this area. The latest version of the reference software, namely VVC Test Model 10 (VTM 10.0), can be downloaded from https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM.

비디오 인코더 (200) 및 비디오 디코더 (300) 는 고 효율 비디오 코딩 (HEVC) 으로서도 또한 지칭되는 ITU-T H.265 와 같은 비디오 코딩 표준 또는 그에 대한 확장들, 예컨대 멀티-뷰 및/또는 스케일러블 비디오 코딩 확장들에 따라 동작할 수도 있다. 대안적으로, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 다용도 비디오 코딩 (VVC) 로서 또한 지칭되는 ITU-T H.266 과 같은 다른 독점 또는 산업 표준들에 따라 동작할 수도 있다. VVC 표준의 드래프트는 Bross 등의 "Versatile Video Coding (Draft 10)," Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 18^th Meeting: by teleconference, 22 June - 1 July 2020, JVET-S2001-vA (이하, "VVC 드래프트 10") 에 기술된다. VVC 드래프트 10 의 편집적 정세화는 Bross 등의 "Versatile Video Coding Editorial Refinements on Draft 10," Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 20^th Meeting: by teleconference, 7- 16 Oct.2020, JVET-T2001-v2 에 기술된다. 다용도 비디오 코딩 및 테스트 모델 10 (VTM 10.0) 의 알고리즘 설명은 J.Chen, Y.Ye 및 S.Kim 의 "Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)," JVET-T2002, Dec. 2020 (이하, JVET-T2002) 로서 지칭될 수 있다. 하지만, 본 개시의 기법들은 임의의 특정 코딩 표준으로 한정되지 않는다.Video encoder 200 and video decoder 300 implement a video coding standard such as ITU-T H.265, also referred to as High Efficiency Video Coding (HEVC) or extensions thereof, such as multi-view and/or scalable video It may operate according to coding extensions. Alternatively, video encoder 200 and video decoder 300 may operate according to other proprietary or industry standards, such as ITU-T H.266, also referred to as Versatile Video Coding (VVC). A draft of the VVC standard is from Bross et al., "Versatile Video Coding (Draft 10)," Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 18 ^th Meeting: by teleconference, 22 June - 1 July 2020, JVET-S2001-vA (hereinafter "VVC Draft 10"). Editorial refinements of VVC Draft 10 can be found in Bross et al., "Versatile Video Coding Editorial Refinements on Draft 10," Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11; 20 ^th Meeting: by teleconference, 7-16 Oct.2020, described in JVET-T2001-v2. An algorithm description of the Versatile Video Coding and Test Model 10 (VTM 10.0) can be found in J.Chen, Y.Ye and S.Kim, "Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11)," JVET-T2002, Dec. 2020 (hereinafter referred to as JVET-T2002). However, the techniques of this disclosure are not limited to any particular coding standard.

일반적으로, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽처들의 블록 기반 코딩을 수행할 수도 있다. 용어 "블록" 은 일반적으로 프로세싱될 (예컨대, 인코딩될, 디코딩될, 또는 다르게는 인코딩 및/또는 디코딩 프로세스에서 사용될) 데이터를 포함하는 구조를 지칭한다. 예를 들어, 블록은 루미넌스 및/또는 크로미넌스 데이터의 샘플들의 2차원 매트릭스를 포함할 수도 있다. 일반적으로, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 YUV (예컨대, Y, Cb, Cr) 포맷으로 표현된 비디오 데이터를 코딩할 수도 있다. 즉, 픽처의 샘플들에 대한 적색, 녹색, 및 청색 (RGB) 데이터를 코딩하는 것보다는, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 루미넌스 및 크로미넌스 컴포넌트들을 코딩할 수도 있으며, 여기서, 크로미넌스 컴포넌트들은 적색 색조 및 청색 색조 크로미넌스 컴포넌트들 양자 모두를 포함할 수도 있다. 일부 예들에 있어서, 비디오 인코더 (200) 는 인코딩 이전에 수신된 RGB 포맷팅된 데이터를 YUV 표현으로 변환하고, 비디오 디코더 (300) 는 YUV 표현을 RGB 포맷으로 변환한다. 대안적으로, 프리- 및 포스트-프로세싱 유닛들 (도시되지 않음) 이 이들 변환들을 수행할 수도 있다.In general, video encoder 200 and video decoder 300 may perform block-based coding of pictures. The term “block” generally refers to a structure containing data to be processed (eg, to be encoded, decoded, or otherwise used in an encoding and/or decoding process). For example, a block may include a two-dimensional matrix of samples of luminance and/or chrominance data. In general, video encoder 200 and video decoder 300 may code video data represented in YUV (eg, Y, Cb, Cr) format. That is, rather than coding red, green, and blue (RGB) data for samples of a picture, video encoder 200 and video decoder 300 may code luminance and chrominance components, where: Chrominance components may include both red and blue hue chrominance components. In some examples, video encoder 200 converts received RGB formatted data to a YUV representation prior to encoding, and video decoder 300 converts the YUV representation to RGB format. Alternatively, pre- and post-processing units (not shown) may perform these transformations.

본 개시는 일반적으로 픽처의 데이터를 인코딩하거나 또는 디코딩하는 프로세스를 포함하도록 픽처들의 코딩 (예컨대, 인코딩 및 디코딩) 을 참조할 수도 있다. 유사하게, 본 개시는, 블록들에 대한 데이터를 인코딩하거나 또는 디코딩하는 프로세스, 예컨대, 예측 및/또는 잔차 코딩을 포함하도록 픽처의 블록들의 코딩을 참조할 수도 있다. 인코딩된 비디오 비트스트림은 일반적으로 코딩 결정들 (예컨대, 코딩 모드들) 및 픽처들의 블록들로의 파티셔닝을 나타내는 신택스 엘리먼트들에 대한 일련의 값들을 포함한다. 따라서, 픽처 또는 블록을 코딩하는 것에 대한 참조들은 일반적으로 픽처 또는 블록을 형성하는 신택스 엘리먼트들에 대한 코딩 값들로서 이해되어야 한다.This disclosure may generally refer to coding (eg, encoding and decoding) of pictures to include the process of encoding or decoding data of a picture. Similarly, this disclosure may refer to coding of blocks of a picture to include the process of encoding or decoding data for the blocks, eg, prediction and/or residual coding. An encoded video bitstream typically includes a series of values for syntax elements that indicate coding decisions (eg, coding modes) and partitioning of pictures into blocks. Accordingly, references to coding a picture or block should generally be understood as coding values for syntax elements forming the picture or block.

HEVC 는 코딩 유닛들 (CU들), 예측 유닛들 (PU들), 및 변환 유닛들 (TU들) 을 포함한 다양한 블록들을 정의한다. HEVC 에 따르면, (비디오 인코더 (200) 와 같은) 비디오 코더는 쿼드트리 구조에 따라 코딩 트리 유닛 (CTU) 을 CU들로 파티셔닝한다. 즉, 비디오 코더는 CTU들 및 CU들을 4개의 동일한 비-중첩 정사각형들로 파티셔닝하고, 쿼드트리의 각각의 노드는 0개 또는 4개의 자식 노드들 중 어느 하나를 갖는다. 자식 노드들이 없는 노드들은 "리프 노드들" 로서 지칭될 수도 있고, 그러한 리프 노드들의 CU들은 하나 이상의 PU들 및/또는 하나 이상의 TU들을 포함할 수도 있다. 비디오 코더는 PU들 및 TU들을 추가로 파티셔닝할 수도 있다. 예를 들어, HEVC 에서, 잔차 쿼드트리 (RQT) 는 TU들의 파티셔닝을 나타낸다. HEVC 에서, PU들은 인터-예측 데이터를 나타내는 한편, TU들은 잔차 데이터를 나타낸다. 인트라-예측되는 CU들은 인트라-모드 표시와 같은 인트라-예측 정보를 포함한다.HEVC defines various blocks including coding units (CUs), prediction units (PUs), and transform units (TUs). According to HEVC, a video coder (such as video encoder 200) partitions a coding tree unit (CTU) into CUs according to a quadtree structure. That is, the video coder partitions the CTUs and CUs into four equal non-overlapping squares, and each node in the quadtree has either zero or four child nodes. Nodes with no child nodes may be referred to as “leaf nodes,” and the CUs of such leaf nodes may include one or more PUs and/or one or more TUs. A video coder may further partition PUs and TUs. For example, in HEVC, a residual quadtree (RQT) represents partitioning of TUs. In HEVC, PUs represent inter-prediction data, while TUs represent residual data. Intra-predicted CUs contain intra-prediction information such as an intra-mode indication.

다른 예로서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 VVC 에 따라 동작하도록 구성될 수도 있다. VVC 에 따르면, (비디오 인코더 (200) 와 같은) 비디오 코더는 픽처를 복수의 코딩 트리 유닛들 (CTU들) 로 파티셔닝한다. 비디오 인코더 (200) 는 쿼드트리 바이너리 트리 (QTBT) 구조 또는 멀티-타입 트리 (MTT) 구조와 같은 트리 구조에 따라 CTU 를 파티셔닝할 수도 있다. QTBT 구조는 HEVC 의 CU들, PU들, 및 TU들 간의 분리와 같은 다중의 파티션 타입들의 개념들을 제거한다. QTBT 구조는 2개의 레벨들: 즉, 쿼드트리 파티셔닝에 따라 파티셔닝된 제 1 레벨, 및 바이너리 트리 파티셔닝에 따라 파티셔닝된 제 2 레벨을 포함한다. QTBT 구조의 루트 노드는 CTU 에 대응한다. 바이너리 트리들의 리프 노드들은 코딩 유닛들 (CU들) 에 대응한다.As another example, video encoder 200 and video decoder 300 may be configured to operate according to VVC. According to VVC, a video coder (such as video encoder 200) partitions a picture into a plurality of coding tree units (CTUs). Video encoder 200 may partition the CTU according to a tree structure, such as a quadtree binary tree (QTBT) structure or a multi-type tree (MTT) structure. The QTBT structure removes the concept of multiple partition types, such as HEVC's separation between CUs, PUs, and TUs. The QTBT structure includes two levels: a first level partitioned according to quadtree partitioning, and a second level partitioned according to binary tree partitioning. A root node of the QTBT structure corresponds to a CTU. Leaf nodes of binary trees correspond to coding units (CUs).

MTT 파티셔닝 구조에서, 블록들은 쿼드트리 (QT) 파티션, 바이너리 트리 (BT) 파티션, 및 하나 이상의 타입들의 트리플 트리 (TT) (터너리 (ternary) 트리 (TT) 로 또한 지칭됨) 파티션들을 사용하여 파티셔닝될 수도 있다. 트리플 또는 터너리 트리 파티션은, 블록이 3개의 서브블록들로 분할되는 파티션이다. 일부 예들에 있어서, 트리플 또는 터너리 트리 파티션은 중심을 통해 오리지널 블록을 분할하지 않고 블록을 3개의 서브블록들로 분할한다. MTT 에서의 파티셔닝 타입들 (예컨대, QT, BT, 및 TT) 은 대칭적이거나 비대칭적일 수도 있다.In the MTT partitioning structure, blocks are divided using quadtree (QT) partitions, binary tree (BT) partitions, and one or more types of triple tree (TT) (also referred to as ternary tree (TT)) partitions. It can also be partitioned. A triple or ternary tree partition is a partition in which a block is divided into three subblocks. In some examples, a triple or ternary tree partition splits a block into three subblocks without splitting the original block through the center. Partitioning types in MTT (eg, QT, BT, and TT) may be symmetric or asymmetric.

일부 예들에 있어서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 루미넌스 및 크로미넌스 컴포넌트들의 각각을 나타내기 위해 단일 QTBT 또는 MTT 구조를 사용할 수도 있는 한편, 다른 예들에 있어서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 루미넌스 컴포넌트에 대한 하나의 QTBT/MTT 구조 및 양자 모두의 크로미넌스 컴포넌트들에 대한 다른 QTBT/MTT 구조 (또는 개별 크로미넌스 컴포넌트들에 대한 2개의 QTBT/MTT 구조들) 와 같은 2 이상의 QTBT 또는 MTT 구조들을 사용할 수도 있다.In some examples, video encoder 200 and video decoder 300 may use a single QTBT or MTT structure to represent each of the luminance and chrominance components, while in other examples, video encoder 200 and video decoder 300 has one QTBT/MTT structure for the luminance component and another QTBT/MTT structure for both chrominance components (or two QTBT/MTT structures for individual chrominance components). ) may use two or more QTBT or MTT structures such as

비디오 인코더 (200) 및 비디오 디코더 (300) 는 HEVC 마다의 쿼드트리 파티셔닝, QTBT 파티셔닝, MTT 파티셔닝, 또는 다른 파티셔닝 구조들을 사용하도록 구성될 수도 있다. 설명의 목적들을 위해, 본 개시의 기법들의 설명은 QTBT 파티셔닝에 관하여 제시된다. 하지만, 본 개시의 기법들은 또한, 쿼드트리 파티셔닝, 또는 다른 타입들의 파티셔닝도 물론 사용하도록 구성된 비디오 코더들에 적용될 수도 있음을 이해해야 한다.Video encoder 200 and video decoder 300 may be configured to use quadtree partitioning per HEVC, QTBT partitioning, MTT partitioning, or other partitioning structures. For purposes of explanation, the description of the techniques of this disclosure is presented in terms of QTBT partitioning. However, it should be understood that the techniques of this disclosure may also be applied to video coders configured to use quadtree partitioning, or other types of partitioning as well.

일부 예들에 있어서, CTU 는 루마 샘플들의 코딩 트리 블록 (CTB), 3개의 샘플 어레이들을 갖는 픽처의 크로마 샘플들의 2개의 대응하는 CTB들, 또는 샘플들을 코딩하는데 사용된 3개의 별도의 컬러 평면들 및 신택스 구조들을 사용하여 코딩되는 픽처 또는 모노크롬 픽처의 샘플들의 CTB 를 포함한다. CTB 는, CTB 들로의 컴포넌트의 분할이 파티셔닝이 되도록, N 의 일부 값에 대한 샘플들의 NxN 블록일 수도 있다. 컴포넌트는, 4:2:0, 4:2:2, 또는 4:4:4 컬러 포맷으로 픽처를 구성하는 3개의 어레이들 (루마 및 2개의 크로마) 중 하나로부터의 단일 샘플 또는 어레이, 또는 모노크롬 포맷으로 픽처를 구성하는 어레이의 단일 샘플 또는 어레이이다. 일부 예들에 있어서, 코딩 블록은, CTB 의 코딩 블록들로의 분할이 파티셔닝이 되도록 M 및 N 의 일부 값들에 대한 샘플들의 MxN 블록이다.In some examples, a CTU is a coding tree block (CTB) of luma samples, two corresponding CTBs of chroma samples of a picture having three sample arrays, or three separate color planes used to code the samples and Contains the CTB of samples of a picture or a monochrome picture coded using syntax structures. A CTB may be an NxN block of samples for some value of N such that the division of a component into CTBs is partitioning. A component is a single sample or array from one of the three arrays (luma and two chroma) that make up a picture in 4:2:0, 4:2:2, or 4:4:4 color format, or monochrome A single sample or array of arrays that make up a picture in a format. In some examples, a coding block is an MxN block of samples for some values of M and N such that the division of a CTB into coding blocks is partitioning.

블록들 (예컨대, CTU들 또는 CU들) 은 픽처에서 다양한 방식들로 그룹핑될 수도 있다. 일 예로서, 브릭 (brick) 은 픽처에서의 특정 타일 내의 CTU 행들의 직사각형 영역을 지칭할 수도 있다. 타일은 픽처에서의 특정 타일 열 및 특정 타일 행 내의 CTU 들의 직사각형 영역일 수도 있다. 타일 열은, 픽처의 높이와 동일한 높이 및 (예컨대, 픽처 파라미터 세트에서와 같이) 신택스 엘리먼트들에 의해 명시된 폭을 갖는 CTU들의 직사각형 영역을 지칭한다. 타일 행은, (예컨대, 픽처 파라미터 세트에서와 같이) 신택스 엘리먼트들에 의해 명시된 높이 및 픽처의 폭과 동일한 폭을 갖는 CTU들의 직사각형 영역을 지칭한다.Blocks (eg, CTUs or CUs) may be grouped in a variety of ways in a picture. As an example, a brick may refer to a rectangular area of CTU rows within a particular tile in a picture. A tile may be a rectangular area of CTUs within a particular tile column and a particular tile row in a picture. A tile column refers to a rectangular region of CTUs with a height equal to the height of the picture and a width specified by syntax elements (eg, as in a picture parameter set). A tile row refers to a rectangular region of CTUs with a height specified by syntax elements (eg, as in a picture parameter set) and a width equal to the width of the picture.

일부 예들에 있어서, 타일은 다중의 브릭들로 파티셔닝될 수도 있으며, 그 각각은 타일 내의 하나 이상의 CTU 행들을 포함할 수도 있다. 다중의 브릭들로 파티셔닝되지 않은 타일이 또한, 브릭으로서 지칭될 수도 있다. 하지만, 타일의 진정한 서브세트인 브릭은 타일로서 지칭되지 않을 수도 있다.In some examples, a tile may be partitioned into multiple bricks, each of which may include one or more CTU rows within the tile. A tile that is not partitioned into multiple bricks may also be referred to as a brick. However, bricks that are a true subset of tiles may not be referred to as tiles.

픽처에서의 브릭들은 또한 슬라이스로 배열될 수도 있다. 슬라이스는, 단일의 네트워크 추상화 계층 (NAL) 유닛에 배타적으로 포함될 수도 있는 픽처의 정수 개의 브릭들일 수도 있다. 일부 예들에 있어서, 슬라이스는 다수의 완전한 타일들 또는 하나의 타일의 완전한 브릭들의 연속적인 시퀀스만을 포함한다.Bricks in a picture may also be arranged into slices. A slice may be an integer number of bricks of a picture that may be exclusively included in a single network abstraction layer (NAL) unit. In some examples, a slice contains only a contiguous sequence of multiple complete tiles or complete bricks of one tile.

본 개시는 수직 및 수평 치수들의 관점에서 (CU 또는 다른 비디오 블록과 같은) 블록의 샘플 치수들을 지칭하기 위해 상호교환가능하게 "N×N" 및 "N 바이 N" 을 사용할 수도 있다, 예컨대, 16×16 샘플들 또는 16 바이 16 샘플들. 일반적으로, 16×16 CU 는 수직 방향에서 16 샘플들 (y = 16) 그리고 수평 방향에서 16 샘플들 (x = 16) 을 가질 것이다. 마찬가지로, N×N CU 는 일반적으로 수직 방향에서 N 샘플들 및 수평 방향에서 N 샘플들을 가지며, 여기서, N 은 음이 아닌 정수 값을 나타낸다. CU 에서의 샘플들은 행들 및 열들로 배열될 수도 있다. 더욱이, CU들은 수직 방향에서와 동일한 수의 샘플들을 수평 방향에서 반드시 가질 필요는 없다. 예를 들어, CU들은 N×M 샘플들을 포함할 수도 있으며, 여기서, M 은 반드시 N 과 동일할 필요는 없다.This disclosure may use “N×N” and “N by N” interchangeably to refer to sample dimensions of a block (such as a CU or other video block) in terms of vertical and horizontal dimensions, e.g., 16 ×16 samples or 16 by 16 samples. In general, a 16×16 CU will have 16 samples in the vertical direction (y = 16) and 16 samples in the horizontal direction (x = 16). Similarly, an N×N CU typically has N samples in the vertical direction and N samples in the horizontal direction, where N denotes a non-negative integer value. Samples in a CU may be arranged in rows and columns. Moreover, CUs do not necessarily have the same number of samples in the horizontal direction as in the vertical direction. For example, CUs may contain N×M samples, where M is not necessarily equal to N.

비디오 인코더 (200) 는 예측 및/또는 잔차 정보를 나타내는 CU들에 대한 비디오 데이터, 및 다른 정보를 인코딩한다. 예측 정보는, CU 에 대한 예측 블록을 형성하기 위하여 CU 가 어떻게 예측될지를 표시한다. 잔차 정보는 일반적으로, 인코딩 이전의 CU 의 샘플들과 예측 블록 사이의 샘플 단위 차이들을 나타낸다.Video encoder 200 encodes video data for CUs representing prediction and/or residual information, and other information. Prediction information indicates how a CU is to be predicted in order to form a predictive block for the CU. Residual information generally indicates sample-by-sample differences between samples of a CU before encoding and a predictive block.

CU 를 예측하기 위해, 비디오 인코더 (200) 는 일반적으로, 인터-예측 또는 인트라-예측을 통해 CU 에 대한 예측 블록을 형성할 수도 있다. 인터-예측은 일반적으로 이전에 코딩된 픽처의 데이터로부터 CU 를 예측하는 것을 지칭하는 반면, 인트라-예측은 일반적으로 동일한 픽처의 이전에 코딩된 데이터로부터 CU 를 예측하는 것을 지칭한다. 인터-예측을 수행하기 위해, 비디오 인코더 (200) 는 하나 이상의 모션 벡터들을 사용하여 예측 블록을 생성할 수도 있다. 비디오 인코더 (200) 는 일반적으로, 예컨대, CU 와 레퍼런스 블록 사이의 차이들의 관점에서, CU 에 근접하게 매칭하는 레퍼런스 블록을 식별하기 위해 모션 탐색을 수행할 수도 있다. 비디오 인코더 (200) 는 절대 차이의 합 (SAD), 제곱 차이들의 합 (SSD), 평균 절대 차이 (MAD), 평균 제곱 차이들 (MSD), 또는 레퍼런스 블록이 현재 CU 에 근접하게 매칭하는지 여부를 결정하기 위한 다른 그러한 차이 계산들을 사용하여 차이 메트릭을 계산할 수도 있다. 일부 예들에 있어서, 비디오 인코더 (200) 는 단방향 예측 또는 양방향 예측을 사용하여 현재 CU 를 예측할 수도 있다.To predict a CU, video encoder 200 may generally form a predictive block for the CU via inter-prediction or intra-prediction. Inter-prediction generally refers to predicting a CU from data of a previously coded picture, whereas intra-prediction generally refers to predicting a CU from previously coded data of the same picture. To perform inter-prediction, video encoder 200 may use one or more motion vectors to generate a predictive block. Video encoder 200 may generally perform a motion search to identify a reference block that closely matches a CU, eg, in terms of differences between the CU and the reference block. Video encoder 200 determines the sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared differences (MSD), or whether the reference block closely matches the current CU. Other such difference calculations may be used to determine the difference metric. In some examples, video encoder 200 may predict the current CU using uni-prediction or bi-prediction.

VVC 의 일부 예들은 또한, 인터-예측 모드로 고려될 수도 있는 아핀 모션 보상 모드를 제공한다. 아핀 모션 보상 모드에서, 비디오 인코더 (200) 는 줌 인 또는 아웃, 회전, 원근 모션, 또는 다른 불규칙한 모션 타입들과 같은 비-병진 모션을 나타내는 2 이상의 모션 벡터들을 결정할 수도 있다.Some examples of VVC also provide an affine motion compensation mode, which may be considered an inter-prediction mode. In affine motion compensation mode, video encoder 200 may determine two or more motion vectors representing non-translational motion, such as zooming in or out, rotation, perspective motion, or other irregular motion types.

인트라-예측을 수행하기 위해, 비디오 인코더 (200) 는 예측 블록을 생성하기 위해 인트라-예측 모드를 선택할 수도 있다. VVC 의 일부 예들은 다양한 방향성 모드들 뿐만 아니라 평면 모드 및 DC 모드를 포함하여 67개의 인트라-예측 모드들을 제공한다. 일반적으로, 비디오 인코더 (200) 는, 현재 블록의 샘플들을 예측할 현재 블록 (예컨대, CU 의 블록) 에 대한 이웃 샘플들을 기술하는 인트라-예측 모드를 선택한다. 그러한 샘플들은 일반적으로, 비디오 인코더 (200) 가 래스터 스캔 순서로 (좌우, 상하) CTU들 및 CU들을 코딩하는 것을 가정하여, 현재 블록과 동일한 픽처에서 현재 블록의 상부, 상부 및 좌측에, 또는 좌측에 있을 수도 있다.To perform intra-prediction, video encoder 200 may select an intra-prediction mode to generate a predictive block. Some examples of VVC provide 67 intra-prediction modes including planar mode and DC mode as well as various directional modes. In general, video encoder 200 selects an intra-prediction mode that describes neighboring samples for a current block (eg, a block of a CU) from which to predict samples of the current block. Such samples are generally above, above and to the left of, or to the left of, the current block in the same picture as the current block, assuming that video encoder 200 codes the CTUs and CUs in raster scan order (left to right, top to bottom). may be in

비디오 인코더 (200) 는 현재 블록에 대한 예측 모드를 나타내는 데이터를 인코딩한다. 예를 들어, 인터-예측 모드들에 대해, 비디오 인코더 (200) 는 다양한 이용가능한 인터-예측 모드들 중 어느 것이 사용되는지를 나타내는 데이터 뿐만 아니라, 대응하는 모드에 대한 모션 정보를 인코딩할 수도 있다. 단방향 또는 양방향 인터-예측을 위해, 예를 들어, 비디오 인코더 (200) 는 어드밴스드 모션 벡터 예측 (AMVP) 또는 병합 모드를 사용하여 모션 벡터들을 인코딩할 수도 있다. 비디오 인코더 (200) 는 유사한 모드들을 사용하여 아핀 모션 보상 모드에 대한 모션 벡터들을 인코딩할 수도 있다.Video encoder 200 encodes data representing a prediction mode for a current block. For example, for inter-prediction modes, video encoder 200 may encode data indicating which of the various available inter-prediction modes is being used, as well as motion information for the corresponding mode. For uni- or bi-directional inter-prediction, for example, video encoder 200 may encode motion vectors using Advanced Motion Vector Prediction (AMVP) or merge mode. Video encoder 200 may encode motion vectors for an affine motion compensation mode using similar modes.

블록의 인트라-예측 또는 인터-예측과 같은 예측에 후속하여, 비디오 인코더 (200) 는 블록에 대한 잔차 데이터를 계산할 수도 있다. 잔차 블록과 같은 잔차 데이터는 대응하는 예측 모드를 사용하여 형성되는, 블록과 블록에 대한 예측 블록 사이의 샘플 단위 차이들을 나타낸다. 비디오 인코더 (200) 는 샘플 도메인 대신에 변환 도메인에서 변환된 데이터를 생성하기 위해, 잔차 블록에 하나 이상의 변환들을 적용할 수도 있다. 예를 들어, 비디오 인코더 (200) 는 이산 코사인 변환 (DCT), 정수 변환, 웨이블릿 변환, 또는 개념적으로 유사한 변환을 잔차 비디오 데이터에 적용할 수도 있다. 부가적으로, 비디오 인코더 (200) 는 모드 의존적 비-분리가능 이차 변환 (MDNSST), 신호 의존적 변환, Karhunen-Loeve 변환 (KLT) 등과 같은 제 1 변환에 후속하여 이차 변환을 적용할 수도 있다. 비디오 인코더 (200) 는 하나 이상의 변환들의 적용에 후속하여 변환 계수들을 생성한다.Following prediction, such as intra-prediction or inter-prediction, of a block, video encoder 200 may calculate residual data for the block. Residual data, such as a residual block, represents sample-by-sample differences between a block and a prediction block for the block, formed using a corresponding prediction mode. Video encoder 200 may apply one or more transforms to the residual block to produce transformed data in the transform domain instead of the sample domain. For example, video encoder 200 may apply a discrete cosine transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video data. Additionally, video encoder 200 may apply a secondary transform subsequent to the first transform, such as a mode dependent non-separable secondary transform (MDNSST), a signal dependent transform, a Karhunen-Loeve transform (KLT), or the like. Video encoder 200 produces transform coefficients following application of one or more transforms.

상기 언급된 바와 같이, 변환 계수들을 생성하기 위한 임의의 변환들에 후속하여, 비디오 인코더 (200) 는 변환 계수들의 양자화를 수행할 수도 있다. 양자화는 일반적으로, 변환 계수들이 그 변환 계수들을 나타내는데 사용되는 데이터의 양을 가능하게는 감소시키도록 양자화되어 추가 압축을 제공하는 프로세스를 지칭한다. 양자화 프로세스를 수행함으로써, 비디오 인코더 (200) 는 변환 계수들의 일부 또는 전부와 연관된 비트 심도를 감소시킬 수도 있다. 예를 들어, 비디오 인코더 (200) 는 양자화 동안 n비트 값을 m비트 값으로 라운딩 다운할 수도 있으며, 여기서, n 은 m 보다 크다. 일부 예들에 있어서, 양자화를 수행하기 위해, 비디오 인코더 (200) 는 양자화될 값의 비트단위 우측-시프트를 수행할 수도 있다.As mentioned above, following any transforms to generate transform coefficients, video encoder 200 may perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to provide additional compression, possibly reducing the amount of data used to represent the transform coefficients. By performing the quantization process, video encoder 200 may reduce the bit depth associated with some or all of the transform coefficients. For example, video encoder 200 may round down an n-bit value to an m-bit value during quantization, where n is greater than m. In some examples, to perform quantization, video encoder 200 may perform a bitwise right-shift of the value to be quantized.

양자화에 후속하여, 비디오 인코더 (200) 는 변환 계수들을 스캔하여, 양자화된 변환 계수들을 포함한 2차원 매트릭스로부터 1차원 벡터를 생성할 수도 있다. 스캔은 벡터의 전방에 더 높은 에너지 (및 따라서 더 낮은 주파수) 변환 계수들을 배치하고 벡터의 후방에 더 낮은 에너지 (및 따라서 더 높은 주파수) 변환 계수들을 배치하도록 설계될 수도 있다. 일부 예들에 있어서, 비디오 인코더 (200) 는 양자화된 변환 계수들을 스캔하여 직렬화된 벡터를 생성하기 위해 미리정의된 스캔 순서를 활용하고, 그 다음, 벡터의 양자화된 변환 계수들을 엔트로피 인코딩할 수도 있다. 다른 예들에 있어서, 비디오 인코더 (200) 는 적응적 스캔을 수행할 수도 있다. 1차원 벡터를 형성하기 위해 양자화된 변환 계수들을 스캔한 이후, 비디오 인코더 (200) 는, 예컨대, 컨텍스트-적응적 바이너리 산술 코딩 (CABAC) 에 따라, 1차원 벡터를 엔트로피 인코딩할 수도 있다. 비디오 인코더 (200) 는 또한, 비디오 데이터를 디코딩함에 있어서 비디오 디코더 (300) 에 의한 사용을 위해 인코딩된 비디오 데이터와 연관된 메타데이터를 기술하는 신택스 엘리먼트들에 대한 값들을 엔트로피 인코딩할 수도 있다.Following quantization, video encoder 200 may scan the transform coefficients to generate a one-dimensional vector from the two-dimensional matrix containing the quantized transform coefficients. The scan may be designed to place higher energy (and therefore lower frequency) transform coefficients in front of the vector and lower energy (and therefore higher frequency) transform coefficients in the back of the vector. In some examples, video encoder 200 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector, and then entropy encode the quantized transform coefficients of the vector. In other examples, video encoder 200 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 200 may entropy encode the one-dimensional vector, eg, according to context-adaptive binary arithmetic coding (CABAC). Video encoder 200 may also entropy encode values for syntax elements that describe metadata associated with encoded video data for use by video decoder 300 in decoding the video data.

CABAC 를 수행하기 위해, 비디오 인코더 (200) 는 컨텍스트 모델 내의 컨텍스트를, 송신될 심볼에 배정할 수도 있다. 컨텍스트는 예를 들어, 심볼의 이웃 값들이 제로 값인지 여부와 관련될 수도 있다. 확률 결정은 심볼에 배정된 컨텍스트에 기초할 수도 있다.To perform CABAC, video encoder 200 may assign a context within a context model to a symbol to be transmitted. Context may relate to whether neighboring values of a symbol are zero values, for example. The probability determination may be based on the context assigned to the symbol.

비디오 인코더 (200) 는 신택스 데이터, 예컨대 블록 기반 신택스 데이터, 픽처 기반 신택스 데이터, 및 시퀀스 기반 신택스 데이터를, 비디오 디코더 (300) 에, 예컨대, 픽처 헤더, 블록 헤더, 슬라이스 헤더, 또는 다른 신택스 데이터, 예컨대 시퀀스 파라미터 세트 (SPS), 픽처 파라미터 세트 (PPS), 또는 비디오 파라미터 세트 (VPS) 에서 추가로 생성할 수도 있다. 마찬가지로, 비디오 디코더 (300) 는 대응하는 비디오 데이터를 디코딩하는 방법을 결정하기 위해 그러한 신택스 데이터를 디코딩할 수도 있다.Video encoder 200 sends syntax data, e.g., block-based syntax data, picture-based syntax data, and sequence-based syntax data, to video decoder 300, e.g., picture header, block header, slice header, or other syntax data, For example, it may additionally generate in a sequence parameter set (SPS), a picture parameter set (PPS), or a video parameter set (VPS). Likewise, video decoder 300 may decode such syntax data to determine how to decode the corresponding video data.

이러한 방식으로, 비디오 인코더 (200) 는 인코딩된 비디오 데이터, 예컨대, 픽처의 블록들 (예컨대, CU들) 로의 파티셔닝을 기술하는 신택스 엘리먼트들 및 블록들에 대한 예측 및/또는 잔차 정보를 포함하는 비트스트림을 생성할 수도 있다. 궁극적으로, 비디오 디코더 (300) 는 비트스트림을 수신하고, 인코딩된 비디오 데이터를 디코딩할 수도 있다.In this way, video encoder 200 converts encoded video data, e.g., syntax elements that describe the partitioning of a picture into blocks (e.g., CUs) and bits that contain prediction and/or residual information for the blocks. You can also create streams. Ultimately, video decoder 300 may receive the bitstream and decode the encoded video data.

일반적으로, 비디오 디코더 (300) 는 비트스트림의 인코딩된 비디오 데이터를 디코딩하기 위해 비디오 인코더 (200) 에 의해 수행되는 것과 가역적인 프로세스를 수행한다. 예를 들어, 비디오 디코더 (300) 는 비디오 인코더 (200) 의 CABAC 인코딩 프로세스와 실질적으로 유사하지만 가역적인 방식으로 CABAC 을 사용하여 비트스트림의 신택스 엘리먼트들에 대한 값들을 디코딩할 수도 있다. 신택스 엘리먼트들은 픽처의 CTU들로의 파티셔닝, 및 QTBT 구조와 같은 대응하는 파티션 구조에 따른 각각의 CTU 의 파티셔닝을 위한 파티셔닝 정보를 정의하여, CTU 의 CU들을 정의할 수도 있다. 신택스 엘리먼트들은 비디오 데이터의 블록들 (예컨대, CU들) 에 대한 예측 및 잔차 정보를 추가로 정의할 수도 있다.In general, video decoder 300 performs a process that is reversible to that performed by video encoder 200 to decode encoded video data of a bitstream. For example, video decoder 300 may decode values for syntax elements of a bitstream using CABAC in a manner that is substantially similar to, but reversible to, the CABAC encoding process of video encoder 200 . Syntax elements may define partitioning information for partitioning of a picture into CTUs and partitioning of each CTU according to a corresponding partition structure, such as a QTBT structure, to define the CUs of a CTU. Syntax elements may further define prediction and residual information for blocks of video data (eg, CUs).

잔차 정보는, 예를 들어, 양자화된 변환 계수들에 의해 표현될 수도 있다. 비디오 디코더 (300) 는 블록에 대한 잔차 블록을 재생하기 위해 블록의 양자화된 변환 계수들을 역 양자화 및 역 변환할 수도 있다. 비디오 디코더 (300) 는 시그널링된 예측 모드 (인트라- 또는 인터-예측) 및 관련된 예측 정보 (예컨대, 인터-예측을 위한 모션 정보) 를 사용하여 블록에 대한 예측 블록을 형성한다. 그 다음, 비디오 디코더 (300) 는 예측 블록과 잔차 블록을 (샘플 단위 기반으로) 결합하여 오리지널 블록을 재생할 수도 있다. 비디오 디코더 (300) 는 블록의 경계들을 따라 시각적 아티팩트들을 감소시키기 위해 디블록킹 프로세스를 수행하는 것과 같은 추가적인 프로세싱을 수행할 수도 있다.Residual information may be represented by quantized transform coefficients, for example. Video decoder 300 may inverse quantize and inverse transform the quantized transform coefficients of the block to reconstruct a residual block for the block. Video decoder 300 uses the signaled prediction mode (intra- or inter-prediction) and related prediction information (eg, motion information for inter-prediction) to form a predictive block for the block. Video decoder 300 may then combine the predictive block and the residual block (on a sample-by-sample basis) to reproduce the original block. Video decoder 300 may perform additional processing, such as performing a deblocking process to reduce visual artifacts along the boundaries of a block.

본 개시의 기법들에 따르면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 양방향 광학 플로우 (BDOF) 를 수행하도록 구성될 수도 있다. 예를 들어, 비디오 인코더 (200) 는 현재 블록을 인코딩하는 것의 부분으로서 BDOF 를 수행하도록 구성될 수도 있고, 비디오 디코더 (300) 는 현재 블록을 디코딩하는 것의 부분으로서 BDOF 를 수행하도록 구성될 수도 있다.In accordance with the techniques of this disclosure, video encoder 200 and video decoder 300 may be configured to perform bi-directional optical flow (BDOF). For example, video encoder 200 may be configured to perform BDOF as part of encoding the current block, and video decoder 300 may be configured to perform BDOF as part of decoding the current block.

더 상세히 설명된 바와 같이, 일부 예들에서, 비디오 코더 (예를 들어, 비디오 인코더 (200) 및/또는 비디오 디코더 (300)) 는 입력 블록을 복수의 서브블록들로 분할하는 것으로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하고, 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하고, 서브블록을 복수의 서브-서브블록들로 분할하고, 서브-서브블록들 중 하나 이상에 대한 정세화된 모션 벡터를 결정하는 것으로서, 하나 이상의 서브-서브블록들 중의 서브-서브블록에 대한 정세화된 모션 벡터는 서브-서브블록에서의 복수의 샘플들에 대해 동일한, 상기 정세화된 모션 벡터를 결정하고, 그리고 하나 이상의 서브-서브블록들에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하도록 구성될 수도 있다.As described in more detail, in some examples, a video coder (e.g., video encoder 200 and/or video decoder 300) divides an input block into a plurality of subblocks, such that a size of the input block divides the input block into a plurality of subblocks, which is less than or equal to the size of a coding unit, determines that a bidirectional optical flow (BDOF) is to be applied to a subblock of the plurality of subblocks based on a condition being satisfied, and Divide into a plurality of sub-subblocks, and determine a refined motion vector for one or more of the sub-subblocks, wherein the refined motion vector for a sub-subblock of the one or more sub-subblocks is determine the refined motion vector, which is the same for a plurality of samples in a sub-subblock, and perform BDOF on a subblock based on the refined motion vector for one or more sub-subblocks. may be

다른 예로서, 비디오 코더는 입력 블록을 복수의 서브블록들로 분할하는 것으로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하고, 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하고, 서브블록을 복수의 서브-서브블록들로 분할하고, 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터를 결정하고, 그리고 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하도록 구성될 수도 있다.As another example, a video coder divides an input block into a plurality of subblocks, wherein a size of the input block is less than or equal to a size of a coding unit, and based on a condition being satisfied determine that a bidirectional optical flow (BDOF) is to be applied to a subblock of the plurality of subblocks, divide the subblock into a plurality of sub-subblocks, and refine motion for each of one or more samples in the subblock It may be configured to determine a vector and perform BDOF for the subblock based on the refined motion vector for each of one or more samples in the subblock.

예를 들어, 상기에서 설명된 바와 같이, 비디오 인코더 (200) 또는 비디오 디코더 (300) 는 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터를 결정하고, 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터에 기초하여 BDOF 를 수행할 수도 있다. 본 개시에서, 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터에 기초하여 BDOF 를 수행하는 것은 "픽셀 당 BDOF" 로서 지칭된다. 예를 들어, 픽셀 당 BDOF 에 있어서, 서브블록에서의 모든 샘플들에 대해 동일한 하나의 정세화된 모션 벡터를 갖는 것보다는, 서브블록에서의 각각의 샘플에 대한 정세화된 모션 벡터가 별도로 결정된다.For example, as described above, video encoder 200 or video decoder 300 determines a refined motion vector for each of one or more samples in a subblock, and determines the number of the one or more samples in a subblock. BDOF may be performed based on the refined motion vectors for each. In this disclosure, performing BDOF based on a refined motion vector for each of one or more samples in a subblock is referred to as “per-pixel BDOF”. For example, for per-pixel BDOF, rather than having the same one refined motion vector for all samples in a subblock, the refined motion vector for each sample in a subblock is determined separately.

정세화된 모션 벡터는, 서브블록에 대한 모션 벡터가 변경됨을 반드시 의미하지는 않을 수도 있다. 오히려, 샘플에 대한 정세화된 모션 벡터는, 예측 블록에서의 샘플이 예측 샘플을 생성하기 위해 조정되는 양을 결정하는데 사용될 수도 있다. 예를 들어, 제 1 서브블록의 제 1 샘플에 대해, 제 1 정세화된 모션 벡터는 제 1 예측 샘플을 생성하기 위해 예측 블록에서 제 1 샘플을 얼마나 많이 조정할지를 표시할 수도 있고, 제 1 서브블록의 제 2 샘플에 대해, 제 2 정세화된 모션 벡터는 제 2 예측 샘플을 생성하기 위해 예측에서 제 2 샘플을 얼마나 많이 조정할지를 표시할 수도 있는 등등이다.A refined motion vector may not necessarily mean that a motion vector for a subblock is changed. Rather, the refined motion vector for the sample may be used to determine the amount by which the samples in the predictive block are adjusted to generate the predictive sample. For example, for a first sample of a first subblock, a first refined motion vector may indicate how much to adjust the first sample in the predictive block to generate the first prediction sample, and the first subblock For a second sample of , the second refined motion vector may indicate how much to adjust the second sample in the prediction to generate the second prediction sample, and so on.

본 개시에서 설명된 하나 이상의 예들에 따르면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 개별 왜곡 값들에 기초하여 블록 (예컨대, 입력 블록) 의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다. 예를 들어, 상기에서 설명된 바와 같이, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 조건이 만족되는 것에 기초하여 픽셀 당 BDOF 를 수행할 수도 있다. 조건이 만족되는 것은, 서브블록에 대한 왜곡 값이 임계치보다 큰 경우일 수도 있다.According to one or more examples described in this disclosure, video encoder 200 and video decoder 300 generate a per-pixel for each subblock of one or more subblocks of a block (eg, an input block) based on respective distortion values. It may be determined that either BDOF is performed or BDOF is bypassed. For example, as described above, video encoder 200 and video decoder 300 may perform per-pixel BDOF based on a condition being satisfied. The condition may be satisfied when the distortion value of the subblock is greater than the threshold value.

이에 따라, 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 에 대한 옵션들은, 서브블록에 대한 왜곡 값이 임계치보다 큰지 또는 임계치 이하인지에 기초하여 서브블록에 대해 픽셀 당 BDOF 를 수행하는 것 또는 BDOF 를 바이패스하는 것으로 설정될 수도 있다. 예를 들어, 일부 기법들에서, 서브블록 단위 기반으로, 비디오 인코더 (200) 및 비디오 디코더 (300) 가 픽셀 당 BDOF 를 수행하지만, BDOF 가 바이패스되는지 여부를 결정하지 않는 것이 가능할 수도 있다. 서브블록 단위 기반으로 BDOF 가 바이패스될 수 있는 일부 기법들에서, 픽셀 당 BDOF 는 이용가능하지 않았을 수도 있다. 본 개시에서 설명된 예시적인 기법들로, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 선택적으로, 픽셀 당 BDOF 를 수행하거나 또는 BDOF 를 바이패스하도록 구성될 수도 있으며, 이는 디코딩 오버헤드를 적절하게 밸런싱하는 더 양호한 비디오 압축을 발생시킬 수도 있다.Accordingly, in some examples, options for video encoder 200 and video decoder 300 include performing per-pixel BDOF for a subblock based on whether the distortion value for the subblock is greater than or less than a threshold. It may be configured to bypass BDOF or BDOF. For example, in some techniques, it may be possible for video encoder 200 and video decoder 300 to perform per-pixel BDOF on a subblock-by-subblock basis, but not determine whether BDOF is bypassed. In some techniques where BDOF can be bypassed on a subblock-by-subblock basis, per-pixel BDOF may not have been available. With the example techniques described in this disclosure, video encoder 200 and video decoder 300 may be configured to optionally perform per-pixel BDOF or bypass BDOF, which reduces decoding overhead as appropriate. Balancing may result in better video compression.

하나 이상의 예들에서, 비디오 데이터를 각각 인코딩 또는 디코딩하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 BDOF 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하고, BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여, 또는 더 일반적으로, BDOF 가 블록에 대해 인에이블될 때 블록을 복수의 서브블록들로 분할하도록 구성될 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정할 수도 있다. 개별 왜곡 값들을 결정하는 예시적인 방식들이 하기에 더 상세히 설명된다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하고, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정할 수도 있다.In one or more examples, to encode or decode video data, respectively, video encoder 200 and video decoder 300 determine that BDOF is enabled for the block of video data and determine that BDOF is enabled for the block based on, or more generally, may be configured to divide a block into a plurality of subblocks when BDOF is enabled for the block. Video encoder 200 and video decoder 300 may determine, for each subblock of one or more of the plurality of subblocks, separate distortion values. Exemplary ways of determining individual distortion values are described in more detail below. Video encoder 200 and video decoder 300 determine whether one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of one or more subblocks of a plurality of subblocks based on the respective distortion values. and determine predictive samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed.

비디오 인코더 (200) 는 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 결정할 수도 있고, 잔차 값들을 시그널링할 수도 있다. 비디오 디코더 (300) 는 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 수신할 수도 있고, 잔차 값들을 예측 샘플들에 가산하여 블록을 복원할 수도 있다. 일부 예들에서, 잔차 값들을 수신하기 위해, 비디오 디코더 (300) 는 잔차 값들을 나타내는 정보를 수신하도록 구성될 수도 있고, 이로부터, 비디오 디코더 (300) 는 잔차 값들을 결정한다.Video encoder 200 may determine, and may signal residual values, that indicate a difference between the prediction samples and the samples of the block. Video decoder 300 may receive residual values indicative of a difference between the prediction samples and the samples of the block, and may add the residual values to the prediction samples to reconstruct the block. In some examples, to receive the residual values, video decoder 300 may be configured to receive information indicative of the residual values, from which video decoder 300 determines the residual values.

본 개시는 일반적으로 신택스 엘리먼트들과 같은 특정 정보를 "시그널링" 하는 것을 언급할 수도 있다. 용어 "시그널링" 은 일반적으로, 인코딩된 비디오 데이터를 디코딩하는데 사용되는 신택스 엘리먼트들에 대한 값들 및/또는 다른 데이터의 통신을 지칭할 수도 있다. 즉, 비디오 인코더 (200) 는 비트스트림에서 신택스 엘리먼트들에 대한 값들을 시그널링할 수도 있다. 일반적으로, 시그널링은 비트스트림에서 값을 생성하는 것을 지칭한다. 상기 언급된 바와 같이, 소스 디바이스 (102) 는 목적지 디바이스 (116) 에 의한 추후 취출을 위해 저장 디바이스 (112) 에 신택스 엘리먼트들을 저장할 때 발생할 수도 있는 바와 같이, 비실시간으로 또는 실질적으로 실시간으로 비트스트림을 목적지 디바이스 (116) 로 전송할 수도 있다.This disclosure may generally refer to “signaling” certain information, such as syntax elements. The term “signaling” may generally refer to the communication of values and/or other data for syntax elements used to decode encoded video data. That is, video encoder 200 may signal values for syntax elements in the bitstream. In general, signaling refers to producing a value in a bitstream. As noted above, source device 102 provides a bitstream in non-real-time or substantially real-time, as may occur when storing syntax elements in storage device 112 for later retrieval by destination device 116. to the destination device 116.

도 2a 및 도 2b 는 예시적인 쿼드트리 바이너리 트리 (QTBT) 구조 (130), 및 대응하는 코딩 트리 유닛 (CTU) (132) 을 예시한 개념 다이어그램들이다. 실선들은 쿼드트리 분할을 나타내고, 점선들은 바이너리 트리 분할을 표시한다. 바이너리 트리의 각각의 분할된 (즉, 비-리프) 노드에서, 어떤 분할 타입 (즉, 수평 또는 수직) 이 사용되는지를 표시하기 위해 하나의 플래그가 시그널링되며, 이 예에서, 0 은 수평 분할을 표시하고 1 은 수직 분할을 표시한다. 쿼드트리 분할에 대해, 쿼드트리 노드들은 블록을 동일한 사이즈를 갖는 4개의 서브블록들로 수평으로 및 수직으로 분할하기 때문에 분할 타입을 표시할 필요가 없다. 이에 따라, QTBT 구조 (130) 의 영역 트리 레벨 (즉, 실선들) 에 대한 (분할 정보와 같은) 신택스 엘리먼트들 및 QTBT 구조 (130) 의 예측 트리 레벨 (즉, 점선들) 에 대한 (분할 정보와 같은) 신택스 엘리먼트들을, 비디오 인코더 (200) 가 인코딩할 수도 있고 비디오 디코더 (300) 가 디코딩할 수도 있다. QTBT 구조 (130) 의 종단 리프 노드들에 의해 표현된 CU들에 대한, 예측 및 변환 데이터와 같은 비디오 데이터를, 비디오 인코더 (200) 가 인코딩할 수도 있고 비디오 디코더 (300) 가 디코딩할 수도 있다.2A and 2B are conceptual diagrams illustrating an example quadtree binary tree (QTBT) structure 130 , and a corresponding coding tree unit (CTU) 132 . Solid lines indicate quadtree splitting, and dotted lines indicate binary tree splitting. At each split (i.e. non-leaf) node of the binary tree, a flag is signaled to indicate which split type (i.e. horizontal or vertical) is being used, in this example 0 indicates horizontal split. and 1 indicates vertical division. For quadtree splitting, there is no need to indicate the splitting type because quadtree nodes split a block horizontally and vertically into four subblocks with the same size. Accordingly, syntax elements (such as splitting information) for the domain tree level (ie, solid lines) of the QTBT structure 130 and (splitting information) for the prediction tree level (ie, dotted lines) of the QTBT structure 130 ) syntax elements, which video encoder 200 may encode and video decoder 300 may decode. Video encoder 200 may encode and video decoder 300 may decode video data, such as prediction and transform data, for the CUs represented by the terminal leaf nodes of QTBT structure 130 .

일반적으로, 도 2b 의 CTU (132) 는 제 1 및 제 2 레벨들에서 QTBT 구조 (130) 의 노드들에 대응하는 블록들의 사이즈들을 정의하는 파라미터들과 연관될 수도 있다. 이들 파라미터들은 CTU 사이즈 (샘플들에서 CTU (132) 의 사이즈를 나타냄), 최소 쿼드트리 사이즈 (MinQTSize, 최소 허용된 쿼드트리 리프 노드 사이즈를 나타냄), 최대 바이너리 트리 사이즈 (MaxBTSize, 최대 허용된 바이너리 트리 루트 노드 사이즈를 나타냄), 최대 바이너리 트리 심도 (MaxBTDepth, 최대 허용된 바이너리 트리 심도를 나타냄), 및 최소 바이너리 트리 사이즈 (MinBTSize, 최소 허용된 바이너리 트리 리프 노드 사이즈를 나타냄) 를 포함할 수도 있다.In general, CTU 132 of FIG. 2B may be associated with parameters defining the sizes of blocks corresponding to nodes of QTBT structure 130 at first and second levels. These parameters are CTU size (representing the size of CTU 132 in samples), minimum quadtree size (MinQTSize, representing minimum allowed quadtree leaf node size), maximum binary tree size (MaxBTSize, maximum allowed binary tree size). root node size), maximum binary tree depth (MaxBTDepth, indicating maximum allowed binary tree depth), and minimum binary tree size (MinBTSize, indicating minimum allowed binary tree leaf node size).

CTU 에 대응하는 QTBT 구조의 루트 노드는 QTBT 구조의 제 1 레벨에서 4개의 자식 노드들을 가질 수도 있으며, 이들의 각각은 쿼드트리 파티셔닝에 따라 파티셔닝될 수도 있다. 즉, 제 1 레벨의 노드들은 리프 노드들 (자식 노드들을 갖지 않음) 이거나 또는 4개의 자식 노드들을 갖는다. QTBT 구조 (130) 의 예는 그러한 노드들을, 브랜치들에 대한 실선들을 갖는 자식 노드들 및 부모 노드를 포함하는 것으로서 나타낸다. 제 1 레벨의 노드들이 최대 허용된 바이너리 트리 루트 노드 사이즈 (MaxBTSize) 보다 크지 않으면, 그 노드들은 개별 바이너리 트리들에 의해 추가로 파티셔닝될 수 있다. 하나의 노드의 바이너리 트리 분할은, 분할로부터 발생하는 노드들이 최소 허용된 바이너리 트리 리프 노드 사이즈 (MinBTSize) 또는 최대 허용된 바이너리 트리 심도 (MaxBTDepth) 에 도달할 때까지 반복될 수 있다. QTBT 구조 (130) 의 예는 그러한 노드들을, 브랜치들에 대한 점선들을 갖는 것으로서 나타낸다. 바이너리 트리 리프 노드는, 어떠한 추가의 파티셔닝도 없이, 예측 (예컨대, 인트라-픽처 또는 인터-픽처 예측) 및 변환을 위해 사용되는 코딩 유닛 (CU) 으로서 지칭된다. 상기 논의된 바와 같이, CU들은 또한, "비디오 블록들" 또는 "블록들" 로서 지칭될 수도 있다.A root node of the QTBT structure corresponding to a CTU may have four child nodes at the first level of the QTBT structure, each of which may be partitioned according to quadtree partitioning. That is, nodes at the first level are either leaf nodes (with no child nodes) or have 4 child nodes. The example of QTBT structure 130 shows such nodes as including a parent node and child nodes with solid lines for branches. If the nodes of the first level are not larger than the maximum allowed binary tree root node size (MaxBTSize), the nodes may be further partitioned by individual binary trees. Binary tree splitting of one node can be repeated until the nodes resulting from the split reach either the minimum allowed binary tree leaf node size (MinBTSize) or the maximum allowed binary tree depth (MaxBTDepth). The example of QTBT structure 130 shows such nodes as having dotted lines for branches. A binary tree leaf node is referred to as a coding unit (CU) used for prediction (eg, intra-picture or inter-picture prediction) and transformation, without any further partitioning. As discussed above, CUs may also be referred to as “video blocks” or “blocks”.

QTBT 파티셔닝 구조의 일 예에 있어서, CTU 사이즈는 128x128 (루마 샘플들 및 2개의 대응하는 64x64 크로마 샘플들) 로서 설정되고, MinQTSize 는 16x16 으로서 설정되고, MaxBTSize 는 64x64 로서 설정되고, (폭 및 높이 양자 모두에 대한) MinBTSize 는 4 로서 설정되고, 그리고 MaxBTDepth 는 4 로서 설정된다. 쿼드트리 파티셔닝은 쿼드-트리 리프 노드들을 생성하기 위해 먼저 CTU 에 적용된다. 쿼드트리 리프 노드들은 16x16 (즉, MinQTSize) 으로부터 128x128 (즉, CTU 사이즈) 까지의 사이즈를 가질 수도 있다. 쿼드트리 리프 노드가 128x128 이면, 사이즈가 MaxBTSize (즉, 이 예에서 64x64) 를 초과하기 때문에, 리프 쿼드트리 노드는 바이너리 트리에 의해 추가로 분할되지 않을 것이다. 그렇지 않으면, 쿼드트리 리프 노드는 바이너리 트리에 의해 추가로 파티셔닝될 것이다. 따라서, 쿼드트리 리프 노드는 또한 바이너리 트리에 대한 루트 노드이고, 0 으로서의 바이너리 트리 심도를 갖는다. 바이너리 트리 심도가 MaxBTDepth (이 예에서 4) 에 도달할 경우, 추가의 분할은 허용되지 않는다. MinBTSize (이 예에서, 4) 와 동일한 폭을 갖는 바이너리 트리 노드는, 그 바이너리 트리 노드에 대해 추가의 수직 분할 (즉, 폭의 분할) 이 허용되지 않음을 암시한다. 유사하게, MinBTSize 와 동일한 높이를 갖는 바이너리 트리 노드는, 그 바이너리 트리 노드에 대해 추가의 수평 분할 (즉, 높이의 분할) 이 허용되지 않음을 암시한다. 상기 언급된 바와 같이, 바이너리 트리의 리프 노드들은 CU들로서 지칭되고, 추가의 파티셔닝 없이 예측 및 변환에 따라 추가로 프로세싱된다.In one example of the QTBT partitioning structure, the CTU size is set as 128x128 (luma samples and two corresponding 64x64 chroma samples), MinQTSize is set as 16x16, MaxBTSize is set as 64x64, (both width and height for all) MinBTSize is set as 4, and MaxBTDepth is set as 4. Quadtree partitioning is first applied to the CTU to create quad-tree leaf nodes. Quadtree leaf nodes may have a size from 16x16 (ie, MinQTSize) to 128x128 (ie, CTU size). If the quadtree leaf node is 128x128, then the leaf quadtree node will not be further split by the binary tree because the size exceeds MaxBTSize (ie 64x64 in this example). Otherwise, the quadtree leaf nodes will be further partitioned by the binary tree. Thus, a quadtree leaf node is also the root node for a binary tree, and has a binary tree depth of zero. When the binary tree depth reaches MaxBTDepth (4 in this example) no further splits are allowed. A binary tree node with a width equal to MinBTSize (in this example, 4) implies that no further vertical splits (ie, splits of width) are allowed for that binary tree node. Similarly, a binary tree node with a height equal to MinBTSize implies that no further horizontal splits (ie, splits of height) are allowed for that binary tree node. As mentioned above, the leaf nodes of the binary tree are referred to as CUs and are further processed according to prediction and transformation without further partitioning.

도 3 은 본 개시의 기법들을 수행할 수도 있는 예시적인 비디오 인코더 (200) 를 예시한 블록 다이어그램이다. 도 3 은 설명의 목적들로 제공되며, 본 개시에서 대체로 예시화되고 설명된 바와 같은 기법들의 한정으로서 고려되지 않아야 한다. 설명의 목적들로, 본 개시는 VVC (ITU-T H.266, 개발 중), 및 HEVC (ITU-T H.265) 의 기법들에 따른 비디오 인코더 (200) 를 설명된다. 하지만, 본 개시의 기법들은 다른 비디오 코딩 표준들에 대해 구성되는 비디오 인코딩 디바이스들에 의해 수행될 수도 있다.3 is a block diagram illustrating an example video encoder 200 that may perform the techniques of this disclosure. 3 is provided for purposes of explanation and should not be considered limiting of the techniques as generally exemplified and described in this disclosure. For purposes of explanation, this disclosure describes a video encoder 200 in accordance with the techniques of VVC (ITU-T H.266, under development), and HEVC (ITU-T H.265). However, the techniques of this disclosure may be performed by video encoding devices configured for other video coding standards.

도 3 의 예에 있어서, 비디오 인코더 (200) 는 비디오 데이터 메모리 (230), 모드 선택 유닛 (202), 잔차 생성 유닛 (204), 변환 프로세싱 유닛 (206), 양자화 유닛 (208), 역 양자화 유닛 (210), 역 변환 프로세싱 유닛 (212), 복원 유닛 (214), 필터 유닛 (216), 디코딩된 픽처 버퍼 (DPB) (218), 및 엔트로피 인코딩 유닛 (220) 을 포함한다. 비디오 데이터 메모리 (230), 모드 선택 유닛 (202), 잔차 생성 유닛 (204), 변환 프로세싱 유닛 (206), 양자화 유닛 (208), 역 양자화 유닛 (210), 역 변환 프로세싱 유닛 (212), 복원 유닛 (214), 필터 유닛 (216), DPB (218), 및 엔트로피 인코딩 유닛 (220) 중 임의의 것 또는 그 모두는 하나 이상의 프로세서들에서 또는 프로세싱 회로부에서 구현될 수도 있다. 예를 들어, 비디오 인코더 (200) 의 유닛들은 하드웨어 회로부의 부분으로서, 또는 프로세서, ASIC, 또는 FPGA 의 부분으로서 하나 이상의 회로들 또는 로직 엘리먼트들로서 구현될 수도 있다. 더욱이, 비디오 인코더 (200) 는 이들 및 다른 기능들을 수행하기 위해 추가적인 또는 대안적인 프로세서들 또는 프로세싱 회로부를 포함할 수도 있다.In the example of FIG. 3 , video encoder 200 includes video data memory 230 , mode select unit 202 , residual generation unit 204 , transform processing unit 206 , quantization unit 208 , inverse quantization unit 210 , inverse transform processing unit 212 , reconstruction unit 214 , filter unit 216 , decoded picture buffer (DPB) 218 , and entropy encoding unit 220 . Video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction Any or all of unit 214 , filter unit 216 , DPB 218 , and entropy encoding unit 220 may be implemented in one or more processors or in processing circuitry. For example, the units of video encoder 200 may be implemented as one or more circuits or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Moreover, video encoder 200 may include additional or alternative processors or processing circuitry to perform these and other functions.

비디오 데이터 메모리 (230) 는 비디오 인코더 (200) 의 컴포넌트들에 의해 인코딩될 비디오 데이터를 저장할 수도 있다. 비디오 인코더 (200) 는, 예를 들어, 비디오 소스 (104) (도 1) 로부터 비디오 데이터 메모리 (230) 에 저장된 비디오 데이터를 수신할 수도 있다. DPB (218) 는, 비디오 인코더 (200) 에 의한 후속 비디오 데이터의 예측에서의 사용을 위한 레퍼런스 비디오 데이터를 저장하는 레퍼런스 픽처 메모리로서 작용할 수도 있다. 비디오 데이터 메모리 (230) 및 DPB (218) 는 동기식 DRAM (SDRAM) 을 포함한 동적 랜덤 액세스 메모리 (DRAM), 자기저항성 RAM (MRAM), 저항성 RAM (RRAM), 또는 다른 타입들의 메모리 디바이스들과 같은 다양한 메모리 디바이스들 중 임의의 메모리 디바이스에 의해 형성될 수도 있다. 비디오 데이터 메모리 (230) 및 DPB (218) 는 동일한 메모리 디바이스 또는 별도의 메모리 디바이스들에 의해 제공될 수도 있다. 다양한 예들에 있어서, 비디오 데이터 메모리 (230) 는, 예시된 바와 같은 비디오 인코더 (200) 의 다른 컴포넌트들과 온-칩형이거나 또는 그들 컴포넌트들에 대하여 오프-칩형일 수도 있다.Video data memory 230 may store video data to be encoded by the components of video encoder 200 . Video encoder 200 may receive video data stored in video data memory 230 , eg, from video source 104 ( FIG. 1 ). DPB 218 may act as a reference picture memory that stores reference video data for use in prediction of subsequent video data by video encoder 200 . Video data memory 230 and DPB 218 may be memory devices such as dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. It may be formed by any of the memory devices. Video data memory 230 and DPB 218 may be provided by the same memory device or separate memory devices. In various examples, video data memory 230 may be on-chip with other components of video encoder 200 as illustrated, or off-chip with respect to those components.

본 개시에 있어서, 비디오 데이터 메모리 (230) 에 대한 참조는, 이와 같이 구체적으로 설명되지 않으면 비디오 인코더 (200) 내부의 메모리, 또는 이와 같이 구체적으로 설명되지 않으면 비디오 인코더 (200) 외부의 메모리로 한정되는 것으로서 해석되지 않아야 한다. 오히려, 비디오 데이터 메모리 (230) 에 대한 참조는, 비디오 인코더 (200) 가 인코딩을 위해 수신하는 비디오 데이터 (예컨대, 인코딩될 현재 블록에 대한 비디오 데이터) 를 저장하는 레퍼런스 메모리로서 이해되어야 한다. 도 1 의 메모리 (106) 는 또한 비디오 인코더 (200) 의 다양한 유닛들로부터의 출력들의 일시적 저장을 제공할 수도 있다.In this disclosure, references to video data memory 230 are limited to memory internal to video encoder 200 unless specifically described as such, or memory external to video encoder 200 unless specifically described as such. should not be construed as being Rather, reference to video data memory 230 should be understood as a reference memory that stores video data that video encoder 200 receives for encoding (eg, video data for a current block to be encoded). Memory 106 of FIG. 1 may also provide temporary storage of outputs from various units of video encoder 200 .

도 3 의 다양한 유닛들은 비디오 인코더 (200) 에 의해 수행되는 동작들의 이해를 돕기 위해 예시된다. 그 유닛들은 고정 기능 회로들, 프로그래밍가능 회로들, 또는 이들의 조합으로서 구현될 수도 있다. 고정 기능 회로들은 특정 기능성을 제공하는 회로들을 지칭하고, 수행될 수 있는 동작들에 대해 미리설정된다. 프로그래밍가능 회로들은 다양한 태스크들을 수행하도록 프로그래밍될 수 있는 회로들을 지칭하고, 수행될 수 있는 동작들에서 유연한 기능성을 제공한다. 예를 들어, 프로그래밍가능 회로들은, 프로그래밍가능 회로들로 하여금 소프트웨어 또는 펌웨어의 명령들에 의해 정의된 방식으로 동작하게 하는 소프트웨어 또는 펌웨어를 실행할 수도 있다. 고정 기능 회로들은 (예컨대, 파라미터들을 수신하거나 또는 파라미터들을 출력하기 위해) 소프트웨어 명령들을 실행할 수도 있지만, 고정 기능 회로들이 수행하는 동작들의 타입들은 일반적으로 불변이다. 일부 예들에 있어서, 유닛들 중 하나 이상은 별개의 회로 블록들 (고정 기능 또는 프로그래밍가능) 일 수도 있고, 일부 예들에 있어서, 유닛들 중 하나 이상은 집적 회로들일 수도 있다.The various units of FIG. 3 are illustrated to aid understanding of the operations performed by video encoder 200 . The units may be implemented as fixed function circuits, programmable circuits, or a combination thereof. Fixed function circuits refer to circuits that provide specific functionality and are preset for operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, programmable circuits may execute software or firmware that causes the programmable circuits to operate in a manner defined by instructions in the software or firmware. Fixed function circuits may execute software instructions (eg, to receive parameters or output parameters), but the types of operations they perform are generally immutable. In some examples, one or more of the units may be discrete circuit blocks (fixed function or programmable), and in some examples one or more of the units may be integrated circuits.

비디오 인코더 (200) 는 프로그래밍가능 회로들로부터 형성된, 산술 로직 유닛들 (ALU들), 기본 함수 유닛들 (EFU들), 디지털 회로들, 아날로그 회로들, 및/또는 프로그래밍가능 코어들을 포함할 수도 있다. 비디오 인코더 (200) 의 동작들이 프로그래밍가능 회로들에 의해 실행된 소프트웨어를 사용하여 수행되는 예들에 있어서, 메모리 (106) (도 1) 는 비디오 인코더 (200) 가 수신하고 실행하는 소프트웨어의 명령들 (예컨대, 오브젝트 코드) 을 저장할 수도 있거나, 또는 비디오 인코더 (200) 내의 다른 메모리 (도시 안됨) 가 그러한 명령들을 저장할 수도 있다.Video encoder 200 may include arithmetic logic units (ALUs), basic function units (EFUs), digital circuits, analog circuits, and/or programmable cores formed from programmable circuits. . In examples where the operations of video encoder 200 are performed using software executed by programmable circuits, memory 106 (FIG. 1) contains instructions of software that video encoder 200 receives and executes ( object code), or other memory (not shown) within video encoder 200 may store such instructions.

비디오 데이터 메모리 (230) 는 수신된 비디오 데이터를 저장하도록 구성된다. 비디오 인코더 (200) 는 비디오 데이터 메모리 (230) 로부터 비디오 데이터의 픽처를 취출하고, 비디오 데이터를 잔차 생성 유닛 (204) 및 모드 선택 유닛 (202) 에 제공할 수도 있다. 비디오 데이터 메모리 (230) 에서의 비디오 데이터는 인코딩될 원시 비디오 데이터일 수도 있다.Video data memory 230 is configured to store received video data. Video encoder 200 may retrieve a picture of video data from video data memory 230 and provide the video data to residual generation unit 204 and mode select unit 202 . Video data in video data memory 230 may be raw video data to be encoded.

모드 선택 유닛 (202) 은 모션 추정 유닛 (222), 모션 보상 유닛 (224), 및 인트라-예측 유닛 (226) 을 포함한다. 모드 선택 유닛 (202) 은 다른 예측 모드들에 따라 비디오 예측을 수행하기 위해 추가적인 기능 유닛들을 포함할 수도 있다. 예들로서, 모드 선택 유닛 (202) 은 팔레트 유닛, 인트라-블록 카피 유닛 (이는 모션 추정 유닛 (222) 및/또는 모션 보상 유닛 (224) 의 부분일 수도 있음), 아핀 유닛, 선형 모델 (LM) 유닛 등을 포함할 수도 있다.Mode select unit 202 includes motion estimation unit 222 , motion compensation unit 224 , and intra-prediction unit 226 . Mode select unit 202 may include additional functional units to perform video prediction according to other prediction modes. As examples, mode select unit 202 may include a palette unit, an intra-block copy unit (which may be part of motion estimation unit 222 and/or motion compensation unit 224), an affine unit, a linear model (LM) units and the like.

모드 선택 유닛 (202) 은 일반적으로 인코딩 파라미터들의 조합들 및 그러한 조합들에 대한 결과적인 레이트-왜곡 값들을 테스트하기 위해 다중의 인코딩 패스들을 조정한다. 인코딩 파라미터들은 CTU들의 CU들로의 파티셔닝, CU들에 대한 예측 모드들, CU들의 잔차 데이터에 대한 변환 타입들, CU들의 잔차 데이터에 대한 양자화 파라미터들 등을 포함할 수도 있다. 모드 선택 유닛 (202) 은 궁극적으로 다른 테스트된 조합들보다 우수한 레이트-왜곡 값들을 갖는 인코딩 파라미터들의 조합을 선택할 수도 있다.Mode select unit 202 typically coordinates multiple encoding passes to test combinations of encoding parameters and resulting rate-distortion values for those combinations. Encoding parameters may include partitioning of CTUs into CUs, prediction modes for CUs, transform types for residual data of CUs, quantization parameters for residual data of CUs, and the like. Mode select unit 202 may ultimately select a combination of encoding parameters with better rate-distortion values than other tested combinations.

비디오 인코더 (200) 는 비디오 데이터 메모리 (230) 로부터 취출된 픽처를 일련의 CTU들로 파티셔닝하고, 슬라이스 내에 하나 이상의 CTU들을 캡슐화할 수도 있다. 모드 선택 유닛 (202) 은 상기 설명된 HEVC 의 쿼드트리 구조 또는 QTBT 구조와 같은 트리 구조에 따라 픽처의 CTU 를 파티셔닝할 수도 있다. 상기 설명된 바와 같이, 비디오 인코더 (200) 는 트리 구조에 따라 CTU 를 파티셔닝하는 것으로부터 하나 이상의 CU들을 형성할 수도 있다. 그러한 CU 는 일반적으로 "비디오 블록" 또는 "블록" 으로서도 또한 지칭될 수도 있다.Video encoder 200 may partition a picture retrieved from video data memory 230 into a series of CTUs and encapsulate one or more CTUs within a slice. Mode select unit 202 may partition the CTUs of a picture according to a tree structure, such as the quadtree structure or QTBT structure of HEVC described above. As described above, video encoder 200 may form one or more CUs from partitioning a CTU according to a tree structure. Such a CU may also be generically referred to as a “video block” or “block”.

일반적으로, 모드 선택 유닛 (202) 은 또한 그의 컴포넌트들 (예컨대, 모션 추정 유닛 (222), 모션 보상 유닛 (224), 및 인트라-예측 유닛 (226)) 을 제어하여 현재 블록 (예컨대, 현재 CU, 또는 HEVC 에서, PU 및 TU 의 중첩 부분) 에 대한 예측 블록을 생성한다. 현재 블록의 인터-예측을 위해, 모션 추정 유닛 (222) 은 하나 이상의 레퍼런스 픽처들 (예컨대, DPB (218) 에 저장된 하나 이상의 이전에 코딩된 픽처들) 에서 하나 이상의 근접하게 매칭하는 레퍼런스 블록들을 식별하기 위해 모션 탐색을 수행할 수도 있다. 특히, 모션 추정 유닛 (222) 은, 예컨대, 절대 차이의 합 (SAD), 제곱 차이들의 합 (SSD), 평균 절대 차이 (MAD), 평균 제곱 차이들 (MSD) 등에 따라, 잠재적 레퍼런스 블록이 현재 블록에 얼마나 유사한지를 나타내는 값을 계산할 수도 있다. 모션 추정 유닛 (222) 은 일반적으로, 고려되는 레퍼런스 블록과 현재 블록 사이의 샘플 단위 차이들을 사용하여 이들 계산들을 수행할 수도 있다. 모션 추정 유닛 (222) 은, 현재 블록에 가장 근접하게 매칭하는 레퍼런스 블록을 표시하는, 이들 계산들로부터 야기되는 최저 값을 갖는 레퍼런스 블록을 식별할 수도 있다.In general, mode select unit 202 also controls components thereof (e.g., motion estimation unit 222, motion compensation unit 224, and intra-prediction unit 226) to obtain a current block (e.g., current CU , or, in HEVC, an overlapping portion of a PU and a TU). For inter-prediction of a current block, motion estimation unit 222 identifies one or more closely matching reference blocks in one or more reference pictures (eg, one or more previously coded pictures stored in DPB 218). To do so, motion search may be performed. In particular, motion estimation unit 222 determines whether a potential reference block is currently based on, e.g., sum of absolute differences (SAD), sum of squared differences (SSD), mean absolute difference (MAD), mean squared differences (MSD), and the like. You can also calculate a value indicating how similar it is to a block. Motion estimation unit 222 may generally perform these calculations using sample-by-sample differences between the considered reference block and the current block. Motion estimation unit 222 may identify the reference block with the lowest value resulting from these calculations that indicates the reference block that most closely matches the current block.

모션 추정 유닛 (222) 은, 현재 픽처에서의 현재 블록의 포지션에 대한 레퍼런스 픽처들에서의 레퍼런스 블록들의 포지션들을 정의하는 하나 이상의 모션 벡터들 (MV들) 을 형성할 수도 있다. 그 다음, 모션 추정 유닛 (222) 은 모션 벡터들을 모션 보상 유닛 (224) 에 제공할 수도 있다. 예를 들어, 단방향 인터-예측에 대해, 모션 추정 유닛 (222) 은 단일 모션 벡터를 제공할 수도 있는 반면, 양방향 인터-예측에 대해, 모션 추정 유닛 (222) 은 2개의 모션 벡터들을 제공할 수도 있다. 그 다음, 모션 보상 유닛 (224) 은 모션 벡터들을 사용하여 예측 블록을 생성할 수도 있다. 예를 들어, 모션 보상 유닛 (224) 은 모션 벡터를 사용하여 레퍼런스 블록의 데이터를 취출할 수도 있다. 다른 예로서, 모션 벡터가 분수 샘플 정밀도를 갖는다면, 모션 보상 유닛 (224) 은 하나 이상의 보간 필터들에 따라 예측 블록에 대한 값들을 보간할 수도 있다. 더욱이, 양방향 인터-예측에 대해, 모션 보상 유닛 (224) 은 개별 모션 벡터들에 의해 식별된 2개의 레퍼런스 블록들에 대한 데이터를 취출하고, 예컨대, 샘플 단위 평균화 또는 가중 평균화를 통해 취출된 데이터를 결합할 수도 있다.Motion estimation unit 222 may form one or more motion vectors (MVs) that define positions of reference blocks in reference pictures relative to the position of the current block in the current picture. Motion estimation unit 222 may then provide the motion vectors to motion compensation unit 224 . For example, for unidirectional inter-prediction, motion estimation unit 222 may provide a single motion vector, whereas for bidirectional inter-prediction, motion estimation unit 222 may provide two motion vectors. there is. Motion compensation unit 224 may then use the motion vectors to generate a predictive block. For example, motion compensation unit 224 may use the motion vector to retrieve the data of the reference block. As another example, if the motion vector has fractional-sample precision, motion compensation unit 224 may interpolate values for the predictive block according to one or more interpolation filters. Moreover, for bi-directional inter-prediction, motion compensation unit 224 retrieves the data for two reference blocks identified by the respective motion vectors, and retrieves the data, e.g., through sample-by-sample averaging or weighted averaging. can also be combined.

다른 예로서, 인트라-예측 또는 인트라-예측 코딩에 대해, 인트라-예측 유닛 (226) 은 현재 블록에 이웃하는 샘플들로부터 예측 블록을 생성할 수도 있다. 예를 들어, 방향성 모드들에 대해, 인트라-예측 유닛 (226) 은 일반적으로, 이웃 샘플들의 값들을 수학적으로 결합하고, 현재 블록에 걸쳐 정의된 방향에서 이들 계산된 값들을 파퓰레이팅하여 예측 블록을 생성할 수도 있다. 다른 예로서, DC 모드에 대해, 인트라-예측 유닛 (226) 은 현재 블록에 대한 이웃 샘플들의 평균을 계산하고, 예측 블록을 생성하여 예측 블록의 각각의 샘플에 대해 이러한 결과적인 평균을 포함할 수도 있다.As another example, for intra-prediction or intra-predictive coding, intra-prediction unit 226 may generate a predictive block from samples neighboring the current block. For example, for directional modes, intra-prediction unit 226 generally mathematically combines the values of neighboring samples and populates these computed values in a defined direction over the current block to obtain a predictive block. can also create As another example, for the DC mode, intra-prediction unit 226 may calculate an average of neighboring samples for the current block, and generate a predictive block to include this resulting average for each sample of the predictive block. there is.

모드 선택 유닛 (202) 은 예측 블록을 잔차 생성 유닛 (204) 에 제공한다. 잔차 생성 유닛 (204) 은 비디오 데이터 메모리 (230) 로부터의 현재 블록의 원시의, 인코딩되지 않은 버전 및 모드 선택 유닛 (202) 으로부터의 예측 블록을 수신한다. 잔차 생성 유닛 (204) 은 현재 블록과 예측 블록 사이의 샘플 단위 차이들을 계산한다. 결과적인 샘플 단위 차이들은 현재 블록에 대한 잔차 블록을 정의한다. 일부 예들에 있어서, 잔차 생성 유닛 (204) 은 또한, 잔차 차동 펄스 코드 변조 (RDPCM) 를 사용하여 잔차 블록을 생성하기 위해 잔차 블록에서의 샘플 값들 사이의 차이들을 결정할 수도 있다. 일부 예들에 있어서, 잔차 생성 유닛 (204) 은 바이너리 감산을 수행하는 하나 이상의 감산기 회로들을 사용하여 형성될 수도 있다.Mode select unit 202 provides the predictive block to residual generation unit 204. Residual generation unit 204 receives a raw, unencoded version of the current block from video data memory 230 and a predictive block from mode select unit 202 . Residual generation unit 204 calculates sample-by-sample differences between the current block and the predictive block. The resulting sample-by-sample differences define the residual block for the current block. In some examples, residual generation unit 204 may also determine differences between sample values in the residual block to generate the residual block using residual differential pulse code modulation (RDPCM). In some examples, residual generation unit 204 may be formed using one or more subtractor circuits that perform binary subtraction.

모드 선택 유닛 (202) 이 CU들을 PU들로 파티셔닝하는 예들에 있어서, 각각의 PU 는 루마 예측 유닛 및 대응하는 크로마 예측 유닛들과 연관될 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 다양한 사이즈들을 갖는 PU들을 지원할 수도 있다. 상기 나타낸 바와 같이, CU 의 사이즈는 CU 의 루마 코딩 블록의 사이즈를 지칭할 수도 있고 PU 의 사이즈는 PU 의 루마 예측 유닛의 사이즈를 지칭할 수도 있다. 특정 CU 의 사이즈가 2Nx2N 임을 가정하면, 비디오 인코더 (200) 는 인트라-예측을 위해 2Nx2N 또는 NxN 의 PU 사이즈들을 지원하고, 인터-예측을 위해 2Nx2N, 2NxN, Nx2N, NxN, 기타 등등의 대칭적인 PU 사이즈들을 지원할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 또한, 인터 예측을 위해 2NxnU, 2NxnD, nLx2N, 및 nRx2N 의 PU 사이즈들에 대한 비대칭적인 파티셔닝을 지원할 수도 있다.In examples in which mode select unit 202 partitions CUs into PUs, each PU may be associated with a luma prediction unit and corresponding chroma prediction units. Video encoder 200 and video decoder 300 may support PUs having various sizes. As indicated above, the size of a CU may refer to the size of the luma coding block of the CU and the size of a PU may refer to the size of the luma prediction unit of the PU. Assuming that the size of a particular CU is 2Nx2N, video encoder 200 supports PU sizes of 2Nx2N or NxN for intra-prediction, and symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, NxN, etc. for inter-prediction sizes may be supported. Video encoder 200 and video decoder 300 may also support asymmetric partitioning on PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.

모드 선택 유닛 (202) 이 CU 를 PU들로 추가로 파티셔닝하지 않는 예들에 있어서, 각각의 CU 는 루마 코딩 블록 및 대응하는 크로마 코딩 블록들과 연관될 수도 있다. 상기에서와 같이, CU 의 사이즈는 CU 의 루마 코딩 블록의 사이즈를 지칭할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 2Nx2N, 2NxN, 또는 Nx2N 의 CU 사이즈들을 지원할 수도 있다.In examples in which mode select unit 202 does not further partition a CU into PUs, each CU may be associated with a luma coding block and corresponding chroma coding blocks. As above, the size of a CU may refer to the size of a luma coding block of the CU. Video encoder 200 and video decoder 300 may support CU sizes of 2Nx2N, 2NxN, or Nx2N.

인트라-블록 카피 모드 코딩, 아핀 모드 코딩 및 선형 모델 (LM) 모드 코딩과 같은 다른 비디오 코딩 기법들에 대해, 일부 예들로서, 모드 선택 유닛 (202) 은, 코딩 기법들과 연관된 개별 유닛들을 통해, 인코딩되는 현재 블록에 대한 예측 블록을 생성한다. 팔레트 모드 코딩과 같은 일부 예들에 있어서, 모드 선택 유닛 (202) 은 예측 블록을 생성하지 않을 수도 있고, 대신, 선택된 팔레트에 기초하여 블록을 복원하는 방식을 표시하는 신택스 엘리먼트들을 생성한다. 그러한 모드들에서, 모드 선택 유닛 (202) 은 이들 신택스 엘리먼트들을, 인코딩될 엔트로피 인코딩 유닛 (220) 에 제공할 수도 있다.For other video coding techniques, such as intra-block copy mode coding, affine mode coding and linear model (LM) mode coding, as some examples, mode select unit 202, via individual units associated with the coding techniques, Generates a prediction block for the current block being encoded. For some examples, such as palette mode coding, mode select unit 202 may not generate a predictive block, but instead generate syntax elements indicating a way to reconstruct the block based on the selected palette. In such modes, mode select unit 202 may provide these syntax elements to entropy encoding unit 220 to be encoded.

상기 설명된 바와 같이, 잔차 생성 유닛 (204) 은 현재 블록 및 대응하는 예측 블록에 대한 비디오 데이터를 수신한다. 그 다음, 잔차 생성 유닛 (204) 은 현재 블록에 대한 잔차 블록을 생성한다. 잔차 블록을 생성하기 위해, 잔차 생성 유닛 (204) 은 현재 블록과 예측 블록 사이의 샘플 단위 차이들을 계산한다.As described above, residual generation unit 204 receives video data for a current block and a corresponding predictive block. Residual generation unit 204 then generates a residual block for the current block. To generate a residual block, residual generation unit 204 calculates sample-by-sample differences between the current block and the predictive block.

변환 프로세싱 유닛 (206) 은 잔차 블록에 하나 이상의 변환들을 적용하여 변환 계수들의 블록 (본 명세서에서는 "변환 계수 블록" 으로서 지칭됨) 을 생성한다. 변환 프로세싱 유닛 (206) 은 다양한 변환들을 잔차 블록에 적용하여 변환 계수 블록을 형성할 수도 있다. 예를 들어, 변환 프로세싱 유닛 (206) 은 이산 코사인 변환 (DCT), 방향성 변환, Karhunen-Loeve 변환 (KLT), 또는 개념적으로 유사한 변환을 잔차 블록에 적용할 수도 있다. 일부 예들에 있어서, 변환 프로세싱 유닛 (206) 은 잔차 블록에 대한 다중의 변환들, 예컨대, 1 차 변환 및 2 차 변환, 예컨대 회전 변환을 수행할 수도 있다. 일부 예들에 있어서, 변환 프로세싱 유닛 (206) 은 잔차 블록에 변환들을 적용하지 않는다.Transform processing unit 206 applies one or more transforms to the residual block to produce a block of transform coefficients (referred to herein as a “transform coefficient block”). Transform processing unit 206 may apply various transforms to the residual block to form a transform coefficient block. For example, transform processing unit 206 may apply a discrete cosine transform (DCT), a directional transform, a Karhunen-Loeve transform (KLT), or a conceptually similar transform to the residual block. In some examples, transform processing unit 206 may perform multiple transforms on the residual block, eg, a primary transform and a secondary transform, eg, a rotation transform. In some examples, transform processing unit 206 does not apply transforms to the residual block.

양자화 유닛 (208) 은 양자화된 변환 계수 블록을 생성하기 위해 변환 계수 블록에서의 변환 계수들을 양자화할 수도 있다. 양자화 유닛 (208) 은 현재 블록과 연관된 양자화 파라미터 (QP) 값에 따라 변환 계수 블록의 변환 계수들을 양자화할 수도 있다. 비디오 인코더 (200) 는 (예컨대, 모드 선택 유닛 (202) 을 통해) CU 와 연관된 QP 값을 조정함으로써 현재 블록과 연관된 변환 계수 블록들에 적용되는 양자화도를 조정할 수도 있다. 양자화는 정보의 손실을 도입할 수도 있으며, 따라서, 양자화된 변환 계수들은 변환 프로세싱 유닛 (206) 에 의해 생성된 오리지널 변환 계수들보다 더 낮은 정밀도를 가질 수도 있다.Quantization unit 208 may quantize the transform coefficients in the transform coefficient block to produce a quantized transform coefficient block. Quantization unit 208 may quantize the transform coefficients of a transform coefficient block according to a quantization parameter (QP) value associated with the current block. Video encoder 200 may adjust the degree of quantization applied to transform coefficient blocks associated with the current block by adjusting the QP value associated with the CU (eg, via mode select unit 202 ). Quantization may introduce loss of information, and thus, quantized transform coefficients may have lower precision than the original transform coefficients generated by transform processing unit 206.

역 양자화 유닛 (210) 및 역 변환 프로세싱 유닛 (212) 은, 각각, 양자화된 변환 계수 블록에 역 양자화 및 역 변환들을 적용하여, 변환 계수 블록으로부터 잔차 블록을 복원할 수도 있다. 복원 유닛 (214) 은 모드 선택 유닛 (202) 에 의해 생성된 예측 블록 및 복원된 잔차 블록에 기초하여 (잠재적으로 어느 정도의 왜곡을 가짐에도 불구하고) 현재 블록에 대응하는 복원된 블록을 생성할 수도 있다. 예를 들어, 복원 유닛 (214) 은 복원된 잔차 블록의 샘플들을, 모드 선택 유닛 (202) 에 의해 생성된 예측 블록으로부터의 대응하는 샘플들에 가산하여 복원된 블록을 생성할 수도 있다.Inverse quantization unit 210 and inverse transform processing unit 212 may apply inverse quantization and inverse transforms to the quantized transform coefficient block, respectively, to reconstruct a residual block from the transform coefficient block. Reconstruction unit 214 will generate a reconstructed block corresponding to the current block (potentially with some degree of distortion) based on the predicted block produced by mode select unit 202 and the reconstructed residual block. may be For example, reconstruction unit 214 may add samples of the reconstructed residual block to corresponding samples from the predictive block produced by mode select unit 202 to produce a reconstructed block.

필터 유닛 (216) 은 복원된 블록들에 대해 하나 이상의 필터 동작들을 수행할 수도 있다. 예를 들어, 필터 유닛 (216) 은 CU들의 에지들을 따라 블록화 아티팩트들을 감소시키기 위해 디블록킹 동작들을 수행할 수도 있다. 일부 예들에 있어서, 필터 유닛 (216) 의 동작들은 스킵될 수도 있다.Filter unit 216 may perform one or more filter operations on the reconstructed blocks. For example, filter unit 216 may perform deblocking operations to reduce blocking artifacts along the edges of CUs. In some examples, operations of filter unit 216 may be skipped.

비디오 인코더 (200) 는 복원된 블록들을 DPB (218) 에 저장한다. 예를 들어, 필터 유닛 (216) 의 동작들이 수행되는 않은 예들에 있어서, 복원 유닛 (214) 이, 복원된 블록들을 DPB (218) 에 저장할 수도 있다. 필터 유닛 (216) 의 동작들이 수행되는 예들에 있어서, 필터 유닛 (216) 이, 필터링된 복원된 블록들을 DPB (218) 에 저장할 수도 있다. 모션 추정 유닛 (222) 및 모션 보상 유닛 (224) 은 복원된 (및 잠재적으로 필터링된) 블록들로부터 형성된 DPB (218) 로부터의 레퍼런스 픽처를 취출하여, 후속적으로 인코딩된 픽처들의 블록들을 인터-예측할 수도 있다. 부가적으로, 인트라-예측 유닛 (226) 은 현재 픽처에서의 다른 블록들을 인트라-예측하기 위해 현재 픽처의 DPB (218) 내의 복원된 블록들을 사용할 수도 있다.Video encoder 200 stores the reconstructed blocks in DPB 218 . For example, in examples in which the operations of filter unit 216 are not performed, reconstruction unit 214 may store the reconstructed blocks to DPB 218 . In examples in which the operations of filter unit 216 are performed, filter unit 216 may store the filtered reconstructed blocks to DPB 218 . Motion estimation unit 222 and motion compensation unit 224 retrieve reference pictures from DPB 218 formed from the reconstructed (and potentially filtered) blocks to inter-block blocks of subsequently encoded pictures. can also be predicted. Additionally, intra-prediction unit 226 may use reconstructed blocks within DPB 218 of the current picture to intra-predict other blocks in the current picture.

일반적으로, 엔트로피 인코딩 유닛 (220) 은 비디오 인코더 (200) 의 다른 기능 컴포넌트들로부터 수신된 신택스 엘리먼트들을 엔트로피 인코딩할 수도 있다. 예를 들어, 엔트로피 인코딩 유닛 (220) 은 양자화 유닛 (208) 으로부터의 양자화된 변환 계수 블록들을 엔트로피 인코딩할 수도 있다. 다른 예로서, 엔트로피 인코딩 유닛 (220) 은 모드 선택 유닛 (202) 으로부터의 예측 신택스 엘리먼트들 (예컨대, 인터-예측에 대한 모션 정보 또는 인트라-예측에 대한 인트라-모드 정보) 을 엔트로피 인코딩할 수도 있다. 엔트로피 인코딩 유닛 (220) 은 엔트로피-인코딩된 데이터를 생성하기 위해, 비디오 데이터의 다른 예인, 신택스 엘리먼트들에 대해 하나 이상의 엔트로피 인코딩 동작들을 수행할 수도 있다. 예를 들어, 엔트로피 인코딩 유닛 (220) 은 컨텍스트 적응적 가변 길이 코딩 (CAVLC) 동작, CABAC 동작, V2V (variable-to-variable) 길이 코딩 동작, 신택스 기반 컨텍스트 적응적 바이너리 산술 코딩 (SBAC) 동작, 확률 인터벌 파티셔닝 엔트로피 (PIPE) 코딩 동작, 지수-골롬 인코딩 동작, 또는 다른 타입의 엔트로피 인코딩 동작을 데이터에 대해 수행할 수도 있다. 일부 예들에 있어서, 엔트로피 인코딩 유닛 (220) 은, 신택스 엘리먼트들이 엔트로피 인코딩되지 않은 바이패스 모드에서 동작할 수도 있다.In general, entropy encoding unit 220 may entropy encode syntax elements received from other functional components of video encoder 200 . For example, entropy encoding unit 220 may entropy encode the quantized transform coefficient blocks from quantization unit 208 . As another example, entropy encoding unit 220 may entropy encode prediction syntax elements from mode select unit 202 (eg, motion information for inter-prediction or intra-mode information for intra-prediction). . Entropy encoding unit 220 may perform one or more entropy encoding operations on syntax elements, another example of video data, to generate entropy-encoded data. For example, entropy encoding unit 220 may perform a context-adaptive variable length coding (CAVLC) operation, a CABAC operation, a variable-to-variable (V2V) length coding operation, a syntax-based context-adaptive binary arithmetic coding (SBAC) operation, A probability interval partitioning entropy (PIPE) coding operation, an exponential-Golomb encoding operation, or another type of entropy encoding operation may be performed on the data. In some examples, entropy encoding unit 220 may operate in a bypass mode in which syntax elements are not entropy encoded.

비디오 인코더 (200) 는, 픽처 또는 슬라이스의 블록들을 복원하는데 필요한 엔트로피 인코딩된 신택스 엘리먼트들을 포함하는 비트스트림을 출력할 수도 있다. 특히, 엔트로피 인코딩 유닛 (220) 은 비트스트림을 출력할 수도 있다.Video encoder 200 may output a bitstream that includes entropy encoded syntax elements necessary to reconstruct blocks of a picture or slice. In particular, entropy encoding unit 220 may output a bitstream.

상기 설명된 동작들은 블록에 대하여 설명된다. 그러한 설명은 루마 코딩 블록 및/또는 크로마 코딩 블록들에 대한 동작들인 것으로서 이해되어야 한다. 상기 설명된 바와 같이, 일부 예들에 있어서, 루마 코딩 블록 및 크로마 코딩 블록들은 CU 의 루마 및 크로마 컴포넌트들이다. 일부 예들에 있어서, 루마 코딩 블록 및 크로마 코딩 블록들은 PU 의 루마 및 크로마 컴포넌트들이다.The operations described above are described in terms of blocks. Such description should be understood as operations on luma coding block and/or chroma coding blocks. As described above, in some examples, the luma coding block and chroma coding blocks are luma and chroma components of a CU. In some examples, the luma coding block and chroma coding blocks are luma and chroma components of a PU.

일부 예들에 있어서, 루마 코딩 블록에 대해 수행된 동작들은 크로마 코딩 블록들에 대해 반복될 필요가 없다. 일 예로서, 루마 코딩 블록에 대한 모션 벡터 (MV) 및 레퍼런스 픽처를 식별하기 위한 동작들이, 크로마 블록들에 대한 MV 및 레퍼런스 픽처를 식별하기 위해 반복될 필요는 없다. 오히려, 루마 코딩 블록에 대한 MV 는 크로마 블록들에 대한 MV 를 결정하도록 스케일링될 수도 있으며, 레퍼런스 픽처는 동일할 수도 있다. 다른 예로서, 인트라-예측 프로세스는 루마 코딩 블록 및 크로마 코딩 블록들에 대해 동일할 수도 있다.In some examples, operations performed on a luma coding block need not be repeated for chroma coding blocks. As an example, operations to identify motion vectors (MVs) and reference pictures for luma coding blocks need not be repeated to identify MVs and reference pictures for chroma blocks. Rather, the MV for the luma coding block may be scaled to determine the MV for chroma blocks, and the reference picture may be the same. As another example, the intra-prediction process may be the same for luma coding block and chroma coding blocks.

비디오 인코더 (200) 는 비디오 데이터를 인코딩하도록 구성된 디바이스의 일 예를 나타내고, 그 디바이스는 비디오 데이터를 저장하도록 구성된 메모리, 및 회로부에서 구현된 하나 이상의 프로세싱 유닛들을 포함하고, 하나 이상의 프로세싱 유닛들은, 입력 블록을 복수의 서브블록들로 분할하는 것으로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하고, 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하고, 서브블록을 복수의 서브-서브블록들로 분할하고, 서브-서브블록들 중 하나 이상에 대한 정세화된 모션 벡터를 결정하는 것으로서, 하나 이상의 서브-서브블록들 중의 서브-서브블록에 대한 정세화된 모션 벡터는 서브-서브블록에서의 복수의 샘플들에 대해 동일한, 상기 정세화된 모션 벡터를 결정하고, 그리고 하나 이상의 서브-서브블록들에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하도록 구성된다.Video encoder 200 represents an example of a device configured to encode video data, the device including a memory configured to store video data, and one or more processing units implemented in circuitry, the one or more processing units comprising input Dividing a block into a plurality of subblocks, wherein a size of the input block is less than or equal to a size of a coding unit, and bidirectional optical flow (BDOF) based on a condition being satisfied Determining that it will be applied to a subblock among a plurality of subblocks, dividing the subblock into a plurality of sub-subblocks, and determining a refined motion vector for one or more of the sub-subblocks, - determine the refined motion vector for a sub-subblock of the sub-blocks, which is the same for a plurality of samples in the sub-subblock, and the refinement for one or more sub-subblocks It is configured to perform BDOF on the subblock based on the resulting motion vector.

다른 예로서, 회로부에서 구현된 하나 이상의 프로세싱 유닛들은, 입력 블록을 복수의 서브블록들로 분할하는 것으로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하고, 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하고, 서브블록을 복수의 서브-서브블록들로 분할하고, 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터를 결정하고, 그리고 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하도록 구성될 수도 있다.As another example, one or more processing units implemented in circuitry divides an input block into a plurality of subblocks, wherein a size of the input block is less than or equal to a size of a coding unit, and divides the input block into a plurality of subblocks; , determine that bidirectional optical flow (BDOF) is to be applied to a subblock of the plurality of subblocks based on a condition being satisfied, divide the subblock into a plurality of sub-subblocks, and one or more samples in the subblock determine a refined motion vector for each of the s, and perform BDOF for the subblock based on the refined motion vector for each of one or more samples in the subblock.

또다른 예로서, 비디오 인코더 (200) 의 프로세싱 회로부는, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하고, BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하고, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하고, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하고, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하고, 예측 샘플들과 블록 사이의 차이를 나타내는 잔차 값들을 결정하고, 그리고 잔차 값들을 나타내는 정보를 시그널링하도록 구성될 수도 있다.As another example, processing circuitry of video encoder 200 determines that bi-directional optical flow (BDOF) is enabled for a block of video data, and based on the determination that BDOF is enabled for the block, determines the block as a plurality of divide into subblocks, for each subblock of one or more subblocks of the plurality of subblocks, determine respective distortion values, and determine respective distortion values of each of the one or more subblocks of the plurality of subblocks based on the respective distortion values. Determine that one of the BDOFs is performed or the BDOF is bypassed per pixel for a subblock of , and predict for each subblock of the one or more subblocks based on the determination that the BDOF is performed or the BDOF is bypassed per pixel. It may be configured to determine samples, determine residual values representing a difference between the predicted samples and the block, and signal information representing the residual values.

도 4 는 본 개시의 기법들을 수행할 수도 있는 예시적인 비디오 디코더 (300) 를 예시한 블록 다이어그램이다. 도 4 는 설명의 목적들로 제공되며, 본 개시에서 넓게 예시화되고 설명된 바와 같은 기법들에 대해 한정하는 것은 아니다. 설명의 목적들로, 본 개시는 VVC (ITU-T H.266, 개발 중), 및 HEVC (ITU-T H.265) 의 기법들에 따른 비디오 디코더 (300) 를 설명된다. 하지만, 본 개시의 기법들은 다른 비디오 코딩 표준들에 대해 구성되는 비디오 코딩 디바이스들에 의해 수행될 수도 있다.4 is a block diagram illustrating an example video decoder 300 that may perform the techniques of this disclosure. 4 is provided for purposes of explanation and is not limiting on the techniques as broadly exemplified and described in this disclosure. For purposes of explanation, this disclosure describes a video decoder 300 in accordance with the techniques of VVC (ITU-T H.266, under development), and HEVC (ITU-T H.265). However, the techniques of this disclosure may be performed by video coding devices configured for other video coding standards.

도 4 의 예에 있어서, 비디오 디코더 (300) 는, 코딩된 픽처 버퍼 (CPB) 메모리 (320), 엔트로피 디코딩 유닛 (302), 예측 프로세싱 유닛 (304), 역 양자화 유닛 (306), 역 변환 프로세싱 유닛 (308), 복원 유닛 (310), 필터 유닛 (312), 및 디코딩된 픽처 버퍼 (DPB) (314) 를 포함한다. CPB 메모리 (320), 엔트로피 디코딩 유닛 (302), 예측 프로세싱 유닛 (304), 역 양자화 유닛 (306), 역 변환 프로세싱 유닛 (308), 복원 유닛 (310), 필터 유닛 (312), 및 DPB (314) 중 임의의 것 또는 그 모두는 하나 이상의 프로세서들에서 또는 프로세싱 회로부에서 구현될 수도 있다. 예를 들어, 비디오 디코더 (300) 의 유닛들은 하드웨어 회로부의 부분으로서, 또는 프로세서, ASIC, 또는 FPGA 의 부분으로서 하나 이상의 회로들 또는 로직 엘리먼트들로서 구현될 수도 있다. 더욱이, 비디오 디코더 (300) 는 이들 및 다른 기능들을 수행하기 위해 추가적인 또는 대안적인 프로세서들 또는 프로세싱 회로부를 포함할 수도 있다.In the example of FIG. 4 , video decoder 300 includes coded picture buffer (CPB) memory 320 , entropy decoding unit 302 , prediction processing unit 304 , inverse quantization unit 306 , inverse transform processing unit 308 , reconstruction unit 310 , filter unit 312 , and decoded picture buffer (DPB) 314 . CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, and DPB ( Any or all of 314) may be implemented in one or more processors or in processing circuitry. For example, the units of video decoder 300 may be implemented as one or more circuits or logic elements as part of hardware circuitry, or as part of a processor, ASIC, or FPGA. Moreover, video decoder 300 may include additional or alternative processors or processing circuitry to perform these and other functions.

예측 프로세싱 유닛 (304) 은 모션 보상 유닛 (316) 및 인트라-예측 유닛 (318) 을 포함한다. 예측 프로세싱 유닛 (304) 은 다른 예측 모드들에 따라 예측을 수행하기 위해 추가적인 유닛들을 포함할 수도 있다. 예들로서, 예측 프로세싱 유닛 (304) 은 팔레트 유닛, 인트라-블록 카피 유닛 (이는 모션 보상 유닛 (316) 의 부분을 형성할 수도 있음), 아핀 유닛, 선형 모델 (LM) 유닛 등을 포함할 수도 있다. 다른 예들에 있어서, 비디오 디코더 (300) 는 더 많거나, 더 적거나, 또는 상이한 기능 컴포넌트들을 포함할 수도 있다.Prediction processing unit 304 includes motion compensation unit 316 and intra-prediction unit 318 . Prediction processing unit 304 may include additional units to perform prediction according to other prediction modes. As examples, prediction processing unit 304 may include a palette unit, an intra-block copy unit (which may form part of motion compensation unit 316 ), an affine unit, a linear model (LM) unit, etc. . In other examples, video decoder 300 may include more, fewer, or different functional components.

CPB 메모리 (320) 는, 비디오 디코더 (300) 의 컴포넌트들에 의해 디코딩될, 인코딩된 비디오 비트스트림과 같은 비디오 데이터를 저장할 수도 있다. CPB 메모리 (320) 에 저장된 비디오 데이터는, 예를 들어, 컴퓨터 판독가능 매체 (110) (도 1) 로부터 획득될 수도 있다. CPB 메모리 (320) 는 인코딩된 비디오 비트스트림으로부터 인코딩된 비디오 데이터 (예컨대, 신택스 엘리먼트들) 를 저장하는 CPB 를 포함할 수도 있다. 또한, CPB 메모리 (320) 는 비디오 디코더 (300) 의 다양한 유닛들로부터의 출력들을 나타내는 일시적 데이터와 같은, 코딩된 픽처의 신택스 엘리먼트들 이외의 비디오 데이터를 저장할 수도 있다. DPB (314) 는 일반적으로, 인코딩된 비디오 비트스트림의 후속 데이터 또는 픽처들을 디코딩할 때 레퍼런스 비디오 데이터로서 비디오 디코더 (300) 가 출력 및/또는 사용할 수도 있는 디코딩된 픽처들을 저장한다. CPB 메모리 (320) 및 DPB (314) 는 SDRAM 을 포함한 DRAM, MRAM, RRAM, 또는 다른 타입들의 메모리 디바이스들과 같은 다양한 메모리 디바이스들 중 임의의 메모리 디바이스에 의해 형성될 수도 있다. CPB 메모리 (320) 및 DPB (314) 는 동일한 메모리 디바이스 또는 별도의 메모리 디바이스들에 의해 제공될 수도 있다. 다양한 예들에 있어서, CPB 메모리 (320) 는 비디오 디코더 (300) 의 다른 컴포넌트들과 온-칩형이거나 또는 그들 컴포넌트들에 대하여 오프-칩형일 수도 있다.CPB memory 320 may store video data, such as an encoded video bitstream, to be decoded by the components of video decoder 300 . Video data stored in CPB memory 320 may be obtained, for example, from computer readable medium 110 (FIG. 1). CPB memory 320 may include a CPB that stores encoded video data (eg, syntax elements) from an encoded video bitstream. CPB memory 320 may also store video data other than syntax elements of a coded picture, such as temporary data representing outputs from various units of video decoder 300 . DPB 314 generally stores decoded pictures that video decoder 300 may output and/or use as reference video data when decoding pictures or subsequent data of an encoded video bitstream. CPB memory 320 and DPB 314 may be formed by any of a variety of memory devices, such as DRAM including SDRAM, MRAM, RRAM, or other types of memory devices. CPB memory 320 and DPB 314 may be provided by the same memory device or separate memory devices. In various examples, CPB memory 320 may be on-chip with other components of video decoder 300 or off-chip with respect to those components.

부가적으로 또는 대안적으로, 일부 예들에 있어서, 비디오 디코더 (300) 는 메모리 (120) (도 1) 로부터 코딩된 비디오 데이터를 취출할 수도 있다. 즉, 메모리 (120) 는 CPB 메모리 (320) 로 상기 논의된 바와 같이 데이터를 저장할 수도 있다. 마찬가지로, 메모리 (120) 는, 비디오 디코더 (300) 의 기능성의 일부 또는 전부가 비디오 디코더 (300) 의 프로세싱 회로부에 의해 실행되도록 소프트웨어에서 구현될 때, 비디오 디코더 (300) 에 의해 실행될 명령들을 저장할 수도 있다.Additionally or alternatively, in some examples, video decoder 300 may retrieve coded video data from memory 120 (FIG. 1). That is, memory 120 may store data as discussed above with CPB memory 320 . Likewise, memory 120 may store instructions to be executed by video decoder 300 when some or all of the functionality of video decoder 300 is implemented in software to be executed by processing circuitry of video decoder 300 . there is.

도 4 에 도시된 다양한 유닛들은 비디오 디코더 (300) 에 의해 수행되는 동작들의 이해를 돕기 위해 예시된다. 그 유닛들은 고정 기능 회로들, 프로그래밍가능 회로들, 또는 이들의 조합으로서 구현될 수도 있다. 도 3 과 유사하게, 고정 기능 회로들은 특정 기능성을 제공하는 회로들을 지칭하고, 수행될 수 있는 동작들에 대해 미리설정된다. 프로그래밍가능 회로들은 다양한 태스크들을 수행하도록 프로그래밍될 수 있는 회로들을 지칭하고, 수행될 수 있는 동작들에서 유연한 기능성을 제공한다. 예를 들어, 프로그래밍가능 회로들은, 프로그래밍가능 회로들로 하여금 소프트웨어 또는 펌웨어의 명령들에 의해 정의된 방식으로 동작하게 하는 소프트웨어 또는 펌웨어를 실행할 수도 있다. 고정 기능 회로들은 (예컨대, 파라미터들을 수신하거나 또는 파라미터들을 출력하기 위해) 소프트웨어 명령들을 실행할 수도 있지만, 고정 기능 회로들이 수행하는 동작들의 타입들은 일반적으로 불변이다. 일부 예들에 있어서, 유닛들 중 하나 이상은 별개의 회로 블록들 (고정 기능 또는 프로그래밍가능) 일 수도 있고, 일부 예들에 있어서, 유닛들 중 하나 이상은 집적 회로들일 수도 있다.Various units shown in FIG. 4 are illustrated to aid understanding of the operations performed by video decoder 300 . The units may be implemented as fixed function circuits, programmable circuits, or a combination thereof. Similar to FIG. 3, fixed function circuits refer to circuits that provide specific functionality and are preset for operations that can be performed. Programmable circuits refer to circuits that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, programmable circuits may execute software or firmware that causes the programmable circuits to operate in a manner defined by instructions in the software or firmware. Fixed function circuits may execute software instructions (eg, to receive parameters or output parameters), but the types of operations they perform are generally immutable. In some examples, one or more of the units may be discrete circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be integrated circuits.

비디오 디코더 (300) 는 프로그래밍가능 회로들로부터 형성된, ALU들, EFU들, 디지털 회로들, 아날로그 회로들, 및/또는 프로그래밍가능 코어들을 포함할 수도 있다. 비디오 디코더 (300) 의 동작들이 프로그래밍가능 회로들 상에서 실행하는 소프트웨어에 의해 수행되는 예들에 있어서, 온-칩 또는 오프-칩 메모리는, 비디오 디코더 (300) 가 수신 및 실행하는 소프트웨어의 명령들 (예컨대, 오브젝트 코드) 을 저장할 수도 있다.Video decoder 300 may include ALUs, EFUs, digital circuits, analog circuits, and/or programmable cores formed from programmable circuits. In examples where the operations of video decoder 300 are performed by software executing on programmable circuits, on-chip or off-chip memory may be used to store instructions of software that video decoder 300 receives and executes (e.g., , object code) may be stored.

엔트로피 디코딩 유닛 (302) 은 인코딩된 비디오 데이터를 CPB 로부터 수신하고, 비디오 데이터를 엔트로피 디코딩하여 신택스 엘리먼트들을 재생할 수도 있다. 예측 프로세싱 유닛 (304), 역 양자화 유닛 (306), 역 변환 프로세싱 유닛 (308), 복원 유닛 (310), 및 필터 유닛 (312) 은 비트스트림으로부터 추출된 신택스 엘리먼트들에 기초하여 디코딩된 비디오 데이터를 생성할 수도 있다.Entropy decoding unit 302 may receive encoded video data from the CPB and entropy decode the video data to reproduce syntax elements. Prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, and filter unit 312 decoded video data based on syntax elements extracted from the bitstream. can also create

일반적으로, 비디오 디코더 (300) 는 블록 단위 (block-by-block) 기반으로 픽처를 복원한다. 비디오 디코더 (300) 는 개별적으로 각각의 블록에 대해 복원 동작을 수행할 수도 있다 (여기서, 현재 복원되고 있는, 즉 디코딩되고 있는 블록은 "현재 블록" 으로서 지칭될 수도 있음).In general, video decoder 300 reconstructs a picture on a block-by-block basis. Video decoder 300 may perform a reconstruction operation on each block individually (where the block currently being reconstructed, ie, being decoded, may be referred to as a “current block”).

엔트로피 디코딩 유닛 (302) 은, 양자화 파라미터 (QP) 및/또는 변환 모드 표시(들)와 같은 변환 정보 뿐만 아니라, 양자화된 변환 계수 블록의 양자화된 변환 계수들을 정의하는 신택스 엘리먼트들을 엔트로피 디코딩할 수도 있다. 역 양자화 유닛 (306) 은 양자화된 변환 계수 블록과 연관된 QP 를 사용하여, 양자화도 및 유사하게, 역 양자화 유닛 (306) 이 적용할 역 양자화도를 결정할 수도 있다. 역 양자화 유닛 (306) 은, 예를 들어, 양자화된 변환 계수들을 역 양자화하기 위해 비트단위 좌측-시프트 연산을 수행할 수도 있다. 이에 의해, 역 양자화 유닛 (306) 은 변환 계수들을 포함하는 변환 계수 블록을 형성할 수도 있다.Entropy decoding unit 302 may entropy decode syntax elements that define quantized transform coefficients of a quantized transform coefficient block, as well as transform information, such as a quantization parameter (QP) and/or transform mode indication(s). . Inverse quantization unit 306 may use the QP associated with the quantized transform coefficient block to determine a degree of quantization and, similarly, a degree of inverse quantization for inverse quantization unit 306 to apply. Inverse quantization unit 306 may, for example, perform a bitwise left-shift operation to inverse quantize the quantized transform coefficients. Thereby, inverse quantization unit 306 may form a transform coefficient block that includes transform coefficients.

역 양자화 유닛 (306) 이 변환 계수 블록을 형성한 이후, 역 변환 프로세싱 유닛 (308) 은 현재 블록과 연관된 잔차 블록을 생성하기 위해 변환 계수 블록에 하나 이상의 역 변환들을 적용할 수도 있다. 예를 들어, 역 변환 프로세싱 유닛 (308) 은 역 DCT, 역 정수 변환, 역 Karhunen-Loeve 변환 (KLT), 역 회전 변환, 역 방향성 변환, 또는 다른 역 변환을 변환 계수 블록에 적용할 수도 있다.After inverse quantization unit 306 forms a transform coefficient block, inverse transform processing unit 308 may apply one or more inverse transforms to the transform coefficient block to produce a residual block associated with the current block. For example, inverse transform processing unit 308 may apply an inverse DCT, inverse integer transform, inverse Karhunen-Loeve transform (KLT), inverse rotation transform, inverse directional transform, or other inverse transform to the transform coefficient block.

더욱이, 예측 프로세싱 유닛 (304) 은, 엔트로피 디코딩 유닛 (302) 에 의해 엔트로피 디코딩된 예측 정보 신택스 엘리먼트들에 따라 예측 블록을 생성한다. 예를 들어, 예측 정보 신택스 엘리먼트들이 현재 블록이 인터-예측됨을 표시하면, 모션 보상 유닛 (316) 은 예측 블록을 생성할 수도 있다. 이 경우, 예측 정보 신택스 엘리먼트들은 레퍼런스 블록을 취출할 DPB (314) 에서의 레퍼런스 픽처 뿐만 아니라 현재 픽처에서의 현재 블록의 위치에 대한 레퍼런스 픽처에서의 레퍼런스 블록의 위치를 식별하는 모션 벡터를 표시할 수도 있다. 모션 보상 유닛 (316) 은 일반적으로, 모션 보상 유닛 (224) (도 3) 에 대하여 설명된 것과 실질적으로 유사한 방식으로 인터-예측 프로세스를 수행할 수도 있다.Moreover, prediction processing unit 304 generates a predictive block according to the predictive information syntax elements entropy decoded by entropy decoding unit 302 . For example, if the prediction information syntax elements indicate that the current block is inter-predicted, motion compensation unit 316 may generate the predictive block. In this case, the predictive information syntax elements may indicate a reference picture in DPB 314 from which to derive the reference block, as well as a motion vector that identifies the position of the reference block in the reference picture relative to the position of the current block in the current picture. there is. Motion compensation unit 316 may generally perform the inter-prediction process in a manner substantially similar to that described with respect to motion compensation unit 224 (FIG. 3).

다른 예로서, 예측 정보 신택스 엘리먼트들이 현재 블록이 인트라-예측됨을 표시하면, 인트라-예측 유닛 (318) 은 예측 정보 신택스 엘리먼트들에 의해 표시된 인트라-예측 모드에 따라 예측 블록을 생성할 수도 있다. 다시, 인트라-예측 유닛 (318) 은 일반적으로, 인트라-예측 유닛 (226) (도 3) 에 대하여 설명된 것과 실질적으로 유사한 방식으로 인트라-예측 프로세스를 수행할 수도 있다. 인트라-예측 유닛 (318) 은 DPB (314) 로부터 현재 블록에 대한 이웃 샘플들의 데이터를 취출할 수도 있다.As another example, if the prediction information syntax elements indicate that the current block is intra-predicted, intra-prediction unit 318 may generate the predictive block according to the intra-prediction mode indicated by the prediction information syntax elements. Again, intra-prediction unit 318 may generally perform the intra-prediction process in a manner substantially similar to that described with respect to intra-prediction unit 226 (FIG. 3). Intra-prediction unit 318 may retrieve data of neighboring samples for the current block from DPB 314 .

복원 유닛 (310) 은 예측 블록 및 잔차 블록을 사용하여 현재 블록을 복원할 수도 있다. 예를 들어, 복원 유닛 (310) 은 잔차 블록의 샘플들을 예측 블록의 대응하는 샘플들에 가산하여 현재 블록을 복원할 수도 있다.Reconstruction unit 310 may reconstruct a current block using the predictive block and the residual block. For example, reconstruction unit 310 may add samples of the residual block to corresponding samples of the predictive block to reconstruct the current block.

필터 유닛 (312) 은 복원된 블록들에 대해 하나 이상의 필터 동작들을 수행할 수도 있다. 예를 들어, 필터 유닛 (312) 은 복원된 블록들의 에지들을 따라 블록화 아티팩트들을 감소시키기 위해 디블록킹 동작들을 수행할 수도 있다. 필터 유닛 (312) 의 동작들은 모든 예들에서 반드시 수행되는 것은 아니다.Filter unit 312 may perform one or more filter operations on the reconstructed blocks. For example, filter unit 312 may perform deblocking operations to reduce blockiness artifacts along the edges of reconstructed blocks. Operations of filter unit 312 are not necessarily performed in all examples.

비디오 디코더 (300) 는 복원된 블록들을 DPB (314) 에 저장할 수도 있다. 예를 들어, 필터 유닛 (312) 의 동작들이 수행되는 않은 예들에 있어서, 복원 유닛 (310) 이, 복원된 블록들을 DPB (314) 에 저장할 수도 있다. 필터 유닛 (312) 의 동작들이 수행되는 예들에 있어서, 필터 유닛 (312) 이, 필터링된 복원된 블록들을 DPB (314) 에 저장할 수도 있다. 상기 논의된 바와 같이, DPB (314) 는 예측 프로세싱 유닛 (304) 에, 후속 모션 보상을 위한 이전에 디코딩된 픽처들 및 인트라-예측을 위한 현재 픽처의 샘플들과 같은 레퍼런스 정보를 제공할 수도 있다. 더욱이, 비디오 디코더 (300) 는 도 1 의 디스플레이 디바이스 (118) 와 같은 디스플레이 디바이스 상으로의 후속 프리젠테이션을 위해 DPB (314) 로부터 디코딩된 픽처들 (예컨대, 디코딩된 비디오) 을 출력할 수도 있다.Video decoder 300 may store the reconstructed blocks in DPB 314 . For example, in examples where the operations of filter unit 312 are not performed, reconstruction unit 310 may store the reconstructed blocks to DPB 314 . In examples in which the operations of filter unit 312 are performed, filter unit 312 may store the filtered reconstructed blocks to DPB 314 . As discussed above, DPB 314 may provide reference information to prediction processing unit 304, such as previously decoded pictures for subsequent motion compensation and samples of the current picture for intra-prediction. . Moreover, video decoder 300 may output decoded pictures (eg, decoded video) from DPB 314 for subsequent presentation on a display device, such as display device 118 of FIG. 1 .

이러한 방식으로, 비디오 디코더 (300) 는 비디오 디코딩 디바이스의 일 예를 나타내고, 비디오 디코딩 디바이스는 비디오 데이터를 저장하도록 구성된 메모리, 및 회로부에서 구현된 하나 이상의 프로세싱 유닛들을 포함하고, 하나 이상의 프로세싱 유닛들은, 입력 블록을 복수의 서브블록들로 분할하는 것으로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하고, 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하고, 서브블록을 복수의 서브-서브블록들로 분할하고, 서브-서브블록들 중 하나 이상에 대한 정세화된 모션 벡터를 결정하는 것으로서, 하나 이상의 서브-서브블록들 중의 서브-서브블록에 대한 정세화된 모션 벡터는 서브-서브블록에서의 복수의 샘플들에 대해 동일한, 상기 정세화된 모션 벡터를 결정하고, 그리고 하나 이상의 서브-서브블록들에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하도록 구성된다.In this way, the video decoder 300 represents an example of a video decoding device, the video decoding device including a memory configured to store video data, and one or more processing units implemented in circuitry, the one or more processing units comprising: Dividing the input block into a plurality of subblocks, wherein the size of the input block is less than or equal to the size of a coding unit, bidirectional optical flow (BDOF) based on a condition being satisfied Determining that is to be applied to a subblock of the plurality of subblocks, dividing the subblock into a plurality of sub-subblocks, and determining a refined motion vector for one or more of the sub-subblocks, wherein one or more Determine the refined motion vector for a sub-subblock of the sub-subblocks, which is the same for a plurality of samples in the sub-subblock, and for one or more sub-subblocks. It is configured to perform BDOF on a subblock based on the refined motion vector.

다른 예로서, 비디오 디코더 (300) 의 프로세싱 회로부 (예컨대, 모션 보상 유닛 (316)) 는, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하고, BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하고, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하고, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하고, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하고, 그리고 예측 샘플들에 기초하여 블록을 복원하도록 구성될 수도 있다. 예를 들어, 프로세싱 회로부는 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 수신하고, 잔차 값들을 예측 샘플들에 가산하여 블록을 복원할 수도 있다.As another example, processing circuitry (e.g., motion compensation unit 316) of video decoder 300 determines that bi-directional optical flow (BDOF) is enabled for a block of video data, and BDOF is enabled for the block. divides the block into a plurality of sub-blocks based on the determination that it is, determines individual distortion values for each sub-block of one or more sub-blocks among the plurality of sub-blocks, and determines a plurality of sub-blocks based on the individual distortion values. For each subblock of one or more subblocks of the blocks, determine that one of the per-pixel BDOFs are performed or the BDOF is bypassed, and based on the determination that the per-pixel BDOFs are performed or the BDOFs are bypassed, one or more subblocks of the blocks are performed. It may be configured to determine predictive samples for each subblock of the blocks, and to reconstruct the block based on the predictive samples. For example, processing circuitry may receive residual values representing differences between the predicted samples and samples of the block, and add the residual values to the predicted samples to reconstruct the block.

다음은 HEVC 에 있어서의 CU 구조 및 모션 벡터 예측을 설명한다. 다음은 CU 및 모션 벡터 예측의 상기 설명에 추가적인 컨텍스트를 제공할 수도 있고, 이해를 돕기 위해 상기 설명의 일부 반복을 포함할 수도 있다.The following describes the CU structure and motion vector prediction in HEVC. The following may provide additional context to the above description of CU and motion vector prediction, and may include some repetitions of the above description to aid understanding.

HEVC 에 있어서, 슬라이스에서의 최대 코딩 유닛은 코딩 트리 블록 (CTB) 또는 코딩 트리 유닛 (CTU) 으로 지칭된다. CTB 는 쿼드-트리를 포함하고, 그 노드들은 코딩 유닛들이다. CTB 의 사이즈는 HEVC 메인 프로파일에서 (기술적으로 8x8 CTB 사이즈들이 지원될 수 있더라도) 16x16 으로부터 64x64 까지의 범위들일 수 있다. 코딩 유닛 (CU) 은 8x8 만큼 작도록 CTB 의 동일한 사이즈일 수 있다. 각각의 코딩 유닛은 하나의 모드, 즉, 인터 또는 인트라로 코딩된다. CU 가 인터 코딩될 경우, CU 는 2 또는 4개의 예측 유닛들 (PU들) 로 추가로 파티셔닝될 수도 있거나, 또는 추가의 파티션이 적용되지 않을 경우 단지 하나의 PU 가 될 수도 있다. 2개의 PU들이 하나의 CU 에 존재할 경우, 그 PU들은 하프 사이즈 직사각형들 또는 CU 의 ¼ 또는 ¾ 사이즈인 2개의 직사각형 사이즈일 수 있다. CU 가 인터 코딩될 경우, 각각의 PU 는, 고유한 인터 예측 모드로 도출되는 모션 정보의 하나의 세트를 갖는다.In HEVC, the largest coding unit in a slice is referred to as a coding tree block (CTB) or coding tree unit (CTU). CTB contains a quad-tree, and its nodes are coding units. The size of the CTB can range from 16x16 to 64x64 (although technically 8x8 CTB sizes can be supported) in the HEVC main profile. A coding unit (CU) can be the same size of a CTB to be as small as 8x8. Each coding unit is coded in one mode, ie inter or intra. When a CU is inter-coded, the CU may be further partitioned into 2 or 4 prediction units (PUs), or may be just one PU if no further partition is applied. When two PUs exist in one CU, the PUs can be half-size rectangles or two rectangles that are ¼ or ¾ the size of the CU. When CUs are inter-coded, each PU has one set of motion information derived with a unique inter-prediction mode.

다음은 모션 벡터 예측을 설명한다. HEVC 표준에 있어서, 2개의 인터 예측 모드들, 즉, 예측 유닛 (PU) 에 대해 각각 병합 (스킵은 병합의 특별 케이스로서 고려됨) 및 어드밴스드 모션 벡터 예측 (AMVP) 모드들이 존재한다.Next, motion vector prediction is explained. In the HEVC standard, there are two inter-prediction modes: merge (skip is considered as a special case of merge) and advanced motion vector prediction (AMVP) modes, respectively for a prediction unit (PU).

AMVP 모드 또는 병합 모드 중 어느 하나에 있어서, 모션 벡터 (MV) 후보 리스트가 다중의 모션 벡터 예측자들에 대해 유지된다. 현재 PU 의 병합 모드에서의 레퍼런스 인덱스들 뿐 아니라 모션 벡터(들)가 MV 후보 리스트로부터 하나의 후보를 취함으로써 생성된다.For either AMVP mode or merge mode, a motion vector (MV) candidate list is maintained for multiple motion vector predictors. Motion vector(s) as well as reference indices in the merge mode of the current PU are generated by taking one candidate from the MV candidate list.

MV 후보 리스트는 병합 모드에 대한 5개까지의 후보들 및 AMVP 모드에 대한 오직 2개의 후보들을 포함한다. 병합 후보는 모션 정보의 세트, 예컨대, 레퍼런스 픽처 리스트들 (리스트 0 및 리스트 1) 및 레퍼런스 인덱스들 양자 모두에 대응하는 모션 벡터들을 포함할 수도 있다. 병합 후보가 병합 인덱스에 의해 식별되면, 현재 블록들의 예측을 위해 사용된 레퍼런스 픽처들 뿐 아니라 연관된 모션 벡터들이 결정된다. 한편, 리스트 0 또는 리스트 1 중 어느 하나로부터의 각각의 잠재적 예측 방향에 대한 AMVP 모드 하에서, AMVP 후보는 오직 모션 벡터만을 포함하기 때문에, MV 후보 리스트에 대한 MV 예측자 (MVP) 인덱스와 함께, 레퍼런스 인덱스가 명시적으로 시그널링되어야 한다. AMVP 모드에 있어서, 예측 모션 벡터들은 더 정세화될 수 있다. 양자 모두의 모드들에 대한 후보들은 동일한 공간 및 시간 이웃 블록들로부터 유사하게 도출된다.The MV candidate list includes up to 5 candidates for merge mode and only 2 candidates for AMVP mode. A merge candidate may contain a set of motion information, eg, motion vectors corresponding to both reference picture lists (List 0 and List 1) and reference indices. Once a merge candidate is identified by a merge index, reference pictures used for prediction of current blocks as well as associated motion vectors are determined. On the other hand, under the AMVP mode for each potential prediction direction from either List 0 or List 1, since the AMVP candidate contains only motion vectors, along with the MV predictor (MVP) index for the MV candidate list, the reference The index must be explicitly signaled. For AMVP mode, the predicted motion vectors can be further refined. Candidates for both modes are similarly derived from the same spatial and temporal neighboring blocks.

다음은 공간 이웃 후보들을 설명한다. 예를 들어, 도 5a 및 도 5b 는, 각각, 병합 모드 및 어드밴스드 모션 벡터 예측자 (AMVP) 모드에 대한 공간 이웃 모션 벡터 후보들의 예들을 예시한 개념 다이어그램들이다.The spatial neighbor candidates are described next. For example, FIGS. 5A and 5B are conceptual diagrams illustrating examples of spatial neighboring motion vector candidates for merge mode and advanced motion vector predictor (AMVP) mode, respectively.

공간 MV 후보들이 특정 PU (PU₀) (500) 에 대해 도 5a 및 도 5b 에 도시된 이웃 블록들로부터 도출되지만, 블록들로부터 후보들을 생성하는 방법들은 병합 및 AMVP 모드들에 대해 상이하다. 병합 모드에 있어서, 수치들을 갖는 도 5a 에 도시된 순서들로 4개까지의 공간 MV 후보들이 도출될 수 있으며, 그 순서는 다음과 같다: 즉, 도 5a 에 도시된 바와 같이, 좌측 (0, A1), 상부 (1, B1), 우상부 (2, B0), 좌하부 (3, A0), 및 좌상부 (4, B2).Although spatial MV candidates are derived from neighboring blocks shown in FIGS. 5A and 5B for a particular PU (PU ₀ ) 500 , the methods of generating candidates from blocks are different for merge and AMVP modes. In the merge mode, up to four spatial MV candidates can be derived in the order shown in Fig. 5A with numerical values, and the order is as follows: i.e., as shown in Fig. 5A, left (0, A1), top (1, B1), top right (2, B0), bottom left (3, A0), and top left (4, B2).

AVMP 모드에 있어서, 이웃 블록들은 2개의 그룹들: 즉, 도 5b 의 PU0 (502) 에서 도시된 바와 같이, 블록들 0 및 1 로 이루어진 좌측 그룹, 및 블록들 2, 3, 및 4 로 이루어진 상부 그룹으로 분할된다. 각각의 그룹에 대해, 시그널링된 레퍼런스 인덱스에 의해 표시된 것과 동일한 레퍼런스 픽처를 참조하는 이웃한 블록에서의 잠재적인 후보가 그룹의 최종 후보를 형성하도록 선택될 최고 우선순위를 갖는다. 모든 이웃 블록들이 동일한 레퍼런스 픽처를 포인팅하는 모션 벡터를 포함하지 않는 것이 가능하다. 따라서, 그러한 후보가 발견될 수 없으면, 최종 후보를 형성하기 위해 제 1 이용가능 후보가 스케일링될 수도 있고, 따라서, 시간 거리 차이들이 보상될 수 있다.In AVMP mode, the neighboring blocks are divided into two groups: the left group consisting of blocks 0 and 1, and the upper group consisting of blocks 2, 3, and 4, as shown in PU0 502 of FIG. 5B. divided into groups For each group, potential candidates in neighboring blocks that refer to the same reference picture as indicated by the signaled reference index have the highest priority to be selected to form the group's final candidates. It is possible that all neighboring blocks do not contain a motion vector pointing to the same reference picture. Thus, if such a candidate cannot be found, the first available candidate may be scaled to form a final candidate, thus temporal distance differences may be compensated for.

다음은 HEVC 에 있어서의 시간 모션 벡터 예측을 설명한다. 시간 모션 벡터 예측기 (TMVP) 후보는, 인에이블되고 이용가능하다면, 공간 모션 벡터 후보들 이후에 MV 후보 리스트에 추가된다. TMVP 후보에 대한 모션 벡터 도출의 프로세스는 병합 및 AMVP 모드들 양자 모두에 대해 동일하지만, 병합 모드에서 TMVP 후보에 대한 타겟 레퍼런스 인덱스는 항상 0 으로 설정된다.The following describes temporal motion vector prediction in HEVC. A temporal motion vector predictor (TMVP) candidate, if enabled and available, is added to the MV candidate list after the spatial motion vector candidates. The process of motion vector derivation for a TMVP candidate is the same for both merge and AMVP modes, but the target reference index for a TMVP candidate in merge mode is always set to 0.

TMVP 후보 도출을 위한 1차 블록 위치는, 공간 이웃 후보들을 생성하는데 사용되는 상부 및 좌측 블록들에 대한 바이어스를 보상하기 위해, 블록 602 로서 예시된 블록 "T" 로서 도 6a 에 도시된 바와 같이, 병치된 PU 의 외부에 있는 우하부 블록이다. 하지만, 그 블록이 현재의 CTB 행의 외부에 위치되거나 또는 모션 정보가 이용가능하지 않으면, 그 블록은, 블록 604 로서 예시된 PU 의 중심 블록으로 대체된다.The primary block location for TMVP candidate derivation is block “T” illustrated as block 602, as shown in FIG. It is the lower right block outside the collocated PU. However, if the block is located outside the current CTB row or motion information is not available, then the block is replaced with the central block of the PU, illustrated as block 604 .

TMVP 후보에 대한 모션 벡터는, 슬라이스 레벨에서 표시된, 병치된 픽처의 병치된 PU 로부터 도출된다. 병치된 PU 에 대한 모션 벡터는 병치된 MV 로 지칭된다. AVC 에서의 시간 다이렉트 모드와 유사하게, TMVP 후보 모션 벡터를 도출하기 위해, 병치된 MV 는 도 6b 에 도시된 바와 같이, 시간 거리 차이들을 보상하도록 스케일링되어야 한다.The motion vector for the TMVP candidate is derived from the collocated PU of the collocated picture, indicated at the slice level. A motion vector for a collocated PU is referred to as a collocated MV. Similar to the temporal direct mode in AVC, to derive the TMVP candidate motion vector, the collocated MV must be scaled to compensate for temporal distance differences, as shown in FIG. 6B.

다음은 HEVC 에 있어서의 모션 예측의 추가적인 양태들을 설명한다. 병합 및 AMVP 모드들의 수개의 양태들은 다음과 같이 언급할 가치가 있다. 모션 벡터 스케일링: 모션 벡터들의 값은 그 프리젠테이션 시간에서의 픽처들의 거리에 비례함이 가정된다. 모션 벡터는 2개의 픽처들: 즉, 레퍼런스 픽처 및 모션 벡터를 포함하는 픽처 (즉, 포함 픽처) 를 연관시킨다. 모션 벡터가 다른 모션 벡터를 예측하는데 활용될 경우, 포함 픽처와 레퍼런스 픽처의 거리가 픽처 순서 카운트 (POC) 값들에 기초하여 계산된다.The following describes additional aspects of motion prediction in HEVC. Several aspects of merge and AMVP modes are worth mentioning as follows. Motion vector scaling: It is assumed that the value of motion vectors is proportional to the distance of pictures at their presentation time. A motion vector associates two pictures: a reference picture and the picture containing the motion vector (ie, the containing picture). When a motion vector is used to predict another motion vector, the distance between the containing picture and the reference picture is calculated based on picture order count (POC) values.

예측될 모션 벡터에 대해, 그의 연관된 포함 픽처 및 레퍼런스 픽처 양자는 상이할 수도 있다. 따라서, (POC 에 기초하여) 새로운 거리가 계산된다. 그리고, 모션 벡터는 이들 2개의 POC 거리들에 기초하여 스케일링된다. 공간 이웃 후보에 대해, 2개의 모션 벡터들에 대한 포함 픽처들은 동일한 반면, 레퍼런스 픽처들은 상이하다. HEVC 에서, 모션 벡터 스케일링이 공간 및 시간 이웃 후보들에 대해 TMVP 및 AMVP 양자 모두에 적용된다.For a motion vector to be predicted, both its associated containing picture and reference picture may be different. Thus, a new distance is calculated (based on the POC). And, the motion vector is scaled based on these two POC distances. For a spatial neighbor candidate, the containing pictures for the two motion vectors are the same, while the reference pictures are different. In HEVC, motion vector scaling is applied to both TMVP and AMVP for spatial and temporal neighbor candidates.

인공 모션 벡터 후보 생성: 모션 벡터 후보 리스트가 완료되지 않으면, 인공 모션 벡터 후보들이 생성되고, 그것이 모든 후보들을 가질 때까지 리스트의 말단에 삽입된다.Artificial motion vector candidate generation: If the motion vector candidate list is not complete, artificial motion vector candidates are generated and inserted at the end of the list until it has all candidates.

병합 모드에 있어서, 2개의 타입들의 인공 MV 후보들: 즉, 오직 B-슬라이스들에 대해서만 도출되는 결합된 후보, 및 제 1 타입이 충분한 인공 후보들을 제공하지 않으면 오직 AMVP 에 대해서만 사용되는 제로 후보들이 존재한다. 후보 리스트에 이미 있고 필요한 모션 정보를 갖는 후보들의 각각의 쌍에 대해, 양방향 결합된 모션 벡터 후보들이 리스트 0 에서의 픽처를 참조하는 제 1 후보의 모션 벡터와 리스트 1 에서의 픽처를 참조하는 제 2 후보의 모션 벡터의 조합에 의해 도출된다.For merge mode, there are two types of artificial MV candidates: a combined candidate derived only for B-slices, and zero candidates used only for AMVP if the first type does not provide enough artificial candidates. do. For each pair of candidates that are already in the candidate list and have the necessary motion information, the bi-directionally combined motion vector candidates are the first candidate's motion vector referencing a picture in list 0 and the second candidate referencing a picture in list 1. It is derived by combining candidate motion vectors.

후보 삽입을 위한 프루닝 프로세스: 상이한 블록들로부터의 후보들은 동일하도록 발생할 수도 있으며, 이는 병합/AMVP 후보 리스트의 효율을 감소시킨다. 이러한 문제를 해결하기 위해 프루닝 프로세스가 적용된다. 이 프루닝 프로세스는, 특정 범위에서 동일한 후보를 삽입하는 것을 회피하기 위해 하나의 후보를 현재의 후보 리스트에서의 다른 후보들과 비교한다. 복잡도를 감소시키기 위해, 각각의 잠재적 후보를 모든 다른 기존 후보들과 비교하는 대신, 오직 제한된 수들의 프루닝 프로세스만이 적용된다.Pruning Process for Candidate Insertion: Candidates from different blocks may occur to be identical, which reduces the efficiency of the merge/AMVP candidate list. A pruning process is applied to solve these problems. This pruning process compares one candidate to other candidates in the current candidate list to avoid inserting the same candidate in a particular range. To reduce complexity, instead of comparing each potential candidate to all other existing candidates, only a limited number of pruning processes are applied.

다음은 템플릿 매칭 예측을 설명한다. 템플릿 매칭 (TM) 예측은 프레임 레이트 업 변환 (FRUC) 기법들에 기초한 특별한 병합 모드이다. 이 모드로, 블록의 모션 정보는 시그널링되지 않지만 (예를 들어, 비디오 디코더 (300) 에 의해) 디코더 측에서 도출된다. TM 예측은 AMVP 모드 및 정규 병합 모드 양자 모두에 적용된다. AMVP 모드에 있어서, MVP 후보 선택은, 현재 블록 템플릿과 레퍼런스 블록 템플릿 사이의 최소 차이에 도달하는 것을 픽업하기 위해 템플릿 매칭에 기초하여 결정된다. 정규 병합 모드에 있어서, TM 모드 플래그는 TM 의 사용을 표시하기 위해 시그널링된 다음, TM 은 MV 정세화를 위해 병합 인덱스에 의해 표시된 병합 후보에 적용된다.The following describes template matching prediction. Template matching (TM) prediction is a special merging mode based on frame rate up conversion (FRUC) techniques. In this mode, the block's motion information is not signaled but derived at the decoder side (eg, by video decoder 300). TM prediction applies to both AMVP mode and regular merge mode. In the AMVP mode, MVP candidate selection is determined based on template matching to pick up the one that reaches the minimum difference between the current block template and the reference block template. For regular merge mode, the TM mode flag is signaled to indicate the use of TM, then the TM is applied to the merge candidate indicated by the merge index for MV refinement.

도 7 에 도시된 바와 같이, 템플릿 매칭은 현재 프레임 (700) 에서의 템플릿 (현재 CU 의 상부 및/또는 좌측 이웃 블록들) 과 레퍼런스 프레임 (702) 에서의 블록 (템플릿과 동일한 사이즈) 사이의 가장 가까운 매칭을 발견함으로써 현재 CU 의 모션 정보를 도출하는데 사용된다. 초기 매칭 에러에 기초하여 선택된 AMVP 후보로, AMVP 후보의 MVP 는 템플릿 매칭에 의해 정세화된다. 시그널링된 병합 인덱스에 의해 표시된 병합 후보에 있어서, L0 및 L1 에 대응하는 병합 후보의 병합된 MV들은 템플릿 매칭에 의해 독립적으로 정세화되고, 그 다음, 덜 정확한 것은 이전으로서 더 양호한 것으로 다시 추가로 정세화된다.As shown in FIG. 7 , template matching is the closest match between a template in the current frame 700 (the top and/or left neighboring blocks of the current CU) and a block in the reference frame 702 (same size as the template). It is used to derive the motion information of the current CU by finding a close match. With the AMVP candidate selected based on the initial matching error, the MVP of the AMVP candidate is refined by template matching. For a merge candidate indicated by the signaled merge index, the merged MVs of the merge candidates corresponding to L0 and L1 are independently refined by template matching, and then the less accurate are further refined back to the better as before .

비용 함수에 대해, 모션 벡터가 분수 샘플 포지션을 포인팅할 경우, 모션 보상 보간이 활용될 수도 있다. 복잡도를 감소시키기 위해, 정규의 8탭 DCT-IF 보간 대신 쌍선형 (bi-linear) 보간이 레퍼런스 픽처들에 대한 템플릿들을 생성하기 위해 양자 모두의 템플릿 매칭에 사용된다. 템플릿 매칭의 매칭 비용 (C) 은 다음과 같이 계산된다:For the cost function, motion compensated interpolation may be utilized if the motion vector points to fractional sample positions. To reduce complexity, bi-linear interpolation instead of the regular 8-tap DCT-IF interpolation is used for both template matching to generate templates for reference pictures. The matching cost (C) of template matching is calculated as follows:

상기의 식에서, w 는 경험적으로 4 로 설정되는 가중 팩터이고, MV 및 MV^s 는, 각각, 현재 테스팅하는 MV 및 초기 MV (즉, AMVP 모드에서의 MVP 후보 또는 병합 모드에서의 병합된 모션) 를 나타낸다. 절대 차이의 합 (SAD) 이 템플릿 매칭의 매칭 비용으로서 사용된다.In the above equation, w is a weighting factor empirically set to 4, and MV and MV ^s are the currently testing MV and the initial MV (i.e. MVP candidate in AMVP mode or merged motion in merge mode), respectively. indicate The sum of absolute differences (SAD) is used as the matching cost of template matching.

TM 이 사용될 경우, 모션은 루마 샘플들만을 사용함으로써 정세화된다. 도출된 모션은 MC (모션 보상) 인터 예측을 위해 루마 및 크로마 양자 모두에 대해 사용될 수도 있다. MV 가 결정된 이후, 최종 MC 는 루마에 대해 8탭 보간 필터 및 크로마에 대해 4탭 보간 필터를 사용하여 수행된다.When TM is used, motion is refined by using only luma samples. The derived motion may be used for both luma and chroma for MC (motion compensated) inter prediction. After the MV is determined, the final MC is performed using an 8-tap interpolation filter for luma and a 4-tap interpolation filter for chroma.

탐색 방법에 대해, MV 정세화는 템플릿 매칭 비용의 기준을 갖는 패턴 기반 MV 탐색이다. 2개의 탐색 패턴들: 즉, MV 정세화를 위한 다이아몬드 탐색 및 크로스 탐색이 지원된다. MV 는 다이아몬드 패턴으로의 1/4 루마 샘플 MVD 정확도로 직접 탐색되고, 크로스 패턴으로의 1/4 루마 샘플 MVD 정확도가 후속되며, 그 다음, 이는 크로스 패턴으로의 1/8 루마 샘플 MVD 정세화가 후속된다. MV 정세화의 탐색 범위는 초기 MV 주변의 (-8, +8) 루마 샘플들과 동일하게 설정된다.Regarding the search method , MV refinement is a pattern-based MV search with a criterion of template matching cost. Two search patterns are supported: diamond search and cross search for MV refinement. MV is directly searched with 1/4 luma sample MVD accuracy into diamond pattern, followed by 1/4 luma sample MVD accuracy into cross pattern, which is then followed by 1/8 luma sample MVD refinement into cross pattern do. The search range of MV refinement is set equal to (-8, +8) luma samples around the initial MV.

다음은 양측성 매칭 예측을 설명한다. 양측성 매칭 (양측성 병합으로 또한 지칭됨) (BM) 예측은 프레임 레이트 업 변환 (FRUC) 기법들에 기초한 다른 병합 모드이다. 블록이 BM 모드를 적용하도록 결정될 경우, 2개의 개시 모션 벡터들 (MV0 및 MV1) 은 구성된 병합 리스트에서 병합 후보를 선택하기 위해 시그널링된 병합 후보 인덱스를 사용함으로써 도출된다. 양측성 매칭 탐색은 MV0 및 MV1 주변에서 있을 수도 있다. 최종 MV0' 및 MV1' 은 최소 양측성 매칭 비용에 기초하여 도출된다.The following describes bilateral matching prediction. Bilateral matching (also referred to as bilateral merging) (BM) prediction is another merging mode based on frame rate up conversion (FRUC) techniques. When a block is determined to apply the BM mode, two starting motion vectors (MV0 and MV1) are derived by using the signaled merge candidate index to select a merge candidate from the constructed merge list. A bilateral matching search may be around MV0 and MV1. The final MV0' and MV1' are derived based on the minimum bilateral matching cost.

2개의 레퍼런스 블록들을 포인팅하는 모션 벡터 차이 MVD0 (800) (MV0' - MV0 에 의해 표기됨) 및 MVD1 (802) (MV1' - MV1 에 의해 표기됨) 은 현재 픽처와 2개의 레퍼런스 픽처들 사이의 시간 거리들 (TD), 예를 들어, TD0 및 TD1 에 비례할 수도 있다. 도 8 은 MVD0 및 MVD1 의 일 예를 예시하며, 여기서, TD1 은 TD0 의 4배이다.The motion vector difference MVD0 800 (MV0' - denoted by MV0) and MVD1 802 (MV1' - denoted by MV1) pointing to two reference blocks is the difference between the current picture and the two reference pictures. It may be proportional to temporal distances (TD), eg, TD0 and TD1. 8 illustrates an example of MVD0 and MVD1, where TD1 is four times TD0.

하지만, MVD0 및 MVD1 이 시간 거리들 (TD0 및 TD1) 에 무관하게 미러링되는 옵션적인 설계가 있다. 도 9 는 미러링된 MVD0 (900) 및 MVD1 (902) 의 일 예를 예시하며, 여기서, TD1 은 TD0 의 4배이다.However, there is an optional design in which MVD0 and MVD1 are mirrored regardless of temporal distances (TD0 and TD1). 9 illustrates an example of mirrored MVD0 900 and MVD1 902, where TD1 is four times TD0.

양측성 매칭은 초기 MV0 및 MV1 주변에서 로컬 탐색을 수행하여 최종 MV0' 및 MV1' 을 도출한다. 로컬 탐색은 탐색 범위 [-8, 8] 를 루핑하기 위해 3×3 정사각형 탐색 패턴을 적용한다. 각각의 탐색 반복에 있어서, 탐색 패턴에서의 8개의 주위 MV들의 양측성 매칭 비용이 계산되고 중심 MV 의 양측성 매칭 비용과 비교된다. 최소 양측성 매칭 비용을 갖는 MV 는 다음 탐색 반복에 있어서 새로운 중심 MV 가 된다. 로컬 탐색은, 현재 중심 MV 가 3×3 정사각형 탐색 패턴 내에서 최소 비용을 갖거나 로컬 탐색이 미리정의된 최대 탐색 반복에 도달할 때 종료된다. 도 10 은 탐색 범위 [-8, 8] 에서의 3×3 정사각형 탐색 패턴 (1000) 의 일 예를 예시한다.Bilateral matching performs a local search around the initial MV0 and MV1 to derive the final MV0' and MV1'. The local search applies a 3x3 square search pattern to loop over the search range [-8, 8]. At each search iteration, the bilateral matching cost of the 8 surrounding MVs in the search pattern is calculated and compared to the bilateral matching cost of the central MV. The MV with the minimum bilateral matching cost becomes the new central MV in the next search iteration. A local search ends when the current center MV has a minimum cost within a 3x3 square search pattern or when the local search reaches a predefined maximum search iteration. 10 illustrates an example of a 3x3 square search pattern 1000 in the search range [−8, 8].

다음은 디코더측 모션 벡터 정세화를 설명한다. 병합 모드의 MV들의 정확도를 증가시키기 위해, 디코더측 모션 벡터 정세화 (DMVR) 가 VVC 에서 적용된다. 양방향 예측 동작에 있어서, 정세화된 MV 는, 레퍼런스 픽처 리스트 L0 및 레퍼런스 픽처 리스트 L1 에서의 초기 MV들 주변에서 탐색된다. DMVR 방법은, 레퍼런스 픽처 리스트 L0 및 리스트 L1 에서의 2개의 후보 블록들 사이의 왜곡을 계산한다. 도 11 에 예시된 바와 같이, 초기 MV 주변의 각각의 MV 후보에 기초한 블록들 (1102 및 1100) 사이의 SAD 가 계산된다. 가장 낮은 SAD 를 갖는 MV 후보는 정세화된 MV 가 되고 양방향 예측 신호를 생성하는데 사용된다.The following describes decoder-side motion vector refinement. To increase the accuracy of MVs in merge mode, decoder-side motion vector refinement (DMVR) is applied in VVC. In the bidirectional prediction operation, a refined MV is searched around initial MVs in the reference picture list L0 and the reference picture list L1. The DMVR method calculates the distortion between two candidate blocks in reference picture list L0 and list L1. As illustrated in FIG. 11 , SAD between blocks 1102 and 1100 based on each MV candidate around the initial MV is calculated. The MV candidate with the lowest SAD becomes the refined MV and is used to generate a bi-directional prediction signal.

DMVR 프로세스에 의해 도출되는 정세화된 MV 는 인터 예측 샘플들을 생성하는데 사용되고, 또한, 미래의 픽처 코딩을 위한 시간 모션 벡터 예측에서 사용된다. 오리지널 MV 가 디블록킹 프로세스에서 사용되고, 또한, 미래의 CU 코딩을 위한 공간 모션 벡터 예측에서 사용된다.The refined MV derived by the DMVR process is used to generate inter prediction samples and is also used in temporal motion vector prediction for future picture coding. The original MV is used in the deblocking process and also used in spatial motion vector prediction for future CU coding.

DMVR 은, 16x16 루마 샘플들의 미리정의된 최대 프로세싱 유닛을 갖는 서브블록 기반 병합 모드이다. CU 의 폭 및/또는 높이가 16 루마 샘플들보다 클 경우, CU 는 16 루마 샘플들과 동일한 폭 및/또는 높이를 갖는 서브블록들로 추가로 분할될 수도 있다.DMVR is a subblock-based merge mode with a predefined maximum processing unit of 16x16 luma samples. If the width and/or height of the CU is greater than 16 luma samples, the CU may be further divided into subblocks having a width and/or height equal to 16 luma samples.

다음은 탐색 방식을 설명한다. DVMR 에서, 초기 MV 및 MV 오프셋을 둘러싸고 있는 탐색 포인트들은 MV 차이 미러링 규칙을 따를 수도 있다. 예를 들어, 후보 MV 쌍 (MV0, MV1) 에 의해 표기되는, DMVR 에 의해 체크되는 임의의 포인트들은 다음의 2개의 식들을 따를 수도 있다:The search method is described below. In DVMR, the search points surrounding the initial MV and MV offset may follow the MV difference mirroring rule. For example, any points checked by the DMVR, denoted by a candidate MV pair (MV0, MV1), may conform to the following two equations:

상기의 식에서, MV_offset 은 레퍼런스 픽처들 중 하나에서 초기 MV 와 정세화된 MV 사이의 정세화 오프셋을 나타낸다. 정세화 탐색 범위는 초기 MV 로부터의 2개의 정수 루마 샘플들이다. 탐색은 정수 샘플 오프셋 탐색 스테이지 및 분수 샘플 정세화 스테이지를 포함한다.In the above equation, MV_offset represents a refinement offset between an initial MV and a refined MV in one of the reference pictures. The refinement search range is two integer luma samples from the initial MV. The search includes an integer-sample offset search stage and a fractional-sample refinement stage.

25 포인트 전체 탐색이 정수 샘플 오프셋 탐색에 적용된다. 초기 MV 쌍의 SAD 가 먼저 계산된다. 초기 MV 쌍의 SAD 가 임계치보다 작으면, DMVR 의 정수 샘플 스테이지가 종료된다. 그렇지 않으면, 나머지 24 포인트들의 SAD들이 계산되고 래스터 스캐닝 순서로 체크된다. 가장 작은 SAD 를 갖는 포인트가 정수 샘플 오프셋 탐색 스테이지의 출력으로서 선택된다. DMVR 정세화의 불확실성의 페널티를 감소시키기 위해, DMVR 프로세스 동안 오리지널 MV 가 선호될 수도 있다. 초기 MV 후보들에 의해 참조되는 레퍼런스 블록들 사이의 SAD 는 SAD 값의 1/4 만큼 감소된다.A 25-point full search is applied to the integer-sample offset search. The SAD of the initial MV pair is calculated first. If the SAD of the initial MV pair is less than the threshold, the integer-sample stage of the DMVR ends. Otherwise, the SADs of the remaining 24 points are calculated and checked in raster scanning order. The point with the smallest SAD is selected as the output of the integer-sample offset search stage. To reduce the penalty of uncertainty in DMVR refinement, the original MV may be preferred during the DMVR process. SAD between reference blocks referred to by initial MV candidates is reduced by 1/4 of the SAD value.

정수 샘플 탐색에는 분수 샘플 정세화가 뒤따른다. 계산 복잡도를 절약하기 위해, 분수 샘플 정세화가, SAD 비교로의 추가적인 탐색 대신, 파라메트릭 에러 표면 방정식을 사용함으로써 도출된다. 분수 샘플 정세화는 정수 샘플 탐색 스테이지의 출력에 기초하여 조건부로 호출된다. 정수 샘플 탐색 스테이지가 제 1 반복 또는 제 2 반복 탐색 중 어느 하나에서 가장 작은 SAD 를 갖는 중심으로 종료될 경우, 분수 샘플 정세화가 추가로 적용된다.Integer sample search is followed by fractional sample refinement. To save computational complexity, the fractional sample refinement is derived by using the parametric error surface equation instead of an additional search into the SAD comparison. Fractional sample refinement is invoked conditionally based on the output of the integer sample search stage. If the integer-sample search stage ends with the centroid with the smallest SAD in either the first iteration or the second iteration search, fractional-sample refinement is additionally applied.

파라메트릭 에러 표면 기반 서브-픽셀 오프셋들 추정에 있어서, 중심 포지션 비용 및 중심으로부터의 4개의 이웃 포지션들에서의 비용들은 다음의 형태의 2D 포물선형 에러 표면 방정식에 맞추는데 사용된다:For parametric error surface based sub-pixel offsets estimation, the center position cost and the costs at 4 neighboring positions from the center are used to fit a 2D parabolic error surface equation of the form:

상기의 식에서, (x_min, y_min) 은 가장 적은 비용을 갖는 분수 포지션에 대응하고, C 는 최소 비용 값에 대응한다. 5개 탐색 포인트들의 비용 값을 사용함으로써 상기의 식들을 푸는 것에 의해, (x_min, y_min) 은 다음과 같이 컴퓨팅된다:In the above equation, (x _min , y _min ) corresponds to the fractional position with the smallest cost, and C corresponds to the minimum cost value. By solving the above equations by using the cost value of the 5 search points, (x _min , y _min ) is computed as:

x_min 및 y_min 의 값은, 모든 비용 값들이 양수이고 가장 작은 값이 E(0,0) 이기 때문에 자동으로 - 8 과 8 사이로 제약된다. 이는 VVC 에서 1/16-펠 (pel) MV 정확도를 갖는 하프 펠 오프셋에 대응한다. 컴퓨팅된 분수 (x_min, y_min) 는 서브-픽셀 정확한 정세화 델타 MV 를 얻기 위해 정수 거리 정세화 MV 에 추가된다.The values of x _min and y _min are automatically constrained between -8 and 8 since all cost values are positive and the smallest value is E(0,0) . This corresponds to a half pel offset with 1/16-pel MV accuracy in VVC. The computed fraction (x _min , y _min ) is added to the integer distance refinement MV to obtain a sub-pixel accurate refinement delta MV.

다음은 쌍선형 보간 및 샘플 패딩을 설명한다. VVC 에 있어서, MV들의 해상도는 1/16 루마 샘플들이다. 분수 포지션에서의 샘플들은 8탭 보간 필터를 사용하여 보간된다. DMVR 에 있어서, 탐색 포인트들은 정수 샘플 오프셋을 갖는 초기 분수-펠 MV 를 둘러싸고 있고, 따라서, 이들 분수 포지션의 샘플들은 DMVR 탐색 프로세스를 위해 보간될 수도 있다. 계산 복잡도를 감소시키기 위해, 쌍선형 보간 필터가, DMVR 에서 탐색 프로세스를 위한 분수 샘플들을 생성하는데 사용된다. 일부 예들에서, 2-샘플 탐색 범위를 갖는 쌍선형 필터를 사용함으로써, DVMR 은 정상 모션 보상 프로세스에 비해 더 많은 레퍼런스 샘플들에 액세스하지 않는다. 정세화된 MV 가 DMVR 탐색 프로세스로 달성된 이후, 정상 8탭 보간 필터가 최종 예측을 생성하기 위해 적용된다. 정상 MC 프로세스에 대한 더 많은 레퍼런스 샘플들에 액세스하지 않기 위해, 오리지널 MV 에 기초한 보간 프로세스에 대해서는 필요하지 않지만 정세화된 MV 에 기초한 보간 프로세스에 대해서는 필요한 샘플들이, 이들 이용가능한 샘플들로부터 패딩될 것이다.The following describes bilinear interpolation and sample padding. For VVC, the resolution of MVs is 1/16 luma samples. Samples at fractional positions are interpolated using an 8-tap interpolation filter. For DMVR, search points enclose an initial fractional-pel MV with an integer-sample offset, and thus samples at these fractional positions may be interpolated for the DMVR search process. To reduce computational complexity, a bilinear interpolation filter is used to generate fractional samples for the search process in the DMVR. In some examples, by using a bilinear filter with a 2-sample search range, DVMR does not access more reference samples than a normal motion compensation process. After the refined MV is achieved with the DMVR search process, a normal 8-tap interpolation filter is applied to generate the final prediction. In order not to access more reference samples for the normal MC process, samples that are not needed for the interpolation process based on the original MV but are needed for the interpolation process based on the refined MV will be padded from these available samples.

다음은 DMVR 에 대한 조건들을 인에이블하는 예를 설명한다. 다음의 조건들이 모두 만족되면 DMVR 은 인에이블된다.The following describes an example of enabling conditions for DMVR. DMVR is enabled when all of the following conditions are satisfied.

a. 양방향 예측 MV 를 갖는 CU 레벨 병합 모드a. CU level merge mode with bi-predictive MV

b. 현재 픽처에 대해, 하나의 레퍼런스 픽처는 과거에 있고 다른 하나의 레퍼런스 픽처는 미래에 있음b. For a current picture, one reference picture is in the past and the other reference picture is in the future.

c. 양자 모두의 레퍼런스 픽처들로부터 현재 픽처까지의 거리들 (즉, POC 차이) 은 동일함c. Distances from both reference pictures to the current picture (ie, POC difference) are the same

d. CU 는 64개 초과의 루마 샘플들을 가짐d. CU has more than 64 luma samples

e. CU 높이 및 CU 폭 양자 모두는 8개 루마 샘플들 이상임e. Both CU height and CU width are greater than or equal to 8 luma samples

f. BCW (CU-레벨 가중치들을 갖는 양방향 예측) 가중치 인덱스는 동일한 가중치를 표시함f. BCW (bidirectional prediction with CU-level weights) weight index indicates equal weight

g. WP (가중 예측) 는 현재 블록에 대해 인에이블되지 않음g. WP (Weighted Prediction) is not enabled for the current block

h. 결합된 인터 및 인트라 예측 (CIIP) 모드가 현재 블록에 대해 사용되지 않음h. Combined Inter and Intra Prediction (CIIP) mode not used for current block

다음은 양방향 광학 플로우를 설명한다. 양방향 광학 플로우 (BDOF) 은 4×4 서브블록 레벨에서 CU 에서의 루마 샘플들의 양방향 예측 신호를 정세화하는데 사용된다. 그 명칭이 나타내는 바와 같이, BDOF 모드는, 오브젝트의 모션이 평활함을 가정하는 광학 플로우 개념에 기초한다. 각각의 4×4 서브블록에 대해, 모션 정세화 (v_x, v_y) 는 L0 및 L1 예측 샘플들 사이의 차이를 최소화함으로써 계산된다. 그 다음, 모션 정세화는 4x4 서브블록에서 양방향 예측된 샘플 값들을 조정하는데 사용된다.The bi-directional optical flow is described next. Bi-directional optical flow (BDOF) is used to refine the bi-directional prediction signal of the luma samples in the CU at the 4x4 sub-block level. As its name indicates, BDOF mode is based on the optical flow concept, which assumes that the object's motion is smooth. For each 4x4 subblock, the motion refinement (v _x , v _y ) is computed by minimizing the difference between the L0 and L1 prediction samples. Motion refinement is then used to adjust the bi-predicted sample values in the 4x4 subblock.

예를 들어, BDOF 에 대해, 비디오 인코더 (200) 및 비디오 디코더는 BDOF 가 블록에 대해 인에이블됨을 결정하고, BDOF 가 블록에 대해 인에이블될 경우에 블록을 복수의 서브블록들로 분할할 수도 있다. 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 블록에 대한 제 1 모션 벡터로부터 제 1 레퍼런스 블록을, 그리고 블록에 대한 제 2 모션 벡터로부터 제 2 레퍼런스 블록을 결정할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 예측 블록을 생성하기 위해 제 1 레퍼런스 블록에서의 샘플들과 제 2 레퍼런스 블록에서의 샘플들을 블렌딩 (예를 들어, 가중 평균) 할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 모션 정세화를 결정하고, 서브블록의 샘플들을 인코딩 또는 디코딩하기 위해 사용되는 예측 샘플들을 생성하기 위해 예측 블록에서의 샘플들을 조정할 수도 있다. 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 서브블록에서의 각각의 샘플에 대해 동일한 모션 정세화 (즉, 서브블록 BDOF 로서 지칭되는 서브블록 레벨 모션 정세화) 를 결정할 수도 있다. 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 서브블록에서의 각각의 샘플들의 모션 정세화 (즉, 픽셀 당 BDOF 로서 지칭되는 샘플 레벨 모션 정세화) 를 결정할 수도 있다.For example, for BDOF, video encoder 200 and video decoder may determine that BDOF is enabled for a block and divide the block into a plurality of subblocks if BDOF is enabled for the block. . In some examples, video encoder 200 and video decoder 300 may determine a first reference block from a first motion vector for a block and a second reference block from a second motion vector for a block. Video encoder 200 and video decoder 300 may blend (eg, weight average) the samples in the first reference block with the samples in the second reference block to generate a predictive block. Video encoder 200 and video decoder 300 may determine motion refinement and adjust samples in a predictive block to generate predictive samples used to encode or decode the samples of the subblock. In some examples, video encoder 200 and video decoder 300 may determine the same motion refinement for each sample in a subblock (ie, a subblock level motion refinement referred to as a subblock BDOF). In some examples, video encoder 200 and video decoder 300 may determine a motion refinement of each sample in a subblock (ie, sample-level motion refinement, referred to as per-pixel BDOF).

다음의 단계들은 BDOF 프로세스에서 적용되며, 이는 서브블록 BDOF 에 적용가능할 수도 있다. 픽셀 당 BDOF 에 대한 단계들은 하기에서 추가로 더 상세히 설명된다.The following steps apply in the BDOF process, which may be applicable to subblock BDOF. The steps for per-pixel BDOF are described in further detail below.

먼저, 2개의 예측 신호들의 수평 및 수직 그래디언트들 ( 및 , ) 이, 2개의 이웃 샘플들 사이의 차이를 직접 계산함으로써 컴퓨팅된다, 즉,First, the horizontal and vertical gradients of the two prediction signals ( and , ) is computed by directly computing the difference between two neighboring samples, i.e.

상기의 예에서, I^(k)(i,j) 는 리스트 k (k = 0, 1) 에서의 예측 신호의 좌표 (i,j) 에서의 샘플 값이고, shift1 은, shift1 이 6 과 동일하게 설정됨에 따라, 루마 비트 심도 (bitDepth) 에 기초하여 계산된다. 즉, I⁽⁰⁾ 은 제 1 레퍼런스 블록의 샘플들을 지칭하고, I⁽¹⁾ 은 제 2 레퍼런스 블록의 샘플들을 지칭하며, 여기서, 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록은, 그 샘플들이 BDOF 기법들에 따라 조정되고 있는 예측 블록을 생성하는데 사용되었다.In the example above, I ^(k) (i,j) is the sample value at coordinates (i,j) of the prediction signal in list k (k = 0, 1), and shift1 is equal to shift1 equal to 6 As set, it is calculated based on the luma bit depth (bitDepth). That is, I ⁽⁰⁾ refers to samples of the first reference block, and I ⁽¹⁾ refers to samples of the second reference block, where the first reference block and the second reference block are It was used to generate a prediction block that is being adjusted according to the

그 다음, 그래디언트들 (S₁, S₂, S₃, S₅ 및 S₆) 의 자기상관 및 상호상관이 다음과 같이 계산된다:Then, the autocorrelation and cross-correlation of the gradients (S ₁ , S ₂ , S ₃ , S ₅ and S ₆ ) are calculated as follows:

여기서,here,

여기서, 는 4×4 서브블록 주변의 6×6 윈도우이고, shift2 의 값은 4 와 동일하게 설정되고, shift3 의 값은 1 과 동일하게 설정된다.here, is a 6x6 window around a 4x4 subblock, the value of shift2 is set equal to 4, and the value of shift3 is set equal to 1.

그 다음, 모션 정세화 (v_x, v_y) 가, 다음을 사용하는 상호상관 및 자기상관 항들을 사용하여 도출된다. 이 예에서, 모션 정세화는 서브블록에 대한 것이다. 픽셀 당 모션 정세화 계산은 하기에서 더 상세히 설명된다.The motion refinement (v _x , v _y ) is then derived using the cross-correlation and autocorrelation terms using In this example, motion refinement is for subblocks. The per-pixel motion refinement computation is described in more detail below.

여기서, 이다. 은 플로어 함수이다.here, am. is the floor function.

모션 정세화 및 그래디언트들에 기초하여, 다음의 조정이, 4×4 서브블록에서의 각각의 샘플에 대해 계산된다:Based on the motion refinement and gradients, the following adjustment is computed for each sample in the 4x4 subblock:

마지막으로, CU 의 BDOF 샘플들은 다음과 같이 양방향 예측 샘플들을 조정함으로써 계산된다:Finally, the CU's BDOF samples are calculated by adjusting the bi-prediction samples as follows:

여기서, shift5 는 Max(3, 15 - BitDepth) 와 동일하게 설정되고, 변수 o_offset 은 (1 << (shift5 - 1)) 과 동일하게 설정된다.Here, shift5 is set equal to Max(3, 15 - BitDepth), and the variable o _offset is set equal to (1 << (shift5 - 1)).

상기의 예들에서, I⁽⁰⁾ 은 제 1 레퍼런스 블록을 지칭하고, I⁽¹⁾ 은 제 2 레퍼런스 블록을 지칭하며, b(x,y) 는, 서브블록에 대한 모션 정세화 (v_x, v_y) 에 기초하여 결정되는 조정 값이다. 일부 예들에서, I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) 는 예측 블록으로서 고려될 수도 있고, 따라서, b(x,y) 는 예측 블록을 조정하는 것으로서 고려될 수도 있다. 식 (1-6-6) 에 나타낸 바와 같이, 예측 샘플들 (pred_BDOF(x,y)) 을 생성하기 위해 o_offset 의 가산 및 shift5 에 의한 우측 시프트 연산이 존재할 수도 있다.In the above examples, I ⁽⁰⁾ refers to the first reference block, I ⁽¹⁾ refers to the second reference block, and b(x,y) is the motion refinement for the subblock (v _x , v _y ) is an adjustment value determined based on In some examples, I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) may be considered as a predictive block, and thus b(x,y) may be considered as adjusting the predictive block there is. As shown in equation (1-6-6), there may be an addition of o _offset and a right shift operation by shift5 to generate prediction samples (pred _BDOF (x,y)).

상기는, 비디오 인코더 (200) 및 비디오 디코더 (300) 가, 서브블록에서의 모든 샘플들에 대해 동일한 모션 정세화 (v_x, v_y) 가 동일함을 결정하는 서브블록 BDOF 에 대한 일 예를 설명한다. 조정 값 b(x,y) 는 그래디언트 때문에 서브블록에서의 각각의 샘플에 대해 상이할 수도 있지만, 모션 정세화는 동일할 수도 있다.The above describes an example for a subblock BDOF in which video encoder 200 and video decoder 300 determine that the same motion refinement (v _x , v _y ) is equal for all samples in the subblock do. The adjustment value b(x,y) may be different for each sample in a subblock because of the gradient, but the motion refinement may be the same.

하기에서 추가로 더 상세히 설명되는 바와 같이, 픽셀 당 BDOF 에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 모션 정세화 (v_x', v_y') 를 결정할 수도 있다. 즉, 서브블록 BDOF 에서와 같이, 서브블록에 대해 하나의 모션 정세화가 있기 보다는, 픽셀 당 BDOF 에서는, 각각의 샘플 (예를 들어, 픽셀) 에 대해 상이한 모션 정세화가 있을 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 서브블록에 대해 동일한 모션 정세화를 사용하기 보다는, 그 샘플에 대한 대응하는 픽셀 당 모션 정세화에 기초하여 각각의 샘플에 대한 조정 값 b'(x,y) 를 결정할 수도 있다.As described in further detail below, in per-pixel BDOF, video encoder 200 and video decoder 300 may determine a per-pixel motion refinement (v _x ', v _y '). That is, rather than having one motion refinement per subblock, as with subblock BDOF, in BDOF per pixel, there may be a different motion refinement for each sample (eg, pixel). Rather than using the same motion refinement for a subblock, video encoder 200 and video decoder 300 base the adjustment value b'(x, y) can be determined.

일부 예들에 있어서, 식 (1-6-6) 으로부터의 값들은, BDOF 프로세스에서의 승수들이 15 비트를 초과하지 않도록 선택되고, BDOF 프로세스에서의 중간 파라미터들의 최대 비트 폭은 32 비트 내에서 유지된다.In some examples, the values from equation (1-6-6) are selected such that multipliers in the BDOF process do not exceed 15 bits, and the maximum bit width of intermediate parameters in the BDOF process is kept within 32 bits. .

그래디언트 값들을 도출하기 위해, 현재 CU 경계들의 외부에 있는 리스트 k (k = 0, 1) 에서의 일부 예측 샘플들 (I^(k)(i,j)) 이 비디오 인코더 (200) 및 비디오 디코더 (300) 에 의해 생성된다. 도 12 에 도시된 바와 같이, BDOF 는 CU (1200) 의 경계들 주변의 하나의 확장된 행/열을 사용한다. 경계 밖 예측 샘플들을 생성하는 계산 복잡도를 제어하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 보간없이 직접 (좌표들 상의 floor() 연산을 사용하여) 근처 정수 포지션들에서 레퍼런스 샘플들을 취함으로써 확장된 영역 (백색 포지션들) 에서 예측 샘플들을 생성할 수도 있고, 정상 8탭 모션 보상 보간 필터가, CU 내의 예측 샘플들 (그레이 포지션들) 을 생성하기 위해 사용된다. 이들 확장된 샘플 값들은 그래디언트 계산에서만 사용될 수도 있다. BDOF 프로세스에서의 나머지 단계들에 대해, CU 경계들 외부의 임의의 샘플 및 그래디언트 값들이 필요하면, 샘플 및 그래디언트 값들은 그들의 최근접 이웃들로부터 패딩된다 (즉, 반복됨).To derive the gradient values, some prediction samples (I ^(k) (i, j)) in list k (k = 0, 1) outside the current CU boundaries are used by the video encoder 200 and the video decoder ( 300) is created by As shown in FIG. 12 , BDOF uses one extended row/column around the borders of CU 1200 . To control the computational complexity of generating out-of-bounds prediction samples, video encoder 200 and video decoder 300 take reference samples at nearby integer positions directly (using floor() operations on coordinates) without interpolation. , and a normal 8-tap motion compensation interpolation filter is used to generate prediction samples (gray positions) within the CU. These extended sample values may only be used in gradient calculations. For the remaining steps in the BDOF process, if any sample and gradient values outside the CU boundaries are needed, the sample and gradient values are padded (ie, repeated) from their nearest neighbors.

BDOF 는, 4×4 서브블록 레벨에서 CU 의 양방향 예측 신호 (예를 들어, 제 1 레퍼런스 블록과 제 2 레퍼런스 블록의 합) 를 정세화하는데 사용된다. BDOF 는, 다음의 조건들의 모두가 만족되면 CU 에 적용된다:BDOF is used to refine a bi-directional prediction signal (eg, the sum of a first reference block and a second reference block) of a CU at a 4×4 subblock level. BDOF is applied to a CU if all of the following conditions are met:

a. CU 는 "참" 양방향 예측 모드를 사용하여 코딩됨, 즉, 2개의 레퍼런스 픽처들 중 하나는 디스플레이 순서에서 현재 픽처 전에 있고 다른 하나는 디스플레이 순서에서 현재 픽처 뒤에 있음a. CU is coded using “true” bi-prediction mode, i.e., one of the two reference pictures is before the current picture in display order and the other is after the current picture in display order

b. CU 는 아핀 모드 또는 ATMVP 병합 모드를 사용하여 코딩되지 않음b. CU is not coded using affine mode or ATMVP merge mode

c. CU 는 64개 초과의 루마 샘플들을 가짐c. CU has more than 64 luma samples

d. CU 높이 및 CU 폭 양자 모두는 8개 루마 샘플들 이상임d. Both CU height and CU width are greater than or equal to 8 luma samples

e. BCW 가중치 인덱스는 동일한 가중치를 표시함e. BCW weight index shows equal weight

f. WP 는 현재 CU 에 대해 인에이블되지 않음f. WP currently not enabled for CU

g. CIIP 모드가 현재 CU 에 대해 사용되지 않음g. CIIP mode is currently not used for CU

BDOF 로 일부 문제점들이 존재할 수도 있다. 상기에서 설명된 바와 같이, VVC 의 현재 버전에서, BDOF 방법은 4×4 서브블록 레벨에서 코딩 블록에서의 루마 샘플들의 양방향 예측 신호를 정세화하는데 사용된다. 모션 정세화 (v_x, v_y) 는, 6×6 루마 샘플 영역들에서 L0 및 L1 예측 샘플들 사이의 차이를 최소화함으로써 도출된다. L0 예측 샘플들은 제 1 레퍼런스 블록의 샘플들을 지칭하고, L1 예측 샘플들은 제 2 레퍼런스 블록의 샘플을 지칭한다. 그 다음, 모션 정세화 (v_x, v_y) 는 4x4 서브블록의 각각의 예측 샘플을 조정하는데 사용된다.Some problems may exist with BDOF. As described above, in the current version of VVC, the BDOF method is used to refine the bidirectional prediction signal of luma samples in a coding block at the 4x4 subblock level. The motion refinement (v _x , v _y ) is derived by minimizing the difference between L0 and L1 prediction samples in 6x6 luma sample regions. L0 prediction samples refer to samples of the first reference block, and L1 prediction samples refer to samples of the second reference block. Then, the motion refinement (v _x , v _y ) is used to adjust each prediction sample of the 4x4 subblock.

하지만, 4×4 서브블록에서의 루마 샘플은, 4×4 서브블록에서의 다른 루마 샘플들과 비교하여 상이한 모션 정세화 특성을 가질 수도 있다. 픽셀 레벨에서의 모션 정세화 (v'_x, v'_y) 를 계산하는 것은 각각의 픽셀에 대한 모션 정세화의 정확도를 개선할 수 있고, 따라서, 서브블록 또는 블록 예측 품질을 개선할 수 있다.However, a luma sample in a 4x4 subblock may have different motion refinement characteristics compared to other luma samples in a 4x4 subblock. Computing the motion refinement (v' _x , v' _y ) at the pixel level can improve the accuracy of the motion refinement for each pixel, and thus improve sub-block or block prediction quality.

하지만, BDOF 는 디코더측 프로세스이고, BDOF 의 복잡도가 또한, 비디오 코딩 방법을 설계할 때 고려되어야 할 중요한 양태이다. 모션 정세화 (v'_x, v'_y) 가 픽셀-레벨에서 계산될 경우, BDOF 의 복잡도는 4×4 서브블록 레벨에서의 현재 BDOF 에 비해 16배일 수 있다. 즉, 현재의 4×4 서브블록 BDOF 는 최상의 예측 품질을 달성하지 못한다. 픽셀 당 BDOF 는 더 양호한 예측 품질을 갖지만, 복잡도는 비디오 코딩에 대한 문제이다.However, BDOF is a decoder-side process, and the complexity of BDOF is also an important aspect to consider when designing a video coding method. If the motion refinement (v' _x , v' _y ) is computed at the pixel-level, the complexity of BDOF can be 16 times as compared to the current BDOF at the 4x4 subblock level. That is, the current 4x4 subblock BDOF does not achieve the best prediction quality. BDOF per pixel has better prediction quality, but complexity is an issue for video coding.

VVC 드래프트 10 에서, 디코더측 모션 벡터 정세화 (DMVR) 가 BDOF 에 의해 선행당할 때, BDOF 프로세스는 DMVR 탐색 프로세스에서의 최소 SAD 에 기초하여 바이패스될 수 있다. DMVR 프로세스는 16×16 서브블록 레벨에 있다. 이러한 BDOF 바이패스 방식은 복잡도를 감소시킬 수 있다.In VVC Draft 10, when decoder-side motion vector refinement (DMVR) is preceded by BDOF, the BDOF process can be bypassed based on the minimum SAD in the DMVR search process. The DMVR process is at the 16x16 subblock level. This BDOF bypass scheme can reduce complexity.

하지만, 16×16 서브블록 내의 서브-영역의 예측 신호는 BDOF 에 의해 정세화될 필요가 있을 수도 있다. VVC 드래프트 10 방식의 BDOF 바이패스는 16×16 서브블록 내의 서브-영역에서 BDOF 를 적용할 수 없으며, 한편, 다른 서브-영역들에서 BDOF 를 바이패스할 수 있다. VVC 드래프트 10 에서, BDOF 가 (DMVR 예측되지 않은) 양방향 예측된 코딩 블록에 적용될 때 바이패스 BDOF 방식이 존재하지 않는다.However, the prediction signal of a sub-region within a 16x16 subblock may need to be refined by BDOF. In the BDOF bypass of the VVC draft 10 method, BDOF cannot be applied to a sub-region within a 16×16 subblock, while BDOF can be bypassed in other sub-regions. In VVC Draft 10, there is no bypass BDOF scheme when BDOF is applied to bi-directionally predicted (non-DMVR predicted) coding blocks.

다음은 상기의 문제들을 해결할 수도 있는 예시적인 기법들을 설명한다. 하지만, 그 기법들은 상기의 문제들을 해결하기 위해 제한되거나 요구되는 것으로 간주되지 않아야 한다. 다음의 기법들은, 실제로, 별도로 또는 임의의 조합으로 사용될 수도 있다. 편의상, 다음의 기법들은 다양한 양태들로서 설명되지만, 그러한 양태들은 별개이도록 요구된 것으로서 고려되지 않아야 하며, 실제로, 다양한 양태들이 결합될 수 있다. 예시적인 양태들은, 달리 명시되지 않는 한, 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 에 의해 수행될 수도 있다.The following describes example techniques that may address the above problems. However, the techniques should not be considered limited or required to address the above problems. The following techniques may, in practice, be used separately or in any combination. For convenience, the following techniques are described as various aspects, but such aspects are not to be considered as being required to be separate and, in fact, the various aspects may be combined. Example aspects may be performed by video encoder 200 and/or video decoder 300 unless specified otherwise.

제 1 양태는 서브블록 BDOF 를 바이패스하는 것에 관한 것이다. 이러한 제 1 양태에서, W×H 코딩 블록이 양방향 광학 플로우 (BDOF) 를 적용하도록 결정될 때, 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 는 코딩 블록의 서브 영역에 대한 BDOF 프로세스를 바이패스할 수도 있다. 제 1 양태에 대한 BDOF 프로세스는 다음과 같을 수도 있다.A first aspect relates to bypassing the subblock BDOF. In this first aspect, when a W×H coding block is determined to apply bi-directional optical flow (BDOF), video encoder 200 and/or video decoder 300 bypass the BDOF process for sub-regions of the coding block. You may. The BDOF process for the first aspect may be as follows.

a. BDOF 프로세스는 입력 블록 (S1 로서 명명됨) 으로 시작하며, 여기서, S1 은 치수 W_1 × H_1 을 갖고, S1 의 치수는 코딩 블록의 치수 이하이다. 선행된 프로세스가 블록 기반인 경우, S1 의 치수는 코딩 블록과 동일하다. 선행된 프로세스가 서브블록 기반 (하드웨어 제약으로 인한 또는 이전 프로세싱 스테이지로부터의 서브블록 파티션) 인 경우, S1 의 치수는 코딩 블록 미만이다.a. The BDOF process starts with an input block (named S1), where S1 has dimensions W_1 x H_1, and the dimension of S1 is less than or equal to the dimension of the coding block. If the preceding process is block-based, the dimension of S1 is equal to the coding block. If the preceding process is subblock-based (subblock partition due to hardware constraints or from previous processing stages), the dimension of S1 is less than the coding block.

b. 입력 블록 (S1) 은 N개의 서브블록들 (S2 로서 명명됨) 로 분할되며, 여기서 S2 는 치수 W_2×H_2 를 갖고, S2 의 치수는 S1 의 치수 이하이다. 조건 T 에 의해 결정되는 각각의 S2 에 대해, S2 는 BDOF 를 적용할지 여부에 대해 결정된다. 일부 예들에서, 조건 T 는, 레퍼런스 픽처 0 및 레퍼런스 픽처 1 에서의 2개의 예측 신호들 사이의 SAD 가 임계치보다 작은지 여부를 체크하는 것이다. 이 단계에서의 서브블록은, 유닛 내의 모든 샘플들에 BDOF 를 적용할지 여부를 결정하기 위한 기본 유닛을 정의한다.b. The input block S1 is divided into N subblocks (named S2), where S2 has dimensions W_2xH_2, and the dimension of S2 is less than or equal to the dimension of S1. For each S2 determined by condition T, S2 determines whether to apply BDOF. In some examples, condition T is to check whether the SAD between the two prediction signals in reference picture 0 and reference picture 1 is less than a threshold. A subblock in this step defines a basic unit for determining whether to apply BDOF to all samples in the unit.

c. S2 에 BDOF 를 적용하도록 결정될 경우, S2 는 M개의 서브블록들 (S3 으로서 명명됨) 로 분할되며, 여기서 S3 은 치수 W_3×H_3 을 갖고, S3 의 치수는 S2 의 치수 이하이다. 각각의 S3 에 대해, BDOF 프로세스는 정세화된 모션 벡터 (v'_x, v'_y) 를 도출하기 위해 적용되고, 도출된 모션 벡터를 사용하여 (모션 보상을 통해 또는 초기 예측된 신호에 오프셋을 가산하는 것을 통해) S3 의 예측 신호를 도출한다. 이 단계에서의 서브블록은 정세화된 모션 벡터의 입도에 대한 유닛을 정의하고, 유닛 내의 모든 샘플들은 동일한 정세화된 모션을 공유한다.c. When it is decided to apply BDOF to S2, S2 is divided into M subblocks (named as S3), where S3 has dimensions W_3×H_3, and the dimension of S3 is less than or equal to the dimension of S2. For each S3, a BDOF process is applied to derive a refined motion vector (v' _x , v' _y ), using the derived motion vector (either via motion compensation or by adding an offset to the initial predicted signal). through) derive the prediction signal of S3. A subblock in this step defines a unit for the granularity of the refined motion vector, and all samples in the unit share the same refined motion.

양태 1 의 BDOF 프로세스에서, 블록들 (S1, S2 및 S3) 이 정의된다. S3 의 치수는 S2 이하일 수도 있고, S2 의 치수는 S1 이하일 수도 있다. 즉, W_3 은 W_2 이하이고 H_3 은 H_2 이하이며, W_2 는 W_1 이하이고 H_2 는 H_1 이하이다. 사이즈들은 고정되거나, 픽처 해상도에 적응되거나, 또는 비트스트림에서 시그널링될 수도 있다.In the BDOF process of aspect 1, blocks S1, S2 and S3 are defined. The dimension of S3 may be less than or equal to S2, and the dimension of S2 may be less than or equal to S1. That is, W_3 is less than W_2, H_3 is less than H_2, W_2 is less than W_1, and H_2 is less than H_1. Sizes may be fixed, adapted to picture resolution, or signaled in the bitstream.

하나의 경우는 W_3 이 1 과 동일하고 H_3 이 1 과 동일한 것이며, 여기서, S3 은 픽셀 기반이다. 이 경우는 픽셀 당 BDOF 프로세스일 수도 있다.One case is that W_3 equals 1 and H_3 equals 1, where S3 is pixel-based. This case may also be a per-pixel BDOF process.

일부 예들에서, 선행된 서브블록 기반 프로세스가 코딩 블록에 적용되는지 여부와 무관하게, S1 은 코딩 블록이다.In some examples, S1 is a coding block, regardless of whether the preceding subblock-based process is applied to the coding block.

제 2 양태는 서브블록 BDOF 바이패스 방식을 갖는 픽셀 당 BDOF 에 관한 것이다. 제 1 양태에서와 같이, W×H 코딩 블록 (S1) 이 양방향 광학 플로우 (BDOF) 를 적용하도록 결정될 경우, 코딩 블록은 N개의 서브블록들 (S2) 로 분할된다. 각각의 서브블록에 대해, BDOF 를 서브블록에 적용할지 여부는, 레퍼런스 픽처 0 및 레퍼런스 픽처 1 에서의 2개의 예측 신호들 사이의 SAD 가 임계치보다 작은지 여부를 체크함으로써 추가로 결정된다. BDOF 를 서브블록에 적용하도록 결정되면, 정세화된 모션 벡터 (v'_x, v'_y) 가 서브블록 (S2) 내의 각각의 픽셀 (S3) 에 대해 계산된다. 정세화된 모션 벡터 (v'_x, v'_y) 는 서브블록 (S2) 내의 그 픽셀 (S3) 에 대한 예측된 신호를 조정하는데 사용된다. 서브블록 바이패스 프로세스를 갖는 픽셀 당 BDOF 의 일 예가 도 13 에 도시된다.A second aspect relates to per-pixel BDOF with a sub-block BDOF bypass scheme. As in the first aspect, when a W×H coding block S1 is determined to apply bidirectional optical flow (BDOF), the coding block is divided into N subblocks S2. For each subblock, whether to apply BDOF to the subblock is further determined by checking whether the SAD between two prediction signals in reference picture 0 and reference picture 1 is less than a threshold value. If it is decided to apply BDOF to a subblock, a refined motion vector (v' _x , v' _y ) is calculated for each pixel S3 in subblock S2. The refined motion vector (v' _x , v' _y ) is used to adjust the predicted signal for that pixel S3 in sub-block S2. An example of per-pixel BDOF with subblock bypass process is shown in FIG. 13 .

예를 들어, 도 13 에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 BDOF 가 비디오 데이터의 블록에 대해 인에이블됨을 결정할 수도 있고, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할할 수도 있다. 도 13 에 예시된 바와 같이, 서브블록의 수 (N) 서브블록 인덱스 <i=0> 를 도출하는 것 (1300) 은 비디오 인코더 (200) 및 비디오 디코더 (300) 가 블록을 N개의 서브블록들로 분할하는 것을 지칭하며, 여기서, 각각의 서브블록은 개별 인덱스에 의해 식별되고, 제 1 인덱스는 0 이다. 따라서, 인덱스들은 0 내지 N-1 의 범위이다.For example, in FIG. 13 , video encoder 200 and video decoder 300 may determine that BDOF is enabled for a block of video data, and video encoder 200 and video decoder 300 may determine that BDOF is A block may be divided into a plurality of subblocks based on the determination that is enabled for . As illustrated in FIG. 13 , deriving the number of subblocks (N) subblock index <i=0> ( 1300 ) causes video encoder 200 and video decoder 300 to divide a block into N subblocks. , where each subblock is identified by a separate index, and the first index is 0. Thus, indices range from 0 to N-1.

비디오 인코더 (200) 및 비디오 디코더 (300) 는, i < N 에 의해 표현된 바와 같이, 블록들에서의 모든 서브블록들에 대한 예측 샘플들이 결정되었는지 여부를 결정할 수도 있다 (1302). 모든 서브블록들에 대한 예측 샘플들이 결정되었으면 (1302 의 "아니오"), 비디오 인코더 (200) 및 비디오 디코더 (300) 는 서브블록들에 대한 예측 샘플들을 결정하는 프로세스를 종료할 수도 있다. 하지만, 모든 서브블록들에 대한 예측 샘플들이 결정되지 않았으면 (1302 의 "예"), 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 블록이 분할된 복수의 서브블록들 중 현재 서브블록의 예측 샘플들을 결정하는 프로세스를 계속할 수도 있다.Video encoder 200 and video decoder 300 may determine whether predictive samples have been determined for all subblocks in the blocks, as represented by i<N ( 1302 ). If predictive samples for all subblocks have been determined (“NO” of 1302 ), video encoder 200 and video decoder 300 may end the process of determining predictive samples for subblocks. However, if prediction samples for all subblocks have not been determined (Yes in 1302), the video encoder 200 and the video decoder 300 determine the current subblock among a plurality of subblocks from which the block is divided. The process of determining prediction samples may continue.

현재 서브블록에 대해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 왜곡 값을 결정할 수도 있다 (1304). 왜곡 값에 대한 결정이 서브블록 단위 기반으로 수행될 수도 있기 때문에, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들 (예를 들어, 제 1 서브블록에 대한 제 1 왜곡 값, 제 2 서브블록에 대한 제 2 왜곡 값 등) 을 결정하는 것으로서 고려될 수도 있다.For the current subblock, video encoder 200 and video decoder 300 may determine a distortion value ( 1304 ). Because the determination of the distortion value may be performed on a subblock-by-subblock basis, video encoder 200 and video decoder 300, for each subblock of one or more of the plurality of subblocks, separate distortion values (eg, a first distortion value for a first subblock, a second distortion value for a second subblock, etc.).

현재 서브블록에 대한 왜곡 값을 결정하기 위한 하나의 예시적인 방식은 제 1 레퍼런스 블록 (ref0) 과 제 2 레퍼런스 블록 (ref1) 사이의 절대 차이의 합 (SAD) 을 결정함으로써 이루어진다. 하지만, 왜곡 값을 결정하는 다른 방식들이 있을 수도 있다. 예를 들어, 하기에서 추가로 더 상세히 설명되는 바와 같이, 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 비디오 인코더 (200) 및 비디오 디코더 (300) 가 BDOF 를 수행할 때와 같이, 결과적인 값들이 나중에 재사용될 수 있는 방식으로 왜곡 값을 결정할 수도 있다.One exemplary way to determine the distortion value for the current subblock is by determining the sum of absolute differences (SAD) between the first reference block (ref0) and the second reference block (ref1). However, there may be other ways of determining the distortion value. For example, as described in further detail below, in some examples, video encoder 200 and video decoder 300 perform BDOF and video encoder 200 when video decoder 300 perform BDOF Likewise, the distortion value may be determined in such a way that the resulting values may be reused later.

도 13 에 예시된 바와 같이, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 왜곡 값을 임계 값과 비교할 수도 있다 (1306). 비교에 기초하여, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 2개의 옵션들을 가질 수도 있다. 제 1 옵션은 픽셀 당 BDOF 를 수행하는 것일 수도 있고, 제 2 옵션은 BDOF 를 바이패스하는 것일 수도 있다. 서브블록 BDOF 와 같은, 비디오 인코더 (200) 및 비디오 디코더 (300) 에 대한 다른 옵션들이 존재하지 않을 수도 있다. 이에 따라, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 개별 왜곡 값들에 기초하여 (예를 들어, 개별 왜곡 값들의 고정된 임계 값 또는 개별 임계 값들과의 비교에 기초하여) 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 것으로서 고려될 수도 있다.As illustrated in FIG. 13 , video encoder 200 and video decoder 300 may compare the distortion value to a threshold value ( 1306 ). Based on the comparison, video encoder 200 and video decoder 300 may have two options. A first option may be to perform BDOF per pixel, and a second option may be to bypass BDOF. There may not be other options for video encoder 200 and video decoder 300, such as subblock BDOF. Accordingly, video encoder 200 and video decoder 300 divide a plurality of subblocks based on individual distortion values (e.g., based on comparison of individual distortion values to a fixed threshold value or individual threshold values). For each subblock of one or more of the subblocks, one of the BDOFs per pixel is performed or the BDOF is bypassed.

예를 들어, 현재 서브블록에 대한 왜곡 값이 임계 값보다 크면 (1306 의 "아니오"), 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 BDOF 를 수행할 수도 있다 (1308). 현재 서브블록에 대한 왜곡 값이 임계 값보다 작으면 (1306 의 "예"), 비디오 인코더 (200) 및 비디오 디코더 (300) 는 (예를 들어, 서브블록에 대한 BDOF 를 바이패스함으로써) 서브블록에서 예측 신호를 도출할 수도 있다 (1310).For example, if the distortion value for the current subblock is greater than the threshold (“NO” of 1306), video encoder 200 and video decoder 300 may perform per-pixel BDOF (1308). If the distortion value for the current subblock is less than the threshold (“yes” of 1306), then video encoder 200 and video decoder 300 (eg, by bypassing the BDOF for the subblock) subblock A prediction signal may be derived from (1310).

하나 이상의 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정할 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 가 현재 서브블록에 대해 BDOF 를 수행할 것이면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 BDOF 기법들을 사용하여 예측 샘플들을 결정할 수도 있지만, 비디오 인코더 (200) 및 비디오 디코더 (300) 가 현재 서브블록에 대해 BDOF 를 바이패스할 것이면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 BDOF 기법들을 사용하지 않는 예측 샘플들을 결정할 수도 있다.In one or more examples, video encoder 200 and video decoder 300 may determine predictive samples for each subblock of one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed there is. For example, if video encoder 200 and video decoder 300 are to perform BDOF on a current subblock, video encoder 200 and video decoder 300 will use per-pixel BDOF techniques to determine predictive samples. However, if video encoder 200 and video decoder 300 are to bypass BDOF for a current subblock, video encoder 200 and video decoder 300 may determine predictive samples that do not use BDOF techniques. there is.

도 13 의 상기 예는, 픽셀 당 BDOF 가 현재 서브블록에 대해 수행되는지 또는 BDOF 가 바이패스되는지 여부의 결정 방법을 설명하였다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 서브블록 단위 기반으로 상기의 예시적인 기법들을 수행할 수도 있다.The above example of FIG. 13 explained a method of determining whether per-pixel BDOF is performed for the current subblock or whether BDOF is bypassed. Video encoder 200 and video decoder 300 may perform the example techniques above on a subblock-by-subblock basis.

예를 들어, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하기 위해, 하나 이상의 서브블록들 중 제 1 서브블록에 대해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 개별 왜곡 값들의 제 1 왜곡 값을 결정할 수도 있고, 하나 이상의 서브블록들 중 제 2 서브블록에 대해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 개별 왜곡 값들의 제 2 왜곡 값을 결정할 수도 있다.For example, for each sub-block of one or more of the plurality of sub-blocks, for a first sub-block of the one or more sub-blocks to determine respective distortion values, the video encoder 200 and video Decoder 300 may determine a first distortion value of the respective distortion values, and for a second subblock of the one or more subblocks, video encoder 200 and video decoder 300 may determine a second distortion value of the respective distortion values. value can also be determined.

개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들 중 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 복수의 서브블록들 중 제 1 서브블록에 대해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 제 1 왜곡 값에 기초하여 (예를 들어, 제 1 왜곡 값이 임계 값보다 큰 것에 기초하여) 제 1 서브블록에 대해 BDOF 가 인에이블됨을 결정할 수도 있다. 이 예에서, BDOF 가 제 1 서브블록에 대해 인에이블된다는 결정에 기초하여, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 제 1 서브블록에 대한 예측 샘플들의 제 1 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정할 수도 있다 (예를 들어, 픽셀 당 BDOF 를 수행함). 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록의 제 1 샘플에 대해, 제 1 예측 샘플을 정세화하기 위한 제 1 모션 정세화를 도출하고, 제 1 서브블록의 제 2 샘플에 대해, 제 2 예측 샘플을 정세화하기 위한 제 2 모션 정세화를 도출할 수도 있는 등등이다.To determine that one of per-pixel BDOF is performed or BDOF is bypassed for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on respective distortion values; For a subblock, video encoder 200 and video decoder 300 determine whether the BDOF for the first subblock based on the first distortion value (eg, based on the first distortion value being greater than a threshold value) is It can also be determined that it is enabled. In this example, based on determining that BDOF is enabled for the first sub-block, video encoder 200 and video decoder 300 perform per-pixel to refine a first set of prediction samples for the first sub-block. Motion refinement may be determined (eg, per-pixel BDOF is performed). For example, video encoder 200 and video decoder 300 derive, for a first sample of a first subblock, a first motion refinement for refining a first prediction sample, and For 2 samples, we may derive a second motion refinement to refine the second prediction sample, and so on.

하지만, 복수의 서브블록들 중 제 2 서브블록에 대해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 제 2 왜곡 값에 기초하여 (예를 들어, 제 2 왜곡 값이 임계 값보다 작은 것에 기초하여) BDOF 가 바이패스됨을 결정할 수도 있다. 이 예에서, BDOF 가 제 2 블록에 대해 바이패스된다는 결정에 기초하여, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 제 2 서브블록에 대한 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하는 것을 바이패스할 수도 있다 (예를 들어, BDOF 를 바이패스함). 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록의 제 1 샘플에 대해, 제 1 예측 샘플을 정세화하기 위한 제 1 모션 정세화의 도출을 바이패스하고, 제 1 서브블록의 제 2 샘플에 대해, 제 2 예측 샘플을 정세화하기 위한 제 2 모션 정세화의 도출을 바이패스할 수도 있는 등등이다.However, for a second sub-block of the plurality of sub-blocks, the video encoder 200 and the video decoder 300 perform a second distortion value based on a second distortion value (eg, based on the second distortion value being less than a threshold value). so) may determine that BDOF is bypassed. In this example, based on a determination that BDOF is bypassed for the second block, video encoder 200 and video decoder 300 use motion per pixel to refine a second set of prediction samples for the second sub-block. It is also possible to bypass determining refinement (eg, bypass BDOF). For example, video encoder 200 and video decoder 300 bypass, for a first sample of a first subblock, derivation of a first motion refinement for refining a first prediction sample, and For a second sample of the block, it may bypass the derivation of a second motion refinement to refine the second predictive sample, and so forth.

픽셀 당 BDOF 가 수행되는 것 또는 BDOF 가 바이패스되는 것의 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록에 대해, 제 1 서브블록에 대한 픽셀 당 모션 정세화에 기초하여 제 1 서브블록의 예측 샘플들의 제 1 정세화된 제 1 세트를 결정할 수도 있다. 제 2 서브블록에 대해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화에 기초하여 예측 샘플들의 제 2 세트를 정세화하지 않고 예측 샘플들의 제 2 세트를 결정할 수도 있다.To determine predictive samples for each subblock of one or more subblocks based on determining whether per-pixel BDOF is performed or BDOF is bypassed, video encoder 200 and video decoder 300: For one sub-block, a first refined first set of prediction samples of the first sub-block may be determined based on the per-pixel motion refinement for the first sub-block. For the second subblock, video encoder 200 and video decoder 300 do not refine the second set of predictive samples based on per-pixel motion refinement to refine the second set of predictive samples. You can also decide on 2 sets.

제 2 양태 내에서, 다음은 바이패스 서브블록 BDOF 를 설명한다. 양방향 광학 플로우 (BDOF) 를 적용하도록 결정된 W×H 코딩 블록이 주어지면, 서브블록들의 수 (N) 는 다음과 같이 결정된다:Within the second aspect, the bypass subblock BDOF is described next. Given a W×H coding block determined to apply bi-directional optical flow (BDOF), the number of subblocks (N) is determined as follows:

상기에서, thW 는 최대 서브블록 폭을 나타내고, thH 는 최대 서브블록 높이를 나타낸다. thW 및 thH 의 값들은 미리결정된 정수 값이다 (예를 들어, thW = thH = 8).In the above, thW represents the maximum subblock width, and thH represents the maximum subblock height. The values of thW and thH are predetermined integer values (eg, thW = thH = 8).

각각의 서브블록에 대해, 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 는 레퍼런스 픽처 0 및 레퍼런스 픽처 1 로부터 예측 신호 (predSig0) 및 예측 신호 (predSig1) 를 각각 도출할 수도 있다. predSig0 및 predSig1 의 폭 (sbWidth) 및 높이 (sbHeight) 는 다음과 같이 결정된다:For each subblock, video encoder 200 and/or video decoder 300 may derive a prediction signal (predSig0) and a prediction signal (predSig1) from reference picture 0 and reference picture 1, respectively. The width (sbWidth) and height (sbHeight) of predSig0 and predSig1 are determined as follows:

서브블록에서 BDOF 를 바이패스할지 여부는 predSig0 과 predSig1 사이의 SAD 를 체크함으로써 결정된다. SAD 는 다음과 같이 도출된다:Whether to bypass BDOF in a subblock is determined by checking the SAD between predSig0 and predSig1. SAD is derived as follows:

상기의 식에서, 는 sbWidth×sbHeight 서브블록이며, I^(k)(i,j) 는 레퍼런스 픽처 k (k = 0, 1) 에서의 예측 신호의 좌표 (i,j) 에서의 샘플 값이다.In the above expression, is a sbWidth×sbHeight subblock, and I ^(k) (i, j) is a sample value at coordinates (i, j) of the prediction signal in reference picture k (k = 0, 1).

sbSAD 가 임계치 (sbDistTh) 미만이면, 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 는 서브블록에서 BDOF 를 바이패스하도록 결정할 수도 있고, 그렇지 않으면 (sbSAD 가 sbDistTh 이상이면), 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 는 서브블록에 BDOF 를 적용하도록 결정할 수도 있다. 임계치 (sbDistTh) 는 다음과 같이 도출된다:If sbSAD is less than the threshold (sbDistTh), video encoder 200 and/or video decoder 300 may determine to bypass BDOF in the subblock; otherwise (if sbSAD is greater than or equal to sbDistTh), video encoder 200 and/or video decoder 300 may determine to apply BDOF to the subblock. The threshold (sbDistTh) is derived as follows:

상기의 식에서, n 및 s 는 미리결정된 값이다. 예를 들어, n 은 다음과 같이 도출될 수 있다: n = InternalBitDepth - bitDepth + 1. 상기의 식에서, s 는 스케일 팩터를 나타내며, 예를 들어, s = 1 이다. VVC 의 현재 버전에서, InternalBitDepth 는 bitDepth 10 에서 14 와 동일하고, 따라서, n 은 5 와 동일하다. 스케일 (s) 은 1, 2, 3 다른 미리정의된 값들이거나, 또는 비트스트림에서 시그널링될 수도 있다.In the above formula, n and s are predetermined values. For example, n can be derived as follows: n = InternalBitDepth - bitDepth + 1. In the above equation, s represents a scale factor, for example s = 1. In the current version of VVC, InternalBitDepth equals bitDepth 10 to 14, and therefore n equals 5. The scale (s) may be 1, 2, 3 other predefined values, or signaled in the bitstream.

상기는, 임계 값을 결정하는 하나의 예시적인 방식 및 왜곡 값을 결정하는 하나의 예시적인 방식을 기술한다는 것이 이해되어야 한다. 하지만, 예시적인 기법들은 그것에 한정되지 않는다. 하기에서 더 상세히 설명되는 바와 같이, 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 픽셀 당 BDOF 가 수행될 것이라는 결정이 이루어지면, 왜곡 값들을 결정하는데 사용된 계산들이 픽셀 당 BDOF 를 수행하기 위해 재사용될 수 있는 방식으로 왜곡 값들을 결정할 수도 있다.It should be understood that the above describes one example way of determining the threshold value and one example way of determining the distortion value. However, example techniques are not limited thereto. As described in more detail below, in some examples, video encoder 200 and video decoder 300, once a determination is made that per-pixel BDOF will be performed, the calculations used to determine distortion values It is also possible to determine the distortion values in a way that can be reused to perform

제 2 양태 내에서, 다음은 픽셀 당 BDOF 를 설명한다. 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 가 sbWidth × sbHeight 서브블록에 BDOF 를 적용하도록 결정하였으면, 서브블록은 (sbWidth + 4) × (sbHeight + 4) 영역으로 확장된다. 서브블록 내의 각각의 픽셀에 대해, 비디오 인코더 (200) 및/또는 비디오 디코더 (300) 는, 5×5 주위 영역의 그래디언트들에 기초하여, 정세화된 모션 벡터로도 또한 지칭되는 모션 정세화 (v'_x, v'_y) 를 도출할 수도 있다. 도 14 는 8x8 서브블록의 픽셀 당 BDOF 의 일 예를 예시한다. 따라서, 픽셀 당 BDOF 에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 모션 정세화를 결정할 수도 있다. 서브블록 BDOF 에서, 모션 정세화는 서브블록에 대한 것이며, 샘플 단위 (예를 들어, 픽셀 단위) 기반으로 결정되지 않는다.Within the second aspect, the following describes BDOF per pixel. If video encoder 200 and/or video decoder 300 have determined to apply BDOF to a sbWidth × sbHeight subblock, the subblock extends into the (sbWidth + 4) × (sbHeight + 4) region. For each pixel within a subblock, video encoder 200 and/or video decoder 300, based on the gradients of the 5×5 surrounding area, a motion refinement (v′, also referred to as a refined motion vector) _x , v' _y ) can also be derived. 14 illustrates an example of BDOF per pixel of an 8x8 subblock. Thus, in per-pixel BDOF, video encoder 200 and video decoder 300 may determine a per-pixel motion refinement. In subblock BDOF, motion refinement is for a subblock and is not determined on a sample-by-sample (eg, pixel-by-pixel) basis.

상기에서, sbWidth×sbHeight 서브블록이 주어지면, 다음의 단계들이 픽셀 당 BDOF 프로세스에서 적용된다.In the above, given the sbWidth×sbHeight subblock, the following steps are applied in the per-pixel BDOF process.

- 2개의 예측 신호들의 수평 및 수직 그래디언트들 ( 및 , ) 은 상기에서 설명된 양방향 광학 플로우에서와 같이 2개의 이웃 샘플들 사이의 차이를 직접 계산함으로써 컴퓨팅되며, 여기서, (i,j) 는 레퍼런스 픽처 0 및 레퍼런스 픽처 1 에서의 예측 신호의 (sbWidth + 4) × (sbHeight + 4) 영역에서의 조정된 포지션이다.- horizontal and vertical gradients of the two prediction signals ( and , ) is computed by directly computing the difference between two neighboring samples as in the bidirectional optical flow described above, where (i,j) is the (sbWidth + 4) This is the adjusted position in the area × (sbHeight + 4).

- 서브블록 내의 각각의 픽셀에 대해, 다음의 단계들이 적용된다.- For each pixel in a sub-block, the following steps are applied.

o 그래디언트들 (S₁, S₂, S₃, S₅ 및 S₆) 의 자기상관 및 상호상관이 상기에서 설명된 양방향 광학 플로우에서와 같이 계산되며, 여기서, 는 픽셀 주변의 5×5 윈도우이다.o Autocorrelation and cross-correlation of the gradients (S ₁ , S ₂ , S ₃ , S ₅ and S ₆ ) are computed as in the bi-directional optical flow described above, where: is a 5×5 window around the pixel.

o 그 다음, 모션 정세화 (v'_x, v'_y) 가 상호상관 및 자기상관 항들을 사용하여 도출된다.o The motion refinement (v' _x , v' _y ) is then derived using the cross-correlation and autocorrelation terms.

o 모션 정세화 및 그래디언트들에 기초하여, 다음의 조정이 픽셀의 예측 신호를 도출하기 위해 계산된다: o Based on the motion refinement and gradients, the following adjustment is calculated to derive the pixel's prediction signal:

상기의 예들에서, I⁽⁰⁾ 은 제 1 레퍼런스 블록을 지칭하고, I⁽¹⁾ 은 제 2 레퍼런스 블록을 지칭한다. 조정 값 (b'(x,y)) 은 서브블록에서의 각각의 샘플에 대한 픽셀 당 모션 정세화 (v'_x, v'_y) 에 기초하여 결정되는 조정 값이다. 일부 예들에서, I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) 는 예측 블록으로서 고려될 수도 있고, 따라서, b'(x,y) 는 예측 블록을 조정하는 것으로서 고려될 수도 있다. 식 (3-1-2-1) 에 나타낸 바와 같이, 예측 샘플들 (pred_BDOF(x,y)) 을 생성하기 위해 o_offset 의 가산 및 shift5 에 의한 우측 시프트 연산이 존재할 수도 있다.In the above examples, I ⁽⁰⁾ refers to the first reference block, and I ⁽¹⁾ refers to the second reference block. The adjustment value (b'(x,y)) is an adjustment value determined based on the per-pixel motion refinement (v' _x , v' _y ) for each sample in the subblock. In some examples, I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) may be considered as a predictive block, and thus b'(x,y) may be considered as adjusting a predictive block. may be As shown in equation (3-1-2-1), there may be an addition of o _offset and a right shift operation by shift5 to generate prediction samples (pred _BDOF (x,y)).

제 3 양태는 대안적인 서브블록 SAD 도출에 관한 것이다. SAD 를 도출하기 위한 이 예시적인 기법은, SAD 도출을 위해 결정된 값들이 픽셀 당 BDOF 를 수행하기 위해 재사용될 수 있도록 하는 것일 수도 있다. 즉, 비디오 인코더 (200) 및 비디오 디코더는 먼저, 픽셀 당 BDOF 를 수행할지 여부를 결정하기 위해 서브-블록에 대한 왜곡 값 (예를 들어, SAD 값) 을 결정할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 가 픽셀 당 BDOF 가 수행될 것임을 결정하면, 픽셀 당 BDOF 를 수행할지 여부를 결정하기 위해 비디오 인코더 (200) 및 비디오 디코더 (300) 가 수행한 계산들은 픽셀 당 BDOF 를 수행하기 위해 재사용될 수도 있다.A third aspect relates to alternative subblock SAD derivation. This example technique for deriving SAD may be such that the values determined for SAD derivation can be reused to perform per-pixel BDOF. That is, video encoder 200 and video decoder may first determine a distortion value (eg, SAD value) for a sub-block to determine whether to perform per-pixel BDOF. If video encoder 200 and video decoder 300 determine that per-pixel BDOF is to be performed, the calculations performed by video encoder 200 and video decoder 300 to determine whether to perform per-pixel BDOF are may be reused to perform per BDOF.

예를 들어, 서브블록에 대한 왜곡 값을 결정하기 위한 하나의 방식은 (예를 들어, 제 1 모션 벡터에 의해 식별된) 제 1 레퍼런스 블록 및 (예를 들어, 제 2 모션 벡터에 의해 식별된) 제 2 레퍼런스 블록을 결정하고, 그리고 제 1 레퍼런스 블록의 샘플들과 제 2 레퍼런스 블록의 샘플들 사이의 차이 값을 결정하여 왜곡 값을 결정하는 것이다. 일 예로서, 상기에서 설명된 바와 같이, 왜곡 값을 결정하기 위한 하나의 방식은 다음을 결정하는 것이다:For example, one way to determine a distortion value for a subblock is to use a first reference block (e.g., identified by a first motion vector) and a first reference block (e.g., identified by a second motion vector). ) determining a second reference block, and determining a distortion value by determining a difference value between samples of the first reference block and samples of the second reference block. As an example, as described above, one way to determine the distortion value is to determine:

상기의 식에서, I⁽¹⁾(i,j) 는 제 1 레퍼런스 블록의 샘플들을 지칭하고, I⁽⁰⁾(i,j) 는 제 2 레퍼런스 블록의 샘플들을 지칭한다. 상기에서 추가로 설명된 바와 같이, 픽셀 당 모션 정세화 (예를 들어, v'_x, v'_y) 를 포함하여 모션 정세화를 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 그래디언트들의 자기상관 및 상호상관인 S₁, S₂, S₃, S₅, 및 S₆ 을 결정할 수도 있다. 식 (1-6-3) 에서 기술된 바와 같이, 그래디언트들의 자기상관 및 상호상관을 결정하는 것의 부분은 θ 에 대한 중간 값을 결정하는 것이며, 여기서, θ = 이다.In the above equation, I ⁽¹⁾ (i,j) refers to samples of the first reference block, and I ⁽⁰⁾ (i,j) refers to samples of the second reference block. As described further above, to determine motion refinement, including per-pixel motion refinement (eg, v' _x , v' _y ), video encoder 200 and video decoder 300 use a gradient The autocorrelation and cross-correlation of S ₁ , S ₂ , S ₃ , S ₅ , and S ₆ may be determined. As described in equation (1-6-3), part of determining the autocorrelation and cross-correlation of the gradients is determining the median value for θ, where θ = am.

따라서, 픽셀 당 BDOF 가 서브블록에 대해 수행될 것이면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 를 결정할 필요가 있을 수도 있다. 하나 이상의 예들에서, 서브블록에 대한 왜곡 값을 결정하는 것의 부분으로서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 에 기초하여 왜곡 값을 결정하는 대신 (또는 그에 부가하여) 에 기초하여 서브블록에 대한 왜곡 값을 결정할 수도 있다. 즉, 서브블록에 대한 왜곡 값을 결정하기 위해, 예컨대, 픽셀 당 BDOF 가 수행될 것인지 여부를 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 sbSAD 에 대한 값으로서 를 결정할 수도 있다. 이러한 방식으로, 픽셀 당 BDOF 가 수행될 것이면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 이미, θ 의 값이고 모션 정세화를 결정하기 위해 사용되는 에 대한 값을 결정하였을 것이다.Thus, if per-pixel BDOF is to be performed for a subblock, video encoder 200 and video decoder 300 may need to be determined. In one or more examples, as part of determining a distortion value for a subblock, video encoder 200 and video decoder 300: Instead of (or in addition to) determining the distortion value based on A distortion value for a subblock may be determined based on . That is, to determine a distortion value for a subblock, e.g., whether per-pixel BDOF is to be performed, video encoder 200 and video decoder 300 use as a value for sbSAD can also decide In this way, if per-pixel BDOF is to be performed, video encoder 200 and video decoder 300 already have the value of θ and which is used to determine the motion refinement. would have determined the value for

이에 따라, 하나 이상의 예들에서, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 제 1 레퍼런스 블록 및 제 2 레퍼런스 블록을 결정하도록 구성될 수도 있다. 예를 들어, I⁽⁰⁾(i,j) 는 제 1 레퍼런스 블록일 수도 있고, I⁽¹⁾(i,j) 는 제 2 레퍼런스 블록일 수도 있다.Accordingly, in one or more examples, video encoder 200 and video decoder 300, to determine respective distortion values for each subblock of one or more of the plurality of subblocks, For each subblock of one or more subblocks of the blocks, it may be configured to determine a first reference block and a second reference block. For example, I ⁽⁰⁾ (i,j) may be a first reference block, and I ⁽¹⁾ (i,j) may be a second reference block.

비디오 인코더 (200) 및 비디오 디코더 (300) 는 제 1 레퍼런스 블록의 샘플들 및 제 2 레퍼런스 블록의 샘플들을 스케일링할 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 I⁽⁰⁾(i,j) >> shift2 의 연산을 수행할 수도 있다. 이 예에서, shift2 의 값은, 제 1 레퍼런스 블록의 스케일링된 샘플들을 생성하기 위해 I⁽⁰⁾(i,j) 의 값을 얼마나 많이 스케일링할지를 정의할 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 I⁽¹⁾(i,j) >> shift2 의 연산을 수행할 수도 있다. 이 예에서, shift2 의 값은, 제 2 레퍼런스 블록의 스케일링된 샘플들을 생성하기 위해 I⁽¹⁾(i,j) 의 값을 얼마나 많이 스케일링할지를 정의할 수도 있다.Video encoder 200 and video decoder 300 may scale the samples of the first reference block and the samples of the second reference block. For example, video encoder 200 and video decoder 300 may perform the operation I ⁽⁰⁾ (i,j) >> shift2. In this example, the value of shift2 may define how much to scale the value of I ⁽⁰⁾ (i,j) to generate the scaled samples of the first reference block. For example, video encoder 200 and video decoder 300 may perform the operation I ⁽¹⁾ (i,j) >> shift2. In this example, the value of shift2 may define how much to scale the value of I ⁽¹⁾ (i,j) to generate the scaled samples of the second reference block.

비디오 인코더 (200) 및 비디오 디코더 (300) 는 개별 왜곡 값들을 결정하기 위해 제 1 레퍼런스 블록의 스케일링된 샘플들과 제 2 레퍼런스 블록의 스케일링된 샘플들 사이의 차이 값을 결정할 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 를 결정할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 의 결과에 기초하여 서브블록에 대한 왜곡 값 (예를 들어, sbSAD) 을 결정할 수도 있다.Video encoder 200 and video decoder 300 may determine a difference value between the scaled samples of the first reference block and the scaled samples of the second reference block to determine the respective distortion values. For example, video encoder 200 and video decoder 300 can also decide Video encoder 200 and video decoder 300 A distortion value (eg, sbSAD) for the subblock may be determined based on the result of .

상기에서 설명된 바와 같이, 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 에 대한 계산 이득들이 존재할 수도 있고, 의 값이 픽셀 당 BDOF 에 대해 재사용될 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 인코딩 또는 디코딩되는 블록이 분할된 복수의 서브블록들 중의 하나 이상의 서브블록들 중 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하였다고 가정한다.As described above, in some examples, there may be computational gains for video encoder 200 and video decoder 300; The value of may be reused for per-pixel BDOF. For example, the video encoder 200 and the video decoder 300 determine that per-pixel BDOF is performed on a first subblock among one or more subblocks from among a plurality of subblocks from which a block to be encoded or decoded is divided. Suppose you did

이 예에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들을 결정할 수도 있다. 즉, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록에서의 모든 샘플들에 대해 동일한 하나의 모션 정세화 (v_x,v_y) 를 결정하는 것보다는 또는 그에 부가하여, 제 1 서브블록의 각각의 샘플에 대한 모션 정세화 (v'_x, v'_y) 를 결정할 수도 있다.In this example, video encoder 200 and video decoder 300 may determine, for each sample in the first subblock, separate motion refinements. That is, video encoder 200 and video decoder 300 may, rather than or in addition to determine one motion refinement (v _x ,v _y ) that is the same for all samples in the first subblock, A motion refinement (v' _x , v' _y ) may be determined for each sample of a subblock.

비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들에 기초하여 제 1 서브블록에 대한 예측 블록에서의 샘플들로부터 개별 정세화된 샘플 값들을 결정하도록 구성될 수도 있다. 예를 들어, 상기에서 설명된 바와 같이, 픽셀 당 BDOF 에 대한 예측 샘플들을 결정하기 위한 식은 일 수도 있다.Video encoder 200 and video decoder 300, for each sample in the first sub-block, a separate refined sample value from samples in the predictive block for the first sub-block based on the respective motion refinements. may be configured to determine For example, as described above, the equation for determining prediction samples for BDOF per pixel is It could be.

pred_BDOF 를 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 개별 픽셀 당 모션 정세화들 (즉, (v'_x, v'_y)) 로부터 결정된 픽셀 당 조정 값인 b'(x,y) 를 결정할 수도 있다. 일부 예들에서, 예측 블록은 제 1 레퍼런스 블록과 제 2 레퍼런스 블록의 합으로서 고려될 수도 있다 (즉, I⁽⁰⁾(i,j) + I⁽¹⁾(i,j)). pred_BDOF 를 결정하기 위한 식에서 나타낸 바와 같이, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 b'(x,y) 에 I⁽⁰⁾(i,j) + I⁽¹⁾(i,j) 를 가산할 수도 있다. 따라서, pred_BDOF 를 결정하는 것의 부분으로서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 개별 모션 정세화들 (예를 들어, b'(x,y) 를 결정하는데 사용되는 (v'_x, v ' _y)) 에 기초하여 제 1 서브블록에 대한 예측 블록 (예를 들어, 여기서, 예측 블록은 I⁽⁰⁾(i,j) + I⁽¹⁾(i,j) 와 동일함)) 에서의 샘플들로부터 정세화된 샘플 값들 (예를 들어, pred_BDOF) 을 결정할 수도 있다.To determine the pred _BDOF _, video encoder ₂₀₀ and video decoder 300 use a per-pixel adjustment value, b'(x, y) can be determined. In some examples, a predictive block may be considered as the sum of the first reference block and the second reference block (ie, I ⁽⁰⁾ (i,j) + I ⁽¹⁾ (i,j)). As shown in the equation for determining pred _BDOF , video encoder 200 and video decoder 300 calculate b'(x,y) as I ⁽⁰⁾ (i,j) + I ⁽¹⁾ (i,j) can also be added. Thus, as part of determining pred _BDOF , video encoder 200 and video decoder 300 use (v' _x , v ' _y )) in the predictive block for the first subblock (eg, where the predictive block is equal to I ⁽⁰⁾ (i,j) + I ⁽¹⁾ (i,j))) Refinement sample values (eg, pred _BDOF ) may be determined from samples of .

다른 방식으로 서술하면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 1 레퍼런스 블록에서의 샘플 값들의 제 1 세트를 결정할 수도 있다 (예를 들어, I⁽⁰⁾(i,j) 를 결정함). 비디오 인코더 (200) 및 비디오 디코더 (300) 는 스케일링된 샘플 값들의 제 1 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 1 세트를 스케일링할 수도 있다. 즉, I⁽⁰⁾(i,j) >> shift2 를 수행하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 ">>" 및 "shift2" 의 값에 의해 정의된 스케일 팩터에 의해 샘플들의 제 1 세트를 스케일링하는 것으로서 고려될 수도 있다.Stated another way, video encoder 200 and video decoder 300 may determine a first set of sample values in a first reference block for a first one of one or more subblocks (e.g., , which determines I ⁽⁰⁾ (i,j)). Video encoder 200 and video decoder 300 may scale the first set of sample values with a scale factor to generate the first set of scaled sample values. That is, to perform I ⁽⁰⁾ (i,j) >> shift2, the video encoder 200 and the video decoder 300 sample by the scale factor defined by the values of “>>” and “shift2”. may be considered as scaling the first set of .

비디오 인코더 (200) 및 비디오 디코더 (300) 는 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 2 레퍼런스 블록에서의 샘플 값들의 제 2 세트를 결정할 수도 있다 (예를 들어, I⁽¹⁾(i,j) 를 결정함). 비디오 인코더 (200) 및 비디오 디코더 (300) 는 스케일링된 샘플 값들의 제 2 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 2 세트를 스케일링할 수도 있다. 즉, I⁽¹⁾(i,j) >> shift2 를 수행하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 ">>" 및 "shift2" 의 값에 의해 정의된 스케일 팩터에 의해 샘플들의 제 2 세트를 스케일링하는 것으로서 고려될 수도 있다.Video encoder 200 and video decoder 300 may determine a second set of sample values in a second reference block for a first one of the one or more subblocks (e.g., I ⁽¹⁾ i,j)). Video encoder 200 and video decoder 300 may scale the second set of sample values with a scale factor to generate the second set of scaled sample values. That is, to perform I ⁽¹⁾ (i,j) >> shift2, the video encoder 200 and the video decoder 300 sample by the scale factor defined by the values of “>>” and “shift2” may be considered as scaling the second set of .

비디오 인코더 (200) 및 비디오 디코더 (300) 는, 제 1 서브블록에 대해, 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트에 기초하여 (예를 들어, I⁽⁰⁾(i,j) >> shift2 및 I⁽¹⁾(i,j) >> shift2 에 기초하여) 왜곡 값을 결정할 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 (I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2)) 에 기초하여 제 1 서브블록에 대한 왜곡 값을 결정할 수도 있다.Video encoder 200 and video decoder 300 perform, for a first subblock, based on a first set of scaled sample values and a second set of scaled sample values (e.g., I ⁽⁰⁾ ( Based on i,j) >> shift2 and I ⁽¹⁾ (i,j) >> shift2), the distortion value may be determined. For example, video encoder 200 and video decoder 300 calculate the second based on (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2)). A distortion value for 1 subblock may be determined.

하나 이상의 예들에서, 상기에서 설명된 바와 같이, 픽셀 당 BDOF 가 제 1 서브블록에 대해 수행된다고 가정한다. 이 예에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용할 수도 있다. 예를 들어, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 모션 정세화 (예를 들어, (v'_x, v'_y)) 를 결정하기 위한 그래디언트들의 자기상관 및 상호상관을 결정하기 위해 (I⁽⁰⁾(i,j) >> shift2) - (I⁽¹⁾(i,j) >> shift2) 의 계산을 재사용할 수도 있다. 상기에서 설명된 바와 같이, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, pred_BDOF (즉, 블록의 제 1 서브블록을 인코딩 또는 디코딩하기 위한 예측 샘플들) 를 결정하는데 사용되는 b'(x,y) 의 조정 값을 결정하기 위해 픽셀 당 모션 정세화를 사용할 수도 있다.In one or more examples, assume that per-pixel BDOF is performed for the first sub-block, as described above. In this example, video encoder 200 and video decoder 300 may reuse the first set of scaled sample values and the second set of scaled sample values to determine the per-pixel motion refinement for per-pixel BDOF. there is. For example, video encoder 200 and video decoder 300 use a method to determine autocorrelation and cross-correlation of gradients to determine per-pixel motion refinement (eg, ( _v'x , _v'y )). The calculation of (I ⁽⁰⁾ (i,j) >> shift2) - (I ⁽¹⁾ (i,j) >> shift2) may be reused. As described above _, video encoder 200 and video decoder 300 use b'(x ,y) may use per-pixel motion refinement to determine the adjustment value.

상기는, 비디오 인코더 (200) 및 비디오 디코더 (300) 가 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용할 수도 있는 예를 설명한다. 하지만, 그 기법들은 그것에 한정되지 않는다. 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 BDOF 에 대한 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용할 수도 있다. 즉, 예시적인 기법들은 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용하는 것으로 제한되지 않을 수도 있지만, BDOF 에 대한 모션 정세화를 위해 더 일반적으로 사용될 수 있다 (예를 들어, 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화로 제한되지 않음). BDOF 가 픽셀 단위가 아닌 전체 서브블록에 대한 모션 정세화를 포함하는 예들에서와 같이, 픽셀 당 BDOF 뿐만 아니라 서브블록 기반 BDOF 에 대해서도 복잡도의 감소가 존재할 수도 있다.The above indicates that video encoder 200 and video decoder 300 may reuse the first set of scaled sample values and the second set of scaled sample values to determine the per-pixel motion refinement for per-pixel BDOF. Explain an example. However, the techniques are not limited thereto. In some examples, video encoder 200 and video decoder 300 may reuse the first set of scaled sample values and the second set of scaled sample values to determine motion refinement for BDOF. That is, example techniques may not be limited to reusing the first set of scaled sample values and the second set of scaled sample values for per-pixel motion refinement for per-pixel BDOF, but motion refinement for BDOF. (e.g., not limited to per-pixel motion refinement for per-pixel BDOF). A reduction in complexity may exist for subblock-based BDOF as well as per-pixel BDOF, as in examples where BDOF includes motion refinement for the entire subblock rather than per pixel.

이에 따라, 제 2 양태에서와 같이, 다음은, 서브블록을 바이패스할지 여부 (즉, BDOF 가 바이패스되는지 여부) 를 결정하는데 사용되는 서브블록 SAD 를 도출하기 위한 대안적인 방법을 설명한다. 상기에서 설명된 바와 같이, 예시적인 방법은 식 1 내지 식 6 과 함께 상기에서 설명된 양방향 광학 플로우에서와 같이 를 계산하는 것과 동일한 방식으로 2개의 레퍼런스 신호들 사이의 차이 (diff(i,j)) 를 계산한다.Accordingly, as in the second aspect, the following describes an alternative method for deriving the subblock SAD used to determine whether to bypass a subblock (ie, whether BDOF is bypassed). As described above, the exemplary method is as in the bi-directional optical flow described above in conjunction with Equations 1-6. Calculate the difference ( diff(i,j) ) between the two reference signals in the same way as calculating .

서브블록이 BDOF 를 적용하도록 결정되면, diff(i,j) 는, 상기에서 설명된 양방향 광학 플로우에서와 같이 그래디언트들 (S3 및 S6) 의 자기상관 및 상호상관을 계산하기 위한 단계에서 재사용될 수 있다.If a subblock is determined to apply BDOF, diff(i,j) can be reused in the step for computing the autocorrelation and crosscorrelation of gradients S3 and S6, as in the bidirectional optical flow described above. there is.

제 2 양태에서의 (3-1-1-1) 의 식은 다음과 같이 수정된다:The expression of (3-1-1-1) in the second aspect is modified as follows:

상기의 식에서, I^(k)(i,j) 는 레퍼런스 픽처 k (k = 0, 1) 에서의 예측 신호의 (sbWidth + 4)×(sbHeight + 4) 영역에 좌표 (i,j) 에서의 샘플 값이다. Shift2 는 미리결정된 값이며, 예를 들어, shift2 는 4 와 동일하다. 는 sbWidth×sbHeight 서브블록 영역이다.In the above equation, I ^(k) (i, j) is the (sbWidth + 4) × (sbHeight + 4) area of the prediction signal in the reference picture k (k = 0, 1) at coordinates (i, j). is the sample value. Shift2 is a predetermined value, for example, shift2 is equal to 4. is the sbWidth×sbHeight subblock area.

에 기초하여 서브블록에 대한 왜곡 값을 결정하기 위한 (예를 들어, sbSAD 를 결정하기 위한) 대안적인 기법은 픽셀 당 BDOF 가 수행되는 예들로 제한되는 것으로 간주되지 않아야 한다는 것을 유의해야 한다. 서브블록에 대한 왜곡 값을 결정하기 위한 대안적인 기법은 서브블록 BDOF 또는 일부 다른 BDOF 기법이 적용되는 경우에도 예들에 적용가능할 수도 있다. 예를 들어, 서브블록 BDOF 에 대해서도, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 서브블록에 대해 BDOF 가 수행되는지 여부를 결정하기 위한 왜곡 값을 결정하기 위해 대안적인 기법을 활용할 수도 있다. BDOF 가 수행될 것이면, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 서브블록 BDOF 의 부분으로서 모션 정세화를 결정하기 위한 왜곡 값을 결정하기 위해 대안적인 기법에 대한 계산을 재사용할 수도 있다 (예를 들어, 왜곡 값을 결정하기 위해 대안적인 기법에 대한 계산의 재사용이 있을 수도 있음). It should be noted that an alternative technique for determining a distortion value for a subblock (eg, for determining sbSAD) based on sbSAD should not be considered limited to examples in which per-pixel BDOF is performed. An alternative technique for determining a distortion value for a subblock may be applicable to examples even when subblock BDOF or some other BDOF technique is applied. For example, even for subblock BDOF, video encoder 200 and video decoder 300 may utilize alternative techniques to determine distortion values for determining whether BDOF is performed on a subblock. If BDOF is to be performed, video encoder 200 and video decoder 300 may reuse calculations for alternative techniques to determine distortion values for determining motion refinement as part of a subblock BDOF (eg, For example, there may be reuse of calculations for alternative techniques to determine distortion values).

상기에서 설명된 바와 같이, 픽셀 당 BDOF 가 수행되는지 또는 BDOF 가 바이패스되는지 여부를 결정하기 위해 왜곡값이 비교되는 임계 값은 상기의 식 (3-1-1-2) 에 나타낸 바와 같이 (sbWidth*sbHeight*s) << n 으로서 계산되는 sbDistTh 이다. 하지만, 왜곡 값을 결정하기 위한 대안적인 기법에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 상기에서 설명된 바와 같이, I⁽⁰⁾(i,j) 를 >> shift 2 에 의해 스케일링하고, I⁽¹⁾ 를 >> shift2 에 의해 스케일링할 수도 있다. 따라서, 일부 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 가 sbDistTh 를 결정하는 방식은 >> shift2 스케일링을 고려하도록 수정될 수도 있다.As described above, the threshold value against which the distortion value is compared to determine whether per-pixel BDOF is performed or whether BDOF is bypassed is, as shown in equation (3-1-1-2) above (sbWidth *sbHeight*s) << sbDistTh calculated as n. However, in an alternative technique for determining the distortion value, video encoder 200 and video decoder 300 scale I ⁽⁰⁾ (i,j) >> by shift 2, as described above. and scale I ⁽¹⁾ by >> shift2. Thus, in some examples, the manner in which video encoder 200 and video decoder 300 determine sbDistTh may be modified to account for >>shift2 scaling.

sbDistTh 를 계산하기 위한 제 2 양태에서의 (3-1-1-2) 의 식은 다음과 같이 수정된다:The expression of (3-1-1-2) in the second aspect for calculating sbDistTh is modified as follows:

상기의 식에서, n 및 s 는 미리결정된 값들이다. 예를 들어, n 은 다음과 같이 도출될 수 있다: n = InternalBitDepth - bitDepth + 1. 상기의 식에서, s 는 스케일 팩터를 나타내며, 예를 들어, s = 1 이다. VVC 의 현재 버전에서, InternalBitDepth 는 bitDepth 10 에서 14 와 동일하고, 따라서, n 은 5 와 동일하다. 스케일 (s) 은 1, 2, 3 다른 미리정의된 값들이거나, 또는 비트스트림에서 시그널링될 수도 있다.In the above formula, n and s are predetermined values. For example, n can be derived as follows: n = InternalBitDepth - bitDepth + 1. In the above equation, s represents a scale factor, for example s = 1. In the current version of VVC, InternalBitDepth equals bitDepth 10 to 14, and therefore n equals 5. The scale (s) may be 1, 2, 3 other predefined values, or signaled in the bitstream.

이에 따라, 임계 값을 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 중간 값을 생성하기 위해 하나 이상의 서브블록들 중 제 1 서브블록의 폭 (즉, 식 (3-2-2) 에서의 sbWidth), 하나 이상의 서브블록들 중 제 1 서브블록의 높이 (즉, 식 (3-2-2) 에서의 sbHeight), 및 제 1 스케일 팩터 (즉, 식 (3-2-2) 에서의 "s") 를 승산하도록 구성될 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는 임계 값을 생성하기 위해 제 2 스케일 팩터에 기초하여 중간 값에 대해 좌측-시프트 연산을 수행하도록 구성될 수도 있다. 예를 들어, 제 2 스케일 팩터는 식 (3-2-2) 에서 (n - shift2) 일 수도 있고, 좌측-시프트 연산은 식 (3-2-2) 에서 "<<" 로서 나타내어진다.Accordingly, to determine the threshold value, video encoder 200 and video decoder 300 use the width of a first one of the one or more subblocks (i.e., equation (3-2-2) to generate an intermediate value. ), the height of the first subblock of the one or more subblocks (ie, sbHeight in equation (3-2-2)), and the first scale factor (ie, equation (3-2-2) It may be configured to multiply "s" in ). Video encoder 200 and video decoder 300 may be configured to perform a left-shift operation on the intermediate value based on the second scale factor to generate a threshold value. For example, the second scale factor may be (n - shift2) in equation (3-2-2), and the left-shift operation is represented as "<<" in equation (3-2-2).

하나 이상의 예들에서, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 제 1 서브블록에 대한 왜곡 값 (예를 들어, 왜곡 값을 결정하기 위한 대안적인 기법을 사용하여 계산된 왜곡 값) 을 임계 값 (예를 들어, 식 (3-2-2) 에서 결정된 바와 같은 sbDistTh) 과 비교할 수도 있다. 비디오 인코더 (200) 및 비디오 디코더 (300) 는, 비교에 기초하여 제 1 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다. 예를 들어, 왜곡 값이 임계 값보다 작으면 (예를 들어, 도 13 에서의 1306 의 "예"), 비디오 인코더 (200) 및 비디오 디코더 (300) 는 BDOF 를 바이패스할 수도 있다. 왜곡 값이 임계 값보다 크면 (예를 들어, 도 13 에서의 1306 의 "아니오"), 비디오 인코더 (200) 및 비디오 디코더 (300) 는 픽셀 당 BDOF 를 수행할 수도 있다.In one or more examples, video encoder 200 and video decoder 300 use a distortion value for the first subblock (eg, a distortion value calculated using an alternative technique for determining the distortion value) as a threshold value (eg, sbDistTh as determined in equation (3-2-2)). Video encoder 200 and video decoder 300 may determine that either per-pixel BDOF is performed or BDOF is bypassed for the first subblock based on the comparison. For example, if the distortion value is less than the threshold value (eg, the “yes” of 1306 in FIG. 13 ), video encoder 200 and video decoder 300 may bypass BDOF. If the distortion value is greater than the threshold value (eg, “no” of 1306 in FIG. 13 ), video encoder 200 and video decoder 300 may perform per-pixel BDOF.

제 4 양태들은 thW 및 thH 의 값들을 결정하는 것에 관한 것이다. 상기의 양태들에서와 같이, 예시적인 기법들은 양방향 예측된 코딩 블록에 적용될 수도 있다. 서브블록들의 총 수는, 현재 블록의 폭 및 높이와 서브블록의 최대 서브블록 폭 (thW) 및 높이 (thH) 로부터 도출된다.Fourth aspects relate to determining the values of thW and thH. As in the above aspects, example techniques may be applied to a bi-predicted coding block. The total number of subblocks is derived from the width and height of the current block and the maximum subblock width (thW) and height (thH) of the subblock.

현재 코딩 블록이 서브블록 기반 방법, 예를 들어, DMVR 을 적용할 경우, thW 및 thH 의 값들은, 선행된 방법 (예를 들어, DMVR) 의 최대 서브블록 폭 및 높이 이하여야 한다.When the current coding block applies a subblock-based method, eg, DMVR, the values of thW and thH must be equal to or less than the maximum subblock width and height of the preceding method (eg, DMVR).

thW 및 thH 의 값들은 고정된 미리결정된 값들일 수 있으며, 예를 들어, thW 는 8 과 동일하고, thH 는 8 과 동일하다. thW 및 thH 의 값들은 적응적일 수 있고, 그 값들은 비트스트림으로부터 디코딩된 정보에 의해 결정된다. 다음은 thW 및 thH 의 값들이 적응되는 방식들을 설명한다:The values of thW and thH can be fixed predetermined values, eg thW is equal to 8 and thH is equal to 8. The values of thW and thH may be adaptive, and the values are determined by information decoded from the bitstream. The following describes the ways in which the values of thW and thH are adapted:

a. 선행된 코딩 방법에 의해 결정됨: 현재 코딩 블록이 서브블록 기반 방법을 적용한다면, thW 및 thH 는 선행된 방법과 동일한 서브블록 치수로 설정될 수 있다. 예를 들어, DMVR 이 현재 코딩 블록에 적용될 경우, thW 는 DMVR 최대 서브블록 폭, 예를 들어, 16 과 동일하게 설정되고, thH 는 DMVR 최대 서브블록 높이, 예를 들어, 16 과 동일하게 설정된다. 그렇지 않으면 (현재 코딩 블록이 어떠한 서브블록 기반 방법도 적용하지 않으면), thW 및 thH 는 미리결정된 값들, 예를 들어, 8 로 설정될 수 있다.a. Determined by the preceding coding method: If the current coding block applies the subblock-based method, thW and thH may be set to the same subblock dimensions as the preceding method. For example, when DMVR is applied to the current coding block, thW is set equal to the DMVR maximum subblock width, eg 16, and thH is set equal to the DMVR maximum subblock height, eg 16. . Otherwise (if the current coding block does not apply any subblock-based method), thW and thH may be set to predetermined values, eg 8.

b. 현재 코딩 블록 치수에 의해 결정됨: 이 예에서, thW 및 thH 의 더 큰 값은, 임계치 T (예를 들어, T = 128) 보다 큰 루마 샘플들의 총 수를 갖는 코딩 블록으로 설정된다. W×H 코딩 블록이 주어질 경우: W*H 가 T 보다 크면, thW 및 thH 의 값을 16 과 동일하게 설정한다. 그렇지 않으면 (W*H 가 T 이하이면), thW 및 thH 의 값을 8 과 동일하게 설정한다.b. Determined by the current coding block dimension: in this example, the larger value of thW and thH is set to the coding block with the total number of luma samples greater than the threshold T (eg, T = 128). When a W×H coding block is given: If W*H is greater than T, the values of thW and thH are set equal to 16. Otherwise (if W*H is less than or equal to T), set the values of thW and thH equal to 8.

제 5 양태는, 서브블록 바이패스를 갖는 픽셀 당 BDOF 를 적용하는 예시적인 디코더 프로세스에 관한 것이다. 상기의 양태들은 인코더 (예를 들어, 비디오 인코더 (200)) 및/또는 디코더 (예를 들어, 비디오 디코더 (300)) 에서 적용될 수 있다. 디코더 (예를 들어, 비디오 디코더 (300)) 는, 비트스트림으로부터 픽처에서의 인터 예측된 블록을 디코딩하기 위해 다음의 단계들의 전부 또는 서브세트에 의해 여기에서 설명된 방법들을 실행할 수도 있다:A fifth aspect relates to an example decoder process that applies per-pixel BDOF with subblock bypass. The above aspects may be applied at an encoder (eg, video encoder 200 ) and/or a decoder (eg, video decoder 300 ). A decoder (e.g., video decoder 300) may execute the methods described herein by all or a subset of the following steps to decode an inter-predicted block in a picture from a bitstream:

1. 비트스트림에서의 신택스 엘리먼트들을 디코딩함으로써 현재 블록의 좌상부 루마 포지션으로서 포지션 컴포넌트 (cbX, cbY) 를 도출한다.1. Deriving the position component (cbX, cbY) as the upper-left luma position of the current block by decoding the syntax elements in the bitstream.

2. 비트스트림에서의 신택스 엘리먼트들을 디코딩함으로써 현재 블록의 사이즈를 폭 값 (W) 및 높이 값 (H) 으로서 도출한다.2. Deriving the size of the current block as a width value (W) and a height value (H) by decoding syntax elements in the bitstream.

3. 현재 블록이 비트스트림에서의 디코딩 엘리먼트들로부터 인터 예측된 블록임을 결정한다.3. Determine that the current block is an inter predicted block from decoding elements in the bitstream.

4. 비트스트림에서의 디코딩 엘리먼트들로부터 현재 블록의 모션 벡터 컴포넌트들 (mvL0 및 mvL1) 및 레퍼런스 인덱스들 (refPicL0 및 refPicL1) 을 도출한다.4. Derive motion vector components (mvL0 and mvL1) and reference indices (refPicL0 and refPicL1) of the current block from the decoding elements in the bitstream.

5. 비트스트림에서의 디코딩 엘리먼트들로부터 플래그를 추론하며, 여기서, 플래그는 디코더측 모션 벡터 도출 (예를 들어, DMVR, 양측성 병합, 템플릿 매칭) 이 현재 블록에 적용되는지 여부를 표시한다. 플래그의 추론 방식은, DMVR 이 인에이블될 때에 대한 인에이블링 조건에 관하여 상기에서 설명된 예들과 동일할 수 있지만 이에 제한되지 않는다. 다른 예에서, 이 플래그는 디코더에서 복잡한 조건 체크를 회피하기 위해 비트스트림에서 명시적으로 시그널링될 수 있다.5. Infer a flag from the decoding elements in the bitstream, where the flag indicates whether decoder-side motion vector derivation (eg, DMVR, bilateral merging, template matching) is applied to the current block. The reasoning method of the flag may be the same as the examples described above with respect to the enabling condition for when the DMVR is enabled, but is not limited thereto. In another example, this flag can be explicitly signaled in the bitstream to avoid complex condition checking at the decoder.

6. 현재 블록에 DMVR 을 적용하도록 결정되면, 정세화된 모션 벡터들을 도출한다.6. If it is determined to apply DMVR to the current block, refined motion vectors are derived.

7. 디코딩된 refPicL0, refPicL1 및 모션 벡터들로부터 2개의 (W + 6) × (H + 6) 루마 예측 샘플 어레이들 (predSampleL0 및 predSampleL1) 을 도출하며, 여기서, DMVR 을 적용하도록 결정되면, 모션 벡터들은 정세화된 모션 벡터들이고, 그렇지 않으면, 모션 벡터들은 mvL0, mvL1 이다.7. Derive two (W + 6) × (H + 6) luma prediction sample arrays (predSampleL0 and predSampleL1) from the decoded refPicL0, refPicL1 and motion vectors, where it is determined to apply DMVR, the motion vector are the refined motion vectors, otherwise the motion vectors are mvL0, mvL1.

8. 비트스트림에서의 디코딩 엘리먼트들로부터 플래그를 추론하며, 여기서, 플래그는 양방향 광학 플로우가 현재 블록에 적용되는지 여부를 표시한다. 플래그의 추론 방식은 양방향 광학 플로우와 동일할 수 있지만 이에 제한되지 않는다. 다른 예에서, 이 플래그는 디코더에서 복잡한 조건 체크를 회피하기 위해 비트스트림에서 명시적으로 시그널링될 수 있다.8. Infer a flag from the decoding elements in the bitstream, where the flag indicates whether bi-directional optical flow is applied to the current block. The reasoning method of the flag may be the same as that of the bi-directional optical flow, but is not limited thereto. In another example, this flag can be explicitly signaled in the bitstream to avoid complex condition checking at the decoder.

9. 전술한 플래그 값에 따라, 결정이 현재 블록에 BDOF 를 적용하는 것이라면, 다음과 같이, 수평 방향에서의 서브블록들의 수 (numSbX) 및 수직 방향에서의 서브블록들의 수 (numSbY), 서브블록 폭 (sbWidth) 및 높이 (sbHeight) 를 도출한다:9. If, according to the above flag value, the decision is to apply BDOF to the current block, the number of subblocks in the horizontal direction (numSbX) and the number of subblocks in the vertical direction (numSbY), subblocks as follows: Derive the width (sbWidth) and height (sbHeight):

여기서, thW 및 thH 는 미리결정된 정수 값이다 (예를 들어, thW = thH = 8) where thW and thH are predetermined integer values (e.g. thW = thH = 8)

10. 변수 sbDistTh 를 다음과 같이 도출한다:10. Derive the variable sbDistTh as follows:

여기서, here,

shift2 는 미리결정된 값이며, 예를 들어, shift2 는 4 와 동일하다. shift2 is a predetermined value, for example, shift2 is equal to 4.

n 은 미리결정된 값이며, 예를 들어, n = InternalBitDepth - bitDepth + 1 = 5 이다. n is a predetermined value, for example n = InternalBitDepth - bitDepth + 1 = 5.

s 는 스케일 팩터이며, 예를 들어, s = 1 이다. s is a scale factor, eg s = 1.

11. 포지션 컴포넌트 (sbX, sbY) = (0, 0) 을 현재 블록의 제 1 서브블록의 좌상부 루마 포지션으로서 설정한다.11. Set the position component (sbX, sbY) = (0, 0) as the upper-left luma position of the first subblock of the current block.

12. (sbX, sbY) 에서의 각각의 서브블록에 대해, sbX 가 W 보다 작고 sbY 가 H 보다 작을 경우, 다음의 단계들이 적용된다.12. For each subblock in (sbX, sbY), if sbX is less than W and sbY is less than H, the following steps apply.

12.1. x = sbX - 2 … sbX + sbWidth + 1, y = sbY - 2 … sbY + sbHeight + 1 에 대해, 변수들 diff[x][y] 는 다음과 같이 도출된다:12.1. x = sbX - 2 . sbX + sbWidth + 1, y = sbY - 2... For sbY + sbHeight + 1, the variables diff[x][y] are derived as:

여기서, shift2 는 미리결정된 값이며, 예를 들어, shift2 는 4 와 동일하다. Here, shift2 is a predetermined value, for example, shift2 is equal to 4.

12.2. 변수 sbDist 를 다음과 같이 도출한다:12.2. The variable sbDist is derived as follows:

여기서, i =　0 … sbWidth - 1 이고, j = 0 … sbHeight - 1 이다. Here, i = 　 0 … sbWidth - 1, j = 0 ... sbHeight - 1.

12.3. (바이패스 서브블록 BDOF) sbDist 가 sbDistTh 보다 작으면, 서브블록의 예측 신호를 다음과 같이 도출한다.12.3. (Bypass Subblock BDOF) If sbDist is smaller than sbDistTh, the prediction signal of the subblock is derived as follows.

12.3.1. x = sbX … sbX + sbWidth - 1, y = sbY … sbY + sbHeight - 1 에 대해,12.3.1. x = sbX... sbX + sbWidth - 1, y = sbY... For sbY + sbHeight - 1,

여기서, here,

shift5 는 Max(3, 15 - BitDepth) 와 동일하게 설정된다. shift5 is set equal to Max(3, 15 - BitDepth).

offset5 는 (1 << (shift5 - 1)) 과 동일하게 설정된다. offset5 is set equal to (1 << (shift5 - 1)).

12.4. 그렇지 않으면 (sbDist 가 sbDistTh 이상이면), 다음의 단계들이 적용된다.12.4. Otherwise (if sbDist is greater than or equal to sbDistTh), the following steps apply.

12.4.1. x = sbX - 2 … sbX + sbWidth + 1, y = sbY - 2 … sbY + sbHeight + 1 에 대해, 변수들 (gradientHL0[　x　][　y　], gradientVL0[　x　][　y　], gradientHL1[　x　][　y　] 및 gradientVL1[　x　][　y　]) 은 다음과 같이 도출된다:12.4.1. x = sbX - 2 . sbX + sbWidth + 1, y = sbY - 2... For sbY + sbHeight + 1, the variables (gradientHL0[　x　][　y　], gradientVL0[　x　][　y　], gradientHL1[　x　][　y　] and gradientVL1[　x　][　y　]) are derived as follows:

여기서, shift1 은 미리결정된 값이며, 예를 들어, shift1 은 6 과 동일하게 설정된다. Here, shift1 is a predetermined value, and for example, shift1 is set equal to 6.

12.4.2. x = sbX - 2 … sbX + sbWidth + 1, y = sbY - 2 … sbY + sbHeight + 1 에 대해, 변수들 (tempH[　x　][　y　] 및 tempV[　x　][　y　]) 은 다음과 같이 도출된다:12.4.2. x = sbX - 2 . sbX + sbWidth + 1, y = sbY - 2... For sbY + sbHeight + 1, the variables (tempH[　x　][　y　] and tempV[　x　][　y　]) are derived as follows:

여기서, shift3 은 미리결정된 값이며, 예를 들어, shift3 은 1 과 동일하게 설정된다. Here, shift3 is a predetermined value, and for example, shift3 is set equal to 1.

12.4.3. (piX, piY) (여기서, piX = sbX … sbX + sbWidth - 1, piY = sbY … sbY + sbHeight - 1) 에서의 각각의 픽셀에 대해, 다음의 단계들이 적용된다.12.4.3. For each pixel in (piX, piY) (where piX = sbX ... sbX + sbWidth - 1, piY = sbY ... sbY + sbHeight - 1), the following steps apply.

12.4.3.1. 변수들 (sGx2, sGy2, sGxGy, sGxdI 및 sGydI) 은 다음과 같이 도출된다:12.4.3.1. The variables (sGx2, sGy2, sGxGy, sGxdI and sGydI) are derived as follows:

여기서, i =　-2 … 2 이고, j = -2 … 2 이다. Here, i = 　 -2... 2, and j = -2... is 2.

12.4.3.2. 현재 픽셀의 수평 및 수직 모션 오프셋은 다음과 같이 도출된다:12.4.3.2. The horizontal and vertical motion offsets of the current pixel are derived as follows:

여기서, mvRefineThres 는 미리결정된 값이며, 예를 들어, mvRefineThres 는 (1 << 4) 과 동일하게 설정된다. Here, mvRefineThres is a predetermined value, for example, mvRefineThres is set equal to (1 << 4).

12.4.3.3. 현재 픽셀의 예측 신호는 다음과 같이 도출된다:12.4.3.3. The prediction signal of the current pixel is derived as follows:

여기서, here,

12.5. 서브블록 좌상부 루마 포지션을 다음과 같이 업데이트한다.12.5. Update the upper-left luma position of the subblock as follows.

13. 각각의 서브블록의 도출된 예측 신호를 사용하여 예측 블록을 도출하고, 비디오 디코딩을 위해 도출된 예측 블록을 사용한다.13. A prediction block is derived using the derived prediction signal of each subblock, and the derived prediction block is used for video decoding.

도 15 는 본 개시의 기법들에 따른, 비디오 데이터를 디코딩하기 위한 예시적인 방법을 예시한 플로우차트이다. 현재 블록은 현재 CU 를 포함할 수도 있다. 비디오 디코더 (300) (도 1 및 도 4) 에 대해 설명되지만, 다른 디바이스들이 도 15 의 방법과 유사한 방법을 수행하도록 구성될 수도 있음이 이해되어야 한다. 예를 들어, 예측 프로세싱 유닛 (304) 및/또는 모션 보상 유닛 (316) 이 도 15 의 예시적인 기법들을 수행하도록 구성될 수도 있다. 예측 프로세싱 유닛 (304) 및/또는 모션 보상 유닛 (316) 은 DPB (314) 와 같은 메모리, 또는 비디오 디코더 (300) 의 다른 메모리에 커플링될 수도 있다. 일부 예들에서, 비디오 디코더 (300) 는 도 15 의 예시적인 기법들을 수행하기 위해 비디오 디코더 (300) 에 의해 사용된 정보를 저장하는 메모리 (120) 에 커플링될 수도 있다.15 is a flowchart illustrating an example method for decoding video data, in accordance with the techniques of this disclosure. A current block may contain a current CU. Although described with respect to the video decoder 300 ( FIGS. 1 and 4 ), it should be understood that other devices may be configured to perform a method similar to that of FIG. 15 . For example, prediction processing unit 304 and/or motion compensation unit 316 may be configured to perform the example techniques of FIG. 15 . Prediction processing unit 304 and/or motion compensation unit 316 may be coupled to a memory, such as DPB 314 , or other memory of video decoder 300 . In some examples, video decoder 300 may be coupled to memory 120 that stores information used by video decoder 300 to perform the example techniques of FIG. 15 .

비디오 디코더 (300) 는 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정할 수도 있다 (1500). 예를 들어, 비디오 디코더 (300) 는, BDOF 가 블록에 대해 인에이블됨을 표시하는 시그널링을 수신할 수도 있다. 일부 예들에서, 비디오 디코더 (300) 는, 예컨대, 특정 기준들이 만족되는 것에 기초하여, BDOF 가 블록에 대해 인에이블됨을 추론 (예를 들어, 시그널링을 수신하지 않고 결정) 할 수도 있다.Video decoder 300 may determine that bidirectional optical flow (BDOF) is enabled for the block of video data ( 1500 ). For example, video decoder 300 may receive signaling indicating that BDOF is enabled for the block. In some examples, video decoder 300 may infer (eg, determine without receiving signaling) that BDOF is enabled for a block, eg, based on certain criteria being satisfied.

비디오 디코더 (300) 는, BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할할 수도 있다 (1502). 예를 들어, 비디오 디코더 (300) 는 블록을 N개의 서브블록들로 분할할 수도 있다. 일부 경우들에서, 서브블록들 중 2 이상은 상이한 사이즈들일 수도 있지만, 서브블록들이 동일한 사이즈를 갖는 것이 가능하다. 비디오 디코더 (300) 는 시그널링된 정보에 기초하여 또는 추론에 의해 블록을 분할하는 방법을 결정할 수도 있다.Video decoder 300 may divide a block into a plurality of subblocks based on determining that BDOF is enabled for the block ( 1502 ). For example, video decoder 300 may divide a block into N subblocks. In some cases, two or more of the subblocks may be of different sizes, but it is possible for the subblocks to have the same size. Video decoder 300 may determine how to divide a block based on signaled information or by inference.

비디오 디코더 (300) 는, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정할 수도 있다 (1504). 비디오 디코더 (300) 가 개별 왜곡 값들을 결정할 수도 있는 다양한 방식들이 있을 수도 있다. 일 예로서, 비디오 디코더 (300) 는 제 1 레퍼런스 블록 (예를 들어, I⁽⁰⁾(i,j)) 을 결정하고, 제 2 레퍼런스 블록 (예를 들어, I⁽¹⁾(i,j)) 을 결정할 수도 있다. 비디오 디코더 (300) 는 I⁽⁰⁾(i,j) 와 I⁽¹⁾(i,j) 사이의 절대 차이의 합 (SAD) 을 계산할 수도 있다.Video decoder 300 may determine, for each subblock of one or more of the plurality of subblocks, respective distortion values ( 1504 ). There may be a variety of ways in which video decoder 300 may determine individual distortion values. As an example, video decoder 300 determines a first reference block (e.g., I ⁽⁰⁾ (i,j)) and determines a second reference block (e.g., I ⁽¹⁾ (i,j) )) can be determined. Video decoder 300 may calculate a sum of absolute differences (SAD) between I ⁽⁰⁾ (i,j) and I ⁽¹⁾ (i,j).

하지만, 예시적인 기법들은 그것에 한정되지 않는다. 일부 예들에서, 비디오 디코더 (300) 는, 상기에서 설명된 바와 같이, 왜곡 값을 결정하기 위해 대안적인 기법을 수행할 수도 있다. 예를 들어, 비디오 디코더 (300) 는 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 1 레퍼런스 블록에서의 샘플 값들의 제 1 세트를 결정할 수도 있다 (예컨대, I⁽⁰⁾(i,j) 를 결정함). 비디오 디코더 (300) 는 스케일링된 샘플 값들의 제 1 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 1 세트를 스케일링할 수도 있다 (예컨대, 스케일링된 샘플 값들의 제 1 세트를 생성하기 위해 I⁽⁰⁾(i,j) << shift2 를 결정함). 비디오 디코더 (300) 는 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 2 레퍼런스 블록에서의 샘플 값들의 제 2 세트를 결정할 수도 있다 (예를 들어, I⁽¹⁾(i,j) 를 결정함). 비디오 디코더 (300) 는 스케일링된 샘플 값들의 제 2 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 2 세트를 스케일링할 수도 있다 (예컨대, 스케일링된 샘플 값들의 제 2 세트를 생성하기 위해 I⁽¹⁾(i,j) << shift2 를 결정함). 하나 이상의 예들에서, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하기 위해, 비디오 디코더 (300) 는, 제 1 서브블록에 대해, 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트에 기초하여 개별 왜곡 값들의 왜곡 값을 결정하도록 (예컨대, 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트에 기초하여 SAD 를 결정하도록) 구성될 수도 있다.However, example techniques are not limited thereto. In some examples, video decoder 300 may perform an alternative technique to determine the distortion value, as described above. For example, video decoder 300 may determine a first set of sample values in a first reference block for a first one of the one or more subblocks (e.g., I ⁽⁰⁾ (i,j) decide). Video decoder 300 may scale the first set of sample values with a scale factor to generate the first set of scaled sample values (e.g., I ⁽⁰ ) to generate the first set of scaled sample values. ⁾ (i,j) << determines shift2). Video decoder 300 may determine a second set of sample values in a second reference block for a first one of the one or more subblocks (eg, determine I ⁽¹⁾ (i,j) box). Video decoder 300 may scale the second set of sample values with a scale factor to generate the second set of scaled sample values (e.g., I ^{(1 )} (i,j) << determines shift2). In one or more examples, to determine respective distortion values for each sub-block of one or more of the plurality of sub-blocks, video decoder 300, for a first sub-block, determines the number of scaled sample values determine a distortion value of individual distortion values based on the first set and the second set of scaled sample values (e.g., determine a SAD based on the first set of scaled sample values and the second set of scaled sample values); ) may be configured.

비디오 디코더 (300) 는 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다 (1506). 예를 들어, 도 13 에 관하여 설명된 바와 같이, 비디오 디코더 (300) 에 대해, 서브블록에 대해 픽셀 당 BDOF 를 수행하거나 또는 BDOF 를 바이패스하는 2개의 옵션들이 있을 수도 있다. 일부 예들에서, 서브블록을 평가할 때 비디오 디코더 (300) 에 대한 다른 옵션이 없을 수도 있다.Video decoder 300 may determine that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of one or more subblocks of a plurality of subblocks based on the respective distortion values (1506). For example, as described with respect to FIG. 13 , for video decoder 300 there may be two options: performing per-pixel BDOF on a subblock or bypassing BDOF. In some examples, there may be no other option for video decoder 300 when evaluating a subblock.

일부 예들에서, 픽셀 당 BDOF 를 수행할지 또는 BDOF 를 바이패스할지를 결정하기 위해, 비디오 인코더 (200) 및 비디오 디코더 (300) 는 임계 값을 결정할 수도 있다. 임계 값을 결정하기 위한 하나의 예시적인 방식은 이다. 하지만, 왜곡 값을 결정하기 위한 대안적인 기법이 활용되는 예들에서, 비디오 디코더 (300) 는 임계 값을 와 같이 결정할 수도 있다.In some examples, to determine whether to perform per-pixel BDOF or bypass BDOF, video encoder 200 and video decoder 300 may determine a threshold value. One exemplary way to determine the threshold is am. However, in examples where alternative techniques for determining the distortion value are utilized, video decoder 300 can also be determined as

즉, 비디오 디코더 (300) 는 중간 값을 생성하기 위해 하나 이상의 서브블록들 중 제 1 서브블록의 폭 (예컨대, sbWidth), 하나 이상의 서브블록들 중 제 1 서브블록의 높이 (예컨대, sbHeight), 및 제 1 스케일 팩터 (예컨대, "s") 를 승산할 수도 있다. 비디오 디코더 (300) 는 임계 값을 생성하기 위해 제 2 스케일 팩터에 기초하여 중간 값에 대해 좌측-시프트 연산을 수행할 수도 있다 (예컨대, << (n - shift2) 를 수행함, 여기서, (n - shift2) 는 제 2 스케일 팩터임).That is, video decoder 300 uses a width (eg, sbWidth) of a first one of the one or more subblocks, a height (eg, sbHeight) of a first one of the one or more subblocks, and a first scale factor (eg, “s”). Video decoder 300 may perform a left-shift operation on the intermediate value based on the second scale factor to generate a threshold value (e.g., performing << (n - shift2), where (n - shift2) is the second scale factor).

비디오 디코더 (300) 는 제 1 서브블록에 대한 개별 왜곡 값들의 왜곡 값을 임계 값과 비교할 수도 있다. 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 비디오 디코더 (300) 는, 도 13 의 판정 블록 (1306) 에 예시된 바와 같이, 비교에 기초하여 제 1 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다.Video decoder 300 may compare the distortion value of the individual distortion values for the first subblock to a threshold value. To determine that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of one or more subblocks of a plurality of subblocks based on the respective distortion values, video decoder 300, FIG. 13 As illustrated in decision block 1306 of , it may be determined based on the comparison that either per-pixel BDOF is performed or BDOF is bypassed for the first sub-block.

비디오 디코더 (300) 는, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하도록 구성될 수도 있다 (1508). 일 예로서, 예측 샘플들을 결정하기 위해, 비디오 디코더 (300) 는, 하나 이상의 서브블록들 중 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정할 수도 있다. 이 예에서, 비디오 디코더 (300) 는, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들을 결정할 수도 있고, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들에 기초하여 제 1 서브블록에 대한 예측 블록에서의 샘플들로부터 개별 정세화된 샘플 값들을 결정할 수도 있다.Video decoder 300 may be configured to determine predictive samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed ( 1508 ). As an example, to determine predictive samples, video decoder 300 may determine that per-pixel BDOF is performed on a first one of the one or more subblocks. In this example, video decoder 300 may determine, for each sample in the first subblock, separate motion refinements, and for each sample in the first subblock, based on the respective motion refinements. Individual refined sample values may be determined from samples in a prediction block for the first subblock.

예를 들어, 비디오 디코더 (300) 는 의 연산들을 수행할 수도 있다. pred_BDOF 는 정세화된 샘플 값들을 나타낼 수도 있다. 이 예에서, I⁽⁰⁾(x,y) + I⁽¹⁾(x,y) 가 예측 블록으로서 고려될 수도 있다. b'(x,y) 에 대한 값은, 서브블록에서의 각각의 샘플에 대한 개별 모션 정세화들 (v'_x, v'_y) 에 의해 결정될 수도 있다. 따라서, 개별 정세화 샘플 값들 (예를 들어, pred_BDOF) 은 예측 블록 및 개별 모션 정세화들에 기초한다.For example, video decoder 300 operations can be performed. pred _BDOF may indicate refined sample values. In this example, I ⁽⁰⁾ (x,y) + I ⁽¹⁾ (x,y) may be considered as the predictive block. The value for b'(x,y) may be determined by the respective motion refinements (v' _x , v' _y ) for each sample in the subblock. Thus, individual refinement sample values (eg, pred _BDOF ) are based on the predictive block and individual motion refinements.

모션 정세화들 (v'_x, v'_y) 을 결정하는 다양한 방식들이 있을 수도 있다. 모션 정세화들을 결정하는 것의 부분으로서, 비디오 디코더 (300) 는 를 포함하여 자기상관 및 상호상관을 결정할 수도 있다. 왜곡 값들을 결정하기 위한 대안적인 기법이 사용되는 경우와 같은 하나 이상의 예들에서, 비디오 디코더 (300) 는 제 1 서브블록에 대한 왜곡 값을 결정하기 위해 를 이미 결정하였을 수도 있다. 그러한 예들에서, 비디오 디코더 (300) 는 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 (예컨대, ) 및 스케일링된 샘플 값들의 제 2 세트 (예컨대, ) 를 재사용할 수도 있다 (예컨대, 에 대한 값은 및 를 재계산하지 않고 결정될 수 있음).There may be various ways of determining the motion refinements (v' _x , v' _y ). As part of determining motion refinements, video decoder 300 Including can also determine autocorrelation and cross-correlation. In one or more examples, such as when an alternative technique for determining distortion values is used, video decoder 300 may determine a distortion value for the first subblock using may have already been decided. In such examples, video decoder 300 uses a first set of scaled sample values (e.g., ) and a second set of scaled sample values (eg, ) can also be reused (e.g., the value for and can be determined without recalculating ).

비디오 디코더 (300) 는 예측 샘플들에 기초하여 블록을 복원할 수도 있다 (1510). 예를 들어, 예측 샘플들에 기초하여 블록을 복원하는 것은, 비디오 디코더 (300) 가 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 수신하는 것, 및 잔차 값들을 예측 샘플들에 가산하여 블록을 복원하는 것을 포함할 수도 있다.Video decoder 300 may reconstruct the block based on the predictive samples ( 1510 ). For example, reconstructing a block based on prediction samples involves video decoder 300 receiving residual values representing differences between the prediction samples and samples of the block, and the residual values to the prediction samples. It may also involve restoring blocks by adding them.

상기는 블록의 개별 서브블록들에 대한 예들을 제공한다. 다음은, 2개의 서브블록들이 존재하고 그리고 하나의 서브블록에 대해 픽셀 당 BDOF 가 수행되고, 다른 서브블록에 대해 BDOF 가 바이패스되는 예이다.The above gives examples for individual subblocks of a block. The following is an example in which there are two subblocks and per-pixel BDOF is performed for one subblock and BDOF is bypassed for the other subblock.

예를 들어, 하나 이상의 서브블록들 중 제 1 서브블록에 대해, 비디오 디코더 (300) 는 개별 왜곡 값들의 제 1 왜곡 값을 결정할 수도 있고, 하나 이상의 서브블록들 중 제 2 서브블록에 대해, 비디오 디코더 (300) 는 개별 왜곡 값들의 제 2 왜곡 값을 결정할 수도 있다.For example, for a first subblock of the one or more subblocks, video decoder 300 may determine a first distortion value of the respective distortion values, and for a second subblock of the one or more subblocks, video decoder 300 may determine a first distortion value of the respective distortion values. Decoder 300 may determine a second distortion value of the individual distortion values.

복수의 서브블록들 중 제 1 서브블록에 대해, 비디오 디코더 (300) 는 제 1 왜곡 값에 기초하여 (예컨대, 제 1 왜곡 값의 임계 값과의 비교에 기초하여) BDOF 가 제 1 서브블록에 대해 인에이블됨을 결정할 수도 있다. BDOF 가 제 1 서브블록에 대해 인에이블된다는 결정에 기초하여, 비디오 디코더 (300) 는 제 1 서브블록에 대한 예측 샘플들의 제 1 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정할 수도 있다. 예를 들어, 비디오 디코더 (300) 는, 제 1 서브블록의 제 1 샘플에 대해, 제 1 예측 샘플을 정세화하기 위한 제 1 모션 정세화를 도출하고, 제 1 서브블록의 제 2 샘플에 대해, 제 2 예측 샘플을 정세화하기 위한 제 2 모션 정세화를 도출할 수도 있는 등등이다.For a first subblock of the plurality of subblocks, video decoder 300 determines that a BDOF is assigned to the first subblock based on the first distortion value (eg, based on a comparison of the first distortion value with a threshold value). may be determined to be enabled for Based on determining that BDOF is enabled for the first subblock, video decoder 300 may determine a per-pixel motion refinement to refine the first set of predictive samples for the first subblock. For example, video decoder 300 derives, for a first sample of a first subblock, a first motion refinement for refining a first prediction sample, and for a second sample of a first subblock, a first motion refinement for refining a first prediction sample. may derive a second motion refinement to refine 2 prediction samples, and so forth.

복수의 서브블록들 중 제 2 서브블록에 대해, 비디오 디코더 (300) 는 제 2 왜곡 값에 기초하여 (예컨대, 제 2 왜곡 값의 임계 값과의 비교에 기초하여) BDOF 가 바이패스됨을 결정할 수도 있다. BDOF 가 제 2 블록에 대해 바이패스된다는 결정에 기초하여, 비디오 디코더 (300) 는 제 2 서브블록에 대한 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하는 것을 바이패스할 수도 있다. 예를 들어, 비디오 디코더 (300) 는, 제 1 서브블록의 제 1 샘플에 대해, 제 1 예측 샘플을 정세화하기 위한 제 1 모션 정세화의 도출을 바이패스하고, 제 1 서브블록의 제 2 샘플에 대해, 제 2 예측 샘플을 정세화하기 위한 제 2 모션 정세화의 도출을 바이패스할 수도 있는 등등이다.For a second subblock of the plurality of subblocks, video decoder 300 may determine that the BDOF is bypassed based on the second distortion value (eg, based on a comparison of the second distortion value to a threshold value). there is. Based on determining that BDOF is bypassed for the second block, video decoder 300 may bypass determining a per-pixel motion refinement for refining the second set of predictive samples for the second sub-block. . For example, video decoder 300 bypasses, for a first sample of a first subblock, derivation of a first motion refinement to refine a first prediction sample, and for a second sample of a first subblock , may bypass the derivation of the second motion refinement for refining the second prediction sample, and so forth.

제 1 서브블록에 대해, 비디오 디코더 (300) 는 제 1 서브블록에 대한 픽셀 당 모션 정세화에 기초하여 제 1 서브블록의 예측 샘플들의 정세화된 제 1 세트를 결정할 수도 있다 (예컨대, 본 개시에서 설명된 예시적인 기법들을 사용하여 pred_BDOF 를 결정함). 제 2 서브블록에 대해, 비디오 디코더 (300) 는 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화에 기초하여 예측 샘플들의 제 2 세트를 정세화하지 않고 예측 샘플들의 제 2 세트를 결정할 수도 있다. 즉, 제 2 서브블록에 대해, BDOF 가 바이패스된다. 비디오 디코더 (300) 는, 레퍼런스 블록들의 가중 평균에 기초하여 예측 블록을 결정하는 것과 같이 다양한 기법들에 기초하여 제 2 서브블록에 대한 예측 샘플들을 결정할 수도 있다.For a first sub-block, video decoder 300 may determine a refined first set of predictive samples of the first sub-block based on the per-pixel motion refinement for the first sub-block (e.g., as described in this disclosure). determine pred _BDOF using example techniques described above). For the second subblock, video decoder 300 may determine the second set of predictive samples without refining the second set of predictive samples based on per-pixel motion refinement to refine the second set of predictive samples. . That is, for the second subblock, BDOF is bypassed. Video decoder 300 may determine predictive samples for the second subblock based on various techniques, such as determining the predictive block based on a weighted average of reference blocks.

도 16 은 본 개시의 기법들에 따른, 비디오 데이터를 인코딩하는 예시적인 방법을 예시한 플로우차트이다. 현재 블록은 현재 CU 를 포함할 수도 있다. 비디오 인코더 (200) (도 1 및 도 3) 에 대해 설명되지만, 다른 디바이스들이 도 16 의 방법과 유사한 방법을 수행하도록 구성될 수도 있음이 이해되어야 한다. 예를 들어, 모션 선택 유닛 (202) 및/또는 모션 보상 유닛 (224) 이 도 16 의 예시적인 기법들을 수행하도록 구성될 수도 있다. 모션 선택 유닛 (202) 및/또는 모션 보상 유닛 (224) 은 DPB (218) 와 같은 메모리, 또는 비디오 인코더 (200) 의 다른 메모리에 커플링될 수도 있다. 일부 예들에서, 비디오 인코더 (200) 는 도 16 의 예시적인 기법들을 수행하기 위해 비디오 인코더 (200) 에 의해 사용된 정보를 저장하는 메모리 (106) 에 커플링될 수도 있다. 일반적으로, 비디오 인코더 (200) 는 예측 샘플들을 생성하기 위해 비디오 디코더 (300) 와 동일한 동작들을 수행할 수도 있다.16 is a flowchart illustrating an example method of encoding video data, in accordance with the techniques of this disclosure. A current block may contain a current CU. Although described with respect to video encoder 200 ( FIGS. 1 and 3 ), it should be understood that other devices may be configured to perform a method similar to that of FIG. 16 . For example, motion selection unit 202 and/or motion compensation unit 224 may be configured to perform the example techniques of FIG. 16 . Motion selection unit 202 and/or motion compensation unit 224 may be coupled to a memory, such as DPB 218 , or other memory of video encoder 200 . In some examples, video encoder 200 may be coupled to memory 106 that stores information used by video encoder 200 to perform the example techniques of FIG. 16 . In general, video encoder 200 may perform the same operations as video decoder 300 to generate predictive samples.

비디오 인코더 (200) 는 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정할 수도 있다 (1600). 예를 들어, 비디오 인코더 (200) 는 상이한 코딩 모드들과 연관된 레이트-왜곡 비용들을 결정할 수도 있고, 레이트-왜곡 비용들에 기초하여, BDOF 가 블록에 대해 인에이블됨을 결정할 수도 있다.Video encoder 200 may determine that bidirectional optical flow (BDOF) is enabled for the block of video data ( 1600 ). For example, video encoder 200 may determine rate-distortion costs associated with the different coding modes and, based on the rate-distortion costs, may determine that BDOF is enabled for the block.

비디오 인코더 (200) 는, BDOF 가 블록에 대해 인에이블될 경우 블록을 복수의 서브블록들로 분할할 수도 있다 (1602). 비디오 인코더 (200) 는, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정할 수도 있다 (1604). 비디오 인코더 (200) 는 개별 왜곡 값들을 결정하기 위해 비디오 디코더 (300) 에 대해 설명된 것들과 동일한 기법들을 수행할 수도 있다.Video encoder 200 may divide a block into a plurality of subblocks when BDOF is enabled for the block ( 1602 ). Video encoder 200 may determine, for each subblock of one or more of the plurality of subblocks, respective distortion values ( 1604 ). Video encoder 200 may perform the same techniques as those described for video decoder 300 to determine individual distortion values.

비디오 인코더 (200) 는 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정할 수도 있다 (1606). 예를 들어, 비디오 인코더 (200) 는 픽셀당 BDOF 가 수행되는지 또는 BDOF 가 바이패스되는지를 표시하는 정보를 시그널링하지 않을 수도 있기 때문에, 비디오 인코더 (200) 는, 각각의 서브블록에 대해 픽셀 당 BDOF 가 수행되는지 또는 BDOF 가 바이패스되는지를 결정하기 위해 비디오 디코더 (300) 와 동일한 동작들을 수행할 수도 있다.Video encoder 200 may determine that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of one or more subblocks of a plurality of subblocks based on the respective distortion values (1606). For example, since video encoder 200 may not signal information indicating whether per-pixel BDOF is performed or whether BDOF is bypassed, video encoder 200 may perform per-pixel BDOF for each subblock. may perform the same operations as video decoder 300 to determine whether .

비디오 인코더 (200) 는, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정할 수도 있다 (1608). 비디오 인코더 (200) 는 예측 샘플들과 블록의 샘플들 (예컨대, 개별 서브블록들) 사이의 잔차 값들을 시그널링할 수도 있다 (1610).Video encoder 200 may determine predictive samples for each subblock of one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed ( 1608 ). Video encoder 200 may signal residual values between prediction samples and samples of a block (eg, individual subblocks) ( 1610 ).

다음은, 함께 또는 별도로 적용될 수도 있는 일부 예시적인 기법들을 기술한다.The following describes some example techniques that may be applied together or separately.

조항 1. 비디오 데이터를 디코딩하는 방법으로서, 그 방법은, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하는 단계; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하는 단계; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 단계; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 단계; 및 예측 샘플들에 기초하여 블록을 복원하는 단계를 포함한다.Article 1. A method of decoding video data, the method comprising: determining that bi-directional optical flow (BDOF) is enabled for a block of video data; dividing the block into a plurality of subblocks based on a determination that BDOF is enabled for the block; for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determining individual distortion values; determining that one of the per-pixel BDOFs is performed or the BDOFs are bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determining prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and reconstructing the block based on the predicted samples.

조항 2. 조항 1 의 방법에 있어서, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 단계는, 하나 이상의 서브블록들 중 제 1 서브블록에 대해, 개별 왜곡 값들의 제 1 왜곡 값을 결정하는 단계; 및 하나 이상의 서브블록들 중 제 2 서브블록에 대해, 개별 왜곡 값들의 제 2 왜곡 값을 결정하는 단계를 포함하고, 여기서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계는, 복수의 서브블록들 중 제 1 서브블록에 대해, 제 1 왜곡 값에 기초하여 BDOF 가 제 1 서브블록에 대해 인에이블됨을 결정하는 단계; BDOF 가 제 1 서브블록에 대해 인에이블된다는 결정에 기초하여, 제 1 서브블록에 대한 예측 샘플들의 제 1 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하는 단계; 복수의 서브블록들 중 제 2 서브블록에 대해, 제 2 왜곡 값에 기초하여 BDOF 가 바이패스됨을 결정하는 단계; 및 BDOF 가 제 2 블록에 대해 바이패스된다는 결정에 기초하여, 제 2 서브블록에 대한 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하는 것을 바이패스하는 단계를 포함하고, 여기서, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 단계는, 제 1 서브블록에 대해, 제 1 서브블록에 대한 픽셀 당 모션 정세화에 기초하여 제 1 서브블록의 예측 샘플들의 정세화된 제 1 세트를 결정하는 단계; 및 제 2 서브블록에 대해, 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화에 기초하여 예측 샘플들의 제 2 세트를 정세화하지 않고 예측 샘플들의 제 2 세트를 결정하는 단계를 포함한다.Article 2. The method of clause 1, wherein for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determining individual distortion values comprises: for a first sub-block of the one or more sub-blocks, an individual distortion value determining a first distortion value of ?; and determining, for a second subblock of the one or more subblocks, a second distortion value of the respective distortion values, wherein each of the one or more subblocks of the plurality of subblocks based on the respective distortion values. Determining that one of the BDOFs per pixel is performed or the BDOF is bypassed for a subblock of , wherein, for a first subblock among a plurality of subblocks, based on a first distortion value, the BDOF is selected as the first subblock. determining that it is enabled for; based on determining that BDOF is enabled for the first sub-block, determining a per-pixel motion refinement for refining a first set of prediction samples for the first sub-block; determining that BDOF is bypassed for a second subblock among the plurality of subblocks based on a second distortion value; and based on the determination that the BDOF is bypassed for the second block, bypassing determining a per-pixel motion refinement for refining a second set of prediction samples for the second sub-block, wherein: Determining prediction samples for each sub-block of the one or more sub-blocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed comprises, for a first sub-block, per-pixel for the first sub-block. determining a refined first set of prediction samples of a first subblock based on motion refinement; and, for the second sub-block, determining a second set of predictive samples without refining the second set of predictive samples based on the per-pixel motion refinement to refine the second set of predictive samples.

조항 3. 조항 1 및 조항 2 중 어느 하나의 방법에 있어서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계는 하나 이상의 서브블록들 중 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하는 단계를 포함하고, 그 방법은, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들을 결정하는 단계를 더 포함하고, 여기서, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 단계는, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들에 기초하여 제 1 서브블록에 대한 예측 블록에서의 샘플들로부터 개별 정세화된 샘플 값들을 결정하는 단계를 포함한다.Article 3. The method of any one of clauses 1 and 2, wherein either per-pixel BDOF is performed for each subblock of one or more subblocks of the plurality of subblocks based on individual distortion values, or BDOF is bypassed. The determining step includes determining that per-pixel BDOF is performed for a first subblock of the one or more subblocks, the method determining, for each sample in the first subblock, respective motion refinements. further comprising, wherein determining prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed comprises: For each sample, determining respective refined sample values from samples in the prediction block for the first subblock based on the respective motion refinements.

조항 4. 조항 1 내지 조항 3 중 어느 하나의 방법은, 중간 값을 생성하기 위해 하나 이상의 서브블록들 중 제 1 서브블록의 폭, 하나 이상의 서브블록들 중 제 1 서브블록의 높이, 및 제 1 스케일 팩터를 승산하는 단계; 임계 값을 생성하기 위해 제 2 스케일 팩터에 기초하여 중간 값에 대해 좌측-시프트 연산을 수행하는 단계; 및 제 1 서브블록에 대한 개별 왜곡 값들의 왜곡 값을 임계 값과 비교하는 단계를 더 포함하고, 여기서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계는 비교에 기초하여 제 1 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계를 포함한다.Article 4. The method of any one of clauses 1 to 3 uses a width of a first one of the one or more subblocks, a height of a first one of the one or more subblocks, and a first scale factor to generate an intermediate value. multiplying; performing a left-shift operation on the intermediate value based on the second scale factor to generate a threshold value; and comparing a distortion value of the individual distortion values for the first sub-block with a threshold value, wherein for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on the individual distortion values. Determining that one of the BDOFs per pixel is performed or the BDOF is bypassed includes determining that one of the BDOFs per pixel is performed or the BDOF is bypassed for the first subblock based on the comparison.

조항 5. 조항 1 내지 조항 4 중 어느 하나의 방법은, 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 1 레퍼런스 블록에서의 샘플 값들의 제 1 세트를 결정하는 단계; 스케일링된 샘플 값들의 제 1 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 1 세트를 스케일링하는 단계; 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 2 레퍼런스 블록에서의 샘플 값들의 제 2 세트를 결정하는 단계; 및 스케일링된 샘플 값들의 제 2 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 2 세트를 스케일링하는 단계를 더 포함하고, 여기서, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 단계는, 제 1 서브블록에 대해, 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트에 기초하여 개별 왜곡 값들의 왜곡 값을 결정하는 단계를 포함한다.Article 5. The method of any of clauses 1 to 4 includes determining a first set of sample values in a first reference block for a first subblock of the one or more subblocks; scaling the first set of sample values with a scale factor to produce a first set of scaled sample values; determining a second set of sample values in a second reference block for a first one of the one or more subblocks; and scaling the second set of sample values with a scale factor to generate a second set of scaled sample values, wherein for each subblock of one or more subblocks of the plurality of subblocks. , determining individual distortion values comprises, for the first subblock, determining a distortion value of the individual distortion values based on the first set of scaled sample values and the second set of scaled sample values. .

조항 6. 조항 5 의 방법에 있어서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계는 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하는 단계를 포함하고, 그 방법은 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용하는 단계를 더 포함한다.Article 6. The method of clause 5, wherein determining that one of the per-pixel BDOF is performed or the BDOF is bypassed for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on respective distortion values comprises: determining that per-pixel BDOF is performed for the sub-block, the method comprising: a first set of scaled sample values and a second set of scaled sample values to determine a per-pixel motion refinement for the per-pixel BDOF; It further includes the step of reusing.

조항 7. 조항 5 의 방법에 있어서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 단계는 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하는 단계를 포함하고, 그 방법은 BDOF 에 대한 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용하는 단계를 더 포함한다.Article 7. The method of clause 5, wherein determining that one of the per-pixel BDOF is performed or the BDOF is bypassed for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on respective distortion values comprises: determining that per-pixel BDOF is performed for the subblock, the method comprising: reusing the first set of scaled sample values and the second set of scaled sample values to determine motion refinement for the BDOF; more includes

조항 8. 조항 1 내지 조항 7 중 어느 하나의 방법에 있어서, 블록을 복원하는 단계는, 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 수신하는 단계; 및 잔차 값들을 예측 샘플들에 가산하여 블록을 복원하는 단계를 포함한다.Article 8. The method of any one of clauses 1 to 7, wherein reconstructing the block comprises: receiving residual values representing differences between prediction samples and samples of the block; and adding the residual values to the prediction samples to reconstruct the block.

조항 9. 비디오 데이터를 디코딩하기 위한 디바이스로서, 그 디바이스는 비디오 데이터를 저장하도록 구성된 메모리; 및 메모리에 커플링된 프로세싱 회로부를 포함하고, 프로세싱 회로부는, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하고; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하고; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하고; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하고; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하고; 그리고 예측 샘플들에 기초하여 블록을 복원하도록 구성된다.Article 9. A device for decoding video data, the device comprising: a memory configured to store video data; and processing circuitry coupled to the memory, wherein the processing circuitry determines that bi-directional optical flow (BDOF) is enabled for the block of video data; divide the block into a plurality of subblocks based on determining that BDOF is enabled for the block; for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determining respective distortion values; determine that one of the per-pixel BDOF is performed or the BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; and reconstruct the block based on the predicted samples.

조항 10. 조항 9 의 디바이스에 있어서, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하기 위해, 프로세싱 회로부는, 하나 이상의 서브블록들 중 제 1 서브블록에 대해, 개별 왜곡 값들의 제 1 왜곡 값을 결정하고; 그리고 하나 이상의 서브블록들 중 제 2 서브블록에 대해, 개별 왜곡 값들의 제 2 왜곡 값을 결정하도록 구성되고, 여기서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 프로세싱 회로부는, 복수의 서브블록들 중 제 1 서브블록에 대해, 제 1 왜곡 값에 기초하여 BDOF 가 제 1 서브블록에 대해 인에이블됨을 결정하고; BDOF 가 제 1 서브블록에 대해 인에이블된다는 결정에 기초하여, 제 1 서브블록에 대한 예측 샘플들의 제 1 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하고; 복수의 서브블록들 중 제 2 서브블록에 대해, 제 2 왜곡 값에 기초하여 BDOF 가 바이패스됨을 결정하고; 그리고 BDOF 가 제 2 블록에 대해 바이패스된다는 결정에 기초하여, 제 2 서브블록에 대한 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하는 것을 바이패스하도록 구성되고, 여기서, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하기 위해, 프로세싱 회로부는, 제 1 서브블록에 대해, 제 1 서브블록에 대한 픽셀 당 모션 정세화에 기초하여 제 1 서브블록의 예측 샘플들의 정세화된 제 1 세트를 결정하고; 그리고 제 2 서브블록에 대해, 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화에 기초하여 예측 샘플들의 제 2 세트를 정세화하지 않고 예측 샘플들의 제 2 세트를 결정하도록 구성된다.Article 10. The device of clause 9, wherein for each sub-block of one or more sub-blocks of the plurality of sub-blocks, to determine respective distortion values, the processing circuitry comprises: for a first sub-block of the one or more sub-blocks: determine a first distortion value of the respective distortion values; and for a second subblock of the one or more subblocks, determine a second distortion value of the respective distortion values, wherein each subblock of the one or more subblocks of the plurality of subblocks is configured to determine a second distortion value of the respective distortion values. To determine that one of the per-pixel BDOFs are performed for the block or that the BDOFs are bypassed, processing circuitry determines, for a first one of the plurality of subblocks, that the BDOF determines, based on a first distortion value, that the BDOF is bypassed. determine that it is enabled for the block; determine a per-pixel motion refinement for refining a first set of prediction samples for the first sub-block based on the determination that BDOF is enabled for the first sub-block; For a second subblock of the plurality of subblocks, determine that the BDOF is bypassed based on the second distortion value; and, based on the determination that BDOF is bypassed for the second block, bypass determining a per-pixel motion refinement for refining a second set of prediction samples for the second sub-block, where: To determine predictive samples for each sub-block of the one or more sub-blocks based on a determination that BDOF is performed or BDOF is bypassed, the processing circuitry determines, for the first sub-block, a pixel for the first sub-block determine a refined first set of prediction samples of the first subblock based on the per motion refinement; and for the second subblock, determine a second set of prediction samples without refining the second set of prediction samples based on the per-pixel motion refinement to refine the second set of prediction samples.

조항 11. 조항 9 및 조항 10 중 어느 하나의 디바이스에 있어서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 프로세싱 회로부는 하나 이상의 서브블록들 중 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하도록 구성되고, 여기서, 프로세싱 회로부는 추가로, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들을 결정하도록 구성되고, 여기서, 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하기 위해, 프로세싱 회로부는, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들에 기초하여 제 1 서브블록에 대한 예측 블록에서의 샘플들로부터 개별 정세화된 샘플 값들을 결정하도록 구성된다.Article 11. The device of any of clauses 9 and 10, wherein for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on respective distortion values, one of the per-pixel BDOFs is performed or the BDOF is bypassed. To determine, the processing circuitry is configured to determine that per-pixel BDOF is performed for a first one of the one or more subblocks, wherein the processing circuitry is further configured to: for each sample in the first subblock: processing circuitry configured to determine individual motion refinements, wherein processing circuitry to determine prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and for each sample in the first subblock, determine respective refined sample values from samples in the prediction block for the first subblock based on the respective motion refinements.

조항 12. 조항 9 내지 조항 11 중 어느 하나의 디바이스에 있어서, 프로세싱 회로부는, 중간 값을 생성하기 위해 하나 이상의 서브블록들 중 제 1 서브블록의 폭, 하나 이상의 서브블록들 중 제 1 서브블록의 높이, 및 제 1 스케일 팩터를 승산하고; 임계 값을 생성하기 위해 제 2 스케일 팩터에 기초하여 중간 값에 대해 좌측-시프트 연산을 수행하고; 그리고 제 1 서브블록에 대한 개별 왜곡 값들의 왜곡 값을 임계 값과 비교하도록 구성되고, 여기서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 프로세싱 회로부는 비교에 기초하여 제 1 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하도록 구성된다.Article 12. The device of any of clauses 9-11, wherein processing circuitry comprises: a width of a first of the one or more sub-blocks, a height of a first of the one or more sub-blocks, and multiply by a first scale factor; perform a left-shift operation on the intermediate value based on the second scale factor to generate a threshold value; and compare a distortion value of the respective distortion values for the first sub-block to a threshold value, wherein the per-pixel BDOF for each sub-block of one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values. To determine that one of the BDOFs is performed or the BDOF is bypassed, the processing circuitry is configured to determine that one of the BDOFs is performed or the BDOF is bypassed per pixel for the first subblock based on the comparison.

조항 13. 조항 9 내지 조항 12 중 어느 하나의 디바이스에 있어서, 프로세싱 회로부는, 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 1 레퍼런스 블록에서의 샘플 값들의 제 1 세트를 결정하고; 스케일링된 샘플 값들의 제 1 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 1 세트를 스케일링하고; 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 2 레퍼런스 블록에서의 샘플 값들의 제 2 세트를 결정하고; 그리고 스케일링된 샘플 값들의 제 2 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 2 세트를 스케일링하도록 구성되고, 여기서, 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하기 위해, 프로세싱 회로부는, 제 1 서브블록에 대해, 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트에 기초하여 개별 왜곡 값들의 왜곡 값을 결정하도록 구성된다.Article 13. The device of any of clauses 9-12, wherein the processing circuitry is configured to: determine a first set of sample values in a first reference block for a first one of the one or more subblocks; scaling the first set of sample values with a scale factor to produce a first set of scaled sample values; determine a second set of sample values in a second reference block for a first one of the one or more subblocks; and scale the second set of sample values with a scale factor to generate a second set of scaled sample values, wherein, for each subblock of one or more subblocks of the plurality of subblocks, a respective distortion. To determine the values, the processing circuitry is configured to, for the first sub-block, determine a distortion value of the individual distortion values based on the first set of scaled sample values and the second set of scaled sample values.

조항 14. 조항 13 의 디바이스에 있어서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 프로세싱 회로부는 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하도록 구성되고, 여기서, 프로세싱 회로부는 픽셀 당 BDOF 에 대한 픽셀 당 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용하도록 구성된다.Article 14. The device of clause 13, wherein processing circuitry is configured to determine that one of the per-pixel BDOF is performed or the BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values. is configured to determine that per-pixel BDOF is performed for the first subblock, wherein the processing circuitry includes a first set of scaled sample values and a set of scaled sample values to determine a per-pixel motion refinement for the per-pixel BDOF. configured to reuse the second set.

조항 15. 조항 13 의 디바이스에 있어서, 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하기 위해, 프로세싱 회로부는 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하도록 구성되고, 여기서, 프로세싱 회로부는 BDOF 에 대한 모션 정세화를 결정하기 위해 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트를 재사용하도록 구성된다.Article 15. The device of clause 13, wherein processing circuitry is configured to determine that one of the per-pixel BDOF is performed or the BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values. is configured to determine that per-pixel BDOF is performed for the first sub-block, wherein the processing circuitry determines a first set of scaled sample values and a second set of scaled sample values to determine a motion refinement for the BDOF. configured to be reused.

조항 16. 조항 9 내지 조항 15 중 어느 하나의 디바이스에 있어서, 블록을 복원하기 위해, 프로세싱 회로부는, 예측 샘플들과 블록의 샘플들 사이의 차이를 나타내는 잔차 값들을 수신하고; 그리고 잔차 값들을 예측 샘플들에 가산하여 블록을 복원하도록 구성된다.Article 16. The device of any of clauses 9 to 15, wherein to reconstruct a block, processing circuitry receives residual values representing differences between predicted samples and samples of the block; and reconstruct the block by adding the residual values to the prediction samples.

조항 17. 조항 9 내지 조항 16 중 어느 하나의 디바이스는 디코딩된 비디오 데이터를 디스플레이하도록 구성된 디스플레이를 더 포함한다.Article 17. The device of any of clauses 9 to 16 further comprises a display configured to display the decoded video data.

조항 18. 조항 9 내지 조항 17 중 어느 하나의 디바이스에 있어서, 그 디바이스는 카메라, 컴퓨터, 모바일 디바이스, 브로드캐스트 수신기 디바이스, 또는 셋탑 박스 중 하나 이상을 포함한다.Article 18. The device of any of clauses 9-17, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set top box.

조항 19. 명령들을 저장하는 컴퓨터 판독가능 저장 매체로서, 그 명령들은, 실행될 경우, 하나 이상의 프로세서들로 하여금: 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하게 하고; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하게 하고; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하게 하고; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하게 하고; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하게 하고; 그리고 예측 샘플들에 기초하여 블록을 복원하게 한다.Article 19. A computer-readable storage medium storing instructions that, when executed, cause one or more processors to: determine that bi-directional optical flow (BDOF) is enabled for a block of video data; divide the block into a plurality of subblocks based on a determination that BDOF is enabled for the block; for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determine respective distortion values; determine that one of the BDOFs per pixel is performed or the BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on the respective distortion values; determine prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and to reconstruct a block based on the predicted samples.

조항 20. 조항 19 의 컴퓨터 판독가능 저장 매체에 있어서, 하나 이상의 프로세서들로 하여금 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금, 하나 이상의 서브블록들 중 제 1 서브블록에 대해, 개별 왜곡 값들의 제 1 왜곡 값을 결정하게 하고; 그리고 하나 이상의 서브블록들 중 제 2 서브블록에 대해, 개별 왜곡 값들의 제 2 왜곡 값을 결정하게 하는 명령들을 포함하고, 여기서, 하나 이상의 프로세서들로 하여금 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금, 복수의 서브블록들 중 제 1 서브블록에 대해, 제 1 왜곡 값에 기초하여 BDOF 가 제 1 서브블록에 대해 인에이블됨을 결정하게 하고; BDOF 가 제 1 서브블록에 대해 인에이블된다는 결정에 기초하여, 제 1 서브블록에 대한 예측 샘플들의 제 1 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하게 하고; 복수의 서브블록들 중 제 2 서브블록에 대해, 제 2 왜곡 값에 기초하여 BDOF 가 바이패스됨을 결정하게 하고; 그리고 BDOF 가 제 2 블록에 대해 바이패스된다는 결정에 기초하여, 제 2 서브블록에 대한 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화를 결정하는 것을 바이패스하게 하는 명령들을 포함하고, 여기서, 하나 이상의 프로세서들로 하여금 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금, 제 1 서브블록에 대해, 제 1 서브블록에 대한 픽셀 당 모션 정세화에 기초하여 제 1 서브블록의 예측 샘플들의 정세화된 제 1 세트를 결정하게 하고; 그리고 제 2 서브블록에 대해, 예측 샘플들의 제 2 세트를 정세화하기 위한 픽셀 당 모션 정세화에 기초하여 예측 샘플들의 제 2 세트를 정세화하지 않고 예측 샘플들의 제 2 세트를 결정하게 하는 명령들을 포함한다.Article 20. The computer-readable storage medium of clause 19, wherein the instructions causing the one or more processors to determine, for each sub-block of the one or more sub-blocks of the plurality of sub-blocks, individual distortion values include: determine, for a first subblock of the one or more subblocks, a first distortion value of the respective distortion values; and instructions to cause, for a second one of the one or more subblocks, to determine a second distortion value of the respective distortion values, wherein the one or more processors determine a plurality of subblocks based on the respective distortion values. instructions that cause one or more processors to determine that one of the per-pixel BDOFs is performed or that BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks , determine that BDOF is enabled for the first subblock based on the first distortion value; determine a per-pixel motion refinement for refining a first set of prediction samples for the first sub-block based on the determination that BDOF is enabled for the first sub-block; for a second subblock of the plurality of subblocks, determine that the BDOF is bypassed based on the second distortion value; and instructions to bypass determining a per-pixel motion refinement for refining a second set of prediction samples for a second sub-block based on a determination that the BDOF is bypassed for the second block, wherein: , instructions that cause one or more processors to determine prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed, for a first sub-block, determine a refined first set of prediction samples of the first sub-block based on the per-pixel motion refinement for the first sub-block; and instructions to cause, for the second subblock, to determine a second set of predictive samples without refining the second set of predictive samples based on per-pixel motion refinement to refine the second set of predictive samples.

조항 21. 조항 19 및 조항 20 중 어느 하나의 컴퓨터 판독가능 저장 매체에 있어서, 하나 이상의 프로세서들로 하여금 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금 하나 이상의 서브블록들 중 제 1 서브블록에 대해 픽셀 당 BDOF 가 수행됨을 결정하게 하는 명령들을 포함하고, 그 명령들은, 하나 이상의 프로세서들로 하여금, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들을 결정하게 하는 명령들을 더 포함하고, 여기서, 하나 이상의 프로세서들로 하여금 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금, 제 1 서브블록에서의 각각의 샘플에 대해, 개별 모션 정세화들에 기초하여 제 1 서브블록에 대한 예측 블록에서의 샘플들로부터 개별 정세화된 샘플 값들을 결정하게 하는 명령들을 포함한다.Article 21. The computer-readable storage medium of any one of clauses 19 and 20, wherein the one or more processors cause the one or more processors to generate a per-pixel BDOF of the BDOF for each subblock of one or more subblocks of the plurality of subblocks based on respective distortion values. instructions that cause one to determine that BDOF is performed or that BDOF is bypassed include instructions that cause one or more processors to determine that per-pixel BDOF is performed for a first subblock of the one or more subblocks; further include instructions that cause the one or more processors to determine, for each sample in the first subblock, individual motion refinements, wherein the one or more processors perform per-pixel BDOF or BDOF instructions that cause the one or more processors to determine, for each sample in the first subblock, a respective motion refinement and instructions to determine individual refined sample values from samples in a predictive block for the first sub-block based on .

조항 22. 조항 19 내지 조항 21 중 어느 하나의 컴퓨터 판독가능 저장 매체에 있어서, 하나 이상의 프로세서들로 하여금: 중간 값을 생성하기 위해 하나 이상의 서브블록들 중 제 1 서브블록의 폭, 하나 이상의 서브블록들 중 제 1 서브블록의 높이, 및 제 1 스케일 팩터를 승산하게 하고; 임계 값을 생성하기 위해 제 2 스케일 팩터에 기초하여 중간 값에 대해 좌측-시프트 연산을 수행하게 하고; 그리고 제 1 서브블록에 대한 개별 왜곡 값들의 왜곡 값을 임계 값과 비교하게 하는 명령들을 더 포함하고, 여기서, 하나 이상의 프로세서들로 하여금 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금 비교에 기초하여 제 1 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하게 하는 명령들을 포함한다.Article 22. The computer-readable storage medium of any of clauses 19-21, which causes one or more processors to: a width of a first one of the one or more subblocks, a width of a first one of the one or more subblocks, a second one of the one or more subblocks to generate an intermediate value. multiply the height of 1 subblock, and the first scale factor; perform a left-shift operation on the intermediate value based on the second scale factor to generate a threshold value; and further comprising instructions for causing the one or more processors to compare a distortion value of the respective distortion values for the first subblock to a threshold value, wherein the one or more subblocks of the plurality of subblocks based on the respective distortion values. Instructions that cause one or more processors to determine that either a BDOF per pixel is performed for each subblock of , or a BDOF is bypassed, the one or more processors perform a BDOF per pixel for the first subblock based on the comparison. or contains instructions that determine that BDOF is bypassed.

조항 23. 조항 19 내지 조항 22 중 어느 하나의 컴퓨터 판독가능 저장 매체에 있어서, 하나 이상의 프로세서들로 하여금: 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 1 레퍼런스 블록에서의 샘플 값들의 제 1 세트를 결정하게 하고; 스케일링된 샘플 값들의 제 1 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 1 세트를 스케일링하게 하고; 하나 이상의 서브블록들 중 제 1 서브블록에 대한 제 2 레퍼런스 블록에서의 샘플 값들의 제 2 세트를 결정하게 하고; 그리고 스케일링된 샘플 값들의 제 2 세트를 생성하기 위해 스케일 팩터로 샘플 값들의 제 2 세트를 스케일링하게 하는 명령들을 더 포함하고, 여기서, 하나 이상의 프로세서들로 하여금 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하게 하는 명령들은, 하나 이상의 프로세서들로 하여금, 제 1 서브블록에 대해, 스케일링된 샘플 값들의 제 1 세트 및 스케일링된 샘플 값들의 제 2 세트에 기초하여 개별 왜곡 값들의 왜곡 값을 결정하게 하는 명령들을 포함한다.Article 23. The computer-readable storage medium of any of clauses 19-22, which causes one or more processors to: determine a first set of sample values in a first reference block for a first sub-block of the one or more sub-blocks. make; scale the first set of sample values with a scale factor to produce a first set of scaled sample values; determine a second set of sample values in a second reference block for a first one of the one or more subblocks; and further comprising instructions that cause the one or more processors to scale the second set of sample values with a scale factor to generate a second set of scaled sample values, wherein: one or more subblocks of the plurality of subblocks. For each subblock of the , instructions to determine individual distortion values may cause one or more processors to, for a first subblock, a first set of scaled sample values and a second set of scaled sample values. and instructions for determining a distortion value of individual distortion values based on

조항 24. 비디오 데이터를 디코딩하기 위한 디바이스로서, 그 디바이스는, 양방향 광학 플로우 (BDOF) 가 비디오 데이터의 블록에 대해 인에이블됨을 결정하는 수단; BDOF 가 블록에 대해 인에이블된다는 결정에 기초하여 블록을 복수의 서브블록들로 분할하는 수단; 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해, 개별 왜곡 값들을 결정하는 수단; 개별 왜곡 값들에 기초하여 복수의 서브블록들 중의 하나 이상의 서브블록들의 각각의 서브블록에 대해 픽셀 당 BDOF 의 하나가 수행되거나 또는 BDOF 가 바이패스됨을 결정하는 수단; 픽셀 당 BDOF 가 수행되거나 또는 BDOF 가 바이패스된다는 결정에 기초하여 하나 이상의 서브블록들의 각각의 서브블록에 대해 예측 샘플들을 결정하는 수단; 및 예측 샘플들에 기초하여 블록을 복원하는 수단을 포함한다.Article 24. A device for decoding video data, the device comprising: means for determining that bi-directional optical flow (BDOF) is enabled for a block of video data; means for dividing the block into a plurality of subblocks based on determining that BDOF is enabled for the block; means for determining respective distortion values for each sub-block of one or more sub-blocks of the plurality of sub-blocks; means for determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of one or more subblocks of the plurality of subblocks based on respective distortion values; means for determining prediction samples for each subblock of the one or more subblocks based on a determination that per-pixel BDOF is performed or BDOF is bypassed; and means for reconstructing the block based on the prediction samples.

조항 25. 비디오 데이터를 코딩하는 방법으로서, 그 방법은 입력 블록을 복수의 서브블록들로 분할하는 단계로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하는 단계; 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하는 단계; 서브블록을 복수의 서브-서브블록들로 분할하는 단계; 서브-서브블록들 중 하나 이상에 대한 정세화된 모션 벡터를 결정하는 단계로서, 하나 이상의 서브-서브블록들 중의 서브-서브블록에 대한 정세화된 모션 벡터는 서브-서브블록에서의 복수의 샘플들에 대해 동일한, 상기 정세화된 모션 벡터를 결정하는 단계; 및 하나 이상의 서브-서브블록들에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하는 단계를 포함한다.Article 25. A method of coding video data, the method comprising: dividing an input block into a plurality of subblocks, wherein a size of the input block is less than or equal to a size of a coding unit; determining that bidirectional optical flow (BDOF) is to be applied to a subblock of the plurality of subblocks based on a condition being satisfied; dividing a subblock into a plurality of sub-subblocks; Determining a refined motion vector for one or more of the sub-subblocks, wherein the refined motion vector for a sub-subblock of the one or more sub-subblocks is dependent on a plurality of samples in the sub-subblock. determining the refined motion vector, which is the same for ; and performing BDOF on a subblock based on the refined motion vector for one or more sub-subblocks.

조항 26. 비디오 데이터를 코딩하는 방법으로서, 그 방법은 입력 블록을 복수의 서브블록들로 분할하는 단계로서, 입력 블록의 사이즈는 코딩 유닛의 사이즈 이하인, 상기 입력 블록을 복수의 서브블록들로 분할하는 단계; 조건이 만족되는 것에 기초하여 양방향 광학 플로우 (BDOF) 가 복수의 서브블록들 중의 서브블록에 적용될 것임을 결정하는 단계; 서브블록을 복수의 서브-서브블록들로 분할하는 단계; 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터를 결정하는 단계; 및 서브블록에서의 하나 이상의 샘플들의 각각에 대한 정세화된 모션 벡터에 기초하여 서브블록에 대한 BDOF 를 수행하는 단계를 포함한다.Article 26. A method of coding video data, the method comprising: dividing an input block into a plurality of subblocks, wherein a size of the input block is less than or equal to a size of a coding unit; determining that bidirectional optical flow (BDOF) is to be applied to a subblock of the plurality of subblocks based on a condition being satisfied; dividing a subblock into a plurality of sub-subblocks; determining a refined motion vector for each of one or more samples in the subblock; and performing BDOF on the subblock based on the refined motion vector for each of one or more samples in the subblock.

조항 27. 조항 25 및 조항 26 중 어느 하나의 방법은, 복수의 서브블록들 중 다른 서브블록들에 대해 BDOF 를 바이패스하는 단계를 더 포함한다.Article 27. The method of any one of clauses 25 and 26 further comprises bypassing BDOF for other subblocks of the plurality of subblocks.

조항 28. 조항 25 내지 조항 27 중 어느 하나의 방법에 있어서, 조건이 만족되는 것은, 레퍼런스 픽처 0 및 레퍼런스 픽처 1 에서의 2개의 예측 신호들 사이의 절대 차이의 합 (SAD) 이 임계치보다 작은지 여부의 결정을 포함한다.Article 28. The method of any one of clauses 25 to 27, wherein the condition being satisfied determines whether a sum of absolute differences (SAD) between two prediction signals in reference picture 0 and reference picture 1 is smaller than a threshold. includes

조항 29. 조항 25 내지 조항 28 중 어느 하나의 방법에 있어서, 입력 블록의 사이즈는 thWxthH 이고, 여기서, thW 및 thH 는 고정된 미리결정된 값; 비트스트림으로부터 디코딩된 값 중 하나 이상에 기초하거나; 또는 코딩 유닛을 인코딩 또는 디코딩함에 있어서 BDOF 이전에 사용된 블록들의 사이즈에 기초한다.Article 29. The method of any of clauses 25 to 28, wherein the size of the input block is thWxthH, where thW and thH are fixed predetermined values; based on one or more of the values decoded from the bitstream; or based on the size of blocks used before BDOF in encoding or decoding a coding unit.

조항 30. 비디오 데이터를 코딩하는 방법으로서, 그 방법은 조항 25 내지 조항 29 중 임의의 하나 또는 그 조합을 포함한다.Article 30. A method of coding video data, the method comprising any one or combination of clauses 25-29.

조항 31. 조항 25 내지 조항 30 중 어느 하나의 방법에 있어서, BDOF 를 수행하는 단계는 비디오 데이터를 디코딩하는 것의 부분으로서 BDOF 를 수행하는 단계를 포함한다.Article 31. The method of any of clauses 25-30, wherein performing BDOF comprises performing BDOF as part of decoding the video data.

조항 32. 조항 25 내지 조항 31 중 어느 하나의 방법에 있어서, BDOF 를 수행하는 단계는 인코딩의 복원 루프에 포함하여 비디오 데이터를 인코딩하는 것의 부분으로서 BDOF 를 수행하는 단계를 포함한다.Article 32. The method of any of clauses 25-31, wherein performing BDOF comprises performing BDOF as part of encoding the video data by including in a recovery loop of the encoding.

조항 33. 비디오 데이터를 코딩하기 위한 디바이스로서, 그 디바이스는 비디오 데이터를 저장하기 위한 메모리; 및 메모리에 커플링된 프로세싱 회로부를 포함하고, 프로세싱 회로부는 조항 25 내지 조항 32 중 임의의 하나 또는 그 조합을 수행하도록 구성된다.Article 33. A device for coding video data, the device comprising: a memory for storing video data; and processing circuitry coupled to the memory, the processing circuitry configured to perform any one or combination of clauses 25-32.

조항 34. 비디오 데이터를 코딩하기 위한 디바이스로서, 그 디바이스는 조항 25 내지 조항 32 중 어느 하나의 방법을 수행하기 위한 하나 이상의 수단들을 포함한다.Article 34. Device for coding video data, the device comprising one or more means for performing the method of any of clauses 25 to 32.

조항 35. 조항 33 및 조항 34 중 어느 하나의 디바이스는 디코딩된 비디오 데이터를 디스플레이하도록 구성된 디스플레이를 더 포함한다.Article 35. The device of any of clauses 33 and 34 further comprises a display configured to display the decoded video data.

조항 36. 조항 33 내지 조항 35 중 어느 하나의 디바이스에 있어서, 그 디바이스는 카메라, 컴퓨터, 모바일 디바이스, 브로드캐스트 수신기 디바이스, 또는 셋탑 박스 중 하나 이상을 포함한다.Article 36. The device of any of clauses 33-35, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set top box.

조항 37. 조항 33 내지 조항 36 중 어느 하나의 디바이스에 있어서, 프로세싱 회로부 또는 수행하기 위한 수단은 비디오 디코더를 포함한다.Article 37. The device of any of clauses 33 to 36, wherein the processing circuitry or means for performing comprises a video decoder.

조항 38. 조항 33 내지 조항 37 중 어느 하나의 디바이스에 있어서, 프로세싱 회로부 또는 수행하기 위한 수단은 비디오 인코더를 포함한다.Article 38. The device of any of clauses 33 to 37, wherein the processing circuitry or means for performing comprises a video encoder.

조항 39. 명령들이 저장된 컴퓨터 판독가능 저장 매체로서, 그 명령들은, 실행될 경우, 하나 이상의 프로세서들로 하여금 조항 25 내지 조항 32 중 어느 하나의 방법을 수행하게 한다.Article 39. A computer-readable storage medium having stored thereon instructions, which, when executed, cause one or more processors to perform the method of any of clauses 25-32.

예에 의존하여, 본 명세서에서 설명된 기법들의 임의의 특정 작동들 또는 이벤트들은 상이한 시퀀스로 수행될 수 있고, 전체적으로 부가되거나 병합되거나 또는 제거될 수도 있음 (예컨대, 설명된 모든 작동들 또는 이벤트들이 그 기법들의 실시를 위해 필수적인 것은 아님) 이 인식되어야 한다. 더욱이, 특정 예들에 있어서, 작동들 또는 이벤트들은 순차적인 것보다는, 예컨대, 다중-스레딩된 프로세싱, 인터럽트 프로세싱, 또는 다중의 프로세서들을 통해 동시에 수행될 수도 있다.Depending on the example, any particular acts or events of the techniques described herein may be performed in a different sequence, and may be added, merged, or removed entirely (e.g., all acts or events described may be not essential for the practice of the techniques). Moreover, in certain examples, actions or events may be performed simultaneously rather than sequentially, eg, via multi-threaded processing, interrupt processing, or multiple processors.

하나 이상의 예들에 있어서, 설명된 기능들은 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합에서 구현될 수도 있다. 소프트웨어에서 구현된다면, 그 기능들은 하나 이상의 명령들 또는 코드로서 컴퓨터 판독가능 매체 상으로 저장 또는 전송되고 하드웨어 기반 프로세싱 유닛에 의해 실행될 수도 있다. 컴퓨터 판독가능 매체들은 데이터 저장 매체들과 같은 유형의 매체에 대응하는 컴퓨터 판독가능 저장 매체들, 또는 예컨대, 통신 프로토콜에 따라 일 장소로부터 다른 장소로의 컴퓨터 프로그램의 전송을 용이하게 하는 임의의 매체를 포함하는 통신 매체들을 포함할 수도 있다. 이러한 방식으로, 컴퓨터 판독가능 매체들은 일반적으로 (1) 비일시적인 유형의 컴퓨터 판독가능 저장 매체들 또는 (2) 신호 또는 캐리어파와 같은 통신 매체에 대응할 수도 있다. 데이터 저장 매체들은 본 개시에서 설명된 기법들의 구현을 위한 명령들, 코드 및/또는 데이터 구조들을 취출하기 위해 하나 이상의 컴퓨터들 또는 하나 이상의 프로세서들에 의해 액세스될 수 있는 임의의 가용 매체들일 수도 있다. 컴퓨터 프로그램 제품이 컴퓨터 판독가능 매체를 포함할 수도 있다.In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer readable media are computer readable storage media that correspond to tangible media such as data storage media, or any medium that facilitates transfer of a computer program from one place to another, eg, according to a communication protocol. Communication media including In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media that is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer readable medium.

한정이 아닌 예로서, 그러한 컴퓨터 판독가능 저장 매체들은 RAM, ROM, EEPROM, CD-ROM 또는 다른 광학 디스크 저장부, 자기 디스크 저장부 또는 다른 자기 저장 디바이스들, 플래시 메모리, 또는 원하는 프로그램 코드를 명령들 또는 데이터 구조들의 형태로 저장하는데 이용될 수 있고 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체를 포함할 수 있다. 또한, 임의의 커넥션이 컴퓨터 판독가능 매체로 적절히 명명된다. 예를 들어, 동축 케이블, 광섬유 케이블, 꼬임쌍선, 디지털 가입자 라인 (DSL), 또는 적외선, 무선, 및 마이크로파와 같은 무선 기술들을 이용하여 웹사이트, 서버, 또는 다른 원격 소스로부터 명령들이 송신된다면, 동축 케이블, 광섬유 케이블, 꼬임쌍선, DSL, 또는 적외선, 무선, 및 마이크로파와 같은 무선 기술들은 매체의 정의에 포함된다. 하지만, 컴퓨터 판독가능 저장 매체들 및 데이터 저장 매체들은 커넥션들, 캐리어파들, 신호들, 또는 다른 일시적 매체들을 포함하지 않지만 대신 비일시적 유형의 저장 매체들로 지향됨이 이해되어야 한다. 본 명세서에서 사용된 바와 같이, 디스크 (disk) 및 디스크 (disc) 는 컴팩트 디스크 (CD), 레이저 디스크, 광학 디스크, 디지털 다기능 디스크 (DVD), 플로피 디스크 및 블루레이 디스크를 포함하며, 여기서, 디스크(disk)들은 통상적으로 데이터를 자기적으로 재생하지만 디스크(disc)들은 레이저들을 이용하여 데이터를 광학적으로 재생한다. 상기의 조합들이 또한, 컴퓨터 판독가능 매체들의 범위 내에 포함되어야 한다.By way of example and not limitation, such computer readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or desired program code instructions. or any other medium that can be used for storage in the form of data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. Coax, for example, if instructions are transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave. Cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. As used herein, disk and disc include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy discs, and Blu-ray discs, where the disc Disks typically reproduce data magnetically, but discs reproduce data optically using lasers. Combinations of the above should also be included within the scope of computer-readable media.

명령들은 하나 이상의 DSP들, 범용 마이크로 프로세서들, ASIC들, FPGA들, 또는 다른 균등한 집적된 또는 별개의 로직 회로부와 같은 하나 이상의 프로세서들에 의해 실행될 수도 있다. 이에 따라, 본 명세서에서 사용된 바와 같은 용어들 "프로세서" 및 "프로세싱 회로부" 는 본 명세서에서 설명된 기법들의 구현에 적합한 전술한 구조들 또는 임의의 다른 구조 중 임의의 구조를 지칭할 수도 있다. 부가적으로, 일부 양태들에 있어서, 본 명세서에서 설명된 기능성은 인코딩 및 디코딩을 위해 구성되거나 또는 결합된 코덱에서 통합된 전용 하드웨어 및/또는 소프트웨어 모듈들 내에 제공될 수도 있다. 또한, 그 기법들은 하나 이상의 회로들 또는 로직 엘리먼트들에서 완전히 구현될 수 있다.Instructions may be executed by one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAs, or other equivalent integrated or discrete logic circuitry. Accordingly, the terms “processor” and “processing circuitry” as used herein may refer to any of the foregoing structures or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.

본 개시의 기법들은 무선 핸드셋, 집적 회로 (IC) 또는 IC 들의 세트 (예컨대, 칩 세트) 를 포함하여, 광범위하게 다양한 디바이스들 또는 장치들에서 구현될 수도 있다. 다양한 컴포넌트들, 모듈들, 또는 유닛들이 개시된 기법들을 수행하도록 구성된 디바이스들의 기능적 양태들을 강조하기 위해 본 개시에서 설명되지만, 반드시 상이한 하드웨어 유닛들에 의한 실현을 요구하지는 않는다. 오히려, 상기 설명된 바와 같이, 다양한 유닛들은 적합한 소프트웨어 및/또는 펌웨어와 함께 상기 설명된 바와 같은 하나 이상의 프로세서들을 포함하여 코덱 하드웨어 유닛으로 결합되거나 또는 상호운용식 하드웨어 유닛들의 집합에 의해 제공될 수도 있다.The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or set of ICs (eg, a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined into a codec hardware unit including one or more processors as described above along with suitable software and/or firmware or provided by a collection of interoperable hardware units. .

다양한 예들이 설명되었다. 이들 및 다른 예들은 다음의 청구항들의 범위 내에 있다.Various examples have been described. These and other examples are within the scope of the following claims.

Claims

A method of decoding video data, comprising:
determining that bi-directional optical flow (BDOF) is enabled for the block of video data;
dividing the block into a plurality of subblocks based on the determination that BDOF is enabled for the block;
determining individual distortion values for each sub-block of one or more sub-blocks of the plurality of sub-blocks;
determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values;
determining prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; and
Reconstructing the block based on the prediction samples.

According to claim 1,
For each subblock of one or more subblocks of the plurality of subblocks, determining individual distortion values comprises:
determining, for a first subblock of the one or more subblocks, a first distortion value of the individual distortion values; and
For a second subblock of the one or more subblocks, determining a second distortion value of the respective distortion values;
Determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values;
determining, for the first sub-block among the plurality of sub-blocks, that BDOF is enabled for the first sub-block based on the first distortion value;
based on the determination that BDOF is enabled for the first sub-block, determining a per-pixel motion refinement for refining a first set of prediction samples for the first sub-block;
determining that BDOF is bypassed for the second subblock among the plurality of subblocks based on the second distortion value; and
based on the determination that BDOF is bypassed for a second block, bypassing determining a per-pixel motion refinement for refining a second set of prediction samples for the second sub-block;
Determining prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed comprises:
for the first sub-block, determining a refined first set of prediction samples of the first sub-block based on the per-pixel motion refinement for the first sub-block; and
for the second sub-block, determining a second set of prediction samples without refining the second set of prediction samples based on the per-pixel motion refinement for refining the second set of prediction samples; A method for decoding video data, comprising:

According to claim 1,
Determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values may include: Determining that per-pixel BDOF is performed for a first subblock of
The method further comprises determining, for each sample in the first subblock, respective motion refinements;
Determining prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed comprises: for each sample in the first subblock, determining individual refined sample values from samples in a predictive block for the first subblock based on the individual motion refinements.

According to claim 1,
multiplying a width of a first one of the one or more subblocks, a height of the first one of the one or more subblocks, and a first scale factor to generate an intermediate value;
performing a left-shift operation on the intermediate value based on a second scale factor to generate a threshold value; and
Comparing a distortion value of the individual distortion values for the first sub-block with the threshold value;
Determining that one of per-pixel BDOF is performed or BDOF is bypassed for each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the individual distortion values may include: A method of decoding video data comprising determining that either per-pixel BDOF is performed or BDOF is bypassed for a first subblock.

According to claim 1,
determining a first set of sample values in a first reference block for a first one of the one or more subblocks;
scaling the first set of sample values with a scale factor to produce a first set of scaled sample values;
determining a second set of sample values in a second reference block for the first one of the one or more subblocks; and
scaling the second set of sample values with the scale factor to produce a second set of scaled sample values;
For each subblock of one or more subblocks of the plurality of subblocks, determining individual distortion values comprises, for the first subblock, the first set of scaled sample values and the scaled sample and determining a distortion value of the individual distortion values based on a second set of values.

According to claim 5,
Determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values may be performed in the first subblock. determining that per-pixel BDOF is performed for
wherein the method further comprises reusing the first set of scaled sample values and the second set of scaled sample values to determine a per-pixel motion refinement for per-pixel BDOF. .

According to claim 5,
Determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values may be performed in the first subblock. determining that per-pixel BDOF is performed for
wherein the method further comprises reusing the first set of scaled sample values and the second set of scaled sample values to determine a motion refinement for BDOF.

According to claim 1,
The step of restoring the block is,
receiving residual values representing differences between the prediction samples and samples of the block; and
and adding the residual values to the prediction samples to reconstruct the block.

A device for decoding video data,
a memory configured to store the video data; and
processing circuitry coupled to the memory;
The processing circuitry,
determine that bi-directional optical flow (BDOF) is enabled for the block of video data;
divide the block into a plurality of subblocks based on the determination that BDOF is enabled for the block;
for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determine respective distortion values;
determine that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values;
determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; and
To reconstruct the block based on the prediction samples
A device for decoding video data, configured.

According to claim 9,
For each sub-block of one or more sub-blocks of the plurality of sub-blocks, to determine respective distortion values, the processing circuitry comprises:
for a first one of the one or more subblocks, determine a first distortion value of the individual distortion values; and
For a second subblock of the one or more subblocks, determine a second distortion value of the respective distortion values.
constituted,
To determine based on the respective distortion values, for each sub-block of the one or more sub-blocks of the plurality of sub-blocks, per-pixel BDOF is performed or BDOF is bypassed, the processing circuitry comprises:
for the first sub-block of the plurality of sub-blocks, determine that BDOF is enabled for the first sub-block based on the first distortion value;
based on the determination that BDOF is enabled for the first sub-block, determine a per-pixel motion refinement for refining a first set of prediction samples for the first sub-block;
determine that BDOF is bypassed for the second subblock among the plurality of subblocks based on the second distortion value; and
Based on the determination that BDOF is bypassed for a second block, bypass determining a per-pixel motion refinement for refining a second set of prediction samples for the second sub-block.
constituted,
To determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed, the processing circuitry comprises:
for the first sub-block, determine a refined first set of prediction samples of the first sub-block based on the per-pixel motion refinement for the first sub-block; and
For the second sub-block, determine a second set of predictive samples without refining the second set of predictive samples based on the per-pixel motion refinement for refining the second set of predictive samples.
A device for decoding video data, configured.

According to claim 9,
To determine that one of per-pixel BDOF is performed or BDOF is bypassed for each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values, the processing circuitry comprises: be configured to determine that per-pixel BDOF is performed for a first subblock of the one or more subblocks;
the processing circuitry is further configured to determine, for each sample in the first subblock, respective motion refinements;
To determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed, the processing circuitry comprises: each of the respective subblocks in the first subblock; For a sample, determine individual refined sample values from samples in a predictive block for the first subblock based on the individual motion refinements.

According to claim 9,
The processing circuitry,
multiply a width of a first one of the one or more subblocks, a height of the first one of the one or more subblocks, and a first scale factor to generate an intermediate value;
perform a left-shift operation on the intermediate value based on a second scale factor to generate a threshold value; and
To compare a distortion value of the individual distortion values for the first sub-block with the threshold value
constituted,
To determine that one of per-pixel BDOF is performed or BDOF is bypassed for each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values, the processing circuitry comprises: and determine that one of per-pixel BDOF is performed or BDOF is bypassed for the first subblock based on the comparison.

According to claim 9,
The processing circuitry,
determine a first set of sample values in a first reference block for a first one of the one or more subblocks;
scaling the first set of sample values with a scale factor to produce a first set of scaled sample values;
determine a second set of sample values in a second reference block for the first one of the one or more subblocks; and
to scale the second set of sample values with the scale factor to generate a second set of scaled sample values;
constituted,
For each sub-block of one or more sub-blocks of the plurality of sub-blocks, to determine respective distortion values, the processing circuitry, for the first sub-block, the first set of scaled sample values and and determine a distortion value of the individual distortion values based on the second set of scaled sample values.

According to claim 13,
To determine that one of per-pixel BDOF is performed or BDOF is bypassed for each sub-block of the one or more sub-blocks of the plurality of sub-blocks based on the respective distortion values, the processing circuitry comprises: be configured to determine that per-pixel BDOF is performed for the first sub-block;
a device for decoding video data, wherein the processing circuitry is configured to reuse the first set of scaled sample values and the second set of scaled sample values to determine a per-pixel motion refinement for per-pixel BDOF .

According to claim 13,
To determine that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values, the processing circuitry comprises: be configured to determine that per-pixel BDOF is performed for the first sub-block;
wherein the processing circuitry is configured to reuse the first set of scaled sample values and the second set of scaled sample values to determine motion refinement for BDOF.

According to claim 9,
To recover the block, the processing circuitry comprises:
receive residual values representing differences between the prediction samples and samples of the block; and
Reconstruct the block by adding the residual values to the prediction samples
A device for decoding video data, configured.

According to claim 9,
A device for decoding video data, further comprising a display configured to display the decoded video data.

According to claim 9,
Wherein the device comprises one or more of a camera, computer, mobile device, broadcast receiver device, or set top box.

A computer readable storage medium storing instructions,
The instructions, when executed, cause one or more processors to:
determine that bi-directional optical flow (BDOF) is enabled for the block of video data;
divide the block into a plurality of subblocks based on the determination that BDOF is enabled for the block;
for each sub-block of one or more sub-blocks of the plurality of sub-blocks, determine respective distortion values;
determine that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values;
determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; and
Reconstruct the block based on the prediction samples.

According to claim 19,
Instructions that cause the one or more processors to determine, for each subblock of one or more subblocks of the plurality of subblocks, individual distortion values, cause the one or more processors to:
for a first one of the one or more subblocks, determine a first distortion value of the individual distortion values; and
determining a second distortion value of the individual distortion values for a second subblock of the one or more subblocks;
contains commands,
cause the one or more processors to determine based on the respective distortion values that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks. Instructions cause the one or more processors to:
for the first sub-block of the plurality of sub-blocks, determine that BDOF is enabled for the first sub-block based on the first distortion value;
determine, based on the determination that BDOF is enabled for the first sub-block, a per-pixel motion refinement for refining a first set of prediction samples for the first sub-block;
for the second subblock of the plurality of subblocks, determine that BDOF is bypassed based on the second distortion value; and
Bypass determining a per-pixel motion refinement for refining a second set of prediction samples for the second sub-block based on the determination that BDOF is bypassed for the second block.
contains commands,
instructions that cause the one or more processors to determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; cause:
for the first sub-block, determine a refined first set of prediction samples of the first sub-block based on the per-pixel motion refinement for the first sub-block; and
For the second subblock, determine a second set of prediction samples without refining the second set of prediction samples based on the per-pixel motion refinement for refining the second set of prediction samples.
A computer readable storage medium containing instructions.

According to claim 19,
cause the one or more processors to determine based on the respective distortion values that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks. the instructions comprising instructions that cause the one or more processors to determine that per-pixel BDOF is performed for a first one of the one or more subblocks;
the instructions further comprising instructions to cause the one or more processors to determine, for each sample in the first subblock, respective motion refinements;
instructions that cause the one or more processors to determine prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; For each sample in the first subblock, determine individual refined sample values from samples in a predictive block for the first subblock based on the individual motion refinements , a computer readable storage medium.

According to claim 19,
Causes the one or more processors to:
multiply a width of a first one of the one or more subblocks, a height of the first one of the one or more subblocks, and a first scale factor to generate an intermediate value;
perform a left-shift operation on the intermediate value based on a second scale factor to generate a threshold value; and
Comparing a distortion value of the individual distortion values for the first subblock with the threshold value
contains more commands,
cause the one or more processors to determine based on the respective distortion values that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks. The instructions include instructions that cause the one or more processors to determine, based on the comparison, that one of the BDOFs per pixel is performed or BDOF is bypassed for the first subblock.

According to claim 19,
Causes the one or more processors to:
determine a first set of sample values in a first reference block for a first one of the one or more subblocks;
scale the first set of sample values with a scale factor to produce a first set of scaled sample values;
determine a second set of sample values in a second reference block for the first one of the one or more subblocks; and
Scale the second set of sample values with the scale factor to generate a second set of scaled sample values.
contains more commands,
Instructions that cause the one or more processors to determine, for each subblock of one or more subblocks of the plurality of subblocks, respective distortion values, cause the one or more processors to: , instructions to determine a distortion value of the respective distortion values based on the first set of scaled sample values and the second set of scaled sample values.

A device for decoding video data,
means for determining that bi-directional optical flow (BDOF) is enabled for the block of video data;
means for dividing the block into a plurality of subblocks based on the determination that BDOF is enabled for the block;
means for determining respective distortion values for each subblock of one or more of the plurality of subblocks;
means for determining that one of per-pixel BDOF is performed or BDOF is bypassed for each subblock of the one or more subblocks of the plurality of subblocks based on the respective distortion values;
means for determining prediction samples for each subblock of the one or more subblocks based on the determination that per-pixel BDOF is performed or BDOF is bypassed; and
and means for reconstructing the block based on the prediction samples.