KR20210109049A

KR20210109049A - Optical flow estimation for motion compensated prediction in video coding

Info

Publication number: KR20210109049A
Application number: KR1020217026758A
Authority: KR
Inventors: 야오우 쑤; 보한 리; 징닝 한
Original assignee: 구글 엘엘씨
Priority date: 2017-08-22
Filing date: 2018-05-10
Publication date: 2021-09-03
Also published as: CN110741640B; CN110741640A; WO2019040134A1; KR102400078B1; JP2020522200A; EP3673655A1; KR102295520B1; KR20200002036A; JP6905093B2

Abstract

비디오 시퀀스에서 현재 프레임의 블록들의 인터 예측을 위해 사용될 수 있는 광흐름 레퍼런스 프레임 부분(예를 들면, 블록 또는 전체 프레임)이 생성된다. 순방향 레퍼런스 프레임과 역방향 레퍼런스 프레임이 현재 프레임의 픽셀들에 대한 각각의 모션 필드를 생성하는 광흐름 추정에 사용된다. 모션 필드들은 레퍼런스 프레임들의 일부 또는 모든 픽셀들을 현재 프레임의 픽셀들로 워핑하는 데 사용된다. 워핑된 레퍼런스 프레임 픽셀들은 블렌딩되어 광흐름 레퍼런스 프레임 부분을 형성한다. 인터 예측은 현재 프레임의 인코딩 또는 디코딩 부분들의 일부로서 수행될 수 있다.An optical flow reference frame portion (eg, a block or an entire frame) is generated that can be used for inter prediction of blocks of the current frame in the video sequence. A forward reference frame and a backward reference frame are used for light flow estimation to generate respective motion fields for pixels in the current frame. Motion fields are used to warp some or all pixels of reference frames to pixels of the current frame. The warped reference frame pixels are blended to form a lightflow reference frame portion. Inter prediction may be performed as part of encoding or decoding portions of the current frame.

Description

OPTICAL FLOW ESTIMATION FOR MOTION COMPENSATED PREDICTION IN VIDEO CODING

[0001] 디지털 비디오 스트림들은 일련의 프레임들 또는 스틸 이미지들(still images)을 사용하여 비디오를 나타낼 수 있다. 디지털 비디오는 예를 들면, 화상 회의, 고해상도 비디오 엔터테인먼트, 비디오 광고들, 또는 사용자 생성 비디오들의 공유를 포함한 다양한 적용들에 사용될 수 있다. 디지털 비디오 스트림은 대량의 데이터를 포함할 수 있고 비디오 데이터의 처리, 전송, 또는 저장을 위해 컴퓨팅 디바이스의 상당량의 컴퓨팅 또는 통신 자원들을 소비할 수 있다. 비디오 스트림들에서 데이터의 양을 저감하기 위해 압축 및 다른 인코딩 기술들을 포함한 다양한 접근법들이 제안되고 있다.[0001] Digital video streams may represent video using a series of frames or still images. Digital video may be used in a variety of applications including, for example, video conferencing, high-definition video entertainment, video advertisements, or sharing of user-generated videos. A digital video stream may contain large amounts of data and may consume significant amounts of computing or communication resources of a computing device for processing, transmission, or storage of the video data. Various approaches have been proposed to reduce the amount of data in video streams, including compression and other encoding techniques.

[0002] 압축을 위한 한 가지 기술은 인코딩될 현재 블록에 대응하는 예측 블록을 생성하기 위해 레퍼런스 프레임을 사용한다. 인코딩된 데이터의 양을 저감하기 위해 현재 블록 자체의 값들 대신에 예측 블록과 현재 블록 사이의 차이들이 인코딩될 수 있다.[0002] One technique for compression uses a reference frame to generate a predictive block corresponding to the current block to be encoded. Differences between the prediction block and the current block may be encoded instead of the values of the current block itself to reduce the amount of encoded data.

[0003] 본 발명은 일반적으로 비디오 데이터를 인코딩 및 디코딩하는 것에 관한 것이며, 보다 세부적으로는 비디오 압축에서 모션 보상 예측을 위해 블록 기반의 광흐름 추정(optical flow estimation)을 이용하는 것에 관한 것이다. 비디오 압축에서 모션 보상 예측을 위해 병치된 레퍼런스 프레임을 보간할 수 있는 프레임 레벨 기반의 광흐름 추정도 또한 기재되어 있다.[0003] FIELD OF THE INVENTION The present invention relates generally to encoding and decoding video data, and more particularly to the use of block-based optical flow estimation for motion compensation prediction in video compression. Frame-level based lightflow estimation, which can interpolate collocated reference frames for motion compensation prediction in video compression, is also described.

[0004] 본 발명은 인코딩 및 디코딩 방법들 및 장치를 기술한다. 본 발명의 일 구현예에 따른 방법은 예측될 제1 프레임의 제1 프레임 부분을 결정하는 단계 ― 제1 프레임은 비디오 시퀀스에 있음 ― , 제1 프레임의 순방향 인터 예측(forward inter prediction)을 위해 비디오 시퀀스로부터 제1 레퍼런스 프레임(reference frame)을 결정하는 단계, 제1 프레임의 역방향 인터 예측을 위해 비디오 시퀀스로부터 제2 레퍼런스 프레임을 결정하는 단계, 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임을 사용하여 광흐름 추정(optical flow estimation)을 수행함으로써, 제1 프레임 부분의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 생성하는 단계, 및 광흐름 레퍼런스 프레임 부분을 사용하여 제1 프레임 부분에 대한 예측 프로세스를 수행하는 단계를 포함한다. 제1 프레임 부분 및 광흐름 레퍼런스 프레임 부분은, 예를 들면, 블록 또는 전체 프레임일 수 있다.[0004] The present invention describes encoding and decoding methods and apparatus. A method according to an embodiment of the present invention comprises the steps of determining a first frame portion of a first frame to be predicted, wherein the first frame is in a video sequence, the video for forward inter prediction of the first frame. Determining a first reference frame from a sequence, determining a second reference frame from a video sequence for backward inter prediction of the first frame, light flow using the first reference frame and the second reference frame generating an optical flow reference frame portion for inter prediction of the first frame portion by performing optical flow estimation, and performing a prediction process on the first frame portion using the optical flow reference frame portion includes The first frame portion and the light flow reference frame portion may be, for example, a block or an entire frame.

[0005] 본 발명의 일 구현예에 따른 장치는 비일시적 저장 매체 또는 메모리 및 프로세서를 포함한다. 매체는 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하고, 방법은 비디오 시퀀스에서 예측될 제1 프레임을 결정하는 단계, 및 제1 프레임의 순방향 인터 예측을 위한 제1 레퍼런스 프레임의 이용 가능성 및 제1 프레임의 역방향 인터 예측을 위한 제2 레퍼런스 프레임의 이용 가능성을 결정하는 단계를 포함한다. 방법은 또한, 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임 양자 모두의 이용 가능성을 결정하는 것에 응답하여, 제1 레퍼런스 프레임과 제2 레퍼런스 프레임을 광흐름 추정 프로세스에 대한 입력으로 사용하여, 제1 프레임 부분의 픽셀들에 대한 각각의 모션 필드를 생성하는 단계, 제1 워핑된 레퍼런스 프레임 부분을 형성하기 위해, 모션 필드들을 사용하여 제1 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계 ― 제1 레퍼런스 프레임 부분은 제1 프레임 부분의 픽셀들과 병치된 제1 레퍼런스 프레임의 픽셀들을 포함함 ― , 제2 워핑된 레퍼런스 프레임 부분을 형성하기 위해, 모션 필드들을 사용하여 제2 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계 ― 제2 레퍼런스 프레임 부분은 제1 프레임 부분의 픽셀들과 병치된 제2 레퍼런스 프레임의 픽셀들을 포함함 ― , 및 제1 프레임의 적어도 하나의 블록의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 형성하기 위해, 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분을 블렌딩하는 단계를 포함한다. [0005] An apparatus according to an embodiment of the present invention includes a non-transitory storage medium or memory and a processor. The medium includes instructions executable by a processor to perform the method, the method comprising: determining a first frame to be predicted in a video sequence; and availability of a first reference frame for forward inter prediction of the first frame; and determining availability of a second reference frame for backward inter prediction of the first frame. The method also includes, in response to determining availability of both the first reference frame and the second reference frame, using the first reference frame and the second reference frame as inputs to an optical flow estimation process to: generating a respective motion field for pixels of , warping a first reference frame portion into a first frame portion using the motion fields to form a first warped reference frame portion—a first reference frame the portion comprises pixels of a first reference frame juxtaposed with pixels of the first frame portion, wherein the second reference frame portion is converted to the first frame portion using the motion fields to form a second warped reference frame portion. warping with , wherein the second reference frame portion includes pixels of the second reference frame juxtaposed with pixels of the first frame portion, and a lightflow reference frame for inter prediction of at least one block of the first frame. and blending the first warped reference frame portion and the second warped reference frame portion to form the portion.

[0006] 본 발명의 일 구현예에 따른 다른 장치는 또한 비일시적 저장 매체 또는 메모리 및 프로세서를 포함한다. 매체는 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하고, 방법은, 광흐름 추정을 위해 제1 처리 레벨에서 제1 프레임 부분의 픽셀들에 대한 모션 필드들을 초기화함으로써 ― 제1 처리 레벨은 제1 프레임 부분 내의 다운스케일링된 모션을 나타내고 다수의 레벨들 중 하나의 레벨을 포함함 ― , 비디오 시퀀스로부터의 제1 레퍼런스 프레임 부분 및 비디오 시퀀스의 제2 레퍼런스 프레임 부분을 사용하여 비디오 시퀀스의 제1 프레임의 블록의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 생성하는 단계, 및 다수의 레벨들 중 각각의 레벨에 대해, 제1 워핑된 레퍼런스 프레임 부분을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 제2 워핑된 레퍼런스 프레임 부분을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 광흐름 추정을 사용하여 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분 사이의 모션 필드들을 추정하는 단계, 및 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분 사이의 모션 필드들을 사용하여 제1 프레임 부분의 픽셀들에 대한 모션 필드들을 업데이트하는 단계를 포함한다. 방법은 또한, 다수의 레벨들 중 최종 레벨에 대해: 최종의 제1 워핑된 레퍼런스 프레임 부분을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제1 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 최종의 제2 워핑된 레퍼런스 프레임 부분을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제2 레퍼런스 프레임 부분을 제1 프레임 부분으로 워핑하는 단계, 및 광흐름 레퍼런스 프레임 부분을 형성하기 위해 최종의 제1 워핑된 레퍼런스 프레임 부분과 제2 워핑된 레퍼런스 프레임 부분을 블렌딩하는 단계를 포함한다. [0006] Another device according to an embodiment of the present invention also includes a non-transitory storage medium or memory and a processor. The medium includes instructions executable by a processor to perform the method, the method comprising: initializing motion fields for pixels of a first frame portion at a first processing level for lightflow estimation, the first processing level comprising: representing downscaled motion within the first frame portion and including one of a plurality of levels, wherein the first reference frame portion of the video sequence is used using a first reference frame portion from the video sequence and a second reference frame portion of the video sequence. generating a lightflow reference frame portion for inter prediction of a block of frame, and for each of the plurality of levels, a first reference frame using the motion fields to form a first warped reference frame portion warping a portion into a first frame portion; warping a second reference frame portion into a first frame portion using motion fields to form a second warped reference frame portion; estimating motion fields between the warped reference frame portion and the second warped reference frame portion, and using the motion fields between the first warped reference frame portion and the second warped reference frame portion updating the motion fields for the pixels. The method also includes, for a last level of the plurality of levels: warping the first reference frame portion into the first frame portion using the updated motion fields to form a final first warped reference frame portion, a final warping the second reference frame portion into the first frame portion using the updated motion fields to form a second warped reference frame portion of blending the reference frame portion and the second warped reference frame portion.

[0007] 본 발명의 일 구현예에 따른 다른 장치는 또한 비일시적 저장 매체 또는 메모리 및 프로세서를 포함한다. 매체는 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하고, 방법은, 예측될 제1 프레임의 제1 프레임 부분을 결정하는 단계 ― 제1 프레임은 비디오 시퀀스 내에 있음 ― , 제1 프레임의 순방향 인터 예측을 위해 비디오 시퀀스로부터 제1 레퍼런스 프레임을 결정하는 단계, 제1 프레임의 역방향 인터 예측을 위해 비디오 시퀀스로부터 제2 레퍼런스 프레임을 결정하는 단계, 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임을 사용하여 광흐름 추정을 수행함으로써, 제1 프레임 부분의 인터 예측을 위한 광흐름 레퍼런스 프레임 부분을 생성하는 단계, 및 광흐름 레퍼런스 프레임 부분을 사용하여 제1 프레임 부분에 대한 예측 프로세스를 수행하는 단계를 포함한다.[0007] Another device according to an embodiment of the present invention also includes a non-transitory storage medium or memory and a processor. The medium includes instructions executable by a processor to perform the method, the method comprising: determining a first frame portion of a first frame to be predicted, wherein the first frame is within a video sequence, in a forward direction of the first frame determining a first reference frame from the video sequence for inter prediction, determining a second reference frame from the video sequence for backward inter prediction of the first frame, optical using the first reference frame and the second reference frame generating an optical flow reference frame portion for inter prediction of the first frame portion by performing flow estimation, and performing a prediction process on the first frame portion using the optical flow reference frame portion.

[0008] 본 발명의 이들 및 다른 양태들이 이하의 실시예들의 상세한 설명, 첨부된 청구항들, 및 수반된 도면들에 개시되어 있다.[0008] These and other aspects of the invention are set forth in the following detailed description of the embodiments, the appended claims, and the accompanying drawings.

[0009] 본 명세서의 설명은 아래에 기재된 첨부 도면들을 참조하며, 달리 언급되지 않는 한 여러 도면들에 걸쳐 동일 참조 번호는 동일 부분을 지칭한다.
[0010] 도 1은 비디오 인코딩 및 디코딩 시스템의 개략도이다.
[0011] 도 2는 송신 스테이션 또는 수신 스테이션을 구현할 수 있는 컴퓨팅 디바이스의 일례의 블록도이다.
[0012] 도 3은 인코딩되고 후속적으로 디코딩될 전형적인 비디오 스트림의 다이어그램이다.
[0013] 도 4는 본 발명의 구현예들에 따른 인코더의 블록도이다.
[0014] 도 5는 본 발명의 구현예들에 따른 디코더의 블록도이다.
[0015] 도 6은 레퍼런스 프레임 버퍼의 일례의 블록도이다.
[0016] 도 7은 비디오 시퀀스의 디스플레이 순서의 프레임들의 그룹의 다이어그램이다.
[0017] 도 8은 도 7의 프레임들의 그룹에 대한 코딩 순서의 일례의 다이어그램이다.
[0018] 도 9는 본 명세서의 교시들에 따른 모션 필드(motion field)의 선형 투영을 설명하는데 사용되는 다이어그램이다.
[0019] 도 10은 광흐름 추정을 사용하여 생성된 레퍼런스 프레임의 적어도 일부를 사용하는 비디오 프레임의 모션 보상 예측을 위한 프로세스의 흐름도이다.
[0020] 도 11은 광흐름 레퍼런스 프레임 부분을 생성하기 위한 프로세스의 흐름도이다.
[0021] 도 12는 광흐름 레퍼런스 프레임 부분을 생성하기 위한 다른 프로세스의 흐름도이다.
[0022] 도 13은 도 11과 도 12의 프로세스들을 예시하는 다이어그램이다.
[0023] 도 14는 객체 폐색(object occlusion)을 예시하는 다이어그램이다.
[0024] 도 15는 디코더를 최적화하기 위한 기술을 도시하는 다이어그램.DETAILED DESCRIPTION [0009] The description of the present specification refers to the accompanying drawings set forth below, wherein like reference numerals refer to like parts throughout, unless otherwise stated.
1 is a schematic diagram of a video encoding and decoding system;
2 is a block diagram of an example of a computing device that may implement a transmitting station or a receiving station.
3 is a diagram of an exemplary video stream to be encoded and subsequently decoded;
4 is a block diagram of an encoder according to implementations of the present invention;
5 is a block diagram of a decoder according to implementations of the present invention;
6 is a block diagram of an example of a reference frame buffer.
7 is a diagram of a group of frames in a display order of a video sequence.
FIG. 8 is a diagram of an example of a coding order for the group of frames of FIG. 7 .
9 is a diagram used to illustrate a linear projection of a motion field in accordance with the teachings herein.
10 is a flow diagram of a process for motion compensated prediction of a video frame using at least a portion of a reference frame generated using optical flow estimation.
11 is a flowchart of a process for generating a lightflow reference frame portion;
12 is a flowchart of another process for generating a lightflow reference frame portion;
13 is a diagram illustrating the processes of FIGS. 11 and 12 .
14 is a diagram illustrating object occlusion.
15 is a diagram illustrating a technique for optimizing a decoder;

[0025] 비디오 스트림은 비디오 스트림을 전송 또는 저장하는데 필요한 대역폭을 감소시키기 위해 다양한 기술들에 의해 압축될 수 있다. 비디오 스트림은 압축을 수반하는 비트스트림으로 인코딩될 수 있고, 이는 그리고 나서 비디오 스트림을 디코딩 또는 압축 해제하여 이를 시청(viewing) 또는 추가 처리를 위해 준비할 수 있는 디코더로 전송된다. 비디오 스트림의 압축은 종종 공간 및/또는 모션 보상 예측을 통한 비디오 신호들의 공간 및 시간 상관(spatial and temporal correlation)을 이용한다. 인터 예측(inter-prediction)은 예를 들면, 이전에 인코딩 및 디코딩된 픽셀들을 사용하여 인코딩될 현재 블록과 유사한 블록(예측 블록이라고도 함)을 생성하기 위해 하나 이상의 모션 벡터들을 사용한다. 모션 벡터(들) 및 두 블록들 사이의 차(差)를 인코딩함으로써, 인코딩된 신호를 수신하는 디코더는 현재 블록을 재생성할 수 있다. 인터 예측은 모션 보상 예측으로도 지칭될 수 있다.[0025] A video stream may be compressed by various techniques to reduce the bandwidth required to transmit or store the video stream. The video stream can be encoded into a bitstream with compression, which is then sent to a decoder, which can decode or decompress the video stream and prepare it for viewing or further processing. Compression of a video stream often utilizes spatial and temporal correlation of video signals through spatial and/or motion compensated prediction. Inter-prediction uses one or more motion vectors to generate a block (also called a prediction block) similar to the current block to be encoded, for example using previously encoded and decoded pixels. By encoding the motion vector(s) and the difference between the two blocks, a decoder receiving the encoded signal can regenerate the current block. Inter prediction may also be referred to as motion compensated prediction.

[0026] 인터 예측 프로세스에서 예측 블록을 생성하는데 사용된 각각의 모션 벡터는 현재 프레임 이외의 프레임, 즉 레퍼런스 프레임을 참조할 수 있다. 레퍼런스 프레임들은 비디오 스트림의 시퀀스에서 현재 프레임의 앞 또는 뒤에 위치될 수 있으며, 레퍼런스 프레임으로 사용되기 전에 재구성된 프레임들일 수 있다. 몇몇 경우들에서는, 비디오 시퀀스의 현재 프레임의 블록들을 인코딩 또는 디코딩하는데 사용되는 3 개의 레퍼런스 프레임들이 있을 수 있다. 하나는 골든 프레임으로 지칭될 수 있는 프레임이다. 다른 하나는 가장 최근에 인코딩 또는 디코딩된 프레임이다. 마지막 것은 시퀀스에서는 하나 이상의 프레임들 이전에 인코딩 또는 디코딩되지만 출력 디스플레이 순서에서는 이들 프레임들 이후에 디스플레이되는 대체 레퍼런스 프레임이다. 이러한 방식으로, 대체 레퍼런스 프레임은 역방향 예측에 사용 가능한 레퍼런스 프레임이다. 하나 이상의 순방향 및/또는 역방향 레퍼런스 프레임들이 블록을 인코딩 또는 디코딩하는데 사용될 수 있다. 현재 프레임 내에서 블록을 인코딩 또는 디코딩하는데 사용될 때 레퍼런스 프레임의 유효성은 결과적인 신호대 잡음비 또는 레이트-왜곡(rate-distortion)의 다른 측정치들에 기초하여 측정될 수 있다.[0026] Each motion vector used to generate a prediction block in the inter prediction process may refer to a frame other than the current frame, that is, a reference frame. The reference frames may be located before or after the current frame in the sequence of the video stream, and may be frames reconstructed before being used as the reference frame. In some cases, there may be three reference frames used to encode or decode blocks of a current frame of a video sequence. One is a frame that may be referred to as a golden frame. The other is the most recently encoded or decoded frame. The last one is an alternate reference frame that is encoded or decoded before one or more frames in the sequence but displayed after these frames in the output display order. In this way, the replacement reference frame is a reference frame usable for backward prediction. One or more forward and/or backward reference frames may be used to encode or decode a block. The validity of a reference frame when used to encode or decode a block within the current frame may be measured based on the resulting signal-to-noise ratio or other measures of rate-distortion.

[0027] 이 기술에서는, 예측 블록들을 형성하는 픽셀들이 이용 가능한 레퍼런스 프레임들 중 하나 이상으로부터 직접 획득된다. 레퍼런스 픽셀 블록들 또는 그 선형 조합들은 현재 프레임에서 주어진 코딩 블록의 예측에 사용된다. 이 직접적인 블록 기반의 예측은 레퍼런스 프레임들로부터 이용 가능한 진정한 모션 활동을 캡처하지 않는다. 이러한 이유로, 모션 보상 예측 정확도가 저하될 수 있다.[0027] In this technique, the pixels forming the predictive blocks are obtained directly from one or more of the available reference frames. Reference pixel blocks or their linear combinations are used for prediction of a given coding block in the current frame. This direct block-based prediction does not capture the true motion activity available from reference frames. For this reason, motion compensation prediction accuracy may be degraded.

[0028] 이용 가능한 양방향 레퍼런스 프레임들(예를 들면, 하나 이상의 순방향 및 하나 이상의 역방향 레퍼런스 프레임들)로부터의 모션 정보를 보다 온전히 활용하기 위해, 본 명세서의 교시들의 구현예들은 비디오 신호에서의 진정한 모션 활동들을 추정하기 위해 광흐름에 의해 계산된 픽셀당 모션 필드(per-pixel motion field)를 사용하는 현재 코딩 프레임 부분들과 병치된 레퍼런스 프레임 부분들을 기술한다. 레퍼런스 프레임들로부터 직접 결정되는 종래의 블록 기반의 모션 보상 예측의 능력을 넘어서는 복잡한 비병진(non-translational) 모션 활동의 추적을 가능케 하는 레퍼런스 프레임 부분들이 보간된다. 이러한 레퍼런스 프레임 부분들의 사용은 예측 품질을 향상시킬 수 있다. 본 명세서에 사용되는, 프레임 부분은 블록, 슬라이스, 또는 전체 프레임과 같은 프레임 중 일부 또는 모두를 지칭한다. 하나의 프레임의 프레임 부분은 이 프레임 부분과 다른 프레임의 프레임 부분이 동일한 치수들을 갖고 각각의 프레임의 치수들 내에서 동일한 픽셀 위치들에 있으면 다른 프레임의 프레임 부분과 병치된다.[0028] To more fully utilize motion information from available bidirectional reference frames (eg, one or more forward and one or more backward reference frames), implementations of the teachings herein estimate true motion activities in a video signal. We describe the reference frame portions juxtaposed with the current coding frame portions using the per-pixel motion field calculated by the light flow to Reference frame portions are interpolated, allowing tracking of complex non-translational motion activity beyond the capabilities of conventional block-based motion compensated prediction determined directly from reference frames. The use of these reference frame portions can improve the prediction quality. As used herein, a frame portion refers to some or all of a frame, such as a block, slice, or entire frame. A frame portion of one frame is juxtaposed with a frame portion of another frame if this frame portion and the frame portion of another frame have the same dimensions and are at the same pixel positions within the dimensions of each frame.

[0029] 비디오 압축 및 재구성에 사용하기 위해 레퍼런스 프레임 부분들을 보간하기 위해 광흐름 추정을 사용하는 것에 대한 추가 상세 내용은 본 명세서의 교시들이 구현될 수 있는 시스템에 대한 최초 참조와 함께 본 명세서에서 설명된다.[0029] Additional details of using optical flow estimation to interpolate reference frame portions for use in video compression and reconstruction are set forth herein with initial reference to a system in which the teachings herein may be implemented.

[0030] 도 1은 비디오 인코딩 및 디코딩 시스템(100)의 개략도이다. 송신 스테이션(102)은 예를 들면, 도 2에 기술된 것과 같은 하드웨어의 내부 구성을 갖는 컴퓨터일 수 있다. 하지만, 송신 스테이션(102)의 다른 적절한 구현예들도 가능하다. 예를 들어, 송신 스테이션(102)의 처리는 다수의 디바이스들 간에 분산될 수 있다.[0030] 1 is a schematic diagram of a video encoding and decoding system 100 . The transmitting station 102 may be, for example, a computer having an internal configuration of hardware as described in FIG. 2 . However, other suitable implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 may be distributed among multiple devices.

[0031] 네트워크(104)는 비디오 스트림의 인코딩 및 디코딩을 위해 송신 스테이션(102)과 수신 스테이션(106)을 연결할 수 있다. 구체적으로, 비디오 스트림은 송신 스테이션(102)에서 인코딩될 수 있고 인코딩된 비디오 스트림은 수신 스테이션(106)에서 디코딩될 수 있다. 네트워크(104)는 예를 들면, 인터넷일 수 있다. 네트워크(104)는 또한 근거리 통신망(local area network: LAN), 광역 통신망(wide area network: WAN), 가상 사설망(virtual private network: VPN), 셀룰러 전화 네트워크, 또는 이 예에서는 송신 스테이션(102)으로부터 수신 스테이션(106)으로 비디오 스트림을 전송하는 임의의 다른 수단일 수도 있다.[0031] The network 104 may connect the transmitting station 102 and the receiving station 106 for encoding and decoding of video streams. Specifically, the video stream may be encoded at the transmitting station 102 and the encoded video stream may be decoded at the receiving station 106 . Network 104 may be, for example, the Internet. The network 104 may also be a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), a cellular telephone network, or in this example from a transmitting station 102 . It may be any other means of transmitting the video stream to the receiving station 106 .

[0032] 수신 스테이션(106)은 일례에서, 도 2에 기술된 것과 같은 하드웨어의 내부 구성을 갖는 컴퓨터일 수 있다. 하지만, 수신 스테이션(106)의 다른 적절한 구현예들도 가능하다. 예를 들어, 수신 스테이션(106)의 처리는 다수의 디바이스들 간에 분산될 수 있다.[0032] Receiving station 106 may be, in one example, a computer having an internal configuration of hardware as described in FIG. 2 . However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 may be distributed among multiple devices.

[0033] 비디오 인코딩 및 디코딩 시스템(100)의 다른 구현예들도 가능하다. 예를 들어, 하나의 구현예는 네트워크(104)를 생략할 수 있다. 다른 구현예에서, 비디오 스트림은 인코딩된 후 나중에 수신 스테이션(106) 또는 비일시적 저장 매체 또는 메모리를 갖는 임의의 다른 디바이스로 전송하기 위해 저장될 수 있다. 일 구현예에서, 수신 스테이션(106)은 (예를 들면, 네트워크(104), 컴퓨터 버스, 및/또는 소정의 통신 경로를 통해) 인코딩된 비디오 스트림을 수신하고 나중에 디코딩하기 위해 비디오 스트림을 저장한다. 예시적인 구현예에서, 네트워크(104)를 통해 인코딩된 비디오의 전송을 위해 실시간 전송 프로토콜(RTP)이 사용된다. 다른 구현예에서는, RTP 이외의 전송 프로토콜, 예를 들면, HTTP(Hypertext Transfer Protocol) 기반의 비디오 스트리밍 프로토콜이 사용될 수도 있다.[0033] Other implementations of the video encoding and decoding system 100 are possible. For example, one implementation may omit the network 104 . In another implementation, the video stream may be encoded and then stored for later transmission to the receiving station 106 or any other device having a non-transitory storage medium or memory. In one implementation, the receiving station 106 receives the encoded video stream (eg, over the network 104, computer bus, and/or some communication path) and stores the video stream for later decoding. . In an example implementation, Real Time Transport Protocol (RTP) is used for transmission of encoded video over network 104 . In other implementations, a transport protocol other than RTP may be used, for example, a video streaming protocol based on Hypertext Transfer Protocol (HTTP).

[0034] 화상 회의 시스템에 사용될 때, 예를 들면, 송신 스테이션(102) 및/또는 수신 스테이션(106)은 후술하는 바와 같이 비디오 스트림을 인코딩 및 디코딩하는 능력을 포함할 수 있다. 예를 들어, 수신 스테이션(106)은 화상 회의 서버(예를 들면, 송신 스테이션(102))로부터 인코딩된 비디오 비트스트림을 수신하여 디코딩 및 시청하며 또한 그 자신의 비디오 비트스트림을 인코딩하여 다른 참가자들에 의한 디코딩 및 시청을 위해 화상 회의 서버로 전송하는 화상 회의 참가자일 수 있다.[0034] When used in a videoconferencing system, for example, the transmitting station 102 and/or the receiving station 106 may include the ability to encode and decode a video stream as described below. For example, receiving station 106 receives, decodes and views an encoded video bitstream from a video conferencing server (eg, transmitting station 102) and also encodes its own video bitstream to allow other participants to participate. may be a video conference participant who transmits to a video conference server for decoding and viewing by

[0035] 도 2는 송신 스테이션 또는 수신 스테이션을 구현할 수 있는 컴퓨팅 디바이스(200)의 일례의 블록도이다. 예를 들면, 컴퓨팅 디바이스(200)는 도 1의 송신 스테이션(102)과 수신 스테이션(106) 중 어느 하나 또는 양자 모두를 구현할 수 있다. 컴퓨팅 디바이스(200)는 다수의 컴퓨팅 디바이스들을 포함하는 컴퓨팅 시스템의 형태, 또는 하나의 컴퓨팅 디바이스의 형태, 예를 들면, 휴대 전화, 태블릿 컴퓨터, 랩탑 컴퓨터, 노트북 컴퓨터, 데스크탑 컴퓨터 등일 수 있다.[0035] 2 is a block diagram of an example of a computing device 200 that may implement a transmitting station or a receiving station. For example, computing device 200 may implement either or both of transmit station 102 and receive station 106 of FIG. 1 . Computing device 200 may be in the form of a computing system including multiple computing devices, or in the form of a single computing device, eg, a mobile phone, tablet computer, laptop computer, notebook computer, desktop computer, and the like.

[0036] 컴퓨팅 디바이스(200) 내의 CPU(202)는 중앙 처리 장치일 수 있다. 대안적으로, CPU(202)는 현재 존재하거나 앞으로 개발되는 정보를 조작 또는 처리할 수 있는 임의의 다른 유형의 디바이스 또는 다수의 디바이스들일 수 있다. 개시된 구현예들은 도시된 바와 같이 하나의 프로세서, 예를 들면, CPU(202)로 실시될 수도 있으나, 2 개 이상의 프로세서를 사용하면 속도 및 효율에 있어서 이점이 달성될 수 있다.[0036] The CPU 202 in the computing device 200 may be a central processing unit. Alternatively, the CPU 202 may be any other type of device or multiple devices capable of manipulating or processing information that currently exists or is developed in the future. Although the disclosed implementations may be implemented with one processor, eg, CPU 202 as shown, advantages in speed and efficiency may be achieved using two or more processors.

[0037] 컴퓨팅 디바이스(200)의 메모리(204)는 구현예에서 읽기 전용 메모리(ROM) 디바이스 또는 랜덤 액세스 메모리(RAM) 디바이스일 수 있다. 임의의 다른 적합한 유형의 스토리지 디바이스 또는 비일시적 저장 매체가 메모리(204)로서 사용될 수도 있다. 메모리(204)는 버스(212)를 사용하여 CPU(202)에 의해 액세스되는 코드 및 데이터(206)를 포함할 수 있다. 메모리(204)는 운영 체제(OS)(208)와 애플리케이션 프로그램들(210)을 더 포함할 수 있으며, 애플리케이션 프로그램들(210)은 CPU(202)가 본 명세서에 기술된 방법들을 수행할 수 있게 하는 적어도 하나의 프로그램을 포함한다. 예를 들면, 애플리케이션 프로그램들(210)은 애플리케이션들 1 내지 N을 포함할 수 있으며, 애플리케이션들 1 내지 N은 본 명세서에 기술된 방법들을 수행하는 비디오 코딩 애플리케이션을 더 포함한다. 컴퓨팅 디바이스(200)는 예를 들면, 모바일 컴퓨팅 디바이스와 함께 사용되는 메모리 카드일 수 있는 2 차 스토리지(214)를 또한 포함할 수 있다. 비디오 통신 세션들은 상당한 양의 정보를 포함할 수 있기 때문에, 이들은 2 차 스토리지(214)에 전체적으로 또는 부분적으로 저장될 수 있고 처리를 위해 필요에 따라 메모리(204)에 로딩될 수 있다.[0037] Memory 204 of computing device 200 may be a read only memory (ROM) device or a random access memory (RAM) device in implementations. Any other suitable type of storage device or non-transitory storage medium may be used as the memory 204 . Memory 204 may contain code and data 206 that is accessed by CPU 202 using bus 212 . The memory 204 may further include an operating system (OS) 208 and application programs 210 that enable the CPU 202 to perform the methods described herein. at least one program that For example, application programs 210 may include applications 1 to N, wherein applications 1 to N further include a video coding application for performing the methods described herein. Computing device 200 may also include secondary storage 214 , which may be, for example, a memory card used with a mobile computing device. Because video communication sessions may contain a significant amount of information, they may be stored in whole or in part in secondary storage 214 and loaded into memory 204 as needed for processing.

[0038] 컴퓨팅 디바이스(200)는 또한 디스플레이(218)와 같은, 하나 이상의 출력 디바이스들을 포함할 수 있다. 디스플레이(218)는 일례에서, 터치 입력들을 감지하도록 작동 가능한 터치 감지 요소와 디스플레이를 결합한 터치 감지 디스플레이일 수 있다. 디스플레이(218)는 버스(212)를 통해 CPU(202)에 결합될 수 있다. 사용자가 컴퓨팅 디바이스(200)를 프로그래밍하거나 달리 사용할 수 있게 하는 다른 출력 디바이스들이 디스플레이(218)에 추가적으로 또는 대체로서 제공될 수 있다. 출력 디바이스가 디스플레이이거나 디스플레이를 포함하는 경우, 디스플레이는 액정 디스플레이(LCD), 음극선관(CRT) 디스플레이, 또는 유기 LED(OLED) 디스플레이와 같은 발광 다이오드(LED) 디스플레이에 의한 것을 포함하여, 다양한 방식들로 구현될 수 있다.[0038] Computing device 200 may also include one or more output devices, such as display 218 . Display 218 may, in one example, be a touch-sensitive display that combines a display with a touch-sensitive element operable to sense touch inputs. Display 218 may be coupled to CPU 202 via bus 212 . Other output devices that allow a user to program or otherwise use computing device 200 may be provided in addition to or in place of display 218 . Where the output device is or comprises a display, the display may be displayed in a variety of ways, including by a light emitting diode (LED) display, such as a liquid crystal display (LCD), a cathode ray tube (CRT) display, or an organic LED (OLED) display. can be implemented as

[0039] 컴퓨팅 디바이스(200)는 이미지 감지 디바이스(220), 예를 들면 카메라, 또는 컴퓨팅 디바이스(200)를 조작하고 있는 사용자의 이미지와 같은 이미지를 감지할 수 있는 현재 존재하거나 앞으로 개발되는 임의의 다른 이미지 감지 디바이스(220)를 또한 포함하거나 이와 통신할 수 있다. 이미지 감지 디바이스(220)는 컴퓨팅 디바이스(200)를 조작하는 사용자 쪽으로 지향되도록 위치될 수 있다. 일례에서, 이미지 감지 디바이스(220)의 위치와 광축(optical axis)은 시야가 디스플레이(218)에 바로 인접하고 디스플레이(218)가 보이는 영역을 포함하도록 구성될 수 있다.[0039] Computing device 200 may detect image sensing device 220 , eg, a camera, or any other image sensing currently in existence or developing in the future capable of sensing an image, such as an image of a user operating computing device 200 . It may also include or communicate with device 220 . The image sensing device 220 may be positioned to be directed towards a user manipulating the computing device 200 . In one example, the position and optical axis of the image sensing device 220 may be configured such that the field of view is immediately adjacent to the display 218 and includes an area in which the display 218 is visible.

[0040] 컴퓨팅 디바이스(200)는 사운드 감지 디바이스(222), 예를 들면 마이크로폰, 또는 컴퓨팅 디바이스(200) 근처의 사운드를 감지할 수 있는 현재 존재하거나 앞으로 개발되는 임의의 다른 사운드 감지 디바이스를 또한 포함하거나 이와 통신할 수 있다. 사운드 감지 디바이스(222)는 컴퓨팅 디바이스(200)를 조작하는 사용자 쪽으로 지향되도록 위치될 수 있고, 사용자가 컴퓨팅 디바이스(200)를 조작하고 있는 동안 사용자가 내는 사운드, 예를 들면 음성 또는 다른 발성들(utterances)을 수신하도록 구성될 수 있다.[0040] Computing device 200 also includes or communicates with sound sensing device 222 , for example a microphone, or any other sound sensing device currently existing or developed in the future capable of sensing sound in the vicinity of computing device 200 . can do. The sound sensing device 222 may be positioned to be directed towards a user who is operating the computing device 200 , and sounds such as a voice or other utterances made by the user while the user is operating the computing device 200 . utterances).

[0041] 도 2는 컴퓨팅 디바이스(200)의 CPU(202) 및 메모리(204)가 하나의 유닛으로 통합된 것으로 도시하고 있으나, 다른 구성들이 이용될 수도 있다. CPU(202)의 동작들은 직접 또는 로컬 영역 또는 다른 네트워크에 걸쳐 결합될 수 있는 다수의 기계(개별 기계들은 하나 이상의 프로세서들을 가질 수 있음)에 걸쳐서 분산될 수 있다. 메모리(204)는 네트워크 기반 메모리 또는 컴퓨팅 디바이스(200)의 동작들을 수행하는 다수의 기계들의 메모리와 같이 다수의 기계들에 걸쳐 분산될 수 있다. 여기서는 하나의 버스로 도시되어 있으나, 컴퓨팅 디바이스(200)의 버스(212)는 다수의 버스들로 구성될 수도 있다. 또한, 2 차 스토리지(214)는 컴퓨팅 디바이스(200)의 다른 컴포넌트들에 직접 결합될 수 있거나 네트워크를 통해 액세스될 수 있고 메모리 카드와 같은 통합 유닛 또는 다수의 메모리 카드들과 같은 다수의 유닛들을 포함할 수 있다. 컴퓨팅 디바이스(200)는 그래서 다양한 구성들로 구현될 수 있다.[0041] 2 shows the CPU 202 and the memory 204 of the computing device 200 as integrated into one unit, other configurations may be used. The operations of CPU 202 may be distributed directly or across multiple machines (individual machines may have one or more processors), which may be coupled across a local area or other network. Memory 204 may be distributed across multiple machines, such as a network-based memory or memory of multiple machines performing operations of computing device 200 . Although shown here as one bus, the bus 212 of the computing device 200 may be comprised of multiple buses. In addition, secondary storage 214 may be coupled directly to other components of computing device 200 or may be accessed over a network and may include an integrated unit, such as a memory card, or multiple units, such as multiple memory cards. can do. Computing device 200 may thus be implemented in a variety of configurations.

[0042] 도 3은 인코딩되고 후속적으로 디코딩될 비디오 스트림(300)의 일례의 다이어그램이다. 비디오 스트림(300)은 비디오 시퀀스(302)를 포함한다. 다음 레벨에서, 비디오 시퀀스(302)는 다수의 인접 프레임들(304)을 포함한다. 3 개의 프레임들이 인접 프레임들(304)로서 도시되어 있으나, 비디오 시퀀스(302)는 임의의 개수의 인접 프레임들(304)을 포함할 수 있다. 그 후, 인접 프레임들(304)은 개별 프레임들, 예를 들면 프레임(306)으로 더욱 세분될 수 있다. 다음 레벨에서, 프레임(306)은 일련의 평면들 또는 세그먼트들(308)로 분할될 수 있다. 세그먼트들(308)은 예를 들면, 병렬 처리를 가능케 하는 프레임들의 부분집합(서브세트)일 수 있다. 세그먼트들(308)은 또한 비디오 데이터를 별개의 컬러들로 분리할 수 있는 프레임들의 부분집합일 수 있다. 예를 들어, 컬러 비디오 데이터의 프레임(306)은 휘도 평면과 2 개의 색차 평면들을 포함할 수 있다. 세그먼트들(308)은 상이한 해상도들로 샘플링될 수 있다.[0042] 3 is a diagram of an example of a video stream 300 to be encoded and subsequently decoded. The video stream 300 includes a video sequence 302 . At the next level, the video sequence 302 includes a number of contiguous frames 304 . Although three frames are shown as contiguous frames 304 , the video sequence 302 may include any number of contiguous frames 304 . The adjacent frames 304 may then be further subdivided into individual frames, eg, frame 306 . At the next level, the frame 306 may be divided into a series of planes or segments 308 . Segments 308 may be, for example, a subset (subset) of frames that allow for parallel processing. Segments 308 may also be a subset of frames that may separate video data into distinct colors. For example, a frame 306 of color video data may include a luminance plane and two chrominance planes. Segments 308 may be sampled at different resolutions.

[0043] 프레임(306)이 세그먼트들(308)로 분할되는지 여부에 관계없이, 프레임(306)은 예를 들면, 프레임(306)의 16x16 픽셀들에 대응하는 데이터를 포함할 수 있는 블록들(310)로 더욱 세분될 수 있다. 블록들(310)은 픽셀 데이터의 하나 이상의 세그먼트들(308)로부터의 데이터를 포함하도록 또한 배열될 수 있다. 블록들(310)은 또한 4x4 픽셀들, 8x8 픽셀들, 16x8 픽셀들, 8x16 픽셀들, 16x16 픽셀들, 또는 그 이상과 같은 임의의 다른 적절한 크기일 수 있다. 달리 언급되지 않는 한, 블록 및 매크로블록이라는 용어들은 본 명세서에서 상호 교환적으로 사용된다.[0043] Irrespective of whether the frame 306 is divided into segments 308 , the frame 306 is divided into blocks 310 , which may contain data corresponding to, for example, 16×16 pixels of the frame 306 . can be further subdivided. Blocks 310 may also be arranged to include data from one or more segments 308 of pixel data. Blocks 310 may also be of any other suitable size, such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or more. Unless stated otherwise, the terms block and macroblock are used interchangeably herein.

[0044] 도 4는 본 발명의 구현예들에 따른 인코더(400)의 블록도이다. 인코더(400)는 전술한 바와 같이, 예컨대 메모리, 예를 들면 메모리(204)에 저장된 컴퓨터 소프트웨어 프로그램을 제공함으로써, 송신 스테이션(102)에서 구현될 수 있다. 컴퓨터 소프트웨어 프로그램은, CPU(202)와 같은 프로세서에 의해 실행될 때 송신 스테이션(102)이 도 4에 기술된 방식으로 비디오 데이터를 인코딩하게 하는 기계 명령들을 포함할 수 있다. 인코더(400)는 또한, 예를 들면 송신 스테이션(102)에 포함된 특수 하드웨어로서 구현될 수도 있다. 하나의 특히 바람직한 구현예에서, 인코더(400)는 하드웨어 인코더이다.[0044] 4 is a block diagram of an encoder 400 in accordance with implementations of the present invention. The encoder 400 may be implemented at the transmitting station 102 , as described above, for example, by providing a computer software program stored in a memory, eg, the memory 204 . The computer software program may include machine instructions that, when executed by a processor, such as CPU 202 , cause transmitting station 102 to encode video data in the manner described in FIG. 4 . The encoder 400 may also be implemented as special hardware included, for example, in the transmitting station 102 . In one particularly preferred implementation, the encoder 400 is a hardware encoder.

[0045] 인코더(400)는 비디오 스트림(300)을 입력으로 사용하여 인코딩되거나 압축된 비트스트림(420)을 생성하기 위해 (실선 연결 라인들로 도시된) 순방향 경로에서 다양한 기능들을 수행하기 위해 다음의 단계들을 갖는다: 인트라/인터 예측 단계(402), 변환 단계(404), 양자화 단계(406), 및 엔트로피 인코딩 단계(408). 인코더(400)는 장래 블록들(future blocks)의 인코딩을 위한 프레임을 재구성하기 위해 (점선 연결 라인들로 도시된) 재구성 경로를 또한 포함할 수 있다. 도 4에서, 인코더(400)는 재구성 경로에서 다양한 기능들을 수행하기 위해 다음의 단계들을 갖는다: 역양자화 단계(410), 역변환 단계(412), 재구성 단계(414) 및 루프 필터링 단계(416). 비디오 스트림(300)을 인코딩하기 위해 인코더(400)의 다른 구조적 변형들이 사용될 수도 있다.[0045] The encoder 400 performs the following steps to perform various functions in a forward path (shown by solid connecting lines) to generate an encoded or compressed bitstream 420 using the video stream 300 as input. It has: an intra/inter prediction step 402 , a transform step 404 , a quantization step 406 , and an entropy encoding step 408 . The encoder 400 may also include a reconstruction path (shown by dashed connecting lines) to reconstruct a frame for encoding of future blocks. In FIG. 4 , the encoder 400 has the following steps to perform various functions in the reconstruction path: inverse quantization step 410 , inverse transform step 412 , reconstruction step 414 and loop filtering step 416 . Other structural variants of the encoder 400 may be used to encode the video stream 300 .

[0046] 비디오 스트림(300)이 인코딩을 위해 제시될 때, 프레임(306)과 같은 각각의 프레임들(304)은 블록들의 단위들로 처리될 수 있다. 인트라/인터 예측 단계(402)에서, 각각의 블록들은 인트라-프레임 예측(인트라 예측이라고도 함) 또는 인터-프레임 예측(인터 예측이라고도 함)을 사용하여 인코딩될 수 있다. 어떤 경우에도, 예측 블록이 형성될 수 있다. 인트라 예측의 경우, 이전에 인코딩 및 재구성된 현재 프레임의 샘플들로부터 예측 블록이 형성될 수 있다. 인터 예측의 경우, 하나 이상의 이전에 구성된 레퍼런스 프레임들의 샘플들로부터 예측 블록이 형성될 수 있다. 블록들의 그룹들에 대한 레퍼런스 프레임들의 지정은 아래에서 더욱 상세히 논의된다.[0046] When the video stream 300 is presented for encoding, each frame 304 , such as frame 306 , may be treated in units of blocks. In the intra/inter prediction step 402 , each of the blocks may be encoded using either intra-frame prediction (also called intra prediction) or inter-frame prediction (also called inter prediction). In any case, a prediction block can be formed. In the case of intra prediction, a prediction block may be formed from samples of a previously encoded and reconstructed current frame. For inter prediction, a prediction block may be formed from samples of one or more previously constructed reference frames. The assignment of reference frames to groups of blocks is discussed in more detail below.

[0047] 다음으로, 여전히 도 4를 참조하면, 잔차 블록(또한 잔차(residual)라고도 함)을 생성하기 위해 예측 블록은 인트라/인터 예측 단계(402)에서 현재 블록으로부터 차감될 수 있다. 변환 단계(404)는 예를 들면, 블록 기반 변환들을 사용하여 주파수 영역에서 잔차를 변환 계수들로 변환한다. 양자화 단계(406)는 양자화기 값(quantizer value) 또는 양자화 레벨을 사용하여 변환 계수들을 양자화 변환 계수들로 지칭되는 이산 양자 값들(discrete quantum values)로 변환한다. 예를 들면, 변환 계수들은 양자화기 값으로 나뉘어지거나 트렁케이션(truncation)될 수 있다. 양자화된 변환 계수들은 그리고 나서 엔트로피 인코딩 단계(408)에 의해 엔트로피 인코딩된다. 엔트로피 인코딩된 계수들은 그 다음에, 예를 들면, 사용된 예측 유형, 변환 유형, 모션 벡터들, 및 양자화기 값을 포함할 수 있는 블록을 디코딩하는데 사용된 다른 정보와 함께, 압축된 비트스트림(420)으로 출력된다. 압축된 비트스트림(420)은 가변 길이 코딩(VLC) 또는 산술 코딩과 같은 다양한 기술들을 사용하여 포맷될 수 있다. 압축된 비트스트림(420)은 인코딩된 비디오 스트림 또는 인코딩된 비디오 비트스트림으로도 또한 지칭될 수 있으며, 그래서 이들 용어들은 본 명세서에서 상호 교환적으로 사용될 것이다.[0047] Next, still referring to FIG. 4 , the predictive block may be subtracted from the current block in an intra/inter prediction step 402 to generate a residual block (also referred to as a residual). Transform step 404 transforms the residual into transform coefficients in the frequency domain using, for example, block-based transforms. The quantization step 406 transforms the transform coefficients into discrete quantum values referred to as quantized transform coefficients using a quantizer value or quantization level. For example, the transform coefficients may be divided or truncated by the quantizer value. The quantized transform coefficients are then entropy encoded by an entropy encoding step 408 . The entropy encoded coefficients are then stored in the compressed bitstream ( 420) is output. The compressed bitstream 420 may be formatted using various techniques such as variable length coding (VLC) or arithmetic coding. The compressed bitstream 420 may also be referred to as an encoded video stream or an encoded video bitstream, so these terms will be used interchangeably herein.

[0048] (점선 연결 라인들로 도시된) 도 4의 재구성 경로는 인코더(400) 및 디코더(500)(후술됨)가 압축된 비트스트림(420)을 디코딩하기 위해 동일한 레퍼런스 프레임들을 사용하는 것을 보장하기 위해 사용될 수 있다. 재구성 경로는 파생 잔차 블록(파생 잔차(derivative residual)라고도 함)을 생성하기 위해 양자화된 변환 계수들을 역양자화 단계(410)에서 역양자화하는 것 및 역양자화된 변환 계수들을 역변환 단계(412)에서 역변환하는 것을 포함하여, 아래에서 보다 상세히 논의되는 디코딩 프로세스 중에 발생하는 기능들과 유사한 기능들을 수행한다. 재구성 단계(414)에서, 인트라/인터 예측 단계(402)에서 예측된 예측 블록은 파생 잔차에 추가되어 재구성된 블록을 생성할 수 있다. 블로킹 아티팩트들(blocking artifacts)과 같은 왜곡을 감소시키기 위해 루프 필터링 단계(416)가 재구성된 블록에 적용될 수 있다.[0048] The reconstruction path of FIG. 4 (shown by dashed connecting lines) is used to ensure that encoder 400 and decoder 500 (described below) use the same reference frames to decode the compressed bitstream 420 . can be used The reconstruction path inversely quantizes the quantized transform coefficients in an inverse quantization step 410 to produce a derivative residual block (also called a derivative residual) and inverse transforms the inverse quantized transform coefficients in an inverse transform step 412 . performs functions similar to those occurring during the decoding process discussed in more detail below. In the reconstruction step 414, the prediction block predicted in the intra/inter prediction step 402 may be added to the derivative residual to generate a reconstructed block. A loop filtering step 416 may be applied to the reconstructed block to reduce distortion such as blocking artifacts.

[0049] 인코더(400)의 다른 변형들이 압축된 비트스트림(420)을 인코딩하는데 사용될 수 있다. 예를 들면, 비변환 기반 인코더가 특정 블록들 또는 프레임들에 대한 변환 단계(404)없이 직접 잔차 신호를 양자화할 수 있다. 다른 구현예에서, 인코더는 양자화 단계(406) 및 역양자화 단계(410)를 공통 단계로 결합할 수 있다.[0049] Other variants of the encoder 400 may be used to encode the compressed bitstream 420 . For example, a non-transform-based encoder may quantize the residual signal directly without a transform step 404 for specific blocks or frames. In another implementation, the encoder may combine the quantization step 406 and the inverse quantization step 410 into a common step.

[0050] 도 5는 본 발명의 구현예들에 따른 디코더(500)의 블록도이다. 디코더(500)는 예를 들면, 메모리(204)에 저장된 컴퓨터 소프트웨어 프로그램을 제공함으로써 수신 스테이션(106)에서 구현될 수 있다. 컴퓨터 소프트웨어 프로그램은 CPU(202)와 같은 프로세서에 의해 실행될 때 수신 스테이션(106)이 도 5에 기술된 방식으로 비디오 데이터를 디코딩하게 하는 기계 명령들을 포함할 수 있다. 디코더(500)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)에 포함된 하드웨어에서도 또한 구현될 수 있다.[0050] 5 is a block diagram of a decoder 500 in accordance with implementations of the present invention. The decoder 500 may be implemented at the receiving station 106 by, for example, providing a computer software program stored in the memory 204 . The computer software program may include machine instructions that, when executed by a processor, such as CPU 202 , cause receiving station 106 to decode video data in the manner described in FIG. 5 . The decoder 500 may also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106 .

[0051] 디코더(500)는 위에서 논의된 인코더(400)의 재구성 경로와 유사하게, 일례에서 압축된 비트스트림(420)으로부터 출력 비디오 스트림(516)을 생성하기 위한 다양한 기능들을 수행하기 위해 다음의 단계들을 포함한다: 엔트로피 디코딩 단계(502), 역양자화 단계(504), 역변환 단계(506), 인트라/인터 예측 단계(508), 재구성 단계(510), 루프 필터링 단계(512), 및 디블로킹 필터링 단계(514). 압축된 비트스트림(420)을 디코딩하기 위해 디코더(500)의 다른 구조적 변형들이 사용될 수도 있다.[0051] The decoder 500 includes the following steps to perform various functions for generating an output video stream 516 from the compressed bitstream 420 in one example, similar to the reconstruction path of the encoder 400 discussed above. Do: entropy decoding step 502, inverse quantization step 504, inverse transform step 506, intra/inter prediction step 508, reconstruction step 510, loop filtering step 512, and deblocking filtering step ( 514). Other structural variants of the decoder 500 may be used to decode the compressed bitstream 420 .

[0052] 압축된 비트스트림(420)이 디코딩을 위해 제시될 때, 압축된 비트스트림(420) 내의 데이터 요소들은 엔트로피 디코딩 단계(502)에 의해 디코딩되어 한 세트의 양자화된 변환 계수들을 생성할 수 있다. 역양자화 단계(504)는 (예를 들면, 양자화된 변환 계수들에 양자화기 값을 곱함으로써) 양자화된 변환 계수들을 역양자화하고, 역변환 단계(506)는 역양자화된 변환 계수들을 역변환하여 인코더(400)에서의 역변환 단계(412)에 의해 생성된 것과 동일할 수 있는 파생 잔차를 생성한다. 압축된 비트스트림(420)으로부터 디코딩된 헤더 정보를 사용하여, 디코더(500)는 인트라/인터 예측 단계(508)를 사용하여 인코더(400)에서, 예를 들면 인트라/인터 예측 단계(402)에서 생성된 것과 동일한 예측 블록을 생성할 수 있다. 재구성 단계(510)에서, 예측 블록은 파생 잔차에 추가되어 재구성된 블록을 생성할 수 있다. 루프 필터링 단계(512)는 블록킹 아티팩트들을 저감하기 위해 재구성된 블록에 적용될 수 있다.[0052] When the compressed bitstream 420 is presented for decoding, the data elements in the compressed bitstream 420 may be decoded by an entropy decoding step 502 to produce a set of quantized transform coefficients. Inverse quantization step 504 inverse quantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by a quantizer value), and inverse transform step 506 inverse transforms the inverse quantized transform coefficients to an encoder ( produce a derived residual, which may be identical to that produced by the inverse transform step 412 at 400). Using the header information decoded from the compressed bitstream 420 , the decoder 500 uses an intra/inter prediction step 508 in the encoder 400 , for example in an intra/inter prediction step 402 . It is possible to generate the same prediction block as generated. In the reconstruction step 510 , the prediction block may be added to the derivative residual to produce a reconstructed block. A loop filtering step 512 may be applied to the reconstructed block to reduce blocking artifacts.

[0053] 재구성된 블록에는 다른 필터링이 적용될 수도 있다. 이 예에서는, 블로킹 왜곡을 저감하기 위해 디블로킹 필터링 단계(514)가 재구성된 블록에 적용되며, 결과는 출력 비디오 스트림(516)으로 출력된다. 출력 비디오 스트림(516)은 디코딩된 비디오 스트림으로도 또한 지칭될 수 있으며, 그래서 이들 용어들은 본 명세서에서 상호 교환적으로 사용될 것이다. 디코더(500)의 다른 변형들이 압축된 비트스트림(420)을 디코딩하는데 사용될 수도 있다. 예를 들어, 디코더(500)는 디블로킹 필터링 단계(514)없이도 출력 비디오 스트림(516)을 생성할 수 있다.[0053] Other filtering may be applied to the reconstructed block. In this example, a deblocking filtering step 514 is applied to the reconstructed blocks to reduce blocking distortion, and the result is output as an output video stream 516 . Output video stream 516 may also be referred to as a decoded video stream, so these terms will be used interchangeably herein. Other variants of the decoder 500 may be used to decode the compressed bitstream 420 . For example, the decoder 500 may generate the output video stream 516 without the deblocking filtering step 514 .

[0054] 도 6은 레퍼런스 프레임 버퍼(600)의 일례의 블록도이다. 레퍼런스 프레임 버퍼(600)는 비디오 시퀀스의 프레임들의 블록들을 인코딩 또는 디코딩하는데 사용되는 레퍼런스 프레임들을 저장한다. 이 예에서, 레퍼런스 프레임 버퍼(600)는 마지막 프레임 LAST_FRAME(602), 골든 프레임 GOLDEN_FRAME (604), 및 대체 레퍼런스 프레임 ALTREF_FRAME(606)으로 식별되는 레퍼런스 프레임들을 포함한다. 레퍼런스 프레임의 프레임 헤더는 레퍼런스 프레임이 저장된 레퍼런스 프레임 버퍼 내의 위치에 대한 가상 인덱스를 포함할 수 있다. 레퍼런스 프레임 매핑은 레퍼런스 프레임의 가상 인덱스를 레퍼런스 프레임이 저장된 메모리의 물리적 인덱스에 매핑할 수 있다. 2 개의 레퍼런스 프레임들이 동일한 프레임인 경우, 이들 레퍼런스 프레임들은 상이한 가상 인덱스들을 갖는다 하더라도 동일한 물리적 인덱스를 갖게 된다. 사용된 레퍼런스 프레임 버퍼(600) 내의 레퍼런스 위치들의 개수, 유형들, 및 명칭들 단지 예들일 뿐이다.[0054] 6 is a block diagram of an example of a reference frame buffer 600 . The reference frame buffer 600 stores reference frames used to encode or decode blocks of frames of a video sequence. In this example, reference frame buffer 600 contains reference frames identified as last frame LAST_FRAME 602 , golden frame GOLDEN_FRAME 604 , and replacement reference frame ALTREF_FRAME 606 . The frame header of the reference frame may include a virtual index for a location in the reference frame buffer in which the reference frame is stored. The reference frame mapping may map a virtual index of the reference frame to a physical index of a memory in which the reference frame is stored. When two reference frames are the same frame, these reference frames have the same physical index even though they have different virtual indexes. The number, types, and names of reference locations in the reference frame buffer 600 used are merely examples.

[0055] 레퍼런스 프레임 버퍼(600)에 저장된 레퍼런스 프레임들은 인코딩 또는 디코딩될 프레임들의 블록들을 예측하기 위한 모션 벡터들을 식별하는데 사용될 수 있다. 현재 프레임의 현재 블록을 예측하기 위해 사용되는 예측의 유형에 따라 상이한 레퍼런스 프레임들이 사용될 수 있다. 예를 들면, 양방향 예측(bi-prediction)에서, 현재 프레임의 블록들은 LAST_FRAME(602) 또는 GOLDEN_FRAME(604)로서 저장된 프레임을 사용하여 순방향 예측될 수 있고, ALTREF_FRAME(606)로서 저장된 프레임을 사용하여 역방향 예측될 수 있다.[0055] Reference frames stored in the reference frame buffer 600 may be used to identify motion vectors for predicting blocks of frames to be encoded or decoded. Different reference frames may be used according to the type of prediction used to predict the current block of the current frame. For example, in bi-prediction, blocks of the current frame can be forward predicted using a frame stored as LAST_FRAME 602 or GOLDEN_FRAME 604 and backward using a frame stored as ALTREF_FRAME 606 . can be predicted.

[0056] 레퍼런스 프레임 버퍼(600) 내에 저장될 수 있는 유한 개수의 레퍼런스 프레임들이 있을 수 있다. 도 6에 도시된 바와 같이, 레퍼런스 프레임 버퍼(600)는 최대 8 개의 레퍼런스 프레임들을 저장할 수 있으며, 각각의 저장된 레퍼런스 프레임은 레퍼런스 프레임 버퍼의 상이한 가상 인덱스와 연관될 수 있다. 레퍼런스 프레임 버퍼(600)에서 8 개의 공간들 중 3 개가 LAST_FRAME(602), GOLDEN_FRAME(604), 및 ALTREF_FRAME(606)으로 지정된 프레임들에 의해 사용되지만, 5 개의 공간들은 다른 레퍼런스 프레임을 저장하기 위해 이용 가능한 상태로 유지된다. 예를 들면, 레퍼런스 프레임 버퍼(600)에서 하나 이상의 이용 가능한 공간들은 추가 레퍼런스 프레임들, 특히 본 명세서에 기술된 보간된 레퍼런스 프레임의 일부 또는 전부를 저장하는 데 사용될 수 있다. 레퍼런스 프레임 버퍼(600)는 최대 8 개의 레퍼런스 프레임들을 저장할 수 있는 것으로 도시되어 있으나, 레퍼런스 프레임 버퍼(600)의 다른 구현예들은 추가 또는 더 적은 수의 레퍼런스 프레임들을 저장할 수도 있다.[0056] There may be a finite number of reference frames that may be stored in the reference frame buffer 600 . As shown in FIG. 6 , the reference frame buffer 600 may store up to 8 reference frames, and each stored reference frame may be associated with a different virtual index of the reference frame buffer. 3 of 8 spaces in reference frame buffer 600 are used by frames designated as LAST_FRAME 602 , GOLDEN_FRAME 604 , and ALTREF_FRAME 606 , while 5 spaces are used to store other reference frames remain possible. For example, one or more available spaces in reference frame buffer 600 may be used to store some or all of additional reference frames, particularly an interpolated reference frame described herein. Although the reference frame buffer 600 is shown as capable of storing up to eight reference frames, other implementations of the reference frame buffer 600 may store additional or fewer reference frames.

[0057] 몇몇 구현예들에서, ALTREF_FRAME(606)으로 지정된 대체 레퍼런스 프레임은 디스플레이 순서에서는 현재 프레임으로부터 멀리 있지만 그 디스플레이되는 것보다 더 일찍 인코딩 또는 디코딩되는 비디오 시퀀스의 프레임일 수 있다. 예를 들면, 대체 레퍼런스 프레임은 디스플레이 순서에서 현재 프레임 이후의 10 개, 12 개, 또는 그 이상(또는 미만) 프레임들일 수 있다. 추가의 대체 레퍼런스 프레임들은 디스플레이 순서에서 현재 프레임에 더 가까이 위치된 프레임들일 수 있다.[0057] In some implementations, the alternate reference frame designated ALTREF_FRAME 606 may be a frame of a video sequence that is encoded or decoded earlier than it is displayed, but far from the current frame in display order. For example, the replacement reference frame may be 10, 12, or more (or fewer) frames after the current frame in display order. The additional alternate reference frames may be frames located closer to the current frame in display order.

[0058] 대체 레퍼런스 프레임은 시퀀스 중의 프레임에 직접 대응하지 않을 수 있다. 대신에, 대체 레퍼런스 프레임은 필터링이 적용되거나, 함께 결합되거나, 또는 함께 결합될 뿐만 아니라 필터링된 프레임들 중 하나 이상을 사용하여 생성될 수도 있다. 대체 레퍼런스 프레임은 디스플레이되지 않을 수도 있다. 대신에, 대체 레퍼런스 프레임은 예측 프로세스에서만 사용할 목적으로 생성 및 전송되는 프레임 또는 프레임의 일부일 수 있다(즉, 디코딩된 시퀀스가 디스플레이될 때는 생략된다).[0058] The replacement reference frame may not directly correspond to a frame in the sequence. Alternatively, the replacement reference frame may be generated using one or more of the filtered frames as well as to which filtering is applied, combined together, or combined together. The replacement reference frame may not be displayed. Instead, the replacement reference frame may be a frame or part of a frame that is generated and transmitted for use only in the prediction process (ie, the decoded sequence is omitted when displayed).

[0059] 도 7은 비디오 시퀀스의 디스플레이 순서에 있어서의 프레임들의 그룹의 다이어그램이다. 이 예에서, 프레임들의 그룹은 키 프레임(key frame) 또는 몇몇 경우들에서는 오버레이 프레임으로 지칭될 수 있는 프레임(700)이 선행되고, 8 개의 프레임(702-716)을 포함한다. 프레임(700) 내의 어떤 블록도 프레임들의 그룹의 레퍼런스 프레임들을 사용하여 인터 예측되지 않는다. 프레임(700)은 이 예에서 키(인트라 예측된 프레임이라고도 함)이며, 이는 프레임 내의 예측된 블록들이 인트라 예측을 사용해서만 예측되는 그 상태를 지칭한다. 하지만, 프레임(700)은 오버레이 프레임일 수도 있는데, 이는 이전 그룹의 프레임들의 재구성된 프레임일 수 있는 인터 예측된 프레임이다. 인터 예측된 프레임에서, 예측된 블록들 중 적어도 일부는 인터 예측을 사용하여 예측된다. 각각의 프레임들의 그룹을 형성하는 프레임들의 개수는 비디오의 공간/시간 특성 및 예를 들면, 랜덤 액세스 또는 에러 회복성(error resilience)을 위해 선택된 키 프레임 간격과 같은 다른 인코딩된 구성들에 따라 변할 수 있다.[0059] 7 is a diagram of a group of frames in a display order of a video sequence. In this example, the group of frames includes eight frames 702-716, preceded by frame 700, which may be referred to as a key frame or, in some cases, an overlay frame. No block within frame 700 is inter-predicted using the reference frames of the group of frames. Frame 700 is a key (also called intra-predicted frame) in this example, which refers to a state in which the predicted blocks within the frame are predicted only using intra prediction. However, frame 700 may be an overlay frame, which is an inter-predicted frame, which may be a reconstructed frame of a previous group of frames. In an inter-predicted frame, at least some of the predicted blocks are predicted using inter prediction. The number of frames forming each group of frames may vary depending on the spatial/temporal characteristics of the video and other encoded configurations, such as the key frame interval chosen for, for example, random access or error resilience. have.

[0060] 각 프레임들의 그룹의 코딩 순서는 디스플레이 순서와 다를 수 있다. 이는 비디오 시퀀스에서 현재 프레임의 뒤에 위치된 프레임이 현재 프레임을 인코딩하기 위한 레퍼런스 프레임으로 사용될 수 있게 한다. 디코더(500)와 같은 디코더는 공통의 그룹 코딩 구조를 인코더(400)와 같은 인코더와 공유할 수 있다. 그룹 코딩 구조는 그룹 내의 각각의 프레임들이 레퍼런스 버프(예를 들면, 마지막 프레임, 대체 레퍼런스 프레임 등)에서 행할 수 있는 상이한 역할들을 할당하며 그룹 내의 프레임들에 대한 코딩 순서를 정의하거나 나타낸다.[0060] The coding order of each group of frames may be different from the display order. This allows a frame positioned after the current frame in the video sequence to be used as a reference frame for encoding the current frame. A decoder such as decoder 500 may share a common group coding structure with an encoder such as encoder 400 . The group coding structure assigns different roles that each frame in the group can play in a reference buff (eg, last frame, alternate reference frame, etc.) and defines or indicates the coding order for the frames in the group.

[0061] 도 8은 도 7의 프레임들의 그룹에 대한 코딩 순서의 일례의 다이어그램이다. 도 8의 코딩 순서는, 그룹의 각각의 프레임에 대해 단일의 역방향 레퍼런스 프레임이 이용 가능한 제1 그룹 코딩 구조와 관련된다. 인코딩 및 디코딩 순서는 동일하기 때문에, 도 8에 도시된 순서는 본 명세서에서 일반적으로 코딩 순서로 지칭된다. 키 또는 오버레이 프레임(700)은 레퍼런스 프레임 버퍼(600)에서의 GOLDEN_FRAME(604)과 같이, 레퍼런스 프레임 버퍼 내의 골든 프레임으로 지정된다. 프레임(700)은 이 예에서는 인트라 예측되며, 그래서 레퍼런스 프레임을 필요로 하지 않지만, 이전 그룹으로부터 재구성된 프레임인 프레임(700)으로서 오버레이 프레임도 또한 현재 프레임들의 그룹의 레퍼런스 프레임을 사용하지 않는다. 그룹 내의 마지막 프레임(716)은 레퍼런스 프레임 버퍼(600) 내의 ALTREF_FRAME(606)과 같이, 레퍼런스 프레임 버퍼 내의 대체 레퍼런스 프레임으로 지정된다. 이 코딩 순서에서, 프레임(716)은 나머지 프레임들(702 내지 714) 각각에 대한 역방향 레퍼런스 프레임을 제공하기 위해 프레임(700) 다음에 디스플레이 순서를 벗어나서 코딩된다. 프레임(716)의 블록들을 코딩함에 있어서, 프레임(700)은 프레임(716)의 블록들에 대한 이용 가능한 레퍼런스 프레임 역할을 한다. 도 8은 프레임들의 그룹에 대한 코딩 순서의 일례일 뿐이다. 다른 그룹 코딩 구조들은 순방향 및/또는 역방향 예측을 위해 하나 이상의 상이한 또는 추가 프레임들을 지정할 수 있다.[0061] FIG. 8 is a diagram of an example of a coding order for the group of frames of FIG. 7 . The coding order of FIG. 8 relates to a first group coding structure in which a single reverse reference frame is available for each frame of the group. Since the encoding and decoding order are the same, the order shown in FIG. 8 is generally referred to herein as a coding order. A key or overlay frame 700 is designated as a golden frame in the reference frame buffer, such as GOLDEN_FRAME 604 in the reference frame buffer 600 . Frame 700 is intra predicted in this example, so it does not require a reference frame, but as frame 700 , which is a frame reconstructed from a previous group, the overlay frame also does not use the reference frame of the group of current frames. The last frame 716 in the group is designated as an alternate reference frame in the reference frame buffer, such as ALTREF_FRAME 606 in the reference frame buffer 600 . In this coding order, frame 716 is coded out of display order after frame 700 to provide a reverse reference frame for each of the remaining frames 702-714. In coding the blocks of frame 716 , frame 700 serves as an available reference frame for the blocks of frame 716 . 8 is only an example of a coding order for a group of frames. Other group coding structures may designate one or more different or additional frames for forward and/or backward prediction.

[0062] 위에서 간략하게 언급된 바와 같이, 이용 가능한 레퍼런스 프레임 부분은 광흐름 추정을 사용하여 보간되는 레퍼런스 프레임 부분일 수 있다. 레퍼런스 프레임 부분은 예를 들면, 블록, 슬라이스, 또는 전체 프레임일 수 있다. 본 명세서에서 설명되는 바와 같이 프레임 레벨의 광흐름 추정이 수행될 때, 결과적인 레퍼런스 프레임은 그 치수들이 현재 프레임과 동일하기 때문에 본 명세서에서는 병치된(co-located) 레퍼런스 프레임으로 지칭된다. 이 보간된 레퍼런스 프레임은 본 명세서에서 광흐름 레퍼런스 프레임으로도 지칭될 수 있다.[0062] As briefly mentioned above, the available reference frame portion may be the reference frame portion interpolated using lightflow estimation. The reference frame portion may be, for example, a block, a slice, or an entire frame. When frame-level light flow estimation is performed as described herein, the resulting reference frame is referred to herein as a co-located reference frame because its dimensions are the same as the current frame. This interpolated reference frame may also be referred to herein as a light flow reference frame.

[0063] 도 9는 본 명세서의 교시들에 따른 모션 필드의 선형 투영을 설명하기 위해 사용되는 다이어그램이다. 계층적 코딩 프레임워크 내에서, 현재 프레임의 광흐름(모션 필드라고도 함)은 현재 프레임의 전후에서 가장 가까운 이용 가능한 재구성된(예를 들면, 레퍼런스) 프레임을 사용하여 추정될 수 있다. 도 9에서, 레퍼런스 프레임 1은 현재 프레임(900)의 순방향 예측에 사용될 수 있는 레퍼런스 프레임인 반면, 레퍼런스 프레임 2는 현재 프레임(900)의 역방향 예측에 사용될 수 있는 레퍼런스 프레임이다. 예시를 위해 도 6 내지 도 8의 예를 이용하면, 현재 프레임(900)이 프레임(706)이면, 직전 또는 마지막 프레임(704)(예를 들면, LAST_FRAME(602)으로 레퍼런스 프레임 버퍼(600)에 저장된 재구성된 프레임)이 레퍼런스 프레임 1로 사용될 수 있는 한편, 프레임(716)(예를 들면, ALTREF_FRAME(606)으로 레퍼런스 프레임 버퍼(600)에 저장된 재구성된 프레임)은 레퍼런스 프레임 2로 사용될 수 있다.[0063] 9 is a diagram used to illustrate a linear projection of a motion field in accordance with the teachings herein. Within the hierarchical coding framework, the light flow (also called motion field) of the current frame can be estimated using the closest available reconstructed (eg, reference) frame before and after the current frame. In FIG. 9 , reference frame 1 is a reference frame that can be used for forward prediction of the current frame 900 , while reference frame 2 is a reference frame that can be used for backward prediction of the current frame 900 . Using the example of FIGS. 6 to 8 for illustration, if the current frame 900 is the frame 706, the previous or last frame 704 (eg, LAST_FRAME 602) is stored in the reference frame buffer 600. The stored reconstructed frame) may be used as reference frame 1 , while frame 716 (eg, the reconstructed frame stored in reference frame buffer 600 as ALTREF_FRAME 606 ) may be used as reference frame 2.

[0064] 현재 및 레퍼런스 프레임들의 디스플레이 인덱스들을 알면, 모션 필드가 시간적으로 선형이라고 가정하면, 레퍼런스 프레임 1과 레퍼런스 프레임 2의 픽셀들 사이에서 현재 프레임(900)의 픽셀들로 모션 벡터들이 투영될 수 있다. 도 6 내지 도 8과 관련하여 설명된 간단한 예에서, 현재 프레임(900)에 대한 인덱스는 3이고, 레퍼런스 프레임 1에 대한 인덱스는 0이며, 레퍼런스 프레임 2에 대한 인덱스는 8이다. 도 9에는, 현재 프레임(900)의 픽셀(902)에 대한 투영된 모션 벡터(904)가 도시되어 있다. 설명에 있어서 이전 예를 이용하면, 도 7의 프레임들의 그룹의 디스플레이 인덱스들은 프레임(704)이 프레임(716)보다 프레임(706)에 시간적으로 더 가깝다는 것을 보여주게 된다. 따라서, 도 9에 도시된 단일의 모션 벡터(904)는 레퍼런스 프레임 2와 현재 프레임(900) 사이와는 다른 레퍼런스 프레임 1과 현재 프레임(900) 사이의 모션의 양을 나타낸다. 그럼에도 불구하고, 투영된 모션 필드(906)는 레퍼런스 프레임 1, 현재 프레임(900), 및 레퍼런스 프레임 2 사이에서 선형이다.[0064] Knowing the display indices of the current and reference frames, motion vectors can be projected into the pixels of the current frame 900 between the pixels of reference frame 1 and reference frame 2, assuming the motion field is temporally linear. In the simple example described with reference to FIGS. 6 to 8 , the index for the current frame 900 is 3, the index for the reference frame 1 is 0, and the index for the reference frame 2 is 8. In FIG. 9 , a projected motion vector 904 for a pixel 902 of a current frame 900 is shown. Using the previous example in the description, the display indices of the group of frames of FIG. 7 would show that frame 704 is closer in time to frame 706 than frame 716 . Thus, the single motion vector 904 shown in FIG. 9 represents the amount of motion between reference frame 1 and the current frame 900 that is different from between reference frame 2 and the current frame 900 . Nevertheless, the projected motion field 906 is linear between reference frame 1 , current frame 900 , and reference frame 2 .

[0065] 가장 가까운 이용 가능한 재구성된 순방향 및 역방향 레퍼런스 프레임들을 선택하고 시간적으로 선형인 현재 프레임의 각각의 픽셀들에 대한 모션 필드를 가정하는 것은, 추가 정보를 전송함이 없이 광흐름 추정을 사용한 보간된 레퍼런스 프레임의 생성이 인코더 및 디코더 양자 모두에서 (예를 들면, 인트라/인터 예측 단계(402) 및 인트라/인터 예측 단계(508)에서) 수행될 수 있게 한다. 가장 근접한 이용 가능한 재구성된 레퍼런스 프레임들 대신에, 상이한 프레임들이 인코더와 디코더 사이에 선험적으로(a priori) 지정된 것으로 사용될 수 있는 것도 가능하다. 몇몇 구현예들에서는, 광흐름 추정에 사용되는 프레임들의 식별이 전송될 수 있다. 보간된 프레임의 생성은 아래에서 더 상세하게 논의된다.[0065] Selecting the closest available reconstructed forward and backward reference frames and assuming a motion field for each pixel of the current frame that are temporally linear is an interpolated reference frame using optical flow estimation without transmitting additional information. Allows the generation of α to be performed at both the encoder and decoder (eg, in the intra/inter prediction step 402 and the intra/inter prediction step 508 ). It is also possible that, instead of the closest available reconstructed reference frames, different frames may be used as specified a priori between the encoder and the decoder. In some implementations, an identification of frames used for lightflow estimation may be transmitted. The generation of interpolated frames is discussed in more detail below.

[0066] 도 10은 광흐름 추정을 사용하여 생성된 레퍼런스 프레임의 적어도 일부를 사용하는 비디오 시퀀스의 프레임의 모션 보상 예측을 위한 방법 또는 프로세스(1000)의 흐름도이다. 레퍼런스 프레임 부분은 예를 들면, 블록, 슬라이스, 또는 전체 레퍼런스 프레임일 수 있다. 광흐름 레퍼런스 프레임 부분은 본 명세서에서 병치된 레퍼런스 프레임 부분으로도 또한 지칭될 수도 있다. 프로세스(1000)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)과 같은 컴퓨팅 디바이스들에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수 있다. 예를 들면, 소프트웨어 프로그램은 메모리(204) 또는 2 차 스토리지(214)와 같은 메모리에 저장될 수 있고 CPU(202)와 같은 프로세서에 의해 실행될 때 컴퓨팅 디바이스가 프로세스(1000)를 수행하게 할 수 있는 기계 판독 가능 명령들을 포함할 수 있다. 프로세스(1000)는 특수 하드웨어 또는 펌웨어를 사용하여 구현될 수 있다. 몇몇 컴퓨팅 디바이스들은 다수의 메모리들 또는 프로세서들을 가질 수 있고, 프로세스(1000)에 기술된 동작들은 다수의 프로세서들, 메모리들, 또는 양자 모두를 사용하여 분산될 수 있다.[0066] 10 is a flow diagram of a method or process 1000 for motion compensated prediction of a frame of a video sequence using at least a portion of a reference frame generated using optical flow estimation. The reference frame portion may be, for example, a block, a slice, or an entire reference frame. The lightflow reference frame portion may also be referred to herein as a collocated reference frame portion. Process 1000 may be implemented as a software program that may be executed by computing devices such as, for example, transmitting station 102 or receiving station 106 . For example, a software program may be stored in a memory such as memory 204 or secondary storage 214 and may cause the computing device to perform process 1000 when executed by a processor such as CPU 202 . may include machine readable instructions. Process 1000 may be implemented using specialized hardware or firmware. Some computing devices may have multiple memories or processors, and the operations described in process 1000 may be distributed using multiple processors, memories, or both.

[0067] 1002에서, 예측될 현재 프레임이 결정된다. 프레임들은 도 8에 도시된 코딩 순서와 같은, 임의의 순서로 코딩될 수 있고 그래서 예측될 수 있다. 예측될 프레임들은 제1, 제2, 제3 등의 프레임으로도 또한 지칭될 수 있다. 제1, 제2 등의 라벨은 반드시 프레임들의 순서를 나타내는 것은 아니다. 대신에, 달리 언급되지 않는 한, 라벨은 본 명세서에서 하나의 현재 프레임을 다른 프레임과 구별하기 위해 사용된다. 인코더에서, 프레임은 래스터 스캔(raster scan) 순서와 같은, 블록 코딩 순서로 블록들의 단위들로 처리될 수 있다. 디코더에서, 프레임은 또한 인코딩된 비트스트림 내의 그 인코딩된 잔차들의 수신에 따라 블록들의 단위들로 처리될 수 있다.[0067] At 1002 , a current frame to be predicted is determined. Frames may be coded and thus predicted in any order, such as the coding order shown in FIG. 8 . Frames to be predicted may also be referred to as first, second, third, etc. frames. The first, second, etc. labels do not necessarily indicate the order of the frames. Instead, unless stated otherwise, labels are used herein to distinguish one current frame from another. In the encoder, a frame may be processed into units of blocks in a block coding order, such as a raster scan order. At the decoder, a frame may also be processed into units of blocks upon receipt of its encoded residuals within the encoded bitstream.

[0068] 1004에서, 순방향 및 역방향 레퍼런스 프레임들이 결정된다. 본 명세서에 설명된 예들에서, 순방향 및 역방향 레퍼런스 프레임들은 현재 프레임(900)과 같은, (예를 들면, 디스플레이 순서로) 현재 프레임의 전후의 가장 가까운 재구성된 프레임들이다. 도 10에 명시적으로 도시되지는 않았으나, 순방향 또는 역방향 레퍼런스 프레임이 존재하지 않으면, 프로세스(1000)는 종료된다. 그러면 현재 프레임은 광흐름을 고려하지 않고 처리된다.[0068] At 1004, forward and backward reference frames are determined. In the examples described herein, the forward and backward reference frames are the closest reconstructed frames before and after the current frame (eg, in display order), such as the current frame 900 . Although not explicitly shown in FIG. 10 , if a forward or backward reference frame does not exist, the process 1000 ends. Then, the current frame is processed without considering the light flow.

[0069] 1004에서 순방향 및 역방향 레퍼런스 프레임들이 존재하는 경우, 1006에서 레퍼런스 프레임들을 사용하여 광흐름 레퍼런스 프레임 부분이 생성될 수 있다. 광흐름 레퍼런스 프레임 부분의 생성은 도 11 내지 도 14를 참조하여 보다 상세히 설명된다. 광흐름 레퍼런스 프레임 부분은 몇몇 구현예들에서, 레퍼런스 프레임 버퍼(600) 내의 정의된 위치에 저장될 수 있다. 초기에, 본 명세서의 교시들에 따른 광흐름 추정이 설명된다.[0069] If there are forward and backward reference frames at 1004 , a lightflow reference frame portion may be generated using the reference frames at 1006 . The generation of the light flow reference frame portion is described in more detail with reference to FIGS. 11 to 14 . The lightflow reference frame portion may, in some implementations, be stored at a defined location within the reference frame buffer 600 . Initially, light flow estimation according to the teachings herein is described.

[0070] 다음의 라그랑지 함수(1)를 최소화함으로써 현재 프레임 부분의 각각의 픽셀들에 대해 광흐름 추정이 수행될 수 있다:[0070] Lightflow estimation can be performed for each pixel of the current frame portion by minimizing the following Lagrangian function (1):

[0071]

(1)[0071]

(One)

[0072] 함수 1에서,

는 명도 항등성 가정(즉, 이미지의 작은 부분의 강도 값은 위치 변화에도 불구하고 시간 경과에 따라 변화하지 않고 유지된다는 가정)에 기초한 데이터 페널티이다.

은 모션 필드의 평활도(즉, 인접한 픽셀들은 이미지에서 동일한 객체 항목에 속할 가능성이 높고 그래서 실질적으로 동일한 이미지 모션을 초래한다는 특성)에 기초한 공간 페널티이다. 라그랑지 파라미터 λ는 모션 필드의 평활도의 중요성을 제어한다. 파라미터 λ의 큰 값은 모션 필드를 보다 평활하게 하고, 보다 큰 스케일의 모션을 더 잘 감안할 수 있다. 대조적으로, 파라미터 λ의 더 작은 값은 객체의 에지들 및 작은 객체들의 이동에 보다 효과적으로 적응할 수 있다.[0072] In function 1,

is a data penalty based on the brightness constancy assumption (ie, the assumption that the intensity value of a small portion of an image remains unchanged over time despite changes in position).

is a spatial penalty based on the smoothness of the motion field (ie, the property that adjacent pixels are likely to belong to the same object item in the image and thus result in substantially the same image motion). The Lagrangian parameter λ controls the importance of smoothness of the motion field. A large value of the parameter λ makes the motion field smoother and can better account for larger-scale motion. In contrast, a smaller value of the parameter λ can more effectively adapt to the movement of edges of objects and small objects.

[0073] 본 명세서의 교시들의 구현예에 따르면, 데이터 페널티는 데이터 페널티 함수로 표현될 수 있다:[0073] According to an implementation of the teachings herein, a data penalty may be expressed as a data penalty function:

[0074]

(2)[0074]

(2)

[0075] 현재 픽셀에 대한 모션 필드의 수평 성분은 u로 표현되는 한편, 모션 필드의 수직 성분은 v로 표현된다. 대략적으로 말하면, E_x, E_y, 및 E_t는 (예를 들면, 프레임 인덱스들에 의해 표시되는 바와 같은) 수평 축 x, 수직 축 y, 및 시간 t에 대한 레퍼런스 프레임 부분들의 픽셀 값들의 도함수들이다. 수평 축 및 수직 축은 현재 프레임(900)과 같은 현재 프레임 및 레퍼런스 프레임들 1 및 2와 같은 레퍼런스 프레임들을 형성하는 픽셀들의 어레이에 대해 정의된다.[0075] The horizontal component of the motion field for the current pixel is represented by u, while the vertical component of the motion field is represented by v. Roughly speaking, E _x , E _y , and E _t are the derivatives of pixel values of the reference frame portions with respect to the horizontal axis x (eg, as indicated by the frame indices), the vertical axis y, and the time t. admit. A horizontal axis and a vertical axis are defined for an array of pixels forming a current frame, such as current frame 900 , and reference frames, such as reference frames 1 and 2.

[0076] 데이터 페널티 함수에서, 도함수들 E_x, E_y, 및 E_t는 다음의 함수들 (3), (4), 및 (5)에 따라 계산될 수 있다:[0076] In the data penalty function, the derivatives E _x , E _y , and E _t can be calculated according to the following functions (3), (4), and (5):

[0077]

[0078]

(3)[0078]

(3)

[0079]

[0080]

(4)[0080]

(4)

[0081]

(5)[0081]

(5)

[0082] 변수

은 인코딩되는 현재 프레임 내의 현재 픽셀 위치의 모션 필드에 기초한 레퍼런스 프레임 1 내의 투영된 위치에서의 픽셀 값이다. 유사하게, 변수

는 인코딩되는 현재 프레임 내의 현재 픽셀 위치의 모션 필드에 기초한 레퍼런스 프레임 2 내의 투영된 위치에서의 픽셀 값이다.[0082] variable

is the pixel value at the projected position in reference frame 1 based on the motion field of the current pixel position in the current frame being encoded. Similarly, the variable

is the pixel value at the projected position in reference frame 2 based on the motion field of the current pixel position in the current frame being encoded.

[0083] 변수

은 레퍼런스 프레임 1의 디스플레이 인덱스이며, 여기서 프레임의 디스플레이 인덱스는 비디오 시퀀스의 디스플레이 순서에 있어서의 그 인덱스이다. 유사하게, 변수

는 레퍼런스 프레임 2의 디스플레이 인덱스이고, 변수

은 현재 프레임(900)의 디스플레이 인덱스이다.[0083] variable

is the display index of reference frame 1, where the display index of the frame is its index in the display order of the video sequence. Similarly, the variable

is the display index of reference frame 2, and

is the display index of the current frame 900 .

[0084] 변수

은 선형 필터를 사용하여 레퍼런스 프레임 1에서 계산된 수평 도함수(horizontal derivative)이다. 변수

는 선형 필터를 사용하여 레퍼런스 프레임 2에서 계산된 수평 도함수이다. 변수

은 선형 필터를 사용하여 레퍼런스 프레임 1에서 계산된 수직 도함수이다. 변수

는 선형 필터를 사용하여 레퍼런스 프레임 2에서 계산된 수직 도함수이다.[0084] variable

is the horizontal derivative computed in reference frame 1 using a linear filter. variable

is the horizontal derivative calculated in reference frame 2 using a linear filter. variable

is the vertical derivative calculated in reference frame 1 using a linear filter. variable

is the vertical derivative calculated in reference frame 2 using a linear filter.

[0085] 본 명세서의 교시들의 구현예에서, 수평 도함수를 계산하는데 사용되는 선형 필터는 필터 계수 [-1/60, 9/60, -45/60, 0, 45/60, -9/60, 1/60]를 갖는 7-탭 필터(7-tap filter)이다. 필터는 상이한 주파수 프로파일, 상이한 개수의 탭들, 또는 양자 모두를 가질 수 있다. 수직 도함수들을 계산하는데 사용되는 선형 필터는 수평 도함수들을 계산하는데 사용되는 선형 필터와 동일하거나 상이할 수 있다.[0085] In an implementation of the teachings herein, the linear filter used to calculate the horizontal derivative is the filter coefficient [-1/60, 9/60, -45/60, 0, 45/60, -9/60, 1/60 ] is a 7-tap filter. A filter may have a different frequency profile, a different number of taps, or both. The linear filter used to calculate the vertical derivatives may be the same or different from the linear filter used to calculate the horizontal derivatives.

[0086] 공간 페널티는 공간 페널티 함수로 나타낼 수 있다:[0086] A spatial penalty can be expressed as a spatial penalty function:

[0087]

(6)[0087]

(6)

[0088] 공간 페널티 함수(6)에서, Δu는 모션 필드의 수평 성분 u의 라플라시안(Laplacian)이고, Δv는 모션 필드의 수직 성분 v의 라플라시안이다.[0088] In the spatial penalty function (6), Δu is the Laplacian of the horizontal component u of the motion field, and Δv is the Laplacian of the vertical component v of the motion field.

[0089] 도 11은 광흐름 레퍼런스 프레임 부분을 생성하기 위한 방법 또는 프로세스(1100)의 흐름도이다. 이 예에서, 광흐름 레퍼런스 프레임 부분은 전체 레퍼런스 프레임이다. 프로세스(1100)는 프로세스(1000)의 스텝 1006을 구현할 수 있다. 프로세스(1100)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)과 같은 컴퓨팅 디바이스들에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수 있다. 예를 들면, 소프트웨어 프로그램은 메모리(204) 또는 2 차 스토리지(214)와 같은 메모리에 저장될 수 있고 CPU(202)와 같은 프로세서에 의해 실행될 때 컴퓨팅 디바이스가 프로세스(1100)를 수행하게 할 수 있는 기계 판독 가능 명령들을 포함할 수 있다. 프로세스(1100)는 특수 하드웨어 또는 펌웨어를 사용하여 구현될 수 있다. 전술한 바와 같이, 다수의 프로세서들, 메모리들, 또는 양자 모두가 사용될 수도 있다.[0089] 11 is a flow diagram of a method or process 1100 for generating an opticalflow reference frame portion. In this example, the light flow reference frame portion is the entire reference frame. Process 1100 may implement step 1006 of process 1000 . Process 1100 may be implemented as a software program that may be executed by computing devices such as, for example, transmitting station 102 or receiving station 106 . For example, a software program may be stored in a memory such as memory 204 or secondary storage 214 and may cause the computing device to perform process 1100 when executed by a processor such as CPU 202 . may include machine readable instructions. Process 1100 may be implemented using specialized hardware or firmware. As noted above, multiple processors, memories, or both may be used.

[0090] 순방향 및 역방향 레퍼런스 프레임들은 서로 상대적으로 멀리 떨어져 있을 수 있기 때문에, 이들 사이에는 극적인 모션이 있을 수 있으며, 그래서 명도 항등성 가정의 정확도를 저하시킨다. 이 문제로부터 초래되는 픽셀의 모션에 있어서의 잠재적인 에러를 저감하기 위해, 현재 프레임으로부터 레퍼런스 프레임들로의 추정된 모션 벡터들이 현재 프레임에 대한 광흐름 추정을 초기화하는데 사용될 수 있다. 1102에서, 현재 프레임 내의 모든 픽셀들에는 초기화된 모션 벡터가 할당될 수 있다. 이들은 레퍼런스 프레임들 사이의 모션 길이들을 단축시키기 위해 제1 처리 레벨을 위해 레퍼런스 프레임들을 현재 프레임으로 워핑(warping)하는데 이용될 수 있는 초기 모션 필드들을 정의한다.[0090] Because the forward and backward reference frames may be relatively far from each other, there may be dramatic motion between them, thus reducing the accuracy of the brightness constancy assumption. To reduce the potential error in the motion of the pixel resulting from this problem, the estimated motion vectors from the current frame to the reference frames can be used to initialize the lightflow estimate for the current frame. At 1102 , all pixels within the current frame may be assigned an initialized motion vector. These define initial motion fields that can be used to warp reference frames to the current frame for a first level of processing to shorten motion lengths between reference frames.

[0091] 현재 픽셀의 모션 필드

은 다음의 함수에 따라, 현재 픽셀로부터 역방향 레퍼런스 프레임, 이 예에서는 레퍼런스 프레임 2를 가리키는 추정된 모션 벡터

와 현재 픽셀로부터 순방향 레퍼런스 프레임, 이 예에서는 레퍼런스 프레임 1을 가리키는 추정된 모션 벡터

사이의 차이를 나타내는 모션 벡터를 사용하여 초기화될 수 있다:[0091] Motion field of the current pixel

is the estimated motion vector pointing backwards from the current pixel to the reference frame, in this example reference frame 2, according to the function

and the estimated motion vector pointing forward from the current pixel to the reference frame, in this example reference frame 1.

It can be initialized using a motion vector representing the difference between:

[0092]

[0093] 모션 벡터들 중 어느 하나가 이용 가능하지 않은 경우, 다음의 함수들 중 어느 하나에 따라 이용 가능한 모션 벡터를 사용하여 초기 모션을 외삽(extrapolate)하는 것이 가능하다:[0093] If any of the motion vectors are not available, it is possible to extrapolate the initial motion using the available motion vector according to any one of the following functions:

[0094]

, 또는[0094]

, or

[0095]

.[0095]

.

[0096] 현재 픽셀이 이용 가능한 모션 벡터 레퍼런스를 갖지 않는 경우, 초기화된 모션 벡터를 갖는 하나 이상의 공간적 이웃들이 사용될 수 있다. 예를 들면, 이용 가능한 인접한 초기화된 모션 벡터들의 평균이 사용될 수 있다.[0096] If the current pixel does not have a motion vector reference available, one or more spatial neighbors with an initialized motion vector may be used. For example, an average of available adjacent initialized motion vectors may be used.

[0097] 1102에서 제1 처리 레벨을 위한 모션 필드를 초기화하는 예에서, 레퍼런스 프레임 2는 레퍼런스 프레임 1의 픽셀을 예측하기 위해 사용될 수 있는데, 여기서 레퍼런스 프레임 1은 현재 프레임이 코딩되기 전의 마지막 프레임이다. 도 9에 도시된 것과 유사한 방식으로 선형 투영을 사용하여 현재 프레임에 투영된 그 모션 벡터는, 픽셀 위치(902)에서의 모션 필드(906)와 같이, 교차하는 픽셀 위치에 모션 필드 mv_cur을 발생시킨다.In the example of initializing the motion field for the first processing level at 1102 , reference frame 2 may be used to predict the pixels of reference frame 1 , where reference frame 1 is the last frame before the current frame is coded. . That motion vector, projected onto the current frame using linear projection in a manner similar to that shown in FIG. 9 , generates a _{motion field mv cur at intersecting pixel locations, such as motion field 906 at pixel location 902 .} make it

[0098] 도 11은 프로세스(1100)에는 바람직하게는 다수의 처리 레벨들이 있기 때문에 제1 처리 레벨을 위한 모션 필드를 초기화하는 것을 언급하고 있다. 이는 도 11의 프로세스(1100)(및 아래에서 논의되는 도 12의 프로세스(1200))를 예시하는 다이어그램인 도 13을 참조하면 볼 수 있다. 이하의 설명은 모션 필드라는 문구를 사용한다. 이 문구는 문맥으로부터 달리 분명하지 않은 한, 각각의 픽셀들에 대한 모션 필드들을 총괄적으로 지칭하기 위한 것이다. 따라서, 2 개 이상의 모션 필드를 언급할 때에는 "모션 필드들" 또는 "모션 필드" 라는 문구들이 상호 교환적으로 사용될 수 있다. 또한, 픽셀들의 이동을 지칭할 때에는 광흐름이라는 문구가 모션 필드라는 문구와 상호 교환적으로 사용될 수 있다.[0098] 11 refers to initializing a motion field for a first level of processing since process 1100 preferably has multiple processing levels. This can be seen with reference to FIG. 13 , which is a diagram illustrating process 1100 of FIG. 11 (and process 1200 of FIG. 12 discussed below). The following description uses the phrase motion field. This phrase is intended to collectively refer to the motion fields for each pixel, unless otherwise clear from the context. Accordingly, the phrases “motion fields” or “motion field” may be used interchangeably when referring to two or more motion fields. Also, when referring to the movement of pixels, the phrase light flow may be used interchangeably with the phrase motion field.

[0099] 프레임의 픽셀들에 대한 모션 필드/광흐름을 추정하기 위해, 피라미드 또는 다층 구조가 사용될 수 있다. 예를 들면, 하나의 피라미드 구조에서, 레퍼런스 프레임들은 하나 이상의 상이한 스케일들로 스케일 다운된다(scaled down). 그 다음에, 광흐름은 먼저 피라미드의 최고 레벨(제1 처리 레벨)에서, 즉 가장 스케일링된 레퍼런스 프레임들을 사용하여 모션 필드를 획득하기 위해 추정된다. 그 후에, 모션 필드는 업스케일링되어 다음 레벨에서 광흐름 추정을 초기화하는데 사용된다. 모션 필드를 업스케일링하고, 이를 사용하여 다음 레벨의 광흐름 추정을 초기화하며, 모션 필드를 획득하는 이 프로세스는 피라미드의 최하위 레벨에 도달할 때까지(즉, 전체 스케일의 레퍼런스 프레임 부분들에 대해 광흐름 추정이 완료될 때까지) 계속된다.[0099] To estimate the motion field/lightflow for the pixels of a frame, a pyramid or multi-layer structure may be used. For example, in one pyramid structure, reference frames are scaled down to one or more different scales. Then, the light flow is estimated to first obtain the motion field at the highest level of the pyramid (first processing level), ie using the most scaled reference frames. After that, the motion field is upscaled and used to initialize the lightflow estimation in the next level. This process of upscaling the motion field, using it to initialize the next level of lightflow estimation, and obtaining the motion field, continues until the lowest level of the pyramid is reached (i.e., the optical until flow estimation is complete).

[00100] 이 프로세스의 논리적 근거는 이미지가 스케일 다운될 때 큰 모션을 캡처하는 것이 더 용이하다는 것이다. 하지만, 레퍼런스 프레임들 자체를 스케일링하기 위해 간단한 리스케일 필터들(rescale filters)을 사용하는 것은 레퍼런스 프레임 품질을 저하시킬 수 있다. 리스케일링으로 인한 세부 정보의 손실을 회피하기 위해, 광흐름을 추정하기 위해 레퍼런스 프레임들의 픽셀들 대신 도함수들을 스케일링하는 피라미드 구조가 사용된다. 이 피라미드 스킴(scheme)은 광흐름 추정을 위한 회귀 분석을 나타낸다. 이 스킴은 도 13에 도시되어 있으며, 도 11의 프로세스(1100) 및 도 12의 프로세스(1200)에 의해 구현된다.[00100] The rationale for this process is that it is easier to capture large motions when the image is scaled down. However, using simple rescale filters to scale the reference frames themselves may degrade the reference frame quality. To avoid loss of detail due to rescaling, a pyramid structure that scales derivatives instead of pixels of reference frames is used to estimate the light flow. This pyramid scheme represents a regression analysis for light flow estimation. This scheme is shown in FIG. 13 and is implemented by process 1100 of FIG. 11 and process 1200 of FIG. 12 .

[00101] 초기화 후에, 1104에서 라그랑지 함수(1)의 해를 구하기 위해 라그랑지 파라미터 λ가 설정된다. 바람직하게는, 프로세스(1100)는 라그랑지 파라미터 λ에 대해 다수의 값들을 사용한다. 1104에서 라그랑지 파라미터 λ가 설정되는 제1 값은 100과 같이 비교적 큰 값일 수 있다. 프로세스(1100)가 라그랑지 함수(1) 내에서 라그랑지 파라미터 λ에 대해 다수의 값들을 사용하는 것이 바람직하지만, 후술되는 프로세스(1200)에서 후술되는 바와 같이 단 하나의 값만이 사용되는 것도 가능하다.[00101] After initialization, at 1104, the Lagrangian parameter λ is set to solve the Lagrangian function (1). Preferably, process 1100 uses multiple values for the Lagrangian parameter λ. In 1104, the first value to which the Lagrangian parameter λ is set may be a relatively large value such as 100. Although it is preferred for process 1100 to use multiple values for the Lagrangian parameter λ in the Lagrangian function 1, it is also possible for only one value to be used, as described below in the process 1200 described below. .

[00102] 1106에서, 레퍼런스 프레임들은 현재 처리 레벨에 대한 모션 필드에 따라 현재 프레임으로 워핑된다. 레퍼런스 프레임들을 현재 프레임으로 워핑하는 것은 서브픽셀 위치 라운딩(subpixel location rounding)을 사용하여 수행될 수 있다. 제1 처리 레벨에서 사용되는 모션 필드

는 워핑을 수행하기 전에 풀(full) 해상도 값으로부터 그 레벨의 해상도로 다운스케일링된다는 점에 유의할 필요가 있다. 모션 필드의 다운스케일링은 아래에서 보다 상세히 논의된다.At 1106 , the reference frames are warped into the current frame according to the motion field for the current processing level. Warping the reference frames to the current frame may be performed using subpixel location rounding. Motion field used in the first processing level

It is worth noting that is downscaled from a full resolution value to that level of resolution before warping is performed. Downscaling of the motion field is discussed in more detail below.

[00103] 광흐름

을 알면, 레퍼런스 프레임 1을 워핑하기 위한 모션 필드는 다음과 같이 (예를 들면, 모션은 시간 경과에 따라 선형으로 투영된다는) 선형 투영 가정에 의해 추론된다:[00103] light flow

Knowing , the motion field for warping reference frame 1 is inferred by the linear projection assumption (e.g., motion is projected linearly over time) as follows:

[00104]

[00105] 워핑을 수행하기 위해, 모션 필드

의 수평 성분

및 수직 성분

은 Y 성분에 대해서는 1/8 픽셀 정밀도로 및 U 및 V 성분에 대해서는 1/16 픽셀 정밀도로 라운딩될 수 있다. 서브픽셀 위치 라운딩에 대한 다른 값들도 사용될 수 있다. 라운딩 후, 워핑된 이미지의 각각의 픽셀

은 모션 벡터

에 의해 주어진 참조된 픽셀로 계산된다. 종래의 서브픽셀 보간 필터를 사용하여 서브픽셀 보간이 수행될 수 있다.[00105] To perform warping, a motion field

horizontal component of

and vertical components

can be rounded to 1/8 pixel precision for the Y component and 1/16 pixel precision for the U and V components. Other values for subpixel position rounding may also be used. After rounding, each pixel of the warped image

silver motion vector

It is computed with the referenced pixel given by . Subpixel interpolation may be performed using a conventional subpixel interpolation filter.

[00106] 레퍼런스 프레임 2에 대해서도 동일한 워핑 접근법이 수행되어 워핑된 이미지

가 얻어지는데, 여기서 모션 필드는 다음에 의해 계산된다:[00106] The same warping approach is performed for reference frame 2, so that the warped image

is obtained, where the motion field is calculated by:

[00107]

[00108] 1106에서의 계산의 말미에는, 2 개의 워핑된 레퍼런스 프레임들이 존재한다. 2 개의 워핑된 레퍼런스 프레임들은 1108에서 이들 사이의 모션 필드를 추정하는데 사용된다. 1108에서 모션 필드를 추정하는 것은 다수의 스텝들을 포함할 수 있다.[00108] At the end of the calculation at 1106, there are two warped reference frames. The two warped reference frames are used at 1108 to estimate the motion field between them. Estimating the motion field at 1108 may include multiple steps.

[00109] 먼저, 도함수들 E_x, E_y, 및 E_t가 함수들 (3), (4), 및 (5)를 사용하여 계산된다. 도함수들을 계산할 때, 가장 가까운 이용 가능한 픽셀을 복사함으로써 워핑된 레퍼런스 프레임의 프레임 경계부들이 확장될 수 있다. 이러한 방식으로, 투영된 위치들이 워핑된 레퍼런스 프레임의 외부에 있을 때 픽셀 값들(즉,

및/또는

)이 얻어질 수 있다. 그 다음에, 다수의 층들이 있으면, 도함수들은 현재 레벨로 다운스케일링된다. 도 13에 도시된 바와 같이, 레퍼런스 프레임들은 세부 사항들을 캡쳐하기 위해 원래 스케일에서 도함수를 계산하는 데 사용된다. 각각의 레벨 l에서 도함수들을 다운스케일링하는 것은 2¹ x 2¹ 블록 내에서 평균화함으로써 계산될 수 있다. 도함수들을 계산하는 것뿐만 아니라 도함수들을 평균화하여 다운스케일링하는 것은 모두 선형 연산들이기 때문에, 2 개의 연산들은 각각의 레벨 l에서 도함수들 계산하기 위해 단일의 선형 필터로 결합될 수 있다는 것이 주목된다. 이는 계산들의 복잡도를 낮출 수 있다.[00109] First, derivatives E _x , E _y , and E _t are computed using functions (3), (4), and (5). When calculating the derivatives, the frame boundaries of the warped reference frame can be extended by copying the nearest available pixel. In this way, the pixel values (i.e., the pixel values when the projected positions are outside the warped reference frame)

and/or

) can be obtained. Then, if there are multiple layers, the derivatives are downscaled to the current level. As shown in FIG. 13 , reference frames are used to compute the derivative at the original scale to capture the details. Downscaling the derivatives at each level 1 can be calculated by averaging within a ^{2 1} x 2 ^{1 block.} It is noted that since calculating the derivatives as well as averaging and downscaling the derivatives are all linear operations, the two operations can be combined into a single linear filter to compute the derivatives at each level l. This can lower the complexity of calculations.

[00110] 도함수들이 현재 처리 레벨로 다운스케일링되면, 해당되는 경우, 라그랑지 함수(1)에 따라 광흐름 추정이 수행될 수 있다. 보다 구체적으로, 모션 필드의 수평 성분 u 및 모션 필드의 수직 성분 v에 대한 라그랑지 함수(1)의 도함수들을 0으로 설정함으로써(즉,

및

),

개의 선형 방정식들로 프레임의 모든 N 픽셀들에 대해 성분 u 및 v를 풀 수 있다. 이것은 라플라시안(Laplacians)이 2 차원(2D) 필터에 의해 근사된다는 사실에 기인한다. 정확하지만 매우 복잡한 선형 방정식들을 직접 푸는 대신에, 라그랑지 함수(1)를 최소화하기 위해 반복 접근법들이 사용되어 더 빠르지만 덜 정확한 결과를 얻을 수 있다.[00110] Once the derivatives are downscaled to the current processing level, light flow estimation may be performed according to the Lagrangian function (1), if applicable. More specifically, by setting the derivatives of the Lagrangian function (1) with respect to the horizontal component u of the motion field and the vertical component v of the motion field to zero (i.e.,

and

),

We can solve for the components u and v for all N pixels of the frame with the linear equations. This is due to the fact that Laplacians are approximated by a two-dimensional (2D) filter. Instead of solving the exact but very complex linear equations directly, iterative approaches are used to minimize the Lagrangian function (1), resulting in faster but less accurate results.

[00111] 1108에서, 현재 프레임의 픽셀들에 대한 모션 필드가 워핑된 레퍼런스 프레임들 사이의 추정된 모션 필드를 사용하여 업데이트되거나 개선된다. 예를 들면, 픽셀에 대한 현재 모션 필드는 픽셀별로 각각의 픽셀에 대해 추정된 모션 필드를 추가함으로써 업데이트될 수 있다.[00111] At 1108 , the motion field for pixels of the current frame is updated or improved using the estimated motion field between the warped reference frames. For example, the current motion field for a pixel may be updated by adding an estimated motion field for each pixel on a per-pixel basis.

[00112] 일단 1108에서 모션 필드가 추정되고 나면, 라그랑지 파라미터 λ에 대한 이용 가능한 추가 값들이 있는지 여부를 결정하기 위해 1110에서 질의가 이루어진다. 라그랑지 파라미터 λ의 보다 작은 값들은 보다 작은 스케일의 모션에 대처할 수 있다. 추가 값들이 존재하는 경우, 프로세스(1100)는 1104로 복귀하여 라그랑지 파라미터 λ에 대한 다음 값을 설정할 수 있다. 예를 들면, 프로세스(1100)는 각각의 반복(each iteration)에서 라그랑지 파라미터 λ를 절반으로 줄이면서 반복할 수 있다. 1108에서 업데이트된 모션 필드는 이 다음 반복에서 1106에서 레퍼런스 프레임들을 워핑하기 위한 현재 모션 필드이다. 그 다음에, 모션 필드는 1108에서 다시 추정된다. 1104, 1106, 및 1108에서의 처리는 1110에서의 모든 가능한 라그랑지 파라미터들이 처리될 때까지 계속된다. 일례에서, 도 13에 도시된 바와 같이 피라미드에는 3 개의 레벨들이 있으며, 그래서 일례에서 라그랑지 파라미터 λ의 최소값은 25이다. 라그랑지 파라미터를 변경하면서 이루어지는 이러한 반복 처리는 라그랑지 파라미터를 어닐링(annealing)하는 것으로 지칭될 수 있다.[00112] Once the motion field is estimated at 1108, a query is made at 1110 to determine whether there are additional values available for the Lagrangian parameter λ. Smaller values of the Lagrangian parameter λ can cope with a smaller scale of motion. If additional values exist, the process 1100 may return to 1104 to set the next value for the Lagrangian parameter λ. For example, process 1100 may iterate with halving the Lagrangian parameter λ at each iteration. The motion field updated at 1108 is the current motion field for warping the reference frames at 1106 in this next iteration. The motion field is then estimated again at 1108 . Processing at 1104, 1106, and 1108 continues until all possible Lagrangian parameters at 1110 have been processed. In one example, as shown in FIG. 13 , there are three levels in the pyramid, so the minimum value of the Lagrangian parameter λ in the example is 25. This iterative process performed while changing the Lagrangian parameter may be referred to as annealing the Lagrangian parameter.

[00113] 일단 1110에서 라그랑지 파라미터 λ에 대해 남아있는 값이 없게 되면, 프로세스(1100)는 1112로 진행하여 처리할 처리 레벨들이 더 있는지 여부를 결정한다. 1112에서 추가 처리 레벨들이 존재하면, 프로세스(1100)는 1114로 진행하며, 여기서 1104에서 시작하여 라그랑지 파라미터 λ에 대해 이용 가능한 값들 각각을 사용하여 다음 층을 처리하기 전에 모션 필드가 업스케일링된다. 모션 필드의 업스케일링은 전술한 다운스케일링 계산들의 역(reverse)을 포함하는(그러나, 이에 제한되지 않음) 임의의 알려진 기술을 사용하여 수행될 수 있다.[00113] Once there is no remaining value for the Lagrangian parameter λ at 1110, the process 1100 proceeds to 1112 to determine whether there are more processing levels to process. If there are additional processing levels at 1112 , process 1100 proceeds to 1114 , where, starting at 1104 , the motion field is upscaled before processing the next layer using each of the available values for the Lagrangian parameter λ. Upscaling of the motion field may be performed using any known technique including, but not limited to, the reverse of the downscaling calculations described above.

[00114] 일반적으로, 광흐름은 피라미드의 최고 레벨에서 모션 필드을 획득하기 위해 먼저 추정된다. 그 후에, 모션 필드는 업스케일링되어 다음 레벨에서 광흐름 추정을 초기화하는데 사용된다. 모션 필드를 업스케일링하고, 이를 사용하여 다음 레벨의 광흐름 추정을 초기화하며, 모션 필드를 획득하는 이 프로세스는 1112에서 피라미드의 최하위 레벨에 도달할 때까지(즉, 풀 스케일로 계산된 도함수들에 대해 광흐름 추정이 완료될 때까지) 계속된다.[00114] In general, the light flow is estimated first to obtain the motion field at the highest level of the pyramid. After that, the motion field is upscaled and used to initialize the lightflow estimation in the next level. This process of upscaling the motion field, using it to initialize the next level of lightflow estimation, and obtaining the motion field, continues until the lowest level of the pyramid is reached at 1112 (i.e. to the full scale calculated derivatives). until the light flow estimation is completed).

[00115] 일단 레벨이 레퍼런스 프레임들이 다운스케일링되지 않는 레벨(즉, 레퍼런스 프레임들이 그 원래 해상도에 있음)에 도달하면, 프로세스(1100)는 1116으로 진행한다. 예를 들면, 레벨들의 수는 도 13의 예에서와 같은, 3 개일 수 있다. 1116에서, 워핑된 레퍼런스 프레임들은 블렌딩되어 광흐름 레퍼런스 프레임

를 형성한다. 1116에서 블렌딩된 워핑된 레퍼런스 프레임들은 1108에서 추정된 모션 필드를 사용하여 1106에 기술된 프로세스에 따라 다시 워핑되는 풀 스케일 레퍼런스 프레임들일 수 있음에 유의하자. 다시 말하면, 풀 스케일의 레퍼런스 프레임들은 2 회 ― 이전 처리 층으로부터의 최초의 업스케일링된 모션 필드를 사용하여 한 번 및 모션 필드가 풀 스케일 레벨로 개선된 후에 다시 ― 워핑될 수 있다. 블렌딩은 다음과 같이 (예를 들면, 프레임들이 동일한 시간 간격들로 이격되어 있다는) 시간 선형성 가정을 이용하여 수행될 수 있다:[00115] Once the level reaches a level at which the reference frames are not downscaled (ie, the reference frames are at their original resolution), the process 1100 proceeds to 1116 . For example, the number of levels may be three, such as in the example of FIG. 13 . At 1116, the warped reference frames are blended to form a lightflow reference frame.

to form Note that the warped reference frames blended at 1116 may be full scale reference frames that are re-warped according to the process described at 1106 using the estimated motion field at 1108 . In other words, the full scale reference frames can be warped twice - once using the first upscaled motion field from the previous processing layer and again after the motion field is improved to the full scale level. Blending can be performed using the temporal linearity assumption (e.g., frames are spaced at equal time intervals) as follows:

[00116]

[00116]

[00117] 몇몇 구현예들에서는, 블렌딩된 값보다는 워핑된 레퍼런스 프레임들 중 하나만의 픽셀을 선호하는 것이 바람직하다. 예를 들면, (

로 표시된) 레퍼런스 프레임 1의 레퍼런스 픽셀은 경계들을 벗어하는(예를 들면, 프레임의 치수들의 외부에 있음) 반면 레퍼런스 프레임 2의 레퍼런스 픽셀은 경계들을 벗어나지 않으면, 레퍼런스 프레임 2에서 발생하는 워핑된 이미지의 픽셀만 다음에 따라 사용된다:[00117] In some implementations, it is desirable to prefer a pixel of only one of the warped reference frames over a blended value. For example, (

of the warped image occurring in reference frame 2, if the reference pixel of reference frame 1 (indicated by Only pixels are used according to:

[00118]

[00119] 블렌딩의 일부로서 선택적인 폐색 검출(occlusion)이 수행될 수 있다. 객체들과 배경의 폐색은 비디오 시퀀스에서 흔히 발생하는데, 객체의 부분들은 하나의 레퍼런스 프레임에는 나타나지만 다른 레퍼런스 프레임에는 숨겨져 있다. 일반적으로, 전술한 광흐름 추정 방법은 명도 항등성 가정이 위반되기 때문에 이러한 상황에서는 객체의 모션을 추정할 수 없다. 폐색의 크기가 비교적 작은 경우, 평활도 페널티 함수는 모션을 꽤 정확하게 추정할 수 있다. 즉, 숨겨진 부분에서의 미정의(undefined) 모션 필드가 인접한 모션 벡터들에 의해 평활화되면, 전체 객체의 모션은 정확할 수 있다.[00119] Optional occlusion may be performed as part of the blending. Occlusion of objects and backgrounds is common in video sequences, where parts of an object appear in one reference frame but are hidden in another. In general, the above-described light flow estimation method cannot estimate the motion of an object in such a situation because the assumption of brightness constancy is violated. When the magnitude of the occlusion is relatively small, the smoothness penalty function can estimate the motion fairly accurately. That is, if an undefined motion field in the hidden part is smoothed by adjacent motion vectors, the motion of the entire object may be correct.

[00120] 하지만, 이러한 경우에 조차도, 전술한 간단한 블렌딩 방법은 만족스러운 보간 결과를 제공하지 않을 수 있다. 이는 객체 폐색을 예시하는 다이어그램인 도 14를 참조하여 설명될 수 있다. 이 예에서 객체 A의 폐색된 부분은 레퍼런스 프레임 1에서는 표시되고 레퍼런스 프레임 2에서는 객체 B에 의해 숨겨진다. 객체 A의 숨겨진 부분이 레퍼런스 프레임 2에는 표시되지 않기 때문에, 레퍼런스 프레임 2로부터의 참조된 픽셀은 객체 B의 것이다. 이 경우에는, 레퍼런스 프레임 1로부터의 워핑된 픽셀만을 사용하는 것이 바람직하다. 따라서, 상기 블렌딩 대신에 또는 그에 부가하여 폐색들을 검출하는 기술을 사용하면 보다 나은 블렌딩 결과를 제공할 수 있고, 그래서 보다 나은 레퍼런스 프레임을 제공할 수 있다.[00120] However, even in this case, the simple blending method described above may not provide a satisfactory interpolation result. This can be explained with reference to Fig. 14, which is a diagram illustrating object occlusion. In this example, the occluded portion of object A is displayed in reference frame 1 and hidden by object B in reference frame 2. Since the hidden portion of object A is not visible in reference frame 2, the referenced pixels from reference frame 2 are of object B. In this case, it is preferable to use only the warped pixels from reference frame 1. Therefore, using a technique for detecting occlusions instead of or in addition to the above blending may provide a better blending result, and thus a better reference frame.

[00121] 폐색의 검출과 관련하여, 도 14로부터 폐색이 발생하고 모션 필드가 꽤 정확한 경우, 객체 A의 폐색된 부분의 모션 벡터는 레퍼런스 프레임 2에서 객체 B를 가리킴이 관찰된다. 이는 다음과 같은 상황들을 초래할 수 있다. 첫 번째 상황은 워핑된 픽셀 값들

및

는 2 개의 상이한 객체들에서 온 것이기 때문에 매우 다르다는 것이다. 두 번째 상황은 객체 B의 픽셀들은 현재 프레임 내의 객체 B 및 현재 프레임 내의 객체 A의 폐색 부분에 대한 다수의 모션 벡터들에 의해 참조된다는 것이다.[00121] Regarding the detection of occlusion, it is observed from FIG. 14 that if occlusion occurs and the motion field is quite accurate, the motion vector of the occluded portion of object A points to object B in reference frame 2. This can lead to situations such as: The first situation is the warped pixel values.

and

is very different because it comes from two different objects. The second situation is that the pixels of object B are referenced by multiple motion vectors for object B in the current frame and the occluded portion of object A in the current frame.

[00122] 이러한 관찰들에 의해,

에 대한

만의 폐색 및 사용을 결정하기 위해 다음의 조건들이 확립될 수 있으며,

에 대해

만 사용하는 경우에도 유사한 조건들이 적용된다:[00122] By these observations,

for

The following conditions may be established to determine the occlusion and use of the bay:

About

Similar conditions apply when using only:

[00123]

이 문턱값

보다 크고; 그리고[00123]

this threshold

greater than; and

[00124]

가 문턱값

보다 크다.[00124]

is the threshold

bigger than

[00125]

는 레퍼런스 프레임 1의 참조된 픽셀이 현재의 병치된 프레임 내의 임의의 픽셀에 의해 참조되는 총 횟수이다. 전술한 서브픽셀 보간이 존재하는 경우, 레퍼런스 서브픽셀 위치가 관심있는 픽셀 위치의 1 픽셀 길이 내에 있을 때

가 카운트된다. 또한,

가 서브픽셀 위치를 가리키는 경우, 4 개의 인접한 픽셀들의 가중 평균

는 현재의 서브픽셀 위치에 대한 레퍼런스들의 총 수이다.

도 유사하게 정의할 수 있다.[00125]

is the total number of times the referenced pixel of reference frame 1 is referenced by any pixel in the current collocated frame. When the aforementioned subpixel interpolation, if present, the reference subpixel position is within 1 pixel length of the pixel position of interest.

is counted In addition,

If is a subpixel position, the weighted average of 4 adjacent pixels

is the total number of references to the current subpixel position.

can be defined similarly.

[00126] 따라서, 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스 프레임을 사용하여 제1 레퍼런스 프레임에서 폐색이 검출될 수 있다. 그 다음, 워핑된 레퍼런스 프레임들의 블렌딩은 제2 워핑된 레퍼런스 프레임으로부터의 픽셀 값들로 폐색에 대응하는 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 것(populating)을 포함할 수 있다. 유사하게, 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스 프레임을 사용하여 제2 레퍼런스 프레임에서 폐색이 검출될 수 있다. 그 다음, 워핑된 레퍼런스 프레임들의 블렌딩은 제1 워핑된 레퍼런스 프레임으로부터의 픽셀 값들로 폐색에 대응하는 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 것을 포함할 수 있다.[00126] Accordingly, occlusion may be detected in the first reference frame using the first warped reference frame and the second warped reference frame. The blending of the warped reference frames may then include populating pixel positions of the lightflow reference frame corresponding to the occlusion with pixel values from the second warped reference frame. Similarly, occlusion can be detected in the second reference frame using the first warped reference frame and the second warped reference frame. The blending of the warped reference frames may then include filling pixel positions of the lightflow reference frame corresponding to the occlusion with pixel values from the first warped reference frame.

[00127] 프로세스(1100)는 실질적인 압축 성능 이득들을 제공한다는 것이 실험적으로 제시된다. 이들 성능 이득들은 저해상도 프레임들의 세트의 경우 PSNR에서 2.5 % 및 SSIM에서 3.3 %, 및 중해상도 프레임들의 세트의 경우 PSNR에서 3.1 % 및 SSIM에서 4.0 %의 이득들을 포함한다. 하지만, 및 전술한 바와 같이, 라그랑지 함수(1)에 따라 수행되는 광흐름 추정은 프레임의 모든 N 픽셀들에 대한 모션 필드의 수평 성분 u 및 수직 성분 v를 풀기 위해

선형 방정식들을 이용한다. 다시 말해서, 광흐름 추정의 계산 복잡도는 프레임 크기의 다항식 함수이며, 이는 디코더의 복잡도에 부담을 준다. 따라서, 서브프레임 기반(예를 들면, 블록 기반)의 광흐름 추정이 다음에 설명되는데, 이는 도 11과 관련하여 설명된 프레임 기반 광흐름 추정보다 디코더의 복잡도를 저감시킬 수 있다.It has been experimentally shown that process 1100 provides substantial compression performance gains. These performance gains include gains of 2.5% in PSNR and 3.3% in SSIM for the set of low resolution frames, and gains of 3.1% in PSNR and 4.0% in SSIM for the set of medium resolution frames. However, as described above, the light flow estimation performed according to the Lagrangian function (1) is performed to solve for the horizontal component u and the vertical component v of the motion field for all N pixels of the frame.

Use linear equations. In other words, the computational complexity of the optical flow estimation is a polynomial function of the frame size, which places a burden on the complexity of the decoder. Accordingly, subframe-based (eg, block-based) optical flow estimation is described next, which can reduce the complexity of the decoder compared to the frame-based optical flow estimation described with reference to FIG. 11 .

[00128] 도 12는 광흐름 레퍼런스 프레임 부분을 생성하기 위한 방법 또는 프로세스(1200)의 흐름도이다. 이 예에서, 광흐름 레퍼런스 프레임 부분은 전체 레퍼런스 프레임보다 더 작다. 이 예에서 병치된 프레임 부분들은 블록을 참조하여 설명되지만, 다른 프레임 부분들은 도 12에 따라 처리될 수 있다. 프로세스(1200)는 프로세스(1000)의 스텝 1006을 구현할 수 있다. 프로세스(1200)는 예를 들면, 송신 스테이션(102) 또는 수신 스테이션(106)과 같은 컴퓨팅 디바이스들에 의해 실행될 수 있는 소프트웨어 프로그램으로 구현될 수 있다. 예를 들면, 소프트웨어 프로그램은 메모리(204) 또는 2 차 스토리지(214)와 같은 메모리에 저장될 수 있고 CPU(202)와 같은 프로세서에 의해 실행될 때 컴퓨팅 디바이스가 프로세스(1200)를 수행하게 할 수 있는 기계 판독 가능 명령들을 포함할 수 있다. 프로세스(1200)는 특수 하드웨어 또는 펌웨어를 사용하여 구현될 수 있다. 전술한 바와 같이, 다수의 프로세서들, 메모리들, 또는 양자 모두가 사용될 수도 있다.[00128] 12 is a flow diagram of a method or process 1200 for generating an opticalflow reference frame portion. In this example, the lightflow reference frame portion is smaller than the entire reference frame. Although in this example the collocated frame parts are described with reference to blocks, other frame parts may be processed according to FIG. 12 . Process 1200 may implement step 1006 of process 1000 . Process 1200 may be implemented as a software program that may be executed by computing devices such as, for example, transmitting station 102 or receiving station 106 . For example, a software program may be stored in a memory such as memory 204 or secondary storage 214 and may cause the computing device to perform process 1200 when executed by a processor such as CPU 202 . may include machine readable instructions. Process 1200 may be implemented using specialized hardware or firmware. As noted above, multiple processors, memories, or both may be used.

[00129] 1202에서, 현재 프레임 내의 모든 픽셀들에는 초기화된 모션 벡터가 할당된다. 이들은 레퍼런스 프레임들 사이의 모션 길이들을 단축시키기 위해 제1 처리 레벨을 위해 레퍼런스 프레임들을 현재 프레임으로 워핑하는데 이용될 수 있는 초기 모션 필드들을 정의한다. 1202에서의 초기화는 1102에서의 초기화와 관련하여 설명된 것과 동일한 처리를 이용하여 수행될 수 있으며, 그래서 여기서는 설명이 반복되지 않는다.[00129] At 1202, all pixels in the current frame are assigned an initialized motion vector. These define initial motion fields that can be used to warp reference frames to the current frame for a first level of processing to shorten motion lengths between reference frames. The initialization at 1202 may be performed using the same processing as described with respect to the initialization at 1102, so the description is not repeated here.

[00130] 1204에서, 레퍼런스 프레임들 ― 예컨대, 레퍼런스 프레임 1 및 레퍼런스 프레임 2 ― 은 1202에서 초기화된 모션 필드에 따라 현재 프레임으로 워핑된다. 1204에서의 워핑은 바람직하게는, 1202에서 초기화된 모션 필드 mv_cur가 레퍼런스 프레임들을 워핑하기 전에 그 풀 해상도 값으로부터 다운스케일링되지 않는 것을 제외하고는, 1106에서의 워핑과 관련하여 설명된 것과 동일한 처리를 이용하여 수행될 수 있다.At 1204 , reference frames - eg, reference frame 1 and reference frame 2 - are warped to the current frame according to the motion field initialized at 1202 . Warping at 1204 is preferably the same process as described with respect to warping at 1106, except that _{the motion field mv cur initialized at 1202 is not downscaled from its full resolution value before warping the reference frames.} can be performed using

[00131] 1204에서의 계산의 말미에는, 풀 해상도의 2 개의 워핑된 레퍼런스 프레임들이 존재한다. 프로세스(1100)와 같이, 프로세스(1200)는 도 13과 관련하여 설명된 것과 유사한 멀티-레벨 프로세스를 사용하여 2 개의 레퍼런스 프레임들 사이의 모션 필드를 추정할 수 있다. 대략적으로 말하면, 프로세스(1200)는 모든 레벨이 고려될 때까지 레벨에 대한 도함수들을 계산하고, 도함수들을 사용하여 광흐름 추정을 수행하며, 결과적인 모션 필드를 다음 레벨을 위해 업스케일링한다.[00131] At the end of the calculation at 1204, there are two warped reference frames of full resolution. Like process 1100 , process 1200 can estimate a motion field between two reference frames using a multi-level process similar to that described with respect to FIG. 13 . Broadly speaking, process 1200 computes derivatives for a level until all levels are considered, uses the derivatives to perform lightflow estimation, and upscales the resulting motion field for the next level.

[00132] 보다 구체적으로, 현재(또는 제1) 처리 레벨에서 블록에 대한 모션 필드

이 1206에서 초기화된다. 블록은 현재 프레임의 스캔 순서(예를 들면, 래스터 스캔 순서)에서 선택된 현재 프레임의 블록일 수 있다. 블록에 대한 모션 필드

은 블록의 각각의 픽셀들에 대한 모션 필드를 포함한다. 즉, 1206에서, 현재의 블록 내의 모든 픽셀들에는 초기화된 모션 벡터가 할당된다. 초기화된 모션 벡터들은 레퍼런스 프레임들의 레퍼런스 블록들 사이의 길이들을 단축시키기 위해 레퍼런스 블록들을 현재 블록으로 워핑하는데 사용된다.[00132] More specifically, a motion field for a block at the current (or first) processing level

It is initialized at 1206. The block may be a block of the current frame selected from the scan order (eg, raster scan order) of the current frame. motion field for blocks

contains a motion field for each pixel of the block. That is, at 1206 , all pixels in the current block are assigned an initialized motion vector. The initialized motion vectors are used to warp the reference blocks into the current block to shorten the lengths between the reference blocks of the reference frames.

[00133] 1206에서, 모션 필드

은 그 풀 해상도 값으로부터 레벨의 해상도로 다운스케일링된다. 다시 말하면, 1206에서의 초기화는 1202에서 초기화된 풀 해상도 값으로부터 블록의 각각의 픽셀들에 대한 모션 필드를 다운스케일링하는 것을 포함할 수 있다. 다운스케일링은 전술한 다운스케일링과 같은, 임의의 기술을 사용하여 수행될 수 있다.[00133] At 1206, the motion field

is downscaled from its full resolution value to the level's resolution. In other words, the initialization at 1206 may include downscaling the motion field for each pixel of the block from the full resolution value initialized at 1202 . Downscaling may be performed using any technique, such as the downscaling described above.

[00134] 1208에서, 워핑된 레퍼런스 프레임들 각각의 모션 필드에 대응하는 병치된 레퍼런스 블록들이 현재 블록으로 워핑된다. 레퍼런스 블록들의 워핑은 1106에서의 프로세스(1100)와 유사하게 수행된다. 즉, 레퍼런스 프레임 1의 레퍼런스 블록의 픽셀들의 광흐름

를 알면, 워핑을 위한 모션 필드는 다음과 같이 (예를 들면, 모션은 시간 경과에 따라 선형으로 투영된다는) 선형 투영 가정에 의해 추론된다:At 1208 , collocated reference blocks corresponding to the motion field of each of the warped reference frames are warped to the current block. Warping of the reference blocks is performed similar to process 1100 at 1106 . That is, the light flow of the pixels of the reference block of the reference frame 1

Knowing , the motion field for warping is inferred by the linear projection assumption (e.g., motion is projected linearly over time) as follows:

[00135]

[00136] 워핑을 수행하기 위해, 모션 필드

의 수평 성분

및 수직 성분

은 Y 성분에 대해서는 1/8 픽셀 정밀도로 및 U 및 V 성분에 대해서는 1/16 픽셀 정밀도로 라운딩될 수 있다. 다른 값들도 사용될 수 있다. 라운딩 후, 워핑된 블록의 각각의 픽셀, 예를 들면

은 모션 벡터 mv_r1에 의해 주어진 참조된 픽셀로 계산된다. 종래의 서브픽셀 보간 필터를 사용하여 서브픽셀 보간이 수행될 수 있다.[00136] To perform warping, a motion field

horizontal component of

and vertical components

can be rounded to 1/8 pixel precision for the Y component and 1/16 pixel precision for the U and V components. Other values may also be used. After rounding, each pixel of the warped block, e.g.

is computed with the referenced pixel given by the motion vector mv _{r1 .} Subpixel interpolation may be performed using a conventional subpixel interpolation filter.

[00137] 레퍼런스 프레임 2의 레퍼런스 블록에 대해서도 동일한 워핑 접근법이 수행되어 워핑된 블록, 예를 들면

가 얻어지는데, 여기서 모션 필드는 다음에 의해 계산된다:[00137] The same warping approach is performed for the reference block of the reference frame 2, so that the warped block, for example

is obtained, where the motion field is calculated by:

[00138]

[00139] 1208에서의 계산의 말미에, 2 개의 워핑된 레퍼런스 블록들이 존재한다. 2 개의 워핑된 레퍼런스 블록들은 1210에서 이들 사이의 모션 필드를 추정하는 데 사용된다. 1210에서의 처리는 도 11의 1108에서의 처리와 관련하여 설명된 것과 유사할 수 있다.[00139] At the end of the calculation at 1208, there are two warped reference blocks. The two warped reference blocks are used at 1210 to estimate the motion field between them. The processing at 1210 may be similar to that described with respect to the processing at 1108 of FIG. 11 .

[00140] 보다 구체적으로, 2 개의 워핑된 레퍼런스 블록들은 풀 해상도일 수 있다. 도 13의 피라미드 구조에 따르면, 도함수들 E_x, E_y, 및 E_t가 함수들 (3), (4), 및 (5)를 사용하여 계산된다. 프레임 레벨 추정을 위한 도함수들을 계산할 때, 프로세스(1100)와 관련하여 설명된 바와 같이 경계를 벗어난 픽셀 값들을 획득하기 위해 가장 가까운 이용 가능한 픽셀을 복사함으로써 프레임 경계부들이 확장될 수 있다. 하지만, 다른 프레임 부분들의 경우, 1204에서 워핑된 레퍼런스 프레임들에서 인접한 픽셀들이 종종 이용 가능하다. 예를 들어, 블록 기반 추정의 경우, 블록 자체가 프레임 경계부에 있지 않으면 워핑된 레퍼런스 프레임들에서 인접 블록들의 픽셀들이 이용 가능하다. 따라서, 워핑된 레퍼런스 프레임 부분에 대해 경계들을 벗어난 픽셀들의 경우에, 워핑된 레퍼런스 프레임의 인접 부분들의 픽셀들은, 해당되는 경우 픽셀 값들

및/또는

로 사용될 수 있다. 투영된 픽셀들이 프레임 경계부들의 외부에 있는 경우, 가장 가까운 이용 가능한(즉, 경계들 내의) 픽셀을 복사하는 것이 여전히 사용될 수 있다. 도함수들이 계산된 후, 이들은 현재 레벨로 다운스케일링될 수 있다. 각각의 레벨 l에서 다운스케일링된 도함수들은 앞서 논의된 바와 같이, 2¹ x 2¹ 블록 내에서 평균화함으로써 계산될 수 있다. 도함수들을 계산하고 평균화하는 두 가지 선형 연산들을 단일의 선형 필터로 결합함으로써 계산들의 복잡도를 낮출 수 있지만, 필수는 아니다.More specifically, the two warped reference blocks may be full resolution. According to the pyramid structure of FIG. 13 , derivatives E _x , E _y , and E _t are calculated using functions (3), (4), and (5). When calculating derivatives for frame level estimation, frame boundaries may be expanded by copying the nearest available pixel to obtain out-of-bounds pixel values as described with respect to process 1100 . However, for other frame portions, adjacent pixels in the reference frames warped at 1204 are often available. For example, for block-based estimation, pixels of adjacent blocks are available in warped reference frames unless the block itself is at a frame boundary. Thus, in the case of pixels that are out of bounds with respect to a warped reference frame portion, pixels in adjacent portions of the warped reference frame have pixel values, if applicable.

and/or

can be used as If the projected pixels are outside the frame boundaries, then copying the nearest available (ie within the boundaries) pixel can still be used. After the derivatives are calculated, they can be downscaled to the current level. The downscaled derivatives at each level 1 can be calculated by averaging within a ^{2 1} x 2 ^{1 block, as discussed above.} Combining the two linear operations of calculating and averaging the derivatives into a single linear filter can reduce the complexity of the calculations, but is not required.

[00141] 1210에서의 처리를 계속하면, 다운스케일링된 도함수들은 워핑된 레퍼런스 부분들 사이의 모션 필드을 추정하기 위한 광흐름 추정을 수행하게 위해 라그랑지 함수(1)의 입력들로서 사용될 수 있다. 수평 성분 u 및 수직 성분 v에 대한 라그랑지 함수(1)의 도함수들을 0으로 설정하고(즉,

및

),

개의 선형 방정식들을 푸는 것에 의해, 부분, 여기서는 블록의 모든 N 픽셀들에 대해 모션 필드의 수평 성분 u 및 수직 성분 v가 결정될 수 있다. 이를 위해, 경계를 벗어난 모션 벡터들을 다루는 두 가지 선택적 방법이 있다. 하나의 방법은 인접한 블록들과 제로 상관(zero correlation)을 가정하고, 경계를 벗어난 모션 벡터가 경계를 벗어난 픽셀 위치에 가장 가까운 경계부 위치에 있는 모션 벡터와 동일하다고 가정하는 것이다. 다른 방법은 현재 픽셀에 대해 초기화된 모션 벡터(즉, 1206에서 초기화된 모션 필드)를 현재 픽셀에 대응하는 경계를 벗어난 픽셀 위치에 대한 모션 벡터로 사용하는 것이다.Continuing the process at 1210 , the downscaled derivatives may be used as inputs of the Lagrangian function 1 to perform lightflow estimation to estimate the motion field between the warped reference portions. Set the derivatives of the Lagrangian function (1) with respect to the horizontal component u and the vertical component v to zero (i.e.,

and

),

By solving the linear equations of , the horizontal component u and the vertical component v of the motion field can be determined for a part, here all N pixels of the block. To this end, there are two alternative ways to deal with out-of-bounds motion vectors. One method is to assume zero correlation with adjacent blocks, and assume that the out-of-bounds motion vector is the same as the motion vector at the boundary position closest to the out-of-boundary pixel position. Another method is to use the motion vector initialized for the current pixel (ie, the motion field initialized at 1206 ) as the motion vector for the out-of-bounds pixel position corresponding to the current pixel.

[00142] 모션 필드가 추정되고 난 후에, 레벨에 대한 현재 모션 필드가 워핑된 레퍼런스 블록들 사이의 추정 모션 필드를 사용하여 업데이트되거나 개선되어, 1210에서의 처리를 완료한다. 예를 들면, 픽셀에 대한 현재 모션 필드는 픽셀별로 각각의 픽셀에 대해 추정된 모션 필드를 추가함으로써 업데이트될 수 있다.[00142] After the motion field is estimated, the current motion field for the level is updated or improved using the estimated motion field between the warped reference blocks, completing processing at 1210 . For example, the current motion field for a pixel may be updated by adding an estimated motion field for each pixel on a per-pixel basis.

[00143] 프로세스(1100)에서, 각각의 레벨에서 모션 필드가 라그랑지 파라미터 λ에 대해 점점 더 작은 값들을 사용하여 추정되고 개선되도록, 라그랑지 파라미터 λ에 대해 감소하는 값들(decreasing values)을 설정하기 위해 추가 루프가 포함된다. 프로세스(1200)에서, 이 루프는 생략된다. 즉, 도시된 바와 같은 프로세스(1200)에서는, 라그랑지 파라미터 λ에 대해 단 하나의 값만이 현재 처리 레벨에서 모션 필드를 추정하는데 사용된다. 이는 25와 같이, 비교적 작은 값일 수 있다. 예를 들면, 모션의 평활도, 이미지 해상도, 또는 다른 변수들에 따라, 라그랑지 파라미터 λ에 대해 다른 값들도 가능하다.[00143] In process 1100, an additional loop to set decreasing values for the Lagrangian parameter λ such that at each level the motion field is estimated and improved using increasingly smaller values for the Lagrangian parameter λ. is included In process 1200, this loop is omitted. That is, in the process 1200 as shown, only one value for the Lagrangian parameter λ is used to estimate the motion field at the current processing level. This may be a relatively small value, such as 25. Other values for the Lagrangian parameter λ are possible, for example, depending on the smoothness of the motion, the image resolution, or other variables.

[00144] 다른 구현예들에서, 프로세스(1200)는 라그랑지 파라미터 λ를 변경하기 위한 추가 루프를 포함할 수 있다. 이러한 루프가 포함되는 구현예에서, 프로세스(1100)에서 1104 및 1110에서의 처리와 관련하여 설명된 바와 같이, 1208에서 레퍼런스 블록들을 워핑하고 1210에서 모션 필드를 추정 및 업데이트하는 것이 라그랑지 파라미터 λ에 대한 모든 값들이 사용될 때까지 반복되도록, 라그랑지 파라미터 λ는 1210에서 모션 필드를 추정하기 전에 설정될 수 있다.[00144] In other implementations, process 1200 may include an additional loop to change the Lagrangian parameter λ. In an implementation where such a loop is involved, warping the reference blocks at 1208 and estimating and updating the motion field at 1210 as described with respect to processing at 1104 and 1110 in process 1100 depends on the Lagrangian parameter λ. The Lagrangian parameter λ may be set before estimating the motion field at 1210 , such that it is repeated until all values for α are used.

[00145] 프로세스(1200)는 1210에서 모션 필드를 추정 및 업데이트한 후에 1212의 질의로 진행한다. 이는 라그랑지 파라미터 λ에 단일 값이 사용될 때 1210의 레벨에서 첫 번째이면서 유일한 모션 필드 추정 및 업데이트의 후에 수행된다. 라그랑지 파라미터 λ에 대한 다수의 값들이 처리 레벨에서 수정될 때, 프로세스(1200)는 라그랑지 파라미터 λ의 최종 값을 사용하여 1210에서 모션 필드를 추정 및 업데이트한 후에 1212의 질의로 진행한다.[00145] The process 1200 proceeds to the query at 1212 after estimating and updating the motion field at 1210 . This is done after the first and only motion field estimation and update at the level of 1210 when a single value is used for the Lagrangian parameter λ. When multiple values for the Lagrangian parameter λ are modified at the processing level, the process 1200 estimates and updates the motion field at 1210 using the final value of the Lagrangian parameter λ before proceeding to the query at 1212 .

[00146] 1212에서 질의에 응답하여 추가 처리 레벨들이 있는 경우, 프로세스(1200)는 1214로 진행하며, 여기서 모션 필드는 1206에서 시작하여 다음 층을 처리하기 전에 업스케일링된다. 업스케일링은 임의의 알려진 기술에 따라 수행될 수 있다.[00146] If there are additional processing levels in response to the query at 1212 , the process 1200 proceeds to 1214 , where the motion field is upscaled before processing the next layer starting at 1206 . Upscaling may be performed according to any known technique.

[00147] 일반적으로, 광흐름은 먼저 피라미드의 최고 레벨에서 모션 필드을 획득하기 위해 추정된다. 그 후, 모션 필드는 업스케일링되어 다음 레벨에서 광흐름 추정을 초기화하는데 사용된다. 모션 필드를 업스케일링하고, 이를 사용하여 다음 레벨의 광흐름 추정을 초기화하며, 모션 필드를 획득하는 이 프로세스는 1212에서 피라미드의 최하위 레벨에 도달할 때까지(즉, 풀 스케일로 계산된 도함수들에 대해 광흐름 추정이 완료될 때까지) 계속된다.[00147] In general, the light flow is first estimated to obtain the motion field at the highest level of the pyramid. The motion field is then upscaled and used to initialize the lightflow estimate at the next level. This process of upscaling the motion field, using it to initialize the next level of lightflow estimation, and obtaining the motion field, continues at 1212 until the lowest level of the pyramid is reached (i.e., to the derivatives calculated at full scale). until the light flow estimation is completed).

[00148] 일단 레벨이 레퍼런스 프레임들이 다운스케일링되지 않는 레벨(즉, 레퍼런스 프레임들이 그 원래 해상도에 있음)에 도달하면, 프로세스(1200)는 1216으로 진행한다. 예를 들면, 레벨들의 수는 도 13의 예에서와 같은, 3 개일 수 있다. 1216에서, 워핑된 레퍼런스 블록들은 블렌딩되어 광흐름 레퍼런스 블록(예를 들면, 전술한 바와 같이

)을 형성한다. 1216에서 블렌딩된 워핑된 레퍼런스 블록들은 1210에서 추정된 모션 필드를 사용하여 1208에 기술된 프로세스에 따라 다시 워핑되는 풀 스케일 레퍼런스 블록들일 수 있음에 유의하자. 다시 말하면, 풀 스케일의 레퍼런스 블록들은 2 회 ― 이전 처리 층으로부터의 최초의 업스케일링된 모션 필드를 사용하여 한 번 및 모션 필드가 풀 스케일 레벨로 개선된 후에 다시 ― 워핑될 수 있다. 블렌딩은 1116에서 설명된 처리와 유사하게 시간 선형성 가정을 이용하여 수행될 수 있다. 1116에서 설명되고 도 14에 예로서 도시된 바와 같은 선택적인 폐색 검출은 1216에 블렌딩의 일부로서 통합될 수도 있다.[00148] Once the level reaches a level at which the reference frames are not downscaled (ie, the reference frames are at their original resolution), the process 1200 proceeds to 1216 . For example, the number of levels may be three, such as in the example of FIG. 13 . At 1216, the warped reference blocks are blended to a lightflow reference block (eg, as described above).

) to form Note that the warped reference blocks blended at 1216 may be full scale reference blocks that are warped again according to the process described at 1208 using the estimated motion field at 1210 . In other words, the full scale reference blocks can be warped twice - once using the first upscaled motion field from the previous processing layer and again after the motion field is improved to the full scale level. Blending may be performed using a temporal linearity assumption similar to the process described at 1116 . An optional occlusion detection as described at 1116 and shown by way of example in FIG. 14 may be incorporated as part of the blending at 1216 .

[00149] 1216에서 병치된 레퍼런스 블록이 생성되고 난 후에, 프로세스(1200)는 1218로 진행하여 예측을 위한 추가 프레임 부분들(여기서는, 블록들)이 있는지 여부를 결정한다. 있는 경우, 프로세스(1200)는 다음 블록에 대해 1206에서 시작하여 반복한다. 블록들은 스캔 순서로 처리될 수 있다. 1218에서의 질의에 응답하여 고려해야 할 블록이 더 이상 없으면, 프로세스(1200)는 종료된다.[00149] After the collocated reference block is generated at 1216 , the process 1200 proceeds to 1218 to determine whether there are additional frame portions (here, blocks) for prediction. If so, process 1200 repeats starting at 1206 for the next block. Blocks may be processed in scan order. If there are no more blocks to consider in response to the query at 1218, the process 1200 ends.

[00150] 도 10을 다시 참조하면, 프로세스(1200)는 프로세스(1000)에서 1006을 구현할 수 있다. 1006에서의 처리의 말미에는, 프로세스(1100), 프로세스(1200), 또는 본 명세서에 기재된 바와 같은 이들의 변형예에 따라 수행되는지에 관계없이, 하나 이상의 워핑된 레퍼런스 프레임 부분들이 존재한다.[00150] Referring again to FIG. 10 , process 1200 may implement 1006 in process 1000 . At the end of processing at 1006, there are one or more warped reference frame portions, whether performed according to process 1100 , process 1200 , or a variant thereof as described herein.

[00151] 1008에서는, 1006에서 생성된 광흐름 레퍼런스 프레임 부분을 사용하여 예측 프로세스가 수행된다. 인코더에서 예측 프로세스를 수행하는 것은 프레임의 현재 블록에 대한 광흐름 레퍼런스 프레임으로부터 예측 블록을 생성하는 것을 포함할 수 있다. 광흐름 레퍼런스 프레임은 프로세스(1100)에 의해 출력되고 레퍼런스 프레임 버퍼(600)와 같은 레퍼런스 프레임 버퍼에 저장되는 광흐름 레퍼런스 프레임일 수 있다. 광흐름 레퍼런스 프레임은 프로세스(1200)에 의해 출력된 광흐름 레퍼런스 부분들을 결합함으로써 생성된 광흐름 레퍼런스 프레임일 수 있다. 광흐름 레퍼런스 부분들을 결합하는 것은 광흐름 레퍼런스 부분들 각각의 생성에 사용되는 각각의 현재 프레임 부분들의 픽셀 위치들에 따라 광흐름 레퍼런스 부분들(예를 들면, 병치된 레퍼런스 블록들)을 배열하는 것을 포함할 수 있다. 결과적인 광흐름 레퍼런스 프레임은 인코더(400)의 레퍼런스 프레임 버퍼(600)와 같은, 인코더의 레퍼런스 프레임 버퍼에 사용하기 위해 저장될 수 있다.[00151] At 1008 , a prediction process is performed using the portion of the lightflow reference frame generated at 1006 . Performing the prediction process at the encoder may include generating a prediction block from a lightflow reference frame for a current block of the frame. The optical flow reference frame may be an optical flow reference frame output by process 1100 and stored in a reference frame buffer, such as reference frame buffer 600 . The lightflow reference frame may be a lightflow reference frame created by combining portions of the lightflow reference output by process 1200 . Combining the lightflow reference portions includes arranging the lightflow reference portions (eg, collocated reference blocks) according to the pixel positions of each current frame portion used in the creation of each of the lightflow reference portions. may include The resulting optical flow reference frame may be stored for use in a reference frame buffer of an encoder, such as reference frame buffer 600 of encoder 400 .

[00152] 인코더에서 예측 블록을 생성하는 것은 광흐름 레퍼런스 프레임에서 병치된 블록을 예측 블록으로서 선택하는 것을 포함할 수 있다. 인코더에서 예측 블록을 생성하는 것은 대신에 현재 블록에 대한 가장 잘 매칭되는 예측 블록을 선택하기 위해 광흐름 레퍼런스 프레임 내에서 모션 검색을 수행하는 것을 포함할 수 있다. 하지만, 예측 블록은 인코더에서 생성되며, 결과적인 잔차는 예컨대 도 4의 인코더(400)와 관련하여 설명된 손실성 인코딩 프로세스(lossy encoding process)를 사용하여 추가로 처리될 수 있다.[00152] Generating the predictive block at the encoder may include selecting a collocated block in the opticalflow reference frame as the predictive block. Generating the predictive block at the encoder may instead include performing a motion search within the opticalflow reference frame to select a best-matching predictive block for the current block. However, the predictive block is generated at the encoder, and the resulting residual may be further processed using, for example, the lossy encoding process described with respect to the encoder 400 of FIG. 4 .

[00153] 인코더에서, 프로세스(1000)는 하나 이상의 인트라 예측 모드들 및 현재 프레임에 대해 이용 가능한 예측 프레임들을 사용하는 단일 및 복합 인터 예측 모드들 둘 모두를 포함하는, 다양한 예측 모드들을 사용하는 현재 블록에 대한 레이트 왜곡 루프(rate distortion loop)의 일부를 형성할 수 있다. 단일 인터 예측 모드는 인터 예측을 위해 단일의 순방향 또는 역방향 레퍼런스 프레임만을 사용한다. 복합 인터 예측 모드는 인터 예측을 위해 순방향 및 역방향 레퍼런스 프레임 양자 모두를 사용한다. 레이트 왜곡 루프에서는, 각각의 예측 모드들을 사용하여 현재 블록을 인코딩하는데 사용되는 레이트(예를 들면, 비트 수)가 인코딩으로 인한 왜곡과 비교된다. 왜곡은 인코딩 전과 디코딩 후의 블록의 픽셀 값들 사이의 차이들로 계산될 수 있다. 차이들은 절대 차이들의 합 또는 프레임들의 블록들에 대한 누적 오류를 캡처하는 다른 측정치일 수 있다.[00153] At the encoder, process 1000 is a rate for a current block using various prediction modes, including both single and complex inter prediction modes using one or more intra prediction modes and prediction frames available for the current frame. It can form part of a rate distortion loop. The single inter prediction mode uses only a single forward or backward reference frame for inter prediction. The complex inter prediction mode uses both forward and backward reference frames for inter prediction. In the rate distortion loop, the rate (eg, number of bits) used to encode the current block using the respective prediction modes is compared to the distortion due to encoding. Distortion can be calculated as differences between pixel values of a block before encoding and after decoding. Differences may be the sum of absolute differences or another measure that captures the cumulative error for blocks of frames.

[00154] 몇몇 구현예들에서는, 광흐름 레퍼런스 프레임의 사용을 단일 인터 예측 모드로 제한하는 것이 바람직할 수 있다. 즉, 광흐름 레퍼런스 프레임은 모든 복합 레퍼런스 모드에서는 레퍼런스 프레임으로서 제외될 수 있다. 이는 레이트 왜곡 루프를 단순화할 수 있고, 광흐름 레퍼런스 프레임은 이미 순방향 및 역방향 레퍼런스 프레임 양자 모두를 고려하기 때문에 블록의 인코딩에 대한 추가적인 영향은 거의 없을 것으로 예상된다. 본 명세서에 기재된 구현예에 따르면, 광흐름 레퍼런스 프레임이 현재 프레임을 인코딩하는데 사용하는데 이용 가능한지 여부를 나타내기 위해 플래그가 비트스트림으로 인코딩될 수 있다. 플래그는 일례에서 현재 프레임 내의 임의의 단일 블록이 광흐름 레퍼런스 프레임 블록을 사용하여 인코딩될 때 인코딩될 수 있다. 광흐름 레퍼런스 프레임이 현재 프레임에 이용 가능한 경우, 현재 블록이 광흐름 레퍼런스 프레임을 사용한 인터 예측에 의해 인코딩되었는지 여부를 나타내는 추가 플래그 또는 다른 표시기(예를 들면, 블록 레벨에서)를 포함할 수 있다.[00154] In some implementations, it may be desirable to limit the use of the opticalflow reference frame to a single inter prediction mode. That is, the light flow reference frame may be excluded as a reference frame in all composite reference modes. This can simplify the rate distortion loop, and since the lightflow reference frame already considers both forward and backward reference frames, it is expected that there will be little additional impact on the encoding of the block. According to implementations described herein, a flag may be encoded into the bitstream to indicate whether a lightflow reference frame is available for use in encoding the current frame. The flag may be encoded in one example when any single block within the current frame is encoded using the opticalflow reference frame block. If an opticalflow reference frame is available for the current frame, it may include an additional flag or other indicator (eg, at block level) indicating whether the current block was encoded by inter prediction using the opticalflow reference frame.

[00155] 1008에서의 예측 프로세스는 현재 프레임이 인코딩될 때까지 현재 프레임의 모든 블록들에 대해 반복될 수 있다.[00155] The prediction process at 1008 may be repeated for all blocks of the current frame until the current frame is encoded.

[00156] 디코더에서, 1008에서 광흐름 레퍼런스 프레임 부분을 사용하여 예측 프로세스를 수행하는 것은 광흐름 레퍼런스 프레임이 현재 프레임을 디코딩하는데 이용 가능하다는 결정에 기인할 수 있다. 몇몇 구현예들에서, 결정은 현재 프레임의 적어도 하나의 블록이 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩되었음을 나타내는 플래그를 검사함으로써 이루어진다. 디코더에서 1008에서 예측 프로세스를 수행하는 것은 예측 블록을 생성하는 것을 포함할 수 있다. 예측 블록을 생성하는 것은 블록 헤더에서와 같이, 인코딩된 비트스트림으로부터 디코딩된 인터 예측 모드를 사용하는 것을 포함할 수 있다. 인터 예측 모드를 결정하기 위해 플래그 또는 표시기가 디코딩될 수 있다. 인터 예측 모드가 광흐름 레퍼런스 프레임 모드인 경우(즉, 블록이 광흐름 레퍼런스 프레임 부분을 사용하여 인터 예측된 경우), 디코딩될 현재 블록에 대한 예측 블록은 광흐름 레퍼런스 프레임 부분의 픽셀들 및 모션 벡터 모드 및/또는 모션 벡터를 사용하여 생성된다.[00156] At the decoder, performing the prediction process using the opticalflow reference frame portion at 1008 may result in a determination that the opticalflow reference frame is available for decoding the current frame. In some implementations, the determination is made by examining a flag indicating that at least one block of the current frame has been encoded using the opticalflow reference frame portion. Performing the prediction process at 1008 at the decoder may include generating a prediction block. Generating the prediction block may include using an inter prediction mode decoded from an encoded bitstream, such as in a block header. A flag or indicator may be decoded to determine the inter prediction mode. When the inter prediction mode is the optical flow reference frame mode (ie, when the block is inter predicted using the optical flow reference frame part), the prediction block for the current block to be decoded is the pixels of the optical flow reference frame part and the motion vector. It is created using modes and/or motion vectors.

[00157] 디코딩의 일부로서 예측 프로세스에 사용하기 위한 광흐름 레퍼런스 프레임을 생성하기 위한 동일한 처리가 인코더에서 수행된 것과 같이, 디코더(500)와 같은 디코더에서 수행될 수 있다. 예를 들면, 플래그가 현재 프레임의 적어도 하나의 블록이 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩되었다는 것을 나타낼 때, 전체 광흐름 레퍼런스 프레임이 생성되어 예측 프로세스에 사용하기 위해 저장될 수 있다. 하지만, 코딩 블록들이 인터 예측 레퍼런스 프레임으로서 병치/광흐름 레퍼런스 프레임을 사용하는 것으로 식별되는 프로세스(1200)의 성능을 제한하도록 프로세스(1200)를 수정함으로써, 디코더에서의 컴퓨테이션 파워가 추가적으로 절약된다. 이는 디코더를 최적화하기 위한 하나의 기술을 예시하는 도면인 도 15를 참조하여 설명될 수 있다.[00157] The same processing for generating an opticalflow reference frame for use in the prediction process as part of decoding may be performed in a decoder, such as decoder 500, as performed in the encoder. For example, when the flag indicates that at least one block of the current frame has been encoded using a portion of the opticalflow reference frame, an entire opticalflow reference frame may be generated and stored for use in the prediction process. However, by modifying the process 1200 to limit the performance of the process 1200 in which the coding blocks are identified as using the collocated/optical flow reference frame as the inter prediction reference frame, additional computational power is saved at the decoder. This can be explained with reference to FIG. 15 , which is a diagram illustrating one technique for optimizing a decoder.

[00158] 도 15에는, 픽셀들이 그리드(1500)를 따라 도시되며, w는 그리드(1500)의 제1 축을 따라 픽셀의 위치를 나타내고 y는 그리드(1500)의 제2 축을 따라 픽셀의 위치를 나타낸다. 그리드(1500)는 현재 프레임의 일부의 픽셀 위치들을 나타낸다. 1008에서 디코더에서 예측 프로세스를 수행하기 위해, 1006 및 1008에서의 처리가 결합될 수 있다. 예를 들면, 1006에서 프로세스를 수행하기 전에, 1008에서의 예측 프로세스는 (예를 들면, 모션 벡터와 같은 헤더 정보로부터) 현재 블록을 인코딩하는 데 사용되는 레퍼런스 블록을 찾는 것을 포함할 수 있다. 도 15에서, 현재 코딩 블록(1502)에 대한 모션 벡터는 내부 파선(1504)으로 나타낸 레퍼런스 블록을 가리킨다. 현재 코딩 블록(1502)은 4x4 픽셀들을 포함한다. 레퍼런스 블록이 현재 프레임이 아니라 레퍼런스 프레임에 위치되기 때문에 레퍼런스 블록의 위치는 파선(1504)으로 도시된다.[00158] In FIG. 15 , pixels are shown along a grid 1500 , where w denotes the position of the pixel along the first axis of the grid 1500 and y denotes the position of the pixel along the second axis of the grid 1500 . Grid 1500 represents pixel positions of a portion of the current frame. The processing at 1006 and 1008 may be combined to perform the prediction process at the decoder at 1008 . For example, before performing the process at 1006 , the prediction process at 1008 may include finding a reference block used to encode the current block (eg, from header information such as a motion vector). In FIG. 15 , the motion vector for the current coding block 1502 points to the reference block indicated by the inner dashed line 1504 . The current coding block 1502 contains 4x4 pixels. The location of the reference block is shown by dashed line 1504 because the reference block is located in the reference frame and not the current frame.

[00159] 일단 레퍼런스 블록이 위치되고 나면, 레퍼런스 블록이 걸쳐져 있는(즉, 중복되는) 레퍼런스 블록들 모두가 식별된다. 이는 서브픽셀 보간 필터들을 고려하기 위해 각각의 경계부에서 필터 길이의 절반만큼 레퍼런스 블록의 크기를 확장하는 것을 포함할 수 있다. 도 15에서는, 서브픽셀 보간 필터의 길이(L)가 레퍼런스 블록을 외부 파선(1506)으로 나타낸 경계부들까지 확장하는데 사용된다. 비교적 흔하듯이, 모션 벡터는 전체 펠(full-pel) 위치들과는 완벽하게 정렬되지 않는 레퍼런스 블록을 발생시킨다. 도 15에서 어두운 영역은 전체 펠 위치들을 나타낸다. 전체 펠 위치들과 중복되는 레퍼런스 블록들 모두가 식별된다. 블록 크기들이 현재 코딩 블록(1502)과 동일하다고 가정하면, 현재 블록과 병치된 제1 레퍼런스 블록, 제1 레퍼런스 블록의 위에 있는 제2 레퍼런스 블록, 제1 레퍼런스 블록의 좌측으로부터 연장되는 2 개의 레퍼런스 블록들, 및 제2 레퍼런스 블록의 좌측으로부터 연장되는 2 개의 레퍼런스 블록들이 식별된다.[00159] Once a reference block is located, all of the reference blocks it spans (ie, overlaps) are identified. This may include extending the size of the reference block by half the filter length at each boundary in order to consider subpixel interpolation filters. In FIG. 15 , the length L of the subpixel interpolation filter is used to extend the reference block to the boundaries indicated by the outer dashed line 1506 . As is relatively common, a motion vector results in a reference block that is not perfectly aligned with full-pel positions. The shaded areas in FIG. 15 represent all pel positions. All pel positions and overlapping reference blocks are identified. Assuming that the block sizes are equal to the current coding block 1502, a first reference block collocated with the current block, a second reference block above the first reference block, and two reference blocks extending from the left of the first reference block , and two reference blocks extending from the left side of the second reference block are identified.

[00160] 레퍼런스 블록들이 식별되면, 병치된/광흐름 추정된 레퍼런스 블록들을 생성하기 위해 식별된 레퍼런스 블록들과 병치된 현재 프레임 내의 블록들에 대해서만 프로세스(1200)가 1006에서 수행될 수 있다. 도 15의 예에서는, 이로 인해 6 개의 광흐름 레퍼런스 프레임 부분들이 발생되게 된다.[00160] Once the reference blocks have been identified, the process 1200 may be performed at 1006 only on blocks in the current frame collocated with the identified reference blocks to generate collocated/optical flow estimated reference blocks. In the example of FIG. 15 , this results in 6 lightflow reference frame parts being generated.

[00161] 이 수정된 프로세스에 따르면, 인코더와 디코더가 동일한 예측자(predictor)를 가지면서도 디코더는 병치된 레퍼런스 프레임 전체를 계산할 필요가 없는 것이 보장된다. 임의의 확장된 경계들을 포함하는 후속 블록에 대한 레퍼런스 블록(들)은 현재 블록의 디코딩 프로세스 중에 식별된 하나 이상의 레퍼런스 블록들과 중첩될 수 있다는 것이 주목된다. 이 경우에, 디코더에서의 컴퓨팅 요건들을 더욱 저감시키기 위해 식별된 블록들 중 임의의 블록에 대해 광흐름 추정이 단 1 회만 수행될 필요가 있다. 다시 말하면, 1216에서 생성된 레퍼런스 블록은 현재 프레임의 다른 블록들을 디코딩하는데 사용하기 위해 저장될 수 있다.[00161] According to this modified process, it is guaranteed that the encoder and decoder have the same predictor while the decoder does not need to compute the entire collocated reference frame. It is noted that the reference block(s) for a subsequent block including any extended boundaries may overlap with one or more reference blocks identified during the decoding process of the current block. In this case, the lightflow estimation need only be performed once for any of the identified blocks to further reduce the computing requirements at the decoder. In other words, the reference block generated at 1216 may be stored for use in decoding other blocks of the current frame.

[00162] 하지만, 예측 블록은 디코더에서 생성되며, 인코딩된 비트스트림으로부터의 현재 블록에 대한 디코딩된 잔차는 도 5의 디코더(500)와 관련하여 예로서 설명된 바와 같이 재구성 블록을 형성하기 위해 예측 블록과 결합될 수 있다.[00162] However, the predictive block is generated at the decoder, and the decoded residual for the current block from the encoded bitstream is combined with the predictive block to form a reconstruction block, as described by way of example with respect to the decoder 500 of FIG. 5 . can be

[00163] 프로세스(1200) 이후에 또는 프로세스(1200)와 함께 수행되든지 간에, 1008에서의 예측 프로세스는 현재 프레임이 디코딩될 때까지 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩된 현재 프레임의 모든 블록들에 대해 반복될 수 있다. 디코딩 순서로 블록들을 처리할 때, 광흐름 레퍼런스 프레임 부분을 사용하여 인코딩되지 않은 블록은 종래에, 인코딩된 비트스트림으로부터의 블록에 대해 디코딩된 예측 모드에 따라 종래의 방식으로 디코딩될 수 있다.[00163] Whether performed after process 1200 or in conjunction with process 1200 , the prediction process at 1008 is to be repeated for all blocks of the current frame encoded using the opticalflow reference frame portion until the current frame is decoded. can When processing blocks in decoding order, blocks not encoded using the opticalflow reference frame portion can be decoded in a conventional manner according to the prediction mode decoded for blocks from the conventionally encoded bitstream.

[00164] 프레임 또는 블록 내의 N 개의 픽셀들에 대해, 광흐름 공식의 해를 구하기 위한 복잡도는 O(N * M)으로 나타낼 수 있는데, 여기서 M은 선형 방정식들을 풀기 위한 반복들의 횟수이다. M은 레벨들의 개수 또는 라그랑지 파라미터 λ의 값들의 개수와 관련이 없다. 대신에, M은 선형 방정식을 푸는 데 있어서 계산 정밀도와 관련이 있다. M의 값이 클수록 정밀도가 더 양호해진다. 이러한 복잡성을 고려하면, 프레임 레벨로부터 서브프레임 레벨(예를 들면, 블록 기반)의 추정으로 진행하는 것은 디코더 복잡도를 감소시키는 위한 몇 가지 옵션들을 제공한다. 첫째, 및 모션 필드 평활도의 제약이 블록 경계부들에서는 완화되기 때문에, 블록의 선형 방정식들을 풀 때 해답으로 수렴하기가 더 용이하며, 그래서 유사한 정밀도에 대해 M이 더 작아진다. 둘째, 모션 벡터의 해를 구하는 것은 평활도 페널티 팩터로 인해 그 인접한 모션 벡터들을 포함한다. 블록 경계부들에서의 모션 벡터들은 더 적은 수의 인접한 모션 벡터들을 가지며, 그래서 보다 빠른 계산들이 이루어진다. 셋째, 및 위에서 논의된 바와 같이, 광흐름은 전체 프레임이 아니라, 인터 예측을 위해 병치된 레퍼런스 프레임을 사용하는 그 코딩 블록들에 의해 식별되는 병치된 레퍼런스 프레임의 블록들의 일부에 대해서만 계산될 필요가 있다.[00164] For N pixels in a frame or block, the complexity for solving the light flow formula can be expressed as O(N * M), where M is the number of iterations to solve the linear equations. M has nothing to do with the number of levels or the number of values of the Lagrangian parameter λ. Instead, M is related to the computational precision in solving linear equations. The larger the value of M, the better the precision. Given this complexity, moving from frame level to subframe level (eg block based) estimation provides several options for reducing decoder complexity. First, and since the constraint of motion field smoothness is relaxed at block boundaries, it is easier to converge to a solution when solving the linear equations of a block, so that M is smaller for similar precision. Second, solving a motion vector involves its adjacent motion vectors due to the smoothness penalty factor. Motion vectors at block boundaries have fewer adjacent motion vectors, so faster calculations are made. Third, and as discussed above, the light flow need not be calculated for the entire frame, but only for some of the blocks of the collocated reference frame identified by those coding blocks that use the collocated reference frame for inter prediction. have.

[00165] 설명의 단순화를 위해, 프로세스들(1000, 1100, 및 1200) 각각은 일련의 스텝들 또는 동작들로서 도시 및 기재되어 있다. 하지만, 본 발명에 따른 스텝들 또는 동작들은 다양한 순서들로 및/또는 동시에 발생할 수도 있다. 또한, 본 명세서에 제시 및 기재되지 않은 다른 스텝들 또는 동작들도 사용될 수 있다. 또한, 개시된 주제에 따른 방법을 구현하기 위해 예시된 모든 스텝들 또는 동작들이 필요한 것은 아니다.[00165] For simplicity of explanation, each of processes 1000 , 1100 , and 1200 is shown and described as a series of steps or operations. However, steps or operations according to the present invention may occur in various orders and/or concurrently. In addition, other steps or operations not shown and described herein may also be used. Moreover, not all illustrated steps or acts are required to implement a method in accordance with the disclosed subject matter.

[00166] 전술한 인코딩 및 디코딩의 양태들은 인코딩 및 디코딩 기술들의 몇몇 예들을 예시한다. 하지만, 인코딩 및 디코딩은 이들 용어들이 청구범위에 사용되는 바와 같이, 데이터의 압축, 압축 해제, 변환, 또는 임의의 다른 처리 또는 변경을 의미할 수 있음을 이해해야 한다.[00166] Aspects of encoding and decoding described above exemplify some examples of encoding and decoding techniques. It should be understood, however, that encoding and decoding, as these terms are used in the claims, may mean compression, decompression, transformation, or any other processing or alteration of data.

[00167] "예"라는 단어는 본 명세서에서 예, 사례, 또는 예시로서 기능함을 의미하도록 사용된다. 본 명세서에서 "예"로 기재된 임의의 양태 또는 설계는 반드시 다른 양태들 또는 설계들보다 바람직하거나 유익한 것으로 해석될 필요는 없다. 오히려, "예"라는 단어의 사용은 개념을 구체적인 방식으로 제시하기 위한 것이다. 본 출원에서 사용되는 "또는"이라는 용어는 배타적인 "또는"이 아니라 포괄적인 "또는"을 의미하는 것으로 의도된다. 즉, 달리 특정되거나 문맥으로부터 분명하지 않은 한, "X는 A 또는 B를 포함한다"는 자연스런 포괄적인 순열들(permutations) 중 임의의 것을 의미하는 것으로 의도된다. 즉, X가 A를 포함하고; X가 B를 포함하며; 또는 X가 A와 B 양자 모두를 포함하면, "X는 A 또는 B를 포함한다"는 전술한 사례들 중 임의의 것에서 충족된다. 또한, 본 출원 및 첨부된 청구범위에서 사용되는 관사들 "a" 및 "an"은 단수 형태를 지시하도록 달리 특정되거나 문맥으로부터 분명하지 않은 한 "하나 이상"을 의미하는 것으로 일반적으로 해석되어야 한다. 또한, 전체를 통해서 "일 구현예" 또는 "하나의 구현예"라는 용어의 사용은 동일한 실시예 또는 구현예을 의미하도록 기재되지 않는 한 이와 같이 의미하도록 의도되지 않는다.[00167] The word “example” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “an example” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, the use of the word "yes" is intended to present the concept in a concrete manner. As used herein, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless otherwise specified or clear from context, "X comprises A or B" is intended to mean any of the natural inclusive permutations. That is, X comprises A; X comprises B; or if X includes both A and B, then "X includes A or B" is satisfied in any of the preceding instances. Also, as used in this application and the appended claims, the articles "a" and "an" are to be construed generally to mean "one or more" unless otherwise specified to indicate a singular form or clear from the context. Also, the use of the terms “one embodiment” or “an embodiment” throughout is not intended to mean so unless described to mean the same embodiment or embodiment.

[00168] 송신 스테이션(102) 및/또는 수신 스테이션(106)(및 이에 저장되고 및/또는 인코더(400) 및 디코더(500)에 의한 것을 포함하여, 이에 의해 실행되는 알고리즘들, 방법들, 명령들 등)의 구현예들은 하드웨어, 소프트웨어, 또는 이들의 임의의 조합으로 실현될 수 있다. 하드웨어는 예를 들면, 컴퓨터들, 지적 재산권(IP) 코어들, ASICs(application-specific integrated circuits), 프로그래머블 로직 어레이들, 광학 프로세서들, 프로그래머블 로직 컨트롤러들, 마이크로코드(microcode), 마이크로컨트롤러들, 서버들, 마이크로프로세서들, 디지털 신호 프로세서들, 또는 임의의 다른 적절한 회로를 포함할 수 있다. 청구범위에서, "프로세서"라는 용어는 전술한 하드웨어 중 임의의 것을 단독으로 또는 조합하여 포함하는 것으로 이해되어야 한다. "신호"와 "데이터"라는 용어들은 상호 교환적으로 사용된다. 또한, 송신 스테이션(102)과 수신 스테이션(106)의 부분들은 반드시 동일한 방식으로 구현될 필요는 없다.[00168] Transmitting station 102 and/or receiving station 106 (and algorithms, methods, instructions, etc. stored thereon and/or executed by, including by, encoder 400 and decoder 500 ) Implementations of may be realized in hardware, software, or any combination thereof. Hardware may include, for example, computers, intellectual property (IP) cores, application-specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuitry. In the claims, the term "processor" should be understood to include any of the foregoing hardware, alone or in combination. The terms "signal" and "data" are used interchangeably. Also, portions of the transmitting station 102 and the receiving station 106 are not necessarily implemented in the same way.

[00169] 또한, 일 양태에서, 예를 들면 송신 스테이션(102) 또는 수신 스테이션(106)은 실행될 때 본 명세서에 기재된 각각의 방법들, 알고리즘들, 및/또는 명령들 중 임의의 것을 수행하는 컴퓨터 프로그램을 갖는 범용 컴퓨터 또는 범용 프로세서를 사용하여 구현될 수 있다. 부가적으로 또는 대안적으로, 예를 들면 본 명세서에 기재된 방법들, 알고리즘들, 또는 명령들 중 임의의 것을 수행하기 위한 다른 하드웨어를 포함하는 전용 컴퓨터/프로세서가 사용될 수도 있다.[00169] Further, in an aspect, for example, the transmitting station 102 or the receiving station 106 has a computer program that, when executed, performs any of the respective methods, algorithms, and/or instructions described herein. It may be implemented using a general-purpose computer or general-purpose processor. Additionally or alternatively, a dedicated computer/processor including, for example, other hardware for performing any of the methods, algorithms, or instructions described herein may be used.

[00170] 송신 스테이션(102) 및 수신 스테이션(106)은 예를 들면, 화상 회의 시스템의 컴퓨터들 상에서 구현될 수 있다. 대안적으로, 송신 스테이션(102)은 서버 상에서 구현될 수도 있고, 수신 스테이션(106)은 핸드헬드(hand-held) 통신 디바이스와 같은 서버와 별개인 디바이스 상에서 구현될 수도 있다. 이 경우에, 송신 스테이션(102)은 인코더(400)를 사용하여 콘텐츠를 인코딩된 비디오 신호로 인코딩하고는, 인코딩된 비디오 신호를 통신 디바이스에 전송할 수 있다. 차례로, 그 후, 통신 디바이스는 인코딩된 비디오 신호를 디코더(500)를 사용하여 디코딩할 수 있다. 대안적으로, 통신 디바이스는 통신 디바이스 상에 로컬로 저장된 콘텐츠, 예를 들면, 송신 스테이션(102)에 의해 전송되지 않은 콘텐츠를 디코딩할 수도 있다. 다른 적절한 송신 및 수신 구현 스킴들도 이용 가능하다. 예를 들면, 수신 스테이션(106)은 휴대용 통신 디바이스라기 보다는 일반적으로 고정형의 개인용 컴퓨터일 수 있고 및/또는 인코더(400)를 포함하는 디바이스가 디코더(500)도 또한 포함할 수도 있다.[00170] The transmitting station 102 and the receiving station 106 may be implemented on computers of a videoconferencing system, for example. Alternatively, the transmitting station 102 may be implemented on a server and the receiving station 106 may be implemented on a device separate from the server, such as a hand-held communication device. In this case, the transmitting station 102 may encode the content into an encoded video signal using the encoder 400 and transmit the encoded video signal to the communication device. In turn, the communication device can then decode the encoded video signal using the decoder 500 . Alternatively, the communication device may decode content stored locally on the communication device, eg, content not transmitted by the transmitting station 102 . Other suitable transmit and receive implementation schemes are also available. For example, the receiving station 106 may be a generally stationary personal computer rather than a portable communication device and/or the device including the encoder 400 may also include the decoder 500 .

[00171] 또한, 본 발명의 구현예들의 전부 또는 일부는 예를 들면, 컴퓨터 사용 가능 또는 컴퓨터 판독 가능 매체로부터 액세스 가능한 컴퓨터 프로그램 제품의 형태를 취할 수 있다. 컴퓨터 사용 가능 또는 컴퓨터 판독 가능 매체는 예를 들면, 임의의 프로세서에 의해 또는 프로세서와 연계하여 사용하기 위한 프로그램을 유형적으로(tangibly) 포함, 저장, 통신, 또는 운반할 수 있는 임의의 디바이스일 수 있다. 매체는 예를 들면, 전자, 자기, 광학, 전자기, 또는 반도체 디바이스일 수 있다. 다른 적절한 매체들도 또한 이용 가능하다.[00171] Further, all or some of the embodiments of the present invention may take the form of a computer program product accessible from, for example, a computer-usable or computer-readable medium. A computer-usable or computer-readable medium may be, for example, any device capable of tangibly containing, storing, communicating, or carrying a program for use by or in connection with any processor. . The medium may be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device. Other suitable media are also available.

[00172] 추가 구현예들은 이하의 예들에 요약되어 있다.[00172] Additional implementations are summarized in the examples below.

[00173] 예 1: 비디오 시퀀스에서 예측될 제1 프레임을 결정하는 단계; 제1 프레임의 순방향 인터 예측을 위해 비디오 시퀀스로부터 제1 레퍼런스 프레임을 결정하는 단계; 제1 프레임의 역방향 인터 예측을 위해 비디오 시퀀스로부터 제2 레퍼런스 프레임을 결정하는 단계; 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임을 사용하여 광흐름 추정을 수행함으로써, 제1 프레임의 인터 예측을 위한 광흐름 레퍼런스 프레임을 생성하는 단계; 및 광흐름 레퍼런스 프레임을 사용하여 제1 프레임에 대한 예측 프로세스를 수행하는 단계를 포함하는 방법.[00173] Example 1: determining a first frame to be predicted in a video sequence; determining a first reference frame from the video sequence for forward inter prediction of the first frame; determining a second reference frame from the video sequence for backward inter prediction of the first frame; generating an optical flow reference frame for inter prediction of the first frame by performing optical flow estimation using the first reference frame and the second reference frame; and performing a prediction process on the first frame using the lightflow reference frame.

[00174] 예 2: 예 1의 방법에서, 광흐름 레퍼런스 프레임을 생성하는 단계는: 제1 프레임의 각각의 픽셀들에 대해 라그랑지 함수를 최소화함으로써 광흐름 추정을 수행하는 단계를 포함한다.[00174] Example 2: The method of example 1, wherein generating the light flow reference frame includes: performing light flow estimation by minimizing a Lagrangian function for each pixel of the first frame.

[00175] 예 3: 예 1 또는 2의 방법에서, 광흐름 추정은 제1 프레임의 픽셀들에 대한 각각의 모션 필드를 생성하고, 광흐름 레퍼런스 프레임을 생성하는 단계는: 제1 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 제2 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 및 광흐름 레퍼런스 프레임을 형성하기 위해 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계를 포함한다.[00175] Example 3: The method of examples 1 or 2, wherein the lightflow estimation generates respective motion fields for pixels of a first frame, and generating the lightflow reference frame comprises: forming a first warped reference frame warping the first reference frame to the first frame using the motion fields to warping the second reference frame into the first frame using the motion fields to form a second warped reference frame; and blending the first warped reference frame and the second warped reference frame to form an optical flow reference frame.

[00176] 예 4: 예 3의 방법에서, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계는: 제1 레퍼런스 프레임과 제2 레퍼런스 프레임 사이 및 현재 프레임과 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임 각각의 사이의 거리들을 사용하여, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임의 병치된 픽셀 값들을 스케일링함으로써 병치된 픽셀 값들을 결합하는 단계를 포함한다.[00176] Example 4: The method of example 3, wherein blending the first warped reference frame and the second warped reference frame comprises: between the first reference frame and the second reference frame and between the current frame and the first reference frame and the second reference and combining the collocated pixel values by scaling the collocated pixel values of the first warped reference frame and the second warped reference frame using the distances between each frame.

[00177] 예 5: 예 3 또는 예 4의 방법에서, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계는: 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임의 병치된 픽셀 값들을 결합하는 것 또는 제1 워핑된 레퍼런스 프레임 또는 제2 워핑된 레퍼런스 프레임 중 하나의 단일 픽셀 값을 사용하는 것 중 어느 하나에 의해 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 단계를 포함한다.[00177] Example 5: The method of Examples 3 or 4, wherein blending the first warped reference frame and the second warped reference frame comprises: combining collocated pixel values of the first warped reference frame and the second warped reference frame. filling the pixel positions of the lightflow reference frame either by combining or using a single pixel value of one of the first warped reference frame or the second warped reference frame.

[00178] 예 6: 예 3 내지 예 5 중 어느 하나의 예의 방법에서, 방법은: 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 사용하여 제1 레퍼런스 프레임에서 폐색을 검출하는 단계를 더 포함하고, 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계는: 제2 워핑된 레퍼런스 프레임으로부터의 픽셀 값들로 폐색에 대응하는 광흐름 레퍼런스 프레임의 픽셀 위치들을 채우는 단계를 포함한다.[00178] Example 6: The method of any of Examples 3-5, the method further comprising: detecting occlusion in the first reference frame using the first warped reference frame and the second warped reference frame; Blending the first warped reference frame and the second warped reference frame includes: filling pixel positions of the lightflow reference frame corresponding to the occlusion with pixel values from the second warped reference frame.

[00179] 예 7: 예 1 내지 예 6 중 어느 하나의 예의 방법에서, 예측 프로세스를 수행하는 단계는: 제1 프레임의 블록들의 단일 레퍼런스 인터 예측을 위해서만 광흐름 레퍼런스 프레임을 사용하는 단계를 포함한다.[00179] Example 7: The method of any of Examples 1-6, wherein performing the prediction process includes: using the opticalflow reference frame only for single reference inter prediction of blocks of the first frame.

[00180] 예 8: 예 1 내지 예 7 중 어느 하나의 예의 방법에서, 제1 레퍼런스 프레임은 제1 프레임의 순방향 인터 예측에 이용 가능한 제1 프레임에 대해 비디오 시퀀스의 디스플레이 순서로 가장 가까운 재구성된 프레임이고, 제2 레퍼런스 프레임은 제1 프레임의 역방향 인터 예측에 이용 가능한 제1 프레임에 대해 디스플레이 순서로 가장 가까운 재구성된 프레임이다.[00180] Example 8: The method of any of examples 1-7, wherein the first reference frame is the closest reconstructed frame in display order of the video sequence to the first frame available for forward inter prediction of the first frame, The 2 reference frame is the closest reconstructed frame in display order to the first frame available for backward inter prediction of the first frame.

[00181] 예 9: 예 1 내지 예 8 중 어느 하나의 예의 방법에서, 예측 프로세스를 수행하는 단계는: 제1 프레임의 제1 블록과 병치된 광흐름 레퍼런스 프레임 내의 레퍼런스 블록을 결정하는 단계; 및 레퍼런스 블록과 제1 블록의 잔차를 인코딩하는 단계를 포함한다.[00181] Example 9: The method of any of Examples 1-8, wherein performing the prediction process comprises: determining a reference block in an opticalflow reference frame collocated with a first block of a first frame; and encoding the residual of the reference block and the first block.

[00182] 예 10: 장치는: 프로세서; 및 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하는 비일시적 저장 매체를 포함하고, 방법은: 비디오 시퀀스에서 예측될 제1 프레임을 결정하는 단계; 제1 프레임의 순방향 인터 예측을 위한 제1 레퍼런스 프레임의 이용 가능성 및 제1 프레임의 역방향 인터 예측을 위한 제2 레퍼런스 프레임의 이용 가능성을 결정하는 단계; 제1 레퍼런스 프레임 및 제2 레퍼런스 프레임 양자 모두의 이용 가능성을 결정하는 것에 응답하여: 광흐름 추정을 사용하여 제1 레퍼런스 프레임과 제2 레퍼런스 프레임을 사용하여 제1 프레임의 픽셀들에 대한 각각의 모션 필드를 생성하는 단계; 제1 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 제2 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 및 제1 프레임의 블록들의 인터 예측을 위한 광흐름 레퍼런스 프레임을 형성하기 위해 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계를 포함한다.[00182] Example 10: A device includes: a processor; and a non-transitory storage medium comprising instructions executable by a processor to perform the method, the method comprising: determining a first frame to be predicted in a video sequence; determining availability of a first reference frame for forward inter prediction of the first frame and availability of a second reference frame for backward inter prediction of the first frame; In response to determining the availability of both the first reference frame and the second reference frame: respective motions for pixels of the first frame using the first reference frame and the second reference frame using lightflow estimation creating a field; warping the first reference frame into the first frame using the motion fields to form a first warped reference frame; warping the second reference frame into the first frame using the motion fields to form a second warped reference frame; and blending the first warped reference frame and the second warped reference frame to form an optical flow reference frame for inter prediction of blocks of the first frame.

[00183] 예 11: 예 10의 장치에서, 방법은: 광흐름 레퍼런스 프레임을 사용하여 제1 프레임에 대한 예측 프로세스를 수행하는 단계를 더 포함한다.[00183] Example 11: The apparatus of example 10, the method further comprising: performing a prediction process on the first frame using the opticalflow reference frame.

[00184] 예 12: 예 10 또는 예 11의 장치에서, 방법은: 제1 프레임의 블록들의 단일 레퍼런스 인터 예측을 위해서만 광흐름 레퍼런스 프레임을 사용하는 단계를 더 포함한다.[00184] Example 12: The apparatus of examples 10 or 11, the method further comprising: using the opticalflow reference frame only for single reference inter prediction of blocks of the first frame.

[00185] 예 13: 예 10 내지 예 12 중 어느 하나의 예의 장치에서, 각각의 모션 필드를 생성하는 단계는: 제1 레퍼런스 프레임과 제2 레퍼런스 프레임을 사용하여 제1 프레임의 각각의 픽셀들에 대한 라그랑지 함수의 출력을 계산하는 단계를 포함한다.[00185] Example 13: The apparatus of any one of examples 10-12, wherein generating each motion field comprises: a Lagrangian for respective pixels of the first frame using the first reference frame and the second reference frame and calculating the output of the function.

[00186] 예 14: 예 13의 장치에서, 라그랑지 함수의 출력을 계산하는 단계는: 라그랑지 파라미터에 대한 제1 값을 사용하여 현재 프레임의 픽셀들에 대한 제1 모션 필드들의 세트를 계산하는 단계; 및 현재 프레임의 픽셀들에 대한 개선된 모션 필드들의 세트를 계산하기 위해 라그랑지 파라미터에 대한 제2 값을 사용하는 라그랑지 함수에 대한 입력으로서 제1 모션 필드들의 세트를 사용하는 단계를 포함하고, 라그랑지 파라미터에 대한 제2 값은 라그랑지 파라미터에 대한 제1 값보다 더 작고, 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스는 개선된 모션 필드들의 세트를 사용하여 워핑된다.[00186] Example 14: The apparatus of example 13, wherein calculating the output of the Lagrangian function comprises: calculating a first set of motion fields for pixels of the current frame using the first value for the Lagrangian parameter; and using the first set of motion fields as input to a Lagrangian function that uses the second value for the Lagrangian parameter to compute a set of enhanced motion fields for pixels of the current frame, The second value for the Lagrangian parameter is smaller than the first value for the Lagrangian parameter, and the first warped reference frame and the second warped reference are warped using the set of enhanced motion fields.

[00187] 예 15: 장치는: 프로세서; 및 방법을 수행하기 위해 프로세서에 의해 실행 가능한 명령들을 포함하는 비일시적 저장 매체를 포함하고, 방법은: 광흐름 추정을 위해 제1 처리 레벨에서 제1 프레임의 픽셀들에 대한 모션 필드들을 초기화함으로써 ― 제1 처리 레벨은 제1 프레임 내의 다운스케일된 모션을 나타내고 다수의 레벨들 중 하나의 레벨을 포함함 ― , 비디오 시퀀스로부터의 제1 레퍼런스 프레임 및 비디오 시퀀스의 제2 레퍼런스 프레임을 사용하여 비디오 시퀀스의 제1 프레임의 인터 예측을 위한 광흐름 레퍼런스 프레임을 생성하는 단계; 다수의 레벨들 중 각각의 레벨에 대해: 제1 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 제2 워핑된 레퍼런스 프레임을 형성하기 위해 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 광흐름 추정을 사용하여 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임 사이의 모션 필드들을 추정하는 단계; 및 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임 사이의 모션 필드들을 사용하여 제1 프레임의 픽셀들에 대한 모션 필드들을 업데이트하는 단계; 다수의 레벨들 중 최종 레벨에 대해: 최종의 제1 워핑된 레퍼런스 프레임을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제1 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 최종의 제2 워핑된 레퍼런스 프레임을 형성하기 위해 업데이트된 모션 필드들을 사용하여 제2 레퍼런스 프레임을 제1 프레임으로 워핑하는 단계; 및 광흐름 레퍼런스 프레임을 형성하기 위해 최종의 제1 워핑된 레퍼런스 프레임과 제2 워핑된 레퍼런스 프레임을 블렌딩하는 단계를 포함한다. [00187] Example 15: A device includes: a processor; and a non-transitory storage medium comprising instructions executable by a processor to perform the method, the method comprising: initializing motion fields for pixels of a first frame at a first processing level for lightflow estimation by: wherein the first processing level represents downscaled motion within the first frame and comprises one of a number of levels of the video sequence using a first reference frame from the video sequence and a second reference frame of the video sequence. generating an optical flow reference frame for inter prediction of the first frame; for each one of the plurality of levels: warping the first reference frame into the first frame using the motion fields to form a first warped reference frame; warping the second reference frame into the first frame using the motion fields to form a second warped reference frame; estimating motion fields between the first warped reference frame and the second warped reference frame using optical flow estimation; and updating motion fields for pixels of the first frame using the motion fields between the first warped reference frame and the second warped reference frame; for a last level of the plurality of levels: warping the first reference frame to the first frame using the updated motion fields to form a final first warped reference frame; warping the second reference frame to the first frame using the updated motion fields to form a final second warped reference frame; and blending the final first warped reference frame and the second warped reference frame to form a lightflow reference frame.

[00188] 예 16: 예 15의 장치에서, 광흐름 추정은 프레임의 각각의 픽셀들에 대해 라그랑지 함수를 사용한다.[00188] Example 16: The apparatus of example 15, wherein the lightflow estimation uses a Lagrangian function for each pixel of the frame.

[00189] 예 17: 예 16의 장치에서, 방법은 다수의 레벨들 중 각각의 레벨에 대해: 제1 레퍼런스 프레임을 워핑하는 단계, 제2 레퍼런스 프레임을 워핑하는 단계, 모션 필드들을 추정하는 단계, 및 모션 필드들을 업데이트하는 단계의 제1 반복(first iteration)에 대해 라그랑지 함수의 라그랑지 파라미터를 최대 값으로 초기화하는 단계; 및 라그랑지 파라미터에 대해 한 세트의 가능한 값들 중 점점 더 작은 값들을 사용하여, 제1 레퍼런스 프레임을 워핑하는 단계, 제2 레퍼런스 프레임을 워핑하는 단계, 모션 필드들을 추정하는 단계, 및 모션 필드들을 업데이트하는 단계의 추가 반복(additional iteration)을 수행하는 단계를 더 포함한다.[00189] Example 17: The apparatus of example 16, the method, for each one of the plurality of levels: warping the first reference frame, warping the second reference frame, estimating motion fields, and a motion field initializing a Lagrangian parameter of the Lagrangian function to a maximum value for a first iteration of updating s; and using increasingly smaller of the set of possible values for the Lagrangian parameter, warping the first reference frame, warping the second reference frame, estimating motion fields, and updating the motion fields. It further includes the step of performing an additional iteration of the step (additional iteration).

[00190] 예 18: 예 16 또는 예 17의 장치에서, 모션 필드들을 추정하는 단계는: 수평축, 수직축, 및 시간에 대해 제1 워핑된 레퍼런스 프레임 및 제2 워핑된 레퍼런스 프레임의 픽셀 값들의 도함수들을 계산하는 단계; 레벨이 최종 레벨과 상이하다는 것에 응답하여 도함수들을 다운스케일링하는 단계; 도함수들을 사용하여 라그랑지 함수를 나타내는 선형 방정식들을 푸는 단계를 포함한다.[00190] Example 18: The apparatus of examples 16 or 17, wherein estimating the motion fields comprises: calculating derivatives of pixel values of the first warped reference frame and the second warped reference frame with respect to a horizontal axis, a vertical axis, and time ; downscaling the derivatives in response to the level being different from the final level; and solving linear equations representing the Lagrangian function using the derivatives.

[00191] 예 19: 예 15 내지 예 18 중 어느 하나의 예의 장치에서, 방법은 광흐름 레퍼런스 프레임을 사용하여 제1 프레임을 인터 예측하는 단계를 더 포함한다.[00191] Example 19: The apparatus of any one of examples 15-18, the method further comprising inter-predicting the first frame using the opticalflow reference frame.

[00192] 예 20: 예 15 내지 예 19 중 어느 하나의 예의 장치에서, 프로세서와 비일시적 저장 매체는 디코더를 형성한다.[00192] Example 20: The apparatus of any one of examples 15-19, wherein the processor and the non-transitory storage medium form a decoder.

[00193] 전술한 실시예들, 구현예들, 및 양태들은 본 발명의 이해를 수월하게 하기 위해 기재되었으며 본 발명을 제한하지 않는다. 그 반대로, 본 발명은 첨부된 청구항들의 범위 내에 포함된 다양한 변형들 및 등가의 배열들을 포함하도록 의도되며, 이러한 범위는 법에 의해 허용되는 모든 이러한 변형들 및 동등한 구조를 포괄하도록 가장 넓은 해석에 따르게 된다.[00193] The foregoing examples, implementations, and aspects have been described to facilitate understanding of the invention and do not limit the invention. On the contrary, this invention is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the broadest interpretation so as to cover all such modifications and equivalent structures permitted by law. do.

Claims

determining a first frame to be predicted in the video sequence;
determining a first reference frame from the video sequence for forward inter prediction of the first frame;
determining a second reference frame from the video sequence for backward inter prediction of the first frame;
generating an optical flow reference frame for inter prediction of the first frame by performing optical flow estimation using the first frame, the first reference frame, and the second reference frame; performing light flow estimation includes performing regression analysis while maintaining the first reference frame and the second reference frame at original scales;
performing an inter prediction process on the first frame using the optical flow reference frame;
containing,
Way.

According to claim 1,
The generating the light flow reference frame includes performing the light flow estimation by minimizing the Lagrangian function for each pixel of the first frame.
Way.

According to claim 1,
The light flow estimation generates each motion field for pixels of the first frame, and the step of generating the light flow reference frame comprises:
warping the first reference frame into the first frame using motion fields to form a first warped reference frame;
warping the second reference frame to the first frame using the motion fields to form a second warped reference frame; and
blending the first warped reference frame and the second warped reference frame to form the light flow reference frame;
containing,
Way.

4. The method of claim 3,
The blending of the first warped reference frame and the second warped reference frame may include: between the first reference frame and the second reference frame and between the first frame and the first reference frame and the second reference frame. combining the collocated pixel values by scaling the collocated pixel values of the first warped reference frame and the second warped reference frame using distances between each frame;
Way.

4. The method of claim 3,
The blending of the first warped reference frame and the second warped reference frame may include combining collocated pixel values of the first warped reference frame and the second warped reference frame or the first warped reference frame. populating pixel positions of the lightflow reference frame by using a single pixel value of either a reference frame or the second warped reference frame.
Way.

4. The method of claim 3,
detecting occlusion in the first reference frame using the first warped reference frame and the second warped reference frame;
The blending of the first warped reference frame and the second warped reference frame may include filling pixel positions of the lightflow reference frame corresponding to the occlusion with pixel values from the second warped reference frame. containing,
Way.

According to claim 1,
The step of performing the prediction process comprises:
generating a prediction block for the current block of the first frame by performing a motion search within the optical flow reference frame; or
By using a motion vector decoded from an encoded bitstream to generate a predictive block using pixels of the lightflow reference frame,
using the optical flow reference frame only for single reference inter prediction of blocks of the first frame,
Way.

According to claim 1,
The first reference frame is a reconstruction frame that is closest to the first frame in a display order of the video sequence and can be used for forward inter prediction of the first frame,
The second reference frame is a reconstruction frame that is closest to the first frame in the display order and can be used for backward inter prediction of the first frame,
Way.

According to claim 1,
The step of performing the prediction process comprises:
determining a reference block collocated with a first block of the first frame in the light flow reference frame; and
encoding the residual of the reference block and the first block;
containing,
Way.

An apparatus comprising a processor and a non-transitory storage medium, comprising:
The non-transitory storage medium includes instructions executable by the processor to perform a method, the method comprising:
determining a first frame to be predicted in the video sequence;
determining availability of a first reference frame for forward inter prediction of the first frame and a second reference frame for backward prediction of the first frame;
In response to determining that both the first reference frame and the second reference frame are available,
generating respective motion fields for pixels of the first frame using the first reference frame and the second reference frame using a pyramid structure for light flow estimation, wherein the motion field comprises the first scaling derivatives of pixel values of the first reference frame and the second reference frame with respect to a horizontal axis, a vertical axis, and time, instead of pixel values of the reference frame and the second reference frame;
warping the first reference frame into the first frame using motion fields to form a first warped reference frame;
warping the second reference frame into the first frame using the motion fields to form a second warped reference frame; and
forming an optical flow reference frame for inter prediction of blocks of the first frame by blending the first warped reference frame and the second warped reference frame
containing,
Device.

11. The method of claim 10,
The method further comprises performing a prediction process using inter prediction for the first frame using the optical flow reference frame,
The prediction process is
generating a prediction block for a current block of the first frame by performing a motion search within the optical flow reference frame; or
Using a motion vector decoded from an encoded bitstream to generate a predictive block using pixels of the lightflow reference frame.
containing one of
Device.

11. The method of claim 10,
The method further comprises excluding the optical flow reference frame from the composite inter prediction of blocks of the first frame,
Device.

11. The method of claim 10,
generating each motion field comprises calculating an output of a Lagrangian function for each pixel of the first frame using the first reference frame and the second reference frame;
Device.

14. The method of claim 13,
Calculating the output of the Lagrangian function comprises:
calculating a first set of motion fields for pixels of a current frame using the first value for the Lagrangian parameter; and
using the first set of motion fields as input to a Lagrangian function using a second value for the Lagrangian parameter to compute an improved set of motion fields for the pixels of the current frame.
including,
The second value for the Lagrangian parameter is smaller than the first value for the Lagrangian parameter, and the first warped reference frame and the second warped reference frame are warped using the improved set of motion fields. felled,
Device.

11. The method of claim 10,
The method is
performing a synthetic reference inter prediction process on a first block of the first frame using a forward reference frame and a backward reference frame other than the optical flow reference frame; and
performing a single reference inter prediction process for the first block of the first frame using the optical flow reference frame;
further comprising,
Device.

A device comprising a processor, comprising:
The processor is
generating an optical flow reference frame for inter prediction of a first frame of the video sequence using a first reference frame from the video sequence and a second reference frame of the video sequence; and
performing an inter prediction process of at least one block of the first frame using the optical flow reference frame;
configured to perform a method comprising:
The step of generating the light flow reference frame includes:
initializing motion fields for pixels of a first frame at a first processing level for lightflow estimation, wherein the first processing level is indicative of downscaled motion within the first frame and of one of a plurality of levels. including level ―;
For each level of the plurality of levels:
warping the first reference frame into the first frame using the motion fields to form a first warped reference frame;
warping the second reference frame into the first frame using the motion fields to form a second warped reference frame;
estimating motion fields between the first warped reference frame and the second warped reference frame using the optical flow estimation; and
updating motion fields for pixels of the first frame using motion fields between the first warped reference frame and the second warped reference frame;
For the last one of the multiple levels:
warping the first reference frame into the first frame using the updated motion fields to form a final first warped reference frame;
warping the second reference frame to the first frame using the updated motion fields to form a final second warped reference frame; and
forming the light flow reference frame by blending the final first warped reference frame and the second warped reference frame;
made by
The light flow estimate is based on the first reference relative to a horizontal axis, a vertical axis, and time, instead of scaling pixel values of the first reference frame and the second reference frame for levels below the final level. a regression analysis performed while the first reference frame and the second reference frame are maintained at their original scale by scaling the derivatives of the pixel values of the frame and the second reference frame;
Device.

17. The method of claim 16,
The light flow estimation uses a Lagrangian function for each pixel of a frame,
Device.

18. The method of claim 17,
The method includes: for each level of the plurality of levels:
Lagrangian parameter of the Lagrangian function for a first iteration of warping the first reference frame, warping the second reference frame, estimating the motion fields, and updating the motion fields initializing to a maximum value; and
warping the first reference frame using increasingly smaller of a set of possible values for the Lagrangian parameter, warping the second reference frame, estimating the motion fields, and performing additional iterations of updating the motion fields.
further comprising,
Device.

19. The method of claim 18,
Estimating the motion fields comprises:
calculating the derivatives;
downscaling the derivatives in response to the level being different from the final level; and
Solving linear equations representing the Lagrangian function using the derivatives
containing,
Device.

17. The method of claim 16,
The processor forms a decoder,
Device.