JP2022539657A

JP2022539657A - Adaptive resolution change in video processing

Info

Publication number: JP2022539657A
Application number: JP2021570430A
Authority: JP
Inventors: サーヴァー，モハメッド，ジー．; ルオ，ジャンコン; イエ，ヤン
Original assignee: アリババグループホウルディングリミテッド
Priority date: 2019-06-24
Filing date: 2020-05-29
Publication date: 2022-09-13
Also published as: EP3981155A1; US11843790B2; EP3981155A4; CN114128262A; US20240089473A1; KR20220024659A; US20200404297A1; US20230045775A9; US20220086469A1; US11190781B2; WO2020263499A1

Abstract

本開示は、映像のエンコーディング及びデコーディング中に適応解像度変更を行うシステム及び方法を提供する。方法は、ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、を含む。The present disclosure provides systems and methods for adaptive resolution change during video encoding and decoding. The method compares resolutions of a target picture and a first reference picture, and a first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions. and resampling the reference picture of the second reference picture, and encoding or decoding the target picture using the second reference picture.

Description

関連出願の相互参照
[0001] 本開示は、共に全体として本明細書に援用される、２０１９年６月２４日に出願された米国仮特許出願第６２／８６５，９２７号、及び２０１９年９月１３日に出願された米国仮特許出願第６２／９００，４３９号に対する優先権の利益を主張するものである。 Cross-reference to related applications
[0001] This disclosure is disclosed in U.S. Provisional Patent Application Nos. 62/865,927, filed June 24, 2019, and U.S. Provisional Patent Application No. 62/865,927, filed September 13, 2019, both of which are incorporated herein in their entirety. This application claims the benefit of priority to U.S. Provisional Patent Application No. 62/900,439.

技術分野
[0002] 本開示は、一般に、映像処理に関し、特に、映像符号化において、適応解像度変更を行う方法及びシステムに関する。 Technical field
[0002] This disclosure relates generally to video processing, and more particularly to methods and systems for adaptive resolution change in video coding.

背景
[0003] 映像は、視覚情報を捕捉する一連の静止ピクチャ（又は「フレーム」）である。記憶メモリ及び伝送帯域幅を減少させるために、映像は、記憶又は伝送前に圧縮され、表示前に復元され得る。圧縮プロセスは、通常、エンコーディングと呼ばれ、復元プロセスは、通常、デコーディングと呼ばれる。最も一般的には、予測、変換、量子化、エントロピー符号化、及びインループフィルタリングに基づく、標準化映像符号化技術を用いる様々な映像符号化フォーマットが存在する。特定の映像符号化フォーマットを指定する、ＨＥＶＣ（High Efficiency Video Coding）／H.265標準規格、ＶＶＣ（Versatile Video Coding）／H.266標準規格、ＡＶＳ標準規格などの映像符号化標準規格が、標準化機関によって開発されている。ますます高度な映像符号化技術が、映像標準規格に採用されるにつれて、新しい映像符号化標準規格の符号化効率は、ますます高くなる。 background
[0003] A video is a series of still pictures (or "frames") that capture visual information. To reduce storage memory and transmission bandwidth, the video may be compressed before storage or transmission and decompressed before display. The compression process is usually called encoding and the decompression process is usually called decoding. There are various video coding formats that use standardized video coding techniques, most commonly based on prediction, transform, quantization, entropy coding, and in-loop filtering. Video coding standards such as HEVC (High Efficiency Video Coding)/H.265 standard, VVC (Versatile Video Coding)/H.266 standard, and AVS standard, which specify specific video coding formats, have been standardized. developed by an institution. As more and more advanced video coding techniques are adopted in video standards, the coding efficiency of new video coding standards is getting higher and higher.

開示の概要
[0004] 本開示の実施形態は、映像のエンコーディング又はデコーディング中に適応解像度変更を行う方法を提供する。ある例示的実施形態では、この方法は、ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、を含む。 Summary of Disclosure
[0004] Embodiments of the present disclosure provide a method for adaptive resolution change during encoding or decoding of video. In one exemplary embodiment, the method includes comparing resolutions of a target picture and a first reference picture; Re-sampling a first reference picture to generate a picture, and encoding or decoding a target picture using a second reference picture.

[0005] 本開示の実施形態は、映像のエンコーディング又はデコーディング中に適応解像度変更を行うデバイスも提供する。ある例示的実施形態では、デバイスは、コンピュータ命令を保存する１つ又は複数のメモリと、コンピュータ命令を実行して、デバイスに、ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、を行わせるように構成された１つ又は複数のプロセッサと、を含む。 [0005] Embodiments of the present disclosure also provide devices for adaptive resolution change during encoding or decoding of video. In an exemplary embodiment, the device comprises one or more memories storing computer instructions and executing the computer instructions to instruct the device to compare resolutions of a target picture and a first reference picture; resampling the first reference picture to generate a second reference picture in response to the picture and the first reference picture having different resolutions; using the second reference picture; and one or more processors configured to perform encoding or decoding of the target picture.

[0006] 本開示の実施形態は、一組の命令を保存する非一時的コンピュータ可読媒体を提供し、前記一組の命令は、適応解像度変更の方法をコンピュータシステムに行わせるように、コンピュータシステムの少なくとも１つのプロセッサによって実行可能である。ある例示的実施形態では、この方法は、ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、を含む。 [0006] An embodiment of the present disclosure provides a non-transitory computer-readable medium storing a set of instructions, the set of instructions for causing a computer system to perform a method of adaptive resolution change. can be executed by at least one processor of In one exemplary embodiment, the method includes comparing resolutions of a target picture and a first reference picture; Re-sampling a first reference picture to generate a picture, and encoding or decoding a target picture using a second reference picture.

図面の簡単な説明
[0007] 本開示の実施形態及び様々な局面は、以下の詳細な説明及び添付の図面に示される。図面に示される様々なフィーチャは、一定の縮尺で描かれていない。 Brief description of the drawing
[0007] Embodiments and various aspects of the present disclosure are illustrated in the following detailed description and accompanying drawings. Various features illustrated in the drawings are not drawn to scale.

[0008]開示の実施形態と一致した、例示的映像エンコーダを示す模式図である。[0008] Fig. 2 is a schematic diagram illustrating an exemplary video encoder, consistent with disclosed embodiments; [0009]開示の実施形態と一致した、例示的映像デコーダを示す模式図である。[0009] FIG. 4 is a schematic diagram illustrating an exemplary video decoder, consistent with disclosed embodiments; [0010]開示の実施形態と一致した、参照ピクチャの解像度が現在のピクチャとは異なる例を示す。[0010] FIG. 7 illustrates an example where a reference picture has a different resolution than a current picture, consistent with an embodiment of the disclosure. [0011]開示の実施形態と一致した、ＶＶＣ（Versatile Video Coding）において、ルマ成分に使用されるサブペル（sub-pel）動き補償インターポレーションフィルタを示す表である。[0011] Fig. 5 is a table showing sub-pel motion compensated interpolation filters used for luma components in Versatile Video Coding (VVC), consistent with disclosed embodiments; [0012]開示の実施形態と一致した、ＶＶＣにおいて、クロマ成分に使用されるサブペル動き補償インターポレーションフィルタを示す表である。[0012] Fig. 4 is a table showing sub-pel motion compensated interpolation filters used for chroma components in VVC, consistent with disclosed embodiments; [0013]開示の実施形態と一致した、例示的参照ピクチャバッファを示す。[0013] FIG. 4 illustrates an exemplary reference picture buffer, consistent with disclosed embodiments. [0014]開示の実施形態と一致した、３つの異なる解像度を含むサポート解像度セットの一例を示す表である。[0014] Fig. 4 is a table showing an example of a supported resolution set including three different resolutions consistent with disclosed embodiments; [0015]開示の実施形態と一致した、再サンプリングされた参照ピクチャ及びオリジナルの参照ピクチャの両方が保存される場合の例示的デコードピクチャバッファ（ＤＰＢ）を示す。[0015] FIG. 4 illustrates an exemplary decoded picture buffer (DPB) when both resampled and original reference pictures are preserved, consistent with disclosed embodiments. [0016]開示の実施形態と一致した、漸進的ダウンサンプリングを示す。[0016] Fig. 3 illustrates progressive downsampling consistent with disclosed embodiments; [0017]開示の実施形態と一致した、解像度変更を伴う例示的映像符号化プロセスを示す。[0017] FIG. 4 illustrates an exemplary video encoding process with resolution change consistent with disclosed embodiments. [0018]開示の実施形態と一致した、例示的ダウンサンプリングフィルタを示す表である。[0018] Fig. 4 is a table showing exemplary downsampling filters consistent with disclosed embodiments; [0019]開示の実施形態と一致した、ピーク信号対雑音比（ＰＳＮＲ）計算を使用する例示的映像符号化プロセスを示す。[0019] FIG. 4 illustrates an exemplary video encoding process using peak signal-to-noise ratio (PSNR) calculation consistent with disclosed embodiments. [0020]開示の実施形態と一致した、例示的低域通過フィルタの周波数応答を示す。[0020] FIG. 4 illustrates a frequency response of an exemplary low pass filter consistent with disclosed embodiments; [0021]開示の実施形態と一致した、６タップフィルタを示す表である。[0021] Fig. 6 is a table showing a 6-tap filter, consistent with the disclosed embodiment; [0022]開示の実施形態と一致した、８タップフィルタを示す表である。[0022] Fig. 4 is a table showing an 8-tap filter consistent with disclosed embodiments; [0023]開示の実施形態と一致した、４タップフィルタを示す表である。[0023] Fig. 4 is a table showing a 4-tap filter consistent with disclosed embodiments; [0024]開示の実施形態と一致した、２：１ダウンサンプリングに関するインターポレーションフィルタ係数を示す表である。[0024] Fig. 4 is a table showing interpolation filter coefficients for 2:1 downsampling, consistent with disclosed embodiments; [0025]開示の実施形態と一致した、１．５：１ダウンサンプリングに関するインターポレーションフィルタ係数を示す表である。[0025] Fig. 6 is a table showing interpolation filter coefficients for 1.5:1 downsampling, consistent with disclosed embodiments; [0026]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的ルマサンプルインターポレーションフィルタリングプロセスを示す。[0026] FIG. 4 illustrates an exemplary luma sample interpolation filtering process for reference downsampling, consistent with disclosed embodiments. [0027]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的クロマサンプルインターポレーションフィルタリングプロセスを示す。[0027] FIG. 4 illustrates an exemplary chroma sample interpolation filtering process for reference downsampling, consistent with disclosed embodiments. [0028]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的ルマサンプルインターポレーションフィルタリングプロセスを示す。[0028] FIG. 4 illustrates an exemplary luma sample interpolation filtering process for reference downsampling, consistent with disclosed embodiments. [0029]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的クロマサンプルインターポレーションフィルタリングプロセスを示す。[0029] Fig. 4 illustrates an exemplary chroma sample interpolation filtering process for reference downsampling, consistent with disclosed embodiments; [0030]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的ルマサンプルインターポレーションフィルタリングプロセスを示す。[0030] FIG. 4 illustrates an exemplary luma sample interpolation filtering process for reference downsampling, consistent with disclosed embodiments. [0031]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的クロマサンプルインターポレーションフィルタリングプロセスを示す。[0031] FIG. 4 illustrates an exemplary chroma sample interpolation filtering process for reference downsampling, consistent with disclosed embodiments. [0032]開示の実施形態と一致した、参照ダウンサンプリングに関する例示的クロマ小数サンプル位置計算を示す。[0032] FIG. 4 illustrates an exemplary chroma fractional sample position calculation for reference downsampling, consistent with disclosed embodiments. [0001]開示の実施形態と一致した、２：１比の参照ダウンサンプリングを使用するＭＣインターポレーションに関する８タップフィルタを示す表である。1 is a table showing an 8-tap filter for MC interpolation using a 2:1 ratio reference downsampling, consistent with disclosed embodiments; [0002]開示の実施形態と一致した、１．５：１比の参照ダウンサンプリングを使用するＭＣインターポレーションに関する８タップフィルタを示す表である。[0002] Fig. 4 is a table showing an 8-tap filter for MC interpolation using a 1.5:1 ratio reference downsampling, consistent with disclosed embodiments; [0003]開示の実施形態と一致した、２：１比の参照ダウンサンプリングを使用するＭＣインターポレーションに関する８タップフィルタを示す表である。[0003] Fig. 4 is a table showing an 8-tap filter for MC interpolation using a 2:1 ratio reference downsampling, consistent with disclosed embodiments; [0004]開示の実施形態と一致した、１．５：１比の参照ダウンサンプリングを使用するＭＣインターポレーションに関する８タップフィルタを示す表である。[0004] Fig. 4 is a table showing an 8-tap filter for MC interpolation using a 1.5:1 ratio reference downsampling, consistent with disclosed embodiments; [0005]開示の実施形態と一致した、２：１比の参照ダウンサンプリングを使用するルマ４×４ブロックＭＣインターポレーションに関する６タップフィルタ係数を示す表である。[0005] Fig. 4 is a table showing 6-tap filter coefficients for luma 4x4 block MC interpolation using a 2:1 ratio reference downsampling, consistent with disclosed embodiments; [0006]開示の実施形態と一致した、１．５：１比の参照ダウンサンプリングを使用するルマ４×４ブロックＭＣインターポレーションに関する６タップフィルタ係数を示す表である。[0006] Fig. 6 is a table showing 6-tap filter coefficients for luma 4x4 block MC interpolation using a 1.5:1 ratio reference downsampling, consistent with disclosed embodiments;

詳細な説明
[0007] 以下に添付の図面に示される例示的実施形態に詳細に言及する。以下の説明は、別段の説明のない限り、異なる図面の同じ番号が、同じ又は類似の要素を表す、添付の図面を参照する。例示的実施形態の以下の説明に記載する実施態様は、本発明と一致した全ての実施態様を表すわけではない。代わりに、それらは、添付の特許請求の範囲に記載される本発明に関連する態様と一致した装置及び方法の例に過ぎない。以下に、本開示の特定の態様をより詳細に記載する。援用された用語及び／又は定義と矛盾する場合は、本明細書に提供される用語及び定義が優先する。 detailed description
[0007] Reference will now be made in detail to exemplary embodiments illustrated in the accompanying drawings. The following description refers to the accompanying drawings, where like numbers in different drawings represent the same or similar elements, unless otherwise stated. The implementations set forth in the following description of exemplary embodiments do not represent all implementations consistent with the present invention. Instead, they are merely examples of apparatus and methods consistent with related aspects of the present invention as recited in the appended claims. Certain aspects of the disclosure are described in greater detail below. In the event of conflict with the incorporated terms and/or definitions, the terms and definitions provided herein control.

[0008] 映像は、視覚情報を保存するために、時系列で配置された一連の静止ピクチャ（又は「フレーム」）である。映像キャプチャデバイス（例えば、カメラ）を使用して、これらのピクチャを時系列で捕捉及び保存することができ、映像再生デバイス（例えば、テレビ、コンピュータ、スマートフォン、タブレットコンピュータ、ビデオプレーヤー、又は表示機能を備えた任意のエンドユーザ端末）を使用して、このようなピクチャを時系列で表示することができる。また、用途によっては、監視、会議の開催、又は生放送などのために、映像キャプチャデバイスは、捕捉された映像を映像再生デバイス（例えば、モニタを備えたコンピュータ）にリアルタイムで伝送することができる。 [0008] A video is a series of still pictures (or "frames") arranged in time sequence to preserve visual information. A video capture device (e.g., camera) can be used to capture and store these pictures in chronological order, and a video playback device (e.g., television, computer, smart phone, tablet computer, video player, or display capability). Any end-user terminal equipped with a terminal) can be used to display such pictures in chronological order. Also, in some applications, the video capture device can transmit the captured video in real time to a video playback device (e.g., a computer with a monitor) for monitoring, conferencing, live broadcasting, and the like.

[0009] このような用途で必要とされる記憶空間及び伝送帯域幅を減少させるために、映像は圧縮され得る。例えば、映像は、記憶及び伝送前に圧縮され、表示前に復元され得る。圧縮及び復元は、プロセッサ（例えば、汎用コンピュータのプロセッサ）又は専用ハードウェアによって実行されるソフトウェアによって実施され得る。圧縮用のモジュールは、一般に「エンコーダ」と呼ばれ、復元用のモジュールは、一般に「デコーダ」と呼ばれる。エンコーダ及びデコーダは、まとめて「コーデック」と呼ばれることがある。エンコーダ及びデコーダは、様々な適切なハードウェア、ソフトウェア、又はこれらの組み合わせの何れかとして実装され得る。例えば、エンコーダ及びデコーダのハードウェア実装は、１つ又は複数のマイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、離散論理、又はこれらの任意の組み合わせなどの回路網を含み得る。エンコーダ及びデコーダのソフトウェア実装は、プログラムコード、コンピュータ実行可能命令、ファームウェア、又はコンピュータ可読媒体に固定された、任意の適切なコンピュータ実施アルゴリズム若しくはプロセスを含み得る。映像圧縮及び復元は、MPEG-1、MPEG-2、MPEG-4、H.26x系などの様々なアルゴリズム又は標準規格によって実施され得る。用途によっては、コーデックが、第１の符号化標準規格から映像を復元し、第２の符号化標準規格を用いて復元映像を再圧縮することができ、この場合、コーデックは、「トランスコーダ」と呼ばれることがある。 [0009] To reduce the storage space and transmission bandwidth required in such applications, the video may be compressed. For example, video may be compressed before storage and transmission and decompressed before display. Compression and decompression may be performed by software executed by a processor (eg, a processor of a general purpose computer) or dedicated hardware. The modules for compression are commonly called "encoders" and the modules for decompression are commonly called "decoders". Encoders and decoders are sometimes collectively referred to as "codecs." Encoders and decoders may be implemented as any of a variety of suitable hardware, software, or combinations thereof. For example, the hardware implementation of the encoder and decoder may be one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, or any of these. may include circuitry such as combinations of A software implementation of the encoder and decoder may include program code, computer-executable instructions, firmware, or any suitable computer-implemented algorithm or process embodied in a computer-readable medium. Video compression and decompression may be performed by various algorithms or standards such as MPEG-1, MPEG-2, MPEG-4, H.26x family. Depending on the application, a codec may recover video from a first encoding standard and recompress the recovered video using a second encoding standard, in which case the codec is a "transcoder." It is sometimes called

[0010] 映像エンコーディングプロセスは、ピクチャの再構成のために使用することができる有用な情報を識別及び保持することができる。映像エンコーディングプロセスにおいて無視された情報を完全に再構成することができない場合、エンコーディングプロセスは、「不可逆」と呼ばれることがある。そうでなければ、それは、「可逆」と呼ばれることがある。ほとんどのエンコーディングプロセスは、不可逆であり、これは、必要とされる記憶空間及び伝送帯域幅を減少させるためのトレードオフである。 [0010] The video encoding process can identify and retain useful information that can be used for picture reconstruction. If the information ignored in the video encoding process cannot be completely reconstructed, the encoding process is sometimes called "lossy." Otherwise it may be called "reversible". Most encoding processes are irreversible, a trade-off to reduce storage space and transmission bandwidth required.

[0011] 多くの場合、（「現在のピクチャ」と呼ばれる）エンコードされているピクチャの有用な情報は、参照ピクチャ（例えば、前にエンコードされた、又は再構成されたピクチャ）に対する変化を含み得る。このような変化は、ピクセルの位置変化、輝度変化、又は色変化を含む場合があり、中でも、位置変化は、最も懸念される。物体を表すピクセル群の位置変化は、参照ピクチャ及び現在のピクチャ間の物体の動きを反映し得る。 [0011] In many cases, useful information in a picture being encoded (referred to as the "current picture") may include changes relative to a reference picture (eg, a previously encoded or reconstructed picture). . Such changes may include pixel position changes, brightness changes, or color changes, of which position changes are of most concern. A change in position of pixels representing an object may reflect movement of the object between the reference picture and the current picture.

[0012] 半分の帯域幅でＨＥＶＣ／H.265と同じ主観的品質を達成するために、ＪＶＥＴは、joint exploration model（「ＪＥＭ」）参照ソフトウェアを使用して、ＨＥＶＣを超える技術を開発している。符号化技術がＪＥＭに組み込まれたため、ＪＥＭは、ＨＥＶＣよりも大幅に高い符号化性能を達成した。ＶＣＥＧ及びＭＰＥＧも、ＨＥＶＣを超える次世代映像圧縮標準規格（Versatile Video Coding（ＶＶＣ／H.266）標準規格）の開発を正式に開始した。 [0012] In order to achieve the same subjective quality as HEVC/H.265 at half the bandwidth, JVET has developed a technology that surpasses HEVC using the joint exploration model ("JEM") reference software. there is JEM achieved significantly higher coding performance than HEVC because the coding technology was embedded in JEM. VCEG and MPEG have also officially started development of a next-generation video compression standard (Versatile Video Coding (VVC/H.266) standard) that surpasses HEVC.

[0013] ＶＶＣ標準規格は、より良い圧縮性能を提供する、さらなる符号化技術を加え続けている。ＶＶＣは、ＨＥＶＣ、H.264／ＡＶＣ、MPEG2、H.263などの近代の映像圧縮標準規格で使用されてきたのと同じ映像符号化システムで実施され得る。図１は、開示の実施形態と一致した例示的映像エンコーダ１００を示す模式図である。例えば、映像エンコーダ１００は、映像ブロック、又は映像ブロックのパーティション若しくはサブパーティションを含む、映像フレーム内のブロックのイントラ符号化又はインター符号化を行い得る。イントラ符号化は、所与の映像フレーム内の映像の空間的冗長性を減少させるため、又は除去するために、空間予測に依存し得る。インター符号化は、映像シーケンスの隣接フレーム内の映像の時間的冗長性を減少させるため、又は除去するために、時間予測に依存し得る。イントラモードは、幾つかの空間ベースの圧縮モードを指す場合があり、（単予測又は双予測などの）インターモードは、幾つかの時間ベースの圧縮モードを指す場合がある。 [0013] The VVC standard continues to add additional encoding techniques that provide better compression performance. VVC can be implemented in the same video coding systems that have been used in modern video compression standards such as HEVC, H.264/AVC, MPEG2, H.263. FIG. 1 is a schematic diagram illustrating an exemplary video encoder 100 consistent with disclosed embodiments. For example, video encoder 100 may perform intra- or inter-coding of blocks within video frames, including video blocks or partitions or sub-partitions of video blocks. Intra-coding may rely on spatial prediction to reduce or remove spatial redundancy in video within a given video frame. Inter-coding may rely on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames of a video sequence. Intra-modes may refer to some spatial-based compression modes, and inter-modes (such as uni-prediction or bi-prediction) may refer to some temporal-based compression modes.

[0014] 図１を参照して、入力映像信号１０２は、ブロックごとに処理され得る。例えば、映像ブロックユニットは、１６×１６ピクセルブロック（例えば、マクロブロック（ＭＢ））でもよい。ＨＥＶＣでは、拡張ブロックサイズ（例えば、符号化ユニット（ＣＵ））を使用して、例えば１０８０ｐ以上の解像度の映像信号を圧縮することができる。ＨＥＶＣでは、ＣＵは、最大６４×６４ルマサンプル、及び対応するクロマサンプルを含み得る。ＶＶＣでは、ＣＵのサイズは、１２８×１２８ルマサンプル及び対応するクロマサンプルを含むようにさらに増大させることができる。ＣＵは、別個の予測法が適用され得る予測ユニット（ＰＵ）に区分化され得る。各入力映像ブロック（例えば、ＭＢ、ＣＵ、ＰＵなど）は、空間予測ユニット１６０又は時間予測ユニット１６２を使用することによって処理され得る。 [0014] Referring to Figure 1, an input video signal 102 may be processed block by block. For example, a video block unit may be a 16×16 pixel block (eg, macroblock (MB)). In HEVC, extended block sizes (eg, coding units (CUs)) can be used to compress video signals with a resolution of, eg, 1080p or higher. In HEVC, a CU can contain up to 64x64 luma samples and corresponding chroma samples. In VVC, the size of the CU can be further increased to contain 128x128 luma samples and corresponding chroma samples. A CU may be partitioned into prediction units (PUs) to which separate prediction methods may be applied. Each input video block (eg, MB, CU, PU, etc.) may be processed by using spatial prediction unit 160 or temporal prediction unit 162 .

[0015] 空間予測ユニット１６０は、現在のＣＵに対して、現在のＣＵを包含する同じピクチャ／スライスに関する情報を使用して、空間予測（例えば、イントラ予測）を行う。空間予測は、現在の映像ブロックを予測するために、同じ映像ピクチャ／スライス内の既に符号化された隣接ブロックからのピクセルを使用し得る。空間予測は、映像信号に固有の空間的冗長性を減少させることができる。時間予測（例えば、インター予測又は動き補償予測）は、現在の映像ブロックを予測するために、既に符号化された映像ピクチャからのサンプルを使用し得る。時間予測は、映像信号に固有の時間的冗長性を減少させることができる。 [0015] Spatial prediction unit 160 performs spatial prediction (eg, intra prediction) on the current CU using information about the same picture/slice that contains the current CU. Spatial prediction may use pixels from already coded neighboring blocks within the same video picture/slice to predict the current video block. Spatial prediction can reduce the spatial redundancy inherent in video signals. Temporal prediction (eg, inter prediction or motion compensated prediction) may use samples from already coded video pictures to predict the current video block. Temporal prediction can reduce the temporal redundancy inherent in video signals.

[0016] 時間予測ユニット１６２は、現在のＣＵを包含するピクチャ／スライスとは異なる１つ又は複数のピクチャ／１つ又は複数のスライスからの情報を使用して、現在のＣＵに対して時間予測（例えば、インター予測）を行う。映像ブロックに関する時間予測は、１つ又は複数の動きベクトルによって信号化され得る。動きベクトルは、現在のブロックと、参照フレーム内の現在のブロックの予測ブロックの１つ又は複数との間の動きの量及び方向を示し得る。複数の参照ピクチャがサポートされる場合、１つ又は複数の参照ピクチャインデックスが、映像ブロックに関して送られてもよい。１つ又は複数の参照インデックスを使用して、時間予測信号が、デコードピクチャバッファ（ＤＰＢ）１６４（参照ピクチャ記憶装置１６４とも呼ばれる）内のどの１つ又は複数の参照ピクチャに由来し得るかを識別することができる。空間予測又は時間予測後に、エンコーダ内のモード決定及びエンコーダ制御ユニット１８０が、例えば、レート歪み最適化法に基づいて、予測モードを選び得る。予測ブロックは、加算器１１６において、現在の映像ブロックから減算され得る。予測残差は、変換ユニット１０４によって変換され、量子化ユニット１０６によって量子化され得る。量子化残差係数は、逆量子化ユニット１１０において逆量子化され、及び逆変換ユニット１１２において逆変換されることによって、再構成残差を形成し得る。再構成ブロックが、加算器１２６において予測ブロックに加算されることによって、再構成映像ブロックが形成され得る。再構成映像ブロックが、参照ピクチャ記憶装置１６４に加えられ、及び将来の映像ブロックの符号化に使用される前に、非ブロック化フィルタ及び適応ループフィルタ１６６などのインループフィルタリングが、再構成映像ブロックに適用され得る。出力映像ビットストリーム１２０を形成するために、符号化モード（例えば、インター又はイントラ）、予測モード情報、動き情報、及び量子化残差係数がエントロピー符号化ユニット１０８に送られて、ビットストリーム１２０を形成するように圧縮及びパックされ得る。 [0016] Temporal prediction unit 162 performs temporal prediction for the current CU using information from one or more pictures/slices that are different from the picture/slice that contains the current CU. (eg, inter-prediction). A temporal prediction for a video block may be signaled by one or more motion vectors. A motion vector may indicate the amount and direction of motion between a current block and one or more of the current block's predictive blocks in a reference frame. If multiple reference pictures are supported, one or more reference picture indices may be sent for a video block. One or more reference indices are used to identify from which one or more reference pictures in a decoded picture buffer (DPB) 164 (also called reference picture storage 164) the temporal prediction signal may come from. can do. After spatial prediction or temporal prediction, mode decision and encoder control unit 180 within the encoder may choose a prediction mode, for example, based on a rate-distortion optimization method. The prediction block may be subtracted from the current video block at summer 116 . The prediction residual may be transformed by transform unit 104 and quantized by quantization unit 106 . The quantized residual coefficients may be inverse quantized at inverse quantization unit 110 and inverse transformed at inverse transform unit 112 to form a reconstructed residual. The reconstructed block may be added to the prediction block at adder 126 to form a reconstructed video block. Before the reconstructed video block is added to the reference picture store 164 and used to encode future video blocks, in-loop filtering such as the deblocking filter and the adaptive loop filter 166 is applied to the reconstructed video block. can be applied to To form the output video bitstream 120, the coding mode (e.g., inter or intra), prediction mode information, motion information, and quantized residual coefficients are sent to entropy encoding unit 108 to convert the bitstream 120 into It can be compressed and packed to form.

[0017] 開示の実施形態と一致して、映像エンコーダ１００の上記のユニットは、ソフトウェアモジュール（例えば、異なる機能を実現するコンピュータプログラム）、ハードウェアコンポーネント（例えば、それぞれの機能を行うための異なる回路網ブロック）、又はソフトウェア及びハードウェアのハイブリッドとして実装されてもよい。 [0017] Consistent with the disclosed embodiments, the above-described units of video encoder 100 include software modules (eg, computer programs that perform different functions), hardware components (eg, different circuits for performing respective functions). network block), or as a hybrid of software and hardware.

[0018] 図２は、開示の実施形態と一致した、例示的映像デコーダ２００を示す模式図である。図２を参照して、映像ビットストリーム２０２は、エントロピーデコーディングユニット２０８において、アンパック又はエントロピーデコーディングされ得る。符号化モード又は予測情報が、予測ブロックを形成するために、空間予測ユニット２６０（例えば、イントラ符号化される場合）又は時間予測ユニット２６２（例えば、インター符号化される場合）に送られ得る。インター符号化される場合、予測情報は、予測ブロックサイズ、（例えば、動きの方向及び量を示し得る）１つ若しくは複数の動きベクトル、又は（例えば、どの参照ピクチャから予測信号が取得されるかを示し得る）１つ若しくは複数の参照インデックスを含み得る。 [0018] Figure 2 is a schematic diagram illustrating an exemplary video decoder 200, consistent with the disclosed embodiments. Referring to FIG. 2, video bitstream 202 may be unpacked or entropy decoded in entropy decoding unit 208 . Coding mode or prediction information may be sent to spatial prediction unit 260 (eg, if intra-coded) or temporal prediction unit 262 (eg, if inter-coded) to form a prediction block. When inter-coded, the prediction information may be the prediction block size, one or more motion vectors (which may indicate, for example, the direction and amount of motion), or (for example, from which reference picture the prediction is taken). ) may include one or more reference indices.

[0019] 時間予測ブロックを形成するために、動き補償予測が、時間予測ユニット２６２によって適用され得る。残差ブロックを再構成するために、残差変換係数が、逆量子化ユニット２１０及び逆変換ユニット２１２に送られ得る。予測ブロック及び残差ブロックは、２２６で合算され得る。再構成ブロックは、デコードピクチャバッファ（ＤＰＢ）２６４（参照ピクチャ記憶装置２６４とも呼ばれる）に保存される前に、（ループフィルタ２６６によって）インループフィルタリングを経てもよい。ＤＰＢ２６４の再構成映像を使用して、ディスプレイデバイスを駆動すること、又は将来の映像ブロックを予測することができる。デコード映像２２０が、ディスプレイ上に表示され得る。 [0019] Motion compensated prediction may be applied by temporal prediction unit 262 to form a temporal prediction block. The residual transform coefficients may be sent to inverse quantization unit 210 and inverse transform unit 212 to reconstruct a residual block. The prediction block and residual block may be summed 226 . The reconstructed block may undergo in-loop filtering (by loop filter 266) before being stored in decoded picture buffer (DPB) 264 (also called reference picture store 264). The reconstructed video of the DPB 264 can be used to drive a display device or predict future video blocks. A decoded video 220 may be displayed on the display.

[0020] 開示の実施形態と一致して、映像デコーダ２００の上記のユニットは、ソフトウェアモジュール（例えば、異なる機能を実現するコンピュータプログラム）、ハードウェアコンポーネント（例えば、それぞれの機能を行うための異なる回路網ブロック）、又はソフトウェア及びハードウェアのハイブリッドとして実装されてもよい。 [0020] Consistent with the disclosed embodiments, the above-described units of the video decoder 200 are comprised of software modules (eg, computer programs that perform different functions), hardware components (eg, different circuits for performing respective functions). network block), or as a hybrid of software and hardware.

[0021] ＶＶＣ標準規格の目的の１つは、ネットワーク及びデバイスの多様性を許容する能力をテレビ会議用途に提供することである。特に、ＶＶＣ標準規格は、ネットワーク状態が悪化したときに、エンコードビットレートを急速に減少させ、ネットワーク状態が改善したときに、映像品質を急速に高めることを含む、変化するネットワーク環境に急速に適応する能力を提供する必要がある。加えて、同じコンテンツの複数の表現を提供する適応ストリーミングサービスの場合、複数の表現のそれぞれは、異なる特性（例えば、空間解像度、又はサンプルビット深度）を有する場合があり、映像品質は、低から高まで変化し得る。したがって、ＶＶＣ標準規格は、適応ストリーミングサービスのために、高速表現切り換えをサポートする必要がある。ある表現から別の表現への切り換え（ある解像度から別の解像度への切り換えなど）の間に、ＶＶＣ標準規格は、高速及びシームレスな切り換え能力を損なうことなく、効率的な予測構造の使用を可能にする必要がある。 [0021] One of the goals of the VVC standard is to provide videoconferencing applications with the ability to tolerate a diversity of networks and devices. In particular, the VVC standard adapts rapidly to changing network environments, including rapidly decreasing the encoding bitrate when network conditions deteriorate and rapidly increasing video quality when network conditions improve. should provide the ability to Additionally, for adaptive streaming services that provide multiple representations of the same content, each of the multiple representations may have different characteristics (e.g., spatial resolution, or sample bit depth), with video quality varying from low to can vary to high. Therefore, the VVC standard needs to support fast representation switching for adaptive streaming services. During switching from one representation to another (such as switching from one resolution to another), the VVC standard allows the use of efficient prediction structures without compromising fast and seamless switching capabilities. need to be

[0022] 本開示と一致した幾つかの実施形態では、解像度を変更するために、エンコーダ（例えば、図１のエンコーダ１００）は、参照ピクチャバッファ（例えば、図１のＤＰＢ１６４、及び図２のＤＰＢ２６４）のコンテンツを消去するために、瞬時デコーダリフレッシュ（ＩＤＲ（instantaneous-decoder-refresh））符号化ピクチャを送る。ＩＤＲ符号化ピクチャの受信時に、デコーダ（例えば、図２のデコーダ２００）は、参照バッファ内の全てのピクチャに対して、「参照に使用しない」と標識を付ける。全ての後続の送信ピクチャが、ＩＤＲピクチャより前にデコードされた何れのフレームも参照することなく、デコードされ得る。符号化映像シーケンス内の第１のピクチャは、常にＩＤＲピクチャである。 [0022] In some embodiments consistent with this disclosure, to change resolution, an encoder (eg, encoder 100 of FIG. 1) uses reference picture buffers (eg, DPB 164 of FIG. 1 and DPB 264 of FIG. 2). ), send an instantaneous-decoder-refresh (IDR) coded picture. Upon receipt of an IDR encoded picture, the decoder (eg, decoder 200 of FIG. 2) marks all pictures in the reference buffer as "unused for reference." All subsequent transmitted pictures may be decoded without reference to any frames decoded prior to the IDR picture. The first picture in an encoded video sequence is always an IDR picture.

[0023] 本開示と一致した幾つかの実施形態では、適応解像度変更（ＡＲＣ（adaptive resolution change））技術を用いて、映像ストリームが、新しいＩＤＲピクチャを必要とすることなく、及びスケーラブル映像コーデックの場合のようにマルチレイヤを必要とすることなく、同じ映像シーケンス内の符号化ピクチャ間の空間解像度を変更することを可能にし得る。ＡＲＣ技術によれば、解像度切り換え点において、現在符号化されているピクチャは、（利用可能であれば）同じ解像度の参照ピクチャから予測されるか、又は参照ピクチャを再サンプリングすることによって、異なる解像度の参照ピクチャから予測される。参照ピクチャ「Ｒｅｆ０」の解像度が、現在符号化されているピクチャの解像度と同じである、適応解像度変更の図を図３に示す。しかしながら、参照ピクチャ「Ｒｅｆ１」及び「Ｒｅｆ２」の解像度は、現在のピクチャの解像度とは異なる。現在のピクチャの動き補償予測信号を生成するために、「Ｒｅｆ１」及び「Ｒｅｆ２」の両方が、現在のピクチャの解像度に再サンプリングされる。 [0023] In some embodiments consistent with the present disclosure, adaptive resolution change (ARC) techniques are used to allow a video stream to be processed without the need for new IDR pictures and with scalable video codecs. It may allow changing the spatial resolution between coded pictures within the same video sequence without requiring multiple layers as in the case. According to the ARC technique, at a resolution switch point, the currently coded picture is either predicted from a reference picture of the same resolution (if available), or a different resolution by resampling the reference picture. is predicted from reference pictures. A diagram of adaptive resolution change is shown in FIG. 3, where the resolution of the reference picture 'Ref0' is the same as the resolution of the currently encoded picture. However, the resolutions of the reference pictures 'Ref1' and 'Ref2' are different from the resolution of the current picture. Both 'Ref1' and 'Ref2' are resampled to the resolution of the current picture to generate the motion compensated prediction signal for the current picture.

[0024] 本開示と一致して、参照フレームの解像度が現在のフレームの解像度とは異なる場合、動き補償予測信号を生成する１つの方法は、まず、参照ピクチャが、現在のピクチャと同じ解像度に再サンプリングされ、及び動きベクトルを用いる既存の動き補償プロセスが適用され得る、ピクチャベースの再サンプリングである。動きベクトルは、（再サンプリングが適用される前に、動きベクトルがユニットで送られる場合）スケーリングされてもよく、又は（再サンプリングが適用された後に、動きベクトルがユニットで送られる場合）スケーリングされなくてもよい。ピクチャベースの再サンプリングを用いた場合、特に、参照ピクチャダウンサンプリング（すなわち、参照ピクチャの解像度が、現在のピクチャの解像度よりも大きい場合）に関して、ダウンサンプリングが、通常、低域フィルタリングの後にデシメーションが続くことによって達成されるため、情報が、動き補償インターポレーションの前に参照再サンプリングステップで失われ得る。 [0024] Consistent with this disclosure, if the resolution of the reference frame is different from the resolution of the current frame, one method of generating a motion-compensated prediction signal is to first convert the reference picture to the same resolution as the current picture. It is a picture-based resampling that can be resampled and existing motion compensation processes using motion vectors can be applied. The motion vector may be scaled (if the motion vector is sent in units before resampling is applied) or scaled (if the motion vector is sent in units after resampling is applied). It doesn't have to be. With picture-based resampling, particularly for reference picture downsampling (i.e., when the resolution of the reference picture is greater than the resolution of the current picture), downsampling is usually followed by decimation after low-pass filtering. As achieved by following, information may be lost in the reference resampling step prior to motion compensated interpolation.

[0025] 別の方法は、再サンプリングがブロックレベルで行われる、ブロックベースの再サンプリングである。これは、現在のブロックによって使用される１つ又は複数の参照ピクチャを調べることによって行われ、それらの一方又は両方が、現在のピクチャとは異なる解像度を有する場合、再サンプリングが、サブペル動き補償インターポレーションプロセスと組み合わせて行われる。 [0025] Another method is block-based resampling, where resampling is done at the block level. This is done by looking at one or more reference pictures used by the current block, and if one or both of them have a different resolution than the current picture, then resampling is done with sub-pel motion compensated interpolating. It is done in conjunction with the poration process.

[0026] 本開示は、ＡＲＣと共に使用するためのピクチャベースの再サンプリング法及びブロックベースの再サンプリング法の両方を提供する。以下の説明は、まず、開示するピクチャベースの再サンプリング法について触れ、次いで、開示するブロックベースの再サンプリング法について触れる。 [0026] This disclosure provides both picture-based and block-based resampling methods for use with ARC. The following discussion first addresses the disclosed picture-based resampling method and then the disclosed block-based resampling method.

[0027] 開示するピクチャベースの再サンプリング法は、従来のブロックベースの再サンプリング法によって生じる幾つかの問題を解決することができる。第一に、従来のブロックレベルの再サンプリング法では、参照ピクチャの解像度が現在のピクチャの解像度とは異なると判断されたときに、オンザフライのブロックレベルの再サンプリングが行われ得る。しかしながら、エンコーダが、動き探索中の各探索点において、ブロックを再サンプリングしなければならない場合があるため、ブロックレベルの再サンプリングプロセスは、エンコーダ設計を複雑にし得る。動き探索は、一般に、エンコーダにとって、時間のかかるプロセスであり、したがって、動き探索中のオンザフライ要件は、動き探索プロセスを複雑にし得る。オンザフライのブロックレベルの再サンプリングの代用として、エンコーダは、再サンプリングされた参照ピクチャが現在のピクチャと同じ解像度を有するように、参照ピクチャを事前に再サンプリングしてもよい。しかしながら、これは、動き推定及び動き補償中の予測信号を異ならせ得るため、符号化効率を低下させ得る。 [0027] The disclosed picture-based resampling method can solve several problems caused by conventional block-based resampling methods. First, in conventional block-level resampling methods, on-the-fly block-level resampling may be performed when the reference picture resolution is determined to be different from the current picture resolution. However, the block-level resampling process can complicate the encoder design because the encoder may have to resample the block at each search point during motion search. Motion search is generally a time consuming process for an encoder, so the on-the-fly requirement during motion search can complicate the motion search process. As an alternative to on-the-fly block-level resampling, the encoder may pre-resample the reference picture such that the resampled reference picture has the same resolution as the current picture. However, this may lead to different prediction signals during motion estimation and motion compensation, thus reducing coding efficiency.

[0028] 第二に、従来のブロックベースの再サンプリング法では、ブロックレベルの再サンプリングの設計は、ＳｂＴＭＶＰ（subblock-based temporal motion vector prediction）、アフィン動き補償予測、ＤＭＶＲ（decoder side motion vector refinement）などの幾つかの他の有用な符号化ツールと互換性がない。ＡＲＣが有効にされると、これらの符号化ツールは、無効にされ得る。しかし、このような無効化は、符号化性能を大幅に低下させる。 [0028] Second, in conventional block-based resampling methods, the design of block-level resampling includes subblock-based temporal motion vector prediction (SbTMVP), affine motion compensation prediction, decoder side motion vector refinement (DMVR). incompatible with some other useful encoding tools such as When ARC is enabled, these encoding tools can be disabled. However, such invalidation significantly degrades coding performance.

[0029] 第三に、当該分野で公知の任意のサブペル動き補償インターポレーションフィルタが、ブロック再サンプリングに使用され得るが、それは、ブロックアップサンプリング及びブロックダウンサンプリングに対して同様に適用可能ではない場合がある。例えば、参照ピクチャが現在のピクチャよりも低い解像度を有する場合、動き補償に使用されるインターポレーションフィルタが、アップサンプリングに使用され得る。しかしながら、参照ピクチャが現在のピクチャよりも高い解像度を有する場合、インターポレーションフィルタが、整数位置をフィルタリングすることができず、したがって、エイリアシングを生じさせ得るため、インターポレーションフィルタは、ダウンサンプリングには適さない。例えば、表１（図４）は、小数サンプル位置の様々な値と共に、ルマ成分整数位置に使用される例示的インターポレーションフィルタ係数を示す。表２（図５）は、小数サンプル位置の様々な値と共に、クロマ成分に使用される例示的インターポレーションフィルタ係数を示す。表１（図４）及び表２（図５）に示されるように、小数サンプル位置０（すなわち、整数位置）では、インターポレーションフィルタが適用されない。 [0029] Third, although any sub-pel motion compensated interpolation filter known in the art can be used for block resampling, it is similarly not applicable for block upsampling and block downsampling. Sometimes. For example, if the reference picture has a lower resolution than the current picture, the interpolation filters used for motion compensation may be used for upsampling. However, if the reference picture has a higher resolution than the current picture, the interpolation filter may not be able to filter the integer positions and thus cause aliasing. is not suitable. For example, Table 1 (FIG. 4) shows exemplary interpolation filter coefficients used for luma component integer positions along with various values for fractional sample positions. Table 2 (FIG. 5) shows exemplary interpolation filter coefficients used for the chroma components along with various values for the fractional sample positions. As shown in Table 1 (FIG. 4) and Table 2 (FIG. 5), at fractional sample position 0 (ie, integer position), no interpolation filter is applied.

[0030] 従来のブロックベースの再サンプリング法に関連する上記の問題を回避するため、本開示は、ＡＲＣに使用することができる、ピクチャレベルの再サンプリング法を提供する。ピクチャ再サンプリングプロセスは、アップサンプリング又はダウンサンプリングを伴い得る。アップサンプリングは、画像の２次元（２Ｄ）表現を維持しつつ、空間解像度を増加させることである。アップサンプリングプロセスでは、参照ピクチャの解像度は、隣接する利用可能なサンプルから、利用できないサンプルを補間することによって、増加する。ダウンサンプリングプロセスでは、参照画像の解像度は減少する。 [0030] To avoid the above problems associated with conventional block-based resampling methods, this disclosure provides a picture-level resampling method that can be used for ARC. The picture resampling process may involve upsampling or downsampling. Upsampling is increasing the spatial resolution while maintaining a two-dimensional (2D) representation of the image. In the upsampling process, the resolution of a reference picture is increased by interpolating unavailable samples from neighboring available samples. In the downsampling process the resolution of the reference image is reduced.

[0031] 開示する実施形態によれば、再サンプリングは、ピクチャレベルで行うことができる。ピクチャレベルの再サンプリングでは、参照ピクチャの解像度が現在のピクチャの解像度とは異なる場合、参照ピクチャは、現在のピクチャの解像度に再サンプリングされる。現在のピクチャの動き推定及び／又は補償は、再サンプリングされた参照ピクチャに基づいて行われ得る。このようにして、ブロックレベル動作が解像度変更に依存しないため、ＡＲＣは、エンコーダ及びデコーダにおいて「トランスペアレントな」やり方で実施され得る。 [0031] According to disclosed embodiments, resampling can be done at the picture level. In picture-level resampling, a reference picture is resampled to the resolution of the current picture if the resolution of the reference picture is different from the resolution of the current picture. Motion estimation and/or compensation for the current picture may be performed based on the resampled reference pictures. In this way, ARC can be implemented in a 'transparent' manner at the encoder and decoder, as the block-level operation does not depend on resolution changes.

[0032] ピクチャレベルの再サンプリングは、オンザフライで、すなわち、現在のピクチャが予測されている間に行われ得る。幾つかの例示的実施形態では、オリジナルの（再サンプリングされていない）参照ピクチャのみが、デコードピクチャバッファ（ＤＰＢ）、例えば、エンコーダ１００（図１）のＤＰＢ１６４、及びデコーダ２００（図２）のＤＢＰ２６４に保存される。ＤＰＢは、ＶＶＣの現在のバージョンと同じやり方で管理され得る。開示するピクチャレベルの再サンプリング法では、ピクチャのエンコーディング又はデコーディングが行われる前に、ＤＰＢ内の参照ピクチャの解像度が現在のピクチャの解像度とは異なる場合に、エンコーダ又はデコーダが、参照ピクチャを再サンプリングする。幾つかの実施形態では、再サンプリングピクチャバッファを使用して、現在のピクチャに関する全ての再サンプリングされた参照ピクチャを保存してもよい。参照ピクチャの再サンプリングされたピクチャは、再サンプリングピクチャバッファに保存され、現在のピクチャの動き探索／補償は、再サンプリングピクチャバッファからのピクチャを使用して行われる。エンコーディング又はデコーディングが完了した後に、再サンプリングピクチャバッファは除去される。図６は、低解像度参照ピクチャが再サンプリングされ、再サンプリングピクチャバッファに保存される、オンザフライのピクチャレベルの再サンプリングの一例を示す。図６に示すように、エンコーダ又はデコーダのＤＰＢは、３つの参照ピクチャを包含する。参照ピクチャ「Ｒｅｆ０」及び「Ｒｅｆ２」の解像度は、現在のピクチャの解像度と同じである。したがって、「Ｒｅｆ０」及び「Ｒｅｆ２」は、再サンプリングされる必要はない。しかしながら、参照ピクチャ「Ｒｅｆ１」の解像度は、現在のピクチャの解像度とは異なり、したがって、再サンプリングされる必要がある。よって、再サンプリングされた「Ｒｅｆ１」のみが参照ピクチャバッファに保存され、「Ｒｅｆ０」及び「Ｒｅｆ２」は、参照ピクチャバッファに保存されない。 [0032] Picture-level resampling may be done on-the-fly, ie, while the current picture is being predicted. In some exemplary embodiments, only original (non-resampled) reference pictures are stored in decoded picture buffers (DPBs), eg, DPB 164 of encoder 100 (FIG. 1) and DBP 264 of decoder 200 (FIG. 2). stored in The DPB can be managed in the same way as the current version of VVC. In the disclosed picture-level resampling method, before the picture is encoded or decoded, the encoder or decoder re-sampling the reference picture if the resolution of the reference picture in the DPB is different from the resolution of the current picture. to sample. In some embodiments, a resampled picture buffer may be used to store all resampled reference pictures for the current picture. A resampled picture of the reference picture is stored in a resampled picture buffer, and motion search/compensation for the current picture is performed using the picture from the resampled picture buffer. After the encoding or decoding is completed, the resampled picture buffer is removed. FIG. 6 shows an example of on-the-fly picture-level resampling in which a low-resolution reference picture is resampled and stored in a resampled picture buffer. As shown in FIG. 6, the encoder or decoder DPB contains three reference pictures. The resolution of the reference pictures 'Ref0' and 'Ref2' is the same as the resolution of the current picture. Therefore, 'Ref0' and 'Ref2' do not need to be resampled. However, the resolution of the reference picture 'Ref1' is different from the resolution of the current picture and therefore needs to be resampled. Thus, only the resampled 'Ref1' is saved in the reference picture buffer, and 'Ref0' and 'Ref2' are not saved in the reference picture buffer.

[0033] 幾つかの例示的実施形態と一致して、オンザフライのピクチャレベルの再サンプリングでは、ピクチャが使用できる再サンプリング参照ピクチャの数は、制約を受ける。例えば、所与の現在のピクチャの再サンプリング参照ピクチャの最大数は、予め設定されてもよい。例えば、最大数は、１になるように設定されてもよい。この場合、エンコーダ又はデコーダが、参照ピクチャの内の最大でも１つが、現在のピクチャの解像度とは異なる解像度を有することを許容することができ、他の参照ピクチャの全てが、同じ解像度を有さなければならないように、ビットストリーム制約が課され得る。この最大数は、再サンプルピクチャバッファのサイズ、及び現在の映像シーケンスの何れのピクチャに対しても行うことができる再サンプリングの最大数を示すため、最大数は、最悪の場合のデコーダの複雑さに直接関係する。したがって、この最大数は、シーケンスパラメータセット（ＳＰＳ）又はピクチャパラメータセット（ＰＰＳ）の一部として信号化されてもよく、プロファイル／レベル定義の一部として指定されてもよい。 [0033] Consistent with some exemplary embodiments, with on-the-fly picture-level resampling, the number of resampling reference pictures that a picture can use is constrained. For example, the maximum number of resampled reference pictures for a given current picture may be preset. For example, the maximum number may be set to be one. In this case, the encoder or decoder may allow at most one of the reference pictures to have a resolution different from that of the current picture, and all other reference pictures have the same resolution. Bitstream constraints can be imposed as they must. Since this maximum number indicates the size of the resampled picture buffer and the maximum number of resamplings that can be done for any picture of the current video sequence, the maximum number is the worst case decoder complexity. directly related to Therefore, this maximum number may be signaled as part of the Sequence Parameter Set (SPS) or Picture Parameter Set (PPS) and may be specified as part of the Profile/Level definition.

[0034] 幾つかの例示的実施形態では、参照ピクチャの２つのバージョンが、ＤＰＢに保存される。一方のバージョンは、オリジナルの解像度を有し、他方のバージョンは、最大解像度を有する。現在のピクチャの解像度が、オリジナルの解像度又は最大解像度と異なる場合、エンコーダ又はデコーダは、保存された最大解像度ピクチャからオンザフライのダウンサンプリングを行い得る。ピクチャ出力に関して、ＤＰＢは、常に、オリジナルの（再サンプリングされていない）参照ピクチャを出力する。 [0034] In some exemplary embodiments, two versions of a reference picture are stored in the DPB. One version has the original resolution and the other version has the full resolution. If the resolution of the current picture differs from the original or full resolution, the encoder or decoder may perform on-the-fly downsampling from the stored full resolution picture. Regarding picture output, the DPB always outputs the original (not resampled) reference picture.

[0035] 幾つかの例示的実施形態では、再サンプリング比は、任意に選択することができ、縦及び横スケーリング比は、異なってもよい。ピクチャレベルの再サンプリングが適用されるため、エンコーディング／デコーダのブロックレベル動作は、参照ピクチャの解像度に依存しない状態にされ、それによって、エンコーダ／デコーダのブロックレベルの設計論理をさらに複雑にすることなく、任意の再サンプリング比を可能にすることが許容される。 [0035] In some exemplary embodiments, the resampling ratio can be arbitrarily chosen and the vertical and horizontal scaling ratios can be different. Because picture-level resampling is applied, the block-level operations of the encoder/decoder are made independent of the resolution of the reference picture, thereby without further complicating the block-level design logic of the encoder/decoder. , is allowed to allow arbitrary resampling ratios.

[0036] 幾つかの例示的実施形態では、符号化解像度及び最大解像度の信号化は、以下のように行うことができる。映像シーケンスの何れのピクチャの最大解像度も、ＳＰＳにおいて信号化される。ピクチャの符号化解像度は、ＰＰＳにおいて、又はスライスヘッダにおいて信号化され得る。どちらの場合も、符号化解像度が最大解像度と同じであるか否かを示す１つのフラグが信号化される。符号化解像度が最大解像度と同じではない場合、現在のピクチャの符号化幅及び高さがさらに信号化される。ＰＰＳ信号化が使用される場合、信号化された符号化解像度は、このＰＰＳを参照する全てのピクチャに適用される。スライスヘッダ信号化が使用される場合、信号化された符号化解像度は、現在のピクチャ自体にのみ適用される。幾つかの実施形態では、現在の符号化解像度と、信号化された最大解像度との間の差が、ＳＰＳにおいて信号化され得る。 [0036] In some exemplary embodiments, coding resolution and full resolution signaling may be performed as follows. The full resolution of any picture of the video sequence is signaled in the SPS. The coding resolution of a picture can be signaled in the PPS or in the slice header. In both cases, one flag is signaled that indicates whether the encoding resolution is the same as the full resolution. If the encoding resolution is not the same as the maximum resolution, the encoding width and height of the current picture are also signaled. When PPS signalling is used, the signaled coding resolution applies to all pictures that reference this PPS. If slice header signalling is used, the signaled coding resolution applies only to the current picture itself. In some embodiments, the difference between the current encoding resolution and the maximum signaled resolution may be signaled in the SPS.

[0037] 幾つかの例示的実施形態では、任意の再サンプリング比を使用する代わりに、解像度は、予め定義された、Ｎ個のサポート解像度のセット（Ｎは、シーケンス内のサポート解像度の数である）に限定される。Ｎ及びサポート解像度の値は、ＳＰＳにおいて信号化され得る。ピクチャの符号化解像度は、ＰＰＳ又はスライスヘッダを用いて信号化される。どちらの場合も、実際の符号化解像度を信号化する代わりに、対応する解像度インデックスが信号化される。表３（図７）は、符号化シーケンスが３つの異なる解像度を許容するサポート解像度セットの一例を示す。現在のピクチャの符号化解像度が１４４０×８１６である場合、対応するインデックス値（＝１）が、ＰＰＳ又はスライスヘッダを用いて信号化される。 [0037] In some exemplary embodiments, instead of using an arbitrary resampling ratio, the resolution is a pre-defined set of N supported resolutions, where N is the number of supported resolutions in the sequence. is limited). The values of N and supported resolutions can be signaled in the SPS. The coding resolution of a picture is signaled using the PPS or slice header. In both cases, instead of signaling the actual coding resolution, the corresponding resolution index is signaled. Table 3 (Fig. 7) shows an example of a set of supported resolutions where the encoding sequence allows for three different resolutions. If the coding resolution of the current picture is 1440×816, the corresponding index value (=1) is signaled using the PPS or slice header.

[0038] 幾つかの例示的実施形態では、再サンプリングされた参照ピクチャ、及び再サンプリングされていない参照ピクチャの両方が、ＤＰＢに保存される。参照ピクチャごとに、１つのオリジナルの（すなわち、再サンプリングされていない）ピクチャ、及びＮ－１個の再サンプリングされたコピーが保存される。参照ピクチャの解像度が現在のピクチャの解像度とは異なる場合、ＤＰＢの再サンプリングされたピクチャを使用して、現在のピクチャを符号化する。ピクチャ出力に関して、ＤＰＢは、オリジナルの（再サンプリングされていない）参照ピクチャを出力する。一例として、図８は、表３（図７）に示されたサポート解像度セットのＤＰＢの占有率を示す。図８に示すように、ＤＰＢは、参照ピクチャとして使用される各ピクチャのＮ（例えば、Ｎ＝３）個のコピーを包含する。 [0038] In some exemplary embodiments, both resampled and non-resampled reference pictures are stored in the DPB. For each reference picture, one original (ie, non-resampled) picture and N−1 resampled copies are saved. If the resolution of the reference picture is different from the resolution of the current picture, the resampled picture of the DPB is used to encode the current picture. Regarding picture output, the DPB outputs the original (not resampled) reference picture. As an example, FIG. 8 shows the DPB occupancy for the supported resolution set shown in Table 3 (FIG. 7). As shown in FIG. 8, the DPB contains N (eg, N=3) copies of each picture that are used as reference pictures.

[0039] Ｎの値の選択は、用途によって決まる。Ｎの値が大きくなるにつれ、解像度選択の柔軟性が増すが、複雑さ及びメモリ要件が増大する。小さい値のＮは、複雑さの少ないデバイスには適するが、選択することができる解像度が限定される。したがって、幾つかの実施形態では、用途及びデバイス能力に基づき、ＳＰＳによって信号化されるＮの値を決めるために、柔軟性がエンコーダに与えられ得る。 [0039] The choice of value for N depends on the application. As the value of N increases, flexibility in resolution selection increases, but complexity and memory requirements increase. Small values of N are suitable for low complexity devices, but limit the resolutions that can be selected. Therefore, in some embodiments flexibility may be provided to the encoder to determine the value of N to be signaled by the SPS based on application and device capabilities.

[0040] 本開示と一致して、参照ピクチャのダウンサンプリングは、漸進的に（すなわち、徐々に）行われ得る。従来の（すなわち、直接）ダウンサンプリングでは、入力画像のダウンサンプリングは、単一ステップで行われる。ダウンサンプリング比が高くなれば、単一ステップダウンサンプリングは、重度のエイリアシング問題を回避するために、より長いタップダウンサンプリングフィルタを必要とする。しかしながら、より長いタップフィルタは、計算コストが高い。幾つかの例示的実施形態では、ダウンサンプリングされた画像の品質を維持するために、ダウンサンプリング比が閾値よりも大きい（例えば、２：１ダウンサンプリングよりも大きい）場合、ダウンサンプリングが徐々に行われる漸進的ダウンサンプリングが使用される。例えば、２：１より大きいダウンサンプリング比を実施するために、２：１ダウンサンプリングに十分なダウンサンプリングフィルタが繰り返し適用されてもよい。図９は、横寸法及び縦寸法の両方において、２つのピクチャにわたり、４分の１ダウンサンプリングが実施される、漸進的ダウンサンプリングの一例を示す。第１のピクチャが、（両方向に）２分の１ダウンサンプリングされ、第２のピクチャが、（両方向に）２分の１ダウンサンプリングされる。 [0040] Consistent with this disclosure, downsampling of reference pictures may be done incrementally (ie, over time). In conventional (ie, direct) downsampling, the downsampling of the input image is done in a single step. For higher downsampling ratios, single step downsampling requires longer tap downsampling filters to avoid severe aliasing problems. However, longer tap filters are computationally expensive. In some exemplary embodiments, downsampling occurs gradually when the downsampling ratio is greater than a threshold (eg, greater than 2:1 downsampling) to maintain the quality of the downsampled image. A progressive downsampling is used. For example, a downsampling filter sufficient for 2:1 downsampling may be repeatedly applied to implement downsampling ratios greater than 2:1. FIG. 9 shows an example of progressive downsampling, where quarter downsampling is performed over two pictures in both horizontal and vertical dimensions. The first picture is downsampled by a factor of two (in both directions) and the second picture is downsampled by a factor of two (in both directions).

[0041] 本開示と一致して、開示するピクチャレベルの再サンプリング法は、他の符号化ツールと共に使用することができる。例えば、幾つかの例示的実施形態では、再サンプリングされた参照ピクチャに基づく、スケーリングされた動きベクトルは、時間動きベクトル予測子（ＴＭＶＰ（temporal motion vector predictor））、又は高度時間動きベクトル予測子（ＡＴＭＶＰ（advanced temporal motion vector predictor））中に使用することができる。 [0041] Consistent with this disclosure, the disclosed picture-level resampling methods can be used with other coding tools. For example, in some exemplary embodiments, a scaled motion vector based on a resampled reference picture is a temporal motion vector predictor (TMVP) or an advanced temporal motion vector predictor ( It can be used during ATMVP (advanced temporal motion vector predictor).

[0042] 次に、ＡＲＣ用の符号化ツールを評価するためのテスト方法論を説明する。適応解像度変更は、主に、ネットワークの帯域幅に適応するために使用されるので、ネットワーク帯域幅変更中に、異なるＡＲＣスキームの符号化効率を比較するために、以下のテスト条件が考慮され得る。 [0042] Next, a test methodology for evaluating coding tools for ARC is described. Since adaptive resolution change is mainly used to adapt to network bandwidth, the following test conditions can be considered to compare the coding efficiency of different ARC schemes during network bandwidth change. .

[0043] 幾つかの実施形態では、解像度は、ある特定の時間インスタンスで、（縦方向及び横方向の両方で）半分に変更され、その後、一定期間後に、オリジナルの解像度に戻される。図１０は、解像度変更の例を示す。時点ｔ_１において、解像度が半分に変更され、その後、時点ｔ_２において、フル解像度に戻される。 [0043] In some embodiments, the resolution is changed in half (both vertically and horizontally) at a particular instance of time, and then changed back to the original resolution after a period of time. FIG. 10 shows an example of resolution change. At time t ₁ the resolution is changed to half and then back to full resolution at time t ₂ .

[0044] 幾つかの実施形態では、解像度が低下したとき、ダウンサンプリング比が大きすぎる（例えば、所与の寸法のダウンサンプリング比が２：１より大きい）場合、一定期間にわたり、漸進的ダウンサンプリングが使用される。しかしながら、アップサンプリングが使用されるとき、漸進的アップサンプリングは使用されない。 [0044] In some embodiments, when the resolution is reduced, if the downsampling ratio is too large (eg, the downsampling ratio for a given dimension is greater than 2:1), incremental downsampling is performed over a period of time. is used. However, when upsampling is used, gradual upsampling is not used.

[0045] 幾つかの実施形態では、スケーラブルＨＥＶＣテストモデル（ＳＭＨ）ダウンサンプリングフィルタが、ソース再サンプリングに使用され得る。表４（図１１）では、異なる小数サンプル位置と共に、詳細なフィルタ係数が示される。 [0045] In some embodiments, a scalable HEVC test model (SMH) downsampling filter may be used for source resampling. Table 4 (FIG. 11) shows the detailed filter coefficients with different fractional sample positions.

[0046] 幾つかの例示的実施形態では、映像品質を測定するために、２つのピーク信号対雑音比（ＰＳＮＲ）が計算される。ＰＳＮＲ計算の図を図１２に示す。第１のＰＳＮＲは、再サンプリングされたソースと、デコードされたピクチャとの間で計算される。第２のＰＳＮＲは、オリジナルのソースと、アップサンプリングされたデコードソースとの間で計算される。 [0046] In some exemplary embodiments, two peak signal-to-noise ratios (PSNR) are calculated to measure video quality. A diagram of the PSNR calculation is shown in FIG. A first PSNR is computed between the resampled source and the decoded picture. A second PSNR is computed between the original source and the upsampled decoded source.

[0047] 幾つかの実施形態では、エンコードされるピクチャの数は、ＶＶＣの現在のテスト条件と同じである。 [0047] In some embodiments, the number of encoded pictures is the same as the current test conditions for VVC.

[0048] 次に、開示するブロックベースの再サンプリング法を説明する。ブロックベースの再サンプリングでは、再サンプリング及び動き補償インターポレーションを１つのフィルタリング動作に統合することにより、上述の情報損失を減少させることができる。以下のケースを例にとる：現在のブロックの動きベクトルが、一方の寸法において（例えば、横寸法において）、ハーフペル精度を有し、参照ピクチャの幅が、現在のピクチャの幅の２倍である。この場合、現在のピクチャの幅と合致するように、参照ピクチャの幅を半分に減少させ、次いで、ハーフペル動きインターポレーションを行うピクチャレベルの再サンプリングと比較して、ブロックベースの再サンプリング法は、ハーフペル精度で、参照ピクチャの奇数位置を参照ブロックとして直接フェッチする。第１５回ＪＶＥＴ会合では、動き補償（ＭＣ）インターポレーション及び参照再サンプリングが統合されかつ単一ステップフィルタで行われる、ＡＲＣのためのブロックベースの再サンプリング法が、ＶＶＣに採用された。ＶＶＣドラフト６では、参照再サンプリングを使用しないＭＣインターポレーションのための既存のフィルタが、参照再サンプリングを使用するＭＣインターポレーションに再利用される。同じフィルタが、参照アップサンプリング及びダウンサンプリングの両方に使用される。フィルタ選択の詳細を以下に説明する。 [0048] The disclosed block-based resampling method will now be described. Block-based resampling can reduce the information loss described above by combining resampling and motion compensated interpolation into one filtering operation. Take the following case as an example: the motion vector of the current block has half-pel accuracy in one dimension (e.g. in the horizontal dimension) and the width of the reference picture is twice the width of the current picture. . In this case, compared to picture-level resampling that reduces the width of the reference picture by half to match the width of the current picture, followed by half-pel motion interpolation, the block-based resampling method is , with half-pel precision, directly fetching the odd positions of the reference picture as reference blocks. At the 15th JVET meeting, a block-based resampling method for ARC, in which motion compensated (MC) interpolation and reference resampling are integrated and performed with a single-step filter, was adopted for VVC. In VVC Draft 6, existing filters for MC interpolation without reference resampling are reused for MC interpolation with reference resampling. The same filter is used for both reference upsampling and downsampling. Details of filter selection are described below.

[0049] ルマ成分に関して、ハーフペルＡＭＶＲモードが選択されかつインターポレーション位置がハーフペルの場合、６タップフィルタ［３，９，２０，２０，９，３］が使用され、動き補償ブロックサイズが４×４の場合、表５（図１３）に示されるような以下の６タップフィルタが使用され、そうでなければ、表６（図１４）に示されるような８タップフィルタが使用される。クロマ成分に関して、表７（図１５）に示される４タップフィルタが使用される。 [0049] For the luma component, if the half-pel AMVR mode is selected and the interpolation position is half-pel, a 6-tap filter [3, 9, 20, 20, 9, 3] is used and the motion compensation block size is 4× If 4, then the following 6-tap filter as shown in Table 5 (FIG. 13) is used, otherwise an 8-tap filter as shown in Table 6 (FIG. 14) is used. For the chroma component, the 4-tap filter shown in Table 7 (FIG. 15) is used.

[0050] ＶＶＣでは、参照再サンプリングを使用しないＭＣインターポレーション、及び参照再サンプリングを使用するＭＣインターポレーションに対して、同じフィルタが使用される。ＶＶＣのＭＣＩＦは、ＤＣＴアップサンプリングに基づいて設計されるが、参照ダウンサンプリング及びＭＣインターポレーションを統合した単一ステップフィルタとして、それを使用することが適切ではない場合がある。例えば、位相０フィルタリングの場合（すなわち、スケーリングされたｍｖが整数である）、ＶＶＣ８タップＭＣＩＦ係数が、［０，０，０，６４，０，０，０，０］であり、これは、予測サンプルが、参照サンプルから直接コピーされたことを意味する。このことは、参照ダウンサンプリングを使用しない、又は参照アップサンプリングを使用するＭＣインターポレーションにとって問題ではないかもしれないが、参照ダウンサンプリングケースの場合、デシメーションの前の低域通過フィルタの欠如により、エイリアシングアーチファクトを引き起こし得る。 [0050] In VVC, the same filter is used for MC interpolation without reference resampling and for MC interpolation with reference resampling. Although the VVC MCIF is designed based on DCT upsampling, it may not be suitable to use it as a single-step filter integrating reference downsampling and MC interpolation. For example, for phase 0 filtering (i.e., scaled mv is an integer), the VVC 8-tap MCIF coefficients are [0,0,0,64,0,0,0,0], which is the predicted It means that the sample was copied directly from the reference sample. This may not be a problem for MC interpolation without reference downsampling or with reference upsampling, but for the reference downsampling case the lack of a low-pass filter before decimation leads to May cause aliasing artifacts.

[0051] 本開示は、参照ダウンサンプリングを使用するＭＣインターポレーションに関して、コサイン窓化シンクフィルタを使用する方法を提供する。 [0051] This disclosure provides a method of using a cosine-windowed sinc filter for MC interpolation with reference downsampling.

[0052] 窓化シンクフィルタは、周波数のある帯域を他の帯域から分離する帯域通過フィルタである。窓化シンクフィルタは、図１６に示すように、カットオフ周波数を下回る全ての周波数を振幅１により通過させ、カットオフ周波数を上回る全ての周波数をゼロ振幅によりカットする周波数応答を有する低域通過フィルタである。 [0052] A windowed sinc filter is a bandpass filter that separates one band of frequencies from another. A windowed sinc filter is a lowpass filter with a frequency response that passes all frequencies below the cutoff frequency with an amplitude of 1 and cuts all frequencies above the cutoff frequency with a zero amplitude, as shown in FIG. is.

[0053] フィルタのインパルス応答としても知られるフィルタカーネルは、理想的な低域通過フィルタの周波数応答の逆フーリエ変換を求めることによって得られる。低域通過フィルタのインパルス応答は、シンク関数の一般形式である。

式中、ｆｃは、［０，１］の値を有するカットオフ周波数であり、ｒは、ダウンサンプリング比、すなわち、１．５：１のダウンサンプリングに関して１．５、及び２：１のダウンサンプリングに関して２である。シンク関数は、以下のように定義される。

[0053] The filter kernel, also known as the filter's impulse response, is obtained by taking the inverse Fourier transform of the frequency response of an ideal low-pass filter. The impulse response of a low-pass filter is the general form of a sinc function.

where fc is the cutoff frequency with values of [0,1] and r is the downsampling ratio, i.e. 1.5 for 1.5:1 downsampling and 2:1 downsampling is 2 with respect to A sink function is defined as follows.

[0054] シンク関数は、無限である。フィルタカーネルを有限長にするために、フィルタカーネルをＬ個の点に切り捨てるように、窓関数が適用される。滑らかなテーパー状曲線を得るために、コサイン窓化関数が使用され、これは、以下によって得られる。

[0054] The sink function is infinite. To make the filter kernel of finite length, a window function is applied to truncate the filter kernel to L points. To obtain a smooth tapered curve, a cosine windowing function is used, which is obtained by:

[0055] コサイン窓化シンクフィルタのカーネルは、理想的な応答関数ｈ（ｎ）とコサイン窓関数ｗ（ｎ）の積である。

[0055] The kernel of a cosine windowed sinc filter is the product of the ideal response function h(n) and the cosine window function w(n).

[0056] 窓化シンクカーネルのために選択された２つのパラメータとして、カットオフ周波数ｆｃ及びカーネル長Ｌが存在する。スケーラブルＨＥＶＣテストモデル（ＳＨＭ）で使用されるダウンサンプリングフィルタの場合、ｆｃ＝０．９及びＬ＝１３である。 [0056] There are two parameters selected for the windowed sinc kernel, the cutoff frequency fc and the kernel length L. For the downsampling filter used in the scalable HEVC test model (SHM), fc=0.9 and L=13.

[0057] （４）で取得されたフィルタ係数は、実数である。フィルタを適用することは、重みをフィルタ係数として、参照サンプルの重み付き平均を計算することに等しい。デジタルコンピュータ又はハードウェアにおける効率的な計算のために、係数の合計が２のＮ乗（Ｎは、整数である）に等しくなるように、係数は、正規化され、スケーラーで乗算され、及び整数に丸められる。フィルタリングされたサンプルは、２のＮ乗で除算される（Ｎビットの右シフトに等しい）。例えば、ＶＶＣドラフト６では、インターポレーションフィルタ係数の合計は、６４である。 [0057] The filter coefficients obtained in (4) are real numbers. Applying the filter is equivalent to computing a weighted average of the reference samples, with the weights as the filter coefficients. For efficient computation in a digital computer or hardware, the coefficients are normalized, multiplied by a scaler, and integer rounded to . The filtered samples are divided by 2 to the Nth power (equivalent to a right shift of N bits). For example, in VVC Draft 6, the total number of interpolation filter coefficients is 64.

[0058] 幾つかの開示の実施形態では、ダウンサンプリングフィルタが、ＳＨＭにおいて、ルマ成分及びクロマ成分の両方に関して、参照ダウンサンプリングを使用するＶＶＣ動き補償インターポレーションのために使用され、既存のＭＣＩＦが、参照アップサンプリングを使用する動き補償インターポレーションに使用される。カーネル長Ｌ＝１３であって、第１の係数が小さく、ゼロに丸められる場合、フィルタ長は、フィルタ性能に影響を与えることなく、１２に減少させることができる。 [0058] In some disclosed embodiments, a downsampling filter is used in SHM for VVC motion compensated interpolation using reference downsampling for both luma and chroma components, and the existing MCIF is used for motion compensated interpolation using reference upsampling. If the kernel length L=13 and the first coefficient is small and rounded to zero, the filter length can be reduced to 12 without affecting filter performance.

[0059] 一例として、２：１ダウンサンプリング及び１．５：１ダウンサンプリングに関するフィルタ係数は、表８（図１７）及び表９（図１８）にそれぞれ示される。 [0059] As an example, filter coefficients for 2:1 downsampling and 1.5:1 downsampling are shown in Table 8 (Fig. 17) and Table 9 (Fig. 18), respectively.

[0060] 係数の値の他に、ＳＨＭフィルタの設計と、既存のＭＣＩＦの設計との間に幾つかの他の相違がある。 [0060] Besides the values of the coefficients, there are some other differences between SHM filter designs and existing MCIF designs.

[0061] 第１の相違として、ＳＨＭフィルタは、整数サンプル位置及び小数サンプル位置におけるフィルタリングを必要とするが、ＭＣＩＦは、小数サンプル位置におけるフィルタリングのみを必要とする。ＶＶＣドラフト６の参照ダウンサンプリングケースのルマサンプルインターポレーションフィルタリングプロセスに対する変更の一例を表１０（図１９）に記載する。 [0061] As a first difference, SHM filters require filtering at integer and fractional sample positions, whereas MCIF only requires filtering at fractional sample positions. An example of a modification to the luma sample interpolation filtering process for the reference downsampling case of VVC Draft 6 is listed in Table 10 (FIG. 19).

[0062] ＶＶＣドラフト６の参照ダウンサンプリングケースのクロマサンプルインターポレーションフィルタリングプロセスに対する変更の一例を表１１（図２０）に記載する。 [0062] An example of a change to the chroma sample interpolation filtering process for the reference downsampling case of VVC Draft 6 is listed in Table 11 (FIG. 20).

[0063] 第２の相違として、ＳＨＭフィルタの場合、フィルタ係数の合計は１２８であるが、既存のＭＣＩＦの場合、フィルタ係数の合計は６４である。ＶＶＣドラフト６では、丸め誤差によって生じる損失を減少させるために、中間予測信号が、出力信号よりも高い精度（より大きなビット深度で表される）に維持される。中間信号の精度は、内部精度と呼ばれる。ある実施形態では、内部精度をＶＶＣドラフト６と同じに維持するために、ＳＨＭフィルタの出力は、既存のＭＣＩＦを使用した場合と比較して、さらに１ビット右シフトする必要がある。ＶＶＣドラフト６の参照ダウンサンプリングケースのルマサンプルインターポレーションフィルタリングプロセスに対する変更の一例を表１２（図２１）に示す。 [0063] As a second difference, for the SHM filter, the sum of filter coefficients is 128, whereas for the existing MCIF, the sum of filter coefficients is 64. In VVC Draft 6, the intermediate prediction signal is kept to a higher precision (represented by a larger bit depth) than the output signal in order to reduce the loss caused by rounding errors. The accuracy of the intermediate signal is called internal accuracy. In one embodiment, to keep the internal precision the same as VVC Draft 6, the output of the SHM filter needs to be right-shifted an additional bit compared to using the existing MCIF. An example of a modification to the luma sample interpolation filtering process for the VVC Draft 6 reference downsampling case is shown in Table 12 (FIG. 21).

[0064] ＶＶＣドラフト６の参照ダウンサンプリングケースのクロマサンプルインターポレーションフィルタリングプロセスに対する変更の一例を表１３（図２２）に示す。 [0064] An example of a change to the chroma sample interpolation filtering process for the reference downsampling case of VVC Draft 6 is shown in Table 13 (FIG. 22).

[0065] 幾つかの実施形態によれば、内部精度は、１ビット分上げることができ、追加１ビット右シフトを用いて、内部精度を出力精度に変換することができる。ＶＶＣドラフト６の参照ダウンサンプリングケースのルマサンプルインターポレーションフィルタリングプロセスに対する変更の一例を表１４（図２３）に示す。 [0065] According to some embodiments, the internal precision can be increased by one bit, and an additional one-bit right shift can be used to convert the internal precision to the output precision. An example of a modification to the luma sample interpolation filtering process for the VVC Draft 6 reference downsampling case is shown in Table 14 (FIG. 23).

[0066] ＶＶＣドラフト６の参照ダウンサンプリングケースのクロマサンプルインターポレーションフィルタリングプロセスに対する変更の一例を表１５（図２４）に示す。 [0066] An example of a change to the chroma sample interpolation filtering process for the reference downsampling case of VVC Draft 6 is shown in Table 15 (Fig. 24).

[0067] 第３の相違として、ＳＨＭフィルタは、１２タップを有する。したがって、インターポレーションが行われたサンプルを生成するために、１１個の隣接サンプル（左に５個、右に６個、又は上に５個、下に６個）が必要とされる。ＭＣＩＦと比較して、追加の隣接サンプルがフェッチされる。ＶＶＣドラフト６では、クロマｍｖ精度は、１／３２である。しかしながら、ＳＨＭフィルタは、１６位相のみを有する。したがって、クロマｍｖは、参照ダウンサンプリングに関して、１／１６に丸められ得る。これは、１ビット分、クロマｍｖの最後の５ビットを右シフトすることによって行われ得る。ＶＶＣドラフト６の参照ダウンサンプリングケースのクロマ小数サンプル位置計算に対する変更の一例を表１６（図２５）に示す。 [0067] As a third difference, the SHM filter has 12 taps. Therefore, 11 adjacent samples (5 left, 6 right, or 5 top, 6 bottom) are required to generate an interpolated sample. Additional neighboring samples are fetched compared to MCIF. In VVC draft 6, the chroma mv precision is 1/32. However, SHM filters only have 16 phases. Therefore, the chroma mv can be rounded to 1/16 with respect to the reference downsampling. This can be done by right shifting the last 5 bits of the chroma mv by 1 bit. An example of a change to the chroma fractional sample position calculation for the reference downsampling case of VVC Draft 6 is shown in Table 16 (FIG. 25).

[0068] 幾つかの実施形態では、ＶＶＣドラフトの既存のＭＣＩＦ設計に合わせるために、我々は、８タップコサイン窓化シンクフィルタの使用を提案する。フィルタ係数は、式（４）のコサイン窓化シンク関数において、Ｌ＝９を設定することによって導出することができる。さらに既存のＭＣＩＦフィルタに合わせるために、フィルタ係数の合計は、６４に設定され得る。２：１及び１．５：１の比に関する例示的フィルタ係数を表１７（図２６）及び表１８（図２７）にそれぞれ示す。 [0068] In some embodiments, to match the existing MCIF design of the VVC draft, we propose the use of an 8-tap cosine-windowed sinc filter. The filter coefficients can be derived by setting L=9 in the cosine windowed sinc function of equation (4). To further match existing MCIF filters, the total filter coefficients can be set to 64. Exemplary filter coefficients for ratios of 2:1 and 1.5:1 are shown in Table 17 (FIG. 26) and Table 18 (FIG. 27), respectively.

[0069] 幾つかの実施形態によれば、クロマ成分に関する１／３２サンプル精度に適応するために、参照ダウンサンプリングを使用するクロマ動き補償インターポレーションにおいて、３２位相コサイン窓化シンクフィルタセットが使用され得る。２：１及び１．５：１の比に関するフィルタ係数の例を表１９（図２８）及び表２０（図２９）にそれぞれ示す。 [0069] According to some embodiments, a 32-phase cosine windowed sinc filter set is used in chroma motion compensated interpolation using reference downsampling to accommodate 1/32 sample accuracy for the chroma components. can be Examples of filter coefficients for ratios of 2:1 and 1.5:1 are shown in Table 19 (FIG. 28) and Table 20 (FIG. 29), respectively.

[0070] 幾つかの実施形態によれば、４×４ルマブロックの場合、６タップコサイン窓化シンクフィルタが、参照ダウンサンプリングを使用するＭＣインターポレーションに使用され得る。２：１及び１．５：１の比に関するフィルタ係数の例を以下の表２１（図３０）及び表２２（図３１）にそれぞれ示す。 [0070] According to some embodiments, for a 4x4 luma block, a 6-tap cosine-windowed sinc filter may be used for MC interpolation with reference downsampling. Examples of filter coefficients for ratios of 2:1 and 1.5:1 are shown below in Tables 21 (FIG. 30) and 22 (FIG. 31), respectively.

[0071] 幾つかの実施形態では、命令を含む非一時的コンピュータ可読記憶媒体も提供され、命令は、上記の方法を行うために、デバイス（開示のエンコーダ及びデコーダなど）によって実行され得る。非一時的媒体の一般的な形態には、例えば、フロッピー（登録商標）ディスク、フレキシブルディスク、ハードディスク、ソリッドステートドライブ、磁気テープ、又はその他の磁気データ記憶媒体、ＣＤ－ＲＯＭ、その他の光学データ記憶媒体、孔のパターンを有する任意の物理媒体、ＲＡＭ、ＰＲＯＭ、及びＥＰＲＯＭ、ＦＬＡＳＨ（登録商標）－ＥＰＲＯＭ又はその他のフラッシュメモリ、ＮＶＲＡＭ、キャッシュ、レジスタ、その他のメモリチップ又はカートリッジ、並びに上記のネットワーク化バージョンが含まれる。デバイスは、１つ若しくは複数のプロセッサ（ＣＰＵ）、入出力インタフェース、ネットワークインタフェース、及び／又はメモリを含み得る。 [0071] In some embodiments, a non-transitory computer-readable storage medium is also provided that includes instructions, which may be executed by devices (such as the disclosed encoders and decoders) to perform the methods described above. Common forms of non-transitory media include, for example, floppy disks, floppy disks, hard disks, solid state drives, magnetic tapes or other magnetic data storage media, CD-ROMs, and other optical data storage. Media, any physical media with a pattern of holes, RAM, PROM and EPROM, FLASH-EPROM or other flash memory, NVRAM, cache, registers, other memory chips or cartridges, and networking of the above version is included. A device may include one or more processors (CPUs), input/output interfaces, network interfaces, and/or memory.

[0072] 「第１の」及び「第２の」などの本明細書の関係語は、あるエンティティ又は動作を別のエンティティ又は動作と区別するためだけに使用されるものであり、これらのエンティティ又は動作間の実際の関係又は順序を必要とするもの、又は暗示するものではないことに留意されたい。また、「含む（comprising）」、「有する（having）」、「包含する（containing）」、及び「含む（including）」という語、並びに他の類似の形態は、意味が同等であること、及びこれらの語の何れか１つに続く１つ又は複数の項が、そのような１つ若しくは複数の項の網羅的列挙ではない点で、又は列挙された１つ若しくは複数の項のみに限定されない点で、オープンエンド形式であることが意図される。 [0072] The relative terms herein such as "first" and "second" are only used to distinguish one entity or action from another entity or action, and these entities Note that no actual relationship or order between acts is required or implied. Also, the terms "comprising," "having," "containing," and "including," and other similar forms are equivalent in meaning, and The term or terms following any one of these terms are not limited in any way to an exhaustive list of such terms or terms, or to only the listed term or terms. In that respect it is intended to be an open-ended format.

[0073] 本明細書では、特に別段の記載のない限り、「又は」という用語は、実行不可能でない限り、全ての可能な組み合わせを網羅する。例えば、データベースがＡ又はＢを含み得ると記載される場合、特に別段の記載のない限り、又は実行不可能でない限り、データベースは、Ａ、又はＢ、又はＡ及びＢを含み得る。第２の例として、データベースがＡ、Ｂ、又はＣを含み得ると記載される場合、特に別段の記載のない限り、又は実行不可能でない限り、データベースは、Ａ、又はＢ、又はＣ、又はＡ及びＢ、又はＡ及びＣ、又はＢ及びＣ、又はＡ及びＢ及びＣを含み得る。 [0073] As used herein, unless stated otherwise, the term "or" encompasses all possible combinations unless impracticable. For example, where it is stated that a database can include A or B, the database can include A or B, or A and B, unless stated otherwise or impracticable. As a second example, when it is stated that a database may contain A, B, or C, unless otherwise stated or impracticable, the database may include A, or B, or C, or It can include A and B, or A and C, or B and C, or A and B and C.

[0074] 上記の実施形態は、ハードウェア、又はソフトウェア（プログラムコード）、又はハードウェア及びソフトウェアの組み合わせによって実施され得ることが理解される。ソフトウェアによって実施される場合、それは、上記のコンピュータ可読媒体に保存され得る。ソフトウェアは、プロセッサによる実行時に、開示の方法を行うことができる。本開示に記載したコンピューティングユニット及び他の機能ユニットは、ハードウェア、又はソフトウェア、又はハードウェア及びソフトウェアの組み合わせによって実装され得る。当業者は、上記のモジュール／ユニットの内の複数が、１つのモジュール／ユニットとして統合され得ること、及び上記のモジュール／ユニットのそれぞれが、複数のサブモジュール／サブユニットにさらに分割され得ることも理解するだろう。 [0074] It will be appreciated that the above embodiments may be implemented in hardware, or software (program code), or a combination of hardware and software. When implemented by software, it can be stored on any of the computer readable media described above. The software, when executed by a processor, can perform the disclosed methods. The computing units and other functional units described in this disclosure may be implemented by hardware or software or a combination of hardware and software. Those skilled in the art will also appreciate that multiple of the above modules/units may be integrated as one module/unit, and that each of the above modules/units may be further divided into multiple sub-modules/sub-units. will understand.

[0075] 上述の明細書では、実施態様によって異なり得る多数の具体的詳細に関して、実施形態を説明した。記載した実施形態の特定の適応及び変更が行われ得る。ここに開示した発明の明細書及び実施を考慮して、他の実施形態が当業者には明らかとなり得る。上記明細書及び例は、単なる例示と見なされることが意図され、本発明の真の範囲及び精神は、以下の特許請求の範囲によって示される。また、図面に示されるステップの順序は、単に、説明のためのものであることが意図され、ステップの何れの特定の順序にも限定されることは意図されない。そのため、同じ方法を実施しながら、これらのステップが異なる順序で行われ得ることを当業者は理解できる。 [0075] In the foregoing specification, embodiments have been described with reference to numerous specific details that may vary from implementation to implementation. Certain adaptations and modifications of the described embodiments may be made. Other embodiments may be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the above specification and examples be considered as exemplary only, with the true scope and spirit of the invention being indicated by the following claims. Also, the order of steps shown in the figures is intended for illustration only and is not intended to be limited to any particular order of steps. As such, those skilled in the art will appreciate that these steps may be performed in a different order while performing the same method.

[0076] 実施形態は、以下の条項を用いてさらに説明することができる。
１．コンピュータ実施映像処理方法であって、
ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、
ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、
第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、
を含む方法。
２．第２の参照ピクチャを第１のバッファに保存することであって、第１のバッファが、将来のピクチャの予測に使用されるデコードピクチャを保存する第２のバッファとは異なる、保存することと、
ターゲットピクチャのエンコーディング又はデコーディングの完了後に、第１のバッファから第２の参照ピクチャを除去することと、
をさらに含む、条項１に記載の方法。
３．ターゲットピクチャのエンコーディング又はデコーディングを行うことが、
所定数以下の再サンプリングされた参照ピクチャを使用することによって、ターゲットピクチャのエンコーディング又はデコーディングを行うことを含む、条項１及び２の何れか一項に記載の方法。
４．ターゲットピクチャのエンコーディング又はデコーディングを行うことが、
シーケンスパラメータセット又はピクチャパラメータセットの一部として、所定数を信号化することを含む、条項３に記載の方法。
５．第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることが、
第１の参照ピクチャの第１のバージョン及び第２のバージョンを保存することであって、第１のバージョンが、オリジナルの解像度を有し、第２のバージョンが、参照ピクチャの再サンプリングが可能な最大解像度を有する、保存することと、
第２の参照ピクチャを生成するために最大バージョンを再サンプリングすることと、
を含む、条項１に記載の方法。
６．第１の参照ピクチャの第１のバージョン及び第２のバージョンをデコードピクチャバッファに保存することと、
将来のピクチャを符号化するために第１のバージョンを出力することと、
をさらに含む、条項５に記載の方法。
７．第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることが、
サポート解像度で第２の参照ピクチャを生成することを含む、条項１に記載の方法。
８．サポート解像度の数と、
サポート解像度に対応するピクセル寸法と、
を示す情報をシーケンスパラメータセット又はピクチャパラメータセットの一部として信号化することをさらに含む、条項７に記載の方法。
９．情報が、
サポート解像度の少なくとも１つに対応するインデックスを含む、条項８に記載の方法。
１０．映像アプリケーション又は映像デバイスの構成に基づいて、サポート解像度の数を設定することをさらに含む、条項８及び９の何れか一項に記載の方法。
１１．第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすること、
第２の参照ピクチャを生成するために第１の参照ピクチャの漸進的ダウンサンプリングを行うこと、条項１～１０の何れか一項に記載の方法。
１２．デバイスであって、
コンピュータ命令を保存する１つ又は複数のメモリと、
コンピュータ命令を実行して、デバイスに、
ターゲットピクチャの解像度を第１の参照ピクチャの解像度と比較することと、
ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、
第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、
を行わせるように構成される１つ又は複数のプロセッサと、
を含む、デバイス。
１３．一組の命令を保存する非一時的コンピュータ可読媒体であって、前記一組の命令は、映像コンテンツを処理する方法をコンピュータシステムに行わせるように、コンピュータシステムの少なくとも１つのプロセッサによって実行可能であり、前記方法が、
ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、
ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、
第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、
を含む、非一時的コンピュータ可読媒体。
１４．ターゲットピクチャ及び参照ピクチャが異なる解像度を有することに応答して、動き補償インターポレーションを行うため、及び参照ブロックを生成するために、帯域通過フィルタを参照ピクチャに適用することと、
参照ブロックを使用して、ターゲットピクチャのブロックのエンコーディング又はデコーディングを行うことと、
を含む、コンピュータ実施映像処理方法。
１５．帯域通過フィルタが、コサイン窓化シンクフィルタである、条項１４に記載の方法。
１６．コサイン窓化シンクフィルタが、カーネル関数

を有し、式中、

であり、ｆｃが、コサイン窓化シンクフィルタのカットオフ周波数であり、Ｌが、カーネル長さであり、ｒが、ダウンサンプリング比である、条項１５に記載の方法。
１７．ｆｃが、０．９に等しく、Ｌが、１３に等しい、条項１６に記載の方法。
１８．コサイン窓化シンクフィルタが、８タップフィルタである、条項１５～１７の何れか一項に記載の方法。
１９．コサイン窓化シンクフィルタが、４タップフィルタである、条項１５～１７の何れか一項に記載の方法。
２０．コサイン窓化シンクフィルタが、３２位相フィルタである、条項１５～１７の何れか一項に記載の方法。
２１．３２位相フィルタが、クロマ動き補償インターポレーションで使用される、条項２０に記載の方法。
２２．帯域通過フィルタを参照ピクチャに適用することが、
小数サンプル位置でルマサンプル又はクロマサンプルを取得することを含む、条項１４～２１の何れか一項に記載の方法。
２３．デバイスであって、
コンピュータ命令を保存する１つ又は複数のメモリと、
コンピュータ命令を実行して、デバイスに、
ターゲットピクチャ及び参照ピクチャが異なる解像度を有することに応答して、動き補償インターポレーションを行うため、及び参照ブロックを生成するために、帯域通過フィルタを参照ピクチャに適用することと、
参照ブロックを使用して、ターゲットピクチャのブロックのエンコーディング又はデコーディングを行うことと、
を行わせるように構成される１つ又は複数のプロセッサと、
を含む、デバイス。
２４．一組の命令を保存する非一時的コンピュータ可読媒体であって、前記一組の命令は、映像コンテンツを処理する方法をコンピュータシステムに行わせるように、コンピュータシステムの少なくとも１つのプロセッサによって実行可能であり、前記方法が含む：
２５．一組の命令を保存する非一時的コンピュータ可読媒体であって、前記一組の命令は、映像コンテンツを処理する方法をコンピュータシステムに行わせるように、コンピュータシステムの少なくとも１つのプロセッサによって実行可能であり、前記方法が、
ターゲットピクチャ及び第１の参照ピクチャの解像度を比較することと、
ターゲットピクチャ及び第１の参照ピクチャが異なる解像度を有することに応答して、第２の参照ピクチャを生成するために第１の参照ピクチャを再サンプリングすることと、
第２の参照ピクチャを使用して、ターゲットピクチャのエンコーディング又はデコーディングを行うことと、
を含む、非一時的コンピュータ可読媒体。 [0076] Embodiments can be further described using the following clauses.
1. A computer-implemented video processing method comprising:
comparing resolutions of a target picture and a first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding a target picture using a second reference picture;
method including.
2. storing a second reference picture in a first buffer, the first buffer being different than a second buffer storing decoded pictures used for prediction of future pictures; ,
removing the second reference picture from the first buffer after completing the encoding or decoding of the target picture;
The method of Clause 1, further comprising:
3. encoding or decoding the target picture,
3. The method of any one of

clauses

1 and 2, comprising encoding or decoding the target picture by using no more than a predetermined number of resampled reference pictures.
4. encoding or decoding the target picture,
4. The method of clause 3, comprising signaling the predetermined number as part of a sequence parameter set or a picture parameter set.
5. resampling the first reference picture to generate a second reference picture;
Storing a first version and a second version of a first reference picture, the first version having the original resolution and a second version capable of resampling the reference picture. storing with full resolution;
resampling the maximum version to generate a second reference picture;
The method of clause 1, comprising
6. storing a first version and a second version of the first reference picture in a decoded picture buffer;
outputting the first version for encoding future pictures;
6. The method of clause 5, further comprising
7. resampling the first reference picture to generate a second reference picture;
2. The method of clause 1, comprising generating the second reference picture at the supported resolution.
8. number of supported resolutions and
pixel dimensions corresponding to supported resolutions, and
8. The method of clause 7, further comprising signaling as part of a sequence parameter set or a picture parameter set information indicative of.
9. information is
9. The method of clause 8, including an index corresponding to at least one of the supported resolutions.
10. 10. The method of any one of

Clauses

8 and 9, further comprising setting the number of supported resolutions based on the configuration of the video application or video device.
11. resampling the first reference picture to generate a second reference picture;
11. The method of any one of clauses 1-10, wherein progressive downsampling of a first reference picture to generate a second reference picture.
12. a device,
one or more memories storing computer instructions;
execute computer instructions to the device,
comparing the resolution of the target picture to the resolution of the first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding a target picture using a second reference picture;
one or more processors configured to cause
device, including
13. A non-transitory computer-readable medium storing a set of instructions, the set of instructions executable by at least one processor of a computer system to cause the computer system to perform a method of processing video content. Yes, the method comprising:
comparing resolutions of a target picture and a first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding a target picture using a second reference picture;
A non-transitory computer-readable medium, including
14. applying a bandpass filter to a reference picture to perform motion compensated interpolation and to generate a reference block in response to the target picture and the reference picture having different resolutions;
encoding or decoding a block of a target picture using the reference block;
A computer-implemented video processing method comprising:
15. 15. The method of clause 14, wherein the bandpass filter is a cosine windowed sinc filter.
16. A cosine-windowed sinc filter uses the kernel function

and where

16. The method of clause 15, wherein fc is the cutoff frequency of the cosine windowed sinc filter, L is the kernel length, and r is the downsampling ratio.
17. 17. The method of clause 16, wherein fc equals 0.9 and L equals 13.
18. 18. The method of any one of clauses 15-17, wherein the cosine-windowed sinc filter is an 8-tap filter.
19. 18. The method of any one of clauses 15-17, wherein the cosine windowed sinc filter is a 4-tap filter.
20. 18. A method according to any one of clauses 15-17, wherein the cosine windowed sinc filter is a 32-phase filter.
21. The method of clause 20, wherein a 32 phase filter is used in chroma motion compensated interpolation.
22. Applying a bandpass filter to the reference picture
22. The method of any one of clauses 14-21, comprising obtaining luma or chroma samples at fractional sample positions.
23. a device,
one or more memories storing computer instructions;
execute computer instructions to the device,
applying a bandpass filter to a reference picture to perform motion compensated interpolation and to generate a reference block in response to the target picture and the reference picture having different resolutions;
encoding or decoding a block of a target picture using the reference block;
one or more processors configured to cause
device, including
24. A non-transitory computer-readable medium storing a set of instructions, the set of instructions executable by at least one processor of a computer system to cause the computer system to perform a method of processing video content. Yes, the method includes:
25. A non-transitory computer-readable medium storing a set of instructions, the set of instructions executable by at least one processor of a computer system to cause the computer system to perform a method of processing video content. Yes, the method comprising:
comparing resolutions of a target picture and a first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding a target picture using the second reference picture;
A non-transitory computer-readable medium, including

[0077] 図面及び明細書では、例示的実施形態を開示した。しかしながら、これらの実施形態に対して多くの変形形態及び変更形態を作ることができる。したがって、特定の用語が使用されるが、それらは、単に一般的及び説明的な意味で使用されるものであり、限定を意図したものではない。 [0077] Exemplary embodiments have been disclosed in the drawings and specification. However, many variations and modifications can be made to these embodiments. Accordingly, although specific terms have been employed, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

A computer-implemented video processing method comprising:
comparing resolutions of a target picture and a first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding the target picture using the second reference picture;
method including.

storing the second reference picture in a first buffer, the first buffer being different than a second buffer storing decoded pictures used for prediction of future pictures; and
removing the second reference picture from the first buffer after completing the encoding or decoding of the target picture;
2. The method of claim 1, further comprising:

encoding or decoding the target picture;
2. The method of claim 1, comprising encoding or decoding the target picture by using no more than a predetermined number of resampled reference pictures.

encoding or decoding the target picture;
4. The method of claim 3, comprising signaling the predetermined number as part of a sequence parameter set or a picture parameter set.

resampling the first reference picture to generate the second reference picture;
storing a first version and a second version of the first reference picture, the first version having an original resolution and the second version being a reproduction of the reference picture; storing, having the maximum resolution at which sampling is possible;
resampling the maximum version to generate the second reference picture;
2. The method of claim 1, comprising:

storing the first version and the second version of the first reference picture in a decoded picture buffer;
outputting the first version for encoding future pictures;
6. The method of claim 5, further comprising:

resampling the first reference picture to generate the second reference picture;
2. The method of claim 1, comprising generating the second reference picture at a supported resolution.

number of supported resolutions and
pixel dimensions corresponding to said supported resolution;
8. The method of claim 7, further comprising signaling as part of a sequence parameter set or a picture parameter set information indicative of .

said information is
9. The method of claim 8, comprising an index corresponding to at least one of said supported resolutions.

9. The method of claim 8, further comprising setting the number of supported resolutions based on a video application or video device configuration.

resampling the first reference picture to generate the second reference picture;
2. The method of claim 1, comprising progressively downsampling the first reference picture to generate the second reference picture.

a device,
one or more memories storing computer instructions;
executing the computer instructions to cause the device to:
comparing the resolution of the target picture to the resolution of the first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding the target picture using the second reference picture;
one or more processors configured to cause
device, including

the one or more processors executing the computer instructions to cause the device to:
storing the second reference picture in a first buffer, the first buffer being different than a second buffer storing decoded pictures used for prediction of future pictures; and
removing the second reference picture from the first buffer after completing the encoding or decoding of the target picture;
13. The device of claim 12, further configured to cause:

the one or more processors executing the computer instructions to cause the device to:
13. The device of claim 12, further configured to cause encoding or decoding of the target picture by using no more than a predetermined number of resampled reference pictures.

the one or more processors executing the computer instructions to cause the device to:
15. The device of claim 14, further configured to cause signaling of the predetermined number as part of a sequence parameter set or a picture parameter set.

the one or more processors executing the computer instructions to cause the device to:
storing a first version and a second version of the first reference picture in the one or more memories, the first version having an original resolution and the second version having an original resolution; saving a version having a maximum resolution at which the reference picture can be resampled;
resampling the maximum version to generate the second reference picture;
13. The device of claim 12, further configured to cause:

the one or more processors executing the computer instructions to cause the device to:
storing the first version and the second version of the first reference picture in a decoded picture buffer;
outputting the first version for encoding future pictures;
17. The device of claim 16, further configured to cause:

the one or more processors executing the computer instructions to cause the device to:
13. The device of Claim 12, further configured to generate the second reference picture at a supported resolution.

the one or more processors executing the computer instructions to cause the device to:
number of supported resolutions and
pixel dimensions corresponding to said supported resolution;
19. The device of claim 18, further configured to cause signaling of information indicative of as part of a sequence parameter set or a picture parameter set.

A non-transitory computer-readable medium storing a set of instructions, the set of instructions executable by at least one processor of a computer system to cause the computer system to perform a method of processing video content. and wherein the method is
comparing resolutions of a target picture and a first reference picture;
resampling the first reference picture to generate a second reference picture in response to the target picture and the first reference picture having different resolutions;
encoding or decoding the target picture using the second reference picture;
A non-transitory computer-readable medium, including