JP2023154047A

JP2023154047A - Method, apparatus and system for encoding and decoding tree or blocks of video samples

Info

Publication number: JP2023154047A
Application number: JP2023132042A
Authority: JP
Inventors: クリストファージェームズロゼワーン，; James Rosewarne Christopher
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-03-11
Filing date: 2023-08-14
Publication date: 2023-10-18
Anticipated expiration: 2040-01-20
Also published as: AU2021254642B2; EP3939277A1; AU2021254642A1; CN113574874B; RU2766881C1; EP3939277A4; KR20210100727A; WO2020181317A1; JP7337163B2; JP2022522576A; US20220150509A1; CN113574874A; TWI788262B; TWI769432B; TW202101981A; TW202239204A; RU2022102866A; BR112021013495A2; AU2019201653A1

Abstract

To provide a system and method of decoding a transform block for a color channel of an image frame from a video bitstream.SOLUTION: A method comprises: determining a chroma format of an image frame, the chroma format having chroma channels of the image frame being subsampled relative to a luma channel of the image frame; determining a coefficient group size of the transform block, the coefficient group size being determined based only on a transform block size of a largest area of the transform block of up to 16 samples, and independent of a color plane of the transform block and color plane subsampling due to the determined chroma format; and decoding the transform block using coefficient groups of the determined size from the video bitstream.SELECTED DRAWING: None

Description

関連出願への参照
本出願は２０１９年３月１１日に出願されたオーストラリア特許出願第2019201653号の出願日の３５Ｕ．Ｓ．Ｃ§１１９に基づく利益を主張し、その全体が本明細書に完全に記載されているかのように参照により本明細書に組み込まれる。 REFERENCE TO RELATED APPLICATIONS This application is filed under Australian patent application no. S. C § 119 and is incorporated herein by reference as if fully set forth herein.

本発明は一般に、デジタルビデオ信号処理に関し、特に、ビデオサンプルのツリー若しくはブロックを符号化及び復号するための方法、装置及びシステムに関する。本発明はまた、ビデオサンプルのツリー若しくはブロックを符号化および復号するためのコンピュータプログラムが記録されたコンピュータ可読媒体を含むコンピュータプログラム製品に関する。 The present invention relates generally to digital video signal processing and, more particularly, to methods, apparatus, and systems for encoding and decoding trees or blocks of video samples. The invention also relates to a computer program product comprising a computer readable medium having recorded thereon a computer program for encoding and decoding trees or blocks of video samples.

ビデオデータの送信及び記憶のためのアプリケーションを含む、ビデオ符号化のための多くのアプリケーションが現在存在する。多くのビデオ符号化規格も開発されており、他の規格も現在開発中である。ビデオ符号化標準化における最近の開発は、「Joint Video Experts Team」（JVET）と呼ばれるグループの形成をもたらした。Joint Video Experts Team（JVET）は、「Video Coding Experts Group」(VCEG)としても知られる国際電気通信連合（ＩＴＵ）の電気通信標準化セクタ（ＩＴＵ－Ｔ）のStudy Group 16、Question6(SG16／Q6)のメンバー、および「Moving Picture Experts group」（MPEG）としても知られる国際標準化機構／国際電気技術委員会合同技術委員会１／小委員会２９／作業グループ１１(ISO／IEC JTC１／ＳＣ２９／ＷＧ１１）のメンバーを含む。 Many applications currently exist for video encoding, including applications for the transmission and storage of video data. A number of video coding standards have also been developed, and others are currently under development. Recent developments in video coding standardization have led to the formation of a group called the "Joint Video Experts Team" (JVET). The Joint Video Experts Team (JVET) is a member of Study Group 16, Question 6 (SG16/Q6) of the Telecommunications Standardization Sector (ITU-T) of the International Telecommunication Union (ITU), also known as the "Video Coding Experts Group" (VCEG). Members of the International Organization for Standardization/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the "Moving Picture Experts group" (MPEG) Including members of.

Joint Video Experts Team（JVET）は、米国サンディエゴで開催された１０回目の会議でレスポンスを分析し、Call for Proposals（CfP）を発行した。提出されたレスポンスは、現在の最新技術のビデオ圧縮規格、すなわち「高効率ビデオ符号化」（ＨＥＶＣ）のものを著しく上回るビデオ圧縮能力を実証した。このアウトパフォーマンスに基づいて、「versatile video coding」（ＶＶＣ）と命名される新しいビデオ圧縮規格を開発するプロジェクトを開始することが決定された。ＶＶＣは特に、ビデオフォーマットが（例えば、より高い解像度およびより高いフレームレートで）能力を増加させ、帯域幅コストが比較的高いＷＡＮ上のサービス配信に対する市場需要の増加に対処することにつれて、絶えずより高い圧縮性能に対する継続的な需要に対処することが予想される。同時に、ＶＶＣは、現代のシリコンプロセスで実施可能でなければならず、達成された性能対実施コスト（例えば、シリコン面積、ＣＰＵプロセッサ負荷、メモリ使用量、および帯域幅に関して）の間の許容可能なトレードオフを提供しなければならない。 The Joint Video Experts Team (JVET) analyzed the responses and issued a Call for Proposals (CfP) at its 10th meeting in San Diego, USA. The submitted responses demonstrated video compression capabilities that significantly exceed those of the current state-of-the-art video compression standard, namely "High Efficiency Video Coding" (HEVC). Based on this outperformance, it was decided to launch a project to develop a new video compression standard named "versatile video coding" (VVC). VVC in particular continues to grow as video formats increase in capacity (e.g., with higher resolutions and higher frame rates) and address increasing market demand for service delivery over WANs where bandwidth costs are relatively high. It is expected to address the continuing demand for high compression performance. At the same time, VVC must be implementable in modern silicon processes and have an acceptable balance between achieved performance versus implementation cost (e.g. in terms of silicon area, CPU processor load, memory usage, and bandwidth). A trade-off must be offered.

ビデオデータは、画像データのフレームのシーケンスを含み、各フレームは、１つまたは複数のカラーチャネルを含む。一般に、１つの一次色チャネル（primary colour channel）と２つの二次色チャネル（secondary colour channel）が必要である。一次色チャネルは一般に「ルマ（luma）」チャネルと呼ばれ、二次色チャネルは一般に「クロマ（chroma）」チャネルと呼ばれる。ビデオデータは典型的にはＲＧＢ(赤－緑－青）色空間で表示されるが、この色空間は３つのそれぞれの要素間に高度の相関を有する。エンコーダまたはデコーダによって見られるビデオデータ表現はしばしば、ＹＣｂＣｒなどの色空間を使用する。ＹＣｂＣｒは、伝達関数に従って「ルマ」にマッピングされた輝度をＹ（一次）チャネルに集中させ、ＣｂおよびＣｒ（二次）チャネルにクロマを集中させる。さらに、ＣｂおよびＣｒチャネルは、「４：２：０クロマフォーマット」として知られる、ルマチャネルと比較してより低いレート、例えば、水平方向に半分および垂直方向に半分で空間的にサンプリング（サブサンプリング）されてもよい。４：２：０クロマフォーマットは、インターネットビデオストリーミング、ブロードキャストテレビジョン、Blu-Ray^TMディスクへの保存など、「コンシューマ」アプリケーションで一般的に使用される。水平方向に半分のレートでＣｂおよびＣｒチャネルをサブサンプリングし、垂直方向にサブサンプリングしないことは、「４：２：２クロマフォーマット」として知られている。４：２：２クロマフォーマットは、典型的には映画制作などのための映像のキャプチャを含むプロフェッショナルアプリケーションにおいて使用される。４：２：２クロマフォーマットのより高いサンプリングレートは、結果として得られるビデオを、カラーグレーディングのような編集動作に対してより弾力的にする。コンシューマに配布する前に、４：２：２クロマフォーマットマテリアルはしばしば、４：２：０クロマフォーマットに変換され、次いで、コンシューマに配布するために符号化される。クロマフォーマットに加えて、ビデオは、解像度およびフレームレートによっても特徴付けられる。例の解像度は３８４０ｘ２１６０の解像度の超高精細度（ＵＨＤ）、または７６８０ｘ４３２０の解像度の「８Ｋ」で、例のフレームレートは６０または１２０Ｈｚである。ルマサンプルレートは、約５００メガサンプル／秒から数ギガサンプル／秒の範囲であってもよい。４：２：０クロマフォーマットの場合、各クロマチャネルのサンプルレートは、ルマサンプルレートの４分の１であり、４：２：２クロマフォーマットの場合、各クロマチャネルのサンプルレートは、ルマサンプルレートの半分である。 Video data includes a sequence of frames of image data, each frame including one or more color channels. Generally, one primary color channel and two secondary color channels are required. Primary color channels are commonly referred to as "luma" channels, and secondary color channels are commonly referred to as "chroma" channels. Video data is typically displayed in an RGB (red-green-blue) color space, which has a high degree of correlation between its three respective components. Video data representations seen by encoders or decoders often use a color space such as YCbCr. YCbCr concentrates luminance mapped into "luma" according to a transfer function into the Y (primary) channel and chroma into the Cb and Cr (secondary) channels. Additionally, the Cb and Cr channels are spatially sampled (subsampled) at a lower rate, e.g. half horizontally and half vertically, compared to the luma channel, known as the "4:2:0 chroma format". may be done. The 4:2:0 chroma format is commonly used in "consumer" applications such as Internet video streaming, broadcast television, and storage on Blu-Ray ^TM discs. Subsampling the Cb and Cr channels at half the rate horizontally and not vertically is known as the "4:2:2 chroma format." The 4:2:2 chroma format is typically used in professional applications including capturing video for film production and the like. The higher sampling rate of the 4:2:2 chroma format makes the resulting video more resilient to editing operations such as color grading. Prior to distribution to consumers, 4:2:2 chroma format material is often converted to 4:2:0 chroma format and then encoded for distribution to consumers. In addition to chroma format, videos are also characterized by resolution and frame rate. Example resolutions are Ultra High Definition (UHD) with a resolution of 3840x2160, or "8K" with a resolution of 7680x4320, and example frame rates are 60 or 120Hz. The luma sample rate may range from about 500 megasamples/second to several gigasamples/second. For the 4:2:0 chroma format, the sample rate of each chroma channel is one-fourth of the luma sample rate, and for the 4:2:2 chroma format, the sample rate of each chroma channel is the luma sample rate. It is half of

ＶＶＣ規格は、「ブロックベース」コーデックであり、フレームは最初に、「符号化ツリーユニット」（ＣＴＵ）として知られる領域の正方形アレイに分割される。ＣＴＵは一般に、１２８×１２８ルマサンプルなどの比較的大きな面積を占有する。しかしながら、各フレームの右端および下端のＣＴＵは、面積がより小さくてもよい。各ＣＴＵには、ルマチャネルのための「符号化ツリー」と、クロマチャネルのための追加の符号化ツリーとが関連付けられている。符号化ツリーは、ＣＴＵの領域を「符号化ブロック」（ＣＢ）とも呼ばれる一連のブロックに分解することを定義する。単一の符号化ツリーがルマチャネルおよびクロマチャネルの両方のためのブロックを指定することも可能であり、その場合、並置された符号化ブロックの集合は「符号化ユニット」（ＣＵ）と呼ばれ、すなわち、各ＣＵは、各色チャネルについて符号化ブロックを有する。ＣＢは、特定の順序で符号化または復号するために処理される。４：２：０クロマフォーマットの使用の結果として、１２８×１２８ルマサンプル領域のためのルマ符号化ツリーを有するＣＴＵは、１２８×１２８ルマサンプル領域と一緒に配置された６４×６４クロマサンプル領域のための対応するクロマ符号化ツリーを有する。単一の符号化ツリーがルマチャネルおよびクロマチャネルのために使用されているとき、所与のエリアのためのコロケートされたブロックの集合は一般に、「ユニット」、例えば、上述のＣＵ、ならびに「予測ユニット」（ＰＵ）、および「変換ユニット」（ＴＵ）と呼ばれる。所与のエリアに対して別個の符号化ツリーが使用される場合、上述のＣＢ、ならびに「予測ブロック」（ＰＢ）、および「変換ブロック」（ＴＢ）が使用される。 The VVC standard is a "block-based" codec in which a frame is first divided into square arrays of regions known as "coding tree units" (CTUs). A CTU typically occupies a relatively large area, such as 128x128 luma samples. However, the right and bottom CTUs of each frame may have a smaller area. Associated with each CTU is a "coding tree" for the luma channel and an additional coding tree for the chroma channel. A coding tree defines a decomposition of a region of a CTU into a series of blocks, also called "coding blocks" (CBs). It is also possible that a single coding tree specifies blocks for both luma and chroma channels, in which case the set of co-located coding blocks is called a "coding unit" (CU); That is, each CU has a coded block for each color channel. CBs are processed for encoding or decoding in a particular order. As a result of the use of the 4:2:0 chroma format, a CTU with a luma encoding tree for a 128x128 luma sample area has a 64x64 chroma sample area co-located with a 128x128 luma sample area. has a corresponding chroma encoding tree for . When a single coding tree is used for the luma and chroma channels, the collection of collocated blocks for a given area is generally referred to as a "unit", e.g., the CU mentioned above, and a "prediction unit". ” (PU), and “transformation unit” (TU). If separate coding trees are used for a given area, the CBs mentioned above are used, as well as "prediction blocks" (PBs) and "transform blocks" (TBs).

「ユニット」と「ブロック」との間の上記の区別にもかかわらず、用語「ブロック」は、すべてのカラーチャネルに動作が適用されるフレームのエリアまたは領域に対する一般的な用語として使用されてもよい。 Despite the above distinction between "unit" and "block", the term "block" may also be used as a general term for an area or region of a frame in which an operation is applied to all color channels. good.

各ＣＵに対して、フレームデータの対応する領域のコンテンツ（サンプル値）の予測ユニット（ＰＵ）が生成される（「予測ユニット」）。さらに、予測とエンコーダへの入力で見られる領域のコンテンツとの間の差（または空間領域における「残差」）の表現が形成される。各色チャネルの差は、残差係数のシーケンスとして変換および符号化され、所与のＣＵのための１つまたは複数のＴＵを形成することができる。適用される変換は、残差値の各ブロックに適用される離散コサイン変換（ＤＣＴ）または他の変換とすることができる。この変換は分離可能に適用され、すなわち、２次元変換は、２つのパスで実行される。ブロックは最初に、ブロック内のサンプルの各行に１次元変換を適用することによって変換される。次に、部分結果は、部分結果の各列に１次元変換を適用することによって変換され、残差サンプルを実質的に非相関化する変換係数の最終ブロックを生成する。さまざまなサイズの変換は、長方形形状のブロックの変換を含めて、ＶＶＣ規格によってサポートされ、各側面寸法は２のべき乗である。変換係数は、ビットストリームへのエントロピー符号化のために量子化される。 For each CU, a prediction unit (PU) of the content (sample values) of the corresponding region of frame data is generated (“prediction unit”). Furthermore, a representation of the difference (or "residual" in the spatial domain) between the prediction and the content of the domain seen at the input to the encoder is formed. The differences for each color channel may be transformed and encoded as a sequence of residual coefficients to form one or more TUs for a given CU. The applied transform may be a discrete cosine transform (DCT) or other transform applied to each block of residual values. This transformation is applied separably, ie the two-dimensional transformation is performed in two passes. The block is first transformed by applying a one-dimensional transformation to each row of samples within the block. The partial results are then transformed by applying a one-dimensional transform to each column of the partial results to produce a final block of transform coefficients that substantially decorrelate the residual samples. Conversions of various sizes are supported by the VVC standard, including conversion of rectangular shaped blocks, each side dimension being a power of two. The transform coefficients are quantized for entropy encoding into the bitstream.

空間予測（「イントラ予測」）がＰＢを生成するために使用される場合、参照サンプルのセットが、現在のＰＢのための予測サンプルを生成するために使用される。参照サンプルは、既に「再構成」されているＰＢに隣接するサンプルを含む（イントラ予測サンプルへの残差サンプルの追加）。これらの隣接するサンプルは、ＰＢの上に行を形成し、ＰＢの左に列を形成する。行および列はまた、ＰＢ境界を越えて延在し、追加の近傍サンプルを含む。Ｚ順走査におけるブロックの走査により、参照サンプルの幾つかは直前のブロックにおいて再構成されている。直前のブロックからのサンプルの使用は、ビデオエンコーダまたはデコーダを通るブロックのスループットを制限するフィードバック依存性をもたらす。さらに、比較的小さいブロックが他のフレームから予測される場合（「インター予測」）、特にサブピクセル補間フィルタリングに適応するために必要とされる追加のサンプルを考慮すると、基準サンプルをフェッチするためのメモリ帯域幅が過剰になる可能性がある。 When spatial prediction (“intra prediction”) is used to generate a PB, a set of reference samples is used to generate predicted samples for the current PB. The reference samples include samples adjacent to the PB that have already been "reconstructed" (addition of residual samples to intra-predicted samples). These adjacent samples form rows above the PB and columns to the left of the PB. Rows and columns also extend beyond the PB boundaries and include additional neighboring samples. Due to the scanning of blocks in Z-scanning, some of the reference samples have been reconstructed in the previous block. The use of samples from the previous block introduces feedback dependencies that limit the throughput of the block through the video encoder or decoder. Furthermore, when relatively small blocks are predicted from other frames ("inter prediction"), the Memory bandwidth can become excessive.

本発明の目的は、既存の構成の１つまたは複数の欠点を実質的に克服するか、または少なくとも改善することである。 It is an object of the invention to substantially overcome or at least ameliorate one or more disadvantages of existing arrangements.

本開示の一態様は、ビデオビットストリームから画像フレームのカラーチャネルの変換ブロックを復号する方法であって、画像フレームのクロマフォーマットを決定することと、クロマフォーマットは、画像フレームのルマチャネルに対してサブサンプリングされる画像フレームのクロマチャネルを有し、変換ブロックの係数グループサイズを決定することと、係数グループサイズは１６サンプルまでの変換ブロックの最大領域であり、係数グループサイズは、変換ブロックサイズのみに基づいて決定され、（ｉ）変換ブロックのカラープレーンと、（ｉｉ）決定されたクロマフォーマットによるカラープレーンサブサンプリングとの両方に独立しており、ビデオビットストリームから、決定されたサイズの係数グループを使用して変換ブロックを復号することとを有することを特徴とする方法を提供する。 One aspect of the present disclosure is a method for decoding a transform block of color channels of an image frame from a video bitstream, the method comprising: determining a chroma format of the image frame; Having the chroma channels of the image frame being sampled and determining the coefficient group size of the transform block, the coefficient group size is the maximum area of the transform block up to 16 samples, and the coefficient group size is the transform block size only. coefficient groups of the determined size from the video bitstream, independent of both (i) the color plane of the transform block and (ii) color plane subsampling according to the determined chroma format. decoding a transform block using a transform block.

他の様態によれば、ビットストリームの画像フレームのルマ及びクロマカラープレーンに属する変換ブロックに単一のテーブルが使用される。 According to other aspects, a single table is used for transform blocks belonging to luma and chroma color planes of image frames of a bitstream.

他の様態によれば、係数グループサイズは、前記変換ブロック幅及び高さの制約内で１：１に最も近いアスペクト比を有するように選択される。 According to other aspects, a coefficient group size is selected to have an aspect ratio closest to 1:1 within the transform block width and height constraints.

本開示の他の様態は、ビデオビットストリームから画像フレームのカラーチャネルの変換ブロックを復号する方法を実施するコンピュータプログラムが記憶された非一時的コンピュータ可読媒体であって、前記プログラムが、画像フレームのクロマフォーマットを決定するためのコードと、クロマフォーマットは、画像フレームのルマチャネルに対してサブサンプリングされる画像フレームのクロマチャネルを有し、変換ブロックの係数グループサイズを決定するためのコードと、係数グループサイズは１６サンプルまでの変換ブロックの最大領域であり、係数グループサイズは、変換ブロックサイズのみに基づいて決定され、（ｉ）変換ブロックのカラープレーンと、（ｉｉ）決定されたクロマフォーマットによるカラープレーンサブサンプリングとの両方に独立しており、ビデオビットストリームから、決定されたサイズの係数グループを使用して変換ブロックを復号するためのコードとを有することを特徴とする非一時的コンピュータ可読媒体を提供する。 Another aspect of the disclosure is a non-transitory computer-readable medium having a computer program stored thereon for implementing a method of decoding transform blocks of color channels of image frames from a video bitstream, the program comprising: Code for determining a chroma format, where the chroma format has the chroma channels of the image frame subsampled with respect to the luma channels of the image frame, and a code for determining the coefficient group size of the transform block, The size is the maximum area of the transform block up to 16 samples, and the coefficient group size is determined based only on the transform block size, and (i) the color plane of the transform block and (ii) the color plane according to the determined chroma format. and a code for decoding a transform block from a video bitstream using coefficient groups of a determined size. provide.

本開示の他の様態は、ビデオデコーダであって、ビデオビットストリームから画像フレームのカラーチャネルの変換ブロックを受信し、画像フレームのクロマフォーマットを決定し、クロマフォーマットは、画像フレームのルマチャネルに対してサブサンプリングされる画像フレームのクロマチャネルを有し、変換ブロックの係数グループサイズを決定し、係数グループサイズは１６サンプルまでの変換ブロックの最大領域であり、係数グループサイズは、変換ブロックサイズのみに基づいて決定され、（ｉ）変換ブロックのカラープレーンと、（ｉｉ）決定されたクロマフォーマットによるカラープレーンサブサンプリングとの両方に独立しており、ビデオビットストリームから、決定されたサイズの係数グループを使用して変換ブロックを復号するように構成されていることを特徴とするビデオデコーダを提供する。 Another aspect of the disclosure is a video decoder that receives a transform block for a color channel of an image frame from a video bitstream, and determines a chroma format of the image frame, the chroma format for a luma channel of the image frame. Have the chroma channel of the image frame to be subsampled and determine the coefficient group size of the transform block, where the coefficient group size is the maximum area of the transform block up to 16 samples, and the coefficient group size is based only on the transform block size. using coefficient groups of the determined size from the video bitstream, independent of both (i) the color plane of the transform block and (ii) color plane subsampling according to the determined chroma format. A video decoder is provided, wherein the video decoder is configured to decode a transform block.

本開示の他の様態は、システムであって、メモリと、プロセッサと、を有し、ここで、前記プロセッサは、ビデオビットストリームから画像フレームのカラーチャネルの変換ブロックを復号する方法を実施するための、前記メモリに記憶されたコードを実行するように構成され、前記方法は、画像フレームのクロマフォーマットを決定することと、クロマフォーマットは、画像フレームのルマチャネルに対してサブサンプリングされる画像フレームのクロマチャネルを有し、変換ブロックの係数グループサイズを決定することと、係数グループサイズは１６サンプルまでの変換ブロックの最大領域であり、係数グループサイズは、変換ブロックサイズのみに基づいて決定され、（ｉ）変換ブロックのカラープレーンと、（ｉｉ）決定されたクロマフォーマットによるカラープレーンサブサンプリングとの両方に独立しており、ビデオビットストリームから、決定されたサイズの係数グループを使用して変換ブロックを復号することとを有することを特徴とするシステムを提供する。 Another aspect of the disclosure is a system having a memory and a processor, wherein the processor is configured to perform a method for decoding a transform block of color channels of an image frame from a video bitstream. The method is configured to execute code stored in the memory of the image frame, the method comprising: determining a chroma format of an image frame, the chroma format being subsampled relative to a luma channel of the image frame; chroma channel and determining the coefficient group size of the transform block, the coefficient group size is the maximum area of the transform block up to 16 samples, the coefficient group size is determined based only on the transform block size, and ( from the video bitstream, using coefficient groups of the determined size, independent of both i) the color plane of the transform block and (ii) color plane subsampling with the determined chroma format. and decoding.

他の態様も開示される。 Other aspects are also disclosed.

本発明の少なくとも１つの例示的な実施形態を、以下の図面および付録を参照して説明する。
図１は、ビデオ符号化及び復号システムを示す概略ブロック図である。図２Ａは、図１のビデオ符号化および復号システムの一方または両方を実施することができる汎用コンピュータシステムの概略ブロック図を形成する。図２Ｂは、図１のビデオ符号化および復号システムの一方または両方を実施することができる汎用コンピュータシステムの概略ブロック図を形成する。図３は、ビデオエンコーダの機能モジュールを示す概略ブロック図である。図４は、ビデオデコーダの機能モジュールを示す概略ブロック図である。図５は、汎用ビデオ符号化のツリー構造における１つ以上のブロックへのブロックの利用可能な分割を示す概略ブロック図である。図６は、汎用ビデオ符号化のツリー構造における１つ以上のブロックへのブロックの許可された分割を達成するためのデータフローの概略図である。図７Ａは、符号化ツリーユニット（ＣＴＵ）をいくつかの符号化ユニット（ＣＵ）に分割する例を示す。図７Ｂは、符号化ツリーユニット（ＣＴＵ）をいくつかの符号化ユニット（ＣＵ）に分割する例を示す。図８Ａは、符号化ツリーユニット（ＣＴＵ）を、ルマチャネルおよびクロマチャネルにおけるいくつかの符号化ブロック（ＣＢ）に分割する例を示す。図８Ｂは、符号化ツリーユニット（ＣＴＵ）を、ルマチャネルおよびクロマチャネルにおけるいくつかの符号化ブロック（ＣＢ）に分割する例を示す。図８Ｃは、符号化ツリーユニット（ＣＴＵ）を、ルマチャネルおよびクロマチャネルにおけるいくつかの符号化ブロック（ＣＢ）に分割する例を示す。図９は、変換ブロックサイズおよび関連するスキャンパターンの集合を示す。図１０は、ルマ符号化ツリーおよびクロマ符号化ツリーにおいて許可された分割のリストを生成するための規則のセットを示す。図１１は、画像フレームの符号化ツリーをビデオビットストリームに符号化するための方法を示す。図１２は、ビデオビットストリームから画像フレームの符号化ツリーを復号する方法を示す。図１３は、画像フレームの符号化ツリーをビデオビットストリームに符号化する方法を示す。図１４は、ビデオビットストリームから画像フレームの符号化ツリーを復号する方法を示す。図１５は、イントラ予測符号化ユニットの変換ブロック分割の集合を示す。図１６は、画像フレームの符号化ユニットをビデオビットストリームに符号化する方法を示す。図１７は、ビデオビットストリームから画像フレームの符号化ユニットを復号する方法を示す。 At least one exemplary embodiment of the invention is described with reference to the following drawings and appendices.
FIG. 1 is a schematic block diagram illustrating a video encoding and decoding system. FIG. 2A forms a schematic block diagram of a general purpose computer system capable of implementing one or both of the video encoding and decoding systems of FIG. FIG. 2B forms a schematic block diagram of a general purpose computer system capable of implementing one or both of the video encoding and decoding systems of FIG. FIG. 3 is a schematic block diagram showing functional modules of a video encoder. FIG. 4 is a schematic block diagram showing functional modules of a video decoder. FIG. 5 is a schematic block diagram illustrating possible partitioning of blocks into one or more blocks in a tree structure of general purpose video encoding. FIG. 6 is a schematic diagram of the data flow to achieve the permitted partitioning of blocks into one or more blocks in a tree structure of general purpose video encoding. FIG. 7A shows an example of dividing a coding tree unit (CTU) into several coding units (CU). FIG. 7B shows an example of dividing a coding tree unit (CTU) into several coding units (CU). FIG. 8A shows an example of dividing a coding tree unit (CTU) into several coding blocks (CBs) in a luma channel and a chroma channel. FIG. 8B shows an example of dividing a coding tree unit (CTU) into several coding blocks (CB) in a luma channel and a chroma channel. FIG. 8C shows an example of dividing a coding tree unit (CTU) into several coding blocks (CB) in a luma channel and a chroma channel. FIG. 9 shows a set of transform block sizes and associated scan patterns. FIG. 10 shows a set of rules for generating a list of allowed splits in the luma and chroma encoding trees. FIG. 11 shows a method for encoding a coding tree of image frames into a video bitstream. FIG. 12 shows a method for decoding a coding tree of image frames from a video bitstream. FIG. 13 shows a method of encoding an encoding tree of image frames into a video bitstream. FIG. 14 shows a method for decoding a coding tree of image frames from a video bitstream. FIG. 15 shows a set of transform block partitions of an intra predictive coding unit. FIG. 16 shows a method of encoding encoding units of image frames into a video bitstream. FIG. 17 shows a method for decoding coded units of an image frame from a video bitstream.

添付の図面のいずれか１以上において、同一の参照符号を有するステップ及び／又は特徴を参照する場合、それらのステップ及び／又は特徴は本明細書の目的のために、反対の意図が現れない限り、同一の機能又は動作を有する。 References to steps and/or features having the same reference numerals in any one or more of the accompanying drawings refer to those steps and/or features for the purposes of this specification unless a contrary intention appears. , have the same function or operation.

上述のように、直前のブロックからのサンプルの使用は、ビデオエンコーダまたはデコーダにおけるブロックのスループットを制限し得るフィードバック依存性をもたらす。典型的なリアルタイム符号化および復号アプリケーションに必要とされるように、高レートの処理ブロックを維持できることを保証するために、結果として生じるフィードバック依存性ループの重大性を軽減する方法が望ましい。フィードバック依存ループは例えば、毎秒５００－４０００サンプルからの現代のビデオフォーマットの高いサンプルレートに対して特に問題であるが、ＡＳＩＣ（特定用途向け集積回路）クロック周波数は典型的には数百ＭＨｚである。 As mentioned above, the use of samples from the previous block introduces feedback dependencies that can limit the throughput of the block at the video encoder or decoder. To ensure that high rate processing blocks can be maintained, as is required for typical real-time encoding and decoding applications, methods that reduce the severity of the resulting feedback dependency loops are desirable. Feedback dependent loops are particularly problematic for the high sample rates of modern video formats, for example from 500-4000 samples per second, whereas ASIC (Application Specific Integrated Circuit) clock frequencies are typically hundreds of MHz. .

図１は、ビデオ符号化及び復号システム１００の機能モジュールを示す概略ブロック図である。システム１００は、遭遇する最悪の場合のブロック処理レートを低減するために、ルマ符号化ツリーおよびクロマ符号化ツリーにおける領域の許容される再分割のための異なる規則を利用することができる。例えば、システム１００は、ブロックのアスペクト比にかかわらず、ブロックが常に１６（１６）サンプルの倍数としてサイズ設定されるように動作することができる。さらに、符号化ツリーが小さなルマ符号化ブロックの存在を示す分割を含む場合、分割は、クロマチャネルにおいて禁止されてもよく、その結果、単一のクロマＣＢが複数のルマＣＢと並置される。クロマＣＢは、（１つまたは複数のルマＣＢがインター予測を使用する場合を含む）各コロケートされたルマＣＢの予測モードとは独立して、１つのイントラ予測モードなどの単一の予測モードを使用することができる。残留係数符号化は、２つのサンプルの幅または高さを有するブロックの場合を含めて、１６ブロックサイズの倍数を利用することもできる。 FIG. 1 is a schematic block diagram illustrating functional modules of a video encoding and decoding system 100. System 100 may utilize different rules for permissible subdivision of regions in luma and chroma encoding trees to reduce the worst-case block processing rate encountered. For example, system 100 may operate such that blocks are always sized as multiples of sixteen (16) samples, regardless of the block's aspect ratio. Additionally, if the coding tree includes a split that indicates the presence of small luma coded blocks, splitting may be prohibited in the chroma channel, resulting in a single chroma CB being juxtaposed with multiple luma CBs. A chroma CB can use a single prediction mode, such as one intra-prediction mode, independently of the prediction mode of each collocated luma CB (including when one or more luma CBs use inter-prediction). can be used. Residual coefficient encoding can also utilize multiples of 16 block sizes, including for blocks with a width or height of two samples.

システム１００は、ソース装置１１０と宛先装置１３０とを含む。通信チャネル１２０は、符号化されたビデオ情報をソース装置１１０から宛先装置１３０に通信するために使用される。いくつかの構成では、ソース装置１１０および宛先装置１３０がそれぞれの携帯電話ハンドセットまたは「スマートフォン」のいずれかまたは両方を備えることができ、その場合、通信チャネル１２０はワイヤレスチャネルである。他の構成では、ソース装置１１０および宛先装置１３０がビデオ会議機器を備えることができ、その場合、通信チャネル１２０は通常、インターネット接続などの有線チャネルである。さらに、ソース装置１１０および宛先装置１３０は、無線テレビ放送、ケーブルテレビアプリケーション、インターネットビデオアプリケーション（ストリーミングを含む）、およびファイルサーバ内のハードディスクドライブなどの何らかのコンピュータ可読記憶媒体上に符号化ビデオデータが取り込まれるアプリケーションをサポートする装置を含む、広範囲の装置のうちの任意のものを備えることができる。 System 100 includes a source device 110 and a destination device 130. Communication channel 120 is used to communicate encoded video information from source device 110 to destination device 130. In some configurations, source device 110 and destination device 130 may comprise either or both respective mobile telephone handsets or "smartphones," in which case communication channel 120 is a wireless channel. In other configurations, source device 110 and destination device 130 may include video conferencing equipment, in which case communication channel 120 is typically a wired channel, such as an Internet connection. In addition, source device 110 and destination device 130 may be configured to receive encoded video data on some computer-readable storage medium, such as over-the-air television broadcasts, cable television applications, Internet video applications (including streaming), and hard disk drives within file servers. Any of a wide variety of devices may be included, including devices that support applications that are supported.

図１に示すように、ソース装置１１０は、ビデオソース１１２と、ビデオエンコーダ１１４と、送信機１１６と、を含む。ビデオソース１１２は、典型的には撮像センサ等の、撮像されたビデオフレームデータ（１１３として示されている）のソース、非一時的記録媒体上に格納された前に撮像されたビデオシーケンス、又はリモート撮像センサからのビデオ、を有する。ビデオソース１１２はまた、コンピュータグラフィックスカードの出力であってもよく、例えば、タブレットコンピュータなどのコンピューティングデバイスで実行されているオペレーティングシステムとさまざまなアプリケーションのビデオ出力を表示する。ビデオソース１１２として撮像センサを含み得るソース装置１１０の例は、スマートフォン、ビデオカメラ、業務用ビデオカメラ、およびネットワークビデオカメラを含む。 As shown in FIG. 1, source device 110 includes a video source 112, a video encoder 114, and a transmitter 116. Video source 112 is typically a source of imaged video frame data (shown as 113), such as an imaging sensor, a previously imaged video sequence stored on a non-transitory storage medium, or video from a remote imaging sensor. Video source 112 may also be the output of a computer graphics card, for example, displaying the video output of an operating system and various applications running on a computing device, such as a tablet computer. Examples of source devices 110 that may include an imaging sensor as a video source 112 include smartphones, video cameras, professional video cameras, and network video cameras.

ビデオエンコーダ１１４は、図３を参照してさらに説明されるように、ビデオソース１１２からの撮像されたフレームデータ（矢印１１３によって示される）をビットストリーム（矢印１１５によって示される）に変換（または「符号化」）する。ビットストリーム１１５は、符号化されたビデオデータ（または「符号化されたビデオ情報」）として通信チャネル１２０を介して送信機１１６によって送信される。ビットストリーム１１５は後に通信チャネル１２０を介して送信されるまで、または通信チャネル１２０を介した送信の代わりに、「フラッシュ」メモリまたはハードディスクドライブなどの非一時的記憶装置１２２に記憶されることも可能である。 Video encoder 114 converts (or " "encoding"). Bitstream 115 is transmitted by transmitter 116 over communication channel 120 as encoded video data (or “encoded video information”). Bitstream 115 may also be stored in non-transitory storage 122, such as "flash" memory or a hard disk drive, until or instead of being transmitted over communication channel 120 at a later time. It is.

宛先装置１３０は、受信機１３２と、ビデオデコーダ１３４と、表示装置１３６と、を含む。受信機１３２は、通信チャネル１２０から符号化されたビデオデータを受信し、受信されたビデオデータをビットストリームとしてビデオデコーダ１３４に渡す（矢印１３３によって示される）。そして、ビデオデコーダ１３４は、（矢印１３５で示す）復号フレームデータを表示装置１３６に出力する。復号フレームデータ１３５は、フレームデータ１１３と同じクロマフォーマットを有する。表示装置１３６の例には、陰極線管、スマートフォン、タブレットコンピュータ、コンピュータモニタ、またはスタンドアロンテレビセットなどの液晶ディスプレイが含まれる。また、ソース装置１１０および宛先装置１３０の各々の機能性が単一の装置で実現されることも可能であり、その例は、携帯電話ハンドセットおよびタブレットコンピュータを含む。 Destination device 130 includes a receiver 132, a video decoder 134, and a display device 136. Receiver 132 receives encoded video data from communication channel 120 and passes the received video data as a bitstream to video decoder 134 (indicated by arrow 133). Video decoder 134 then outputs decoded frame data (indicated by arrow 135) to display device 136. Decoded frame data 135 has the same chroma format as frame data 113. Examples of display devices 136 include a cathode ray tube, a liquid crystal display such as a smartphone, a tablet computer, a computer monitor, or a stand-alone television set. It is also possible that the functionality of each of source device 110 and destination device 130 is implemented in a single device, examples of which include a mobile telephone handset and a tablet computer.

上記の例示的なデバイスにもかかわらず、ソース装置１１０および宛先装置１３０のそれぞれは、一般にハードウェアおよびソフトウェア構成要素の組合せを介して、汎用コンピューティングシステム内で構成され得る。図２Ａは、コンピュータモジュール２０１と、キーボード２０２、マウスポインタデバイス２０３、スキャナ２２６、ビデオソース１１２として構成することができるカメラ２２７、およびマイクロフォン２８０などの入力デバイスと、プリンタ２１５、表示装置１３６として構成することができるディスプレイデバイス２１４、およびスピーカ２１７を含む出力デバイスと、を含む、そのようなコンピュータシステム２００を示す。外部変復調器（モデム）トランシーバ装置２１６は、接続２２１を介して通信ネットワーク２２０との間で通信するためにコンピュータモジュール２０１によって使用され得る。通信チャネル１２０を表すことができる通信ネットワーク２２０は、インターネット、セルラ電気通信ネットワーク、またはプライベートＷＡＮなどの広域ネットワーク（ＷＡＮ）であってもよい。接続２２１が電話回線である場合、モデム２１６は従来の「ダイヤルアップ」モデムであってもよい。あるいは接続２２１が大容量（例えば、ケーブルまたは光）接続である場合、モデム２１６はブロードバンドモデムであってもよい。無線モデムはまた、通信ネットワーク２２０への無線接続のために使用されてもよい。トランシーバ装置２１６は、送信機１１６及び受信機１３２の機能性を提供することができ、通信チャネル１２０は、接続２２１内に具現化することができる。 Notwithstanding the example devices described above, each of source device 110 and destination device 130 may be configured within a general-purpose computing system, generally through a combination of hardware and software components. FIG. 2A shows a computer module 201 configured with input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, which can be configured as a video source 112, and a microphone 280, a printer 215, and a display device 136. 2 shows such a computer system 200 including a display device 214 that can be used, and an output device that includes a speaker 217. External modem transceiver device 216 may be used by computer module 201 to communicate with communications network 220 via connection 221 . Communication network 220, which may represent communication channel 120, may be a wide area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. If connection 221 is a telephone line, modem 216 may be a conventional "dial-up" modem. Alternatively, if connection 221 is a high capacity (eg, cable or optical) connection, modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to communications network 220. Transceiver device 216 may provide transmitter 116 and receiver 132 functionality, and communication channel 120 may be embodied in connection 221.

コンピュータモジュール２０１は、典型的には少なくとも１つのプロセッサユニット２０５と、メモリユニット２０６とを含む。例えば、メモリユニット２０６は、半導体ランダムアクセスメモリ（ＲＡＭ）及び半導体リードオンリーメモリ（ROM）を有することができる。コンピュータモジュール２０１はまた、ビデオディスプレイ２１４、スピーカ２１７、およびマイクロフォン２８０に結合するオーディオビデオインターフェース２０７、キーボード２０２、マウス２０３、スキャナ２２６、カメラ２２７、およびオプションとしてジョイスティックまたは他のヒューマンインターフェースデバイス（図示せず）に結合するＩ／Ｏインターフェース２１３、ならびに外部モデム２１６およびプリンタ２１５のためのインターフェース２０８を含む、いくつかの入出力（Ｉ／Ｏ）インターフェースを含む。オーディオビデオインターフェース２０７からコンピュータモニタ２１４への信号は一般に、コンピュータグラフィックスカードの出力である。いくつかの実装では、モデム２１６が、例えばインターフェース２０８内のコンピュータモジュール２０１内に組み込まれてもよい。コンピュータモジュール２０１はまた、ローカルネットワークインターフェース２１１を有し、これは、接続２２３を介して、ローカルエリアネットワーク（ＬＡＮ）として知られるローカルエリア通信ネットワーク２２２への、コンピュータシステム２００の結合を可能にする。図２Ａに示すように、ローカル通信ネットワーク２２２は、通常、いわゆる「ファイアウォール」デバイスまたは同様の機能のデバイスを含む接続２２４を介してワイドネットワーク２２０に結合することもできる。ローカルネットワークインターフェース２１１は、イーサネットTM回路カード、ブルートゥースTMワイヤレス構成又はＩＥＥＥ８０２．１１ワイヤレス構成を含むことができるが、インターフェース２１１のために多くの他のタイプのインターフェースが実施されてもよい。ローカルネットワークインターフェース２１１は、また、送信機１１６の機能を提供することができ、受信機１３２および通信チャネル１２０はまた、ローカル通信ネットワーク２２２において具現化することができる。 Computer module 201 typically includes at least one processor unit 205 and memory unit 206. For example, memory unit 206 can include semiconductor random access memory (RAM) and semiconductor read only memory (ROM). Computer module 201 also includes an audio video interface 207 that couples to a video display 214, speakers 217, and microphone 280, a keyboard 202, a mouse 203, a scanner 226, a camera 227, and optionally a joystick or other human interface device (not shown). ), and an interface 208 for an external modem 216 and printer 215. The signal from audio video interface 207 to computer monitor 214 is typically the output of a computer graphics card. In some implementations, modem 216 may be incorporated within computer module 201 within interface 208, for example. Computer module 201 also has a local network interface 211, which enables coupling of computer system 200 via connection 223 to a local area communications network 222, known as a local area network (LAN). As shown in FIG. 2A, local communication network 222 may also be coupled to wide network 220 via a connection 224, which typically includes a so-called "firewall" device or a device of similar functionality. Local network interface 211 may include an Ethernet™ circuit card, a Bluetooth™ wireless configuration, or an IEEE 802.11 wireless configuration, although many other types of interfaces may be implemented for interface 211. Local network interface 211 may also provide the functionality of transmitter 116, and receiver 132 and communication channel 120 may also be embodied in local communication network 222.

Ｉ／Ｏインターフェース２０８および２１３は、シリアルコネクティビティおよびパラレルコネクティビティのいずれかまたは両方を提供することができ、前者は、典型的にはユニバーサルシリアルバス（ＵＳＢ）規格に従って実施され、対応するＵＳＢコネクタ（図示せず）を有する。記憶装置２０９が提供され、典型的にはハードディスクドライブ（ＨＤＤ）２１０を含む。フロッピーディスクドライブおよび磁気テープドライブ（図示せず）などの他の記憶装置も使用することができる。光ディスクドライブ２１２は、典型的にはデータの不揮発性ソースとして機能するために設けられる。例えば、光ディスク（例えば、ＣＤ－ＲＯＭ、ＤＶＤ、Blu ray DiscTM)、ＵＳＢ－ＲＡＭ、ポータブル、外部ハードドライブ、およびフロッピーディスクなどのポータブルメモリデバイスは、コンピュータシステム２００に対するデータの適切なソースとして使用することができる。典型的にはＨＤＤ２１０、光ドライブ２１２、ネットワーク２２０及び２２２のいずれかはビデオソース１１２として、又はディスプレイ２１４を介して再生するために記憶されるべき復号されたビデオデータのための宛先として動作するように構成されてもよい。システム１００のソース装置１１０および宛先装置１３０は、コンピュータシステム２００において具現化されてもよい。 I/O interfaces 208 and 213 may provide either or both serial and parallel connectivity, with the former typically implemented according to the Universal Serial Bus (USB) standard and with a corresponding USB connector (Fig. (not shown). Storage 209 is provided and typically includes a hard disk drive (HDD) 210. Other storage devices such as floppy disk drives and magnetic tape drives (not shown) may also be used. Optical disk drive 212 is typically provided to serve as a non-volatile source of data. For example, portable memory devices such as optical disks (e.g., CD-ROM, DVD, Blu-ray DiscTM), USB-RAM, portable, external hard drives, and floppy disks may be used as suitable sources of data for computer system 200. I can do it. Typically, HDD 210, optical drive 212, and any of networks 220 and 222 are configured to operate as video source 112 or as a destination for decoded video data to be stored for playback via display 214. may be configured. Source device 110 and destination device 130 of system 100 may be embodied in computer system 200.

コンピュータモジュール２０１の構成要素２０５～２１３は、典型的には相互接続バス２０４を介して、当業者に知られているコンピュータシステム２００の従来の動作モードをもたらす方法で通信する。例えば、プロセッサ２０５は、接続２１８を用いてシステムバス２０４に結合される。同様に、メモリ２０６および光ディスクドライブ２１２は、接続２１９によってシステムバス２０４に結合される。上記の構成が実行可能なコンピュータの例には、ＩＢＭ－ＰＣおよび互換機、Sun SPARCステーション、Apple MacTMまたは同様のコンピュータシステムが含まれる。 Components 205-213 of computer module 201 typically communicate via interconnect bus 204 in a manner that provides conventional modes of operation of computer system 200 as known to those skilled in the art. For example, processor 205 is coupled to system bus 204 using connection 218. Similarly, memory 206 and optical disk drive 212 are coupled to system bus 204 by connection 219. Examples of computers on which the above configuration can be performed include IBM-PCs and compatibles, Sun SPARC stations, Apple MacTM or similar computer systems.

適切または必要な場合、ビデオエンコーダ１１４およびビデオデコーダ１３４、ならびに以下で説明する方法は、コンピュータシステム２００を使用して実施することができる。具体的には、ビデオエンコーダ１１４、ビデオデコーダ１３４、および説明される方法は、コンピュータシステム２００内で実行可能な１つまたは複数のソフトウェアアプリケーションプログラム２３３として実施することができる。具体的には、ビデオエンコーダ１１４、ビデオデコーダ１３４、および説明する方法のステップは、コンピュータシステム２００内で実行されるソフトウェア２３３内の命令２３１（図２Ｂ参照）によって実行される。ソフトウェア命令２３１は、それぞれが１つ以上の特定のタスクを実行するための１つ以上のコードモジュールとして形成されてもよい。ソフトウェアはまた、２つの別個の部分に分割されてもよく、その場合、第１の部分と対応するコードモジュールは説明される方法を実行し、第２の部分と対応するコードモジュールは、第１の部分とユーザとの間のユーザインターフェースを管理する。 Where appropriate or necessary, video encoder 114 and video decoder 134 and the methods described below may be implemented using computer system 200. In particular, video encoder 114, video decoder 134, and the described methods may be implemented as one or more software application programs 233 executable within computer system 200. Specifically, video encoder 114, video decoder 134, and the steps of the described method are performed by instructions 231 (see FIG. 2B) within software 233 executing within computer system 200. Software instructions 231 may be formed as one or more code modules, each for performing one or more specific tasks. The software may also be divided into two separate parts, where the code module corresponding to the first part performs the described method and the code module corresponding to the second part performs the method described. Manages the user interface between parts of the system and the user.

ソフトウェアは例えば、以下に説明する記憶装置を含むコンピュータ可読媒体に記憶することができる。ソフトウェアは、コンピュータ可読媒体からコンピュータシステム２００にロードされ、その後、コンピュータシステム２００によって実行される。このようなソフトウェア又はコンピュータ可読媒体に記録されたコンピュータプログラムを有するコンピュータ可読媒体は、コンピュータプログラム製品である。コンピュータシステム２００におけるコンピュータプログラム製品の使用は、ビデオエンコーダ１１４、ビデオデコーダ１３４、および説明される方法を実施するための有利な装置をもたらすことが好ましい。 The software may be stored on computer readable media, including, for example, the storage devices described below. The software is loaded into computer system 200 from a computer readable medium and then executed by computer system 200. A computer readable medium having such software or a computer program recorded thereon is a computer program product. Preferably, use of the computer program product in computer system 200 provides video encoder 114, video decoder 134, and advantageous apparatus for implementing the described methods.

ソフトウェア２３３は、典型的にはＨＤＤ２１０またはメモリ２０６に記憶される。ソフトウェアは、コンピュータ可読媒体からコンピュータシステム２００にロードされ、コンピュータシステム２００によって実行される。したがって、例えば、ソフトウェア２３３は、光ディスクドライブ２１２によって読み取られる光学的に読み取り可能なディスク記憶媒体（例えば、ＣＤ－ＲＯＭ）２２５に記憶することができる。 Software 233 is typically stored on HDD 210 or memory 206. The software is loaded onto computer system 200 from a computer readable medium and executed by computer system 200. Thus, for example, software 233 may be stored on an optically readable disk storage medium (eg, CD-ROM) 225 that is read by optical disk drive 212.

場合によっては、アプリケーションプログラム２３３が１つ以上のＣＤ－ＲＯＭ２２５上で符号化されてユーザに供給され、対応するドライブ２１２を介して読み出されてもよく、あるいはネットワーク２２０または２２２からユーザによって読み出されてもよい。さらに、ソフトウェアは、他のコンピュータ可読媒体からコンピュータシステム２００にロードすることもできる。コンピュータ可読記憶媒体は、実行および／または処理のために記録された命令および／またはデータをコンピュータシステム２００に提供する任意の非一時的な有形の記憶媒体を指す。このような記憶媒体の例としては、フロッピーディスク、磁気テープ、ＣＤ－ＲＯＭ、ＤＶＤ、Blu-ray DiscTM、ハードディスクドライブ、ＲＯＭまたは集積回路、ＵＳＢメモリ、光磁気ディスク、またはＰＣＭＣＩＡカードなどのコンピュータ可読カードを含み、そのような装置がコンピュータモジュール２０１の内部または外部であるか否かは問わない。コンピュータモジュール４０１へのソフトウェア、アプリケーションプログラム、命令および／またはビデオデータまたは符号化されたビデオデータの提供にも参加し得る一時的なまたは非有形のコンピュータ可読伝送媒体の例には、無線または赤外線伝送チャネル、ならびに別のコンピュータまたはネットワーク接続された装置へのネットワーク接続、ならびにウェブサイトなどに記録された電子メール伝送および情報を含むインターネットまたはイントラネットが含まれる。 In some cases, the application program 233 may be provided to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or read by the user from the network 220 or 222. may be done. Additionally, software can also be loaded onto computer system 200 from other computer-readable media. Computer-readable storage media refers to any non-transitory, tangible storage media that provides recorded instructions and/or data to computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tapes, CD-ROMs, DVDs, Blu-ray DiscTM, hard disk drives, ROMs or integrated circuits, USB memories, magneto-optical disks, or computer readable cards such as PCMCIA cards. , whether such devices are internal or external to computer module 201. Examples of transitory or non-tangible computer-readable transmission media that may also participate in providing software, application programs, instructions and/or video data or encoded video data to computer module 401 include wireless or infrared transmissions. Channels and network connections to other computers or networked devices, as well as the Internet or intranets, including e-mail transmissions and information recorded on websites and the like.

アプリケーションプログラム２３３の第２の部分および上記の対応するコードモジュールは、ディスプレイ２１４上でレンダリングされるかまたは他の方法で表される１つ以上のグラフィカルユーザインタフェース（ＧＵＩ）を実装するために実行されてもよい。典型的にはキーボード２０２およびマウス２０３の操作を通して、アプリケーションおよびコンピュータシステム２００のユーザは機能的に適応可能な方法でインターフェースを操作し、ＧＵＩに関連するアプリケーションに制御コマンドおよび／または入力を提供することができる。スピーカ２１７を介して出力されるスピーチプロンプトおよびマイクロフォン２８０を介して入力されるユーザ音声コマンドを利用するオーディオインターフェースなど、他の形態の機能的に適応可能なユーザインターフェースを実装することもできる。 The second portion of application program 233 and the corresponding code modules described above are executed to implement one or more graphical user interfaces (GUIs) that are rendered or otherwise represented on display 214. You can. Typically through operation of a keyboard 202 and a mouse 203, applications and users of computer system 200 may manipulate the interface in a functionally adaptive manner to provide control commands and/or input to applications associated with the GUI. Can be done. Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface that utilizes speech prompts output via speaker 217 and user voice commands input via microphone 280.

図２Ｂは、プロセッサ２０５および「メモリ」２３４の詳細な概略ブロック図である。メモリ２３４は、図２Ａのコンピュータモジュール２０１がアクセス可能な全てのメモリモジュール（ＨＤＤ２０９及び半導体メモリ２０６を含む）の論理集合体を表す。 FIG. 2B is a detailed schematic block diagram of processor 205 and “memory” 234. Memory 234 represents a logical collection of all memory modules (including HDD 209 and semiconductor memory 206) that are accessible to computer module 201 of FIG. 2A.

最初にコンピュータモジュール２０１の電源が入ると、パワーオン自己テスト（ＰＯＳＴ）プログラム２５０が実行される。ＰＯＳＴプログラム２５０は、典型的には図２Ａの半導体メモリ２０６のＲＯＭ２４９に記憶される。ソフトウェアを記憶するＲＯＭ２４９などのハードウェアデバイスは、ファームウェアと呼ばれることもある。ＰＯＳＴプログラム２５０は、コンピュータモジュール２０１内のハードウェアを検査して、適切に機能することを確認し、通常、正しい動作のために、プロセッサ２０５、メモリ２３４（２０９、２０６）、および基本入出力システムソフトウェア（ＢＩＯＳ）モジュール２５１（通常はＲＯＭ２４９にも格納される）をチェックする。ＰＯＳＴプログラム２５０が正常に実行されると、ＢＩＯＳ２５１は、図２Ａのハードディスクドライブ２１０を起動する。ハードディスクドライブ２１０を起動すると、ハードディスクドライブ２１０上に常駐するブートストラップローダプログラム２５２がプロセッサ２０５を介して実行される。これにより、オペレーティングシステム２５３がＲＡＭメモリ２０６にロードされ、その上でオペレーティングシステム２５３が動作を開始する。オペレーティングシステム２５３は、プロセッサ２０５によって実行可能なシステムレベルアプリケーションであり、プロセッサ管理、メモリ管理、デバイス管理、ストレージ管理、ソフトウェアアプリケーションインタフェース、および汎用ユーザインタフェースを含む様々な高レベルの機能を満たす。 When computer module 201 is first powered on, a power-on self-test (POST) program 250 is executed. POST program 250 is typically stored in ROM 249 of semiconductor memory 206 in FIG. 2A. A hardware device such as ROM 249 that stores software is sometimes referred to as firmware. The POST program 250 checks the hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, memory 234 (209, 206), and basic input/output system for correct operation. Check the software (BIOS) module 251 (usually also stored in ROM 249). When the POST program 250 is successfully executed, the BIOS 251 boots the hard disk drive 210 of FIG. 2A. When the hard disk drive 210 is started, the bootstrap loader program 252 resident on the hard disk drive 210 is executed via the processor 205. As a result, the operating system 253 is loaded into the RAM memory 206, and the operating system 253 starts operating thereon. Operating system 253 is a system-level application executable by processor 205 that performs various high-level functions including processor management, memory management, device management, storage management, software application interfaces, and general purpose user interfaces.

オペレーティングシステム２５３は、メモリ２３４（２０９、２０６）を管理して、コンピュータモジュール２０１上で実行される各プロセスまたはアプリケーションが別のプロセスに割り当てられたメモリと衝突することなく実行するのに十分なメモリを有することを保証する。さらに、図２Ａのコンピュータシステム２００で利用可能な異なるタイプのメモリは、各プロセスが効果的に実行できるように、適切に使用されなければならない。したがって、集約メモリ２３４は、メモリの特定のセグメントが（特に明記されていない限り）どのように割り当てられるかを示すことを意図するものではなく、むしろ、コンピュータシステム２００によってアクセス可能なメモリの一般的なビューと、そのようなセグメントがどのように使用されるかを提供することを意図するものである。 Operating system 253 manages memory 234 (209, 206) to ensure that each process or application running on computer module 201 has enough memory to run without conflicting with memory allocated to another process. We guarantee that we have the following. Furthermore, the different types of memory available in the computer system 200 of FIG. 2A must be used appropriately so that each process can execute effectively. Accordingly, aggregate memory 234 is not intended to indicate how particular segments of memory are allocated (unless otherwise specified), but rather a general representation of memory accessible by computer system 200. It is intended to provide a comprehensive view of how such segments are used.

図２Ｂに示すように、プロセッサ２０５は、制御部２３９、演算論理ユニット（ＡＬＵ）２４０、時にはキャッシュメモリと呼ばれるローカルまたは内部メモリ２４８、を含む多数の機能モジュールを含む。キャッシュメモリ２４８は、典型的にはレジスタセクション内に多数の記憶レジスタ２４４～２４６を含む。１つ以上の内部バス２４１は、これらの機能モジュールを機能的に相互接続する。プロセッサ２０５はまた、典型的には、接続２１８を使用して、システムバス２０４を介して外部装置と通信するための１つ以上のインターフェース２４２を有する。メモリ２３４は、接続２１９を使用してバス２０４に結合される。 As shown in FIG. 2B, processor 205 includes a number of functional modules, including a controller 239, an arithmetic logic unit (ALU) 240, and local or internal memory 248, sometimes referred to as cache memory. Cache memory 248 typically includes a number of storage registers 244-246 within a register section. One or more internal buses 241 functionally interconnect these functional modules. Processor 205 also typically has one or more interfaces 242 for communicating with external devices via system bus 204 using connections 218 . Memory 234 is coupled to bus 204 using connection 219.

アプリケーションプログラム２３３は、条件分岐およびループ命令を含み得る命令のシーケンス２３１を含む。プログラム２３３はまた、プログラム２３３の実行に使用されるデータ２３２を含んでもよい。命令２３１およびデータ２３２は、それぞれメモリ位置２２８、２２９、２３０および２３５、２３６、２３７に格納される。命令２３１とメモリ位置２２８～２３０の相対的なサイズに応じて、メモリ位置２３０に示される命令によって示されるように、特定の命令を単一のメモリ位置に記憶することができる。あるいは、命令がメモリ位置２２８および２２９に示される命令セグメントによって示されるように、各々が別個のメモリ位置に記憶されるいくつかの部分にセグメント化されてもよい。 Application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. Program 233 may also include data 232 used to execute program 233. Instructions 231 and data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending on the relative sizes of instructions 231 and memory locations 228-230, particular instructions may be stored in a single memory location, as indicated by the instruction shown in memory location 230. Alternatively, the instructions may be segmented into several parts, each stored in a separate memory location, as illustrated by the instruction segments shown in memory locations 228 and 229.

一般に、プロセッサ２０５には、その中で実行される命令のセットが与えられる。プロセッサ２０５は後続の入力を待ち、この入力に対してプロセッサ２０５は、別の命令セットを実行することによって反応する。各入力は入力装置２０２、２０３のうちの１つまたは複数によって生成されたデータ、ネットワーク２２０、２０２のうちの１つを介して外部ソースから受信されたデータ、記憶装置２０６、２０９のうちの１つから取り出されたデータ、または対応するリーダ２１２に挿入された記憶媒体２２５から取り出されたデータを含む、いくつかのソースのうちの１つまたは複数から提供することができ、すべて図２Ａに示されている。命令のセットを実行すると、データが出力される場合がある。実行には、データまたは変数をメモリ２３４に記憶することも含まれ得る。 Generally, processor 205 is provided with a set of instructions to execute therein. Processor 205 waits for subsequent input, to which processor 205 reacts by executing another set of instructions. Each input may include data generated by one or more of input devices 202, 203, data received from an external source via one of networks 220, 202, data received from an external source via one of networks 220, 202, or one of storage devices 206, 209. 2A, or data retrieved from a storage medium 225 inserted into a corresponding reader 212, all of which are shown in FIG. 2A. has been done. Executing a set of instructions may output data. Execution may also include storing data or variables in memory 234.

ビデオエンコーダ１１４、ビデオデコーダ１３４、および説明される方法は、メモリ２３４内の対応するメモリ位置２５５、２５６、２５７に格納されている入力変数２５４を使用することができる。ビデオエンコーダ１１４、ビデオデコーダ１３４、および説明される方法は、出力変数２６１を生成し、これらは、メモリ２３４内の対応するメモリ位置２６２、２６３、２６４に格納される。中間変数２５８は、メモリ位置２５９、２６０、２６６および２６７に格納され得る。 Video encoder 114, video decoder 134, and the described method may use input variables 254 stored in corresponding memory locations 255, 256, 257 within memory 234. Video encoder 114, video decoder 134, and the described method produce output variables 261, which are stored in corresponding memory locations 262, 263, 264 within memory 234. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

図２Ｂのプロセッサ２０５を参照すると、レジスタ２４４、２４５、２４６、演算論理ユニット（ＡＬＵ）２４０、および制御部２３９は、プログラム２３３を構成する命令セット内のすべての命令に対して「フェッチ、デコード、および実行」サイクルを実行するのに必要なマイクロオペレーションのシーケンスを実行するために協働する。各フェッチ、デコード、および実行サイクルは
メモリ位置２２８、２２９、２３０から命令２３１をフェッチまたは読出すフェッチ動作
制御部２３９が、どの命令がフェッチされたかを判定するデコード動作
制御部２３９及び／又はＡＬＵ２４０が命令を実行する動作を実行する
を有する。 Referring to processor 205 in FIG. 2B, registers 244, 245, 246, arithmetic logic unit (ALU) 240, and control unit 239 perform "fetch, decode, and execute the sequence of micro-operations necessary to execute the "run" cycle. Each fetch, decode, and execute cycle consists of: a fetch operation in which the instruction 231 is fetched or read from a memory location 228, 229, 230, a decode operation in which the controller 239 determines which instruction was fetched, a decode operation in which the controller 239 and/or the ALU 240 It has a , which performs the action of executing an instruction.

その後、次の命令のフェッチ、デコード、および実行サイクルをさらに実行することができる。同様に、制御部２３９がメモリ位置２３２に値を格納または書き込む格納サイクルを実行することができる。 Further fetch, decode, and execute cycles for the next instruction can then be performed. Similarly, controller 239 may perform a store cycle in which a value is stored or written to memory location 232 .

後述する図１０および図１１の方法における各ステップまたはサブプロセスは、プログラム２３３の１つまたは複数のセグメントに関連付けられ、典型的にはプロセッサ２０５内のレジスタセクション２４４、２４５、２４７、ＡＬＵ２４０、および制御部２３９が協働して、プログラム２３３の注記されたセグメントに対する命令セット内のすべての命令に対してフェッチ、デコード、および実行サイクルを実行することによって実行される。 Each step or subprocess in the method of FIGS. 10 and 11 described below is associated with one or more segments of program 233 and typically includes register sections 244, 245, 247, ALU 240, and control within processor 205. This is accomplished by units 239 working together to perform fetch, decode, and execute cycles for all instructions in the instruction set for the noted segment of program 233.

図３は、ビデオエンコーダ１１４の機能モジュールを示す概略ブロック図である。図４は、ビデオデコーダ１３４の機能モジュールを示す概略ブロック図である。一般に、データは、固定サイズのサブブロックへのブロックの分割などのサンプルまたは係数のグループで、または配列として、ビデオデコーダ１３４とビデオエンコーダ１１４の機能モジュールの間を通過する。ビデオエンコーダ１１４およびビデオデコーダ１３４は、図２Ａおよび図２Ｂに示すように、汎用コンピュータシステム２００を使用して実施することができ、様々な機能モジュールは、ハードディスクドライブ２０５上に常駐し、プロセッサ２０５によってその実行中に制御されるソフトウェアアプリケーションプログラム２３３の１つ以上のソフトウェアコードモジュールなど、コンピュータシステム２００内で実行可能なソフトウェアによって、コンピュータシステム２００内の専用ハードウェアによって実現することができる。あるいは、ビデオエンコーダ１１４およびビデオデコーダ１３４は、コンピュータシステム２００内で実行可能なソフトウェアおよび専用ハードウェアの組合せによって実装されてもよい。ビデオエンコーダ１１４、ビデオデコーダ１３４、および説明される方法は、代替として、説明される方法の機能またはサブ機能を実行する１つまたは複数の集積回路などの専用ハードウェアで実装され得る。そのような専用ハードウェアは、グラフィック処理ユニット（ＧＰＵ）、デジタルシグナルプロセッサ（ＤＳＰ）、特定用途向け標準製品（ＡＳＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、または１つまたは複数のマイクロプロセッサおよび関連するメモリを含むことができる。特に、ビデオエンコーダ１１４は、モジュール３１０～３８６を含み、ビデオデコーダ１３４は、ソフトウェアアプリケーションプログラム２３３の１つ以上のソフトウェアコードモジュールとしてそれぞれ実装され得るモジュール４２０～４９６を含む。 FIG. 3 is a schematic block diagram illustrating functional modules of video encoder 114. FIG. 4 is a schematic block diagram illustrating functional modules of video decoder 134. Generally, data is passed between the functional modules of video decoder 134 and video encoder 114 in groups or as an array of samples or coefficients, such as a division of a block into fixed-sized subblocks. Video encoder 114 and video decoder 134 may be implemented using a general purpose computer system 200, as shown in FIGS. 2A and 2B, with various functional modules residing on a hard disk drive 205 and being executed by a This may be implemented by specialized hardware within computer system 200, by software executable within computer system 200, such as one or more software code modules of software application program 233 controlled during its execution. Alternatively, video encoder 114 and video decoder 134 may be implemented by a combination of software and specialized hardware executable within computer system 200. Video encoder 114, video decoder 134, and the described methods may alternatively be implemented in dedicated hardware such as one or more integrated circuits that perform the functions or sub-functions of the described methods. Such specialized hardware may be a graphics processing unit (GPU), digital signal processor (DSP), application specific standard product (ASSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), or one It may include one or more microprocessors and associated memory. In particular, video encoder 114 includes modules 310-386, and video decoder 134 includes modules 420-496, each of which may be implemented as one or more software code modules of software application program 233.

図３のビデオエンコーダ１１４は、汎用ビデオ符号化（ＶＶＣ）ビデオ符号化パイプラインの一例であるが、本明細書で説明する処理ステージを実行するために他のビデオコーデックを使用することもできる。ビデオエンコーダ１１４は、一連のフレームのような撮像されたフレームデータ１１３を受信し、各フレームは１つ以上のカラーチャネルを含む。フレームデータ１１３は、４：２：０クロマフォーマットまたは４：２：２クロマフォーマットであってもよい。ブロックパーティショナ３１０は最初に、フレームデータ１１３をＣＴＵに分割し、ＣＴＵのための特定のサイズが使用されるように構成される、一般に正方形の形状である。ＣＴＵのサイズは例えば、６４×６４、１２８×１２８、または２５６×２５６ルマサンプルとすることができる。ブロックパーティショナ３１０は、ルマ符号化ツリー及びクロマ符号化ツリーに従って、各ＣＴＵを１つ以上のＣＢにさらに分割する。ＣＢは様々なサイズを有し、正方形および非正方形のアスペクト比の両方を含んでもよい。図１０を参照して、ブロックパーティショナ３１０の動作をさらに説明する。しかし、ＶＶＣ規格ではＣＢ、ＣＵ、ＰＵ、およびＴＵは常に２の累乗である辺長を有する。したがって、３１２として表される現在のＣＢは、ブロックパーティショナ３１０から出力され、ＣＴＵのクロマ符号化ツリーおよびルマ符号化ツリーに従って、ＣＴＵの１つまたは複数のブロックにわたる反復に従って進行する。ＣＴＵをＣＢに分割するためのオプションは、図５および図６を参照して以下でさらに説明される。 Video encoder 114 of FIG. 3 is an example of a general purpose video coding (VVC) video encoding pipeline, although other video codecs may be used to perform the processing stages described herein. Video encoder 114 receives imaged frame data 113, such as a series of frames, each frame including one or more color channels. Frame data 113 may be in 4:2:0 chroma format or 4:2:2 chroma format. Block partitioner 310 initially partitions frame data 113 into CTUs, which are generally square in shape and configured such that a particular size for the CTU is used. The size of the CTU can be, for example, 64x64, 128x128, or 256x256 luma samples. Block partitioner 310 further partitions each CTU into one or more CBs according to a luma encoding tree and a chroma encoding tree. CBs have a variety of sizes and may include both square and non-square aspect ratios. The operation of block partitioner 310 will be further described with reference to FIG. However, in the VVC standard, CB, CU, PU, and TU always have side lengths that are powers of two. Accordingly, the current CB, denoted as 312, is output from the block partitioner 310 and progresses according to the CTU's chroma encoding tree and luma encoding tree as it repeats over one or more blocks of the CTU. Options for dividing a CTU into CBs are further explained below with reference to FIGS. 5 and 6.

フレームデータ１１３の第１の分割から得られるＣＴＵは、ラスタスキャン順序でスキャンされ、１つまたは複数の「スライス」にグループ化され得る。スライスは「イントラ」（または「Ｉ」）スライスであってもよく、イントラスライス（Ｉスライス）はスライス内のすべてのＣＵがイントラ予測されることを示す。代替的に、スライスは、単一または双予測（それぞれ、「Ｐ」または「Ｂ」スライス）であってもよく、それぞれ、スライスにおける単一および双予測のさらなる利用可能性を示す。 The CTUs obtained from the first partition of frame data 113 may be scanned in raster scan order and grouped into one or more "slices." A slice may be an "intra" (or "I") slice, where an intra slice (I slice) indicates that all CUs within the slice are intra-predicted. Alternatively, the slices may be uni- or bi-predictive ("P" or "B" slices, respectively), indicating the further availability of uni- and bi-prediction in the slice, respectively.

各ＣＴＵに対して、ビデオエンコーダ１１４は２段階で動作する。第１段階（「サーチ」ステージと呼ばれる）では、ブロックパーティショナ３１０が符号化ツリーの様々な潜在的構成をテストする。符号化ツリーの各潜在的構成は、関連する「候補」ＣＢを有する。第１段階は、低歪で高い圧縮効率を提供するＣＢを選択するために様々な候補ＣＢをテストすることを含む。このテストは一般にラグランジュ最適化を含み、それによって候補ＣＢがレート（符号化コスト）と歪（入力フレームデータ１１３に関する誤差）の重み付けされた組合せに基づいて評価される。「最良の」候補ＣＢ（評価されたレート／歪みが最も低いＣＢ）は、ビットストリーム１１５への後続の符号化のために選択される。候補ＣＢの評価には、所与のエリアに対してＣＢを使用するか、または様々な分割オプションに従ってエリアをさらに分割し、結果として生じるより小さいエリアのそれぞれをさらなるＣＢで符号化するか、またはエリアをさらにさらに分割するオプションが含まれる。その結果、ＣＢと符号化ツリー自体の両方がサーチステージで選択される。 For each CTU, video encoder 114 operates in two stages. In the first stage (referred to as the "search" stage), block partitioner 310 tests various potential configurations of the encoding tree. Each potential configuration of the coding tree has an associated "candidate" CB. The first stage involves testing various candidate CBs to select the CB that provides high compression efficiency with low distortion. This test typically involves a Lagrangian optimization whereby candidate CBs are evaluated based on a weighted combination of rate (coding cost) and distortion (error with respect to input frame data 113). The “best” candidate CB (the CB with the lowest estimated rate/distortion) is selected for subsequent encoding into bitstream 115. Candidate CB evaluation involves using a CB for a given area, or subdividing the area according to various partitioning options and encoding each resulting smaller area with a further CB, or Includes options to further divide the area. As a result, both the CB and the coding tree itself are selected in the search stage.

ビデオエンコーダ１１４は、各ＣＢ、例えばＣＢ３１２に対して、矢印３２０によって示される予測ブロック（ＰＢ）を生成する。ＰＢ３２０は、関連するＣＢ３１２のコンテンツの予測である。減算器モジュール３２２は、ＰＢ３２０とＣＢ３１２との間に、３２４（または「残差」、空間領域内にある差分を参照する）として示される差分を生成する。差分３２４は、ＰＢ３２０およびＣＢ３１２における対応するサンプル間のブロックサイズの差分である。差分３２４は、変換され、量子化され、矢印３３６によって示される変換ブロック（ＴＢ）として表される。ＰＢ３２０および関連するＴＢ３３６は典型的には例えば、評価されたコストまたは歪みに基づいて、多くの可能な候補ＣＢのうちの１つから選択される。 Video encoder 114 generates a predictive block (PB), indicated by arrow 320, for each CB, e.g., CB 312. PB 320 is a prediction of the content of the associated CB 312. Subtractor module 322 generates a difference between PB 320 and CB 312, denoted as 324 (or "residual", referring to the difference that is in the spatial domain). Difference 324 is the block size difference between corresponding samples in PB 320 and CB 312. Difference 324 is transformed, quantized, and represented as a transform block (TB) indicated by arrow 336. PB 320 and associated TB 336 are typically selected from one of many possible candidate CBs based on, for example, estimated cost or distortion.

候補符号化ブロック（ＣＢ）は、関連するＰＢおよび結果として生じる残差についてビデオエンコーダ１１４に利用可能な予測モードの１つから生じるＣＢである。各候補ＣＢは図８を参照して後述するように、１つまたは複数の対応するＴＢをもたらす。ＴＢ３３６は、差分３２４の定量化され変換された表現である。ビデオデコーダ１１４において予測されたＰＢと組み合わされると、ＴＢ３３６は、ビットストリームにおける追加の信号を犠牲にして、復号されたＣＢとオリジナルのＣＢ３１２との間の差分を低減する。 A candidate coded block (CB) is a CB resulting from one of the prediction modes available to video encoder 114 for the associated PB and the resulting residual. Each candidate CB yields one or more corresponding TBs, as described below with reference to FIG. TB 336 is a quantified and transformed representation of difference 324. When combined with the predicted PB at video decoder 114, TB 336 reduces the difference between the decoded CB and original CB 312 at the expense of additional signal in the bitstream.

したがって、各候補符号化ブロック（ＣＢ）、すなわち、変換ブロック（ＴＢ）と組み合わせた予測ブロック（ＰＢ）は、関連する符号化コスト（または「レート」）および関連する差分（または「歪み」）を有する。レートは、典型的にはビット単位で測定される。ＣＢの歪みは、典型的には絶対差の和（ＳＡＤ）または二乗差の和（ＳＳＤ）などのサンプル値の差分として推定される。各候補ＰＢから得られる推定は、差分３２４を使用してモード選択器３８６によって決定され、イントラ予測モード（矢印３８８によって表される）を決定する。各候補予測モードと対応する残差符号化に関連する符号化コストの推定は、残差のエントロピー符号化よりもかなり低いコストで実行できる。従って、レート歪み検知における最適モードを決定するために、多数の候補モードを評価することができる。 Therefore, each candidate coded block (CB), i.e. a prediction block (PB) in combination with a transform block (TB), has an associated coding cost (or "rate") and an associated difference (or "distortion"). have Rate is typically measured in bits. CB distortion is typically estimated as a difference in sample values, such as a sum of absolute differences (SAD) or a sum of squared differences (SSD). The estimate obtained from each candidate PB is determined by mode selector 386 using difference 324 to determine the intra prediction mode (represented by arrow 388). Estimating the coding cost associated with each candidate prediction mode and corresponding residual encoding can be performed at a significantly lower cost than entropy encoding of the residual. Therefore, a large number of candidate modes can be evaluated to determine the optimal mode in rate-distortion sensing.

レート歪みの観点から最適モードの決定は、典型的にはラグランジュ最適化の変形を用いて達成される。イントラ予測モード３８８の選択は、典型的には特定のイントラ予測モードの適用から生じる残差データのための符号化コストを決定することを含む。符号化コストは「絶対変換差の和」（ＳＡＴＤ）を使用することによって近似することができ、それによって、アダマール変換などの比較的単純な変換を使用して、推定された変換残差コストを得る。比較的単純な変換を使用するいくつかの実施形態では、単純化された推定方法から得られるコストがさもなければ完全な評価から決定されるのであろう実際のコストに単調に関係する。単調に関連する推定コストを有する実施形態では、単純化された推定方法を使用して、ビデオエンコーダ１１４の複雑さを低減しながら、同じ決定（すなわち、イントラ予測モード）を行うことができる。推定されたコストと実際のコストとの間の関係における可能な非単調性を可能にするために、簡略化された推定方法を使用して、最良の候補のリストを生成することができる。非単調性は例えば、残差データの符号化に利用可能なさらなるモード決定から生じ得る。最良の候補のリストは、任意の数であってもよい。最良の候補を使用して、より完全な探索を実行して、候補のそれぞれについて残差データを符号化するための最適モード選択を確立することができ、他のモード決定と共にイントラ予測モードの最終選択を可能にする。 Determining the optimal mode from a rate-distortion perspective is typically accomplished using a variant of Lagrangian optimization. Selection of an intra-prediction mode 388 typically involves determining a coding cost for residual data resulting from application of a particular intra-prediction mode. The encoding cost can be approximated by using the "sum of absolute transform differences" (SATD), whereby a relatively simple transform such as the Hadamard transform is used to reduce the estimated transform residual cost to obtain. In some embodiments that use relatively simple transformations, the cost obtained from the simplified estimation method is monotonically related to the actual cost that would otherwise be determined from a complete evaluation. In embodiments with monotonically related estimated costs, a simplified estimation method can be used to make the same decision (i.e., intra-prediction mode) while reducing the complexity of video encoder 114. To allow for possible non-monotonicity in the relationship between estimated and actual costs, a simplified estimation method can be used to generate the list of best candidates. Non-monotonicity may arise, for example, from further mode decisions available for encoding the residual data. The list of best candidates may be any number. Using the best candidates, a more complete search can be performed to establish the optimal mode selection for encoding the residual data for each of the candidates, and the final intra-prediction mode along with other mode decisions. Allow choice.

他のモード決定は、「変換スキップ」として知られる順方向変換をスキップする能力を含む。変換をスキップすることは、変換基底関数としての表現を介して符号化コストを低減するための適切な相関を欠く残差データに適している。比較的単純なコンピュータ生成グラフィックスのような特定のタイプのコンテンツは、同様の挙動を示すことがある。「スキップされた変換」の場合、変換自体が実行されなくても、残差係数は依然として符号化される。 Other mode decisions include the ability to skip forward transforms, known as "transform skip." Skipping the transform is suitable for residual data that lacks proper correlation to reduce encoding cost via representation as transform basis functions. Certain types of content, such as relatively simple computer-generated graphics, may exhibit similar behavior. In the case of a "skipped transform", the residual coefficients are still encoded even though the transform itself is not performed.

ラグランジュ処理または類似の最適化処理を採用して、ＣＴＵのＣＢへの最適分割（ブロックパーティショナ３１０による）と、複数の可能性からの最良の予測モードの選択の両方を選択することができる。モード選択モジュール３８６における候補モードのラグランジュ最適化プロセスの適用を通して、最低コスト測定を有するイントラ予測モードが「最良」のモードとして選択される。最低コストのモードは、選択されたイントラ予測モード３８８であり、エントロピーエンコーダ３３８によってビットストリーム１１５に符号化される。モード選択モジュール３８６の動作によるイントラ予測モード３８８の選択は、ブロックパーティショナ３１０の動作に拡張する。例えば、イントラ予測モード３８８の選択のための候補は、所与のブロックに適用可能なモードと、さらに、所与のブロックと一緒に集合的に配置される複数のより小さいブロックに適用可能なモードとを含むことができる。所与のブロックおよびより小さいコロケートされたブロックに適用可能なモードを含む場合、候補を暗黙的に選択するプロセスは、ＣＴＵのＣＢへの最良の階層分解を決定するプロセスでもある。 A Lagrangian or similar optimization process may be employed to select both the optimal partitioning of CTUs into CBs (by block partitioner 310) and the selection of the best prediction mode from multiple possibilities. Through application of a Lagrangian optimization process of candidate modes in mode selection module 386, the intra-prediction mode with the lowest cost measure is selected as the "best" mode. The lowest cost mode is the selected intra prediction mode 388 and is encoded into the bitstream 115 by the entropy encoder 338. The selection of intra-prediction mode 388 by the operation of mode selection module 386 extends to the operation of block partitioner 310. For example, candidates for selection of intra-prediction modes 388 include modes applicable to a given block and also modes applicable to multiple smaller blocks that are collectively placed together with the given block. and may include. The process of implicitly selecting candidates is also the process of determining the best hierarchical decomposition of CTUs into CBs, given the modes applicable to a given block and smaller colocated blocks.

ビデオエンコーダ１１４の動作の第２段階（「符号化」ステージと呼ばれる）では、選択されたルマ符号化ツリーおよび選択されたクロマ符号化ツリー、したがって選択された各ＣＢに対する反復がビデオエンコーダ１１４内で実行される。反復では、ＣＢが本明細書でさらに説明するように、ビットストリーム１１５に符号化される。 In the second stage of operation of video encoder 114 (referred to as the "encoding" stage), the selected luma encoding tree and the selected chroma encoding tree, and thus the iterations for each selected CB, are performed within video encoder 114. executed. At the iteration, the CB is encoded into the bitstream 115, as described further herein.

エントロピーエンコーダ３３８は、構文要素の可変長符号化と構文要素の算術符号化の両方をサポートする。算術符号化は、コンテキスト適応２進算術符号化処理を使用してサポートされる。算術的に符号化された構文要素は１つ以上の’ｂｉｎｓ’のシーケンスからなる。ビンはビットと同様に、「０」または「１」の値を持つ。しかし、ビンはビットストリーム１１５内で離散ビットとして符号化されていない。ビンは、「コンテキスト」として知られる、関連する予測（または「可能性」または「最も可能性のある」）値および関連する確率を有する。符号化される実際のビンが予測値と一致するとき、「最確シンボル」（ＭＰＳ）が符号化される。最も確率の高いシンボルを符号化することは、消費されるビットに関して比較的安価である。符号化されるべき実際のビンがありそうな値と一致しない場合、「最低確率シンボル」（ＬＰＳ）が符号化される。最低確率シンボルを符号化することは、消費されるビットに関して比較的高いコストを有する。ビン符号化技術は、「０」対「１」の確率がスキューされるビンの効率的な符号化を可能にする。２つの可能な値（すなわちｆｌａｇ）を持つ構文要素に対しては、単一のビンで十分である。可能な値が多い構文要素の場合は、一連のビンが必要である。 Entropy encoder 338 supports both variable length encoding of syntax elements and arithmetic encoding of syntax elements. Arithmetic encoding is supported using context-adaptive binary arithmetic encoding processing. An arithmetic encoded syntax element consists of a sequence of one or more 'bins'. A bin, like a bit, has a value of "0" or "1". However, the bins are not encoded as discrete bits within the bitstream 115. A bin has an associated predicted (or "likely" or "most likely") value and an associated probability, known as the "context." A "most probable symbol" (MPS) is encoded when the actual bin encoded matches the predicted value. Encoding the most probable symbols is relatively cheap in terms of bits consumed. If the actual bin to be encoded does not match the likely value, a "least probability symbol" (LPS) is encoded. Encoding the lowest probability symbol has a relatively high cost in terms of bits consumed. Bin encoding techniques allow for efficient encoding of bins in which the probabilities of ``0'' versus ``1'' are skewed. For syntax elements with two possible values (ie, flag), a single bin is sufficient. For syntactic elements with many possible values, a series of bins is required.

シーケンス中の後のビンの存在は、シーケンス中の前のビンの値に基づいて決定されてもよい。さらに、各ビンは、２つ以上のコンテキストに関連付けることができる。特定のコンテキストの選択は構文要素の以前のビン、隣接する構文要素のビン値（すなわち、隣接するブロックからのもの）などに依存することができる。コンテキスト符号化ビンが符号化されるたびに、そのビンに対して選択されたコンテキスト（もしあれば）は、新しいビン値を反映する方法で更新される。このように、２進算術符号化方式は適応型であると言われている。 The presence of later bins in the sequence may be determined based on the values of earlier bins in the sequence. Furthermore, each bin can be associated with more than one context. The selection of a particular context may depend on the previous bin of the syntax element, the bin values of adjacent syntax elements (ie, from adjacent blocks), etc. Each time a context encoding bin is encoded, the selected context (if any) for that bin is updated in a manner that reflects the new bin value. In this way, the binary arithmetic coding system is said to be adaptive.

また、ビデオエンコーダ１１４によってサポートされるのは、コンテキストを欠くビン（「バイパスビン」）である。バイパスビンは、「０」と「１」との間の等確率分布を仮定して符号化される。したがって、各ビンは、ビットストリーム１１５内の１ビットを占有する。コンテキストがないと、メモリが節約され、複雑さが軽減される。したがって、特定のビンの値の分布が偏っていない場合は、バイパスビンが使用される。コンテキストおよび適応を使用するエントロピーコーダの一例はＣＡＢＡＣ（コンテキスト適応バイナリ算術コーダ）として当技術分野で知られており、このコーダの多くの変形がビデオ符号化に使用されている。 Also supported by video encoder 114 are bins lacking context (“bypass bins”). Bypass bins are encoded assuming an equal probability distribution between "0" and "1". Therefore, each bin occupies one bit in bitstream 115. No context saves memory and reduces complexity. Therefore, if the distribution of values in a particular bin is not skewed, a bypass bin is used. An example of an entropy coder that uses context and adaptation is known in the art as CABAC (Context Adaptive Binary Arithmetic Coder), and many variants of this coder are used for video coding.

エントロピーエンコーダ３３８は、コンテキスト符号化ビンとバイパス符号化ビンとの組合せを使用してイントラ予測モード３８８を符号化する。典型的には、「最確モード」のリストがビデオエンコーダ１１４において生成される。最も確率の高いモードのリストは典型的には３つまたは６つのモードのような固定長であり、以前のブロックで遭遇したモードを含むことができる。コンテキスト符号化ビンは、イントラ予測モードが最も確率の高いモードの１つかどうかを示すフラグを符号化する。イントラ予測モード３８８が最も確率の高いモードのうちの１つである場合、バイパス符号化されたビンを使用するさらなるシグナリングが符号化される。符号化されたさらなるシグナリングは例えば、切り捨てられた単項ビンストリングを使用して、どの最も確率の高いモードがイントラ予測モード３８８に対応するかを示す。そうでない場合、イントラ予測モード３８８は、「残りのモード」として符号化される。残りのモードとしての符号化は、バイパス符号化されたビンを使用しても符号化される固定長符号などの代替構文を使用して、最も確率の高いモードリストに存在するもの以外のイントラ予測モードを表現する。 Entropy encoder 338 encodes intra prediction mode 388 using a combination of context encoding bins and bypass encoding bins. Typically, a list of “most probable modes” is generated at video encoder 114. The list of most probable modes is typically a fixed length, such as 3 or 6 modes, and can include modes encountered in previous blocks. The context encoding bin encodes a flag indicating whether the intra prediction mode is one of the most probable modes. If intra prediction mode 388 is one of the most probable modes, further signaling using bypass encoded bins is encoded. Further encoded signaling indicates which most probable mode corresponds to the intra-prediction mode 388, for example using a truncated unary bin string. Otherwise, intra prediction mode 388 is encoded as a "remaining mode." Encoding as a remaining mode allows intra-predictions other than those present in the most probable mode list using alternative syntax such as fixed-length codes that are also encoded using bypass-encoded bins. express the mode.

マルチプレクサモジュール３８４は、決定された最良のイントラ予測モード３８８に従ってＰＢ３２０を出力し、各候補ＣＢのテストされた予測モードから選択する。候補予測モードは、ビデオエンコーダ１１４によってサポートされるすべての考えられる予測モードを含む必要はない。 Multiplexer module 384 outputs PB 320 according to the determined best intra prediction mode 388 to select from the tested prediction modes of each candidate CB. Candidate prediction modes need not include all possible prediction modes supported by video encoder 114.

予測モードは大きく二つのカテゴリーに分類される。第１のカテゴリは、「イントラフレーム予測」（「イントラ予測」とも呼ばれる）である。イントラフレーム予測では、ブロックに対する予測が生成され、生成方法は現在のフレームから得られた他のサンプルを使用してもよい。イントラ予測されたＰＢの場合、異なるイントラ予測モードがルマおよびクロマのために使用されることが可能であり、したがって、イントラ予測は主に、ＰＢ上での動作に関して説明される。 Prediction modes are broadly classified into two categories. The first category is "intra frame prediction" (also called "intra prediction"). In intra-frame prediction, a prediction is generated for a block, and the generation method may use other samples obtained from the current frame. For intra-predicted PBs, different intra-prediction modes can be used for luma and chroma, and thus intra-prediction is primarily described in terms of operations on the PB.

予測モードの第２のカテゴリは、「インターフレーム予測」（「インター予測」とも呼ばれる）である。インターフレーム予測では、ブロックの予測がビットストリーム内のフレームを符号化する順序で現在のフレームに先行する１つまたは２つのフレームからのサンプルを使用して生成される。さらに、インターフレーム予測のために、単一の符号化ツリーが典型的には、ルマチャネルおよびクロマチャネルの両方について使用される。ビットストリーム内のフレームの符号化順は、キャプチャまたは表示時のフレームの順序とは異なる場合がある。１つのフレームが予測に使用される場合、ブロックは「単一予測」であると言われ、１つの関連する動きベクトルを有する。２つのフレームが予測に使用される場合、ブロックは「双予測」されると言われ、２つの関連する動きベクトルを有する。Ｐスライスの場合、各ＣＵは、イントラ予測または単一予測され得る。Ｂスライスの場合、各ＣＵは、イントラ予測、単一予測、または双予測され得る。フレームは、典型的にはフレームの時間的階層を可能にする「ピクチャのグループ」構造を使用して符号化される。フレームの時間的階層は、フレームがフレームを表示する順序で、先行するピクチャおよび後続するピクチャを参照することを可能にする。画像は、各フレームを復号するための依存関係が満たされていることを確認するために必要な順序で符号化される。 The second category of prediction modes is "interframe prediction" (also called "inter prediction"). In interframe prediction, a prediction of a block is generated using samples from one or two frames that precede the current frame in the order of encoding the frames in the bitstream. Furthermore, for interframe prediction, a single coding tree is typically used for both luma and chroma channels. The encoding order of frames within a bitstream may differ from the order of frames when captured or displayed. If one frame is used for prediction, the block is said to be "uni-predictive" and has one associated motion vector. If two frames are used for prediction, a block is said to be "bi-predicted" and has two associated motion vectors. For P slices, each CU may be intra-predicted or single-predicted. For B slices, each CU may be intra-predicted, uni-predicted, or bi-predicted. Frames are typically encoded using a "group of pictures" structure that allows for a temporal hierarchy of frames. The temporal hierarchy of frames allows frames to refer to preceding and following pictures in the order in which they are displayed. Images are encoded in the order necessary to ensure that the dependencies for decoding each frame are met.

インター予測のサブカテゴリは、「スキップモード」と呼ばれる。インター予測およびスキップモードは、２つの別個のモードとして説明される。しかしながら、インター予測モード及びスキップモードの両方は、先行するフレームからのサンプルのブロックを参照する動きベクトルを含む。インター予測は符号化された動きベクトルデルタを含み、動きベクトル予測子に対する動きベクトルを指定する。動きベクトル予測子は、「マージインデックス」で選択された１つ以上の候補動きベクトルのリストから得られる。符号化された動きベクトルデルタは、選択された動きベクトル予測に空間オフセットを提供する。また、インター予測は、ビットストリーム１３３内の符号化された残差を使用する。スキップモードは、インデックス（「マージインデックス」とも呼ばれる）のみを使用して、いくつかの動きベクトル候補のうちの１つを選択する。選択された候補は、さらなるシグナリングなしに使用される。また、スキップモードは、残差係数の符号化をサポートしない。スキップモードが使用されるとき、符号化された残差係数がないことは、スキップモードのための変換を実行する必要がないことを意味する。したがって、スキップモードは、典型的にはパイプライン処理問題を生じない。パイプライン処理問題は、イントラ予測ＣＵおよびインター予測ＣＵの場合であり得る。スキップモードの限定されたシグナリングのために、スキップモードは比較的高品質の参照フレームが利用可能であるときに、非常に高い圧縮性能を達成するために有用である。ランダムアクセスピクチャグループ構造のより高い時間レイヤにおける双予測ＣＵは、典型的には基礎となる動きを正確に反映する高品質の参照ピクチャおよび動きベクトル候補を有する。 A subcategory of inter prediction is called "skip mode." Inter-prediction and skip modes are described as two separate modes. However, both inter-prediction mode and skip mode include motion vectors that refer to blocks of samples from the previous frame. Inter prediction includes a coded motion vector delta and specifies a motion vector to a motion vector predictor. A motion vector predictor is obtained from a list of one or more candidate motion vectors selected by a "merge index." The encoded motion vector delta provides a spatial offset to the selected motion vector prediction. Inter prediction also uses coded residuals within the bitstream 133. Skip mode uses only the index (also called "merge index") to select one of several motion vector candidates. The selected candidate is used without further signaling. Also, skip mode does not support encoding of residual coefficients. When skip mode is used, the absence of coded residual coefficients means that there is no need to perform a transform for skip mode. Therefore, skip mode typically does not create pipelining problems. Pipelining problems may be the case for intra-predicted CUs and inter-predicted CUs. Because of the limited signaling of skip mode, skip mode is useful for achieving very high compression performance when relatively high quality reference frames are available. Bi-predictive CUs at higher temporal layers of the random access picture group structure typically have high quality reference pictures and motion vector candidates that accurately reflect the underlying motion.

サンプルは、動きベクトルおよび参照ピクチャインデックスに従って選択される。動きベクトルおよび参照ピクチャインデックスは、すべてのカラーチャネルに適用され、したがって、インター予測は主に、ＰＢではなくＰＵ上での動作に関して説明される。各カテゴリー内（すなわち、イントラおよびインターフレーム予測）では、ＰＵを生成するために異なる技法を適用することができる。例えば、イントラ予測は、所定のフィルタリング及び生成処理に従ってＰＵを生成する方向と組み合わせて、以前に再構成されたサンプルの隣接する行及び列からの値を使用することができる。あるいは、ＰＵが少数のパラメータを使用して記述されてもよい。インター予測法は、動きパラメータの数とその精度で変わる可能性がある。動きパラメータは通常、参照フレームのリストからのどの参照フレームが使用されるべきかを示す参照フレームインデックスと、参照フレームの各々のための空間変換とを含むが、より多くのフレーム、特別なフレーム、またはスケーリングおよび回転などの複雑なアフィンパラメータを含むことができる。さらに、参照サンプルブロックに基づいて高密度動き推定を生成するために、所定の動き精緻化処理を適用することができる。 Samples are selected according to the motion vector and reference picture index. Motion vectors and reference picture indices apply to all color channels, so inter prediction is primarily described in terms of operation on the PU rather than the PB. Within each category (i.e., intra and interframe prediction), different techniques may be applied to generate PUs. For example, intra-prediction may use values from adjacent rows and columns of previously reconstructed samples in combination with direction to generate PUs according to a predetermined filtering and generation process. Alternatively, a PU may be described using a small number of parameters. Inter prediction methods can vary in the number of motion parameters and their accuracy. The motion parameters typically include a reference frame index indicating which reference frame from the list of reference frames should be used and a spatial transformation for each of the reference frames, but more frames, special frames, or can include complex affine parameters such as scaling and rotation. Additionally, a predetermined motion refinement process may be applied to generate a dense motion estimate based on the reference sample block.

ＰＢ３２０を決定し、選択し、減算器３２２で元のサンプルブロックからＰＢ３２０を減算すると、符号化コストが最も低い、３２４で表される残差が得られ、非可逆圧縮を受ける。非可逆圧縮プロセスは、変換、量子化、およびエントロピー符号化のステップを含む。順方向一次変換モジュール３２６は、差分３２４に順方向変換を適用し、差分３２４を空間領域から周波数領域に変換し、矢印３２８によって表される一次変換係数を生成する。一次変換係数３２８は、順方向二次変換モジュール３３０に渡され、非分離二次変換（ＮＳＳＴ）動作を実行することによって、矢印３３２によって表される変換係数を生成する。順方向一次変換は典型的には分離可能であり、典型的にはＤＣＴ－２を使用して、行のセット、次いで各ブロックの列のセットを変換するが、ＤＳＴ－７およびＤＣＴ－８も、例えば、１６サンプルを超えないブロック幅については水平方向に、１６サンプルを超えないブロック高さについては垂直方向に利用可能であり得る。行および列の各セットの変換は、最初にブロックの各行に１次元変換を適用して部分結果を生成し、次に部分結果の各列に１次元変換を適用して最終結果を生成することによって実行される。順方向二次変換は一般に、分離不可能な変換であり、これは、イントラ予測されたＣＵの残差に対してのみ適用され、それにもかかわらず、バイパスされてもよい。順方向二次変換は、１６個のサンプル（１次変換係数３２８の左上４×４サブブロックとして配置される）または６４個のサンプル（１次変換係数３２８の４つの４×４サブブロックとして配置される、左上８×８係数として配置される）のいずれかで動作する。更に、順方向二次変換の行列係数は、使用のために２組の係数が利用できるように、ＣＵのイントラ予測モードに従って複数のセットから選択される。行列係数のセットの１つ、または順方向二次変換のバイパスを使用することは、「ｎｓｓｔ＿ｉｎｄｅｘ」のシンタックス要素でシグナリングされ、切り捨てられた単項二値化（a truncated unary binarisation）を使って、値ゼロ（二次変換は適用されない）、１つ（選択された行列係数の第１セット）、または２つ（選択された行列係数の第２セット）を表すように符号化されている。 Determining and selecting the PB 320 and subtracting the PB 320 from the original sample block in a subtractor 322 yields the residual, denoted 324, which has the lowest encoding cost and undergoes lossy compression. The lossy compression process includes the steps of transformation, quantization, and entropy encoding. A forward linear transform module 326 applies a forward transform to the difference 324 to transform the difference 324 from the spatial domain to the frequency domain and generate linear transform coefficients represented by arrows 328. The linear transform coefficients 328 are passed to a forward quadratic transform module 330 to generate transform coefficients represented by arrows 332 by performing a non-separable quadratic transform (NSST) operation. Forward linear transforms are typically separable, typically using DCT-2 to transform a set of rows and then a set of columns for each block, but DST-7 and DCT-8 are also used. , for example, horizontally for block widths not exceeding 16 samples and vertically for block heights not exceeding 16 samples. The transformation of each set of rows and columns involves first applying a one-dimensional transformation to each row of the block to produce a partial result, and then applying a one-dimensional transformation to each column of the partial result to produce the final result. executed by The forward quadratic transform is generally a non-separable transform, which is only applied to the intra-predicted CU residuals and may nevertheless be bypassed. The forward quadratic transform consists of 16 samples (arranged as the upper left 4x4 subblock of the linear transform coefficients 328) or 64 samples (arranged as four 4x4 subblocks of the linear transform coefficients 328). (arranged as an 8x8 coefficient in the top left corner). Furthermore, the matrix coefficients of the forward quadratic transform are selected from multiple sets according to the intra-prediction mode of the CU, such that two sets of coefficients are available for use. Using one of the sets of matrix coefficients, or bypassing the forward quadratic transform, is signaled with the "nsst_index" syntax element, using a truncated unary binarisation: It is encoded to represent the values zero (no quadratic transform applied), one (first set of selected matrix coefficients), or two (second set of selected matrix coefficients).

変換係数３３２は、量子化器モジュール３３４に渡される。モジュール３３４では、「量子化パラメータ」による量子化が実行され、矢印３３６によって表される残差係数が生成される。量子化パラメータは所与のＴＢについて一定であり、したがって、ＴＢについての残差係数の生成のための均一なスケーリングをもたらす。「量子化行列」を適用することによって、不均一なスケーリングも可能であり、それによって、各残差係数に適用されるスケーリング係数は、量子化パラメータと、典型的にはＴＢのサイズに等しいサイズを有するスケーリング行列内の対応するエントリとの組合せから導出される。残差係数３３６は、ビットストリーム１１５における符号化のためにエントロピーエンコーダ３３８に供給される。典型的には、ＴＵの少なくとも１つの有意な残差係数を有する各ＴＢの残差係数がスキャンパターンに従って、値の順序付けられたリストを生成するためにスキャンされる。スキャンパターンは一般に、４×４「サブブロック」のシーケンスとしてＴＢをスキャンし、残差係数の４×４セットの粒度で規則的なスキャン動作を提供し、サブブロックの配置は、ＴＢのサイズに依存する。さらに、予測モード３８８および対応するブロック分割もビットストリーム１１５に符号化される。 Transform coefficients 332 are passed to quantizer module 334. In module 334, quantization with a "quantization parameter" is performed to generate residual coefficients represented by arrows 336. The quantization parameter is constant for a given TB, thus providing uniform scaling for the generation of residual coefficients for the TB. Non-uniform scaling is also possible by applying a "quantization matrix", whereby the scaling factor applied to each residual coefficient is equal to the quantization parameter and of a size typically equal to the size of TB. is derived from the combination with the corresponding entry in the scaling matrix with . Residual coefficients 336 are provided to entropy encoder 338 for encoding in bitstream 115. Typically, the residual coefficients of each TB having at least one significant residual coefficient of the TU are scanned to generate an ordered list of values according to a scan pattern. The scan pattern typically scans the TB as a sequence of 4x4 "subblocks", providing a regular scanning operation with a granularity of 4x4 sets of residual coefficients, and the arrangement of the subblocks depends on the size of the TB. Dependent. Additionally, the prediction mode 388 and corresponding block partitioning are also encoded into the bitstream 115.

上述したように、ビデオエンコーダ１１４は、ビデオデコーダ１３４に見られるフレーム表現に対応するフレーム表現にアクセスする必要がある。従って、残差係数３３６も逆量子化器モジュール３４０によって逆量子化され、矢印３４２によって表される逆変換係数を生成する。逆変換係数３４２は、逆二次変換モジュール３４４を通過して、矢印３４６で表される中間逆変換係数を生成する。中間逆変換係数３４６は、ＴＵの矢印３５０によって表される残差サンプルを生成するために、逆一次変換モジュール３４８に渡される。逆二次変換モジュール３４４によって実行される逆変換のタイプは、順方向二次変換モジュール３３０によって実行される順変換のタイプに対応する。逆一次変換モジュール３４８によって実行される逆変換のタイプは、一次変換モジュール３２６によって実行される一次変換のタイプに対応する。加算モジュール３５２は、残差サンプル３５０とＰＵ３２０とを加算して、ＣＵの再構成サンプル（矢印３５４によって示される）を生成する。 As mentioned above, video encoder 114 needs access to a frame representation that corresponds to the frame representation seen at video decoder 134. Therefore, residual coefficients 336 are also dequantized by dequantizer module 340 to produce inverse transform coefficients represented by arrows 342. The inverse transform coefficients 342 are passed through an inverse quadratic transform module 344 to produce intermediate inverse transform coefficients represented by arrows 346. The intermediate inverse transform coefficients 346 are passed to an inverse linear transform module 348 to generate residual samples represented by TU arrows 350. The type of inverse transform performed by inverse quadratic transform module 344 corresponds to the type of forward transform performed by forward quadratic transform module 330. The type of inverse transform performed by inverse linear transform module 348 corresponds to the type of linear transform performed by linear transform module 326. Summing module 352 sums residual samples 350 and PU 320 to generate reconstructed samples of the CU (indicated by arrow 354).

再構成されたサンプル３５４は、参照サンプルキャッシュ３５６およびループ内フィルタモジュール３６８に渡される。参照サンプルキャッシュ３５６は、通常ＡＳＩＣ上のスタティックＲＡＭを使用して実現され（したがって、コストのかかるオフチップメモリアクセスを回避する）、フレーム内の後続のＣＵのためのフレーム内ＰＢを生成するための依存関係を満たすために必要な最小限のサンプル記憶装置を提供する。最小依存関係は、典型的にはＣＴＵの行の最下部に沿ったサンプルの「ラインバッファ」を含み、ＣＴＵの次の行および列バッファリングによって使用され、その範囲はＣＴＵの高さによって設定される。参照サンプルキャッシュ３５６は、参照サンプルフィルタ３６０に参照サンプル（矢印３５８で示す）を供給する。サンプルフィルタ３６０は、平滑化演算を適用して、フィルタリングされた参照サンプル（矢印３６２によって示される）を生成する。フィルタリングされた参照サンプル３６２は、イントラフレーム予測モジュール３６４によって使用され、矢印３６６によって表されるサンプルのイントラ予測ブロックを生成する。各候補イントラ予測モードについて、イントラフレーム予測モジュール３６４は、サンプルのブロック、すなわち３６６を生成する。 The reconstructed samples 354 are passed to a reference sample cache 356 and an in-loop filter module 368. Reference sample cache 356 is typically implemented using static RAM on an ASIC (thus avoiding costly off-chip memory accesses) and is used to generate intra-frame PBs for subsequent CUs within a frame. Provide the minimum sample storage needed to satisfy dependencies. The minimum dependency typically includes a "line buffer" of samples along the bottom of the CTU's row, used by the CTU's next row and column buffering, and whose range is set by the CTU's height. Ru. Reference sample cache 356 provides reference samples (indicated by arrow 358) to reference sample filter 360. Sample filter 360 applies a smoothing operation to produce a filtered reference sample (indicated by arrow 362). Filtered reference samples 362 are used by intra-frame prediction module 364 to generate an intra-predicted block of samples represented by arrow 366. For each candidate intra prediction mode, intra frame prediction module 364 generates a block, or 366, of samples.

ループ内フィルタモジュール３６８は、再構成されたサンプル３５４にいくつかのフィルタリング段階を適用する。フィルタリング段階は、不連続性から生じるアーチファクトを低減するために、ＣＵ境界に整列された平滑化を適用する「デブロッキングフィルタ」（ＤＢＦ）を含む。インループフィルタモジュール３６８に存在する別のフィルタリング段階は、「適応ループフィルタ」（ＡＬＦ）であり、これは、歪みをさらに低減するためにウィナーベースの適応フィルタを適用する。ループ内フィルタモジュール３６８における更なる利用可能なフィルタリング段階は、「サンプル適応オフセット」（ＳＡＯ）フィルタである。ＳＡＯフィルタは最初に、再構成されたサンプルを１つまたは複数のカテゴリに分類し、割り当てられたカテゴリに従って、サンプルレベルでオフセットを適用することによって動作する。 An in-loop filter module 368 applies several filtering stages to the reconstructed samples 354. The filtering stage includes a "deblocking filter" (DBF) that applies smoothing aligned to the CU boundaries to reduce artifacts resulting from discontinuities. Another filtering stage present in the in-loop filter module 368 is an "adaptive loop filter" (ALF), which applies a Wiener-based adaptive filter to further reduce distortion. A further available filtering stage in the in-loop filter module 368 is a "sample adaptive offset" (SAO) filter. The SAO filter operates by first classifying the reconstructed samples into one or more categories and applying an offset at the sample level according to the assigned category.

矢印３７０で表されるフィルタリングされたサンプルは、ループ内フィルタモジュール３６８から出力される。フィルタリングされたサンプル３７０は、フレームバッファ３７２に記憶される。フレームバッファ３７２は、典型的には、いくつかの（例えば、１６までの）ピクチャを格納するための容量を有し、従って、メモリ２０６に格納される。フレームバッファ３７２は、大きなメモリ消費が要求されるため、通常、オンチップメモリを使用して記憶されない。したがって、フレームバッファ３７２へのアクセスは、メモリ帯域幅に関してコストがかかる。フレームバッファ３７２は、参照フレーム（矢印３７４によって表される）を動き推定モジュール３７６および動き補償モジュール３８０に提供する。 Filtered samples, represented by arrow 370, are output from in-loop filter module 368. Filtered samples 370 are stored in frame buffer 372. Frame buffer 372 typically has the capacity to store a number (eg, up to 16) pictures and is therefore stored in memory 206. Frame buffer 372 is typically not stored using on-chip memory because of the large memory consumption required. Therefore, accessing frame buffer 372 is costly in terms of memory bandwidth. Frame buffer 372 provides a reference frame (represented by arrow 374) to motion estimation module 376 and motion compensation module 380.

動き推定モジュール３７６は、いくつかの「動きベクトル」（３７８として示される）を推定し、各々は現在のＣＢの位置からのデカルト空間オフセットであり、フレームバッファ３７２内の参照フレームのうちの１つ内のブロックを参照する。参照サンプルのフィルタリングされたブロック（３８２として表される）は、各動きベクトルに対して生成される。フィルタリングされた参照サンプル３８２は、モードセレクタ３８６による潜在的な選択に利用可能なさらなる候補モードを形成する。さらに、所与のＣＵについて、ＰＵ３２０は、１つの参照ブロック（「単一予測」）を使用して形成されてもよく、または２つの参照ブロック（「双予測」）を使用して形成されてもよい。選択された動きベクトルに対して、動き補償モジュール３８０は、動きベクトル内のサブピクセル精度をサポートするフィルタリング処理に従って、ＰＢ３２０を生成する。したがって、動き推定モジュール３７６（多くの候補動きベクトルに対して動作する）は、計算の複雑さを低減するために、動き補償モジュール３８０（選択された候補のみに対して動作する）のそれと比較して、単純化されたフィルタリング処理を実行することができる。 Motion estimation module 376 estimates a number of “motion vectors” (denoted as 378), each of which is a Cartesian spatial offset from the current CB position and one of the reference frames in frame buffer 372. Refer to blocks within. A filtered block of reference samples (denoted as 382) is generated for each motion vector. Filtered reference samples 382 form further candidate modes available for potential selection by mode selector 386. Furthermore, for a given CU, PU 320 may be formed using one reference block ("uni-prediction") or using two reference blocks ("bi-prediction"). Good too. For the selected motion vector, motion compensation module 380 generates PB 320 according to a filtering process that supports sub-pixel precision within the motion vector. Therefore, motion estimation module 376 (operating on many candidate motion vectors) is compared to that of motion compensation module 380 (operating only on selected candidates) to reduce computational complexity. can perform a simplified filtering process.

図３のビデオエンコーダ１１４は汎用ビデオ符号化（ＶＶＣ）を参照して説明されるが、他のビデオ符号化規格または実装はモジュール３１０～３８６の処理段階を使用することもできる。フレームデータ１１３（およびビットストリーム１１５）は、メモリ２０６、ハードディスクドライブ２１０、ＣＤ－ＲＯＭ、Ｂｌｕ－ｒａｙディスクＴＭ、または他のコンピュータ可読記憶媒体から読み取る（または書き込む）こともできる。さらに、フレームデータ１１３（およびビットストリーム１１５）は、通信ネットワーク２２０または無線周波数受信機に接続されたサーバなどの外部ソースから受信（または送信）されてもよい。 Although video encoder 114 of FIG. 3 is described with reference to generic video coding (VVC), other video encoding standards or implementations may also use the processing stages of modules 310-386. Frame data 113 (and bitstream 115) may also be read from (or written to) memory 206, hard disk drive 210, CD-ROM, Blu-ray disc™, or other computer-readable storage medium. Additionally, frame data 113 (and bitstream 115) may be received (or transmitted) from an external source, such as a server connected to communication network 220 or a radio frequency receiver.

ビデオデコーダ１３４を図４に示す。図４のビデオデコーダ１３４は、汎用ビデオコーディング（ＶＶＣ）ビデオデコーディングパイプラインの一例であるが、他のビデオコーデックを使用して、本明細書で説明する処理段階を実行することもできる。図４に示すように、ビットストリーム１３３はビデオデコーダ１３４に入力される。ビットストリーム１３３は、メモリ２０６、ハードディスクドライブ２１０、ＣＤ－ＲＯＭ、Ｂｌｕ－ｒａｙディスク^ＴＭ、または他の一時的でないコンピュータ可読記憶媒体から読み取ることができる。あるいは、ビットストリーム１３３が通信ネットワーク２２０または無線周波数受信機に接続されたサーバなどの外部ソースから受信されてもよい。ビットストリーム１３３は、復号される撮像フレームデータを表す符号化されたシンタックス要素を含む。 Video decoder 134 is shown in FIG. Although video decoder 134 of FIG. 4 is an example of a general purpose video coding (VVC) video decoding pipeline, other video codecs may be used to perform the processing stages described herein. As shown in FIG. 4, bitstream 133 is input to video decoder 134. Bitstream 133 may be read from memory 206, hard disk drive 210, CD-ROM, Blu-ray Disc ^™ , or other non-transitory computer-readable storage medium. Alternatively, bitstream 133 may be received from an external source, such as a server connected to communications network 220 or a radio frequency receiver. Bitstream 133 includes encoded syntax elements representing the imaging frame data to be decoded.

ビットストリーム１３３は、エントロピーデコーダモジュール４２０に入力される。エントロピーデコーダモジュール４２０は、「ｂｉｎｓ」のシーケンスを復号することによってビットストリーム１３３からシンタックス要素を抽出し、そのシンタックス要素の値をビデオデコーダ１３４内の他のモジュールに渡す。エントロピーデコーダモジュール４２０は、演算デコーディングエンジンを使用して、各シンタックス要素を１つ以上のビンのシーケンスとして復号する。各ビンは、ビンの「１」と「０」の値を符号化するために使用される確率レベルを記述するコンテキストと共に、一つ以上の「コンテキスト」を使用することができる。所与のビンに対して複数のコンテキストが利用可能な場合、「コンテキストモデリング」または「コンテキスト選択」ステップが、ビンを復号するために利用可能なコンテキストの１つを選択するために実行される。ビンを復号するプロセスは、順次フィードバックループを形成する。フィードバックループにおける動作の数は、エントロピーデコーダ４２０がビン／秒で高いスループットを達成することを可能にするために最小化されることが好ましい。コンテキストモデリングはコンテキスト、すなわち、現在のビンの前のプロパティを選択するときに、ビデオデコーダ１３４に知られているビットストリームの他のプロパティに依存する。例えば、コンテキストは、符号化ツリー内の現在のＣＵの四分木深さに基づいて選択され得る。依存性は、ビンを復号する前によく知られている特性に基づくか、または長い順次処理を必要とせずに決定されることが好ましい。符号化ツリーの四分木深さは、容易に知られているコンテキストモデリングに対する依存性の一例である。イントラ予測モードは、決定するのが比較的困難または計算集約的であるコンテキストモデリングのための依存性の一例である。イントラ予測モードは、「最も確率の高いモード（most probable modes）」（ＭＰＭ）のリストへのインデックスまたは「残りのモード」のリストへのインデックスのいずれかとして符号化され、ＭＰＭと残りのモードの間の選択は復号された「ｉｎｔｒａ＿ｌｕｍａ＿ｍｐｍ＿ｆｌａｇ」に従っている。ＭＰＭが使用されている場合、「ｉｎｔｒａ＿ｌｕｍａ＿ｍｐｍ＿ｉｄｘ」シンタックス要素が復号され、最も確率の高いモードのうちどれを使用するのかを選択する。一般に、６つのＭＰＭがある。残りのモードが使用されている場合、「ｉｎｔｒａ＿ｌｕｍａ＿ｒｅｍａｉｎｄｅｒ」シンタックス要素が復号され、残りの（非ＭＰＭ）モードのどれを使用するかを選択する。最も確率の高いモードと残りのモードの両方を決定することは、かなりの数の動作を必要とし、隣接ブロックのイントラ予測モードへの依存性を含む。例えば、隣接ブロックは、現在のブロックの左上のブロックであってもよい。望ましくは、各ＣＵのビンのコンテキストが、シグナリングされているイントラ予測モードを知ることなく、算術符号化エンジンによる構文解析を可能にして、決定することができる。したがって、逐次ビン復号のための算術符号化エンジンに存在するフィードバックループは、イントラ予測モードへの依存性を回避する。イントラ予測モード決定は、隣接ブロックのイントラ予測モードに対するＭＰＭリスト構成の依存性のために、別個のフィードバックループを用いて、後続の処理ステージに延期され得る。したがって、エントロピーデコーダモジュール４２０の演算デコードエンジンは、以前の（例えば、隣接する）ブロックのイントラ予測モードを知る必要なく、ｉｎｔｒａ＿ｌｕｍａ＿ｍｐｍ＿ｆｌａｇ、ｉｎｔｒａ＿ｌｕｍａ＿ｍｐｍ＿ｉｄｘ、ｉｎｔｒａ＿ｌｕｍａ＿ｒｅｍａｉｎｄｅｒを構文解析することができる。エントロピーデコーダモジュール４２０は、ビットストリーム１３３からシンタックス要素を復号するために、算術符号化アルゴリズム、例えば「コンテキスト適応２進算術符号化」（ＣＡＢＡＣ）を適用する。復号されたシンタックス要素は、ビデオデコーダ１３４内のパラメータを再構成するために使用される。パラメータは、残差係数（矢印４２４によって表される）と、イントラ予測モード（矢印４５８によって表される）などのモード選択情報とを含む。モード選択情報は、動きベクトル、および各ＣＴＵの１つまたは複数のＣＢへの分割などの情報も含む。パラメータは、典型的には以前に復号されたＣＢからのサンプルデータと組み合わせて、ＰＢを生成するために使用される。 Bitstream 133 is input to entropy decoder module 420. Entropy decoder module 420 extracts syntax elements from bitstream 133 by decoding the sequence of “bins” and passes the values of the syntax elements to other modules in video decoder 134. Entropy decoder module 420 decodes each syntax element as a sequence of one or more bins using a computational decoding engine. Each bin may use one or more "contexts," with a context describing the probability level used to encode the bin's "1" and "0" values. If multiple contexts are available for a given bin, a "context modeling" or "context selection" step is performed to select one of the available contexts for decoding the bin. The process of decoding the bins forms a sequential feedback loop. The number of operations in the feedback loop is preferably minimized to enable entropy decoder 420 to achieve high throughput in bins/second. Context modeling relies on context, ie, other properties of the bitstream known to video decoder 134, when selecting the previous property of the current bin. For example, a context may be selected based on the current CU's quadtree depth within the encoding tree. Preferably, the dependence is determined prior to decoding the bins based on well-known characteristics or without requiring lengthy sequential processing. The quadtree depth of the encoding tree is an example of a dependence on context modeling that is readily known. Intra prediction mode is an example of a dependency for context modeling that is relatively difficult or computationally intensive to determine. An intra-prediction mode is encoded as either an index into a list of "most probable modes" (MPM) or an index into a list of "remaining modes", and is an index between the MPM and the remaining modes. The selection between is according to the decoded "intra_luma_mpm_flag". If MPM is used, the "intra_luma_mpm_idx" syntax element is decoded to select which of the most probable modes to use. Generally, there are six MPMs. If the remaining modes are used, the "intra_luma_remainder" syntax element is decoded to select which of the remaining (non-MPM) modes to use. Determining both the most probable mode and the remaining modes requires a significant number of operations and involves dependence on the intra-prediction modes of neighboring blocks. For example, the adjacent block may be the upper left block of the current block. Desirably, the context of each CU's bin can be determined without knowing the intra prediction mode being signaled, allowing parsing by the arithmetic coding engine. Therefore, the feedback loop present in the arithmetic coding engine for sequential bin decoding avoids dependence on intra prediction mode. The intra-prediction mode decision may be deferred to a subsequent processing stage using a separate feedback loop due to the dependence of the MPM list configuration on the intra-prediction modes of neighboring blocks. Accordingly, the computational decode engine of entropy decoder module 420 can parse intra_luma_mpm_flag, intra_luma_mpm_idx, intra_luma_remainder without needing to know the intra prediction mode of previous (eg, adjacent) blocks. Entropy decoder module 420 applies an arithmetic coding algorithm, such as “context adaptive binary arithmetic coding” (CABAC), to decode syntax elements from bitstream 133. The decoded syntax elements are used to reconstruct parameters within video decoder 134. The parameters include residual coefficients (represented by arrow 424) and mode selection information, such as intra prediction mode (represented by arrow 458). The mode selection information also includes information such as motion vectors and the division of each CTU into one or more CBs. The parameters are used to generate the PB, typically in combination with sample data from previously decoded CBs.

残差係数４２４は、逆量子化モジュール４２８に入力される。逆量子化モジュール４２８は、残差係数４２４に対して逆量子化（または「スケーリング」）を実行して、量子化パラメータに従って、矢印４３２によって表される再構成された中間変換係数を生成する。再構成された中間変換係数４３２は、復号された「ｎｓｓｔ＿ｉｎｄｅｘ」シンタックス要素に従って、二次変換が適用されるか、または演算（バイパス）されない逆二次変換モジュール４３６に渡される。「ｎｓｓｔ＿ｉｎｄｅｘ」は、プロセッサ２０５の実行の下で、エントロピーデコーダ４２０によってビットストリーム１３３から復号される。図３を参照して説明されるように、「ｎｓｓｔ＿ｉｎｄｅｘ」は、ビットストリーム１３３から、ゼロから２の値を有する切り捨てられた単項シンタックス要素として復号される。逆二次変換モジュール４３６は、再構成された変換係数４４０を生成する。不均一な逆量子化行列の使用がビットストリーム１３３に示される場合、ビデオデコーダ１３４は、スケーリングファクタのシーケンスとしてビットストリーム１３３から量子化行列を読み出し、スケーリングファクタを行列に配置する。逆スケーリングは、量子化パラメータと組み合わせて量子化行列を使用して、再構成された中間変換係数４３２を生成する。 Residual coefficients 424 are input to an inverse quantization module 428. Inverse quantization module 428 performs inverse quantization (or "scaling") on residual coefficients 424 to produce reconstructed intermediate transform coefficients represented by arrows 432 according to quantization parameters. The reconstructed intermediate transform coefficients 432 are passed to an inverse quadratic transform module 436 where a quadratic transform is applied or not operated on (bypassed) according to the decoded "nsst_index" syntax element. “nsst_index” is decoded from bitstream 133 by entropy decoder 420 under execution of processor 205. As explained with reference to FIG. 3, "nsst_index" is decoded from the bitstream 133 as a truncated unary syntax element having a value between zero and two. Inverse quadratic transform module 436 produces reconstructed transform coefficients 440. If the use of a non-uniform inverse quantization matrix is indicated in bitstream 133, video decoder 134 reads the quantization matrix from bitstream 133 as a sequence of scaling factors and places the scaling factors into the matrix. Inverse scaling uses a quantization matrix in combination with a quantization parameter to produce reconstructed intermediate transform coefficients 432.

再構成された変換係数４４０は、逆一次変換モジュール４４４に渡される。モジュール４４４は、係数を周波数領域から空間領域に戻すように変換する。ＴＢは、有効残差係数値および非有効残差係数値に事実上基づいている。モジュール４４４の動作の結果は、矢印４４８によって表される残差サンプルのブロックである。残差サンプル４４８は、対応するＣＵとサイズが等しい。残差サンプル４４８は、加算モジュール４５０に供給される。加算モジュール４５０において、残差サンプル４４８は、復号されたＰＢ（４５２として表される）に加算されて、矢印４５６によって表される再構成されたサンプルのブロックを生成する。再構成サンプル４５６は、再構成サンプルキャッシュ４６０およびループ内フィルタリングモジュール４８８に供給される。ループ内フィルタリングモジュール４８８は、４９２として表されるフレームサンプルの再構成されたブロックを生成する。フレームサンプル４９２は、フレームバッファ４９６に書き込まれる。 The reconstructed transform coefficients 440 are passed to an inverse linear transform module 444. Module 444 transforms the coefficients from the frequency domain back to the spatial domain. TB is effectively based on valid and non-valid residual coefficient values. The result of the operation of module 444 is a block of residual samples represented by arrow 448. Residual samples 448 are equal in size to the corresponding CU. Residual samples 448 are provided to a summing module 450. At summing module 450, residual samples 448 are added to the decoded PB (represented as 452) to produce a block of reconstructed samples represented by arrow 456. Reconstructed samples 456 are provided to a reconstructed sample cache 460 and an in-loop filtering module 488. In-loop filtering module 488 produces a reconstructed block of frame samples, represented as 492. Frame samples 492 are written to frame buffer 496.

再構成サンプルキャッシュ４６０は、ビデオエンコーダ１１４の再構成サンプルキャッシュ３５６と同様に動作する。再構成されたサンプルキャッシュ４６０は（例えば、典型的には、オンチップメモリであるデータ２３２を代わりに使用することによって）メモリ２０６を介さずに後続のＣＢをイントラ予測するために必要とされる再構成されたサンプルのための記憶装置を提供する。矢印４６４によって表される参照サンプルは、再構成サンプルキャッシュ４６０から得られ、参照サンプルフィルタ４６８に供給されて、矢印４７２によって示されるフィルタリングされた参照サンプルを生成する。フィルタリングされた参照サンプル４７２は、イントラフレーム予測モジュール４７６に供給される。モジュール４７６は、ビットストリーム１３３でシグナリングされ、エントロピーデコーダ４２０によって復号されたイントラ予測モードパラメータ４５８に従って、矢印４８０によって表されるイントラ予測サンプルのブロックを生成する。 Reconstruction sample cache 460 operates similarly to reconstruction sample cache 356 of video encoder 114. Reconfigured sample cache 460 is needed to intra-predict subsequent CBs without going through memory 206 (e.g., by instead using data 232, which is typically on-chip memory). Provide storage for reconstructed samples. Reference samples represented by arrow 464 are obtained from reconstructed sample cache 460 and provided to reference sample filter 468 to produce filtered reference samples shown by arrow 472. Filtered reference samples 472 are provided to an intra-frame prediction module 476. Module 476 generates a block of intra-prediction samples, represented by arrow 480, according to intra-prediction mode parameters 458 signaled in bitstream 133 and decoded by entropy decoder 420.

ＣＢの予測モードがビットストリーム１３３におけるイントラ予測であることが示されていると、イントラ予測サンプル４８０は、マルチプレクサモジュール４８４を介して復号ＰＢ４５２を形成する。イントラ予測は、サンプルの予測ブロック（ＰＢ）、すなわち、同じ色成分内の「隣接サンプル」を使用して導出された１つの色成分内のブロックを生成する。隣接するサンプルは、現在のブロックに隣接するサンプルであり、ブロック復号順序において先行することにより、既に再構成されている。ルマおよびクロマブロックが並置される場合、ルマおよびクロマブロックは、異なるイントラ予測モードを使用することができる。しかしながら、２つのクロマチャネルはそれぞれ、同じイントラ予測モードを共有する。イントラ予測は、３つのタイプに分類される。「ＤＣイントラ予測」は、隣接するサンプルの平均を表す単一の値でＰＢをポピュレートすることを含む。「プレーンイントラ予測（Planar intra prediction）」は、隣接するサンプルから導出されるＤＣオフセットおよび垂直および水平勾配で、プレーンに従うサンプルでＰＢをポピュレートすることを含む。「角度イントラ予測（Angular intra prediction）」は、フィルタリングされ、ＰＢを横切って特定の方向（または「角度」）に伝播される隣接するサンプルでＰＢをポピュレートすることを含む。ＶＶＣ６５では、正方形ブロックでは使用できない追加の角度を使用できる矩形ブロックで角度がサポートされ、合計８７の角度が生成される。第４のタイプのイントラ予測は、クロマＰＢに利用可能であり、それによって、ＰＢは、「クロス構成要素線形モデル」（ＣＣＬＭ）モードに従って、並置されたルマ再構成サンプルから生成される。３つの異なるＣＣＬＭモードが利用可能であり、その各々は、隣接するルマ及びクロマサンプルから導出された異なるモデルを使用する。次いで、導出されたモデルを使用して、コロケートされたルマサンプルからクロマＰＢのサンプルのブロックを生成する。 If the prediction mode of the CB is indicated to be intra prediction in the bitstream 133, the intra prediction samples 480 form a decoded PB 452 via a multiplexer module 484. Intra-prediction produces predictive blocks (PBs) of samples, ie, blocks within one color component that are derived using "adjacent samples" within the same color component. Neighboring samples are samples that are adjacent to the current block and have already been reconstructed by being ahead in the block decoding order. If the luma and chroma blocks are co-located, the luma and chroma blocks can use different intra prediction modes. However, each of the two chroma channels share the same intra prediction mode. Intra predictions are classified into three types. "DC intra-prediction" involves populating the PB with a single value that represents the average of adjacent samples. "Planar intra prediction" involves populating the PB with samples that follow the plane, with DC offsets and vertical and horizontal gradients derived from neighboring samples. “Angular intra prediction” involves populating a PB with neighboring samples that are filtered and propagated in a particular direction (or “angle”) across the PB. In VVC65, angles are supported in rectangular blocks that allow additional angles not available in square blocks, producing a total of 87 angles. A fourth type of intra-prediction is available for chroma PB, whereby the PB is generated from collocated luma reconstructed samples according to a "cross component linear model" (CCLM) mode. Three different CCLM modes are available, each using a different model derived from adjacent luma and chroma samples. The derived model is then used to generate blocks of chroma PB samples from the collocated luma samples.

ＣＢの予測モードがビットストリーム１３３におけるインター予測であることが示されていると、動き補償モジュール４３４は、フレームバッファ４９６からサンプルのブロックを選択し、フィルタリングするために、動きベクトルおよび参照フレームインデックスを使用して、４３８として表されるインター予測サンプルのブロックを生成する。サンプル４９８のブロックは、フレームバッファ４９６に記憶された以前に復号されたフレームから得られる。双方向予測の場合、２つのサンプルのブロックが生成され、一緒にブレンドされて、復号されたＰＢ４５２のためのサンプルが生成される。フレームバッファ４９６には、ループ内フィルタリングモジュール４８８からのフィルタリングされたブロックデータ４９２でポピュレートされる。ビデオエンコーダ１１４のループ内フィルタリングモジュール３６８と同様に、ループ内フィルタリングモジュール４８８は、ＤＢＦ、ＡＬＦ、およびＳＡＯフィルタリング動作のいずれか、少なくとも、またはすべてを適用する。一般に、動きベクトルは、ルマチャネルとクロマチャネルの両方に適用されるが、サブサンプル補間ルマチャネルおよびクロマチャネルのフィルタリング処理は異なる。符号化ツリーにおける分割が比較的小さなルマブロックの集合をもたらし、対応するクロマ領域が対応する小さなクロマブロックに分割されない場合、ブロックは図１３および図１４をそれぞれ参照して説明されるように、符号化され、復号される。特に、いずれかの小さなルマブロックがインター予測を使用して予測される場合、インター予測動作は、ルマＣＢに対してのみ実行され、対応するクロマＣＢのいずれの部分に対しても実行されない。ループ内フィルタリングモジュール３６８は、再構成されたサンプル４５６からフィルタリングされたブロックデータ４９２を生成する。 If the prediction mode of the CB is indicated to be inter prediction in the bitstream 133, the motion compensation module 434 selects a block of samples from the frame buffer 496 and uses the motion vector and reference frame index for filtering. to generate a block of inter-predicted samples, denoted as 438. Blocks of samples 498 are obtained from previously decoded frames stored in frame buffer 496. For bidirectional prediction, two blocks of samples are generated and blended together to generate the samples for the decoded PB 452. Frame buffer 496 is populated with filtered block data 492 from in-loop filtering module 488 . Similar to in-loop filtering module 368 of video encoder 114, in-loop filtering module 488 applies any, at least, or all of DBF, ALF, and SAO filtering operations. Generally, motion vectors are applied to both luma and chroma channels, but the filtering process for subsample interpolated luma and chroma channels is different. If the partitioning in the coding tree results in a relatively small set of luma blocks and the corresponding chroma regions are not partitioned into corresponding small chroma blocks, the blocks are coded as described with reference to FIGS. 13 and 14, respectively. encoded and decrypted. In particular, if any small luma block is predicted using inter prediction, the inter prediction operation is performed only on the luma CB and not on any part of the corresponding chroma CB. In-loop filtering module 368 generates filtered block data 492 from reconstructed samples 456.

図５は、汎用ビデオ符号化のツリー構造内の１つまたは複数のサブ領域への領域の利用可能な分割（divisions）または分割（splits）の集合５００を示す概略ブロック図である。集合５００に示される分割（divisions）は、図３を参照して説明されるように、ラグランジュ最適化によって決定されるように、符号化ツリーに従って各ＣＴＵを１つまたは複数のＣＵまたはＣＢに分割するために、エンコーダ１１４のブロックパーティショナ３１０に利用可能である。 FIG. 5 is a schematic block diagram illustrating a set 500 of available divisions or splits of a region into one or more sub-regions within a tree structure for general purpose video encoding. The divisions shown in set 500 divide each CTU into one or more CUs or CBs according to the coding tree, as determined by Lagrangian optimization, as described with reference to FIG. It is available to block partitioner 310 of encoder 114 to do this.

集合５００は、正方形領域のみが他の、おそらくは正方形でないサブ領域に分割されていることを示すが、図５００は潜在的な分割を示しているが、包含領域が正方形であることを必要としないことを理解されたい。含有領域が非正方形の場合、分割から生じるブロックの寸法は含有ブロックの縦横比に従ってスケールされる。領域がそれ以上分割されなくなると、すなわち、符号化ツリーのリーフノードにおいて、ＣＵがその領域を占有する。ブロックパーティショナ３１０によるＣＴＵの１つまたは複数のＣＵへの特定のサブ分割は、ＣＴＵの「符号化ツリー」と呼ばれる。 Although set 500 shows that only square regions are partitioned into other, possibly non-square sub-regions, diagram 500 shows potential partitioning but does not require the containing region to be square. I hope you understand that. If the containing area is non-square, the dimensions of the block resulting from the split are scaled according to the aspect ratio of the containing block. When a region is no longer divided, ie at a leaf node of the coding tree, a CU occupies the region. A particular subdivision of a CTU into one or more CUs by block partitioner 310 is referred to as a "coding tree" of the CTU.

領域をサブ領域にサブ分割するプロセスは、結果として生じるサブ領域が最小ＣＵサイズに達したときに終了しなければならない。所定の最小サイズ、例えば、１６サンプルより小さいブロック領域を禁止するようにＣＵを制約することに加えて、ＣＵは、４の最小幅または高さを有するように制約される。幅および高さの両方に関して、または幅または高さに関して、他の最小値も可能である。サブ分割のプロセスは、最も深いレベルの分解の前に終了することもでき、その結果、ＣＵが最小ＣＵサイズよりも大きくなる。分割が起こらず、その結果、単一のＣＵがＣＴＵの全体を占有することが可能である。ＣＴＵの全体を占有する単一のＣＵは、最大の利用可能な符号化ユニットサイズである。また、分割が発生しないＣＵは、処理領域サイズよりも大きい。符号化ツリーの最高レベルでの２分割または３分割の結果として、６４×１２８、１２８×６４、３２×１２８、および１２８×３２などのＣＵサイズが可能であり、それぞれも処理領域サイズより大きい。図１０Ａ～１０Ｆを参照してさらに説明される処理領域サイズよりも大きいＣＵＳの例。４：２：０などのサブサンプリングされたクロマフォーマットの使用により、ビデオエンコーダ１１４およびビデオデコーダ１３４の構成は、ルマチャネルにおけるよりも早くクロマチャネルにおける領域の分割を終了させることができる。 The process of subdividing a region into subregions must end when the resulting subregion reaches the minimum CU size. In addition to constraining the CU to prohibit block regions smaller than a predetermined minimum size, eg, 16 samples, the CU is constrained to have a minimum width or height of 4. Other minimum values for both width and height or for width or height are also possible. The process of subdivision may also terminate before the deepest level of decomposition, resulting in a CU larger than the minimum CU size. No splitting occurs, so that a single CU can occupy the entire CTU. A single CU occupying the entire CTU is the largest available coding unit size. Further, a CU in which division does not occur is larger than the processing area size. As a result of the two or three splits at the highest level of the encoding tree, CU sizes such as 64x128, 128x64, 32x128, and 128x32 are possible, each of which is larger than the processing area size. An example of a CUS larger than the processing region size further described with reference to FIGS. 10A-10F. The use of a subsampled chroma format, such as 4:2:0, allows the configuration of video encoder 114 and video decoder 134 to finish dividing regions in the chroma channel sooner than in the luma channel.

符号化ツリーのリーフノードには、それ以上のサブ分割のないＣＵが存在する。例えば、リーフノード５１０は、１つのＣＵを含む。符号化ツリーの非リーフノードには、２つ以上のさらなるノードへの分割が存在し、各ノードはリーフノード従って１つのＣＵを含むか、またはより小さな領域へのさらなる分割を含むことができる。符号化ツリーの各リーフノードにおいて、各カラーチャネルに対して１つの符号化ブロックが存在する。ルマおよびクロマの両方について同じ深さで終端する分割は、３つの並置されたＣＢをもたらす。クロマよりも深いルマの深さで終端する分割は、複数のルマＣＢがクロマチャネルのＣＢと並置されることになる。 At the leaf nodes of the encoding tree, there are CUs with no further subdivisions. For example, leaf node 510 includes one CU. At non-leaf nodes of the encoding tree, there may be a partition into two or more further nodes, each node containing a leaf node and thus one CU, or a further partition into smaller regions. At each leaf node of the coding tree, there is one coding block for each color channel. A split terminating at the same depth for both luma and chroma results in three juxtaposed CBs. A split that terminates at a luma depth deeper than chroma will result in multiple luma CBs being juxtaposed with the chroma channel's CBs.

四分木分割５１２は図５に示すように、包含領域を４つの等しいサイズの領域に分割する。ＨＥＶＣと比較して、汎用ビデオ符号化（ＶＶＣ）は、水平２分割５１４および垂直２分割５１６を追加することにより、さらなる柔軟性を達成する。分割５１４および５１６の各々は、包含領域を２つの等しいサイズの領域に分割する。分割は、含有ブロック内の水平境界（５１４）または垂直境界（５１６）に沿っている。 Quadtree partitioning 512 partitions the containing region into four equally sized regions, as shown in FIG. Compared to HEVC, general purpose video coding (VVC) achieves additional flexibility by adding horizontal halves 514 and vertical halves 516. Each of partitions 514 and 516 divides the containing region into two equally sized regions. The divisions are along horizontal boundaries (514) or vertical boundaries (516) within the containing block.

水平３分割５１８および垂直３分割５２０を追加することにより、汎用ビデオ符号化においてさらなる柔軟性が達成される。３分割５１８および５２０は、ブロックを、包含領域の幅または高さの１／４および３／４に沿って水平方向（５１８）または垂直方向（５２０）のいずれかで境界をつけられた３つの領域に分割する。４分木、２分木、および３分木の組合せは、「ＱＴＢＴＴＴ」と呼ばれる。ツリーのルートには、ゼロ個以上の四分木分割（ツリーの「ＱＴ」セクション）が含まれる。ＱＴセクションが終了すると、ゼロまたはそれ以上の２分割または３分割（ツリーの「マルチツリー」または「ＭＴ」セクション）が発生し、最終的にツリーのリーフノードのＣＢまたはＣＵで終了する。ツリーがすべてのカラーチャネルを記述する場合、ツリーリーフノードはＣＵである。ツリーがルマチャネルまたはクロマチャネルを記述する場合、ツリーリーフノードはＣＢである。 Additional flexibility is achieved in general purpose video encoding by adding horizontal thirds 518 and vertical thirds 520. Triple divisions 518 and 520 divide the block into three sections bounded either horizontally (518) or vertically (520) along 1/4 and 3/4 of the width or height of the containing region. Divide into regions. The combination of quadtree, binary tree, and tertiary tree is called "QTBTTT." The root of the tree contains zero or more quadtree splits (the "QT" section of the tree). When a QT section ends, zero or more bipartitions or three-partitions ("multi-tree" or "MT" sections of the tree) occur, ultimately terminating in the CB or CU of a leaf node of the tree. If the tree describes all color channels, the tree leaf nodes are CUs. If the tree describes a luma or chroma channel, the tree leaf nodes are CBs.

４分木のみをサポートし、したがって正方形ブロックのみをサポートするＨＥＶＣと比較して、ＱＴＢＴＴＴは、特に２分木および／または３分木分割の可能な再帰的適用を考慮すると、より多くの可能なＣＵサイズをもたらす。異常な（正方形でない）ブロックサイズの可能性は、ブロック幅または高さが４サンプル未満であるか、または４サンプルの倍数ではないかのいずれかになる分割を排除するように分割オプションを制約することによって低減することができる。一般に、この制約は、ルマサンプルを考慮する際に適用される。しかしながら、説明した構成では、制約がクロマチャネル用のブロックに別々に適用することができる。クロマチャネルに対する分割オプションへの制約の適用は、フレームデータが４：２：０クロマフォーマットまたは４：２：２クロマフォーマットの場合など、ルマとクロマで最小ブロックサイズが異なり得る。各分割では、この包含領域に関して辺寸法が変わらない、半分になっている、または１／４になっているサブ領域が生成される。そして、ＣＴＵサイズは２のべき乗であるため、全てのＣＵの辺寸法も２のべき乗である。 Compared to HEVC, which supports only quadtrees and thus only square blocks, QTBTT has a larger number of possible yields the CU size. The possibility of unusual (non-square) block sizes constrains the split options to eliminate splits where the block width or height is either less than 4 samples or not a multiple of 4 samples. This can be reduced by Generally, this constraint is applied when considering luma samples. However, in the described configuration, constraints can be applied separately to blocks for chroma channels. Applying constraints to the splitting options for chroma channels may result in different minimum block sizes for luma and chroma, such as when frame data is in 4:2:0 chroma format or 4:2:2 chroma format. Each division generates subregions whose side dimensions are unchanged, halved, or quartered with respect to this included region. Since the CTU size is a power of 2, the side dimensions of all CUs are also powers of 2.

図６は、汎用ビデオ符号化で使用されるＱＴＢＴＴＴ(または「符号化ツリー」）構造のデータフロー６００を示す概略フロー図である。ＱＴＢＴＴＴ構造は、ＣＴＵを１つまたは複数のＣＵに分割することを定義するために、各ＣＴＵに対して使用される。各ＣＴＵのＱＴＢＴＴＴ構造は、ビデオエンコーダ１１４内のブロックパーティショナ３１０によって決定され、ビットストリーム１１５に符号化されるか、またはビデオデコーダ１３４内のエントロピーデコーダ４２０によってビットストリーム１３３から復号される。データフロー６００はさらに、図５に示される分割に従って、ＣＴＵを１つまたは複数のＣＵに分割するためにブロックパーティショナ３１０に利用可能な許容可能な組合せを特徴付ける。 FIG. 6 is a schematic flow diagram illustrating a data flow 600 of a QTBTTT (or "coding tree") structure used in general purpose video encoding. The QTBTT structure is used for each CTU to define the division of the CTU into one or more CUs. The QTBTTT structure of each CTU is determined by block partitioner 310 in video encoder 114 and encoded into bitstream 115 or decoded from bitstream 133 by entropy decoder 420 in video decoder 134. Data flow 600 further characterizes the permissible combinations available to block partitioner 310 to partition a CTU into one or more CUs according to the partitioning shown in FIG.

階層の最上位レベル、すなわちＣＴＵから始めて、ゼロまたはそれ以上の四分木分割が最初に実行される。具体的には、四分木（ＱＴ）分割決定６１０がブロックパーティショナ３１０によって行われる。「１」シンボルを返す６１０での決定は、四分木分割５１２に従って現在のノードを４つのサブノードに分割する決定を示す。その結果、６２０などの、４つの新しいノードが生成され、各新しいノードについて、ＱＴ分割決定６１０に戻る。各新しいノードは、ラスタ（またはＺスキャン）順序で考慮される。あるいは、ＱＴ分割決定６１０がさらなる分割が実行されるべきでないことを示す（「０」シンボルを返す）場合、四分木分割は停止し、マルチツリー（ＭＴ）分割がその後考慮される。 Starting from the top level of the hierarchy, ie, the CTU, zero or more quadtree splits are first performed. Specifically, a quadtree (QT) partitioning decision 610 is made by block partitioner 310. The decision at 610 to return a "1" symbol indicates a decision to split the current node into four subnodes according to quadtree partitioning 512. As a result, four new nodes are generated, such as 620, and for each new node, the QT split decision 610 is returned. Each new node is considered in raster (or Z-scan) order. Alternatively, if the QT split decision 610 indicates that no further splits should be performed (returns a "0" symbol), quadtree splits stop and multi-tree (MT) splits are then considered.

まず、ＭＴ分割決定６１２がブロックパーティショナ３１０によって行われる。６１２において、ＭＴ分割を実行する決定が示される。決定６１２で「０」のシンボルを返すことは、ノードのサブノードへのそれ以上の分割が実行されないことを示す。ノードのそれ以上の分割が実行されない場合、ノードは符号化ツリーのリーフノードであり、ＣＵに対応する。リーフノードは６２２で出力される。あるいは、ＭＴ分割６１２がＭＴ分割を実行する決定を示す（「１」シンボルを返す）場合、ブロックパーティショナ３１０は方向決定６１４に進む。 First, an MT partitioning decision 612 is made by the block partitioner 310. At 612, a decision to perform MT splitting is indicated. Returning a "0" symbol in decision 612 indicates that no further division of the node into subnodes is performed. If no further splitting of the node is performed, the node is a leaf node of the encoding tree and corresponds to a CU. Leaf nodes are output at 622. Alternatively, if MT partitioning 612 indicates a decision to perform MT partitioning (returning a "1" symbol), block partitioner 310 proceeds to direction determination 614.

方向決定６１４は、水平（「Ｈ」または「０」）または垂直（「Ｖ」または「１」）のいずれかとしてＭＴ分割の方向を示す。ブロックパーティショナ３１０は、決定６１４が水平方向を示す「０」を返す場合、決定６１６に進む。ブロックパーティショナ３１０は、決定６１４が垂直方向を示す「１」を返す場合、決定６１８に進む。 Direction determination 614 indicates the direction of the MT split as either horizontal (“H” or “0”) or vertical (“V” or “1”). Block partitioner 310 proceeds to decision 616 if decision 614 returns a "0" indicating horizontal direction. Block partitioner 310 proceeds to decision 618 if decision 614 returns a "1" indicating vertical direction.

決定６１６および６１８のそれぞれにおいて、ＭＴ分割のパーティション数は、ＢＴ／ＴＴ分割で２つ（２分割または「ＢＴ」ノード）または３つ（３分割または「ＴＴ」）のいずれかとして示される。すなわち、ＢＴ／ＴＴ分割決定６１６は、６１４からの指示された方向が水平であるときにブロックパーティショナ３１０によって行われ、ＢＴ／ＴＴ分割決定６１８は、６１４からの指示された方向が垂直であるときにブロックパーティショナ３１０によって行われる。 In each of decisions 616 and 618, the number of partitions for the MT split is indicated as either two (two splits or "BT" nodes) or three (three splits or "TT") for the BT/TT split. That is, a BT/TT split decision 616 is made by the block partitioner 310 when the indicated direction from 614 is horizontal, and a BT/TT split decision 618 is made when the indicated direction from 614 is vertical. This is sometimes done by block partitioner 310.

ＢＴ／ＴＴ分割決定６１６は、水平分割が「０」を返すことによって示される２分割５１４であるか、「１」を返すことによって示される３分割５１８であるかを示す。ＢＴ／ＴＴ分割決定６１６が２分割を示す場合、ＨＢＴＣＴＵノード生成ステップ６２５において、水平２分割５１４に従って、２つのノードがブロックパーティショナ３１０によって生成される。ＢＴ／ＴＴ分割６１６が３分割を示す場合、ＨＴＴＣＴＵノード生成ステップ６２６において、水平３分割５１８に従って、ブロックパーティショナ３１０によって３つのノードが生成される。 The BT/TT split decision 616 indicates whether the horizontal split is a split of two 514, indicated by returning a "0", or a split of three 518, indicated by returning a "1". If the BT/TT split decision 616 indicates a two-part split, two nodes are generated by the block partitioner 310 according to the horizontal two-part split 514 in an HBT CTU node generation step 625 . If the BT/TT partition 616 indicates a partition into three, three nodes are generated by the block partitioner 310 according to the horizontal partition into thirds 518 in an HTT CTU node generation step 626 .

ＢＴ／ＴＴ分割決定６１８は、垂直分割が「０」を返すことによって示される２分割５１６であるか、「１」を返すことによって示される３分割５２０であるかを示す。ＢＴ／ＴＴ分割６１８が２分割を示す場合、ＶＢＴＣＴＵノード生成ステップ６２７では、垂直２分割５１６に従って、ブロックパーティショナ３１０によって２つのノードが生成される。ＢＴ／ＴＴ分割６１８が３分割を示す場合、ＶＴＴＣＴＵノード生成ステップ６２８において、垂直３分割５２０に従って、ブロックパーティショナ３１０によって３つのノードが生成される。ステップ６２５～６２８から生じる各ノードについて、ＭＴ分割決定６１２に戻るデータフロー６００の再帰が、方向６１４に応じて、左から右へ、または上から下への順序で適用される。その結果、２分木および３分木分割を適用して、様々なサイズを有するＣＵを生成することができる。 The BT/TT split decision 618 indicates whether the vertical split is a split of two 516, indicated by returning a "0", or a split of three 520, indicated by returning a "1". If the BT/TT split 618 indicates a two-part split, then in the VBT CTU node generation step 627, two nodes are generated by the block partitioner 310 according to the vertical two-part split 516. If the BT/TT partition 618 indicates a partition into three, then three nodes are generated by the block partitioner 310 according to the vertical partition into thirds 520 in a VTT CTU node generation step 628 . For each node resulting from steps 625-628, the recursion of data flow 600 back to MT splitting decision 612 is applied in left-to-right or top-to-bottom order, depending on direction 614. As a result, binary tree and tertiary tree partitioning can be applied to generate CUs with various sizes.

符号化ツリーの各ノードにおける許可された分割および許可されない分割のセットは、図９を参照してさらに説明される。 The set of allowed and disallowed splits at each node of the encoding tree is further explained with reference to FIG.

図７Ａおよび７Ｂは、ＣＴＵ７１０のいくつかのＣＵまたはＣＢへの分割例７００を提供する。ＣＵ７１２の一例を図７Ａに示す。図７Ａは、ＣＴＵ７１０におけるＣＵの空間配置を示す。分割例７００は、図７Ｂに符号化ツリー７２０としても示されている。 7A and 7B provide an example partition 700 of a CTU 710 into several CUs or CBs. An example of the CU 712 is shown in FIG. 7A. FIG. 7A shows the spatial arrangement of CUs in CTU 710. Example partitioning 700 is also shown as encoding tree 720 in FIG. 7B.

図７ＡのＣＴＵ７１０内の各非リーフノード、例えばノード７１４、７１６および７１８において、収容されたノード（さらに分割されていてもよいし、ＣＵであってもよい）は、ノードのリストを作成するために「Ｚオーダー」でスキャンまたはトラバースされ、符号化ツリー７２０内のカラムとして表される。４分木分割の場合、Ｚオーダースキャンは、左上から右に続いて左下から右の順序になる。水平分割および垂直分割の場合、Ｚオーダースキャン（トラバーサル）は、それぞれ、上から下へ、および左から右へのスキャンに単純化する。図７Ｂの符号化ツリー７２０は、適用されたスキャンオーダーに従って、すべてのノードおよびＣＵをリストする。各分割は、リーフノード（ＣＵ）に到達するまで、ツリーの次のレベルで２、３、または４個の新しいノードのリストを生成する。 For each non-leaf node in CTU 710 of FIG. 7A, e.g. nodes 714, 716, and 718, the accommodated nodes (which may be further divided and may be CUs) are used to create a list of nodes. are scanned or traversed in “Z order” and represented as columns in encoding tree 720. In the case of quadtree partitioning, the Z-order scan is from top left to right, then bottom left to right. For horizontal and vertical splits, the Z-order scan (traversal) simplifies to top-to-bottom and left-to-right scans, respectively. The encoding tree 720 of FIG. 7B lists all nodes and CUs according to the applied scan order. Each split generates a list of 2, 3, or 4 new nodes at the next level of the tree until a leaf node (CU) is reached.

ブロックパーティショナ３１０によって画像をＣＴＵに分解し、さらにＣＵに分解し、図３を参照して説明されるように、各残差ブロック（３２４）を生成するためにＣＵを用いて、残差ブロックは、ビデオエンコーダ１１４によって順変換および量子化される。結果として得られるＴＢ３３６は、その後、エントロピー符号化モジュール３３８の動作の一部として、残差係数の順次リストを形成するためにスキャンされる。同等のプロセスがビットストリーム１３３からＴＢを得るために、ビデオデコーダ１３４内で実行される。 A block partitioner 310 decomposes the image into CTUs, further decomposes the image into CUs, and uses the CUs to generate each residual block (324), as described with reference to FIG. is forward transformed and quantized by video encoder 114. The resulting TB 336 is then scanned as part of the operation of the entropy encoding module 338 to form a sequential list of residual coefficients. An equivalent process is executed within video decoder 134 to obtain the TB from bitstream 133.

図７Ａおよび７Ｂの例は、ルマチャネルおよびクロマチャネルの両方に適用可能な符号化ツリーを説明する。しかしながら、図７Ａおよび図７Ｂの例は、ルマチャネルのみに適用可能な符号化ツリーまたはクロマチャネルのみに適用可能な符号化ツリーのトラバースに関する挙動も示す。多くのネストされた分割を持つ符号化ツリーの場合、より深いレベルで利用可能な分割オプションは、対応する小さな領域の利用可能なブロックサイズの制限によって制約される。小さな領域のための利用可能なブロックサイズに対する制限は、実装に不合理な負担を課すほど高いブロック処理レートの最悪の場合を防止するために課される。特に、ブロックサイズがクロマにおける１６（１６）個のサンプルの倍数であるという制約は、実装が１６（１６）個のサンプルの粒度でサンプルを処理することを可能にする。ブロックサイズを１６サンプルの倍数に制限することは、「イントラ再構成」フィードバックループ、すなわち、モジュール４５０、４６０、４６８、４７６、および４８４を含む図４のビデオデコーダ１３４内の経路、ならびにビデオエンコーダ１１４内の同等の経路に特に関連する。特に、ブロックサイズを１６（１６）サンプルの倍数に制限することは、イントラ予測モードにおけるスループットを維持するのに役立つ。例えば、「同時データ複数命令」(ＳＩＭＤ)マイクロプロセッサアーキテクチャは一般に、１６個のサンプルを含むことができるワイドワード上で動作する。また、ハードウェアアーキテクチャは、イントラ再構成フィードバックループに沿ってサンプルを転送するために、１６サンプルの幅を有するバスのような広いバスを使用することができる。より小さなブロックサイズ、例えば４つのサンプルが使用されるならば、バスは、例えばサンプルデータを含むバス幅の４分の１だけ、十分に利用されないであろう。利用不足のバスはより小さなブロック（すなわち、１６サンプル未満）を処理することができるが、比較的小さなサイズの多くのブロック又は全てのブロックのような最悪の場合のシナリオでは、利用不足がエンコーダ（１１４）又はデコーダ（１３４）のリアルタイム動作を妨げる結果となり得る。インター予測の場合、各ブロックは、フレームバッファ（バッファ３７２または４９６など）から取得された参照サンプルに依存する。フレームバッファは、先行するフレームを処理するときに参照サンプルで占められるので、インター予測ブロックを生成するためのブロックバイブロック動作に影響を及ぼすフィードバック依存ループはない。イントラフレーム再構成に関連するフィードバック依存ループに加えて、イントラ予測モード４５８の決定に関連する追加の同時フィードバックループが存在する。イントラ予測モード４５８は、最も確率の高いモードリストからモードを選択することによって、または残りのモードリストからモードを選択することによって決定される。最も確率の高いモードリストおよび残りのモードリストの決定は、隣接ブロックのイントラ予測モードを必要とする。比較的小さいブロックサイズが使用される場合、最も確率の高いモードリストおよび残りのモードリストはより頻繁に、すなわち、サンプルのブロックサイズおよびチャネルのサンプリングレートによって支配される周波数で決定される必要がある。 The examples of FIGS. 7A and 7B illustrate coding trees applicable to both luma and chroma channels. However, the examples of FIGS. 7A and 7B also illustrate behavior with respect to traversal of a coding tree that is applicable only to luma channels or only applicable to chroma channels. For coding trees with many nested partitions, the partitioning options available at deeper levels are constrained by the limits of the available block size of the corresponding small region. The limit on the available block size for small regions is imposed to prevent the worst case of block processing rates so high as to impose an unreasonable burden on the implementation. In particular, the constraint that the block size be a multiple of sixteen (16) samples in chroma allows the implementation to process samples at a granularity of sixteen (16) samples. Limiting the block size to multiples of 16 samples creates an "intra-reconstruction" feedback loop, i.e., a path within video decoder 134 of FIG. Particularly relevant to equivalent paths within. In particular, limiting block sizes to multiples of sixteen (16) samples helps maintain throughput in intra-prediction mode. For example, "Simultaneous Data Multiple Instruction" (SIMD) microprocessor architectures typically operate on wide words that can contain 16 samples. Also, the hardware architecture may use a wide bus, such as a bus with a width of 16 samples, to transfer samples along the intra-reconstruction feedback loop. If a smaller block size is used, eg 4 samples, the bus will be underutilized, eg by one quarter of the bus width containing the sample data. An underutilized bus can handle smaller blocks (i.e. less than 16 samples), but in the worst case scenario, such as many blocks or all blocks of relatively small size, the underutilized bus will cause the encoder ( 114) or the real-time operation of the decoder (134). For inter prediction, each block depends on reference samples obtained from a frame buffer (such as buffer 372 or 496). Since the frame buffer is populated with reference samples when processing the previous frame, there are no feedback dependent loops that affect block-by-block operations to generate inter-predicted blocks. In addition to the feedback dependent loops associated with intra frame reconstruction, there is an additional simultaneous feedback loop associated with determining intra prediction mode 458. Intra prediction mode 458 is determined by selecting a mode from the most probable mode list or by selecting a mode from the remaining mode list. Determining the most probable mode list and the remaining mode list requires the intra-prediction modes of neighboring blocks. If a relatively small block size is used, the most probable mode list and the remaining mode list need to be determined more frequently, i.e. at a frequency governed by the sample block size and the channel sampling rate. .

図８Ａ、８Ｂ、および８Ｃは、ルマ分割の前に終端され、４：２：０クロマフォーマットを使用するクロマ分割を有する符号化ツリー８２０（図８Ｂ）によるＣＴＵ８００（８Ａ）の例示的な分割を提供する。クロマ分割が終了する場合、各クロマチャネルに１つずつ、１対のＣＢが使用される。説明の便宜上、サイズ６４×６４ルマサンプルのＣＴＵ８００。ＣＴＵ８００は、１２８×１２８のＣＴＵサイズと、１つの追加の四分木分割を含む符号化ツリーとに等しい。四分木分割が８×８ルマ領域８１４に適用される。８×８ルマ領域８１４は、４つの４×４ルマＣＢに分割されるが、クロマチャネルでは分割は起こらない。その代わりに、所定の最小サイズ（記載された例では１６）のクロマＣＢペアが使用され、１つは各クロマチャネルに対応する。クロマＣＢのペアは、典型的には同時に処理されることが望ましいサンプルの数に対する最小粒度に対応する最小サイズである。例えば、ビデオエンコーダ１１４およびビデオエンコーダ１３４の多くの実装は例えば、ハードウェア実装における対応する幅の広い内部バスの使用により、１６サンプルのセットに対して動作する。さらに、分割から生じる各ルマＣＢは、少なくとも部分的に、クロマＣＢのペアと重なり、集合ルマＣＢは、クロマＣＢのペアと完全に重なる。領域８１４の例では、４×４のクロマＣＢのペアが生成される。図８Ｃは、結果として得られるルマＣＢとクロマＣＢとがどのように関連するかの例を示す。 8A, 8B, and 8C show an exemplary partitioning of CTU 800 (8A) by encoding tree 820 (Fig. 8B) with chroma partitioning terminated before luma partitioning and using a 4:2:0 chroma format. provide. When chroma splitting is finished, a pair of CBs are used, one for each chroma channel. For convenience of explanation, CTU800 with size 64x64 luma sample. CTU 800 is equal to a CTU size of 128x128 and a coding tree that includes one additional quadtree split. A quadtree decomposition is applied to the 8x8 luma region 814. The 8x8 luma region 814 is divided into four 4x4 luma CBs, but no division occurs in the chroma channels. Instead, chroma CB pairs of a predetermined minimum size (16 in the described example) are used, one corresponding to each chroma channel. A pair of chroma CBs is typically of a minimum size corresponding to the minimum granularity for the number of samples desired to be processed simultaneously. For example, many implementations of video encoder 114 and video encoder 134 operate on sets of 16 samples, eg, due to the use of correspondingly wider internal buses in the hardware implementation. Furthermore, each luma CB resulting from the splitting at least partially overlaps a pair of chroma CBs, and the aggregate luma CB completely overlaps a pair of chroma CBs. In the example of region 814, 4×4 chroma CB pairs are generated. FIG. 8C shows an example of how the resulting luma CB and chroma CB are related.

再び８Ａを参照すると、垂直２分割が１６×４ルマ領域８１０に適用される。１６×４ルマ領域８１０は、２つの８×４ルマＣＢに分割されるが、クロマチャネルには分割は起こらず、８×２クロマＣＢのペアをもたらす。１６×４ルマ領域８１２には、垂直３分割が適用される。１６×４ルマ領域８１２は、４×４、４×８、および４×４ルマＣＢに分割されるが、クロマチャネルには分割は起こらず、８×２クロマＣＢのペアをもたらす。水平２分割は、８×１６ルマ領域８１６に適用される。８×１６ルマ領域８１６は、８×４、８×８、および８×４ルマＣＢに分割されるが、クロマチャネルでは分割は起こらず、４×８クロマＣＢのペアをもたらす。したがって、クロマＣＢは、面積が少なくとも１６サンプルである。 Referring again to 8A, a vertical bisection is applied to the 16×4 luma region 810. The 16x4 luma region 810 is divided into two 8x4 luma CBs, but no splitting occurs in the chroma channels, resulting in a pair of 8x2 chroma CBs. Vertical division into thirds is applied to the 16×4 luma region 812. The 16x4 luma region 812 is divided into 4x4, 4x8, and 4x4 luma CBs, but no splitting occurs in the chroma channels, resulting in a pair of 8x2 chroma CBs. Horizontal bisection is applied to an 8x16 luma region 816. The 8x16 luma region 816 is divided into 8x4, 8x8, and 8x4 luma CBs, but no splitting occurs in the chroma channels, resulting in a pair of 4x8 chroma CBs. Therefore, Chroma CB is at least 16 samples in area.

図８Ｃは、異なる平面内の異なるブロック構造を例示するために、「爆発的（exploded）」（または分離）方式で示される３つの色平面を有するＣＴＵ８００の一部を示す。ルマサンプル平面８５０、第１のクロマサンプル平面８５２、および第２のクロマサンプル平面８５４が示されている。「ＹＣｂＣｒ」色空間が使用中であるとき、ルマサンプル平面８５０は画像フレームのＹサンプルを含み、第１のクロマサンプル平面８５２は画像フレームのＣｂサンプルを含み、第２のクロマサンプル平面８５４は画像フレームのＣｒサンプルを含む。４：２：０クロマフォーマットを使用すると、第１のクロマサンプル平面８５２および第２のクロマサンプル平面８５４は、ルマサンプル平面８５０に対して水平および垂直にサンプル密度の半分を有することになる。結果として、サンプル中のクロマブロックのＣＢ寸法は、典型的には対応するルマＣＢの寸法の半分である。すなわち、４：２：０クロマフォーマットの場合、クロマＣＢの幅および高さは、それぞれ、コロケートされたルマＣＢの幅および高さの半分である。４：２：２クロマフォーマットの場合、クロマＣＢの高さはコロケートされたルマＣＢの高さの半分であり、幅はコロケートされたルマＣＢの幅と同じである。明確にするために、８×１６ルマ領域８１６の符号化ツリーにおける親分割のみが示され、分割はルマサンプル平面８５０においてのみ示される。クロマ分割が終了すると、複数のルマＣＢがクロマＣＢのペアと並置される。例えば、ＣＴＵ８００の符号化ツリーは、８×１６ルマ領域８１６に適用される水平３分割を含む。水平３分割は、ルマサンプル平面８５０に存在する、８×４ルマＣＢ８６０、８×８ルマＣＢ８６２、および８×４ルマＣＢ８６４をもたらす。８×１６ルマ領域８１６は、クロマサンプル平面（８５２および８５４）内の４×８クロマサンプルのエリアに対応するので、符号化ツリーの３分割はクロマサンプル平面（８５２および８５４）には適用されない。したがって、４×８クロマサンプルの領域は、クロマについてのリーフノードを形成し、その結果、クロマＣＢのペア、すなわち、第１のクロマサンプル平面８５２についてのクロマＣＢ８６６と、第２のクロマサンプル平面８５４についてのクロマＣＢ８６８とが得られる。ルマ平面のみに適用される水平３分割の例では、３２サンプルの最小クロマＣＢサイズが達成される。他の例示的なルマ領域（８１０、８１２、および８１４）は、最小ルマブロックサイズおよびサンプル処理の所望の粒度に対応する、１６の最小クロマＣＢサイズをもたらす。 FIG. 8C shows a portion of a CTU 800 with three color planes shown in an "exploded" (or separated) manner to illustrate different block structures in different planes. A luma sample plane 850, a first chroma sample plane 852, and a second chroma sample plane 854 are shown. When the "YCbCr" color space is in use, the luma sample plane 850 contains the Y samples of the image frame, the first chroma sample plane 852 contains the Cb samples of the image frame, and the second chroma sample plane 854 contains the image frame's Cb samples. Contains Cr samples of the frame. Using a 4:2:0 chroma format, first chroma sample plane 852 and second chroma sample plane 854 will have half the sample density horizontally and vertically to luma sample plane 850. As a result, the CB dimensions of the chroma blocks in the sample are typically half the dimensions of the corresponding luma CBs. That is, for a 4:2:0 chroma format, the width and height of the chroma CB are half the width and height of the collocated luma CB, respectively. For the 4:2:2 chroma format, the height of the chroma CB is half the height of the collocated luma CB, and the width is the same as the width of the collocated luma CB. For clarity, only the parent split in the encoding tree of the 8x16 luma region 816 is shown, and the split is only shown in the luma sample plane 850. Once the chroma division is complete, multiple luma CBs are juxtaposed with a pair of chroma CBs. For example, the coding tree for CTU 800 includes horizontal third divisions applied to an 8x16 luma region 816. Horizontal division into thirds results in 8x4 luma CB 860, 8x8 luma CB 862, and 8x4 luma CB 864 that reside in luma sample plane 850. Since the 8x16 luma region 816 corresponds to an area of 4x8 chroma samples in the chroma sample planes (852 and 854), the 3 division of the encoding tree is not applied to the chroma sample planes (852 and 854). Therefore, the 4x8 chroma sample region forms a leaf node for chroma, resulting in a pair of chroma CBs: chroma CB 866 for the first chroma sample plane 852 and chroma CB 866 for the second chroma sample plane 854. Chroma CB868 is obtained. In the example of horizontal third division applied only to the luma plane, a minimum chroma CB size of 32 samples is achieved. Other example luma regions (810, 812, and 814) yield a minimum chroma CB size of 16, corresponding to the minimum luma block size and desired granularity of sample processing.

図９は、４：２：０クロマフォーマットの使用から生じるクロマチャネルのための変換ブロックサイズおよび関連するスキャンパターンの集合９００を示す。集合９００は、４：２：２クロマフォーマットにも使用することができる。記載された構成は、特に４：２：０及び４：２：２フォーマットに対して、画像フレームのクロマチャネルが画像フレームのルマチャネルに対してサブサンプリングされるクロマフォーマットを有する画像フレームと共に使用するのに適している。集合９００は、全ての可能なクロマ変換ブロックサイズを含まない。図９には、１６以下の幅または８以下の高さを有するクロマ変換ブロックのみが示されている。より大きな幅および高さを有するクロマブロックが生じ得るが、参照を容易にするために図９には示されていない。 FIG. 9 shows a set 900 of transform block sizes and associated scan patterns for chroma channels resulting from the use of a 4:2:0 chroma format. Set 900 can also be used for 4:2:2 chroma formats. The described configuration is suitable for use with image frames having a chroma format in which the chroma channels of the image frame are subsampled relative to the luma channels of the image frame, particularly for 4:2:0 and 4:2:2 formats. suitable for Set 900 does not include all possible chroma transform block sizes. In FIG. 9, only chroma conversion blocks with a width of 16 or less or a height of 8 or less are shown. Chroma blocks with larger widths and heights may occur, but are not shown in FIG. 9 for ease of reference.

禁止された変換サイズ９１０のセットは、変換ブロックサイズ２×２、２×４、および４×２を含み、これらはすべて、１６サンプル未満の領域を有する。言い換えれば、図９の例では、特にイントラ予測ＣＢについて、１６（１６）個のクロマサンプルの最小変換サイズが説明された構成の動作から生じる。禁止された変換サイズ９１０のインスタンスは、図１０を参照して説明したように、分割オプションを決定することによって回避される。変換における残差係数は、変換が「サブブロック」（または「係数グループ」）に分割される２層アプローチでスキャンされる。スキャンは、最後の有効（非ゼロ）係数からＤＣ（左上）係数に向かってスキャン経路に沿って行われる。スキャン経路は、各サブブロック（「下位層」）内の進行、および１つのサブブロックから次（「上位層」）への進行として定義される。集合９００では、８×２ＴＢ９２０が８×２サブブロック、すなわち、１６個の残差係数を含むサブブロックを使用する。２×８ＴＢ９２２は、２×８サブブロックを使用し、すなわち、１６個の残差係数も含む。 The set of prohibited transform sizes 910 includes transform block sizes 2x2, 2x4, and 4x2, all of which have areas less than 16 samples. In other words, in the example of FIG. 9, a minimum transform size of sixteen (16) chroma samples results from the operation of the described configuration, especially for the intra-predicted CB. Instances of prohibited transform sizes 910 are avoided by determining the splitting options as described with reference to FIG. The residual coefficients in the transform are scanned in a two-layer approach where the transform is divided into "subblocks" (or "coefficient groups"). The scan is performed along the scan path from the last valid (non-zero) coefficient to the DC (top left) coefficient. The scan path is defined as the progression within each subblock (the "lower layer") and from one subblock to the next (the "upper layer"). In set 900, 8x2 TB 920 uses 8x2 subblocks, ie, subblocks containing 16 residual coefficients. The 2x8 TB 922 uses 2x8 subblocks, i.e. also includes 16 residual coefficients.

幅または高さが２であり、他の寸法が８の倍数であるＴＢは、複数の２×８または８×２サブブロックを使用する。したがって、いくつかの例では２つのサンプルの幅を有するクロマブロックが、ブロックをサブブロックに分割することを使用して符号化され、サイズ２×８サンプルのそれぞれと、２つのサンプルの高さを有するクロマブロックとはいくつかの例ではブロックをサブブロックに分割することを使用して符号化され、サイズ８×２サンプルのそれぞれである。例えば、１６×２ＴＢ９１６は、２つの８×２サブブロックを有し、各サブブロックは、ＴＢ９２０に対して示されるようにスキャンされる。サブブロック進行９１７に示すように、１つのサブブロックから次へのスキャンの進行。 A TB with a width or height of 2 and other dimensions that are multiples of 8 uses multiple 2x8 or 8x2 subblocks. Thus, in some examples a chroma block with a width of two samples is encoded using dividing the block into subblocks, each of size 2x8 samples and a height of two samples. A chroma block having a size of 8×2 samples each is encoded using dividing the block into subblocks in some examples. For example, a 16x2 TB 916 has two 8x2 subblocks, and each subblock is scanned as shown for TB 920. The progression of the scan from one subblock to the next, as shown in subblock progression 917.

２×３２ＴＢ（図９には図示せず）は、１×４アレイとして配置された４つの２×８サブブロックを使用する。各サブブロック内の残差係数は、２×８ＴＢ９２２について示されるようにスキャンされ、サブブロックは１×４アレイの最下位サブブロックから最上位サブブロックまで進む。 2x32 TB (not shown in Figure 9) uses four 2x8 subblocks arranged as a 1x4 array. The residual coefficients within each subblock are scanned as shown for 2×8 TB 922, with the subblocks proceeding from the lowest to the highest subblock of the 1×4 array.

ＴＢが大きければ大きいほど、同様のスキャンの進行に続く。幅および高さがそれぞれ４以上であるすべてのＴＢについて、４×４サブブロックスキャンが使用される。例えば、４×８ＴＢ９２３は、下位サブブロックから上部サブブロックへの進行と共に、４×４サブブロックスキャン９２４を使用する。４×４ＴＢ９２５は、同様の方法でスキャンすることができる。８×８ＴＢ９２９は、４つの４×４サブブロックに対して進行９３０を使用する。すべての場合において、サブブロック内のスキャンおよびサブブロックからサブブロックへの進行は、後方対角スキャン（a backward diagonal scan）に続き、すなわち、スキャンは、「最後の」有意残差係数からＴＢの左上残差係数に向かって後方に進行する。図９はまた、例えば、８×４ＴＢ９３２、１６×４ＴＢ９３４、および１６×８ＴＢ９３６にわたるスキャン順序を示す。さらに、スキャン経路に沿った最後の有意係数の位置に応じて、サブブロックの最後の有意係数位置から左上の残差係数に戻るまでの最後の有意残差係数を含むサブブロックの部分のみをスキャンする必要がある。順方向（すなわち、ブロックの右下により近い）にスキャン経路に沿ったさらなるサブブロックは、スキャンされる必要はない。集合９００、特に禁止された変換サイズ９１０は、図１０を参照して説明されるように、クロマにおける符号化ツリーの領域（またはノード）をサブ領域（またはサブノード）に分割する能力に制限を課す。 The larger the TB, the more similar scans will follow. For all TBs whose width and height are each greater than or equal to 4, a 4x4 subblock scan is used. For example, a 4x8 TB 923 uses a 4x4 subblock scan 924 with progression from the lower subblock to the upper subblock. A 4×4 TB925 can be scanned in a similar manner. The 8x8 TB 929 uses progression 930 for four 4x4 subblocks. In all cases, the scan within a subblock and the progression from subblock to subblock follows a backward diagonal scan, i.e. the scan starts from the "last" significant residual coefficient to Proceed backwards towards the upper left residual coefficients. FIG. 9 also shows the scan order across, for example, 8x4 TB 932, 16x4 TB 934, and 16x8 TB 936. Furthermore, depending on the position of the last significant coefficient along the scan path, only the part of the subblock that contains the last significant residual coefficient from the last significant coefficient position of the subblock back to the top left residual coefficient is scanned. There is a need to. Further sub-blocks along the scan path in the forward direction (ie closer to the bottom right of the block) do not need to be scanned. Set 900, and in particular prohibited transform sizes 910, impose limits on the ability to partition regions (or nodes) of the encoding tree in chroma into sub-regions (or sub-nodes), as explained with reference to FIG. .

２×２、２×４、および４×２のＴＢ（ＴＢ９１０のセット）を使用するＶＶＣシステムでは、２×２のサブブロックが２つのサンプルの幅および／または高さのＴＢのために使用され得る。上述したように、ＴＢ９１０の使用は、イントラ再構成フィードバック依存性ループにおけるスループット制約を増加させる。さらに、４つの係数のみを有するサブブロックの使用は、より高いスループットで残差係数を構文解析することの困難性を増加させる。特に、各サブブロックについて、「有意性マップ」は、その中に含まれる各残差係数の有意性を示す。１値の有意性フラグの符号化は、残差係数の大きさを少なくとも１であるとして確立し、ゼロ値フラグの符号化は、残差係数の大きさをゼロとして確立する。（１つ前方からの）残差係数の大きさおよび符号は、「有意である」残差係数についてのみ符号化される。有意ビットは符号化されず、大きさ（ゼロから）がＤＣ係数に対して常に符号化される。高スループットエンコーダおよびデコーダは、リアルタイム動作を維持するために、クロックサイクル当たり複数の有意性マップビンを符号化または復号する必要があり得る。サイクル当たりのマルチビン符号化および復号の難しさは、ビン間依存性がより多いとき、例えば、より小さいサブブロックサイズが使用されるとき、増加する。システム１００において、サブブロックサイズは、ブロックサイズにかかわらず、（最後の有意係数を含むサブブロックの例外にもかかわらず）１６である。 In VVC systems using 2x2, 2x4, and 4x2 TBs (sets of TB910), 2x2 subblocks are used for TBs that are two samples wide and/or high. obtain. As mentioned above, the use of TB 910 increases throughput constraints in intra-reconfiguration feedback dependency loops. Furthermore, the use of subblocks with only four coefficients increases the difficulty of parsing the residual coefficients with higher throughput. In particular, for each subblock, a "significance map" indicates the significance of each residual coefficient contained therein. Encoding a one-valued significance flag establishes the magnitude of the residual coefficient as at least one, and encoding a zero-valued flag establishes the magnitude of the residual coefficient as zero. The magnitude and sign of the residual coefficients (from one step forward) are encoded only for those residual coefficients that are "significant." Significant bits are not encoded, and magnitude (from zero) is always encoded for DC coefficients. High-throughput encoders and decoders may need to encode or decode multiple significance map bins per clock cycle to maintain real-time operation. The difficulty of multi-bin encoding and decoding per cycle increases when there are more inter-bin dependencies, eg, when smaller subblock sizes are used. In system 100, the subblock size is 16 regardless of block size (with the exception of the subblock containing the last significant coefficient).

図１０は、クロマ符号化ツリーにおいて許可された分割のリストを生成するための規則１０００のセットを示す。他のフレームは、インター予測されたブロックとイントラ予測されたブロックとの混合を可能にすることができる。符号化ツリーの利用可能な分割の全セットを図６を参照して説明したが、利用可能な変換サイズに対する制限は所与の領域サイズに対する特定の分割オプションに制約を課す。以下に説明するように、各クロマチャネルに対する分割オプションは、対応する符号化ツリーユニットの領域の寸法に従って決定される。 FIG. 10 shows a set of rules 1000 for generating a list of allowed splits in a chroma encoding tree. Other frames may allow for a mix of inter-predicted blocks and intra-predicted blocks. Although the full set of available partitions of the encoding tree has been described with reference to FIG. 6, the limitations on the available transform sizes impose constraints on the particular partitioning options for a given region size. As explained below, the splitting options for each chroma channel are determined according to the dimensions of the region of the corresponding coding tree unit.

クロマ領域のための規則１０２０は、異なる領域の許可された分割を示す。規則１０２０の許可された分割は、異なるクロマフォーマットが使用されている場合があるので、クロマチャネルが考慮中であっても、ルマサンプルの単位で表現される。 Rules 1020 for chroma regions indicate the allowed division of different regions. The allowed divisions of rule 1020 are expressed in units of luma samples even though chroma channels are being considered since different chroma formats may be used.

符号化ツリーのノードを横断する際に、符号化ツリーの領域サイズをもつ分割オプションのセットの利用可能性をチェックすることにより、クロマに対する許可された分割のリストを得る。ＣＢを使用して符号化される可能性のある領域をもたらす分割オプションは、許可される分割のリストに追加される。ＣＢを使用して符号化される領域のためには、領域サイズが集合９００からの特定のサイズの整数個の変換で符号化を可能にしなければならない。特定のサイズは、（幅および高さの両方を考慮して）領域サイズを超えない最大サイズであるように選択される。したがって、より小さい領域に対しては、単一の変換が使用される。領域サイズが最大の利用可能な変換のサイズを超える場合、最大の利用可能な変換は、領域の全体を占有するようにタイル化される。 Obtain the list of allowed splits for chroma by checking the availability of a set of split options with the region size of the coding tree when traversing the nodes of the coding tree. Split options that result in regions that may be encoded using CB are added to the list of allowed splits. For a region to be encoded using CB, the region size must allow encoding with an integer number of transforms of a particular size from set 900. The particular size is chosen to be the largest size (considering both width and height) that does not exceed the region size. Therefore, for smaller regions, a single transform is used. If the region size exceeds the size of the largest available transform, the largest available transform is tiled to occupy the entire region.

所与の領域（ルマサンプルで表される）を有する符号化ツリー内のノードを考慮する場合、所与のタイプの分割を実行する能力は、分割タイプおよびクロマ領域エリアに従って決定される。図１０に示すように、分割オプションは分割オプションが禁止サイズのサブ領域をもたらすかどうかを決定するために、領域サイズに対してテストされる。許可されたサイズのサブ領域をもたらす分割オプションは、許可されたクロマ分割１０７０と見なされる。 When considering a node in a coding tree with a given region (represented by luma samples), the ability to perform a given type of segmentation is determined according to the segmentation type and chroma region area. As shown in FIG. 10, the splitting option is tested against the region size to determine whether the splitting option results in a subregion of prohibited size. Splitting options that result in subregions of the allowed size are considered allowed chroma splitting 1070.

例えば、ＱＴモードである場合（図６の決定６１０に対応する）、クロマ領域のための規則１０２１ａとして示されるように、領域が４：２：０フォーマットのサイズ８×８または４：２：２フォーマットの８×８である場合、分割がクロマチャネルに対してそれぞれ２×２または２×４の変換サイズをもたらすので、四分木分割は許可されない。許容可能な領域サイズを矢印１０２１で示す。同様に、クロマ規則セット１０２０に対する他の許容可能な分割は、矢印１０２２、１０２３、１０２４、１０２５、および１０２６によって示され、図１３および図１４に関連して以下に説明される。矢印１０２１、１０２２、１０２３、１０２４、１０２５および１０２６は、それぞれ許可されたクロマ分割リスト１０７０を参照する。 For example, if in QT mode (corresponding to decision 610 of FIG. 6), the region is of size 8x8 or 4:2:2 in 4:2:0 format, as shown as rule 1021a for chroma region. For the 8x8 format, quadtree splitting is not allowed because splitting results in a transform size of 2x2 or 2x4 for the chroma channels, respectively. Allowable area sizes are indicated by arrows 1021. Similarly, other acceptable divisions for chroma rule set 1020 are indicated by arrows 1022, 1023, 1024, 1025, and 1026 and are discussed below in connection with FIGS. 13 and 14. Arrows 1021, 1022, 1023, 1024, 1025 and 1026 each refer to the allowed chroma split list 1070.

クロマチャネルの領域サイズは、ルマサンプルグリッドに関して記述される。たとえば、８ｘ４領域は、４：２：０クロマフォーマットが使用されている場合、クロマチャネルの４ｘ２変換に対応する。４：２：２クロマフォーマットが使用されている場合、８ｘ４領域はクロマの４ｘ４変換に対応する。４：４：４クロマフォーマットが使用されているとき、クロマはルマに関してサブサンプリングされず、したがって、クロマにおける変換サイズは領域サイズに対応する。 The region size of the chroma channel is described in terms of the luma sample grid. For example, an 8x4 region corresponds to a 4x2 transformation of the chroma channels if a 4:2:0 chroma format is used. If a 4:2:2 chroma format is used, the 8x4 region corresponds to a 4x4 transformation of the chroma. When the 4:4:4 chroma format is used, chroma is not subsampled with respect to luma, so the transform size in chroma corresponds to the region size.

許容可能な分割オプションは、以下の図１３および図１４に関連してさらに説明される。 Acceptable splitting options are further described in connection with FIGS. 13 and 14 below.

図１１は、画像フレームの符号化ツリーをビデオビットストリームに符号化する方法１１００を示す。方法１１００は、構成されたＦＰＧＡ、ＡＳＩＣ、またはＡＳＳＰなどの装置によって実施され得る。さらに、方法１１００は、プロセッサ２０５の実行下でビデオデコーダ１１４によって実行されてもよい。したがって、方法１１００は、コンピュータ可読記憶媒体および／またはメモリ２０６に記憶されてもよい。方法１１００は、クロマフォーマットを判定するステップ１１０５で開始する。 FIG. 11 shows a method 1100 for encoding an encoding tree of image frames into a video bitstream. Method 1100 may be implemented by a device such as a configured FPGA, ASIC, or ASSP. Further, method 1100 may be performed by video decoder 114 under execution of processor 205. Accordingly, method 1100 may be stored on computer readable storage medium and/or memory 206. Method 1100 begins with step 1105 of determining a chroma format.

クロマフォーマットを判定するステップ１１０５において、プロセッサ２０５は、フレームデータ１１３のクロマフォーマットを、４：２：０クロマフォーマットまたは４：２：２クロマフォーマットのうちの１つとして判定する。クロマフォーマットはフレームデータのプロパティであり、方法１１００の動作中に変化しない。方法１１００は、プロセッサ２０５の制御下で、ステップ１１０５からフレームをＣＴＵに分割するステップ１１１０に続く。 In determining chroma format step 1105, processor 205 determines the chroma format of frame data 113 as one of a 4:2:0 chroma format or a 4:2:2 chroma format. Chroma format is a property of the frame data and does not change during operation of method 1100. The method 1100 continues from step 1105 with step 1110 of dividing the frame into CTUs under the control of processor 205.

フレームをＣＴＵに分割するステップ１１１０において、ブロックパーティショナ３１０は、プロセッサ２０５の実行下で、フレームデータ１１３の現在のフレームをＣＴＵのアレイに分割する。分割から生じるＣＴＵにわたる符号化の進行が開始する。プロセッサ内の制御は、ステップ１１１０から符号化ツリーを決定するステップ１１２０に進む。 In step 1110 of partitioning the frame into CTUs, block partitioner 310, under execution of processor 205, partitions the current frame of frame data 113 into an array of CTUs. The progression of encoding over the CTUs resulting from the split begins. Control within the processor passes from step 1110 to step 1120 where an encoding tree is determined.

符号化ツリーを決定するステップ１１２０において、ビデオエンコーダ１１４は、プロセッサ２０５の実行下で、様々な予測モードおよび分割オプションを組み合わせてテストして、ＣＴＵの符号化ツリーに到達する。また、ＣＴＵに対する符号化ツリーの各ＣＵに対する予測モードと残差係数を導出する。一般に、ラグランジュ最適化は、ＣＴＵのための最適な符号化ツリーおよびＣＵを選択するために実行される。インター予測の使用を評価する場合、候補動きベクトルのセットから動きベクトルが選択される。候補動きベクトルは、サーチパターンに従って生成される。候補動きベクトルに対するフェッチされた参照ブロックの歪みのテストを評価する場合、符号化ツリーにおける禁止されたクロマ分割の適用が考慮される。分割がクロマにおいて禁止され、ルマにおいて許可される場合、結果として生じるルマＣＢは、インター予測を使用することができる。動き補償はルマチャンネルのみに適用されるため、歪み演算ではルマ歪みが考慮され、クロマ歪みは考慮されない。クロマ分割が禁止されていた場合、クロマチャンネルで動き補償が行われないため、クロマ歪みは考慮されない。クロマについては、考慮されるイントラ予測モードおよび符号化されたクロマＴＢ（もしあれば）から生じる歪みが考慮される。ルマとクロマの両方を考慮する場合、インター予測検索では、まずルマ歪みに基づいて動きベクトルを選択し、次にクロマ歪みも考慮して動きベクトルを「リファイン」することがある。リファインメントは一般に、サブピクセル変位量のような動きベクトル値上の小さな変動を考慮する。クロマ分割が禁止され、小さいルマブロックに対するインター予測の評価が実行される場合、クロマリファインメントは必要とされない。プロセッサ２０５内の制御は、ステップ１１２０から符号化ツリーを符号化するステップ１１３０に進む。 In determining a coding tree step 1120, video encoder 114, under execution of processor 205, tests various prediction modes and splitting options in combination to arrive at a coding tree of CTUs. Furthermore, the prediction mode and residual coefficients for each CU of the coding tree for the CTU are derived. Generally, Lagrangian optimization is performed to select the optimal coding tree and CU for the CTU. When evaluating the use of inter prediction, a motion vector is selected from a set of candidate motion vectors. Candidate motion vectors are generated according to the search pattern. When evaluating the distortion test of fetched reference blocks against candidate motion vectors, the application of forbidden chroma splitting in the encoding tree is taken into account. If segmentation is prohibited in chroma and allowed in luma, the resulting luma CB may use inter-prediction. Since motion compensation is applied only to the luma channel, the distortion calculation takes into account luma distortion and not chroma distortion. If chroma division was prohibited, no motion compensation would be performed on the chroma channel, so chroma distortion would not be taken into account. For chroma, the distortion resulting from the considered intra-prediction mode and the encoded chroma TB (if any) is taken into account. When considering both luma and chroma, an inter-predictive search may first select a motion vector based on luma distortion and then "refine" the motion vector by also considering chroma distortion. Refinement generally takes into account small variations in motion vector values, such as sub-pixel displacements. If chroma segmentation is prohibited and inter-prediction evaluation is performed on small luma blocks, chroma refinement is not required. Control within processor 205 passes from step 1120 to step 1130, which encodes the encoding tree.

符号化ツリーを符号化するステップ１１３０において、ビデオエンコーダ１１４は、プロセッサ２０５の実行下で、図１３に関連して説明する方法１３００を実行して、現在のＣＴＵの符号化ツリーをビットストリーム１１５に符号化する。ステップ１１３０は、現在のＣＴＵをビットストリームに符号化するために実行される。プロセッサ２０５における制御は、ステップ１１３０から最後のＣＴＵテストステップ１１４０に進む。 In step 1130 of encoding a coding tree, video encoder 114, under execution of processor 205, performs method 1300 described in connection with FIG. encode. Step 1130 is performed to encode the current CTU into a bitstream. Control in processor 205 passes from step 1130 to a final CTU test step 1140.

最後のＣＴＵテストステップ１１４０において、プロセッサ２０５は、現在のＣＴＵがスライス又はフレーム内の最後のＣＴＵであるかどうかをテストする。そわない場合（ステップ１１４０で「ＮＯ」）、ビデオエンコーダ１１４は、フレーム内の次のＣＴＵに進み、プロセッサ２０５内の制御はステップ１１４０からステップ１１２０に戻り、フレーム内の残りのＣＴＵの処理を継続する。ＣＴＵがフレームまたはスライス内の最後のＣＴＵである場合、ステップ１１４０は「ＹＥＳ」に戻り、方法１１００は終了する。方法１１００の結果として、画像フレーム全体がＣＴＵのシーケンスとしてビットストリームに符号化される。 In a last CTU test step 1140, processor 205 tests whether the current CTU is the last CTU in the slice or frame. If not (“NO” in step 1140), video encoder 114 advances to the next CTU in the frame and control within processor 205 returns from step 1140 to step 1120 to continue processing the remaining CTUs in the frame. do. If the CTU is the last CTU in the frame or slice, step 1140 returns "YES" and method 1100 ends. As a result of method 1100, an entire image frame is encoded into a bitstream as a sequence of CTUs.

図１２は、ビデオビットストリームから画像フレームの符号化ツリーを復号する方法１２００を示す。方法１２００は、構成されたＦＰＧＡ、ＡＳＩＣ、またはＡＳＳＰなどの装置によって実施され得る。さらに、方法１２００は、プロセッサ２０５の実行下でビデオデコーダ１３４によって実行されてもよい。したがって、方法１２００は、コンピュータ可読記憶媒体および／またはメモリ２０６に記憶されてもよい。方法１２００は、クロマフォーマットを判定するステップ１２０５で開始する。 FIG. 12 shows a method 1200 for decoding an encoding tree of image frames from a video bitstream. Method 1200 may be implemented by a device such as a configured FPGA, ASIC, or ASSP. Further, method 1200 may be performed by video decoder 134 under execution of processor 205. Accordingly, method 1200 may be stored on computer readable storage medium and/or memory 206. Method 1200 begins with step 1205 of determining a chroma format.

クロマフォーマットを判定するステップ１２０５において、プロセッサ２０５は、フレームデータ１１３のクロマフォーマットを、４：２：０クロマフォーマットまたは４：２：２クロマフォーマットのうちの１つとして判定する。クロマフォーマットはフレームデータのプロパティであり、方法１２００の動作中に変化しない。ビデオデコーダ１３４は、ビットストリーム１３３のプロファイルによってクロマフォーマットを判定してもよい。プロファイルは特定のビットストリーム１３３によって使用され得る符号化ツールのセットを定義し、クロマフォーマットを４：２：０のような特定の値に制約し得る。プロファイルは例えば、ビットストリーム１３３からの「ｐｒｏｆｉｌｅ＿ｉｄｃ」シンタックス要素を復号することによって、またはビットストリーム１３３からの１つ以上の制約フラグを復号することによって判定され、各制約フラグはビットストリーム１３３における特定のツールの使用を制約する。クロマフォーマットがプロファイルによって完全に特定されていない場合、「ｃｈｒｏｍａ＿ｆｏｒｍａｔ＿ｉｄｃ」のようなさらなるシンタックスを復号して、クロマフォーマットを判定してもよい。方法１２００は、ステップ１２０５からフレームをＣＴＵに分割するステップ１２１０まで、プロセッサ２０５の実行下で継続する。 In determining chroma format step 1205, processor 205 determines the chroma format of frame data 113 as one of a 4:2:0 chroma format or a 4:2:2 chroma format. Chroma format is a property of the frame data and does not change during operation of method 1200. Video decoder 134 may determine the chroma format according to the profile of bitstream 133. A profile defines the set of encoding tools that may be used by a particular bitstream 133 and may constrain the chroma format to particular values, such as 4:2:0. The profile is determined, for example, by decoding a "profile_idc" syntax element from bitstream 133 or by decoding one or more constraint flags from bitstream 133, each constraint flag being a specific restrict the use of tools. If the chroma format is not fully specified by the profile, further syntax such as "chroma_format_idc" may be decoded to determine the chroma format. Method 1200 continues under execution of processor 205 from step 1205 to step 1210 of dividing the frame into CTUs.

フレームをＣＴＵに分割するステップ１２１０において、ビデオデコーダ１３４は、プロセッサ２０５の実行下で、ＣＴＵのアレイに復号されるフレームデータ１３３の現在のフレームの分割を決定する。決定された分割から生じるＣＴＵにわたる復号の進行が開始する。プロセッサ内の制御は、ステップ１２１０から符号化ツリーを復号するステップ１２２０に進む。 In step 1210 of dividing the frame into CTUs, video decoder 134, under execution of processor 205, determines the division of the current frame of frame data 133 to be decoded into an array of CTUs. The decoding process begins over the CTUs resulting from the determined partition. Control within the processor passes from step 1210 to step 1220, which decodes the encoding tree.

符号化ツリーを復号するステップ１２２０において、ビデオデコーダ１３４はプロセッサ２０５の実行下で、ビットストリーム１３３から現在のＣＴＵの符号化ツリーを復号するために、現在のＣＴＵに対して方法１４００を実行する。現在のＣＴＵは、ステップ１２１０の実行から生じるＣＴＵのうちの選択された１つである。プロセッサ２０５における制御は、ステップ１２２０から最後のＣＴＵテストステップ１２４０に進む。 In step 1220 of decoding the coding tree, video decoder 134 under execution of processor 205 performs method 1400 on the current CTU to decode the coding tree of the current CTU from bitstream 133. The current CTU is the selected one of the CTUs resulting from the execution of step 1210. Control in processor 205 passes from step 1220 to a final CTU test step 1240 .

最後のＣＴＵテストステップ１２４０において、プロセッサ２０５は、現在のＣＴＵがスライス又はフレーム内の最後のＣＴＵであるかどうかをテストする。そわない場合（ステップ１２４０で「ＮＯ」）、ビデオデコーダ１３４はフレーム内の次のＣＴＵに進み、プロセッサ２０５内の制御はステップ１２４０からステップ１２２０に戻り、ビットストリームからＣＴＵを復号し続ける。ＣＴＵがフレームまたはスライス内の最後のＣＴＵである場合、ステップ１２４０は「ＹＥＳ」に戻り、方法１３００は終了する。 In a last CTU test step 1240, processor 205 tests whether the current CTU is the last CTU in the slice or frame. If not (“NO” in step 1240), video decoder 134 advances to the next CTU in the frame and control within processor 205 returns from step 1240 to step 1220 to continue decoding CTUs from the bitstream. If the CTU is the last CTU in the frame or slice, step 1240 returns "YES" and method 1300 ends.

図１３は、画像フレームの符号化ツリーをビデオビットストリームに符号化する方法１３００を示す。方法１３００は、構成されたＦＰＧＡ、ＡＳＩＣ、またはＡＳＳＰなどの装置によって実施され得る。さらに、方法１３００は、プロセッサ２０５の実行下でビデオエンコーダ１１４によって実行され得る。したがって、方法１３００は、コンピュータ可読記憶媒体および／またはメモリ２０６に記憶されてもよい。方法１３００は各ブロックが最小領域にあるように、ブロックをビットストリーム１１５に符号化する。記載された構成は、所定の最小サイズのサンプルを使用する。説明される例で使用される最小サイズは１６サンプルであり、これは、いくつかのハードウェアおよびソフトウェア実装の観点から好ましい。しかしながら、それにもかかわらず、異なる最小サイズを使用することができる。例えば、３２または６４の処理粒度と、それぞれ３２または６４サンプルの対応する最小ブロック領域とが可能である。最小面積を有する符号化ブロックは、ハードウェアおよびソフトウェア実装の両方において、実装の実現可能性にとって有利である。ソフトウェア実装の場合、１６サンプルの最小領域は、ＡＶＸ－２およびＳＳＥ４などの典型的な単一命令多重データ（ＳＩＭＤ）命令セットと整列する。現在のＣＴＵの符号化ツリーのルートノードで最初に呼び出される方法１３００は、分割モードを符号化するステップ１３１０で開始する。 FIG. 13 shows a method 1300 for encoding an encoding tree of image frames into a video bitstream. Method 1300 may be implemented by a device such as a configured FPGA, ASIC, or ASSP. Further, method 1300 may be performed by video encoder 114 under execution of processor 205. Accordingly, method 1300 may be stored on computer readable storage medium and/or memory 206. Method 1300 encodes blocks into bitstream 115 such that each block is in a minimal area. The described configuration uses a predetermined minimum size sample. The minimum size used in the example described is 16 samples, which is preferred from some hardware and software implementation perspectives. However, different minimum sizes can nevertheless be used. For example, a processing granularity of 32 or 64 and a corresponding minimum block area of 32 or 64 samples, respectively, are possible. A coded block with minimal area is advantageous for implementation feasibility, both in hardware and software implementation. For a software implementation, a minimum area of 16 samples aligns with typical single instruction multiple data (SIMD) instruction sets such as AVX-2 and SSE4. The method 1300, first invoked at the root node of the current CTU's encoding tree, begins with step 1310 of encoding a split mode.

分割モードを符号化するステップ１３１０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、符号化ツリーの現在のノードにおける分割モードをビットストリーム１１５に符号化する。分割モードは図５を参照して説明したように分割の１つであり、分割モードを符号化するステップは、可能な分割の符号化のみを可能にする。例えば、四分木分割５１２は、符号化ツリーのルートノードにおいて、または符号化ツリー内の他の四分木分割の下においてのみ可能である。セット９１０に関連して示されるように、４サンプル未満の幅または高さを有するルマＣＢをもたらす分割は禁止される。例えば、規則セット１０１０に基づいて、２分割および／または３分割の最大深さに関する他の制約も有効であり得る。プロセッサ２０５における制御は、ステップ１３１０から分割無しテストステップ１３２０に進む。 In step 1310 of encoding the splitting mode, entropy encoder 338 encodes the splitting mode at the current node of the encoding tree into bitstream 115 under execution of processor 205 . The split mode is one of the splits as explained with reference to FIG. 5, and the step of encoding the split mode allows only possible splits to be encoded. For example, quadtree split 512 is only possible at the root node of the coding tree or below other quadtree splits in the coding tree. As shown with respect to set 910, splits that result in luma CBs having a width or height of less than 4 samples are prohibited. For example, based on rule set 1010, other constraints regarding the maximum depth of bisection and/or triplication may also be valid. Control in processor 205 passes from step 1310 to test no split step 1320 .

分割無しテストステップ１３２０で、プロセッサ２０５は、現在の分割が「分割無し」（すなわち、５１０）であるかどうかをテストする。現在の分割が分割無し５１０である場合（ステップ１３２０で「ＹＥＳ」）、プロセッサ２０５の制御はステップ１３２０からＣＵを符号化するステップ１３３０に進む。そうでなく、現在の分割が５１０でない場合（ステップ１３２０で「ＮＯ」）、プロセッサ２０５の制御はクロマ分割禁止テストステップ１３４０に進む。 At test no split step 1320, processor 205 tests whether the current split is "no split" (ie, 510). If the current partition is no partition 510 (“YES” in step 1320), control of processor 205 proceeds from step 1320 to step 1330 where the CU is encoded. Otherwise, if the current division is not 510 (“NO” in step 1320), control of processor 205 continues to chroma division inhibit test step 1340.

ＣＵを符号化するステップ１３３０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、ＣＵの予測モードおよびＣＵの残差をビットストリーム１１５に符号化する。ステップ１３３０が符号化ツリーの各リーフノードで到達すると、方法１３００は完了ステップ１３３０で終了し、符号化ツリートラバースにおける親呼び出しに戻る。符号化ツリーのすべてのノードがトラバースされると、ＣＴＵ全体がビットストリーム１１５に符号化され、制御は方法１１００に戻り、画像フレーム内の次のＣＴＵに進む。 In step 1330 of encoding the CU, entropy encoder 338 encodes the CU's prediction mode and the CU's residual into bitstream 115 under execution of processor 205 . Once step 1330 is reached at each leaf node of the encoding tree, method 1300 ends with a completion step 1330 and returns to the parent call in the encoding tree traversal. Once all nodes of the encoding tree have been traversed, the entire CTU is encoded into bitstream 115 and control returns to method 1100 to proceed to the next CTU in the image frame.

クロマ分割禁止テストステップ１３４０において、プロセッサ２０５は図１０のクロマ領域１０２０分割規則セットに従って、ステップ１３１０のように、符号化ツリー内の現在のノードに対する分割がクロマチャネルに適用されることを許可されているかどうかを判定する。符号化ツリー内の現在のノードが１２８個のルマサンプル（３２×４または４×３２または１６×８または８×１６）のルマ領域をカバーする場合、対応するクロマ領域（それぞれ１６×２、２×１６、８×４、４×８のクロマサンプル）内の３分割は、規則セット１０２０に示されるように禁止される。３分割が許可された場合、結果として得られるブロックサイズは禁止されたブロックサイズ（例えば、２×４または４×２）を含むことになる。符号化ツリー内の現在のノードが６４個のルマサンプルのルマ領域をカバーする場合、規則セット１０２０に示されるように、２分割、３分割、四分木分割は禁止される。６４個のルマサンプルのルマ領域に対して２分割、３分割、四分木分割を実施すると、禁止されたクロマブロックサイズ（２×２、２×４、４×２）になる。分割が禁止されていない場合（すなわち、分割がリスト１０７０の許可されたクロマ分割である場合）、ステップ１３４０は「ＮＯ」を返し、プロセッサ２０５の制御はステップ１３４０からルマおよびクロマ分割を実行するステップ１３５０に進む。そわない場合、分割が禁止されている場合（１３４０で「ＹＥＳ」）、プロセッサ２０５の制御はルマ分割を実行するステップ１３１００に進む。 In a chroma splitting prohibition test step 1340, the processor 205 determines whether splitting for the current node in the encoding tree is allowed to be applied to the chroma channel, as in step 1310, according to the chroma region 1020 splitting rule set of FIG. Determine whether there is. If the current node in the encoding tree covers a luma region of 128 luma samples (32x4 or 4x32 or 16x8 or 8x16), then the corresponding chroma region (16x2, 2 x16, 8x4, 4x8 chroma samples) is prohibited as shown in rule set 1020. If splitting into three is allowed, the resulting block size will include the prohibited block size (eg, 2x4 or 4x2). If the current node in the encoding tree covers a luma region of 64 luma samples, 2-split, 3-split, and quadtree splits are prohibited, as shown in rule set 1020. If a luma region of 64 luma samples is divided into two, divided into three, or quadtree divided, the chroma block size becomes prohibited (2×2, 2×4, 4×2). If the split is not prohibited (i.e., if the split is a permitted chroma split in list 1070), step 1340 returns "NO" and control of processor 205 continues from step 1340 to perform luma and chroma splits. Proceed to 1350. If not, and if division is prohibited (“YES” in 1340), control of processor 205 proceeds to step 13100 where luma division is performed.

ルマおよびクロマ分割を実行するステップ１３５０において、プロセッサ２０５は、分割を適用して、符号化ツリーの現在のノードに関連する現在の領域を、符号化ツリーのサブノードに関連するサブ領域に分割する。分割は、図５および図６の説明に従って適用される。プロセッサ２０５内の制御は、ステップ１３５０から領域を選択するステップ１３６０に進む。 In step 1350 of performing luma and chroma partitioning, processor 205 applies partitioning to partition the current region associated with the current node of the encoding tree into subregions associated with subnodes of the encoding tree. The splitting is applied according to the description of FIGS. 5 and 6. Control within processor 205 passes from step 1350 to step 1360 where a region is selected.

領域を選択するステップ１３６０において、プロセッサは、ステップ１３５０から生じるサブ領域のうちの１つを選択する。サブ領域は、領域のＺ順スキャンに従って選択される。選択は、ステップ１３６０の後続の反復でサブ領域を通って進行する。プロセッサ２０５内の制御は、ステップ１３６０から符号化ツリーを符号化するステップ１３７０に進む。 In select region step 1360, the processor selects one of the sub-regions resulting from step 1350. Sub-regions are selected according to a Z-order scan of the region. The selection progresses through the subregions in subsequent iterations of step 1360. Control within processor 205 passes from step 1360 to step 1370, which encodes the encoding tree.

符号化ツリーを符号化するステップ１３７０において、プロセッサ２０５は、ステップ１３６０の結果として生じる選択された領域に対して、方法１３００を再帰的に起動する。ステップ１３７０はさらに、ビットストリームの各領域について、ルマおよびクロマブロック、ならびに関連する予測モードおよび残差係数を符号化するように動作する。プロセッサ２０５における制御は、ステップ１３７０から最後の領域テストステップ１３８０に進む。 In step 1370 of encoding the encoding tree, processor 205 recursively invokes method 1300 for the selected region resulting from step 1360. Step 1370 further operates to encode luma and chroma blocks and associated prediction modes and residual coefficients for each region of the bitstream. Control in processor 205 passes from step 1370 to a final area test step 1380.

最後の領域テストステップ１３８０において、プロセッサ２０５は、ステップ１３６０で選択された選択領域がステップ１３５０で実行されるように、分割モード分割から得られた領域の最後の１つかどうかをテストする。領域が最後の領域でない場合（ステップ１３８０で「ＮＯ」）、プロセッサ２０５における制御はステップ１３８０からステップ１３６０に進み、分割の領域を進み続け、そわない場合、ステップ１３８０は「ＹＥＳ」を返し、方法１３００は終了し、プロセッサ２０５における制御は、方法１３００の親呼び出しに進む。 In a final region test step 1380, processor 205 tests whether the selected region selected in step 1360 is the last one of the regions obtained from the split mode segmentation to be performed in step 1350. If the region is not the last region (“NO” in step 1380), control in processor 205 passes from step 1380 to step 1360 to continue advancing through the region of division; otherwise, step 1380 returns “YES” and the method 1300 ends and control in processor 205 passes to the parent invocation of method 1300.

ルマ分割を実行するステップ１３１００では、ステップ１３１０で符号化されたような分割モードがプロセッサ２０５のみによってルマチャネルで実行される。その結果、符号化ツリーの現在のノードは、分割モードに従って複数のルマＣＢに分割される。クロマＣＢのペア、すなわち、クロマチャネル当たり１つのクロマＣＢのみが生成される。結果として得られる各ルマＣＢは、クロマＣＢのペアと集合的に結果として得られるルマＣＢとに部分的に重なる（並置される）。集合ルマＣＢは、クロマＣＢのペアの領域を正確にカバーする。クロマＣＢのペアの領域と。また、各ルマＣＢ及びクロマＣＢの最小面積は、最小サイズ、例えば１６サンプルである。 In step 13100 of performing luma splitting, the splitting mode as encoded in step 1310 is performed on the luma channel by processor 205 only. As a result, the current node of the coding tree is split into multiple luma CBs according to the splitting mode. Only pairs of chroma CBs are generated, ie, one chroma CB per chroma channel. Each resulting luma CB partially overlaps (juxtaposes) the pair of chroma CBs and the collectively resulting luma CB. The aggregate luma CB exactly covers the area of the chroma CB pair. Chroma CB pair area. Further, the minimum area of each luma CB and chroma CB is the minimum size, for example, 16 samples.

ステップ１３１００および１３５０はそれぞれ、クロマチャネルＣｂおよびＣｒのためのクロマ符号化ブロックのサイズを決定するように動作する。ステップ１３５０では、ステップ１３１０で決定された分割モードに基づいて、クロマチャネルのクロマ符号化ブロックサイズが決定される。ステップ１３１００において、クロマチャネルのクロマ符号化ブロックサイズは、所定の最小クロマブロックサイズに基づいて決定される。上述したように、ステップ１３５０は、符号化ツリーユニットに対して禁止されているクロマ分割に基づいて実施される。図１０の規則セット１０２０に示されるように、許容可能な分割、したがってクロマ符号化ブロックのサイズは、ステップ１１０５で判定されたクロマフォーマットに基づいて決定される。 Steps 13100 and 1350 operate to determine the size of chroma encoded blocks for chroma channels Cb and Cr, respectively. In step 1350, the chroma encoding block size of the chroma channel is determined based on the division mode determined in step 1310. At step 13100, a chroma encoding block size for a chroma channel is determined based on a predetermined minimum chroma block size. As mentioned above, step 1350 is performed based on prohibited chroma splitting for the coding tree unit. As shown in rule set 1020 of FIG. 10, the allowable partitioning, and thus the size of the chroma encoded block, is determined based on the chroma format determined in step 1105.

プロセッサ２０５内の制御は、ステップ１３１００からルマＣＢを選択するステップ１３１１０に進む。 Control within processor 205 passes from step 13100 to step 13110 where a luma CB is selected.

ルマＣＢを選択するステップ１３１１０において、プロセッサ２０５は、ステップ１３１００から得られたＣＢの次のルマＣＢを選択する。方法１３１００は最初に、第１のＣＢ、すなわち、ルマ分割から生じるＣＢの左上ルマＣＢを選択する。ステップ１３１１０の後続の起動時に、各「次の」ルマＣＢは、ステップ１３１００から得られるルマＣＢに渡るＺオーダスキャンに従って選択される。プロセッサ２０５における制御は、ステップ１３１１０からルマＣＢを符号化するステップ１３１２０に進む。 In step 13110 of selecting a luma CB, processor 205 selects the luma CB next to the CB obtained from step 13100. The method 13100 first selects the first CB, ie, the upper left luma CB of the CB resulting from the luma split. Upon subsequent activation of step 13110, each "next" luma CB is selected according to the Z-ordered scan across the luma CB resulting from step 13100. Control in processor 205 passes from step 13110 to step 13120, which encodes the luma CB.

ルマＣＢを符号化するステップ１３１２０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、選択されたルマＣＢをビットストリーム１１５に符号化する。一般的に、予測モードと残差係数は、選択されたルマＣＢに対して符号化される。ルマＣＢのために符号化された予測モードは、インター予測またはイントラ予測を使用することができる。例えば、「ｃｕ＿ｓｋｉｐ＿ｆｌａｇ」は残差なしでのインター予測の使用を示すために符号化され、さもなければ、「ｐｒｅｄ＿ｍｏｄｅ＿ｆｌａｇ」および任意選択で「ｐｒｅｄ＿ｍｏｄｅ＿ｉｂｃ＿ｆｌａｇ」は、それぞれ任意選択の残差係数をもつイントラ予測、インター予測、またはブロック内コピーの使用を示すために符号化される。残差が存在してもよい場合、「ｃｕ＿ｃｂｆ」フラグはＣＢの任意のＴＢにおける少なくとも１つの有意な（非ゼロの）残差係数の存在を示す。ＣＢがインター予測を使用するように指示される場合、関連する動きベクトルは、ルマＣＢのみに適用可能である。すなわち、動きベクトルは、部分的に並置されたクロマＣＢに関連するＰＢを生成するためにも適用されない。ＣＢがブロック内コピーを使用するように指示されると、関連するブロックベクトルは、ルマＣＢのみに関連付けられ、部分的に並置されたクロマＣＢには関連付けられない。プロセッサ２０５における制御は、ステップ１３１２０から最後のルマＣＢテストステップ１３１３０に進む。 In a step of encoding luma CB 13120 , entropy encoder 338 encodes the selected luma CB into bitstream 115 under execution of processor 205 . Generally, the prediction mode and residual coefficients are encoded for the selected luma CB. The encoded prediction mode for luma CB may use inter prediction or intra prediction. For example, "cu_skip_flag" is encoded to indicate the use of inter-prediction without residuals, otherwise "pred_mode_flag" and optionally "pred_mode_ibc_flag" are respectively coded to indicate the use of inter-prediction with optional residual coefficients. , inter-prediction, or encoded to indicate the use of intra-block copy. If residuals may be present, the "cu_cbf" flag indicates the presence of at least one significant (non-zero) residual coefficient in any TB of the CB. If the CB is instructed to use inter-prediction, the associated motion vectors are only applicable to the luma CB. That is, motion vectors are also not applied to generate PBs associated with partially collocated chroma CBs. When a CB is instructed to use intrablock copying, the associated block vectors are associated only with luma CBs and not with partially collocated chroma CBs. Control in processor 205 passes from step 13120 to a final luma CB test step 13130.

最後のルマＣＢテストステップ１３１３０で、プロセッサ２０５は、ステップ１３１１０で選択されたルマＣＢがステップ１３１００で実行された分割のルマＣＢのＺ順反復に従って最後のルマＣＢであるかどうかをテストする。選択されたルマＣＢが最後のものでない場合（ステップ１３１３０で「ＮＯ」）、プロセッサ２０５の制御はステップ１３１３０からステップ１３１２０に進む。そわない場合、ステップ１３１３０は「ＹＥＳ」に戻り、プロセッサ２０５の制御は、クロマイントラ予測モードを決定するステップ１３１４０に進む。 Last Luma CB Test In step 13130, processor 205 tests whether the luma CB selected in step 13110 is the last luma CB according to the Z-ordered iteration of the luma CB of the split performed in step 13100. If the selected luma CB is not the last one (“NO” in step 13130), control of processor 205 proceeds from step 13130 to step 13120. If not, step 13130 returns "YES" and control of processor 205 proceeds to step 13140, which determines the chroma intra prediction mode.

クロマイントラ予測モードを決定する１３１４０では、ビデオエンコーダ１１４がプロセッサ２０５の実行下で、ステップ１３１００のルマＣＢと一緒に配置されたクロマＣＢのペアに対するイントラ予測モードを決定する。ステップ１３１４０は、イントラ予測を使用してクロマブロックが符号化されることを効果的に決定する。クロマＣＢによって占有される領域が、ルマチャネルにおいて複数のルマＣＢにさらに分割されるかどうかの判定が行われる。チャネルに対するクロマブロックのサイズは、ステップ１３５０の動作によって決定される所定の最小値（例えば１６サンプル）である。ステップ１３１２０において、対応するルマＣＢがインター予測を使用して符号化された場合であっても、クロマＣＢのペアに対するイントラ予測モードが決定される。１つの構成では、ＤＣイントラ予測のような単一の予測モードが各クロマＣＢに適用される。単一予測モードの使用は、クロマ分割の禁止によってモードが決定されることを可能にし（ステップ１３４０における「ＹＥＳ」の結果）、複数の可能なモードのうちのどの１つのモードが使用されるべきかを決定するための追加の探索を必要としない。さらに、ビットストリーム１１５はこの場合、追加のシグナリングを必要としない、すなわち、追加の「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素を符号化する必要がない。しかし、構成はクロマ分割が禁止されているとき（ステップ１３４０で「ＹＥＳ」）、ビットストリーム１１５に「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素を含めることによって、いくつかの可能なイントラ予測モードのうちの１つのイントラ予測モードをシグナリングすることによって、より高い圧縮性能を達成することができる。ビデオエンコーダ１１４は、どのイントラ予測モードを使用するかを決定する。イントラ予測モードは、一般に歪みと比較して符号化コストの考慮に従って決定される。しかしながら、一般に、このようなクロマＣＢに対して単一のイントラ予測モードを使用する場合と比較して、より高い圧縮性能が得られる。プロセッサ２０５における制御は、ステップ１３１４０からクロマＣＢを符号化するステップ１３１５０に進む。 In determining chroma intra prediction mode 13140, video encoder 114, under execution of processor 205, determines an intra prediction mode for the luma CB of step 13100 and the co-located chroma CB pair. Step 13140 effectively determines that the chroma block is encoded using intra prediction. A determination is made whether the area occupied by the chroma CB is further divided into multiple luma CBs in the luma channel. The size of the chroma block for the channel is a predetermined minimum value (eg, 16 samples) determined by the operation of step 1350. In step 13120, an intra-prediction mode is determined for the pair of chroma CBs even if the corresponding luma CBs were encoded using inter-prediction. In one configuration, a single prediction mode, such as DC intra prediction, is applied to each chroma CB. The use of a single prediction mode allows the mode to be determined by prohibiting chroma splitting (a "YES" result in step 1340) and determines which one of multiple possible modes should be used. No additional exploration is required to determine which. Furthermore, the bitstream 115 does not require any additional signaling in this case, ie no additional "intra_chroma_pred_mode" syntax element needs to be encoded. However, when chroma splitting is prohibited (“YES” in step 1340), the configuration allows intra prediction of one of several possible intra prediction modes by including an “intra_chroma_pred_mode” syntax element in the bitstream 115. Higher compression performance can be achieved by signaling the prediction mode. Video encoder 114 determines which intra prediction mode to use. Intra prediction modes are generally determined according to considerations of encoding cost compared to distortion. However, higher compression performance is generally obtained compared to using a single intra-prediction mode for such chroma CBs. Control in processor 205 passes from step 13140 to step 13150, which encodes the chroma CB.

クロマＣＢを符号化するステップ１３１５０において、エントロピーエンコーダ３３８はプロセッサ２０５の実行下で、複数のイントラ予測モードが使用可能であるときに、「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素を使用して、クロマＣＢのイントラ予測モードをビットストリーム１１５に符号化する。１つのイントラ予測モード、例えばＤＣイントラ予測が可能であるとき、「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」は、ビットストリーム１１５に符号化されない。クロマイントラ予測のための利用可能なイントラ予測モードがＤＣ、平面、および以下の角度予測モードを含むことができる：水平、垂直、上右対角。利用可能なイントラ予測モードは「ダイレクトモード」（ＤＭ＿ＣＨＲＯＭＡ）も含むことができ、それによって、クロマイントラ予測モードは、共配置されたルマＣＢから、一般的にステップ１３１００から結果として生じるルマＣＢの最下位および最右から、取得される。「クロス構成要素線形モデル」イントラ予測が利用可能である場合、クロマＣＢは、ルマＣＢからのサンプルから予測され得る。図１４のステップ１４１５０を参照して説明したように、クロマＣＢに関連付けられたクロマＴＢの残差係数も、ビットストリーム１１５に符号化され得る。ステップ１３１５０がプロセッサ２０５によって実行されると、方法１３００が終了し、プロセッサ２０５内の制御が方法１３００の親呼び出しに戻る。 In step 13150 of encoding the chroma CB, the entropy encoder 338, under execution of the processor 205, uses the "intra_chroma_pred_mode" syntax element to perform intra prediction of the chroma CB when multiple intra prediction modes are available. The mode is encoded into the bitstream 115. When one intra-prediction mode is possible, for example DC intra-prediction, “intra_chroma_pred_mode” is not encoded into the bitstream 115. Available intra prediction modes for chroma intra prediction may include DC, planar, and the following angular prediction modes: horizontal, vertical, top right diagonal. The available intra-prediction modes may also include a “direct mode” (DM_CHROMA), whereby the chroma intra-prediction mode extracts the most of the resulting luma CB from step 13100, generally from the co-located luma CB. Retrieved from the bottom and the rightmost. If "cross-component linear model" intra-prediction is available, chroma CB may be predicted from samples from luma CB. As described with reference to step 14150 of FIG. 14, the residual coefficients of chroma TB associated with chroma CB may also be encoded into bitstream 115. Once step 13150 is executed by processor 205, method 1300 ends and control within processor 205 returns to the parent invocation of method 1300.

図１４は、方法１２００のステップ１２２０で実施される、ビデオビットストリームから画像フレームの符号化ツリーを復号する方法１４００を示す。方法１４００は、構成されたＦＰＧＡ、ＡＳＩＣ、またはＡＳＳＰなどの装置によって実施され得る。さらに、方法１４００は、プロセッサ２０５の実行下でビデオデコーダ１３４によって実行されてもよい。そのようなものとして、方法１４００は、コンピュータ可読記憶媒体および／またはメモリ２０６に記憶することができる。方法１４００は、各ブロックがハードウェアの場合とソフトウェアの場合の両方で、実装の実現可能性にとって有利である１６サンプルなどの最小面積よりも小さくないように、ビットストリーム１３３からブロックを復号することになる。ソフトウェアの場合、１６サンプルの最小領域は、ＡＶＸ－２及びＳＳＥ４のような典型的な単一命令多重データ（ＳＩＭＤ）命令セットと整列する。現在のＣＴＵの符号化ツリーのルートノードで最初に起動される方法１４００は、分割モードを復号するステップ１４１０で開始する。 FIG. 14 shows a method 1400 of decoding an encoding tree of image frames from a video bitstream, performed in step 1220 of method 1200. Method 1400 may be implemented by a device such as a configured FPGA, ASIC, or ASSP. Further, method 1400 may be performed by video decoder 134 under execution of processor 205. As such, method 1400 may be stored on computer readable storage medium and/or memory 206. Method 1400 decodes blocks from bitstream 133 such that each block is no smaller than a minimum area, such as 16 samples, which is advantageous for implementation feasibility in both hardware and software cases. become. For software, a minimum area of 16 samples aligns with typical single instruction multiple data (SIMD) instruction sets such as AVX-2 and SSE4. The method 1400, which is first activated at the root node of the encoding tree of the current CTU, begins with step 1410 of decoding the split mode.

分割モードを復号するステップ１４１０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、符号化ツリーの現在のノードにおける分割モードをビットストリーム１３３に復号する。分割モードは、図５を参照して説明したように分割のうちの１つであり、分割モードを符号化する方法は、クロマチャネルにおいて分割が禁止されている場合であっても許可される、すなわち、ルマチャネルにおいて許可される分割の符号化のみを許可する。例えば、四分木分割５１２は、符号化ツリーのルートノードにおいて、または符号化ツリー内の他の四分木分割の下においてのみ可能である。４サンプル未満の幅または高さを有するルマＣＢをもたらす分割は禁止される。したがって、最小ルマＣＢサイズは１６サンプルである。２分割および／または３分割の最大深さに関する他の制約もまた、有効であり得る。プロセッサ２０５における制御は、ステップ１４１０から分割無しテストステップ１４２０に進む。 In step 1410 of decoding split modes, entropy decoder 420 , under execution of processor 205 , decodes the split modes at the current node of the encoding tree into bitstream 133 . The split mode is one of the splits as described with reference to FIG. 5, and the method of encoding the split mode is allowed even if split is prohibited in the chroma channel. That is, only the encoding of the divisions allowed in the luma channel is allowed. For example, quadtree split 512 is only possible at the root node of the coding tree or below other quadtree splits in the coding tree. Splitting that results in a luma CB with a width or height of less than 4 samples is prohibited. Therefore, the minimum luma CB size is 16 samples. Other constraints on the maximum depth of bisection and/or triplication may also be valid. Control in processor 205 passes from step 1410 to test no split step 1420 .

分割無しテストステップ１４２０において、プロセッサ２０５は現在の分割が「分割無し」（すなわち、５１０）であるかどうかをテストする。現在の分割が分割無し５１０である場合（１４２０で「ＹＥＳ」）、プロセッサ２０５の制御はステップ１４２０からＣＵを復号するステップ１４３０に進む。そわない場合、ステップ１４２０は「ＮＯ」を返し、プロセッサ２０５の制御はクロマ分割禁止テストステップ１４４０に進む。 In a no split test step 1420, processor 205 tests whether the current split is "no split" (ie, 510). If the current partition is no partition 510 (“YES” at 1420), control of the processor 205 proceeds from step 1420 to step 1430 where the CU is decoded. If not, step 1420 returns "NO" and processor 205 control passes to chroma splitting inhibit test step 1440.

ＣＵを復号するステップ１４３０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、ＣＵの予測モード及びビットストリーム１１５のＣＵの残差係数を復号する。ステップ１４３０は、エントロピーデコーダ４２０によってビットストリームから決定された残差係数および予測モードを使用して、符号化ユニットを復号するように動作する。ステップ１４３０が符号化ツリーの各リーフノードで到達すると、方法１４００はステップ１４３０が完了すると終了し、符号化ツリー探索における親呼び出しに戻る。符号化ツリーのすべてのノードがトラバースされると、ＣＴＵ全体がビットストリーム１３３から復号され、制御は方法１２００に戻り、画像フレーム内の次のＣＴＵに進む。 In step 1430 of decoding a CU, entropy decoder 420 , under execution of processor 205 , decodes the prediction mode of the CU and the residual coefficients of the CU of bitstream 115 . Step 1430 operates to decode the coding unit using the residual coefficients and prediction mode determined from the bitstream by entropy decoder 420. Once step 1430 is reached at each leaf node of the encoding tree, method 1400 ends upon completion of step 1430 and returns to the parent call in the encoding tree search. Once all nodes of the encoding tree have been traversed, the entire CTU is decoded from bitstream 133 and control returns to method 1200 to proceed to the next CTU in the image frame.

クロマ分割禁止テストステップ１４４０において、プロセッサ２０５は図１０のクロマ領域１０２０分割規則セットに従って、ステップ１４１０のように、符号化ツリー内の現在のノードに対する分割がクロマチャネルに適用されることを許可されているかどうかを判定する。ステップ１４４０は、方法１３００のステップ１３４０と同様に、分割テストが禁止されているかどうかを判定する。ステップ１４４０の動作は、禁止ブロックサイズの発生を防止する。クロマ領域が既に最小サイズ、例えば１６のクロマサンプルにある場合、結果として得られる領域が許容最小値よりも小さいので、任意のタイプのさらなる分割は許容されない。クロマ領域サイズが３２サンプルであり、対応する分割が（水平または垂直３分割であるかにかかわらず）３分割である場合、領域８クロマサンプルのクロマブロックを回避するために、さらなる分割も許可されない。分割が禁止されていない場合（すなわち、分割が許可されている場合）、ステップ１４５０は「ＮＯ」を返し、プロセッサ２０５の制御はステップ１４４０からルマおよびクロマ分割を実行するステップ１４５０に進む。そわない場合、分割が禁止されている場合（ステップ１４５０で「ＹＥＳ」）、プロセッサ２０５の制御はクロマイントラ予測モードを決定するステップ１４１００に進む。 In a chroma splitting prohibition test step 1440, the processor 205 determines whether splitting for the current node in the encoding tree is allowed to be applied to the chroma channels, as in step 1410, according to the chroma region 1020 splitting rule set of FIG. Determine whether there is. Step 1440, similar to step 1340 of method 1300, determines whether split testing is prohibited. The act of step 1440 prevents the occurrence of prohibited block sizes. If the chroma region is already at a minimum size, for example 16 chroma samples, any type of further division is not allowed as the resulting region is smaller than the allowed minimum. If the chroma region size is 32 samples and the corresponding division is 3 divisions (whether horizontal or vertical 3 divisions), further divisions are also not allowed to avoid chroma blocks of 8 chroma samples in the region. . If splitting is not prohibited (ie, splitting is allowed), step 1450 returns "NO" and control of processor 205 passes from step 1440 to step 1450, which performs luma and chroma splitting. Otherwise, if segmentation is prohibited (“YES” in step 1450), control of the processor 205 proceeds to step 14100 in which a chroma intra prediction mode is determined.

ルマおよびクロマ分割を実行するステップ１４５０において、プロセッサ２０５は、分割を適用して、符号化ツリーの現在のノードに関連する現在の領域を、符号化ツリーのサブノードに関連するサブ領域に分割する。分割は、図５および図６に関連して説明したように適用される。 In step 1450 of performing luma and chroma partitioning, processor 205 applies partitioning to partition the current region associated with the current node of the encoding tree into subregions associated with subnodes of the encoding tree. The splitting is applied as described in connection with FIGS. 5 and 6.

ステップ１４１００および１４５０はそれぞれ、クロマチャネルＣｂおよびＣｒのためのクロマ符号化ブロックのサイズを決定するように動作する。ステップ１４５０では、ステップ１４１０で復号された分割モードに基づいて、クロマチャネルのクロマ符号化ブロックサイズが決定される。ステップ１４１００において、クロマチャネルのクロマ符号化ブロックサイズは、所定の最小クロマブロックサイズに基づいて決定される。上述のように、ステップ１４５０は、１６の最小クロマＣＢサイズ（およびルマ領域１２８サンプルの３分割の場合には３２）に対応する、符号化ツリーユニットに対して禁止されているクロマ分割に基づいて実施される。図１０の規則セット１０２０に示されるように、許容可能な分割、したがってクロマ符号化ブロックのサイズは、ステップ１２０５で判定されたクロマフォーマットに基づいて決定される。 Steps 14100 and 1450 operate to determine the size of chroma encoded blocks for chroma channels Cb and Cr, respectively. In step 1450, the chroma encoding block size of the chroma channel is determined based on the division mode decoded in step 1410. At step 14100, a chroma encoding block size for a chroma channel is determined based on a predetermined minimum chroma block size. As mentioned above, step 1450 is based on the chroma splitting that is prohibited for the encoding tree unit, corresponding to a minimum chroma CB size of 16 (and 32 in the case of 3 splits of the luma region 128 samples). Implemented. As shown in rule set 1020 of FIG. 10, the allowable partitioning, and thus the size of the chroma encoded block, is determined based on the chroma format determined in step 1205.

プロセッサ２０５内の制御は、ステップ１４５０から領域選択ステップ１４６０に進む。 Control within processor 205 passes from step 1450 to region selection step 1460.

領域選択ステップ１４６０において、プロセッサ２０５は、領域のＺオーダスキャンに従って、ステップ１４５０から生じるサブ領域の１つを選択する。ステップ１４６０は、後続の反復でサブ領域を通る進行選択を操作する。プロセッサ２０５内の制御は、ステップ１４６０から符号化ツリーを復号するステップ１４７０に進む。 In a region selection step 1460, processor 205 selects one of the sub-regions resulting from step 1450 according to the Z-order scan of the region. Step 1460 manipulates the progression selection through the subregions in subsequent iterations. Control within processor 205 passes from step 1460 to step 1470, which decodes the encoding tree.

符号化ツリーを復号するステップ１４７０において、プロセッサ２０５は、ステップ１４６０の動作の結果として生じる選択された領域に対して、方法１４００を再帰的に起動する。ステップ１４７０はさらに、ビットストリームから決定された残差係数および予測モードを使用して、符号化ツリーの各領域を復号するように動作する。プロセッサ２０５における制御は、ステップ１４７０から最後の領域テストステップ１４８０に進む。 In step 1470 of decoding the encoding tree, processor 205 recursively invokes method 1400 for the selected region resulting from the operation of step 1460. Step 1470 further operates to decode each region of the encoding tree using the residual coefficients and prediction mode determined from the bitstream. Control in processor 205 passes from step 1470 to a final area test step 1480.

最後の領域テストステップ１４８０で、プロセッサ２０５は、ステップ１４６０の最後の反復で事前選択されたように、選択された領域が、ステップ１４５０で実施された分割モード分割から生じる領域の最後の１つかどうかをテストする。領域が最後の領域でない場合（ステップ１４８０で「ＮＯ」）、プロセッサ２０５の制御は、ステップ１４８０からステップ１４６０に進み、分割の領域を進み続ける。そわない場合、ステップ１４８０は「ＹＥＳ」を返し、方法１４００は終了し、プロセッサ２０５の制御は方法１４００の親呼出しに進む。 In a final region test step 1480, the processor 205 determines whether the selected region is the last one of the regions resulting from the split mode partitioning performed in step 1450, as preselected in the last iteration of step 1460. test. If the region is not the last region (“NO” in step 1480), control of processor 205 passes from step 1480 to step 1460 and continues to advance through the regions of the division. If not, step 1480 returns "YES" and method 1400 ends and processor 205 control passes to the parent invocation of method 1400.

ルマ分割を実行するステップ１４１００では、ステップ１４１０で符号化されたような分割モードがプロセッサ２０５のみによってルマチャネルで実行される。その結果、符号化ツリーの現在のノードは、分割モードに従って複数のルマＣＢに分割される。ステップ１４１００は、クロマＣＢのペア、すなわち、クロマチャネル当たり１つのクロマＣＢのみを生成するように動作する。結果として得られる各ルマＣＢは、クロマＣＢのペアと部分的に重なり（少なくとも部分的に一緒に配置され）、集合的に、ルマＣＢは、クロマＣＢのペアと完全に重なる。また、各ルマＣＢおよびクロマＣＢの最小面積は１６サンプルである。プロセッサ２０５内の制御は、ステップ１４１００からルマＣＢを選択するステップ１４１１０に進む。 In step 14100 of performing luma splitting, the splitting mode as encoded in step 1410 is performed on the luma channel by processor 205 only. As a result, the current node of the coding tree is split into multiple luma CBs according to the splitting mode. Step 14100 operates to generate only chroma CB pairs, ie, one chroma CB per chroma channel. Each resulting luma CB partially overlaps (at least partially co-locates with) a pair of chroma CBs, and collectively, luma CBs completely overlap a pair of chroma CBs. Further, the minimum area of each luma CB and chroma CB is 16 samples. Control within processor 205 passes from step 14100 to step 14110, which selects a luma CB.

ルマＣＢを選択するステップ１４１１０において、プロセッサ２０５は、ステップ１４１００から得られたＣＢの次のルマＣＢを選択する。次のルマＣＢの選択は、第１のＣＢ、すなわちルマ分割から生じるＣＢの左上のルマＣＢから開始する。ステップ１４１１０の後続の呼び出し時に、各「次の」ルマＣＢが、ステップ１４１００から得られるルマＣＢにわたるＺオーダースキャンに従って選択される。プロセッサ２０５内の制御は、ステップ１４１１０からルマＣＢを復号するステップ１４１２０に進む。 In step 14110 of selecting a luma CB, processor 205 selects the luma CB next to the CB obtained from step 14100. The selection of the next luma CB starts from the top left luma CB of the first CB, ie the CB resulting from the luma split. On subsequent invocations of step 14110, each "next" luma CB is selected according to the Z-order scan over the luma CB resulting from step 14100. Control within processor 205 passes from step 14110 to step 14120, which decodes the luma CB.

ルマＣＢを復号するステップ１４１２０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、選択されたルマＣＢをビットストリーム１１５に復号する。一般に、予測モードおよび残差は、選択されたルマＣＢについて復号される。例えば、「ｃｕ＿ｓｋｉｐ＿ｆｌａｇ」は残差なしでのインター予測の使用を示すために復号され、さもなければ「ｐｒｅｄ＿ｍｏｄｅ＿ｆｌａｇ」および任意選択で「ｐｒｅｄ＿ｍｏｄｅ＿ｉｂｃ＿ｆｌａｇ」はそれぞれ任意選択の残差係数をもつイントラ予測、インター予測、またはブロック内コピーの使用を示すために復号される。残差が存在する可能性がある場合、「ｃｕ＿ｃｂｆ」フラグはＣＢの任意のＴＢにおける少なくとも１つの有意な（非ゼロの）残差係数の存在を示す。ＣＢがインター予測を使用するように指示される場合、関連する動きベクトルはルマＣＢのみに適用可能であり、すなわち、動きベクトルは、部分的に並置されたクロマＣＢに関連するＰＢを生成するためにも適用されない。ＣＢがブロック内コピーを使用するように指示されると、関連するブロックベクトルは、ルマＣＢのみに関連付けられ、部分的に並置されたクロマＣＢには関連付けられない。プロセッサ２０５における制御は、ステップ１４１２０から最後のルマＣＢをテストするステップ１４１３０に進む。 In step 14120 of decoding luma CB, entropy decoder 420 decodes the selected luma CB into bitstream 115 under execution of processor 205 . Generally, the prediction mode and residual are decoded for the selected luma CB. For example, "cu_skip_flag" is decoded to indicate the use of inter-prediction without residuals, otherwise "pred_mode_flag" and optionally "pred_mode_ibc_flag" are used for intra-prediction, inter-prediction with optional residual coefficients, respectively. , or decoded to indicate the use of intrablock copy. If residuals may exist, the "cu_cbf" flag indicates the presence of at least one significant (non-zero) residual coefficient in any TB of the CB. When a CB is instructed to use inter-prediction, the associated motion vectors are only applicable to luma CBs, i.e. the motion vectors are used to generate PBs that are associated with partially collocated chroma CBs. It also does not apply to When a CB is instructed to use intrablock copying, the associated block vectors are associated only with luma CBs and not with partially collocated chroma CBs. Control in processor 205 passes from step 14120 to step 14130, where the last luma CB is tested.

最後のルマＣＢをテストするステップ１４１３０で、プロセッサ２０５は、ステップ１４１１０で選択されたルマＣＢがステップ１４１００で実行された分割のルマＣＢのＺオーダー反復に従って最後のルマＣＢであるかどうかをテストする。選択されたルマＣＢが最後のものでない場合、プロセッサ２０５内の制御は、ステップ１４１３０からステップ１４１１０に進む。そわない場合には、プロセッサ２０５における制御は、クロマイントラ予測モードを決定するステップ１４１４０に進む。 In step 14130 of testing the last luma CB, the processor 205 tests whether the luma CB selected in step 14110 is the last luma CB according to the Z-ordered iteration of the luma CB of the split performed in step 14100. . If the selected luma CB is not the last one, control within processor 205 proceeds from step 14130 to step 14110. If not, control in processor 205 proceeds to step 14140 where a chroma intra prediction mode is determined.

クロマイントラ予測モードを決定する１４１４０では、ビデオデコーダ１３４が、プロセッサ２０５の実行下で、ステップ１４１００のルマＣＢと一緒に配置されたクロマＣＢのペアに対するイントラ予測モードを決定する。ステップ１４１４０は、ステップ１４４０の動作によって決定されるように、クロマブロックがルマのための符号化ツリーを分割することが行われている間にクロマのための符号化ツリーを分割することの停止の結果である場合、イントラ予測を使用してクロマブロックが符号化されており、したがって、イントラ予測を使用して復号されるべきであることを効果的に決定する。ステップ１４１２０において、対応するルマＣＢがインター予測を使用して復号された場合であっても、クロマＣＢのペアに対するイントラ予測モードが決定される。１つの構成では、ＤＣイントラ予測のような単一の予測モードが各クロマＣＢに適用される。単一予測モードの使用は、クロマ分割の禁止によってモードが決定されることを可能にし（ステップ１４４０における「ＹＥＳ」の結果）、複数の可能なモードのうちのどの１つのモードが使用されるべきかを決定するための追加の探索を必要としない。さらに、ビットストリーム１３４はこの場合、追加のシグナリングを必要とせず、すなわち、追加の「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素を符号化する必要がない。しかし、構成は、クロマ分割が禁止されているとき（ステップ１４４０で「ＹＥＳ」）、「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素をビットストリーム１３４に含めることによって、いくつかの可能なイントラ予測モードのうち１つのイントラ予測モードをシグナリングすることによって、より高い圧縮性能を達成することができる。ビデオデコーダ１３４は、エントロピーデコーダ４２０を使用して、ビットストリーム１３４から「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素を復号するために、使用されるイントラ予測モードを決定する必要がある。プロセッサ２０５における制御は、ステップ１４１４０からクロマＣＢを復号するステップ１４１５０に進む。 In determining chroma intra-prediction mode 14140, video decoder 134, under execution of processor 205, determines an intra-prediction mode for the luma CB of step 14100 and the co-located chroma CB pair. Step 14140 includes stopping the splitting of the coding tree for chroma while the chroma block is splitting the coding tree for luma, as determined by the operation of step 1440. If so, it effectively determines that the chroma block was encoded using intra-prediction and therefore should be decoded using intra-prediction. In step 14120, an intra prediction mode for the pair of chroma CBs is determined even if the corresponding luma CB was decoded using inter prediction. In one configuration, a single prediction mode, such as DC intra prediction, is applied to each chroma CB. The use of a single prediction mode allows the mode to be determined by prohibiting chroma splitting (a "YES" result in step 1440) and determines which one of multiple possible modes should be used. No additional exploration is required to determine which. Furthermore, the bitstream 134 does not require any additional signaling in this case, ie, no additional "intra_chroma_pred_mode" syntax element needs to be encoded. However, when chroma splitting is prohibited (“YES” in step 1440), the configuration can be configured to enable one of several possible intra prediction modes by including an “intra_chroma_pred_mode” syntax element in the bitstream 134. Higher compression performance can be achieved by signaling the prediction mode. Video decoder 134, using entropy decoder 420, must determine the intra prediction mode to be used to decode the “intra_chroma_pred_mode” syntax element from bitstream 134. Control in processor 205 proceeds from step 14140 to step 14150, where the chroma CB is decoded.

クロマＣＢを復号するステップ１４１５０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、一般に、復号された「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」シンタックス要素に従って、ビットストリーム４２０からのクロマＣＢのイントラ予測モードを決定する。「ｉｎｔｒａ＿ｃｈｒｏｍａ＿ｐｒｅｄ＿ｍｏｄｅ」の復号は、複数のイントラ予測モードが利用可能な場合に実行される。１つのイントラ予測モード、例えばＤＣイントラ予測のみが利用可能である場合、モードは、ビットストリーム１３３から追加のシンタックス要素を復号することなく推論される。クロマイントラ予測のために利用可能なイントラ予測モードは、ＤＣ、平面、以下の角度予測モードを含み得る：水平、垂直、上右対角。利用可能なイントラ予測モードはまた、「直接モード」（ＤＭ＿ＣＨＲＯＭＡ）を含むことができ、それによって、クロマイントラ予測モードは、ステップ１４１００から結果として生じるルマＣＢの一般的に最下位および最右位で並置されたルマＣＢから取得される。「クロス構成要素線形モデル」イントラ予測が利用可能である場合、クロマＣＢは、ルマＣＢからのサンプルから予測され得る。クロマＣＢのペアについて、「ｃｕ＿ｃｂｆ」フラグは、クロマＣＢのペアのいずれか１つにおける少なくとも１つの有意な残差係数の存在をシグナリングする。少なくとも１つの有意な残差係数がクロマＣＢのペアのいずれか１つに存在する場合、「ｔｕ＿ｃｂｆ＿ｃｂ」および「ｔｕ＿ｃｂｆ＿ｃｒ」はそれぞれ、ＣｂおよびＣｒチャネルのクロマＣＢにおける少なくとも１つの有意な係数の存在をシグナリングする。少なくとも１つの有意な残差係数を有するクロマＣＢについて、シンタックス要素の「ｒｅｓｉｄｕａｌ＿ｃｏｄｉｎｇ」シーケンスがそれぞれのクロマＣＢの残差係数を決定するために復号される。残差符号化シンタックスは、後方対角スキャンに従って、最後の有意な係数位置から左上（「ＤＣ」）係数位置に変換ブロックをポピュレートする値のシーケンスとして残差係数を符号化する。後方対角スキャンは、一般にサイズ４×４であるが、サイズ２×２、２×４、２×８、８×２、４×２も可能な「サブブロック」（または「係数グループ」）のシーケンスとして変換ブロックのスキャンを実行する。各係数グループ内のスキャンは、後方対角方向にあり、１つのサブブロックから次のサブブロックへのスキャンも後方対角方向にある。ステップ１４１５０がプロセッサ２０５によって実行されると、方法１４００が終了し、プロセッサ２０５内の制御が方法１４００の親呼び出しに戻る。 In step 14150 of decoding a chroma CB, entropy decoder 420, under execution of processor 205, determines the intra prediction mode of the chroma CB from bitstream 420, generally according to the decoded "intra_chroma_pred_mode" syntax element. Decoding of "intra_chroma_pred_mode" is performed when multiple intra prediction modes are available. If only one intra-prediction mode is available, e.g. DC intra-prediction, the mode is inferred from the bitstream 133 without decoding additional syntax elements. Intra prediction modes available for chroma intra prediction may include DC, planar, and the following angular prediction modes: horizontal, vertical, top right diagonal. The available intra-prediction modes may also include a “direct mode” (DM_CHROMA), whereby the chroma intra-prediction mode is configured to perform a Obtained from juxtaposed luma CB. If "cross-component linear model" intra-prediction is available, chroma CB may be predicted from samples from luma CB. For a pair of chroma CBs, the "cu_cbf" flag signals the presence of at least one significant residual coefficient in any one of the pair of chroma CBs. “tu_cbf_cb” and “tu_cbf_cr” indicate the presence of at least one significant coefficient in the chroma CB of the Cb and Cr channels, respectively, if at least one significant residual coefficient is present in any one of the pair of chroma CBs. signal. For chroma CBs that have at least one significant residual coefficient, a "residual_coding" sequence of syntax elements is decoded to determine the residual coefficient for each chroma CB. The residual encoding syntax encodes the residual coefficients as a sequence of values that populates the transform block from the last significant coefficient position to the top left ("DC") coefficient position according to a backward diagonal scan. A backward diagonal scan scans "subblocks" (or "coefficient groups"), typically of size 4x4, but also of size 2x2, 2x4, 2x8, 8x2, 4x2. Perform scanning of transform blocks as a sequence. The scans within each coefficient group are back diagonal, and the scans from one subblock to the next are also back diagonal. Once step 14150 is executed by processor 205, method 1400 ends and control within processor 205 returns to the parent invocation of method 1400.

方法１３００および１４００の符号化ツリーアプローチは、１６サンプルの最小ブロック領域が４：２：０クロマフォーマットビデオデータに対して維持され、ソフトウェアおよびハードウェアの両方における高スループット実装を容易にする。さらに、小さいＣＢサイズに対するルマＣＢに対するインター予測の制限は、動き補償されたクロマＣＢを生成するためにサンプルをフェッチをもする必要性を回避することによって、動き補償メモリ帯域幅に対するこの最悪の場合のメモリ帯域幅を低減する。特に、最小クロマＣＢサイズが２×２であり、クロマＣＢのサブサンプル補間のためのフィルタサポートを提供するために追加のサンプルが必要とされる場合、小さいブロックサイズのためにルマチャネルにおいてインター予測を実行するだけと比較して、メモリ帯域幅の実質的な増加が見られる。動き補償の符号化利得は、実質的にルマチャネル内に現れ、したがって、小さなブロックも動き補償されることから省略することは比較的わずかな符号化性能の影響のためにメモリ帯域幅の低減を達成する。さらに、メモリ帯域幅の低減は、４×４ルマＣＢに対して動き補償を実行し、結果として得られる符号化利得を達成する実現可能性に寄与する。 The coding tree approach of methods 1300 and 1400 maintains a minimum block area of 16 samples for 4:2:0 chroma format video data, facilitating high-throughput implementation in both software and hardware. Additionally, the limitation of inter-prediction for luma CBs for small CB sizes reduces this worst-case scenario to motion-compensated memory bandwidth by avoiding the need to also fetch samples to generate motion-compensated chroma CBs. Reduce memory bandwidth. In particular, if the minimum chroma CB size is 2x2 and additional samples are required to provide filter support for subsample interpolation of the chroma CB, inter-prediction in the luma channel for small block sizes is There is a substantial increase in memory bandwidth compared to just running. The coding gain of motion compensation appears substantially within the luma channel; therefore, omitting small blocks from being motion compensated also achieves a reduction in memory bandwidth for a relatively small coding performance impact. do. Furthermore, the reduction in memory bandwidth contributes to the feasibility of performing motion compensation on 4x4 luma CBs and achieving the resulting coding gain.

ビデオエンコーダ１１４およびビデオデコーダ１３４の１つの構成では、２つ以上のルマ分割が、符号化ツリーのクロマ分割が終了する点から符号化ツリー内で発生することができる。例えば、８×１６ルマ領域は、クロマチャネルにおいて分割されず、４×８クロマＣＢのペアをもたらす。ルマチャネルにおいて、８×１６ルマ領域は最初に、水平３分割で分割され、次に、結果として生じるルマＣＢのうちの１つがさらに分割される。例えば、結果として得られる８×４ルマＣＢは、２つの４×４ルマＣＢに垂直に２分割される。符号化ツリーのクロマ分割が終了する点から符号化ツリーにおいて２つ以上のルマ分割を有する構成は、クロマ分割禁止領域内のビデオエンコーダ１１４およびビデオデコーダ１３４のそれぞれにおける方法１３００および１４００を再起動し、後続の呼出しではそれ以上のクロマＣＢは必要とされないという修正を伴う。クロマＣＢのペアが作成される方法１３００および１４００の呼び出しでは、クロマ領域全体が作成されたクロマＣＢによってカバーされるので、方法１３００および１４００の再帰的呼び出しは追加のクロマＣＢを作成する必要がない。 In one configuration of video encoder 114 and video decoder 134, two or more luma splits may occur within the encoding tree from the point where the chroma split of the encoding tree ends. For example, an 8x16 luma region is not divided in chroma channels, resulting in a 4x8 chroma CB pair. In the luma channel, the 8x16 luma region is first divided into horizontal thirds, and then one of the resulting luma CBs is further divided. For example, the resulting 8x4 luma CB is vertically bisected into two 4x4 luma CBs. Configurations with more than one luma split in the coding tree from the point where the chroma split of the coding tree ends restarts methods 1300 and 1400 in video encoder 114 and video decoder 134, respectively, in the chroma splitting prohibited region. , with the modification that no further chroma CBs are required on subsequent calls. In calls to methods 1300 and 1400 in which a pair of chroma CBs are created, the entire chroma region is covered by the created chroma CBs, so recursive calls to methods 1300 and 1400 do not need to create additional chroma CBs. .

図１５は、イントラ予測符号化ユニットの変換ブロック分割の集合１５００を示す。ルマＣＢは、同じサイズの１つのルマＴＢ（「ＩＳＰ＿ＮＯ＿ＳＰＬＩＴ」）に分割されてもよい。サイズ４×４のルマＣＢが１６サンプルの領域を有し、さらに分割されず、サイズ４×４の１つのルマＴＢにもなる。３２サンプルの領域を有するルマＣＢは、２つの区分に分割されてもよい。例えば、８×４ルマＣＢ１５１０は、水平に（「ＩＳＰ＿ＨＯＲ＿ＳＰＬＩＴ」）２つの８×２ルマＴＢ１５２０に分割されてもよいし、垂直に（「ＩＳＰ＿ＶＥＲ＿ＳＰＬＩＴ」）２つの４×４ルマＴＢ１５３０に分割されてもよい。ルマＣＢ１５１０が４×８ルマＣＢである場合、ブロックは１５２０において２つの４×４ルマＴＢに水平に、または１５３０において２つの２×８ルマＴＢに垂直に分割され得る。 FIG. 15 shows a set 1500 of transform block partitions of an intra predictive coding unit. The luma CB may be divided into one luma TB (“ISP_NO_SPLIT”) of the same size. A luma CB with a size of 4×4 has an area of 16 samples, and is not further divided into one luma TB with a size of 4×4. A luma CB with an area of 32 samples may be divided into two sections. For example, an 8×4 Luma CB1510 may be split horizontally (“ISP_HOR_SPLIT”) into two 8×2 Luma TB1520 or vertically (“ISP_VER_SPLIT”) into two 4×4 Luma TB1530. good. If the luma CB 1510 is a 4x8 luma CB, the block may be divided horizontally into two 4x4 luma TBs at 1520 or vertically into two 2x8 luma TBs at 1530.

６４サンプル以上の領域のルマＣＢは、４つの区分への１つの区分に分割される。幅Ｗおよび高さＨの、より大きい６４サンプルの領域を有するルマＣＢ１５５０は、サイズＷｘ（Ｈ／４）の４つのルマＴＢ１５６０に水平に分割されてもよく、または４つの（Ｗ／４）ｘＨルマＴＢに垂直に分割されてもよい。集合１５００に示すように、ルマＣＢを複数の区分に分割すると、ルマＴＢがますます小さくなる。イントラ予測は、各ルマＴＢについてＰＢを生成するために実行され、イントラ再構成プロセスは、１つの区分から次の区分へのルマＣＢ内で実行される。 The luma CB in the area of 64 samples or more is divided into one section into four sections. A Luma CB1550 with a larger area of 64 samples of width W and height H may be horizontally divided into four Luma TB1560 of size Wx(H/4), or four (W/4)xH The luma TB may be divided vertically. As shown in set 1500, dividing luma CB into multiple sections makes luma TB smaller and smaller. Intra prediction is performed to generate a PB for each luma TB, and an intra reconstruction process is performed within the luma CB from one partition to the next.

図１６は、画像フレームの符号化ユニットをビデオビットストリーム１１５に符号化するための方法１６００を示す。方法１６００は、構成されたＦＰＧＡ、ＡＳＩＣ、またはＡＳＳＰなどの装置によって実施され得る。さらに、方法１６００は、プロセッサ２０５の実行下でビデオエンコーダ１１４によって実行され得る。そのようなものとして、方法１６００は、コンピュータ可読記憶媒体および／またはメモリ２０６に記憶することができる。方法１６００は、係数グループサイズが変換ブロックサイズのみに基づいて決定され、ルマチャネルとクロマチャネルとの間でさらに区別されないように、ブロックをビットストリーム１１５に符号化することになる。エントロピー符号化は、ビデオエンコーダ１１４におけるクリティカルフィードバックループであるので、係数グループサイズ決定に必要なメモリアクセスまたは演算を低減することが有利である。符号化ツリー内の各符号化ユニットに対して呼び出される、すなわち図１３のステップ１３３０で呼び出される方法１６００は、ｐｒｅｄ＿ｍｏｄｅを符号化するステップ１６１０で開始する。上述したように、ステップ１３３０は、ステップ１３２０が現在の分割が分割無し５１０であると判定した場合に実行される。 FIG. 16 shows a method 1600 for encoding encoded units of image frames into a video bitstream 115. Method 1600 may be implemented by a device such as a configured FPGA, ASIC, or ASSP. Additionally, method 1600 may be performed by video encoder 114 under execution of processor 205. As such, method 1600 may be stored on computer readable storage medium and/or memory 206. Method 1600 will encode blocks into bitstream 115 such that coefficient group sizes are determined based solely on transform block sizes and do not further differentiate between luma and chroma channels. Since entropy encoding is a critical feedback loop in video encoder 114, it is advantageous to reduce the memory accesses or operations required for coefficient group size determination. The method 1600, called for each coding unit in the coding tree, ie, called in step 1330 of FIG. 13, begins with step 1610 of encoding pred_mode. As mentioned above, step 1330 is performed if step 1320 determines that the current split is no split 510.

ｐｒｅｄｍｏｄｅを符号化するステップ１６１０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、ＣＵの予測モードをビットストリーム１１５に符号化する。プロセッサ２０５における制御は、ステップ１６１０からイントラ予測テストステップ１６２０に進む。 In step 1610 of encoding pred mode, entropy encoder 338 encodes the prediction mode of the CU into bitstream 115 under execution of processor 205 . Control in processor 205 passes from step 1610 to intra-prediction test step 1620 .

イントラ予測テストステップ１６２０において、プロセッサ２０５は、ＣＵの予測モードをテストする。予測モードがイントラ予測である場合（ステップ１６２０で「ＹＥＳ」）、プロセッサ２０５の制御は、ステップ１６２０からイントラサブ分割モードを符号化するステップ１６５０に進む。さもなければ、予測モードがイントラ予測でない場合（ステップ１６２０で「Ｎｏ」）、プロセッサ２０５の制御は、ステップ１６２０からマージフラグおよびインデックスを符号化するステップ１６３０に進む。 In an intra-prediction test step 1620, the processor 205 tests the prediction mode of the CU. If the prediction mode is intra prediction (“YES” in step 1620), control of processor 205 proceeds from step 1620 to step 1650, where the intra subdivision mode is encoded. Otherwise, if the prediction mode is not intra-prediction (“No” in step 1620), control of processor 205 passes from step 1620 to step 1630, where the merge flag and index are encoded.

マージフラグおよびインデックスを符号化するステップ１６３０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、インター予測のための「マージモード」の使用（または使用しない）をシグナリングするマージフラグをビットストリーム１１５に符号化する。マージモードはＣＵの動きベクトルを、空間的に（または時間的に）隣接するブロックの候補のセットのうち、空間的に（または時間的に）隣接するブロックから取得させる。マージモードが使用されている場合、対応する「マージインデックス」で１つの候補が選択される。マージインデックスは、マージフラグと共にビットストリーム１１５内に符号化される。「動きベクトル予測」が使用される場合、同様の符号化が実行され、それによって、いくつかの可能な候補動きベクトルのうちの１つが、フラグを使用して予測子としてシグナリングされる。プロセッサ２０５における制御は、ステップ１６３０から動きベクトルデルタを符号化するステップ１６４０に進む。 In step 1630 of encoding merge flags and indices, entropy encoder 338, under execution of processor 205, inserts a merge flag into bitstream 115 that signals the use (or non-use) of "merge mode" for inter prediction. encode. The merge mode causes the motion vector of a CU to be obtained from a spatially (or temporally) adjacent block among a candidate set of spatially (or temporally) adjacent blocks. If merge mode is used, one candidate is selected with the corresponding "merge index". The merge index is encoded within the bitstream 115 along with the merge flag. When "motion vector prediction" is used, a similar encoding is performed whereby one of several possible candidate motion vectors is signaled as a predictor using a flag. Control in processor 205 passes from step 1630 to step 1640, where the motion vector delta is encoded.

動きベクトルデルタを符号化するステップ１６４０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、動きベクトルデルタをビットストリーム１１５に符号化する。ステップ１６４０は、動きベクトル予測がＣＵのために使用される場合に実行される。動きベクトルデルタは、ステップ１６３０で符号化された動きベクトル予測子と、動き補償に使用される動きベクトルと、の間のデルタを指定する。プロセッサ２０５における制御は、ステップ１６４０から符号化残差テストステップ１６６０に進む。動きベクトル予測がＣＵのために使用されない場合、ステップ１６４０は実施されず、方法１６００は直接ステップ１６６０に進む。 In step 1640 of encoding motion vector deltas, entropy encoder 338 encodes the motion vector deltas into bitstream 115 under execution of processor 205 . Step 1640 is performed if motion vector prediction is used for the CU. Motion vector delta specifies the delta between the motion vector predictor encoded in step 1630 and the motion vector used for motion compensation. Control in processor 205 passes from step 1640 to encoded residual test step 1660. If motion vector prediction is not used for the CU, step 1640 is not performed and method 1600 proceeds directly to step 1660.

イントラサブ分割モードを符号化するステップ１６５０では、エントロピーエンコーダ３３８がプロセッサ２０５の実行下で、コンテキスト符号化「Ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｍｏｄｅ＿ｆｌａｇ」シンタックス要素を用いてイントラサブ分割をビットストリーム１１５に使用するかどうかの決定を符号化する。イントラサブ分割は、ルマＣＢサイズが最小ルマ変換ブロックサイズよりも大きい、すなわち１６ルマサンプルよりも大きいときに、ルマチャネルに利用可能である。イントラサブ分割は、集合１５００に示されるように、符号化ユニットを複数のルマ変換ブロックに分割する。ルマＣＢが複数のＴＢに分割される場合、「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｓｐｌｉｔ＿ｆｌａｇ」は、ルマＣＢの複数のルマＴＢへの分割が水平方向または垂直方向に生じるかどうかをシグナリングする。集合的に、「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｍｏｄｅ＿ｆｌａｇ」および「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｓｐｌｉｔ＿ｆｌａｇ」は、「ＩＳＰ＿ＮＯ＿ＳＰＬＩＴ」、「ＩＳＰ＿ＨＯＲ＿ＳＰＬＩＴ」および「ＩＳＰ＿ＶＥＲ＿ＳＰＬＩＴ」として列挙される３つの可能な分割を符号化する。プロセッサ２０５における制御は、ステップ１６５０から符号化残差テストステップ１６６０に進む。 In step 1650 of encoding intra-subpartitioning mode, entropy encoder 338, under execution of processor 205, uses the context encoding "Intra_subpartitions_mode_flag" syntax element to determine whether intra-subpartitioning is to be used for bitstream 115. encode. Intra subdivision is available for luma channels when the luma CB size is larger than the minimum luma transform block size, ie larger than 16 luma samples. Intra subpartitioning partitions the coding unit into multiple luma transform blocks, as shown in set 1500. If a luma CB is split into multiple TBs, "intra_subpartitions_split_flag" signals whether the split of the luma CB into multiple luma TBs occurs horizontally or vertically. Collectively, "intra_subpartitions_mode_flag" and "intra_subpartitions_split_flag" are divided into three possible values, listed as "ISP_NO_SPLIT", "ISP_HOR_SPLIT" and "ISP_VER_SPLIT". Encode the split. Control in processor 205 passes from step 1650 to encoded residual test step 1660.

符号化残差テストステップ１６６０において、プロセッサ２０５は、符号化ブロックの任意の変換ブロック内の少なくとも１つの残差係数が有意であるかどうかを判定する。この判定は、イントラサブ分割の適用から生じるすべてのルマＴＢと、２つのクロマチャネルに関連するクロマＴＢのペアとを含む。ルマＴＢおよびクロマＴＢのいずれかにおける少なくとも１つの残差係数が有意である場合、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、「ｃｕ＿ｃｂｆ」シンタックス要素について「１」を算術的に符号化し、ステップ１６６０は「ＹＥＳ」を返し、プロセッサ２０５はルマ係数グループサイズを決定するステップ１６７０に進む。有意な残差係数がＣＵのいずれのＴＢにも存在しない場合、ステップ１６６０は「ＮＯ」を返し、「０」がｃｕ＿ｃｂｆについて算術的に符号化され、方法１６００は終了し、プロセッサ２０５は、ＣＴＵ内の次のＣＵに進む。 In a coding residual testing step 1660, processor 205 determines whether at least one residual coefficient in any transform block of the coding block is significant. This determination includes all luma TBs resulting from the application of intra-subdivision and the chroma TB pairs associated with the two chroma channels. If at least one residual coefficient in either luma TB and chroma TB is significant, entropy encoder 338, under execution of processor 205, arithmetically encodes a “1” for the “cu_cbf” syntax element; Step 1660 returns "YES" and processor 205 proceeds to step 1670 where it determines the luma coefficient group size. If there are no significant residual coefficients in any TB of the CU, step 1660 returns "NO", a "0" is arithmetic encoded for cu_cbf, method 1600 ends, and processor 205 Proceed to the next CU within.

ルマ係数グループサイズを決定するステップ１６７０において、プロセッサは、ＣＵに関連する１つまたは複数のルマＴＢ（変換ブロック）の係数グループサイズを決定する。イントラサブ分割が使用されていない場合、１つのルマＴＢが存在する。イントラサブ分割が使用されている場合は、２つまたは４つのルマＴＢがある。ルマＴＢのサイズは水平または垂直に実行されているイントラサブ分割、およびルマＴＢの数に依存し、したがって、集合１５００に示すように、ルマＣＵサイズに依存する。 In determining luma coefficient group size step 1670, the processor determines the coefficient group size of one or more luma TBs (transform blocks) associated with the CU. If intra-subpartitioning is not used, one luma TB exists. If intra-subpartitioning is used, there are two or four luma TBs. The size of the luma TB depends on the intra subdivisions being performed horizontally or vertically, and the number of luma TBs, and thus on the luma CU size, as shown in set 1500.

係数グループサイズは以下のテーブル１に示すように、ルマＴＢ幅および高さを使用して決定される。テーブル１は、ＴＢがルマチャネルまたはクロマチャネルに対するものであるかにかかわらず、ＴＢに対して同じサイズの係数グループを有するルマチャネルおよびクロマチャネルに対する係数グループマッピングテーブルに対する変換ブロック（ＴＢ）サイズを示す。ＴＢ幅および高さは２の累乗であり、したがって、テーブル１は、ＴＢ幅および高さのｌｏｇ２、すなわち「ｌｏｇ２ＴＢｗｉｄｔｈ」および「ｌｏｇ２ＴＢｈｅｉｇｈｔ」がテーブル１の３次元への最初の２つのインデックスを形成することを考慮する。テーブルの最終寸法は、係数グループの幅と高さを区別する。係数グループの寸法は、ｌｏｇ２幅およびｌｏｇ２高さとして記憶される。例えば、サイズ１６×１６のＴＢはテーブル１のインデックス（４，４）をもたらし、これは、４×４の係数グループサイズを示す（２，２）を返す。サイズ（２×３２）のＴＢはテーブル１のインデックス（１，５）をもたらし、これは、２×８の係数グループサイズを示す（１，３）を返す。ルマＴＢの最小面積は１６サンプルであるので、テーブル１においてｌｏｇ２ｗｉｄｔｈ＋ｌｏｇ２ｈｅｉｇｈｔが４未満の場合はアクセスされない。イントラサブ分割がＣＵのために使用される場合、各ルマＴＢは同じサイズを有し、したがって、ルマＴＢのための係数グループサイズ決定は、ＣＵのために１回実行される。 The coefficient group size is determined using the luma TB width and height as shown in Table 1 below. Table 1 shows the transform block (TB) sizes for coefficient group mapping tables for luma and chroma channels with coefficient groups of the same size for the TB, regardless of whether the TB is for a luma channel or a chroma channel. TB width and height are powers of 2, so Table 1 is the log2 of TB width and height, i.e. "log2TBwidth" and "log2TBheight" form the first two indexes into the third dimension of Table 1 Take that into consideration. The final dimensions of the table distinguish between the width and height of the coefficient groups. The dimensions of the coefficient group are stored as log2 width and log2 height. For example, a TB of size 16x16 yields index (4,4) of table 1, which returns (2,2) indicating a coefficient group size of 4x4. A TB of size (2x32) yields index (1,5) of table 1, which returns (1,3) indicating a coefficient group size of 2x8. Since the minimum area of the luma TB is 16 samples, if log2width+log2height in Table 1 is less than 4, it will not be accessed. If intra-subpartitioning is used for CU, each luma TB has the same size, so the coefficient group sizing for luma TB is performed once for CU.

以下のテーブル２は、クロマと比較して、ルマが同じサイズＴＢについて異なる係数グループサイズを有するルマチャネルおよびクロマチャネルについて、変換ブロック（ＴＢ）サイズを係数グループサイズにマッピングすることを示す。テーブル２が使用される場合、追加の次元、すなわちルマとクロマとを区別することが必要とされ、表サイズはテーブル１と比較して２倍である。テーブル１に定義される係数グループサイズはＴＢ幅および高さ内に適合するが、面積が１６サンプルを超えない可能な最大サイズであるサイズをもたらす。テーブル１は係数グループサイズのセットを提供し、そこから係数グループサイズが選択される。幅対高さの選択された係数グループアスペクト比は、ＴＢ幅および高さの制約内で可能な限り１：１に近く保たれる。プロセッサ２０５内の制御は、ステップ１６７０からルマＴＢを符号化するステップ１６８０に進む。 Table 2 below shows that compared to chroma, luma maps transform block (TB) size to coefficient group size for luma and chroma channels with different coefficient group sizes for the same size TB. If Table 2 is used, an additional dimension is required, namely to distinguish between luma and chroma, and the table size is twice as large compared to Table 1. The coefficient group size defined in Table 1 fits within the TB width and height, but results in a size that is the largest possible size that does not exceed 16 samples in area. Table 1 provides a set of coefficient group sizes from which the coefficient group size is selected. The selected coefficient group aspect ratio of width to height is kept as close to 1:1 as possible within the TB width and height constraints. Control within processor 205 passes from step 1670 to step 1680, which encodes the luma TB.

ルマＴＢを符号化するステップ１６８０において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、ＣＵの１つまたは複数のルマＴＢの残差係数をビットストリーム１１５に符号化する。ステップ１６７０の決定された係数グループサイズは、各ルマＴＢに対して使用される。各ルマＴＢについて、ルマＴＢ内の少なくとも１つの有意係数の存在を示す符号化ブロックフラグがビットストリーム１１５に符号化される。ルマＴＢに少なくとも１つの有意係数が存在する場合、最後の有意位置がビットストリームに符号化される。最後の有意位置は、ＴＢのＤＣ（左上）係数から右下の係数に進むスキャン経路に沿った最後の有意係数として定義される。スキャン経路は、ＴＢを、それぞれ係数グループサイズとしてサイズ設定され、ＴＢの全体を占有する、オーバーラップしないサブブロックのアレイに分割する際の対角スキャンとして定義される。スキャン順序における１つのサブブロックから次のサブブロックへの進行もまた、対角スキャンに従う。エントロピーエンコーダ３３８は、左上の係数グループおよび最後の有意係数を含む係数グループ以外の各係数グループについて、「符号化サブブロックフラグ」を符号化する。符号化サブブロックフラグは、サブブロック内の少なくとも１つの有意残差係数の存在を示す。サブブロック内に有意残差係数がない場合、ＴＢ内の残差係数の対角スキャンは、そのサブブロックをスキップする。サブブロック内に少なくとも１つの有意残差係数がある場合、そのサブブロック内のすべての位置がスキャンされ、各残差係数の大きさが符号化され、各有意残差係数の符号が符号化される。プロセッサ２０５における制御は、ステップ１６８０からクロマ係数グループサイズを決定するステップ１６９０に進む。 In a step of encoding luma TBs 1680, entropy encoder 338, under execution of processor 205, encodes the residual coefficients of one or more luma TBs of the CU into bitstream 115. The determined coefficient group size of step 1670 is used for each luma TB. For each luma TB, a coded block flag is encoded into the bitstream 115 indicating the presence of at least one significant coefficient within the luma TB. If there is at least one significant coefficient in the luma TB, the last significant position is encoded into the bitstream. The last significant position is defined as the last significant coefficient along the scan path going from the DC (top left) coefficient of TB to the bottom right coefficient. A scan path is defined as a diagonal scan in dividing the TB into an array of non-overlapping sub-blocks, each sized as the coefficient group size and occupying the entire TB. The progression from one subblock to the next in the scan order also follows a diagonal scan. Entropy encoder 338 encodes an "encoding subblock flag" for each coefficient group other than the upper left coefficient group and the coefficient group containing the last significant coefficient. The encoded subblock flag indicates the presence of at least one significant residual coefficient within the subblock. If there are no significant residual coefficients in a subblock, the diagonal scan of residual coefficients in TB skips that subblock. If there is at least one significant residual coefficient in a subblock, all positions in that subblock are scanned, the magnitude of each residual coefficient is encoded, and the sign of each significant residual coefficient is encoded. Ru. Control in processor 205 passes from step 1680 to step 1690, where the chroma coefficient group size is determined.

クロマ係数グループサイズを決定するステップ１６９０において、プロセッサ２０５は、ＣＵに関連付けられたクロマ変換ブロックのペアについての係数グループサイズを決定する。ルマＣＢが複数のルマＴＢに分割されるか否かとは無関係に、各クロマチャネルに対する１つのクロマＣＢがＣＵに関連付けられる。係数グループサイズはテーブル１に示すように、クロマＴＢ幅及び高さを用いて決定される。ＴＢ幅および高さは２の累乗であり、したがって、テーブル１は、ＴＢ幅および高さのｌｏｇ２、すなわち「ｌｏｇ２ＴＢｗｉｄｔｈ」および「ｌｏｇ２ＴＢｈｅｉｇｈｔ」がテーブル１の３次元への最初の２つのインデックスを形成することを考慮する。テーブルの最終寸法は、係数グループの幅と高さを区別する。係数グループの寸法は、ｌｏｇ２幅およびｌｏｇ２高さとして記憶される。例えば、サイズ１６×１６のＴＢはテーブル１のインデックス（４，４）をもたらし、これは、４×４の係数グループサイズを示す（２，２）を返す。サイズ（２×３２）のＴＢはテーブル１のインデックス（１，５）をもたらし、これは、２×８の係数グループサイズを示す（１，３）を返す。各クロマＴＢは同じサイズを有するので、クロマＴＢのペアに対する係数グループサイズ決定はＣＵに対して１回実行される。テーブル２が使用される場合、追加の寸法、すなわちルマとクロマとを区別する次元が必要とされ、テーブルサイズはテーブル１のそれと比較して２倍である。 In determining chroma coefficient group size step 1690, processor 205 determines the coefficient group size for the pair of chroma transform blocks associated with the CU. One chroma CB for each chroma channel is associated with a CU, regardless of whether the luma CB is divided into multiple luma TBs. The coefficient group size is determined using the chroma TB width and height, as shown in Table 1. TB width and height are powers of 2, so Table 1 is the log2 of TB width and height, i.e. "log2TBwidth" and "log2TBheight" form the first two indexes into the third dimension of Table 1 Take that into account. The final dimensions of the table distinguish between the width and height of the coefficient groups. The dimensions of the coefficient group are stored as log2 width and log2 height. For example, a TB of size 16x16 yields index (4,4) of table 1, which returns (2,2) indicating a coefficient group size of 4x4. A TB of size (2x32) yields index (1,5) of table 1, which returns (1,3) indicating a coefficient group size of 2x8. Since each chroma TB has the same size, coefficient group size determination for a pair of chroma TBs is performed once for a CU. If table 2 is used, an additional dimension is required, ie a dimension to distinguish between luma and chroma, and the table size is twice compared to that of table 1.

ステップ１６７０および１６９０に関連して説明したように、係数グループサイズは、変換ブロックサイズのみに基づいて決定され、ルマチャネルとクロマチャネルとの間でさらに区別されることはない。したがって、係数グループサイズは、クロマフォーマットが４：２：２であるか４：２：０であるかに関係なく決定される。テーブル１に関連して説明したように、係数グループサイズは、１６サンプルまでの係数グループの最大領域に基づく。ステップ１６９０はクロマフォーマットに起因して、色プレーン（ＣｂおよびＣｒチャネルに適用可能）における変換ブロックまたはサブサンプリングの色プレーン（ＹまたはＣｂまたはＣｒ）とは無関係に、ＴＢに対する係数グループサイズを決定するために動作する。テーブル１は、ステップ１６７０およびステップ１６９０の両方で使用される。従って、ルマ面に属する変換ブロックと、クロマカラー面の各々に対して、単一のテーブルが使用される。プロセッサ２０５における制御は、ステップ１６９０からクロマＴＢを符号化するステップ１６１００に進む。 As described in connection with steps 1670 and 1690, the coefficient group size is determined based solely on the transform block size and no further differentiation is made between luma and chroma channels. Therefore, the coefficient group size is determined regardless of whether the chroma format is 4:2:2 or 4:2:0. As explained in connection with Table 1, the coefficient group size is based on the maximum area of the coefficient group up to 16 samples. Step 1690 determines the coefficient group size for TB due to the chroma format, independent of the transform block or subsampling color plane (Y or Cb or Cr) in the color plane (applicable to Cb and Cr channels). Works for. Table 1 is used in both step 1670 and step 1690. Therefore, a single table is used for each transform block belonging to the luma plane and the chroma color plane. Control in processor 205 passes from step 1690 to step 16100, which encodes the chroma TB.

クロマＴＢを符号化するステップ１６１００において、エントロピーエンコーダ３３８は、プロセッサ２０５の実行下で、ＣＵのクロマＴＢのペアの残差係数をビットストリーム１１５に符号化する。ステップ１６９０の決定された係数グループサイズは、クロマＴＢのペアに対して使用される。各クロマＴＢについて、クロマＴＢにおける少なくとも１つの有意係数の存在を示す符号化ブロックフラグはビットストリーム１１５に符号化される。各クロマＴＢに対する符号化ステップの残りは、ステップ１６８０を参照して説明したように、ルマＴＢに対する符号化プロセスに一致する。方法１６００はステップ１６１００の実行時に終了し、プロセッサ２０５における制御は、ＣＴＵの次のＣＵに進む。 In step 16100 of encoding chroma TBs, entropy encoder 338 encodes the residual coefficients of the CU's chroma TB pairs into bitstream 115 under execution of processor 205 . The determined coefficient group size of step 1690 is used for the pair of chroma TBs. For each chroma TB, a coded block flag indicating the presence of at least one significant coefficient in the chroma TB is encoded into the bitstream 115. The remainder of the encoding steps for each chroma TB correspond to the encoding process for the luma TB, as described with reference to step 1680. Method 1600 ends upon execution of step 16100, and control in processor 205 proceeds to the next CU of the CTU.

図１７は、ビデオビットストリーム１３３から画像フレームの符号化ユニットを復号する方法１７００を示す。方法１７００は、構成されたＦＰＧＡ、ＡＳＩＣ、またはＡＳＳＰなどの装置によって実施されてもよい。さらに、方法１７００は、プロセッサ２０５の実行下でビデオデコーダ１３４によって実行されてもよい。そのようなものとして、方法１７００は、コンピュータ可読記憶媒体および／またはメモリ２０６に記憶することができる。方法１７００は、係数グループサイズが変換ブロックサイズのみに基づいて決定され、ルマチャネルとクロマチャネルとの間でさらに区別されないように、ビットストリーム１３３からブロックを復号することになる。エントロピー復号は、ビデオエンコーダ１３４におけるクリティカルフィードバックループであるので、係数グループサイズ決定に必要なメモリアクセスまたは演算を低減することが有利である。方法１７００は、符号化ツリー内の各符号化ユニットについて呼び出され、すなわち、図１４のステップ１４３０で呼び出される。上述したように、ステップ１４３０は、現在の分割が分割無し５１０である場合に実行される。方法１７００は、ｐｒｅｄ＿ｍｏｄｅを復号するステップ１７１０で開始する。 FIG. 17 shows a method 1700 for decoding encoded units of image frames from a video bitstream 133. Method 1700 may be implemented by a device such as a configured FPGA, ASIC, or ASSP. Further, method 1700 may be performed by video decoder 134 under execution of processor 205. As such, method 1700 may be stored on computer readable storage medium and/or memory 206. Method 1700 will decode blocks from bitstream 133 such that coefficient group sizes are determined based solely on transform block sizes and do not further differentiate between luma and chroma channels. Since entropy decoding is a critical feedback loop in video encoder 134, it is advantageous to reduce the memory accesses or operations required for coefficient group size determination. Method 1700 is called for each coding unit in the coding tree, ie, in step 1430 of FIG. As mentioned above, step 1430 is performed if the current split is no split 510. Method 1700 begins with step 1710 of decoding pred_mode.

ｐｒｅｄ＿ｍｏｄｅを復号するステップ１７１０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、ビットストリーム１３３からＣＵの予測モードを復号する。プロセッサ２０５における制御は、ステップ１７１０からイントラ予測テストステップ１７２０に進む。 In step 1710 of decoding pred_mode, entropy decoder 420 , under execution of processor 205 , decodes the prediction mode of the CU from bitstream 133 . Control in processor 205 passes from step 1710 to intra-prediction test step 1720 .

イントラ予測テストステップ１７２０では、プロセッサ２０５がステップ１７１０で復号されたＣＵの予測モードをテストする。予測モードがイントラ予測である場合、ステップ１７２０は「ＹＥＳ」を返し、プロセッサ２０５における制御はステップ１７２０からイントラサブ分割モードを復号するステップ１７５０に進む。そわない場合、イントラ予測ではない場合、ステップ１７２０は「ＮＯ」を返し、プロセッサ２０５における制御は、ステップ１７２０からマージフラグおよびインデックスを復号するステップ１７３０に進む。 In an intra-prediction test step 1720, the processor 205 tests the prediction mode of the CU decoded in step 1710. If the prediction mode is intra prediction, step 1720 returns "YES" and control in processor 205 passes from step 1720 to step 1750, which decodes the intra subdivision mode. Otherwise, if it is not an intra prediction, step 1720 returns "NO" and control in processor 205 passes from step 1720 to step 1730 where the merge flag and index are decoded.

マージフラグおよびインデックスを復号するステップ１７３０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、ビットストリーム１３３から、「マージモード」がビットストリーム内でインター予測のために使用されているか否かをシグナリングするマージフラグを復号する。マージモードはＣＵの動きベクトルを、空間的または時間的に隣接する候補ブロックのセットのうち、空間的に（または時間的に）隣接するブロックから取得させる。マージモードが使用される場合、１つの候補が「マージインデックス」によって選択され、ビットストリーム１３３からも復号される。「動きベクトル予測」が使用される場合、同様の復号が実行され、それによって、いくつかの可能な候補動きベクトルのうちの１つが、ビットストリーム内のフラグによって予測子としてシグナリングされる。プロセッサ２０５における制御は、ステップ１７３０から動きベクトルデルタを復号するステップ１７４０に進む。 In step 1730 of decoding the merge flag and index, the entropy decoder 420, under execution of the processor 205, signals from the bitstream 133 whether "merge mode" is used for inter prediction in the bitstream. Decode the merge flag to be used. The merge mode causes the motion vector of a CU to be obtained from spatially (or temporally) adjacent blocks from a set of spatially or temporally adjacent candidate blocks. If merge mode is used, one candidate is selected by the “merge index” and is also decoded from the bitstream 133. If "motion vector prediction" is used, a similar decoding is performed whereby one of several possible candidate motion vectors is signaled as a predictor by a flag in the bitstream. Control in processor 205 passes from step 1730 to step 1740, where the motion vector delta is decoded.

動きベクトルデルタを復号するステップ１７４０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、ビットストリーム１３３から動きベクトルデルタを復号する。ステップ１７４０は、動きベクトル予測がＣＵのために使用される場合に実行される。動きベクトルデルタは、ステップ１７３０で符号化された動きベクトル予測子と、動き補償に使用される動きベクトルとの間のデルタを指定する。プロセッサ２０５における制御は、ステップ１７４０から符号化残差テストステップ１７６０に進む。動きベクトル予測がＣＵのために使用されない場合、ステップ１７４０は実施されず、プロセッサ２０５における制御は直接ステップ１７６０に進む。 In step 1740 of decoding motion vector deltas, entropy decoder 420 , under execution of processor 205 , decodes motion vector deltas from bitstream 133 . Step 1740 is performed if motion vector prediction is used for the CU. Motion vector delta specifies the delta between the motion vector predictor encoded in step 1730 and the motion vector used for motion compensation. Control in processor 205 passes from step 1740 to encoded residual test step 1760. If motion vector prediction is not used for the CU, step 1740 is not performed and control in processor 205 proceeds directly to step 1760.

イントラサブ分割モードを復号するステップ１７５０では、エントロピーデコーダ４２０が、プロセッサ２０５の実行下で、コンテキスト符号化された「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｍｏｄｅ＿ｆｌａｇ」シンタックス要素を使用して、ビットストリーム１３３からイントラサブ分割を使用するかどうかの決定を復号する。ルマＣＢサイズが最小ルマ変換ブロックサイズ、すなわち１６ルマサンプルよりも大きいとき、イントラサブ分割はルマチャネルに対して利用可能である。集合１５００に示されるように、イントラサブ分割は、符号化ユニットを複数のルマ変換ブロックに分割する。ルマＣＢが複数のＴＢに分割される場合、「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｓｐｌｉｔ＿ｆｌａｇ」は、ルマＣＢの複数のルマＴＢへの分割が水平にもしくは垂直に発生するか否かをシグナリングする。集合的に、「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｍｏｄｅ＿ｆｌａｇ」と「ｉｎｔｒａ＿ｓｕｂｐａｒｔｉｔｉｏｎｓ＿ｓｐｌｉｔ＿ｆｌａｇ」は、「ＩＳＰ＿ＮＯ＿ＳＰＬＩＴ」、「ＩＳＰ＿ＨＯＲ＿ＳＰＬＩＴ」、および「ＩＳＰ＿ＶＥＲ＿ＳＰＬＩＴ」として列挙された３つの可能な分割を符号化する。プロセッサ２０５における制御は、ステップ１７５０から符号化残差テストステップ１７６０に進む。 In step 1750 of decoding intra-subpartition modes, entropy decoder 420, under execution of processor 205, determines whether to use intra-subpartitions from bitstream 133 using a context-encoded "intra_subpartitions_mode_flag" syntax element. Decoding whether to decide. Intra subdivision is available for luma channels when the luma CB size is larger than the minimum luma transform block size, ie, 16 luma samples. As shown in set 1500, intra subpartitioning partitions a coding unit into multiple luma transform blocks. If the luma CB is split into multiple TBs, the "intra_subpartitions_split_flag" signals whether the split of the luma CB into multiple luma TBs occurs horizontally or vertically. Collectively, "intra_subpartitions_mode_flag" and "intra_subpartitions_split_flag" are listed as "ISP_NO_SPLIT", "ISP_HOR_SPLIT", and "ISP_VER_SPLIT". encode two possible partitions. Control in processor 205 passes from step 1750 to encoded residual test step 1760.

符号化残差テストステップ１７６０において、プロセッサ２０５は、符号化ブロックの任意の変換ブロック内の少なくとも１つの残差係数が有意であるかどうかを判定する。この判定は、イントラサブ分割の適用から生じるすべてのルマＴＢと、２つのクロマチャネルに関連するクロマＴＢのペアとを含む。エントロピーエンコーダ４２０はプロセッサ２０５の実行下で、「ｃｕ＿ｃｂｆ」シンタックス要素を算術的に復号し、プロセッサ２０５は、ＣＵのいずれかのＴＢにおける少なくとも１つの残差係数が有意であるか否かを判定する。ルマＴＢおよびクロマＴＢのいずれかにおける少なくとも１つの残差係数が有意である場合、ステップ１７６０は「ＹＥＳ」を返し、プロセッサ２０５における制御は、ルマ係数グループサイズを決定するステップ１７７０に進む。ｃｕ＿ｃｂｆについて算術的に復号される「ゼロ」によって示されるように、ＣＵのいずれのＴＢにも有意な残差係数が存在しない場合、ステップ１７６０は「ｎｏ」を返し、方法１７００は終了し、プロセッサ２０５は、ＣＴＵ内の次のＣＵに進む。 In a coding residual testing step 1760, processor 205 determines whether at least one residual coefficient in any transform block of the coding block is significant. This determination includes all luma TBs resulting from the application of intra-subdivision and the chroma TB pairs associated with the two chroma channels. Entropy encoder 420, under execution of processor 205, arithmetically decodes the "cu_cbf" syntax element, and processor 205 determines whether at least one residual coefficient in any TB of the CU is significant. do. If at least one residual coefficient in either luma TB and chroma TB is significant, step 1760 returns "YES" and control in processor 205 proceeds to step 1770, which determines the luma coefficient group size. If there are no significant residual coefficients in any TB of the CU, as indicated by the arithmetic decoded "zero" for cu_cbf, step 1760 returns "no" and the method 1700 ends and the processor 205 proceeds to the next CU within the CTU.

ルマ係数グループサイズを決定するステップ１７７０において、プロセッサ２０５は、ＣＵに関連する１つまたは複数のルマ変換ブロックの係数グループサイズを判定する。ステップ１７７０の判定は、ステップ１６７０の判定と同様に動作する。プロセッサ２０５内の制御は、ステップ１７７０からルマＴＢを復号するステップ１７８０に進む。 In determining luma coefficient group size step 1770, processor 205 determines the coefficient group size of one or more luma transform blocks associated with the CU. The determination at step 1770 operates similarly to the determination at step 1670. Control within processor 205 passes from step 1770 to step 1780, which decodes the luma TB.

ルマＴＢを復号するステップ１７８０において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、ビットストリーム１３３からのＣＵの１つまたは複数のルマＴＢの残差係数を復号する。ステップ１７７０の判定された係数グループサイズは、各ルマＴＢに対して使用される。各ルマＴＢについて、ルマＴＢ内に少なくとも１つの有意係数が存在することを示す符号化ブロックフラグがビットストリーム１３３から復号される。ルマＴＢ内に少なくとも１つの有意係数が存在する場合、最後の有意位置がビットストリームから復号される。最後の有意位置は、ＴＢのＤＣ（左上）係数から右下の係数に進むスキャン経路に沿った最後の有意係数として定義される。スキャン経路は、ＴＢを、それぞれ係数グループサイズとしてサイズ設定され、ＴＢの全体を占有する、オーバーラップしないサブブロックのアレイに分割する際の対角スキャンとして定義される。スキャン順序における１つのサブブロックから次のサブブロックへの進行もまた、対角スキャンに従う。エントロピーエンコーダ３３８は、左上の係数グループおよび最後の有意係数を含む係数グループ以外の各係数グループについて、「符号化サブブロックフラグ」を符号化する。符号化されたサブブロックフラグは、サブブロック内の少なくとも１つの有意残差係数の存在を示す。サブブロック内に有意残差係数がない場合、ＴＢ内の残差係数の対角スキャンは、そのサブブロックをスキップする。サブブロック内に少なくとも１つの有意残差係数がある場合、そのサブブロック内のすべての位置がスキャンされ、各残差係数の大きさが符号化され、各有意残差係数の符号が符号化される。プロセッサ２０５における制御は、ステップ１７８０から、クロマ係数グループサイズを決定するステップ１７９０に進む。 In step 1780 of decoding luma TB, entropy decoder 420 , under execution of processor 205 , decodes the residual coefficients of one or more luma TBs of the CU from bitstream 133 . The determined coefficient group size of step 1770 is used for each luma TB. For each luma TB, a coded block flag is decoded from the bitstream 133 indicating the presence of at least one significant coefficient within the luma TB. If there is at least one significant coefficient in the luma TB, the last significant position is decoded from the bitstream. The last significant position is defined as the last significant coefficient along the scan path going from the DC (top left) coefficient of TB to the bottom right coefficient. A scan path is defined as a diagonal scan in dividing the TB into an array of non-overlapping sub-blocks, each sized as the coefficient group size and occupying the entire TB. The progression from one subblock to the next in the scan order also follows a diagonal scan. Entropy encoder 338 encodes an "encoding subblock flag" for each coefficient group other than the upper left coefficient group and the coefficient group containing the last significant coefficient. The encoded subblock flag indicates the presence of at least one significant residual coefficient within the subblock. If there are no significant residual coefficients in a subblock, the diagonal scan of residual coefficients in TB skips that subblock. If there is at least one significant residual coefficient in a subblock, all positions in that subblock are scanned, the magnitude of each residual coefficient is encoded, and the sign of each significant residual coefficient is encoded. Ru. From step 1780, control in processor 205 passes to step 1790, where the chroma coefficient group size is determined.

クロマ係数グループサイズを決定するステップ１７９０において、プロセッサ２０５は、ＣＵに関連付けられたクロマ変換ブロックのペアに対する係数グループサイズを判定する。ステップ１７９０で行われる判定は、ステップ１６９０で行われる判定と同じように動作する。 In determining chroma coefficient group size step 1790, processor 205 determines the coefficient group size for the pair of chroma transform blocks associated with the CU. The determination made in step 1790 operates similarly to the determination made in step 1690.

ステップ１６９０と同様に、係数グループサイズは、ステップ１７９０において、変換ブロックサイズに基づいて決定され、ルマチャネルとクロマチャネルとの間でさらに区別されない。したがって、係数グループサイズは、クロマフォーマットが４：２：２または４：２：０であるか、または各色プレーンにおける対応するサブサンプリングであるかに関係なく決定される。テーブル１に関連して説明したように、係数グループサイズは、１６サンプルまでであるＴＢの最大領域に基づく。ステップ１６９０は、変換ブロックの色平面（ＣｂまたはＣｒ）に関係なく、ＴＢの係数グループサイズを決定するように動作する。プロセッサ２０５内の制御は、ステップ１７９０からクロマＴＢを復号するステップ１７１００に進む。 Similar to step 1690, the coefficient group size is determined in step 1790 based on the transform block size and does not further differentiate between luma and chroma channels. Therefore, the coefficient group size is determined regardless of whether the chroma format is 4:2:2 or 4:2:0, or the corresponding subsampling in each color plane. As explained in connection with Table 1, the coefficient group size is based on the maximum area of the TB, which is up to 16 samples. Step 1690 operates to determine the coefficient group size of TB regardless of the color plane (Cb or Cr) of the transform block. Control within processor 205 passes from step 1790 to step 17100, which decodes the chroma TB.

クロマＴＢを復号するステップ１７１００において、エントロピーデコーダ４２０は、プロセッサ２０５の実行下で、ビットストリーム１３３からのＣＵのクロマＴＢのペアの残差係数を復号する。ステップ１７９０の決定された係数グループサイズは、クロマＴＢのペアに対して使用される。各クロマＴＢについて、クロマＴＢ内に少なくとも１つの有意係数が存在することを示す符号化ブロックフラグがビットストリーム１３３から復号される。各クロマＴＢに対する復号処理の残りの部分は、ステップ１７８０を参照して説明したように、ルマＴＢに対するのと同じ方法で動作する。方法１７００は、ステップ１７１００の実行時に終了し、プロセッサ２０５における制御は、ＣＴＵの次のＣＵに進む。 In step 17100 of decoding chroma TB, entropy decoder 420, under execution of processor 205, decodes the residual coefficients of the CU's chroma TB pair from bitstream 133. The determined coefficient group size of step 1790 is used for the pair of chroma TBs. For each chroma TB, a coded block flag is decoded from the bitstream 133 indicating the presence of at least one significant coefficient within the chroma TB. The remainder of the decoding process for each chroma TB operates in the same manner as for luma TBs, as described with reference to step 1780. Method 1700 ends upon execution of step 17100 and control in processor 205 proceeds to the next CU of the CTU.

テーブル３はテーブル１を用いた場合に、JVET 「共通試験条件」(ＣＴＣ)－「All Intra Main １０」構成の下で得られた符号化性能結果を示す。テーブル３の結果は、方法１６００および１７００を実施しないベースラインＶＴＭ－４．０と比較して、方法１６００および１７００を実施する「ＶＶＣ試験モデル」（ＶＴＭ）ソフトウェアを用いて得られた。全体として、変化からの符号化の影響はなく、クロマチャネルに少しの利得さえ見られ、変換ブロックサイズから係数グループサイズへのマッピングテーブルを単純化することは、符号化性能に有害ではないことを実証する。 Table 3 shows the encoding performance results obtained under the JVET "Common Test Conditions" (CTC) - "All Intra Main 10" configuration when Table 1 is used. The results in Table 3 were obtained using the "VVC Test Model" (VTM) software implementing methods 1600 and 1700 compared to baseline VTM-4.0 without implementing methods 1600 and 1700. Overall, there was no coding impact from the change and even a small gain in the chroma channel, indicating that simplifying the mapping table from transform block size to coefficient group size is not detrimental to coding performance. Demonstrate.

ビデオエンコーダ１１５およびビデオデコーダ１３４は、それぞれ方法１６００および１７００を使用して、ルマＴＢおよびクロマＴＢの係数グループサイズを調和させることによって、残差符号化／復号処理におけるメモリ削減を達成する。その結果、クロマＴＢは、２×８のような係数グループサイズにアクセスすることができる。２ｘ２と４ｘ４のみではなく、４ｘ２、２ｘ４、８ｘ２。ルマＴＢの場合、イントラサブ分割を使用すると、１６ｘ１と１ｘ１６のサイズが可能である。サイズ１６ｘ１および１ｘ１６はテーブル１におけるそれらの存在によってクロマに利用可能であるが、クロマブロックの最小幅および高さは２つのサンプルであり、したがって、サイズ１６ｘ１および１ｘ１６はクロマＴＢにおいて使用されない。残差符号化及び復号は、設計におけるフィードバックループの一部であるので、メモリ低減は例えば、ソフトウェア実装におけるキャッシュ性能又はハードウェア実装におけるクリティカルパス低減の改善に対応する。 Video encoder 115 and video decoder 134 use methods 1600 and 1700, respectively, to achieve memory reduction in the residual encoding/decoding process by matching coefficient group sizes of luma TB and chroma TB. As a result, Chroma TB has access to coefficient group sizes such as 2x8. Not just 2x2 and 4x4, but 4x2, 2x4, 8x2. For luma TB, sizes of 16x1 and 1x16 are possible using intra subdivision. Although sizes 16x1 and 1x16 are available to chroma due to their presence in Table 1, the minimum width and height of a chroma block is two samples, so sizes 16x1 and 1x16 are not used in chroma TB. Since residual encoding and decoding are part of a feedback loop in the design, memory reduction corresponds to, for example, improving cache performance in a software implementation or critical path reduction in a hardware implementation.

uint32_t g_log2SbbSize[MAX_CU_DEPTH + 1][MAX_CU_DEPTH + 1][2] =
//===== ルマ／クロマ =====
{
{ { 0,0 },{ 0,1 },{ 0,2 },{ 0,3 },{ 0,4 },{ 0,4 },{ 0,4 },{ 0,4 } },
{ { 1,0 },{ 1,1 },{ 1,2 },{ 1,3 },{ 1,3 },{ 1,3 },{ 1,3 },{ 1,3 } },
{ { 2,0 },{ 2,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 3,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } }
};
テーブル１：（ＴＢがルマチャネルまたはクロマチャネルのためのものであるかにかかわらず、ＴＢのための同じサイズの係数グループを有する）ルマチャネルおよびクロマチャネルのための係数グループマッピングテーブルへのブロックサイズの変換
uint32_t g_log2SbbSize[2][MAX_CU_DEPTH+1][MAX_CU_DEPTH+1][2] =
{
//===== ルマ =====
{
{ {0,0}, {0,1}, {0,2}, {0,3}, {0,4}, {0,4}, {0,4}, {0,4} },
{ {1,0}, {1,1}, {1,2}, {1,3}, {1,3}, {1,3}, {1,3}, {1,3} },
{ {2,0}, {2,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {3,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} }
},
//===== クロマ =====
{
{ {0,0}, {0,0}, {0,0}, {0,0}, {0,0}, {0,0}, {0,0}, {0,0} },
{ {0,0}, {1,1}, {1,1}, {1,1}, {1,1}, {1,1}, {1,1}, {1,1} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} }
},
};
テーブル２：変換ブロックサイズの、ルマチャネルおよびクロマチャネルの係数グループサイズへの従来のマッピング（ルマ対クロマの同じサイズのＴＢに対して異なる係数グループサイズを有する） uint32_t g_log2SbbSize[MAX_CU_DEPTH + 1][MAX_CU_DEPTH + 1][2] =
//===== Luma/Chroma =====
{
{ { 0,0 },{ 0,1 },{ 0,2 },{ 0,3 },{ 0,4 },{ 0,4 },{ 0,4 },{ 0,4 } },
{ { 1,0 },{ 1,1 },{ 1,2 },{ 1,3 },{ 1,3 },{ 1,3 },{ 1,3 },{ 1,3 } },
{ { 2,0 },{ 2,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 3,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } },
{ { 4,0 },{ 3,1 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 },{ 2,2 } }
};
Table 1: Conversion of block size to coefficient group mapping table for luma channel and chroma channel (with same size coefficient group for TB regardless of whether TB is for luma channel or chroma channel)
uint32_t g_log2SbbSize[2][MAX_CU_DEPTH+1][MAX_CU_DEPTH+1][2] =
{
//===== Luma =====
{
{ {0,0}, {0,1}, {0,2}, {0,3}, {0,4}, {0,4}, {0,4}, {0,4} },
{ {1,0}, {1,1}, {1,2}, {1,3}, {1,3}, {1,3}, {1,3}, {1,3} },
{ {2,0}, {2,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {3,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {4,0}, {3,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} }
},
//===== Chroma =====
{
{ {0,0}, {0,0}, {0,0}, {0,0}, {0,0}, {0,0}, {0,0}, {0,0} },
{ {0,0}, {1,1}, {1,1}, {1,1}, {1,1}, {1,1}, {1,1}, {1,1} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} },
{ {0,0}, {1,1}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2}, {2,2} }
},
};
Table 2: Conventional mapping of transform block size to coefficient group size for luma and chroma channels (with different coefficient group sizes for the same size TB for luma vs. chroma)

テーブル３：ＴＢがルマチャネルまたはクロマチャネルに対するものであるかにかかわらず、ＴＢに対して同じサイズの係数グループを有することから生じる符号化性能。 Table 3: Coding performance resulting from having coefficient groups of the same size for a TB, regardless of whether the TB is for a luma channel or a chroma channel.

産業上の利用可能性
記載される構成は、コンピュータ及びデータ処理産業に、特にビデオ及び画像信号のような信号の符号化、復号のためのディジタル信号処理に適用可能であり、高い圧縮効率を達成する。 Industrial Applicability The described configuration is applicable to the computer and data processing industry, in particular to digital signal processing for encoding, decoding signals such as video and image signals, achieving high compression efficiency. do.

ＨＥＶＣとは対照的に、ＶＶＣシステムは柔軟性を高めるために、ルマチャネルおよびクロマチャネルのための別個の符号化ツリーの使用を可能にする。しかしながら、上述したように、結果として生じる問題は、スループットに影響を及ぼすより小さなクロマブロックの使用により生じる可能性がある。本明細書で説明される構成は、各符号化ツリーユニットが処理されてスループット問題を回避するのを助けるときに、適切な規則を決定する。さらに、上述のように、上述の構成は、スループット問題を回避するための規則が与えられると、各符号化ツリーを記述するために使用されるコンテキスト符号化ビンの算術符号化の改善された効率および精度を提供することを支援することができる。 In contrast to HEVC, VVC systems allow the use of separate coding trees for luma and chroma channels for increased flexibility. However, as mentioned above, resulting problems can arise from the use of smaller chroma blocks, which affects throughput. The configuration described herein determines appropriate rules as each encoding tree unit is processed to help avoid throughput problems. Furthermore, as mentioned above, the above configuration improves the efficiency of the arithmetic coding of the context coding bins used to describe each coding tree, given the rules to avoid throughput problems. and can assist in providing accuracy.

上記は本発明のいくつかの実施形態のみを記載し、本発明の範囲および精神から逸脱することなく、本発明に修正および／または変更を加えることができ、実施形態は例示的であり、限定的ではない。 The foregoing describes only some embodiments of the invention; modifications and/or changes may be made thereto without departing from the scope and spirit of the invention, and the embodiments are exemplary and limiting. Not the point.

Claims

A method for decoding a transform block of an image frame from a bitstream, the method comprising:
determining a chroma format of the image frame from a plurality of chroma formats including a 4:2:0 chroma format and a 4:2:2 chroma format;
dividing the coding tree unit into one or more coding units each having a luma coded block and a chroma coded block;
determining a sub-block of the transform block that is a luma transform block for a luma encoded block or a chroma transform block for a chroma encoded block;
decoding the transform block from the bitstream using the sub-blocks;
If the block size of the chroma coded block of a certain coding unit having a luma coded block and a chroma coded block in the coding tree unit of 4:2:0 chroma format is 8x2, the certain code Even if vertical ternary division is performed on the luma encoded block of a coding unit, vertical ternary division of the chroma encoded block of the certain encoding unit is not permitted;
If the luma encoding block of the certain encoding unit is vertically ternary divided and the chroma encoding block of the certain encoding unit is not vertically ternary dividing, the chroma encoding of the certain encoding unit The block is arranged at a position corresponding to three luma encoded blocks obtained by vertical ternary division of the luma encoded block of the certain encoding unit,
The size ratio of each of the three luma encoded blocks is 1:2:1,
The transform block is decoded based on (i) the size of the transform block; and (ii) whether the transform block is a luma transform block or a chroma transform block;
The size of the sub-blocks of the transform block is determined by the size of the sub-block of the transform block without using both (i) whether the transform block is a luma transform block or a chroma transform block; and (ii) the chroma format of the image frame. A method characterized in that the size of the transformation block is determined from the size of the transformation block.

A video decoder for decoding transform blocks of image frames from a bitstream, the video decoder comprising:
means for determining a chroma format of the image frame from a plurality of chroma formats including a 4:2:0 chroma format and a 4:2:2 chroma format;
means for dividing the coding tree unit into one or more coding units each having a luma coded block and a chroma coded block;
means for determining a subblock of the transform block that is a luma transform block for a luma encoded block or a chroma transform block for a chroma encoded block;
means for decoding the transform block from the bitstream using the sub-blocks;
If the block size of the chroma coded block of a certain coding unit having a luma coded block and a chroma coded block in the coding tree unit of 4:2:0 chroma format is 8x2, the certain code Even if vertical ternary division is performed on the luma encoded block of a coding unit, vertical ternary division of the chroma encoded block of the certain encoding unit is not permitted;
If the luma encoding block of the certain encoding unit is vertically ternary divided and the chroma encoding block of the certain encoding unit is not vertically ternary dividing, the chroma encoding of the certain encoding unit The block is arranged at a position corresponding to three luma encoded blocks obtained by vertical ternary division of the luma encoded block of the certain encoding unit,
The size ratio of each of the three luma encoded blocks is 1:2:1,
The transform block is decoded based on (i) the size of the transform block; and (ii) whether the transform block is a luma transform block or a chroma transform block;
The size of the sub-blocks of the transform block is determined by the size of the sub-block of the transform block without using both (i) whether the transform block is a luma transform block or a chroma transform block; and (ii) the chroma format of the image frame. A video decoder characterized in that the size of a transform block is determined based on the size of a transform block.

A method for encoding transform blocks of an image frame into a bitstream, the method comprising:
determining a chroma format of the image frame from a plurality of chroma formats including a 4:2:0 chroma format and a 4:2:2 chroma format;
dividing the coding tree unit into one or more coding units each having a luma coded block and a chroma coded block;
determining a sub-block of the transform block that is a luma transform block for a luma encoded block or a chroma transform block for a chroma encoded block;
encoding the transform block into the bitstream using the sub-blocks;
If the block size of the chroma coded block of a certain coding unit having a luma coded block and a chroma coded block in the coding tree unit of 4:2:0 chroma format is 8x2, the certain code Even if vertical ternary division is performed on the luma encoded block of a coding unit, vertical ternary division of the chroma encoded block of the certain encoding unit is not permitted;
If the luma encoding block of the certain encoding unit is vertically ternary divided and the chroma encoding block of the certain encoding unit is not vertically ternary dividing, the chroma encoding of the certain encoding unit The block is arranged at a position corresponding to three luma encoded blocks obtained by vertical ternary division of the luma encoded block of the certain encoding unit,
The size ratio of each of the three luma encoded blocks is 1:2:1,
The transform block is encoded based on (i) the size of the transform block; and (ii) whether the transform block is a luma transform block or a chroma transform block;
The size of the sub-blocks of the transform block is determined by the size of the sub-block of the transform block without using both (i) whether the transform block is a luma transform block or a chroma transform block; and (ii) the chroma format of the image frame. A method characterized in that the size of the transformation block is determined from the size of the transformation block.

A video encoder for encoding transform blocks of image frames into a bitstream, the video encoder comprising:
means for determining a chroma format of the image frame from a plurality of chroma formats including a 4:2:0 chroma format and a 4:2:2 chroma format;
means for dividing the coding tree unit into one or more coding units each having a luma coded block and a chroma coded block;
means for determining a subblock of the transform block that is a luma transform block for a luma encoded block or a chroma transform block for a chroma encoded block;
means for encoding the transform block into the bitstream using the sub-blocks;
If the block size of the chroma coded block of a certain coding unit having a luma coded block and a chroma coded block in the coding tree unit of 4:2:0 chroma format is 8x2, the certain code Even if vertical ternary division is performed on the luma encoded block of a coding unit, vertical ternary division of the chroma encoded block of the certain encoding unit is not permitted;
If the luma encoding block of the certain encoding unit is vertically ternary divided and the chroma encoding block of the certain encoding unit is not vertically ternary dividing, the chroma encoding of the certain encoding unit The block is arranged at a position corresponding to three luma encoded blocks obtained by vertical ternary division of the luma encoded block of the certain encoding unit,
The size ratio of each of the three luma encoded blocks is 1:2:1,
The transform block is encoded based on (i) the size of the transform block; and (ii) whether the transform block is a luma transform block or a chroma transform block;
The size of the sub-blocks of the transform block is determined by the size of the sub-block of the transform block without using both (i) whether the transform block is a luma transform block or a chroma transform block; and (ii) the chroma format of the image frame. A video encoder characterized in that the size of the transform block is determined based on the size of the transform block.