JP5259828B2

JP5259828B2 - Video coding using transforms larger than 4x4 and 8x8

Info

Publication number: JP5259828B2
Application number: JP2011530171A
Authority: JP
Inventors: イエ、ヤン; チェン、ペイソン; カークゼウィックズ、マルタ
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2008-10-03
Filing date: 2009-09-30
Publication date: 2013-08-07
Anticipated expiration: 2029-09-30
Also published as: CA2742390A1; AU2009298559B2; WO2010039822A3; AU2009298559A1; RU2011117669A; RU2497303C2; WO2010039822A2; KR101247923B1; JP2012504915A; KR20110063856A; ZA201103208B; CA2742390C

Description

優先権の主張
各出願の内容全体が参照によって本明細書に組み込まれる、本出願は、２００８年１０月３日に出願された米国仮出願第６１／１０２７８３号および２００９年５月１８日に出願された米国特許仮出願第６１／１７９２２８号の利益を主張する。 PRIORITY CLAIM The entire contents of each application are incorporated herein by reference. This application is filed on US Provisional Application No. 61/102783 filed Oct. 3, 2008 and May 18, 2009. Claims the benefit of US Provisional Patent Application No. 61 / 179,228.

本発明は、８×８よりも大きい変換サイズを使用したビデオデータの符号化および復号に関する。 The present invention relates to encoding and decoding video data using a transform size larger than 8 × 8.

デジタルビデオ機能は、デジタルテレビ、デジタル直接放送システム、無線電話ハンドセットなどの無線通信装置、無線放送システム、パーソナルデジタルアシスタンス（ＰＤＡ）、ラップトップまたはデスクトップコンピュータ、デジタルカメラ、デジタル録音装置、ビデオゲーム装置、ビデオゲームコンソースなどを含む広範囲の装置に組み込むことができる。デジタルビデオ装置は、ＭＰＥＧ−２や、ＭＰＥＧ−４や、Ｈ．２６４／ＭＰＥＧ−４，Ｐａｒｔ１０，ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ（ＡＶＣ）などのビデオ圧縮技術を実施してデジタルビデオをより効率的に送受信する。ビデオ圧縮技術は、空間予測および時間予測を実施してビデオシーケンスに固有の冗長性を低減させるかあるいは排除する。 Digital video functions include digital television, digital direct broadcast system, wireless communication device such as wireless telephone handset, wireless broadcast system, personal digital assistance (PDA), laptop or desktop computer, digital camera, digital recording device, video game device, It can be incorporated into a wide range of devices including video game consources. Digital video devices include MPEG-2, MPEG-4, H.264, etc. H.264 / MPEG-4, Part 10, Advanced Video Coding (AVC) and other video compression techniques are implemented to transmit and receive digital video more efficiently. Video compression techniques perform spatial and temporal prediction to reduce or eliminate redundancy inherent in video sequences.

ビデオ圧縮は一般に、空間予測および／または時間予測を含む。特に、画面内符号化は、空間予測によって、ビデオフレーム、ビデオフレームのスライスなどを含んでよい所与の符号化後単位内のビデオブロック同士の間の空間冗長性を低減させるかあるいは排除する。これに対して、画面間符号化は、時間予測によって、ビデオシーケンスの連続的な符号化後単位のビデオブロック同士の間の時間冗長性を低減させるかあるいは排除する。画面内符号化の場合、ビデオ符号器は、空間予測を実行し、同じ符号化後単位内の他のデータに基づいてデータを圧縮する。画面間符号化の場合、ビデオ符号器は、動き推定および動き補償を実行して、２つ以上の隣接する符号化後単位の互いに一致するビデオブロックの移動を追跡する。 Video compression generally includes spatial prediction and / or temporal prediction. In particular, intra-screen coding reduces or eliminates spatial redundancy between video blocks within a given post-coding unit that may include video frames, slices of video frames, etc., by spatial prediction. In contrast, inter-frame coding reduces or eliminates temporal redundancy between video blocks in consecutive post-coding units of a video sequence by temporal prediction. For intra-picture encoding, the video encoder performs spatial prediction and compresses data based on other data in the same post-encoding unit. For inter-screen coding, the video encoder performs motion estimation and motion compensation to track the movement of two or more adjacent post-coding units of matching video blocks.

空間または時間予測の後、符号化中の元のビデオブロックから予測プロセス中に生成される予測ビデオブロックを減算することによって、残余ブロックが生成される。したがって、残余ブロックは、予測ブロックと符号化中の現在のブロックとの差を示す。ビデオ符号器は、変換プロセス、量子化プロセス、およびエントロピー符号化プロセスを適用して、残余ブロックの伝達に関連するビットレートをさらに低減させ得る。これらの変換技術は、１組の画素値を周波数ドメインにおける画素値のエネルギーを表す変換係数に変換することができる。量子化は、変換係数に適用され、一般に、任意の所与の係数に関連するビットの数を制限するプロセスを伴う。エントロピー符号化の前に、ビデオ符号器は、量子化された係数ブロックをスキャンして係数の一次元ベクトルに変換する。ビデオ符号器エントロピーは、量子化された変換係数のベクトルを符号化して残余データをさらに圧縮する。 After spatial or temporal prediction, a residual block is generated by subtracting the predicted video block generated during the prediction process from the original video block being encoded. Thus, the residual block indicates the difference between the prediction block and the current block being encoded. The video encoder may apply a transform process, a quantization process, and an entropy coding process to further reduce the bit rate associated with the transmission of the residual block. These conversion techniques can convert a set of pixel values into conversion coefficients that represent the energy of the pixel values in the frequency domain. Quantization is applied to the transform coefficients and generally involves a process that limits the number of bits associated with any given coefficient. Prior to entropy coding, the video encoder scans the quantized coefficient block and converts it to a one-dimensional vector of coefficients. Video encoder entropy encodes a quantized vector of transform coefficients to further compress the residual data.

ビデオ復号器は、エントロピー復号演算を実行して係数を取り込むことができる。また復号器で逆方向スキャンを実行して、係数の受信された一次元ベクトルから二次元ブロックを形成することができる。ビデオ復号器は次いで、各係数を逆量子化し逆変換して、再構成された残余ブロックを得る。ビデオ復号器は次に、動き情報を含む予測情報に基づいて予測ビデオブロックを復号する。ビデオ復号器は次いで、再構成されたビデオブロックを生成するとともにビデオ情報の復号シーケンスを生成するために、予測ビデオブロックを対応する再構成された残余ブロックに加算する。 The video decoder can perform entropy decoding operations to capture the coefficients. A reverse scan can also be performed at the decoder to form a two-dimensional block from the received one-dimensional vector of coefficients. The video decoder then dequantizes and inverse transforms each coefficient to obtain a reconstructed residual block. The video decoder then decodes the predicted video block based on the prediction information including motion information. The video decoder then adds the predicted video block to the corresponding reconstructed residual block to generate a reconstructed video block and a decoded sequence of video information.

本出願のシステム、方法、および装置はそれぞれ、いくつかの態様を有し、どの態様もその所望の属性にのみ関係しているわけではない。以下に、特許請求の範囲によって表される本出願の範囲を制限せずに、本出願の顕著な特徴についてここで簡単に論じる。この議論を検討し、特に「詳細な説明」という名称の節を読んだ後、本出願の例示的な特徴が、特に、たとえばビデオ符号化効率の向上を含む、いくつかの向上をどのように実現できるかが理解される。 Each of the systems, methods, and apparatus of the present application has several aspects, and none of the aspects relate only to its desired attributes. In the following, the salient features of the present application will be briefly discussed here without limiting the scope of the present application as represented by the claims. After reviewing this discussion and reading the section entitled “Detailed Description” in particular, the exemplary features of the present application show some improvements, including, for example, improved video coding efficiency, among others. It is understood whether it can be realized.

一実施形態には、ビデオデータを符号化する方法において、予測モードに基づく予測ビデオブロックを生成するためにビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用することと、残余ブロックを形成するために、ビデオフレーム内の元のビデオブロックから予測ビデオブロックを減算することと、残余ブロックに適用するための第１の変換サイズを有する変換を選択することと、選択された変換を示すヘッダデータを生成することであって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と第２の構文要素が第１の変換サイズを示すように、前記ヘッダデータを生成することと、残余変換係数を生成するために、選択された変換を残余ブロックに適用することと、ヘッダデータおよび残余変換係数に基づくビデオ信号を生成することとを備える方法がある。 In one embodiment, in a method for encoding video data, applying spatial prediction or motion compensation to an original video block in a video frame to generate a predicted video block based on a prediction mode; Indicating the selected transform, subtracting the predictive video block from the original video block in the video frame to form, selecting a transform having a first transform size to apply to the remaining blocks Generating header data, the header data comprising a first syntax element having a first value indicating at least one transform size and a second syntax element indicating a predicted block size of the predicted video block; So that the combined first syntax element and second syntax element indicate a first transform size. Generating a data to generate residual transform coefficients, a method comprising the applying the selected transform to the residual block, and generating a video signal based on the header data and the residual transform coefficients.

他の実施形態には、ビデオデータを復号する方法において、少なくとも１つのブロックについてのヘッダデータおよび少なくとも１つのブロックについての残余変換係数を備えるビデオのフレーム内の少なくとも１つのブロックを示すビデオ信号を受信することであって、ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および少なくとも１つのブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と第２の構文要素が、少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示している、前記ビデオ信号を受信することと、少なくとも１つのブロックの予測ブロックサイズの予測ビデオブロックを生成するために少なくとも１つのブロックに空間予測または動き補償を適用することと、前記第１の構文要素および第２の構文要素に基づいて少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを判定することと、復号された残余ブロックを得るために判定された第１の変換サイズの逆変換を残余変換係数に適用することと、復号されたビデオブロックを得るために復号された残余ブロックを予測ビデオブロックに加算することを備える方法がある。 In another embodiment, in a method for decoding video data, receiving a video signal indicative of at least one block in a frame of video comprising header data for at least one block and residual transform coefficients for at least one block. The header data comprises a first syntax element having a first value indicating at least one transform size and a second syntax element indicating a predicted block size of at least one block, together Receiving the video signal, wherein the first syntax element and the second syntax element are indicative of a transform having a first transform size used to encode at least one block. And less to generate a predictive video block with a predictive block size of at least one block Applying a spatial prediction or motion compensation to one block and a first transform size used to encode at least one block based on the first syntax element and the second syntax element. Determining, applying an inverse transform of the first transform size determined to obtain a decoded residual block to the residual transform coefficients, and decoding the residual block decoded to obtain a decoded video block There is a method comprising adding to a predicted video block.

他の実施形態には、ビデオデータを符号化する装置において、予測モードに基づいて予測ビデオブロックを生成するために、ビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用するための手段と、残余ブロックを形成するために、ビデオフレーム内の元のビデオブロックから予測ビデオブロックを減算するための手段と、残余ブロックに適用するために第１の変換サイズを有する変換を選択するための手段と、選択された変換を示すヘッダデータを生成するための手段であって、ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、前記第１の構文要素と第２の構文要素が一緒に第１の変換サイズを示すように、前記ヘッダデータを生成するための手段と、残余変換係数を生成するために、選択された変換を残余ブロックに適用するための手段と、ヘッダデータおよび残余変換係数に基づくビデオ信号を生成するための手段とを備える装置がある。 In another embodiment, in an apparatus for encoding video data, means for applying spatial prediction or motion compensation to an original video block in a video frame to generate a predicted video block based on a prediction mode And means for subtracting the predicted video block from the original video block in the video frame to form a residual block, and for selecting a transform having a first transform size to apply to the residual block Means for generating header data indicative of the selected transform, wherein the header data has a first syntax element having a first value indicative of at least one transform size and a prediction block of the prediction video block A second syntax element indicative of a size, wherein the first syntax element and the second syntax element together define a first transform size. Means for generating the header data, means for applying the selected transform to the residual block to generate a residual transform coefficient, and a video signal based on the header data and the residual transform coefficient. There are devices comprising means for generating.

他の実施形態には、ビデオデータを復号する装置において、少なくとも１つのブロックについてのヘッダデータおよび少なくとも１つのブロックについての残余変換係数を備える、ビデオのフレーム内の少なくとも１つのブロックを示すビデオ信号を受信するための手段であって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および少なくとも１つのブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と第２の構文要素が少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示している、前記ビデオ信号を受信するための手段と、少なくとも１つのブロックの予測ブロックサイズの予測ビデオブロックを生成するために、少なくとも１つのブロックに空間予測または動き補償を適用するための手段と、前記第１の構文要素および第２の構文要素に基づいて少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを判定するための手段と、復号された残余ブロックを得るために、判定された第１の変換サイズの逆変換を残余変換係数に適用する手段と、復号された残余ブロックを予測ビデオブロックに加算し、復号されたビデオブロックを得るための手段とを備える装置がある。 In another embodiment, in an apparatus for decoding video data, a video signal indicative of at least one block in a frame of video comprising header data for at least one block and a residual transform coefficient for at least one block. Means for receiving, wherein the header data comprises a first syntax element having a first value indicative of at least one transform size and a second syntax element indicative of a predicted block size of at least one block. The video signal, wherein the combined first and second syntax elements are indicative of a transform having a first transform size used to encode at least one block. Means for receiving and generating a predicted video block of predicted block size of at least one block And means for applying spatial prediction or motion compensation to at least one block and encoding at least one block based on the first syntax element and the second syntax element. Means for determining a first transform size, means for applying an inverse transform of the determined first transform size to a residual transform coefficient to obtain a decoded residual block, and a decoded residual block And a means for adding to the predicted video block and obtaining a decoded video block.

他の実施形態には、ビデオデータを符号化するシステムにおいて、予測モードに基づく予測ビデオブロックを生成するために、ビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用するように構成された予測ユニットと、残余ブロックを形成するために、ビデオフレーム内の元のビデオブロックから予測ビデオブロックを減算するように構成された加算器と、第１の変換サイズを有する変換を選択して残余ブロックに適用し、選択された変換を示すヘッダデータを生成するように構成されたプロセッサであって、ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、前記第１の構文要素と第２の構文要素が一緒に第１の変換サイズを示すように、前記ヘッダを生成するプロセッサと、残余変換係数を生成するために、選択された変換を残余ブロックに適用するように構成されたブロック変換ユニットと、前記ヘッダデータおよび残余変換係数に基づくビデオ信号を生成するように構成されたエントロピー符号化ユニットとを備えるシステムがある。 In another embodiment, a system for encoding video data is configured to apply spatial prediction or motion compensation to an original video block in a video frame to generate a predicted video block based on a prediction mode. Selected residual unit, an adder configured to subtract the predicted video block from the original video block in the video frame to form a residual block, and a transform having a first transform size to select the residual A processor configured to apply to a block and generate header data indicative of a selected transformation, wherein the header data has a first value indicative of at least one transformation size and a first syntax element and prediction A second syntax element indicating a predicted block size of the video block, the first syntax element and the second syntax element; A processor that generates the header, together with a block transform unit configured to apply a selected transform to the residual block to generate a residual transform coefficient, such that There is a system comprising an entropy coding unit configured to generate a video signal based on the header data and residual transform coefficients.

他の実施形態には、ビデオデータを復号するシステムにおいて、少なくとも１つのブロックについてのヘッダデータおよび少なくとも１つのブロックについての残余変換係数を備える、ビデオのフレーム内の少なくとも１つのブロックを示すビデオ信号を受信するように構成された受信器であって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および少なくとも１つのブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった第１の構文要素と第２の構文要素が少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示す、受信器と、少なくとも１つのブロックの予測ブロックサイズの予測ビデオブロックを生成するために、少なくとも１つのブロックに空間予測または動き補償を適用するように構成された予測ユニットと、前記第１の構文要素および第２の構文要素に基づいて少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを判定するように構成されたプロセッサと、復号された残余ブロックを得るために判定された第１の変換サイズの逆変換を残余変換係数に適用するように構成された逆変換ユニットと、復号されたビデオブロックを得るために、復号された残余ブロックを予測ビデオブロックに加算するように構成された加算器とを備えるシステムがある。 In another embodiment, in a system for decoding video data, a video signal indicative of at least one block in a frame of video comprising header data for at least one block and residual transform coefficients for at least one block. A receiver configured to receive, wherein the header data includes a first syntax element having a first value indicative of at least one transform size and a second indicative of a predicted block size of at least one block; A receiver comprising a syntax element, the combined first syntax element and the second syntax element indicating a transform having a first transform size used to encode at least one block; At least to generate a predicted video block of predicted block size of at least one block A prediction unit configured to apply spatial prediction or motion compensation to one block and a first used to encode at least one block based on the first syntax element and the second syntax element; A processor configured to determine a transform size of the first transform size, and an inverse transform unit configured to apply an inverse transform of the first transform size determined to obtain a decoded residual block to the residual transform coefficients There are systems that comprise an adder configured to add the decoded residual block to the predicted video block to obtain a decoded video block.

ビデオ信号を符号化し復号するための送信元装置および送信先装置を示すブロック図。The block diagram which shows the transmission origin apparatus and transmission destination apparatus for encoding and decoding a video signal. 図１のビデオ符号器の一実施形態のブロック図。FIG. 2 is a block diagram of an embodiment of the video encoder of FIG. 図１の符号器によって使用される変換の種類を図１の復号器に示すようにフラグ値を設定するプロセスの一実施形態のフローチャート。2 is a flowchart of one embodiment of a process for setting flag values to indicate the type of transform used by the encoder of FIG. 1 to the decoder of FIG. 図１の符号器によって使用される変換の種類を図１の復号器に示すようにフラグ値を設定するプロセスの他の実施形態のフローチャート。FIG. 6 is a flowchart of another embodiment of a process for setting flag values to indicate the type of transform used by the encoder of FIG. 1 to the decoder of FIG. 図３のプロセスによって符号化されたビデオデータを復号するための正しい逆変換を選択するプロセスの一実施形態のフローチャート。FIG. 4 is a flowchart of one embodiment of a process for selecting the correct inverse transform for decoding video data encoded by the process of FIG. 図４のプロセスによって符号化されたビデオデータを復号するための正しい逆変換を選択するプロセスの他の実施形態のフローチャート。FIG. 5 is a flowchart of another embodiment of a process for selecting a correct inverse transform for decoding video data encoded by the process of FIG. 図１のビデオ復号器の一実施形態のブロック図。FIG. 2 is a block diagram of an embodiment of the video decoder of FIG.

以下の詳細な説明はある特定の実施形態を対象にする。しかし、本明細書の教示は多数の異なる方法に適用することができる。この説明では、同じ部分が同じ参照符号で示されている図面を参照する。 The following detailed description is directed to certain specific embodiments. However, the teachings herein can be applied in many different ways. In this description, reference is made to the drawings wherein like parts are designated with like reference numerals.

一実施形態は、ビデオ符号化および復号用の変換サイズ構文要素を対象とする。画像およびビデオ信号の符号化および復号プロセスにおいて簡略化された１組の変換選択規則および指針を実施することによって、低ビットレート構文を作成することが可能であった。上述のように、変換サイズ構文は、符号器における特定の変換サイズを示す手段であるとともに復号器における変換サイズを解釈する手段である。変換サイズ構文要素は、使用すべき変換のサイズを示すのに使用することができ、かついくつかのビットを備えるフラグ値を含んでよい。以下の詳細な説明では概して用語「ビデオ」、「画像」、および「ピクチャ」が交換可能に使用されることがあることに留意されたい。したがって、本発明の様々な形態の範囲をこれらの用語同士の違いの概念によって制限すべきではない。 One embodiment is directed to transform size syntax elements for video encoding and decoding. By implementing a simplified set of transform selection rules and guidelines in the image and video signal encoding and decoding processes, it was possible to create a low bit rate syntax. As described above, the transform size syntax is a means for indicating a specific transform size in the encoder and a means for interpreting the transform size in the decoder. The transform size syntax element may be used to indicate the size of the transform to be used and may include a flag value comprising a number of bits. Note that in the following detailed description, the terms “video”, “image”, and “picture” may be used interchangeably. Accordingly, the scope of the various forms of the present invention should not be limited by the concept of differences between these terms.

図１は、本開示で説明する符号化技術を実施するビデオ符号化・復号システム１０を示すブロック図である。図１に示されているように、システム１０は、符号化されたビデオデータを通信チャネル１６を介して送信先装置１４に送信する送信元装置１２を含んでいる。送信元装置１２は、ビデオ送信装置１８、ビデオ符号器２０、および送信器２２を含んでよい。送信元装置１２のビデオ送信装置１８は、ビデオカメラなどのビデオ取り込み装置、事前に取り込まれたビデオを含むビデオアーカイブ、またはビデオコンテンツプロバイダからのビデオフィードを含んでよい。さらなる代替形態として、ビデオ送信装置１８は、ソースビデオとしてのコンピュータグラフィクスによるデータ、またはライブビデオとコンピュータによって生成されたビデオの組合せを生成することができる。場合によっては、送信元装置１２は携帯電話またはテレビ電話であってよく、その場合、ビデオ送信装置１８は、電話に搭載されたビデオカメラであってよい。それぞれの場合に、取り込まれたか、事前に取り込まれたか、あるいはコンピュータによって生成されたビデオは、送信器２２および通信チャネル１６を介して送信元装置１２から送信先装置１４に送信するようにビデオ符号器２０によって符号化することができる。 FIG. 1 is a block diagram illustrating a video encoding / decoding system 10 that implements the encoding techniques described in this disclosure. As shown in FIG. 1, the system 10 includes a source device 12 that transmits encoded video data to a destination device 14 via a communication channel 16. The source device 12 may include a video transmission device 18, a video encoder 20, and a transmitter 22. The video transmission device 18 of the source device 12 may include a video capture device, such as a video camera, a video archive containing pre-captured video, or a video feed from a video content provider. As a further alternative, the video transmitter 18 can generate computer graphics data as source video or a combination of live video and computer generated video. In some cases, the source device 12 may be a mobile phone or a video phone, in which case the video transmission device 18 may be a video camera mounted on the phone. In each case, the captured, pre-captured, or computer generated video code is transmitted from the source device 12 to the destination device 14 via the transmitter 22 and communication channel 16. It can be encoded by the device 20.

ビデオ符号器２０はビデオ送信装置１８からビデオデータを受信する。ビデオ送信装置１８から受信されるビデオデータは一連のビデオフレームであってよい。ビデオ符号器２０は、一連のフレームを符号化単位に分割し、これらの符号化単位を処理して一連のビデオフレームを符号化する。符号化単位はたとえば、フレーム全体またはフレームの一部（たとえばスライス）であってよい。したがって、場合によっては、フレームをスライスに分割することができる。ビデオ符号器２０は、ビデオデータを符号化するために、各符号化単位を画素のブロック（本明細書ではビデオブロックまたはブロックと呼ばれる）に分割し、個々の符号化単位内のビデオブロックを処理する。そのため、符号化単位（たとえば、フレームやスライス）は複数のビデオブロックを含んでよい。言い換えれば、ビデオシーケンスは複数のフレームを含んでよく、フレームは複数のスライスを含んでよく、スライスは複数のビデオブロックを含んでよい。 Video encoder 20 receives video data from video transmitter 18. The video data received from the video transmission device 18 may be a series of video frames. Video encoder 20 divides a series of frames into coding units and processes these coding units to encode a series of video frames. A coding unit may be, for example, an entire frame or a part of a frame (eg a slice). Thus, in some cases, a frame can be divided into slices. Video encoder 20 divides each coding unit into blocks of pixels (referred to herein as video blocks or blocks) and processes the video blocks within individual coding units to encode video data. To do. Thus, a coding unit (eg, frame or slice) may include a plurality of video blocks. In other words, a video sequence may include multiple frames, a frame may include multiple slices, and a slice may include multiple video blocks.

各ビデオブロックは、一定サイズまたは可変サイズを有してよく、かつ指定された符号化標準に応じて異なるサイズを有してよい。一例として、国際電気通信連合電気通信標準化部門（ＩＴＵ−Ｔ）Ｈ．２６４／ＭＰＥＧ−４，Ｐａｒｔ１０，ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｉｎｇ（ＡＶＣ）（以下「Ｈ．２６４／ＭＰＥＧ−４Ｐａｒｔ１０ＡＶＣ」標準）は、輝度成分については１６×１６画素、８×８画素、または４×４画素、および彩度成分については８×８画素のような様々なブロックサイズの画像内予測をサポートする。画像内予測は、輝度成分については１６×１６画素、１６×８画素、８×１６画素、８×８画素、８×４画素、４×８画素、４×４画素のような様々なブロックサイズで実行することができ、彩度成分については対応するスケールのサイズで実行することができる。Ｈ．２６４では、たとえば、１６×１６画素の各ビデオブロックは、マクロブロック（ＭＢ）と呼ばれることが多く、より小さいサイズの副ブロックに細分して副ブロック単位で画像内予測または画像間予測を施すことができる。一般に、ＭＢおよび様々な副ブロックをビデオブロックとみなすことができる。したがって、ＭＢをビデオブロックとみなすことができ、区分または再区分される場合、ＭＢ自体をビデオブロックの組を形成するとみなすことができる。 Each video block may have a constant size or a variable size, and may have a different size depending on the specified coding standard. As an example, International Telecommunication Union Telecommunication Standardization Sector (ITU-T) H.264. H.264 / MPEG-4, Part 10, Advanced Video Coding (AVC) (hereinafter “H.264 / MPEG-4 Part 10 AVC” standard) is 16 × 16 pixels, 8 × 8 pixels, or 4 × It supports intra prediction of various block sizes such as 4 pixels and 8 × 8 pixels for the saturation component. Intra-picture prediction has various block sizes such as 16 × 16 pixels, 16 × 8 pixels, 8 × 16 pixels, 8 × 8 pixels, 8 × 4 pixels, 4 × 8 pixels, and 4 × 4 pixels for luminance components. The saturation component can be executed at the corresponding scale size. H. In H.264, for example, each video block of 16 × 16 pixels is often called a macroblock (MB), and is subdivided into smaller-sized subblocks to perform intra-picture prediction or inter-picture prediction in units of subblocks. Can do. In general, MBs and various sub-blocks can be considered video blocks. Thus, an MB can be considered as a video block, and when partitioned or repartitioned, the MB itself can be considered to form a set of video blocks.

各ビデオブロックごとに、ビデオ符号器２０は、ブロックのブロックタイプを選択する。ブロックタイプは、このブロックが画像間予測を使用して予測されるかそれとも画像内予測を使用して予測されるかということと、ブロックの予測ブロックサイズとを示すことができる。たとえば、Ｈ．２６４／ＭＰＥＧ−４Ｐａｒｔ１０ＡＶＣ標準は、Ｉｎｔｅｒ１６×１６、Ｉｎｔｅｒ１６×８、Ｉｎｔｅｒ８×１６、Ｉｎｔｅｒ８×８、Ｉｎｔｅｒ８×４、Ｉｎｔｅｒ４×８、Ｉｎｔｅｒ４×４、Ｉｎｔｒａ１６×１６、Ｉｎｔｒａ８×８、およびＩｎｔｒａ４×４を含むいくつかの画像間予測ブロックタイプおよび画像内予測ブロックタイプをサポートする。以下に詳しく説明するように、ビデオ符号器２０は、符号化すべき各ビデオブロックについてブロックタイプの１つを選択することができる。 For each video block, video encoder 20 selects the block type of the block. The block type can indicate whether this block is predicted using inter-picture prediction or intra-picture prediction and the predicted block size of the block. For example, H.M. H.264 / MPEG-4 Part 10 AVC standard is Inter 16 × 16, Inter 16 × 8, Inter 8 × 16, Inter 8 × 8, Inter 8 × 4, Inter 4 × 8, Inter 4 × 4, Intra 16 × 16 Supports several inter-picture and intra-picture prediction block types, including Intra 8 × 8 and Intra 4 × 4. As described in detail below, video encoder 20 may select one of the block types for each video block to be encoded.

ビデオ符号器２０は、各ビデオブロックについて予測モードも選択する。画像内符号化されたビデオブロックの場合、予測モードは、事前に符号化された１つまたは複数のビデオブロックを使用して現在のビデオブロックを予測する方法を判定することができる。Ｈ．２６４／ＭＰＥＧ−４Ｐａｒｔ１０ＡＶＣ標準では、たとえば、ビデオ符号器２０は、各Ｉｎｔｒａ４×４ブロックごとに９つの考えられる一方向予測モード、すなわち垂直予測モード、水平予測モード、ＤＣ予測モード、対角左下予測モード、対角右下予測モード、垂直右予測モード、水平下予測モード、垂直左予測モード、および水平上予測モードのうちの１つを選択することができる。各Ｉｎｔｒａ８×８ブロックを予測するのにも同様の予測モードが使用される。Ｉｎｔｒａ１６×１６ブロックについては、ビデオ符号器２０は、４つの考えられる一方向予測モード、すなわち垂直予測モード、水平予測モード、ＤＣ予測モード、および平面上予測モードのうちの１つを選択することができる。場合によっては、ビデオ符号器２０は、一方向予測モードだけでなく、一方向モードの組合せを定める１つまたは複数の多方向予測モードも含む１組の予測モードから予測モードを選択することができる。たとえば、１つまたは複数の多方向予測モードは、２つの一方向予測モードを組み合わせた二方向予測モードであってよい。 Video encoder 20 also selects a prediction mode for each video block. For intra-picture coded video blocks, the prediction mode can determine how to predict the current video block using one or more pre-coded video blocks. H. In the H.264 / MPEG-4 Part 10 AVC standard, for example, the video encoder 20 has nine possible unidirectional prediction modes for each Intra 4 × 4 block: vertical prediction mode, horizontal prediction mode, DC prediction mode, pair One of a corner lower left prediction mode, a diagonal lower right prediction mode, a vertical right prediction mode, a horizontal lower prediction mode, a vertical left prediction mode, and a horizontal upper prediction mode can be selected. A similar prediction mode is used to predict each Intra 8 × 8 block. For Intra 16 × 16 blocks, video encoder 20 selects one of four possible unidirectional prediction modes: vertical prediction mode, horizontal prediction mode, DC prediction mode, and planar prediction mode. Can do. In some cases, video encoder 20 may select a prediction mode from a set of prediction modes that includes not only a unidirectional prediction mode, but also one or more multi-directional prediction modes that define a combination of unidirectional modes. . For example, the one or more multi-directional prediction modes may be a bi-directional prediction mode that combines two unidirectional prediction modes.

ビデオ符号器２０は、ビデオブロックの予測モードを選択した後、選択された予測モードを使用して予測ビデオブロックを生成する。予測ビデオブロックが元のビデオブロックから減算されて残余ブロックが形成される。残余ブロックは、元のビデオブロックの画素値と生成された予測ブロックの画素値との差としての１組の画素差値を含んでいる。残余ブロックは、二次元ブロックフォーマット（たとえば、画素差値の二次元マトリクスやアレイ）で表すことができる。 After selecting the prediction mode of the video block, the video encoder 20 generates a prediction video block using the selected prediction mode. The predicted video block is subtracted from the original video block to form a residual block. The residual block includes a set of pixel difference values as the difference between the pixel value of the original video block and the pixel value of the generated prediction block. The residual block can be represented in a two-dimensional block format (eg, a two-dimensional matrix or array of pixel difference values).

ビデオ符号器２０は、残余ブロックを生成した後、ブロックを符号化する前に残余ブロックにいくつかの他の演算を実行することができる。ビデオ符号器２０は、整数変換、ＤＣＴ変換、方向性変換、またはウェーブレット変換などの変換を画素値の残余ブロックに適用して変換係数のブロックを生成することができる。変換係数は残余ブロックの周波数ドメイン表現であってよい。したがって、ビデオ符号器２０は残余画素値を変換係数（残余変換係数とも呼ばれる）に変換する。残余変換係数を変換ブロックまたは係数ブロックと呼ぶことができる。残余変換係数は、分離不能な変換が適用されるときには係数の一次元表現であってよく、あるいは分離可能な変換が適用されるときには係数の二次元表現であってよい。分離不能な変換は分離不能な方向性変換を含んでよい。分離可能な変換は、分離可能な方向性変換、ＤＣＴ変換、整数変換、およびウェーブレット変換を含んでよい。 Video encoder 20 may perform some other operations on the residual block after generating the residual block and before encoding the block. Video encoder 20 may apply a transform such as integer transform, DCT transform, directional transform, or wavelet transform to the residual block of pixel values to generate a block of transform coefficients. The transform coefficient may be a frequency domain representation of the residual block. Accordingly, the video encoder 20 converts the residual pixel value into a transform coefficient (also referred to as a residual transform coefficient). The residual transform coefficients can be called transform blocks or coefficient blocks. The residual transform coefficients may be a one-dimensional representation of the coefficients when a non-separable transform is applied, or may be a two-dimensional representation of the coefficients when a separable transform is applied. Non-separable transformations may include non-separable directional transformations. The separable transforms may include separable directional transforms, DCT transforms, integer transforms, and wavelet transforms.

変換後に、ビデオ符号器２０は、量子化を実行して量子化変換係数を生成する（量子化係数または量子化残余係数とも呼ばれる）。この場合も、量子化係数を一次元ベクトルフォーマットまたは二次元ブロックフォーマットで表すことができる。量子化とは一般に、係数を量子化して、場合によっては、係数を表すのに使用されるデータの量を少なくするプロセスを指す。量子化プロセスは、いくつかまたはすべての係数に関連するビット深さを浅くすることができる。用語「係数」は、本明細書で使用されるときは、変換係数、量子化係数、または他の種類の係数を表すことができる。本開示の技術は、場合によっては、残余画素値、量子化残余画素値、ならびに変換係数および量子化変換係数に適用することができる。 After conversion, video encoder 20 performs quantization to generate quantized transform coefficients (also called quantized coefficients or quantized residual coefficients). Again, the quantized coefficients can be expressed in a one-dimensional vector format or a two-dimensional block format. Quantization generally refers to the process of quantizing a coefficient and possibly reducing the amount of data used to represent the coefficient. The quantization process can reduce the bit depth associated with some or all of the coefficients. The term “coefficient” as used herein may represent a transform coefficient, a quantization coefficient, or other type of coefficient. The technique of the present disclosure can be applied to the residual pixel value, the quantized residual pixel value, the transform coefficient, and the quantized transform coefficient in some cases.

分離可能な変換が使用され、かつ係数ブロックが二次元ブロックフォーマットで表されるとき、ビデオ符号器２０は、係数をスキャンして二次元フォーマットから一次元フォーマットに変換する。言い換えれば、ビデオ符号器２０は、二次元ブロックから得た係数をスキャンして係数を係数の一次元ベクトルに直列化することができる。本開示の態様の１つによれば、ビデオ符号器２０は、収集された統計に基づいて係数ブロックを一次元に変換するのに使用されるスキャン順序を調整することができる。統計は、二次元ブロックの各位置における所与の係数値が零または非零になる可能性の表示を備えてよく、かつたとえば、二次元ブロックの各係数位置に関連する計数、確率または他の統計基準を備えてよい。場合によっては、ブロックの係数位置のサブセットについてのみ統計を収集することができる。たとえば特定数のブロックの後でスキャン順序が評価されると、非零係数を有する確率がより高いと判定されたブロック内の係数位置が、非零係数を有する確率がより低いと判定されたブロック内の係数位置より前にスキャンされるようにスキャン順序を変更することができる。このように、一次元係数ベクトルの最初の部分において非零係数をより効率的にグループ化し、一次元係数ベクトルの最後の部分において零値係数をより効率的にグループ化するように最初のスキャン順序を適合させることができる。これによって、一次元係数ベクトルの最初の部分の非零係数間の零のランが短くなり、一次元係数ベクトルの最後の部分に零のより長い１つのランが位置するようになるため、エントロピー符号化に使用されるビット数を少なくすることができる。 When a separable transform is used and the coefficient block is represented in a two-dimensional block format, video encoder 20 scans the coefficients to convert from the two-dimensional format to the one-dimensional format. In other words, video encoder 20 can scan the coefficients obtained from the two-dimensional block and serialize the coefficients into a one-dimensional vector of coefficients. According to one aspect of the present disclosure, video encoder 20 may adjust the scan order used to transform coefficient blocks to one dimension based on collected statistics. The statistics may comprise an indication that a given coefficient value at each position of the two-dimensional block may be zero or non-zero, and for example, a count, probability or other associated with each coefficient position of the two-dimensional block Statistical criteria may be provided. In some cases, statistics can be collected only for a subset of the coefficient positions of a block. For example, if the scan order is evaluated after a certain number of blocks, the coefficient positions within the block that are determined to have a higher probability of having non-zero coefficients are determined to have a lower probability of having non-zero coefficients. The scanning order can be changed so that scanning is performed before the coefficient positions in the. In this way, the first scan order to group non-zero coefficients more efficiently in the first part of the one-dimensional coefficient vector and more efficiently group zero-value coefficients in the last part of the one-dimensional coefficient vector. Can be adapted. This shortens the zero run between the non-zero coefficients in the first part of the one-dimensional coefficient vector and places one longer run of zeros in the last part of the one-dimensional coefficient vector. The number of bits used for conversion can be reduced.

ビデオ符号器２０は、係数をスキャンした後、コンテクスト適応可変長符号化（ＣＡＶＬＣ）、コンテクスト適応２進算術符号化（ＣＡＢＡＣ）、ランレングス符号化など、様々なエントロピー符号化方法のいずれかを使用して符号化単位の各ビデオブロックを符号化する。送信元装置１２は、符号化されたビデオデータを送信器２２およびチャネル１６を介して送信先装置１４に送信する。通信チャネル１６は、無線周波数（ＲＦ）スペクトルや１つまたは複数の物理伝送線などの任意の無線通信媒体または有線通信媒体、あるいは無線媒体と有線媒体の任意の組合せを備えてよい。通信チャネル１６は、ローカルエリアネットワークなどのパケット式ネットワーク、広域ネットワーク、インターネットなどのグローバルネットワークの一部を形成することができる。通信チャネル１６は一般に、符号化されたビデオデータを送信元装置１２から送信先装置１４まで送信するための任意の適切な通信媒体または様々な通信媒体の集合を表す。 Video encoder 20 scans the coefficients and then uses any of a variety of entropy encoding methods such as context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), run length coding, etc. Thus, each video block of the encoding unit is encoded. The transmission source device 12 transmits the encoded video data to the transmission destination device 14 via the transmitter 22 and the channel 16. Communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum, one or more physical transmission lines, or any combination of wireless and wired media. The communication channel 16 may form part of a global network such as a packet network such as a local area network, a wide area network, or the Internet. Communication channel 16 generally represents any suitable communication medium or collection of various communication media for transmitting encoded video data from source device 12 to destination device 14.

送信先装置１４は受信器２４と、ビデオ復号器２６と、表示装置２８とを含んでよい。ビデオ信号を受信する一手段である受信器２４は、符号化されたビデオビットストリームを送信元装置１２からチャネル１６を介して受信する。ビデオ復号器２６は、エントロピー復号を適用して符号化されたビデオビットストリームを復号し、ヘッダ情報、動きベクトル、および符号化後単位の符号化されたビデオブロックの量子化残余係数を得る。上述のように、送信元装置１２によって符号化された量子化残余係数は一次元ベクトルとして符号化されている。したがって、ビデオ復号器２６は、符号化されたビデオブロックの量子化残余係数をスキャンして係数の一次元ベクトルを量子化残余係数の二次元ブロックに変換する。ビデオ復号器２６は、ビデオ符号器２０と同様に、ビデオブロック内の所与の係数位置が零または非零になる可能性を示す統計を収集し、それによって、符号化プロセスで使用されたのと同じ方法でスキャン順序を調整することができる。したがって、直列化された量子化変換係数の一次元ベクトル表現を量子化変換係数の二次元ブロックに変換し直すために、ビデオ復号器２６によって相互適応スキャン順序を適用することができる。 The destination device 14 may include a receiver 24, a video decoder 26, and a display device 28. A receiver 24, which is a means for receiving a video signal, receives an encoded video bitstream from the transmission source device 12 via the channel 16. Video decoder 26 applies entropy decoding to decode the encoded video bitstream to obtain header information, motion vectors, and quantized residual coefficients of the encoded video block in post-encoding units. As described above, the quantization residual coefficient encoded by the transmission source device 12 is encoded as a one-dimensional vector. Accordingly, the video decoder 26 scans the quantized residual coefficients of the encoded video block and converts the one-dimensional vector of coefficients into a two-dimensional block of quantized residual coefficients. Video decoder 26, like video encoder 20, collects statistics indicating the probability that a given coefficient position in the video block may be zero or non-zero, thereby being used in the encoding process. You can adjust the scan order in the same way. Thus, a cross-adaptive scan order can be applied by the video decoder 26 to convert the serialized one-dimensional vector representation of the quantized transform coefficients back into a two-dimensional block of quantized transform coefficients.

ビデオ復号器２６は、復号されたヘッダ情報および復号された残余情報を使用して符号化単位の各ブロックを再構成する。特に、ビデオ復号器２６は、ヘッダ情報の一部として含められる予測情報および動き情報を使用して現在のビデオブロックの予測ビデオブロックを生成し、予測ブロックを対応する残余ビデオブロックと組み合わせて各ビデオブロックを再構成することができる。送信先装置１４は、再構成されたビデオブロックを表示装置２８を介してユーザに表示することができる。表示装置２８は、陰極管（ＣＲＴ）、液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ、発光ダイオード（ＬＥＤ）ディスプレイ、有機ＬＥＤディスプレイ、他の種類の表示ユニットのような様々な表示装置のいずれかを備えてよい。 The video decoder 26 reconstructs each block of the coding unit using the decoded header information and the decoded residual information. In particular, video decoder 26 generates prediction video blocks for the current video block using prediction information and motion information included as part of the header information, and combines the prediction block with the corresponding residual video block for each video. Blocks can be reconfigured. The destination device 14 can display the reconstructed video block to the user via the display device 28. The display device 28 comprises any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, a light emitting diode (LED) display, an organic LED display, and other types of display units. Good.

場合によっては、送信元装置１２と送信先装置１４は実質的に対称的に動作することができる。たとえば、送信元装置１２および送信先装置１４はそれぞれ、ビデオ符号化構成要素およびビデオ復号構成要素を含んでよい。したがって、システム１０は、たとえばビデオストリーミング、ビデオ放送、またはテレビ電話用の装置１２、１４間の一方向または二方向ビデオ伝送をサポートすることができる。ビデオ符号化構成要素およびビデオ復号構成要素を含む装置は、デジタルビデオレコーダ（ＤＶＲ）などの一般的な符号化記録再生装置の一部を形成してもよい。 In some cases, source device 12 and destination device 14 can operate substantially symmetrically. For example, source device 12 and destination device 14 may each include a video encoding component and a video decoding component. Thus, the system 10 can support one-way or two-way video transmission between devices 12, 14 for, for example, video streaming, video broadcasting, or videophone. An apparatus that includes a video encoding component and a video decoding component may form part of a typical encoded recording and playback device, such as a digital video recorder (DVR).

ビデオ符号器２０およびビデオ復号器２６は、ＭＰＥＧ−１、ＭＰＥＧ−２、およびＭＰＥＧ−４におけるＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ（ＭＰＥＧ）によって定められた標準、ＩＴＵ−ＴＨ．２６３標準、Ｈ．２６４／ＭＰＥＧ４Ｐａｒｔ１０ＡＶＣ標準、米国映画テレビ技術者協会（ＳＭＰＴＥ）４２１ＭビデオＣＯＤＥＣ標準（一般には「ＶＣ−１」と呼ばれる）、ＡｕｄｉｏＶｉｄｅｏＣｏｄｉｎｇＳｔａｎｄａｒｄＷｏｒｋｇｒｏｕｐｏｆＣｈｉｎａ（一般には「ＡＶＳ」と呼ばれる）によって定められた標準のような様々なビデオ圧縮標準のいずれかと、標準化団体によって定められたかあるいはある機関によって独自標準として開発された任意の他のビデオ符号化標準に従って動作することができる。図１には示されていないが、いくつかの態様では、ビデオ符号器２０およびビデオ復号器２６はそれぞれ、オーディオ符号器および復号器とそれぞれ一体化することができ、かつ共通のデータストリームまたは別個のデータストリーム中のオーディオとビデオの両方の符号化を扱う適切なＭＵＸ−ＤＥＭＵＸユニットまたは他のハードウェアおよびソフトウェアを含んでよい。このように、送信元装置１２および送信先装置１４はマルチメディアデータを処理することができる。ＭＵＸ−ＤＥＭＵＸユニットは、必要に応じて、ＩＴＵＨ．２２３マルチプレクサプロトコル、またはユーザデータグラムプロトコル（ＵＤＰ）のような他のプロトコルに従うことができる。 The video encoder 20 and the video decoder 26 are ITU-T H.264, a standard defined by the Moving Picture Experts Group (MPEG) in MPEG-1, MPEG-2, and MPEG-4. H.263 standard, H.264. By the H.264 / MPEG4 Part 10 AVC standard, the American Film and Television Engineers Association (SMPTE) 421M Video CODEC standard (commonly referred to as “VC-1”), and the Audio Video Coding Standard Group of China (commonly referred to as “AVS”) It can operate according to any of a variety of video compression standards, such as defined standards, and any other video encoding standard that has been defined by a standards body or developed as a proprietary standard by an organization. Although not shown in FIG. 1, in some aspects, video encoder 20 and video decoder 26 may each be integrated with an audio encoder and decoder, respectively, and a common data stream or separate A suitable MUX-DEMUX unit or other hardware and software that handles both audio and video encoding in the data stream may be included. In this way, the transmission source device 12 and the transmission destination device 14 can process multimedia data. The MUX-DEMUX unit can be installed as required by ITU H.264. The H.223 multiplexer protocol or other protocols such as User Datagram Protocol (UDP) can be followed.

いくつかの態様では、ビデオ放送の場合、本開示で説明する技術を拡張Ｈ．２６４ビデオ符号化に適用して、２００７年７月に技術標準ＴＩＡ−１０９９（「ＦＬＯ仕様」）として発表されたＦｏｒｗａｒｄＬｉｎｋＯｎｌｙ（ＦＬＯ）無線インタフェース仕様「ＦｏｒｗａｒｄＬｉｎｋＯｎｌｙＡｉｒＩｎｔｅｒｆａｃｅＳｐｅｃｉｆｉｃａｔｉｏｎｆｏｒＴｅｒｒｅｓｔｒｉａｌＭｏｂｉｌｅＭｕｌｔｉｍｅｄｉａＭｕｌｔｉｃａｓｔ」を使用して地上移動体マルチメディアマルチキャスト（ＴＭ３）システムにおけるリアルタイムビデオサービスを供給することができる。すなわち、通信チャネル１６は、ＦＬＯ仕様などに従って無線ビデオ情報を放送するのに使用される無線情報チャネルを備えてよい。ＦＬＯ仕様は、ＦＬＯ無線インタフェースに適したビットストリーム構文および意味ならびに復号プロセスを定義する例を含む。 In some aspects, in the case of video broadcast, the techniques described in this disclosure are extended H.264. Applying to H.264 video coding, Forward Link Only Air Interface Multi-Turrestrable Multi-Turrestrable Multi-Terristral Mestre Ref. Tera-1099 ("FLO Specification") published in July 2007 Can be used to provide real-time video services in terrestrial mobile multimedia multicast (TM3) systems. That is, the communication channel 16 may comprise a wireless information channel used to broadcast wireless video information according to the FLO specification or the like. The FLO specification includes examples defining bitstream syntax and semantics suitable for the FLO radio interface and the decoding process.

あるいは、ＤＶＢ−Ｈ（デジタルビデオ放送−ハンドヘルド）、ＩＳＤＢ−Ｔ（統合サービスデジタル放送−地上）、またはＤＭＢ（デジタル媒体放送）のような他の標準に従ってビデオを放送することができる。したがって、送信元装置１２は移動無線端末、ビデオストリーミングサーバ、またはビデオ放送サーバであってよい。しかし、本開示で説明する技術は、任意の特定の種類の放送、マルチキャスト、またはポイントツーポイントシステムに限定されない。放送の場合、送信元装置１２は、各々が図１の送信先装置１４と同様の装置であってよい複数の送信先装置にビデオデータのいくつかのチャネルを放送することができる。したがって、図１には単一の送信先装置１４が示されているが、ビデオ放送アプリケーションの場合、送信元装置１２は通常、ビデオコンテンツを多数の送信先装置に同時に放送する。 Alternatively, the video can be broadcast according to other standards such as DVB-H (Digital Video Broadcast-Handheld), ISDB-T (Integrated Services Digital Broadcast-Terrestrial), or DMB (Digital Media Broadcast). Accordingly, the transmission source device 12 may be a mobile wireless terminal, a video streaming server, or a video broadcast server. However, the techniques described in this disclosure are not limited to any particular type of broadcast, multicast, or point-to-point system. In the case of broadcasting, the source device 12 can broadcast several channels of video data to a plurality of destination devices, each of which may be a device similar to the destination device 14 of FIG. Thus, although a single destination device 14 is shown in FIG. 1, for a video broadcast application, the source device 12 typically broadcasts video content to multiple destination devices simultaneously.

他の例では、イーサネット（登録商標）、電話（たとえばＰＯＴＳ）、ケーブル、電力線、および光ファイバシステムのうちの１つまたは複数を含む任意の有線または無線通信システム、ならびに／または符号分割多元接続（ＣＤＭＡまたはＣＤＭＡ２０００）通信システム、周波数分割多元接続（ＦＤＭＡ）システム、直交周波数分割多元（ＯＦＤＭ）接続システム、ＧＳＭ（登録商標）（ＧｌｏｂａｌＳｙｓｔｅｍｆｏｒＭｏｂｉｌｅＣｏｍｍｕｎｉｃａｔｉｏｎ）やＧＰＲＳ（汎用パケット無線サービス）やＥＤＧＥ（拡張データＧＳＭ環境）などの時分割多重接続（ＴＤＭＡ）システム、ＴＥＴＲＡ（ＴｅｒｒｅｓｔｒｉａｌＴｒｕｎｋｅｄＲａｄｉｏ）携帯電話システム、広帯域符号分割多元接続（ＷＣＤＭＡ）システム、高データレート１ｘＥＶ−ＤＯ（ＦｉｒｓｔｇｅｎｅｒａｔｉｏｎＥｖｏｌｕｔｉｏｎＤａｔａＯｎｌｙ）または１ｘＥＶ−ＤＯゴールドマルチキャストシステム、ＩＥＥＥ８０２．１８システム、ＭｅｄｉａＦＬＯ．ＴＭ．システム、ＤＭＢシステム、ＤＶＢ−Ｈシステム、または２つ以上の装置間のデータ通信用の他の方式のうちの１つまたは複数を含む無線システムによる通信向けに送信器２２、通信チャネル１６、および受信器２４を構成することができる。 In other examples, any wired or wireless communication system including one or more of Ethernet, telephone (eg, POTS), cable, power line, and fiber optic system, and / or code division multiple access ( CDMA or CDMA2000) communication system, frequency division multiple access (FDMA) system, orthogonal frequency division multiple access (OFDM) access system, GSM (Global System for Mobile Communication), GPRS (General Packet Radio Service), EDGE (extended) Time division multiple access (TDMA) system such as data GSM environment), TETRA (Terrestrial Trunked Radio) mobile phone system, wideband code division multiple access (WCDMA) system High data rate 1xEV-DO (First generation Evolution Data Only) or 1xEV-DO Gold Multicast system, IEEE 802.18 system, MediaFLO. TM. Transmitter 22, communication channel 16, and receive for communication by a wireless system including one or more of a system, DMB system, DVB-H system, or other scheme for data communication between two or more devices The device 24 can be configured.

ビデオ符号器２０およびビデオ復号器２６はそれぞれ、１つまたは複数のマイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＥＰＧＡ）、離散ロジック、ソフトウェア、ハードウェア、ファームウェア、またはそれらの組合せとして実現することができる。ビデオ符号器２０およびビデオ復号器２６の各々は、いずれも、それぞれの移動装置、加入者装置、放送装置、サーバなどに複合符号器／復号器（ＣＯＤＥＣ）の一部として一体化することのできる１つまたは複数の符号器または復号器に含めることができる。また、送信元装置１２および送信先装置１４はそれぞれ、無線通信をサポートするのに十分な無線周波数（ＲＦ）無線構成要素およびアンテナを含め、必要に応じて、符号化されたビデオを送受信するのに適切な変調構成要素、復調構成要素、周波数変換構成要素、フィルタリング構成要素、および増幅器構成要素を含んでよい。しかし、例示を容易にするために、このような構成要素については、図１の送信元装置１２の送信器２２および送信先装置１４の受信器２４として簡単に説明する。 Video encoder 20 and video decoder 26 are each one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (EPGAs), discrete logic, software, hardware Hardware, firmware, or a combination thereof. Each of video encoder 20 and video decoder 26 can each be integrated as part of a composite encoder / decoder (CODEC) in a respective mobile device, subscriber device, broadcast device, server, etc. It can be included in one or more encoders or decoders. Also, source device 12 and destination device 14 each transmit and receive encoded video as needed, including radio frequency (RF) radio components and antennas sufficient to support wireless communication. Suitable modulation components, demodulation components, frequency conversion components, filtering components, and amplifier components. However, for ease of illustration, such components are briefly described as the transmitter 22 of the source device 12 and the receiver 24 of the destination device 14 of FIG.

図２は、ビデオ符号器２０の一例を示すブロック図である。ビデオ復号器２６は、ビデオ符号器２０と同様の構成要素を含んでよい。ビデオ符号器２０は、ビデオフレーム内のブロックの画像内符号化および画像間符号化を実行することができる。画像内符号化は、空間予測によって、所与のビデオフレーム内のビデオの空間冗長性を低減させるかあるいは排除する。画像間符号化は、時間予測によって、互いに隣接するフレーム内のビデオの時間冗長性を低減させるかあるいは排除する。画像間符号化の場合、ビデオ符号器２０は、動き推定を実行して２つ以上の隣接するフレーム間の互いに一致するビデオブロックの移動を追跡する。 FIG. 2 is a block diagram illustrating an example of the video encoder 20. Video decoder 26 may include similar components as video encoder 20. Video encoder 20 may perform intra-picture and inter-picture coding of blocks within a video frame. Intra-picture coding reduces or eliminates spatial redundancy of video within a given video frame by spatial prediction. Inter-picture coding reduces or eliminates temporal redundancy of video in adjacent frames by temporal prediction. For inter-picture coding, video encoder 20 performs motion estimation to track the movement of matching video blocks between two or more adjacent frames.

図２に示されているように、ビデオ符号器２０は、符号化すべきビデオフレーム内の現在のビデオブロック２１を受信する。図２の例では、ビデオ符号器２０は、動き予測ユニット２３と、基準フレームストア２５と、ブロック変換ユニット２９と、量子化ユニット３１と、逆量子化ユニット３３と、逆変換ユニット３５と、エントロピー符号化ユニット３７と、モード決定ユニット４３と、空間予測ユニット４７と、非ブロック化フィルタ４９とを含んでいる。ビデオ符号器２０は、加算器３９、加算器４１、およびスイッチ５１も含んでいる。ビデオ符号器２０は、量子化係数をスキャンするための不図示のスキャンユニットも含んでよい。図２は、ビデオブロックを画像間符号化するビデオ符号器２０の時間予測構成要素およびビデオブロックを画像内符号化する空間予測構成要素を示している。スイッチ５１は、モード決定ユニット４３によって制御することができ、空間予測ビデオブロックまたは時間予測ビデオブロックを入力ビデオブロック用の予測ビデオブロックとして選択するのに使用することができる。 As shown in FIG. 2, video encoder 20 receives a current video block 21 in a video frame to be encoded. In the example of FIG. 2, the video encoder 20 includes a motion prediction unit 23, a reference frame store 25, a block transform unit 29, a quantization unit 31, an inverse quantization unit 33, an inverse transform unit 35, an entropy. An encoding unit 37, a mode determination unit 43, a spatial prediction unit 47, and a deblocking filter 49 are included. Video encoder 20 also includes adder 39, adder 41, and switch 51. The video encoder 20 may also include a scan unit (not shown) for scanning the quantized coefficients. FIG. 2 shows the temporal prediction component of video encoder 20 that inter-codes video blocks and the spatial prediction component that intra-codes video blocks. The switch 51 can be controlled by the mode determination unit 43 and can be used to select a spatial prediction video block or a temporal prediction video block as a prediction video block for an input video block.

動き予測ユニット２３は、画像間符号化を評価する際、ビデオブロック２１を１つまたは複数の互いに隣接するビデオフレーム内のブロックと比較して１つまたは複数の動きベクトルを生成する。隣接する１つまたは複数のフレームを基準フレームストア２５から取り込むことができる。可変サイズ、たとえば１６×１６、１６×８、８×１６、８×８、またはそれより小さいサイズのブロックについて動き推定を実行することができる。動き予測ユニット２３は、たとえば速度(rate)歪みモデルに基づいて現在のビデオブロック２１に最もよく一致する隣接するフレーム内のブロックを識別し、各ブロック間の変位を求める。これに基づいて、動き予測ユニット２３は、変位の大きさおよび軌跡を示す動きベクトルを作成する。 When estimating inter-picture coding, motion prediction unit 23 compares video block 21 with one or more blocks in adjacent video frames to generate one or more motion vectors. One or more adjacent frames can be captured from the reference frame store 25. Motion estimation can be performed on blocks of variable size, eg 16 × 16, 16 × 8, 8 × 16, 8 × 8, or smaller. Motion prediction unit 23 identifies the block in the adjacent frame that best matches the current video block 21 based on, for example, a rate distortion model, and determines the displacement between each block. Based on this, the motion prediction unit 23 creates a motion vector indicating the magnitude and locus of the displacement.

動きベクトルは、２分の１または４分の１画素精度、あるいは場合によってはそれよりも高い精度を有してよく、ビデオ符号器２０が整数画素位置より高い精度で動きを追跡してより優れた予測ブロックを得るのを可能にする。分数画素値を有する動きベクトルを使用すると、動き予測ユニット２３で補間演算を実行することができる。たとえば、ＡＶＣ／Ｈ．２６４標準では、２分の１画素位置で輝度信号を得る際、係数（１、−５、２０、２０、−５、１）／３２を有する６タップウィーナーフィルタを使用することができる。４分の１画素位置で輝度信号を得る際、整数画素位置における値および２分の１画素位置における補間値に対する双線形フィルタリングを使用することができる。最高で１／８画素精度を有してよい彩度構成要素の分数画素補間で双線形フィルタを使用することもできる。動き予測ユニット２３は、速度歪みモデルを使用してビデオブロックの最良の動きベクトルを識別した後、動き補償によって予測ビデオブロックを出力する。 Motion vectors may have half or quarter pixel accuracy, or even higher accuracy, and better as video encoder 20 tracks motion with higher accuracy than integer pixel locations. It is possible to obtain a predicted block. Using motion vectors having fractional pixel values, the motion prediction unit 23 can perform an interpolation operation. For example, AVC / H. In the H.264 standard, a 6-tap Wiener filter having a coefficient (1, -5, 20, 20, -5, 1) / 32 can be used when obtaining a luminance signal at a half pixel position. In obtaining the luminance signal at the quarter pixel position, bilinear filtering on the value at the integer pixel position and the interpolated value at the half pixel position can be used. Bilinear filters can also be used in fractional pixel interpolation of saturation components that may have up to 1/8 pixel accuracy. Motion prediction unit 23 outputs the predicted video block by motion compensation after identifying the best motion vector of the video block using the velocity distortion model.

代替形態において、画像内符号化を評価する場合、空間予測ユニット４７が使用され、同じ符号化単位（たとえば、同じフレーム）内のすでに符号化されたブロックを使用して予測ビデオブロックが形成される。たとえば、ビデオブロック２１をビデオブロック２１と同じフレーム内のすでに符号化された他のブロックと比較することができる。いくつかの実施形態では、すでに符号化されたブロックを基準フレームストア２５から取り込むことができる。いくつかの実施形態では、様々な空間予測方法を使用することができる。たとえば、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣでは、サイズ４×４、８×８、および／または１６×１６のビデオブロックに対して方向性空間予測を実行することができる。さらに、４×４輝度ブロックおよび８×８輝度ブロックに合計で９つの予測方向を使用することができる。１６×１６輝度ブロックおよび１６×１６彩度ブロックに合計で４つの予測方向を使用することができる。他の種類の空間予測を同じ符号化単位内で実行することができる。たとえば、動き推定と同様のプロセスを使用して、現在の符号化単位のすでに符号化された部分内で現在のビデオブロックに一致するビデオブロックを識別することができる。さらに、一致するビデオブロックと現在のビデオブロックとの変位量を求め、次いで、現在のビデオブロックについての符号化されたビデオヘッダデータの一部として示すことができる。モード決定ユニット４３は、ラグランジュ速度歪みモデルのように事前に定められた基準に基づいて最適な空間予測モード（たとえば、予測ブロックサイズ、予測方向、または予測ビデオブロックの変位など）を選択することができる。 In an alternative, when evaluating intra-picture coding, spatial prediction unit 47 is used to form a predictive video block using previously coded blocks within the same coding unit (eg, the same frame). . For example, the video block 21 can be compared to other previously encoded blocks in the same frame as the video block 21. In some embodiments, already encoded blocks can be retrieved from the reference frame store 25. In some embodiments, various spatial prediction methods can be used. For example, H.M. In H.264 / MPEG-4 AVC, directional spatial prediction can be performed on video blocks of size 4x4, 8x8, and / or 16x16. In addition, a total of nine prediction directions can be used for 4 × 4 luminance blocks and 8 × 8 luminance blocks. A total of four prediction directions can be used for the 16 × 16 luminance block and the 16 × 16 saturation block. Other types of spatial prediction can be performed within the same coding unit. For example, a process similar to motion estimation can be used to identify a video block that matches the current video block within an already encoded portion of the current coding unit. Further, the amount of displacement between the matching video block and the current video block can be determined and then shown as part of the encoded video header data for the current video block. The mode determination unit 43 may select an optimal spatial prediction mode (eg, prediction block size, prediction direction, or displacement of the prediction video block) based on a predetermined criterion such as a Lagrange velocity distortion model. it can.

ビデオ符号器２０は、元のブロックから予測ブロックを減算するための一手段である加算器３９における元の、現在のビデオブロック２１から、動き予測ユニット２３または空間予測ユニット４７によって作成された予測ビデオブロックを減算することによって残余ビデオブロックを形成する。変換を適用するための一手段であるブロック変換ユニット２９は、残余ブロックに変換を適用する。モード決定ユニット４３によって、使用すべき変換のサイズおよび種類をブロック変換ユニット２９に示すことができる。量子化ユニット３１は、変換係数を量子化してビットレートをさらに低下させる。ビデオ信号を生成するための一手段であるエントロピー符号化ユニット３７は、量子化係数をエントロピー符号化してビットレートをさらに低下させる。ビデオ復号器２６は逆演算を実行して符号化されたビデオを再構成する。 The video encoder 20 is a prediction video created by the motion prediction unit 23 or the spatial prediction unit 47 from the original current video block 21 in the adder 39 which is a means for subtracting the prediction block from the original block. A residual video block is formed by subtracting the blocks. A block conversion unit 29, which is one means for applying the conversion, applies the conversion to the remaining blocks. The mode determination unit 43 can indicate to the block conversion unit 29 the size and type of conversion to be used. The quantization unit 31 quantizes the transform coefficient to further reduce the bit rate. An entropy encoding unit 37, which is a means for generating a video signal, entropy encodes the quantized coefficients to further reduce the bit rate. Video decoder 26 performs the inverse operation to reconstruct the encoded video.

逆量子化ユニット３３および逆変換ユニット３５はそれぞれ、逆量子化および逆変換を適用して残余ブロックを再構成する。加算器４１は、再構成された残余ブロックを予測ブロックに加算し、基準フレームストア２５に格納される再構成されたビデオブロックを作成する。再構成されたビデオブロックは、動き予測ユニット２３または空間予測ユニット４７によって、現在のビデオフレームまたは以後のビデオフレーム内の以後のビデオブロックを符号化するのに使用される。 Inverse quantization unit 33 and inverse transform unit 35 reconstruct the residual block by applying inverse quantization and inverse transformation, respectively. The adder 41 adds the reconstructed residual block to the prediction block, and creates a reconstructed video block stored in the reference frame store 25. The reconstructed video block is used by the motion prediction unit 23 or the spatial prediction unit 47 to encode a subsequent video block in the current video frame or a subsequent video frame.

現在のビデオフレーム２１内の所与のブロックに動き補償を実行する際、動き予測ユニット２３は、固定された１組のフィルタを使用して基準フレームから得た基準ブロックを補間することができる。現在のブロックが一方向に予測される場合には１つの基準ブロックが必要であり、現在のブロックが二方向(双方向)に予測される場合には２つの基準ブロックが必要である。Ｈ．２６４では、場合によっては、順方向および逆方向における複数の基準フレームを使用することができる。動き予測ユニット２３で使用される実際のフィルタは、動きベクトルの分数部によって決まる。たとえば、動きベクトルが所与の次元の基準フレーム内の２分の１画素位置を指す場合、２分１画素位置の値を求めるには、（１、−５、２０、２０、−５、１）／３２などの６タップフィルタがその次元において２分の１画素動きベクトルと一緒に使用される。両方の動きベクトル構成要素が整数位置を指す場合、基準フレームストア２５内の基準フレームから得た画素値を補間フィルタリング演算を実行せずに直接使用することができる。 In performing motion compensation on a given block in the current video frame 21, motion prediction unit 23 can interpolate the reference block obtained from the reference frame using a fixed set of filters. One reference block is required if the current block is predicted in one direction, and two reference blocks are required if the current block is predicted in two directions (bidirectional). H. In H.264, in some cases, multiple reference frames in the forward and reverse directions can be used. The actual filter used in the motion prediction unit 23 depends on the fractional part of the motion vector. For example, if the motion vector points to a half pixel position in a reference frame of a given dimension, to find the value of the half pixel position, (1, -5, 20, 20, -5, 1 A 6-tap filter such as) / 32 is used in the dimension along with a half-pixel motion vector. If both motion vector components point to integer positions, the pixel values obtained from the reference frame in the reference frame store 25 can be used directly without performing an interpolation filtering operation.

図７は、ビデオ復号器２６の一例を示すブロック図である。符号化されたビットストリームがシステム７００に送り込まれる。ビットストリームの各部分は、それぞれの異なるビデオブロックに相当する。さらに、これらのビデオブロックのいくつかは単一のビデオフレームを構成してもよい。ビットストリームの、所与のビデオブロックに相当する部分がエントロピー復号ユニット７０２でエントロピー復号され、量子化残余変換係数を備える残余ブロックが形成される。次に、残余ブロックを不図示の逆スキャンユニットで逆スキャンすることができる。残余ブロックを逆量子化ユニット７０６で逆量子化し逆変換ユニット７０８で逆変換して復号された残余ブロックを形成することができる。エントロピー復号ユニット７０２は、後述のように、受信されたヘッダデータに基づいて、実行すべき逆変換の種類および／またはサイズを判定することができる。予測ビデオブロックが生成され、加算ユニット７１０で、復号された残余ブロックに加算される。 FIG. 7 is a block diagram illustrating an example of the video decoder 26. The encoded bit stream is fed into the system 700. Each part of the bitstream corresponds to a different video block. In addition, some of these video blocks may constitute a single video frame. The portion of the bitstream corresponding to a given video block is entropy decoded by entropy decoding unit 702 to form a residual block comprising quantized residual transform coefficients. Next, the remaining blocks can be reversely scanned by a reverse scanning unit (not shown). The residual block may be dequantized by the inverse quantization unit 706 and inverse transformed by the inverse transform unit 708 to form a decoded residual block. The entropy decoding unit 702 can determine the type and / or size of the inverse transform to be performed based on the received header data, as described below. A predictive video block is generated and added to the decoded residual block at adder unit 710.

２種類の予測方法、すなわち画像内予測方法および画像間予測方法のうちの一方を使用して予測ビデオブロックを形成することができる。空間予測ユニット７１６は、同じビデオフレーム（または符号化単位としてビデオスライスが使用される場合には同じビデオスライス）内のすでに符号化されたブロックを使用して画像内予測ブロックを生成する。動き補償ユニット７１８は、基準フレームストア７２０に格納されている前のフレームおよび／または後のフレームを使用して画像間予測ブロックを生成する。ビデオブロックを符号化するのに使用される符号化モードを示す受信されたヘッダデータに応じて、空間予測ユニット７１６または動き補償ユニット７１８を呼び出して画像内予測ブロックまたは画像間予測ブロックを生成するようにスイッチ７２２を切り替えることができる。次に、予測ブロックが、加算ユニット７１０で、復号された残余ブロックに加算され、復号されたビデオブロックが生成される。 One of two types of prediction methods can be used to form a predictive video block: an intra-picture prediction method and an inter-picture prediction method. Spatial prediction unit 716 generates an intra-picture prediction block using already coded blocks in the same video frame (or the same video slice if a video slice is used as the coding unit). Motion compensation unit 718 generates an inter-picture prediction block using the previous frame and / or the subsequent frame stored in reference frame store 720. Depending on the received header data indicating the encoding mode used to encode the video block, the spatial prediction unit 716 or the motion compensation unit 718 is invoked to generate an intra-picture prediction block or an inter-picture prediction block. The switch 722 can be switched. Next, the prediction block is added to the decoded residual block at an addition unit 710 to generate a decoded video block.

次に、結果として得られた再構成されたビデオブロックは、視覚的に悪影響を与える恐れのあるブロック化アーチファクトを防止するためにビデオブロックをブロックエッジの所でフィルタリングすることができる非ブロック化フィルタリングユニット７１２に送信される。生成される出力は、復号された最終ビデオブロックである。復号された最終ビデオブロックは、同じまたは他のビデオフレーム内の他のビデオブロックを再構成できるように基準フレームストア７２０に格納することができる。 The resulting reconstructed video block can then be deblocked filtered so that the video block can be filtered at the block edge to prevent blocking artifacts that can be visually detrimental. Transmitted to unit 712. The generated output is the decoded final video block. The decoded final video block can be stored in the reference frame store 720 so that other video blocks in the same or other video frames can be reconstructed.

復号器は、符号化されたビデオストリームを適切に復号するために、ビデオデータを符号化するのにどのような種類の変換が使用されたかを知る必要がある。復号器は次に、符号器で使用される順変換に対応する適切な逆変換を適用することがある。したがって、ビデオブロックを符号化するのに使用された変換の種類を示すデータをビデオビットストリームの一部として復号器に送信してビデオブロックを適切に復号する必要がある。 The decoder needs to know what kind of transform was used to encode the video data in order to properly decode the encoded video stream. The decoder may then apply an appropriate inverse transform corresponding to the forward transform used in the encoder. Therefore, it is necessary to properly decode the video block by sending data indicating the type of transform used to encode the video block to the decoder as part of the video bitstream.

図２に関して説明したように、ブロック変換ユニット２９は残余ビデオブロックに変換を適用する。残余ブロックに変換を適用すると、量子化およびエントロピー符号化と組み合わせたときに高圧縮効率を可能にする所望のエネルギー集中が実現される。ＭＰＥＧ２やＨ．２６４／ＡＶＣのような一般的なブロック式ビデオ符号化システムで使用される変換の例には８×８ＤＣＴ変換ならびに４×４および８×８整数変換が含まれる。 As described with respect to FIG. 2, block transform unit 29 applies transforms to the remaining video blocks. Applying the transform to the residual block achieves the desired energy concentration that enables high compression efficiency when combined with quantization and entropy coding. MPEG2 and H.264 Examples of transforms used in common block video coding systems such as H.264 / AVC include 8x8 DCT transforms and 4x4 and 8x8 integer transforms.

Ｈ．２６４／ＡＶＣ標準は、高符号化効率をもたらす最新のビデオ符号化標準である。Ｈ．２６４／ＡＶＣは様々な種類のブロック変換を使用する。画像内予測（空間予測）されたブロックおよび画像間予測（時間予測）されたブロックについて、Ｈ．２６４／ＡＶＣは、４×４ＤＣＴ変換に基づく４×４整数変換または８×８ＤＣＴ変換に基づく８×８整数変換を使用する。 H. The H.264 / AVC standard is the latest video coding standard that provides high coding efficiency. H. H.264 / AVC uses various types of block conversion. For blocks that have undergone intra-picture prediction (spatial prediction) and blocks that have undergone inter-picture prediction (temporal prediction). H.264 / AVC uses a 4 × 4 integer transform based on a 4 × 4 DCT transform or an 8 × 8 integer transform based on an 8 × 8 DCT transform.

ビデオ信号の彩度信号については、追加的なレベルの２×２アダマール変換が各ブロック内の２×２ＤＣ構成要素に適用される。 For the saturation signal of the video signal, an additional level of 2 × 2 Hadamard transform is applied to the 2 × 2 DC components in each block.

ビデオ信号の輝度信号については、変換は以下のように選択される。まず、当該ブロックが画像内予測されるかそれとも画像間予測されるかが判定される。このブロックが画像間予測される場合、次に、ブロックサイズが８×８よりも小さいかどうかが判定される。該ブロックが８×８よりも小さい場合、４×４整数変換が使用される。該ブロックが８×８以上である場合、４×４整数変換または８×８整数変換が使用される。 For the luminance signal of the video signal, the conversion is selected as follows. First, it is determined whether the block is predicted within a picture or between pictures. If this block is predicted between images, it is next determined whether the block size is smaller than 8 × 8. If the block is smaller than 8x8, a 4x4 integer transform is used. If the block is 8x8 or greater, a 4x4 integer transform or an 8x8 integer transform is used.

当該ブロックが画像内予測される場合、このブロックがＩＮＴＲＡ＿１６×１６モードを使用して予測されるかどうかが判定される。該ブロックがＩＮＴＲＡ＿１６×１６モードを使用して予測される場合、４×４整数変換が該ブロックに適用され、追加的なレベルの４×４アダマール変換が各ブロック内の４×４ＤＣ構成要素に適用される。該ブロックがＩＮＴＲＡ＿１６×１６モードを使用して予測されない場合、該ブロックがＩＮＴＲＡ＿４×４モードを使用して予測される場合には４×４整数変換が使用され、該ブロックがＩＮＴＲＡ＿８×８モードを使用して予測される場合には８×８整数変換が使用される。 If the block is predicted in-picture, it is determined whether this block is predicted using the INTRA — 16 × 16 mode. If the block is predicted using the INTRA_16 × 16 mode, a 4 × 4 integer transform is applied to the block and an additional level of 4 × 4 Hadamard transform is applied to the 4 × 4 DC component in each block Is done. If the block is not predicted using the INTRA_16x16 mode, a 4x4 integer transform is used if the block is predicted using the INTRA_4x4 mode, and the block uses the INTRA_8x8 mode Therefore, 8 × 8 integer conversion is used.

当該ブロックに対して４×４変換または８×８変換を使用できる場合、変換の選択は使用中のＨ．２６４／ＡＶＣプロファイルに依存する。ハイプロファイル以外の任意のＨ．２６４プロファイル（たとえば、ベースラインプロファイル、拡張ベースラインプロファイル、メインプロファイル）の下では、４×４整数変換のみが使用される。Ｈ．２６４／ＡＶＣハイプロファイル（すなわち、忠実度範囲拡張）の下では、８×８ＤＣＴ変換に基づく８×８整数変換を輝度信号に使用することもできる。４×４整数変換と８×８整数変換のいずれを選択するかは、追加的な構文要素、変換＿サイズ＿８×８＿フラグによって示される。４×４変換または８×８変換を使用できる場合（たとえば、サイズが８×８以上の画像間符号化されたブロック）、変換＿サイズ＿８×８＿フラグが符号化されたビデオデータと一緒に復号器に送信される。変換＿サイズ＿８×８＿フラグが１に設定された場合、残余ブロックに対して８×８整数変換が適用され、そうでない場合（変換＿サイズ＿８×８＿フラグが０に設定された場合）、残余ブロックに対して４×４整数変換が適用される。 If a 4 × 4 or 8 × 8 transform can be used for the block, the transform selection is the H.264 in use. Depends on H.264 / AVC profile. Arbitrary H. other than high profile Under the H.264 profile (eg, baseline profile, extended baseline profile, main profile), only 4 × 4 integer transforms are used. H. Under the H.264 / AVC high profile (ie, fidelity range extension), an 8 × 8 integer transform based on the 8 × 8 DCT transform can also be used for the luminance signal. Whether to select 4 × 4 integer conversion or 8 × 8 integer conversion is indicated by an additional syntax element, conversion_size_8 × 8_flag. If a 4x4 transform or an 8x8 transform can be used (eg, an inter-coded block of size 8x8 or larger), the transform_size_8x8_flag is decoded along with the encoded video data Sent to the instrument. If the transform_size_8 × 8_flag is set to 1, then 8 × 8 integer transform is applied to the remaining block, otherwise (the transform_size_8 × 8_flag is set to 0), the remainder A 4 × 4 integer transform is applied to the block.

Ｈ．２６４／ＡＶＣでは、１６×１６、１６×８、８×１６、８×８、８×４、４×８、および４×４のような様々なブロックサイズ（すなわち、モーションパーティション）に対して動き予測を実行することができる。通常、対象の縁部および多数のディテールを有する領域の周りではより小さいモーションパーティションが使用され、一方、より平滑な領域の周りにはより大きなモーションパーティションが通常選択される。その結果、動き予測後の残余ブロックも通常より平滑になり、すなわちそれらの残余ブロックはより低い周波数の成分を含む傾向がある。このような信号については、より大きい変換を適用すると、よりうまくエネルギーの集中が行われ得る。モーションパーティションおよび変換サイズを選択する方法およびそのための符号器は、すべて参照によって本明細書に組み込まれる米国特許第５１０７３４５号、米国特許第６９９６２８３号、および米国特許第６６００８３６号に記載されている。上述のように、Ｈ．２６４／ＡＶＣは、４×４整数変換および８×８整数変換のみを画像間符号化されたビデオブロックに使用する。４×４整数変換および８×８整数変換は、現在１ビットサイズに限定されている変換＿サイズ＿８×８＿フラグの値によって示される。したがって、１ビット変換＿サイズ＿８×８＿フラグでは２種類の変換しか示せないためＨ．２６４で使用されている現在の構文で追加的な変換サイズを示すことはできない。符号器および復号器によって使用される追加的な変換サイズを示すのを可能にする構文および構文要素について以下に説明する。いくつかの実施形態では、この構文要素は、変換サイズを示す２ビットフラグ値を備える。フラグ値は、復号器に送信されるヘッダ情報の一部として含めることができる。 H. In H.264 / AVC, motion is available for various block sizes (ie, motion partitions) such as 16 × 16, 16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, and 4 × 4. Prediction can be performed. Typically, smaller motion partitions are used around areas of interest and multiple details, while larger motion partitions are usually selected around smoother areas. As a result, the residual blocks after motion prediction are also smoother than usual, i.e., those residual blocks tend to contain lower frequency components. For such signals, applying a larger transformation may result in better energy concentration. Methods for selecting motion partitions and transform sizes and encoders therefor are described in US Pat. No. 5,107,345, US Pat. No. 6,996,283, and US Pat. No. 6,600,286, all incorporated herein by reference. As mentioned above, H.M. H.264 / AVC uses only 4 × 4 integer transforms and 8 × 8 integer transforms for video blocks that have been inter-coded. The 4 × 4 integer transform and the 8 × 8 integer transform are indicated by the value of transform_size_8 × 8_flag, which is currently limited to 1 bit size. Therefore, since the 1-bit conversion_size_8 × 8_flag can indicate only two types of conversion, The current syntax used in H.264 cannot indicate additional transform sizes. The syntax and syntax elements that make it possible to indicate additional transform sizes used by the encoder and decoder are described below. In some embodiments, this syntax element comprises a 2-bit flag value indicating the transform size. The flag value can be included as part of the header information sent to the decoder.

以下の実施形態では、画像間予測されたビデオブロックまたは画像内予測されたビデオブロックを上述の方法と一緒に使用することができる。すなわち、動き補償または空間予測によってビデオブロックの予測ブロックを形成することができる。動き補償を使用する実施形態では、予測ブロックサイズはモーションパーティションサイズに等しく、したがって、用語「予測ブロック」と用語「モーションパーティション」は相互交換可能に使用することができる。さらに、空間予測が使用される実施形態では、予測ブロックサイズは、使用される空間予測ブロックのサイズに等しい。したがって、用語「予測ブロック」と用語「画像内予測ブロック」または用語「空間予測ブロック」を相互交換可能に使用することができる。たとえば、ＩＮＴＲＡ＿１６×１６予測およびＩＮＴＲＡ＿８×８予測を使用して符号化されたビデオブロックに複数の変換選択肢を使用することができる。４×４変換だけでなく、１６×１６変換、１６×８変換、８×１６変換、または８×８変換をＩＮＴＲＡ＿１６×１６予測されたビデオブロックに適用することができ、ＩＮＴＲＡ＿８×８予測されたビデオブロックに８×８変換を適用することができる。画像内予測されたブロックについては、画像間予測されたビデオブロックと同様に変換サイズを示すことができる。変換サイズフラグ構文要素を予測ブロックサイズ構文要素と組み合わせることができ、変換サイズフラグ構文要素の可変長符号化を使用することができる。 In the following embodiment, an inter-picture predicted video block or an intra-picture predicted video block may be used with the method described above. That is, a prediction block of a video block can be formed by motion compensation or spatial prediction. In embodiments that use motion compensation, the predicted block size is equal to the motion partition size, so the terms “predicted block” and the term “motion partition” can be used interchangeably. Further, in embodiments where spatial prediction is used, the prediction block size is equal to the size of the spatial prediction block used. Accordingly, the terms “prediction block” and the term “intra-picture prediction block” or the term “spatial prediction block” can be used interchangeably. For example, multiple transform options may be used for video blocks encoded using INTRA — 16 × 16 prediction and INTRA — 8 × 8 prediction. In addition to 4x4 transforms, 16x16 transforms, 16x8 transforms, 8x16 transforms, or 8x8 transforms can be applied to INTRA_16x16 predicted video blocks, and INTRA_8x8 predicted An 8 × 8 transform can be applied to the video block. For the intra-picture predicted block, the transform size can be indicated in the same manner as the inter-picture predicted video block. The transform size flag syntax element can be combined with the predicted block size syntax element, and variable length encoding of the transform size flag syntax element can be used.

後述の構文は、フラグ値と所与のブロックの予測ブロックサイズとの両方を使用して変換サイズを示す。当該ブロックの予測ブロックサイズとフラグ値を組み合わせると、フラグ値と変換サイズとの１対１の対応を使用するときよりも多くの変換サイズを示すことができる。たとえば、変換サイズとフラグ値との１対１の対応では、２ビットフラグは、各フラグ値が単一の変換サイズを示す４つの異なる変換サイズを示すに過ぎない。しかし、該ブロックの予測ブロックサイズをさらに利用することによって、同数のビットをフラグに使用して追加的な変換サイズを示すことができる。たとえば、変換サイズを該ブロックの予測ブロックサイズに等しくすべきであることをフラグ値００が示し、予測ブロックサイズがＮ個の異なるブロックサイズの予測ブロックサイズであってよい場合、単一のフラグ値００が、Ｎ個の異なる変換サイズを示すことができる。したがって、一実施形態では、使用中の変換サイズが該ブロックの予測ブロックサイズに等しいことをフラグ値の１つまたは複数が示すことができる。他の実施形態では、可変長符号化を使用してフラグ値を符号化することができる。 The syntax described below indicates the transform size using both the flag value and the predicted block size for a given block. Combining the predicted block size and flag value of the block can indicate more transform sizes than when using a one-to-one correspondence between flag values and transform sizes. For example, in a one-to-one correspondence between conversion size and flag value, a 2-bit flag only indicates four different conversion sizes, each flag value indicating a single conversion size. However, by further utilizing the predicted block size of the block, the same number of bits can be used for flags to indicate additional transform sizes. For example, if the flag value 00 indicates that the transform size should be equal to the predicted block size of the block, and the predicted block size may be a predicted block size of N different block sizes, a single flag value 00 can indicate N different transform sizes. Thus, in one embodiment, one or more of the flag values can indicate that the transform size in use is equal to the predicted block size of the block. In other embodiments, flag values can be encoded using variable length encoding.

図３は、所与のビデオブロックについて符号器２０によって使用される変換サイズを示すフラグ値を符号器に設定するプロセス３００の例示的な実施形態である。各ビデオブロックの予測ブロックサイズをモード決定ユニット４３で判定することができ、変換をブロック変換ユニット２９で実行することができる（図２参照）。予測ブロックサイズの選択とあるブロックで使用される変換サイズの選択は、モード決定ユニット４３によって行うことができる。プロセス３００の第１のステップ３０２では、この所与のブロックの予測ブロックサイズが８×８よりも大きいかどうかが判定される。予測ブロックサイズが８×８以下である場合、プロセスはステップ３０６に進む。あるいは、予測ブロックサイズが８×８よりも大きい場合、プロセスはステップ３３８に進む。 FIG. 3 is an exemplary embodiment of a process 300 for setting a flag value in the encoder indicating the transform size used by the encoder 20 for a given video block. The predicted block size of each video block can be determined by the mode determination unit 43 and the conversion can be performed by the block conversion unit 29 (see FIG. 2). Selection of the prediction block size and selection of the transform size used in a certain block can be performed by the mode determination unit 43. In a first step 302 of the process 300, it is determined whether the predicted block size for this given block is greater than 8x8. If the predicted block size is 8 × 8 or less, the process proceeds to step 306. Alternatively, if the predicted block size is greater than 8 × 8, the process proceeds to step 338.

ステップ３０６では、予測ブロックサイズが８×８よりも小さいかどうかが判定される。予測ブロックサイズが８×８よりも小さい場合、プロセス３００は、当該ブロックに４×４変換が適用されるステップ３１０に進む。次いで、プロセス３００は、どのフラグ値も復号器に送信されるように設定されないステップ３１４に進む。あるいは、予測ブロックサイズが８×８以上であるとステップ３０６で判定される場合、プロセスは、該ブロックで使用すべき変換サイズが８×８であるかどうかが判定されるステップ３１８に進む。８×８変換サイズを使用すべきでないと判定される場合、プロセス３００は、該ブロックに４×４変換が適用されるステップ３２２に進み、次に、値が０の１ビットフラグが復号器に送信されるように設定されるステップ３２６に進む。代替形態において、ステップ３１８で、８×８変換を使用すべきであると判定され、プロセスは、該ブロックに８×８変換が適用されるステップ３３０に進み、次に、値が１の１ビットフラグが復号器に送信されるように設定されるステップ３３４に進む。 In step 306, it is determined whether the predicted block size is less than 8 × 8. If the predicted block size is less than 8x8, the process 300 proceeds to step 310 where a 4x4 transform is applied to the block. Process 300 then proceeds to step 314 where no flag value is set to be sent to the decoder. Alternatively, if it is determined in step 306 that the predicted block size is greater than or equal to 8 × 8, the process proceeds to step 318 where it is determined whether the transform size to be used in the block is 8 × 8. If it is determined that the 8 × 8 transform size should not be used, the process 300 proceeds to step 322 where a 4 × 4 transform is applied to the block, and then a 1-bit flag with a value of 0 is sent to the decoder. Proceed to step 326 which is set to be transmitted. In an alternative, it is determined in step 318 that an 8 × 8 transform should be used, and the process proceeds to step 330 where an 8 × 8 transform is applied to the block, and then one bit with a value of 1 Proceed to step 334 where the flag is set to be sent to the decoder.

予測ブロックサイズが８×８よりも大きいとステップ３０２で判定される場合、プロセスはステップ３３８に進む。ステップ３３８で、この所与のブロックに８×８よりも大きい変換サイズを使用すべきかどうかが、自動的にまたは手動で、符号器で判定される。８×８よりも大きい変換サイズを使用すべきでない場合、プロセス３００は、この所与のブロックに対して使用すべき変換サイズが８×８であるかどうかが判定されるステップ３４２に進む。使用すべき変換サイズが８×８ではない場合、プロセス３００は、該ブロックに４×４変換が適用されるステップ３４６に進み、次に、１ビットフラグ値０が復号器に送信されるように設定されるステップ３５０に進む。代替形態において、使用すべき変換サイズが８×８である場合、プロセス３００は、該ブロックに８×８変換が適用されるステップ３５４に進み、次いで２ビットフラグ値１０が復号器に送信されるように設定されるステップ３５８に進む。 If it is determined at step 302 that the predicted block size is greater than 8 × 8, the process proceeds to step 338. At step 338, the encoder determines whether to use a transform size greater than 8 × 8 for this given block, either automatically or manually. If a transform size greater than 8x8 should not be used, process 300 proceeds to step 342 where it is determined whether the transform size to use for this given block is 8x8. If the transform size to use is not 8 × 8, the process 300 proceeds to step 346 where a 4 × 4 transform is applied to the block, so that a 1-bit flag value of 0 is sent to the decoder. Proceed to step 350 to be set. In the alternative, if the transform size to be used is 8 × 8, process 300 proceeds to step 354 where an 8 × 8 transform is applied to the block, and then a 2-bit flag value of 10 is sent to the decoder. Proceed to step 358 set as follows.

使用すべき変換サイズが８×８よりも大きいとステップ３３８で判定される場合、プロセス３００はステップ３６２に進む。ステップ３６２では、この所与のブロックの予測ブロックサイズが１６×１６であるかどうかが判定される。予測ブロックサイズが１６×１６であると判定される場合、プロセス３００は、１６×１６変換が該ブロックに適用されるステップ３６６に進み、次にステップ３８２に進む。あるいは、予測ブロックサイズが１６×１６ではないとステップ３６２で判定される場合、プロセス３００は、予測ブロックサイズが８×１６であるかどうかが判定されるステップ３７０に進む。予測ブロックサイズが８×１６であると判定される場合、プロセス３００は、該ブロックに８×１６変換が適用される次のステップ３７４に進み、次いでステップ３８２に進む。あるいは、予測ブロックサイズが８×１６ではないと判定される場合、プロセス３００は、該ブロックに１６×８変換が適用される次のステップ３７４に進み、次いでステップ３８２に進む。ステップ３８２では、２ビットフラグ値１１が復号器に送信されるように設定される。 If step 338 determines that the transform size to be used is greater than 8 × 8, process 300 proceeds to step 362. In step 362, it is determined whether the predicted block size of this given block is 16x16. If it is determined that the predicted block size is 16 × 16, the process 300 proceeds to step 366 where a 16 × 16 transform is applied to the block, and then proceeds to step 382. Alternatively, if it is determined at step 362 that the predicted block size is not 16 × 16, the process 300 proceeds to step 370 where it is determined whether the predicted block size is 8 × 16. If it is determined that the predicted block size is 8 × 16, the process 300 proceeds to the next step 374 where an 8 × 16 transform is applied to the block, and then proceeds to step 382. Alternatively, if it is determined that the predicted block size is not 8 × 16, the process 300 proceeds to the next step 374 where a 16 × 8 transform is applied to the block, and then proceeds to step 382. In step 382, the 2-bit flag value 11 is set to be sent to the decoder.

プロセス３００によれば、フラグ値は以下の変換タイプに対応する。

According to process 300, flag values correspond to the following conversion types:

当業者には、プロセス３００のステップのいくつかを省略するかあるいは新しいステップを追加しても同じ結果を実現できることが認識される。さらに、ステップのいくつかを異なる順序で実行することができる。フラグ値を再構成する（たとえば、０を８×８変換とし、１０を４×４変換とする）ことができることにも留意されたい。 One skilled in the art will recognize that some of the steps of process 300 may be omitted or new steps added to achieve the same result. In addition, some of the steps can be performed in a different order. Note also that the flag values can be reconstructed (eg, 0 is an 8 × 8 transform and 10 is a 4 × 4 transform).

図４は、所与のビデオブロックについて符号器２０によって使用される変換サイズを示すフラグ値を符号器で設定する他のプロセス４００の例示的な実施形態である。予測ブロックサイズをモード決定ユニット４３で判定することができ、変換をブロック変換ユニット２９で実行することができる。予測ブロックサイズの選択とあるブロックで使用される変換サイズの選択は、モード決定ユニット４３によって行われる。プロセス４００の第１のステップ４０２では、所与のブロックの予測ブロックサイズが８×８よりも大きいかどうかが判定される。予測ブロックサイズが８×８以下である場合、プロセスはステップ４０６に進む。あるいは、予測ブロックサイズが８×８よりも大きい場合、プロセスはステップ４３８に進む。 FIG. 4 is an exemplary embodiment of another process 400 for setting a flag value at the encoder indicating the transform size used by the encoder 20 for a given video block. The predicted block size can be determined by the mode determination unit 43 and the conversion can be performed by the block conversion unit 29. Selection of the prediction block size and selection of the transform size used in a certain block are performed by the mode determination unit 43. In a first step 402 of process 400, it is determined whether the predicted block size for a given block is greater than 8x8. If the predicted block size is 8 × 8 or less, the process proceeds to step 406. Alternatively, if the predicted block size is greater than 8 × 8, the process proceeds to step 438.

ステップ４０６では、予測ブロックサイズが８×８よりも小さいかどうかが判定される。予測ブロックサイズが８×８よりも小さい場合、プロセス４００は、当該ブロックに４×４変換が適用されるステップ４１０に進む。次いで、プロセス４００は、どのフラグ値も復号器に送信されるように設定されないステップ４１４に進む。あるいは、予測ブロックサイズが８×８以上であるとステップ４０６で判定される場合、プロセスは、該ブロックで使用すべき変換サイズが８×８であるかどうかが判定されるステップ４１８に進む。８×８変換サイズを使用すべきでないと判定される場合、プロセス４００は、該ブロックに４×４変換が適用されるステップ４２２に進み、次に、値が０の１ビットフラグが復号器に送信されるように設定されるステップ４２６に進む。代替形態において、ステップ４１８で、８×８変換を使用すべきであると判定され、プロセスは、該ブロックに８×８変換が適用されるステップ４３０に進み、次に、値が１の１ビットフラグが復号器に送信されるように設定されるステップ４３４に進む。 In step 406, it is determined whether the predicted block size is less than 8 × 8. If the predicted block size is less than 8x8, the process 400 proceeds to step 410 where a 4x4 transform is applied to the block. Process 400 then proceeds to step 414 where no flag value is set to be sent to the decoder. Alternatively, if it is determined in step 406 that the predicted block size is greater than or equal to 8 × 8, the process proceeds to step 418 where it is determined whether the transform size to be used in the block is 8 × 8. If it is determined that the 8 × 8 transform size should not be used, the process 400 proceeds to step 422 where a 4 × 4 transform is applied to the block, and then a 1-bit flag with a value of 0 is sent to the decoder. Proceed to step 426 set to transmit. In an alternative, it is determined in step 418 that an 8 × 8 transform should be used, and the process proceeds to step 430 where an 8 × 8 transform is applied to the block, and then one bit with a value of 1 Proceed to step 434 where the flag is set to be sent to the decoder.

予測ブロックサイズが８×８よりも大きいとステップ４０２で判定される場合、プロセスはステップ４３８に進む。ステップ４３８で、予測ブロックサイズが１６×１６であるかどうかが判定される。予測ブロックサイズが１６×１６である場合、プロセス４００は、ブロックに適用すべき変換サイズが８×８であるかどうかが判定されるステップ４４２に進む。適用すべき変換サイズが８×８であると判定される場合、プロセス４００は、この所与のブロックに８×８変換が適用されるステップ４４６に進み、次に、値が００の２ビットフラグが復号器に送信されるように設定されるステップ４５０に進む。あるいは、適用すべき変換サイズは８×８ではないとステップ４４２で判定される場合、プロセス４４２は、該ブロックに１６×１６変換を適用すべきかどうかが判定されるステップ４５４に進む。１６×１６変換を適用すべきである場合、プロセス４００は、この所与のブロックに１６×１６変換が適用されるステップ４５８に進み、次に値が０１の２ビットフラグが復号器に送信されるように設定されるステップ４６２に進む。一方、適用すべき変換サイズは１６×１６ではないとステップ４５４で判定される場合、プロセス４００は、この所与のブロックに適用すべき変換サイズが１６×８であるかどうかが判定されるステップ４６６に進む。適用すべき変換サイズが１６×８である場合、プロセス４００は、この所与のブロックに１６×８変換が適用されるステップ４７０に進み、次に値が１０の２ビットフラグが復号器に送信されるように設定されるステップ４７４に進む。代替形態において、該ブロックに適用すべき変換サイズは１６×８ではないとステップ４６６で判定される場合、プロセス４００は、この所与のブロックに８×１６変換が適用されるステップ４７８に進み、次に値が１１の２ビットフラグが復号器に送信されるように設定されるステップ４８２に進む。 If it is determined at step 402 that the predicted block size is greater than 8 × 8, the process proceeds to step 438. At step 438, it is determined whether the predicted block size is 16x16. If the predicted block size is 16x16, the process 400 proceeds to step 442 where it is determined whether the transform size to apply to the block is 8x8. If it is determined that the transform size to be applied is 8 × 8, the process 400 proceeds to step 446 where the 8 × 8 transform is applied to this given block, and then a 2-bit flag with a value of 00. Proceeds to step 450 where is set to be sent to the decoder. Alternatively, if it is determined at step 442 that the transform size to be applied is not 8 × 8, the process 442 proceeds to step 454 where it is determined whether a 16 × 16 transform should be applied to the block. If a 16x16 transform is to be applied, the process 400 proceeds to step 458 where the 16x16 transform is applied to this given block, and then a 2-bit flag with a value of 01 is sent to the decoder. Proceed to step 462 set to: On the other hand, if it is determined in step 454 that the transform size to be applied is not 16 × 16, then the process 400 determines whether the transform size to be applied to this given block is 16 × 8. Proceed to 466. If the transform size to apply is 16 × 8, process 400 proceeds to step 470 where the 16 × 8 transform is applied to this given block, and then a 2-bit flag with a value of 10 is sent to the decoder. Proceed to step 474 which is set to do so. In an alternative, if it is determined in step 466 that the transform size to be applied to the block is not 16 × 8, the process 400 proceeds to step 478 where an 8 × 16 transform is applied to the given block; The process then proceeds to step 482 where a 2-bit flag with a value of 11 is set to be sent to the decoder.

予測ブロックサイズが１６×１６ではないとステップ４３８で判定される場合、プロセス４００は、この所与のブロックに適用すべき変換サイズが８×８であるかどうかが判定されるステップ４８４に進む。適用すべき変換が８×８である場合、プロセス４００は、該ブロックに８×８変換が適用されるステップ４９２に進み、次に、１ビットフラグ値０が復号器に送信されるように設定されるステップ４２６に進む。代替形態において、該ブロックに適用すべき変換サイズは８×８ではないとステップ４８４で判定される場合、プロセスは、予測ブロックサイズが１６×８であるかどうかが判定されるステップ４８６に進む。予測ブロックサイズが１６×８である場合、プロセス４００は、該ブロックに１６×８変換が実行されるステップ４８８に進み、次にステップ４３４に進む。代替形態において、予測ブロックサイズは１６×８ではないとステップ４８６で判定される場合、プロセス４００は、該ブロックに８×１６変換が実行されるステップ４９０に進み、次にステップ４３４に進む。ステップ４３４では、値が１の１ビットフラグが復号器に送信されるように設定される。 If it is determined at step 438 that the predicted block size is not 16 × 16, the process 400 proceeds to step 484 where it is determined whether the transform size to apply to this given block is 8 × 8. If the transform to be applied is 8x8, the process 400 proceeds to step 492 where the 8x8 transform is applied to the block and then set to send a 1-bit flag value 0 to the decoder. The process proceeds to step 426. In an alternative, if it is determined in step 484 that the transform size to be applied to the block is not 8 × 8, the process proceeds to step 486 where it is determined whether the predicted block size is 16 × 8. If the predicted block size is 16 × 8, process 400 proceeds to step 488 where a 16 × 8 transform is performed on the block, and then proceeds to step 434. In the alternative, if it is determined at step 486 that the predicted block size is not 16 × 8, the process 400 proceeds to step 490 where an 8 × 16 transform is performed on the block, and then proceeds to step 434. In step 434, a 1-bit flag with a value of 1 is set to be sent to the decoder.

プロセス４００によれば、フラグ値は以下の変換タイプに対応する。

According to process 400, the flag values correspond to the following conversion types:

当業者には、プロセス４００のステップのいくつかを省略するかあるいは新しいステップを追加しても同じ結果を実現できることが認識される。さらに、ステップのいくつかを異なる順序で実行することができる。フラグ値を再構成する（たとえば、００を１６×１６変換とし、０１を８×８変換とする）ことができることにも留意されたい。 Those skilled in the art will recognize that some of the steps of process 400 may be omitted or new steps added to achieve the same result. In addition, some of the steps can be performed in a different order. Note also that the flag value can be reconstructed (eg, 00 is a 16 × 16 transform and 01 is an 8 × 8 transform).

図５は、プロセス３００を使用して符号器２０によって符号化されたブロックに対して復号器２６で逆変換を実行するためのプロセス５００の例示的な実施形態である。他の構成要素として、エントロピー復号ユニット、空間予測ユニット、動き補償ユニット、逆量子化ユニット、逆変換ユニット、エントロピー復号ユニット、および加算器を含んでよい復号器６は、プロセス５００の各ステップを実行する一手段である。さらに、復号器２６の様々な構成要素を使用してプロセス５００の様々なステップを実行することができる。ステップ５０２では、予測ブロックサイズが８×８より大きいかどうかが判定される。予測ブロックサイズが８×８よりも大きい場合、プロセスは、復号器が、１ビットまたは２ビットフラグ値を探し、このフラグ値および予測ブロックサイズに基づいて逆変換を実行するステップ５１８に進む。使用すべき逆変換の種類は表１に示されている。あるいは、予測ブロックサイズは８×８以下であるとステップ５０２で判定される場合、プロセス５００は、予測ブロックサイズが８×８よりも小さいかどうかが判定されるステップ５０６に進む。予測ブロックサイズが８×８よりも小さい場合、プロセス５００は、４×４逆変換が実行される次のステップ５１０に進む。一方、予測ブロックサイズが８×８以上であるとステップ５０６で判定される場合、プロセスは、復号器が、１ビットフラグ値を探し、フラグ値に基づいて逆変換を実行するステップ５１４に進む。使用すべき逆変換の種類は表１に示されている。 FIG. 5 is an exemplary embodiment of a process 500 for performing inverse transformations at the decoder 26 on blocks encoded by the encoder 20 using the process 300. Decoder 6, which may include an entropy decoding unit, a spatial prediction unit, a motion compensation unit, an inverse quantization unit, an inverse transform unit, an entropy decoding unit, and an adder as other components, performs the steps of process 500. It is one means to do. Further, various components of decoder 26 may be used to perform various steps of process 500. In step 502, it is determined whether the predicted block size is greater than 8x8. If the predicted block size is greater than 8 × 8, the process proceeds to step 518 where the decoder looks for a 1-bit or 2-bit flag value and performs an inverse transform based on the flag value and the predicted block size. The types of inverse transformation to be used are shown in Table 1. Alternatively, if it is determined at step 502 that the predicted block size is 8 × 8 or less, the process 500 proceeds to step 506 where it is determined whether the predicted block size is less than 8 × 8. If the predicted block size is less than 8x8, the process 500 proceeds to the next step 510 where a 4x4 inverse transform is performed. On the other hand, if it is determined in step 506 that the predicted block size is greater than or equal to 8 × 8, the process proceeds to step 514 where the decoder looks for a 1-bit flag value and performs an inverse transform based on the flag value. The types of inverse transformation to be used are shown in Table 1.

図６は、プロセス４００を使用して符号器２０によって符号化されたブロックに対して復号器２６で逆変換を実行するためのプロセス６００の例示的な実施形態である。プロセッサであってよい復号器２６は、他の構成要素として、プロセス６００の各ステップを実行する一手段である。ステップ６０２では、予測ブロックサイズが８×８より大きいかどうかが判定される。予測ブロックサイズが８×８よりも大きい場合、プロセスは、予測ブロックサイズが１６×１６であるかどうかが判定されるステップ６１８に進む。予測ブロックサイズが１６×１６である場合、プロセス６００は、復号器が２ビットフラグ値を探し、このフラグ値に従って該ブロックに逆変換が実行されるステップ６２２に進む。使用すべき逆変換の種類は表２に示されている。あるいは、予測ブロックサイズは１６×１６ではないとステップ６１８で判定される場合、プロセス６００は、復号器が１ビットフラグ値を探し、この１ビット値およびモーションパーティションサイズに基づいて逆変換が実行されるステップ６２６に進む。使用すべき逆変換の種類は表２に示されている。 FIG. 6 is an exemplary embodiment of a process 600 for performing inverse transforms at the decoder 26 on blocks encoded by the encoder 20 using the process 400. Decoder 26, which may be a processor, is a means for performing the steps of process 600 as another component. In step 602, it is determined whether the predicted block size is greater than 8x8. If the predicted block size is greater than 8x8, the process proceeds to step 618 where it is determined whether the predicted block size is 16x16. If the predicted block size is 16 × 16, process 600 proceeds to step 622 where the decoder looks for a 2-bit flag value and an inverse transform is performed on the block according to this flag value. The types of inverse transformation to be used are shown in Table 2. Alternatively, if it is determined in step 618 that the predicted block size is not 16 × 16, the process 600 causes the decoder to look for a 1-bit flag value and perform an inverse transform based on the 1-bit value and the motion partition size. Proceed to step 626. The types of inverse transformation to be used are shown in Table 2.

予測ブロックサイズが８×８以下であるとステップ６０２で判定される場合、プロセス６００は、予測ブロックサイズが８×８よりも小さいかどうかが判定されるステップ６０６に進む。予測ブロックサイズが８×８よりも小さい場合、プロセス６００は、４×４逆変換が実行される次のステップ６１０に進む。一方、予測ブロックサイズは８×８以上であるとステップ６０６で判定される場合、プロセスは、復号器が、１ビットフラグ値を探し、フラグ値に基づいて逆変換を実行するステップ６１４に進む。使用すべき逆変換の種類は表２に示されている。 If it is determined at step 602 that the predicted block size is 8 × 8 or less, the process 600 proceeds to step 606 where it is determined whether the predicted block size is less than 8 × 8. If the predicted block size is less than 8x8, the process 600 proceeds to the next step 610 where a 4x4 inverse transform is performed. On the other hand, if it is determined in step 606 that the predicted block size is greater than or equal to 8 × 8, the process proceeds to step 614 where the decoder looks for a 1-bit flag value and performs an inverse transform based on the flag value. The types of inverse transformation to be used are shown in Table 2.

プロセス３００、４００、５００、および６００は、ビデオのあるブロックに使用すべき変換のサイズを判定するための特定の構文を表している。当業者には、各プロセスが、各ブロックの符号化および復号を行い、かつ各フラグ値を設定する例示的なプロセスに過ぎないことが認識される。追加的なステップ、より少ないステップ、または再構成されたステップを含む他のプロセスを使用しても表１または表２に示されているのと同じ構文を実現することができることに留意されたい。さらに、当業者には、各変換表示に割り当てられた特定のフラグ値を変更できることが認識される。また、表１および表２に示されているのと同様の構文を形成することができる。 Processes 300, 400, 500, and 600 represent a specific syntax for determining the size of a transform to use for a block of video. Those skilled in the art will recognize that each process is merely an exemplary process for encoding and decoding each block and setting each flag value. Note that other processes including additional steps, fewer steps, or reconstructed steps can be used to achieve the same syntax as shown in Table 1 or Table 2. Further, those skilled in the art will recognize that the specific flag value assigned to each conversion indication can be changed. Also, a syntax similar to that shown in Tables 1 and 2 can be formed.

追加的な変換サイズ（たとえば、３２×３２）および予測ブロックサイズ（たとえば、３２×３２）を使用して各ブロックの符号化および復号を行い、かつ各フラグ値を設定することができることにも留意されたい。たとえば、フラグは、上述のようにフラグ値に２ビットしか使用しない場合でも、フラグは、３２×３２の変換サイズを示すことができる。たとえば、プロセス３００では、ステップ３６２は、予測ブロックサイズが３２×３２に等しいかどうか判定することができ、ステップ３７０は、予測ブロックサイズが１６×３２に等しいかどうか判定することができる。次に、各ステップでそれぞれ、当該ブロックに３２×３２変換、１６×３２変換、または３２×１６変換が実行されるようにステップ３６６、３７４、および３７８を修正することができる。したがって、ステップ３５８で設定されるフラグ値は、１６×１６、８×１６、または１６×８の変換ではなく、３２×３２、１６×３２、または３２×１６の変換を示す。フラグ値と予測ブロックサイズの組合せを使用して追加的な変換サイズを示すように追加的な修正を施すことができる。 Note also that additional transform sizes (eg, 32 × 32) and predicted block sizes (eg, 32 × 32) can be used to encode and decode each block and set each flag value. I want to be. For example, even if the flag uses only 2 bits for the flag value as described above, the flag can indicate a conversion size of 32 × 32. For example, in process 300, step 362 can determine whether the predicted block size is equal to 32 × 32, and step 370 can determine whether the predicted block size is equal to 16 × 32. Steps 366, 374, and 378 can then be modified so that a 32 × 32 transformation, a 16 × 32 transformation, or a 32 × 16 transformation is performed on the block at each step, respectively. Accordingly, the flag value set in step 358 indicates a 32 × 32, 16 × 32, or 32 × 16 conversion rather than a 16 × 16, 8 × 16, or 16 × 8 conversion. Additional modifications can be made to indicate additional transform sizes using combinations of flag values and predicted block sizes.

フラグ値の各ビットは、符号化されたビデオデータの一部として通信チャネル１６に沿って送信される。フラグ値の各ビットの配置は、符号化方式に応じて、送信されるビットストリームに沿って異なっていてよい。フラグ値は、復号器に送信されるヘッダの一部であってよい。ヘッダは、ブロックタイプ、予測モード、輝度および彩度についての符号化されたブロックパターン（ＣＢＰ）、予測ブロックサイズ、および１つまたは複数の動きベクトルのような、現在のビデオブロックの特定の特性を識別することができる追加的なヘッダ構文要素を含んでよい。これらのヘッダ構文要素は、たとえば、ビデオ符号器２０内のエントロピー符号化ユニット３７で生成することができる。 Each bit of the flag value is transmitted along the communication channel 16 as part of the encoded video data. The arrangement of each bit of the flag value may be different along the transmitted bitstream depending on the encoding scheme. The flag value may be part of the header sent to the decoder. The header contains specific characteristics of the current video block such as block type, prediction mode, coded block pattern (CBP) for luminance and saturation, prediction block size, and one or more motion vectors. Additional header syntax elements that can be identified may be included. These header syntax elements can be generated, for example, by entropy encoding unit 37 in video encoder 20.

一実施形態では、ヘッダは、符号化されたブロック内に非零係数があるかどうかを示すビットを含む。非零係数が存在する場合、変換サイズを示すビットもヘッダに含められる。非零係数が存在しない場合、変換サイズビットは送信されない。他の実施形態では、非零係数が存在するかどうかとは無関係に各ヘッダにおいて変換サイズ要素が送信される。 In one embodiment, the header includes a bit that indicates whether there are non-zero coefficients in the encoded block. If non-zero coefficients are present, a bit indicating the transform size is also included in the header. If there are no non-zero coefficients, no transform size bits are transmitted. In other embodiments, a transform size element is transmitted in each header regardless of whether non-zero coefficients are present.

本開示で説明した技術は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せで実施することができる。ユニットまたは構成要素として説明したあらゆる特徴を集積論理装置として一緒に実現するか、あるいは互いに離散しているが相互運用可能な論理装置として別個に実現することができる。各技術は、ソフトウェアで実施される場合、実行時に上述の方法のうちの１つまたは複数を実行する指示を備えるコンピュータ読み取り可能媒体によって少なくとも部分的に実現することができる。コンピュータ読み取り可能媒体は、パッケージ材料を含んでよいコンピュータプログラム製品の一部を形成することができる。コンピュータ読み取り可能な媒体は、シンクロナスＤＲＡＭ（ＳＤＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、非揮発性ランダムアクセスメモリ（ＮＶＲＡＭ）、電気的に消去可能なプログラマブルＲＯＭ（ＥＥＰＲＯＭ）、フラッシュメモリ、磁気または光学データ記憶媒体などのランダムアクセスメモリ（ＲＡＭ）を備えてよい。上記に加えてあるいは代替として、これらの技術を少なくとも部分的に、指示またはデータ構造の形でコードを運ぶかあるいは伝達し、かつコンピュータによってアクセスし、読み取り、かつ／あるいは実行することができるコンピュータ読み取り可能な通信媒体によって実現することができる。 The techniques described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. Any feature described as a unit or component may be implemented together as an integrated logic device or separately as discrete but interoperable logic devices. Each technique, when implemented in software, can be implemented at least in part by a computer-readable medium comprising instructions that when executed execute one or more of the above-described methods. The computer readable medium may form part of a computer program product that may include packaging material. Computer readable media include synchronous DRAM (SDRAM), read only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable ROM (EEPROM), flash memory, magnetic or optical data A random access memory (RAM) such as a storage medium may be provided. In addition to or in the alternative, a computer readable medium capable of carrying or transmitting code in the form of instructions or data structures and accessed, read and / or executed by a computer, at least in part. It can be realized by a possible communication medium.

コードは、１つまたは複数のデジタル信号プロセッサ（ＤＳＰ）、汎用マイクロプロセッサ、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルロジックアレイ（ＦＰＧＡ）、または他の同等の集積論理回路または離散論理回路のような１つまたは複数のプロセッサによって実行することができる。したがって、本明細書で使用される用語「プロセッサ」は、前述の構造のいずれかまたは本明細書で説明した技術を実施するのに適した任意の他の構造を指すことがある。また、いくつかの態様では、符号化および復号用に構成されるかあるいは複合ビデオ符号器復号器（ＣＯＤＥＣ）に組み込まれた専用のソフトウェアユニットまたはハードウェアユニット内に本明細書で説明した機能を設けることができる。様々な特徴をユニットとして示したのは、例示した装置の様々な機能態様を強調するためであり、このようなユニットが別個のハードウェア構成要素またはソフトウェア構成要素によって実現されなければならないことを必ずしも示唆するものではない。むしろ、１つまたは複数のユニットに関連する機能を共通または別個のハードウェア構成要素またはソフトウェア構成要素内に一体化することができる。 The code is like one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits Can be executed by one or more processors. Thus, the term “processor” as used herein may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. Also, in some aspects, the functionality described herein may be implemented in a dedicated software unit or hardware unit that is configured for encoding and decoding or that is incorporated into a composite video encoder decoder (CODEC). Can be provided. The various features are shown as units in order to emphasize various functional aspects of the illustrated apparatus, and it is not necessarily that such units must be implemented by separate hardware or software components. It is not a suggestion. Rather, functionality associated with one or more units can be integrated within a common or separate hardware or software component.

本開示の様々な実施形態について説明した。これらの実施形態および他の実施形態は以下の特許請求の範囲内である。
以下に本件出願当初の特許請求の範囲に記載された発明を付記する。
（１）
ビデオデータを符号化する方法において、
予測モードに基づく予測ビデオブロックを生成するためにビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用することと、
残余ブロックを形成するために前記ビデオフレーム内の前記元のビデオブロックから前記予測ビデオブロックを減算することと、
前記残余ブロックに適用するための第１の変換サイズを有する変換を選択することと、
前記選択された変換を示すヘッダデータを生成することであって、前記ヘッダデータが少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記第１の変換サイズを示しているように前記ヘッダデータを生成することと、
残余変換係数を生成するために前記選択された変換を前記残余ブロックに適用することと、
前記ヘッダデータおよび前記残余変換係数に基づいてビデオ信号を生成することを備える方法。
（２）
前記ヘッダデータは、符号化されたブロックパターンを示す第３の構文要素をさらに備え、前記第３の構文要素は第２の値を備え、前記第１の構文要素は前記第２の値が非零である場合順次前記第３の構文要素の後に続く、（１）に記載の方法。
（３）
前記第１の構文要素の前記第１の値は複数の変換サイズに対応する、（１）に記載の方法。
（４）
前記第１の値は、前記予測ビデオブロックの前記予測ブロックサイズに基づいて前記第１の変換サイズにマップされる、（３）に記載の方法。
（５）
前記第１の変換サイズはサイズＸ×Ｙであり、ＸはＹに等しくない、（３）に記載の方法。
（６）
ＸとＹの少なくとも一方は８に等しく、ＸとＹの少なくとも一方は１６に等しい、（５）に記載の方法。
（７）
前記第１の変換サイズは前記予測ビデオブロックの前記予測ブロックサイズに等しい、（１）に記載の方法。
（８）
前記第１の変換サイズはＮ×Ｍであり、ＭとＮの少なくとも一方は１６以上である、（１）に記載の方法。
（９）
前記選択された変換を示すヘッダデータを生成することは、
前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定することと、
前記予測ブロックサイズが第２のしきい値より小さいかどうかを判定することを備える、（１）に記載の方法。
（１０）
前記第１のしきい値は８×８であり、前記第２のしきい値は８×８である、（９）に記載の方法。
（１１）
前記選択された変換を示すヘッダデータを生成することは、
前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定することと、
前記予測ブロックサイズが第２の値に等しいかどうかを判定することを備える、（１）に記載の方法。
（１２）
前記第１のしきい値は８×８であり、前記第２の値は１６×１６である、（１１）に記載の方法。
（１３）
前記第１のしきい値は８×８であり、前記第２の値は１６×８である、（１１）に記載の方法。
（１４）
前記選択された変換は整数変換である、（１）に記載の方法。
（１５）
前記選択された変換は離散余弦変換である、（１）に記載の方法。
（１６）
前記選択された変換は方向性変換である、（１）に記載の方法。
（１７）
ビデオデータを復号する方法において、
少なくとも１つのブロックについてのヘッダデータ及び前記少なくとも１つのブロックについての残余変換係数を備えており、ビデオのフレーム内の前記少なくとも１つのブロックを示すビデオ信号を受信することであって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記少なくとも１つのブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示している、前記ビデオ信号を受信することと、
前記少なくとも１つのブロックの前記予測ブロックサイズの予測ビデオブロックを生成するために、前記少なくとも１つのブロックに空間予測または動き補償を適用することと、
前記第１の構文要素および前記第２の構文要素に基づいて前記少なくとも１つのブロックを符号化するのに使用される前記第１の変換サイズを判定すること、
復号された残余ブロックを得るために、前記判定された第１の変換サイズの逆変換を前記残余変換係数に適用することと、
復号されたビデオブロックを得るために、前記復号された残余ブロックを前記予測ビデオブロックに加算することを備える方法。
（１８）
前記ヘッダデータは、符号化されたブロックパターンを示す第３の構文要素をさらに備え、該第３の構文要素は第２の値を備え、前記第１の構文要素は、前記第２の値が非零である場合、順次前記第３の構文要素の後に続いている、（１７）に記載の方法。
（１９）
前記第１の構文要素の前記第１の値は複数の変換サイズに対応する、（１７）に記載の方法。
（２０）
前記第１の値は、前記少なくとも１つのブロックの前記予測ブロックサイズに基づいて前記第１の変換サイズにマップされる、（１９）に記載の方法。
（２１）
前記第１の変換サイズはサイズＸ×Ｙであり、ＸはＹに等しくない、（１７）に記載の方法。
（２２）
ＸとＹの少なくとも一方は８に等しく、ＸとＹの少なくとも一方は１６に等しい、（２１）に記載の方法。
（２３）
前記第１の変換サイズは前記少なくとも１つのブロックの前記予測ブロックサイズに等しい、（１７）に記載の方法。
（２４）
前記第１の変換サイズはＮ×Ｍであり、ＭとＮの少なくとも一方は１６以上である、（１７）に記載の方法。
（２５）
前記第１の変換サイズを判定することは、
前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定することと、
前記予測ブロックサイズが第２のしきい値より小さいかどうかを判定することとを備える、（１７）に記載の方法。
（２６）
前記第１のしきい値は８×８であり、前記第２のしきい値は８×８である、（２５）に記載の方法。
（２７）
前記第１の変換サイズを判定することは、
前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定することと、
前記予測ブロックサイズが第２の値に等しいかどうかを判定することを備える、（１７）に記載の方法。
（２８）
前記第１のしきい値は８×８であり、前記第２の値は１６×１６である、（２７）に記載の方法。
（２９）
前記第１のしきい値は８×８であり、前記第２の値は１６×８である、（２７）に記載の方法。
（３０）
前記逆変換は整数変換である、（１７）に記載の方法。
（３１）
前記逆変換は離散余弦変換である、（１７）に記載の方法。
（３２）
前記逆変換は方向性変換である、（１７）に記載の方法。
（３３）
ビデオデータを符号化する装置において、
予測モードに基づく予測ビデオブロックを生成するために、ビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用する手段と、
残余ブロックを形成するために、前記ビデオフレーム内の前記元のビデオブロックから前記予測ビデオブロックを減算する手段と、
前記残余ブロックに適用するために、第１の変換サイズを有する変換を選択して手段と、
前記選択された変換を示すヘッダデータを生成するための手段であって、
前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記第１の変換サイズを示している、前記ヘッダデータを生成するための手段と、
残余変換係数を生成するために前記選択された変換を前記残余ブロックに適用する手段と、
前記ヘッダデータおよび前記残余変換係数に基づいてビデオ信号を生成するための手段を備える装置。
（３４）
空間予測または動き補償を適用するための前記手段は予測ユニットを備え、減算するための前記手段は加算器を備え、前記変換サイズを選択するための前記手段はモード決定ユニットを備え、ヘッダデータを生成するための前記手段はエントロピー符号化ユニットを備え、前記選択された変換を適用するための前記手段はブロック変換ユニットを備え、ビデオ信号を生成するための前記手段は前記エントロピー符号化ユニットを備える、（３３）に記載の装置。
（３５）
ビデオデータを復号する装置において、
少なくとも１つのブロックについてのヘッダデータおよび前記少なくとも１つのブロックについての残余変換係数を備えており、ビデオのフレーム内の前記少なくとも１つのブロックを示すビデオ信号を受信するための手段であって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記少なくとも１つのブロックのモーションパーティションサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示している、前記ビデオ信号を受信するための手段と、
前記少なくとも１つのブロックの前記予測ブロックサイズの予測ビデオブロックを生成するために、前記少なくとも１つのブロックに空間予測または動き補償を適用するための手段と、
前記第１の構文要素および前記第２の構文要素に基づいて、前記少なくとも１つのブロックを符号化するのに使用される前記第１の変換サイズを判定するための手段と、
復号された残余ブロックを得るために、前記判定された第１の変換サイズの逆変換を前記残余変換係数に適用するための手段と、
復号されたビデオブロックを得るために、前記復号された残余ブロックを前記予測ビデオブロックに加算するための手段を備える装置。
（３６）
受信のための前記手段は受信器を備え、空間予測または動き補償を適用するための前記手段は予測ユニットを備え、前記第１の変換サイズを判定するための前記手段はエントロピー復号ユニットを備え、逆変換を適用するための前記手段は逆変換ユニットを備え、加算するための前記手段は加算器を備える、（３５）に記載の装置。
（３７）
ビデオデータを符号化するシステムにおいて、
予測ビデオブロックを生成するために、ビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用するように構成された予測ユニットと、
残余ブロックを形成するために、前記ビデオフレーム内の前記元のビデオブロックから前記予測ビデオブロックを減算するように構成された加算器と、
前記残余ブロックに適用する第１の変換サイズを有する変換を選択するように構成されたモード決定ユニットと、
残余変換係数を生成するために、前記選択された変換を前記残余ブロックに適用するように構成されたブロック変換ユニットと、
前記選択された変換を示すヘッダデータを生成するものであり、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記第１の変換サイズを示しており、そして、
前記ヘッダデータおよび前記残余変換係数に基づいてビデオ信号を生成するように構成されたエントロピー符号化ユニットを備えるシステム。
（３８）
前記ヘッダデータは、符号化されたブロックパターンを示す第３の構文要素をさらに備え、前記第３の構文要素は第２の値を備え、前記第１の構文要素は、前記第２の値が非零である場合、順次前記第３の構文要素の後に続く、（３７）に記載のシステム。
（３９）
前記第１の構文要素の前記第１の値は複数の変換サイズに対応する、（３７）に記載のシステム。
（４０）
前記第１の値は、前記予測ビデオブロックの前記予測ブロックサイズに基づいて前記第１の変換サイズにマップされる、（３９）に記載のシステム。
（４１）
前記第１の変換サイズはサイズＸ×Ｙであり、ＸはＹに等しくない、（３７）に記載のシステム。
（４２）
ＸとＹの少なくとも一方は８に等しく、ＸとＹの少なくとも一方は１６に等しい、（４１）に記載のシステム。
（４３）
前記第１の変換サイズは前記予測ビデオブロックの前記予測ブロックサイズに等しい、（３７）に記載のシステム。
（４４）
前記第１の変換サイズはＮ×Ｍであり、ＭとＮの少なくとも一方は１６以上である、（３７）に記載のシステム。
（４５）
前記エントロピー符号化ユニットはさらに、前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定し、かつ前記予測ブロックサイズが第２のしきい値より小さいかどうかを判定するように構成される、（３７）に記載のシステム。
（４６）
前記第１のしきい値は８×８であり、前記第２のしきい値は８×８である、（４５）に記載のシステム。
（４７）
前記エントロピー符号化ユニットはさらに、前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定し、かつ前記予測ブロックサイズが第２の値に等しいかどうかを判定するように構成される、（３７）に記載のシステム。
（４８）
前記第１のしきい値は８×８であり、前記第２の値は１６×１６である、請求項４７）に記載のシステム。
（４９）
前記第１のしきい値は８×８であり、前記第２の値は１６×８である、（４７）に記載のシステム。
（５０）
前記選択された変換は整数変換である、（３７）に記載のシステム。
（５１）
前記選択された変換は離散余弦変換である、（３７）に記載のシステム。
（５２）
前記選択された変換は方向性変換である、（３７）に記載のシステム。
（５３）
ビデオデータを復号するシステムにおいて、
少なくとも１つのブロックについてのヘッダデータおよび前記少なくとも１つのブロックについての残余変換係数を備えており、ビデオのフレーム内の前記少なくとも１つのブロックを示すビデオ信号を受信するように構成された受信器であって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記少なくとも１つのブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示している、前記受信器と、
前記少なくとも１つのブロックの前記予測ブロックサイズの予測ビデオブロックを生成するために、前記少なくとも１つのブロックに空間予測または動き補償を適用するように構成された予測ユニットと、
前記第１の構文要素および前記第２の構文要素に基づいて前記少なくとも１つのブロックを符号化するのに使用される前記第１の変換サイズを判定するように構成されたエントロピー復号ユニットと、
復号された残余ブロックを得るよために、前記判定された第１の変換サイズの逆変換を前記残余変換係数に適用するように構成された逆変換ユニットと、
復号されたビデオブロックを得るために、前記復号された残余ブロックを前記予測ビデオブロックに加算するように構成された加算器を備えるシステム。
（５４）
前記ヘッダデータは、符号化されたブロックパターンを示し、第２の値を備える第３の構文要素をさらに備え、前記第１の構文要素は、前記第２の値が非零である場合、順次前記第３の構文要素の後に続く、（５３）に記載のシステム。
（５５）
前記第１の構文要素の前記第１の値は複数の変換サイズに相当する、（５３）に記載のシステム。
（５６）
前記第１の値は、前記少なくとも１つのブロックの前記予測ブロックサイズに基づいて前記第１の変換サイズにマップされる、（５５）に記載のシステム。
（５７）
前記第１の変換サイズはサイズＸ×Ｙであり、ＸはＹに等しくない、（５３）に記載のシステム。
（５８）
ＸとＹの少なくとも一方は８に等しく、ＸとＹの少なくとも一方は１６に等しい、（５７）に記載のシステム。
（５９）
前記第１の変換サイズは前記少なくとも１つのブロックの前記予測ブロックサイズに等しい、（５３）に記載のシステム。
（６０）
前記第１の変換サイズはＮ×Ｍであり、ＭとＮの少なくとも一方は１６以上である、（５３）に記載のシステム。
（６１）
前記エントロピー復号ユニットはさらに、前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定し、かつ前記予測ブロックサイズが第２のしきい値より小さいかどうかを判定するように構成される、（５３）に記載のシステム。
（６２）
前記第１のしきい値は８×８であり、前記第２のしきい値は８×８である、（６１）に記載のシステム。
（６３）
前記エントロピー復号ユニットはさらに、前記予測ブロックサイズが第１のしきい値よりも大きいかどうかを判定し、かつ前記予測ブロックサイズが第２の値に等しいかどうかを判定するように構成される、（５３）に記載のシステム。
（６４）
前記第１のしきい値は８×８であり、前記第２の値は１６×１６である、（６３）に記載のシステム。
（６５）
前記第１のしきい値は８×８であり、前記第２の値は１６×８である、（６３）に記載のシステム。
（６６）
前記逆変換は整数変換である、（５３）に記載のシステム。
（６７）
前記逆変換は離散余弦変換である、（５３）に記載のシステム。
（６８）
前記逆変換は方向性変換である、（５３）に記載のシステム。
（６９）
実行時に、
予測モードに基づく予測ビデオブロックを生成するために、ビデオフレーム内の元のビデオブロックに空間予測または動き補償を適用することと、
残余ブロックを形成するために、前記ビデオフレーム内の前記元のビデオブロックから前記予測ビデオブロックを減算することと、
前記残余ブロックに適用するために、第１の変換サイズを有する変換を選択することと、
前記選択された変換を示すヘッダデータを生成することであって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記予測ビデオブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になって前記第１の構文要素と前記第２の構文要素が前記第１の変換サイズを示すように、前記ヘッダデータを生成することと、
残余変換係数を生成するために、前記選択された変換を前記残余ブロックに適用することと、
前記ヘッダデータおよび前記残余変換係数に基づいてビデオ信号を生成することを備える方法を実行する命令を備えるコンピュータ読み取り可能な媒体。
（７０）
実行時に、
少なくとも１つのブロックについてのヘッダデータおよび前記少なくとも１つのブロックについての残余変換係数を備えており、ビデオのフレーム内の前記少なくとも１つのブロックを示すビデオ信号を受信することであって、前記ヘッダデータが、少なくとも１つの変換サイズを示す第１の値を有する第１の構文要素および前記少なくとも１つのブロックの予測ブロックサイズを示す第２の構文要素を備えており、一緒になった前記第１の構文要素と前記第２の構文要素が前記少なくとも１つのブロックを符号化するのに使用される第１の変換サイズを有する変換を示している、前記ビデオ信号を受信することと、
前記少なくとも１つのブロックの前記予測ブロックサイズの予測ビデオブロックを生成するために、前記少なくとも１つのブロックに空間予測または動き補償を適用することと、
前記第１の構文要素および前記第２の構文要素に基づいて前記少なくとも１つのブロックを符号化するのに使用される前記第１の変換サイズを判定することと、
復号された残余ブロックを得るために、前記判定された第１の変換サイズの逆変換を前記残余変換係数に適用することと、
復号されたビデオブロックを得るために、前記復号された残余ブロックを前記予測ビデオブロックに加算することを備えた、方法を実行する命令を備えるコンピュータ読み取り可能な媒体。 Various embodiments of the disclosure have been described. These and other embodiments are within the scope of the following claims.
The invention described in the scope of the claims at the beginning of the present application is added below.
(1)
In a method of encoding video data,
Applying spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block based on the prediction mode;
Subtracting the predicted video block from the original video block in the video frame to form a residual block;
Selecting a transform having a first transform size to apply to the residual block;
Generating header data indicating the selected transform, wherein the header data indicates a first syntax element having a first value indicating at least one transform size and a predicted block size of the predicted video block; Generating the header data such that the first syntax element and the second syntax element together comprise a second syntax element and the second syntax element indicates the first transformation size;
Applying the selected transform to the residual block to generate residual transform coefficients;
Generating a video signal based on the header data and the residual transform coefficients.
(2)
The header data further includes a third syntax element indicating an encoded block pattern, the third syntax element includes a second value, and the first syntax element includes a non-second value. The method according to (1), wherein if zero, the third syntax element is sequentially followed.
(3)
The method of (1), wherein the first value of the first syntax element corresponds to a plurality of transform sizes.
(4)
The method of (3), wherein the first value is mapped to the first transform size based on the predicted block size of the predicted video block.
(5)
The method of (3), wherein the first transform size is size X × Y, where X is not equal to Y.
(6)
The method of (5), wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.
(7)
The method of (1), wherein the first transform size is equal to the predicted block size of the predicted video block.
(8)
The method according to (1), wherein the first transformation size is N × M, and at least one of M and N is 16 or more.
(9)
Generating header data indicative of the selected transformation;
Determining whether the predicted block size is greater than a first threshold;
The method of (1), comprising determining whether the predicted block size is less than a second threshold.
(10)
The method according to (9), wherein the first threshold value is 8 × 8 and the second threshold value is 8 × 8.
(11)
Generating header data indicative of the selected transformation;
Determining whether the predicted block size is greater than a first threshold;
The method of (1), comprising determining whether the predicted block size is equal to a second value.
(12)
The method according to (11), wherein the first threshold value is 8 × 8 and the second value is 16 × 16.
(13)
The method according to (11), wherein the first threshold value is 8 × 8 and the second value is 16 × 8.
(14)
The method of (1), wherein the selected transformation is an integer transformation.
(15)
The method of (1), wherein the selected transform is a discrete cosine transform.
(16)
The method of (1), wherein the selected transformation is a directional transformation.
(17)
In a method for decoding video data,
Receiving header data for at least one block and residual transform coefficients for the at least one block, receiving a video signal indicative of the at least one block in a frame of video, wherein the header data is A first syntax element having a first value indicating at least one transform size and a second syntax element indicating a predicted block size of the at least one block, the first syntax combined Receiving the video signal, wherein an element and the second syntax element indicate a transform having a first transform size used to encode the at least one block;
Applying spatial prediction or motion compensation to the at least one block to generate a predicted video block of the predicted block size of the at least one block;
Determining the first transform size used to encode the at least one block based on the first syntax element and the second syntax element;
Applying an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
A method comprising adding the decoded residual block to the predicted video block to obtain a decoded video block.
(18)
The header data further includes a third syntax element indicating an encoded block pattern, the third syntax element includes a second value, and the first syntax element includes the second value. The method according to (17), wherein when non-zero, the third syntax element is sequentially followed.
(19)
The method of (17), wherein the first value of the first syntax element corresponds to a plurality of transform sizes.
(20)
The method of (19), wherein the first value is mapped to the first transform size based on the predicted block size of the at least one block.
(21)
The method of (17), wherein the first transform size is size X × Y, where X is not equal to Y.
(22)
The method of (21), wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.
(23)
The method of (17), wherein the first transform size is equal to the predicted block size of the at least one block.
(24)
The method according to (17), wherein the first transformation size is N × M, and at least one of M and N is 16 or more.
(25)
Determining the first transform size includes
Determining whether the predicted block size is greater than a first threshold;
Determining whether the predicted block size is less than a second threshold.
(26)
The method according to (25), wherein the first threshold value is 8 × 8 and the second threshold value is 8 × 8.
(27)
Determining the first transform size includes
Determining whether the predicted block size is greater than a first threshold;
The method of (17), comprising determining whether the predicted block size is equal to a second value.
(28)
The method according to (27), wherein the first threshold value is 8 × 8 and the second value is 16 × 16.
(29)
The method according to (27), wherein the first threshold value is 8 × 8 and the second value is 16 × 8.
(30)
The method according to (17), wherein the inverse transformation is an integer transformation.
(31)
The method according to (17), wherein the inverse transform is a discrete cosine transform.
(32)
The method according to (17), wherein the inverse transformation is a directional transformation.
(33)
In an apparatus for encoding video data,
Means for applying spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block based on the prediction mode;
Means for subtracting the predicted video block from the original video block in the video frame to form a residual block;
Means for selecting a transform having a first transform size to apply to the residual block;
Means for generating header data indicative of the selected transformation comprising:
The header data comprises a first syntax element having a first value indicative of at least one transform size and a second syntax element indicative of a prediction block size of the prediction video block, together with the first syntax element Means for generating the header data, wherein one syntax element and the second syntax element indicate the first transform size;
Means for applying the selected transform to the residual block to generate residual transform coefficients;
An apparatus comprising means for generating a video signal based on the header data and the residual transform coefficients.
(34)
The means for applying spatial prediction or motion compensation comprises a prediction unit, the means for subtracting comprises an adder, the means for selecting the transform size comprises a mode determination unit, and header data The means for generating comprises an entropy coding unit, the means for applying the selected transform comprises a block transform unit, and the means for generating a video signal comprises the entropy coding unit. (33) The apparatus.
(35)
In an apparatus for decoding video data,
Means for receiving a video signal indicative of said at least one block in a frame of video comprising header data for at least one block and a residual transform coefficient for said at least one block, comprising: The data comprises a first syntax element having a first value indicative of at least one transform size and a second syntax element indicative of a motion partition size of the at least one block, the first syntax element together Means for receiving the video signal, wherein the syntax element and the second syntax element indicate a transform having a first transform size used to encode the at least one block;
Means for applying spatial prediction or motion compensation to the at least one block to generate a predictive video block of the predictive block size of the at least one block;
Means for determining the first transform size used to encode the at least one block based on the first syntax element and the second syntax element;
Means for applying an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
An apparatus comprising means for adding the decoded residual block to the predicted video block to obtain a decoded video block.
(36)
The means for receiving comprises a receiver, the means for applying spatial prediction or motion compensation comprises a prediction unit, and the means for determining the first transform size comprises an entropy decoding unit; The apparatus of (35), wherein said means for applying an inverse transform comprises an inverse transform unit, and said means for adding comprises an adder.
(37)
In a system for encoding video data,
A prediction unit configured to apply spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block;
An adder configured to subtract the predicted video block from the original video block in the video frame to form a residual block;
A mode determination unit configured to select a transform having a first transform size to apply to the residual block;
A block transform unit configured to apply the selected transform to the residual block to generate a residual transform coefficient;
Generating header data indicating the selected transform, wherein the header data indicates a first syntax element having a first value indicating at least one transform size and a predicted block size of the predicted video block; A first syntax element, the combined first syntax element and the second syntax element indicate the first transform size; and
A system comprising an entropy coding unit configured to generate a video signal based on the header data and the residual transform coefficients.
(38)
The header data further includes a third syntax element indicating an encoded block pattern, the third syntax element includes a second value, and the first syntax element includes the second value. The system according to (37), wherein when non-zero, the third syntax element is sequentially followed.
(39)
The system of (37), wherein the first value of the first syntax element corresponds to a plurality of transform sizes.
(40)
The system of (39), wherein the first value is mapped to the first transform size based on the predicted block size of the predicted video block.
(41)
The system of (37), wherein the first transform size is size X × Y, where X is not equal to Y.
(42)
The system according to (41), wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.
(43)
The system of (37), wherein the first transform size is equal to the predicted block size of the predicted video block.
(44)
The system according to (37), wherein the first transformation size is N × M, and at least one of M and N is 16 or more.
(45)
The entropy encoding unit is further configured to determine whether the predicted block size is greater than a first threshold and whether the predicted block size is less than a second threshold. The system according to (37).
(46)
The system according to (45), wherein the first threshold value is 8x8 and the second threshold value is 8x8.
(47)
The entropy encoding unit is further configured to determine whether the predicted block size is greater than a first threshold and whether the predicted block size is equal to a second value. (37).
(48)
48. The system of claim 47, wherein the first threshold is 8x8 and the second value is 16x16.
(49)
The system according to (47), wherein the first threshold value is 8 × 8 and the second value is 16 × 8.
(50)
The system of (37), wherein the selected transform is an integer transform.
(51)
The system of (37), wherein the selected transform is a discrete cosine transform.
(52)
The system of (37), wherein the selected transformation is a directional transformation.
(53)
In a system for decoding video data,
A receiver configured to receive a video signal indicative of the at least one block in a frame of video, comprising header data for at least one block and a residual transform coefficient for the at least one block. The header data comprises a first syntax element having a first value indicating at least one transform size and a second syntax element indicating a predicted block size of the at least one block, together The receiver wherein the first syntax element and the second syntax element are indicative of a transform having a first transform size used to encode the at least one block;
A prediction unit configured to apply spatial prediction or motion compensation to the at least one block to generate a prediction video block of the prediction block size of the at least one block;
An entropy decoding unit configured to determine the first transform size used to encode the at least one block based on the first syntax element and the second syntax element;
An inverse transform unit configured to apply an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
A system comprising an adder configured to add the decoded residual block to the predicted video block to obtain a decoded video block.
(54)
The header data indicates an encoded block pattern and further includes a third syntax element having a second value, and the first syntax element is sequentially updated when the second value is non-zero. The system according to (53), which follows the third syntax element.
(55)
The system according to (53), wherein the first value of the first syntax element corresponds to a plurality of transform sizes.
(56)
The system of claim 55, wherein the first value is mapped to the first transform size based on the predicted block size of the at least one block.
(57)
The system of (53), wherein the first transform size is size X × Y, where X is not equal to Y.
(58)
The system of (57), wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.
(59)
The system of (53), wherein the first transform size is equal to the predicted block size of the at least one block.
(60)
The system according to (53), wherein the first transformation size is N × M, and at least one of M and N is 16 or more.
(61)
The entropy decoding unit is further configured to determine whether the predicted block size is greater than a first threshold and whether the predicted block size is less than a second threshold. The system according to (53).
(62)
The system of (61), wherein the first threshold is 8x8 and the second threshold is 8x8.
(63)
The entropy decoding unit is further configured to determine whether the predicted block size is greater than a first threshold and determine whether the predicted block size is equal to a second value. The system according to (53).
(64)
The system of (63), wherein the first threshold value is 8x8 and the second value is 16x16.
(65)
The system of (63), wherein the first threshold value is 8x8 and the second value is 16x8.
(66)
The system according to (53), wherein the inverse transform is an integer transform.
(67)
The system according to (53), wherein the inverse transform is a discrete cosine transform.
(68)
The system according to (53), wherein the inverse transformation is a directional transformation.
(69)
At runtime,
Applying spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block based on the prediction mode;
Subtracting the predicted video block from the original video block in the video frame to form a residual block;
Selecting a transform having a first transform size to apply to the residual block;
Generating header data indicating the selected transform, wherein the header data includes a first syntax element having a first value indicating at least one transform size and a predicted block size of the predicted video block; Generating the header data such that together the first syntax element and the second syntax element indicate the first transform size, the second syntax element indicating:
Applying the selected transform to the residual block to generate a residual transform coefficient;
A computer readable medium comprising instructions for performing a method comprising generating a video signal based on the header data and the residual transform coefficients.
(70)
At runtime,
Receiving a video signal indicative of the at least one block in a frame of video comprising header data for at least one block and a residual transform coefficient for the at least one block, wherein the header data comprises: A first syntax element having a first value indicating at least one transform size and a second syntax element indicating a predicted block size of the at least one block, the first syntax combined Receiving the video signal, wherein an element and the second syntax element indicate a transform having a first transform size used to encode the at least one block;
Applying spatial prediction or motion compensation to the at least one block to generate a predicted video block of the predicted block size of the at least one block;
Determining the first transform size used to encode the at least one block based on the first syntax element and the second syntax element;
Applying an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
A computer readable medium comprising instructions for performing a method comprising adding the decoded residual block to the predicted video block to obtain a decoded video block.

Claims

In a method for encoding video data by an encoder , the method comprises:
Applying spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block based on the prediction mode;
Subtracting the predicted video block from the original video block in the video frame to form a residual block;
Selecting a transform having a first transform size to apply to the residual block;
Applying the selected transform to the residual block to generate a residual transform coefficient;
Generating header data indicating the selected transformation, comprising:
The header data comprises a first syntax element having a first value indicating at least three transform sizes and a second syntax element indicating a prediction block size of the prediction video block , wherein the at least three The transform size comprises at least one N × M transform size, and at least one of M and N is greater than or equal to 16, where the first only when combined with the second syntax element The first value of the syntax element indicates the first transform size, where the header data further includes a third syntax element indicating whether the residual transform coefficient includes one or more non-zero coefficients. Prepare
Generating the header data ;
Generating a video signal based on the header data and the residual transform coefficient;
A method comprising:

The method of claim 1, wherein the third syntax element comprises a second value, and wherein the first syntax element sequentially follows the third syntax element when the second value is non-zero. .

The method of claim 1, wherein the first value of the first syntax element corresponds to a plurality of transform sizes.

The method of claim 3, wherein the first value is mapped to the first transform size based on the predicted block size of the predicted video block.

The method of claim 1, wherein the first transform size is size X × Y, where X is not equal to Y.

6. The method of claim 5, wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.

The method of claim 1, wherein the first transform size is equal to the predicted block size of the predicted video block.

Generating header data indicative of the selected transformation;
Determining whether the predicted block size is greater than a first threshold;
The method of claim 1, comprising determining whether the predicted block size is less than a second threshold.

The method of claim 8, wherein the first threshold is 8 × 8 and the second threshold is 8 × 8.

Generating header data indicative of the selected transformation;
Determining whether the predicted block size is greater than a first threshold;
The method of claim 1, comprising determining whether the predicted block size is equal to a second value.

The method of claim 10, wherein the first threshold is 8 × 8 and the second value is 16 × 16.

The method of claim 10, wherein the first threshold is 8 × 8 and the second value is 16 × 8.

The method of claim 1, wherein the selected transform is an integer transform.

The method of claim 1, wherein the selected transform is a discrete cosine transform.

In a method for decoding video data by a decoder , the method comprises:
Receiving header data for at least one block and residual transform coefficients for the at least one block, receiving a video signal indicative of the at least one block in a frame of video, wherein the header data is A first syntax element having a first value indicative of at least three transform sizes and a second syntax element indicative of a predicted block size of the at least one block , wherein the at least three transform sizes are , At least one N × M transform size, and at least one of M and N is greater than or equal to 16, wherein only when combined with the second syntax element is the first syntax element The first value indicates the first transform size, wherein the header data further includes the residual transform coefficient A third syntax element indicating whether to include one or more non-zero coefficients, the method comprising: receiving the video signal,
Applying spatial prediction or motion compensation to the at least one block to generate a predicted video block of the predicted block size of the at least one block;
Determining the first transform size used to encode the at least one block based on the first syntax element and the second syntax element,
Applying an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
A method comprising adding the decoded residual block to the predicted video block to obtain a decoded video block.

Said header data, said third syntax element comprising a second value, the first syntax element, when the second value is non-zero, followed after sequential said third syntax element The method according to claim 15.

The method of claim 15, wherein the first value of the first syntax element corresponds to a plurality of transform sizes.

The method of claim 17, wherein the first value is mapped to the first transform size based on the predicted block size of the at least one block.

The method of claim 15, wherein the first transform size is size X × Y, where X is not equal to Y.

20. The method of claim 19, wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.

The method of claim 15, wherein the first transform size is equal to the predicted block size of the at least one block.

Determining the first transform size includes
Determining whether the predicted block size is greater than a first threshold;
16. The method of claim 15, comprising determining whether the predicted block size is less than a second threshold.

23. The method of claim 22, wherein the first threshold is 8x8 and the second threshold is 8x8.

Determining the first transform size includes
Determining whether the predicted block size is greater than a first threshold;
16. The method of claim 15, comprising determining whether the predicted block size is equal to a second value.

25. The method of claim 24, wherein the first threshold is 8x8 and the second value is 16x16.

25. The method of claim 24, wherein the first threshold is 8x8 and the second value is 16x8.

The method of claim 15, wherein the inverse transform is an integer transform.

The method of claim 15, wherein the inverse transform is a discrete cosine transform.

In an apparatus for encoding video data,
Means for applying spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block based on the prediction mode;
Means for subtracting the predicted video block from the original video block in the video frame to form a residual block;
Means for selecting a transform having a first transform size to apply to the residual block;
Means for applying the selected transform to the residual block to generate a residual transform coefficient;
Means for generating header data indicative of the selected transformation comprising:
The header data comprises a first syntax element having a first value indicating at least three transform sizes and a second syntax element indicating a prediction block size of the prediction video block , wherein the at least three The transform size comprises at least one N × M transform size, and at least one of M and N is greater than or equal to 16, where the first only when combined with the second syntax element The first value of the syntax element indicates the first transform size, where the header data further includes a third syntax element indicating whether the residual transform coefficient includes one or more non-zero coefficients. comprising, means for generating the header data,
An apparatus comprising means for generating a video signal based on the header data and the residual transform coefficients.

The means for applying spatial prediction or motion compensation comprises a prediction unit, the means for subtracting comprises an adder, the means for selecting the transform size comprises a mode determination unit, and header data The means for generating comprises an entropy coding unit, the means for applying the selected transform comprises a block transform unit, and the means for generating a video signal comprises the entropy coding unit. 30. The apparatus of claim 29.

In an apparatus for decoding video data,
Means for receiving a video signal indicative of said at least one block in a frame of video comprising header data for at least one block and a residual transform coefficient for said at least one block, comprising: The data comprises a first syntax element having a first value indicative of at least three transform sizes and a second syntax element indicative of a predicted block size of the predictive video block , wherein the at least three transform sizes Comprises at least one N × M transform size, and at least one of M and N is greater than or equal to 16, wherein the first syntax element only when combined with the second syntax element The first value of the first transform size used to encode the at least one block Shown, wherein the header data further comprises a third syntax element in which the residual transform coefficients indicate whether to include one or more non-zero coefficients, and means for receiving said video signal,
Means for applying spatial prediction or motion compensation to the at least one block to generate a predictive video block of the predictive block size of the at least one block;
It means for determining on the basis of the first syntax element and the second syntax element, wherein the at least one block is used to encode the first transform size,
Means for applying an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
An apparatus comprising means for adding the decoded residual block to the predicted video block to obtain a decoded video block.

The means for receiving comprises a receiver, the means for applying spatial prediction or motion compensation comprises a prediction unit, and the means for determining the first transform size comprises an entropy decoding unit; 32. The apparatus of claim 31, wherein the means for applying an inverse transform comprises an inverse transform unit, and the means for adding comprises an adder.

In a system for encoding video data,
A prediction unit configured to apply spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block;
An adder configured to subtract the predicted video block from the original video block in the video frame to form a residual block;
A mode determination unit configured to select a transform having a first transform size to apply to the residual block;
A block transform unit configured to apply the selected transform to the residual block to generate a residual transform coefficient;
An entropy encoding unit that generates header data indicative of the selected transform;
The header data comprises a first syntax element having a first value indicating at least three transform sizes and a second syntax element indicating a prediction block size of the prediction video block , wherein the at least three The transform size comprises at least one N × M transform size, and at least one of M and N is greater than or equal to 16, where the first only when combined with the second syntax element The first value of the syntax element indicates the first transform size, and is configured to generate a video signal based on the header data and the residual transform coefficient , wherein the header data further includes The entropy coding unit comprising a third syntax element that indicates whether the residual transform coefficients include one or more non-zero coefficients ;
A system comprising:

The header data includes pre Symbol third syntax element comprising a second value, the first syntax element, when the second value is non-zero, followed by sequential the third syntax element 34. The system of claim 33.

34. The system of claim 33, wherein the first value of the first syntax element corresponds to a plurality of transform sizes.

36. The system of claim 35, wherein the first value is mapped to the first transform size based on the predicted block size of the predicted video block.

34. The system of claim 33, wherein the first transform size is size XxY, where X is not equal to Y.

38. The system of claim 37, wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.

34. The system of claim 33, wherein the first transform size is equal to the predicted block size of the predicted video block.

The entropy encoding unit is further configured to determine whether the predicted block size is greater than a first threshold and whether the predicted block size is less than a second threshold. 34. The system of claim 33, wherein:

41. The system of claim 40, wherein the first threshold is 8x8 and the second threshold is 8x8.

The entropy encoding unit is further configured to determine whether the predicted block size is greater than a first threshold and whether the predicted block size is equal to a second value. 34. The system of claim 33.

43. The system of claim 42, wherein the first threshold is 8x8 and the second value is 16x16.

43. The system of claim 42, wherein the first threshold is 8x8 and the second value is 16x8.

34. The system of claim 33, wherein the selected transform is an integer transform.

34. The system of claim 33, wherein the selected transform is a discrete cosine transform.

In a system for decoding video data,
A receiver configured to receive a video signal indicative of the at least one block in a frame of video, comprising header data for at least one block and a residual transform coefficient for the at least one block. And
The header data comprises a first syntax element having a first value indicating at least three transform sizes and a second syntax element indicating a prediction block size of the prediction video block , wherein the at least three The transform size comprises at least one N × M transform size, and at least one of M and N is greater than or equal to 16, where the first only when combined with the second syntax element The first value of the syntax element indicates a first transform size used to encode the at least one block, wherein the header data further includes one or more non-residue transform coefficients. The receiver comprising a third syntax element indicating whether to include a zero coefficient ;
A prediction unit configured to apply spatial prediction or motion compensation to the at least one block to generate a prediction video block of the prediction block size of the at least one block;
An entropy decoding unit configured to determine the first transform size used to encode the at least one block based on the first syntax element and the second syntax element;
An inverse transform unit configured to apply an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
A system comprising an adder configured to add the decoded residual block to the predicted video block to obtain a decoded video block.

It said header data comprises further a third syntax element comprising a second value, the first syntax element, when the second value is non-zero, followed by sequential the third syntax element 48. The system of claim 47.

48. The system of claim 47, wherein the first value of the first syntax element corresponds to a plurality of transform sizes.

50. The system of claim 49, wherein the first value is mapped to the first transform size based on the predicted block size of the at least one block.

48. The system of claim 47, wherein the first transform size is size XxY, where X is not equal to Y.

52. The system of claim 51, wherein at least one of X and Y is equal to 8, and at least one of X and Y is equal to 16.

48. The system of claim 47, wherein the first transform size is equal to the predicted block size of the at least one block.

The entropy decoding unit is further configured to determine whether the predicted block size is greater than a first threshold and whether the predicted block size is less than a second threshold. 54. The system of claim 53.

55. The system of claim 54, wherein the first threshold is 8x8 and the second threshold is 8x8.

The entropy decoding unit is further configured to determine whether the predicted block size is greater than a first threshold and determine whether the predicted block size is equal to a second value. 48. The system of claim 47.

57. The system of claim 56, wherein the first threshold is 8x8 and the second value is 16x16.

57. The system of claim 56, wherein the first threshold is 8x8 and the second value is 16x8.

48. The system of claim 47, wherein the inverse transform is an integer transform.

48. The system of claim 47, wherein the inverse transform is a discrete cosine transform.

A computer-readable storage medium that stores instructions for one or more computers to perform a process, said instructions causing a method to be executed;
Based on the instructions, the method includes:
Applying spatial prediction or motion compensation to the original video block in the video frame to generate a predictive video block based on the prediction mode;
Subtracting the predicted video block from the original video block in the video frame to form a residual block;
Selecting a transform having a first transform size to apply to the residual block;
Applying the selected transform to the residual block to generate a residual transform coefficient;
Generating header data indicating the selected transformation, comprising:
The header data comprises a first syntax element having a first value indicating at least three transform sizes and a second syntax element indicating a prediction block size of the prediction video block , wherein the at least three The transform size comprises at least one N × M transform size, and at least one of M and N is greater than or equal to 16, where the first only when combined with the second syntax element The first value of the syntax element indicates the first transform size; and
Generating a video signal based on the header data and the residual transform coefficient ;
The computer-readable storage medium.

A computer-readable storage medium that stores instructions for one or more computers to perform a process, said instructions causing a method to be executed;
Based on the instructions, the method includes:
Receiving a video signal indicative of the at least one block in a frame of video comprising header data for at least one block and a residual transform coefficient for the at least one block, wherein the header data comprises: , A first syntax element having a first value indicating at least three transform sizes, and a second syntax element indicating a predicted block size of the predicted video block , wherein the at least three transform sizes are: At least one N × M transform size, and at least one of M and N is greater than or equal to 16, wherein the first syntax element of the first syntax element only when combined with the second syntax element A first value indicates a first transform size used to encode the at least one block; Wherein said header data further includes said residual transform coefficients comprises a third syntax element indicating whether to include one or more non-zero coefficients, for receiving said video signal,
Applying spatial prediction or motion compensation to the at least one block to generate a predicted video block of the predicted block size of the at least one block;
Determining the first transform size used to encode the at least one block based on the first syntax element and the second syntax element;
Applying an inverse transform of the determined first transform size to the residual transform coefficients to obtain a decoded residual block;
Adding the decoded residual block to the predicted video block to obtain a decoded video block ;
The computer-readable storage medium.