JP6744505B2

JP6744505B2 - Coding video syntax elements using context trees

Info

Publication number: JP6744505B2
Application number: JP2019568606A
Authority: JP
Inventors: チャン、チン−ハン; ハン、ジンニン; シュー、ヤオウー
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2017-07-13
Filing date: 2018-03-16
Publication date: 2020-08-19
Anticipated expiration: 2038-03-16
Also published as: US20190020900A1; CN110731084A; US10506258B2; KR102165070B1; EP3652946A1; KR20190139313A; CN110731084B; WO2019013842A1; JP2020523862A

Description

デジタルビデオストリームは、一連のフレームまたは静止画像を用いてビデオを表し得る。デジタルビデオは、例えば、ビデオ会議、高解像度ビデオエンターテイメント、ビデオ広告、またはユーザにより生成されたビデオの共有を含む、種々のアプリケーションに使用され得る。デジタルビデオストリームは、大量のデータを含み、ビデオデータの処理、送信、または保存のために、コンピューティングデバイスの大量の計算または通信リソースを消費する可能性がある。圧縮および他の符号化技術を含む種々のアプローチが、ビデオストリームにおいてデータ量を削減するために提案されている。 Digital video streams may represent video using a series of frames or still images. Digital video may be used in a variety of applications including, for example, video conferencing, high definition video entertainment, video advertising, or sharing user-generated video. Digital video streams contain large amounts of data and can consume large amounts of computing or communication resources of a computing device for processing, transmitting, or storing video data. Various approaches have been proposed to reduce the amount of data in video streams, including compression and other coding techniques.

本開示の一実装形態によるビデオフレームの現在のブロックを符号化する方法は、ビデオフレームの以前に符号化されたブロックに関連する第１のセットのシンタックス要素、および第１のセットのシンタックス要素を符号化するために用いられる第１のコンテキスト情報を識別することを含む。方法は、第１のコンテキスト情報に基づいて第１のセットのシンタックス要素をデータグループに分離することによって、コンテキストツリーを生成することをさらに含む。コンテキストツリーは、データグループを表す複数のノードを含む。複数のノードは、第１のセットのシンタックス要素のコスト削減に関連する。第１のセットのシンタックス要素を分離することは、第１のコンテキスト情報の値に対して分離基準を適用して、複数のノードのうちの少なくともいくつかを生成することを含む。方法は、現在のブロックに関連する第２のセットのシンタックス要素、および第２のセットのシンタックス要素に関連する第２のコンテキスト情報を識別することをさらに含む。方法は、第２のセットのシンタックス要素のうちの１つのシンタックス要素に関連する第２のコンテキスト情報の値に基づいて複数のノードのうちの１つを識別することをさらに含み、識別されたノードは、その１つのシンタックス要素を含むデータグループを表す。方法は、識別されたノードに関連する確率モデルに従ってその１つのシンタックス要素を符号化することをさらに含む。 A method of encoding a current block of a video frame according to one implementation of the present disclosure includes a first set of syntax elements associated with a previously encoded block of a video frame, and a first set of syntax elements. Includes identifying first contextual information used to encode the element. The method further includes generating a context tree by separating the first set of syntax elements into data groups based on the first context information. The context tree includes multiple nodes that represent a data group. The plurality of nodes is associated with cost savings for the first set of syntax elements. Separating the first set of syntax elements includes applying a separation criterion to the values of the first context information to generate at least some of the plurality of nodes. The method further includes identifying a second set of syntax elements associated with the current block and second context information associated with the second set of syntax elements. The method further comprises and identifies one of the plurality of nodes based on a value of second context information associated with a syntax element of one of the second set of syntax elements. Each node represents a data group including the one syntax element. The method further includes encoding the one syntax element according to a probabilistic model associated with the identified node.

本開示の一実装形態による符号化されたビデオフレームの符号化されたブロックを復号化する方法は、符号化されたビデオフレームの以前に復号化されたブロックに関連する第１のセットのシンタックス要素、および第１のセットのシンタックス要素を復号化するために用いられる第１のコンテキスト情報を識別することを含む。方法は、第１のコンテキスト情報に基づいて第１のセットのシンタックス要素をデータグループに分離することによって、コンテキストツリーを生成することをさらに含む。コンテキストツリーは、データグループを表す複数のノードを含む。複数のノードは、第１のセットのシンタックス要素のコスト削減に関連する。第１のセットのシンタックス要素を分離することは、第１のコンテキスト情報の値に対して分離基準を適用して、複数のノードのうちの少なくともいくつかを生成することを含む。方法は、符号化されたブロックに関連する第２のセットのシンタックス要素、および第２のセットのシンタックス要素に関連する第２のコンテキスト情報を識別することをさらに含む。方法は、第２のセットのシンタックス要素のうちの１つのシンタックス要素に関連する第２のコンテキスト情報の値に基づいて複数のノードのうちの１つを識別することをさらに含み、識別されたノードは、その１つのシンタックス要素を含むデータグループを表す。方法は、識別されたノードに関連する確率モデルに従ってその１つのシンタックス要素を復号化することをさらに含む。 A method of decoding a coded block of a coded video frame according to an implementation of the disclosure includes a first set of syntax associated with a previously decoded block of a coded video frame. Includes identifying an element and first contextual information used to decode the first set of syntax elements. The method further includes generating a context tree by separating the first set of syntax elements into data groups based on the first context information. The context tree includes multiple nodes that represent a data group. The plurality of nodes is associated with cost savings for the first set of syntax elements. Separating the first set of syntax elements includes applying a separation criterion to the values of the first context information to generate at least some of the plurality of nodes. The method further includes identifying a second set of syntax elements associated with the encoded block and second context information associated with the second set of syntax elements. The method further comprises and identifies one of the plurality of nodes based on a value of second context information associated with a syntax element of one of the second set of syntax elements. Each node represents a data group including the one syntax element. The method further includes decoding the one syntax element according to a probabilistic model associated with the identified node.

本開示の一実装形態による、符号化されたビデオフレームの符号化されたブロックを復号化するための装置は、非一時的記憶媒体に格納された命令を実行するように構成されたプロセッサを備える。命令は、符号化されたビデオフレームの以前に復号化されたブロックに関連する第１のセットのシンタックス要素、および第１のセットのシンタックス要素を復号化するために用いられる第１のコンテキスト情報を識別する命令を含む。命令は、第１のコンテキスト情報に基づいて第１のセットのシンタックス要素をデータグループに分離することによって、コンテキストツリーを生成する命令をさらに含む。コンテキストツリーは、データグループを表す複数のノードを含む。複数のノードは、第１のセットのシンタックス要素のコスト削減に関連する。第１のセットのシンタックス要素を分離する命令は、第１のコンテキスト情報の値に対して分離基準を適用して、複数のノードのうちの少なくともいくつかを生成する命令を含む。命令は、符号化されたブロックに関連する第２のセットのシンタックス要素、および第２のセットのシンタックス要素に関連する第２のコンテキスト情報を識別する命令をさらに含む。命令は、第２のセットのシンタックス要素のうちの１つのシンタックス要素に関連する第２のコンテキスト情報の値に基づいてノードのうちの１つを識別する命令をさらに含む。識別されたノードは、１つのシンタックス要素を含むデータグループを表す。命令は、識別されたノードに関連する確率モデルに従って１つのシンタックス要素を復号化する命令をさらに含む。 An apparatus for decoding an encoded block of an encoded video frame, according to an implementation of the disclosure, comprises a processor configured to execute instructions stored on a non-transitory storage medium. .. The instructions include a first set of syntax elements associated with a previously decoded block of an encoded video frame, and a first context used to decode the first set of syntax elements. Contains instructions that identify information. The instructions further include instructions for generating a context tree by separating the first set of syntax elements into data groups based on the first context information. The context tree includes multiple nodes that represent a data group. The plurality of nodes is associated with cost savings for the first set of syntax elements. The instructions that separate the first set of syntax elements include instructions that apply a separation criterion to a value of the first context information to generate at least some of the plurality of nodes. The instructions further include instructions identifying a second set of syntax elements associated with the encoded block and second context information associated with the second set of syntax elements. The instructions further include instructions for identifying one of the nodes based on a value of second context information associated with a syntax element of one of the second set of syntax elements. The identified node represents a data group that contains one syntax element. The instructions further include instructions for decoding one syntax element according to a probabilistic model associated with the identified node.

本開示の一実装形態による、ビデオフレームの現在のブロックを復号化するための装置は、非一時的記憶媒体に格納された命令を実行するように構成されたプロセッサを備える。命令は、ビデオフレームの以前に符号化されたブロックに関連する第１のセットのシンタックス要素、および第１のセットのシンタックス要素を符号化するために用いられる第１のコンテキスト情報を識別する命令を含む。命令は、第１のコンテキスト情報に基づいて第１のセットのシンタックス要素をデータグループに分離することによって、コンテキストツリーを生成する命令をさらに含む。コンテキストツリーは、データグループを表す複数のノードを含む。複数のノードは、第１のセットのシンタックス要素のコスト削減に関連する。第１のセットのシンタックス要素を分離する命令は、第１のコンテキスト情報の値に対して分離基準を適用して、複数のノードのうちの少なくともいくつかを生成する命令を含む。命令は、現在のブロックに関連する第２のセットのシンタックス要素、および第２のセットのシンタックス要素に関連する第２のコンテキスト情報を識別する命令をさらに含む。命令は、第２のセットのシンタックス要素のうちの１つのシンタックス要素に関連する第２のコンテキスト情報の値に基づいてノードのうちの１つを識別する命令をさらに含み、識別されたノードは、その１つのシンタックス要素を含むデータグループを表す。命令は、識別されたノードに関連する確率モデルに従ってその１つのシンタックス要素を符号化する命令をさらに含む。 An apparatus for decoding a current block of a video frame, according to one implementation of the disclosure, comprises a processor configured to execute instructions stored on a non-transitory storage medium. The instructions identify a first set of syntax elements associated with a previously coded block of a video frame, and first context information used to code the first set of syntax elements. Including instructions. The instructions further include instructions for generating a context tree by separating the first set of syntax elements into data groups based on the first context information. The context tree includes multiple nodes that represent a data group. The plurality of nodes is associated with cost savings for the first set of syntax elements. The instructions that separate the first set of syntax elements include instructions that apply a separation criterion to a value of the first context information to generate at least some of the plurality of nodes. The instructions further include instructions that identify a second set of syntax elements associated with the current block and second context information associated with the second set of syntax elements. The instructions further include instructions for identifying one of the nodes based on a value of second context information associated with a syntax element of one of the second set of syntax elements, the identified node Represents a data group containing the one syntax element. The instructions further include instructions encoding the one syntax element according to a probabilistic model associated with the identified node.

コスト削減は、第１のセットのシンタックス要素をコーディングするためのコストの削減であってよい。このコストは、計算コストであり得る。計算コストは、例えば、コーディングのために使用される計算リソースの量を示し得る。シンタックス要素をコーディングするためのコストは、シンタックス要素をコーディングするための時間、シンタックス要素をコーディングするために使用されるスペース（例えば、コンピュータメモリのサイズ）、シンタックス要素をコーディングするための演算の数（例えば、算術演算）、およびコーディングするシンタックス要素の数から成るパラメータのグループから選択される１つまたは複数のパラメータに依存するか、またはそれらを示し得る。 The cost reduction may be a cost reduction for coding the first set of syntax elements. This cost can be a computational cost. Computational cost may indicate, for example, the amount of computational resources used for coding. The cost to code a syntax element is the time it takes to code the syntax element, the space used to code the syntax element (eg, the size of computer memory), and the cost to code the syntax element. It may depend on or indicate one or more parameters selected from the group of parameters consisting of the number of operations (eg arithmetic operations) and the number of syntax elements to code.

本開示のこれらおよび他の態様は、以下の詳細な説明、添付の特許請求の範囲、および添付の図面において開示される。 These and other aspects of the disclosure are disclosed in the following detailed description, the appended claims and the accompanying drawings.

本明細書の記載は、以下に説明される添付の図面を参照し、いくつかの図面に亘って同様の参照番号が同様の構成を参照している。
ビデオ符号化および復号化システムの概略図である。送信局または受信局を実装し得るコンピューティングデバイスの一例のブロック図である。符号化され、続いて復号化される典型的なビデオストリームの図である。本開示の実装形態による符号化器のブロック図である。本開示の実装形態による復号化器のブロック図である。ビデオフレームのブロックに関連するシンタックス要素をコーディングするための技術のフローチャート図である。ビデオフレームのブロックに関連するシンタックス要素をコーディングするためのコンテキストツリーを生成するための技術のフローチャート図である。ビデオフレームの現在のブロックに関連するシンタックス要素を符号化するためのシステムのブロック図である。符号化されたビデオフレームの符号化されたブロックに関連する符号化されたシンタックス要素を復号化するためのシステムのブロック図である。図１０Ａは、コンテキストツリーを生成するための第１段階の一例を示し、図１０Ｂは、コンテキストツリーを生成するための第２段階の一例を示し、図１０Ｃは、コンテキストツリーを生成するための第３段階の一例を示し、図１０Ｄは、コンテキストツリーを生成するための第４段階の一例を示す図である。コンテキストツリーの一例を示す図である。 The description herein refers to the accompanying drawings, which are described below, and like reference numerals refer to like features throughout the several views.
FIG. 3 is a schematic diagram of a video encoding and decoding system. FIG. 6 is a block diagram of an example of a computing device that may implement a transmitter station or a receiver station. FIG. 3 is a diagram of an exemplary video stream that is encoded and then decoded. FIG. 3 is a block diagram of an encoder according to an implementation of the present disclosure. FIG. 6 is a block diagram of a decoder according to an implementation of the present disclosure. FIG. 6 is a flow chart diagram of a technique for coding syntax elements associated with blocks of a video frame. FIG. 6 is a flow chart diagram of a technique for generating a context tree for coding syntax elements associated with blocks of a video frame. FIG. 6 is a block diagram of a system for encoding syntax elements associated with a current block of a video frame. FIG. 7 is a block diagram of a system for decoding encoded syntax elements associated with encoded blocks of an encoded video frame. 10A shows an example of a first step for generating a context tree, FIG. 10B shows an example of a second step for generating a context tree, and FIG. 10C shows a first step for generating a context tree. FIG. 10D is a diagram showing an example of the fourth stage for generating the context tree, showing an example of the three stages. It is a figure which shows an example of a context tree.

ビデオ圧縮スキームは、それぞれの画像またはフレームをブロックなどのより小さな複数の部分に分割すること、これらのそれぞれのブロックに関連するシンタックス要素の符号化を制限する技術を用いて符号化されたビットストリームを生成することを含み得る。符号化されたビットストリームを復号化して、符号化されたシンタックス要素からソース画像を再生成することができる。例えば、ビデオ圧縮スキームは、ビデオストリームの現在のブロックの予測残差を変換ブロックの変換係数に変換することを含み得る。変換係数は、量子化され、符号化されたビットストリームにエントロピー符号化される。復号化器は、符号化された変換係数を用いて符号化されたビットストリームを復号化または圧縮解除し、閲覧またはさらなる処理のためにビデオストリームを準備する。シンタックス要素は、符号化または復号化されるべきビデオシーケンスのすべてまたは一部を表すデータの要素である。例えば、シンタックス要素は、変換ブロックの変換係数、予測残差を生成するために使用される動きベクトル、フレームヘッダ内のフラグの値、またはビデオシーケンスに関連する他のデータであり得る。 Video compression schemes use bits that are encoded using techniques that divide each image or frame into smaller parts, such as blocks, that limit the encoding of syntax elements associated with each of these blocks. It may include generating a stream. The encoded bitstream can be decoded to regenerate the source image from the encoded syntax elements. For example, the video compression scheme may include transforming the prediction residual of the current block of the video stream into transform coefficients of the transform block. The transform coefficients are quantized and entropy coded into the coded bitstream. The decoder decodes or decompresses the encoded bitstream with the encoded transform coefficients, preparing the video stream for viewing or further processing. Syntax elements are elements of data that represent all or part of a video sequence to be encoded or decoded. For example, the syntax element may be a transform coefficient of a transform block, a motion vector used to generate a prediction residual, a value of a flag in a frame header, or other data associated with a video sequence.

コンテキスト情報は、シンタックス要素のエントロピー符号化およびエントロピー復号化で使用され得る。コンテキスト情報の例は、輝度プレーン（ｌｕｍｉｎａｎｃｅｐｌａｎｅ）、色度プレーン（ｃｈｒｏｍｉｎａｎｃｅｐｌａｎｅ）、隣接係数（ｎｅｉｇｈｂｏｒｉｎｇｃｏｅｆｆｉｃｉｅｎｔ）、係数位置、変換サイズなど、またはそれらの組み合わせを指し得る。コンテキスト情報の値は、どのようにコンテキスト情報を用いてシンタックス要素を符号化または復号化するかを指し得る。符号化器または復号化器は、コンテキスト情報に基づいてシンタックス要素の確率分布を予測できる。すなわち、コンテキスト情報は、（例えば、ビデオフレーム内の近接度、ブロックサイズなどに基づいて）符号化または復号化されるべき現在のブロックに類似するブロックなどについて、シンタックス要素が以前にどのようにコーディングされたかを示す確率モデルに関連し得る。適切な確率モデルを用いてシンタックス要素をコーディングすることにより、符号化器または復号化器は、より少ないビットを用いてシンタックス要素をそれぞれ符号化または復号化できる。 The context information may be used in entropy coding and entropy decoding of syntax elements. Examples of the context information may refer to a luminance plane, a chrominance plane, a neighboring coefficient, a coefficient position, a transform size, or a combination thereof. The value of context information may refer to how the context information is used to encode or decode syntax elements. The encoder or decoder can predict the probability distribution of syntax elements based on the context information. That is, the contextual information is what the syntax element used to be for a block similar to the current block to be encoded or decoded (eg, based on proximity within a video frame, block size, etc.). It may be associated with a probabilistic model that indicates whether it was coded. By coding the syntax elements with the appropriate probabilistic model, the encoder or decoder can encode or decode the syntax elements with fewer bits, respectively.

しかしながら、符号化または復号化されるべきブロックに関連する複数のシンタックス要素がある場合に、使用する適切な確率モデルを決定することは困難な場合がある。１つの解決法は、コンテキスト情報のすべての組み合わせに基づいて、ブロックに関連するシンタックス要素を異なるグループに分離することを含み得る。各グループは、コンテキスト情報の１つの組み合わせに関連する複数のシンタックス要素を含む。次いで、シンタックス要素は、最大数のシンタックス要素を含むグループに関連する確率モデルを用いてコーディングされ得る。 However, determining the appropriate probabilistic model to use can be difficult when there are multiple syntax elements associated with the block to be encoded or decoded. One solution may include segregating the syntax elements associated with blocks into different groups based on all combinations of contextual information. Each group includes multiple syntax elements associated with one combination of contextual information. The syntax elements can then be coded with a probabilistic model associated with the group containing the maximum number of syntax elements.

しかしながら、この解決法は欠点を有する場合がある。例えば、利用可能なコンテキスト情報の数が増えると、グループの数が指数関数的に増加し得る。これにより、エントロピーコーディングプロセスのコストが望ましくないほど高くなる可能性がある。別の例では、複数のグループが存在し得るが、各グループは少数のシンタックス要素のみを含む場合がある。この過剰な分離により、シンタックス要素についての不正確な確率モデル推定が生じ、これは、符号化器または復号化器に、準最適または不適切な確率モデルを用いてシンタックス要素をそれぞれ符号化または復号化させる場合がある。 However, this solution may have drawbacks. For example, as the number of contextual information available increases, the number of groups may grow exponentially. This can undesirably increase the cost of the entropy coding process. In another example, there may be multiple groups, but each group may contain only a few syntax elements. This excessive separation results in an inaccurate stochastic model estimate for the syntax elements, which causes the encoder or decoder to code the syntax elements with a suboptimal or improper stochastic model, respectively. Or it may be decrypted.

本開示の実装形態は、コンテキストツリーを用いて、ビデオフレームのブロックに関連するシンタックス要素をコーディングすることを含む。ビデオフレームのブロックに関連するシンタックス要素は、ブロックに含まれるシンタックス要素、ブロックを含むビデオフレームのヘッダに含まれるシンタックス要素、またはそうでなければ、ブロックのコンテンツまたはコーディングに関連するかまたはそのために使用されるシンタックス要素であってよい。以前にコーディングされたシンタックス要素をコーディングするためのコンテキスト情報が識別され、コンテキストツリーが、そのコンテキスト情報に基づいて以前にコーディングされたシンタックス要素をデータグループに分離することにより、生成される。コンテキストツリーは、データグループを表し、かつ以前にコーディングされたシンタックス要素についてのコスト削減に関連するノードを含む。コンテキストツリーが以前にコーディングされたシンタックス要素とそれに関連するコンテキスト情報とを用いて生成された後、シンタックス要素および関連するコンテキスト情報の別のセットが識別され得る。例えば、以前にコーディングされたシンタックス要素は、ビデオフレームの第１のブロックに関連するシンタックス要素であってよく、他のセットのシンタックス要素は、そのビデオフレームの第２のブロックに関連するシンタックス要素であってよい。他のセットのシンタックス要素のうちの１つのシンタックス要素は、その１つのシンタックス要素に関連するコンテキスト情報の値に基づいてなど、コンテキストツリーの複数のノードのうちの１つを識別することによりコーディングされ得る。その１つのシンタックス要素は、その識別されたノードに関連する確率モデルに従ってコーディングされ得る。従って、コンテキストツリーを用いて、さらなるセットのシンタックス要素を処理することができ、エントロピー符号化またはエントロピー復号化のコストが低くなる。 Implementations of the present disclosure include coding syntax elements associated with blocks of video frames using a context tree. A syntax element associated with a block of a video frame is a syntax element contained in the block, a syntax element contained in the header of the video frame containing the block, or otherwise related to the content or coding of the block, or It may be a syntax element used for that purpose. Context information for coding a previously coded syntax element is identified and a context tree is generated by separating the previously coded syntax element into data groups based on the context information. The context tree represents a group of data and contains nodes associated with cost savings for previously coded syntax elements. After the context tree has been generated using previously coded syntax elements and their associated context information, another set of syntax elements and associated context information may be identified. For example, the previously coded syntax element may be the syntax element associated with the first block of the video frame and the other set of syntax elements associated with the second block of the video frame. It may be a syntax element. A syntax element of one of the other sets of syntax elements identifies one of a plurality of nodes of a context tree, such as based on a value of context information associated with that one syntax element. Can be coded by The one syntax element may be coded according to a probabilistic model associated with the identified node. Therefore, the context tree can be used to process a further set of syntax elements, which lowers the cost of entropy coding or entropy decoding.

コンテキストツリーを用いたシンタックス要素のコーディングのための技術のさらなる詳細は、それらが実装され得るシステムを最初に参照して本明細書で説明される。図１は、ビデオ符号化および復号化システム１００の概略図である。送信局１０２は、例えば、図２に記載されているようなハードウェアの内部構成を有するコンピュータとすることができる。しかしながら、送信局１０２の他の実装形態も可能である。例えば、送信局１０２の処理を複数の装置に分散させることができる。 Further details of techniques for coding syntax elements with context trees are described herein with initial reference to the system in which they may be implemented. FIG. 1 is a schematic diagram of a video encoding and decoding system 100. The transmitting station 102 can be, for example, a computer having an internal configuration of hardware as shown in FIG. However, other implementations of transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed to a plurality of devices.

ネットワーク１０４は、ビデオストリームの符号化および復号化のために、送信局１０２および受信局１０６を接続することができる。具体的には、ビデオストリームを送信局１０２で符号化することができ、符号化されたビデオストリームを受信局１０６で復号化することができる。ネットワーク１０４は、例えばインターネットであってもよい。ネットワーク１０４は、ローカルエリアネットワーク（ＬＡＮ）、ワイドエリアネットワーク（ＷＡＮ）、仮想プライベートネットワーク（ＶＰＮ）、携帯電話ネットワーク、または送信局１０２から、この例では、受信局１０６にビデオストリームを転送する任意の他の手段とすることができる。 Network 104 may connect transmitting station 102 and receiving station 106 for encoding and decoding of video streams. Specifically, the video stream can be encoded at the transmitting station 102 and the encoded video stream can be decoded at the receiving station 106. The network 104 may be the Internet, for example. The network 104 may be a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), a mobile phone network, or any station that transfers a video stream from a transmitting station 102 to a receiving station 106 in this example. Other means are possible.

受信局１０６は、一例では、図２に記載されたようなハードウェアの内部構成を有するコンピュータとすることができる。しかしながら、受信局１０６の他の適切な実装形態も可能である。例えば、受信局１０６の処理を複数の装置に分散させることができる。 Receiving station 106 may, in one example, be a computer having an internal hardware configuration as described in FIG. However, other suitable implementations of receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed to a plurality of devices.

ビデオ符号化および復号化システム１００の他の実装形態も可能である。例えば、実装形態はネットワーク１０４を省略することができる。別の実装形態では、ビデオストリームを符号化し、後で受信局１０６またはメモリを有する任意の他の装置に送信するために格納することができる。一実装形態では、受信局１０６は、符号化されたビデオストリームを（例えば、ネットワーク１０４、コンピュータバス、および／または何らかの通信経路を介して）受信し、後の復号化のためにビデオストリームを格納する。一実装形態では、ネットワーク１０４を介して符号化されたビデオを伝送するためにリアルタイム転送プロトコル（ＲＴＰ：ｒｅａｌ−ｔｉｍｅｔｒａｎｓｐｏｒｔｐｒｏｔｏｃｏｌ）が使用される。別の実装形態では、例えば、ハイパーテキスト転送プロトコル（ＨＴＴＰ：ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）ベースのビデオストリーミングプロトコルなどのＲＴＰ以外の転送プロトコルが使用されてもよい。 Other implementations of the video encoding and decoding system 100 are possible. For example, the implementation may omit the network 104. In another implementation, the video stream may be encoded and stored for later transmission to the receiving station 106 or any other device having memory. In one implementation, receiving station 106 receives the encoded video stream (eg, via network 104, a computer bus, and/or some communication path) and stores the video stream for later decoding. To do. In one implementation, a real-time transport protocol (RTP) is used to transmit the encoded video over the network 104. In another implementation, a transfer protocol other than RTP may be used, such as, for example, a Hypertext Transfer Protocol (HTTP) based video streaming protocol.

ビデオ会議システムで使用される場合、例えば、送信局１０２および／または受信局１０６は、以下に説明するように、ビデオストリームを符号化および復号化する能力を含むことができる。例えば、受信局１０６は、ビデオ会議サーバ（例えば、送信局１０２）から符号化されたビデオビットストリームを受信して復号化および視聴し、さらに自身のビデオビットストリームを他の参加者による復号化および視聴のために符号化してビデオ会議サーバに送信するビデオ会議参加者であってよい。 When used in a video conferencing system, for example, transmitting station 102 and/or receiving station 106 may include the ability to encode and decode video streams, as described below. For example, receiving station 106 receives the encoded video bitstream from the video conferencing server (eg, transmitting station 102) for decoding and viewing, and also its own video bitstream for decoding and decoding by other participants. It may be a video conference participant who encodes for viewing and sends to the video conference server.

図２は、送信局または受信局を実装することができるコンピューティングデバイス２００の一例のブロック図である。例えば、コンピューティングデバイス２００は、図１の送信局１０２および受信局１０６の一方または両方を実装することができる。コンピューティングデバイス２００は、複数のコンピューティングデバイスを含むコンピューティングシステムの形態、または例えば、携帯電話、タブレットコンピュータ、ラップトップコンピュータ、ノートブックコンピュータ、デスクトップコンピュータなどの１つのコンピューティングデバイスの形態とすることができる。 FIG. 2 is a block diagram of an example of a computing device 200 that may implement a transmitter station or a receiver station. For example, computing device 200 may implement one or both of transmitting station 102 and receiving station 106 of FIG. The computing device 200 may be in the form of a computing system that includes multiple computing devices, or in the form of one computing device such as, for example, a mobile phone, tablet computer, laptop computer, notebook computer, desktop computer. You can

コンピューティングデバイス２００のプロセッサ２０２は、従来の中央処理装置とすることができる。代替的に、プロセッサ２０２は、現在存在するか、または今後開発される情報を操作または処理することができる他のタイプのデバイスまたは複数のデバイスであってもよい。例えば、開示された実装態様は、図示のような１つのプロセッサ（例えば、プロセッサ２０２）で実施され得るが、複数のプロセッサを使用することにより速度および効率における優位性を得ることができる。 The processor 202 of the computing device 200 can be a conventional central processing unit. Alternatively, processor 202 may be any other type of device or devices capable of manipulating or processing information that currently exists or will be developed in the future. For example, the disclosed implementations may be implemented in a single processor as shown (eg, processor 202), but the use of multiple processors may provide advantages in speed and efficiency.

コンピューティングデバイス２００内のメモリ２０４は、一実装形態では読み出し専用メモリ（ＲＯＭ）デバイスまたはランダムアクセスメモリ（ＲＡＭ）デバイスであってもよい。しかしながら、他の適切なタイプの記憶装置をメモリ２０４として使用することができる。メモリ２０４は、プロセッサ２０２がバス２１２を使用してアクセスするコードおよびデータ２０６を含むことができる。メモリ２０４は、オペレーティングシステム２０８およびアプリケーションプログラム２１０をさらに含むことができ、アプリケーションプログラム２１０は、本明細書に記載された技術をプロセッサ２０２が実行するのを可能にする少なくとも１つのプログラムを含む。例えば、アプリケーションプログラム２１０は、アプリケーション１〜Ｎを含むことができ、アプリケーション１〜Ｎは、本明細書で説明する技術を実行するビデオ符号化アプリケーションをさらに含む。コンピューティングデバイス２００は、例えば、モバイルコンピューティングデバイスと共に使用されるメモリカードであり得る二次ストレージ２１４を含むこともできる。ビデオ通信セッションは、かなりの量の情報を含み得るので、それらは、二次ストレージ２１４に全体的または部分的に記憶され、処理のために必要に応じてメモリ２０４にロードされる。 The memory 204 in the computing device 200 may be a read only memory (ROM) device or a random access memory (RAM) device in one implementation. However, any other suitable type of storage device can be used as the memory 204. The memory 204 may include code and data 206 that the processor 202 accesses using the bus 212. The memory 204 can further include an operating system 208 and application programs 210, which include at least one program that enables the processor 202 to perform the techniques described herein. For example, application program 210 can include applications 1-N, which further include video coding applications that perform the techniques described herein. Computing device 200 may also include secondary storage 214, which may be, for example, a memory card used with mobile computing devices. Since video communication sessions may include a significant amount of information, they may be stored wholly or partially in secondary storage 214 and loaded into memory 204 as needed for processing.

コンピューティングデバイス２００は、ディスプレイ２１８などの１つまたは複数の出力デバイスを含むこともできる。ディスプレイ２１８は、一例では、ディスプレイを、タッチ入力を感知するように動作可能なタッチセンシティブエレメントと組み合わせたタッチセンシティブディスプレイであってもよい。ディスプレイ２１８は、バス２１２を介してプロセッサ２０２に接続され得る。ユーザがコンピューティングデバイス２００をプログラムするかまたは他の方法で使用することを可能にする他の出力デバイスが、ディスプレイ２１８に加えて、またはディスプレイ２１８に代えて設けられてもよい。出力デバイスがディスプレイであるか、またはディスプレイを含む場合、ディスプレイは、液晶ディスプレイ（ＬＣＤ）、陰極線管（ＣＲＴ）ディスプレイ、または有機ＬＥＤ（ＯＬＥＤ）ディスプレイなどの発光ダイオード（ＬＥＤ）ディスプレイを含む様々な方法で実施することができる。 Computing device 200 may also include one or more output devices such as display 218. The display 218 may, in one example, be a touch sensitive display that combines the display with touch sensitive elements operable to sense touch input. The display 218 may be connected to the processor 202 via the bus 212. Other output devices that allow a user to program or otherwise use computing device 200 may be provided in addition to or in place of display 218. When the output device is or includes a display, the display may be a liquid crystal display (LCD), a cathode ray tube (CRT) display, or a light emitting diode (LED) display such as an organic LED (OLED) display. Can be implemented in.

コンピューティングデバイス２００は、コンピューティングデバイス２００を操作するユーザの画像等の画像を検出することができる、例えば、カメラなどの撮像デバイス２２０、または現在または将来開発される任意の他の撮像デバイス２２０を含むか、または撮像デバイス２２０と通信することができる。撮像デバイス２２０は、コンピューティングデバイス２００を操作するユーザの方に向けられるように配置することができる。一例では、撮像デバイス２２０の位置および光軸は、視野が、ディスプレイ２１８に直接隣接する領域であって、その領域からディスプレイ２１８が視認可能な領域を含むように構成することができる。 The computing device 200 may detect an image, such as an image of a user operating the computing device 200, such as an imaging device 220, such as a camera, or any other imaging device 220 currently or in the future developed. It may include or be in communication with the imaging device 220. The imaging device 220 may be arranged so that it is oriented towards the user operating the computing device 200. In one example, the position and optical axis of the imaging device 220 can be configured such that the field of view includes an area directly adjacent to the display 218 from which the display 218 is visible.

コンピューティングデバイス２００は、コンピューティングデバイス２００の近くの音を感知することができる、例えば、マイクロホンなどの音声感知デバイス２２２、または現在または今後開発される任意の他の音声感知デバイスを含むか、またはそれと通信することができる。音声感知デバイス２２２は、コンピューティングデバイス２００を操作するユーザの方に向けられ、かつユーザがコンピューティングデバイス２００を操作している間にユーザによって発せられた、例えば音声、他の発話を受信するように構成することができる。 The computing device 200 includes, for example, a voice sensing device 222, such as a microphone, or any other voice sensing device now or hereafter developed capable of sensing sound near the computing device 200, or You can communicate with it. The voice sensing device 222 is directed toward a user operating the computing device 200 and is adapted to receive, for example, voice, other utterances made by the user while the user is operating the computing device 200. Can be configured to.

図２は、コンピューティングデバイス２００のプロセッサ２０２およびメモリ２０４が１つのユニットに統合されていることを示しているが、他の構成を利用することもできる。プロセッサ２０２の動作は、直接的にまたはローカルエリアネットワークまたは他のネットワークを介して接続され得る複数のマシン（個別のマシンは１つまたは複数のプロセッサを有することができる）にわたって分散させることができる。メモリ２０４は、ネットワークベースのメモリのように複数のマシンに分散するか、またはコンピューティングデバイス２００の動作を実行する複数のマシンにおけるメモリとすることができる。本明細書では、１つのバスとして示されているが、コンピューティングデバイス２００のバス２１２は、複数のバスから構成することができる。さらに、二次ストレージ２１４は、コンピューティングデバイス２００の他の構成要素に直接接続されるか、またはネットワークを介してアクセスされ、かつメモリカードなどの１つの統合されたユニットまたは複数のメモリカードなどの複数のユニットを含むことができる。従って、コンピューティングデバイス２００は、多種多様な構成で実装され得る。 2 illustrates that the processor 202 and memory 204 of computing device 200 are integrated into one unit, other configurations may be utilized. The operation of processor 202 may be distributed across multiple machines (individual machines may have one or more processors) that may be connected directly or via a local area network or other network. The memory 204 may be distributed across multiple machines, such as network-based memory, or may be memory in multiple machines performing operations of the computing device 200. Although shown herein as a single bus, the bus 212 of the computing device 200 may be composed of multiple buses. Further, the secondary storage 214 may be directly connected to other components of the computing device 200, or accessed via a network, and may be an integrated unit such as a memory card or multiple memory cards. It can include multiple units. Accordingly, computing device 200 may be implemented in a wide variety of configurations.

図３は、符号化され、続いて復号化されるビデオストリーム３００の一例の図である。ビデオストリーム３００は、ビデオシーケンス３０２を含む。次のレベルでは、ビデオシーケンス３０２は、複数の隣接フレーム３０４を含む。３つのフレームが隣接フレーム３０４として示されているが、ビデオシーケンス３０２は任意の数の隣接フレーム３０４を含むことができる。隣接フレーム３０４はさらに、個々のフレーム、例えばフレーム３０６に細分化することができる。次のレベルでは、フレーム３０６は、一連のプレーンまたはセグメント３０８に分割され得る。セグメント３０８は、例えば、並列処理を可能にするフレームのサブセットとすることができる。セグメント３０８は、ビデオデータを別々の色に分離することができるフレームのサブセットとすることができる。例えば、カラービデオデータのフレーム３０６は、輝度プレーンおよび２つの色度プレーンを含むことができる。セグメント３０８は、異なる解像度でサンプリングすることができる。 FIG. 3 is a diagram of an example of a video stream 300 that is encoded and subsequently decoded. Video stream 300 includes video sequence 302. At the next level, the video sequence 302 includes multiple adjacent frames 304. Although three frames are shown as contiguous frames 304, video sequence 302 may include any number of contiguous frames 304. Adjacent frames 304 can be further subdivided into individual frames, such as frame 306. At the next level, frame 306 may be divided into a series of planes or segments 308. The segment 308 can be, for example, a subset of frames that allows parallel processing. The segment 308 can be a subset of frames that can separate the video data into different colors. For example, frame 306 of color video data may include a luma plane and two chromaticity planes. The segment 308 can be sampled at different resolutions.

フレーム３０６がセグメント３０８に分割されているか否かにかかわらず、フレーム３０６は、さらに、フレーム３０６内の例えば１６×１６画素に対応するデータを含むことができるブロック３１０に細分化されてもよい。ブロック３１０は、１つまたは複数のセグメント３０８の画素データからのデータを含むように構成され得る。ブロック３１０は、４×４画素、８×８画素、１６×８画素、８×１６画素、１６×１６画素、またはそれより大きいような任意の他の適切なサイズであってもよい。特に明記しない限り、ブロックおよびマクロブロックという用語は、本明細書では交換可能に使用される。 Whether or not frame 306 is divided into segments 308, frame 306 may be further subdivided into blocks 310 that may include data corresponding to, for example, 16×16 pixels in frame 306. Block 310 may be configured to include data from the pixel data of one or more segments 308. Block 310 may be 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or any other suitable size such as larger. Unless otherwise stated, the terms block and macroblock are used interchangeably herein.

図４は、本開示の実装形態による符号化器４００のブロック図である。符号化器４００は、例えばメモリ２０４などのメモリに格納されたコンピュータソフトウェアプログラムを提供するなどして、上述のように送信局１０２内で実装され得る。コンピュータソフトウェアプログラムは、ＣＰＵ２０２などのプロセッサによる実行時に、送信局１０２に図４で説明した方法でビデオデータを符号化させる機械命令を含むことができる。符号化器４００は、例えば、送信局１０２に含まれる専用のハードウェアとして実装されることも可能である。特に望ましい一実装形態では、符号化器４００は、ハードウェア符号化器である。 FIG. 4 is a block diagram of an encoder 400 according to implementations of the present disclosure. Encoder 400 may be implemented within transmitting station 102 as described above, such as by providing a computer software program stored in a memory such as memory 204. The computer software program may include machine instructions that, when executed by a processor such as the CPU 202, cause the transmitting station 102 to encode video data in the manner described in FIG. The encoder 400 may be implemented as dedicated hardware included in the transmitting station 102, for example. In one particularly desirable implementation, encoder 400 is a hardware encoder.

符号化器４００は、ビデオストリーム３００を入力として用いてフォワードパス（実線の接続線で示す）において様々な機能を実行して、符号化または圧縮されたビットストリーム４２０を生成するイントラ予測／インター予測ステージ４０２、変換ステージ４０４、量子化ステージ４０６、およびエントロピー符号化ステージ４０８を有する。符号化器４００は、将来のブロックの符号化のためのフレームを再構成する再構成パス（点線の接続線で示す）をも含む。図４において、符号化器４００は、再構成パスにおいて様々な機能を実行する以下のステージ、すなわち、逆量子化ステージ４１０、逆変換ステージ４１２、再構成ステージ４１４、およびループフィルタリングステージ４１６を有する。符号化器４００の他の構成的な変形例を用いてビデオストリーム３００を符号化することができる。 The encoder 400 uses the video stream 300 as an input to perform various functions in the forward path (indicated by solid connecting lines) to produce an intra-predicted/inter-predicted coded bitstream 420. It has a stage 402, a transform stage 404, a quantization stage 406, and an entropy coding stage 408. The encoder 400 also includes a reconstruction path (indicated by a dashed connecting line) that reconstructs a frame for encoding of future blocks. In FIG. 4, the encoder 400 has the following stages to perform various functions in the reconstruction pass: an inverse quantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. The video stream 300 may be encoded using other structural variations of the encoder 400.

ビデオストリーム３００が符号化のために提示されると、フレーム３０６などの隣接フレーム３０４の各々は、ブロックの単位で処理され得る。イントラ予測／インター予測ステージ４０２において、それぞれのブロックは、イントラフレーム予測（イントラ予測とも呼ばれる）またはインターフレーム予測（インター予測とも呼ばれる）を用いて符号化され得る。いずれの場合でも、予測ブロックが形成され得る。イントラ予測の場合、予測ブロックは、以前に符号化され、かつ再構成された現在のフレーム内のサンプルから形成され得る。インター予測の場合、予測ブロックは、１つまたは複数の以前に構築された参照フレーム内のサンプルから形成され得る。 When the video stream 300 is presented for encoding, each of the adjacent frames 304, such as frame 306, may be processed in blocks. In intra-prediction/inter-prediction stage 402, each block may be encoded using intra-frame prediction (also called intra-prediction) or inter-frame prediction (also called inter-prediction). In either case, predictive blocks may be formed. For intra prediction, the predictive block may be formed from previously coded and reconstructed samples in the current frame. For inter prediction, the prediction block may be formed from the samples in one or more previously constructed reference frames.

次に、引き続き図４を参照して、イントラ予測／インター予測ステージ４０２において予測ブロックが現在のブロックから減算され、残差ブロック（残差とも呼ばれる）が生成される。変換ステージ４０４は、ブロックベースの変換を用いて、残差を、例えば周波数領域の変換係数に変換する。量子化ステージ４０６は、変換係数を、量子化値または量子化レベルを用いて量子化変換係数と呼ばれる離散量子値に変換する。例えば、変換係数は、量子化値で除算され、切り捨てられてもよい。 Continuing to refer to FIG. 4, the prediction block is then subtracted from the current block in the intra-prediction/inter-prediction stage 402 to produce a residual block (also called residual). The transform stage 404 transforms the residual into, for example, transform coefficients in the frequency domain using block-based transforms. The quantization stage 406 transforms the transform coefficient into a discrete quantum value called a quantized transform coefficient using the quantized value or the quantized level. For example, the transform coefficient may be divided by the quantized value and truncated.

次に、量子化された変換係数は、エントロピー符号化ステージ４０８によってエントロピー符号化される。例えば、エントロピー符号化ステージ４０８は、現在のブロックに関連するシンタックス要素を符号化するためのコンテキスト情報を識別すること、コンテキスト情報に基づいてシンタックス要素をデータグループに分離することによりコンテキストツリーを生成することを含み得る。コンテキスト情報を識別し、コンテキストツリーを生成するための実装形態は、図６〜図１１に関して以下で説明される。エントロピー符号化された係数は、（例えば、使用された予測のタイプ、変換タイプ、動きベクトル、および量子化値などを示すために用いられるようなシンタックス要素を含み得る）ブロックを復号化するために使用される他の情報とともに、圧縮されたビットストリーム４２０に出力される。圧縮されたビットストリーム４２０は、可変長コーディング（ｖａｒｉａｂｌｅｌｅｎｇｔｈｃｏｄｉｎｇ：ＶＬＣ）または算術コーディングなどの様々な技術を用いてフォーマットされ得る。圧縮されたビットストリーム４２０は、符号化されたビデオストリームまたは符号化されたビデオビットストリームとも称され、これらの用語は本明細書では互換的に使用される。 The quantized transform coefficients are then entropy coded by entropy coding stage 408. For example, entropy encoding stage 408 identifies the context information for encoding syntax elements associated with the current block, and separates the context elements into data groups based on the context information to construct a context tree. Producing may be included. Implementations for identifying context information and generating context trees are described below with respect to Figures 6-11. The entropy coded coefficients are for decoding blocks (eg, may include syntax elements such as those used to indicate the type of prediction used, transform type, motion vector, and quantized value, etc.). Output along with other information used for the compressed bitstream 420. The compressed bitstream 420 may be formatted using various techniques such as variable length coding (VLC) or arithmetic coding. Compressed bitstream 420 is also referred to as an encoded video stream or an encoded video bitstream, and these terms are used interchangeably herein.

符号化器４００および復号化器５００（以下に説明する）が、圧縮されたビットストリーム４２０を復号化するために同じ参照フレームを使用することを確実にするために、図４における再構成パス（点線の接続線で示す）が使用される。再構成パスは、逆量子化ステージ４１０で量子化された変換係数を逆量子化すること、および逆変換ステージ４１２で逆量子化された変換係数を逆変換して微分残差ブロック（微分残差とも称される）を生成することを含む復号化プロセス（後述）中に行われる機能と同様の機能を実行する。再構成ステージ４１４において、イントラ予測／インター予測ステージ４０２で予測された予測ブロックを微分残差に加えて、再構成されたブロックが作成される。ループフィルタリングステージ４１６は、ブロッキングアーチファクトなどの歪みを低減するために、再構成されたブロックに適用され得る。 To ensure that encoder 400 and decoder 500 (discussed below) use the same reference frames to decode compressed bitstream 420, the reconstruction pass ( (Represented by dotted connecting lines) is used. The reconstruction pass inversely quantizes the transform coefficient quantized in the inverse quantization stage 410, and inversely transforms the inverse quantized transform coefficient in the inverse transform stage 412 to obtain a differential residual block (differential residual difference). Perform a function similar to that performed during the decoding process (discussed below), including generating a. In the reconstruction stage 414, the prediction block predicted in the intra prediction/inter prediction stage 402 is added to the differential residual to create a reconstructed block. Loop filtering stage 416 may be applied to the reconstructed block to reduce distortions such as blocking artifacts.

符号化器４００の他の変形例を用いて圧縮されたビットストリーム４２０を符号化することができる。いくつかの実装形態では、非変換ベースの符号化器は、あるブロックまたはフレームに関して変換ステージ４０４を使用せずに残差信号を直接量子化することができる。いくつかの実装形態では、符号化器は、共通のステージに組み合わせられた量子化ステージ４０６および逆量子化ステージ４１０を有し得る。 Other variations of encoder 400 can be used to encode compressed bitstream 420. In some implementations, a non-transform based encoder may directly quantize the residual signal without using the transform stage 404 for certain blocks or frames. In some implementations, the encoder may have a quantization stage 406 and an inverse quantization stage 410 combined in a common stage.

図５は、本開示の実装形態による復号化器５００のブロック図である。復号化器５００は、例えば、メモリ２０４に格納されたコンピュータソフトウェアプログラムを提供することによって、受信局１０６で実装され得る。コンピュータソフトウェアプログラムは、プロセッサ２０２などのプロセッサによる実行時に、受信局１０６に、図５において説明した方法でビデオデータを復号化させる機械命令を含み得る。復号化器５００は、例えば、送信局１０２または受信局１０６に含まれるハードウェアで実装されることも可能である。 FIG. 5 is a block diagram of a decoder 500 according to an implementation of the present disclosure. Decoder 500 may be implemented at receiving station 106, for example, by providing a computer software program stored in memory 204. The computer software program may include machine instructions that, when executed by a processor, such as processor 202, cause receiving station 106 to decode video data in the manner described in FIG. Decoder 500 may be implemented in hardware included in transmitting station 102 or receiving station 106, for example.

復号化器５００は、上述の符号化器４００の再構成パスと同様に、一例では、様々な機能を実行して圧縮されたビットストリーム４２０から出力ビデオストリーム５１６を生成するための以下のステージ、すなわち、エントロピー復号化ステージ５０２、逆量子化ステージ５０４、逆変換ステージ５０６、イントラ予測／インター予測ステージ５０８、再構成ステージ５１０、ループフィルタリングステージ５１２、およびデブロッキングフィルタリングステージ５１４を含む。圧縮されたビットストリーム４２０を復号化するために復号化器５００の他の構造的な変形例を使用することができる。 Decoder 500, in one example, performs the various functions, similar to the reconstruction pass of encoder 400 described above, to generate the output video stream 516 from compressed bitstream 420, the following stages: That is, it includes an entropy decoding stage 502, an inverse quantization stage 504, an inverse transform stage 506, an intra prediction/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a deblocking filtering stage 514. Other structural variations of the decoder 500 may be used to decode the compressed bitstream 420.

圧縮されたビットストリーム４２０が復号化のために提示されると、圧縮されたビットストリーム４２０内のデータ要素が、エントロピー復号化ステージ５０２によって復号化されて、一組の量子化変換係数が生成される。例えば、エントロピー復号化ステージ５０２は、符号化されたブロックに関連する符号化されたシンタックス要素を復号化するためのコンテキスト情報を識別すること、コンテキスト情報に基づいて符号化されたシンタックス要素をデータグループに分離することによりコンテキストツリーを生成することを含み得る。コンテキスト情報を識別し、コンテキストツリーを生成するための実装形態は、図６〜図１１に関して以下で説明される。 When the compressed bitstream 420 is presented for decoding, the data elements in the compressed bitstream 420 are decoded by the entropy decoding stage 502 to produce a set of quantized transform coefficients. It For example, entropy decoding stage 502 identifies context information for decoding the encoded syntax elements associated with the encoded block, the encoded syntax elements based on the context information. It may include generating a context tree by separating the data groups. Implementations for identifying context information and generating context trees are described below with respect to Figures 6-11.

逆量子化ステージ５０４は、（例えば、量子化された変換係数に量子化値を乗算することにより）量子化された変換係数を逆量子化し、逆変換ステージ５０６は、逆量子化された変換係数を逆変換して、符号化器４００における逆変換ステージ４１２によって生成されたものと同一である微分残差を生成する。圧縮されたビットストリーム４２０から復号化されたヘッダ情報を用いて、復号化器５００は、イントラ予測／インター予測ステージ５０８を用いて、例えばイントラ予測／インター予測ステージ４０２において、符号化器４００で生成されたのと同じ予測ブロックを作成する。 The dequantization stage 504 dequantizes the quantized transform coefficient (eg, by multiplying the quantized transform coefficient by a quantized value), and the inverse transform stage 506 degenerates the dequantized transform coefficient. Is inverse transformed to produce a differential residual that is the same as that produced by the inverse transform stage 412 in encoder 400. Using the decoded header information from the compressed bitstream 420, the decoder 500 uses the intra-prediction/inter-prediction stage 508 to generate the encoder 400, eg, in the intra-prediction/inter-prediction stage 402. Create the same prediction block that was done.

再構成ステージ５１０において、予測ブロックを微分残差に加えて再構成ブロックが作成される。ループフィルタリングステージ５１２は、再構成されたブロックに適用されて、ブロッキングアーチファクトを低減し得る。再構成されたブロックに他のフィルタリングを適用することができる。この例では、ブロッキング歪を低減するためにデブロッキングフィルタリングステージ５１４が再構成ブロックに適用され、その結果が出力ビデオストリーム５１６として出力される。出力ビデオストリーム５１６は、復号化されたビデオストリームとも呼ばれ、用語は本明細書では互換的に使用される。復号化器５００の他の変形例を用いて、圧縮されたビットストリーム４２０を復号化することができる。いくつかの実装形態では、復号化器５００は、デブロッキングフィルタリングステージ５１４を用いずに出力ビデオストリーム５１６を生成することができる。 At the reconstruction stage 510, the prediction block is added to the differential residual to create a reconstruction block. Loop filtering stage 512 may be applied to the reconstructed block to reduce blocking artifacts. Other filtering can be applied to the reconstructed block. In this example, deblocking filtering stage 514 is applied to the reconstructed block to reduce blocking distortion and the result is output as output video stream 516. Output video stream 516 is also referred to as a decoded video stream, and the terms are used interchangeably herein. Other variations of the decoder 500 can be used to decode the compressed bitstream 420. In some implementations, the decoder 500 may generate the output video stream 516 without the deblocking filtering stage 514.

次に、図６〜図７を参照して、コンテキストツリーを生成し、コンテキストツリーを用いてシンタックス要素をコーディングするための技術を説明する。図６は、コンテキストツリーを用いてビデオフレームのブロックに関連するシンタックス要素をコーディングするための技術６００のフローチャートである。図７は、ビデオフレームのブロックに関連するシンタックス要素をコーディングするためのコンテキストツリーを生成するための技術７００のフローチャートである。技術６００および技術７００の一方または両方は、例えば、送信局１０２または受信局１０６などのコンピューティングデバイスによって実行されるソフトウェアプログラムとして実施することができる。例えば、ソフトウェアプログラムは、メモリ２０４または二次ストレージ２１４などのメモリに格納され、プロセッサ２０２のようなプロセッサによる実行時に、コンピューティングデバイスに技術６００および／または技術７００を実行させる機械可読命令を含むことができる。技術６００または技術７００の一方または両方は、専用のハードウェアまたはファームウェアを用いて実装され得る。上述したように、いくつかのコンピューティングデバイスは、複数のメモリまたはプロセッサを有してもよく、技術６００または技術７００の一方または両方において説明される複数の動作は、複数のプロセッサ、メモリ、またはその両方を用いて分散されてもよい。 Next, a technique for generating a context tree and coding syntax elements using the context tree will be described with reference to FIGS. 6 to 7. FIG. 6 is a flowchart of a technique 600 for coding syntax elements associated with blocks of video frames using a context tree. FIG. 7 is a flowchart of a technique 700 for generating a context tree for coding syntax elements associated with blocks of video frames. One or both of techniques 600 and 700 may be implemented, for example, as a software program executed by a computing device such as transmitting station 102 or receiving station 106. For example, the software program may include machine-readable instructions stored in memory, such as memory 204 or secondary storage 214, that, when executed by a processor, such as processor 202, cause a computing device to perform technology 600 and/or technology 700. You can One or both of technology 600 or technology 700 may be implemented using dedicated hardware or firmware. As mentioned above, some computing devices may have multiple memories or processors, and the operations described in one or both of techniques 600 or 700 may result in multiple processors, memories, or It may be dispersed using both.

技術６００または技術７００の一方または両方は、符号化器、例えば、図４に示される符号化器４００、または復号化器、例えば、図５に示される復号化器５００により実行され得る。従って、技術６００および技術７００の以下の説明内の参照には、現在のブロックの符号化または符号化されたブロックの復号化、または現在のブロックの符号化または符号化されたブロックの復号化に使用されるコンテキストツリーの生成に関する議論が含まれ得る。技術６００または技術７００のすべてまたは一部を用いて、現在のブロックを符号化するか、または符号化されたブロックを復号化することができるが、「現在のブロックを符号化する」など、または「符号化されたブロックを復号化する」などへの言及は、適用可能な動作を指すことができる。例えば、技術６００または技術７００が現在のブロックを符号化するためのプロセスの一部として使用される場合、「符号化されたブロックを復号化する」などへの言及は無視され得る。同様に、技術６００または技術７００が、符号化されたブロックを復号化するためのプロセスの一部として使用される場合、「現在のブロックを符号化する」などへの言及は無視され得る。 One or both of technique 600 or technique 700 may be performed by an encoder, eg, encoder 400 shown in FIG. 4, or a decoder, eg, decoder 500 shown in FIG. Accordingly, references within the following description of technique 600 and technique 700 refer to encoding the current block or decoding the encoded block, or encoding the current block or decoding the encoded block. Discussion may be included regarding the generation of the context tree used. All or a portion of technique 600 or technique 700 may be used to encode the current block or to decode the encoded block, such as "encode the current block", or References such as "decode an encoded block" can refer to applicable acts. For example, if technique 600 or technique 700 is used as part of a process for encoding a current block, references to “decode encoded block” and the like may be ignored. Similarly, if technique 600 or technique 700 is used as part of a process for decoding an encoded block, references to “encode the current block” and the like may be ignored.

まず図６を参照すると、コンテキストツリーを用いてビデオフレームのブロックに関連するシンタックス要素をコーディングするための技術６００のフローチャートが示されている。６０２において、ビデオフレームの以前にコーディングされたブロックに関連する第１のセットのシンタックス要素をコーディングするためのコンテキスト情報が識別される。コンテキスト情報を識別することは、第１のセットのシンタックス要素の符号化または復号化に使用可能なコンテキスト情報の可能な値を判定することを含み得る。コーディングプロセスの間、符号化器（例えば、図４に示される符号化器４００）または復号化器（例えば、図５に示される復号化器５００）は、定義されたセットのコンテキスト情報内のコンテキスト情報の値を使用するように構成され得る。各セットのコンテキスト情報の値は、シンタックス要素をコーディングするための異なる確率を反映できる。 Referring first to FIG. 6, a flowchart of a technique 600 for coding syntax elements associated with blocks of video frames using a context tree is shown. At 602, contextual information for coding a first set of syntax elements associated with a previously coded block of a video frame is identified. Identifying the context information may include determining possible values for the context information that may be used to encode or decode the first set of syntax elements. During the coding process, an encoder (eg, encoder 400 shown in FIG. 4) or a decoder (eg, decoder 500 shown in FIG. 5) may use a context within a defined set of context information. It can be configured to use the value of the information. The values of context information for each set can reflect different probabilities for coding syntax elements.

例えば、符号化器または復号化器による使用のために利用可能なＮセットのコンテキスト情報があり得る。Ｎセットのコンテキスト情報は、符号化器または復号化器が使用できるコンテキストバッファ内に格納され得る。このセットの値は、シンタックス要素（例えば、コーディングされるビデオフレームのブロックに関連する第２のセットのシンタックス要素）のコーディングのためのデフォルトの確率、シンタックス要素（例えば、第１のセットのシンタックス要素）の以前のコーディングに基づいて決定される確率などを反映し得る。各セットのコンテキスト情報は、そのセットのコンテキスト情報に対する異なる値を含み得る。第１のセットの第１のコンテキスト情報の値は、第２のセットの同じ第１のコンテキスト情報の値と同じではない場合がある。 For example, there may be N sets of contextual information available for use by the encoder or decoder. The N sets of context information may be stored in a context buffer available to the encoder or decoder. The values of this set are default probabilities for coding syntax elements (eg, second set of syntax elements associated with blocks of a video frame to be coded), syntax elements (eg, first set). (Syntax element of ), the probability determined based on the previous coding, etc. can be reflected. Each set of contextual information may include different values for that set of contextual information. The value of the first context information of the first set may not be the same as the value of the same first context information of the second set.

コンテキスト情報を識別することは、コンテキストベクトルを生成すること、受信すること、またはそうでなければ識別することを含み得る。コンテキストベクトルは、シンタックス要素をコーディングするためのコンテキスト情報の可能な値の一部またはすべてを含む。コンテキストベクトルは、変数ｃｔｘとして表され得る。コンテキストベクトルのインデックスｃｔｘ［Ｎ］は、複数のセットのコンテキスト情報のうちの１つの可能な値を指し得る。例えば、ｃｔｘ［０］に格納されている値は、第１のコンテキスト情報の複数の異なる値を指し得る。コンテキストベクトルに含まれるコンテキスト情報は、キャッシュに格納され得る。 Identifying context information may include generating, receiving, or otherwise identifying a context vector. The context vector contains some or all of the possible values of context information for coding syntax elements. The context vector may be represented as the variable ctx. The context vector index ctx[N] may refer to one possible value of the sets of context information. For example, the value stored in ctx[0] may refer to different values of the first context information. The context information contained in the context vector may be stored in the cache.

例えば、コンテキストベクトルを生成することは、データ構造（例えば、配列、オブジェクトシリーズなど）を定義すること、およびデータ構造のインデックス内の異なるセットのコンテキスト情報からのコンテキスト情報の値を格納することを含み得る。別の例では、コンテキストベクトルを識別することは、符号化器または復号化器によってアクセス可能なデータベースまたは同様のデータストア（例えば、コンテキスト情報が格納されるキャッシュ）から定義済みのデータ構造を取得することを含み得る。さらに別の例では、コンテキストベクトルを受信することは、ソフトウェアまたはハードウェアコンポーネントからコンテキストベクトルを示すデータを受信することを含み得る。例えば、サーバデバイスは、データを符号化器または復号化器に送信するためのソフトウェアを含み得る。データは、コンテキストベクトル、またはデータベースまたは同様のデータストア内などで、コンテキストベクトルを識別するために使用できるデータを含み得る。 For example, generating a context vector includes defining a data structure (eg, array, object series, etc.) and storing values of context information from different sets of context information within an index of the data structure. obtain. In another example, identifying the context vector obtains a predefined data structure from a database or similar data store accessible by the encoder or decoder (eg, a cache where context information is stored). May be included. In yet another example, receiving the context vector may include receiving data indicating the context vector from a software or hardware component. For example, the server device may include software for sending data to the encoder or decoder. The data may include a context vector, or data that can be used to identify the context vector, such as in a database or similar data store.

６０４において、コンテキストツリーが生成される。コンテキストツリーは、第１のセットのシンタックス要素をコーディングするために使用されるデータを表すノードを含む、バイナリツリーまたは非バイナリツリーであってよい。コンテキストツリーのノードは、リーフノード、または非リーフノードであり得る。リーフノードは、第１のセットのシンタックス要素のセットまたはサブセットを表すノードであり、そのセットまたはサブセットは、本明細書ではデータグループと呼ばれる。非リーフノードは、データグループを分離するために用いられる表現を表すノードであり、この表現は、本明細書では分離基準と呼ばれる。例えば、コンテキストツリーがバイナリツリーである場合、非リーフノードは、両方がリーフノードであるか、両方が非リーフノードであるか、または１つのリーフノードと１つの非リーフとを含む２つの子ノードを有することができる親ノードである。 At 604, a context tree is generated. The context tree may be a binary tree or a non-binary tree that includes nodes that represent the data used to code the first set of syntax elements. The nodes of the context tree can be leaf nodes or non-leaf nodes. Leaf nodes are nodes that represent a set or subset of a first set of syntax elements, which set or subset is referred to herein as a data group. A non-leaf node is a node that represents an expression used to separate data groups, which expression is referred to herein as a separation criterion. For example, if the context tree is a binary tree, non-leaf nodes are either both leaf nodes, both non-leaf nodes, or two child nodes containing one leaf node and one non-leaf node. It is a parent node that can have.

コンテキストツリーを生成することは、識別されたコンテキスト情報（例えば、コンテキストベクトル内のセットのコンテキスト情報の値）に基づいて、第１のセットのシンタックス要素をデータグループに分離することを含み得る。コンテキストツリーのノードは、生成されて、第１のセットのシンタックス要素のデータグループを表す。第１のセットのシンタックス要素を分離することは、コンテキスト情報の値に対して分離基準を適用して、複数のノードのうちのいくつかを生成することを含み得る。コンテキストツリーの生成に使用できる分離基準は、符号化器または復号化器がアクセスできるリスト、データベース、または他のデータストアで定義され得る。 Generating the context tree may include separating the first set of syntax elements into data groups based on the identified context information (eg, the value of the set of context information in the context vector). A node of the context tree is created to represent the data group of the first set of syntax elements. Separating the first set of syntax elements may include applying separation criteria to the values of the context information to generate some of the plurality of nodes. The separation criteria that can be used to generate the context tree can be defined in a list, database, or other data store accessible to the encoder or decoder.

以下に説明するように、分離基準は、コンテキスト情報の値に対して適用され、その分離基準を使用すると第１のセットのシンタックス要素の最大のコスト削減になると判定したことに応答して１つまたは複数のノードを生成することができる。例えば、候補コスト削減は、コンテキスト情報の異なる値に対する異なる分離基準の適用に基づいて判定され得る。最大のコスト削減をもたらす分離基準とコンテキスト情報の対応する値とが、コンテキストツリーの１つまたは複数の新しいノードを生成するために選択される。 As explained below, the separation criterion is applied to the values of the context information, and in response to determining that using the separation criterion results in the greatest cost savings for the first set of syntax elements, 1 One or more nodes can be created. For example, candidate cost savings may be determined based on applying different separation criteria to different values of contextual information. The separation criterion that yields the greatest cost savings and the corresponding value of the context information are selected to generate one or more new nodes of the context tree.

ノードは、リーフノードとして開始し、その後、非リーフノードに変更されてもよい。例えば、第１のレベルのコンテキストツリーは、コーディングされる第１のセットのシンタックス要素のすべてを表すリーフノードである１つのノードを含み得る。分離基準が、コンテキスト情報の値に対して適用され、第１のセットのシンタックス要素を分離することができる。分離基準が適用された後、そのノードは、適用された分離基準を表す非リーフノードになり、リーフノードは、その非リーフノードの子ノードとして生成される。子ノードの各々は、第１のセットのシンタックス要素すべてのサブセットを含むデータグループを表す。例えば、コンテキストツリーがバイナリツリーである場合、第１のレベルのコンテキストツリーのノードについてのコンテキスト情報の値に分離基準を適用することは、そのノードの２つの子ノードを生成することを含む。これらの子ノードの各々によって表されるデータグループは、例えば、第１のセットのシンタックス要素の半分を含み得る。あるいは、これらのデータグループは、第１のセットのシンタックス要素のうちの異なる量を含み得る。 A node may start as a leaf node and then change to a non-leaf node. For example, the first level context tree may include one node that is a leaf node that represents all of the first set of coded syntax elements. Separation criteria may be applied to the context information values to separate the first set of syntax elements. After the separation criterion is applied, the node becomes a non-leaf node representing the applied separation criterion, and the leaf node is created as a child node of the non-leaf node. Each of the child nodes represents a data group that includes a subset of all the first set of syntax elements. For example, if the context tree is a binary tree, applying a separation criterion to the value of the context information for a node in the first level context tree includes creating two child nodes of that node. The data group represented by each of these child nodes may include, for example, half of the first set of syntax elements. Alternatively, these data groups may include different amounts of the first set of syntax elements.

コンテキストツリーのノードは、第１のセットのシンタックス要素（および、例えば、コンテキストツリーを用いてコーディングされた後続のセットのシンタックス要素）のコスト削減に関連する。コンテキストツリーは、ビデオフレームのブロックに関連するシンタックス要素をコーディングするための最低コストを判定するために生成される。シンタックス要素をコーディングするためのコストは、シンタックス要素をコーディングするための計算コストであり得る。計算コストは、例えば、コーディングのために使用される計算リソースの量を示し得る。例えば、シンタックス要素をコーディングするためのコストは、シンタックス要素をコーディングするための時間、シンタックス要素をコーディングするために使用されるスペース、およびシンタックス要素をコーディングするための演算の数（例えば、算術演算）から成るパラメータのグループから選択された１つまたは複数のパラメータを示すか、またはそれらのうちのいくつかの機能であり得る。シンタックス要素をコーディングするためのコストは、コーディングするシンタックス要素の数に依存し得る。コーディングするシンタックス要素の数は、データグループを用いて表され得る。コンテキストツリーのリーフノードは、異なるデータグループを表し、従って、シンタックス要素の異なる可能な数を表す。そのため、ビデオフレームのブロックに関連するシンタックス要素をエントロピーコーディングするコストは、各リーフノードに対して判定され得る。例えば、コンテキストツリーのリーフノードにより表されるデータグループに基づいてシンタックス要素をエントロピーコーディングするコストは、次の式を用いて計算され得る。 The nodes of the context tree are associated with cost savings for the first set of syntax elements (and, for example, subsequent sets of syntax elements coded with the context tree). The context tree is generated to determine the lowest cost to code the syntax elements associated with blocks of video frames. The cost of coding syntax elements may be the computational cost of coding syntax elements. Computational cost may indicate, for example, the amount of computational resources used for coding. For example, the cost to code a syntax element is the time to code the syntax element, the space used to code the syntax element, and the number of operations to code the syntax element (eg, , Arithmetic operation) may indicate one or more parameters selected from the group of parameters, or may be a function of some of them. The cost to code a syntax element may depend on the number of syntax elements to code. The number of syntax elements to code can be represented using a data group. The leaf nodes of the context tree represent different groups of data and thus different possible numbers of syntax elements. As such, the cost of entropy coding the syntax elements associated with blocks of a video frame may be determined for each leaf node. For example, the cost of entropy coding syntax elements based on the data groups represented by the leaf nodes of the context tree can be calculated using the following formula:

ここで、ｇ_ｉはｉ番目のデータグループであり、ｅ（ｇ_ｉ）はデータグループｇ_ｉのエントロピーコスト関数である、ｒ（ｓｉｚｅ（ｇ_ｉ））はデータグループｇ_ｉのサイズとともに減少する正の出力を有するサイズペナルティ関数であり、λは、サイズペナルティ関数ｒ（ｓｉｚｅ（ｇ_ｉ））の重みである。サイズペナルティ関数ｒ（ｓｉｚｅ（ｇ_ｉ））は、例えば、値域および定義域Ｒ^＋→Ｒ^＋を有し得る。エントロピーコスト関数は、次の式を用いて計算され得る。 Where g _i is the i-th data group, e(g _i ) is the entropy cost function of data group g _i , and r(size(g _i )) is a positive value that decreases with the size of data group g _i. Is a size penalty function having an output of λ, and λ is a weight of the size penalty function r(size(g _i )). The size penalty function r(size(g _i )) may have, for example, a range and a domain R ⁺ →R ⁺ . The entropy cost function can be calculated using the following formula:

ここで、ｎ_ｉはデータグループｇ_ｉにおけるデータの長さであり、ｐ_ｉ［ｋ］はシンタックス値ｋを有するデータグループｇ_ｉのシンタックス要素についての確率である。コンテキストツリーのノードによって表される各データグループは、確率モデルに関連し得る。 Here, n _i is the length of the data in the data group g _i , and p _i [k] is the probability for the syntax element of the data group g _i having the syntax value k. Each data group represented by a node in the context tree may be associated with a probabilistic model.

コンテキストツリーを生成することは、第１のセットのシンタックス要素の分離されたデータグループを表すノードを作成することから生じ得るコスト削減を判定することを含むことができる。すなわち、リーフノードに分離基準を適用すると、そのリーフノードは２つ以上の子ノードを有する非リーフノード（例えば、親ノード）となる。分離基準を用いてどのノードを分離するかを判定する際のコスト削減は、親ノードのコストから結果として得られる子ノードのコストの合計を差し引くことにより計算される。生成されるコンテキストツリーは、分離基準を用いて非リーフノード（例えば、親ノード）になり得る１つまたは複数のリーフノードを有し得る。従って、コンテキストツリーを生成することは、コンテキストツリーの所与のレベルのリーフノードまたはセットのリーフノードのうちのどれが、分離された際に、第１のセットのシンタックス要素をエントロピーコーディングするための最高のコスト削減をもたらすかを判定することを含み得る。 Generating the context tree can include determining potential cost savings from creating the nodes that represent the separated data groups of the first set of syntax elements. That is, when the separation criterion is applied to a leaf node, the leaf node becomes a non-leaf node (for example, a parent node) having two or more child nodes. The cost savings in determining which node to separate using the separation criterion is calculated by subtracting the total cost of the resulting child nodes from the cost of the parent node. The generated context tree may have one or more leaf nodes that may be non-leaf nodes (eg, parent nodes) using the separation criteria. Thus, generating the context tree is for entropy coding the first set of syntax elements when any of the leaf nodes of a given level or set of leaf nodes of the context tree are isolated. Determination of the highest cost savings of

コンテキストツリーの所与のセットのリーフノードにおいて最高のコスト削減をもたらすリーフノードを分離することを決定することは、分離基準がそれらのリーフノードに適用される前後に、リーフノードを表すデータグループをエントロピーコーディングするためのコスト削減を判定することを含み得る。最も高いコスト削減に関連するリーフノードは、分離されて子ノードを生成できるが、別の結果として、いくつかの実装形態では、セットのリーフノードの他のノードによって表されるデータグループは分離されない。コンテキストツリーを生成するためのさらなる実装形態および例については、図７に関して以下で説明される。 Determining to separate the leaf nodes that yield the highest cost savings in a given set of leaf nodes in the context tree is to determine the data groups that represent the leaf nodes before and after the separation criterion is applied to those leaf nodes. It may include determining a cost reduction for entropy coding. The leaf nodes associated with the highest cost savings can be isolated to produce child nodes, but another consequence is that in some implementations the data groups represented by other nodes in the set of leaf nodes are not. .. Further implementations and examples for generating a context tree are described below with respect to FIG.

６０６において、第２のセットのシンタックス要素およびそれに関連するコンテキスト情報が識別される。第２のセットのシンタックス要素は、コンテキストツリーを生成するために使用される第１のセットのシンタックス要素を含むブロックの後のラスタースキャンまたは他のスキャン順序に配置されたブロックに関連し得る。あるいは、第２のセットのシンタックス要素は、コンテキストツリーを生成するために使用される第１のセットのシンタックス要素とは異なるビデオフレームに関連していてもよい。第２のセットのシンタックス要素をコーディングするために用いられるコンテキスト情報は、第２のコンテキストベクトルの値として表され得る。第２のコンテキストベクトルは、第１のセットのコンテキスト情報をコーディングするために用いられるコンテキスト情報の値を含むコンテキストベクトルと同じでも異なっていてもよい。 At 606, a second set of syntax elements and associated contextual information is identified. The second set of syntax elements may be associated with blocks placed in a raster scan or other scan order after the block containing the first set of syntax elements used to generate the context tree. .. Alternatively, the second set of syntax elements may be associated with a different video frame than the first set of syntax elements used to generate the context tree. The context information used to code the second set of syntax elements may be represented as a value of a second context vector. The second context vector may be the same or different than the context vector containing the values of the context information used to code the first set of context information.

６０８において、コンテキストツリーの複数のノードのうちの１つは、第２のセットのシンタックス要素のうちのシンタックス要素の１つに関連するコンテキスト情報の値に基づいて識別される。識別されたノードは、第２のセットのシンタックス要素のうちのそのシンタックス要素を含むデータグループを表す。コンテキスト情報の値に基づいてノードを識別することは、例えば、シンタックス要素に関連する第２のコンテキスト情報の値に対して、（例えば、コンテキストツリーを生成するために用いられる）以前にコーディングされたブロックに関連するシンタックス要素を分離するために使用される分離基準のうちのいくつかを適用することを含み得る。 At 608, one of the plurality of nodes of the context tree is identified based on a value of context information associated with one of the syntax elements of the second set of syntax elements. The identified node represents a data group that includes that syntax element of the second set of syntax elements. Identifying the node based on the value of the context information is coded previously (eg, used to generate the context tree), eg, for the value of the second context information associated with the syntax element. May include applying some of the separation criteria used to separate the syntax elements associated with the blocks.

例えば、コンテキストツリーがバイナリツリーの場合、コンテキスト情報の値は、真または偽として分離基準のいくつかに対して分解され得る。第１のツリーレベルの第１のノードを第２のツリーレベルの第２および第３のノードに分離するために使用される第１の分離基準は、コンテキストベクトルの第１のインデックスのコンテキスト情報の値が３より大きいかどうかを問うことができる。もしそうでない場合、第２のノードを第３のツリーレベルの第４および第５のノードに分離するために使用される分離基準が、コンテキスト情報の値に対して適用され得る。しかしながら、もしそうである場合、第３のノードを第３のツリーレベルの第６および第７のノードに分離するために使用される分離基準が、コンテキスト情報の値に対してかわりに適用され得る。このプロセスは、１つのノードがコンテキスト情報の値と分離基準とに基づいて識別されるまで繰り返され得る。 For example, if the context tree is a binary tree, the context information values may be decomposed as true or false against some of the separation criteria. The first separation criterion used to separate the first node of the first tree level into the second and third nodes of the second tree level is the context information of the first index of the context vector. You can ask if the value is greater than 3. If not, the separation criterion used to separate the second node into the fourth and fifth nodes of the third tree level may be applied to the value of the context information. However, if so, the separation criteria used to separate the third node into the sixth and seventh nodes of the third tree level may be applied instead to the value of the context information. .. This process may be repeated until one node is identified based on the value of the contextual information and the separation criteria.

６１０において、第２のセットのシンタックス要素のうちのそのシンタックス要素は、識別されたノードに関連する確率モデルに従ってコーディングされる。コンテキストツリーのノード、またはそれらのノードによって表されるデータグループは、各々、確率モデルに関連している。ノードまたはデータグループに関連する確率モデルは、そのノードまたはデータグループのシンタックス要素についての確率を反映できる。いくつかの実装形態では、確率モデルは、そのノードまたはデータグループのそれらのシンタックス要素のみについての確率を特定し（例えば、含み）、そのデータグループに含まれていないシンタックス要素についての確率を特定しない（例えば、含まない）。 At 610, the syntax element of the second set of syntax elements is coded according to a probabilistic model associated with the identified node. The nodes of the context tree, or the data groups represented by those nodes, are each associated with a probabilistic model. The probabilistic model associated with a node or data group can reflect the probabilities for the syntax elements of that node or data group. In some implementations, the probabilistic model identifies (eg, includes) probabilities only for those syntax elements of the node or data group and determines probabilities for syntax elements that are not included in the data group. Not specified (eg, not included).

確率モデルは、ビデオフレームのブロックに関連するシンタックス要素が、特定の値になる確率、またはそのブロックまたはフレームに存在する確率などを示すことができる。例えば、確率モデルは、シンタックス要素のうちの１つまたは複数に関連し得る異なる確率を反映する整数値を含むことができる。シンタックス要素についての所与の確率を、整数値を最大値で除算することによって導出されるパーセンテージとして表すことができるように、確率モデルに対し最大値が定義され得る。例えば、確率モデルに対する最大値は、２５６のスケールにすることができる。シンタックス要素についての確率は、値１１９を反映し得る。従って、確率モデルは、そのシンタックス要素に関連する１１９／２５６の確率があることを示すことになる。 The probabilistic model can indicate the probability that a syntax element associated with a block of a video frame will have a particular value, or will be present in that block or frame. For example, the probabilistic model may include integer values that reflect different probabilities that may be associated with one or more of the syntax elements. A maximum value may be defined for a probability model so that a given probability for a syntax element can be expressed as a percentage derived by dividing an integer value by the maximum value. For example, the maximum value for the probabilistic model can be on a scale of 256. The probabilities for syntax elements may reflect the value 119. Therefore, the probabilistic model will show that there is a 119/256 probability associated with that syntax element.

識別されたノードに関連する確率モデルに従って第２のセットのシンタックス要素のうちのそのシンタックス要素をコーディングすることは、符号化プロセス中に、確率モデルに従って第２のセットのシンタックス要素のうちのそのシンタックス要素を符号化すること、または確率モデルに従って第２のセットのシンタックス要素のうちのそのシンタックス要素を復号化することを含み得る。識別された確率モデルに関連する確率は、（例えば、図４に示されるエントロピー符号化ステージ４０８、または図５に示されるエントロピー復号化ステージ５０２において）エントロピーコーディングを用いて処理される。例えば、算術コーディングを用いて、閾値を満たす確率を有するシンタックス要素を符号化または復号化して、シンタックス要素が、その確率が低すぎる際に符号化または復号化されないようにすることができる。 Coding that syntax element of the second set of syntax elements according to the probabilistic model associated with the identified node is performed during the encoding process according to the probabilistic model of the second set of syntax elements. Of the syntax elements of, or decoding the syntax elements of the second set of syntax elements according to a probabilistic model. Probabilities associated with the identified probabilistic model are processed using entropy coding (eg, in entropy encoding stage 408 shown in FIG. 4 or entropy decoding stage 502 shown in FIG. 5). For example, arithmetic coding can be used to encode or decode syntax elements that have a probability of meeting a threshold so that the syntax element is not coded or decoded when its probability is too low.

従って、算術コーディングを実行して、ビットストリームの合計サイズ、またはビットストリームを送信する際のコストを最小化するなどのように、ビットストリームに符号化されるシンタックス要素の総数を制限することができる。従って、符号化動作中、および算術コーディングが識別された確率モデルに従って第２のセットのシンタックス要素に対して実行された後、第２のセットのシンタックス要素は、符号化されたビットストリームに圧縮されることにより符号化される。あるいは、復号化動作中、および算術コーディングが識別された確率モデルに従って第２のセットのシンタックス要素に対して実行された後、第２のセットのシンタックス要素は、符号化されたビットストリームから圧縮解除されることにより復号化される。 Therefore, arithmetic coding may be performed to limit the total size of the bitstream or the total number of syntax elements encoded in the bitstream, such as minimizing the cost of sending the bitstream. it can. Therefore, during the encoding operation, and after the arithmetic coding is performed on the second set of syntax elements according to the identified probabilistic model, the second set of syntax elements is converted into an encoded bitstream. It is encoded by being compressed. Alternatively, during the decoding operation, and after the arithmetic coding is performed on the second set of syntax elements according to the identified probabilistic model, the second set of syntax elements is extracted from the encoded bitstream. It is decrypted by being decompressed.

いくつかの実装形態では、技術６００は、第２のセットのシンタックス要素を用いてコンテキストツリーを更新し、次に更新されたコンテキストツリーを用いて第２のセットのシンタックス要素をコーディングすることを含み得る。例えば、コンテキストツリーのノードに関連する確率モデルは、第２のセットのシンタックス要素およびそれに関連するコンテキスト情報などに基づいて再計算され得る。次に、第２のセットのシンタックス要素が、再計算された確率モデルに基づいてコーディングされ得る。例えば、再計算されたコスト削減を用いて、リーフノードを分離し、例えばそのリーフノードを、コスト削減に関連する非リーフノードに変更して、これにより、第２のセットのシンタックス要素をコーディングするための推定最低コストが得られる。 In some implementations, the technique 600 updates the context tree with a second set of syntax elements, and then codes the second set of syntax elements with the updated context tree. Can be included. For example, the probabilistic model associated with the nodes of the context tree may be recomputed based on the second set of syntax elements and their associated contextual information, and so on. A second set of syntax elements may then be coded based on the recalculated stochastic model. For example, the recomputed cost savings are used to isolate a leaf node, eg, change that leaf node to a non-leaf node associated with the cost savings, thereby coding a second set of syntax elements. An estimated minimum cost to do is obtained.

いくつかの実装形態では、コンテキストツリーは、符号化されたビットストリーム内で受信されたデータに基づいて生成され得る。例えば、技術６００が符号化されたシンタックス要素を復号化するために実行される場合、コンテキストツリーを示すデータは、符号化器、中継デバイス、または別のコンピューティングデバイスから送信される符号化されたビットストリーム内において復号化器で受信され得る。符号化されたビットストリームは、符号化されたビデオフレームを含み、符号化されたビデオフレームは、符号化されたシンタックス要素が関連する符号化されたブロックを含む。従って、コンテキストツリーは、例えば符号化器から受信された情報に基づいて生成され得る。例えば、ビットストリームの生成に使用されるデータは、符号化されたシンタックス要素を復号化するためのコンテキスト情報のそれぞれに適用される分離基準を示すデータを含み得る。 In some implementations, the context tree may be generated based on the data received in the encoded bitstream. For example, if the technique 600 is performed to decode an encoded syntax element, the data indicative of the context tree may be encoded as transmitted from an encoder, relay device, or another computing device. It can be received at the decoder in a separate bitstream. The encoded bitstream includes encoded video frames, and the encoded video frames include encoded blocks to which the encoded syntax elements are associated. Thus, the context tree may be generated based on the information received from the encoder, for example. For example, the data used to generate the bitstream may include data indicating a separation criterion applied to each of the context information for decoding the encoded syntax elements.

いくつかの実装形態では、ブロックに関連するシンタックス要素は、キャッシュに格納されたデータを用いて符号化または復号化され得る。例えば、キャッシュに格納されたデータは、シンタックス要素がデータグループに分離されていることを示し得る。分離は、例えば、コンテキストツリーの生成、コンテキストツリーの更新、または現在のコンテキストツリーが、示された分離と一致していることの確認のために使用され得る。 In some implementations, the syntax elements associated with a block may be encoded or decoded with data stored in cache. For example, cached data may indicate that syntax elements are separated into data groups. Isolation may be used, for example, to generate a context tree, update a context tree, or confirm that the current context tree matches the indicated isolation.

いくつかの実装形態では、シンタックス要素がそれに従ってコーディングされている確率モデルが更新され得る。例えば、確率モデルは、ビデオフレームの最終ブロックの符号化または復号化に応答して更新され得る。確率モデルを更新することは、シンタックス要素がビデオフレームのブロックに関連付けられる回数をカウントすることを含み得る。例えば、その数は、符号化または復号化されている適用可能な各ブロックに応答して更新され得る。確率モデルは、ビデオフレームの最終ブロックが符号化または復号化された後に得られる合計数に基づいて更新され得る。例えば、カウントが閾値よりも大きい場合、確率モデルを更新して、シンタックス要素が特定の値であるか、または存在する確率が増加したことを反映させることができる。カウントが閾値よりも小さい場合、確率モデルを更新して、その確率が減少したことを反映させることができる。閾値は、例えば、前のビデオフレームのそのシンタックス要素の総数であり得る。 In some implementations, the probabilistic model with the syntax elements coded accordingly may be updated. For example, the probabilistic model may be updated in response to encoding or decoding the final block of the video frame. Updating the probabilistic model may include counting the number of times a syntax element is associated with a block of video frames. For example, the number may be updated in response to each applicable block being encoded or decoded. The probabilistic model may be updated based on the total number obtained after the last block of the video frame has been encoded or decoded. For example, if the count is greater than a threshold, the probabilistic model can be updated to reflect that the syntax element is at a particular value or the probability of being present has increased. If the count is less than the threshold, the probabilistic model can be updated to reflect that the probability has decreased. The threshold may be, for example, the total number of its syntax elements in the previous video frame.

確率モデルの更新は、符号化器および復号化器の各々によって独立して行われ得る。例えば、符号化器および復号化器は、本開示の技術に従ってシンタックス要素を符号化および復号化するために使用可能な確率モデルを別々に格納し得る。あるいは、確率モデルの更新は、符号化器で決定され、復号化器に伝達され得る。例えば、符号化器は、ビデオフレームが符号化された後に確率モデルに関連する確率を更新し、それらの更新された確率を復号化器と同期して、符号化されたビデオフレームの復号化に使用できるようにする。 The updating of the probabilistic model can be done independently by each of the encoder and the decoder. For example, an encoder and a decoder may separately store probabilistic models that can be used to encode and decode syntax elements according to the techniques of this disclosure. Alternatively, the probabilistic model updates may be determined at the encoder and communicated to the decoder. For example, the encoder updates the probabilities associated with the probabilistic model after the video frame is encoded and synchronizes those updated probabilities with the decoder to decode the encoded video frame. Make it available.

いくつかの実装形態では、ブロックに関連するシンタックス要素をコーディングするためのコンテキスト情報を識別することは、シンタックス要素をコーディングするために使用されるセットのコンテキスト情報のうちの１つを選択することを含み得る。符号化器は、それに利用可能なＮセットのうちの１つを選択し、その選択されたセットの可能な値のみに基づいてコンテキストベクトルを生成、受信、またはそうでなければ識別するように構成され得る。例えば、符号化器は、同じビデオフレームの他のブロックに関連するシンタックス要素をコーディングするために最も使用されているセットのコンテキスト情報を選択し得る。別の例では、符号化器は、ビデオフレームの直近に符号化されたブロックによってシンタックス要素をコーディングするために使用されたセットのコンテキスト情報を選択してもよい。コンテキストベクトルは、他のセットのコンテキスト情報の値を含まない。そのため、コンテキストベクトルの各インデックスは、選択されたセットのコンテキスト情報の単一の値を反映する。 In some implementations, identifying context information for coding syntax elements associated with a block selects one of a set of context information used to code the syntax elements. May be included. The encoder is configured to select one of the N sets available to it and to generate, receive, or otherwise identify a context vector based only on the possible values of the selected set. Can be done. For example, the encoder may select the set of context information that is most used to code syntax elements associated with other blocks of the same video frame. In another example, the encoder may select the set of contextual information used to code the syntax elements by the most recently encoded block of the video frame. The context vector does not include values for other sets of context information. As such, each index in the context vector reflects a single value of context information for the selected set.

いくつかの実装形態では、符号化器または復号化器は、セットのコンテキスト情報内のコンテキスト情報のタイプが、その値を決定する前にブロックに関連するシンタックス要素のコーディングすることに適しているかどうかを判定し得る。例えば、セット内のコンテキスト情報のタイプのうちの１つまたは複数が、ブロックに関連するシンタックス要素を符号化または復号化するために使用できる意味のある情報に関連しない、またはそれを提供しない場合があり得る。符号化器または復号化器は、無関係なタイプのコンテキスト情報に対する値を判定しないか、さもなければコンテキストベクトル内に無関係なタイプのコンテキスト情報を含めず、または使用しない可能性がある。いくつかの実装形態では、復号化器は、コンテキストベクトルを生成するように構成されていない場合がある。例えば、シンタックス要素を符号化するために使用されるコンテキスト情報の値は、コンテキストベクトルを使用するなどして、符号化器から復号化器に伝達され得る。復号化器は、符号化器から受信されたコンテキストベクトルに含まれる値を用いて、シンタックス要素を復号化できる。 In some implementations, the encoder or decoder does the type of context information within the set context information is suitable for coding the syntax elements associated with the block before determining its value? You can judge. For example, if one or more of the types of contextual information in the set are not associated with or provide meaningful information that can be used to encode or decode syntax elements associated with the block. Can be. The encoder or decoder may not determine a value for irrelevant type context information, or otherwise do not include or use irrelevant type context information in the context vector. In some implementations, the decoder may not be configured to generate the context vector. For example, the value of the context information used to encode the syntax element may be conveyed from the encoder to the decoder, such as by using the context vector. The decoder can decode the syntax element using the values contained in the context vector received from the encoder.

次に図７を参照すると、ビデオフレームのブロックに関連するシンタックス要素をコーディングするためのコンテキストツリーを生成するための技術７００のフローチャートが示されている。技術７００は、６０４で実行されるような、図６に示される技術６００の動作のうちの１つまたは複数を含み得る。そのため、技術７００は、ビデオフレームのブロックに関連するシンタックス要素をコーディングするための技術、例えば技術６００の一部として実行され得る。あるいは、技術７００は、シンタックス要素をコーディングするための技術とは別個に実行されてもよい。例えば、コンテキストツリーを生成するための動作は、シンタックス要素をコーディングするための動作とは独立していてもよい。これは、例えば、コンテキストツリーがシンタックス要素のサンプルセットを用いて生成され、コンテキストツリー自体がシンタックス要素をコーディングするために使用される前に、トレーニングされてコンテキストツリーを更新する場合であり得る。 Referring now to FIG. 7, a flowchart of a technique 700 for generating a context tree for coding syntax elements associated with blocks of video frames is shown. Technique 700 may include one or more of the operations of technique 600 shown in FIG. 6, such as performed at 604. As such, technique 700 may be implemented as part of a technique, eg technique 600, for coding syntax elements associated with blocks of video frames. Alternatively, technique 700 may be performed separately from the technique for coding syntax elements. For example, the act of generating the context tree may be independent of the act of coding the syntax element. This may be the case, for example, if the context tree is generated with a sample set of syntax elements and is trained to update the context tree before the context tree itself is used to code the syntax elements. ..

７０２において、第１のデータグループに対して使用する分離基準が決定される。第１のデータグループは、生成されるコンテキストツリーを用いて符号化または復号化するためのシンタックス要素のすべてを含むデータグループであり得る。分離基準は、コンテキスト情報を評価するために使用できる複数の分離基準のうちの１つであり得る。例えば、符号化器または復号化器は、例えばデータベースまたは他のデータストアに格納され得る分離基準のリストへのアクセスを有し得る。リストの各分離基準は、コンテキストツリーのタイプに応じて、数値、数値の範囲、またはバイナリ値を返すことができる表現を含み得る。例えば、コンテキストツリーがバイナリツリーである場合、各分離基準は、コンテキスト情報の値に適用された場合に、真または偽として返す表現を含み得る。 At 702, the separation criterion to use for the first group of data is determined. The first data group may be a data group that includes all syntax elements for encoding or decoding with the generated context tree. The separation criterion can be one of a plurality of separation criteria that can be used to evaluate contextual information. For example, the encoder or decoder may have access to a list of separation criteria, which may be stored in, for example, a database or other data store. Each separation criterion in the list may include a representation that can return a number, a range of numbers, or a binary value, depending on the type of context tree. For example, if the context tree is a binary tree, each separation criterion may include expressions that, when applied to the value of context information, return as true or false.

第１のデータグループに適用する分離基準を判定することは、符号化または復号化する複数のシンタックス要素についてのコンテキスト情報の異なる値に対して異なる分離基準を適用して、候補コスト削減を判定することを含み得る。各候補コスト削減は、１つの分離基準と、それが適用されるコンテキスト情報の対応する値とを表すことができる。複数の候補コスト削減を比較して、そのうちの最高のコスト削減を判定することができる。その最高のコスト削減をもたらす分離基準は、第１のデータグループに対して使用する分離基準として決定され得る。例えば、候補コスト削減を判定することには、すでに生成されたコンテキストツリーのノードに対して幅優先探索を実行して、分離する１つまたは複数の候補データグループを識別することを含み得る。従って、候補コスト削減のうちの異なるものは、１つまたは複数の候補データグループのうちの異なるものに関連する。 Determining the separation criterion to apply to the first group of data includes applying different separation criteria to different values of context information for multiple syntax elements to encode or decode to determine a candidate cost reduction. Can include doing. Each candidate cost reduction may represent one separation criterion and the corresponding value of context information to which it applies. Multiple candidate cost savings can be compared to determine the highest cost savings of them. The separation criterion that provides the highest cost savings can be determined as the separation criterion to use for the first data group. For example, determining a candidate cost reduction may include performing a breadth-first search on nodes of the context tree that have already been generated to identify one or more candidate data groups to separate. Thus, different ones of the candidate cost savings are associated with different one or more candidate data groups.

７０４において、第１のデータグループは、決定された分離基準を用いて、第２のデータグループと第３のデータグループとに分離される。例えば、コンテキストツリーがバイナリツリーである場合、決定された分離基準を用いて第１のデータグループを第２のデータグループと第３のデータグループとに分離することは、第１のデータグループのシンタックス要素のどれが、分離基準が使用される際に真として分解され、どれが代わりに偽として分解されるかを決定することを含み得る。例えば、第２のデータグループは、真として分解される第１のデータグループのシンタックス要素を含み、第３のデータグループは、偽として分解される第１のデータグループのシンタックス要素を含むことができる。第１のデータグループは、第１のノードを用いてコンテキストツリーで表され得る。従って、第１のデータグループを分離することは、コンテキストツリー内の第２のデータグループを表す第２のノードを生成すること、コンテキストツリー内の第３のデータグループを表す第３のノードを生成することを含み得る。 At 704, the first data group is separated into a second data group and a third data group using the determined separation criteria. For example, if the context tree is a binary tree, then separating the first data group into a second data group and a third data group using the determined separation criteria is a synonym for the first data group. It may include determining which of the tax elements are decomposed as true when the separation criterion is used and which are decomposed as false instead. For example, the second data group includes the syntax elements of the first data group that are decomposed as true, and the third data group includes the syntax elements of the first data group that is decomposed as false. You can The first data group may be represented in the context tree with the first node. Thus, separating the first data group creates a second node representing the second data group in the context tree, creating a third node representing the third data group in the context tree. Can include doing.

７０６において、第２および第３のデータグループの各々に対し使用する分離基準が判定される。第２のデータグループに対し使用する分離基準を判定することは、候補コスト削減が決定されたシンタックス要素の第１の部分についてのコンテキスト情報の異なる値に対して異なる分離基準を適用することから生じるセットの候補コスト削減を判定することを含み、その結果、第１のデータグループが第２および第３のデータグループに分離される。最高の候補コスト削減をもたらす分離基準が、第２のデータグループに対して選択される。第３のデータグループに対し使用する分離基準を判定することも、候補コスト削減が決定されたシンタックス要素の第２の部分についてのコンテキスト情報の異なる値に対して異なる分離基準を適用することから生じるセットの候補コスト削減を判定することを含み、その結果、第１のデータグループが第２および第３のデータグループに分離される。最高の候補コスト削減をもたらす分離基準が、第３のデータグループに対して選択される。第１のデータグループの分離をもたらす候補コスト削減の対象となるシンタックス要素の第１の部分および第２の部分は、それらのシンタックス要素の異なる部分であり得る。あるいは、第１の部分および第２の部分は、第１のデータグループの分離をもたらす候補コスト削減の対象となるシンタックス要素の一部またはすべてを共有してもよい。 At 706, a separation criterion to use for each of the second and third data groups is determined. Determining the separation criterion to use for the second data group is because applying different separation criteria to different values of contextual information for the first part of the syntax element for which the candidate cost reduction was determined. Determining the resulting set of candidate cost savings, such that the first data group is separated into second and third data groups. The separation criterion that yields the highest candidate cost savings is selected for the second data group. Since determining the separation criterion to use for the third data group also applies different separation criterion to different values of context information for the second part of the syntax element for which the candidate cost reduction was determined. Determining the resulting set of candidate cost savings, such that the first data group is separated into second and third data groups. The separation criterion that yields the highest candidate cost savings is selected for the third data group. The first portion and the second portion of the syntax elements subject to candidate cost reductions that result in the separation of the first data group may be different portions of those syntax elements. Alternatively, the first portion and the second portion may share some or all of the syntax elements subject to candidate cost reduction resulting in the separation of the first data group.

７０８において、第２のデータグループに対して選択された分離基準を使用することにより生じる最高のコスト削減が、第３のデータグループに対して選択される分離基準を使用することにより得られる最高のコスト削減と比較される。例えば、その比較に基づいて、第２のデータグループに対して選択された分離基準を使用することにより生じる最高のコスト削減は、第３のデータグループに対して選択された分離基準を使用することにより生じる最高コスト削減よりも大きいという判定がなされ得る。 At 708, the highest cost savings resulting from using the separation criterion selected for the second data group is the best obtained by using the separation criterion selected for the third data group. Compared to cost savings. For example, the highest cost savings resulting from using the separation criterion selected for the second data group based on that comparison is to use the separation criterion selected for the third data group. A determination can be made that it is greater than the maximum cost savings caused by.

７１０において、第２のデータグループに対して選択された分離基準を使用することにより生じる最高のコスト削減が第２のデータグループに対して選択された分離基準を使用することにより生じる最高のコスト削減よりも大きいという判定に応答して、第２のデータグループは分離され得る。例えば、第２のデータグループは、第２のデータグループに対して選択された分離基準を用いて、第４のデータグループと第５のデータグループとに分離され得る。第２のデータグループを分離することは、コンテキストツリー内の第４のデータグループを表すノードを生成すること、コンテキストツリー内の第５のデータグループを表すノードを生成することをさらに含み得る。 At 710, the highest cost savings incurred by using the segregation criteria selected for the second data group is the highest cost savings incurred by using the segregation criteria selected for the second data group. The second group of data may be separated in response to the determination of greater than. For example, the second data group may be separated into a fourth data group and a fifth data group using the separation criteria selected for the second data group. Separating the second data group may further include generating a node representing the fourth data group in the context tree, generating a node representing the fifth data group in the context tree.

７１２において、第１、第２、第３、第４、および第５のデータグループを表すノードを含むコンテキストツリーが生成される。コンテキストツリーを生成することは、上記のノードを生成することを含み得る。例えば、第１のデータグループを表すノードは、第１のデータグループを第２および第３のデータグループに分離する際に、または分離する前に生成されない場合がある。同様に、第４のデータグループを表すノードは、第２のデータグループを第４および第５のデータグループに分離する際に、または分離する前に生成されない場合がある。代わりに、これらのノードはすべて、データグループの分離が完了した後、同時に、またはほぼ同時に生成され得る。あるいは、コンテキストツリーを生成することは、すでに生成されたノードを（例えば、データグループのそれぞれの分離の際に、または前に）コンテキストツリーに関連付けることを含み得る。 At 712, a context tree is generated that includes nodes representing the first, second, third, fourth, and fifth data groups. Generating the context tree may include generating the nodes described above. For example, the node representing the first data group may not have been generated during or prior to separating the first data group into second and third data groups. Similarly, the node representing the fourth data group may not have been generated during or prior to the separation of the second data group into the fourth and fifth data groups. Alternatively, all of these nodes may be created at the same time, or near the same time, after the separation of the data groups is complete. Alternatively, generating the context tree may include associating previously generated nodes with the context tree (eg, during or before each separation of the data groups).

いくつかの実装形態では、技術７００は、第１のデータグループを第２および第３のデータグループに分離する前に、第２および第３のデータグループをエントロピーコーディングするための合計コストが、第１のデータグループをエントロピーコーディングするためのコストよりも小さいかどうかを判定することを含み得る。例えば、コンテキスト情報の値に対して適用する分離基準を判定することは、第１のデータグループ、第２のデータグループ、および第３のデータグループの各々をエントロピーコーディングするコストを計算することを含み得る。次いで、第１のデータグループをエントロピーコーディングするためのコストと、第２および第３のデータグループをエントロピーコーディングするための合計コストとの間の比較が実行され得る。例えば、ノードによって表されるデータグループをエントロピーコーディングするための合計コストが、それらのノードを生成するために分離されるデータグループをエントロピーコーディングするためのコストよりも小さくないという判定がなされた場合、それらのノードは生成されない場合がある。 In some implementations, the technique 700 may reduce the total cost for entropy coding the second and third data groups before separating the first data group into the second and third data groups. It may include determining if the cost is less than the cost of entropy coding the one data group. For example, determining a separation criterion to apply to the value of context information includes calculating a cost of entropy coding each of the first data group, the second data group, and the third data group. obtain. A comparison may then be performed between the cost of entropy coding the first data group and the total cost of entropy coding the second and third data groups. For example, if it is determined that the total cost to entropy code the data groups represented by the nodes is not less than the cost to entropy code the data groups that are separated to produce those nodes, Those nodes may not be created.

いくつかの実装形態では、第１のデータグループをエントロピーコーディングするためのコストと第２および第３のデータグループをエントロピーコーディングするための合計コストとの比較は、ノードが、データグループの分離の一部として生成された後に実行され得る。例えば、新しく生成されたノードによって表されるデータグループをエントロピーコーディングするための合計コストが、それらのノードを生成するために分離されたデータグループをエントロピーコーディングするためのコストよりも小さくないという判定がなされた場合、それらの新たに生成されたノードは、コンテキストツリーから除去され得る。 In some implementations, comparing the cost of entropy coding the first data group with the total cost of entropy coding the second and third data groups is performed by the node It can be executed after being generated as a part. For example, a determination that the total cost to entropy code the data groups represented by the newly generated nodes is not less than the cost to entropy code the separated data groups to generate those nodes. If done, those newly created nodes may be removed from the context tree.

いくつかの実装形態では、データグループは、コンテキスト情報の値に分離基準を適用することから生じるコスト削減の量に基づいて、２つ以上のデータグループに分離される。例えば、データグループの分離は、最高の候補コスト削減が削減閾値を満たしているという判定に応答し得る。削減閾値は、整数、浮動小数点数、またはブロックに関連するシンタックス要素をコーディングするためのコストの最小減少を表す他の値であり得る。その最小減少が、コンテキスト情報の値への分離基準の適用により満たされない場合、データグループは分離されない可能性がある。例えば、削減閾値は、ブロックに関連するシンタックス要素のコーディング効率を改善することなく計算リソースが消費されるのを防ぐために使用され得る。 In some implementations, the data groups are separated into two or more data groups based on the amount of cost savings resulting from applying the separation criteria to the value of context information. For example, separation of data groups may be responsive to a determination that the highest candidate cost reduction meets the reduction threshold. The reduction threshold can be an integer, a floating point number, or other value that represents a minimum reduction in cost for coding syntax elements associated with a block. A data group may not be separated if its minimum reduction is not met by the application of a separation criterion to the value of context information. For example, the reduction threshold may be used to prevent computational resources from being consumed without improving coding efficiency of syntax elements associated with the block.

次に、図８〜図９を参照して、コンテキストツリーを用いてシンタックス要素をコーディングするためのシステムが説明される。図８〜図９に示されるシステムは、図６に示される技術６００、または図７に示される技術７００の一方または両方のすべてまたは一部を使用することを含み得る。例えば、システムは、技術６００または技術７００の一方または両方のすべてまたは一部を実行するために使用されるハードウェアおよび／またはソフトウェアコンポーネントを表し得る。 A system for coding syntax elements with a context tree is now described with reference to FIGS. 8-9. The systems shown in FIGS. 8-9 may include using all or a portion of one or both of the technique 600 shown in FIG. 6 or the technique 700 shown in FIG. 7. For example, a system may represent hardware and/or software components used to perform all or part of one or both of technology 600 or technology 700.

図８は、ビデオフレームの現在のブロックに関連するシンタックス要素を符号化するためのシステムのブロック図である。シンタックス要素を符号化するためのシステムは、例えば、図４に示される符号化器４００などの符号化器によって、またはそれを用いて実装され得る。システムは、以前にコーディングされたシンタックス要素８０２に対応するコンテキスト情報８００を含む。コンテキスト情報８００は、符号化器による使用のために定義されたコンテキスト情報の１つまたは複数のセットであり得る。コンテキスト情報８００は、セットのコンテキスト情報の値に基づいて生成、受信、またはそうでなければ識別されるコンテキストベクトルを含み得る。シンタックス要素８０２は、符号化されるブロックに関連するシンタックス要素である。 FIG. 8 is a block diagram of a system for encoding syntax elements associated with a current block of a video frame. A system for encoding syntax elements may be implemented by, for example, an encoder such as encoder 400 shown in FIG. The system includes context information 800 corresponding to previously coded syntax elements 802. Context information 800 may be one or more sets of context information defined for use by the encoder. Context information 800 may include context vectors generated, received, or otherwise identified based on the values of the set of context information. Syntax element 802 is a syntax element associated with the block to be encoded.

コンテキスト情報８００および以前にコーディングされたシンタックス要素８０２は、それぞれコンテキストキャッシュ８０４およびシンタックス要素キャッシュ８０６にキャッシュされる。次いで、コンテキストキャッシュ８０４およびシンタックス要素キャッシュ８０６に格納されたデータは、コンテキストツリー８０８を生成するための入力として使用される。例えば、コンテキスト情報８００に含まれるコンテキストベクトルは、コンテキストキャッシュ８０４にキャッシュされ、その後、コンテキストツリー８０８を生成するように構成されたソフトウェアコンポーネントによって受信され得る。例えば、コンテキストベクトルに含まれるコンテキスト情報の値は、符号化器が利用可能な分離基準とともに使用されて、シンタックス要素キャッシュ８０６内のシンタックス要素の１つを符号化するための最低コストに基づいてコンテキストツリー８０８を生成することができる。コンテキストツリー８０８が生成された後、確率モデル８１０は、コンテキストツリー内のそれぞれのリーフノードについて、それらのリーフノードによって表されるデータグループに基づいて推定され得る。確率モデルは、入来するシンタックス要素（例えば、コンテキストツリー８０８を生成するために使用されるシンタックス要素８０２が、ビデオフレームの第１のブロックに関連する場合などの、そのビデオフレームの第２のブロックに関連するシンタックス要素）を符号化するために使用される。例えば、以前にコーディングされたシンタックス要素８０２は、ビデオフレームの第１のブロックに関連する第１のセットのシンタックス要素を指し、入来するシンタックス要素は、符号化されるべき現在のブロックなどの、ビデオフレームの第２のブロックに関連する第２のセットのシンタックス要素を指し得る。入来するシンタックス要素を符号化するために、その入来するシンタックス要素に対応するコンテキスト情報（例えば、コンテキスト情報８００の）が識別され得る。次に、そのコンテキスト情報を適用して、その入来するシンタックス要素を符号化するためのコストが最も低くなるコンテキストツリー８０８のノードを識別することができる。次に、コンテキストツリー８０８のその識別されたノードに関連する確率モデル８１０が識別され得る。確率モデル８１０は、符号化器に利用可能な異なるコンテキスト情報についての確率の確率テーブルに格納された複数の確率モデルのうちの１つであり得る。確率モデル８１０を識別することは、例えば、コンテキストツリー８０８の識別されたノードに関連する情報を用いて、確率テーブルを照会することを含み得る。 Context information 800 and previously coded syntax elements 802 are cached in context cache 804 and syntax element cache 806, respectively. The data stored in context cache 804 and syntax element cache 806 is then used as input to generate context tree 808. For example, the context vector contained in the context information 800 may be cached in the context cache 804 and subsequently received by a software component configured to generate the context tree 808. For example, the value of the context information contained in the context vector is used in conjunction with the separation criteria available to the encoder to base the lowest cost to encode one of the syntax elements in syntax element cache 806. Can generate a context tree 808. After the context tree 808 is generated, the probabilistic model 810 can be estimated for each leaf node in the context tree based on the data groups represented by those leaf nodes. The probabilistic model is based on an incoming syntax element (eg, the syntax element 802 used to generate the context tree 808 is associated with the second block of the video frame, such as when the second block of the video frame is associated with the first block). Syntax element associated with a block of. For example, previously coded syntax element 802 refers to a first set of syntax elements associated with a first block of a video frame, the incoming syntax elements being the current block to be encoded. May refer to a second set of syntax elements associated with a second block of a video frame, such as. To encode the incoming syntax element, context information (eg, of context information 800) corresponding to the incoming syntax element may be identified. The context information can then be applied to identify the node in the context tree 808 that has the lowest cost to encode the incoming syntax element. The probabilistic model 810 associated with that identified node of the context tree 808 may then be identified. Probabilistic model 810 may be one of a plurality of probabilistic models stored in a probability table of probabilities for different context information available to the encoder. Identifying the probabilistic model 810 may include, for example, querying the probability table with information associated with the identified node in the context tree 808.

従って、符号化されるべき入来するシンタックス要素に対応するコンテキスト情報は、コンテキストツリー８０８に渡されて、確率モデル８１０を要求するか、さもなければ識別する。次に、確率モデル８１０は、算術コーダ８１２に渡される。算術コーダ８１２は、確率モデル８１０の確率を用いて算術コーディングを実行するように構成されたソフトウェアモジュールまたは同様のコンポーネントであり得る。入来するシンタックス要素も算術コーダ８１２に渡される。次いで、算術コーダ８１２は、確率モデル８１０に従って入来するシンタックス要素をエントロピー符号化する。算術コーダ８１２によるエントロピー符号化の結果として、符号化されたブロック８１４が生成される。符号化されたブロック８１４は、符号化された形式の入来するシンタックス要素を含む。符号化ブロック８１４は、図４に示される圧縮されたビットストリーム４２０などの符号化されたビットストリームに含まれてもよい。 Accordingly, the context information corresponding to the incoming syntax element to be encoded is passed to the context tree 808 to request or otherwise identify the probabilistic model 810. The probabilistic model 810 is then passed to the arithmetic coder 812. Arithmetic coder 812 may be a software module or similar component configured to perform arithmetic coding with the probabilities of probabilistic model 810. Incoming syntax elements are also passed to arithmetic coder 812. Arithmetic coder 812 then entropy codes the incoming syntax elements according to stochastic model 810. As a result of entropy coding by arithmetic coder 812, coded block 814 is generated. Encoded block 814 includes the incoming syntax elements in encoded form. Encoding block 814 may be included in an encoded bitstream, such as compressed bitstream 420 shown in FIG.

コンテキストキャッシュ８０４およびシンタックス要素キャッシュ８０６は、シンタックス要素８０２および符号化された対応するコンテキスト情報８００を格納することができる。コンテキストキャッシュ８０４およびシンタックス要素キャッシュ８０６は、コンテキストツリー８０８を生成するために使用され得る。コンテキストキャッシュ８０４およびシンタックス要素キャッシュ８０６は、後続のセットのシンタックス要素が受信されてシンタックス要素キャッシュ８０６にキャッシュされるか、そうでなければ符号化のために識別される場合に、既存のコンテキストツリー８０４を更新するために用いられ得る。シンタックス要素キャッシュ８０６は、シンタックス要素８０２、入来するシンタックス要素、および／または符号化された他のセットのシンタックス要素を格納するためのキャッシュであってよい。例えば、後続のセットのシンタックス要素は、シンタックス要素キャッシュ８０６に格納されたシンタックス要素と比較され得る。シンタックス要素キャッシュ８０６に格納された情報との一致が判定された場合、コンテキストツリー８０８は、修正なしで使用され得る。 Context cache 804 and syntax element cache 806 may store syntax element 802 and corresponding encoded context information 800. Context cache 804 and syntax element cache 806 may be used to generate context tree 808. The context cache 804 and the syntax element cache 806 are pre-existing when a subsequent set of syntax elements is received and cached in the syntax element cache 806 or otherwise identified for encoding. It can be used to update the context tree 804. Syntax element cache 806 may be a cache for storing syntax elements 802, incoming syntax elements, and/or other sets of encoded syntax elements. For example, the subsequent set of syntax elements may be compared to the syntax elements stored in syntax element cache 806. If a match with the information stored in the syntax element cache 806 is determined, the context tree 808 can be used without modification.

図９は、符号化されたビデオフレームの符号化されたブロックに関連する符号化されたシンタックス要素を復号化するためのシステムのブロック図である。符号化されたシンタックス要素を復号化するためのシステムは、例えば、図５に示される復号化器５００などの復号化器によって、またはそれを用いて実装され得る。符号化されたビットストリーム９００は、ビデオストリームを表し、ビデオストリームの符号化されたブロックを含む。例えば、符号化されたビットストリーム９００は、図４に示される圧縮されたビットストリーム４２０であってよい。符号化されたビットストリーム９００の符号化されたブロックは、例えば、図８に示される符号化されたブロック８１０であってよい。 FIG. 9 is a block diagram of a system for decoding coded syntax elements associated with coded blocks of a coded video frame. A system for decoding encoded syntax elements may be implemented by, for example, a decoder such as decoder 500 shown in FIG. Encoded bitstream 900 represents a video stream and includes encoded blocks of the video stream. For example, the encoded bitstream 900 may be the compressed bitstream 420 shown in FIG. The encoded block of encoded bitstream 900 may be, for example, encoded block 810 shown in FIG.

システムは、以前にコーディングされたシンタックス要素９０４に対応するコンテキスト情報９０２を使用する。コンテキスト情報９０２は、復号化器による使用のために定義された１つまたは複数のセットのコンテキスト情報であり得る。コンテキスト情報９０２および以前にコーディングされたシンタックス要素９０４は、それぞれコンテキストキャッシュ９０６およびシンタックス要素キャッシュ９０８にキャッシュされる。コンテキストキャッシュ９０６およびシンタックス要素キャッシュ９０８に格納されたデータは、コンテキストツリー９１０を生成するための入力として使用され得る。例えば、コンテキスト情報９０２は、セットのコンテキスト情報の値に基づいて、生成、受信、またはそうでなければ識別されるコンテキストベクトルを含み得る。例えば、コンテキストベクトルは、符号化器から受信され得る。コンテキストベクトルは、コンテキストキャッシュ９０６にキャッシュされる。シンタックス要素９０４は、コンテキストツリー９１０を生成するためにコンテキスト情報９０２とともに使用される以前にコーディングされたシンタックス要素である。コンテキストツリー９１０は、図８に示されるコンテキストツリー８０８であってよい。例えば、コンテキストツリー９１０は、コンテキストツリー８０８が生成されたのと同じ方法で生成され得る。別の例では、コンテキストツリー８０８は、コンテキストツリー９１０としてシンタックス要素を復号化する際の使用のために、復号化器へ通信され得る。次に、生成されたコンテキストツリー９１０を用いて、入来するシンタックス要素（例えば、コンテキストツリー９１０を生成するために使用されるシンタックス要素９０４が、符号化されたビデオフレームの第１の符号化されたブロックに関連する場合などの、その符号化されたビデオフレームの第２の符号化されたブロックに関連するシンタックス要素）を復号化するための確率モデルを識別する。例えば、以前にコーディングされたシンタックス要素９０４は、ビデオフレームの第１のブロックに関連する第１のセットのシンタックス要素を指し、入来するシンタックス要素は、復号化されるべき符号化されたブロックなどの、ビデオフレームの第２のブロックに関連する第２のセットのシンタックス要素を指し得る。 The system uses the context information 902 corresponding to the previously coded syntax element 904. Context information 902 may be one or more sets of context information defined for use by the decoder. Context information 902 and previously coded syntax elements 904 are cached in context cache 906 and syntax element cache 908, respectively. The data stored in context cache 906 and syntax element cache 908 may be used as input to generate context tree 910. For example, contextual information 902 may include a context vector that is generated, received, or otherwise identified based on the value of the set of contextual information. For example, the context vector may be received from the encoder. The context vector is cached in the context cache 906. Syntax element 904 is a previously coded syntax element that is used with context information 902 to generate context tree 910. The context tree 910 may be the context tree 808 shown in FIG. For example, context tree 910 may be generated in the same manner that context tree 808 was generated. In another example, context tree 808 may be communicated to a decoder for use in decoding syntax elements as context tree 910. The generated context tree 910 is then used to generate incoming syntax elements (eg, the syntax element 904 used to generate the context tree 910 is the first code of the encoded video frame). A probabilistic model for decoding the syntax elements associated with the second encoded block of the encoded video frame, such as those associated with encoded blocks. For example, previously coded syntax element 904 refers to a first set of syntax elements associated with a first block of a video frame, and the incoming syntax elements are coded to be decoded. A second set of syntax elements associated with a second block of a video frame, such as a block.

確率モデル９１２は、図８に示される確率モデル８１０に関して上述したのと同じ方法などで（例えば、復号化されるべき入来するシンタックス要素に対応するコンテキスト情報をコンテキストツリー９１０に渡して、確率モデル９１２を識別することにより）、コンテキストツリー９１０に基づいて識別される。次に、確率モデル９１２は、算術コーダ９１４に渡され、入来するシンタックス要素を復号化する。例えば、算術コーダ９１４は、確率モデル９１２の確率を用いて、入来するシンタックス要素をエントロピー復号化するように構成されたソフトウェアモジュールまたは同様のコンポーネントであり得る。復号化されたブロック９１６は、算術コーダ９１４が確率モデル９１２の確率を用いて入来するシンタックス要素をエントロピー復号化することに応答して生成される。次に、復号化されたブロック９１６は、例えば図５に示される出力ビデオストリーム５１６などのビデオストリームの一部として出力され得る。 Probabilistic model 912 may pass context information corresponding to incoming syntax elements to be decoded to context tree 910 in the same manner as described above with respect to probabilistic model 810 shown in FIG. (By identifying the model 912), it is identified based on the context tree 910. Probabilistic model 912 is then passed to arithmetic coder 914 to decode the incoming syntax elements. For example, arithmetic coder 914 can be a software module or similar component configured to entropy decode incoming syntax elements using the probabilities of probabilistic model 912. Decoded block 916 is generated in response to arithmetic coder 914 entropy decoding the incoming syntax elements using the probabilities of probabilistic model 912. The decoded block 916 may then be output as part of a video stream, such as output video stream 516 shown in FIG.

コンテキストキャッシュ９０６およびシンタックス要素キャッシュ９０８は、シンタックス要素９０４および復号化された対応するコンテキスト情報９０２を格納することができる。コンテキストキャッシュ９０６およびシンタックス要素キャッシュ９０８は、コンテキストツリー９１０を生成するために使用され得る。コンテキストキャッシュ９０６およびシンタックス要素キャッシュ９０８は、後続のセットの符号化されたシンタックス要素が受信されてシンタックス要素キャッシュ９０８にキャッシュされるか、そうでなければ復号化のために識別される場合に、既存のコンテキストツリー９１０を更新するために用いられ得る。シンタックス要素キャッシュ９０８は、シンタックス要素９０４、入来するシンタックス要素、および／または他のセットの符号化されたシンタックス要素を、それらが復号化される前に格納するためのキャッシュであってよい。例えば、後続のセットの符号化されたシンタックス要素は、シンタックス要素キャッシュ９０８に格納された符号化されたシンタックス要素と比較され得る。シンタックス要素キャッシュ９０８に格納された情報との一致が判定された場合、コンテキストツリー９１０は、修正なしで使用され得る。 Context cache 906 and syntax element cache 908 can store syntax element 904 and the corresponding decoded context information 902. Context cache 906 and syntax element cache 908 may be used to generate context tree 910. Context cache 906 and syntax element cache 908 are used when a subsequent set of encoded syntax elements is received and cached in syntax element cache 908, or otherwise identified for decoding. , Can be used to update an existing context tree 910. The syntax element cache 908 is a cache for storing syntax elements 904, incoming syntax elements, and/or other sets of encoded syntax elements before they are decoded. You may For example, the subsequent set of encoded syntax elements may be compared to the encoded syntax elements stored in syntax element cache 908. If a match with the information stored in the syntax element cache 908 is determined, the context tree 910 can be used without modification.

次に、図１０Ａ〜図１１を参照すると、例示的なコンテキストツリーの生成が説明されている。図１０Ａ〜図１１に示されるように生成された３つの例示的なコンテキストツリーは、図６に示される技術６００または図７に示される技術７００の一方または両方のすべてまたは一部を実行することによって生成され得る。さらに、図１０Ａ〜図１１に示されるように生成された例示的なコンテキストツリーは、図８〜図９に示されるシステムの一方または両方を用いて生成され得る。例えば、例示的なコンテキストツリーは、図８に示されるシステムを使用する符号化器、または図９に示されるシステムを使用する復号化器によって生成され得る。符号化器または復号化器は、例えば、技術６００に基づいて、シンタックス要素をコーディングする技術の一部としてツリーを生成することができる。あるいは、符号化器または復号化器は、例えば、技術７００に基づいて、シンタックス要素のコーディングとは無関係にツリーを生成することができる。 Referring now to FIGS. 10A-11, exemplary context tree generation is described. The three exemplary context trees generated as shown in FIGS. 10A-11 are for performing all or part of one or both of the technique 600 shown in FIG. 6 or the technique 700 shown in FIG. Can be generated by. Further, the exemplary context trees generated as shown in FIGS. 10A-11 may be generated using one or both of the systems shown in FIGS. 8-9. For example, the exemplary context tree may be generated by an encoder using the system shown in FIG. 8 or a decoder using the system shown in FIG. The encoder or decoder may generate a tree as part of a technique for coding syntax elements, eg, based on technique 600. Alternatively, the encoder or decoder may generate a tree, for example based on technique 700, independent of coding syntax elements.

図１０Ａは、コンテキストツリーを生成するための第１の段階の例を示す図である。例えば、図１０Ａは、例示的なコンテキストツリーの第１のレベルの生成を示す。ノード１０００は、符号化または復号化されるべきブロックに関連するシンタックス要素のすべてを含むデータグループを表すリーフノードである。ノード１０００によって表されるデータグループは、コンテキスト情報の値に対して分離基準を適用することにより、ノード１００２およびノード１００４によって表されるデータグループに分離され得る。上述のように、ノード１０００の分離をもたらす特定の分離基準およびコンテキスト情報の値は、コンテキスト情報の異なる値に対する異なる分離基準の適用に基づいて判定される候補コスト削減に基づいて判定され得る。ノード１０００は、コンテキストベクトルの［０］インデックスに位置するコンテキスト情報の値に、３よりも大きいという分離基準を適用することにより分離される。 FIG. 10A is a diagram showing an example of a first stage for generating a context tree. For example, FIG. 10A illustrates a first level generation of an exemplary context tree. Node 1000 is a leaf node that represents a data group that includes all of the syntax elements associated with the block to be encoded or decoded. The data groups represented by node 1000 may be separated into data groups represented by nodes 1002 and 1004 by applying separation criteria to the values of context information. As mentioned above, the particular isolation criteria that result in the isolation of the node 1000 and the value of the context information may be determined based on a candidate cost reduction determined based on the application of different isolation criteria to different values of context information. Nodes 1000 are separated by applying a separation criterion of greater than 3 to the value of context information located at the [0] index of the context vector.

図１０Ｂは、コンテキストツリーを生成するための第２の段階の例を示す図である。例えば、図１０Ｂは、例示的なコンテキストツリーの第２のレベルの生成を示す。ノード１００２および１００４は、図１０Ａに示された第１の段階においてノード１０００によって表されるデータグループの分離に応答して生成される。ノード１０００によって表されるデータグループは、ノード１００２および１００４によって表されるデータグループに含まれるシンタックス要素をコーディングするための合計コストが、ノード１０００によって表されるデータグループに含まれるシンタックス要素をコーディングするためのコストよりも少ないという判定に応答して、ノード１００２および１００４によって表されるデータグループに分離される。 FIG. 10B is a diagram showing an example of the second stage for generating the context tree. For example, FIG. 10B illustrates a second level generation of the exemplary context tree. Nodes 1002 and 1004 are created in response to the separation of the data groups represented by node 1000 in the first stage shown in FIG. 10A. The data group represented by node 1000 has a total cost for coding the syntax elements contained in the data groups represented by nodes 1002 and 1004, and the total cost of coding the syntax elements contained in the data group represented by node 1000. In response to determining that it is less than costly to code, the data groups represented by nodes 1002 and 1004 are separated.

ノード１００２および１００４はリーフノードである。ノード１０００は、ノード１００２および１００４の生成に応答して非リーフノードになる。ノード１００２は、符号化または復号化されるべきブロックに関連するシンタックス要素のうちのいくつかを含むデータグループを表す。ノード１００４は、これらのシンタックス要素の残りを含むデータグループを表す。ノード１００２によって表されるデータグループは、ノード１００６によって表されるデータグループと、ノード１００８によって表されるデータグループとに分離されることが可能である。ノード１００４によって表されるデータグループは、ノード１０１０によって表されるデータグループと、ノード１０１２によって表されるデータグループとに分離されることが可能である。 Nodes 1002 and 1004 are leaf nodes. Node 1000 becomes a non-leaf node in response to the creation of nodes 1002 and 1004. Node 1002 represents a data group that includes some of the syntax elements associated with the block to be encoded or decoded. Node 1004 represents a data group containing the rest of these syntax elements. The data group represented by node 1002 can be separated into the data group represented by node 1006 and the data group represented by node 1008. The data group represented by node 1004 can be separated into the data group represented by node 1010 and the data group represented by node 1012.

分離基準は、ノード１００２によって表されるデータグループを分離するために選択される。その分離基準は、２による剰余が１よりも小さいという表現を含み、コンテキストベクトルの［２］インデックスに配置されたコンテキスト情報の値に対して適用される。ノード１００２にその分離基準を使用することから生じるコスト削減は２０である。異なる分離基準が、ノード１００４によって表されるデータグループを分離するために選択される。その分離基準は、２よりも小さいという表現を含み、コンテキストベクトルの［１］インデックスに配置されたコンテキスト情報の値に対して適用される。ノード１００４にその分離基準を使用することによるコスト削減は１５である。従って、ノード１００２によって表されるデータグループをノード１００６および１００８によって表されるデータグループに分離することから生じるコスト削減は、ノード１００４によって表されるデータグループをノード１０１０および１０１２によって表されるデータグループに分離することから生じるコスト削減よりも大きい。 Separation criteria are selected to separate the data groups represented by node 1002. The separation criterion includes the expression that the remainder by 2 is less than 1, and is applied to the value of the context information located at the [2] index of the context vector. The cost savings resulting from using that isolation criterion for node 1002 are 20. Different separation criteria are selected to separate the data groups represented by node 1004. The separation criterion includes the expression less than 2 and is applied to the value of the context information located at the [1] index of the context vector. The cost savings by using that isolation criterion for node 1004 is 15. Thus, the cost savings resulting from separating the data group represented by node 1002 into the data groups represented by nodes 1006 and 1008 is the data group represented by node 1004 represented by nodes 1010 and 1012. Greater than the cost savings resulting from the separation.

図１０Ｃは、コンテキストツリーを生成するための第３の段階の例を示す図である。例えば、図１０Ｃは、例示的なコンテキストツリーの第３のレベルの生成を示す。ノード１００６および１００８は、図１０Ｂに示された第２の段階においてノード１００２によって表されるデータグループの分離に応答して生成される。ノード１００２によって表されるデータグループは、ノード１００６および１００８によって表されるデータグループに含まれるシンタックス要素をコーディングするための合計コストが、ノード１００２によって表されるデータグループに含まれるシンタックス要素をコーディングするためのコストよりも少ないという判定に応答して、ノード１００６および１００８によって表されるデータグループに分離される。 FIG. 10C is a diagram showing an example of the third stage for generating the context tree. For example, FIG. 10C illustrates a third level generation of an exemplary context tree. Nodes 1006 and 1008 are created in response to the separation of the data groups represented by node 1002 in the second stage shown in FIG. 10B. The data group represented by node 1002 has a total cost for coding the syntax elements contained in the data groups represented by nodes 1006 and 1008, and In response to determining that it is less than costly to code, it is separated into the data groups represented by nodes 1006 and 1008.

ノード１００６および１００８はリーフノードである。ノード１００２は、ノード１００６および１００８の生成に応答して非リーフノードになる。ノード１００６は、ノード１００２によって表されるデータグループに含まれるシンタックス要素のいくつかを含むデータグループを表す。ノード１００８は、これらのシンタックス要素の残りを含むデータグループを表す。ノード１００６によって表されるデータグループは、ノード１０１４によって表されるデータグループと、ノード１０１６によって表されるデータグループとに分離されることが可能である。ノード１００８によって表されるデータグループは、ノード１０１８によって表されるデータグループと、ノード１０２０によって表されるデータグループとに分離されることが可能である。 Nodes 1006 and 1008 are leaf nodes. Node 1002 becomes a non-leaf node in response to the generation of nodes 1006 and 1008. Node 1006 represents a data group that includes some of the syntax elements contained in the data group represented by node 1002. Node 1008 represents a data group that contains the rest of these syntax elements. The data group represented by node 1006 may be separated into the data group represented by node 1014 and the data group represented by node 1016. The data group represented by node 1008 may be separated into the data group represented by node 1018 and the data group represented by node 1020.

分離基準は、ノード１００６によって表されるデータグループを分離するために選択される。その分離基準は、残りがマイナス１に等しいという表現を含み、コンテキストベクトルの［１］インデックスに配置されたコンテキスト情報の値に対して適用される。ノード１００６にその分離基準を使用することから生じるコスト削減は７である。異なる分離基準が、ノード１００８によって表されるデータグループを分離するために選択される。その分離基準は、１より大きいという表現を含み、コンテキストベクトルの［０］インデックスに配置されたコンテキスト情報の値に対して適用される。ノード１００８にその分離基準を使用することによるコスト削減は１８である。従って、ノード１００８によって表されるデータグループをノード１０１８および１０２０によって表されるデータグループに分離することから生じるコスト削減は、リーフノード１００６によって表されるデータグループをノード１０１４および１０１８によって表されるデータグループに分離することから生じるコスト削減よりも大きく、リーフノード１００４によって表されるデータグループをノード１０１０および１０１２によって表されるデータグループに分離することから生じるコスト削減よりも大きい。 Separation criteria are selected to separate the data groups represented by node 1006. The separation criterion includes the expression that the remainder is equal to minus one and is applied to the value of the context information located at the [1] index of the context vector. The cost savings resulting from using that isolation criterion for node 1006 is 7. Different separation criteria are selected to separate the data groups represented by node 1008. The separation criterion includes the expression greater than 1 and is applied to the value of the context information located at the [0] index of the context vector. The cost savings by using that isolation criterion for node 1008 is 18. Therefore, the cost savings resulting from separating the data group represented by node 1008 into the data groups represented by nodes 1018 and 1020 are the data groups represented by leaf node 1006 and the data represented by nodes 1014 and 1018. The cost savings resulting from the separation into groups are greater than the cost savings resulting from separating the data group represented by leaf node 1004 into the data groups represented by nodes 1010 and 1012.

図１０Ｄは、コンテキストツリーを生成するための第４の段階の例を示す図である。例えば、図１０Ｄは、例示的なコンテキストツリーの第４のレベルの生成を示す。ノード１０１８および１０２０は、図１０Ｃに示された第３の段階においてノード１００８によって表されるデータグループの分離に応答して生成される。ノード１００８によって表されるデータグループは、ノード１０１８および１０２０によって表されるデータグループに含まれるシンタックス要素をコーディングするための合計コストが、ノード１００８によって表されるデータグループに含まれるシンタックス要素をコーディングするためのコストよりも少ないという判定に応答して、ノード１０１８および１０２０によって表されるデータグループに分離される。 FIG. 10D is a diagram showing an example of the fourth stage for generating a context tree. For example, FIG. 10D illustrates a fourth level generation of the exemplary context tree. Nodes 1018 and 1020 are created in response to the separation of the data groups represented by node 1008 in the third stage shown in FIG. 10C. The data group represented by node 1008 has a total cost for coding the syntax elements contained in the data groups represented by nodes 1018 and 1020, and the total cost of coding the syntax elements contained in the data group represented by node 1008. In response to determining that it is less than costly to code, it is separated into the data groups represented by nodes 1018 and 1020.

ノード１０１８および１０２０はリーフノードである。ノード１００８は、ノード１０１８および１０２０の生成に応答して非リーフノードになる。ノード１０１８は、ノード１００８によって表されるデータグループに含まれるシンタックス要素のいくつかを含むデータグループを表す。ノード１０２０は、これらのシンタックス要素の残りを含むデータグループを表す。ノード１０１８によって表されるデータグループは、ノード１０２２によって表されるデータグループと、ノード１０２４によって表されるデータグループとに分離されることが可能である。ノード１０２０によって表されるデータグループは、ノード１０２６によって表されるデータグループと、ノード１０２８によって表されるデータグループとに分離されることが可能である。 Nodes 1018 and 1020 are leaf nodes. Node 1008 becomes a non-leaf node in response to the generation of nodes 1018 and 1020. Node 1018 represents a data group that includes some of the syntax elements contained in the data group represented by node 1008. Node 1020 represents a data group that contains the rest of these syntax elements. The data group represented by node 1018 may be separated into the data group represented by node 1022 and the data group represented by node 1024. The data group represented by node 1020 can be separated into the data group represented by node 1026 and the data group represented by node 1028.

分離基準は、ノード１０１８によって表されるデータグループを分離するために選択される。その分離基準は、残りが１よりも小さいという表現を含み、コンテキストベクトルの［１］インデックスに配置されたコンテキスト情報の値に対して適用される。ノード１０１８にその分離基準を使用することから生じるコスト削減は６である。異なる分離基準が、ノード１０２０によって表されるデータグループを分離するために選択される。その分離基準は、２による剰余が１に等しいという表現を含み、コンテキストベクトルの［２］インデックスに配置されたコンテキスト情報の値に対して適用される。ノード１０２０にその分離基準を使用することによるコスト削減は３である。 Separation criteria are selected to separate the data groups represented by node 1018. The separation criterion includes the expression that the remainder is less than 1 and is applied to the value of the context information located at the [1] index of the context vector. The cost savings resulting from using that isolation criterion for node 1018 is 6. Different separation criteria are selected to separate the data groups represented by node 1020. The separation criterion includes the expression that the remainder by 2 is equal to 1 and is applied to the value of the context information located at the [2] index of the context vector. The cost savings by using that isolation criterion for node 1020 is three.

従って、ノード１０１８によって表されるデータグループをノード１０２２および１０２４によって表されるデータグループに分離することから生じるコスト削減は、ノード１０２０によって表されるデータグループをノード１０２６および１０２８によって表されるデータグループに分離することから生じるコスト削減よりも大きい。しかしながら、ノード１０２２および１０２４によって表されるデータグループに含まれるシンタックス要素をコーディングするための合計コストが、ノード１０１８によって表されるデータグループに含まれるシンタックス要素をコーディングするためのコストよりも著しく少ないわけではないという判定がなされ得る。その判定の結果として、ノード１０１８によって表されるデータグループは分離されず、例示的なコンテキストツリーの第５のレベルにおいてノードは生成されない。 Thus, the cost savings resulting from separating the data group represented by node 1018 into the data groups represented by nodes 1022 and 1024 is the data group represented by node 1020 representing the data group represented by nodes 1026 and 1028. Greater than the cost savings resulting from the separation. However, the total cost for coding the syntax elements included in the data group represented by nodes 1022 and 1024 is significantly greater than the cost for coding the syntax elements included in the data group represented by node 1018. A determination may be made that it is not small. As a result of that determination, the data group represented by node 1018 is not separated and no node is created at the fifth level of the exemplary context tree.

図１１は、コンテキストツリー１１００の一例を示す図である。コンテキストツリー１１００は、図１０Ａ〜図１０Ｄに示される第１、第２、第３、および第４の段階で生成されたノード１０００、１００２、１００４、１００６、１００８、１０１０、１０１２、１０１８、および１０２０を含むバイナリツリーである。例えば、上記では説明しなかったが、ノード１０１０および１０１２は、ノード１０１０および１０１２によって表されるデータグループに含まれるシンタックス要素をコーディングするための合計コストが、ノード１００４によって表されるデータグループに含まれるシンタックス要素をコーディングするためのコストよりも小さいという判定に応答して生成され得る。 FIG. 11 is a diagram showing an example of the context tree 1100. Context tree 1100 includes nodes 1000, 1002, 1004, 1006, 1008, 1010, 1012, 1018, and 1020 generated in the first, second, third, and fourth stages shown in FIGS. 10A-10D. It is a binary tree containing. For example, although not described above, nodes 1010 and 1012 have a total cost of coding the syntax elements included in the data groups represented by nodes 1010 and 1012 that the data groups represented by node 1004. It may be generated in response to determining that it is less than the cost to code the included syntax elements.

コンテキストツリー１１００は、シンタックス要素の後続のコーディングのために使用され得る。あるいは、コンテキストツリー１１００は、さらなる修正なしにシンタックス要素をコーディングするために使用される前にトレーニングされてもよい。例えば、コンテキストツリー１１００は、Ｎ個のシンタックス要素に基づいて生成された可能性がある。別のＮ個のシンタックス要素を用いてコンテキストツリー１１００をトレーニングし、これらのＮ個のシンタックス要素のコーディング効率が最初のＮ個のシンタックス要素よりも改善されていることを検証することができる。その後、コンテキストツリー１１００は、さらなる修正なしに後続のシンタックス要素のさらなるコーディングのために使用され得る。 The context tree 1100 may be used for subsequent coding of syntax elements. Alternatively, the context tree 1100 may be trained before being used to code the syntax elements without further modification. For example, context tree 1100 may have been generated based on N syntax elements. Training the context tree 1100 with another N syntax elements and verifying that the coding efficiency of these N syntax elements is improved over the first N syntax elements. it can. The context tree 1100 can then be used for further coding of subsequent syntax elements without further modification.

図１１に示されるコンテキストツリー１１００の実装形態は、図示および説明されたものよりも追加的な機能、少ない機能、または異なる機能を含むことができる。いくつかの実装形態では、コンテキストツリーは非バイナリツリーであり得る。例えば、コンテキスト情報に適用される分離基準は、各ブランチが分離基準を満たすための可能な値の範囲に対応する場合などに、コンテキストツリー内の複数のブランチを返し得る。例えば、分離基準のバイナリバージョンは、コンテキストベクトルの第１のインデックスのコンテキスト情報の値が３未満かどうかを問うことができる。ツリーは、その分離基準に基づいて２つのブランチを含むことができ、１つのブランチは、そのコンテキスト情報の値が３未満のデータグループを表すノードにつながり、他方のブランチは、そのコンテキスト情報の値が３以上であるデータグループを表すノードにつながる。 The implementation of context tree 1100 shown in FIG. 11 may include additional features, fewer features, or different features than those shown and described. In some implementations, the context tree may be a non-binary tree. For example, a separation criterion applied to contextual information may return multiple branches in a context tree, such as when each branch corresponds to a range of possible values to meet the separation criterion. For example, the binary version of the separation criterion may ask if the value of the context information in the first index of the context vector is less than 3. The tree may include two branches based on its separation criterion, one branch leading to a node representing a data group whose context information value is less than 3 and the other branch the context information value. Is connected to a node that represents a data group of 3 or more.

しかしながら、その分離基準の非バイナリバージョンは、コンテキストベクトルの第１のインデックスにあるコンテキスト情報の値が何であるかを問うことができる。例えば、その分離基準に基づく第１のブランチは、そのコンテキスト情報の値が０から１であるデータグループを表すノードにつながり、他のそのようなブランチは、そのコンテキスト情報の値が２から３であるデータグループを表すノードにつながり、他のそのようなブランチは、そのコンテキスト情報の値が４から５であるデータグループを表すノードにつながり得る。いくつかの実装形態では、コンテキスト情報の値への非バイナリ分離基準の適用から生じるブランチの最大数に、構成可能または構成不可能な制限が存在し得る。 However, the non-binary version of that separation criterion can ask what the value of the context information at the first index of the context vector is. For example, a first branch based on that separation criterion leads to a node representing a data group whose context information has a value of 0 to 1, while other such branches have a context information value of 2 to 3 and so on. Other such branches may lead to nodes representing data groups, and other such branches may lead to nodes representing data groups whose context information has a value of 4 to 5. In some implementations, there may be configurable or non-configurable limits on the maximum number of branches resulting from the application of non-binary separation criteria to the value of context information.

上述の符号化および復号化の態様は、符号化および復号化技術のいくつかの例を示す。しかしながら、符号化および復号化は、特許請求の範囲で使用されるそれらの用語として圧縮、圧縮解除、変換、または任意の他の処理またはデータの変更を意味し得ることを理解されたい。 The encoding and decoding aspects described above illustrate some examples of encoding and decoding techniques. However, it is to be understood that encoding and decoding may refer to compression, decompression, conversion, or any other process or modification of data as those terms are used in the claims.

本明細書では、「例」という用語は、例、事例、または実例を意味するものとして使用されている。本明細書において「例」として記載された任意の態様または設計は、必ずしも他の態様または設計に対して好ましいまたは有利であるとして解釈されるべきではない。むしろ、「例」という言葉の使用は、具体的な方法で概念を提示することを意図している。本出願で使用される場合、用語「または」は、排他的な「または」ではなく、包括的な「または」を意味することが意図される。すなわち、他に明記されていない限り、または文脈によって明らかに別の形で示されない限り、「ＸはＡまたはＢを含む」という表現は、任意の自然な包含的置換（ｎａｔｕｒａｌｉｎｃｌｕｓｉｖｅｐｅｒｍｕｔａｔｉｏｎｓ）を意味することを意図する。即ち、「ＸはＡまたはＢを含む」は、ＸがＡを含む場合、ＸがＢを含む場合、またはＸがＡおよびＢの両方を含む場合のいずれにおいても満足される。さらに、本出願および添付の特許請求の範囲で使用される冠詞「ａ」および「ａｎ」は、他に明記されない限り、または単数形に向けられる文脈により明らかに示されない限り、「１つまたは複数」を意味すると一般に解釈されるべきである。さらに、本開示において「実装形態」または「一実装形態」という用語の使用は、そのように記載されない限り、同じ実施形態または実装形態を意味することを意図するものではない。 The word "example" is used herein to mean an example, instance, or illustration. Any aspect or design described herein as "examples" should not necessarily be construed as preferred or advantageous over other aspects or designs. Rather, use of the word "example" is intended to present concepts in a concrete way. As used in this application, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise, or explicitly indicated otherwise by context, the expression "X comprises A or B" means any natural inclusive permutation. Intended to. That is, "X includes A or B" is satisfied when X includes A, X includes B, or X includes both A and B. Further, the articles "a" and "an" used in the present application and the appended claims, unless otherwise indicated or clearly indicated by the context of the singular, refer to "one or more. Should be generally construed as meaning "." Furthermore, the use of the terms “implementation” or “one implementation” in this disclosure is not intended to mean the same embodiment or implementation, unless so stated.

送信局１０２および／または受信局１０６（ならびに、符号化器４００および復号化器５００が含む、それに記憶され、かつ／またはそれによって実行されるアルゴリズム、方法、命令など）の実装形態は、ハードウェア、ソフトウェア、またはそれらの任意の組み合わせにおいて実現することができる。ハードウェアは、例えば、コンピュータ、知的財産（ＩＰ）コア、特定用途向け集積回路（ＡＳＩＣ）、プログラマブル論理アレイ、光プロセッサ、プログラマブル論理コントローラ、マイクロコード、マイクロコントローラ、サーバ、マイクロプロセッサ、デジタル信号プロセッサ、または他の適切な回路を含むことができる。特許請求の範囲において、「プロセッサ」という用語は、前述のハードウェアのいずれかを単独でまたは組み合わせて含むものとして理解されるべきである。用語「信号」および「データ」は互換的に使用される。さらに、送信局１０２および受信局１０６の一部は、必ずしも同じ方法で実装される必要はない。 Implementations of the transmitting station 102 and/or the receiving station 106 (as well as the algorithms, methods, instructions, etc., which the encoder 400 and decoder 500 include, and are stored in, and/or executed by) the hardware. , Software, or any combination thereof. Hardware includes, for example, computers, intellectual property (IP) cores, application specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors. , Or other suitable circuitry. In the claims, the term "processor" should be understood to include any of the foregoing hardware, either alone or in combination. The terms "signal" and "data" are used interchangeably. Moreover, some of the transmitting station 102 and the receiving station 106 do not necessarily have to be implemented in the same way.

さらに、一態様では、例えば、送信局１０２または受信局１０６は、実行時に、本明細書に記載されたそれぞれの方法、アルゴリズム、および／または命令のうちの任意のものを実行するコンピュータプログラムを備えた汎用コンピュータまたは汎用プロセッサを用いて実装され得る。加えて、または代替的に、例えば、本明細書に記載された方法、アルゴリズム、または命令のいずれかを実行するための他のハードウェアを含むことができる専用コンピュータ／プロセッサを利用することができる。 Further, in one aspect, for example, transmitting station 102 or receiving station 106 comprises a computer program product that, when executed, executes any of the methods, algorithms, and/or instructions described herein. It may be implemented using a general purpose computer or a general purpose processor. Additionally or alternatively, a dedicated computer/processor may be utilized that may include, for example, other hardware for executing any of the methods, algorithms, or instructions described herein. ..

送信局１０２および受信局１０６は、例えば、ビデオ会議システム内のコンピュータ上で実装され得る。あるいは、送信局１０２はサーバ上で実装されてもよく、受信局１０６はサーバとは別のハンドヘルド通信デバイスのようなデバイス上で実装されてもよい。この場合、送信局１０２は、符号化器４００を用いて、コンテンツを符号化されたビデオ信号に符号化し、符号化されたビデオ信号を通信デバイスに送信することができる。通信デバイスは、復号化器５００を用いて符号化されたビデオ信号を復号化することができる。あるいは、通信デバイスは、通信デバイス上に局所的に格納されたコンテンツ、例えば、送信局１０２によって送信されなかったコンテンツを復号化することができる。他の適切な送信および受信の実装方式が利用可能である。例えば、受信局１０６は、ポータブル通信デバイスではなく、一般に固定のパーソナルコンピュータであってもよく、かつ／または符号化器４００を含むデバイスは、復号化器５００を含んでもよい。 The transmitting station 102 and the receiving station 106 may be implemented on a computer in a video conferencing system, for example. Alternatively, transmitting station 102 may be implemented on a server and receiving station 106 may be implemented on a device such as a handheld communication device separate from the server. In this case, the transmitting station 102 may use the encoder 400 to encode the content into an encoded video signal and transmit the encoded video signal to the communication device. The communication device can decode the encoded video signal using the decoder 500. Alternatively, the communication device may decrypt content stored locally on the communication device, eg, content that was not transmitted by transmitting station 102. Other suitable transmit and receive implementations are available. For example, receiving station 106 may be a generally stationary personal computer rather than a portable communication device, and/or the device containing encoder 400 may include decoder 500.

さらに、本開示の実装形態の全部または一部は、例えばコンピュータ使用可能またはコンピュータ可読媒体からアクセス可能なコンピュータプログラム製品の形態を取ることができる。コンピュータ使用可能またはコンピュータ可読媒体は、例えば、任意のプロセッサによって、またはそれに関連して使用するために、プログラムを有形的に収容、格納、通信、または輸送することができる任意のデバイスであり得る。媒体は、例えば、電子、磁気、光学、電磁気、または半導体デバイスであり得る。他の適切な媒体も利用可能である。 Moreover, all or part of an implementation of the present disclosure can take the form of, for example, a computer program product accessible from a computer usable or computer readable medium. Computer-usable or computer-readable media can be any device that can tangibly contain, store, communicate, or transport a program, for example, for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device. Other suitable media are also available.

上述した実施形態、実装形態、および態様は、本開示の理解を容易にするために記載されており、本開示を限定するものではない。一方、本開示は、添付の特許請求の範囲内に含まれる様々な改変および均等の構成を包含することを意図しており、その範囲は、すべてのそのような改変および均等の構成を包含するように法律で許容されるような最も広い解釈が与えられるべきである。 The embodiments, implementations, and aspects described above are set forth to facilitate an understanding of the present disclosure and are not intended to limit the present disclosure. On the contrary, the present disclosure is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope includes all such modifications and equivalent arrangements. The broadest interpretation as permitted by law should be given.

Claims

A method of decoding an encoded block of an encoded video frame, the method comprising:
Context tree,
Determining a first candidate cost reduction resulting from applying a separation criterion to context information about a first syntax element of a previously decoded block of the encoded video frame,
A first group of the first syntax elements according to a first separation criterion of the separation criteria resulting in the highest of the first candidate cost savings, and the first syntax. Separating the syntax elements into a second group of elements,
Determining a second candidate cost reduction resulting from applying the separation criterion to contextual information for the first group,
Determining a third candidate cost reduction resulting from applying the separation criterion to contextual information for the second group,
In response to determining that the highest candidate cost reduction of the second candidate cost reductions is greater than the highest candidate cost reduction of the third candidate cost reductions, the second candidate cost reductions A first subgroup of the first group and a second subgroup of the first group according to a second separation criterion of the separation criteria that results in the highest candidate cost reduction of Generate by separating into two subgroups,
Decoding a second syntax element of the encoded block according to a stochastic model identified using the context tree.

The context tree is
A first node representing the first syntax element;
A second node representing the first group,
A third node representing the second group,
A fourth node representing the first subgroup,
A fifth node representing the second subgroup,
The first node and the second node are non-leaf nodes,
The method of claim 1, wherein the third node, the fourth node, and the fifth node are leaf nodes.

The plurality of nodes of the context tree include the first node, the second node, the third node, the fourth node, and the fifth node, the method comprising:
Of the first syntax elements including the one syntax element of the second syntax elements based on context information associated with the one syntax element of the second syntax elements; Identifying one of the plurality of nodes representing a group or subgroup,
The method of claim 2, further comprising: identifying the probabilistic model using identified nodes.

Identifying one node of the plurality of nodes representing the first group of syntax elements that includes the one syntax element of the second syntax elements,
Applying the first separation criterion or the second separation criterion to context information associated with the one syntax element of the second syntax elements. Method.

Updating the context tree by recomputing cost savings for one or more nodes of a plurality of nodes based on the second syntax element;
The method according to any one of claims 2 to 4, further comprising: decoding the third syntax element of the second encoded block using the updated context tree.

Separating syntax elements into the first group and the second group according to a first separation criterion of the separation criteria that results in the highest of the first candidate cost savings. Is
Separating syntax elements into the first group and the second group in response to determining that the highest candidate cost reduction of the first candidate cost reductions meets a reduction threshold. The method according to any one of claims 1 to 5, comprising:

Determining a first candidate cost reduction resulting from applying a separation criterion to contextual information about the first syntax element is:
7. A method according to any one of claims 1 to 6, comprising performing a breadth-first search on a plurality of nodes of the context tree to identify a node representing the first syntax element.

An apparatus for decoding an encoded block of an encoded video frame, comprising:
Memory and
A processor configured to execute a plurality of instructions stored in the memory, the processor comprising:
Instructions for determining a first candidate cost reduction resulting from applying a separation criterion to context information about a first syntax element of a previously decoded block of the encoded video frame.
A first group of the first syntax elements according to a first separation criterion of the separation criteria that results in a highest candidate cost savings of the first candidate cost savings; and a first syntax of the first syntax elements. An instruction to separate the syntax elements into a second group of elements,
Instructions for determining a second candidate cost reduction resulting from applying the separation criterion to contextual information for the first group;
Instructions for determining a third candidate cost reduction resulting from applying the separation criterion to contextual information for the second group;
An instruction to compare the highest candidate cost reduction of the second candidate cost reductions with the highest candidate cost reduction of the third candidate cost reductions to determine which is greater;
In response to determining that the highest candidate cost reduction of the second candidate cost reductions is greater than the highest candidate cost reduction of the third candidate cost reductions,
Said first group according to a second separation criterion of said separation criteria resulting in the highest of the second candidate cost savings, said first group being a first subgroup of said first group; An instruction that separates into a second subgroup of the first group;
A plurality of first syntax elements, the first group, the second group, the first subgroup of the first group, and a plurality of second subgroups of the first group. Instructions to generate a context tree containing nodes,
In response to determining that the highest candidate cost reduction of the third candidate cost reductions is greater than the highest candidate cost reduction of the second candidate cost reductions,
Said second group according to a third separation criterion of said separation criteria that results in the highest of the second candidate cost savings, said second group being a first subgroup of said second group; An instruction that separates into a second subgroup of the second group;
A plurality of first syntax elements, the first group, the second group, the first subgroup of the second group, and a plurality of second subgroups of the second group. Instructions to generate a context tree containing nodes,
And decoding instructions for decoding a second syntax element of the encoded block according to a probability model identified using the context tree.

The plurality of instructions are
Of the first syntax elements including the one syntax element of the second syntax elements based on context information associated with the one syntax element of the second syntax elements; An instruction identifying the node representing the group or subgroup,
Instructions for identifying the probabilistic model using identified nodes.

An instruction identifying a node representing a group or subgroup of the first syntax elements that includes the one syntax element of the second syntax elements,
The apparatus of claim 9, comprising instructions for applying a separation criterion of one of the separation criteria to context information associated with the one syntax element of the second syntax element.

The cost for entropy decoding one syntax element of the second syntax elements is
An entropy cost function of the group or subgroup of first syntax elements that includes the one syntax element of the second syntax elements;
A size penalty function having a positive output that decreases with the size of the first group of syntax elements or subgroups;
The apparatus according to any one of claims 8 to 10, wherein the apparatus is calculated based on the weighting of the size penalty function.

The entropy cost function is
The length of the data in the first syntax element group or subgroup, and
12. The apparatus of claim 11, wherein the syntax elements included in the first group or subgroups of syntax elements have a probability of having a specified value.

A method of decoding an encoded block of an encoded video frame, the method comprising:
Decoding syntax elements of the encoded block according to a probability model identified using a context tree, the context tree comprising:
A first node representing a syntax element of a previously decoded block of the encoded video frame,
A second node representing the first group of syntax elements;
A third node representing the second group of syntax elements;
A fourth node representing a first subgroup of the first group of syntax elements;
A fifth node representing a second subgroup of the first group of syntax elements;
The syntax elements are separated into the first group and the second group using one separation criterion that results in the highest candidate cost savings of the first set of candidate cost savings. Yes, the first set of candidate cost reductions is determined by applying a plurality of separation criteria including the one separation criterion to context information of the syntax element,
The first group is responsive to a determination that the highest candidate cost reduction of the second set of candidate cost reductions is greater than the highest candidate cost reduction of the third set of candidate cost reductions. , The first subgroup and the second subgroup are separated, and the second set of candidate cost reductions is based on the plurality of separation criteria with respect to the context information of the first group. And the third set of candidate cost reductions is determined by applying the plurality of separation criteria to the context information of the second group. ,Method.

14. The method of claim 13, further comprising: generating the context tree based on data encoded in an encoded bitstream that includes the encoded video frames.

The plurality of nodes of the context tree include the first node, the second node, the third node, the fourth node, and the fifth node, the method comprising:
Including the one syntax element of the syntax elements of the encoded block based on context information associated with the syntax element of one of the syntax elements of the encoded block. Identifying one of a plurality of nodes representing a group or subgroup of the syntax elements of the previously decoded block,
15. The method of claim 13 or 14, further comprising: identifying the probabilistic model using identified nodes.

The one separation criterion is a first separation criterion and one of a plurality of nodes representing a first group of syntax elements including one of the second syntax elements. To identify
Applying the first or second separation criterion of the plurality of separation criteria to context information associated with the one syntax element of the syntax elements of the encoded block; 16. The method of claim 15, comprising:

Updating the context tree by recalculating cost savings for one or more nodes of the plurality of nodes based on a second syntax element;
The method of claim 15, further comprising: decoding the syntax elements of the second encoded block using the updated context tree.

Responsive to determining that the highest candidate cost reduction of the first set of candidate cost reductions meets a reduction threshold, the syntax elements of the previously decoded block are converted into the first candidate cost reduction. 18. The method according to any one of claims 13 to 17, further comprising separating into a group and the second group.

The cost for entropy decoding one syntax element of the syntax elements of the encoded block is:
An entropy cost function for a group or subgroup of syntax elements of the previously decoded block that includes the one syntax element of the syntax elements of the encoded block;
A size penalty function having a positive output that decreases with the size of the group or subgroup of syntax elements of the previously decoded block;
The method according to any one of claims 13 to 18, calculated based on the weighting of the size penalty function.

The entropy cost function is
The length of data in a group or subgroup of syntax elements of the previously decoded block,
20. The method of claim 19, wherein the syntax elements included in the group or subgroup of syntax elements of the previously decoded block are calculated based on the probability of having a specified value.