JP6059219B2

JP6059219B2 - Latency reduction in video encoding and decoding

Info

Publication number: JP6059219B2
Application number: JP2014518537A
Authority: JP
Inventors: ジェイサリバン，ゲイリー
Original assignee: Microsoft Corp
Current assignee: Microsoft Corp
Priority date: 2011-06-30
Filing date: 2011-10-11
Publication date: 2017-01-11
Anticipated expiration: 2031-10-11
Also published as: WO2013002818A1; CA2840427A1; EP2727341A4; US11871040B2; EP3691268A1; AU2011371809B2; HK1226567A1; RS64742B1; US20220394307A1; US11601681B2; US11451830B2; US20230008752A1; US20240364935A1; US20230020316A1; TW201728180A; CN103621085B; US20180249184A1; US20200404337A1; US20160316228A1; US9743114B2

Description

本発明は、ビデオ符号化及び復号化における待ち時間の低減に関する。 The present invention relates to latency reduction in video encoding and decoding.

技術者は、デジタルビデオのビットレートを低減するために圧縮（ソースコーディング又はソースエンコーディングとも呼ばれる）を用いる。圧縮は、情報をより低いビットレート形式に変換することにより、ビデオ情報を格納及び送信するコストを低減する。伸張（デコーディングとも呼ばれる）は、圧縮形式から元の情報のバージョンを再構成する。「コーデック」は、エンコーダ／デコーダシステムである。 Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression reduces the cost of storing and transmitting video information by converting the information to a lower bit rate format. Decompression (also called decoding) reconstructs a version of the original information from a compressed format. A “codec” is an encoder / decoder system.

過去２０年間に亘り、様々なビデオコーデック規格が採用されてきた。このような規格は、Ｈ．２６１、Ｈ．２６２（ＭＰＥＧ−２又はＩＳＯ／ＩＥＣ１３８１８−２）、Ｈ．２６３及びＨ２６４（ＡＶＣ又はＩＳＯ／ＩＥＣ１４４９６−１０）規格、並びにＭＰＥＧ−１（ＩＳＯ／ＩＥＣ１１１７２−２）、ＭＰＥＧ−４Ｖｉｓｕａｌ（ＩＳＯ／ＩＥＣ１４４９６−２）及びＳＭＰＴＥ４２１Ｍ規格を含む。更に最近は、ＨＥＶＣ規格を策定中である。ビデオコーデック規格は、通常、符号化ビデオビットストリームのシンタックスのオプションを定め、符号化及び復号化の際に特定の機能が用いられるとき、ビットストリーム内のパラメータを詳細に記述する。多くの場合、ビデオコーデック規格は、デコーダが復号化する際に正しい結果を達成するために実行すべき復号化動作に関する詳細も提供する。 Various video codec standards have been adopted over the past 20 years. Such a standard is H.264. 261, H.H. 262 (MPEG-2 or ISO / IEC 13818-2), H.264. 263 and H264 (AVC or ISO / IEC 14496-10) standards, as well as MPEG-1 (ISO / IEC 11172-2), MPEG-4 Visual (ISO / IEC 14496-2) and SMPTE 421M standards. More recently, the HEVC standard is being formulated. Video codec standards typically specify syntax options for the encoded video bitstream and describe in detail the parameters in the bitstream when specific functions are used during encoding and decoding. In many cases, video codec standards also provide details regarding the decoding operations that should be performed to achieve the correct results when the decoder decodes.

圧縮の基本的目標は、良好なレート歪み性能を提供することである。したがって、特定のビットレートで、エンコーダは、最高品質のビデオを提供しようとする。或いは、特定レベルの品質／元のビデオに対する忠実さで、エンコーダは、最低ビットレートの符号化ビデオを提供しようとする。実際には、使用シナリオに依存して、符号化時間、符号化の複雑さ、符号化リソース、復号化時間、復号化の複雑さ、復号化リソース、全体の遅延、及び／又は再生の滑らかさも、符号化及び復号化中に行われる決定に影響を与える。 The basic goal of compression is to provide good rate distortion performance. Thus, at a particular bit rate, the encoder tries to provide the highest quality video. Alternatively, with a certain level of quality / fidelity to the original video, the encoder attempts to provide the lowest bit rate encoded video. In practice, depending on the usage scenario, the coding time, coding complexity, coding resources, decoding time, decoding complexity, decoding resources, overall delay, and / or smoothness of playback may also be Affects decisions made during encoding and decoding.

例えば、記憶装置からのビデオ再生、ネットワーク接続を介してストリーミングされる符号化データからのビデオ再生、及び（あるビットレートから別のビットレートへの、又はある規格から別の規格への）ビデオのトランスコーディングのような使用シナリオを検討する。エンコーダ側では、このような用途は、全く時間に依存しないオフライン符号化を可能にする。したがって、エンコーダは、ビデオを圧縮する最も効率的な方法を見付けるために、符号化時間を増大し、符号化中に使用されるリソースを増大でき、それによりレート歪み性能を向上できる。デコーダ側で少量の遅延が許容可能な場合、エンコーダは、例えばシーケンスの遙か前方にあるピクチャからのピクチャ間依存性を利用することにより、レート歪み性能を更に向上できる。 For example, video playback from a storage device, video playback from encoded data streamed over a network connection, and video (from one bit rate to another, or from one standard to another) Consider usage scenarios such as transcoding. On the encoder side, such an application allows off-line coding that is totally time independent. Thus, the encoder can increase the encoding time and resources used during encoding to find the most efficient way to compress video, thereby improving rate distortion performance. If a small amount of delay is acceptable at the decoder side, the encoder can further improve the rate distortion performance, for example, by taking advantage of inter-picture dependency from the picture farther ahead of the sequence.

また、遠隔デスクトップ会議、監視ビデオ、ビデオ電話、及び他のリアルタイム通信シナリオのような使用シナリオを検討する。このような用途は、時間に依存する。入力ピクチャの記録と出力ピクチャの再生との間の短い待ち時間が、性能の主要な要因である。非リアルタイム通信に適用される符号化／復号化ツールがリアルタイム通信シナリオに適用されるとき、全体の待ち時間は、許容できない程長くなる場合が多い。これらのツールが符号化及び復号化中に導入する遅延は、通常のビデオ再生の性能を向上し得るが、それらはリアルタイム通信を台無しにしてしまう。 Also consider usage scenarios such as remote desktop conferencing, surveillance video, video telephony, and other real-time communication scenarios. Such applications are time dependent. The short latency between recording the input picture and playing the output picture is a major factor in performance. When an encoding / decoding tool applied to non-real time communication is applied to a real time communication scenario, the overall latency is often unacceptably long. The delay introduced by these tools during encoding and decoding can improve the performance of normal video playback, but they ruin real-time communication.

纏めると、ビデオ符号化及び復号化において待ち時間を減少させる技術及びツールを提示する。この技術及びツールは、待ち時間を低減し、リアルタイム通信における反応を向上する。例えば、技術及びツールは、ビデオフレームの並べ替えに起因する待ち時間を制限することにより、及びビデオフレームの符号化データに伴う１又は複数のシンタックスエレメントでフレーム並べ替え待ち時間の制約を示すことにより、全体の待ち時間を低減する。 In summary, techniques and tools that reduce latency in video encoding and decoding are presented. This technique and tool reduces latency and improves responsiveness in real-time communications. For example, techniques and tools may indicate frame reordering latency constraints by limiting latency due to video frame reordering, and with one or more syntax elements associated with the encoded data of the video frame. To reduce the overall waiting time.

本願明細書に記載の技術及びツールの一態様によると、ビデオエンコーダのようなツール、ビデオエンコーダを有するリアルタイム通信ツール、又は他のツールは、待ち時間の制約（例えば、ビデオシーケンスの複数のフレーム間のフレーム間依存性と一致するフレーム並べ替え待ち時間の制約）を示す１又は複数のシンタックスエレメントを設定する。ツールは、シンタックスエレメントを出力し、それにより、フレームの出力順序の観点から、再構成フレームが出力準備ができたときの決定をより簡単且つ迅速に実現する。 In accordance with one aspect of the techniques and tools described herein, a tool such as a video encoder, a real-time communication tool with a video encoder, or other tool can be used for latency constraints (eg, between multiple frames of a video sequence). One or a plurality of syntax elements indicating a frame rearrangement waiting time constraint that coincides with the inter-frame dependency are set. The tool outputs syntax elements, thereby making it easier and faster to determine when the reconstructed frame is ready for output in terms of the output order of the frames.

本願明細書に記載の技術及びツールの別の態様によると、ビデオデコーダのようなツール、ビデオデコーダを有するリアルタイム通信ツール、又は他のツールは、待ち時間の制約（例えば、フレーム並べ替え待ち時間の制約）を示す１又は複数のシンタックスエレメントを受信し解析する。ツールは、ビデオシーケンスの複数のフレームの符号化データも受信する。符号化データの少なくとも一部は、フレームのうちの１つを再構成するために復号化される。ツールは、シンタックスエレメントに基づき制約を決定でき、次に待ち時間の制約を用いて、（出力順序の観点から）再構成フレームの出力準備ができたときを決定できる。ツールは、再構成フレームを出力する。 In accordance with another aspect of the techniques and tools described herein, a tool such as a video decoder, a real-time communication tool with a video decoder, or other tool can provide latency constraints (e.g., frame reorder latency One or more syntax elements indicating (constraints) are received and analyzed. The tool also receives encoded data for multiple frames of the video sequence. At least a portion of the encoded data is decoded to reconstruct one of the frames. The tool can determine constraints based on the syntax elements, and then use latency constraints to determine when the reconstructed frame is ready for output (in terms of output order). The tool outputs a reconstruction frame.

本発明の前述の及び他の目的、特徴及び利点は、添付の図面を参照して進められる以下の詳細な説明から一層明らかになるだろう。 The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.

幾つかの記載の実施形態が実装され得る例示的なコンピューティングシステムの図である。FIG. 1 is an illustration of an example computing system in which some described embodiments may be implemented. 幾つかの記載の実施形態が実装され得る例示的なネットワーク環境の図である。FIG. 3 is an illustration of an example network environment in which some described embodiments may be implemented. 幾つかの記載の実施形態が実装され得る例示的なネットワーク環境の図である。FIG. 3 is an illustration of an example network environment in which some described embodiments may be implemented. 幾つかの記載の実施形態が実装され得る関連する例示的なエンコーダシステム図である。FIG. 3 is a related exemplary encoder system diagram in which some described embodiments may be implemented. 幾つかの記載の実施形態が実装され得る関連する例示的なデコーダシステム図である。FIG. 3 is a related exemplary decoder system diagram in which some described embodiments may be implemented. 例示的なシリーズにおけるフレームの符号化順序及び出力順序を示す図である。FIG. 5 is a diagram illustrating a frame encoding order and an output order in an exemplary series. 例示的なシリーズにおけるフレームの符号化順序及び出力順序を示す図である。FIG. 5 is a diagram illustrating a frame encoding order and an output order in an exemplary series. 例示的なシリーズにおけるフレームの符号化順序及び出力順序を示す図である。FIG. 5 is a diagram illustrating a frame encoding order and an output order in an exemplary series. 例示的なシリーズにおけるフレームの符号化順序及び出力順序を示す図である。FIG. 5 is a diagram illustrating a frame encoding order and an output order in an exemplary series. 例示的なシリーズにおけるフレームの符号化順序及び出力順序を示す図である。FIG. 5 is a diagram illustrating a frame encoding order and an output order in an exemplary series. 待ち時間の制約を示す１又は複数のシンタックスエレメントを設定し出力する例示的な技術を示すフローチャートである。6 is a flowchart illustrating an exemplary technique for setting and outputting one or more syntax elements indicating latency constraints. 待ち時間の減少した復号化のための例示的な技術を示すフローチャートである。6 is a flowchart illustrating an example technique for decoding with reduced latency.

詳細な説明は、ビデオ符号化及び復号化において待ち時間を減少させる技術及びツールを提示する。この技術及びツールは、待ち時間を減少するのを助け、リアルタイム通信における反応を向上する。 The detailed description presents techniques and tools that reduce latency in video encoding and decoding. This technique and tool helps reduce latency and improves responsiveness in real-time communications.

ビデオ符号化／復号化シナリオでは、入力ビデオフレームが受信されたときとフレームが再生されるときとの間の特定の遅延は避けられない。フレームは、エンコーダにより符号化され、デコーダへ配信され、デコーダにより復号化される。そして、特定量の待ち時間が、符号化リソース、復号化リソース及び／又はネットワーク帯域についての実際的制限により引き起こされる。しかしながら、他の待ち時間は回避できる。例えば、待ち時間は、レート歪み性能を向上するために（例えばシーケンスの遙か前方にあるピクチャからのフレーム間依存性を利用するために）、エンコーダ及びデコーダにより導入される可能性がある。このような待ち時間は、レート歪み性能、プロセッサ稼働率又は再生の滑らかさの点で不利益があるかも知れないが、低減できる。 In video encoding / decoding scenarios, a certain delay between when an input video frame is received and when the frame is played is inevitable. The frame is encoded by the encoder, delivered to the decoder, and decoded by the decoder. And a certain amount of latency is caused by practical limits on coding resources, decoding resources and / or network bandwidth. However, other waiting times can be avoided. For example, latency may be introduced by encoders and decoders to improve rate distortion performance (eg, to take advantage of interframe dependencies from pictures that are farther forward in the sequence). Such latency can be reduced, although it may be detrimental in terms of rate distortion performance, processor utilization, or smoothness of playback.

ここに記載される技術及びツールにより、待ち時間を抑制することにより（したがって、フレーム間依存性の時間範囲を制限することにより）及び待ち時間の制約をデコーダに示すことにより、待ち時間は減少される。例えば、待ち時間の制約は、フレーム並べ替え待ち時間の制約である。代替で、待ち時間の制約は、秒、ミリ秒、又は別の時間量の観点からの制約である。次に、デコーダは、待ち時間の制約を決定し、どのフレームが出力する準備ができているかを決定するときに該制約を使用できる。このように、遅延は、遠隔デスクトップ会議、ビデオ電話、監視ビデオ、ウェブカメラビデオ及び他のリアルタイム通信アプリケーションの場合に減少できる。 With the techniques and tools described herein, latency is reduced by suppressing latency (thus limiting the time range of inter-frame dependencies) and by showing latency constraints to the decoder. The For example, the waiting time constraint is a frame rearrangement waiting time constraint. Alternatively, the latency constraint is a constraint in terms of seconds, milliseconds, or another amount of time. The decoder can then determine latency constraints and use the constraints when determining which frames are ready to output. Thus, delay can be reduced for remote desktop conferencing, video telephony, surveillance video, webcam video, and other real-time communication applications.

ここに記載する幾つかの技術革新は、Ｈ．２６４及び／又はＨＥＶＣ規格に特有のシンタックスエレメント及び動作を参照して説明される。これらの技術革新は、他の規格又はフォーマットでも実施できる。 Some of the innovations described here are H.264 and / or HEVC standards specific syntax elements and operations will be described. These innovations can be implemented in other standards or formats.

より一般的には、本願明細書に記載する例に対して種々の代替が可能である。フローチャート図を参照して説明する特定の技術は、フローチャートに示す段階の順序を変更することにより、特定の段階を分割し、繰り返し、又は省略する等により変更できる。ビデオ符号化及び復号化の待ち時間を減少する種々の態様は、組みあわせて又は別個に用いることができる。異なる実施形態は、１又は複数の記載の技術及びツールを用いる。ここに記載する技術及びツールの幾つかは、背景で述べた１又は複数の問題を解決する。通常、所与の技術／ツールは、これらの全ての問題を解決しない。 More generally, various alternatives to the examples described herein are possible. The specific technique described with reference to the flowchart can be changed by changing the order of the steps shown in the flowchart to divide, repeat, or omit the specific steps. Various aspects of reducing video encoding and decoding latency can be used in combination or separately. Different embodiments use one or more of the described techniques and tools. Some of the techniques and tools described herein solve one or more of the problems described in the background. Usually a given technology / tool does not solve all these problems.

＜Ｉ．例示的なコンピューティングシステム＞
図１は、幾つかの記載の技術及びツールが実施され得る適切なコンピューティングシステム（１００）の汎用的な例を示す。技術及びツールは多様な汎用又は特定目的コンピューティングシステムで実施できるので、コンピューティングシステム（１００）は使用又は機能の範囲に関する如何なる限定も示唆しない。 <I. Exemplary Computing System>
FIG. 1 illustrates a general example of a suitable computing system (100) in which some described techniques and tools may be implemented. Since the techniques and tools can be implemented in a variety of general purpose or special purpose computing systems, the computing system (100) does not suggest any limitation as to the scope of use or functionality.

図１を参照すると、コンピューティングシステム（１００）は、１又は複数の処理ユニット（１１０、１１５）及びメモリ（１２０、１２５）を有する。図１では、この最も基本的な構成（１３０）は破線で囲まれる。処理ユニット（１１０、１１５）は、コンピュータ実行可能命令を実行する。処理ユニットは、汎用中央演算処理装置（ＣＰＵ）、ＡＳＩＣ（application-specific integrated circuit）のプロセッサ又は任意の他の種類のプロセッサであり得る。マルチプロセッシングシステムでは、処理能力を増大するために複数の処理ユニットがコンピュータ実行可能命令を実行する。例えば、図１は、中央処理ユニット（１１０）と共にグラフィック処理ユニット又はコプロセッシングユニット（１１５）も示す。有形メモリ（１２０、１２５）は、処理ユニットによりアクセス可能な、揮発性メモリ（例えば、レジスタ、キャッシュ、ＲＡＭ）、不揮発性メモリ（例えば、ＲＯＭ、ＥＥＰＲＯＭ、フラッシュメモリ等）、又は２つの組みあわせであっても良い。メモリ（１２０、１２５）は、処理ユニットによる実行に適するコンピュータ実行可能命令の形式で、ビデオ符号化及び復号化の待ち時間を減少する１又は複数の技術革新を実施するソフトウェア（１８０）を格納する。 Referring to FIG. 1, the computing system (100) includes one or more processing units (110, 115) and memory (120, 125). In FIG. 1, this most basic configuration (130) is surrounded by a dashed line. The processing units (110, 115) execute computer-executable instructions. The processing unit may be a general-purpose central processing unit (CPU), an application-specific integrated circuit (ASIC) processor, or any other type of processor. In a multiprocessing system, multiple processing units execute computer-executable instructions to increase processing power. For example, FIG. 1 also shows a graphics processing unit or coprocessing unit (115) along with a central processing unit (110). Tangible memory (120, 125) can be volatile memory (eg, registers, cache, RAM), non-volatile memory (eg, ROM, EEPROM, flash memory, etc.), or a combination of the two, accessible by the processing unit. There may be. The memory (120, 125) stores software (180) that implements one or more innovations that reduce the latency of video encoding and decoding in the form of computer-executable instructions suitable for execution by the processing unit. .

コンピュータシステムは、追加機能を有しても良い。例えば、コンピューティングシステム（１００）は、記憶装置（１４０）、１又は複数の入力装置（１５０）、１又は複数の出力装置（１６０）、１又は複数の通信接続（１７０）を有する。バス、制御部又はネットワークのような相互接続メカニズム（図示しない）は、コンピューティングシステム（１００）のコンポーネントを相互接続する。通常、オペレーティングシステムソフトウェア（図示しない）は、コンピューティングシステム（１００）において実行する他のソフトウェアのために動作環境を提供し、コンピューティングシステム（１００）のコンポーネントの活動を調整する。 The computer system may have additional functions. For example, the computing system (100) includes a storage device (140), one or more input devices (150), one or more output devices (160), and one or more communication connections (170). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing system (100). Typically, operating system software (not shown) provides an operating environment for other software executing on the computing system (100) and coordinates the activities of the components of the computing system (100).

有形記憶装置（１４０）は、取り外し可能又は取り外し不可能であっても良く、磁気ディスク、磁気テープ若しくはカセット、ＣＤ−ＲＯＭ、ＤＶＤ、又は非一時的方法で情報を格納するために使用可能な且つコンピューティングシステム（１００）内でアクセス可能な任意の他の媒体を含む。記憶装置（１４０）は、ビデオ符号化及び復号化の待ち時間を減少する１又は複数の技術革新を実施するソフトウェア（１８０）のための命令を格納する。 The tangible storage device (140) may be removable or non-removable and can be used to store information in a magnetic disk, magnetic tape or cassette, CD-ROM, DVD, or non-transitory manner and It includes any other medium accessible within computing system (100). The storage device (140) stores instructions for software (180) that implements one or more innovations that reduce video encoding and decoding latency.

入力装置（１５０）は、キーボード、マウス、ペン又はトラックボールのようなタッチ入力装置、音声入力装置、スキャン装置、又はコンピューティングシステム（１００）に入力を供給する別の装置であっても良い。ビデオ符号化では、入力装置（１５０）は、カメラ、ビデオカード、ＴＶチューナカード、又はアナログ若しくはデジタル形式でビデオ入力を受け付ける同様の装置、又はビデオサンプルをコンピューティングシステム（１００）に読み込むＣＤ−ＲＯＭ若しくはＣＤ−ＲＷであっても良い。出力装置（１６０）は、ディスプレイ、プリンタ、スピーカ、ＣＤライタ、又はコンピューティングシステム（１００）から出力を供給する別の装置であっても良い。 The input device (150) may be a touch input device such as a keyboard, mouse, pen or trackball, a voice input device, a scanning device, or another device that provides input to the computing system (100). For video encoding, the input device (150) is a camera, video card, TV tuner card, or similar device that accepts video input in analog or digital form, or a CD-ROM that reads video samples into the computing system (100). Alternatively, it may be a CD-RW. The output device (160) may be a display, printer, speaker, CD writer, or another device that provides output from the computing system (100).

通信接続（１７０）は、通信媒体を介して他のコンピューティングエンティティとの通信を可能にする。通信媒体は、コンピュータ実行可能命令、オーディオ又はビデオ入力又は出力、又は変調データ信号内の他のデータのような情報を伝達する。変調データ信号は、１又は複数の特性セットを有する信号であるか、又は信号内の情報をエンコードするために変更されても良い。例として、限定ではなく、通信媒体は、電気的、光学的、ＲＦ又は他のキャリアを使用できる。 Communication connection (170) allows communication with other computing entities via a communication medium. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. The modulated data signal is a signal that has one or more of its characteristics set or may be changed to encode information in the signal. By way of example, and not limitation, communication media can use electrical, optical, RF or other carriers.

技術及びツールは、コンピュータ可読媒体の一般的概念で記載され得る。コンピュータ可読媒体は、コンピューティング環境内でアクセス可能な任意の利用可能な有形媒体である。例として、限定ではなく、コンピューティングシステム（１００）と共に、コンピュータ可読媒体は、メモリ（１２０、１２５）、記憶装置（１４０）及びそれらの任意の組みあわせを有する。 Techniques and tools may be described in the general concept of computer-readable media. Computer readable media can be any available tangible media that can be accessed within a computing environment. By way of example, and not limitation, in conjunction with computing system (100), computer-readable media includes memory (120, 125), storage device (140), and any combination thereof.

技術及びツールは、対象の実際の又は仮想のプロセッサ上のコンピューティングシステムで実行される、プログラムモジュールに含まれるようなコンピュータ実行可能命令の一般的概念で記載され得る。概して、プログラムモジュールは、特定のタスクを実行し又は特定の抽象データ型を実装する、ルーチン、プログラム、ライブラリ、オブジェクト、クラス、コンポーネント、データ構造、等を含む。プログラムモジュールの機能は、種々の実施形態で必要に応じて結合され又はプログラムモジュール間で分割されても良い。プログラムモジュールのためのコンピュータ実行可能命令は、ローカル又は分散型コンピューティングシステム内で実行されても良い。 Techniques and tools may be described in the general notion of computer-executable instructions, such as those included in program modules, that are executed on a computing system on a real or virtual processor of interest. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functions of the program modules may be combined or divided among the program modules as required in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing system.

用語「システム」及び「装置」は本願明細書では同義的に用いられる。文脈上特に断らない限り、いずれの用語も、コンピューティングシステム又はコンピューティング装置の種類に関して如何なる限定も示唆しない。概して、コンピューティングシステム又はコンピューティング装置は、ローカル又は分散型であり、特定目的ハードウェア及び／又は汎用ハードウェアと本願明細書に記載の機能を実施するソフトウェアとの任意の組合せを含み得る。 The terms “system” and “apparatus” are used interchangeably herein. Unless otherwise noted in context, no terms imply any limitation as to the type of computing system or computing device. In general, a computing system or computing device is local or distributed and may include any combination of special purpose hardware and / or general purpose hardware and software that performs the functions described herein.

提示のために、詳細な説明は、コンピューティングシステム内のコンピュータ動作を記載するために「決定する」及び「使用する」のような用語を用いる。これらの用語は、コンピュータにより実行される動作の高レベルの抽象化であり、人間により行われる動作と混同されるべきではない。これらの用語に対応する実際のコンピュータ動作は、実装に依存して変化する。 For presentation purposes, the detailed description uses terms such as “determine” and “use” to describe computer operations within the computing system. These terms are high-level abstractions of operations performed by a computer and should not be confused with operations performed by a human. The actual computer operations corresponding to these terms vary depending on the implementation.

＜ＩＩ．例示的なネットワーク環境＞
図２Ａ及び２Ｂは、ビデオエンコーダ（２２０）及びビデオデコーダ（２７０）を有する例示的なネットワーク環境（２０１、２０２）を示す。エンコーダ（２２０）及びデコーダ（２７０）は、適切な通信プロトコルを用いて、ネットワーク（２５０）を介して接続される。ネットワーク（２５０）は、インターネット又は他のコンピュータネットワークを含み得る。 <II. Exemplary network environment>
2A and 2B illustrate an exemplary network environment (201, 202) having a video encoder (220) and a video decoder (270). The encoder (220) and decoder (270) are connected via the network (250) using an appropriate communication protocol. The network (250) may include the Internet or other computer network.

図２Ａに示すネットワーク環境（２０１）では、各リアルタイム通信（ＲＴＣ）ツール（２１０）は、双方向通信のためにエンコーダ（２２０）及びデコーダ（２７０）の両方を含む。所与のエンコーダ（２２０）は、エンコーダ（２２０）からの符号化データを受け付ける対応するデコーダ（２７０）と共に、ＳＭＰＴＥ４２１Ｍ規格、ＩＳＯ−ＩＥＣ１４４９６−１０規格（Ｈ．２６４又はＡＶＣとしても知られる）、ＨＥＶＣ規格、別の規格、又は独自フォーマットに従う出力を生成できる。双方向通信は、ビデオ会議、ビデオ電話、又は他の２者間通信シナリオの一部であり得る。ず２Ａのネットワーク環境（２０１）は２つのリアルタイム通信ツール（２１０）を有するが、代わりに、ネットワーク環境（２０１）は、多者通信に参加する３以上のリアルタイム通信ツール（２１０）を有することも可能である。 In the network environment (201) shown in FIG. 2A, each real-time communication (RTC) tool (210) includes both an encoder (220) and a decoder (270) for bidirectional communication. A given encoder (220), together with a corresponding decoder (270) that accepts encoded data from the encoder (220), SMPTE 421M standard, ISO-IEC 14496-10 standard (also known as H.264 or AVC), HEVC Output can be generated according to a standard, another standard, or proprietary format. Two-way communication may be part of a video conference, video phone, or other two-party communication scenario. The 2A network environment (201) has two real-time communication tools (210), but alternatively the network environment (201) may have more than two real-time communication tools (210) participating in multi-party communication. Is possible.

リアルタイム通信ツール（２１０）は、エンコーダ（２２０）による符号化を管理する。図３は、リアルタイム通信ツール（２１０）に含まれ得る例示的なエンコーダシステム（３００）を示す。代替で、リアルタイム通信ツール（２１０）は、別のエンコーダシステムを用いる。リアルタイム通信ツール（２１０）は、デコーダ（２７０）による復号化も管理する。図４は、リアルタイム通信ツール（２１０）に含まれ得る例示的なデコーダシステム（４００）を示す。代替で、リアルタイム通信ツール（２１０）は、別のデコーダシステムを用いる。 The real-time communication tool (210) manages encoding by the encoder (220). FIG. 3 shows an exemplary encoder system (300) that may be included in the real-time communication tool (210). Alternatively, the real-time communication tool (210) uses another encoder system. The real-time communication tool (210) also manages the decoding by the decoder (270). FIG. 4 shows an exemplary decoder system (400) that may be included in the real-time communication tool (210). Alternatively, the real-time communication tool (210) uses another decoder system.

図２Ｂに示すネットワーク環境（２０２）では、符号化ツール（２１２）は、デコーダ（２７０）を含む複数の再生ツール（２１４）に配信するためにビデオを符号化するエンコーダ（２２０）を有する。一方向通信は、ビデオ監視システム、ウェブカメラ監視システム、遠隔デスクトップ会議プレゼンテーション、又はビデオが符号化されある場所から１又は複数の他の場所へ送信される他のシナリオのために提供され得る。図２Ｂのネットワーク環境（２０２）は２つの再生ツール（２１４）を有するが、ネットワーク環境（２０２）は、より多くの又はより少ない再生ツール（２１４）を有することも可能である。概して、再生ツール（２１４）は、符号化ツール（２１２）と通信し、受信すべき再生ツール（２１４）のビデオストリームを決定する。再生ツール（２１４）は、このストリームを受信し、受信符号化データを適切な期間の間バッファリングし、復号化及び再生を始める。 In the network environment (202) shown in FIG. 2B, the encoding tool (212) has an encoder (220) that encodes video for delivery to a plurality of playback tools (214) including a decoder (270). One-way communication may be provided for video surveillance systems, webcam surveillance systems, remote desktop conference presentations, or other scenarios where video is transmitted from one location to one or more other locations. Although the network environment (202) of FIG. 2B has two playback tools (214), the network environment (202) may have more or fewer playback tools (214). In general, the playback tool (214) communicates with the encoding tool (212) to determine the video stream of the playback tool (214) to be received. The playback tool (214) receives this stream, buffers the received encoded data for an appropriate period, and begins decoding and playback.

図３は、符号化ツール（２１２）に含まれ得る例示的なエンコーダシステム（３００）を示す。代替で、符号化ツール（２１２）は、別のエンコーダシステムを用いる。符号化ツール（２１２）は、１又は複数の再生ツール（２１４）との接続を管理するサーバ側制御ロジックも有し得る。図４は、再生ツール（２１４）に含まれ得る例示的なデコーダシステム（４００）を示す。代替で、再生ツール（２１４）は、別のデコーダシステムを用いる。再生ツール（２１４）は、符号化ツール（２１２）との接続を管理するクライアント側制御ロジックも有し得る。 FIG. 3 shows an exemplary encoder system (300) that may be included in the encoding tool (212). Alternatively, the encoding tool (212) uses another encoder system. The encoding tool (212) may also have server-side control logic that manages connections with one or more playback tools (214). FIG. 4 shows an exemplary decoder system (400) that may be included in the playback tool (214). Alternatively, the playback tool (214) uses another decoder system. The playback tool (214) may also have client-side control logic that manages the connection with the encoding tool (212).

幾つかの場合には、待ち時間（例えば、フレーム並べ替え待ち時間）を示すためのシンタックスエレメントの使用は、特定の規格又はフォーマットに固有である。例えば、符号化データは、規格又はフォーマットに従って定められる基本符号化ビデオビットストリームのシンタックスの一部として、又は符号化データに関連する定められたメディアメタデータとして、待ち時間の制約を示す１又は複数のシンタックスエレメントを含み得る。このような場合には、リアルタイム通信ツール（２１０）、符号化ツール（２１２）及び／又は待ち時間の減少した再生ツール（２１４）は、コーデックに依存する。ここで、これらのツールが行う決定は、特定の規格又はフォーマットのビットストリームシンタックスに依存し得る。 In some cases, the use of syntax elements to indicate latency (eg, frame reordering latency) is specific to a particular standard or format. For example, the encoded data may indicate latency constraints as part of the syntax of the base encoded video bitstream defined according to the standard or format, or as defined media metadata associated with the encoded data. Multiple syntax elements may be included. In such cases, the real-time communication tool (210), encoding tool (212) and / or reduced latency playback tool (214) is codec dependent. Here, the decisions made by these tools may depend on the bitstream syntax of a particular standard or format.

他の場合には、待ち時間（例えば、フレーム並べ替え待ち時間）の制約を示すためのシンタックスエレメントの使用は、特定の規格又はフォーマットの範囲外である。例えば、待ち時間の制約を示すシンタックスエレメントは、メディア送信ストリームのシンタックス、メディア記憶ファイル、又はより一般的にはメディアシステム多重プロトコル若しくは転送プロトコルの一部として伝達され得る。或いは、待ち時間を示すシンタックスエレメントは、メディア特性交渉プロトコルに従って、リアルタイム通信ツール（２１０）、符号化ツール（２１２）及び／又は再生ツール（２１４）の間で交渉され得る。このような場合には、リアルタイム通信ツール（２１０）、符号化ツール（２１２）及び／又は待ち時間の減少した再生ツール（２１４）は、コーデックに依存しない。ここで、これらのツールは、符号化中に設定されたフレーム間依存性に対する制御レベルを条件に、任意の利用可能なビデオエンコーダ及びデコーダと共に動作し得る。 In other cases, the use of syntax elements to indicate latency (eg, frame reordering latency) constraints are outside the scope of a particular standard or format. For example, syntax elements indicating latency constraints may be conveyed as part of the media transmission stream syntax, media storage file, or more generally the media system multiplexing protocol or transfer protocol. Alternatively, a syntax element indicating latency can be negotiated between the real-time communication tool (210), the encoding tool (212) and / or the playback tool (214) according to a media property negotiation protocol. In such cases, the real-time communication tool (210), the encoding tool (212) and / or the reduced latency playback tool (214) are codec independent. Here, these tools may work with any available video encoder and decoder, subject to the level of control over interframe dependencies set during encoding.

＜ＩＩＩ．例示的なエンコーダシステム＞
図３は、幾つかの記載の実施形態が実装され得る関連する例示的なエンコーダシステム（３００）の図である。エンコーダシステム（３００）は、リアルタイム通信のための短待ち時間符号化モード、トランスコーディングモード、及びファイル若しくはストリームからのメディア再生のための通常符号化モードのような複数の符号化モードのうちの任意のモードで動作可能な汎用符号化ツールであり得る。或いは、エンコーダシステム（３００）は、前述の符号化モードのうちの１つに適応される特定目的符号化ツールであり得る。エンコーダシステム（３００）は、オペレーティングシステムモジュールとして、アプリケーションライブラリの一部として、又は独立型アプリケーションとして実装できる。概して、エンコーダシステム（３００）は、ビデオソース（３１０）からソースビデオフレームシーケンス（３１１）を受信し、チャネル（３９０）への出力として符号化データを生成する。チャネルへ出力される符号化データは、待ち時間の減少した復号化を実現するために、待ち時間（例えば、フレーム並べ替え待ち時間）についての制約を示す１又は複数のシンタックスエレメントを含み得る。 <III. Exemplary Encoder System>
FIG. 3 is a diagram of a related exemplary encoder system (300) in which some described embodiments may be implemented. The encoder system (300) is an arbitrary one of a plurality of encoding modes, such as a short latency encoding mode for real-time communication, a transcoding mode, and a normal encoding mode for media playback from a file or stream. It can be a general-purpose encoding tool that can operate in the following modes. Alternatively, the encoder system (300) may be a special purpose encoding tool adapted to one of the aforementioned encoding modes. The encoder system (300) can be implemented as an operating system module, as part of an application library, or as a stand-alone application. In general, the encoder system (300) receives a source video frame sequence (311) from a video source (310) and generates encoded data as output to a channel (390). The encoded data output to the channel may include one or more syntax elements that indicate constraints on latency (eg, frame reordering latency) to achieve decoding with reduced latency.

ビデオソース（３１０）は、カメラ、チューナカード、記憶媒体、又は他のデジタルビデオソースであり得る。ビデオソース（３１０）は、例えば３０フレーム毎秒のフレームレートでビデオフレームシーケンスを生成する。本願明細書で用いられるように、用語「フレーム」は、概して、ソース、符号化又は再構成画像データを表す。プログレッシブビデオでは、フレームはプログレッシブビデオフレームである。例示的な実施形態では、インタレースビデオでは、インタレースビデオフレームは符号化の前にデインタレースされる。代替で、２つの相補的インタレースビデオフィールドは、インタレースビデオフレーム又は別個のフィールドとして符号化される。プログレッシブビデオフレームを示すこととは別に、用語「フレーム」は、単一のペアリングされていないビデオフィールド、ビデオフィールドの相補的ペア、所与の時間のビデオオブジェクトを表すビデオブジェクトプレーン、又は大きな画像内の関心領域を示し得る。ビデオオブジェクトプレーン又は領域は、シーンの複数のオブジェクト又は領域を含む大きな画像の一部であり得る。 The video source (310) can be a camera, tuner card, storage medium, or other digital video source. The video source (310) generates a video frame sequence at a frame rate of 30 frames per second, for example. As used herein, the term “frame” generally refers to source, encoded or reconstructed image data. In progressive video, the frame is a progressive video frame. In the exemplary embodiment, for interlaced video, interlaced video frames are deinterlaced before encoding. Alternatively, the two complementary interlaced video fields are encoded as interlaced video frames or separate fields. Apart from referring to progressive video frames, the term “frame” refers to a single unpaired video field, a complementary pair of video fields, a video object plane representing a video object at a given time, or a large image. May indicate a region of interest. A video object plane or region may be part of a larger image that includes multiple objects or regions of the scene.

到来するソースフレーム（３１１）は、複数のフレームバッファ記憶領域（３２１、３２２、．．．、３２ｎ）を含むソースフレーム一時記憶記憶領域（３２０）に格納される。フレームバッファ（３２１、３２２、等）は、ソースフレーム記憶領域（３２０）に１つのソースフレームを保持する。１又は複数のソースフレーム（３１１）がフレームバッファ（３２１、３２２、等）に格納された後、フレームセレクタ（３３０）は、ソースフレーム記憶領域（３２０）から個々のソースフレームを周期的に選択する。エンコーダ（３４０）への入力のためにフレームがフレームセレクタ（３３０）により選択される順序は、フレームがビデオソース（３１０）により生成される順序と異なっても良い。例えば、フレームは、時間的後方予測を実現するために順序の前の方にあっても良い。エンコーダ（３４０）の前に、エンコーダシステム（３００）は、選択されたフレーム（３３１）を符号化する前に前処理（例えば、フィルタリング）を実行するプリプロセッサ（図示しない）を有し得る。 The incoming source frame (311) is stored in a source frame temporary storage area (320) including a plurality of frame buffer storage areas (321, 322,..., 32n). The frame buffer (321, 322, etc.) holds one source frame in the source frame storage area (320). After one or more source frames (311) are stored in the frame buffer (321, 322, etc.), the frame selector (330) periodically selects individual source frames from the source frame storage area (320). . The order in which frames are selected by the frame selector (330) for input to the encoder (340) may be different from the order in which frames are generated by the video source (310). For example, the frames may be earlier in order to achieve temporal backward prediction. Before the encoder (340), the encoder system (300) may have a preprocessor (not shown) that performs pre-processing (eg, filtering) before encoding the selected frame (331).

エンコーダ（３４０）は、選択されたフレーム（３３１）を符号化し、符号化フレーム（３４１）を生成し、メモリ管理制御信号（３４２）も生成する。現在フレームが符号化された最初のフレームではない場合、符号化処理を実行するとき、エンコーダ（３４０）は、復号化フレーム一時記憶記憶領域（３６０）に格納されている１又は複数の前に符号化／復号化したフレーム（３６９）を用いても良い。このような格納された復号化フレーム（３６９）は、現在ソースフレーム（３３１）のコンテンツのインターフレーム予測のための参照フレームとして用いられる。概して、エンコーダ（３４０）は、動き推定及び補償、周波数変換、量子化及びエントロピ符号化のような符号化タスクを実行する複数の符号化モジュールを有する。エンコーダ（３４０）により実行される正確な動作は、圧縮フォーマットに依存して変化し得る。出力符号化データのフォーマットは、Windows(登録商標) Media Videoフォーマット、ＶＣ−１フォーマット、ＭＰＥＧ−ｘフォーマット（例えば、MPEG-１、MPEG-２又はMPEG-４）、Ｈ．２６ｘフォーマット（例えば、H.２６１、H.２６２、H.２６３、H.２６４）、ＨＥＶＣフォーマット又は他のフォーマットであり得る。 The encoder (340) encodes the selected frame (331), generates an encoded frame (341), and also generates a memory management control signal (342). When the current frame is not the first frame encoded, when performing the encoding process, the encoder (340) encodes one or more previous codes stored in the decoded frame temporary storage area (360). An encoded / decoded frame (369) may be used. The stored decoded frame (369) is used as a reference frame for interframe prediction of the content of the current source frame (331). In general, the encoder (340) has a plurality of encoding modules that perform encoding tasks such as motion estimation and compensation, frequency conversion, quantization and entropy encoding. The exact operation performed by the encoder (340) may vary depending on the compression format. The format of the output encoded data includes Windows (registered trademark) Media Video format, VC-1 format, MPEG-x format (for example, MPEG-1, MPEG-2, or MPEG-4), H.264, and so forth. It can be a 26x format (eg, H.261, H.262, H.263, H.264), HEVC format, or other formats.

符号化フレーム（３４１）及びメモリ管理制御信号（３４２）は、復号化処理エミュレータ（３５０）により処理される。復号化処理エミュレータ（３５０）は、デコーダの機能の一部、例えば、動き推定及び補償でエンコーダ（３４０）により用いられる参照フレームを再構成するための復号化タスクを実施する。復号化処理エミュレータ（３５０）は、メモリ管理制御信号（３４２）を用いて、所与の符号化フレーム（３４１）が、符号化されるべき後続フレームのインターフレーム予測で参照フレームとして用いるために、再構成され及び格納される必要があるか否かを決定する。制御信号（３４２）が符号化フレーム（３４１）は格納される必要があると示す場合、復号化処理エミュレータ（３５０）は、符号化フレーム８３４１）を受信し対応する復号化フレーム（３５１）を生成するデコーダにより行われる復号化処理をモデル化する。こうするとき、エンコーダ（３４０）が復号化フレーム記憶領域（３６０）に格納された復号化フレーム（３６９）を用いているとき、復号化処理エミュレータ（３５０）も、復号化処理の一部として記憶領域（３６０）からの復号化フレーム（３６９）を用いる。 The encoded frame (341) and the memory management control signal (342) are processed by the decoding processing emulator (350). The decoding processing emulator (350) performs a decoding task to reconstruct the reference frame used by the encoder (340) in some of the decoder functions, eg, motion estimation and compensation. The decoding emulator (350) uses the memory management control signal (342) to use a given encoded frame (341) as a reference frame in interframe prediction of subsequent frames to be encoded. Determine whether it needs to be reconfigured and stored. If the control signal (342) indicates that the encoded frame (341) needs to be stored, the decoding processing emulator (350) receives the encoded frame 8341) and generates a corresponding decoded frame (351). The decoding process performed by the decoder is modeled. In this case, when the encoder (340) is using the decoded frame (369) stored in the decoded frame storage area (360), the decoding processing emulator (350) is also stored as part of the decoding processing. The decoded frame (369) from the region (360) is used.

復号化フレーム一時記憶記憶領域（３６０）は、複数のフレームバッファ記憶領域（３６１、３６２、．．．、３６ｎ）を有する。復号化処理エミュレータ（３５０）が、参照フレームとしての使用のためにエンコーダ（３４０）によりもはや必要とされないフレームを有するフレームバッファ（３６１、３６２、．．．、３６ｎ）を識別するために、メモリ管理制御信号（３４２）を用いて、記憶領域（３６０）のコンテンツを管理する。復号化処理をモデル化した後に、復号化処理エミュレータ（３５０）は、この方法で識別されたフレームバッファ（３６１、３６２等）に新たな復号化フレーム（３５１）を格納する。 The decoded frame temporary storage area (360) has a plurality of frame buffer storage areas (361, 362, ..., 36n). Memory management for the decoding processing emulator (350) to identify frame buffers (361, 362,..., 36n) having frames that are no longer needed by the encoder (340) for use as reference frames The control signal (342) is used to manage the contents of the storage area (360). After modeling the decoding process, the decoding process emulator (350) stores the new decoded frame (351) in the frame buffer (361, 362, etc.) identified by this method.

符号化フレーム（３４１）及びメモリ管理制御信号（３４２）も、一時符号化データ領域（３７０）にバッファリングされる。符号化データ領域（３７０）に集められた符号化データは、基本符号化ビデオビットストリームのシンタックスの一部として、待ち時間の制約を示す１又は複数のシンタックスエレメントを有し得る。或いは、符号化データ領域（３７０）に集められた符号化データは、符号化ビデオデータに関連するメディアメタデータの一部として（例えば、１又は複数の相補的拡張情報（supplemental enhancement information：ＳＥＩ）メッセージ又はビデオ使用性情報（video usability information：ＶＵＩ）メッセージ内の１又は複数のパラメータとして）、待ち時間の制約を示すシンタックスエレメントを有し得る。 The encoded frame (341) and the memory management control signal (342) are also buffered in the temporary encoded data area (370). The encoded data collected in the encoded data area (370) may have one or more syntax elements indicating latency constraints as part of the syntax of the base encoded video bitstream. Alternatively, the encoded data collected in the encoded data area (370) may be part of media metadata associated with the encoded video data (eg, one or more supplemental enhancement information (SEI)). Message or video usability information (VUI) message (as one or more parameters), may have a syntax element indicating a latency constraint.

一時符号化データ領域（３７０）から集められたデータ（３７１）は、チャネルエンコーダ（３８０）により処理される。チャネルエンコーダ（３８０）は、メディアストリームとして送信するために集められたデータをパケット化し得る。この場合、チャネルエンコーダ（３８０）は、メディア送信ストリームのシンタックスの一部として、待ち時間の制約を示すシンタックスエレメントを追加できる。或いは、チャネルエンコーダ（３８０）は、ファイルとして格納するために集められたデータを編成し得る。この場合、チャネルエンコーダ（３８０）は、メディア格納ファイルのシンタックスの一部として、待ち時間の制約を示すシンタックスエレメントを追加できる。或いは、更に一般的には、チャネルエンコーダ（３８０）は、１又は複数のメディアシステム多重化プロトコル又は転送プロトコルを実装できる。この場合、チャネルエンコーダ（３８０）は、プロトコルのシンタックスの一部として、待ち時間の制約を示すシンタックスエレメントを追加できる。チャネルエンコーダ（３８０）は、格納、通信接続又は出力のための別のチャネルを表すチャネル（３９０）に出力を提供する。 Data (371) collected from the temporary encoded data area (370) is processed by the channel encoder (380). A channel encoder (380) may packetize the collected data for transmission as a media stream. In this case, the channel encoder (380) can add a syntax element indicating a latency constraint as part of the syntax of the media transmission stream. Alternatively, the channel encoder (380) may organize the collected data for storage as a file. In this case, the channel encoder (380) can add a syntax element indicating a latency constraint as part of the syntax of the media storage file. Or, more generally, the channel encoder (380) may implement one or more media system multiplexing protocols or transfer protocols. In this case, the channel encoder (380) can add a syntax element indicating a latency constraint as part of the protocol syntax. A channel encoder (380) provides an output to a channel (390) that represents another channel for storage, communication connection or output.

＜ＩＶ．例示的なデコーダシステム＞
図４は、幾つかの記載の実施形態が実装され得る関連する例示的なデコーダシステム（４００）の図である。デコーダシステム（４００）は、リアルタイム通信のための短待ち時間復号化モード、及びファイル若しくはストリームからのメディア再生のための通常復号化モードのような複数の復号化モードのうちの任意のモードで動作可能な汎用復号化ツールであり得る。或いは、デコーダシステム（４００）は、前述の復号化モードのうちの１つに適応される特定目的復号化ツールであり得る。デコーダシステム（４００）は、オペレーティングシステムモジュールとして、アプリケーションライブラリの一部として、又は独立型アプリケーションとして実装できる。概して、デコーダシステム（４００）は、符号化データをチャネル（４１０）から受信し、出力先（４９０）への出力として再構成フレームを生成する。符号化データは、待ち時間の減少した復号化を実現するために、待ち時間（例えば、フレーム並べ替え待ち時間）についての制約を示す１又は複数のシンタックスエレメントを含み得る。 <IV. Exemplary Decoder System>
FIG. 4 is a diagram of a related exemplary decoder system (400) in which some described embodiments may be implemented. The decoder system (400) operates in any of a plurality of decoding modes, such as a short latency decoding mode for real-time communication and a normal decoding mode for media playback from a file or stream. It can be a general purpose decryption tool possible. Alternatively, the decoder system (400) may be a special purpose decoding tool adapted to one of the aforementioned decoding modes. The decoder system (400) can be implemented as an operating system module, as part of an application library, or as a stand-alone application. In general, the decoder system (400) receives encoded data from a channel (410) and generates a reconstructed frame as output to an output destination (490). The encoded data may include one or more syntax elements that indicate constraints on latency (eg, frame reordering latency) to achieve decoding with reduced latency.

デコーダシステム（４００）は、格納、通信接続又は符号化データのための別のチャネルを表すチャネル（４１０）を入力として有する。チャネル（４１０）は、チャネル符号化された符号化データを生成する。チャネルデコーダ（４２０）は、符号化データを処理できる。例えば、チャネルデコーダ（４２０）は、メディアストリームとして送信するために集められたデータをデパケット化し得る。この場合、チャネルデコーダ（４２０）は、メディア送信ストリームのシンタックスの一部として、待ち時間の制約を示すシンタックスエレメントを解析（パース、parse）できる。或いは、チャネルデコーダ（４２０）は、ファイルとして格納するために集められた符号化ビデオデータを分離する。この場合、チャネルデコーダ（４２０）は、メディア格納ファイルのシンタックスの一部として、待ち時間の制約を示すシンタックスエレメントを解析（パース、parse）できる。或いは、更に一般的には、チャネルデコーダ（４２０）は、１又は複数のメディアシステム逆多重化プロトコル又は転送プロトコルを実装できる。この場合、チャネルデコーダ（４２０）は、プロトコルのシンタックスの一部として、待ち時間の制約を示すシンタックスエレメントを解析できる。 The decoder system (400) has as input a channel (410) representing another channel for storage, communication connection or encoded data. The channel (410) generates encoded data that is channel-encoded. The channel decoder (420) can process the encoded data. For example, the channel decoder (420) may depacketize the collected data for transmission as a media stream. In this case, the channel decoder (420) can parse a syntax element indicating a latency constraint as part of the syntax of the media transmission stream. Alternatively, the channel decoder (420) separates the encoded video data collected for storage as a file. In this case, the channel decoder (420) can parse a syntax element indicating a waiting time constraint as part of the syntax of the media storage file. Or, more generally, the channel decoder (420) may implement one or more media system demultiplexing or transfer protocols. In this case, the channel decoder (420) can analyze a syntax element indicating a latency constraint as part of the protocol syntax.

チャネルデコーダ（４２０）から出力される符号化データ（４２１）は、十分な量のデータが受信されるまで、一時符号化データ領域（４３０）に格納される。符号化データ（４２１）は、符号化フレーム（４３１）及びメモリ管理制御信号（４３２）を有する。符号化データ領域（４３０）内の符号化データ（４２１）は、基本符号化ビデオビットストリームのシンタックスの一部として、待ち時間の制約を示す１又は複数のシンタックスエレメントを有し得る。或いは、符号化データ領域（４３０）内の符号化データ（４２１）は、符号化ビデオデータに関連するメディアメタデータの一部として（例えば、１又は複数のＳＥIメッセージ又はＶＵＩメッセージ内の１又は複数のパラメータとして）、待ち時間の制約を示すシンタックスエレメントを有し得る。概して、符号化データ領域（４３０）は、符号化データ（４２１）がデコーダ（４５０）により使用されるまで、符号化データ（４２１）を一時的に格納する。この点で、符号化フレーム（４３１）及びメモリ管理制御信号（４３２）の符号化データは、符号化データ領域（４３０）からデコーダ（４５０）へ送られる。復号化が続くにつれ、新しい符号化データが符号化データ領域（４３０）に追加され、符号化データ領域（４３０）に残っている古い符号化データはデコーダ（４５０）へ送られる。 The encoded data (421) output from the channel decoder (420) is stored in the temporary encoded data area (430) until a sufficient amount of data is received. The encoded data (421) includes an encoded frame (431) and a memory management control signal (432). The encoded data (421) in the encoded data area (430) may have one or more syntax elements indicating latency constraints as part of the syntax of the base encoded video bitstream. Alternatively, the encoded data (421) in the encoded data area (430) may be part of the media metadata associated with the encoded video data (eg, one or more in one or more SEI messages or VUI messages). May have a syntax element indicating a latency constraint. In general, the encoded data area (430) temporarily stores the encoded data (421) until the encoded data (421) is used by the decoder (450). At this point, the encoded data of the encoded frame (431) and the memory management control signal (432) is sent from the encoded data area (430) to the decoder (450). As decoding continues, new encoded data is added to the encoded data area (430) and the old encoded data remaining in the encoded data area (430) is sent to the decoder (450).

デコーダ（４５０）は、符号化フレーム（４３１）を周期的に復号化して対応する復号化フレーム（４５１）を生成する。適切な場合、復号化処理を実行するとき、デコーダ（４５０）は、１又は複数の以前復号化フレーム（４６９）をインターフレーム予測の参照フレームとして用いても良い。デコーダ（４５０）は、このような以前復号化フレーム（４６９）を復号化フレーム一時記憶記憶領域（４６０）から読み出す。概して、デコーダ（４５０）は、エントロピ復号化、逆量子化、逆周波数変換、及び動き補償のような復号化タスクを実行する複数の復号化モジュールを有する。デコーダ（４５０）により実行される正確な動作は、圧縮フォーマットに依存して変化し得る。 The decoder (450) periodically decodes the encoded frame (431) to generate a corresponding decoded frame (451). Where appropriate, when performing the decoding process, the decoder (450) may use one or more previously decoded frames (469) as reference frames for inter-frame prediction. The decoder (450) reads such a previously decoded frame (469) from the decoded frame temporary storage area (460). In general, the decoder (450) has a plurality of decoding modules that perform decoding tasks such as entropy decoding, inverse quantization, inverse frequency transform, and motion compensation. The exact operation performed by the decoder (450) may vary depending on the compression format.

復号化フレーム一時記憶記憶領域（４６０）は、複数のフレームバッファ記憶領域（４６１、４６２、．．．、４６ｎ）を有する。復号化フレーム記憶領域（４６０）は、復号化ピクチャバッファの一例である。デコーダ（４５０）は、メモリ管理制御信号（４３２）を用いて、復号化フレーム（４５１）を格納できるフレームバッファ（４６１、４６２等）を識別する。デコーダ（４５０）は、そのフレームバッファに復号化フレーム（４５１）を格納する。 The decoded frame temporary storage area (460) has a plurality of frame buffer storage areas (461, 462, ..., 46n). The decoded frame storage area (460) is an example of a decoded picture buffer. The decoder (450) uses the memory management control signal (432) to identify a frame buffer (461, 462, etc.) that can store the decoded frame (451). The decoder (450) stores the decoded frame (451) in the frame buffer.

出力シーケンサ（４８０）は、メモリ管理制御信号（４３２）を用いて、出力順序で生成されるべき次のフレームが復号化フレーム記憶領域（４６０）内で利用できるときを識別する。符号化−復号化システムの待ち時間を低減するために、出力シーケンサ（４８０）は、待ち時間の制約を示すシンタックスエレメントを用いて、出力順序で生成されるべきフレームの識別を迅速に処理する。出力順序で生成されるべき次のフレーム（４８１）が復号化フレーム記憶領域（４６０）内で利用可能なとき、該フレームは、出力シーケンサ（４８０）により読み出され、出力先（４９０）（例えば、ディスプレイ）に出力される。概して、フレームが復号化フレーム記憶領域（４６０）から出力シーケンサ（４８０）により出力される順序は、フレームがデコーダ（４５０）により復号化される順序と異なっても良い。 The output sequencer (480) uses the memory management control signal (432) to identify when the next frame to be generated in output order is available in the decoded frame storage area (460). To reduce the latency of the encoding-decoding system, the output sequencer (480) uses syntax elements that indicate latency constraints to quickly process the identification of frames to be generated in output order. . When the next frame (481) to be generated in output order is available in the decoded frame storage area (460), it is read by the output sequencer (480) and output to (490) (eg, To the display). In general, the order in which frames are output from the decoded frame storage area (460) by the output sequencer (480) may be different from the order in which frames are decoded by the decoder (450).

＜Ｖ．待ち時間の低減した符号化及び復号化を実現するシンタックスエレメント＞
大部分のビデオコーデックシステムでは、符号化順序（復号化順序又はビットストリーム順序とも称される）は、ビデオフレームがビットストリーム内の復号化データ内に現れる、したがって復号化中に処理される順序である。符号化順序は、符号化前にフレームがカメラによりキャプチャされる順序と異なっても良く、復号化フレームが復号化後に表示、格納又はその他の場合には出力される順序（出力順序又は表示順序）と異なっても良い。出力順序に対するフレームの並べ替えは、（主に圧縮性能の観点で）有利であるが、符号化及び復号化処理のエンドエンド間の待ち時間を増大してしまう。 <V. Syntax element for realizing encoding and decoding with reduced waiting time>
In most video codec systems, the encoding order (also referred to as decoding order or bitstream order) is the order in which video frames appear in the decoded data in the bitstream and are therefore processed during decoding. is there. The encoding order may be different from the order in which frames are captured by the camera prior to encoding, and the order in which decoded frames are displayed, stored or otherwise output after decoding (output order or display order) And may be different. Although the rearrangement of frames with respect to the output order is advantageous (mainly in terms of compression performance), it increases the latency between the end and end of the encoding and decoding processes.

本願明細書に記載の技術及びツールは、ビデオフレームの並べ替えに起因する待ち時間を低減し、また、並べ替え待ち時間の制約に関する情報をデコーダシステムに提供することにより、デコーダシステムによる待ち時間の低減も実現する。このような待ち時間の低減は、多くの目的で有用である。例えば、待ち時間の低減は、ビデオ会議システムを用いた双方向ビデオ通信で生じるタイムラグを低減するために用いることができ、遠隔にいる参加者間の会話の流れ及び通信の双方向性がより素早く且つ自然になる。 The techniques and tools described herein reduce latency due to video frame reordering and also provide latency information by the decoder system by providing information about the reordering latency constraint to the decoder system. Reduction is also realized. Such a reduction in waiting time is useful for many purposes. For example, latency reduction can be used to reduce the time lag that occurs in two-way video communication using a video conferencing system, allowing faster conversation flow and communication bidirectionality between remote participants. And become natural.

Ａ．出力タイミング及び出力順序のためのアプローチ
Ｈ．２６４規格によると、デコーダは、復号化フレームの出力準備ができたときを決定するために２つのアプローチを用いることができる。デコーダは、（例えば、ピクチャタイミングＳＥＩメッセージ内で伝達されるとき）復号化タイムスタンプ及び出力タイムスタンプの形式のタイミング情報を用いることができる。或いは、デコーダは、復号化フレームが出力準備ができたときを決定するために種々のシンタックスエレメントで伝達されるバッファリング性能限界を用いることができる。 A. Approach for output timing and output order According to the H.264 standard, a decoder can use two approaches to determine when a decoded frame is ready for output. The decoder can use timing information in the form of a decoding timestamp and an output timestamp (eg, when conveyed in a picture timing SEI message). Alternatively, the decoder can use buffering performance limits conveyed in various syntax elements to determine when the decoded frame is ready for output.

タイミング情報は、各復号化フレームに関連付けることができる。デコーダは、復号化フレームが出力できるときを決定するためにタイミング情報を用いることができる。しかしながら、実際には、このようなタイミング情報は、デコーダには利用可能でないかも知れない。さらに、タイミング情報が利用可能なときでも、幾つかのデコーダは、（デコーダがタイミング情報が利用可能か否かに拘わらず動作するよう設計されているために）実際にはこの情報を用いない。 Timing information can be associated with each decoded frame. The decoder can use the timing information to determine when the decoded frame can be output. In practice, however, such timing information may not be available to the decoder. Furthermore, even when timing information is available, some decoders do not actually use this information (because the decoder is designed to operate regardless of whether timing information is available).

バッファリング性能限界は、シンタックスエレメントmax_dec_frame_buffering、シンタックスエレメントnum_reorder_frames、相対順序情報（「ピクチャ順番（picture order count）」と称される）及びビットストリーム内で伝達される他のメモリ管理制御情報を含む、Ｈ．２６４規格（及びＨＥＶＣ規格のドラフトバージョン）に従う幾つかのシンタックスエレメントで示される。シンタックスエレメントmax_dec_frame_buffering（又はMaxDpbFramesとして指定される導出された変数）は、フレームバッファのユニット内の復号化ピクチャバッファ（decoded picture buffer：DPB）の所要サイズを指定する。したがって、シンタックスエレメントmax_dec_frame_bufferingは、符号化ビデオシーケンスに用いられるトップレベルのメモリ容量を表し、デコーダに正しい順序でピクチャを出力させる。シンタックスエレメントnum_reorder_frames（又はmax_num_reorder_frames）は、符号化順序で任意のフレーム（又は相補的フィールドペア、又は非ペアフィールド）に先行する及び出力順序で後続するフレーム（又は相補的フィールドペア、又は非ペアフィールド）の最大数を示す。言い換えると、num_reorder_framesは、ピクチャ並べ替えに必要なメモリ容量についての制約を指定する。シンタックスエレメントmax_num_ref_framesは、シーケンス内の任意のピクチャのインター予測のために復号化処理により用いられ得る短期及び長期参照フレーム（又は相補的参照フィールドペア、又は非ペア参照フィールド）の最大数を指定する。シンタックスエレメントmax_num_ref_framesは、復号化参照ピクチャのマーク付けのためにスライディングウインドウの大きさも決定する。num_reorder_framesと同様に、max_num_ref_framesは必要メモリ容量についての制約を指定する。 Buffering performance limits include syntax element max_dec_frame_buffering, syntax element num_reorder_frames, relative order information (referred to as “picture order count”) and other memory management control information conveyed in the bitstream. H., et al. It is shown with several syntax elements according to the H.264 standard (and the draft version of the HEVC standard). The syntax element max_dec_frame_buffering (or a derived variable specified as MaxDpbFrames) specifies the required size of the decoded picture buffer (DPB) in the unit of the frame buffer. Thus, the syntax element max_dec_frame_buffering represents the top level memory capacity used for the encoded video sequence and causes the decoder to output pictures in the correct order. The syntax element num_reorder_frames (or max_num_reorder_frames) is a frame (or complementary field pair or non-pair field) that precedes any frame (or complementary field pair or non-pair field) in encoding order and follows in output order. ) Is the maximum number. In other words, num_reorder_frames specifies a constraint on the memory capacity necessary for picture rearrangement. The syntax element max_num_ref_frames specifies the maximum number of short-term and long-term reference frames (or complementary reference field pairs, or non-paired reference fields) that can be used by the decoding process for inter prediction of any picture in the sequence. . The syntax element max_num_ref_frames also determines the size of the sliding window for marking the decoded reference picture. Like num_reorder_frames, max_num_ref_frames specifies constraints on the required memory capacity.

デコーダは、max_dec_frame_buffering（又はMaxDpbFrames）及びnum_reorder_framesシンタックスエレメントを用いて、バッファリング性能限界を超えたときを決定する。これは、例えば、新しい復号化フレームがＤＰＢに格納する必要があるが、ＤＰＢ内に利用可能な残り領域がないときに生じる。この状況では、デコーダは、ピクチャ順番情報を用いて、復号化されたピクチャの中からどれが出力順序で最も早いかを識別する。次に、出力順序で最も早いピクチャが出力される。このような処理は、格納する必要のある新しいピクチャの到着によりピクチャがＤＰＢの「外にはじき出される」ので、「バンピング」と呼ばれることが多い。 The decoder uses max_dec_frame_buffering (or MaxDpbFrames) and num_reorder_frames syntax elements to determine when the buffering performance limit has been exceeded. This occurs, for example, when a new decoded frame needs to be stored in the DPB, but there is no remaining space available in the DPB. In this situation, the decoder uses the picture order information to identify which of the decoded pictures is the earliest in output order. Next, the earliest picture in the output order is output. Such a process is often referred to as “bumping” because a new picture that needs to be stored arrives and the picture is “pushed out” of the DPB.

デコーダは、max_dec_frame_buffering（又はMaxDpbFrames）及びnum_reorder_framesシンタックスエレメントで示される情報は、デコーダで必要なメモリ容量を決定するのに十分である。しかしながら、ピクチャを出力するための「バンピング」処理を制御するために用いられるとき、このような情報の使用は、望ましくない待ち時間を導入し得る。Ｈ．２６４規格で定められるように、max_dec_frame_buffering及びnum_reorder_framesシンタックスエレメントは、任意の特定のピクチャに適用され得る並べ替えの量についての限界を定めず、したがってエンドエンド間待ち時間についての限界を定めない。これらのシンタックスエレメントの値に関係なく、特定のピクチャは出力前の任意の長い時間の間、ＤＰＢ内に保持され得る。この長い時間は、エンコーダによるソースピクチャの事前バッファリングにより追加される実質的待ち時間に対応する。 For the decoder, the information indicated by the max_dec_frame_buffering (or MaxDpbFrames) and num_reorder_frames syntax elements is sufficient to determine the memory capacity required by the decoder. However, the use of such information can introduce undesirable latency when used to control a “bumping” process to output a picture. H. As defined in the H.264 standard, the max_dec_frame_buffering and num_reorder_frames syntax elements do not set a limit on the amount of reordering that can be applied to any particular picture, and therefore do not set a limit on end-to-end latency. Regardless of the value of these syntax elements, a particular picture can be kept in the DPB for any long time before output. This long time corresponds to the substantial latency added by the pre-buffering of the source picture by the encoder.

Ｂ．フレーム並べ替え待ち時間の制約を示すシンタックスエレメント
本願明細書に記載の技術及びツールは、ビデオ通信システムにおける待ち時間を低減する。符号化ツール、リアルタイム通信ツール又は他のツールは、符号化ビデオシーケンス内の任意のフレームに適用できる並べ替え範囲についての限界を設定する。例えば、この限界は、出力順序において符号化ビデオシーケンス内の任意の所与のフレームに先行できるフレーム及び符号化順序において後続できるフレームの数として表される。この限界は、シーケンス内の任意の特定のフレームについて許容される並べ替え待ち時間を制限する。言い方を変えると、この限界は、任意の特定のフレームに適用できる符号化順序と出力順序との間の並べ替えの時間範囲を（フレームの観点で）制限する。並べ替え順序の制限は、エンドエンド間遅延を低減するのに役立つ。また、このような限界の設定は、待ち時間の低減が重要な使用シナリオのためのリアルタイムシステムの交渉プロトコル又はアプリケーション仕様で有用であり得る。 B. Syntax Element Showing Frame Reordering Latency Constraints The techniques and tools described herein reduce latency in video communication systems. Encoding tools, real-time communication tools, or other tools set limits on the reordering range that can be applied to any frame in the encoded video sequence. For example, this limit is expressed as the number of frames that can precede any given frame in the encoded video sequence in the output order and frames that can follow in the encoding order. This limit limits the reordering latency allowed for any particular frame in the sequence. In other words, this limit limits (in terms of frames) the time range of reordering between encoding order and output order that can be applied to any particular frame. Sorting order restrictions help to reduce end-to-end delay. Such limit setting may also be useful in a real-time system negotiation protocol or application specification for usage scenarios where latency reduction is important.

１又は複数のシンタックスエレメントは、フレーム並べ替え待ち時間についての制約を示す。フレーム並べ替え待ち時間についての制約を伝達することは、双方向リアルタイム通信又は他の使用シナリオでのシステムレベルの交渉を可能にする。これは、フレーム並べ替え待ち時間についての制約を直接表しメディアストリーム又はセッションの特性を特徴付ける方法を提供する。 One or more syntax elements indicate constraints on frame reordering latency. Communicating constraints on frame reordering latency enables system level negotiation in two-way real-time communication or other usage scenarios. This directly represents constraints on frame reordering latency and provides a way to characterize media stream or session characteristics.

ビデオデコーダは、復号化ビデオフレームの出力の待ち時間を低減させるために、フレーム並べ替え待ち時間についての示された制約を用いることができる。特に、フレーム「バンピング」処理と比べて、フレーム並べ替え待ち時間についての制約を伝達することは、デコーダが、出力準備のできているＤＰＢ内のフレームをより簡単に且つ素早く識別することを可能にする。例えば、デコーダは、フレームの符号化順序と出力順序との間の差を計算することにより、ＤＰＢ内のフレームの待ち時間状態を決定できる。フレームの待ち時間状態をフレーム並べ替え待ち時間についての制約と比較することにより、デコーダは、フレーム並べ替え待ち時間についての制約に達したときを決定できる。デコーダは、この限界に達したフレームを直ちに出力できる。これは、種々のシンタックスエレメント及び追跡構造を用いる「バンピング」処理と比べて、デコーダが出力準備のできているフレームをより迅速に識別するのを助ける。このように、デコーダは、復号化フレームが出力できるときを迅速に（且つ早く）決定できる。フレームが出力できるときをデコーダがより迅速に（及び早く）識別できるほど、デコーダは、ディスプレイ又は後続の処理段階により迅速に（及び早く）ビデオを出力できる。 The video decoder can use the indicated constraints on frame reordering latency to reduce the latency of output of the decoded video frame. In particular, communicating constraints on frame reordering latency compared to frame “bumping” processing allows the decoder to more easily and quickly identify frames in the DPB that are ready for output. To do. For example, the decoder can determine the latency state of a frame in the DPB by calculating the difference between the frame encoding order and the output order. By comparing the frame latency state to the constraints on frame reordering latency, the decoder can determine when the constraints on frame reordering latency are reached. The decoder can immediately output a frame that has reached this limit. This helps the decoder more quickly identify frames that are ready for output, as compared to a “bumping” process that uses various syntax elements and tracking structures. In this way, the decoder can quickly (and quickly) determine when a decoded frame can be output. The more quickly (and faster) the decoder can identify when a frame can be output, the faster the decoder can output (and earlier) video to the display or subsequent processing steps.

したがって、フレーム並べ替え待ち時間についての制約を用いて、デコーダは、復号化フレーム記憶領域が一杯になる前に、復号化フレーム記憶領域からフレームを出力し始めることができるが、依然として適合した復号化を提供できる（つまり、フレームが別の従来の方式を用いて復号化されたフレームとビットレベルで正確に一致するように、全てのフレームを復号化する）。これは、待ち時間シンタックスエレメントにより示される（フレーム）遅延が復号化フレーム記憶領域の（フレーム単位の）サイズより遙かに小さいとき、遅延を有意に低減する。 Thus, with constraints on frame reordering latency, the decoder can begin outputting frames from the decoded frame storage area before the decoded frame storage area is full, but is still compatible decoding. (I.e., all frames are decoded so that the frames exactly match the frames decoded using another conventional scheme at the bit level). This significantly reduces the delay when the (frame) delay indicated by the latency syntax element is much smaller than the size (in frames) of the decoded frame storage area.

図５Ａ−５Ｅは、異なるフレーム間依存性を有するフレームのシリーズ（５０１−５０５）を示す。シリーズは、（１）ピクチャ並べ替えに必要なメモリ容量についての制約（つまり、例えばシンタックスエレメントnum_reorder_framesで示されるような、並べ替えの目的で参照フレームを格納するために用いられるフレームバッファの数）、及び（２）例えば、変数MaxLatencyFramesにより指定されるようなフレーム並べ替え待ち時間についての制約、の異なる値により特徴付けられる。図５Ａ−５Ｅでは、所与のフレームＦ_ｊ ^ｋについて、下付文字ｊは出力順序内のフレームの位置を示し、上付文字ｋは符号化順序内のフレームの位置を示す。フレームは、出力順序で示され、出力順序では下付文字の値は左から右へ増大する。矢印は、動き補償のためのフレーム間依存性を示す。これにより、符号化順序において先行するフレームは、符号化順序において後続のフレームの予測に用いられる。簡単のため、図５Ａ−５Ｅは、（参照フレームが変わり得るマクロブロック、ブロック等のレベルではなく）フレームレベルでのフレーム間依存性を示す。図５Ａ−５Ｅは、所与のフレームのための参照フレームとして最大２フレームを示す。実際には、幾つかの実装では、所与のフレーム内の異なるマクロブロック、ブロック等が、異なる参照フレームを用いることができ、２以上の参照フレームを所与のフレームのために用いることができる。 5A-5E show a series of frames (501-505) with different inter-frame dependencies. The series is (1) a constraint on the memory capacity required for picture reordering (ie, the number of frame buffers used to store reference frames for reordering purposes, eg as indicated by the syntax element num_reorder_frames) And (2) characterized by different values of the constraints on the frame reordering latency, for example as specified by the variable MaxLatencyFrames. In FIGS. 5A-5E, for a given frame F _j ^k , the subscript j indicates the position of the frame in the output order, and the superscript ^k indicates the position of the frame in the encoding order. The frames are shown in output order, where the subscript value increases from left to right. Arrows indicate interframe dependence for motion compensation. Thereby, the preceding frame in the encoding order is used for prediction of the subsequent frame in the encoding order. For simplicity, FIGS. 5A-5E show inter-frame dependencies at the frame level (rather than at the level of macroblocks, blocks, etc. where the reference frame may change). 5A-5E show up to two frames as reference frames for a given frame. In practice, in some implementations, different macroblocks, blocks, etc. within a given frame can use different reference frames, and more than one reference frame can be used for a given frame. .

図５Ａでは、シリーズ（５０１）は９個のフレームを有する。出力順序における最後のフレームＦ_８ ^１は、最初のフレームＦ_０ ^０を参照フレームとして用いる。シリーズ（５０１）内の他のフレームは、最後のフレームＦ_８ ^１及び最初のフレームＦ_０ ^０の両方を参照フレームとして用いる。これは、フレームＦ_０ ^０が最初に復号化され、次にフレームＦ_８ ^１が続き、次にフレームＦ_１ ^２が続く、等である。図５Ａに示すシリーズ（５０１）では、num_reorder_framesの値は１である。デコーダシステムの処理の任意の時点で、図５Ａに示すフレームのうち、並べ替え目的で復号化フレーム記憶領域に格納されるフレームは１個のみ（Ｆ_８ ^１）である。（最初のフレームも、参照フレームＦ_０ ^０として用いられ格納されるが、並べ替え目的では格納されない。最初のフレームＦ_０ ^０の出力順序は中間フレームの出力順序より小さいので、最初のフレームＦ_０ ^０はnum_reorder_framesの目的でカウントされない。）num_reorder_framesが低い値にもかかわらず、シリーズ（５０１）は比較的長い待ち時間を有する。MaxLatencyFramesの値は７である。最初のフレームＦ_０ ^０を符号化した後、エンコーダは、次のフレームＦ_１ ^２がシリーズ（５０１）の最後のフレームＦ_８ ^１に依存するので、出力順序で次のフレームＦ_１ ^２を符号化する前に、８以上のソースフレームをバッファリングするまで待つ。MaxLatencyFramesの値は、事実上、任意の特定の符号化フレームについて下付文字の値と上付文字の値との間の最大許容差である。 In FIG. 5A, the series (501) has nine frames. The last frame F ₈ ¹ in the output order uses the first frame F ₀ ⁰ as a reference frame. The other frames in the series (501) use both the last frame F ₈ ¹ and the first frame F ₀ ⁰ as reference frames. This is because frame F ₀ ⁰ is decoded first, followed by frame F ₈ ¹ , followed by frame F ₁ ² , and so on. In the series (501) shown in FIG. 5A, the value of num_reorder_frames is 1. At any point in the processing of the decoder system, only one frame (F ₈ ¹ ) is stored in the decoded frame storage area for rearrangement among the frames shown in FIG. 5A. (The first frame is also used and stored as reference frame F ₀ ⁰ but is not stored for reordering purposes. Since the output order of the first frame F ₀ ⁰ is less than the output order of the intermediate frames, the first frame F ₀ ⁰ is not counted for the purpose of num_reorder_frames.) Despite the low value of num_reorder_frames, series (501) has a relatively long latency. The value of MaxLatencyFrames is 7. After encoding the first frame F ₀ ⁰ , the encoder encodes the next frame F ₁ ² in output order because the next frame F ₁ ² depends on the last frame F ₈ ¹ of the series (501). Wait until 8 or more source frames are buffered before doing so. The MaxLatencyFrames value is effectively the maximum allowable difference between the subscript value and the superscript value for any particular encoding frame.

図５Ｂで、図５Ａのシリーズ（５０１）のように、シリーズ（５０２）は９個のフレームを有するが、フレーム間依存性は異なる。フレームの時間的並べ替えは、短い範囲に渡って生じる。その結果、シリーズ（５０２）は非常に小さい待ち時間を有し、MaxLatencyFramesの値は１である。num_reorder_framesの値は依然として１である。 In FIG. 5B, like the series (501) in FIG. 5A, the series (502) has nine frames, but the inter-frame dependencies are different. The temporal reordering of frames occurs over a short range. As a result, the series (502) has a very low latency and the value of MaxLatencyFrames is 1. The value of num_reorder_frames is still 1.

図５Ｃでは、シリーズ（５０３）は１０個のフレームを有する。最長フレーム間依存性は、図５の最長フレーム間依存性よりも（時間的範囲の点で）短いが、図５Ｂの最長フレーム間依存性よりも長い。シリーズ（５０３）は、num_reorder_framesの同じく低い値１を有するが、MaxLatencyFramesの比較的低い値２を有する。したがって、シリーズ（５０３）は、図５Ａのシリーズ（５０１）よりも小さいエンドエンド間待ち時間を可能にするが、図５Ｂのシリーズ（５０２）の許容可能待ち時間ほど小さくはない。 In FIG. 5C, the series (503) has 10 frames. The longest interframe dependency is shorter (in terms of time range) than the longest interframe dependency of FIG. 5, but longer than the longest interframe dependency of FIG. 5B. Series (503) has a similarly low value 1 for num_reorder_frames, but a relatively low value 2 for MaxLatencyFrames. Thus, the series (503) allows a lower end-to-end latency than the series (501) of FIG. 5A, but not as small as the allowable latency of the series (502) of FIG. 5B.

図５Ｄで、シリーズ（５０４）は、フレーム間依存性に従う３つの時間レイヤを有する時間的階層構造に編成されるフレームを有する。最下位の時間分解能レイヤは、最初のフレームＦ_０ ^０及び最後のフレームＦ_８ ^１を有する。次の時間分解能レイヤは、最初のフレームＦ_４ ^２を追加し、最初のフレームＦ_０ ^０及び最後のフレームＦ_８ ^１に依存する。最上位の時間分解能レイヤは、残りのフレームを追加する。図５Ｄに示すシリーズ（５０４）は、少なくとも最上位の時間分解能レイヤで、最後のフレームＦ_８ ^１の符号化順序と出力順序との間の差により、num_reorder_framesの比較的低い値２を有するが、MaxLatencyFramesの比較的高い値７を有する。中間の時間分解能レイヤ又は最下位の時間分解能レイヤのみが復号化される場合、フレーム並べ替え遅延についての制約は、（中間レイヤでは）１又は（最下位レイヤでは）０まで低減できる。種々の時間分解能で待ち時間の低減した復号化を実現するために、シンタックスエレメントは、時間的階層構造内の異なるレイヤについてフレーム並べ替え待ち時間の制約を示すことができる。 In FIG. 5D, the series (504) has frames organized in a temporal hierarchy with three temporal layers subject to interframe dependencies. The lowest temporal resolution layer has a first frame F ₀ ⁰ and a last frame F ₈ ¹ . The next temporal resolution layer adds the first frame F ₄ ² and depends on the first frame F ₀ ⁰ and the last frame F ₈ ¹ . The top temporal resolution layer adds the remaining frames. The series (504) shown in FIG. 5D has a relatively low value of 2 for num_reorder_frames, due to the difference between the encoding order and output order of the last frame F ₈ ¹ at least in the highest temporal resolution layer, It has a relatively high value of 7 for MaxLatencyFrames. If only the intermediate time resolution layer or the lowest time resolution layer is decoded, the constraints on frame reordering delay can be reduced to 1 (in the intermediate layer) or 0 (in the lowest layer). In order to achieve low latency decoding with various temporal resolutions, syntax elements can indicate frame reordering latency constraints for different layers in the temporal hierarchy.

図５Ｅで、シリーズ（５０５）は、異なるフレーム間依存性に従う３つの時間レイヤを有する時間的階層構造に編成されるフレームを有する。最下位の時間分解能レイヤは、最初のフレームＦ_０ ^０、中間フレームＦ_４ ^１及び最後のフレームＦ_８ ^５を有する。次の時間分解能レイヤは、（最初のフレームＦ_０ ^０及び中間フレームＦ_４ ^１に依存する）フレームＦ_２ ^２と（中間フレームＦ_４ ^１及び最後のフレームＦ_８ ^５に依存する）フレームＦ_６ ^６を追加する。最上位の時間分解能レイヤは、残りのフレームを追加する。図５Ｄのシリーズ（５０４）と比べると、図５Ｅのシリーズ（５０５）は、少なくとも最上位の時間分解能レイヤで、中間フレームＦ_４ ^１及び最後のフレームＦ_８ ^５の符号化順序と出力順序との間の差により、num_reorder_framesの比較的低い値２を有するが、MaxLatencyFramesの比較的低い値３を有する。中間の時間分解能レイヤ又は最下位の時間分解能レイヤのみが復号化される場合、フレーム並べ替え遅延についての制約は、（中間レイヤでは）１又は（最下位レイヤでは）０まで低減できる。 In FIG. 5E, the series (505) has frames organized into a temporal hierarchy with three temporal layers that follow different inter-frame dependencies. The lowest temporal resolution layer has a first frame F ₀ ⁰ , an intermediate frame F ₄ ¹ and a last frame F ₈ ⁵ . The next temporal resolution layer consists of frame F ₂ ² (which depends on the first frame F ₀ ⁰ and intermediate frame F ₄ ¹ ) and frame F ₆ ⁶ (which depends on the intermediate frame F ₄ ¹ and last frame F ₈ ⁵ ). Add The top temporal resolution layer adds the remaining frames. Compared to the series (504) in FIG. 5D, the series (505) in FIG. 5E is at least the highest temporal resolution layer, and the encoding order and output order of the intermediate frame F ₄ ¹ and the last frame F ₈ ⁵ Due to the difference between them, it has a relatively low value of 2 for num_reorder_frames but a relatively low value of 3 for MaxLatencyFrames. If only the intermediate time resolution layer or the lowest time resolution layer is decoded, the constraints on frame reordering delay can be reduced to 1 (in the intermediate layer) or 0 (in the lowest layer).

図５Ａ−５Ｅに示す例では、MaxLatencyFramesの値が分かる場合、デコーダは、出力順序で先行するフレームを受信すると、中間出力のために準備の出来ている特定のフレームを識別できる。所与のフレームで、フレームの出力順序の値からフレームの符号化順序の値を差し引いたものは、MaxLatencyFramesの値に等しくても良い。この場合、所与のフレームは、出力順序で先行するフレームが受信されると直ぐに出力のために準備される。（これに対し、このようなフレームは、追加フレームが受信されるまで又はシーケンスの終わりに達するまで、num_reorder_framesだけを用いて出力の準備ができていると識別できなかった。）特に、デコーダは、MaxLatencyFramesの値を用いて以下のフレームの早期の出力を可能にすることができる。 In the example shown in FIGS. 5A-5E, if the value of MaxLatencyFrames is known, the decoder can identify a particular frame that is ready for intermediate output when it receives a preceding frame in output order. For a given frame, the frame output order value minus the frame encoding order value may be equal to the value of MaxLatencyFrames. In this case, a given frame is prepared for output as soon as the preceding frame in the output order is received. (In contrast, such frames could not be identified as ready for output using only num_reorder_frames until additional frames were received or the end of the sequence was reached.) In particular, the decoder The value of MaxLatencyFrames can be used to enable early output of the following frames:

・図５Ａのシリーズ（５０１）中、フレームＦ_８ ^１
・図５Ｂのシリーズ（５０２）中、フレームＦ_２ ^１、Ｆ_４ ^３、Ｆ_６ ^５、Ｆ_８ ^７
・図５Ｃのシリーズ（５０３）中、フレームＦ_３ ^１、Ｆ_６ ^４、Ｆ_９ ^７
・図５Ｄのシリーズ（５０４）中、フレームＦ_８ ^１
・図５Ｅのシリーズ（５０５）中、フレームＦ_４ ^１、Ｆ_８ ^５
さらに、システムレベルでのMaxLatencyFramesの値の宣言又は交渉は、num_reorder_framesを用いた並べ替え記憶容量の測定及び該容量の指示によっては使用できない方法で、ビットストリーム又はセッションの待ち時間特性の要約表現を提供できる。 -Frame F ₈ ¹ in the series (501) of FIG. 5A
In the series (502) of FIG. 5B, frames F ₂ ¹ , F ₄ ³ , F ₆ ⁵ , F ₈ ⁷
-In the series (503) of FIG. 5C, frames F ₃ ¹ , F ₆ ⁴ , F ₉ ⁷
-Frame F ₈ ¹ in the series (504) of FIG. 5D
-Frames F ₄ ¹ and F ₈ ^{5 in} the series (505) of FIG. 5E
In addition, the declaration or negotiation of MaxLatencyFrames values at the system level provides a summary representation of the latency characteristics of a bitstream or session in a way that cannot be used by measuring the reordered storage capacity using num_reorder_frames and indicating that capacity. it can.

Ｃ．例示的な実装
フレーム並べ替え待ち時間の制約を示すシンタックスエレメントは、実装に依存して種々の方法で伝達できる。シンタックスエレメントは、シーケンスパラメータセット（sequence parameter set：ＳＰＳ）、ピクチャパラメータセット（picture parameter set：ＰＰＳ）又はビットストリームの他の要素の一部として伝達され、ＳＥＩメッセージ、ＶＵＩメッセージ又は他のメタデータの一部として伝達され、又は特定の他の方法で伝達され得る。任意の実装において、制約値を示すシンタックスエレメントは、符号無し指数ゴロム符号化、特定の他の形式のエントロピ符号化、又は固定長符号化を用いることにより符号化され、次に伝達できる。デコーダは、シンタックスエレメントを受信した後に、対応する復号化を実行する。 C. Exemplary Implementation A syntax element indicating a frame reordering latency constraint can be communicated in various ways depending on the implementation. A syntax element is conveyed as part of a sequence parameter set (SPS), a picture parameter set (PPS), or other elements of the bitstream, and is a SEI message, VUI message or other metadata. May be communicated as part of, or in some other way. In any implementation, the syntax element indicating the constraint value may be encoded and then communicated using unsigned exponential Golomb encoding, certain other types of entropy encoding, or fixed length encoding. After receiving the syntax element, the decoder performs the corresponding decoding.

第１の実装では、フラグmax_latency_limitation_flagが伝達される。このフラグが第１の２進値（例えば０）を有する場合、フレーム並べ替え待ち時間について如何なる制約も課せられない。この場合、max_latency_framesシンタックスエレメントの値は、伝達されないか無視される。その他の場合（フラグが１のような第２の２進値を有する場合）、max_latency_framesシンタックスエレメントの値は、フレーム並べ替え待ち時間の制約を示すために伝達される。例えば、この場合、max_latency_framesシンタックスエレメントの伝達された値は、任意の非負整数値であり得る。 In the first implementation, the flag max_latency_limitation_flag is transmitted. If this flag has a first binary value (e.g. 0), no restrictions are imposed on the frame reordering latency. In this case, the value of the max_latency_frames syntax element is not transmitted or ignored. In other cases (when the flag has a second binary value such as 1), the value of the max_latency_frames syntax element is communicated to indicate a frame reordering latency constraint. For example, in this case, the transmitted value of the max_latency_frames syntax element may be any non-negative integer value.

第２の実装では、シンタックスエレメントmax_latency_frames_plus１は、フレーム並べ替え待ち時間の制約を示すために伝達される。max_latency_frames_plus１が第１の２進値（例えば０）を有する場合、フレーム並べ替え待ち時間について如何なる制約も課せられない。他の値（例えば、非ゼロ値）では、フレーム並べ替え待ち時間についての制約の値は、max_latency_frames_plus１−１に設定される。例えば、max_latency_frames_plus１の値は、包括的に０乃至２^３２−２の範囲内である。 In the second implementation, the syntax element max_latency_frames_plus1 is communicated to indicate a frame reordering latency constraint. If max_latency_frames_plus1 has a first binary value (eg 0), no restrictions are imposed on the frame reordering latency. For other values (for example, non-zero values), the value of the constraint on the frame rearrangement waiting time is set to max_latency_frames_plus1-1. For example, the value of max_latency_frames_plus1 is in the range of 0 to 2 ³² -2 comprehensively.

同様に、第３の実装では、シンタックスエレメントmax_latency_framesは、フレーム並べ替え待ち時間の制約を示すために伝達される。max_latency_framesが第１の値（例えば最大値）を有する場合、フレーム並べ替え待ち時間について如何なる制約も課せられない。他の値（例えば、最大値より小さい値）では、フレーム並べ替え待ち時間についての制約の値は、max_latency_framesに設定される。 Similarly, in the third implementation, the syntax element max_latency_frames is communicated to indicate frame reordering latency constraints. When max_latency_frames has a first value (eg, a maximum value), no restrictions are imposed on the frame rearrangement waiting time. For other values (for example, a value smaller than the maximum value), the constraint value for the frame rearrangement waiting time is set to max_latency_frames.

第４の実装では、フレーム並べ替え待ち時間についての制約は、フレームメモリの最大サイズに関連して示される。例えば、待ち時間の制約は、num_reorder_framesシンタックスエレメントに対する増大として伝達される。通常、（フレームの観点の）フレーム並べ替え待ち時間についての制約は、num_reorder_framesより大きいか又はそれに等しい。待ち時間の制約を伝達する際にビットを節約するために、（例えば、符号無し指数ゴロム符号化、特定の他の形式のエントロピ符号化を用いて）待ち時間の制約とnum_reorder_framesとの間の差が符号化され、そして伝達される。シンタックスエレメントmax_latency_increase_plus１は、フレーム並べ替え待ち時間の制約を示すために伝達される。max_latency_increase_plus１が第１の値（例えば０）を有する場合、フレーム並べ替え待ち時間について如何なる制約も課せられない。他の値（例えば、非ゼロ値）では、フレーム並べ替え待ち時間についての制約の値は、num_reorder_frames＋max_latency_increase_plus１−１に設定される。例えば、max_latency_increase_plus１の値は、包括的に０乃至２^３２−２の範囲内である。 In the fourth implementation, the constraint on the frame reordering latency is shown in relation to the maximum size of the frame memory. For example, latency constraints are communicated as an increase over the num_reorder_frames syntax element. Usually, the constraint on frame reordering latency (in terms of frames) is greater than or equal to num_reorder_frames. To save bits in communicating latency constraints, the difference between latency constraints and num_reorder_frames (eg, using unsigned exponential Golomb coding, certain other forms of entropy coding) Are encoded and transmitted. The syntax element max_latency_increase_plus1 is transmitted to indicate a frame rearrangement waiting time constraint. When max_latency_increase_plus1 has a first value (eg, 0), no restriction is imposed on the frame rearrangement waiting time. For other values (for example, non-zero values), the value of the constraint on the frame rearrangement waiting time is set to num_reorder_frames + max_latency_increase_plus1-1. For example, the value of max_latency_increase_plus1 is in the range of 0 to 2 ³² -2 comprehensively.

代替で、フレーム並べ替え待ち時間についての制約を示す１又は複数のシンタックスエレメントは、他の特定の方法で伝達される。 Alternatively, one or more syntax elements indicating constraints on frame reordering latency are communicated in other specific ways.

Ｄ．待ち時間の制約を示す他の方法
前述の例の多くでは、待ち時間の制約は、フレーム数の観点で表されるフレーム並べ替え待ち時間についての制約である。より一般的には、待ち時間の制約は、フレーム数の観点から表される、又は秒、ミリ秒若しくは別の時間指標の観点から表される遅延についての制約である。例えば、待ち時間の制約は、１秒又は０．５秒のような絶対時間指標として表すことができる。エンコーダは、このような時間指標を（ビデオのフレームレートを考慮して）フレーム数に変換し、次にビデオシーケンスの複数のフレーム間のフレーム間依存性がフレーム数と一致するようにビデオを符号化する。或いは、フレーム並べ替え及びフレーム間依存性に関係なく、エンコーダは、時間指標を用いて、符号化ビデオのビットレート、符号化の複雑さ、ネットワーク帯域幅などの短期間変動を平滑化するために遅延が用いられる範囲を制限できる。デコーダは、この時間指標を用いて、フレームが復号化ピクチャバッファから出力され得るときを決定できる。 D. Other ways to indicate latency constraints In many of the above examples, latency constraints are constraints on frame reordering latency expressed in terms of the number of frames. More generally, latency constraints are constraints on delay expressed in terms of the number of frames, or in terms of seconds, milliseconds, or another time measure. For example, the latency constraint can be expressed as an absolute time index, such as 1 second or 0.5 seconds. The encoder converts such a time index into a number of frames (considering the video frame rate) and then encodes the video so that the interframe dependency between multiple frames of the video sequence matches the number of frames. Turn into. Or, regardless of frame reordering and inter-frame dependency, the encoder can use time indices to smooth out short-term variations such as the bit rate of encoded video, encoding complexity, network bandwidth, etc. The range over which the delay is used can be limited. The decoder can use this time index to determine when a frame can be output from the decoded picture buffer.

待ち時間の制約は、送信側と受信側との間で交渉され、符号化ビデオのビットレートの短期的変動を平滑化する能力、符号化の複雑さの短期的変動を平滑化する能力、ネットワーク帯域幅の短期的変動を平滑化する能力、及び／又は増大する遅延から利益を享受する別の要因で、応答をトレードオフする（遅延がない）。このような交渉では、フレームレートと独立した方法で待ち時間の制約を定め特徴付けることが有用であり得る。次に、制約は、ビデオのフレームレートを考慮して、符号化及び復号化中に適用できる。或いは、制約は、ビデオのフレームレートに関係なく、符号化及び復号化中に適用できる。 Latency constraints are negotiated between the sender and receiver, the ability to smooth short-term fluctuations in the bit rate of the encoded video, the ability to smooth short-term fluctuations in the coding complexity, the network Trade off response (no delay) with the ability to smooth short-term bandwidth fluctuations and / or another factor that benefits from increasing delay. In such negotiations, it can be useful to define and characterize latency constraints in a manner that is independent of the frame rate. The constraints can then be applied during encoding and decoding, taking into account the video frame rate. Alternatively, the constraints can be applied during encoding and decoding regardless of the video frame rate.

Ｅ．シンタックスエレメントを設定及び出力する一般的な技術
図６は、待ち時間の低減した復号化を実現するシンタックスエレメントを設定及び出力する例示的な技術（６００）を示す。例えば、図２Ａ及び２Ｂを参照して説明したリアルタイム通信ツール又は符号化ツールは、技術（６００）を実行する。或いは、別のツールが技術（６００）を実行する。 E. General Techniques for Setting and Outputting Syntax Elements FIG. 6 shows an exemplary technique (600) for setting and outputting syntax elements that achieve decoding with reduced latency. For example, the real-time communication tool or encoding tool described with reference to FIGS. 2A and 2B performs the technique (600). Alternatively, another tool performs the technique (600).

始めに、ツールは、ビデオシーケンスの複数のフレーム間のフレーム間依存性に一致する待ち時間（例えば、フレーム並べ替え待ち時間、時間指標の観点からの待ち時間）についての制約を示す１又は複数のシンタックスエレメントを設定する（６１０）。ツールがビデオエンコーダを有するとき、同一のツールがフレームを受信し、フレームを符号化して（フレーム並べ替え待ち時間の制約と一致するフレーム間依存性を用いて）符号化データを生成し、そして格納又は送信のために符号化データを出力し得る。 First, the tool shows one or more constraints that indicate constraints on latency (eg, frame reorder latency, latency in terms of time index) that matches the interframe dependency between multiple frames of the video sequence. A syntax element is set (610). When the tool has a video encoder, the same tool receives the frame, encodes the frame (using interframe dependencies consistent with frame reordering latency constraints), and generates and stores encoded data Or, encoded data may be output for transmission.

通常、フレーム並べ替え待ち時間の制約は、ビデオシーケンス内の任意のフレームに対して許容される並べ替え待ち時間である。制約は、種々の方法で表すことができるが、種々の他の意味を有する。例えば、制約は、出力順序で所与のフレームに先行できるが符号化順序で所与のフレームに続くことができるフレームの最大数の観点で表現できる。或いは、制約は、ビデオシーケンス内の任意のフレームについて、符号化順序と出力順序との間の最大差として表すことができる。或いは、個々のフレームに焦点を当てると、制約は、ビデオシーケンス内の所与の特定のフレームに関連する並べ替え待ち時間として表すことができる。或いは、フレームのグループに焦点を当てると、制約は、ビデオシーケンス内のフレームのグループに関連する並べ替え待ち時間として表すことができる。或いは、制約は特定の他の方法で表すことができる。 Typically, the frame reordering latency constraint is the reordering latency allowed for any frame in the video sequence. Constraints can be expressed in various ways, but have various other meanings. For example, constraints can be expressed in terms of the maximum number of frames that can precede a given frame in output order but can follow a given frame in encoding order. Alternatively, the constraint can be expressed as the maximum difference between the encoding order and the output order for any frame in the video sequence. Alternatively, focusing on individual frames, the constraint can be expressed as a reordering latency associated with a given particular frame in the video sequence. Alternatively, focusing on a group of frames, the constraint can be expressed as a reordering latency associated with the group of frames in the video sequence. Alternatively, the constraints can be expressed in certain other ways.

次に、ツールは、シンタックスエレメントを出力する（６２０）。これは、複数のフレームの出力順序の観点から、再構成フレームが出力準備ができたときの決定を実現する。シンタックスエレメントは、基本符号化ビデオストリーム内のシーケンスパラメータセット又はピクチャパラメータセットの一部として、フレームの符号化データも含むメディア格納ファイル又はメディア送信ストリームのシンタックスの一部として、（例えば、システムレベルの交渉の際のストリーム又はセッションパラメータ値の交換中に）メディア特性交渉プロトコルの一部として、フレームの符号化データと共に多重化されたメディアシステム情報の一部として、又は（例えば、ＳＥＩメッセージ又はＶＵＩメッセージ内の）フレームの符号化データに関連するメディアメタデータの一部として、出力できる。異なるシンタックスエレメントが、メモリ容量要件を示すために出力され得る。例えば、（max_dec_frame_bufferingのような）バッファサイズシンタックスエレメントはＤＰＢの最大サイズを示すことができ、（num_reorder_framesのような）フレームメモリシンタックスエレメントは並べ替えのためのフレームメモリの最大サイズを示すことができる。 Next, the tool outputs a syntax element (620). This realizes the determination when the reconstructed frame is ready for output in terms of the output order of the plurality of frames. The syntax element is part of a sequence parameter set or picture parameter set in the base encoded video stream, part of the syntax of the media storage file or media transmission stream that also contains the encoded data of the frame (e.g., system As part of a media property negotiation protocol (during the exchange of stream or session parameter values during level negotiation), as part of media system information multiplexed with the encoded data of the frame, or (for example, SEI messages or As part of the media metadata associated with the encoded data of the frame (in the VUI message). Different syntax elements can be output to indicate memory capacity requirements. For example, a buffer size syntax element (such as max_dec_frame_buffering) can indicate the maximum size of the DPB, and a frame memory syntax element (such as num_reorder_frames) can indicate the maximum size of the frame memory for reordering. it can.

待ち時間の制約の値は、Ｖ．Ｃ章で記載したような種々の方法で表すことができる。例えば、ツールは、シンタックスエレメントの存在又は不在を示すフラグを出力する。フラグがシンタックスエレメントは不在であると示す場合、待ち時間の制約は、不定であるか、又は既定値を有する。その他の場合、シンタックスエレメントは、待ち時間の制約に従い、待ち時間の制約を示す。或いは、シンタックスエレメントの１つの値が、待ち時間の制約は不定又は既定値であると示し、シンタックスエレメントの他の可能な値が、待ち時間の制約に値する整数を示す。或いは、待ち時間の制約がフレーム並べ替え待ち時間の制約である場合には、シンタックスエレメントの所与の値は、並べ替えのためのフレームメモリの最大サイズに対するフレーム並べ替え待ち時間の制約に値する整数を示す。これは、num_reorder_framesのような異なるシンタックスエレメントで示される。或いは、待ち時間の制約は特定の他の方法で表される。 The value of the waiting time constraint is V. It can be represented in various ways as described in section C. For example, the tool outputs a flag indicating the presence or absence of syntax elements. If the flag indicates that the syntax element is absent, the latency constraint is indefinite or has a default value. In other cases, the syntax element indicates a latency constraint according to the latency constraint. Alternatively, one value of the syntax element indicates that the latency constraint is indefinite or default, and another possible value of the syntax element indicates an integer that deserves the latency constraint. Alternatively, if the latency constraint is a frame reordering latency constraint, the given value of the syntax element deserves a frame reordering latency constraint on the maximum size of the frame memory for reordering. Indicates an integer. This is indicated by different syntax elements such as num_reorder_frames. Alternatively, latency constraints are expressed in certain other ways.

幾つかの実装では、ビデオシーケンスのフレームは、時間的階層構造に従って編成される。この場合、異なるシンタックスエレメントは、時間的階層構造の異なる時間レイヤについてのフレーム並べ替え待ち時間の異なる制約を示し得る。 In some implementations, the frames of the video sequence are organized according to a temporal hierarchical structure. In this case, different syntax elements may indicate different constraints on frame reordering latency for different temporal layers of the temporal hierarchy.

Ｆ．シンタックスエレメントを受信及び使用する一般的な技術
図７は、待ち時間の低減した復号化を実現するシンタックスエレメントを受信及び使用する例示的な技術（７００）を示す。例えば、図２Ａ及び２Ｂを参照して説明したリアルタイム通信ツール又は再生ツールは、技術（７００）を実行する。或いは、別のツールが技術（７００）を実行する。 F. General Techniques for Receiving and Using Syntax Elements FIG. 7 shows an exemplary technique (700) for receiving and using syntax elements that provides decoding with reduced latency. For example, the real-time communication tool or playback tool described with reference to FIGS. 2A and 2B implements technique (700). Alternatively, another tool performs the technique (700).

始めに、ツールは、待ち時間（例えば、フレーム並べ替え待ち時間、時間指標の観点からの待ち時間）についての制約を示す１又は複数のシンタックスエレメントを受信し解析する（７１０）。例えば、解析は、待ち時間の制約を示す１又は複数のシンタックスエレメントをビットストリームから読み出すことを含む。ツールは、ビデオシーケンスの複数のフレームの符号化データも受信する（７２０）。ツールは、シンタックスエレメントを解析し、シンタックスエレメントに基づき待ち時間の制約を決定できる。通常、フレーム並べ替え待ち時間の制約は、ビデオシーケンス内の任意のフレームに対して許容される並べ替え待ち時間である。前の章で述べたように、制約は、種々の方法で表すことができるが、種々の他の意味を有する。シンタックスエレメントは、基本符号化ビデオストリーム内のシーケンスパラメータセット又はピクチャパラメータセットの一部として、メディア格納ファイル又はメディア送信ストリームのシンタックスの一部として、メディア特性交渉プロトコルの一部として、符号化データと共に多重化されたメディアシステム情報の一部として、又は符号化データに関連するメディアメタデータの一部として、伝達できる。ツールは、メモリ容量要件を示す異なるシンタックスエレメント、例えばmax_dec_frame_bufferingのようなバッファサイズシンタックスエレメント及びnum_reorder_framesのようなフレームメモリシンタックスエレメントを受信し解析できる。 Initially, the tool receives and analyzes (710) one or more syntax elements that indicate constraints on latency (eg, frame reordering latency, latency from a time index perspective). For example, the analysis includes reading one or more syntax elements from the bitstream that indicate latency constraints. The tool also receives (720) encoded data for multiple frames of the video sequence. The tool can parse the syntax element and determine latency constraints based on the syntax element. Typically, the frame reordering latency constraint is the reordering latency allowed for any frame in the video sequence. As stated in the previous chapter, constraints can be expressed in various ways, but have various other meanings. The syntax element is encoded as part of the sequence parameter set or picture parameter set in the base encoded video stream, as part of the syntax of the media storage file or media transmission stream, as part of the media property negotiation protocol. It can be communicated as part of the media system information multiplexed with the data or as part of the media metadata associated with the encoded data. The tool can receive and analyze different syntax elements indicating memory capacity requirements, eg, buffer size syntax elements such as max_dec_frame_buffering and frame memory syntax elements such as num_reorder_frames.

待ち時間の制約の値は、Ｖ．Ｃ章で記載したような種々の方法で表すことができる。例えば、ツールは、シンタックスエレメントの存在又は不在を示すフラグを受信する。フラグがシンタックスエレメントは不在であると示す場合、待ち時間の制約は、不定であるか、又は既定値を有する。その他の場合、シンタックスエレメントは、待ち時間の制約に従い、待ち時間の制約を示す。或いは、シンタックスエレメントの１つの値が、待ち時間の制約は不定又は既定値であると示し、シンタックスエレメントの他の可能な値が、待ち時間の制約に値する整数を示す。或いは、待ち時間の制約がフレーム並べ替え待ち時間の制約である場合には、シンタックスエレメントの所与の値は、並べ替えのためのフレームメモリの最大サイズに対するフレーム並べ替え待ち時間の制約に値する整数を示す。これは、num_reorder_framesのような異なるシンタックスエレメントで示される。或いは、待ち時間の制約は特定の他の方法で伝達される。 The value of the waiting time constraint is V. It can be represented in various ways as described in section C. For example, the tool receives a flag indicating the presence or absence of syntax elements. If the flag indicates that the syntax element is absent, the latency constraint is indefinite or has a default value. In other cases, the syntax element indicates a latency constraint according to the latency constraint. Alternatively, one value of the syntax element indicates that the latency constraint is indefinite or default, and another possible value of the syntax element indicates an integer that deserves the latency constraint. Alternatively, if the latency constraint is a frame reordering latency constraint, the given value of the syntax element deserves a frame reordering latency constraint on the maximum size of the frame memory for reordering. Indicates an integer. This is indicated by different syntax elements such as num_reorder_frames. Alternatively, latency constraints are communicated in certain other ways.

図７に戻り、ツールは、フレームのうちの１つを再構成するために符号化データの少なくとも一部を復号化する（７３０）。ツールは、再構成フレームを出力する（７４０）。これを行う際、ツールは、例えば、ビデオシーケンスのフレームの出力順序の観点で、再構成フレームの出力準備ができるときを決定するために、待ち時間の制約を用いることができる。 Returning to FIG. 7, the tool decodes (730) at least a portion of the encoded data to reconstruct one of the frames. The tool outputs a reconstruction frame (740). In doing this, the tool can use latency constraints, for example, to determine when a reconstructed frame is ready for output in terms of the output order of the frames of the video sequence.

幾つかの実装では、ビデオシーケンスのフレームは、時間的階層構造に従って編成される。この場合、異なるシンタックスエレメントは、時間的階層構造の異なる時間レイヤについてのフレーム並べ替え待ち時間の異なる制約を示し得る。ツールは、出力の時間分解能に依存して、フレーム並べ替え待ち時間の異なる制約のうちの１つを選択できる。 In some implementations, the frames of the video sequence are organized according to a temporal hierarchical structure. In this case, different syntax elements may indicate different constraints on frame reordering latency for different temporal layers of the temporal hierarchy. The tool can select one of the different constraints of frame reordering latency depending on the temporal resolution of the output.

開示の発明の原理を適用できる多くの可能な実施形態に照らし、図示の実施形態は本発明の単なる好適な例であり、本発明を限定するものと考えられるべきではないことが理解される。むしろ、本発明の範囲は添付の請求の範囲によって定められる。したがって、出願人は、添付の請求の範囲の範囲及び精神の範囲内に包含される全てを出願人の発明として請求する。 In light of the many possible embodiments to which the disclosed inventive principles can be applied, it is understood that the illustrated embodiments are merely preferred examples of the invention and are not to be construed as limiting the invention. Rather, the scope of the present invention is defined by the appended claims. Accordingly, Applicants claim as Applicant's invention all that is encompassed within the scope and spirit of the appended claims.

Claims

In a computing system comprising a video decoder,
Receiving and analyzing one or more syntax elements indicative of a frame reordering latency constraint, wherein the frame reordering latency constraint can precede any frame in the video sequence in output order; A step represented by the maximum number of frames that can follow the frame in a grouping order;
Receiving encoded data of a plurality of frames of the video sequence;
Decoding at least a portion of the encoded data to reconstruct one of the plurality of frames at the video decoder;
Outputting the reconstructed frame;
Having a method.

Determining the frame reordering latency constraint based on the one or more syntax elements;
Determining when the reconstructed frame is ready to be output in terms of the output order of the plurality of frames of the video sequence using the frame rearrangement latency constraint;
The method of claim 1 further comprising:

The plurality of frames of the video sequence are organized according to a temporal hierarchy, wherein different syntax elements indicate different constraints of frame reordering latency for different temporal layers of the temporal hierarchy. The method of claim 2, further comprising selecting one of the different constraints of the frame reordering latency depending on the temporal resolution of the output.

The method of claim 1, wherein the frame reordering latency constraint is an acceptable reordering latency for any frame in the video sequence.

The one or more syntax elements and the encoded data are transmitted as part of a syntax of an encoded video bitstream, and the method includes:
Receiving and analyzing a buffer size syntax element indicating a maximum size of a decoded picture buffer and a frame memory syntax element indicating a maximum size of a frame memory for rearrangement, the buffer size syntax element and the buffer size syntax element; A frame memory syntax element is different from the one or more syntax elements indicating the frame reordering latency constraint,
The method of claim 1 further comprising:

The one or more syntax elements include a sequence parameter set, a picture parameter set, a syntax of a media storage file that also includes the encoded data, a syntax of a media transmission stream that also includes the encoded data, a media characteristic negotiation protocol, The method of claim 1, wherein the method is conveyed as part of media system information multiplexed with encoded data or media metadata associated with the encoded data.

One possible values of the one or more syntax elements, limitation of the frame reordering latency indicates as having a certain or predetermined value is undefined, an other possible of the one or more syntax elements The value indicates an integer that deserves the frame reordering latency constraint, or the value of the one or more syntax elements is the frame reordering latency constraint on the maximum size of the frame memory for reordering. The method of claim 1, wherein the method indicates an integer that deserves and the maximum size of the frame memory for the reordering is indicated by different syntax elements.

In computing systems,
Setting one or more syntax elements indicating a frame reordering latency constraint that matches interframe dependencies between a plurality of frames of a video sequence, wherein the frame reordering latency constraint is output wherein the coding sequence can precede any frame of a video sequence represented by the maximum number of frames that can be followed by the frame in order, the steps,
Outputting the one or more syntax elements, thereby realizing a determination when the reconstructed frame is ready for output in terms of the output order of the plurality of frames; and
Having a method.

The computing system comprises a video encoder, and the method comprises:
Receiving a plurality of frames of a video sequence;
A step of encoding the plurality of frames to generate encoded data in the video encoder, the encoding step including the inter-frame dependency that matches the frame rearrangement waiting time constraint; Using steps,
Outputting the encoded data for storage or transmission;
9. The method of claim 8, further comprising:

A computing system comprising a processor, a memory, and a storage device, the computing system comprising:
Receiving and analyzing one or more syntax elements indicative of a frame reordering latency constraint;
Determining the frame reordering latency constraint based on the one or more syntax elements, wherein the frame reordering latency constraint can precede a given frame in output order and in encoding order. Represented in terms of the maximum number of frames that can follow the given frame;
Receiving encoded data of a plurality of frames of a video sequence;
Decoding at least a portion of the encoded data to reconstruct one of the plurality of frames at a video decoder;
Outputting the reconstructed frame in order to determine when the reconstructed frame is ready for output in terms of the output order of the plurality of frames of the video sequence. Including a step of using a reordering latency constraint;
A computing system adapted to perform a method comprising:

Receiving and analyzing one or more syntax elements indicating a maximum size of a frame memory for reordering, wherein the maximum size of the frame memory for reordering is determined in the encoding order of the video sequence; Represented in terms of the maximum number of frames that can precede any frame and can follow it in output order;
The method of claim 1 further comprising:

The value of the one or more syntax elements indicates an integer worth the constraint of the frame rearrangement waiting time with respect to the maximum size of the frame memory for rearrangement, and the maximum size of the frame memory for rearrangement is: Indicated by different syntax elements,
The frame rearrangement waiting time constraint is determined by adding an integer worth the frame rearrangement waiting time constraint and subtracting 1 to the maximum value worth the maximum size of the frame memory for the rearrangement. The method according to claim 1.

  Setting one or more syntax elements indicating a maximum size of a frame memory for rearrangement, wherein the maximum size of the frame memory for rearrangement is an arbitrary frame of the video sequence in encoding order; Represented in terms of the maximum number of frames that can precede and follow the frame in output order; and
  Outputting the one or more syntax elements indicating a maximum size of a frame memory for the rearrangement;
  9. The method of claim 8, further comprising:

The value of the one or more syntax elements indicates an integer worth the constraint of the frame rearrangement waiting time with respect to the maximum size of the frame memory for rearrangement, and the maximum size of the frame memory for rearrangement is: Indicated by different syntax elements,
The frame rearrangement waiting time constraint is determined by adding an integer worth the frame rearrangement waiting time constraint and subtracting 1 to the maximum value worth the maximum size of the frame memory for the rearrangement. The method according to claim 8.