JP2014509114A

JP2014509114A - Bitstream subset instructions

Info

Publication number: JP2014509114A
Application number: JP2013550446A
Authority: JP
Inventors: トマスルセルト，; リカードスイェベルイ，; ツァンフェイウー，
Original assignee: テレフオンアクチーボラゲットエルエムエリクソン（パブル）
Priority date: 2011-01-19
Filing date: 2012-01-19
Publication date: 2014-04-10
Anticipated expiration: 2032-01-19
Also published as: WO2012099529A1; KR20130119479A; MA34944B1; US20160099988A1; SG191748A1; CN103404140B; EP2666296A1; US9485287B2; EP2666296A4; ZA201304569B; JP5553945B2; US20130287123A1; CN103404140A; US9143783B2; KR101560956B1

Abstract

ビデオビットストリーム（２１０）内のビットストリームサブセットを示す方法が提供される。方法は、ビットストリームを受信することと、各々がビデオデータ又は補足情報のいずれか一方を含むビデオパケット（２１１〜２１６）にビットストリームを分割することと、単一のサブセット識別子（ｓｔｒｅａｍ＿ｉｄ）で各パケットにマークを付けることとを備える。各サブセット識別子は、対応するビットストリームサブセット（２２１〜２２３）と関連付けられる。更に、ビデオビットストリームからビデオパケットを抽出する方法が提供される。方法は、関係のあるサブセット識別子を提供することと、ビットストリームからビデオパケットを受信することと、受信したビデオパケット毎にパケットのサブセット識別子を検査することとを備える。パケットは、サブセット識別子が関係のあるサブセット識別子のうちの１つと一致する場合に抽出される。これにより、ビットストリームサブセットのプロパティを単一の識別子に要約でき、ネットワーク内及びクライアント側でのビデオパケットの処理を簡略化できる。更に、上述の方法に対応する装置が提供される。 A method is provided for indicating a bitstream subset within a video bitstream (210). The method includes receiving a bitstream, dividing the bitstream into video packets (211 to 216) each containing either video data or supplemental information, and each with a single subset identifier (stream_id). Marking the packet. Each subset identifier is associated with a corresponding bitstream subset (221-223). Further provided is a method for extracting video packets from a video bitstream. The method comprises providing a relevant subset identifier, receiving a video packet from the bitstream, and examining the subset identifier of the packet for each received video packet. A packet is extracted if the subset identifier matches one of the relevant subset identifiers. This can summarize the properties of the bitstream subset into a single identifier, simplifying the processing of video packets within the network and on the client side. Furthermore, an apparatus corresponding to the above-described method is provided.

Description

本発明は、圧縮ビデオビットストリーム内のビットストリームサブセットを示す方法及び装置、並びに圧縮ビデオビットストリームからビデオパケットを抽出する方法及び装置に関する。更に本発明は、コンピュータプログラム及びコンピュータプログラム製品に関する。 The present invention relates to a method and apparatus for indicating a bitstream subset in a compressed video bitstream, and a method and apparatus for extracting video packets from a compressed video bitstream. The invention further relates to a computer program and a computer program product.

ＭＰＥＧ−４／高度ビデオ符号化（ＡＶＣ）としても既知であるＨ．２６４は、最新のビデオ符号化規格である。これは、各ビデオフレーム内の冗長及びフレーム間の冗長の双方の除去を利用する複合コーデックである。符号化処理の出力はビデオ符号化レイヤ（ＶＣＬ）データであり、これは送信又は格納前にネットワーク抽象化レイヤ（ＮＡＬ）ユニットに更にカプセル化される。ビデオデータ以外にＮＡＬユニットで搬送可能な他のデータは、ビデオの解像度又は必要な復号器の性能等のＶＣＬデータの復号化に必須のデータを搬送するシーケンスパラメータセット（ＳＰＳ）及びピクチャパラメータセット（ＰＰＳ）等のパラメータセット、あるいは復号器又はネットワーク要素にとって有用であるがＶＣＬデータの復号化に必須ではない情報を搬送する補足拡張情報（ＳＥＩ）を含む。 H. also known as MPEG-4 / Advanced Video Coding (AVC). H.264 is the latest video coding standard. This is a composite codec that utilizes the removal of both redundancy within each video frame and redundancy between frames. The output of the encoding process is video coding layer (VCL) data, which is further encapsulated in a network abstraction layer (NAL) unit before transmission or storage. In addition to the video data, other data that can be carried by the NAL unit includes a sequence parameter set (SPS) and a picture parameter set (SPS) that carry data necessary for decoding VCL data such as video resolution or required decoder performance. Parameter set such as PPS) or supplemental extended information (SEI) that carries information useful to the decoder or network element but not essential for decoding the VCL data.

ＮＡＬは、広範な種類のシステムがリアルタイム転送プロトコル（ＲＴＰ）又はハイパーテキスト転送プロトコル（ＨＴＴＰ）を介する送信、あるいはＩＳＯファイル形式での格納等のビデオデータの転送及び格納のために単純で効果的に且つ柔軟にＶＣＬを使用できるようにするために設計されている。ＮＡＬユニットの概念は、圧縮ビットストリームを論理ユニットに分割することにより、ネットワーク、すなわち送信及び格納システムがビットストリームにアクセスし、それらをグループ化して操作する手段を提供すると考えられる。例えば１つの圧縮ピクチャに対応するユニットは、符号化ピクチャが圧縮ビデオの復号化を開始するためのランダムアクセスポイントとして使用可能であるかをネットワークに示す高度な情報で補足される。 NAL is simple and effective for a wide variety of systems to transfer and store video data such as transmission via Real Time Transfer Protocol (RTP) or Hypertext Transfer Protocol (HTTP), or storage in ISO file format. And it is designed to be able to use VCL flexibly. The concept of a NAL unit is thought to provide a means for the network, ie the transmission and storage system, to access the bitstream and group and manipulate them by dividing the compressed bitstream into logical units. For example, a unit corresponding to one compressed picture is supplemented with advanced information that indicates to the network whether the encoded picture can be used as a random access point for initiating decoding of the compressed video.

ＮＡＬは、Ｈ．２６４／ＡＶＣビデオの最小サイズの機能ユニットである。ＮＡＬユニットは、ＮＡＬユニットヘッダ及びＮＡＬユニットペイロードに細分化可能である。ＮＡＬユニットヘッダは、圧縮ビットストリームを管理するためにネットワークにより使用される識別子セットから構成される。例えば帯域幅が制限される場合にビデオの送信ビットレートを減少するために、ビデオデータの破棄による品質の劣化を最小限にするように、ＮＡＬユニットヘッダにおいて搬送される情報に基づいていくつかのＮＡＬユニットが破棄される。この処理を「ビットストリームの間引き」と呼ぶ。 NAL is an H.264 standard. It is a functional unit of the minimum size of H.264 / AVC video. A NAL unit can be subdivided into a NAL unit header and a NAL unit payload. The NAL unit header consists of a set of identifiers used by the network to manage the compressed bitstream. Based on the information carried in the NAL unit header to minimize quality degradation due to discarding video data, for example to reduce the video transmission bit rate when bandwidth is limited, The NAL unit is discarded. This process is called “bitstream thinning”.

従来のビデオサービスは単一の表現で、すなわち固定されたカメラ位置及び空間解像度を使用してビデオを提供するが、最近、多重解像度及び多視点のビデオ表現が重要度を帯びている。多重解像度表現は、異なる表示解像度を有する対象の装置で使用されるように、複数の異なる空間解像度のビデオを表現する。多視点表現は異なるカメラ視点からのコンテンツを表現する。特定の例は立体ビデオであり、シーンは人間の目と同様の距離を有する２つのカメラにより取り込まれる。適切な表示技術を使用して、奥行きの知覚が閲覧者に提供される。 Conventional video services provide video in a single representation, i.e., using a fixed camera position and spatial resolution, but recently, multi-resolution and multi-view video representations have gained importance. Multi-resolution representations represent multiple different spatial resolution videos for use with target devices having different display resolutions. Multi-viewpoint expression represents content from different camera viewpoints. A specific example is stereoscopic video, where the scene is captured by two cameras that have a distance similar to the human eye. Using appropriate display technology, depth perception is provided to the viewer.

多くの場合、多重解像度表現及び多視点ビデオ表現は階層表現レイヤと呼ばれる。この場合、基本レイヤはビデオの基本品質を表現し、後続の拡張レイヤはより高い品質に向けて表現を修正する。 In many cases, multi-resolution representations and multi-view video representations are called hierarchical representation layers. In this case, the base layer represents the basic quality of the video, and subsequent enhancement layers modify the representation for higher quality.

スケーラブルビデオ符号化（ＳＶＣ）及び多視点ビデオ符号化（ＭＶＣ）はそれぞれ、多重解像度ビデオ表現及び多視点ビデオ表現を圧縮するために使用されるビデオ符号化規格であり、異なるレイヤ間の冗長情報を除去することにより高い圧縮効率が達成される。ＳＶＣ及びＭＶＣはＡＶＣ規格に基づき、ＡＶＣの新版の付録Ｇ及びＨとして含まれる。そのため、これらはＡＶＣ構造の大部分を共有する。 Scalable video coding (SVC) and multi-view video coding (MVC) are video coding standards used to compress multi-resolution video representations and multi-view video representations, respectively. By removing, high compression efficiency is achieved. SVC and MVC are based on the AVC standard and are included as appendices G and H of the new version of AVC. As such, they share most of the AVC structure.

ＳＶＣ及びＭＶＣのビットストリームに固有の階層依存性のため、ＮＡＬユニットヘッダ内に復号化依存性及びビュー識別子等の追加情報フィールドが必要とされる。しかし、既存のＡＶＣの実現例との互換性を維持するために、基本的なＡＶＣのＮＡＬユニットヘッダは変更されなかった。その代わりに、ＮＡＬユニットの２つの新しいタイプ、すなわちプレフィックスＮＡＬユニット（タイプ１４）及び符号化スライス拡張ＮＡＬユニット（タイプ２０）を導入することにより、依存性及びビュー識別子等の追加情報が組み込まれた。これらは、ＡＶＣにおいて「不使用」と定義されるため、仕様の付録Ｇ又はＨをサポートしないＡＶＣ復号器により無視される。 Due to the hierarchical dependencies inherent in the SVC and MVC bitstreams, additional information fields such as decoding dependencies and view identifiers are required in the NAL unit header. However, the basic AVC NAL unit header was not changed in order to maintain compatibility with existing AVC implementations. Instead, additional information such as dependencies and view identifiers were incorporated by introducing two new types of NAL units: prefix NAL units (type 14) and coded slice extension NAL units (type 20). . These are ignored by AVC decoders that do not support the appendix G or H of the specification because they are defined as “not used” in AVC.

プレフィックスＮＡＬユニットは、ビットストリームにおいてプレフィックスＮＡＬユニットの直後に位置することになっているＶＣＬＡＶＣＮＡＬユニットと関連付けられ、基本レイヤに関する追加情報を伝達する。ＡＶＣ復号器は、プレフィックスＮＡＬユニットを無視することにより基本レイヤを復号化できる。 The prefix NAL unit is associated with a VCL AVC NAL unit that is to be located immediately after the prefix NAL unit in the bitstream and carries additional information about the base layer. The AVC decoder can decode the base layer by ignoring the prefix NAL unit.

符号化スライス拡張ＮＡＬユニットは、ＳＶＣ又はＭＶＣの拡張レイヤでのみ使用される。これは、基本レイヤ又は他の拡張レイヤに関する拡張情報を表す。プレフィックスＮＡＬユニットと同様に依存性及びビュー識別子を伝達する他に、符号化スライス拡張ＮＡＬユニットはＳＶＣ又はＭＶＣのＮＡＬユニットヘッダ及び対応するＶＣＬデータの双方から構成される。従って、これはプレフィックスＮＡＬユニットとＶＣＬＡＶＣＮＡＬユニットとの組み合わせである。ＳＶＣ及びＭＶＣの拡張レイヤのＮＡＬユニットは、ＡＶＣ復号器により無視される。 The coded slice enhancement NAL unit is used only in the enhancement layer of SVC or MVC. This represents extension information for the base layer or other enhancement layers. In addition to conveying the dependency and view identifier in the same manner as the prefix NAL unit, the coded slice extension NAL unit is composed of both the SVC or MVC NAL unit header and the corresponding VCL data. This is therefore a combination of a prefix NAL unit and a VCL AVC NAL unit. SVC and MVC enhancement layer NAL units are ignored by the AVC decoder.

ＡＶＣのＳＶＣ及びＭＶＣへの拡張は同様の方法で定義される。それらの使用は相互に排他的であり、すなわち規格において定義される構文及び意味は部分的に矛盾し、ＳＶＣの要素とＭＶＣの要素とを同時に使用できない。ＳＶＣの特徴とＭＶＣの特徴とを組み合わせるには規格を変更する必要があり、特にＮＡＬユニットヘッダの定義を変更する必要がある。 The extension of AVC to SVC and MVC is defined in a similar way. Their use is mutually exclusive, i.e. the syntax and semantics defined in the standard are partially inconsistent and SVC elements and MVC elements cannot be used simultaneously. In order to combine the SVC feature and the MVC feature, it is necessary to change the standard, and in particular, it is necessary to change the definition of the NAL unit header.

ＨＥＶＣは、現在標準化が進められている次世代ビデオ符号化規格である。ＨＥＶＣは、ＡＶＣと比較して、特に高解像度ビデオシーケンスに対して符号化を実質的に向上することを目的とする。 HEVC is a next-generation video coding standard that is currently being standardized. HEVC aims to substantially improve coding, especially for high resolution video sequences, compared to AVC.

高度な構文の設計に関して、最も単純な方法はＡＶＣの高度な構文の概念を採用することであり、特にＡＶＣのＮＡＬユニットの概念を採用することである。しかし、これは以下の問題を生じる。 With regard to advanced syntax design, the simplest method is to adopt the AVC advanced syntax concept, and in particular, to adopt the AVC NAL unit concept. However, this causes the following problems.

最新の技術によると、ＳＶＣ及びＭＶＣは下位互換性を有するようにＡＶＣから構築される。新規のＮＡＬユニットのタイプ２０は、どんな拡張レイヤにも使用できるヘッダの拡張を用いて設計される。従来のＡＶＣ復号器の問題を解決するために、従来のＮＡＬユニット（タイプ１、タイプ５及び他のタイプ）は維持され、プレフィックスＮＡＬユニットを関連付ける方法が通常のＡＶＣのＶＣＬＮＡＬユニット（タイプ１及びタイプ５）の各々に対して使用される。この手法は原則的にＨＥＶＣ及びそれ以降の拡張に対して使用されてもよいが、関連する以下の問題を有する。
−新規の特徴又は機能性を導入するには、例えば符号化スライス拡張ＮＡＬユニットである新規のＮＡＬユニットのタイプの定義が必要である。これは、ＮＡＬユニットのタイプの最大数が通常は例えばＮＡＬユニットのタイプフィールドの定義された長さにより制限されるため、望ましくない。
−従来の復号器を考慮するために、基本レイヤはプレフィックスＮＡＬユニットと共に従来のＮＡＬユニットのタイプを用いて作成される必要があり、その結果、第２の新規のＮＡＬユニットのタイプが設計される必要がある。従って、ＮＡＬユニットのタイプ数が更に増加する。
−基本レイヤ及び拡張レイヤの信号伝送は一様でなく、レイヤ毎にネットワークを介する特別な処理が必要である。そのため、実現例が複雑になる。プレフィックスＮＡＬユニットの使用は不自然であり、必要なヘッダ情報と対応するＶＣＬデータとの間に弱いリンクしか提供しない。このリンクは、例えばＮＡＬユニットのうちの１つが送信中に紛失した場合に容易に切れる可能性がある。
−今後拡張する場合、プレフィックスＮＡＬユニットのネスティングが複雑である。
−追加のＮＡＬユニットヘッダを介して高度なインタフェースを拡張することにより、ＮＡＬユニットヘッダにおいて伝達される情報に基づいてＮＡＬユニットを処理すると考えられるネットワーク機能性は、ＮＡＬユニットヘッダが拡張される度に更新される必要がある。 According to the latest technology, SVC and MVC are built from AVC for backward compatibility. The new NAL unit type 20 is designed with header extensions that can be used for any enhancement layer. In order to solve the problems of the conventional AVC decoder, the conventional NAL units (type 1, type 5 and other types) are maintained, and the method of associating the prefix NAL unit is a normal AVC VCL NAL unit (type 1 and Used for each of type 5). This approach may in principle be used for HEVC and later extensions, but has the following associated problems.
-Introducing a new feature or functionality requires a definition of the type of the new NAL unit, eg a coded slice extension NAL unit. This is undesirable because the maximum number of NAL unit types is usually limited, for example, by the defined length of the NAL unit type field.
-In order to take into account the conventional decoder, the base layer needs to be created using the conventional NAL unit type together with the prefix NAL unit, so that a second new NAL unit type is designed. There is a need. Therefore, the number of types of NAL units further increases.
-Signal transmission in the base layer and the enhancement layer is not uniform, and special processing via the network is required for each layer. This complicates the implementation example. The use of prefix NAL units is unnatural and provides only a weak link between the necessary header information and the corresponding VCL data. This link can easily break if, for example, one of the NAL units is lost during transmission.
-For future expansion, nesting of prefix NAL units is complex.
-By extending the advanced interface via an additional NAL unit header, the network functionality considered to process the NAL unit based on the information conveyed in the NAL unit header is Need to be updated.

最新のＡＶＣの概念に関連する更なる問題は、階層表現に関する。現在、ＳＶＣ及びＭＶＣにおいて、ｖｉｅｗ＿ｉｄ、ｄｅｐｅｎｄｅｎｃｙ＿ｉｄ及びｑｕａｌｉｔｙ＿ｉｄ等のレイヤのプロパティに関する全てのフラグは知的な選択又は分類を行わずに単純にＮＡＬユニットヘッダに入れられる。例えばビットストリームを受信中のクライアントがビットストリームを間引くか又は操作したい場合、クライアントはフラグの定義に関する詳細な知識を有する必要がある。基本的に、クライアントは各フラグの意味及びそれらの相互関係性を完全に理解する必要がある。例えば１つのビューが多視点ビットストリームから抽出される必要のある場合、そのビューが依存するビューが含まれない場合に誤った動作が行われやすい。あるいは、クライアントがｖｉｅｗ＿ｉｄフラグのみを考慮する場合、低品質バージョンが選択される。ＳＥＩ要素が何らかの補助を行う場合でも、階層ビットストリームから特定のビデオ表現を抽出するために必要な全ての情報をネットワークが見つけ且つ理解するのは非常に困難である。 A further problem associated with modern AVC concepts relates to hierarchical representation. Currently, in SVC and MVC, all flags related to layer properties such as view_id, dependency_id and quality_id are simply put in the NAL unit header without any intelligent selection or classification. For example, if a client receiving a bitstream wants to thin out or manipulate the bitstream, the client needs to have detailed knowledge about the definition of the flag. Basically, the client needs to fully understand the meaning of each flag and their interrelationship. For example, if a view needs to be extracted from a multi-view bitstream, an erroneous operation is likely to occur if the view on which that view depends is not included. Alternatively, if the client only considers the view_id flag, the low quality version is selected. Even if the SEI element provides some assistance, it is very difficult for the network to find and understand all the information necessary to extract a particular video representation from the hierarchical bitstream.

更に、３Ｄを範囲に含むアプリケーション及び規格の増加に伴い、奥行きマップ及びオクルージョンマップ等の新規のデータ要素がテクスチャと共に送信され、受信側における出力ビューの更に柔軟なレンダリングが可能になる。そのような要素は（多視点又はスケーラブル）「テクスチャ」ビデオと共に階層表現を形成するため、全てを同一のビットストリームで送信するのが望ましい。あるいは、異なるデータ要素のそのようなバンドリングは、転送プロトコル又はファイル形式等の高いシステムレベルでの信号伝送を介して達成されてもよい。しかし、そのような高度なプロトコルのソフトウェア及びハードウェアでの実現例はビデオの伸張の実現例と切り離されることが多く、そのため、ビットストリームレベルでサポートされない場合、テクスチャと奥行きとの同期等の異なるデータ要素の正確な時間同期は非常に複雑である。尚、異なるビデオ要素はフレームを位置合わせされる必要があるため、テクスチャ及び奥行き等の異なるビデオデータ要素の同期は、ビデオとオーディオとの同期よりはるかに緊密である必要がある。更に、テクスチャ及び奥行き等のビデオ要素は、例えばそれらの間の動き情報（「動きベクトル」）を再使用することにより一緒に圧縮されてもよく、それにはビットストリームレベルでの緊密な結合が必要である。 Furthermore, with the increasing number of applications and standards that cover 3D, new data elements such as depth maps and occlusion maps are transmitted along with textures, allowing more flexible rendering of the output view at the receiver. Since such elements form a hierarchical representation with “texture” video (multi-view or scalable), it is desirable to transmit all in the same bitstream. Alternatively, such bundling of different data elements may be achieved via high system level signaling such as transfer protocols or file formats. However, software and hardware implementations of such advanced protocols are often separated from video decompression implementations, and therefore differ in texture and depth synchronization, etc. if not supported at the bitstream level. Accurate time synchronization of data elements is very complex. Note that because different video elements need to be frame aligned, the synchronization of different video data elements such as texture and depth needs to be much closer than the synchronization of video and audio. In addition, video elements such as texture and depth may be compressed together, for example by reusing motion information between them ("motion vectors"), which requires tight coupling at the bitstream level It is.

ＨＥＶＣの開発の当初の対象はビデオのみである。しかし、今後、スケーラブル符号化及び／又は多視点符号化に拡張される可能性がある。また、ＡＶＣにおけるＮＡＬユニットの概念と同様のパケット化の概念が使用される可能性がある。従って、以下において、提示される方法は主にＨＥＶＣ等の将来のビデオ符号化規格に適用可能であるが、用語「ＮＡＬユニット」はＡＶＣにおいて定義されるのと同じ意味で使用される。また、ＳＰＳ、ＰＰＳ及びＳＥＩ等の他のＡＶＣの概念がＨＥＶＣにおいて使用されることが予期されるため、ＨＥＶＣ又は他の何らかの将来のビデオ符号化規格において異なる名称で呼ばれる可能性があるが、以下においてＡＶＣ用語を使用する。 The initial target of HEVC development is video only. However, it may be expanded to scalable coding and / or multi-view coding in the future. In addition, the concept of packetization similar to the concept of NAL units in AVC may be used. Therefore, in the following, the presented method is mainly applicable to future video coding standards such as HEVC, but the term “NAL unit” is used interchangeably as defined in AVC. Also, because other AVC concepts such as SPS, PPS and SEI are expected to be used in HEVC, they may be referred to by different names in HEVC or some other future video coding standard, AVC terminology is used.

本発明の目的は、上記の技術及び従来技術の向上された技術を提供することである。 It is an object of the present invention to provide an improved technique of the above technique and the prior art.

更に詳細には、本発明の目的は、階層ビデオ表現を容易にする将来のビデオ符号化規格に対する向上された汎用構文を提供することである。 More particularly, it is an object of the present invention to provide an improved general syntax for future video coding standards that facilitates hierarchical video representation.

本発明の上記の目的及び他の目的は、独立請求項により定義される本発明の種々態様により達成される。本発明の実施形態は、従属請求項により特徴付けられる。 The above and other objects of the invention are achieved by the various aspects of the invention as defined by the independent claims. Embodiments of the invention are characterized by the dependent claims.

本発明を説明するために、ビデオ信号は圧縮ビデオビットストリームに符号化され、例えばローカルエリアネットワーク、移動電話ネットワーク又はインターネットであるネットワークを介して送信され、例えばテレビ、コンピュータ、ビデオプレーヤ又は移動電話であるクライアントにおいて復号化されると仮定する。ネットワークは、ルータ及びスイッチ等の複数のネットワーク要素を含んでもよい。 To illustrate the present invention, a video signal is encoded into a compressed video bitstream and transmitted over a network, for example a local area network, a mobile telephone network or the Internet, eg on a television, computer, video player or mobile telephone. Assume that it is decrypted at some client. The network may include multiple network elements such as routers and switches.

本発明の第１の態様によると、圧縮ビデオビットストリーム内のビットストリームサブセットを示す方法が提供される。圧縮ビデオビットストリームは、複数のビットストリームサブセット、すなわち少なくとも２つのビットストリームサブセットを含む。方法は、圧縮ビデオビットストリームを受信することと、ビデオビットストリームをビデオパケットに分割することと、複数のサブセット識別子のうちの一つのサブセット識別子で各ビデオパケットにマークを付けることとを備える。各ビデオパケットは、ビデオデータ又は補足情報のいずれか一方を含む。複数のサブセット識別子の各サブセット識別子は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットと関連付けられる。 According to a first aspect of the present invention, a method for indicating a bitstream subset within a compressed video bitstream is provided. The compressed video bitstream includes a plurality of bitstream subsets, ie, at least two bitstream subsets. The method comprises receiving a compressed video bitstream, dividing the video bitstream into video packets, and marking each video packet with a subset identifier of the plurality of subset identifiers. Each video packet includes either video data or supplementary information. Each subset identifier of the plurality of subset identifiers is associated with a corresponding bitstream subset of the plurality of bitstream subsets.

本発明の第２の態様によると、コンピュータプログラムが提供される。コンピュータプログラムは、コンピュータプログラムコードを備える。コンピュータプログラムコードは、本発明の第１の態様に係る方法を実現するために実行されるように構成される。 According to a second aspect of the present invention, a computer program is provided. The computer program comprises computer program code. The computer program code is configured to be executed to implement the method according to the first aspect of the present invention.

本発明の第３の態様によると、コンピュータプログラム製品が提供される。コンピュータプログラム製品は、コンピュータ可読媒体を備える。コンピュータ可読媒体において、本発明の第２の態様に係るコンピュータプログラムが実現されている。 According to a third aspect of the invention, a computer program product is provided. The computer program product comprises a computer readable medium. The computer program according to the second aspect of the present invention is realized on a computer-readable medium.

本発明の第４の態様によると、圧縮ビデオビットストリームからビデオパケットを抽出する方法が提供される。圧縮ビデオビットストリームはビデオパケットに分割される。圧縮ビデオビットストリームは、複数のビットストリームサブセットを含む。各ビデオパケットは、ビデオデータ又は補足情報のいずれか一方を含む。各ビデオパケットは、複数のサブセット識別子のうちの一つのサブセット識別子を更に含む。各サブセット識別子は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットと関連付けられる。方法は、少なくとも１つの関係のあるサブセット識別子を提供することと、圧縮ビデオビットストリームからビデオパケットを受信することとを備える。方法は、受信したビデオパケット毎に、ビデオパケットのサブセット識別子を検査することと、圧縮ビデオビットストリームからビデオパケットを抽出することとを更に備える。ビデオパケットは、抽出されたサブセット識別子が少なくとも１つの関係のあるサブセット識別子のうちの１つと一致するという条件で、圧縮ビデオビットストリームから抽出される。 According to a fourth aspect of the present invention, a method for extracting video packets from a compressed video bitstream is provided. The compressed video bitstream is divided into video packets. The compressed video bitstream includes a plurality of bitstream subsets. Each video packet includes either video data or supplementary information. Each video packet further includes a subset identifier of the plurality of subset identifiers. Each subset identifier is associated with a corresponding bitstream subset of the plurality of bitstream subsets. The method comprises providing at least one relevant subset identifier and receiving a video packet from the compressed video bitstream. For each received video packet, the method further comprises examining a subset identifier of the video packet and extracting the video packet from the compressed video bitstream. Video packets are extracted from the compressed video bitstream, provided that the extracted subset identifier matches one of the at least one related subset identifier.

本発明の第５の態様によると、別のコンピュータプログラムが提供される。コンピュータプログラムは、コンピュータプログラムコードを備える。コンピュータプログラムコードは、本発明の第４の態様に係る方法を実現するために実行されるように構成される。 According to a fifth aspect of the present invention, another computer program is provided. The computer program comprises computer program code. The computer program code is configured to be executed to implement a method according to the fourth aspect of the present invention.

本発明の第６の態様によると、コンピュータプログラム製品が提供される。コンピュータプログラム製品はコンピュータ可読媒体を備える。コンピュータ可読媒体において、本発明の第５の態様に係るコンピュータプログラムが実現されている。 According to a sixth aspect of the invention, a computer program product is provided. The computer program product comprises a computer readable medium. A computer program according to the fifth aspect of the present invention is realized on a computer-readable medium.

本発明の第７の態様によると、圧縮ビデオビットストリーム内のビットストリームサブセットを示すビットストリームマーカが提供される。圧縮ビデオビットストリームは、複数のビットストリームサブセットを含む。ビットストリームマーカは、受信ユニットと、パケット化ユニットと、マーク付けユニットとを備える。受信ユニットは圧縮ビデオビットストリームを受信するように構成される。パケット化ユニットは、圧縮ビデオビットストリームをビデオパケットに分割するように構成される。各ビデオパケットはビデオデータ又は補足情報のいずれか一方を含む。マーク付けユニットは、複数のサブセット識別子のうちの一つのサブセット識別子で各ビデオパケットにマークを付けるように構成される。各サブセット識別子は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットと関連付けられる。 According to a seventh aspect of the present invention, a bitstream marker is provided that indicates a bitstream subset within a compressed video bitstream. The compressed video bitstream includes a plurality of bitstream subsets. The bitstream marker comprises a receiving unit, a packetizing unit, and a marking unit. The receiving unit is configured to receive a compressed video bitstream. The packetization unit is configured to split the compressed video bitstream into video packets. Each video packet includes either video data or supplemental information. The marking unit is configured to mark each video packet with one of the plurality of subset identifiers. Each subset identifier is associated with a corresponding bitstream subset of the plurality of bitstream subsets.

本発明の第８の態様によると、圧縮ビデオビットストリームからビデオパケットを抽出するビットストリーム抽出器が提供される。圧縮ビデオビットストリームはビデオパケットに分割される。圧縮ビデオビットストリームは複数のビットストリームサブセットを含む。各ビデオパケットはビデオデータ又は補足情報のいずれか一方を含む。各ビデオパケットは、複数のサブセット識別子のうちの一つのサブセット識別子を更に含む。各サブセット識別子は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットと関連付けられる。ビットストリーム抽出器は、サブセット選択ユニットと、受信ユニットと、抽出ユニットとを備える。サブセット選択ユニットは、少なくとも１つの関係のあるサブセット識別子を提供するように構成される。受信ユニットは、圧縮ビデオビットストリームからのビデオパケットを受信するように構成される。抽出ユニットは、受信したビデオパケット毎に、ビデオパケットのサブセット識別子を検査し且つ圧縮ビデオビットストリームからビデオパケットを抽出するように構成される。ビデオパケットは、抽出されたサブセット識別子が少なくとも１つの関係のあるサブセット識別子のうちの１つと一致するという条件で、圧縮ビデオビットストリームから抽出される。 According to an eighth aspect of the present invention, there is provided a bitstream extractor that extracts video packets from a compressed video bitstream. The compressed video bitstream is divided into video packets. The compressed video bitstream includes a plurality of bitstream subsets. Each video packet includes either video data or supplemental information. Each video packet further includes a subset identifier of the plurality of subset identifiers. Each subset identifier is associated with a corresponding bitstream subset of the plurality of bitstream subsets. The bitstream extractor includes a subset selection unit, a reception unit, and an extraction unit. The subset selection unit is configured to provide at least one relevant subset identifier. The receiving unit is configured to receive a video packet from the compressed video bitstream. The extraction unit is configured to check the subset identifier of the video packet and extract the video packet from the compressed video bitstream for each received video packet. Video packets are extracted from the compressed video bitstream, provided that the extracted subset identifier matches one of the at least one related subset identifier.

本発明は、最新のビデオ符号化規格の階層ビットストリームの概念が異なるビットストリームサブセットの識別を可能にするように一般化されてもよいという見解を利用する。この場合、各ビットストリームサブセットは、特定のプロパティが関連付けられたレイヤを表す。例えば各ビットストリームサブセットは、テクスチャ基本ビューレイヤ、奥行きマップ高品質レイヤ、時間オクルージョンマップレイヤ等のビデオデータを搬送するＶＣＬレイヤ、あるいは非ビデオデータ、すなわち例えばパラメータセットである補足情報を搬送する非ＶＣＬレイヤのいずれかを表す。これは、各レイヤ、すなわちビットストリームサブセットとストリーム識別子（ｓｔｒｅａｍ＿ｉｄ）とを関連付けることにより達成される。ストリーム識別子は、特定のビュー識別子（ｖｉｅｗ＿ｉｄ）又は依存性識別子（ｄｅｐｅｎｄｅｎｃｙ＿ｉｄ）等のレイヤのプロパティを記述するパラメータと関連付けられる。ｓｔｒｅａｍ＿ｉｄは、ＮＡＬユニットヘッダにおいて信号伝送される。 The present invention takes advantage of the view that the concept of a layered bitstream in modern video coding standards may be generalized to allow the identification of different bitstream subsets. In this case, each bitstream subset represents a layer with which a particular property is associated. For example, each bitstream subset can be a VCL layer that carries video data such as a texture base view layer, a depth map high quality layer, a temporal occlusion map layer, or non-VCL that carries non-video data, i.e., supplemental information, eg, a parameter set. Represents one of the layers. This is accomplished by associating each layer, ie, a bitstream subset, with a stream identifier (stream_id). The stream identifier is associated with a parameter describing a layer property, such as a specific view identifier (view_id) or a dependency identifier (dependency_id). The stream_id is signaled in the NAL unit header.

特定のレイヤの異なるプロパティの全てを単一の識別子ｓｔｒｅａｍ＿ｉｄに統合することにより、ネットワーク内及びクライアント側でビデオパケットを解釈及び識別する処理が簡略化される。提案される構文により、特に階層表現に適用され、従って将来のビデオコーデック及びアプリケーションと互換性のあるネットワークフレンドリで高度なビデオビットストリームの信号伝送の簡潔で拡張可能なシステム設計が可能になる。本発明の一実施形態に係るビデオビットストリームの信号伝送は、最新のビデオ符号化規格、特に上述のＡＶＣのＮＡＬユニットの概念に固有の問題が軽減される点で有利である。 By consolidating all the different properties of a particular layer into a single identifier stream_id, the process of interpreting and identifying video packets within the network and on the client side is simplified. The proposed syntax allows a concise and scalable system design for signal transmission of network friendly and advanced video bitstreams that is particularly applicable to hierarchical representations and is therefore compatible with future video codecs and applications. The video bitstream signal transmission according to an embodiment of the present invention is advantageous in that it reduces the problems inherent in the latest video coding standards, in particular the AVC NAL unit concept described above.

更に詳細には、新規の機能性は、新規のＮＡＬユニットのタイプの定義もＮＡＬユニットヘッダの構文の更新も必要としない。更に、レイヤのプロパティが単一のｓｔｒｅａｍ＿ｉｄに要約されるため、ビデオビットストリームを処理するネットワーク要素及びクライアントは、ＮＡＬユニットヘッダ内で使用される全ての情報要素、すなわち識別子、標識、パラメータ又はフラグに関する詳細な知識を有する必要がない。寧ろ、関係のあるｓｔｒｅａｍ＿ｉｄに関する知識で十分である。最後に、階層ビデオビットストリーム、すなわち１つの圧縮ビデオビットストリームに多重化された複数のビットストリームサブセットの信号伝送は、高度な信号伝送に依存する解決策より容易に正確な時間同期を達成できる点で有利である。更に、ビデオ表現の関係のあるレイヤ間の冗長がビデオ信号の圧縮において利用されてもよい。 More specifically, the new functionality does not require a new NAL unit type definition or a NAL unit header syntax update. Furthermore, since the layer properties are summarized into a single stream_id, the network elements and clients that process the video bitstream are concerned with all the information elements used in the NAL unit header, ie identifiers, indicators, parameters or flags. There is no need to have detailed knowledge. Rather, knowledge of the relevant stream_id is sufficient. Finally, signal transmission of hierarchical video bitstreams, ie, multiple bitstream subsets multiplexed into one compressed video bitstream, can achieve accurate time synchronization more easily than solutions that rely on advanced signal transmission. Is advantageous. In addition, redundancy between layers of interest in video representation may be utilized in video signal compression.

圧縮ビデオビットストリーム内の各ビデオパケットに単一のサブセット識別子でマークを付けると述べたが、圧縮ビデオビットストリームに含まれる全てのビデオパケットのサブセットに単一のサブセット識別子でマークを付けるだけの本発明の実施形態も考えられる。 Although we have stated that each video packet in a compressed video bitstream is marked with a single subset identifier, a book that only marks a subset of all video packets contained in a compressed video bitstream with a single subset identifier. Embodiments of the invention are also contemplated.

本発明の一実施形態によると、方法は、少なくとも１つのサブセットの定義を提供することを更に備える。サブセットの定義の各々は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットのプロパティを記述する。関連付けられたビットストリームサブセットのプロパティを定義するためにサブセットの定義を利用することは、対応するビデオレイヤのプロパティがネットワーク要素及びクライアントに明確に提供される点で有利である。 According to an embodiment of the invention, the method further comprises providing a definition of at least one subset. Each subset definition describes the properties of the corresponding bitstream subset of the plurality of bitstream subsets. Utilizing the subset definition to define the associated bitstream subset properties is advantageous in that the corresponding video layer properties are clearly provided to network elements and clients.

本発明の一実施形態によると、少なくとも１つのサブセットの定義は圧縮ビデオビットストリーム内のビデオパケットとして提供される。ビデオパケットは、各々が異なるビットストリームサブセットに対応する複数のサブセットの定義を搬送してもよい。１つ又は複数のサブセットの定義を含むビデオパケットは、パラメータストリームセット（ＳｔＰＳ）において搬送されてもよい。これは、関連付けられたビットストリームサブセット、すなわちビデオ又はパラメータレイヤのプロパティを記述するサブセットの定義がビデオ信号と共にネットワーク要素及びクライアントに提供される点で有利である。サブセットの定義の各々は、ｔｅｍｐｏｒａｌ＿ｉｄ、ｖｉｅｗ＿ｉｄ、ｑｕａｌｉｔｙ＿ｉｄ、ｐｒｉｏｒｉｔｙ＿ｉｄ又はサブセットにおいて搬送されるデータの種類のうちの少なくとも１つに関する情報を含む。１つ以上のビットストリームサブセットがサブセットの定義又は他のパラメータを信号伝送するために確保されてもよいことが理解されるだろう。そのような確保されたビットストリームサブセットは、ネットワーク要素及びクライアントに対して既知である、例えばｓｔｒｅａｍ＿ｉｄ＝０である事前に定義されたｓｔｒｅａｍ＿ｉｄと関連付けられてもよい。 According to one embodiment of the invention, the definition of at least one subset is provided as a video packet in a compressed video bitstream. The video packet may carry multiple subset definitions, each corresponding to a different bitstream subset. Video packets containing one or more subset definitions may be carried in a parameter stream set (StPS). This is advantageous in that the definition of the associated bitstream subset, i.e. the subset describing the properties of the video or parameter layer, is provided to the network element and the client together with the video signal. Each subset definition includes information regarding at least one of temporal_id, view_id, quality_id, priority_id, or the type of data carried in the subset. It will be appreciated that one or more bitstream subsets may be reserved for signaling subset definitions or other parameters. Such reserved bitstream subsets may be associated with a predefined stream_id that is known to network elements and clients, eg, stream_id = 0.

本発明の一実施形態によると、複数のサブセット識別子の各サブセット識別子は数値であってもよい。各サブセット識別子の値は、それと関連付けられたビットストリームサブセットの相対的な優先順位に対応する。換言すると、各ビットストリームサブセットのサブセット識別子ｓｔｒｅａｍ＿ｉｄは、特定のビットストリームサブセットにより搬送されるビデオデータの重要度を示す。そのような情報を使用して、例えば帯域幅が制限されるためパケットを破棄する必要のあるネットワーク要素又はクライアントは、関連性が低いことを示す高いｓｔｒｅａｍ＿ｉｄを有するパケットを破棄し、関連性が高いことを示す低いｓｔｒｅａｍ＿ｉｄを有するパケットを保持する。 According to an embodiment of the present invention, each subset identifier of the plurality of subset identifiers may be a numerical value. Each subset identifier value corresponds to the relative priority of the bitstream subset associated with it. In other words, the subset identifier stream_id of each bitstream subset indicates the importance of the video data carried by the particular bitstream subset. Using such information, for example, a network element or client that needs to discard packets due to limited bandwidth discards packets with a high stream_id indicating low relevance and is highly relevant Hold a packet with a low stream_id indicating that

本発明の一実施形態によると、方法は、少なくとも１つのビデオ表現の定義を提供することを更に備える。ビデオ表現の定義の各々は、少なくとも１つの関係のあるサブセット識別子を含む。少なくとも１つの関係のあるサブセット識別子と関連付けられたビットストリームサブセットは、復号化可能なビデオ表現を形成する。換言すると、各ビデオ表現は、複数のｓｔｒｅａｍ＿ｉｄ及びそれらと関連付けられたビットストリームサブセットをグループ化して、復号化可能なビデオを形成する。これは、独立して復号化できない単一のビットストリームサブセットを識別するサブセットの定義と異なる。ビデオ表現は、例えば基本ビューに対する全てのテクスチャ情報、あるいはテクスチャ及び奥行きマップ情報を含む。これは、特定のビデオ表現を正常に復号化するために処理する必要のあるビットストリームサブセットに関する情報がネットワーク要素又はクライアントに提供される点で有利である。 According to an embodiment of the invention, the method further comprises providing a definition of at least one video representation. Each definition of the video representation includes at least one related subset identifier. The bitstream subset associated with at least one relevant subset identifier forms a decodable video representation. In other words, each video representation groups a plurality of stream_ids and their associated bitstream subsets to form a decodable video. This differs from the subset definition that identifies a single bitstream subset that cannot be independently decoded. The video representation includes, for example, all texture information for the basic view, or texture and depth map information. This is advantageous in that network elements or clients are provided with information about the bitstream subset that needs to be processed in order to successfully decode a particular video representation.

本発明の一実施形態によると、少なくとも１つのビデオ表現の定義は、圧縮ビデオビットストリーム内のビデオパケットとして提供される。これは、ビデオ表現に関する情報、すなわち復号化可能なビデオを形成するために処理する必要のあるｓｔｒｅａｍ＿ｉｄのリストがビデオデータと共に信号伝送され、すなわち多重化される点で有利である。ビデオパケットは複数のビデオ表現の定義を搬送してもよい。ビデオ表現の定義の各々は、特定の特徴、すなわちプロパティを有する復号化可能な異なるビデオ表現に対応する。１つ又は複数のビデオ表現の定義を含むビデオパケットは、この目的のために確保される表現パラメータセット（ＲＰＳ）と呼ばれるビットストリームサブセットにおいて搬送されてもよい。ＲＰＳは、例えばｓｔｒｅａｍ＿ｉｄ＝０である事前に定義されたｓｔｒｅａｍ＿ｉｄと関連付けられてもよい。ビデオ表現は、ビデオ表現の相対的な優先順位を示す数値と関連付けられてもよいことが更に理解されるだろう。 According to one embodiment of the invention, the definition of at least one video representation is provided as a video packet in a compressed video bitstream. This is advantageous in that the information about the video representation, ie the list of stream_ids that need to be processed to form a decodable video, is signaled, ie multiplexed, with the video data. A video packet may carry multiple video representation definitions. Each definition of a video representation corresponds to a different decodable video representation having a particular characteristic, ie property. Video packets containing one or more video representation definitions may be carried in a bitstream subset called a representation parameter set (RPS) reserved for this purpose. The RPS may be associated with a predefined stream_id, for example, stream_id = 0. It will be further appreciated that the video representation may be associated with a numerical value that indicates the relative priority of the video representation.

本発明の一実施形態によると、方法は、受信したビデオパケット毎に、抽出したビデオパケットを転送又は復号化するか、あるいは受信したビデオパケットを破棄することを更に備える。受信したビデオパケットは、抽出されたサブセット識別子が少なくとも１つの関係のあるサブセット識別子のいずれとも一致しないという条件で破棄される。換言すると、受信したビデオパケットは、それと関連付けられたｓｔｒｅａｍ＿ｉｄが関係のあるｓｔｒｅａｍ＿ｉｄのリストと一致する場合に処理され、すなわち転送又は復号化され、一致しない場合に破棄される。これは、受信したビデオパケットが関係のあるビデオパケットであるか否かに依存してネットワーク要素又はクライアントが当該ビデオパケットを処理するという点で有利である。例えばクライアントは、例えば復号化可能なビデオ、すなわちビデオ表現を共に形成するサブセットのグループである特定のビットストリームサブセットのみを処理するように構成される。更に、ネットワーク要素は、帯域幅が制限される場合、例えば高品質拡張レイヤである関連性が低いと考えられるビットストリームサブセットを破棄するように構成されてもよい。 According to an embodiment of the present invention, the method further comprises, for each received video packet, transferring or decoding the extracted video packet or discarding the received video packet. The received video packet is discarded on condition that the extracted subset identifier does not match any of the at least one related subset identifier. In other words, a received video packet is processed if the stream_id associated with it matches the list of relevant stream_ids, i.e. forwarded or decoded, and discarded if it does not match. This is advantageous in that the network element or client processes the video packet depending on whether the received video packet is a relevant video packet or not. For example, the client is configured to process only a particular bitstream subset, eg, a decodable video, ie, a group of subsets that together form a video representation. Further, the network element may be configured to discard bitstream subsets that are considered less relevant, eg, high quality enhancement layers, when bandwidth is limited.

本発明の一実施形態によると、方法は、サブセットの定義を提供することを更に備える。サブセットの定義は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットのプロパティを記述する。方法は、対応するビットストリームと関連付けられたサブセット識別子を少なくとも１つの関係のあるサブセット識別子として使用することを更に備える。サブセットの定義に含まれるｓｔｒｅａｍ＿ｉｄを使用することは、ネットワーク要素及びクライアントがサブセットの定義により指定されたビデオパケットを処理対象として選択できるようにする点で有利である。 According to an embodiment of the invention, the method further comprises providing a subset definition. The subset definition describes the properties of the corresponding bitstream subset of the plurality of bitstream subsets. The method further comprises using the subset identifier associated with the corresponding bitstream as at least one related subset identifier. Using the stream_id included in the subset definition is advantageous in that it allows network elements and clients to select video packets specified by the subset definition for processing.

本発明の一実施形態によると、方法は、複数のサブセットの定義からサブセットの定義を選択することを更に備える。サブセットの定義は、対応するビットストリームサブセットの少なくとも１つのプロパティに従って選択される。これは、複数のサブセットの定義が提供される場合にネットワーク要素又はクライアントが特定のプロパティを有するビットストリームサブセットを処理するために選択する点で有利である。例えばクライアントは、例えば特定のｔｅｍｐｏｒａｌ＿ｉｄ、ｖｉｅｗ＿ｉｄ、ｑｕａｌｉｔｙ＿ｉｄ、ｐｒｉｏｒｉｔｙ＿ｉｄであるか又はサブセットにおいて搬送されるデータの特定の種類を示す特定の標識を含むサブセットの定義を選択する。 According to an embodiment of the invention, the method further comprises selecting a subset definition from a plurality of subset definitions. The definition of the subset is selected according to at least one property of the corresponding bitstream subset. This is advantageous in that a network element or client chooses to process a bitstream subset with specific properties when multiple subset definitions are provided. For example, the client selects a subset definition that includes a specific indicator, eg, a specific temporal_id, view_id, quality_id, priority_id, or indicating a specific type of data carried in the subset.

本発明の一実施形態によると、方法は、圧縮ビデオビットストリーム内のビデオパケットからサブセットの定義を受信することを更に備える。これは、関連付けられたビットストリームサブセット、すなわちビデオ又はパラメータレイヤのプロパティを記述するサブセットの定義がビデオデータと共にネットワーク要素及びクライアントにより受信される点で有利である。 According to one embodiment of the invention, the method further comprises receiving a subset definition from a video packet in the compressed video bitstream. This is advantageous in that the definition of the associated bitstream subset, i.e. the subset describing the properties of the video or parameter layer, is received by the network element and the client together with the video data.

本発明の一実施形態によると、方法は、圧縮ビデオビットストリーム内のビデオパケットからビデオ表現の定義を受信することを更に備える。これは、復号化可能なビデオを形成するために処理する必要のあるビデオ表現に関する情報、すなわち複数のｓｔｒｅａｍ＿ｉｄがビデオデータと共に受信される点で有利である。 According to one embodiment of the invention, the method further comprises receiving a definition of the video representation from a video packet in the compressed video bitstream. This is advantageous in that information about the video representation that needs to be processed to form a decodable video, i.e. a plurality of stream_ids, is received with the video data.

本発明の第１の態様及び第４の態様に係る方法の実施形態を参照して、本発明の利点をいくつかの例において説明したが、対応する論法は本発明の第２の態様及び第５の態様に係るコンピュータプログラムの実施形態、本発明の第３の態様及び第６の態様に係るコンピュータプログラム製品、並びに本発明の第７の態様及び第８の態様に係る装置に適用される。 Although the advantages of the present invention have been described in several examples with reference to method embodiments according to the first and fourth aspects of the present invention, the corresponding rationale is not limited to the second and fourth aspects of the present invention. The computer program product according to the fifth aspect, the computer program product according to the third and sixth aspects of the present invention, and the apparatus according to the seventh and eighth aspects of the present invention are applied.

本発明の更なる目的、特徴及び利点は、以下の詳細な開示、図面及び添付の特許請求の範囲を検討することにより明らかになるだろう。本発明の種々の特徴は以下に説明する実施形態以外の実施形態を作成するために組み合わせ可能であることが当業者には理解される。 Further objects, features and advantages of the present invention will become apparent upon review of the following detailed disclosure, drawings, and appended claims. Those skilled in the art will appreciate that the various features of the present invention can be combined to create embodiments other than those described below.

本発明の上記の目的、特徴及び利点、並びに追加の目的、特徴及び利点は、添付の図面を参照して以下の本発明の実施形態の例示であって限定しない詳細な説明を読むことにより更に理解されるだろう。
図１は、ビデオ信号を符号化、転送及び復号化するシステムを示す図である。図２は、本発明の実施形態に係るビットストリームサブセット、サブセットの定義及びビデオ表現の定義の概念を示す図である。図３は、本発明の実施形態に係る提案される構文を示す図である。図４は、本発明の一実施形態に係る圧縮ビデオビットストリーム内のビットストリームサブセットを示す方法を示す図である。図５は、本発明の一実施形態に係る圧縮ビデオビットストリームからビデオパケットを抽出する方法を示す図である。図６は、本発明の一実施形態に係るビットストリームマーカを示す図である。図７は、本発明の一実施形態に係るビットストリーム抽出器を示す図である。図８は、本発明の一実施形態に係るコンピュータプログラムコードを実行するビデオ処理装置を示す図である。 The above objects, features and advantages of the present invention, as well as additional objects, features and advantages will be further understood by reading the following non-limiting detailed description of embodiments of the present invention with reference to the accompanying drawings. Will be understood.
FIG. 1 is a diagram illustrating a system for encoding, transferring, and decoding video signals. FIG. 2 is a diagram illustrating the concept of bitstream subsets, subset definitions, and video representation definitions according to an embodiment of the present invention. FIG. 3 is a diagram illustrating a proposed syntax according to an embodiment of the present invention. FIG. 4 is a diagram illustrating a method for indicating a bitstream subset in a compressed video bitstream according to an embodiment of the present invention. FIG. 5 is a diagram illustrating a method for extracting video packets from a compressed video bitstream according to an embodiment of the present invention. FIG. 6 is a diagram illustrating a bitstream marker according to an embodiment of the present invention. FIG. 7 is a diagram illustrating a bitstream extractor according to an embodiment of the present invention. FIG. 8 is a diagram illustrating a video processing apparatus that executes computer program code according to an embodiment of the present invention.

全ての図面は概略的であり、必ずしも一定の縮尺ではなく、一般に、本発明を説明するために必要な部分のみを示す。その場合、他の部分は省略されるか又は単に提案される。 All drawings are schematic and are not necessarily to scale, generally showing only the portions necessary to describe the present invention. In that case, the other parts are omitted or simply suggested.

本発明を説明するために、ビデオ信号を符号化、転送及び復号化するシステム１００を図１に示す。 To illustrate the present invention, a system 100 for encoding, transferring and decoding video signals is shown in FIG.

システム１００は、ビデオ符号化装置１１０、転送ネットワーク１２０及びビデオ復号化装置１３０を備える。通常、ビデオ符号化装置１１０は１つ又は複数のソースからビデオ信号を受信し、ビデオ信号を圧縮し且つ結果として得られたビットストリームを例えばＮＡＬユニットであるビデオパケットに細分化するように構成される。結果として得られたビデオパケットは、その後、転送ネットワーク１２０を介して復号化装置１３０に転送される。転送ネットワーク１２０は、通常、符号化装置１１０から復号化装置１３０にビデオパケットを転送するように構成される複数の相互接続ノード、すなわちネットワーク要素１２１〜１２３を備える。ネットワーク要素１２１〜１２３は、例えばスイッチ、ルータ又はビデオパケットを処理するのに適した他の何らかの種類のネットワークノードである。転送ネットワーク１２０は、例えばローカルエリアネットワーク、移動電話ネットワーク又はインターネットである。 The system 100 includes a video encoding device 110, a transfer network 120, and a video decoding device 130. Typically, the video encoder 110 is configured to receive a video signal from one or more sources, compress the video signal, and subdivide the resulting bitstream into video packets, eg, NAL units. The The resulting video packet is then transferred to the decoding device 130 via the transfer network 120. The transport network 120 typically comprises a plurality of interconnect nodes, ie network elements 121-123, configured to transfer video packets from the encoding device 110 to the decoding device 130. Network elements 121-123 are, for example, switches, routers or some other type of network node suitable for processing video packets. The transport network 120 is, for example, a local area network, a mobile telephone network, or the Internet.

復号化装置１３０は、転送ネットワーク１２０からビデオパケットを受信し且つ受信した圧縮ビデオビットストリームを復号化するように構成される。更に、復号化装置１３０は、復号化されたビデオを閲覧者に対して表示するように構成されてもよい。復号化装置１３０は、例えばビデオプレーヤ、テレビ、コンピュータ又は移動電話である。 Decoding device 130 is configured to receive video packets from transport network 120 and to decode the received compressed video bitstream. Further, the decryption device 130 may be configured to display the decrypted video to a viewer. The decoding device 130 is, for example, a video player, a television, a computer, or a mobile phone.

以下、ストリーム識別子、サブセットの定義及びビデオ表現の概念を示す図２を参照して、本発明の実施形態を説明する。 Hereinafter, an embodiment of the present invention will be described with reference to FIG. 2 showing the concept of stream identifiers, subset definitions and video representation.

図２において、圧縮ビデオビットストリーム２１０の一部分が示される。当該部分は６つのビデオパケット、すなわちＮＡＬユニット２１１〜２１６を含み、各ビデオパケットはビデオデータあるいはパラメータ等の補足情報を含む。更に、各ＮＡＬユニット２１１〜２１６は、各ＮＡＬユニット２１１〜２１６と圧縮ビデオビットストリームの対応するビットストリームサブセットとを関連付けるためのフラグｓｔｒｅａｍ＿ｉｄを含む。例えば図２に例示するビデオビットストリームの一部分２１０の場合、ＮＡＬユニット２１１、２１３及び２１６はｓｔｒｅａｍ＿ｉｄ＝０でマークを付けられる。すなわち、これらは第１のビットストリームサブセットと関連付けられる。更に、ＮＡＬユニット２１２及び２１４はｓｔｒｅａｍ＿ｉｄ＝１でマークを付けられる。すなわち、これらは第１のビットストリームサブセットと異なる第２のビットストリームサブセットと関連付けられる。最後に、ＮＡＬユニット２１５はｓｔｒｅａｍ＿ｉｄ＝２でマークを付けられる。すなわち、これは第１のビットストリームサブセット及び第２のビットストリームサブセットと異なる第３のビットストリームサブセットと関連付けられる。 In FIG. 2, a portion of the compressed video bitstream 210 is shown. The portion includes six video packets, that is, NAL units 211 to 216, and each video packet includes supplementary information such as video data or parameters. Furthermore, each NAL unit 211-216 includes a flag stream_id for associating each NAL unit 211-216 with a corresponding bitstream subset of the compressed video bitstream. For example, for the portion 210 of the video bitstream illustrated in FIG. 2, NAL units 211, 213, and 216 are marked with stream_id = 0. That is, they are associated with the first bitstream subset. In addition, NAL units 212 and 214 are marked with stream_id = 1. That is, they are associated with a second bitstream subset that is different from the first bitstream subset. Finally, the NAL unit 215 is marked with stream_id = 2. That is, it is associated with a third bitstream subset that is different from the first bitstream subset and the second bitstream subset.

圧縮ビデオビットストリームにおいて搬送される各ＮＡＬユニットに含まれる識別子ｓｔｒｅａｍ＿ｉｄにより、ＮＡＬユニットが属するビットストリームサブセットがＮＡＬユニット毎に示される。圧縮ビデオビットストリームにおいて搬送される各ビットストリームサブセットが、例えばテクスチャ基本ビューレイヤ、奥行きマップ高品質レイヤ、パラメータセットレイヤ、時間レイヤ、オクルージョンマップレイヤ又は他の何らかの種類のビデオ又は補足レイヤである圧縮ビデオ信号の特定のレイヤを表すため、各ＮＡＬユニットは対応するレイヤと関連付けられる。このように、例えばレイヤが表現するビュー、レイヤが含むビデオ品質又はレイヤ間の依存性を記述するパラメータである特定のレイヤと関連付けられる全てのパラメータは、単一のストリーム識別子ｓｔｒｅａｍ＿ｉｄに要約される。パラメータセットを記述するために単一の識別子を使用することにより間接指定が導入され、ＮＡＬユニットが転送される際に通過するネットワーク要素又はＮＡＬユニットが復号化されるクライアントによるＮＡＬユニットの簡略な処理が可能になる。 The identifier stream_id included in each NAL unit carried in the compressed video bitstream indicates the bit stream subset to which the NAL unit belongs for each NAL unit. Compressed video where each bitstream subset carried in the compressed video bitstream is, for example, a texture base view layer, a depth map high quality layer, a parameter set layer, a temporal layer, an occlusion map layer, or some other type of video or supplemental layer Each NAL unit is associated with a corresponding layer to represent a particular layer of the signal. In this way, for example, all parameters associated with a particular layer, which are parameters describing the view that the layer represents, the video quality that the layer contains, or the dependency between layers, are summarized in a single stream identifier stream_id. Indirect designation is introduced by using a single identifier to describe the parameter set, and the NAL unit simplified processing by the network element through which the NAL unit is transferred or the NAL unit is decrypted when it is transferred Is possible.

各レイヤ、すなわちビットストリームサブセットのプロパティは事前に定義され、ビデオ信号の符号化、転送及び復号化に関係する全てのエンティティに対して既知であってもよい。例えば図１を参照すると、復号化装置１３０は、転送ネットワーク１２０を介して符号化装置１１０から受信した異なるビットストリームサブセット、すなわちレイヤのうちの１つ又はいくつかに属するビデオパケットのみを復号化するように構成されてもよい。これは、例えばｓｔｒｅａｍ＿ｉｄ＝０が基本レイヤであり、ｓｔｒｅａｍ＿ｉｄ＝１及びｓｔｒｅａｍ＿ｉｄ＝２が向上されたビデオ品質を提供するための拡張レイヤであって、復号化装置１３０が低品質のビデオ信号しか表示できない場合に当てはまる。更に、ネットワーク要素１２１〜１２３は、使用可能な帯域幅が制限される場合、３つのビットストリームサブセットのうちの１つ又は２つに属するビデオパケットのみを転送するように構成されてもよい。 The properties of each layer, i.e. the bitstream subset, are predefined and may be known for all entities involved in the encoding, transmission and decoding of the video signal. For example, referring to FIG. 1, the decoding device 130 only decodes video packets belonging to different bitstream subsets, ie one or several of the layers, received from the encoding device 110 via the transport network 120. It may be configured as follows. For example, stream_id = 0 is a base layer, and stream_id = 1 and stream_id = 2 are enhancement layers for providing improved video quality, and the decoding apparatus 130 can display only a low-quality video signal. The case is true. Further, the network elements 121-123 may be configured to forward only video packets belonging to one or two of the three bitstream subsets when the available bandwidth is limited.

更に図２を参照して、多層ビデオビットストリームに含まれるビデオレイヤの向上された処理を説明する。 With further reference to FIG. 2, the improved processing of video layers included in a multi-layer video bitstream will be described.

本発明の一実施形態によると、各レイヤ、すなわちビットストリームサブセットのプロパティは、サブセットの定義２２１〜２２３により、ビデオ信号の符号化、転送及び復号化に関係するエンティティに提供される。レイヤ毎に、レイヤのプロパティを記述する情報、すなわちパラメータを含む対応するサブセットの定義２２１〜２２３が提供される。例えばサブセットの定義２２１は、ｓｔｒｅａｍ＿ｉｄ＝０で示される第１のビットストリームサブセットのプロパティを記述する。それに対応して、サブセットの定義２２２は第２のビットストリームサブセット、すなわちｓｔｒｅａｍ＿ｉｄ＝１のプロパティを記述し、サブセットの定義２２３は第３のビットストリームサブセット、すなわちｓｔｒｅａｍ＿ｉｄ＝２のプロパティを記述する。サブセットの定義２２１〜２２３の各々に含まれるパラメータ、すなわちｔｅｘｔｕｒｅ＿ｆｌａｇ、ｄｅｐｔｈ＿ｆｌａｇ、ｏｃｃｌｕｓｉｏｎ＿ｆｌａｇ及びｖｉｅｗ＿ｉｄは、ビットストリームサブセットがテクスチャレイヤであるか、奥行きマップレイヤであるか、オクルージョンテクスチャレイヤであるか、並びにそれが属するビューを示す。図２に例示するサブセットの定義２２１〜２２３の場合、全てのレイヤはｖｉｅｗ＿ｉｄ＝０であり、同一のビューに属する。サブセットの定義２２１により記述される第１のレイヤはビューのテクスチャを含み、サブセットの定義２２２により記述される第２のレイヤはビューの奥行きマップを含み、サブセットの定義２２３により記述される第３のレイヤはビューのオクルージョンテクスチャを含む。 According to one embodiment of the invention, the properties of each layer, i.e. the bitstream subset, are provided to entities involved in the encoding, forwarding and decoding of the video signal according to the subset definitions 221-223. For each layer, information describing the properties of the layer, ie corresponding subset definitions 221 to 223 including parameters, are provided. For example, the subset definition 221 describes the properties of the first bitstream subset indicated by stream_id = 0. Correspondingly, subset definition 222 describes a second bitstream subset, ie, a stream_id = 1 property, and subset definition 223 describes a third bitstream subset, ie, a stream_id = 2 property. The parameters included in each of the subset definitions 221-223, namely texture_flag, depth_flag, occlusion_flag and view_id, are whether the bitstream subset is a texture layer, a depth map layer, an occlusion texture layer, and Indicates the view to which it belongs. In the case of the subset definitions 221 to 223 illustrated in FIG. 2, all layers have view_id = 0 and belong to the same view. The first layer described by subset definition 221 contains the texture of the view, the second layer described by subset definition 222 contains the depth map of the view, and the third layer described by subset definition 223 The layer contains the occlusion texture of the view.

図２を更に参照して、本発明の一実施形態に係るビデオ表現の概念を説明する。 With further reference to FIG. 2, the concept of video representation according to one embodiment of the present invention will be described.

ビデオ表現は、復号化可能なビデオを形成するために１つ以上のレイヤ、すなわちビットストリームサブセットをグループ化するために使用される。このグループ化機構は、圧縮ビデオビットストリームの処理に関係するネットワーク要素及びクライアントにビットストリームサブセットに関する情報を提供するために使用される。これは、ネットワーク要素及びクライアントにビデオ表現の定義に含まれる統合された情報を提供することにより達成される。このために、ビデオ表現は、例えばビットストリーム内の全てのテクスチャ情報を含んでもよく、あるいは基本ビューに対するテクスチャ及び奥行きマップ情報を含んでもよい。例えばビデオ信号を復号化して閲覧者に対してビデオを表示するために圧縮ビデオビットストリームから特定のビデオ表現を抽出したいネットワーク要素又はクライアントは、関係のあるビデオ表現を識別した後、ビデオ表現の一部である全てのビットストリームサブセットを抽出する。 The video representation is used to group one or more layers, ie bitstream subsets, to form a decodable video. This grouping mechanism is used to provide information about bitstream subsets to network elements and clients involved in processing compressed video bitstreams. This is accomplished by providing network elements and clients with integrated information included in the definition of the video representation. To this end, the video representation may include, for example, all texture information in the bitstream, or may include texture and depth map information for the base view. For example, a network element or client that wants to extract a particular video representation from a compressed video bitstream in order to decode the video signal and display the video to the viewer identifies the relevant video representation and then selects one of the video representations. Extract all bitstream subsets that are parts.

本発明の一実施形態に係る図２に示すビデオ表現２３１〜２３３等のビデオ表現は、ネットワーク要素及びクライアントによるビデオ表現の識別を容易にするために、ビデオ表現識別子ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄでマークを付けられる。更に、各ビデオ表現は、ビデオ信号を復号化することにより有意味なビデオをレンダリングするために必要なビットストリームセブセットを示すｓｔｒｅａｍ＿ｉｄのリストを含む。 Video representations such as video representations 231-233 shown in FIG. 2 according to one embodiment of the present invention are marked with a video representation identifier representation_id to facilitate identification of the video representation by network elements and clients. In addition, each video representation includes a list of stream_ids that indicate the bitstream subsets needed to render meaningful video by decoding the video signal.

例えば、表現の定義２３１により定義される第１のビデオ表現はｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄ＝０でマークを付けられており、単一のビットストリームサブセット識別子ｓｔｒｅａｍ＿ｉｄ＝０を含む。従って、ｓｔｒｅａｍ＿ｉｄ＝０を有するビットストリームサブセットの定義２２１を与えられるため、第１の表現はモノラル２Ｄビデオシーケンス、すなわちテクスチャ情報の単一ビューである。更に、表現の定義２３２により定義される第２のビデオ表現はｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄ＝１でマークを付けられ、２つのビットストリームサブセット識別子ｓｔｒｅａｍ＿ｉｄ＝０及びｓｔｒｅａｍ＿ｉｄ＝１のリストを含む。従って、サブセットの定義２２１及び２２２を与えられるため、第２のビデオ表現はｓｔｒｅａｍ＿ｉｄ＝１を有するレイヤにより搬送される奥行きマップを更に含み、３Ｄビデオシーケンスをレンダリングできる。しかし、第２のビデオ表現がオクルージョンテクスチャ情報を含まないため、レンダリングされる３Ｄビデオの品質は制限される。この問題は、オクルージョンテクスチャ情報を含む３Ｄビデオシーケンスのレンダリングを可能にする第３のビデオ表現により解決される。この目的のために、ビデオ表現の定義２３３は、３つのビットストリームサブセット識別子ｓｔｒｅａｍ＿ｉｄ＝０、ｓｔｒｅａｍ＿ｉｄ＝１及びｓｔｒｅａｍ＿ｉｄ＝２のリストを含む。 For example, the first video representation defined by representation definition 231 is marked with representation_id = 0 and includes a single bitstream subset identifier stream_id = 0. Thus, given a bitstream subset definition 221 with stream_id = 0, the first representation is a mono 2D video sequence, ie a single view of texture information. Further, the second video representation defined by representation definition 232 is marked with representation_id = 1 and includes a list of two bitstream subset identifiers stream_id = 0 and stream_id = 1. Thus, given the subset definitions 221 and 222, the second video representation further includes a depth map carried by the layer with stream_id = 1 and can render a 3D video sequence. However, the quality of the rendered 3D video is limited because the second video representation does not contain occlusion texture information. This problem is solved by a third video representation that allows rendering of 3D video sequences containing occlusion texture information. For this purpose, the video representation definition 233 includes a list of three bitstream subset identifiers stream_id = 0, stream_id = 1 and stream_id = 2.

階層ビットストリーム、サブセットの定義及びビデオ表現の概念は上述したように１つのビューのみに限定されないことが理解されるだろう。サブセットの定義は、例えば異なるカメラアングルを有する例えばｖｉｅｗ＿ｉｄ＝０及びｖｉｅｗ＿ｉｄ＝１である異なるビューを表現するレイヤ、あるいは異なるビデオ品質のビデオ信号を搬送するレイヤのプロパティを記述するために使用される。 It will be appreciated that the concept of hierarchical bitstream, subset definition and video representation is not limited to only one view as described above. The definition of the subset is used to describe the properties of layers that represent different views, for example with different camera angles, eg view_id = 0 and view_id = 1, or layers carrying video signals of different video quality.

ビデオ表現の概念により、ビデオビットストリームの処理に関係するネットワーク要素及びクライアントによる多層ビデオビットストリームの簡略な処理が可能になる。特定のビデオ表現を転送したいネットワーク要素又は復号化したいクライアントは、単純に、対応する表現の定義を識別し、必要なビットストリームサブセット識別子のリストを表現の定義から読み出し、必要なビットストリームサブセットに属するビデオパケットを圧縮ビデオビットストリームから抽出する。 The concept of video representation allows simple processing of multi-layer video bitstreams by network elements and clients involved in processing video bitstreams. A network element that wants to transfer a particular video representation or a client that wants to decode simply identifies the corresponding representation definition, reads the list of required bitstream subset identifiers from the representation definition, and belongs to the required bitstream subset Video packets are extracted from the compressed video bitstream.

例えばｔｅｍｐｏｒａｌ＿ｉｄ、ｐｒｉｏｒｉｔｙ＿ｉｄ、ｑｕａｌｉｔｙ＿ｉｄ及びｄｅｐｅｎｄｅｎｃｙ＿ｉｄ等である複数の識別子を使用するＳＶＣのＮＡＬユニットヘッダの拡張等であるビットストリームサブセットを識別する既知の方法と比較して、提案される方法は単一の識別子のみを使用するため、ハードウェア及びソフトウェアの双方のはるかに単純な実現例が可能になる。特に、単一の識別子ｓｔｒｅａｍ＿ｉｄは、例えばｄｅｐｅｎｄｅｎｃｙ＿ｉｄと異なり、事前に定義された意味を有さないが、その意味は、例えばサブセットの定義である二次手段を介して示される。この間接指定により、提案される概念は、新規の機能性が後の段階で導入される場合にそのような機能性に関する情報がサブセットの定義において信号伝送されるため容易に拡張可能であり、ＮＡＬユニットヘッダの変更は不要である。特に、本発明の一実施形態に係るＮＡＬユニットヘッダは、拡張が使用されるか否かに関係なく固定長を有する。それにより、ＮＡＬユニットヘッダの構文解析は既知の解決策よりはるかに単純になる。 Compared to known methods for identifying a bitstream subset, such as an extension of the SVC NAL unit header, which uses multiple identifiers such as temporal_id, priority_id, quality_id, and dependency_id, the proposed method is a single identifier. Much simpler implementations of both hardware and software are possible. In particular, a single identifier stream_id, for example, unlike dependency_id, does not have a predefined meaning, but its meaning is indicated via secondary means, for example a subset definition. With this indirection, the proposed concept can be easily extended when new functionality is introduced at a later stage, as information about such functionality is signaled in the subset definition, and NAL There is no need to change the unit header. In particular, the NAL unit header according to an embodiment of the present invention has a fixed length regardless of whether the extension is used. Thereby, parsing of the NAL unit header is much simpler than known solutions.

図３を参照して、特に提案される構文に関して、本発明の実施形態の更に詳細な説明を以下に提示する。 With reference to FIG. 3, a more detailed description of embodiments of the present invention is presented below, particularly with respect to the proposed syntax.

サブセット識別子ｓｔｒｅａｍ＿ｉｄは、ＮＡＬユニットヘッダにおいて搬送される。ＮＡＬユニットヘッダは、ＮＡＬユニットのタイプ又はｏｕｔｐｕｔ＿ｆｌａｇ等の他の識別子を含んでも含まなくてもよい。ＮＡＬユニットヘッダ３１０の一例を図３に示す。本例において、ｆｏｒｂｉｄｄｅｎ＿ｚｅｒｏ＿ｂｉｔは０に等しいとされ、ｎａｌ＿ｕｎｉｔ＿ｔｙｐｅはＮＡＬユニットに含まれるタイプデータを指定し、ｏｕｔｐｕｔ＿ｆｌａｇは現在のＮＡＬユニットの復号化されたコンテンツが画面出力を意図するかを信号伝送する。 The subset identifier stream_id is carried in the NAL unit header. The NAL unit header may or may not include other identifiers such as the type of NAL unit or output_flag. An example of the NAL unit header 310 is shown in FIG. In this example, forbidden_zero_bit is assumed to be equal to 0, nal_unit_type specifies type data included in the NAL unit, and output_flag signals whether the decoded content of the current NAL unit is intended for screen output.

異なるＮＡＬユニットに同一のｓｔｒｅａｍ＿ｉｄでマークを付けることにより、それらは同一のビットストリームサブセット、すなわち同一のレイヤに属するものとしてマークを付けられる。通常、同一のビットストリームサブセットに属する全てのＮＡＬユニットは、以下に例示する特定のプロパティのうちの１つ以上を共有する。
−サブセット内の全てのＮＡＬユニットはパラメータセットである。
−サブセット内の全てのＮＡＬユニットはイントラ符号化ピクチャである。
−サブセット内の全てのＮＡＬユニットは、いわゆる「クローズドピクチャグループ（ＧＯＰ）」のランダムアクセスポイントの始点を示す（ＡＶＣにおける瞬時復号器リフレッシュ（ＩＤＲ）ピクチャ）。
−サブセット内の全てのＮＡＬユニットは、いわゆる「オープンＧＯＰ」のランダムアクセスポイントの始点を示す（復号化を開始できるイントラピクチャ）。
−サブセット内の全てのＮＡＬユニットは、基本品質を表現するビデオフレームを搬送する。
−サブセット内の全てのＮＡＬユニットは、時間リファイン情報を搬送する。
−サブセット内の全てのＮＡＬユニットは、空間スケーラビリティの場合の空間リファイン情報を搬送する。
−サブセット内の全てのＮＡＬユニットは、多視点符号化における特定のカメラビューに対応する情報を搬送する。
−サブセット内の全てのＮＡＬユニットは、特定のビデオストリーム又は例えば奥行きマップストリームに関する情報を搬送する。 By marking different NAL units with the same stream_id, they are marked as belonging to the same bitstream subset, ie the same layer. In general, all NAL units belonging to the same bitstream subset share one or more of the specific properties illustrated below.
-All NAL units in the subset are parameter sets.
-All NAL units in the subset are intra-coded pictures.
-All NAL units in the subset indicate the starting point of a so-called "closed picture group (GOP)" random access point (instantaneous decoder refresh (IDR) picture in AVC).
-All NAL units in the subset indicate the starting point of a so-called “open GOP” random access point (intra picture from which decoding can start).
-All NAL units in the subset carry video frames representing the basic quality.
-All NAL units in the subset carry time refinement information.
-All NAL units in the subset carry spatial refinement information in case of spatial scalability.
-All NAL units in the subset carry information corresponding to a particular camera view in multi-view coding.
-All NAL units in the subset carry information about a particular video stream or eg a depth map stream.

尚、複数のそのようなプロパティは、所定のサブセットに対して同時に有効であってもよい。 It should be noted that a plurality of such properties may be valid for a given subset at the same time.

いくつかのサブセットのプロパティ及び関連付けられるｓｔｒｅａｍ＿ｉｄは事前に定義されてもよい。例えばｓｔｒｅａｍ＿ｉｄ＝０は、ＳＰＳ、ＰＰＳ、ＳｔＰＳ又はＲＰＳ等のパラメータセットを搬送するＮＡＬユニットのみを含むサブセットを示す。あるいは、ＳＰＳ、ＰＰＳ、ＳｔＰＳ及びＲＰＳは事前に定義された異なるｓｔｒｅａｍ＿ｉｄを有する。 Some subset of properties and associated stream_id may be predefined. For example, stream_id = 0 indicates a subset including only NAL units that carry parameter sets such as SPS, PPS, StPS, or RPS. Alternatively, SPS, PPS, StPS and RPS have different predefined stream_id.

サブセットのプロパティに関する情報は、例えばサブセットの定義により明示的に提供されてもよい。サブセットの定義は、例えばｔｅｍｐｏｒａｌ＿ｉｄ又はｑｕａｌｉｔｙ＿ｉｄ等のパラメータを含んでもよい。サブセットの定義３２０の一例を図３に示す。 Information about the properties of the subset may be explicitly provided, for example, by the definition of the subset. The definition of the subset may include parameters such as temporal_id or quality_id. An example of a subset definition 320 is shown in FIG.

サブセットの定義３２０等の本発明の一実施形態に係るサブセットの定義の構文は、例えばＮＡＬユニットのタイプ又は他のプロパティに依存する条件付きフィールドを含んでもよい。サブセットの定義３２０は、参照するＮＡＬユニットがＶＣＬデータを含み且つ非ＶＣＬデータを含まない場合にのみ使用される構文要素を例示する（尚、本開示において、例におけるＮＡＬユニットのタイプはＡＶＣの仕様に従う）。 The syntax of a subset definition according to one embodiment of the invention, such as subset definition 320, may include conditional fields that depend on, for example, the type of NAL unit or other properties. The subset definition 320 illustrates syntax elements that are used only when the referenced NAL unit includes VCL data and does not include non-VCL data. (In this disclosure, the type of NAL unit in the example is the AVC specification. Follow).

サブセットの定義３２０において、ｓｔｒｅａｍ＿ｉｄはビットストリームのサブセットである階層ストリームを識別し、ｓｔｒｅａｍ＿ｔｙｐｅは指定されているストリームの種類を記述し、ｖｅｒｓｉｏｎ＿ｉｄはストリームを記述する際に従う仕様のバージョンを指定する。ｖｉｅｗ＿ｉｄ、ｔｅｍｐｏｒａｌ＿ｉｄ、ｑｕａｌｉｔｙ＿ｉｄ及びｄｅｐｅｎｄｅｎｃｙ＿ｉｄのフラグの各々は、プロパティ又は対応するレイヤを識別する。更に詳細には、ｖｉｅｗ＿ｉｄはカメラビューを示し、ｔｅｍｐｏｒａｌ＿ｉｄは、例えば３０Ｈｚのフレームレートを上位の６０Ｈｚにリファインする時間リファインレイヤを示し、ｑｕａｌｉｔｙ＿ｉｄは圧縮ビデオの信号忠実度を示し、ｄｅｐｅｎｄｅｎｃｙ＿ｉｄは空間スケーラビリティの場合の空間リファインレイヤを示す。 In the subset definition 320, stream_id identifies the hierarchical stream that is a subset of the bitstream, stream_type describes the type of the specified stream, and version_id specifies the version of the specification that follows when describing the stream. Each of the view_id, temporal_id, quality_id, and dependency_id flags identifies a property or corresponding layer. More specifically, view_id indicates a camera view, temporal_id indicates a temporal refinement layer that refines a frame rate of, for example, 30 Hz to the upper 60 Hz, quality_id indicates signal fidelity of compressed video, and dependency_id indicates spatial scalability. The spatial refinement layer of is shown.

サブセットの定義自体は、例えば専用のＮＡＬユニット、すなわちＳｔＰＳの形態でビットストリームにおいて搬送されてもよい。そのような専用のＮＡＬユニットは、専用のＮＡＬユニットのタイプにより示される。これは、パラメータセットを搬送するＮＡＬユニットのＮＡＬユニットヘッダにおいて特定のｓｔｒｅａｍ＿ｉｄにより示されるパラメータセットのビットストリームサブセット等の専用のサブセットの一部であってもよい。そのようなストリームパラメータセットのＮＡＬユニットを搬送するサブセットは、ｓｔｒｅａｍ＿ｉｄ＝０等の事前に定義されたｓｔｒｅａｍ＿ｉｄを有してもよい。 The subset definition itself may be carried in the bitstream, for example in the form of dedicated NAL units, ie StPS. Such dedicated NAL units are indicated by the type of dedicated NAL unit. This may be part of a dedicated subset, such as a bitstream subset of the parameter set indicated by a particular stream_id in the NAL unit header of the NAL unit carrying the parameter set. The subset carrying NAL units of such a stream parameter set may have a predefined stream_id, such as stream_id = 0.

ＳｔＰＳは、時間拡張（ｔｅｍｐｏｒａｌ＿ｉｄ）、品質拡張（ｑｕａｌｉｔｙ＿ｉｄ）、空間拡張（ｄｅｐｅｎｄｅｎｃｙ＿ｉｄ）、優先順位（ｐｒｉｏｒｉｔｙ＿ｉｄ）、あるいはＳＶＣ及びＭＶＣに対するＮＡＬユニットの拡張ヘッダにおいて搬送されるような何らかの信号伝送に関するパラメータを含んでもよい。ＳｔＰＳは、関連付けられたＮＡＬユニットにおいて搬送されるデータの種類に関する情報、すなわち、ＮＡＬユニットがパラメータセット、ＳＥＩメッセージ、イントラピクチャ又はアンカーピクチャ等を含むかに関する情報を更に含んでもよい。ＳｔＰＳは、圧縮データが表現するコンテンツに関する高度な情報を更に搬送してもよく、例えば関連付けられたＮＡＬユニットがテクスチャデータ、奥行き情報又はオクルージョン情報等を表すかに関する情報を搬送してもよい。 StPS includes parameters related to time extension (temporal_id), quality extension (quality_id), spatial extension (dependency_id), priority (priority_id), or some signal transmission as carried in the NAL unit extension header for SVC and MVC. But you can. The StPS may further include information regarding the type of data carried in the associated NAL unit, i.e., information regarding whether the NAL unit includes a parameter set, an SEI message, an intra picture, an anchor picture, or the like. StPS may further carry advanced information about the content represented by the compressed data, for example, information about whether the associated NAL unit represents texture data, depth information, occlusion information, or the like.

図３に例示するサブセットの定義３２０等の本発明の一実施形態に係るストリームパラメータセットの構文は、拡張可能な方法で定義される。このために、それは、例えばバージョンＮである仕様の特定のバージョンにより定義される構文要素を含む。例えばバージョンＮ＋１である仕様の将来の更新において、追加の構文要素が含まれる。この場合、仕様のバージョンＮに準拠する受信装置はバージョンＮの構文要素しか解釈できないが、バージョンＮ＋１に準拠する受信装置は追加の構文要素も解釈できる。この場合、ＳｔＰＳは可変長を有し、必要に応じて、仕様の新規バージョンにおいて拡張可能である。バージョンＮの受信装置がバージョンＮに準拠するパラメータ及びバージョンＮ＋１に準拠する更なるパラメータを含むバージョンＮ＋１のＳｔＰＳを受信する場合、当該受信装置はバージョンＮの構文要素を読み出し、バージョンＮ＋１の構文要素を無視する。あるいは、バージョンＮの受信装置は、バージョンＮに準拠しないため解釈できない構文要素を見つけた場合、ＳｔＰＳと関連付けられたＮＡＬユニットを処理しないと決定する。装置の挙動、すなわち未知の構文を無視するか又は関連付けられたＮＡＬユニットを処理しないかは、例えば別個のフラグにより信号伝送される。 The syntax of the stream parameter set according to an embodiment of the present invention, such as the subset definition 320 illustrated in FIG. 3, is defined in an extensible manner. For this, it contains syntax elements defined by a specific version of the specification, for example version N. In a future update of the specification, for example version N + 1, additional syntax elements will be included. In this case, a receiving device that conforms to version N of the specification can only interpret version N syntax elements, but a receiving device that conforms to version N + 1 can also interpret additional syntax elements. In this case, StPS has a variable length and can be extended in new versions of the specification as needed. When a version N receiving device receives a version N + 1 StPS including a parameter compliant with version N and further parameters compliant with version N + 1, the receiving device reads the syntax element of version N and reads the syntax element of version N + 1. ignore. Alternatively, if a receiving device of version N finds a syntax element that cannot be interpreted because it does not conform to version N, it determines that it does not process the NAL unit associated with StPS. The behavior of the device, i.e. ignoring the unknown syntax or not processing the associated NAL unit, is signaled, for example, by a separate flag.

ストリームパラメータセットの構文は、それが準拠する仕様のバージョンを示すバージョン識別子を更に含んでもよい。バージョン識別子は例えば数値として表されてもよく、その場合、数値が大きくなるほど大きいバージョン番号を示す。バージョンＮの受信装置がＮ以下のバージョン番号を有するＳｔＰＳを受信する場合、当該受信装置は構文を解析できる。バージョンＮの受信装置がＮより大きいバージョン番号を有するＳｔＰＳを受信する場合、当該受信装置は構文を解析できず、少なくともバージョン番号がＮより大きいバージョンに特有の部分を解析できない。その場合、当該受信装置はＳｔＰＳと関連付けられたＮＡＬユニットを処理しない。あるいは、当該受信装置は、ＳｔＰＳ自体を解釈せずにＳｔＰＳと関連付けられたＮＡＬユニットを復号化する。この挙動は、追加フラグにより制御されてもよい。 The syntax of the stream parameter set may further include a version identifier indicating the version of the specification to which it conforms. For example, the version identifier may be expressed as a numerical value. In this case, the larger the numerical value, the larger the version number. When a version N receiving apparatus receives an StPS having a version number of N or less, the receiving apparatus can analyze the syntax. When a version N receiving apparatus receives an StPS having a version number greater than N, the receiving apparatus cannot parse the syntax, and at least a part specific to a version having a version number greater than N cannot be parsed. In that case, the receiving apparatus does not process the NAL unit associated with StPS. Alternatively, the receiving apparatus decodes the NAL unit associated with StPS without interpreting StPS itself. This behavior may be controlled by an additional flag.

サブセットの定義３２０において示すように、ストリームパラメータセットとサブセットに含まれるＮＡＬユニットとの関連付けはパラメータｓｔｒｅａｍ＿ｉｄにより提供され、ストリームパラメータセット、すなわちサブセットの定義を含むＮＡＬユニットと関連するＮＡＬユニットとは同一のｓｔｒｅａｍ＿ｉｄでマークを付けられる。あるいは、ＳｔＰＳは、ｓｔｒｅａｍ＿ｉｄのリストを含むことにより２つ以上のｓｔｒｅａｍ＿ｉｄとの関連を示し、例えばＳｔＰＳがこれらのｓｔｒｅａｍ＿ｉｄのうちのいずれかを有するＮＡＬユニットに対応することを示す。更に別の方法として、ＳｔＰＳは、Ｍ＝「１１１１１１１１００００００００」（１０進数で６５２８０）等のビットマスクフィールド及び例えばＶ＝「１０１０１０１０００００００００」（１０進数で４３５２０）である値フィールドを示し、「＆」をビット単位の「ａｎｄ」演算とする「ｓｔｒｅａｍ＿ｉｄ＆Ｍ＝Ｖ」等の条件に一致するｓｔｒｅａｍ＿ｉｄを有する全てのＮＡＬユニットがＳｔＰＳと関連付けられる。複数のｓｔｒｅａｍ＿ｉｄを同一のＳｔＰＳと関連付けることは、サブセットのプロパティに関するパラメータを搬送するために必要なＳｔＰＳのパケット数が少ないという利点を有する。このように、複数の異なるｓｔｒｅａｍ＿ｉｄに対応する複数のビットストリームサブセットに有効な情報が単一のＳｔＰＳにおいて搬送される。これらのサブセットのうちの１つ又はいくつかに対してのみ有効な追加情報は、別個のＳｔＰＳにおいて搬送可能である。 As shown in the subset definition 320, the association between the stream parameter set and the NAL units included in the subset is provided by the parameter stream_id, and is the same as the NAL unit associated with the stream parameter set, ie the NAL unit containing the subset definition Marked with stream_id. Alternatively, StPS indicates an association with two or more stream_ids by including a list of stream_id, for example, indicates that StPS corresponds to a NAL unit having any of these stream_ids. As a further alternative, StPS indicates a bit mask field such as M = “11111111000000” (decimal 65280) and a value field such as V = “1010101000000” (decimal 43520), and “&” is a bit. All NAL units having a stream_id that matches a condition such as “stream_id & M = V” as the unit “and” operation are associated with the StPS. Associating multiple stream_ids with the same StPS has the advantage that fewer StPS packets are needed to carry the parameters for the subset properties. In this way, information valid for a plurality of bitstream subsets corresponding to a plurality of different stream_ids is carried in a single StPS. Additional information that is only valid for one or several of these subsets can be carried in a separate StPS.

ｓｔｒｅａｍ＿ｉｄは数値であってもよく、低いｓｔｒｅａｍ＿ｉｄを有するＮＡＬユニットがパラメータセット又はイントラフレーム等の重要度の高いデータを搬送し且つ高いｓｔｒｅａｍ＿ｉｄを有するＮＡＬユニットがＳＥＩ又は時間拡張データ等の重要度の低いデータを搬送するように定義されてもよい。そのような優先順位情報を使用して、例えば帯域幅が制限されるためパケットを破棄する必要のあるネットワークノードは、ｓｔｒｅａｍ＿ｉｄの値が高いパケットを破棄し、ｓｔｒｅａｍ＿ｉｄの値が低いパケットを保持する。 The stream_id may be a numerical value, a NAL unit having a low stream_id carries high importance data such as a parameter set or an intra frame, and a NAL unit having a high stream_id is low importance such as SEI or time extension data It may be defined to carry data. Using such priority information, for example, a network node that needs to discard a packet because its bandwidth is limited discards a packet with a high stream_id value and retains a packet with a low stream_id value.

ＳＶＣ及びＭＶＣのようにｔｅｍｐｏｒａｌ＿ｉｄ及びｄｅｐｅｎｄｅｎｃｙ＿ｉｄ等の複数のパラメータを使用する場合と比較して、ｓｔｒｅａｍ＿ｉｄの概念の利点は、間接指定を使用するため、新規のコーデック機能性が新たに設定される場合に新規のフィールドを導入する必要がないことである。更に、ｓｔｒｅａｍ＿ｉｄが関連付けられたビットストリームサブセットの相対的な優先順位を表す場合、いくつかの例では間接指定を分析する必要すらなく、ＮＡＬユニットの構文解析及びディスパッチが更に単純になる。 Compared to using multiple parameters such as temporal_id and dependency_id, such as SVC and MVC, the advantage of the stream_id concept is that it uses indirect designation, so when new codec functionality is newly set. There is no need to introduce new fields. In addition, if the stream_id represents the relative priority of the associated bitstream subset, in some instances it may not be necessary to analyze the indirection, further simplifying parsing and dispatching of NAL units.

ビデオ表現内にｓｔｒｅａｍ＿ｉｄのリストを提供することにより、圧縮ビデオビットストリームの他の部分から独立して復号化可能な当該ビットストリームの一部が識別される。関連付けられたビットストリームサブセットは復号化可能なビデオを形成する。すなわち、表現外のＮＡＬユニットに対するピクチャ間又はビュー間の依存性は存在しない。これは、単一のｓｔｒｅａｍ＿ｉｄにより識別されるが例えば時間拡張情報のみを含む場合に独立して復号化されないビットストリームサブセットと異なる。ビデオ表現は、例えばテクスチャ、奥行き又はオクルージョン情報であるコンテンツの種類等の特定のプロパティと関連付けられる。 By providing a list of stream_ids in the video representation, a part of the bitstream that can be decoded independently of the other parts of the compressed video bitstream is identified. The associated bitstream subset forms a video that can be decoded. That is, there is no dependency between pictures or views on NAL units outside the representation. This is different from a bitstream subset that is identified by a single stream_id but is not decoded independently, for example when it contains only time extension information. Video representations are associated with specific properties such as content type, eg texture, depth or occlusion information.

ビデオ表現の定義、すなわち必要なｓｔｒｅａｍ＿ｉｄのリスト及び表現の追加のプロパティは、ビットストリーム内の別個のビデオパケット、すなわちＲＰＳにおいて提供されてもよい。表現のパラメータセットを提供する少なくとも２つの方法が存在する。１つの方法は、１つのＮＡＬユニットにおいて１つのＲＰＳのみを提供することである。別の方法として、本発明の一実施形態に係るＮＡＬユニット３３０により示すような単一のＮＡＬユニットにおいて複数のＲＰＳを提供する。 The definition of the video representation, i.e. the list of required stream_ids and the additional properties of the representation, may be provided in a separate video packet in the bitstream, i.e. RPS. There are at least two ways of providing a parameter set of representations. One way is to provide only one RPS in one NAL unit. Alternatively, multiple RPSs are provided in a single NAL unit as illustrated by NAL unit 330 according to one embodiment of the present invention.

ビデオ表現の定義３３０において、ｎｕｍ＿ｒｅｐｒｅｓｅｎｔａｔｉｏｎｓはこのＮＡＬユニットにおいて指定される表現の数であり、ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄはビデオ表現を識別し、ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｐｒｉｏｒｉｔｙ＿ｉｄは表現の優先順位を定義し、ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｔｙｐｅはビデオ表現の種類である。ビデオ表現の種類は、例えば以下のうちのいずれか１つである。
−テクスチャのみを有する平面ビデオ。
−テクスチャのみ及び立体アプリケーションを有する立体ビデオ。
−テクスチャ及び奥行き情報を有し、ビューの合成を使用して何らかの裸眼立体／立体効果を生成する平面ビデオ。
−テクスチャ及び奥行き情報を有し、ビューの合成を使用して十分な裸眼立体効果を生成する立体ビデオ。
−テクスチャ、奥行き及び非オクルージョン情報を有し、ビューの合成を使用して十分な裸眼立体／立体効果を生成する平面ビデオ。
−可逆チャネルを使用して事前に帯域外でネットワーク要素及びクライアントに送信されるＳＰＳ及びＰＰＳ。 In the video representation definition 330, num_representations is the number of representations specified in this NAL unit, representation_id identifies the video representation, representation_priority_id defines the priority of the representation, and representation_type is the type of video representation. The type of video expression is, for example, one of the following.
-Planar video with texture only.
Stereo video with texture only and stereo application.
Planar video that has texture and depth information and uses view synthesis to generate some autostereoscopic / stereoscopic effects.
Stereoscopic video with texture and depth information and using view synthesis to produce sufficient autostereoscopic effects.
Planar video with texture, depth and non-occlusion information and using view compositing to generate sufficient autostereoscopic / stereoscopic effects.
SPS and PPS that are sent out-of-band to network elements and clients in advance using a reversible channel.

ビデオ表現の定義３３０を更に参照すると、ｎｕｍ＿ｓｔｒｅａｍｓはこの表現に含まれる必要なストリームの数を信号伝送し、各ｓｔｒｅａｍ＿ｉｄは必要なストリームを指定する。オプションとして、ＲＰＳ情報を搬送するＮＡＬユニットは、例えばｓｔｒｅａｍ＿ｉｄ＝０である事前に定義されたｓｔｒｅａｍ＿ｉｄでマークを付けられてもよい。 With further reference to the video representation definition 330, num_streams signals the number of required streams included in the representation, and each stream_id specifies the required streams. Optionally, a NAL unit carrying RPS information may be marked with a predefined stream_id, eg, stream_id = 0.

表現の定義３３０において示したように、ＲＰＳにより定義されるプロパティのうちの１つは、他のビデオ表現に対する現在の表現の優先順位の値を示す優先順位標識ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｐｒｉｏｒｉｔｙ＿ｉｄであってもよい。 As shown in the representation definition 330, one of the properties defined by the RPS may be a priority indicator representation_priority_id that indicates a priority value of the current representation relative to other video representations.

表現の定義であるＲＰＳ及びサブセットの定義であるＳｔＰＳに基づいて、動的ストリーミングの手法が単純な方法で適用される。例えばサーバは、ｑｕａｌｉｔｙ＿ｉｄ＝０、１及び２によりそれぞれ識別される３つのビットストリームサブセット、すなわちレイヤを有するビデオビットストリームを提供する。これらのストリームは、低品質、中間品質及び高品質の３つのビデオ表現に選択的に組み合わされる。この場合、低品質の表現はｑｕａｌｉｔｙ＿ｉｄ＝０を有するサブセットを含み、中間品質の表現はｑｕａｌｉｔｙ＿ｉｄ＝１及びｑｕａｌｉｔｙ＿ｉｄ＝２を有する２つのサブセットを含み、第３の表現は３つ全てのサブセットを含む。クライアントは、短いバッファ時間を得るために品質は最低であるがサイズが最小である第１のストリームのダウンロードを開始してもよい。ビデオをしばらく再生した後、ネットワークの帯域幅が利用可能であると検出した場合、クライアントは中間品質又は高品質の表現に切り替えてもよい。ネットワークにおいて輻輳が発生した場合、クライアントは低品質のレベルに引き下げてもよい。 Based on the representation definition RPS and the subset definition StPS, the dynamic streaming approach is applied in a simple manner. For example, the server provides a video bitstream having three bitstream subsets, or layers, identified by quality_id = 0, 1, and 2, respectively. These streams are selectively combined into three video representations of low quality, intermediate quality and high quality. In this case, the low quality representation includes a subset with quality_id = 0, the intermediate quality representation includes two subsets with quality_id = 1 and quality_id = 2, and the third representation includes all three subsets. The client may initiate the download of the first stream with the lowest quality but the smallest size to obtain a short buffer time. After playing the video for a while and detecting that the network bandwidth is available, the client may switch to an intermediate or high quality representation. If congestion occurs in the network, the client may reduce it to a low quality level.

ｑｕａｌｉｔｙ＿ｉｄとサブセットの定義において定義される他のパラメータとを有することで、必要なビットストリームサブセットの選択は容易になる。このために、クライアントは、ＳＶＣのＮＡＬユニットヘッダの場合のように複数の識別子を構文解析し且つ解釈するのではなく、ｓｔｒｅａｍ＿ｉｄに関する規則を用いて単純に命令される。尚、通常は帯域幅が品質と正相関を有するため、ｑｕａｌｉｔｙ＿ｉｄの代わりにｂａｎｄｗｉｄｔｈ＿ｉｄを信号伝送して必要なビットレートを示してもよい。 Having the quality_id and other parameters defined in the subset definition facilitates the selection of the required bitstream subset. To this end, the client is simply instructed using the rules for stream_id, rather than parsing and interpreting multiple identifiers as in the SVC NAL unit header. Note that, since the bandwidth usually has a positive correlation with the quality, the required bit rate may be indicated by signal transmission of bandwidth_id instead of quality_id.

本発明の実施形態によると、パケットを受信又は転送するビデオパケット受信機又はビデオパケット転送機等のネットワーク要素、あるいはビデオパケットを受信及び復号化する復号化装置は、以下のようにｓｔｒｅａｍ＿ｉｄを解釈する。受信、転送又は復号化動作に関係すると考えられるｓｔｒｅａｍ＿ｉｄのリストがネットワーク又は復号化装置に提供されると仮定する。ビデオパケットが受信されると、ｓｔｒｅａｍ＿ｉｄが検査される。ｓｔｒｅａｍ＿ｉｄの値に従って、ビデオパケットは受信／転送／復号化される、すなわちビットストリームから抽出されるか、あるいは破棄される。すなわち、ビデオパケット内のｓｔｒｅａｍ＿ｉｄが関係のあるｓｔｒｅａｍ＿ｉｄのうちの１つと一致する場合、パケットは更に処理される。ビデオパケット内のｓｔｒｅａｍ＿ｉｄが関係のあるｓｔｒｅａｍ＿ｉｄのうちのいずれとも一致しない場合、パケットは破棄され、それ以上処理されない。 According to an embodiment of the present invention, a network element such as a video packet receiver or video packet forwarder that receives or forwards a packet, or a decoding device that receives and decodes a video packet interprets stream_id as follows: . Assume that a list of stream_ids that are considered to be related to receiving, forwarding or decoding operations is provided to the network or decoding device. When a video packet is received, stream_id is examined. Depending on the value of stream_id, the video packet is received / transferred / decoded, ie extracted from the bitstream or discarded. That is, if the stream_id in the video packet matches one of the relevant stream_ids, the packet is further processed. If the stream_id in the video packet does not match any of the relevant stream_ids, the packet is discarded and not processed further.

関係のあるｓｔｒｅａｍ＿ｉｄのリストは、事前に定義されたｓｔｒｅａｍ＿ｉｄを含んでもよい。例えばこれは、関連付けられたビットストリームサブセットがパラメータセットを含むことを示すｓｔｒｅａｍ＿ｉｄ＝０を含んでもよい。このように、受信／転送／復号化装置は全てのパラメータセットを受信する。その後、装置は例えばＳｔＰＳ又はＲＰＳのうちの１つ又はいくつかを解釈し、それに従って、関係のあるｓｔｒｅａｍ＿ｉｄのリストを更新する。このために、受信装置は、全てのＳｔＰＳを受信して、特定のプロパティに対してＳｔＰＳの構文を検査してもよい。例えばテクスチャデータ等のビデオデータの種類である所定のプロパティを有するＳｔＰＳが検出される場合、関連付けられたｓｔｒｅａｍ＿ｉｄを有するビデオパケットがビットストリームから抽出される。このように、ＳｔＰＳ及び関連付けられたｓｔｒｅａｍ＿ｉｄは、ＳｔＰＳにおいて搬送されるｔｅｍｐｏｒａｌ＿ｉｄ、ｖｉｅｗ＿ｉｄ、ｑｕａｌｉｔｙ＿ｉｄ又はビデオデータの種類等のパラメータに基づいて選択可能である。 The list of related stream_ids may include a predefined stream_id. For example, this may include stream_id = 0 indicating that the associated bitstream subset includes a parameter set. Thus, the receiving / forwarding / decoding device receives all parameter sets. The device then interprets, for example, one or several of StPS or RPS and updates the list of relevant stream_ids accordingly. For this purpose, the receiving apparatus may receive all StPS and check the syntax of StPS for a specific property. For example, when a StPS having a predetermined property that is a type of video data such as texture data is detected, a video packet having an associated stream_id is extracted from the bitstream. Thus, the StPS and the associated stream_id can be selected based on parameters such as temporal_id, view_id, quality_id or video data type carried in the StPS.

サブセットの関連性は、関連付けられたＳｔＰＳにおいて搬送されるバージョン識別子に基づいて更に判定されてもよい。受信装置がＳｔＰＳにおいて示されるバージョン識別子に準拠する場合、当該装置は関連付けられたＮＡＬユニットを抽出する。受信装置がそのようなバージョン識別子に準拠しない場合、当該装置は関連付けられたＮＡＬユニットを破棄する。 The relevance of the subset may be further determined based on the version identifier carried in the associated StPS. If the receiving device complies with the version identifier indicated in StPS, the device extracts the associated NAL unit. If the receiving device does not comply with such a version identifier, the device discards the associated NAL unit.

関係のあるストリーム識別子のリストは、関係があると考えられる表現の定義を検査することにより取得されてもよい。この場合、ストリーム識別子のリストは表現の定義から抽出され、関係のあるストリーム識別子として使用される。 A list of relevant stream identifiers may be obtained by examining the definition of expressions that are considered relevant. In this case, the list of stream identifiers is extracted from the definition of the expression and used as the relevant stream identifier.

受信装置は、全てのＲＰＳを受信し、例えば優先順位、ビデオコンテンツの種類又はビデオの解像度である特定のプロパティに対してＲＰＳの構文を検査する。所定のプロパティを有するＲＰＳが検出される場合、全ての関連付けられたｓｔｒｅａｍ＿ｉｄが関係のあるｓｔｒｅａｍ＿ｉｄであると考えられ、ビットストリームから抽出される。 The receiving device receives all the RPSs and checks the syntax of the RPS for specific properties, eg priority, video content type or video resolution. If an RPS with a given property is detected, all associated stream_ids are considered relevant stream_id and are extracted from the bitstream.

あるいは、受信／転送／復号化装置は、各ＮＡＬユニットの優先順位に基づいて、パケットを抽出するか又は破棄するかを決定する。特に、ｓｔｒｅａｍ＿ｉｄがビットストリームサブセットの相対的な優先順位に従って定義される場合、受信／転送／復号化装置は、ｓｔｒｅａｍ＿ｉｄが「低い」、すなわち重要度の高いデータを含むパケットを抽出し、ｓｔｒｅａｍ＿ｉｄが「高い」、すなわち重要度の低いデータを含むパケットを破棄する。ｓｔｒｅａｍ＿ｉｄの数値が「低い」又は「高い」という決定は、閾値に基づいてもよい。この手法は、帯域幅が制限される場合のネットワーク要素におけるパッケージの非処理又はビットストリームの間引きに使用される。 Alternatively, the receiving / forwarding / decoding device determines whether to extract or discard the packet based on the priority of each NAL unit. In particular, if the stream_id is defined according to the relative priority of the bitstream subset, the receiving / forwarding / decoding device extracts a packet with stream_id “low”, that is, data with high importance, and the stream_id is “ Discard packets that contain "high", ie, less important data. The determination that the value of stream_id is “low” or “high” may be based on a threshold. This approach is used for package deprocessing or bitstream decimation in network elements when bandwidth is limited.

本発明を更に説明するために、一例を以下に提示する。例は立体ビデオに関し、すなわち、ｖｉｅｗ＿ｉｄ＝０及びｖｉｅｗ＿ｉｄ＝１をそれぞれ有し且つ付随する奥行きデータを有する２つのビューに関する。テクスチャ及び奥行きビデオの双方は、異なるｔｅｍｐｏｒａｌ＿ｉｄにより示される何らかの時間スケーラビリティを提供すると仮定する。この場合、ｔｅｍｐｏｒａｌ＿ｉｄ＝０は例えば３０Ｈｚフレームレートのビデオである時間の基本品質を示し、ｔｅｍｐｏｒａｌ＿ｉｄ＝１は例えばフレームレートを３０Ｈｚから６０Ｈｚに拡張するための時間拡張情報を有する。 To further illustrate the present invention, an example is presented below. The example relates to stereoscopic video, i.e. two views with view_id = 0 and view_id = 1 respectively and with accompanying depth data. Assume that both texture and depth video provide some temporal scalability indicated by different temporal_id. In this case, temporal_id = 0 indicates the basic quality of time, for example, 30 Hz frame rate video, and temporal_id = 1 has time extension information for extending the frame rate from 30 Hz to 60 Hz, for example.

利用可能なビュー及び時間解像度に対するテクスチャ及び奥行きの双方に関するデータを搬送するＮＡＬユニットは、複数のビットストリームサブセットを使用して同一のビットストリームに多重化される。本発明の実施形態に係るビットストリームサブセット及びビデオ表現の概念を使用する場合、それらは一意に識別可能である。 NAL units that carry data regarding both texture and depth for available views and temporal resolution are multiplexed into the same bitstream using multiple bitstream subsets. When using the concept of bitstream subsets and video representation according to embodiments of the present invention, they are uniquely identifiable.

以下の表は、サブセットの定義の一例を提供する。

The following table provides an example of a subset definition.

テクスチャデータ、ｖｉｅｗ＿ｉｄ＝０及びｖｉｅｗ＿ｉｄ＝１に対応するｓｔｒｅａｍ＿ｉｄ＝１〜５を有する５つのサブセット、並びに奥行きデータに対応するｓｔｒｅａｍ＿ｉｄ＝６〜１０を有する５つの更なるサブセットが存在する。ｓｔｒｅａｍ＿ｉｄ＝１及びｓｔｒｅａｍ＿ｉｄ＝６を有するサブセットは、シーケンスパラメータセット（ＳＰＳ）及びピクチャパラメータセット（ＰＰＳ）、すなわち非ＶＣＬデータをそれぞれ搬送するサブセットを示す。残りのｓｔｒｅａｍ＿ｉｄは、ＶＣＬデータを搬送するサブセットを示す。 There are 5 subsets with texture data, stream_id = 1-5 corresponding to view_id = 0 and view_id = 1, and 5 further subsets with stream_id = 6-10 corresponding to depth data. Subsets with stream_id = 1 and stream_id = 6 indicate sequence parameter sets (SPS) and picture parameter sets (PPS), ie, subsets that carry non-VCL data, respectively. The remaining stream_id indicates the subset that carries the VCL data.

更に、対応する表現の定義の一例を以下の表に示す。

Furthermore, an example of the definition of the corresponding expression is shown in the following table.

表中、ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄ＝０を有する表現は、テクスチャ部分に対するＳＰＳ及びＰＰＳのみであるｓｔｒｅａｍ＿ｉｄ＝１を有するＮＡＬユニットを含む。別の例として、ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄ＝３を有する表現はｓｔｒｅａｍ＿ｉｄ＝１、２及び３を含み、テクスチャ情報のみを含むｖｉｅｗ＿ｉｄ＝０の単一ビューのビデオを表現する。更なる例として、ｒｅｐｒｅｓｅｎｔａｔｉｏｎ＿ｉｄ＝７は完全なビットストリームに対応する。 In the table, the representation with representation_id = 0 includes a NAL unit with stream_id = 1 which is only SPS and PPS for the texture portion. As another example, an expression with representation_id = 3 includes stream_id = 1, 2, and 3 and represents a single view video with view_id = 0 that contains only texture information. As a further example, representation_id = 7 corresponds to a complete bitstream.

表現の定義を受信後、受信又は転送装置は、信号伝送されたプロパティに依存して、所定のアプリケーションに対して最も適切な表現を決定し、それにより、関係のあるｓｔｒｅａｍ＿ｉｄのリストを取得する。装置は、受信されるＮＡＬユニットのＮＡＬユニットヘッダを検査することにより、これらのｓｔｒｅａｍ＿ｉｄと関連付けられたＮＡＬユニットを容易に抽出できる。 After receiving the representation definition, the receiving or forwarding device determines the most appropriate representation for a given application, depending on the signaled properties, thereby obtaining a list of relevant stream_ids. The device can easily extract the NAL units associated with these stream_id by examining the NAL unit headers of the received NAL units.

以下、図４を参照して、圧縮ビデオビットストリーム内のビットストリームサブセットを示す方法の実施形態を説明する。方法の一実施形態は、例えば図１を参照して説明した符号化装置１１０等の送出装置において実行される。特に、方法の一実施形態は、ビデオ符号器から圧縮ビデオビットストリームを受信するビットストリームマーク付け装置、すなわちビットストリームマーカにおいて実行されてもよい。このために、方法の一実施形態はビデオ符号器において実現されてもよい。ビットストリームマーカは、各々が例えばビデオフレームである圧縮ビデオデータ、補足情報又は一般にＮＡＬユニットを含むビデオパケットにビットストリームを細分化する。その後、各ビデオパケットは、本発明の一実施形態に係るＮＡＬユニットヘッダ内の構文要素ｓｔｒｅａｍ＿ｉｄを使用して、単一のサブセット識別子でマークを付けられる。オプションとして、ビデオ符号器はビットストリームマーカにパケット化されたビデオビットストリームを提供してもよく、その場合、ビットストリームマーカはビットストリームをパケットに細分化する必要がない。更に、本発明の一実施形態に係るマーク付け手順は、別個のビットストリームマーカではなくビデオ符号器により実行されてもよい。 In the following, referring to FIG. 4, an embodiment of a method for indicating a bitstream subset within a compressed video bitstream will be described. One embodiment of the method is performed in a sending device, such as the encoding device 110 described with reference to FIG. In particular, one embodiment of the method may be performed in a bitstream marking device that receives a compressed video bitstream from a video encoder, ie, a bitstream marker. For this, an embodiment of the method may be implemented in a video encoder. Bitstream markers subdivide the bitstream into compressed video data, each of which is, for example, a video frame, supplemental information, or video packets that generally include NAL units. Each video packet is then marked with a single subset identifier using the syntax element stream_id in the NAL unit header according to one embodiment of the invention. Optionally, the video encoder may provide a video bitstream that is packetized into bitstream markers, in which case the bitstream marker need not subdivide the bitstream into packets. Furthermore, the marking procedure according to an embodiment of the invention may be performed by a video encoder rather than a separate bitstream marker.

圧縮ビデオビットストリーム内のビットストリームサブセットを示す方法の一実施形態４１０を図４に示す。方法４１０は、例えばビデオ符号器から圧縮ビデオビットストリームを受信すること（４１１）と、圧縮ビデオビットストリームをビデオパケットに分割すること（４１２）と、複数のサブセット識別子のうちの単一のサブセット識別子で各ビデオパケットにマークを付けること（４１３）とを備える。複数のサブセット識別子の各サブセット識別子は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットと関連付けられる。 One embodiment 410 of a method illustrating a bitstream subset within a compressed video bitstream is shown in FIG. The method 410 may, for example, receive a compressed video bitstream from a video encoder (411), split the compressed video bitstream into video packets (412), and a single subset identifier of the plurality of subset identifiers. Marking each video packet (413). Each subset identifier of the plurality of subset identifiers is associated with a corresponding bitstream subset of the plurality of bitstream subsets.

オプションとして、方法４１０は、少なくとも１つのサブセットの定義を提供すること（４１４）を更に備える。サブセットの定義の各々は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットのプロパティを記述する。サブセットの定義は、圧縮ビデオビットストリーム内のビデオパケットとして提供され、ネットワーク要素及びクライアントに送信されてもよい。 Optionally, the method 410 further comprises providing 414 a definition of at least one subset. Each subset definition describes the properties of the corresponding bitstream subset of the plurality of bitstream subsets. The subset definition may be provided as video packets in a compressed video bitstream and sent to network elements and clients.

ステップ４１４で提供されるサブセットの定義に加えて、方法４１０は少なくとも１つのビデオ表現の定義を提供すること（４１５）を更に備えてもよい。ビデオ表現の定義の各々は少なくとも１つの関係のあるサブセット識別子を含み、少なくとも１つの関係のあるサブセット識別子と関連付けられた全てのビットストリームサブセットは復号化可能なビデオ表現を形成する。少なくとも１つのビデオ表現の定義は、圧縮ビデオビットストリーム内のビデオパケットとして提供され、ネットワーク要素及びクライアントに送信されてもよい。 In addition to the subset definition provided in step 414, the method 410 may further comprise providing (415) a definition of at least one video representation. Each definition of the video representation includes at least one relevant subset identifier, and all bitstream subsets associated with the at least one relevant subset identifier form a decodable video representation. The definition of at least one video representation may be provided as a video packet in a compressed video bitstream and sent to network elements and clients.

以下、図５を参照して、ビデオパケットに分割された圧縮ビデオビットストリームからビデオパケットを抽出する方法の実施形態を説明する。方法の一実施形態は、例えば図１を参照して説明したネットワーク要素１２１〜１２３等の受信装置又は復号化装置１３０において実行される。特に、本発明の一実施形態は、ビデオパケットに分割された圧縮ビデオビットストリームを受信するビットストリーム抽出装置、すなわちビットストリーム抽出器において実行されてもよい。このために、方法の一実施形態は、ビデオ復号器又はビデオパケットをルーティングするように構成されるネットワーク要素において実現されてもよい。 Hereinafter, an embodiment of a method for extracting a video packet from a compressed video bitstream divided into video packets will be described with reference to FIG. One embodiment of the method is performed in a receiving device or decoding device 130, such as the network elements 121-123 described with reference to FIG. In particular, an embodiment of the present invention may be implemented in a bitstream extraction device that receives a compressed video bitstream divided into video packets, ie, a bitstream extractor. To this end, an embodiment of the method may be implemented in a video decoder or a network element configured to route video packets.

ビデオパケットに分割された圧縮ビデオビットストリームからビデオパケットを抽出する方法の一実施形態５１０を図５に示す。方法５１０は、少なくとも１つの関係のあるサブセット識別子を提供すること（５１１）と、圧縮ビデオビットストリームからビデオパケットを受信すること（５１２）と、受信したビデオパケット毎に（５１３）、ビデオパケットのサブセット識別子を検査すること（５１４）と、抽出したサブセット識別子が少なくとも１つの関係のあるサブセット識別子のうちの１つと一致するという条件で（５１５）、圧縮ビデオビットストリームからビデオパケットを抽出すること（５１６）とを備える。 One embodiment 510 of a method for extracting video packets from a compressed video bitstream divided into video packets is shown in FIG. The method 510 provides at least one relevant subset identifier (511), receives video packets from the compressed video bitstream (512), and for each received video packet (513), Examining the subset identifier (514) and extracting a video packet from the compressed video bitstream, provided that the extracted subset identifier matches one of the at least one related subset identifier (515). 516).

オプションとして、方法５１０は、抽出したビデオパケットを転送又は復号化すること（５１７）と、抽出したサブセット識別子が少なくとも１つの関係のあるサブセット識別子のうちのいずれとも一致しないという条件で（５１５）、受信したビデオパケットを破棄すること（５１８）とを更に備える。 Optionally, the method 510 transfers or decodes the extracted video packet (517) and provided that the extracted subset identifier does not match any of the at least one related subset identifier (515), Discarding (518) the received video packet.

更に、方法５１０は、複数のビットストリームサブセットのうちの対応するビットストリームサブセットのプロパティを記述するサブセットの定義を提供すること（５１９）を備えてもよい。対応するビットストリームと関連付けられたサブセット識別子は、ステップ５１１で少なくとも１つの関係のあるサブセット識別子として使用される。 Further, the method 510 may comprise providing a definition of a subset that describes properties of a corresponding bitstream subset of the plurality of bitstream subsets (519). The subset identifier associated with the corresponding bitstream is used as at least one relevant subset identifier in step 511.

オプションとして、ステップ５１９において、サブセットの定義は対応するビットストリームサブセットの少なくとも１つのプロパティに従って複数のサブセットの定義から選択されてもよい。サブセットの定義は、圧縮ビデオビットストリーム内のビデオパケットから受信されてもよい。 Optionally, in step 519, the subset definition may be selected from a plurality of subset definitions according to at least one property of the corresponding bitstream subset. The definition of the subset may be received from a video packet in the compressed video bitstream.

更に、方法５１０は、少なくとも１つの関係のあるサブセット識別子を含むビデオ表現の定義を提供すること（５２０）を更に備えてもよい。少なくとも１つの関係のあるサブセット識別子と関連付けられたビットストリームサブセットは復号化可能なビデオ表現を形成する。ビデオ表現の定義は、圧縮ビデオビットストリーム内のビデオパケットから受信されてもよい。 Further, the method 510 may further comprise providing (520) a definition of a video representation that includes at least one relevant subset identifier. The bitstream subset associated with the at least one related subset identifier forms a decodable video representation. The definition of the video representation may be received from a video packet in the compressed video bitstream.

以下、図６を参照して、本発明の一実施形態に係る圧縮ビデオビットストリーム内のビットストリームサブセットを示すビットストリームマーカを説明する。ビットストリームマーカの一実施形態は、例えば図１を参照して説明した符号化装置１１０内に位置する。特に、ビットストリームマーカの一実施形態はビデオ符号器において実現されてもよい。 Hereinafter, with reference to FIG. 6, a bitstream marker indicating a bitstream subset in a compressed video bitstream according to an embodiment of the present invention will be described. One embodiment of the bitstream marker is located in the encoding device 110 described with reference to FIG. 1, for example. In particular, one embodiment of a bitstream marker may be implemented in a video encoder.

ビットストリームマーカ６２０は、ビデオソース信号６０１を符号化するように構成されるビデオ符号器６１０から圧縮ビデオビットストリーム６０２を受信する。ビットストリームマーカ６２０は、各々が例えばビデオフレームである圧縮ビデオデータ、補足情報又は一般にＮＡＬユニットを含むビデオパケットにビットストリーム６０２を細分化する。その後、各ビデオパケットは、前述したように、ＮＡＬユニットヘッダ内の構文要素ｓｔｒｅａｍ＿ｉｄを使用して単一のサブセット識別子でマークを付けられる。ビットストリームマーカ６２０は、パケット化され且つマークを付けられたビットストリーム６０３をネットワーク１２０等の転送ネットワークに送信し、最終的にクライアント又は図１を参照して説明した復号化装置１３０等のピア・トゥ・ピアネットワーク内のピアに送信する。 Bitstream marker 620 receives a compressed video bitstream 602 from video encoder 610 that is configured to encode video source signal 601. Bitstream marker 620 subdivides bitstream 602 into compressed video data, each of which is, for example, a video frame, supplemental information, or video packets that generally include NAL units. Each video packet is then marked with a single subset identifier using the syntax element stream_id in the NAL unit header, as described above. The bitstream marker 620 sends the packetized and marked bitstream 603 to a transport network such as the network 120 and finally a client or peer such as the decoding device 130 described with reference to FIG. Send to a peer in a to-peer network.

このために、ビットストリームマーカ６２０は、圧縮ビデオビットストリームを受信する受信ユニット６２１と、圧縮ビデオビットストリームをビデオパケットに分割するように構成されたパケット化ユニット６２２と、単一のサブセット識別子ｓｔｒｅａｍ＿ｉｄで各ビデオパケットにマークを付けるマーク付けユニット６２３とを備える。 To this end, the bitstream marker 620 includes a receiving unit 621 that receives the compressed video bitstream, a packetization unit 622 configured to split the compressed video bitstream into video packets, and a single subset identifier stream_id. A marking unit 623 for marking each video packet.

更に、ビットストリームマーカ６２０は、オプションで、少なくとも１つのサブセットの定義を提供するサブセットの定義ユニット６２４を備えてもよい。サブセットの定義は、圧縮ビデオビットストリーム内のビデオパケットとして提供されてもよい。 Further, the bitstream marker 620 may optionally comprise a subset definition unit 624 that provides a definition of at least one subset. The definition of the subset may be provided as a video packet in the compressed video bitstream.

更に、ビットストリームマーカ６２０は、オプションで、少なくとも１つのビデオ表現の定義を提供するビデオ表現の定義ユニット６２５を備えてもよい。ビデオ表現の定義は、圧縮ビデオビットストリーム内のビデオパケットとして提供されてもよい。 Further, the bitstream marker 620 may optionally comprise a video representation definition unit 625 that provides a definition of at least one video representation. The definition of the video representation may be provided as a video packet within the compressed video bitstream.

受信ユニット６２１、パケット化ユニット６２２、マーク付けユニット６２３、サブセットの定義ユニット６２４及びビデオ表現ユニット６２５は、回路網、集積回路（ＩＣ）、特定用途向け集積回路（ＡＳＩＣ）、１つ以上のプロセッサ上で実行するコンピュータプログラムモジュール、あるいはそれらの組み合わせにより実現される。ユニット６２１〜６２５は別個のユニットとして実現されてもよく、あるいは組み合わされて実現されてもよい。 Receiving unit 621, packetizing unit 622, marking unit 623, subset defining unit 624 and video representation unit 625 may be a network, integrated circuit (IC), application specific integrated circuit (ASIC), on one or more processors. It is realized by a computer program module executed in the above or a combination thereof. The units 621 to 625 may be realized as separate units or may be realized in combination.

ビデオ符号器６１０はビットストリームマーカ６２０にパケット化されたビデオビットストリーム６０２を提供してもよく、その場合、ビットストリームマーカ６２０はビットストリーム６０２をパケットに細分化する必要がないことが理解されるだろう。更に、上述の本発明の一実施形態に係るマーク付け手順は、別個のビットストリームマーカではなくビデオ符号器６１０により実現されてもよい。更に、コンピュータプログラムの一実施形態を用いて既存のビデオ符号器のソフトウェアを更新することにより、既存のビデオ符号器が本発明の一実施形態に係るビットストリームのマーク付けを実行するように構成されてもよい。 It will be appreciated that video encoder 610 may provide packetized video bitstream 602 to bitstream marker 620, in which case bitstream marker 620 need not subdivide bitstream 602 into packets. right. Furthermore, the marking procedure according to one embodiment of the present invention described above may be implemented by the video encoder 610 instead of a separate bitstream marker. Further, the existing video encoder is configured to perform bitstream marking according to an embodiment of the present invention by updating the software of the existing video encoder using an embodiment of the computer program. May be.

以下、図７を参照して、本発明の一実施形態に係る圧縮ビデオビットストリームからビデオパケットを抽出するビットストリーム抽出器を説明する。ビットストリーム抽出器の一実施形態は、例えば図１を参照して説明した復号化装置１３０又はネットワーク要素１２１〜１２３内に位置する。特に、ビットストリーム抽出器の一実施形態はビデオ復号器において実現されてもよく、あるいはビデオパケットをルーティングするように構成されるネットワーク要素において実現されてもよい。 Hereinafter, a bitstream extractor for extracting video packets from a compressed video bitstream according to an embodiment of the present invention will be described with reference to FIG. One embodiment of the bitstream extractor is located, for example, in the decoding device 130 or the network elements 121 to 123 described with reference to FIG. In particular, one embodiment of a bitstream extractor may be implemented in a video decoder or may be implemented in a network element configured to route video packets.

ビットストリーム抽出器７１０は、圧縮ビデオビットストリーム７０１、ビデオパケット、すなわち複数のビットストリームサブセットと関連付けられたＮＡＬユニットを受信する。ビデオビットストリーム７０１は、例えば図１を参照して説明したネットワーク１２０等の転送ネットワークから受信される。ビットストリーム抽出器７１０は、ビットストリーム７０１に含まれる関係のあるＮＡＬユニットを識別し、更なる処理のためにそれらを抽出する。 The bitstream extractor 710 receives a compressed video bitstream 701, video packets, ie, NAL units associated with a plurality of bitstream subsets. The video bitstream 701 is received from a transfer network such as the network 120 described with reference to FIG. The bitstream extractor 710 identifies relevant NAL units contained in the bitstream 701 and extracts them for further processing.

このために、ビットストリーム抽出器７１０は、少なくとも１つの関係のあるサブセット識別子を提供するサブセット選択ユニット７１１と、ビデオビットストリーム７０１からビデオパケットを受信する受信ユニット７１２と、受信したビデオパケット毎にそのサブセット識別子を検査し且つ抽出したサブセット識別子が少なくとも１つの関係のあるサブセット識別子のうちの１つと一致するという条件でビデオビットストリーム７０１からビデオパケットを抽出する抽出ユニット７１３とを備える。オプションとして、抽出ユニット７１３は、受信したビデオパケット毎に、抽出したビデオパケットの転送又は復号化、並びにビデオパケットの破棄（７０４）を行うように更に構成される。ビデオパケットが転送される場合、これは例えばビデオ復号器７２０に送信される（７０２）。ビデオ復号器７２０はビデオ信号を復号化し、閲覧者に対する表示等の更なる処理のために復号化ビデオ信号７０３を出力する。受信したビデオパケットは、抽出されたサブセット識別子が少なくとも１つの関係のあるサブセット識別子のうちのいずれとも一致しないという条件で破棄される（７０４）。 To this end, the bitstream extractor 710 includes a subset selection unit 711 that provides at least one relevant subset identifier, a reception unit 712 that receives video packets from the video bitstream 701, and for each received video packet, An extraction unit 713 that examines the subset identifier and extracts a video packet from the video bitstream 701 on the condition that the extracted subset identifier matches one of the at least one relevant subset identifier. Optionally, the extraction unit 713 is further configured to forward or decode the extracted video packet and discard (704) the video packet for each received video packet. If the video packet is forwarded, it is transmitted (702) to, for example, video decoder 720. Video decoder 720 decodes the video signal and outputs decoded video signal 703 for further processing, such as display to the viewer. The received video packet is discarded (704) on condition that the extracted subset identifier does not match any of the at least one related subset identifier.

オプションとして、ビットストリーム抽出器７１０は、サブセットの定義を提供するサブセットの定義ユニット７１４を更に備えてもよい。サブセットの定義ユニット７１４は、対応するビットストリームサブセットの少なくとも１つのプロパティに従って複数のサブセットの定義から１つのサブセットの定義を選択するように構成される。サブセットの定義ユニット７１４は、圧縮ビデオビットストリーム内のビデオパケットからサブセットの定義を受信するように更に構成されてもよい。 Optionally, the bitstream extractor 710 may further comprise a subset definition unit 714 that provides a subset definition. The subset definition unit 714 is configured to select a subset definition from the plurality of subset definitions according to at least one property of the corresponding bitstream subset. The subset definition unit 714 may be further configured to receive a subset definition from video packets in the compressed video bitstream.

ビットストリーム抽出器７１０等のビットストリーム抽出器の一実施形態は、ビデオ表現の定義を提供するビデオ表現の定義ユニット７１５を更に備えてもよい。ビデオ表現の定義ユニット７１５は、圧縮ビデオビットストリーム内のビデオパケットからビデオ表現の定義を受信するように更に構成されてもよい。 One embodiment of a bitstream extractor, such as bitstream extractor 710, may further comprise a video representation definition unit 715 that provides a definition of the video representation. Video representation definition unit 715 may be further configured to receive a video representation definition from a video packet in the compressed video bitstream.

サブセット選択ユニット７１１、受信ユニット７１２、抽出ユニット７１３、サブセットの定義ユニット７１４及びビデオ表現ユニット７１５は、回路網、ＩＣ、ＡＳＩＣ、１つ以上のプロセッサ上で実行するコンピュータプログラムモジュール、あるいはそれらの組み合わせにより実現される。ユニット７１１〜７１５は、別個のユニットとして実現されてもよく、あるいは組み合わされて実現されてもよい。 Subset selection unit 711, reception unit 712, extraction unit 713, subset definition unit 714 and video representation unit 715 may be a network, an IC, an ASIC, a computer program module executing on one or more processors, or a combination thereof. Realized. The units 711 to 715 may be realized as separate units or may be realized in combination.

ビデオビットストリームからビットストリームサブセット、すなわちビデオパケットを抽出する手順は、別個のビットストリーム抽出器ではなくビデオ復号器７２０により実行されてもよいことが理解されるだろう。更に、コンピュータプログラムの一実施形態を用いて既存のビデオ復号器のソフトウェアを更新することにより、既存のビデオ復号器が本発明の一実施形態に係るビットストリームの抽出を実行するように構成されてもよい。 It will be appreciated that the procedure of extracting a bitstream subset, i.e., video packet, from a video bitstream may be performed by a video decoder 720 rather than a separate bitstream extractor. Furthermore, the existing video decoder is configured to perform bitstream extraction according to an embodiment of the present invention by updating the software of the existing video decoder using an embodiment of the computer program. Also good.

図８を参照して、本発明の実施形態に係るコンピュータプログラム及びコンピュータプログラム製品を示す。 Referring to FIG. 8, a computer program and a computer program product according to an embodiment of the present invention are shown.

図８は、ビデオビットストリーム８０１を処理し且つ処理されたビデオビットストリーム８０２を出力するビデオ処理装置８００を示す。ビデオ処理装置８００は、プロセッサ８０３及び記憶媒体８０４を備える。記憶媒体８０４は、コンピュータプログラム８０５を備えるコンピュータプログラム製品である。あるいは、コンピュータプログラム８０５は、フロッピディスク又はメモリスティック等の適切なコンピュータプログラム製品により記憶媒体８０４に転送される。更なる方法として、コンピュータプログラム８０５は、ネットワークを介して記憶媒体８０４にダウンロードされる。プロセッサ８０３は、記憶媒体８０４からコンピュータプログラム８０５をロードしてコンピュータプログラム８０５に含まれるコンピュータプログラムコードを実行することにより本発明の第１の態様又は第４の態様に係る方法の一実施形態を実現するように構成される。例えばプロセッサ８０３は、コンピュータプログラム８０５を実行する時に、圧縮ビデオビットストリーム内のビットストリームサブセットを示す方法の一実施形態を実現するように構成される。あるいは、プロセッサ８０３は、コンピュータプログラム８０５を実行する時に、圧縮ビデオビットストリームからビデオパケットを抽出する方法の一実施形態を実現するように構成される。プロセッサ８０３は、汎用プロセッサ、ビデオプロセッサ、あるいはコンピュータプログラム８０４を実行する時に本発明の第１の態様又は第４の態様に係る方法の一実施形態を実現するように構成される他の何らかの種類の回路網である。処理装置８００は、例えば移動電話、タブレット、ユーザ機器（ＵＥ）、パーソナルコンピュータ、ビデオプレーヤ／レコーダ、マルチメディアプレーヤ、メディアストリーミングサーバ、セットトップボックス、テレビ、あるいは演算機能を有する他の何らかの種類の装置に含まれてもよい。 FIG. 8 shows a video processing device 800 that processes a video bitstream 801 and outputs a processed video bitstream 802. The video processing device 800 includes a processor 803 and a storage medium 804. The storage medium 804 is a computer program product including a computer program 805. Alternatively, the computer program 805 is transferred to the storage medium 804 by an appropriate computer program product such as a floppy disk or a memory stick. As a further method, the computer program 805 is downloaded to the storage medium 804 via a network. The processor 803 implements an embodiment of the method according to the first aspect or the fourth aspect of the present invention by loading the computer program 805 from the storage medium 804 and executing the computer program code included in the computer program 805. Configured to do. For example, the processor 803 is configured to implement one embodiment of a method for indicating a bitstream subset in a compressed video bitstream when executing the computer program 805. Alternatively, the processor 803 is configured to implement an embodiment of a method for extracting video packets from a compressed video bitstream when executing the computer program 805. The processor 803 may be a general purpose processor, a video processor, or some other type configured to implement an embodiment of the method according to the first or fourth aspect of the invention when executing the computer program 804. It is a network. The processing device 800 may be, for example, a mobile phone, tablet, user equipment (UE), personal computer, video player / recorder, multimedia player, media streaming server, set top box, television, or some other type of device with computing capabilities. May be included.

更に、上述した本発明の全ての実施形態は、ソフトウェア、ハードウェア又はそれらの組み合わせでビデオ符号器又は復号器において実現される。符号器及び／又は復号器は、送出装置と受信装置との間の通信ネットワーク内のネットワークノードであるか又はそのようなネットワークノードに属するネットワーク装置において実現されてもよい。そのようなネットワーク装置は、例えば受信装置が送出装置から送出されたビデオ符号化規格ではない別のビデオ符号化規格のみに対応するか又はそれを優先することが確立される場合に１つのビデオ符号化規格に従うビデオを別のビデオ符号化規格に変換する装置である。上記で開示したビデオ符号器及び／又は復号器は物理的に別個の装置として開示され、１つ以上のＡＳＩＣ等の特定用途向け回路に含まれてもよいが、本発明は、符号器及び／又は復号器の一部が１つ以上の汎用プロセッサ上で実行するコンピュータプログラムモジュールとして実現される装置の実施形態を範囲に含む。 Furthermore, all embodiments of the invention described above are implemented in a video encoder or decoder in software, hardware or a combination thereof. The encoder and / or decoder may be implemented in a network node in a communication network between a sending device and a receiving device or belonging to such a network node. Such a network device may, for example, have one video code if it is established that the receiving device only supports or prefers another video coding standard that is not the video coding standard sent from the sending device. It is an apparatus for converting a video that complies with an encoding standard into another video encoding standard. Although the video encoder and / or decoder disclosed above is disclosed as a physically separate device and may be included in one or more application specific circuits such as an ASIC, the present invention Or, an embodiment of an apparatus in which a portion of the decoder is implemented as a computer program module executing on one or more general purpose processors is included in the scope.

本発明は上述の実施形態に限定されないことが当業者には理解される。多くの変更及び変形が添付の特許請求の範囲の範囲内で可能である。例えば提案された階層ストリームの信号伝送の概念は、原則的に、オーディオ、字幕及びグラフィックス等の全ての種類のメディアに適用される。更に、クライアント又はネットワーク要素が信頼性の高い伝送チャネルによりＳｔＰＳ及びＲＰＳを有利に取得できる一方で、ビデオデータを含む残りのビットストリームサブセットを送信するためにＨＴＴＰ及びＲＴＰ等の伝送プロトコルが使用されてもよい。最後に、ＮＡＬユニットヘッダは、単一のサブセット識別子ｓｔｒｅａｍ＿ｉｄに加えて、更なる情報要素を含んでもよいことが更に理解されるだろう。 Those skilled in the art will appreciate that the present invention is not limited to the embodiments described above. Many modifications and variations are possible within the scope of the appended claims. For example, the proposed concept of layered stream signal transmission applies in principle to all types of media such as audio, subtitles and graphics. In addition, transmission protocols such as HTTP and RTP are used to transmit the remaining subset of bitstreams containing video data while clients or network elements can advantageously obtain StPS and RPS over a reliable transmission channel. Also good. Finally, it will be further understood that the NAL unit header may include additional information elements in addition to the single subset identifier stream_id.

Claims

A method (410) of indicating a bitstream subset in a compressed video bitstream (210) that includes a plurality of bitstream subsets, comprising:
Receiving the compressed video bitstream (411);
Dividing (412) the compressed video bitstream into video packets (211 to 216) each containing either video data or supplemental information;
Marking each video packet with a subset identifier of a plurality of subset identifiers (413), wherein each subset identifier of the plurality of subset identifiers corresponds to a corresponding bit of the plurality of bitstream subsets A method characterized by being associated with a stream subset.

The method further comprising providing (414) at least one subset definition (221-223, 320) each describing a property of a corresponding bitstream subset of the plurality of bitstream subsets. Item 2. The method according to Item 1.

The method of claim 2, wherein the at least one subset definition (221-223, 320) is provided as a video packet in the compressed video bitstream (210).

The method according to any one of claims 1 to 3, wherein each subset identifier of the plurality of subset identifiers is a numerical value corresponding to a relative priority of an associated bitstream subset.

Providing (415) at least one video representation definition (231-233, 330), each including at least one related subset identifier, associated with the at least one related subset identifier 5. A method according to any one of the preceding claims, wherein the bitstream subset forms a decodable video representation.

The method of claim 5, wherein the at least one video representation definition (231-233, 330) is provided as a video packet in the compressed video bitstream (210).

A computer program (805) configured to implement the method of any one of claims 1 to 6 when executed on a processor (803).

A computer readable medium (804) in which a computer program (805) according to claim 7 is stored.

A method (510) of extracting video packets from a compressed video bitstream divided into video packets (211 to 216), wherein the compressed video bitstream (210) includes a plurality of bitstream subsets, each video packet being Including either video data or supplemental information and a subset identifier of a plurality of subset identifiers, each subset identifier being associated with a corresponding bitstream subset of the plurality of bitstream subsets;
Providing (511) at least one relevant subset identifier;
Receiving a video packet from the compressed video bitstream (512);
For each received video packet,
Examining (514) the subset identifier of the video packet;
If the extracted subset identifier matches one of the at least one relevant subset identifier (515), extracting the video packet from the compressed video bitstream (516);
A method comprising the steps of:

For each received video packet,
Transferring or decoding the extracted video packet (517);
If the extracted subset identifier does not match any of the at least one relevant subset identifier (515), discard the received video packet (518);
10. The method of claim 9, further comprising:

Providing (519) a subset definition (221-223, 320) describing properties of a corresponding bitstream subset of the plurality of bitstream subsets;
Using the subset identifier associated with the corresponding bitstream as the at least one related subset identifier (511);
The method according to claim 9 or 10, further comprising:

The method of claim 11, further comprising selecting (519) a definition of the subset from a plurality of subset definitions (221-223, 320) according to at least one property of the corresponding bitstream subset. Method.

13. The method of claim 11 or 12, further comprising receiving (519) the subset definition (221-223, 320) from a video packet in the compressed video bitstream (210).

Providing (520) a definition (231) of a video representation that includes the at least one related subset identifier, the bitstream subset associated with the at least one related subset identifier; 14. A method according to any one of claims 9 to 13, characterized in that forms a decodable video representation.

The method of claim 14, further comprising receiving (520) the video representation definition (231-233, 330) from a video packet in the compressed video bitstream (210).

The method according to any one of claims 9 to 15, wherein each subset identifier of the plurality of subset identifiers is a numerical value corresponding to a relative priority of an associated bitstream subset.

A computer program (805) configured to implement the method according to any one of claims 9 to 16 when executed on a processor (803).

A computer readable medium (804) in which a computer program (805) according to claim 17 is stored.

A bitstream marker (110, 620) indicating a bitstream subset in a compressed video bitstream (210, 602) comprising a plurality of bitstream subsets,
A receiving unit (621) configured to receive the compressed video bitstream;
A packetization unit (622) configured to divide the compressed video bitstream into video packets (211 to 216) each containing either video data or supplemental information;
A marking unit (623) configured to mark each video packet with one of the plurality of subset identifiers, each subset identifier corresponding to a corresponding one of the plurality of bitstream subsets A bitstream marker characterized by being associated with a bitstream subset.

A subset definition unit (624) configured to provide at least one subset definition (221-223, 320) each describing a property of a corresponding bitstream subset of the plurality of bitstream subsets; The bit stream marker according to claim 19, further comprising:

21. The bitstream marker of claim 20, wherein the at least one subset definition (221-223, 320) is provided as a video packet in the compressed video bitstream (210, 603).

The bitstream marker according to any one of claims 19 to 21, wherein each subset identifier of the plurality of subset identifiers is a numerical value corresponding to a relative priority of an associated bitstream subset. .

A video representation definition unit (625) configured to provide at least one video representation definition (231-233, 330), each including at least one relevant subset identifier, the at least one 22. Bitstream marker according to any one of claims 19 to 21, wherein the bitstream subset associated with a relevant subset identifier forms a decodable video representation.

24. The bitstream marker of claim 23, wherein the at least one video representation definition (231-233, 330) is provided as a video packet in the compressed video bitstream (210, 603).

A bit stream extractor (121 to 123, 130, 710) for extracting a video packet from a compressed video bit stream divided into video packets (211 to 216), wherein the compressed video bit stream (210, 701) includes a plurality of compressed video bit streams (210, 701). Bitstream subsets, each video packet includes either video data or supplemental information and a subset identifier of a plurality of subset identifiers, each subset identifier corresponding to a corresponding one of the plurality of bitstream subsets Associated with a bitstream subset, the bitstream extractor
A subset selection unit (711) configured to provide at least one relevant subset identifier;
A receiving unit (712) configured to receive video packets from the compressed video bitstream;
For each received video packet,
Examining the subset identifier of the video packet and extracting the video packet from the compressed video bitstream if the extracted subset identifier matches one of the at least one relevant subset identifier An extraction unit (713) configured as follows:
A bitstream extractor comprising:

For each video packet received, the extraction unit (713)
Forwarding or decoding (702) the extracted video packet;
26. The method of claim 25, further configured to discard (704) the received video packet if the extracted subset identifier does not match any of the at least one related subset identifier. Bitstream extractor.

A subset definition unit (714) configured to provide a subset definition (221-223, 320) describing properties of a corresponding bitstream subset of the plurality of bitstream subsets;
27. The subset selection unit (711) is further configured to use the subset identifier associated with the corresponding bitstream as the at least one relevant subset identifier. The described bitstream extractor.

The subset definition unit (714) is further configured to select the subset definition from a plurality of subset definitions (221-223, 320) according to at least one property of the corresponding bitstream subset. 28. A bitstream extractor according to claim 27.

29. Bitstream extractor according to claim 27 or 28, wherein the subset definition (221-223, 320) is received from a video packet in the compressed video bitstream (210, 701).

The video representation definition unit (715) configured to provide a video representation definition (231-233, 330) that includes the at least one related subset identifier, further comprising the at least one related subset. 30. A bitstream extractor according to any one of claims 25 to 29, wherein the bitstream subset associated with an identifier forms a decodable video representation.

The video representation definition unit (715) is further configured to receive the video representation definitions (231-233, 330) from video packets in the compressed video bitstream (210, 701). The bit stream extractor according to claim 30.

The bitstream extractor according to any one of claims 26 to 31, wherein each subset identifier of the plurality of subset identifiers is a numerical value corresponding to a relative priority of an associated bitstream subset. .