JP5346338B2

JP5346338B2 - Method for indexing video and apparatus for indexing video

Info

Publication number: JP5346338B2
Application number: JP2010513897A
Authority: JP
Inventors: ファーブル，シルヴァン; ソシャール，レジ; ラガレイエ，ピエール・ローレン; ムール，オリヴィエル; ギヨテル，フィリップ; ヴェルミューレン，サミュエル
Original assignee: Thomson Licensing SAS
Current assignee: Thomson Licensing SAS
Priority date: 2007-06-29
Filing date: 2008-06-25
Publication date: 2013-11-20
Anticipated expiration: 2028-06-25
Also published as: CN101690228A; WO2009003885A2; JP2010532121A; CN101690228B; KR20100042632A; WO2009003885A3; EP2174500A2; KR101488548B1

Description

本発明は、ビデオを索引化する方法及びビデオを索引化する装置に関する。 The present invention relates to a method for indexing video and an apparatus for indexing video.

いくつかのピクチャ処理アプリケーションは、ピクチャ品質を向上させるために関心領域（ＲＯＩ）の検出を使用する。例えば、符号化アプリケーションは多くの場合、関心領域を復号化し、前述の領域を符号化するために、より多くの資源を活用する。 Some picture processing applications use region of interest (ROI) detection to improve picture quality. For example, an encoding application often uses more resources to decode a region of interest and to encode the aforementioned region.

種々の方法が、ピクチャにおける関心領域の検出を可能にする。特に、視覚パラメータを考慮に入れ、ピクチャ又はビデオを視る際に人間の眼が永く残る領域の画定を可能にするピクチャ又はビデオの顕著性マップの確立に基づく手法が知られている。 Various methods allow detection of a region of interest in a picture. In particular, methods are known that take into account visual parameters and are based on the establishment of a saliency map of a picture or video that allows the definition of a region where the human eye will remain long when viewing the picture or video.

関心領域の検出は現在、主に、（例えば、前述の領域の量子化ステップを削減することにより、）より多くの帯域幅を与えることにより、符号化中に関心領域を特権化するように、符号化前に使用される。 Region-of-interest detection is currently primarily used to privilege the region of interest during encoding by providing more bandwidth (eg, by reducing the quantization step of the aforementioned region). Used before encoding.

移動端末（携帯電話機、ＰＤＡ、ゲーム・コンソール、ポータブルＤＶＤプレイヤなど）の出現、ディスプレイ及び画面の手法の進展、及び新規サービスの台頭が全て組み合わさって、表示容量が低い端末上でビデオを表示することが必要になっている。例えば、携帯電話機上でテレビを受信することができることにより、寸法が小さな画面上に高密度のピクチャを表示するうえで問題が生じている。 The appearance of mobile terminals (cell phones, PDAs, game consoles, portable DVD players, etc.), the development of display and screen techniques, and the rise of new services are all combined to display video on terminals with low display capacity. It is necessary. For example, since a television can be received on a mobile phone, there is a problem in displaying a high-density picture on a screen having a small size.

本発明は、主に、関心領域の検出でなく、種々のアプリケーションについて考慮に入れ、移動端末であってもなくても、低い表示容量を備えた端末上でのピクチャ表示の問題を少なくとも解決することが可能な、装置又はアプリケーションへの前述の関心領域の伝送に関する。 The present invention primarily takes into account various applications rather than region of interest detection, and at least solves the problem of picture display on terminals with low display capacity, whether mobile terminals or not. It relates to the transmission of the aforementioned region of interest to a device or application.

この目的で、本発明は、符号化ビデオ・データ・ストリームを索引化する方法を提案している。本発明によれば、ビデオ・データ・ストリームは、各ピクチャの関心領域の位置に関する情報を含み、方法は、
符号化ビデオ・ストリームを受信する工程と、
記録サポート上に符号化ビデオ・ストリームする工程と、
関心領域の位置情報を復号化する工程と、
ピクチャ毎に関心領域を選択する工程と、
ビデオ・データを復号化する工程と、
ピクチャ毎に選択された関心領域からビデオ・データ・ストリームの所定数の関心領域を選択する工程と、
選択された関心領域を記録する工程と
を含む。 For this purpose, the present invention proposes a method for indexing an encoded video data stream. According to the invention, the video data stream contains information about the location of the region of interest for each picture, and the method comprises:
Receiving an encoded video stream;
Encoding the video stream on the recording support;
Decoding position information of the region of interest;
Selecting a region of interest for each picture;
Decoding the video data;
Selecting a predetermined number of regions of interest in the video data stream from regions of interest selected for each picture;
Recording the selected region of interest.

好ましい実施例では、記録する工程中に、
選択され、復号化されるにつれ、選択された関心領域が一時メモリに記録され、
選択された関心領域が全て、一時メモリに記録されると、選択された関心領域が永続的メモリ・サポート（５０３）に転送される。 In a preferred embodiment, during the recording process,
As selected and decrypted, the selected region of interest is recorded in temporary memory,
Once all the selected regions of interest have been recorded in temporary memory, the selected regions of interest are transferred to the permanent memory support (503).

好ましくは、記録される前に、関心領域は、選択された関心領域全てに均一のサイズを得るためにフォーマッティングされる。 Preferably, before being recorded, the region of interest is formatted to obtain a uniform size for all selected regions of interest.

好ましくは、方法は、暗号化鍵により、関心領域の位置を暗号化する工程を含む。 Preferably, the method includes the step of encrypting the location of the region of interest with an encryption key.

好ましくは、方法は、ユーザによる支払いにより、復号鍵を得る工程を含む。 Preferably, the method includes the step of obtaining a decryption key by payment by the user.

好ましくは、ビデオ・データ・ストリームは符号化標準Ｈ．２６４／ＡＶＣに応じて符号化され、位置情報は付加拡張情報（ＳＥＩ）タイプ・メッセージに含まれる。 Preferably, the video data stream is encoded standard H.264. The position information is included in a supplementary extended information (SEI) type message.

好ましい実施例によれば、ＳＥＩメッセージは、リアルタイム・プロトコル・パケット（ＲＴＰ）にカプセル化され、ＲＴＰパケットは暗号化される。 According to a preferred embodiment, SEI messages are encapsulated in real-time protocol packets (RTP) and RTP packets are encrypted.

好ましくは、関心領域位置情報に関する付加拡張情報タイプ・メッセージは、参照する各ピクチャの前後の符号化データに挿入される。 Preferably, the additional extended information type message regarding the region-of-interest position information is inserted into encoded data before and after each picture to be referred to.

好ましい実施例によれば、位置情報は、
各ピクチャにおける関心領域の数と、
ピクチャの寸法毎の関心領域毎の座標と、
関心領域毎の表面と、
ピクチャの他の関心領域に対する関心領域の重要度に対する重みと、
関心領域毎のコンテンツに関する情報と、
前述の情報の何れかの組合せ
から選ばれる情報を含む。 According to a preferred embodiment, the location information is
The number of regions of interest in each picture,
Coordinates for each region of interest for each dimension of the picture,
A surface for each region of interest;
A weight for the importance of the region of interest relative to other regions of interest in the picture;
Information about content for each area of interest,
Information selected from any combination of the foregoing information is included.

好ましくは、ピクチャ毎の関心領域の選択工程は、関心領域の重要度に対する重みに応じて関心領域を選択する。 Preferably, the region of interest selection step for each picture selects the region of interest according to the weight for the importance of the region of interest.

好ましくは、ビデオ符号化標準は、柔軟なマクロブロック配列を使用し、関心領域は他のピクチャ・データとは無関係に、スライス群に符号化され、関心領域の位置情報は、関心領域が符号化されるスライス群番号を含む。 Preferably, the video coding standard uses a flexible macroblock arrangement, the region of interest is encoded into slices independently of other picture data, and the location information of the region of interest is encoded by the region of interest. The slice group number to be processed.

好ましくは、付加拡張情報メッセージは、一関心領域に関係しているかを、スライス群毎に示す識別子を含む。 Preferably, the additional extended information message includes an identifier for each slice group indicating whether or not it relates to one region of interest.

好ましくは、方法はＳＥＩメッセージを読み出す更なる工程を含み、ビデオ・データの復号化の工程は関心領域を含むスライス群のみを復号化する。 Preferably, the method includes the further step of reading out the SEI message, and the step of decoding the video data decodes only the slice group including the region of interest.

本発明は、符号化ビデオ・データ・ストリームを索引化する装置にも関する。本発明によれば、ビデオ・データ・ストリームは、各ピクチャの関心領域の位置に関する情報を含み、装置は、
符号化ビデオ・ストリームを受信する手段と、
記録サポート上に符号化ビデオ・ストリームを記録する手段と、
関心領域の位置情報を復号化する手段と、
ビデオ・データを復号化する手段と、
ピクチャ毎に関心領域を選択する手段と、
ピクチャ毎に選択された関心領域からビデオ・データ・ストリームの所定数の関心領域を選択する手段と、
選択された関心領域を記録する手段
とを備える。 The invention also relates to an apparatus for indexing an encoded video data stream. According to the invention, the video data stream contains information about the location of the region of interest for each picture,
Means for receiving an encoded video stream;
Means for recording the encoded video stream on a recording support;
Means for decoding position information of the region of interest;
Means for decoding the video data;
Means for selecting a region of interest for each picture;
Means for selecting a predetermined number of regions of interest in the video data stream from regions of interest selected for each picture;
Means for recording the selected region of interest.

ピクチャの関心領域の検出は一般に、符号化前に行われる。このデータは次いで、符号化を容易にするために使用される。関心領域の位置が、ピクチャの復号化中に、特に、表示容量が制限された装置上の表示中に、関心のあるということを本願の発明者は認識している。実際に、受信端末は、関心領域のみの表示を選び、完全なピクチャの表示に対して前述の領域のより良好な可視性を有することを可能にすることができる。 Detection of a region of interest in a picture is generally performed before encoding. This data is then used to facilitate encoding. The inventor of the present application recognizes that the position of the region of interest is of interest during decoding of a picture, particularly during display on a device with limited display capacity. In fact, the receiving terminal can choose to display only the region of interest and allow it to have better visibility of the aforementioned region for the complete picture display.

本発明の好ましい実施例による符号化装置を示す図である。FIG. 2 is a diagram illustrating an encoding apparatus according to a preferred embodiment of the present invention. 本発明の好ましい実施例による符号化方法を示す図である。FIG. 3 is a diagram illustrating an encoding method according to a preferred embodiment of the present invention. 本発明の好ましい実施例による復号化装置を示す図である。FIG. 4 is a diagram illustrating a decoding apparatus according to a preferred embodiment of the present invention. 本発明の別の実施例による復号化方法を示す図である。FIG. 6 is a diagram illustrating a decoding method according to another embodiment of the present invention. 本発明の別の実施例による個人向記録タイプの装置を示す図である。FIG. 6 is a diagram illustrating a personal recording type apparatus according to another embodiment of the present invention. 本発明の実施例を実現する個人向記録タイプの装置における索引化方法を示す図である。It is a figure which shows the indexing method in the apparatus of the personal recording type which implement | achieves the Example of this invention.

本発明は、限定列挙でないが、添付図面を参照して実施例及び実現形態により、更に深く理解し、例証されよう。 The present invention will be more fully understood and illustrated by way of example and implementation with reference to the accompanying drawings, but not limiting enumeration.

図１は、本発明の好ましい実施例を実現するＨ．２６４／ＡＶＣ符号化標準による符号化装置を示す。前述の好ましい実施例では、ビデオ・ストリームが符号化される。 FIG. 1 illustrates the H.264 implementation of the preferred embodiment of the present invention. 1 shows an encoding device according to the H.264 / AVC encoding standard. In the preferred embodiment described above, the video stream is encoded.

現在のフレームＦ_ｎが、符号化器によって符号化されるために符号化器の入力に供給される。前述のフレームは、スライスの形式で符号化される、すなわち、前述のフレームは、１６×１６画素群に対応する特定数のマクロブロックをそれぞれが含むサブユニットに分割される。各マクロブロックはイントラ・モード又はインター・モードで符号化される。イントラ・モードであってもインター・モードであっても、マクロブロックは、再構成されたフレームに基づいて符号化される。モジュール１０９は、ピクチャのコンテンツに応じて、現在のピクチャの符号化モードをイントラ・モードに決定する。イントラ・モードでは、Ｐ（図２に示す）は、先行して符号化され、復号化され、再構成された（図２中のｕＦ’ｎ、ｕはフィルタリングされていない）現在のフレームＦｎのサンプルを含む。インター・モードでは、Ｐは、１つ又は複数のＦ’_ｎ−１フレームに基づいた動き推定から生じる。 The current frame F _n is supplied to the encoder input for encoding by the encoder. The aforementioned frames are encoded in the form of slices, i.e. the aforementioned frames are divided into subunits each containing a specific number of macroblocks corresponding to a 16x16 pixel group. Each macroblock is encoded in intra mode or inter mode. Whether in intra mode or inter mode, the macroblock is encoded based on the reconstructed frame. The module 109 determines the encoding mode of the current picture as the intra mode according to the content of the picture. In intra mode, P (shown in FIG. 2) is encoded, decoded, and reconstructed (uF′n in FIG. 2, u is unfiltered) of the current frame Fn. Includes sample. In inter mode, P results from motion estimation based on one or more F ′ _n−1 frames.

動き推定モジュール１０１は、現在のフレームＦｎと少なくとも１つの先行フレームＦ’ｎ−１との間の動きの推定を確立する。この動き推定から、動き補償モジュール１０２は、現在のピクチャＦｎがインター・モードで符号化しなければならない場合、フレームＰを生成する。 The motion estimation module 101 establishes a motion estimation between the current frame Fn and at least one previous frame F'n-1. From this motion estimation, motion compensation module 102 generates frame P if the current picture Fn has to be encoded in inter mode.

減算器１０３は、信号Ｄｎ（すなわち、符号化する対象のピクチャＦｎと、ピクチャＰとの間の差）を生成する。次いで、このピクチャは、モジュール１０４においてＤＣＴ変換によって変換される。変換されたピクチャが次いで、量子化モジュール１０５によって量子化される。次いで、ピクチャがモジュール１１１によって再編成される。ＣＡＢＡＣ（コンテキストベースの適応的二進算術符号化）型エントロピ符号化モジュール１１２は次いで、各ピクチャを符号化する。 The subtracter 103 generates a signal Dn (that is, a difference between the picture Fn to be encoded and the picture P). This picture is then transformed in module 104 by DCT transformation. The transformed picture is then quantized by the quantization module 105. The picture is then reorganized by module 111. The CABAC (context based adaptive binary arithmetic coding) type entropy encoding module 112 then encodes each picture.

モジュール１０６及び１０７（それぞれ、量子化モジュール及び逆変換モジュール）は、変換及び量子化に続く逆量子化及び逆変換の後、差Ｄ’ｎが再構成されることを可能にする。 Modules 106 and 107 (quantization module and inverse transform module, respectively) allow the difference D'n to be reconstructed after inverse quantization and inverse transformation following transformation and quantization.

ピクチャが、モジュール１０９により、イントラ・モードで符号化されると、イントラ予測モジュール１０８はピクチャを符号化する。ｕＦ’ｎピクチャ、並びにＤ’ｎ信号及びＰ信号の和が、加算器の出力１１４で得られる。このモジュール１０８は、再構成されたフィルタリングされていないＦ’ｎのピクチャを入力で受け取る。 When the picture is encoded in intra mode by module 109, intra prediction module 108 encodes the picture. The uF'n picture and the sum of the D'n and P signals are obtained at the output 114 of the adder. This module 108 receives as input the reconstructed unfiltered F'n pictures.

フィルタ・モジュール１１０は、ｕＦ’ｎピクチャから再構成され、フィルタリングされたＦ’ｎピクチャを得ることが可能である。 Filter module 110 may reconstruct from uF'n pictures to obtain filtered F'n pictures.

エントロピ復号化モジュール１１２は、ＮＡＬタイプのユニットでカプセル化された符号化スライスを送信する。ＮＡＬは、例えば、ヘッダに関する情報、及びスライスを含む。ＮＡＬタイプ・ユニットはモジュール１１３に送信される。 The entropy decoding module 112 transmits a coded slice encapsulated with a NAL type unit. The NAL includes, for example, information about a header and a slice. The NAL type unit is transmitted to the module 113.

モジュール１１６は、関心領域が求められることを可能にする。現在、いくつかの手法により、関心領域がピクチャ内で位置特定されることを可能にする。顕著性マップの確立に基づいた手法が特に知られている。例えば、トムソン・ライセンシング（ＴｈｏｍｐｓｏｎＬｉｃｅｎｓｉｎｇ）社により、西暦２００６年7月13日付けで出願された特許出願の国際公開２００６／０７２６３号パンフレット（西暦２００６年7月13日付け公開
）には、顕著性マップを作成する効果的な方法が開示されている。 Module 116 allows a region of interest to be determined. Currently, several techniques allow a region of interest to be located in a picture. Techniques based on the establishment of a saliency map are particularly known. For example, a patent application filed on July 13, 2006 by Thompson Licensing, International Publication No. 2006/07263 (published July 13, 2006) is prominent. An effective method for creating a map is disclosed.

手段１１６は次いで、ビデオのピクチャ毎に顕著性マップを確立する。前述の顕著性マップを確立するために、ユーザによって入力されるパラメータも考慮に入れ得る。例えば、ビデオが関係するイベントに応じて、撮影されたシーンの特定の重要な対象を規定し、特に、スポーツ・イベントの場合、それがサッカーの試合に関係している旨を規定することが可能である。効果的には、これにより、イベントに応じて顕著性ゾーンを重み付ける顕著性マップを得ることが可能になる。サッカーの試合では、スタンドではなくボールに焦点を当てることが好ましい。 Means 116 then establishes a saliency map for each picture of the video. In order to establish the aforementioned saliency map, parameters entered by the user may also be taken into account. For example, depending on the event that the video is related to, it is possible to define a certain important object of the scene that was filmed, especially in the case of a sporting event, that it is related to a soccer game It is. Effectively, this makes it possible to obtain a saliency map that weights saliency zones according to events. In soccer matches, it is preferable to focus on the ball, not the stand.

関心領域モジュールはしたがって、１つ又は複数の顕著性ゾーン（関心領域としても表される）が抽出されることを可能にする。前述の関心領域は次いで、ピクチャ上で地理的に位置特定される。 The region of interest module thus allows one or more saliency zones (also represented as regions of interest) to be extracted. The aforementioned region of interest is then geographically located on the picture.

前述の関心領域は、ピクチャの高さ及び幅に応じて座標によって識別される。前述のサイズは、関心領域毎に抽出することも可能である。意味情報の要素と関連付けることも可能である。実際に、サッカーの試合の場合、ユーザが、表示する対象のいくつかの関心領域の選択肢から、表示する対象の関心領域を選択することが可能な場合、関心領域に関する情報を必要とし得る。 The aforementioned region of interest is identified by coordinates according to the height and width of the picture. The aforementioned size can be extracted for each region of interest. It can also be associated with elements of semantic information. In fact, in the case of a soccer game, if the user is able to select a region of interest to display from a selection of several regions of interest to display, information about the region of interest may be required.

モジュール１１５は、ＳＥＩ（「付加拡張情報」）タイプ・メッセージにコード化するために関心領域に関する情報を受け取る。 Module 115 receives information regarding the region of interest for encoding into a SEI ("Additional Extended Information") type message.

ＳＥＩメッセージは、以下の表に示すように符号化される。 The SEI message is encoded as shown in the following table.

ｕｕｉｄ＿ｉｓｏ＿ｉｅｃ＿１１５７８：復号化器にメッセージ・タイプを示すための１２８ビットの単一のワード。

uuid_iso_iec_11578: A single 128-bit word to indicate the message type to the decoder.

ｕｓｅｒ＿ｄａｔａ＿ｐａｙｌｏａｄ＿ｂｙｔｅ：ＳＥＩメッセージの一部を含む８ビット
通常、この場合、
ｐａｙｌｏａｄＳｉｚｅ＝１７（バイト）、よって、ＵＵＩＤは１６であり、固有データは１である。 user_data_payload_byte: 8 bits containing part of the SEI message
payloadSize = 17 (bytes), so the UUID is 16 and the unique data is 1.

ｕｓｅｒ＿ｄａｔａ＿ｐａｙｌｏａｄ＿ｂｙｔｅ： user_data_payload_byte:

ｎｕｍｂｅｒ＿ｏｆ＿ＲＯＩ：ピクチャ（又は後続ピクチャ）内に存在している関心領域の数
ｒｏｉ＿ｘ＿１６：１６個の画素の倍数での、関心領域のピクチャにおけるＸの位置
ｒｏｉ＿ｙ＿１６：１６個の画素の倍数で、関心領域のピクチャにおけるＹの位置
ｒｏｉ＿ｗ＿１６：関心領域のピクチャ内の幅（１６画素の倍数）
ｒｏｉ＿ｈ＿１６：関心領域のピクチャ内の高さ（１６画素の倍数）
ｓｅｍａｎｔｉｃ＿ｉｎｆｏｒｍａｔｉｏｎ：関心領域を特徴付ける題名
相対的重み：基本的に最も関心の高い関心領域が分かるようにピクチャの関心領域毎の重みを表す。

number_of_ROI: number of regions of interest present in the picture (or subsequent picture) roi_x_16: position of X in the region of interest in multiples of 16 pixels roi_y_16: multiple of 16 pixels in region of interest Y position in the picture roi_w — 16: width of the region of interest in the picture (multiple of 16 pixels)
roi_h — 16: height of the region of interest in the picture (multiple of 16 pixels)
semantic_information: Title that characterizes the region of interest Relative weight: Basically represents the weight of each region of interest in the picture so that the region of interest of highest interest is known.

Ｍａｃｒｏｂｌｏｃｋ＿ａｌｉｇｎｍｅｎｔ：関心領域が存在する開始マクロブロックの数、マクロブロック数、幅、及び高さでの、関心領域のサイズを表す。 Macroblock_alignment: Represents the size of the region of interest in the number of starting macroblocks, the number of macroblocks, the width, and the height where the region of interest exists.

関心領域が、顕著性マップを使用して検出されると、顕著性レートが、関心領域毎に得られ、顕著性マップを得る手法によって予め求められた特定の閾値よりも顕著性が高い場合、領域は顕著であるとして分類される。よって、ＳＥＩメッセージでは、関心領域は、固定閾値よりも顕著性が高い領域全ての顕著性の昇順で分類される。
モジュール１１３は、ＳＥＩメッセージをデータ・ストリームに挿入し、そうして符号化されたビデオ・ストリームを伝送ネットワークに送出する。 When a region of interest is detected using a saliency map, a saliency rate is obtained for each region of interest, and if the saliency is higher than a specific threshold previously determined by the technique for obtaining the saliency map Regions are classified as prominent. Therefore, in the SEI message, the region of interest is classified in ascending order of the saliency of all the regions that are more saliency than the fixed threshold.
Module 113 inserts the SEI message into the data stream and sends the encoded video stream to the transmission network.

ＳＥＩメッセージは、参照するピクチャそれぞれよりも前に伝送される。 The SEI message is transmitted before each referenced picture.

他の実施例では、２つ以上のピクチャ間で少なくとも１つの関心領域の位置が変動した場合にのみ、ＳＥＩメッセージを伝送することも可能である。よって、復号化中、復号化器は、復号化する対象のピクチャの直前であっても、前述のＳＥＩメッセージに現在のピクチャが先行しない場合に先行して受信されたピクチャに関する場合でも、直近に受信されたＳＥＩメッセージを考慮に入れる。 In other embodiments, the SEI message can be transmitted only when the position of at least one region of interest varies between two or more pictures. Thus, during decoding, the decoder will be in the immediate vicinity whether it is immediately before the picture to be decoded or if it is related to a previously received picture if the current picture does not precede the SEI message. Take into account the received SEI message.

図２は、本発明の好ましい実施例を実現するＨ．２６４／ＡＶＣ符号化標準による符号化方法を示す。 FIG. 2 illustrates the H.264 implementation of the preferred embodiment of the present invention. 2 illustrates an encoding method according to the H.264 / AVC encoding standard.

工程Ｅ１中、放送する対象のビデオに関連付けられた顕著性マップが求められる。関心領域を示す前述の顕著性マップを求めるために、ビデオ・コンテンツに関する情報は、顕著性マップの確立中にこの情報を考慮に入れるために受け取ることも可能である。特に、スポーツ・イベントの間、ボールの位置は、ユーザの関心領域に対応し、この場合、ボールが位置しているピクチャのゾーンを特権化する。ビデオが、テレビ中継レポートの放送に対応する場合、例えば、既知のピクチャ処理手法を使用して顔を検出することにより、司会者を含むゾーンを特権化することにより、関心領域を求めるものとし得る。 During step E1, a saliency map associated with the video to be broadcast is determined. In order to determine the aforementioned saliency map indicating the region of interest, information about the video content may also be received to take this information into account during establishment of the saliency map. In particular, during a sporting event, the position of the ball corresponds to the user's region of interest, in which case the picture zone in which the ball is located is privileged. If the video corresponds to the broadcast of a television broadcast report, the region of interest may be determined, for example, by privileged the zone containing the moderator by detecting faces using known picture processing techniques. .

Ｅ１工程の終了時に、ビデオ・コンテンツに関する１つ又は複数の関心領域がよって、得られる。 At the end of the E1 process, one or more regions of interest for the video content are thus obtained.

工程Ｅ２中に、ピクチャ内の関心領域の座標が求められる。関心領域のサイズは、画素においても求めることが可能であり、コンテンツ上の意味情報を各関心領域と関連付けることが可能である。 During step E2, the coordinates of the region of interest in the picture are determined. The size of the region of interest can also be obtained in pixels, and the semantic information on the content can be associated with each region of interest.

並列に、工程Ｅ３中に、ビデオ・ストリームは、Ｈ．２６４符号化標準に応じて符号化される。符号化中、関心領域として検出されたゾーンが特権化される。符号化レベルにおける関心領域を特権化するために、より低い量子化ステップが施される。 In parallel, during step E3, the video stream is H.264. It is encoded according to the H.264 encoding standard. During encoding, zones detected as regions of interest are privileged. In order to privilege the region of interest at the coding level, a lower quantization step is applied.

工程Ｅ２に続いて、工程Ｅ４中、ＳＥＩメッセージが、関心領域に関連付けられた意味情報及び位置から作成される。そうして作成されたＳＥＩメッセージは、表１及び表２において上述されたＳＥＩメッセージによる。 Subsequent to step E2, during step E4, an SEI message is created from the semantic information and location associated with the region of interest. The SEI message thus created is based on the SEI message described above in Tables 1 and 2.

工程Ｅ５中、ストリームは、Ｈ．２６４標準による符号化ストリームを得るためにストリームにＳＥＩメッセージを挿入することによって構成される。 During step E5, the stream is H.264. In order to obtain an encoded stream according to the H.264 standard, it is constructed by inserting an SEI message into the stream.

そうして符号化されたビデオ・ストリームは、工程Ｅ６中に、リアルタイムで、又は遅らせて復号化装置に送信され、復号化装置は局所又は遠隔であり得る。 The encoded video stream is then transmitted to the decoding device in real time or delayed during step E6, which may be local or remote.

図３は、Ｈ．２６４／ＡＶＣ符号化標準により、本発明による復号化装置の好ましい実施例を表す。 FIG. The H.264 / AVC coding standard represents a preferred embodiment of the decoding device according to the invention.

モジュール２０９は、入力においてＳＥＩメッセージを受信する。モジュール２０９は、別々のＳＥＩメッセージを抽出する。有用データのＮＡＬはエントロピ復号化モジュール２０１に送信される。 Module 209 receives the SEI message at the input. Module 209 extracts separate SEI messages. The useful data NAL is transmitted to the entropy decoding module 201.

ＳＥＩメッセージはモジュール２１０によって解析される。このモジュールは、関心領域を表すＳＥＩメッセージのコンテンツの復号化を可能にする。各ピクチャの関心領域はよって、単純なやり方で、かつ、ｍａｃｒｏｂｌｏｃｋ＿ａｌｉｇｎｍｅｎｔというフィールドに含まれる情報を使用して各ピクチャの復号化前に復号化装置のレベルで識別される。 The SEI message is parsed by module 210. This module enables the decoding of the content of the SEI message representing the region of interest. The region of interest of each picture is thus identified in a simple manner and at the level of the decoding device before decoding each picture using information contained in the field macroblock_alignment.

マクロブロックは、係数の組を得るために再配列モジュール２０２に送信される。前述の係数は、モジュール２０３における逆量子化、及びモジュール２０４における逆ＤＣＴ変換を経る。モジュール２０４の出力では、Ｄ’ｎのマクロブロックが得られ、Ｄ’ｎはＤｎの変形されたバージョンである。予測ブロックＰを加算器２０５により、Ｄ’ｎに加算してマクロブロックｕＦ’ｎを再構成する。ブロックＰは、インター・モードにおける符号化中に、先行する復号化フレームの、モジュール２０８によって行われる動き補償後に、又は、イントラ・モードにおける符号化の場合、モジュール２０７による、マクロブロックｕＦ’ｎのイントラ予測後に得られる。フィルタ２０６は、歪みの影響を削減するために信号ｕＦ’ｎに施され、再構成されたフレームＦ’ｎが一連のマクロブロックから生成される。 The macroblock is sent to the rearrangement module 202 to obtain a set of coefficients. The aforementioned coefficients undergo an inverse quantization in module 203 and an inverse DCT transform in module 204. At the output of module 204, D'n macroblocks are obtained, where D'n is a modified version of Dn. The predicted block P is added to D′ n by the adder 205 to reconstruct the macroblock uF′n. The block P is a block of the macroblock uF′n by the module 207 during the inter mode encoding, after the motion compensation performed by the module 208 of the preceding decoded frame, or in the case of intra mode encoding. Obtained after intra prediction. Filter 206 is applied to signal uF'n to reduce the effects of distortion, and a reconstructed frame F'n is generated from the series of macroblocks.

ＳＥＩメッセージに含まれる関心領域に関する情報を使用して、関心領域を表すブロックがストリームにおいて、表示前に検出され、前述のブロックが識別され、ユーザの選択に応じてクロッピングし、ＰＤＡや携帯電話機などの装置に、表示するために送信することが可能である。 Using information about the region of interest included in the SEI message, blocks representing the region of interest are detected in the stream before display, the aforementioned blocks are identified, cropped according to the user's choice, such as a PDA or mobile phone Can be sent to other devices for display.

例えば、意味情報を入力することにより、表示したいマクロブロックを選ぶためにユーザに選択を任せることも可能である。例えば、「ボール」を入力し、この場合、ボールを含む関心領域が表示される。この意味情報に関連付けられた関心領域が存在しない場合、関心領域全てを表示することが可能である。種々の関心領域を画面上にモザイクの形式で表示することが可能である。単一の関心領域が表示されると、前述の関心領域は、全画面を占めるように画面上にズームで表示される。 For example, by inputting semantic information, it is possible to leave the selection to the user to select a macroblock to be displayed. For example, “Ball” is input, and in this case, a region of interest including the ball is displayed. If there is no region of interest associated with this semantic information, it is possible to display the entire region of interest. It is possible to display various regions of interest on the screen in the form of a mosaic. When a single region of interest is displayed, the region of interest is displayed on the screen in a zoom so as to occupy the entire screen.

復号化装置はよって、ユーザに対する関心情報を含んでいる可能性が高いマクロブロックのみを復号化する。このようにして、復号化は、より高速になり、復号化装置のレベルで、かつ、したがって、受信側で必要とする資源が少なくなる。受信装置が、処理容量が限定的な移動端末である場合、このことは特に効果的である。 The decoding device thus only decodes macroblocks that are likely to contain interest information for the user. In this way, the decoding is faster and requires less resources at the level of the decoding device and therefore on the receiving side. This is particularly effective when the receiving device is a mobile terminal with limited processing capacity.

図４は、本発明の好ましい実施例を実現するＨ．２６４／ＡＶＣ符号化標準による復号化方法を示す。 FIG. 4 shows an H.264 implementation of the preferred embodiment of the present invention. 2 shows a decoding method according to the H.264 / AVC coding standard.

前述の方法は、表示容量が限定的な移動端末において実現することが可能である。 The above-described method can be realized in a mobile terminal having a limited display capacity.

工程Ｓ１中、必要な表示のタイプが選択される。選択は、移動端末上に存在しているユーザ・インタフェースによって行われる。完全ピクチャ・モードで機能することにされる場合、送信器によって送信されるにつれ、ビデオ・ストリーム全体が表示される。あるいは、ピクチャの関心領域のみを表示することにされる。この特定のモードは、本発明の特徴を構成する。関心領域を表示することにした場合、工程Ｓ２に移り、さもなければ、工程Ｓ８に移る。各種ＳＥＩメッセージを、他のアプリケーションのビデオ・ストリームに挿入することが可能であり、この場合、工程Ｓ８前、又は工程Ｓ８中、ＳＥＩメッセージ解析の工程が存在し得る。 During step S1, the required display type is selected. The selection is made by means of a user interface that exists on the mobile terminal. If it is decided to work in full picture mode, the entire video stream is displayed as it is transmitted by the transmitter. Alternatively, only the region of interest of the picture is displayed. This particular mode constitutes a feature of the present invention. If the region of interest is to be displayed, the process moves to step S2, otherwise the process moves to step S8. Various SEI messages can be inserted into the video streams of other applications, in which case there may be a step of SEI message analysis before or during step S8.

工程Ｓ２中、ユーザは、関心領域の使用法を選択する。特に、ユーザは、
表示したい関心領域の最大数、
画面上に種々の関心領域を表示したい態様（例えば、モザイクの形式）、
関心領域に対して望まれるズームの度合いを選択することが可能であり、
キーワードを使用して、「意味情報」フィールドがキーワードを含む関心領域
を選択することが可能である。この場合、ピクチャ毎に、キーワードを含むピクチャ毎に単一の関心領域（この場合、顕著度が最大の関心領域）を表示することが必要であるか、又はキーワードを含むいくつかの関心領域を表示することが必要であるかを規定することも可能である。 During step S2, the user selects how to use the region of interest. In particular, users
The maximum number of regions of interest you want to display,
A mode of displaying various regions of interest on the screen (for example, mosaic format),
It is possible to select the degree of zoom desired for the region of interest,
Using keywords, it is possible to select a region of interest whose "semantic information" field contains the keyword. In this case, for each picture, it is necessary to display a single region of interest (in this case, the region of greatest saliency) for each picture containing the keyword, or several regions of interest containing the keyword It is also possible to define whether it is necessary to display.

工程Ｓ３中、ストリームに存在しているＳＥＩメッセージは、受信されるにつれて解析される。ＳＥＩメッセージは、ピクチャ符号化前に検出されたピクチャの関心領域の位置を符号化するために使用される。よって、ピクチャ毎に、ピクチャの視覚特性により、若しくはピクチャ・コンテンツにより、又は両方により、１つ又は複数の関心領域が存在し得る。ＳＥＩメッセージは、前述した表１及び表２によって符号化される。ＳＥＩメッセージに関する情報は、時間上、対応するピクチャの表示まで記録される。 During step S3, SEI messages present in the stream are analyzed as they are received. The SEI message is used to encode the position of the region of interest of the picture detected before picture encoding. Thus, for each picture, there may be one or more regions of interest, depending on the visual characteristics of the picture, or by the picture content, or both. The SEI message is encoded according to Tables 1 and 2 described above. Information about the SEI message is recorded until the display of the corresponding picture in time.

工程Ｓ４中、ピクチャは全て、復号化標準に従って復号化される。 During step S4, all pictures are decoded according to the decoding standard.

工程Ｓ５中、復号化された関心領域は、Ｓ２工程中にユーザが選択したものによって処理される。ユーザが、ピクチャの主関心領域のズームを選択した場合、工程Ｓ６中、最大表示サイズに達するようにゾーンが拡大される。ユーザが、関心領域のモザイクを選択した場合、ピクチャは関心領域で再構成され、それぞれの関心領域は、表示に選択された関心領域の数及び画面サイズによって拡大される。ユーザがキーワードを規定した場合、キーワードを含む関心領域が表示され、ズームされる。 During step S5, the decoded region of interest is processed according to what the user selected during step S2. If the user chooses to zoom the main region of interest of the picture, the zone is enlarged to reach the maximum display size during step S6. If the user selects a mosaic of regions of interest, the picture is reconstructed with regions of interest, and each region of interest is magnified by the number of regions of interest selected for display and the screen size. When the user defines a keyword, the region of interest containing the keyword is displayed and zoomed.

工程Ｓ７中、関心領域が、ユーザの望みに応じて、移動端末の画面上に表示される。 During step S7, the region of interest is displayed on the screen of the mobile terminal according to the user's desire.

工程Ｓ８中、ユーザによる、関心領域のみを表示する旨の非選択に続いて、ビデオ・ストリーム全体が、表示のために復号化される。 During step S8, following the deselection by the user to display only the region of interest, the entire video stream is decoded for display.

図５は、本発明のビデオ索引化アプリケーションを示す。 FIG. 5 illustrates the video indexing application of the present invention.

図５は、パーソナル・レコーダ（ＰＶＲ）タイプの装置５００を部分的に示す。ＰＶＲ５００は、その入力で圧縮ビデオ・ストリームを受信する。上述の実施例によれば、このビデオ・データ・ストリームは、Ｈ．２６４符号化標準に従っている。圧縮ビデオ・ストリームは、特に、表１及び表２で上述したようなＳＥＩメッセージを含む。 FIG. 5 partially shows a personal recorder (PVR) type device 500. PVR 500 receives a compressed video stream at its input. According to the embodiment described above, this video data stream is H.264. H.264 encoding standard. The compressed video stream specifically includes SEI messages as described above in Tables 1 and 2.

このビデオ・データ・ストリームは、記録サポート５０３に部分的に送信される。記録サポートは、ハード・ディスク、ホログラフィック・サポート、メモリ・カード、又は「ブルー・レイ」ディスクとみなし得る。この記録サポートは他の実施例では遠隔であり得る。 This video data stream is partially transmitted to the recording support 503. The recording support can be considered a hard disk, holographic support, memory card, or “blue ray” disk. This recording support may be remote in other embodiments.

ビデオ・データ・ストリームは、（例えば、テレビ受像機上に表示されるために）リアルタイムで復号化するために復号化器５０１に別の部分において送信される。既知の装置では、ストリームは、ユーザがリアルタイムで視たい場合、復号化器５０１に送信される。否定の場合、記録が要求された場合、復号化されないが、単純に記録される。 The video data stream is transmitted in another part to decoder 501 for decoding in real time (eg, for display on a television receiver). In known devices, the stream is sent to the decoder 501 when the user wants to see it in real time. In the negative case, if recording is requested, it is not decrypted but simply recorded.

本発明は、この局面によれば、リアルタイムでの視聴が要求されない場合でも、ビデオ・データ・ストリームの部分の復号化を提供する。ビデオ・ストリームの部分の場合、特に、関心領域又は特定の関心領域である。 The present invention, according to this aspect, provides for decoding a portion of a video data stream even when real-time viewing is not required. In the case of parts of a video stream, in particular a region of interest or a specific region of interest.

復号化器５０１が、記録が要求されたビデオ・ストリームを受信すると、データは、記録サポート５０３に送信される。記録サポート５０３は、受信されるにつれ、データを記録する。同時に、復号化器５０１は、ビデオ・データ・ストリームを受信し、ＳＥＩメッセージを漸次、復号化する。復号化された関心領域は、記録サポート５０３に送信する前に、一時的に記録する役割を果たすビデオ索引化モジュール５０２に送信される。 When the decoder 501 receives the video stream requested to be recorded, the data is sent to the recording support 503. The recording support 503 records data as it is received. At the same time, decoder 501 receives the video data stream and progressively decodes the SEI message. The decoded region of interest is sent to the video indexing module 502 which serves to temporarily record before sending it to the recording support 503.

図６は、復号化器５０１及び索引化モジュール５０２によって実現される方法を示す。 FIG. 6 shows the method implemented by the decoder 501 and the indexing module 502.

工程Ｔ１中、ビデオ・データ・ストリームは復号化器５０１によって受信される。工程Ｔ２中、復号化器５０１は、ビデオ・データ・ストリームに存在しているＳＥＩメッセージを復号化する。復号化されたＳＥＩメッセージは、表１及び表２に上述されたようなＳＥＩメッセージである。復号化器は、他のＳＥＩメッセージを復号化することも可能であるが、これは本発明の目的でない。各ＳＥＩメッセージは、表１及び表２に上述したようなピクチャ毎に１つ又は複数の関心領域を表すことが可能である。工程Ｔ３中、復号化器５０１は、各ＳＥＩメッセージを解析し、各ピクチャを復号化する。この工程中、ＳＥＩメッセージにおいて示された重みは、どの関心領域がピクチャ毎に記録されるかを選択するために使用される。好ましい実施例では、顕著性が最大の（すなわち、重みが最高の）関心領域が維持される。 During step T1, the video data stream is received by the decoder 501. During step T2, the decoder 501 decodes the SEI message present in the video data stream. The decrypted SEI message is an SEI message as described above in Tables 1 and 2. The decoder can also decode other SEI messages, but this is not the object of the present invention. Each SEI message may represent one or more regions of interest for each picture as described above in Tables 1 and 2. During step T3, the decoder 501 analyzes each SEI message and decodes each picture. During this process, the weight indicated in the SEI message is used to select which region of interest is recorded for each picture. In the preferred embodiment, the region of interest with the highest saliency (ie, the highest weight) is maintained.

関心領域が復号化されると、工程Ｔ４中に、索引化モジュール５０２に送信される。ピクチャ（全てのピクチャ）毎の関心領域の記録に対する関心は低い。情報量が大きく、効率的なビデオ索引化を可能にしないからである。よって、索引化モジュールは、ビデオを索引化するためにどのピクチャを使用するかを決定する。上述の好ましい実施例によれば、約１０のみのピクチャが、１．５時間のビデオについて選択される。他の実施例では、ピクチャの数が大きくなることが想定される。前述の１０個のピクチャは、等間隔で撮られる。前述の選択されたピクチャは、索引化モジュール５０２に含まれるＲＡＭタイプのメモリ（図示せず）に一時的に記録される。最善のやり方で表示されるために、ピクチャは、工程Ｔ５中にズーミングされる。すなわち、全て、同じサイズであるように拡大される。好ましい実施例では、このサイズはピクチャのサイズであり得る。そのために、それらは、一時メモリにおいて読み出され、拡大後に再記録される。別の実施例では、ピクチャは、一時メモリにおける記録前に拡大される。 Once the region of interest is decoded, it is sent to the indexing module 502 during step T4. Interest in recording the region of interest for each picture (all pictures) is low. This is because the amount of information is large and efficient video indexing is not possible. Thus, the indexing module determines which picture to use to index the video. According to the preferred embodiment described above, only about 10 pictures are selected for a 1.5 hour video. In other embodiments, the number of pictures is assumed to be large. The above ten pictures are taken at regular intervals. The selected picture is temporarily recorded in a RAM type memory (not shown) included in the indexing module 502. In order to be displayed in the best way, the picture is zoomed during step T5. That is, they are enlarged so that they are all the same size. In the preferred embodiment, this size may be the size of the picture. To that end, they are read in a temporary memory and re-recorded after expansion. In another embodiment, the picture is enlarged before recording in temporary memory.

別の実施例によれば、画像はディスプレイ上のモザイクとして提示される。したがって、拡大される代わりに、画像は、全てについて同様に、１つの単一サイズに縮小される。 According to another embodiment, the image is presented as a mosaic on the display. Thus, instead of being enlarged, the image is reduced to one single size for all as well.

ビデオ全体が受信され、記録サポート５０３において記録されると、索引化ピクチャは、一時メモリから記録サポート５０３に転送され、ファイルに記録される。 When the entire video is received and recorded at the recording support 503, the indexed picture is transferred from the temporary memory to the recording support 503 and recorded in a file.

次いで、所望の使用によれば、関心領域は、索引化に使用される。上記索引化により、ユーザがデータベースの内容を照会したい場合に、レコーダがビデオの画像を表示することが可能になる。 Then, according to the desired use, the region of interest is used for indexing . By the indexing, if the user wants to query the contents of the database, the recorder it is capable ing for displaying an image of the video.

本発明の別の局面によれば、ＳＥＩメッセージの符号化中に関心領域の位置データを暗号化することも可能である。よって、復号鍵を有するユーザのみが、関心領域にアクセスし、関心領域の視覚化にアクセスするか、又は、関心領域の位置情報により、ビデオ・ストリームの索引にアクセスすることが可能である。この暗号化工程（図２を参照）は、工程Ｅ４’（図示せず）になるが、工程Ｅ４後に挿入される。 According to another aspect of the present invention, it is possible to encrypt the position data of the region of interest during the encoding of the SEI message. Thus, only the user with the decryption key can access the region of interest, access the visualization of the region of interest, or access the index of the video stream with the location information of the region of interest. This encryption step (see FIG. 2) becomes step E4 '(not shown), but is inserted after step E4.

復号鍵の取得は、例えば、番組放送者からの有料サービスの対象であり得る。 The acquisition of the decryption key can be, for example, a pay service target from a program broadcaster.

これを行うために、関心領域に関するＳＥＩメッセージは、ＲＴＰ（リアルタイム・プロトコル）タイプ・パケットにカプセル化され、別のビデオ・ポート上に送信される。時間ＣＴＳタイプ・ラベルは、対応するピクチャと、関心領域に関するＳＥＩメッセージとを関係付けることが可能である。効果的には、この送信モードは、ビデオでなく、ＳＥＩメッセージを含むＲＴＰパケットのみを暗号化することを可能にする。 To do this, the SEI message for the region of interest is encapsulated in RTP (Real Time Protocol) type packets and sent on another video port. The temporal CTS type label can relate the corresponding picture to the SEI message for the region of interest. Effectively, this transmission mode makes it possible to encrypt only RTP packets that contain SEI messages, not video.

復号は、端末受信器のレベルで行われる。 Decoding is performed at the terminal receiver level.

ＭＰＥＧ−２ＴＳカプセル化の場合、使用される暗号化標準はＤＶＢ−ＣＳＡであり、関心領域に関するＳＥＩメッセージは、ビデオのものとは別のＰＩＤにカプセル化される。関心領域に関するＳＥＩメッセージは、ＰＥＳパケット・ヘッダのＰＴＳ（タイムスタンプ）を介して、対応するピクチャに関係付けられる。この送信モードは、ビデオＰＩＤではなく、関心領域に関するＳＥＩメッセージを含むＰＩＤのみの暗号化を可能にする。 In the case of MPEG-2 TS encapsulation, the encryption standard used is DVB-CSA, and the SEI message for the region of interest is encapsulated in a PID different from that of the video. The SEI message for the region of interest is related to the corresponding picture via the PTS (Time Stamp) in the PES packet header. This transmission mode allows encryption of only PIDs containing SEI messages regarding the region of interest, not video PIDs.

別の実施例では、ビデオ・ストリームは、ピクチャの別々の部分を無関係に符号化し、よって、無関係に復号化することを可能にするＦＭＯ（柔軟なマクロブロック配列）を使用してＨ．２６４／ＡＶＣ符号化標準によって符号化される。ＦＭＯモードは「スライス群」を使用する。「スライス群」は標準に規定されている。この実施例では、関心領域は、ピクチャの残りとは別の群において符号化される。ＰＰＳタイプのＮＡＬは、「スライス群」のマップを含む。関心領域が符号化された「スライス群」を示す、後述のものなどのＳＥＩメッセージが挿入される。 In another embodiment, the video stream is encoded in H.264 using FMO (Flexible Macroblock Array), which allows independent portions of pictures to be encoded independently and thus decoded independently. It is encoded according to the H.264 / AVC encoding standard. The FMO mode uses “slice group”. The “slice group” is defined in the standard. In this embodiment, the region of interest is encoded in a different group than the rest of the picture. The PPS type NAL includes a map of “slice groups”. An SEI message such as that described below is inserted, indicating a “slice group” in which the region of interest is encoded.

以下の表は、この実施例によって使用されるＳＥＩメッセージの形式を示す。 The following table shows the format of the SEI message used by this example.

ｕｓｅｒ＿ｄａｔａ＿ｐａｙｌｏａｄ＿ｂｙｔｅ：ＳＥＩメッセージの一部を含む８ビット
通常、この場合、
ｐａｙｌｏａｄＳｉｚｅ＝１７（バイト）。よって、ＵＵＩＤは１６であり、固有データは１である。 user_data_payload_byte: 8 bits containing part of the SEI message
payloadSize = 17 (bytes). Therefore, the UUID is 16 and the unique data is 1.

Ｓｌｉｃｅ＿ｇｒｏｕｐ（ｉ）＿ｉｄ：ｓｌｉｃｅ＿ｇｒｏｕｐ＿ｉｄが「１」の場合、ｓｌｉｃｅ＿ｇｒｏｕｐは関心領域を表し、ｓｌｉｃｅ＿ｇｒｏｕｐ＿ｉｄが「０」の場合、ｓｌｉｃｅ＿ｇｒｏｕｐはピクチャの残りを表す。

Slice_group (i) _id: When slice_group_id is “1”, slice_group represents a region of interest, and when slice_group_id is “0”, slice_group represents the rest of the picture.

関心領域を表すｓｌｉｃｅ＿ｇｒｏｕｐ毎に、意味情報、相対重み、及び関係するマクロブロックを規定することが可能である。 For each slice_group representing a region of interest, semantic information, relative weights, and related macroblocks can be defined.

よって、関心領域に対応するマクロブロックのみを、無関係に識別され、符号化されるにつれ、受信中に復号化することが可能である。 Thus, only the macroblock corresponding to the region of interest can be decoded during reception as it is independently identified and encoded.

Claims

A method for indexing an encoded video data stream by a personal recorder device , wherein the video data stream includes information about the location of a region of interest in each picture, the method comprising:
The personal recorder device receiving an encoded video data stream for a decoder;
Recording support recording the encoded video data stream on the recording support;
A decoder decoding position information of the region of interest;
A step of video indexing module selects the region of interest for each picture,
A step of the decoder to decode the encoded video data stream,
The step of the video indexing module selects a predetermined number of regions of interest of the video data stream from said selected region of interest for each picture,
The recording support comprising: recording the selected region of interest of the video data stream on the recording support.

The indexing method of claim 1, wherein the recording support records the selected region of interest of the video data stream.
As the video data stream is decoded and the region of interest of the video data stream is selected, the selected region of interest of the video data stream is recorded in temporary memory;
The indexing method, wherein when the selected region of interest of the video data stream is recorded in the temporary memory, the selected region of interest of the video data stream is transferred to a recording support.

The indexing method according to claim 1, wherein the method comprises:
Between the video indexing module selecting a predetermined number of regions of interest in the video data stream and the recording support recording the selected regions of interest in the video data stream. Indexing comprising zooming the selected region of interest of the video data stream with the personal recorder device to obtain a uniform size of the selected region of interest of the video data stream how to.

4. The indexing method according to claim 1, wherein the personal recorder device encrypts the position information of the region of interest with an encryption key. .

5. The indexing method according to claim 4, comprising the step of obtaining a decryption key by the personal recorder device by payment by a user.

6. A method for indexing as claimed in any preceding claim, wherein the video data stream is H.264. A method of indexing, encoded according to the H.264 / AVC encoding standard, wherein the location information is included in a supplemental extension information (SEI) type message.

The indexing method according to claim 5 or 6, wherein the supplemental extension information (SEI) type message is encapsulated in a real-time protocol packet (RTP) and the RTP packet is encrypted. how to.

The indexing method according to any one of claims 5 to 6, wherein an additional extension information (SEI) type message related to region-of-interest position information is provided for each picture referred to by the additional extension information type message. A method of indexing that is inserted into a pre- or post-encoded video data stream.

The indexing method according to claim 1, wherein the position information is
The number of regions of interest in each picture,
The coordinates of each region of interest according to picture width and picture height,
The width and height of each region of interest;
The relative weights of the associated cardiac region to other areas of the picture,
A method of indexing comprising information about the content of each region of interest and information selected from any combination of information.

A method of indexing according to any one of claims 1 to 9, the step of the video indexing module selects a region of interest for each of the picture, the video indexing module region of interest A method of indexing, wherein regions of interest are selected according to the relative weights of the regions of interest.

A method for indexing according to any one of claims 6 to 10, comprising:
The video coding standard uses a flexible macroblock arrangement, the region of interest is encoded into slice groups independently of other picture data, and the location information of the region of interest is encoded in the region of interest. Indexing method including the slice group number.

12. The indexing method of claim 11, wherein the supplemental enhancement information (SEI) type message includes an identifier that indicates for each slice group if it is related to a region of interest.

13. The indexing method of claim 12, further comprising the step of the personal recorder device reading the supplemental enhancement information (SEI) type message, wherein the decoder is the encoded video data stream. Is a method of indexing, in which the decoder decodes only the slice group indicated by the identifier that the slice group relates to one region of interest.

An apparatus for indexing an encoded video data stream, wherein the video data stream includes information regarding the location of a region of interest for each picture, the apparatus comprising:
Means for receiving an encoded video data stream;
Means for recording the encoded video data stream on a recording support;
Means for decoding position information of the region of interest;
Means for decoding the video data stream;
Means for selecting a region of interest for each picture;
Means for selecting a predetermined number of regions of interest in the video data stream from regions of interest selected for each picture;
Means for recording the selected region of interest of the video data stream.