JP2004128872A

JP2004128872A - Image processing apparatus and method thereof, and computer program and computer-readable storage medium

Info

Publication number: JP2004128872A
Application number: JP2002290047A
Authority: JP
Inventors: Hajime Oshima; 大嶋　肇; ▲高▼久　雅彦; Masahiko Takaku; Akira Kunimatsu; 國松　亮
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-10-02
Filing date: 2002-10-02
Publication date: 2004-04-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processing apparatus and a method thereof, and a computer program and a computer-readable storage medium which effectively and quickly deliver information provided in a form of a moving picture in compliance with the MPEG-4 or the like to more viewers. <P>SOLUTION: A moving picture contents input section 1 receives moving picture data, a control information input section 6 sets a data form so that an output destination read received data, a scene update detection section 4 detects updated scenes of the moving picture data, and a scene data generating section 5 generates the scene data of the scenes detected by the detection section 4 according to the data form above. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置及びその方法、並びにコンピュータプログラム及びコンピュータ可読記憶媒体に関し、特にマルチメディア符号化形式の動画データから他形式の静止画データ生成に関する。
【０００２】
【従来の技術】
近年、デジタル・ビデオ・カメラが広く普及してきている。その形態もノートパソコンや携帯端末等のネットワーク通信機能を有する機器に接続できるようになっていたり、あるいは携帯電話と一体化していたりと様々な形態をとるようになってきている。こういった形態が採用される背景には、ビデオ・カメラで撮影した動画データを電子メールで送付したり、ネットワーク上で送信したりして、離れた場所からでも動画を見られるようにしたいと望んでいる消費者が存在することを示すものであると言えるだろう。
【０００３】
こういった背景を踏まえて、動画データの符号化圧縮技術も、ネットワーク通信技術と連携して用いられることを想定してこれまでに様々な改善、拡張がなされてきた。ＩＳＯ（Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｏｒｇａｎｉｚａｔｉｏｎ　ｆｏｒ　Ｓｔａｎｄａｒｄｉｚａｔｉｏｎ）によって標準化されているマルチメディア符号化形式であるＭＰＥＧ−４は、その代表とも言えるものである。ＭＰＥＧ（Ｍｏｖｉｎｇ　Ｐｉｃｔｕｒｅ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）−４は、ネットワーク環境で利用されることを前提に、それまでの動画符号化形式であるＭＰＥＧ−１、ＭＰＥＧ−２の圧縮方法を大幅に改善し、高い圧縮効率が実現されている。
【０００４】
このＭＰＥＧ−４の実現によって、かつては広いデータ伝送帯域幅が要求されるため、ＤＶＤなどの蓄積媒体や放送などの領域でしか用いられなかったＭＰＥＧの動画符号化技術が、インターネット環境のような比較的限られた帯域幅の環境においても利用出来るようになってきた。
【０００５】
実際に、最近になってＭＰＥＧ−４や、米国Ｍｉｃｒｏｓｏｆｔ社によって定義されているＡＳＦ（Ａｄｖａｎｃｅｄ　Ｓｔｒｅａｍｉｎｇ　Ｆｏｒｍａｔ）形式のようなＭＰＥＧ−４を基盤とする動画符号化形式を扱うことの出来るデジタル・ビデオ・カメラやネットワーク・ビデオ・プレーヤーなど、多種の製品が市場に投入されてきている。特に、本明細書の記述時点（平成１４年７月）では、ＮＴＴドコモ社による第三世代移動体通信サービスであるＦＯＭＡ（登録商標）において、ＭＰＥＧ−４を用いたデジタル動画像通信サービスが「ｉモーション（登録商標）」という名称ですでに開始されていることもあり、動画通信を行うことの出来る製品は今後さらに増えるであろうし、コンテンツ提供者もこれまでの静止画、テキストデータに加えて、より訴求力の高い情報を提供できる動画によるコンテンツを流通させる機会が増えるだろうと予想される。
【０００６】
しかし、一般的には動画データを再生できるのは、再生対象の動画データの符号化形式を正しく解析する仕組みを備えている再生装置に限られる。そのため、動画符号化データの解析手段を持たない装置では動画の再生が出来ないという問題点がある。
【０００７】
この問題は、パーソナル・コンピュータ等の装置上で動作するＷＷＷブラウザで動画データを再生する場合であれば、大きな問題になる可能性は少ない。米国Ｎｅｔｓｃａｐｅ　Ｃｏｍｍｕｎｉｃａｔｉｏｎｓ社のＮｅｔｓｃａｐｅ　Ｎａｖｉｇａｔｏｒ（登録商標）、もしくは米国Ｍｉｃｒｏｓｏｆｔ社のＩｎｔｅｒｎｅｔ　Ｅｘｐｌｏｒｅｒ（登録商標）といった代表的なＷＷＷブラウザには、未知のデータ形式の検出時に、その形式を解析する手段を備えた「プラグイン」あるいは「ＡｃｔｉｖｅＸコントロール」として知られるプログラム・モジュールをネットワーク経由で取得し、ＷＷＷブラウザの拡張機能としてインストールする機能がサポートされているため、再生する動画データ形式に対応したモジュールを取得することで再生は可能になるからである。
【０００８】
しかし、携帯電話のような端末機器は諸々の制約から機能が限定されていることが多いため、通常はこういった機能はサポートされておらず、あらかじめ機器に組み込まれた形式のデータ以外は表示することが出来ない。そればかりか、現状では動画はおろか静止画データ形式しか表示できない機器も依然として多いため、そのような機種の利用者は、動画コンテンツを視聴する事が出来なくなってしまう。
【０００９】
加えて、このようなソフトウェア的な制約でなく、物理的な制約によって動画データを視聴できないような機器も存在しうる。例えば、プリンタやＦＡＸのように、データを紙に印刷することによって出力する機器のような場合は、もはやデータを動画として視聴することは不可能である。
【００１０】
また、端末機器が仮にＭＰＥＧ−４形式のデータを解析できたとしても、ＭＰＥＧ−４の種類によっては、端末機器で視聴できない可能性が生じる場合がある。ＩＳＯでは、ＭＰＥＧ−４標準の一部として、ビデオやオーディオ、その他テキストや図形などのデータを「オブジェクト」として扱い、このオブジェクト群を表示画面上に空間的、時間的に配置した「シーン」を記述するためのＢＩＦＳ（Ｂｉｎａｒｙ　Ｆｏｒｍａｔ　ｆｏｒ　Ｓｃｅｎｅｓ）と呼ばれるデータ符号化形式を規定している。このＢＩＦＳ情報が加わることによって、普通の動画よりもさらに表現力の高いコンテンツが提供できるようになっている。しかしながら、このようにＭＰＥＧ−４には複数のレベルが存在していることによって、たとえＭＰＥＧ−４形式をサポートする再生機器であったとしても、ＭＰＥＧ−４動画符号化データは解析可能だがＢＩＦＳの処理機能はサポートしていない機器の場合は、このコンテンツは全く視聴不可能となってしまう。
【００１１】
以上に述べられるように、現状では、動画コンテンツを作成してネットワーク上に流通させても、必ずしも視聴者の機器で動画が再生できるとは限らないため、コンテンツ作成者が意図する情報を相手の視聴者に伝達できなかったり、あるいは限られた視聴者にしか伝達することが出来ないといった問題がある。
【００１２】
上記の問題に加えて、動画データとして情報を伝達する場合には、「内容を把握しづらく、所望する情報にたどり着くまでに時間がかかる」という、よく知られている動画特有の問題もついて回る。
【００１３】
動画情報は、静止画情報やテキスト情報と異なり、時間軸を持つという特徴的な性質を持っている。その性質ゆえに、動画情報の内容を把握するためには動画を先頭から末尾まで再生してみなければならないため、視聴者に内容を伝達するために必要な時間は、静止画やテキスト等の情報と比較してどうしても長くなってしまう。
【００１４】
また、動画情報が視聴者にとって有用なものであるか、また、有用な部分が動画中のどの位置に存在するかは、視聴してみなければ知ることは出来ない。そのため、視聴者にとって無用の情報であったとしても、結局、動画を先頭から再生し、内容を確認しながら視聴してみなければならず、視聴者は所望の情報に至るまで、ずっと不要な情報の視聴を強いられることになる。そのため、視聴者が動画コンテンツを視聴し、そこから必要な情報を取得することは、多大な時間とストレスを要する非効率的なものになりかねない。
【００１５】
さらに、動画情報をネットワーク経由で取得する場合は再生時間に加えてデータ取得のための通信時間が必要となるため、視聴者に強いられる時間とストレスはさらに増大する。しかも、ネットワークが公衆網の場合にはデータ通信に伴う回線使用料金の負担も強いられるという問題もある。圧縮されているとはいえ、動画データのデータ・サイズは非常に大きなものであるので、ネットワーク経由でデータを取得する時間や費用は決して無視できるものではない。
【００１６】
従来、動画からシーン静止画を抽出する技術、およびインデックスを出力する技術として、ビデオ・データ中のフレームを表す静止画と、その画像に付随する音声説明や字幕をテキスト化したデータを配置したインデックス・データを提供する方法がある（例えば、非特許文献１参照。）。また、この方法をふまえて、インターネット／ワールド・ワイド・ウェブ等の通信ネットワーク上のビデオ・ブラウジングをサポートする方法もある（例えば、特許文献１参照）。
【００１７】
【特許公報１】
特開平９−２４４８４９号公報
【非特許公報１】
Ａｍｙ　Ｔ．Ｉｎｃｒｅｍｏｎａ著、「Ａｕｔｏｍａｔｉｃａｌｌｙ　ｔｒａｎｓｃｒｉｂｉｎｇ　ａｎｄ　Ｃｏｎｄｅｎｓｉｎｇ　Ｖｉｄｅｏ：Ｎｅｗ　Ｔｅｃｈｎｏｌｏｇｙ　ｉｓ　Ｂｏｒｎ」、ＡＤＶＡＮＣＥＤ　ＩＭＡＧＩＮＧ、１９９５年８月
【００１８】
【発明が解決しようとする課題】
上記のように、動画コンテンツは多くの再生機器で正しく再生できない可能性があり、また、コンテンツの内容を容易に把握できないため、コンテンツ作成者が伝達したい情報を、視聴者に対して確実かつわかりやすく提供することが困難であるという問題があった。
【００１９】
本発明は、上記の課題を解決するためになされたものであり、ＭＰＥＧ−４等の動画で提供される情報を、内容を容易に把握できるようなシーン静止画データとして、インターネット等のネットワークを介して広範な端末機器で視聴できるような形式で提供することで、より多くの視聴者に情報を効果的に早く伝達することができる画像処理装置及びその方法、並びにコンピュータプログラム及びコンピュータ可読記憶媒体を提供することを目的とする。
【００２０】
【課題を解決するための手段】
上記の目的を達成するため、本発明に係る画像処理装置は、動画データを入力する動画データ入力手段と、出力先の機器が理解できるようなデータ形式を設定するデータ形式設定手段と、前記動画データのシーン更新を検出するシーン更新検出手段と、前記シーン更新検出手段によって検出されたシーンのシーン・データを、前記データ形式で作成するシーン・データ作成手段とを有することを特徴とする。
【００２１】
上記の目的を達成するため、本発明に係る画像処理方法は、動画データを入力する動画データ入力工程と、出力先の機器が理解できるようなデータ形式を設定するデータ形式設定工程と、前記動画データ入力工程で入力された動画データのシーン更新を検出するシーン更新検出工程と、前記シーン更新検出工程で検出されたシーンのシーン・データを、前記データ形式設定工程で設定されたデータ形式で作成するシーン・データ作成工程とを有することを特徴とする。
【００２２】
【発明の実施の形態】
以下に、本発明の実施の形態を図面を参照しながら説明する。図１は、本発明の全体構成を示す図であり、本発明を実現する装置を構成する機能ブロックを示したものである。
【００２３】
動画コンテンツ入力部１は、ＭＰＥＧ−４のようなマルチメディア符号化形式のデジタル・データとして記録されている動画コンテンツ・データを取り込むためのものである。入力される動画コンテンツは、コンピュータ上のディスク装置やＣＤなどの記録媒体、あるいはネットワークを介した遠隔地の装置といった場所に保管されるが、これらの場所からコンテンツ・データを取得して、後続の処理を実行出来るよう内部記憶へ展開する処理が行われる。
【００２４】
コンテンツ非多重化部２は、動画コンテンツ入力部１から送られる動画コンテンツ・データが多重化された複数のデータ列から構成されている場合に、非多重化処理を行うためのものである。ＭＰＥＧ−４の場合は、動画データと一緒に、音声データやその他の種類のデータを単一のデータ列に多重化するＦｌｅｘＭｕｘと呼ばれる多重化方法がＩＳＯによって規格化されているが、コンテンツ非多重化部２は、例えばこのＦｌｅｘＭｕｘ形式で多重化されたデータ列から動画データを分離するといった処理が行われる。なお、このような多重化処理は、全てのデータに対して必ずしも行われていなくても良いため、入力されたコンテンツが多重化されていない場合は、多重化処理はスキップされる。ここで分離されたデータ列は、符号化データ解析部３に渡される。
【００２５】
符号化データ解析部３は、コンテンツ非多重化部２から送られるデータ・ブロックの符号化データを解析し、デコード等のデータ処理を行うためのものである。マルチメディア・データは一般的にデータ量を縮小するための圧縮符号化がなされているが、その場合はここで圧縮データの展開処理が行われる。また、ＩＳＯで規定されているコンテンツ保護標準であるＩＰＭＰ（Ｉｎｔｅｌｌｅｃｔｕａｌ　Ｐｒｏｐｅｒｔｙ　Ｍａｎａｇｅｍｅｎｔ　ａｎｄ　Ｐｒｏｔｅｃｔｉｏｎ）等の仕組みを用いてデータの暗号化がなされている場合は、ここで復号処理が行われる。なお、マルチメディア・コンテンツは、ビデオやオーディオ、静止画像や、それらの相互関連やレイアウト情報を記述するためのデータ等、複数のタイプのデータから構成される場合があるが、符号化データ解析部３は入力されるデータ・タイプに応じた適切な解析処理を行う手段を有するものとする。
【００２６】
シーン更新検出部４は、符号化データ解析部３で処理されたデータ・ブロックを受け取り、そのデータ・ブロックに対応する動画像中のシーンが変更されたかどうかを判断し、更新の検出を行うためのものである。例えばＭＰＥＧ−４の場合、前述のＢＩＦＳを用いることによって、コンテンツ中のシーンが切り替わることをその更新時刻とともに明示的に示すことが出来る。ＢＩＦＳを用いたシーン更新については追って説明するが、シーン更新検出部４ではＢＩＦＳで示されたシーン更新データのような、何らかのシーンの変化を示す信号を検出する処理が行われる。更新が検出されたら、更新時刻をデータ作成制御部８に受け渡す。
【００２７】
シーン・データ作成部５は、先行する機能ブロックから渡されたシーンを表現するデータを、視聴者が静止画像として視聴することが可能なデータ形式にして出力するためのものである。ここでは、例えばＭＰＥＧ−４形式で符号化された動画データ中の所定の時刻のシーンを、ＪＰＥＧ（Ｊｏｉｎｔ　Ｐｈｏｔｏｇｒａｐｈｉｃ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）、ＧＩＦ（Ｇｒａｐｈｉｃｓ　Ｉｎｔｅｒｃｈａｎｇｅ　Ｆｏｒｍａｔ）等の静止画像符号化形式のデータに変換して、画像ファイルやメモリー・ブロックに出力する処理が行われる。対象となるシーンの時刻は、シーン更新検出部４によって取得されたシーン更新時刻を、データ作成制御部８から取得する。ここで出力された画像データの位置は、後述のデータ作成制御部８に受け渡される。
【００２８】
制御情報入力部６は、本発明の画像処理装置に対して、どのような処理を行い、どのような形式のデータを生成するかを指示する制御情報を取り込むためのものである。インデックス・データを作成するための制御情報として入力されるデータには、最終的に出力されるデータの符号化形式や、生成するデータの精度などを指定する。例えば、視聴者の端末でＪＰＥＧ、ＨＴＭＬを処理することが出来、表示可能な最大サイズが１７６×１４４ピクセルであるような場合は、”ＪＰＥＧ，　ＨＴＭＬ”、”１７６×１４４”といったデータが制御データとして用いられることになる。制御データは、画像処理装置内にあらかじめ記録されているか、装置の操作者によって入力されるか、あるいはネットワークを介して端末機器から送付される等の手段で入力される。
【００２９】
制御情報解析部７は、制御情報入力部６によって取得された制御データを解析し、後続のデータ作成制御部９が制御に用いることが出来る形式に変換するためのものである。制御データは、操作者から入力されたりネットワークを介して送られる場合、一般的には文字列形式のデータとして入力されるが、制御情報解析部７はこのデータを画像処理装置が処理しやすい任意の形式に変換する処理を行う。ここで変換されるデータは装置内部でしか使用されないため、内部的に処理しやすい形式であればどのような形式であっても良い。
【００３０】
データ作成制御部８は、制御情報解析部７から渡された制御データに基づいて、シーン・データ作成部５、インデックス・データ作成部９、データ結合部１０で行われるデータ生成処理の挙動を制御するためのものである。また、シーン更新検出部４で検知されたシーンの更新、および、例えばシーン更新の発生した時刻やシーン静止画ファイルの位置などといったシーン更新に付随する情報を受け取り、制御情報として内部的に保持する。
【００３１】
インデックス・データ作成部９は、シーン更新検出部４で検出されたシーン更新情報に基づき、更新されたシーンを表す静止画像データの位置、およびその出現順序を記述するインデックス・データを作成するためのものである。インデックス・データは、シーン画像ファイルのＵＲＬ文字列など、シーン・データの位置を特定するためのデータ・ポインター値や、あるいはシーン画像データそのものが記録され、シーン・データを参照あるいは包含することが可能な形式で作成されるものとする。このインデックス・データは、視聴者の端末で処理可能なデータ形式になっていても良い。例えば、視聴者の端末がＪＰＥＧ、ＨＴＭＬが処理可能な場合は、インデックス・データ作成部９は静止画データが時系列に並べられたＨＴＭＬデータをインデックス・データとして生成するようにしても良い。なお、この制御に必要な処理可能データ形式、シーン更新発生時刻、シーン静止画の位置といった制御情報は、データ作成制御部８から取得することが出来る。
【００３２】
データ結合部１０は、インデックス・データ作成部９で生成されたインデックス・データと、シーン・データ作成部５で生成されたシーン・データとを結合するためのものである。ここでは、例えばインデックス・データがＨＴＭＬの場合は、参照される静止画が正しく参照できるようなＵＲＬを再設定するといったように、両者の関連情報に矛盾がないか等のチェック処理を行う。また、端末で処理可能なデータ形式によっては、インデックス・データとシーン・データとが統合された単一のデータ形式として出力するといった処理も行われる場合もある。統合化されたデータ形式の例としては、ＨＴＭＬデータにＵＲＬで外部参照されるＪＰＥＧファイルのバイナリ・データを埋め込み、単一のアーカイブＨＴＭＬファイルや、シーン画像が出現順にレイアウトされたＪＰＥＧデータのような単一の画像データ、または時間的にレイアウトされたＭｏｔｉｏｎ　ＪＰＥＧのようなデータといったものがありえる。データ結合部１０の処理結果は、視聴者が視聴することが可能な符号化形式で、最終的なコンテンツ・データとして出力される。
【００３３】
受信データ解析部１１は、制御情報入力部６に渡される制御情報がネットワーク経由で送付される場合に、ネットワークから受信されたデータを解析し、制御情報を抽出するためのものである。
【００３４】
送信データ整形部１２は、データ結合部１０で生成された最終的なコンテンツ・データを外部の装置や媒体に何らかの伝送手段を介して送出する場合に、送信形態に応じてデータを適切な形に整形するためのものである。
【００３５】
通信制御部１３は、外部の装置および媒体からデータを送受信するためのものである。外部とのデータ伝送のための制御手段を有し、受信データ解析部１２あるいは送信データ整形部１２からデータの取得および送出を行うために利用される。
【００３６】
一般的に、データをネットワーク上で送受信する場合は、通信プロトコルに応じた通信制御情報がデータに付加されたり、通信を効率的に行えるようにデータを最適化したりする処理が行われる。そのため、受信データ解析部１１、送信データ整形部１２、通信制御部１３で扱われるデータは、必ずしもコンテンツ・データの本来のデータ内容とは一致していないことがある。そのため、これらの機能ブロックでは、通信処理に関連する部分のデータに対する解釈や整形処理が行われるが、コンテンツ・データ自体に対するデータ処理が行われるのではないことに注意されたい。ゆえに、本発明の画像処理装置がネットワーク経由でのデータ送受信を必要としない場合には、受信データ解析部１１、送信データ整形部１２、通信制御部１３は必要ではない。
【００３７】
続いて、本発明の画像処理装置の基本原理について、図面を参照しながら以下に説明する。
【００３８】
まず、本発明では、前記従来技術の項で述べた「動画データとして伝達される情報は、内容を把握しづらく、所望する情報にたどり着くまでに時間がかかる」という問題に対して、動画データ中のある時間的なポイントにおける一場面を示す「シーン」を走査して、シーンの切り替えが行われたこと示す信号を検出し、切り替えが発生した時点のシーンのイメージを静止画像データとして出力したシーン・データと、シーン・データの位置とシーンの前後関係がわかるように出現順を記述したインデックス・データを生成し、これらを視聴者が閲覧することが可能な符号化データ形式で提示する。視聴者はこの出力によって動画のシーン構成を知ることが出来るため、おおよその内容を把握することが出来る。
【００３９】
図３は、シーン・データ、およびインデックス・データの一例をし、ＭＰＥＧ−４形式で記録された元の動画コンテンツ・ファイルｍｏｖｉｅ．ｍｐ４から、シーン切り替えが発生した時点のシーン・データとインデックス・データを抽出して、シーン・データをｍｏｖｉｅ−０００１．ｊｐｇ〜ｍｏｖｉｅ−０００５．ｊｐｇといったＪＰＥＧファイル、インデックス・データをｍｏｖｉｅ．ｈｔｍｌといったＨＴＭＬファイルとして出力されることを示している。ｍｏｖｉｅ．ｈｔｍｌにはシーン・データを示すＪＰＥＧファイルのＵＲＬが記述されていることによって、ｍｏｖｉｅ．ｈｔｍｌを表示装置で表示させると、図中に示されるようなシーン静止画が時系列で配置されたイメージの出力結果が得られる。
【００４０】
そして、本発明では、このようなデータとして出力される動画コンテンツの概要情報を、視聴者が利用する端末機器が適切に扱うことが可能なデータ形式に変換して提供するようにしている。このことによって、端末機器の表示性能の違いを吸収し、動画の内容をより多くの視聴者へ情報を伝達することを可能にしている。
【００４１】
したがって、本発明で生成されるデータ形式は、必ずしも図３で説明したように、静止画像データの形式としてＪＰＥＧ、インデックス・データの形式としてＨＴＭＬのみが用いられる訳ではない。視聴者が利用する端末機器によっては、静止画像データはＧＩＦ、ＰＮＧ（Ｐｏｒｔａｂｌｅ　Ｎｅｔｗｏｒｋ　Ｇｒａｐｈｉｃｓ）等の形式で出力されなければならない場合もある。インデックス・データも、表示能力に劣る端末機器向けの形式としてインターネット標準規格化団体であるＷ３Ｃ（Ｗｏｒｌｄ　Ｗｉｄｅ　Ｗｅｂ　Ｃｏｎｓｏｒｔｉｕｍ）によって提案されているＣｏｍｐａｃｔ　ＨＴＭＬや、同様にＷＡＰ　Ｆｏｒｕｍによって提案されているＷＭＬ（Ｗｉｒｅｌｅｓｓ　Ｍａｒｋｕｐ　Ｌａｎｇｕａｇｅ）等、端末機器に適する形式の構造化言語で出力される場合もある。さらに、シーン静止画を空間的のみならず時間的に配置するため、前記Ｗ３Ｃによって標準化が行われているＳＭＩＬ（Ｓｙｎｃｈｒｏｎｉｚｅｄ　Ｍｕｌｔｉｍｅｄｉａ　Ｉｎｔｅｇｒａｔｅｄ　Ｌａｎｇｕａｇｅ）のように、時間的表現が可能なプレゼンテーション用の構造化言語で記述される場合もある。
【００４２】
また、シーン・データとインデックス・データは、図１のデータ結合部１０によって単一の出力データに統合される場合もある。例えば、前記したように外部参照されるシーン静止画ファイルのデータをＨＴＭＬに埋め込んだアーカイブＨＴＭＬとして出力する以外にも、Ａｎｉｍａｔｉｏｎ　ＧＩＦやＭｏｔｉｏｎ　ＪＰＥＧ形式ファイルのように、シーン静止画が出現順に並べられた単一の連続静止画ファイルのような形式で出力する場合もある。
【００４３】
もちろん、端末機器でサポートされていれば、米国Ａｐｐｌｅ　Ｃｏｍｐｕｔｅｒ社で定義されているマルチメディア・データ形式であるＱｕｉｃｋＴｉｍｅ（登録商標）のように、複数の静止画ファイルを単一のファイルに包含させることが出来るような形式で提示されても良い。
【００４４】
また、端末機器でＭＰＥＧ−４のような動画データ形式の再生をサポートしている場合であっても、従来の技術の項で述べたようにＭＰＥＧ−４でも様々なレベルの規格が存在するため、端末機器で再生可能なレベルのＭＰＥＧ−４データ形式に再変換して提示してもよい。
【００４５】
その他、上記のデータ形式以外であっても、端末機器で出力可能な形式であれば、どのようなデータ形式で提示されてもよい。例えば、端末機器がプリンタ等の場合は、上記のようなラスター形式のデータではなく、ＰｏｓｔＳｃｒｉｐｔ、ＰＤＦ（Ｐｏｒｔａｂｌｅ　Ｄｏｃｕｍｅｎｔ　Ｆｏｒｍａｔ）のような形式で提示されても良い。
【００４６】
動画コンテンツの概要情報が端末機器にどのようなデータ形式で出力されるかの具体例を、図４に示す。図４は、動画インデックス・データが提示される端末機器および各端末に提示される際のデータ形式の例を示す図であり、端末機器として（１）携帯情報端末機、（２）携帯電話機、（３）パーソナル・コンピュータ、（４）ゲーム機、（５）プリンタがある場合に、本発明の画像処理装置によって生成されたデータを、（１）にはＨＴＭＬとＪＰＥＧ、（２）にはＣｏｍｐａｃｔ　ＨＴＭＬとＧＩＦ、（３）にはＳＭＩＬ　（Ｓｙｎｃｈｒｏｎｉｚｅｄ　Ｍｕｌｔｉｍｅｄｉａ　Ｉｎｔｅｇｒａｔｉｏｎ　Ｌａｎｇｕａｇｅ）とＪＰＥＧ、（４）にはＡｎｉｍａｔｉｏｎ　ＧＩＦ、（５）にはＰｏｓｔＳｃｒｉｐｔとして提供していることを示している。
【００４７】
なお、本発明では、動画データのシーン変更を検出するための手段として、動画コンテンツ・データの符号化形式として前記ＢＩＦＳデータを含むＭＰＥＧ−４形式で記録されている場合は、ＢＩＦＳの仕様の一部である「ＢＩＦＳ　Ｃｏｍｍａｎｄ」と呼ばれる機能を利用する。
【００４８】
説明の際の予備知識として、ＭＰＥＧ−４の「ＢＩＦＳ　Ｃｏｍｍａｎｄ」の概要について、以下にごく簡単に説明しておく。
【００４９】
図２は、複数のオブジェクト・データからシーンを合成する概念を示す図であす。ＭＰＥＧ−４では、図２で示されるように、複数のオブジェクトによって動画中のシーンを構成することが出来る。このオブジェクト・データは、それぞれが固有のデータ・ストリームを持ち、さらに、オブジェクトが空間的、時間的にどのように配置されるかを示すためにＢＩＦＳ形式で記述されるシーン記述情報、およびオブジェクトのデータ・ストリームとシーン記述情報とを関連付けるためのオブジェクト・ディスクリプタ情報を持つ。これらの情報は、図５で示されるようにシーン合成手段に入力され、最終的なシーンとして合成される。図５は、ＭＰＥＧ−４のシーンが、シーン記述（ＢＩＦＳ）、オブジェクト・ディスクリプタ、オブジェクト・データによって合成されることを表現する図である。
【００５０】
ＭＰＥＧ−４では、上記ＢＩＦＳデータの一部として、シーンを構成するオブジェクトを挿入、削除、置換したり、あるいはシーン自体を置換したりするための操作を表すコマンド・データの記述方式を規定している。このコマンド・データが「ＢＩＦＳＣｏｍｍａｎｄ」であり、ＢＩＦＳ　Ｃｏｍｍａｎｄデータをコンテンツ中に含めることによって、シーンの更新を制御することが可能になっている。なお、前記ＩＳＯより発行されるＩＳＯ／ＩＥＣ　１４９９６−１標準文書にＢＩＦＳ　Ｃｏｍｍａｎｄを含むＢＩＦＳの完全な仕様が記述されているので、詳細についてはそちらを参照されたい。
【００５１】
本発明では、入力されるコンテンツ・データに含まれるＢＩＦＳ　Ｃｏｍｍａｎｄデータ、あるいはそれに類する何らかのコマンド・データによってシーンに何らかの変更が加えられたことが示されたポイントを、シーン更新が発生したポイントとして扱うことによって、シーン更新の検出手段とする。
【００５２】
次に、本発明の基本原理の実施の形態について、以下に例を挙げて説明する。
【００５３】
＜実施の形態１＞
実施の形態１として、本発明の画像処理装置が、通信プロトコルとしてＨＴＴＰ（Ｈｙｐｅｒ　Ｔｅｘｔ　Ｔｒａｎｓｆｅｒ　Ｐｒｏｔｏｃｏｌ）を使用できるインターネット・サーバー機上で動作するソフトウェア・プログラムとして動作する場合の形態について説明する。この形態の典型例は、通信制御部１３や受信データ解析部１２などはＡｐａｃｈｅなどのＨＴＴＰサーバー・プログラムで、その他の機能ブロックはＣＧＩ（Ｃｏｍｍｏｎ　Ｇａｔｅｗａｙ　Ｉｎｔｅｒｆａｃｅ）プログラムなどとして実現されると思われる。また、視聴者の利用する端末機器は、図４（１）の携帯情報端末機器を想定する。
【００５４】
実施の形態１では、画像処理装置に対してデータの生成方法を指示するためのデータ作成制御データは、図６のように端末機器からＨＴＴＰで伝送される。図６は、動画インデックス・データ生成装置に対して、制御データをＨＴＴＰで伝送する際のデータ形式を示す図である。端末機器からは、図中で示されるように、処理対象の動画コンテンツ・データの位置を示すＵＲＬ（／ｃｇｉ／ｂｉｎ／ｍｖｉｎｄｅｘ．ｅｘｅ？ｍｏｖｉｅ．ｍｐ４　）、端末機器で処理可能なデータ形式（Ａｃｃｅｐｔ：　ｔｅｘｔ／ｐｌａｉｎ，　ｔｅｘｔ／ｈｔｍｌ，　ｉｍａｇｅ／ｊｐｅｇ）、端末機器あるいは表示のためのプログラムの種類を識別する名称（ＭｏｖｉｅＰｌａｙｅｒ／１．０）などがデータ作成制御データとして、ＨＴＴＰで送信されるデータのヘッダー部に記述されて送信されるものとする。無論、データ作成制御データはＨＴＴＰのヘッダー部ではなく、データ本体として送信されても良いことは言うまでもない。
【００５５】
送信されたデータ作成制御データは、通信制御部１３から受信データ解析部１１を経て動画コンテンツ・データのＵＲＬ（／ｃｇｉ／ｂｉｎ／ｍｖｉｎｄｅｘ．ｅｘｅ？ｍｏｖｉｅ．ｍｐ４）が取り出され、画像処理装置を実行するＣＧＩプログラムに渡される。この処理は、本形態では、ＨＴＴＰによるリクエストからヘッダー部分を抽出し、ＣＧＩプログラムに受け渡すといった、ＨＴＴＰサーバー・プログラムが通常備えている機能を用いて実現される。なお、本形態では通信制御部とデータ処理部が分離しているため、ＵＲＬに処理対象の動画コンテンツ・データのファイル名（ｍｏｖｉｅ．ｍｐ４）とデータ処理部が実装されたＣＧＩプログラムのＵＲＬ（／ｃｇｉ／ｂｉｎ／ｍｖｉｎｄｅｘ．ｅｘｅ）が同時に指定されているが、通信制御部とデータ処理部が同一のプログラムとして動作する場合にはコンテンツ・データのファイル名のみを指定出来る。
【００５６】
コンテンツ・データを指定するための方法として、ファイル名以外にも、コンテンツに関連付けられた任意形式の識別子などを使用しても良い。ただし、使用されるデータは、何らかの方法で処理対象のコンテンツ・データを特定することが出来るものでなければならない。
【００５７】
受信されたデータ作成制御データは、制御情報入力部６を介して、制御情報解析部７に渡される。制御情報解析部７では、入力された制御情報を解析し、内部記憶に取り込む処理が行われる。以下、制御情報解析部７で行われる解析処理を、図７のフローチャートを用いて順を追って説明する。
【００５８】
図７は、制御情報解析部８において実行される処理を示すフローチャートである。
【００５９】
（ステップＳ７１）制御情報入力部６から渡されたデータ作成制御データを取得する。
【００６０】
本形態の例では、図６で示されるようにＨＴＴＰのヘッダー部分に記述されたデータ作成制御データは、ＨＴＴＰサーバー・プログラムによって環境変数の形でＣＧＩプログラムに渡される。図６の例で言えば、”Ｕｓｅｒ−Ａｇｅｎｔ”など、”：”より左の部分に示される項目に対応して定義された環境変数に、”ＭｏｖｉｅＰｌａｙｅｒ／１．０”などの”：”より右の部分のデータが代入される。また、１行目のＵＲＬの”？”より右の部分は、”ＱＵＥＲＹ＿ＳＴＲＩＮＧ”という環境変数に代入される。その結果、ＣＧＩプログラムとして実装される制御情報解析部７には、
”ＨＴＴＰ＿ＡＣＣＥＰＴ＝ｔｅｘｔ／ｐｌａｉｎ，　ｔｅｘｔ／ｈｔｍｌ，　ｉｍａｇｅ／ｊｐｅｇ”
”ＨＴＴＰ＿ＵＳＥＲ＿ＡＧＥＮＴ＝ＭｏｖｉｅＰｌａｙｅｒ／１．０”
”ＱＵＥＲＹ＿ＳＴＲＩＮＧ＝ｍｏｖｉｅ．ｍｐ４”
といった環境変数が利用可能となる。
【００６１】
（ステップＳ７２）制御項目およびその項目が示すデータを内部記憶に保存する。
【００６２】
ここで、本形態の例では、環境変数ＨＴＴＰ＿ＡＣＣＥＰＴ、ＨＴＴＰ＿ＵＳＥＲ＿ＡＧＥＮＴ、ＱＵＥＲＹ＿ＳＴＲＩＮＧが制御項目として用いられると仮定すると、これらの環境変数で示されるデータを、後続の処理で利用可能な形式で内部記憶に保存する。
【００６３】
本形態の例では、制御項目として利用可能な環境変数名とそのデータとを連想配列形式のデータ構造中にそれぞれ文字列形式で記憶すると仮定する。ただし、内部記憶に保存される時の形態は、それが後続の処理で利用可能である限りはどのようなものであっても良い。例えば、ＨＴＴＰ＿ＡＣＣＥＰＴのデータ部は、処理可能なデータ形式をコンマで繋げて列記した単一の文字列データとして記述されているが、保存する際には文字列データ中に列記された個々のデータ形式を抽出してリスト形式で保存するといったように、任意の形式変換等の処理を行っても良い。
【００６４】
（ステップＳ７３）入力されたすべてのデータ作成制御データに対して解析処理が行われたら、コンテンツ・データの位置情報を動画コンテンツ入力部１に通知する。
【００６５】
本形態の例では、環境変数ＱＵＥＲＹ＿ＳＴＲＩＮＧの値”ｍｏｖｉｅ．ｍｐ４”がコンテンツ・データの位置情報に相当するため、この”ｍｏｖｉｅ．ｍｐ４”が動画コンテンツ入力部１に渡される。なお、場合によっては、コンテンツ・データの位置情報が絶対パスの形式で渡されることが望ましいことがありえるが、その場合は位置情報を上記のような相対ＵＲＬ形式から絶対パス形式に変換する処理も動画コンテンツ入力部１に受け渡す前に行われると仮定する。
【００６６】
ただし、本形態のように、位置情報が環境変数として展開され、動画コンテンツ入力部１からでも参照することが可能になっている場合は、この処理は必ずしも行われなくても良い。
【００６７】
（ステップＳ７４）上記ステップＳ７１の処理によって内部記憶に保持された制御情報を、データ作成制御部８に受け渡す。渡された制御情報は、そのままデータ作成制御部８によって管理される。
【００６８】
本形態では、ステップＳ７１で述べたようなデータ構造のポインタが渡されるものと仮定する。
【００６９】
以上の手順によって、データ作成制御データが解析され、コンテンツ・データの位置情報が動画コンテンツ入力部１に渡される。
【００７０】
一方、動画コンテンツ・データは、制御情報解析部７から受け渡された位置を元に動画コンテンツ入力部１によって取得され、コンテンツ非多重化部２による非多重化処理、符号化データ解析部３による符号化データの解析処理を経た後、シーン更新検出部４に渡される。シーン更新検出部４では、入力された符号化データ中からシーンの変更を示す情報を捜し、ここで変更が検出されたシーンが、後続の処理で静止画を抽出する対象として取り扱われる。以下、シーン更新検出部４で行われる検出処理を、図８のフローチャートを用いて順を追って説明する。なお、シーン更新検出部４で処理されるデータは、すでにコンテンツ非多重化部２によって動画、静止画、シーン記述などのそれぞれのデータ形式毎に分離されたデータ列として渡されているものとする。また、分離されたデータ列には、ＭＰＥＧ−４の規格において定義されるＤＴＳ（Ｄｅｃｏｄｉｎｇ　Ｔｉｍｅ　Ｓｔａｍｐ）、ＣＴＳ（Ｃｏｍｐｏｓｉｔｉｏｎ　Ｔｉｍｅ　Ｓｔａｍｐ）といった、データ列に含まれる所定のブロックがいつ処理されるべきであるかを示す時刻情報が埋め込まれているものとする。
【００７１】
図８は、シーン更新検出部４において実行される処理を示すフローチャートである。
【００７２】
（ステップＳ８１）まず、入力された符号化データの種類を確認する。
【００７３】
本形態では、入力された符号化データの種類は、前述のオブジェクト・ディスクリプタ情報によって判定する。オブジェクト・ディスクリプタ情報には、コンテンツ中に含まれるオブジェクトの内容が記述されたデータ・ストリーム（Ｅｌｅｍｅｎｔａｒｙ　Ｓｔｒｅａｍ）の構成情報や符号化形式の種類などが記述されている。例えば、オブジェクト・ディスクリプタの一種であるＩｎｉｔｉａｌ　Ｏｂｊｅｃｔ　Ｄｅｓｃｒｉｐｔｏｒには、コンテンツに含まれるＥｌｅｍｅｎｔａｒｙ　Ｓｔｒｅａｍの特性情報を記述するＥＳ＿Ｄｅｓｃｒｉｐｔｏｒ、符号化形式の特性情報を記述するＤｅｃｏｄｅｒＣｏｎｆｉｇＤｅｓｃｒｉｐｔｏｒといった情報がまとめて記述されている。ＩＳＯの定義では、符号化データの種類は、上記のＤｅｃｏｄｅｒＣｏｎｆｉｇＤｅｓｃｒｉｐｔｏｒの項目ｓｔｒｅａｍＴｙｐｅによって指定されるようになっており、この値によって識別可能となっている。符号化データ解析部３では、符号化データをあらかじめ解析し、シーン更新検出部の各処理で利用可能な形態にしておく必要がある。本形態では符号化データ解析部３で符号化データの種類等を含むオブジェクト・ディスクリプタ情報やストリームデータ自身の解析処理が行われるものとする。
【００７４】
なお、オブジェクト・ディスクリプタ情報をはじめとするＭＰＥＧ−４による符号化については、ＩＳＯ／ＩＥＣ　１４４９６−１の規格文書に完全な定義が記載されているため、本明細書では詳細にわたる説明は省略する。
【００７５】
（ステップＳ８２）入力された符号化データの種類が、ＢＩＦＳを示すものであるかチェックする。ＢＩＦＳの場合はステップＳ８３の処理を実行し、さもなければステップＳ８５の処理までスキップする。
【００７６】
本形態の場合は、符号化データの種類は上記ＤｅｃｏｄｅｒＣｏｎｆｉｇＤｅｓｃｒｉｐｔｏｒの項目ｓｔｒｅａｍＴｙｐｅで指定され、ＢＩＦＳの場合はｓｔｒｅａｍＴｙｐｅの値が０ｘ０３となっている。したがって、ここではｓｔｒｅａｍＴｙｐｅ＝０ｘ０３であるかどうかをチェックする。
【００７７】
（ステップＳ８３）入力されたＢＩＦＳデータが、ＢＩＦＳ　Ｃｏｍｍａｎｄであるかチェックする。ＢＩＦＳ　Ｃｏｍｍａｎｄの場合はステップＳ８４の処理を実行し、さもなければステップＳ８５の処理までスキップする。
【００７８】
本形態の場合は、ＢＩＦＳ　Ｃｏｍｍａｎｄを検出することによってシーンに何らかの変更が発生したことを判断しているため、ＢＩＦＳデータ中にＢＩＦＳ　Ｃｏｍｍａｎｄが含まれているかをチェックする必要がある。ＢＩＦＳ　Ｃｏｍｍａｎｄは、通常、Ｃｏｎｄｉｏｎａｌと呼ばれる種類のオブジェクトの属性データとして記述されるため、ここではＢＩＦＳデータ中にＣｏｎｄｉｔｉｏｎａｌオブジェクト、およびその属性として記述されたＢＩＦＳ　Ｃｏｍｍａｎｄデータが含まれるかどうかをチェックする。
【００７９】
（ステップＳ８４）シーンの更新が検出されたら、更新の内容と更新が発生する時刻とをデータ作成制御部８に受け渡すことによって、シーンの更新を通知する。データ作成制御部８は、これらのシーン更新情報を内部記憶に保持し、後続のデータ生成処理の制御に用いられるよう管理しておく。
【００８０】
本形態の場合は、更新の内容は、コマンドの種類（追加／削除／置換／シーン置換）を示すＢＩＦＳ　Ｃｏｍｍａｎｄの属性ｃｏｄｅの値によって判断することが可能である。ｃｏｄｅには、０（オブジェクトの挿入）、１（オブジェクトの削除）、２（オブジェクトの置換）、３（シーンの置換）の４種類の値が定義されている。また、更新が発生する時刻は、ＢＩＦＳ　Ｃｏｍｍａｎｄに対して割り当てられているＣＴＳ（Ｃｏｍｐｏｓｉｔｉｏｎ　Ｔｉｍｅ　Ｓｔａｍｐ）が用いられる。したがって、ここではＢＩＦＳ　Ｃｏｍｍａｎｄのｃｏｄｅ値、およびＣＴＳの値を取得し、それぞれを更新内容、更新発生時刻としてデータ作成制御部８に渡している。
【００８１】
（ステップＳ８５）入力されたデータがすべて処理されたかどうかチェックする。未処理の入力データが残っている場合は、残りのデータに対してステップＳ８１〜ステップＳ８４の処理を行う。
【００８２】
以上の手順によって、シーン更新検出部４に入力された符号化データからシーンの変更を示す情報が検出される。
【００８３】
シーン・データ作成部５では、図９で示される手順で、シーン更新検出部４から渡されたシーン情報を元に、可視形式の画像フォーマットで記録されたシーン静止画データの作成処理を行う。以下、図９のフローチャートを用いてシーン・データ作成部５の処理を順を追って説明する。
【００８４】
図９は、シーン・データ作成部５において実行される処理を示すフローチャートである。
【００８５】
（ステップＳ９１）データ作成制御部８から、入力されたシーンの更新時刻に対応するタイムスタンプを取得する。
【００８６】
（ステップＳ９２）データ作成制御部８から、視聴者が利用する端末機器で処理可能なデータ形式の情報を取得する。
【００８７】
その際、作成される静止画データの形式は視聴者の端末機器で表示できる形式で出力される必要があるため、まず、データ作成制御部８に対して、端末機器で受け付けられるデータ形式を問い合わせる。本形態の例では、処理可能なデータ形式は図７の説明で示されるように、”ＨＴＴＰ＿ＡＣＣＥＰＴ＝ｔｅｘｔ／ｐｌａｉｎ，　ｔｅｘｔ／ｈｔｍｌ，　ｉｍａｇｅ／ｊｐｅｇ”という環境変数として取得されたのちデータ作成制御部８に渡され、管理されている。したがって、ＨＴＴＰ＿ＡＣＣＥＰＴに対応するデータはデータ作成制御部８から取得可能である。
【００８８】
（ステップＳ９３）入力されたシーン情報を、ステップＳ９２で取得された処理可能なデータ形式の静止画フォーマットで符号化し、シーンの静止画データを作成する。
【００８９】
本形態では、ステップＳ９４で取得されるデータ中には、ＨＴＴＰ＿ＡＣＣＥＰＴの”ｉｍａｇｅ／ｊｐｅｇ”として、ＪＰＥＧ形式の静止画データを受け付けることが出来ることを示すデータが含まれているため、ここではＪＰＥＧ形式の符号化処理を実行し、シーン静止画をＪＰＥＧ形式のファイルとして出力するようにする。もちろん、このデータが”ｉｍａｇｅ／ｇｉｆ”となっている場合はＧＩＦ形式で出力する等の動的な制御が必要であることは言うまでもない。
【００９０】
なお、ここで出力される画像ファイルは、図３で示されるように、”ｍｏｖｉｅ−０００１．ｊｐｇ”〜”ｍｏｖｉｅ−０００５．ｊｐｇ”というファイル名で作成されるものとする。
【００９１】
ただし、図５で示されるように、シーンが複数のオブジェクトによって構成されている場合は、静止画データを作成する前に、シーンを構成するオブジェクトの表示位置や、その他表示属性を示すデータを抽出し、その情報に基づいて仮想空間上にオブジェクトを配置してから、その配置イメージを静止画データとして出力しなければならない。
【００９２】
なお、この合成処理はＭＰＥＧ−４で複数のオブジェクトを取り扱う場合には必須の処理であるが、本発明の目的とは直接関係しないので、合成処理の実施方法に関する詳しい説明は省略する。
【００９３】
（ステップＳ９４）ステップＳ９３で出力された静止画データの位置を、シーン更新が発生した時刻と関連付けてデータ作成制御部８に通知する。ここで生成される静止画データは、静止画データ・ファイルに出力されても良いし、メモリー・ブロックに出力されても良い。したがって、データ作成制御部８に通知されるデータの位置は、静止画データ・ファイルの場合はファイル・パス、メモリー・ブロックの場合はアドレス値などが用いられることになる。本形態で示される例では、出力形態はＪＰＥＧ形式等の静止画像ファイルとし、データ作成制御部８には静止画データ・ファイル名が通知されるものとする。
【００９４】
データ作成制御部８は、シーン・データ作成部５から受け渡された静止画データの位置を、同じく受け渡されたタイムスタンプ関連付けて保持する。さらに、データ作成制御部８は受け取った静止画データの位置をインデックス・データ作成部９に渡して、後述されるインデックス・データの作成処理の実行を指示する。
【００９５】
上記ステップＳ９１〜ステップＳ９４までの一連の流れが処理されたら、動画コンテンツ入力部１に入力される次の入力データに対して、図８〜図９で示される処理を繰り返し行っていく。
【００９６】
以上の手順によって、シーン更新検出部４から渡されたシーン情報から、可視形式の画像フォーマットで記録されたシーン静止画データが作成される。ここまでの処理によって生成されたシーン静止画データは、最終的にインデックス・データ作成部９によって生成されるインデックス・データから参照される。以下、インデックス・データ作成部９で行われる処理を、図１０を用いて順を追って説明する。
【００９７】
図１０は、インデックス・データ作成部において実行される処理を示すフローチャートである。
【００９８】
（ステップＳ１０１）データ作成制御部８から静止画データの位置を受け取る。
【００９９】
（ステップＳ１０２）端末機器で処理可能なデータ形式をデータ作成制御部９に問い合わせ、取得する。
【０１００】
本形態では、図７の説明で示されるように、処理可能なデータ形式は”ＨＴＴＰ＿ＡＣＣＥＰＴ＝ｔｅｘｔ／ｐｌａｉｎ，　ｔｅｘｔ／ｈｔｍｌ，　ｉｍａｇｅ／ｊｐｅｇ”という環境変数として取得されたのちデータ作成制御部８に渡され、管理されている。したがって、ここではこのＨＴＴＰ＿ＡＣＣＥＰＴに対応するデータをデータ作成制御部８から取得する。
（ステップＳ１０３）静止画データの位置を、ステップＳ１０１で取得された形式を元に、適切な形式のインデックス・データに記録し、出力する。
【０１０１】
本形態の場合、ステップＳ１０２で取得されるデータ中には、”ｔｅｘｔ／ｈｔｍｌ”として、ＨＴＭＬ形式のデータを受け付けられることを示すデータが含まれているので、インデックス・データ作成部９はインデックス・データを図３に示される”ｍｏｖｉｅ．ｈｔｍｌ”という名称のＨＴＭＬ形式のファイルとして出力するようにしている。また、本形態では、静止画データの位置としてファイル名が用いられているため、インデックス・データには参照される静止画のファイル名が指定されたＩＭＧタグのデータが出力されるものとする。
【０１０２】
なお、ここで作成されるインデックス・データは、必ずしも視聴者の端末機器で再生可能なデータ形式で作成されていなくてもよい。この例では、端末機器はＨＴＭＬを処理することが出来るためインデックス・データをＨＴＭＬ形式で出力しているが、例えばＪＰＥＧ形式しか処理することが出来ない端末機器の場合には、ＨＴＭＬ形式のインデックス・データは処理できない。
【０１０３】
このような場合、インデックス・データには静止画の順序や配置情報などを記録した任意形式のデータとして作成し、後続のデータ結合部１０でインデックス・データの情報とシーン静止画データを結合し、端末機器で処理可能な形式で出力するようにしなければならない。すなわち、ここで挙げられたＪＰＥＧ形式しか処理できない機器に対しては、データ結合部１０では複数のシーン静止画がレイアウトされた一枚のＪＰＥＧ画像を生成するといった処理が必要となるだろう。
【０１０４】
このようなケースでは、インデックス・データ作成部９は任意の形式でインデックス・データを生成しても良い。
【０１０５】
（ステップＳ１０４）ステップＳ１０３で出力されたインデックス・データの位置をデータ作成制御部８に通知する。
【０１０６】
本形態の場合は、ステップＳ１０３で出力されたインデックス・データのファイル名”ｍｏｖｉｅ．ｈｔｍｌ”がデータ作成制御部８に渡される。
【０１０７】
なお、インデックス・データ作成部９では、静止画データの位置が通知された時点でインデックス・データを内部記憶上に作成しておき、すべてのシーンの処理が完了した時点で最終的なインデックス・データ形式に変換、出力する処理を行うようにしても良い。したがって、データ作成制御部８に通知されるインデックス・データの位置は、本形態の場合はＨＴＭＬファイル・パスであるが、内部記憶のアドレス値などを用いても良い。
【０１０８】
以上図７〜図１０の処理によって、動画コンテンツに含まれるシーンへのインデックス・データが作成される。
【０１０９】
すべてのシーンに対して、図７〜図１０の処理が完了したら、データ作成制御部９はデータ結合部１０を呼び出し、これまでに登録されたインデックス・データの位置および静止画データの位置を渡す。本形態では、データ結合部１０では、出力されたＨＴＭＬファイルに記述されたＪＰＥＧファイルへの参照が正しいか、あるいは、ＵＲＬがローカル環境で使用されるファイル・パスなどで記述されている場合に、正しく参照できるＵＲＬに変換する処理などが行われる。
【０１１０】
さらに、端末機器に対してインデックス・データとシーン・データがパッケージされた単一のデータとして提示される必要がある場合は、ここでデータの結合処理を行う。
【０１１１】
例えば、図１０のフローチャートの説明中に挙げられたような、ＪＰＥＧ形式しか処理できない機器や、あるいは図４のゲーム機（４）、プリンタ（５）のような機器に対しては、ここまでの処理で作成されたインデックス・データとシーン・データを統合して、端末機器で処理可能な形式で出力する。データの結合処理を行う際に、インデックス・データ、およびシーン・データを別の符号化形式に変換する必要がある場合は、データ結合部１０では必要な符号化形式変換や再符号化処理を行わなければならない。
【０１１２】
データ結合部１０で処理されたデータは、最終的に、送信データ整形部１２でＨＴＴＰのレスポンス・データとして整形され、通信制御部１３を介して視聴者の端末機器に送信される。
【０１１３】
このような手段によって生成された動画インデックス・データは、視聴者の利用する端末機器がサポートする形式で生成されることが保証されるため、視聴者は確実に動画情報の内容を知ることが可能になる。
【０１１４】
本実施例では、シーン・データと共にインデックス・データを生成したが、シーン・データのみを生成し、生成されたシーン・データを次々と端末装置に転送し、表示するものでも構わない。シーン・データのみを生成する場合は、すべてのシーン・データを生成する前に、生成されたシーン・データを見ることができ、視聴者に内容をより早く伝達することができる。また、シーン・データとインデックス・データを生成する場合は、視聴者がすべてのシーン・データを一度に見ることができ、例えば端末機器がプリンタ等である場合に適している。
【０１１５】
また、本実施例では、シーン更新検出に、シーン操作のためのコマンド・データ（ＢＩＦＳコマンド）を用いたので、容易にシーン更新検出が可能である。
【０１１６】
また、本実施例では、インデックス・データとシーン・データを結合し、ひとまとめにすることで、ネットワークを通じた送受信の手間が軽減し、また、データの管理も容易になる。
【０１１７】
＜その他の実施形態＞
また、上記実施形態では、ネットワークを構成するハードウェア等が含まれるものの、各処理部は実際はソフトウェアで実現できるものである。即ち、本発明の目的は、上述した実施の形態の機能を実現するソフトウェアのプログラムコードを記録した記憶媒体（または、記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（または、ＣＰＵやＭＰＵ）が、記憶媒体に格納されたプログラムコードを読み出し、実行することによっても達成されることは言うまでもない。この場合、記憶媒体から読み出されたプログラムコード自体が、上述した実施の形態の機能を実現することになり、そのプログラムコードを記憶した記憶媒体が本発明を構成することになる。
【０１１８】
また、コンピュータが読み出したプログラムコードを実行することにより、上述した実施の形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）等が、実際の処理の一部または全部を行い、その処理によって、上述した実施の形態の機能が実現される場合も含まれることは言うまでもない。
【０１１９】
さらに、記憶媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵ等が、実際の処理の一部または全部を行い、その処理によって、上述した実施の形態の機能が実現される場合も含まれることは言うまでもない。
【０１２０】
また、上記実施形態では、データ作成制御情報を入力して、データ形式を設定しているが、いくつかのデータ形式から選択して行ってもよいのは言うまでもない。
【０１２１】
【発明の効果】
以上説明したように、本発明によれば、ＭＰＥＧ−４等の動画コンテンツで提供される情報に対し、内容を容易に把握できるようなシーン静止画データを、インターネット等のネットワークを介して広範な端末機器で視聴できるような形式で提供することが可能となり、その結果、より多くの視聴者に情報を効果的に早く伝達することが可能となる。
【図面の簡単な説明】
【図１】本発明の全体構成を示す図である。
【図２】複数のオブジェクト・データからシーンを合成する概念を示す図である。
【図３】シーン・データ、インデックス・データの例を示す図である。
【図４】動画インデックス・データが提示される端末機器および各端末に提示される際のデータ形式の例を示す図である。
【図５】ＭＰＥＧ−４のシーンが、シーン記述（ＢＩＦＳ）、オブジェクト・ディスクリプタ、オブジェクト・データによって合成されることを表現する図である。
【図６】画像処理装置に対して、制御データをＨＴＴＰで伝送する際のデータ形式を示す図である。
【図７】制御情報解析部７において実行される処理を示すフローチャートである。
【図８】シーン更新検出部４において実行される処理を示すフローチャートである。
【図９】シーン・データ作成部５において実行される処理を示すフローチャートである。
【図１０】インデックス・データ作成部９において実行される処理を示すフローチャートである。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image processing apparatus and method, and a computer program and a computer-readable storage medium, and more particularly to generation of still image data of another format from moving image data of a multimedia encoding format.
[0002]
[Prior art]
In recent years, digital video cameras have become widespread. Various forms have been taken, such as being connectable to a device having a network communication function such as a notebook personal computer or a portable terminal, or being integrated with a mobile phone. Behind the adoption of this format, we want to send video data shot by a video camera by e-mail or over a network so that the video can be viewed from a distance. It is a sign that there is a consumer that wants.
[0003]
Against this background, various improvements and enhancements have been made to the encoding and compression technology of moving image data, assuming that it will be used in conjunction with network communication technology. MPEG-4, which is a multimedia coding format standardized by ISO (International Organization for Standardization), is a typical example. MPEG (Moving Picture Experts Group) -4 is presumed to be used in a network environment, and significantly improves the compression method of the conventional moving picture coding formats, MPEG-1 and MPEG-2, and achieves high compression efficiency. Has been realized.
[0004]
Since the realization of MPEG-4 previously required a wide data transmission bandwidth, MPEG moving picture coding technology that was used only in storage media such as DVDs and in areas such as broadcasting has been replaced by a technology such as the Internet environment. It has become available in relatively limited bandwidth environments.
[0005]
In fact, digital video that can handle MPEG-4 and moving picture coding formats based on MPEG-4, such as the ASF (Advanced Streaming Format) format defined by Microsoft Corporation in the United States. A wide variety of products such as cameras and network video players have been introduced to the market. In particular, at the time of the description of this specification (July 2002), in FOMA (registered trademark), a third-generation mobile communication service by NTT DoCoMo, a digital video communication service using MPEG-4 is described as " i-motion (registered trademark) "has already begun, and the number of products that can perform video communication will increase in the future, and content providers will be able to add to still image and text data. Therefore, it is expected that opportunities to distribute video contents that can provide more appealing information will increase.
[0006]
However, in general, moving image data can be reproduced only by a reproducing apparatus having a mechanism for correctly analyzing the encoding format of moving image data to be reproduced. Therefore, there is a problem that a moving image cannot be reproduced by an apparatus having no means for analyzing moving image encoded data.
[0007]
This problem is unlikely to be a major problem if moving image data is reproduced by a WWW browser operating on a device such as a personal computer. A typical WWW browser such as Netscape Navigator (registered trademark) of Netscape Communications of the United States or Internet Explorer (registered trademark) of Microsoft of the United States has a means for analyzing an unknown data format when detecting the format. Acquire a program module known as a "plug-in" or "ActiveX control" via a network and install it as an extended function of the WWW browser. Acquire a module corresponding to the video data format to be played. This is because reproduction becomes possible.
[0008]
However, terminal devices such as mobile phones often have limited functions due to various restrictions, so these functions are not normally supported, and data other than data in a format pre-installed in the device is displayed. I can't do it. In addition, at present, there are still many devices that can display only a still image data format, not to mention a moving image, so that a user of such a model cannot view the moving image content.
[0009]
In addition, there may be devices that cannot view moving image data due to physical restrictions instead of such software restrictions. For example, in a device such as a printer or a facsimile that outputs data by printing it on paper, it is no longer possible to view the data as a moving image.
[0010]
Even if the terminal device can analyze the data in the MPEG-4 format, there is a possibility that the terminal device may not be able to view the data depending on the type of the MPEG-4. In the ISO, as part of the MPEG-4 standard, data such as video, audio, and other texts and figures are treated as "objects", and "scenes" in which these objects are spatially and temporally arranged on a display screen. It defines a data encoding format called BIFS (Binary Format for Scenes) for description. By adding the BIFS information, it is possible to provide contents with higher expressive power than ordinary moving images. However, since MPEG-4 has a plurality of levels as described above, even if the playback device supports the MPEG-4 format, the MPEG-4 video encoded data can be analyzed, but the If the device does not support the processing function, the content cannot be viewed at all.
[0011]
As described above, at present, even if video content is created and distributed on a network, it is not always possible for the video to be played on the viewer's device. There is a problem that it cannot be transmitted to viewers or can be transmitted only to a limited number of viewers.
[0012]
In addition to the above-mentioned problems, when information is transmitted as moving image data, a well-known problem specific to moving images, such as "it is difficult to grasp the contents and it takes time to reach desired information". .
[0013]
Moving image information has a characteristic property of having a time axis, unlike still image information and text information. Due to its nature, the video must be played from the beginning to the end in order to grasp the content of the video information, so the time required to convey the content to the viewer is limited to information such as still images and text It is inevitably longer compared to.
[0014]
Also, it is impossible to know whether the moving image information is useful for the viewer and where the useful part exists in the moving image unless the user watches the moving image information. Therefore, even if the information is useless for the viewer, after all, the video must be played from the beginning and viewed while checking the content, and the viewer is not required to reach the desired information. You will be forced to view the information. Therefore, it can be inefficient for a viewer to watch video content and obtain necessary information from the video content, which requires a great deal of time and stress.
[0015]
Furthermore, when moving image information is acquired via a network, communication time for data acquisition is required in addition to reproduction time, so that the time and stress imposed on the viewer further increase. In addition, when the network is a public network, there is also a problem that a line usage charge is imposed on data communication. Although compressed, the size of the moving image data is very large, so the time and cost of acquiring data over a network is not negligible.
[0016]
Conventionally, as a technique of extracting a scene still image from a moving image and a technique of outputting an index, an index in which a still image representing a frame in video data and data in which audio description and subtitles attached to the image are converted to text are arranged. -There is a method of providing data (for example, see Non-Patent Document 1). There is also a method of supporting video browsing on a communication network such as the Internet / World Wide Web based on this method (for example, see Patent Document 1).
[0017]
[Patent Publication 1]
JP-A-9-244849
[Non-patent publication 1]
Amy T. Incremona, "Automatically transcribing and Conducting Video: New Technology is Born", ADVANCED IMAGING, August 1995.
[0018]
[Problems to be solved by the invention]
As described above, video content may not be able to be played correctly on many playback devices, and it is not possible to easily grasp the content of the content. There was a problem that it was difficult to provide easily.
[0019]
SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems. Information provided in a moving image such as MPEG-4 is converted into scene still image data for easily grasping the contents by using a network such as the Internet. Image processing apparatus and method capable of transmitting information to a larger number of viewers effectively and quickly by providing the information in a format that can be viewed on a wide range of terminal devices via the same, and a computer program and a computer-readable storage medium The purpose is to provide.
[0020]
[Means for Solving the Problems]
In order to achieve the above object, an image processing apparatus according to the present invention includes: a moving image data input unit that inputs moving image data; a data format setting unit that sets a data format that can be understood by an output destination device; It has a scene update detecting means for detecting a scene update of data, and a scene data creating means for creating scene data of the scene detected by the scene update detecting means in the data format.
[0021]
In order to achieve the above object, an image processing method according to the present invention includes a moving image data inputting step of inputting moving image data, a data format setting step of setting a data format that an output destination device can understand, and A scene update detecting step of detecting a scene update of the moving image data input in the data input step, and creating scene data of the scene detected in the scene update detecting step in the data format set in the data format setting step And a scene data creating step.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing the overall configuration of the present invention, and shows functional blocks constituting an apparatus for realizing the present invention.
[0023]
The moving image content input unit 1 is for taking in moving image content data recorded as digital data in a multimedia encoding format such as MPEG-4. The input moving image content is stored in a location such as a recording device such as a disk device or a CD on a computer, or a remote device via a network. Processing for expanding the data to the internal storage so that the processing can be executed is performed.
[0024]
The content demultiplexing unit 2 is for performing a demultiplexing process when the moving image content data transmitted from the moving image content input unit 1 is composed of a plurality of multiplexed data strings. In the case of MPEG-4, a multiplexing method called FlexMux for multiplexing audio data and other types of data together with moving image data into a single data string is standardized by ISO, but content multiplexing is not performed. The multiplexing unit 2 performs a process of separating moving image data from a data string multiplexed in the FlexMux format, for example. Note that such multiplexing processing does not necessarily need to be performed on all data, and thus, if the input content is not multiplexed, the multiplexing processing is skipped. The data string separated here is passed to the encoded data analysis unit 3.
[0025]
The encoded data analysis unit 3 analyzes the encoded data of the data block sent from the content demultiplexing unit 2 and performs data processing such as decoding. Generally, multimedia data is compression-encoded to reduce the amount of data. In this case, a process of expanding the compressed data is performed. When data is encrypted using a mechanism such as IPMP (Intellectual Property Management and Protection), which is a content protection standard defined by ISO, the decryption processing is performed here. The multimedia content may be composed of a plurality of types of data such as video, audio, and still images, and data for describing their mutual relation and layout information. Reference numeral 3 has means for performing an appropriate analysis process according to the input data type.
[0026]
The scene update detection unit 4 receives the data block processed by the encoded data analysis unit 3, determines whether a scene in a moving image corresponding to the data block has been changed, and detects update. belongs to. For example, in the case of MPEG-4, the use of the above-mentioned BIFS makes it possible to explicitly indicate that the scene in the content is to be switched, together with the update time. The scene update using BIFS will be described later, but the scene update detection unit 4 performs a process of detecting a signal indicating some scene change, such as scene update data indicated by BIFS. When the update is detected, the update time is passed to the data creation control unit 8.
[0027]
The scene data creation unit 5 is for outputting data representing a scene passed from a preceding functional block in a data format that can be viewed as a still image by a viewer. Here, for example, a scene at a predetermined time in moving image data encoded in the MPEG-4 format is converted into data in a still image encoding format such as JPEG (Joint Photographic Experts Group) or GIF (Graphics Interchange Format). Then, a process of outputting to an image file or a memory block is performed. As the time of the target scene, the scene update time acquired by the scene update detection unit 4 is acquired from the data creation control unit 8. The position of the image data output here is passed to a data creation control unit 8 described later.
[0028]
The control information input unit 6 is for taking in control information for instructing the image processing apparatus of the present invention what processing to perform and what type of data to generate. The data input as control information for creating the index data specifies the encoding format of the finally output data, the accuracy of the generated data, and the like. For example, if the viewer's terminal can process JPEG and HTML and the maximum displayable size is 176 x 144 pixels, the data such as "JPEG, HTML" and "176 x 144" Will be used as The control data is previously recorded in the image processing apparatus, input by an operator of the apparatus, or input by means such as being transmitted from a terminal device via a network.
[0029]
The control information analysis unit 7 analyzes the control data obtained by the control information input unit 6 and converts the control data into a format that can be used by the subsequent data creation control unit 9 for control. When the control data is input from an operator or transmitted via a network, the control data is generally input as character string data. However, the control information analysis unit 7 converts the data into an arbitrary data which is easily processed by the image processing apparatus. Perform processing to convert to the format. Since the data to be converted is used only inside the device, any format may be used as long as it is easily processed internally.
[0030]
The data creation control unit 8 controls the behavior of the data creation processing performed by the scene data creation unit 5, the index data creation unit 9, and the data combination unit 10 based on the control data passed from the control information analysis unit 7. It is for doing. In addition, it receives the update of the scene detected by the scene update detection unit 4 and the information accompanying the scene update such as the time at which the scene update occurred and the position of the scene still image file, and holds the information internally as control information. .
[0031]
The index data creation unit 9 creates index data describing the position of the still image data representing the updated scene and the order of appearance based on the scene update information detected by the scene update detection unit 4. Things. The index data records the data pointer value for specifying the position of the scene data, such as the URL character string of the scene image file, or the scene image data itself, and can refer to or include the scene data. Created in a simple format. The index data may be in a data format that can be processed by the viewer's terminal. For example, when the terminal of the viewer can process JPEG and HTML, the index data creation unit 9 may generate HTML data in which still image data is arranged in time series as index data. Note that control information such as a processable data format, a scene update occurrence time, and a position of a scene still image necessary for this control can be acquired from the data creation control unit 8.
[0032]
The data combining unit 10 combines the index data generated by the index data generating unit 9 with the scene data generated by the scene data generating unit 5. Here, for example, when the index data is HTML, a check process is performed to check whether there is any inconsistency in the related information between them, such as resetting a URL so that the still image to be referred can be correctly referred to. Further, depending on the data format that can be processed by the terminal, a process of outputting the index data and the scene data as a single integrated data format may be performed. Examples of the integrated data format include HTML data in which binary data of a JPEG file externally referenced by a URL is embedded, such as a single archive HTML file or JPEG data in which scene images are laid out in the order of appearance. There can be a single image data or temporal JPEG-like data such as Motion JPEG. The processing result of the data combining unit 10 is output as final content data in an encoding format that can be viewed by a viewer.
[0033]
When the control information passed to the control information input unit 6 is sent via the network, the received data analysis unit 11 analyzes the data received from the network and extracts the control information.
[0034]
When transmitting the final content data generated by the data combining unit 10 to an external device or medium via some transmission means, the transmission data shaping unit 12 converts the data into an appropriate form according to the transmission mode. It is for shaping.
[0035]
The communication control unit 13 is for transmitting and receiving data from external devices and media. It has a control means for data transmission with the outside, and is used for acquiring and transmitting data from the reception data analysis unit 12 or the transmission data shaping unit 12.
[0036]
Generally, when data is transmitted and received on a network, a process of adding communication control information according to a communication protocol to the data or optimizing the data so that communication can be performed efficiently is performed. Therefore, the data handled by the reception data analysis unit 11, the transmission data shaping unit 12, and the communication control unit 13 may not always match the original data content of the content data. Therefore, it should be noted that in these functional blocks, interpretation and shaping processing are performed on data related to communication processing, but data processing is not performed on content data itself. Therefore, when the image processing apparatus of the present invention does not require data transmission / reception via a network, the reception data analysis unit 11, the transmission data shaping unit 12, and the communication control unit 13 are not required.
[0037]
Subsequently, the basic principle of the image processing apparatus of the present invention will be described below with reference to the drawings.
[0038]
First, in the present invention, the problem that information transmitted as moving image data is difficult to grasp and it takes time to reach desired information as described in the section of the related art described above, A scene that scans a "scene" indicating a scene at a certain point in time, detects a signal indicating that a scene change has been performed, and outputs an image of the scene at the time of the change as still image data Generate data and index data describing the order of appearance so that the position of the scene data and the context of the scene can be understood, and present them in an encoded data format that can be viewed by a viewer. The viewer can know the scene configuration of the moving image from the output, and thus can grasp the approximate content.
[0039]
FIG. 3 shows an example of scene data and index data. The original moving image content file movie. mp4, the scene data and the index data at the time when the scene switching occurs are extracted, and the scene data is represented as movie-0001.mp4. jpg-movie-0005. JPEG file such as .jpg, index data, etc. html is output as an HTML file. movie. Since the URL of the JPEG file indicating the scene data is described in html, movie. When html is displayed on a display device, an output result of an image in which scene still images are arranged in time series as shown in the figure is obtained.
[0040]
According to the present invention, the summary information of the moving image content output as such data is converted into a data format that can be appropriately handled by a terminal device used by a viewer and provided. This makes it possible to absorb the difference in the display performance of the terminal device and to transmit the content of the moving image to more viewers.
[0041]
Therefore, the data format generated in the present invention does not always use JPEG as the still image data format and HTML as the index data format, as described with reference to FIG. Depending on the terminal device used by the viewer, the still image data may need to be output in a format such as GIF or PNG (Portable Network Graphics). The index data is also a format for terminal equipment having poor display capability, Compact HTML proposed by the Internet standardization organization W3C (World Wide Web Consortium), and WML (Wireless) similarly proposed by WAP Forum. Markup Language) may be output in a structured language in a format suitable for the terminal device. Furthermore, in order to arrange the scene still images not only spatially but also temporally, a structured language for presentation that can be expressed temporally, such as Synchronized Multimedia Integrated Language (SMIL) standardized by the W3C. May be described as
[0042]
The scene data and the index data may be combined into a single output data by the data combining unit 10 in FIG. For example, as described above, in addition to outputting data of a scene still image file to be externally referenced as archive HTML embedded in HTML, scene still images are arranged in the order of appearance, such as Animation GIF and Motion JPEG format files. It may be output in a format like a single continuous still image file.
[0043]
Of course, if supported by the terminal device, a plurality of still image files should be included in a single file, such as QuickTime (registered trademark), which is a multimedia data format defined by Apple Computer in the United States. May be presented in a format that allows for
[0044]
Even if the terminal device supports the reproduction of a moving image data format such as MPEG-4, various levels of standards exist in MPEG-4 as described in the section of the related art. Alternatively, it may be re-converted to an MPEG-4 data format of a level that can be reproduced by the terminal device and presented.
[0045]
In addition, any data format other than the above data formats may be used as long as the data can be output by the terminal device. For example, when the terminal device is a printer or the like, the terminal device may be presented in a format such as PostScript or Portable Document Format (PDF) instead of the data in the raster format as described above.
[0046]
FIG. 4 shows a specific example of the data format in which the summary information of the moving image content is output to the terminal device. FIG. 4 is a diagram showing an example of terminal devices to which moving image index data is presented and a data format when presented to each terminal, wherein (1) a portable information terminal, (2) a mobile phone, (3) When there is a personal computer, (4) a game machine, and (5) a printer, the data generated by the image processing apparatus of the present invention is used for (1) HTML and JPEG, and (2) for Compact. HTML and GIF, (3) Synchronized Multimedia Integration Language (SMIL) and JPEG, (4) Animation GIF, and (5) PostScript are provided as PostScript.
[0047]
According to the present invention, as means for detecting a scene change of moving image data, when the moving image content data is recorded in the MPEG-4 format including the BIFS data as the encoding format, the BIFS specification is used. A function called “BIFS Command” is used.
[0048]
As a preliminary knowledge at the time of the description, an outline of the MPEG-4 "BIFS Command" will be briefly described below.
[0049]
FIG. 2 is a diagram showing a concept of synthesizing a scene from a plurality of object data. In MPEG-4, as shown in FIG. 2, a scene in a moving image can be configured by a plurality of objects. This object data has its own data stream, and furthermore, scene description information described in BIFS format to show how the object is spatially and temporally arranged, and object description information. It has object descriptor information for associating the data stream with the scene description information. These pieces of information are input to the scene synthesizing means as shown in FIG. 5, and are synthesized as a final scene. FIG. 5 is a diagram showing that an MPEG-4 scene is synthesized by a scene description (BIFS), an object descriptor, and object data.
[0050]
In MPEG-4, as a part of the BIFS data, a description method of command data representing an operation for inserting, deleting, or replacing an object constituting a scene or replacing a scene itself is defined. I have. This command data is “BIFS Command”, and the updating of the scene can be controlled by including the BIFS Command data in the content. The complete specifications of BIFS, including the BIFS Command, are described in the ISO / IEC 14996-1 standard document issued by the ISO, so please refer to that for details.
[0051]
In the present invention, a point indicating that some change has been made to a scene by BIFS Command data included in input content data or some sort of command data similar thereto is treated as a point at which a scene update occurs. Is a means for detecting scene update.
[0052]
Next, embodiments of the basic principle of the present invention will be described with reference to examples.
[0053]
<Embodiment 1>
Embodiment 1 As Embodiment 1, a mode will be described in which the image processing apparatus of the present invention operates as a software program that operates on an Internet server machine that can use HTTP (Hyper Text Transfer Protocol) as a communication protocol. In a typical example of this mode, the communication control unit 13 and the received data analysis unit 12 are realized by an HTTP server program such as Apache, and the other function blocks are realized by a CGI (Common Gateway Interface) program or the like. The terminal device used by the viewer is assumed to be the portable information terminal device shown in FIG.
[0054]
In the first embodiment, data creation control data for instructing the image processing apparatus on a data generation method is transmitted by HTTP from a terminal device as shown in FIG. FIG. 6 is a diagram showing a data format when control data is transmitted to the moving image index / data generation device by HTTP. From the terminal device, as shown in the figure, a URL (/cgi/bin/mbindex.exe?movie.mp4) indicating the position of the moving image content data to be processed, a data format (Accept) that can be processed by the terminal device : Text / plain, text / html, image / jpeg), a name for identifying the type of terminal device or display program (MoviePlayer / 1.0), etc. are used as data creation control data. It shall be transmitted described in the header part. Needless to say, the data creation control data may be transmitted as a data body instead of the HTTP header.
[0055]
From the transmitted data creation control data, the URL (/cgi/bin/mbindex.exe?movie.mp4) of the moving image content data is extracted from the communication control unit 13 via the reception data analysis unit 11, and the image processing apparatus is executed. Is passed to the CGI program. In this embodiment, this processing is realized by using a function normally provided in an HTTP server program, such as extracting a header portion from an HTTP request and passing it to a CGI program. In this embodiment, since the communication control unit and the data processing unit are separated from each other, the URL of the moving image content data to be processed (movie.mp4) and the URL of the CGI program in which the data processing unit is mounted (/ cgi / bin / mbindex.exe) are specified at the same time, but when the communication control unit and the data processing unit operate as the same program, only the file name of the content data can be specified.
[0056]
As a method for designating the content data, an identifier of any format associated with the content may be used in addition to the file name. However, the data used must be able to identify the content data to be processed in some way.
[0057]
The received data creation control data is passed to the control information analysis unit 7 via the control information input unit 6. The control information analysis unit 7 performs a process of analyzing the input control information and taking it into the internal storage. Hereinafter, the analysis process performed by the control information analysis unit 7 will be described step by step with reference to the flowchart of FIG.
[0058]
FIG. 7 is a flowchart showing a process executed in the control information analysis unit 8.
[0059]
(Step S71) The data creation control data passed from the control information input unit 6 is acquired.
[0060]
In the example of this embodiment, as shown in FIG. 6, the data creation control data described in the HTTP header portion is passed to the CGI program in the form of environment variables by the HTTP server program. In the example of FIG. 6, the environment variables defined corresponding to the items shown to the left of “:” such as “User-Agent” include “:” such as “MoviePlayer / 1.0”. The data in the right part is substituted. Also, the portion of the URL on the right side of “?” In the URL is substituted into an environment variable “QUERY_STRING”. As a result, the control information analysis unit 7 implemented as a CGI program includes:
"HTTP_ACCEPT = text / plain, text / html, image / jpeg"
"HTTP_USER_AGENT = MoviePlayer / 1.0"
"QUERY_STRING = movie.mp4"
Environment variables are available.
[0061]
(Step S72) The control item and the data indicated by the item are stored in the internal storage.
[0062]
Here, in the example of this embodiment, assuming that environment variables HTTP_ACCEPT, HTTP_USER_AGENT, and QUERY_STRING are used as control items, the data indicated by these environment variables is stored in the internal storage in a format that can be used in the subsequent processing. .
[0063]
In the example of the present embodiment, it is assumed that environment variable names usable as control items and their data are stored in associative array format data structures in the form of character strings. However, the form in which the data is stored in the internal storage may be any form as long as it can be used in the subsequent processing. For example, the data part of HTTP_ACCEPT is described as a single character string data in which data formats that can be processed are connected by commas, but when saving, each data format listed in the character string data is stored. For example, an arbitrary format conversion or the like may be performed, for example, by extracting and storing in a list format.
[0064]
(Step S73) When the analysis processing is performed on all the input data creation control data, the position information of the content data is notified to the moving image content input unit 1.
[0065]
In the example of the present embodiment, the value “movie.mp4” of the environment variable QUERY_STRING corresponds to the position information of the content data, and thus “movie.mp4” is passed to the moving image content input unit 1. In some cases, it may be desirable that the position information of the content data is passed in the form of an absolute path. In this case, the process of converting the position information from the relative URL format as described above to the absolute path format is also required. It is assumed that the process is performed before the content is transferred to the moving image content input unit 1.
[0066]
However, when the position information is expanded as environment variables and can be referred to from the moving image content input unit 1 as in the present embodiment, this processing is not necessarily performed.
[0067]
(Step S74) The control information held in the internal storage by the process of step S71 is transferred to the data creation control unit 8. The passed control information is managed by the data creation control unit 8 as it is.
[0068]
In this embodiment, it is assumed that a pointer having the data structure described in step S71 is passed.
[0069]
Through the above procedure, the data creation control data is analyzed, and the position information of the content data is passed to the moving image content input unit 1.
[0070]
On the other hand, the moving image content data is acquired by the moving image content input unit 1 based on the position passed from the control information analysis unit 7, demultiplexed by the content demultiplexing unit 2, and decoded by the encoded data analysis unit 3. After the encoded data is analyzed, it is passed to the scene update detection unit 4. The scene update detection unit 4 searches the input coded data for information indicating a change in the scene, and the scene in which the change is detected is treated as a target from which a still image is extracted in subsequent processing. Hereinafter, the detection processing performed by the scene update detection unit 4 will be described step by step with reference to the flowchart of FIG. It is assumed that the data processed by the scene update detection unit 4 has already been passed by the content demultiplexing unit 2 as a data sequence separated for each data format such as a moving image, a still image, and a scene description. . In addition, a predetermined block included in the data string, such as DTS (Decoding Time Stamp) and CTS (Composition Time Stamp) defined in the MPEG-4 standard, should be processed in the separated data string. It is assumed that time information indicating whether or not the time is embedded.
[0071]
FIG. 8 is a flowchart showing the processing executed in the scene update detection unit 4.
[0072]
(Step S81) First, the type of the input encoded data is confirmed.
[0073]
In the present embodiment, the type of the input encoded data is determined based on the object descriptor information described above. The object descriptor information describes the configuration information of the data stream (Elementary Stream) in which the content of the object included in the content is described, the type of the encoding format, and the like. For example, an Initial Object Descriptor, which is a type of object descriptor, collectively describes information such as an ES_Descriptor that describes elementary stream characteristic information included in content and a DecoderConfigDescriptor that describes encoding-type characteristic information. According to the ISO definition, the type of encoded data is specified by the streamType item of the above-mentioned DecoderConfigDescriptor, and can be identified by this value. The encoded data analysis unit 3 needs to analyze the encoded data in advance and make it available in each processing of the scene update detection unit. In the present embodiment, it is assumed that the encoded data analysis unit 3 analyzes the object descriptor information including the type of encoded data and the like and the stream data itself.
[0074]
Note that the encoding according to MPEG-4, including the object descriptor information, is completely defined in the ISO / IEC 14496-1 standard document, and therefore detailed description is omitted in this specification.
[0075]
(Step S82) It is checked whether the type of the input coded data indicates BIFS. In the case of BIFS, the process of step S83 is executed, otherwise, the process skips to step S85.
[0076]
In the case of this embodiment, the type of encoded data is specified by the streamType item of the above-mentioned DecoderConfigDescriptor, and in the case of BIFS, the value of streamType is 0x03. Therefore, it is checked here whether or not streamType = 0x03.
[0077]
(Step S83) It is checked whether the input BIFS data is a BIFS Command. In the case of BIFS Command, the process of step S84 is executed, otherwise, the process skips to the process of step S85.
[0078]
In the case of this embodiment, since it is determined that some change has occurred in the scene by detecting the BIFS Command, it is necessary to check whether the BIFS data includes the BIFS Command. Since the BIFS Command is usually described as attribute data of a type of object called “Conditional”, here, it is checked whether the BIFS data includes a Conditional object and BIFS Command data described as its attribute.
[0079]
(Step S84) When the update of the scene is detected, the update of the scene is notified by passing the update content and the time at which the update occurs to the data creation control unit 8. The data creation control unit 8 holds these scene update information in an internal storage and manages them so as to be used for controlling the subsequent data generation processing.
[0080]
In the case of this embodiment, the content of the update can be determined by the value of the attribute code of the BIFS Command indicating the type of the command (addition / deletion / replacement / scene replacement). The code defines four types of values: 0 (insert object), 1 (delete object), 2 (replace object), and 3 (replace scene). As the time at which the update occurs, a CTS (Composition Time Stamp) assigned to the BIFS Command is used. Therefore, here, the code value of the BIFS Command and the value of the CTS are acquired, and are passed to the data creation control unit 8 as the update contents and the update occurrence time.
[0081]
(Step S85) It is checked whether or not all the input data has been processed. If unprocessed input data remains, the processing of steps S81 to S84 is performed on the remaining data.
[0082]
According to the above procedure, information indicating a scene change is detected from the encoded data input to the scene update detection unit 4.
[0083]
In the procedure shown in FIG. 9, the scene data creating unit 5 creates scene still image data recorded in a visible image format based on the scene information passed from the scene update detecting unit 4. Hereinafter, the processing of the scene data creation unit 5 will be described step by step with reference to the flowchart of FIG.
[0084]
FIG. 9 is a flowchart showing the processing executed in the scene data creation unit 5.
[0085]
(Step S91) A time stamp corresponding to the update time of the input scene is acquired from the data creation control unit 8.
[0086]
(Step S92) Information of a data format that can be processed by the terminal device used by the viewer is acquired from the data creation control unit 8.
[0087]
At this time, since the format of the created still image data needs to be output in a format that can be displayed on the terminal device of the viewer, first, the data creation control unit 8 is inquired of a data format accepted by the terminal device. . In the example of the present embodiment, as shown in the description of FIG. 7, the data format that can be processed is obtained as an environment variable of “HTTP_ACCEPT = text / plain, text / html, image / jpeg”, and then the data creation control unit 8 Passed and managed. Therefore, data corresponding to HTTP_ACCEPT can be obtained from the data creation control unit 8.
[0088]
(Step S93) The input scene information is encoded in the still image format of the processable data format acquired in step S92, and still image data of the scene is created.
[0089]
In the present embodiment, the data acquired in step S94 includes, as “image / jpeg” of the HTTP_ACCEPT, data indicating that still image data in JPEG format can be received. Is performed to output a scene still image as a JPEG format file. Of course, when this data is "image / gif", it is needless to say that dynamic control such as outputting in GIF format is necessary.
[0090]
It is assumed that the image file output here is created with the file names “movie-0001.jpg” to “movie-0005.jpg” as shown in FIG.
[0091]
However, as shown in FIG. 5, when the scene is composed of a plurality of objects, the display position of the objects constituting the scene and other data indicating the display attributes are extracted before the still image data is created. Then, after arranging the object in the virtual space based on the information, it is necessary to output the arrangement image as still image data.
[0092]
Note that this combining process is essential when handling a plurality of objects in MPEG-4, but since it is not directly related to the object of the present invention, a detailed description of a method of performing the combining process is omitted.
[0093]
(Step S94) The position of the still image data output in step S93 is notified to the data creation control unit 8 in association with the time at which the scene update has occurred. The still image data generated here may be output to a still image data file or may be output to a memory block. Therefore, as the position of the data notified to the data creation control unit 8, a file path is used for a still image data file, and an address value is used for a memory block. In the example shown in this embodiment, the output form is a still image file in JPEG format or the like, and the data creation control unit 8 is notified of the still image data file name.
[0094]
The data creation control unit 8 holds the position of the still image data passed from the scene data creation unit 5 in association with the time stamp also passed. Further, the data creation control unit 8 passes the position of the received still image data to the index data creation unit 9, and instructs execution of index data creation processing described later.
[0095]
After the series of steps S91 to S94 is processed, the processing shown in FIGS. 8 to 9 is repeatedly performed on the next input data to be input to the moving image content input unit 1.
[0096]
Through the above procedure, scene still image data recorded in a visible image format is created from the scene information passed from the scene update detection unit 4. The scene still image data generated by the processing so far is finally referred to from the index data generated by the index data generating unit 9. Hereinafter, the processing performed by the index data creation unit 9 will be described step by step with reference to FIG.
[0097]
FIG. 10 is a flowchart showing a process executed in the index data creation unit.
[0098]
(Step S101) The position of the still image data is received from the data creation control unit 8.
[0099]
(Step S102) The data creation control unit 9 is inquired about a data format that can be processed by the terminal device, and acquires it.
[0100]
In this embodiment, as shown in the description of FIG. 7, the data format that can be processed is obtained as an environment variable “HTTP_ACCEPT = text / plain, text / html, image / jpeg”, and then passed to the data creation control unit 8. Is being managed. Therefore, here, the data corresponding to the HTTP_ACCEPT is acquired from the data creation control unit 8.
(Step S103) The position of the still image data is recorded in index data of an appropriate format based on the format acquired in step S101, and is output.
[0101]
In the case of the present embodiment, since the data acquired in step S102 includes data indicating that the data in the HTML format can be received as “text / html”, the index data creation unit 9 sets the index The data is output as an HTML format file named "movie.html" shown in FIG. In this embodiment, since the file name is used as the position of the still image data, it is assumed that the data of the IMG tag in which the file name of the still image to be referred to is specified is output as the index data.
[0102]
Note that the index data created here does not necessarily have to be created in a data format that can be reproduced on the terminal device of the viewer. In this example, since the terminal device can process HTML, the terminal device outputs the index data in the HTML format. However, for example, in the case of a terminal device which can process only the JPEG format, the index data in the HTML format is output. Data cannot be processed.
[0103]
In such a case, the index data is created as data in an arbitrary format in which the order and arrangement information of the still images are recorded, and the information of the index data and the scene still image data are combined by the subsequent data combining unit 10, The output must be in a format that can be processed by the terminal device. That is, for a device that can process only the JPEG format listed here, the data combining unit 10 will need to perform processing such as generating one JPEG image in which a plurality of scene still images are laid out.
[0104]
In such a case, the index data creation unit 9 may generate the index data in any format.
[0105]
(Step S104) The position of the index data output in step S103 is notified to the data creation control unit 8.
[0106]
In the case of the present embodiment, the file name “movie.html” of the index data output in step S103 is passed to the data creation control unit 8.
[0107]
The index data creation unit 9 creates index data on the internal storage when the position of the still image data is notified, and finalizes the index data when processing of all scenes is completed. A process of converting into a format and outputting the format may be performed. Therefore, the position of the index data notified to the data creation control unit 8 is the HTML file path in the case of the present embodiment, but an address value of the internal storage may be used.
[0108]
As described above, the index data for the scene included in the moving image content is created by the processing of FIGS.
[0109]
When the processing of FIGS. 7 to 10 is completed for all scenes, the data creation control unit 9 calls the data combination unit 10 and passes the positions of index data and still image data registered so far. . In the present embodiment, the data combining unit 10 determines whether the reference to the JPEG file described in the output HTML file is correct or the URL is described in a file path used in a local environment. Processing such as conversion to a URL that can be correctly referred to is performed.
[0110]
Further, when it is necessary to present the index data and the scene data to the terminal device as a single packaged data, the data combining process is performed here.
[0111]
For example, as described in the description of the flowchart of FIG. 10, devices that can process only the JPEG format or devices such as the game machine (4) and the printer (5) of FIG. The index data and the scene data created by the processing are integrated and output in a format that can be processed by the terminal device. When it is necessary to convert the index data and the scene data into another encoding format when performing the data combining process, the data combining unit 10 performs necessary encoding format conversion and re-encoding process. There must be.
[0112]
The data processed by the data combining unit 10 is finally shaped as HTTP response data by the transmission data shaping unit 12 and transmitted to the viewer's terminal device via the communication control unit 13.
[0113]
The video index data generated by such means is guaranteed to be generated in a format supported by the terminal device used by the viewer, so that the viewer can know the contents of the video information reliably become.
[0114]
In the present embodiment, the index data is generated together with the scene data. However, only the scene data may be generated, and the generated scene data may be sequentially transferred to the terminal device and displayed. When only the scene data is generated, the generated scene data can be viewed before generating all the scene data, and the contents can be transmitted to the viewer more quickly. Also, when the scene data and the index data are generated, the viewer can see all the scene data at a time, which is suitable, for example, when the terminal device is a printer or the like.
[0115]
Further, in the present embodiment, since the command data (BIFS command) for the scene operation is used for the scene update detection, the scene update can be easily detected.
[0116]
Further, in this embodiment, by combining the index data and the scene data and putting them together, the trouble of transmission and reception through the network is reduced, and the data management is also facilitated.
[0117]
<Other embodiments>
Further, in the above embodiment, although the hardware and the like configuring the network are included, each processing unit can be actually realized by software. That is, an object of the present invention is to supply a storage medium (or a recording medium) in which software program codes for realizing the functions of the above-described embodiments are recorded to a system or an apparatus, and to provide a computer (or a computer) of the system or the apparatus. , CPU and MPU) read out and execute the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.
[0118]
When the computer executes the readout program codes, not only the functions of the above-described embodiments are realized, but also an operating system (OS) or the like running on the computer based on the instructions of the program codes. However, it goes without saying that a case is also included in which part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing.
[0119]
Further, after the program code read from the storage medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the card or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
[0120]
In the above embodiment, the data format is set by inputting the data creation control information, but it goes without saying that the data format may be selected from several data formats.
[0121]
【The invention's effect】
As described above, according to the present invention, with respect to information provided as moving image content such as MPEG-4, scene still image data that can be easily grasped can be widely distributed via a network such as the Internet. The information can be provided in a format that can be viewed on the terminal device, and as a result, the information can be effectively and quickly transmitted to a larger number of viewers.
[Brief description of the drawings]
FIG. 1 is a diagram showing an overall configuration of the present invention.
FIG. 2 is a diagram showing a concept of synthesizing a scene from a plurality of object data.
FIG. 3 is a diagram showing an example of scene data and index data.
FIG. 4 is a diagram illustrating an example of a terminal device to which moving image index data is presented and a data format when presented to each terminal.
FIG. 5 is a diagram showing that an MPEG-4 scene is synthesized by a scene description (BIFS), an object descriptor, and object data.
FIG. 6 is a diagram illustrating a data format when control data is transmitted to the image processing apparatus by HTTP.
FIG. 7 is a flowchart showing a process executed in a control information analysis unit 7;
FIG. 8 is a flowchart showing a process executed in a scene update detection unit 4;
FIG. 9 is a flowchart showing a process executed in the scene data creation unit 5;
FIG. 10 is a flowchart showing processing executed in the index data creation unit 9;

Claims

Video data input means for inputting video data,
Data format setting means for setting a data format that the output destination device can understand;
Scene update detection means for detecting a scene update of the video data,
An image processing apparatus comprising: a scene data creating unit that creates scene data of a scene detected by the scene update detecting unit in the data format.

The image processing apparatus according to claim 1, wherein the output destination device is an external device.

Communication control means for controlling data transmission between the external device,
Reception data analysis means for analyzing the reception data input from the communication control means,
Having transmission data shaping means for shaping data into a transmission format compatible with the communication control means,
3. The data receiving apparatus according to claim 2, wherein the received data analyzing unit extracts a data format from the received data received from the external device by the communication control unit, and gives the data format to the data format setting unit. Image processing device.

4. The apparatus according to claim 2, further comprising an index data creating unit for creating index data indicating a position and an appearance order of the scene data created by the scene data creating unit in the data format. An image processing apparatus according to claim 1.

The image processing apparatus according to claim 4, wherein the communication control unit transmits the index data created by the index data creation unit to the external device.

5. The image processing apparatus according to claim 4, further comprising a data combining unit that combines and outputs the scene data created by the scene data creating unit and the index data created by the index data creating unit. 6. The image processing device according to 5.

The moving image data input by the moving image data input unit includes scene description data describing a scene, and the scene update detection unit determines a scene update based on the scene description data. The image processing device according to any one of claims 1 to 6.

The moving image data input by the moving image data input unit is data in a data format different from a data format created by the scene data creating unit. The image processing apparatus according to claim 1.

A video data input process for inputting video data,
A data format setting step of setting a data format that the output destination device can understand;
A scene update detecting step of detecting a scene update of the moving image data input in the moving image data input step,
A scene data creating step of creating scene data of the scene detected in the scene update detecting step in the data format set in the data format setting step.

The image processing method according to claim 9, wherein the output destination device is an external device.

11. An index data creation step of creating, in the data format, index data indicating a position and an appearance order of scene data created in the scene data creation step. The image processing method according to 1.

The moving image data input in the moving image data input step includes scene description data describing a scene, and the scene update detection step determines a scene update based on the scene description data. 12. The image processing method according to any one of 9 to 11.

The moving image data input in the moving image data input step is data in a data format different from a data format created in the scene data creating step. The image processing apparatus according to claim 1.

A computer program for executing the image processing method according to claim 9.

A computer-readable storage medium that stores the program according to claim 14 and is readable by a computer.