JP2004159107A

JP2004159107A - Method for index generation, program, and storage medium with program stored therein

Info

Publication number: JP2004159107A
Application number: JP2002323091A
Authority: JP
Inventors: Taku Nishio; 西尾　　卓; Yukinori Minamida; 幸紀南田; Hisaya Kotani; 尚也小谷; Yukinobu Taniguchi; 行信谷口; Tadashi Nakanishi; 正仲西
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-11-06
Filing date: 2002-11-06
Publication date: 2004-06-03
Anticipated expiration: 2022-11-06
Also published as: JP3891097B2

Abstract

<P>PROBLEM TO BE SOLVED: To generate indexes for temporally associating a scenario with an edited video with small workloads after completion of the video for effectively utilizing a scenario document at the time of planing in association with the edited video. <P>SOLUTION: An actual time length of the whole video is acquired from the edited video, predicted time lengths at each scene, and the whole video from the scenario are calculated, a ratio between the actual time length of the whole edited video and the predictive time length of the whole video calculated from the scenario is acquired, the predictive time lengths at each scene calculated from the scenario are corrected by using the ratio, and the indexes describing prediction start times, prediction end times and the predictive time lengths are generated at each scene. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、インデックス生成方法及びプログラム及びインデックス生成プログラムを格納した記憶媒体に係り、特に、編集済みの映像とシナリオを対応付けることによって映像のインデックスを生成するためのインデックス生成方法及びプログラム及びインデックス生成プログラムを格納した記憶媒体に関する。
【０００２】
【従来の技術】
映像を制作するワークフローを考えると、通常まずどのようなシーンを組み合わせて映像を構成するかというシナリオを作り、そのシナリオに基づいて撮影、編集等を行うことによって最終的な映像（編集済み映像）を作るというフローになる。
【０００３】
ここで、シナリオとは、映像の企画構成をシーン毎に記述した文書で、例えば、シーンの時間長、タイトル、スケッチ、ナレーション、テロップ、ＢＧＭ、構図、カメラワーク、出演者、撮影場所等の情報が記述されているものである。これらすべての情報が記述されている必要はなく、また、記述するフォーマットも特に決まっているわけではない。手書きのものもあれば、ワードプロセッサや専用ソフトウェアで作成された電子ファイルや、それらを印刷したものもある。
【０００４】
一般に編集済み映像は、元のシナリオとの時間的な対応付けが曖昧である。シナリオに時間に関する記載がない場合もあれば、シーン毎の時間長が記載されている場合もあるが、編集過程で変更されることもあるため、シナリオの時間長と編集済み映像の時間長が必ずしも一致するとは限らない。このような現状では、シナリオの情報と編集済み映像の時間的な対応が曖昧なため、シナリオ記載のメタデータが映像のどの部分に対応するかわからず、シナリオに含まれている情報をシーン毎にメタデータとして付与することが難しい。
【０００５】
これを解決する手段の一つとして、映像制作時に構造中の位置を示す識別子を与えることによってシナリオと編集済み映像を時間的に対応付ける方法がある（例えば、特許文献１参照）。この方法によればシナリオと編集済み映像を時間的に対応付けることが可能となり、シナリオ情報をメタデータとして利用することが可能となる。
【０００６】
別のアプローチとして、編集済み映像からテロップ認識（例えば、特許文献２参照）や音声認識（例えば、特許文献３参照）等によって情報を抽出し、メタデータとして付与するという方法もある。
【０００７】
【特許文献１】
特開２０００−９２４１９「番組情報管理編集システムとそれに使用される階層化番組情報蓄積管理装置」
【特許文献２】
特開２００２−２７９４３３「映像中の文字検索方法及び装置」
【特許文献３】
特開２００２−１７５３０４「映像検索装置及びその方法」
【０００８】
【発明が解決しようとする課題】
しかしながら、上記の「番組情報管理編集システムとそれに使用される階層化番組情報蓄積管理装置」を用いて編集済み映像とシナリオを対応付けるには、映像制作時にカメラで一つのショットを撮影する度に識別子を与える必要があるので、映像完成後に対応付けを行う必要が生じたとしても、撮影段階で識別子を付与していなければシナリオ情報を関連付けることができないという問題がある。また、企画、撮影、編集のすべての段階において、識別子を与えたり記録したりするための機材が必要となり導入コストが高いという問題がある。
【０００９】
勿論、編集済み映像とシナリオを人手で対応付けることも考えられるが、映像に含まれる膨大なシーンすべてに手作業で対応付けを行うことは時間的コストが高いという問題がある。
【００１０】
さらに、編集済み映像からテロップ認識や音声認識により情報を抽出してメタデータとして付与する方法では、抽出可能な情報はシナリオに含まれる情報の一部でしかなく、認識精度も１００％とはいえない。
【００１１】
本発明は、上記の点に鑑みなされたもので、企画時のシナリオ文書を編集済み映像と対応付けて有効活用するため、映像が完成した後に少ない作業量でシナリオと編集済み映像を時間的に対応付けるためのインデックス生成方法及びプログラム及びインデックス生成プログラムを格納した記憶媒体を提供することを目的とする。
【００１２】
ここで、インデックスとは、シナリオ記述のシーン毎に、予測開始時間、予測終了時間、予測時間長、シーンのタイトル等のメタデータを記述したものである。
【００１３】
【課題を解決するための手段】
図１は、本発明の原理を説明するための図である。
【００１４】
本発明は、編集済み映像とシナリオ文書を元に映像とシナリオを対応付ける映像に対するインデックスを生成するためのインデックス生成方法において、
編集済み映像から映像全体の実時間長を求める実時間長計測過程（ステップ１）と、
シナリオからシーン毎の予測時間長と映像全体の予測時間長を算出する予測時間算出過程（ステップ２）と、
編集済み映像全体の実時間長とシナリオから算出した映像全体の予測時間長の比を求める時間長比較過程（ステップ３）と、
比を用いてシナリオから算出したシーン毎の予測時間長を修正する予測時間長修正過程（ステップ４）と、
シーン毎に予測開始時間と予測終了時間と予測時間長を記述したインデックスを生成するインデックス生成過程（ステップ５）とを行う。
【００１５】
また、本発明の予測時間算出過程において、
シナリオに記載されているシーン毎のナレーションやコメント、台詞の全部または、一部の文字数を数え、
シーン毎の文字数の比を求め、
比を用いてシーン毎の予測時間長を算出する過程を更に行う。
【００１６】
また、本発明は、インデックス生成過程終了後に、
編集済み映像からカット点と該カット点の時間を検出し、
修正候補となる検出されたカット点の時間と予測時間算出過程において算出されたシーン毎の予測開始時間を比較して該カット点のいずれかの時間を修正後の予測開始時間とし、
シナリオから算出されたシーン毎の予測開始時間と予測時間長を修正した時間情報を記述したインデックスを生成する過程を更に行う。
【００１７】
また、本発明は、インデックス生成過程終了後に、
シナリオからシーン毎にメタデータを抽出し、
抽出したメタデータをシーン毎に記述したインデックスを生成する過程を更に行う。
【００１８】
本発明は、編集済み映像とシナリオ文書を元に映像とシナリオを対応付ける映像に対するインデックスを生成するためのインデックス生成プログラムであって、
編集済み映像から映像全体の実時間長を求める実時間長計測ステップと、
シナリオからシーン毎の予測時間長と映像全体の予測時間長を算出し、該シナリオに時間に関する記述がない場合には、該シナリオに記載されているシーン毎のナレーションやコメント、台詞の全部または、一部の文字数を数え、該シーン毎の文字数の比を求める予測時間算出ステップと、
比を用いてシーン毎の予測時間長を算出するする予測時間算出ステップと、
編集済み映像全体の実時間長とシナリオから算出した映像全体の予測時間長の比を求める時間長比較ステップと、
比を用いてシナリオから算出したシーン毎の予測時間長を修正する予測時間長修正ステップと、
シーン毎に予測開始時間と予測終了時間と予測時間長を記述したインデックスを生成するインデックス生成ステップと、
編集済み映像からカット点と該カット点の時間を検出するカット点検出ステップと、
修正候補となる検出されたカット点の時間と予測時間算出ステップにおいて算出されたシーン毎の予測開始時間を比較して該カット点のいずれかの時間を修正後の予測開始時間とする修正予測開始時間設定ステップと、
シナリオから算出されたシーン毎の予測開始時間と予測時間長を修正した時間情報を記述したインデックスを生成する修正インデックス生成ステップと、
シナリオからシーン毎にメタデータを抽出するメタデータ抽出ステップと、
抽出したメタデータをシーン毎に記述したインデックスを生成するメタデータ付与インデックス生成ステップと、を実行する。
【００１９】
本発明は、編集済み映像とシナリオ文書を元に映像とシナリオを対応付ける映像に対するインデックスを生成するためのインデックス生成プログラムを格納した記憶媒体であって、
編集済み映像から映像全体の実時間長を求める実時間長計測ステップと、
シナリオからシーン毎の予測時間長と映像全体の予測時間長を算出し、該シナリオに時間に関する記述がない場合には、該シナリオに記載されているシーン毎のナレーションやコメント、台詞の全部または、一部の文字数を数え、該シーン毎の文字数の比を求める予測時間算出ステップと、
比を用いてシーン毎の予測時間長を算出するする予測時間算出ステップと、
編集済み映像全体の実時間長とシナリオから算出した映像全体の予測時間長の比を求める時間長比較ステップと、
比を用いてシナリオから算出したシーン毎の予測時間長を修正する予測時間長修正ステップと、
シーン毎に予測開始時間と予測終了時間と予測時間長を記述したインデックスを生成するインデックス生成ステップと、
編集済み映像からカット点と該カット点の時間を検出するカット点検出ステップと、
修正候補となる検出されたカット点の時間と予測時間算出ステップにおいて算出されたシーン毎の予測開始時間を比較して該カット点のいずれかの時間を修正後の予測開始時間とする修正予測開始時間設定ステップと、
シナリオから算出されたシーン毎の予測開始時間と予測時間長を修正した時間情報を記述したインデックスを生成する修正インデックス生成ステップと、
シナリオからシーン毎にメタデータを抽出するメタデータ抽出ステップと、
抽出したメタデータをシーン毎に記述したインデックスを生成するメタデータ付与インデックス生成ステップと、からなるプログラムを格納する。
【００２０】
上記のように、本発明では、オペレータが編集済み映像を再生し、映像の全てに目を通してシーン毎の開始点にラベル付けを行う際に、本発明で生成されるインデックスにより開始点が存在する区間を絞り込むことが可能となるため、映像すべてに目を通す必要がなくなり、作業を省力化することができる。
【００２１】
また、本発明では、シナリオシーンの予定開始時間が記載されていない場合にもインデックスを生成することが可能となる。
【００２２】
さらに、インデックス生成の際に編集済みのカット点を用いることで、インデックス中のシーンの予定開始時間をより正確な情報に修正することが可能となる。
【００２３】
また、インデックスにシーンの開始時間の情報だけでなく、シナリオの情報をメタデータとして記述することで、キーワードによるシーン探索が可能となる。
【発明の実施の形態】
以下、図面と共に本発明の実施の形態を説明する。
【００２４】
［第１の実施の形態］
図２は、本発明の第１の実施の形態におけるインデックス生成のフローチャートである。
【００２５】
同図に示すフローチャートは、実時間長計測ステップ（ステップ１０１）、予測時間長算出ステップ（ステップ１０２）、時間長比較ステップ（ステップ１０３）、予測時間長修正ステップ（ステップ１０４）、インデックス生成ステップ（ステップ１０５）より構成される。
【００２６】
以下、図２に基づいてインデックス作成の動作を詳細に説明する。
【００２７】
本発明では、シナリオに記述されている完成予定の映像の時間長と、実際の編集済み映像の時間長にずれがあることから、そのずれをおおまかに修正したインデックスを生成することを目的とする。
【００２８】
まず、実時間長計測ステップ（ステップ１０１）の入力は、既に編集済みの映像となる。映像のジャンルや、内容、編集方法や記録媒体については、特に制限しない。
【００２９】
ステップ１０１では、この入力された編集済み映像から映像全体の実時間長Ｌ１を求める。例えば、編集済みの映像がＶＨＳ等のテープに録画されている場合には、再生デッキ等を用いてカウンタの開始値と終了値の差分により時間長を求める。また、ＭＰＥＧ等の電子フォーマットで録画されている場合には、ファイルのプロパティから映像の時間長を得ることができる。ここで、例えばステップ１０１の出力として編集済みの映像の実時間長Ｌ１＝２００（ｓ）が得られたとする。
【００３０】
次に、ステップ１０２の入力は、上記編集済み映像を制作するときに使用したシナリオとなる。コンピュータプログラムとして実施する場合には、シナリオが機械可読である形式になっているものとする。例えば、図３のように、シーン時間長２０１、タイトル２０２、スケッチ２０３、ナレーション２０４、テロップ２０５、ＢＧＭ２０６が記述されているシナリオが入力されたとする。
【００３１】
このシナリオから各シーン（１〜ｎ）の予測時間長Ｓ１〜Ｓｎを算出する。図３のシナリオのように、予め各シーンに時間長２０１が記載されている場合には、その値をそのまま編集済み映像の予測時間長Ｓ１〜Ｓｎとする。例えば、シナリオにシーンｉの時間長が“０：２０”と記載されていたら、シーンｉの予測時間長Ｓｉを“Ｓｉ＝２０（ｓ）”とする。ところでシナリオには、シーン毎の時間長を記載する代わりに、シーン毎の開始時間が記載されている場合がある。この場合には、シーンｉの予測時間長Ｓ１〜Ｓｎを、
Ｓｉ＝Ｔｉ＋１ −Ｔｉ
により求める。但し、Ｔｉはシーンｉの開始時間とする。
【００３２】
さらに、全体の予測時間長Ｌ２を
Ｌ２＝Ｓ１＋Ｓ２＋…＋Ｓｎ
により算出する。
【００３３】
図３のシナリオの場合、各シーンの予測時間長は、
Ｓ１＝９０（ｓ），
Ｓ２＝５０（ｓ），
Ｓ３＝４０（ｓ）
であり、全体の予測時間長は、
Ｌ２＝Ｓ１＋Ｓ２＋Ｓ３＝１８０（ｓ）
となる。
【００３４】
ステップ１０３の入力は、ステップ１０１で得られた編集済み映像の実時間長Ｌ１とステップ１０２で得られた予測時間長Ｌ２となる。
【００３５】
ステップ１０３では、編集済み映像から算出された全体の実時間長Ｌ１とシナリオから算出された全体の予測時間長Ｌ２の比Ｒ＝Ｌ１／Ｌ２を求める。
【００３６】
図３のシナリオの場合
Ｒ＝Ｌ１／Ｌ２＝２００／１８０＝１０／９
となる。
【００３７】
ステップ１０４では、ステップ１０３で求めた比Ｒを用いて、ステップ１０２で得られた各シーンの予測時間長Ｓ１〜Ｓｎを修正する。修正後の予測時間長Ｓ’１〜Ｓ’ｎを、
Ｓ’１＝Ｓ’１＊Ｒ，Ｓ’２＝Ｓ’２＊Ｒ，…，Ｓ’ｎ＝Ｓ’ｎ＊Ｒ
の計算式により得る。
【００３８】
図３のシナリオの場合、修正済みの各シーンの予測時間長は、
Ｓ’１＝Ｓ１＊Ｒ＝１００（ｓ），
Ｓ’２＝Ｓ２＊Ｒ≒５６（ｓ），
Ｓ’３＝Ｓ３＊Ｒ≒４４（ｓ）
となる。
【００３９】
ステップ１０５では、ステップ１０４で得られた修正済みのシーン毎の予測時間長Ｓ’１〜Ｓ’ｎ及び映像の実時間長Ｌ１を用いて、図４のように各シーン番号３０１、予測開始時間３０２、予測終了時間３０３、予測時間長３０４、映像全体の開始時間３０５、終了時間３０６、時間長３０７を記述したインデックスを生成する。
【００４０】
ここでシーン１の予測開始時間３０２及び映像全体の開始時間３０５は必ず“０：００”とし、修正後の予測時間長３０４を加算していくことで、そのシーンの予測終了時間３０３及び次のシーンの予測開始時間３０２を順次算出する。
【００４１】
図４のインデックスはシーン１が０：００〜１：４０、シーン２が１：４０〜２：３６、シーン３が２：３６〜３：２０に存在することを示している。
【００４２】
ステップ１０５で生成されたインデックスのシーンの予測開始時間と予測時間長は必ずしも正確ではないが、シナリオに記述されているシーンの開始時間や時間長に比べ、実際の編集済み映像のシーン開始時間、時間長に近くなっている可能性が高い。
【００４３】
ここで生成されたインデックスの予測開始時間は、通常、編集済み映像の開始時間と比べ誤差が生じている。そこで、この誤差を考慮に入れて、シーンの実際の開始時間が存在する目安となる範囲を記述したインデックスを生成してもよい。例えば、予測時間長の修正前後の差程度の誤差があると考えると、図５のインデックスが生成される。例えば、シーンｉの修正前と修正後の予測時間長の差をｄｉ、予測終了時間をＴｉとすると、範囲表現した場合のシーンｉの予測終了時間は、（Ｔｉ −ｄｉ）〜（Ｔｉ＋ｄｉ）と表すことができる。そして予測終了時間の範囲に合わせて、予測時間長、次シーンの予測開始時間の修正も行う。このとき予測終了時間の範囲を次シーンでの予測開始時間の範囲とし、予測時間長の範囲には、予測開始時間と予測終了時間の範囲で最小の値と最大の値を用いる。但し、最初のシーンの予測開始時間と最後のシーンの予測終了時間については、編集済み映像より算出した正確な値であるため範囲表示を行う必要がない。
【００４４】
例えば、シーン１の修正前の予測時間長は９０（ｓ）であり、修正後の予測時間長は１００（ｓ）であるから差分は１０（ｓ）となる。そしてシーン１の予測終了時間は、１：４０であるから、この値から±１０（ｓ）を計算して、１：３０〜１：５０が予測終了時間の目安の範囲となる。このとき、シーン１の開始時間は、０：００であるから、この区間の最小値と最大値を求めるとシーン１の予測時間長は、９０〜１１０ｓとなる。この手順を繰り返せば、図５のインデックスが作成される。
【００４５】
本実施の形態によって生成されたインデックスに記載されているシーンの開始時間情報は、編集済み映像のシーン毎の開始時間に近い値となっていることが期待できる。よって、編集済み映像から正確なシーン開始時間を得るためにオペレータが映像を全てを見なくても、本実施の形態により生成されたインデックスに記述されたシーン開始時間の前後、または記述されている範囲を中心に探せばよく、作業を省力化することができる。
【００４６】
また、インデックス生成のために映像撮影時の特殊な装置、手順が不要である。
【００４７】
［第２の実施の形態］
本実施の形態では、シナリオに時間に関する記述がない場合のインデックス生成方法について説明する。
【００４８】
図６は、本発明の第２の実施の形態におけるシナリオの例であり、シナリオに時間に関する記述がない例を示す。図７は、本発明の第２の実施の形態におけるインデックス生成のフローチャートである。
【００４９】
図７におけるステップ２０１及びステップ２０５は、前述の第１の実施の形態のステップ１０１及びステップ１０５と同様であるのでその説明を省略する。
【００５０】
例えば、図６のような時間に関する記述がないシナリオをステップ２０２に入力する。ステップ２０２では、このシナリオ中のシーン毎に記述されている文字数をカウントする。ここで、シーンｉの文字数をｒｉとする。
【００５１】
図６のシナリオでは、シーン１に記述されているナレーションが８５６文字、シーン２に記述されているナレーションが６３２文字であるとする。本実施の形態では、シナリオに記載されているナレーション、コメント、台詞等のうち、ナレーションの文字数をカウントしｒｉとするが、コメントや、台詞等の文字数をｒｉとしてもよいし、それらすべての合計の文字数をｒｉとしてもよい。シナリオにナレーションや、コメント、台詞等の文字が記載されていないシーンがある場合には、そのシーンが規定の文字数または、時間長を持つと仮定して、予め定めた適当な値ａをｒｉとする。例えば、ａにはシーン毎の平均文字数を利用する。
【００５２】
ステップ２０３では、ステップ２０２で求められた文字数を元に、シーン毎の比ｒ１：ｒ２：…：ｒｎを算出する。図６のシナリオの例では、ｒ１：ｒ２＝８５６：６３２となる。
【００５３】
ステップ２０４では、ステップ２０１で得られた編集済み映像の実時間長Ｌ１とステップ２０３で得られた比ｒ１：ｒ２：…：ｒｎから各シーンの予測時間長Ｓ１〜Ｓｎを算出する。各シーンの予測時間長は次のようになる。
【００５４】
Ｓ１＝Ｌ１＊ｒ１／（ｒ１＋ｒ２＋…＋ｒｎ）
Ｓ２＝Ｌ１＊ｒ２／（ｒ１＋ｒ２＋…＋ｒｎ）
…
Ｓｎ＝Ｌ１＊ｒｎ／（ｒ１＋ｒ２＋…＋ｒｎ）
ステップ２０１で得られた編集済み映像の実時間長が３００（ｓ）であるとき、図６の例では、シーン１、シーン２の予測時間長Ｓ１，Ｓ２が、
Ｓ１＝３００＊８５６／（８５６＋６３２）≒１７３（ｓ），
Ｓ２＝３００＊６３２／（８５６＋６３２）≒１２７（ｓ），
と算出される。
【００５５】
本実施の形態では、シナリオに時間情報の記述がない場合についても、おおまかな時間を記述したインデックスの生成が可能になり、前述の第１の実施の形態と同様の効果が得られる。
【００５６】
また、本実施の形態では、文字数をカウントしたが、シーン毎に含まれるカット数を利用することもできる。
【００５７】
［第３の実施の形態］
本実施の形態では、編集済みの映像のカット点の時間情報を利用してシーンの予測開始時間の修正を行う方法について説明する。
【００５８】
図８は、本発明の第３の実施の形態におけるシーンの予測時間の修正方法のフローチャートである。
【００５９】
ここで、カット点とはショット（カメラで連続的に撮影された映像区間）のつなぎ目のことである。このカット点がシーンとシーンの切れ目の候補となる。但し、すべてのカット点がシーンとシーンの切れ目となるわけではない。図８において、ステップ３０１〜３０５は、前述の第１の実施の形態のステップ１０１〜ステップ１０５と同様であるので、その説明は省略する。ステップ３０１〜ステップ３０５の代わりに、第２の実施の形態のステップ２０１〜ステップ２０５の手順でも本実施の形態は実現できる。
【００６０】
ステップ３０６での入力は編集済みの映像となる。ステップ３０６では、この編集済み映像から、映像の特徴量によりカット点を求める。そのための方法としては、“特開２００２−２１８３７６「カット検出装置及びカット検出方法のプログラムを記録した記録媒体」”等の既存の技術を用いてカット点をその時間と共に検出する。
【００６１】
図９は、本発明の第３の実施の形態における映像からカット点を検出した状態を示している。同図は、映像フレームの時間的な並びを模式的に表したもの８０１があり、それから、それぞれ検出されたカット点におけるフレーム８０２、８０３、８０４が示されている。同図の例では、検出された３つのカット点の時間が、１：３３（フレーム８０２）、２：０５（フレーム８０３）、２：４０（フレーム８０４）となっている。
【００６２】
ステップ３０７では、ステップ３０６で検出したカット点と、ステップ３０５で生成されたインデックスのシーン毎の予測開始時間を比較する。
【００６３】
例えば、ステップ３０５で生成される図４のインデックスの例の場合、シーンの予測開始時間は“００：００，０１：４０，０２：３６”となっている。
【００６４】
シーンの正確な開始時間はステップ３０６で検出されたカット点のいずれかである可能性が高いため、このインデックスの予測開始時間とカット点の検出時間を比較し、予測開始時間の修正を行う。修正方法として、シーンの予測開始時間に最も近いカット点の検出時間を修正後のシーン予測開始時間とする。
【００６５】
図１０は、本発明の第３の実施の形態における予測開始時間の修正例を示す。同図では、シーンの予測開始時間と編集済み映像のカット点の時間との対応を示しており、編集済み映像をタイムラインで表したもの９０１、９０５、検出されたカット点９０２、９０３、９０４、シナリオから得られた予測開始時間９０６、９０７が示されている。それぞれ図１０のようにシーン予測開始時間を最も近いカット点を用いて、１：４４（９０６）が、１：３３（９０２）に、２：３６（９０７）が２：４０（９０４）のように修正される。
【００６６】
但し、カット点を検出する際に、検出漏れが発生している可能性もある。そこで、シーンの予測開始時間と最も近いカット点の時間との間隔が予め定められた時間長以上である場合には、修正を行わないようにしてもよい。
【００６７】
また、最も近いカット点が正しいシーンの開始点とは限らないため、予測開始時間の前後Ｄの範囲内に存在するカット点から、オペレータが目視により判断し、対応するカット点を選び出してもよい（Ｄは予め定められた時間長とする）。あるいは、予測開始時間の前後に存在するＭ個のカット点から同様に選び出してもよい（Ｍは予め定められた個数）。
【００６８】
ステップ３０８では、ステップ３０７で修正されたシーン毎の予測開始時間を用いてインデックスを生成する。生成されたインデックスは図１１のようになる。
【００６９】
本実施の形態によれば、編集済みの映像からカット点を検出し、前述の第１の実施の形態または、第２の実施の形態で生成されたインデックスのシーン毎の予測開始時間と比較し、修正することで、インデックス中の予測時間情報をより正確なものとすることが可能である。
【００７０】
［第４の実施の形態］
本実施の形態では、インデックス生成において、時間だけでなく、シナリオからシーン毎に出現頻度の高いキーワードや、タイトルに含まれるキーワード、人物の位置や、構図、ＢＧＭなどを抽出し、当該シーンに対するメタデータ（付属情報）を付与する例を説明する。
【００７１】
図１２は、本発明の第４の実施の形態におけるインデックス生成のフロチャートである。ステップ４０１〜４０５は、第１の実施の形態のステップ１０１〜ス１０５と同様であるため、その説明は省略する。また、本実施の形態では、ステップ４０１〜４０５の代わりに、第２の実施の形態におけるステップ２０１〜２０５、または、第３の実施の形態のステップ３０１〜３０５の手順を用いてもよい。
【００７２】
ステップ４０６では、シナリオよりメタデータを抽出する。メタデータとしては、例えば、タイトル、出演者名、出演者の数、ナレーションやコメントに含まれるキーワード、使用されているＢＧＭ、テロップ、構図等があげられる。
【００７３】
タイトル、出演者名、出演者数、使用されているＢＧＭ、テロップ、構造等の情報については、オペレータがシナリオに記述されている情報を項目毎にインデックスに転記する。シナリオに記載されていない項目は、インデックスでは空欄とする。インデックスの項目は、必要に応じて追加してもよいし、あるいは、一部だけを用いてもよい。
【００７４】
キーワードの抽出方法についえは、例えば、“特開１９９６−９５９８２「キーワード抽出装置」”等の既存技術により、ナレーションの文章を単語単位に切り分けて、すべての単語をそのままキーワードとして用いる。あるいは、各単語の出現頻度をカウントし、出現頻度の高いものから上位１０個を選択してもよい。または、オペレータが手作業でシナリオからキーワードを抽出してもよい。
【００７５】
ここで、インデックスに記述するキーワードはできるだけ他のシーンに含まれないことが望ましい。そのためには、シーン毎にシナリオに含まれる各単語の出現頻度をカウントし、異なるシーンの出現頻度上位１０個に同じ単語が含まれている場合には、その共通する単語をキーワードから除外する。あるいは、予め定められた数以上のシーンに同じ単語が含まれていれば、その単語をキーワードから除外する。このようにして、共通するキーワードを削除したり、キーワードとなりにくくしたりすることで、各シーンの特徴を表現したメタデータとなる。
【００７６】
例えば、ステップ４０６で、オペレータが図３のシナリオのシーン１から「姓名」、「４６億年」、シーン２から「進化」、シーン３から「人類」のキーワードを抽出し、タイトル、テロップ、ＢＧＭのシナリオ記載の情報を転記したとすると、ステップ４０７により生成されるインデックスは図１３のようになる。
【００７７】
このようなメタデータを付与したインデックスを生成すると、前述の第１、第２、第３の実施の形態における効果に加え、オペレータがキーワードを用いてシーン検索を行い、そのシーンの予測開始時間をもとに、対応する映像区間を見つけることが可能となる。
【００７８】
なお、上記の実施の形態における図２、図７、図８、図１２に示すフローチャートをプログラムとして構築し、インデックス生成装置として利用されるコンピュータにインストールし、ＣＰＵ等の制御手段により実行することが可能である。
【００７９】
また、構築されたプログラムを、インデックス生成装置として利用されるコンピュータに接続されるハードディスク装置や、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納しておき、本発明を実施する際にコンピュータにインストールすることも可能である。
【００８０】
なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。
【００８１】
【発明の効果】
上述のように本発明によれば、生成されたインデックスに記載されているシーンの開始時間情報は、編集済み映像のシーン毎の開始時間に近い値となっていることが期待できる。よって編集済み映像から正確なシーン開始時間を得るためにオペレータが映像のすべてをみなくても、生成されたインデックスに記述されたシーン開始時間の前後だけを探せばよく、オペレータの作業を省力化することができる。
【００８２】
そして、本発明では、シナリオにシーン毎の時間に関する情報の記載がない場合においてもインデックスを生成することが可能となり、また、編集済みの映像のカット点の時間情報を利用することで、インデックスのシーン開始時間情報を正確な値に近づけることが可能となる。
【００８３】
さらに、オペレータがあるキーワードに関連するシーンを探したい場合には、検索キーワードが含まれるシーンの開始時間情報を生成されたインデックスより参照することで目的のシーンを容易に見つけることが可能となる。
【００８４】
本発明では、上記の手段を実現するために映像撮影時の特殊な装置や、手順が不要である。
【００８５】
また、画像や音声の信号特徴といった構成単位ではなく、シナリオに基づいて内容に意味のある構成単位（シーン）の開始点を記述したインデックスを生成しているため、シーンの開始点の映像を並べることで映像全体の構造や映像の概要を把握することが可能である。
【図面の簡単な説明】
【図１】本発明の原理を説明するための図である。
【図２】本発明の第１の実施の形態におけるインデックス生成のフローチャートである。
【図３】本発明の第１の実施の形態における入力されるシナリオの例である。
【図４】本発明の第１の実施の形態における生成されるインデックスの例である。
【図５】本発明の第１の実施の形態における誤差を考慮したインデックスの生成例である。
【図６】本発明の第２の実施の形態におけるシナリオの例である。
【図７】本発明の第２の実施の形態におけるインデックス生成のフローチャートである。
【図８】本発明の第２の実施の形態におけるシーン予測時間の修正方法のフローチャートである。
【図９】本発明の第３の実施の形態における映像からカット点を検出した状態を示す図である。
【図１０】本発明の第３の実施の形態における予想開始時間の修正例である。
【図１１】本発明の第３の実施の形態における生成されたインデックスの例である。
【図１２】本発明の第４の実施の形態におけるインデックス生成のフローチャートである。
【図１３】本発明の第４の実施の形態における生成されたインデックスの例である。
【符号の説明】
２０１シーンの時間長
２０２タイトル
２０３スケッチ
２０４ナレーション
２０５テロップ
２０６ＢＧＭ
３０１各シーン番号
３０２予測開始時間
３０３予測終了時間
３０４予測時間長
３０５映像全体の開始時間
３０６終了時間
３０７時間長
８０１映像フレームの時間的な並び
８０２，８０３，８０４検出されたカット点におけるフレーム[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an index generation method, a program, and a storage medium storing an index generation program, and more particularly, to an index generation method, a program, and an index generation program for generating an index of a video by associating an edited video with a scenario. And a storage medium storing the information.
[0002]
[Prior art]
Considering the workflow of video production, usually a scenario is first created in which the scenes are combined to create a video, and the final video (edited video) is created by performing shooting, editing, etc. based on the scenario. It becomes the flow of making.
[0003]
Here, a scenario is a document that describes the planning composition of a video for each scene. For example, information such as the length of a scene, title, sketch, narration, telop, BGM, composition, camera work, performers, and shooting location Is described. It is not necessary that all of this information be described, and the format to be described is not particularly specified. Some are handwritten, others are electronic files created with a word processor or special software, and some are printed.
[0004]
In general, the edited video has a vague temporal correspondence with the original scenario. In some cases, there is no description about the time in the scenario, and in other cases, the time length of each scene is described.However, the time length of the scenario and the time length of the edited video may be changed during the editing process. They do not always match. In this situation, the temporal correspondence between the scenario information and the edited video is ambiguous, and it is not clear which part of the video the metadata described in the scenario corresponds to. It is difficult to add as metadata.
[0005]
As one means for solving this, there is a method of temporally associating a scenario with an edited video by giving an identifier indicating a position in a structure at the time of video production (for example, see Patent Document 1). According to this method, the scenario and the edited video can be temporally associated, and the scenario information can be used as metadata.
[0006]
As another approach, there is a method in which information is extracted from edited video by telop recognition (for example, see Patent Literature 2) or voice recognition (for example, see Patent Literature 3), and is added as metadata.
[0007]
[Patent Document 1]
Japanese Patent Application Laid-Open No. 2000-92419 "Program information management / editing system and hierarchical program information storage / management device used therein"
[Patent Document 2]
Japanese Patent Application Laid-Open No. 2002-279433 "Method and apparatus for searching characters in video"
[Patent Document 3]
JP 2002-175304 A "Video Search Apparatus and Method"
[0008]
[Problems to be solved by the invention]
However, in order to associate the edited video with the scenario using the above “program information management and editing system and the hierarchical program information storage and management device used for the same”, an identifier must be used each time a shot is taken by a camera during video production. Therefore, even if it is necessary to perform the association after the video is completed, there is a problem that the scenario information cannot be associated unless the identifier is assigned at the shooting stage. In addition, at all stages of planning, photographing, and editing, there is a problem that equipment for giving and recording an identifier is required, and the introduction cost is high.
[0009]
Of course, it is conceivable to manually associate the edited video with the scenario, but manually associating all enormous scenes included in the video has a problem that the time cost is high.
[0010]
Furthermore, in the method of extracting information from the edited video by telop recognition or voice recognition and adding it as metadata, the information that can be extracted is only a part of the information included in the scenario, and the recognition accuracy is 100%. Absent.
[0011]
The present invention has been made in view of the above points, and in order to effectively utilize a scenario document at the time of planning in association with an edited video, the scenario and the edited video can be temporally reduced with a small amount of work after the video is completed. An object of the present invention is to provide an index generation method and a program for associating, and a storage medium storing the index generation program.
[0012]
Here, the index describes metadata such as a predicted start time, a predicted end time, a predicted time length, and a scene title for each scene in the scenario description.
[0013]
[Means for Solving the Problems]
FIG. 1 is a diagram for explaining the principle of the present invention.
[0014]
The present invention provides an index generation method for generating an index for a video that associates a video and a scenario based on an edited video and a scenario document,
A real-time length measuring process for obtaining the real-time length of the entire video from the edited video (step 1)
A prediction time calculation process (step 2) for calculating a prediction time length for each scene and a prediction time length of the entire video from the scenario;
A time length comparing step (step 3) for obtaining a ratio between the actual time length of the entire edited video and the predicted time length of the entire video calculated from the scenario;
A predicted time length correcting step (step 4) for correcting the predicted time length for each scene calculated from the scenario using the ratio;
An index generation step (step 5) for generating an index describing the predicted start time, predicted end time, and predicted time length for each scene is performed.
[0015]
Further, in the process of calculating the predicted time of the present invention,
Count all or part of the narration, comment, and dialogue for each scene described in the scenario,
Find the ratio of the number of characters for each scene,
The process of calculating the predicted time length for each scene using the ratio is further performed.
[0016]
In addition, the present invention, after the end of the index generation process,
Detect the cut point and the time of the cut point from the edited video,
By comparing the time of the detected cut point to be a correction candidate and the predicted start time of each scene calculated in the predicted time calculation process, any time of the cut point is set as the corrected predicted start time,
A process of generating an index describing time information obtained by correcting the predicted start time and the predicted time length for each scene calculated from the scenario is further performed.
[0017]
In addition, the present invention, after the end of the index generation process,
Extract metadata for each scene from the scenario,
A process of generating an index describing the extracted metadata for each scene is further performed.
[0018]
The present invention is an index generation program for generating an index for a video that associates a video with a scenario based on an edited video and a scenario document,
A real time length measuring step for obtaining the real time length of the entire video from the edited video,
Calculate the predicted time length of each scene and the predicted time length of the entire video from the scenario, and if there is no description about time in the scenario, narration or comment for each scene described in the scenario, all or dialogue, A predicted time calculation step of counting a number of characters and calculating a ratio of the number of characters for each scene;
A predicted time calculation step of calculating a predicted time length for each scene using the ratio,
A time length comparing step of calculating a ratio of a real time length of the entire edited video to a predicted time length of the entire video calculated from the scenario;
A predicted time length correction step of correcting the predicted time length for each scene calculated from the scenario using the ratio,
An index generation step of generating an index describing a prediction start time, a prediction end time, and a prediction time length for each scene;
A cut point detection step of detecting a cut point and the time of the cut point from the edited video;
A correction prediction start that compares the time of the detected cut point that is a correction candidate with the prediction start time of each scene calculated in the prediction time calculation step and sets any of the cut points as the corrected prediction start time. A time setting step;
A modified index generation step of generating an index describing time information in which the predicted start time and the predicted time length of each scene calculated from the scenario are corrected,
A metadata extraction step of extracting metadata for each scene from the scenario,
Generating an index in which the extracted metadata is described for each scene to generate an index.
[0019]
The present invention is a storage medium storing an index generation program for generating an index for a video that associates a video with a scenario based on a edited video and a scenario document,
A real time length measuring step for obtaining the real time length of the entire video from the edited video,
Calculate the predicted time length of each scene and the predicted time length of the entire video from the scenario, and if there is no description about time in the scenario, narration or comment for each scene described in the scenario, all or dialogue, A predicted time calculation step of counting a number of characters and calculating a ratio of the number of characters for each scene;
A predicted time calculation step of calculating a predicted time length for each scene using the ratio,
A time length comparing step of calculating a ratio of a real time length of the entire edited video to a predicted time length of the entire video calculated from the scenario;
A predicted time length correction step of correcting the predicted time length for each scene calculated from the scenario using the ratio,
An index generation step of generating an index describing a prediction start time, a prediction end time, and a prediction time length for each scene;
A cut point detection step of detecting a cut point and the time of the cut point from the edited video;
A correction prediction start that compares the time of the detected cut point that is a correction candidate with the prediction start time of each scene calculated in the prediction time calculation step and sets any of the cut points as the corrected prediction start time. A time setting step;
A modified index generation step of generating an index describing time information in which the predicted start time and the predicted time length of each scene calculated from the scenario are corrected,
A metadata extraction step of extracting metadata for each scene from the scenario,
And a metadata generation index generation step of generating an index in which the extracted metadata is described for each scene.
[0020]
As described above, in the present invention, when the operator plays the edited video and labels the start point of each scene through the entire video, the start point exists by the index generated by the present invention. Since it is possible to narrow down the sections, it is not necessary to look through all of the images, and labor can be saved.
[0021]
Further, according to the present invention, an index can be generated even when the scheduled start time of a scenario scene is not described.
[0022]
Furthermore, by using the edited cut point at the time of index generation, it becomes possible to correct the scheduled start time of the scene in the index to more accurate information.
[0023]
Also, by describing not only the information on the start time of the scene but also the information on the scenario as metadata in the index, it is possible to perform a scene search using a keyword.
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0024]
[First Embodiment]
FIG. 2 is a flowchart of index generation according to the first embodiment of the present invention.
[0025]
The flowchart shown in the figure includes a real time length measuring step (step 101), a predicted time length calculating step (step 102), a time length comparing step (step 103), a predicted time length correcting step (step 104), an index generation step ( Step 105).
[0026]
Hereinafter, the index creation operation will be described in detail with reference to FIG.
[0027]
In the present invention, since there is a difference between the time length of the video to be completed described in the scenario and the time length of the actual edited video, an object is to generate an index roughly correcting the difference. .
[0028]
First, the input of the real time length measurement step (step 101) is an already edited video. The genre, content, editing method, and recording medium of the video are not particularly limited.
[0029]
In step 101, a real time length L1 of the entire video is obtained from the input edited video. For example, if the edited video is recorded on a tape such as a VHS, the time length is obtained from the difference between the start value and the end value of the counter using a playback deck or the like. If the video is recorded in an electronic format such as MPEG, the time length of the video can be obtained from the properties of the file. Here, it is assumed that the real time length L1 = 200 (s) of the edited video is obtained as the output of step 101, for example.
[0030]
Next, the input of step 102 is the scenario used when producing the edited video. When implemented as a computer program, it is assumed that the scenarios are in a machine-readable format. For example, assume that a scenario in which a scene time length 201, a title 202, a sketch 203, a narration 204, a telop 205, and a BGM 206 are described as shown in FIG.
[0031]
From this scenario, the predicted time lengths S1 to Sn of the scenes (1 to n) are calculated. When the time length 201 is described in advance in each scene as in the scenario of FIG. 3, the value is directly used as the predicted time length S1 to Sn of the edited video. For example, if the time length of scene i is described as “0:20” in the scenario, the predicted time length Si of scene i is set to “Si = 20 (s)”. By the way, in some scenarios, a start time for each scene is described instead of describing a time length for each scene. In this case, the predicted time lengths S1 to Sn of the scene i are
Si = Ti + 1-Ti
Ask by Here, Ti is the start time of scene i.
[0032]
Further, the entire predicted time length L2 is
L2 = S1 + S2 +... + Sn
Is calculated by
[0033]
In the case of the scenario of FIG. 3, the predicted time length of each scene is
S1 = 90 (s),
S2 = 50 (s),
S3 = 40 (s)
And the overall prediction time length is
L2 = S1 + S2 + S3 = 180 (s)
It becomes.
[0034]
The input in step 103 is the real time length L1 of the edited video obtained in step 101 and the predicted time length L2 obtained in step 102.
[0035]
In step 103, a ratio R = L1 / L2 of the entire actual time length L1 calculated from the edited video and the entire predicted time length L2 calculated from the scenario is obtained.
[0036]
In case of the scenario in Fig. 3
R = L1 / L2 = 200/180 = 10/9
It becomes.
[0037]
In step 104, the predicted time lengths S1 to Sn of each scene obtained in step 102 are corrected using the ratio R obtained in step 103. The corrected predicted time lengths S′1 to S′n are
S′1 = S′1 * R, S′2 = S′2 * R,..., S′n = S′n * R
It is obtained by the calculation formula.
[0038]
In the case of the scenario shown in FIG. 3, the estimated time length of each corrected scene is
S′1 = S1 * R = 100 (s),
S′2 = S2 * R ≒ 56 (s),
S′3 = S3 * R ≒ 44 (s)
It becomes.
[0039]
In step 105, each scene number 301, prediction start time is used as shown in FIG. 4 using the corrected predicted time lengths S ′ 1 to S′n for each scene and the real time length L 1 of the video obtained in step 104. An index describing 302, a predicted end time 303, a predicted time length 304, a start time 305, an end time 306, and a time length 307 of the entire video is generated.
[0040]
Here, the prediction start time 302 of the scene 1 and the start time 305 of the entire video are always set to “0:00”, and the corrected prediction time length 304 is added. The prediction start time 302 of the scene is sequentially calculated.
[0041]
The index in FIG. 4 indicates that scene 1 exists at 0:00 to 1:40, scene 2 exists at 1:40 to 2:36, and scene 3 exists at 2:36 to 3:20.
[0042]
Although the predicted start time and predicted time length of the scene of the index generated in step 105 are not always accurate, the scene start time and actual scene start time of the edited video are compared with the start time and time length of the scene described in the scenario. It is likely that the time length is approaching.
[0043]
The predicted start time of the index generated here usually has an error compared to the start time of the edited video. Therefore, taking this error into account, an index may be generated that describes a range that is a measure of the actual start time of the scene. For example, assuming that there is an error of about the difference between before and after the correction of the prediction time length, the index of FIG. For example, assuming that the difference between the predicted time length of the scene i before and after the correction and the predicted time length after the correction is di, and the prediction end time is Ti, the predicted end time of the scene i in the range expression is (Ti-di) to (Ti + di) It can be expressed as. Then, the prediction time length and the prediction start time of the next scene are also corrected according to the range of the prediction end time. At this time, the range of the prediction end time is set as the range of the prediction start time in the next scene, and the minimum and maximum values in the range of the prediction start time and the prediction end time are used as the range of the prediction time length. However, since the predicted start time of the first scene and the predicted end time of the last scene are accurate values calculated from the edited video, there is no need to perform range display.
[0044]
For example, the predicted time length of scene 1 before correction is 90 (s), and the predicted time length after correction is 100 (s), so that the difference is 10 (s). Then, since the predicted end time of scene 1 is 1:40, ± 10 (s) is calculated from this value, and 1:30 to 1:50 is a standard range of the predicted end time. At this time, since the start time of scene 1 is 0:00, when the minimum value and the maximum value of this section are obtained, the predicted time length of scene 1 is 90 to 110 s. By repeating this procedure, the index shown in FIG. 5 is created.
[0045]
It can be expected that the start time information of the scene described in the index generated according to the present embodiment has a value close to the start time of each scene of the edited video. Therefore, even if the operator does not look at the entire video in order to obtain the correct scene start time from the edited video, the scene start time is described before or after or described in the index generated according to the present embodiment. It is sufficient to search around the range, and labor can be saved.
[0046]
Also, no special device or procedure is required for video shooting for index generation.
[0047]
[Second embodiment]
In the present embodiment, an index generation method in a case where there is no description about time in a scenario will be described.
[0048]
FIG. 6 is an example of a scenario according to the second embodiment of the present invention, and shows an example in which there is no description about time in the scenario. FIG. 7 is a flowchart of index generation according to the second embodiment of the present invention.
[0049]
Steps 201 and 205 in FIG. 7 are the same as steps 101 and 105 of the above-described first embodiment, and a description thereof will be omitted.
[0050]
For example, a scenario having no description about time as shown in FIG. In step 202, the number of characters described for each scene in this scenario is counted. Here, the number of characters in scene i is ri.
[0051]
In the scenario of FIG. 6, it is assumed that the narration described in scene 1 has 856 characters and the narration described in scene 2 has 632 characters. In the present embodiment, of the narration, comments, dialogues, etc. described in the scenario, the number of characters of the narration is counted and defined as ri. May be the number of characters of ri. If there is a scene in which no characters such as narration, comments, dialogue, etc. are described in the scenario, it is assumed that the scene has a specified number of characters or time length, and a predetermined appropriate value a is defined as ri. I do. For example, for a, the average number of characters for each scene is used.
[0052]
In step 203, the ratio r1: r2: ...: rn for each scene is calculated based on the number of characters obtained in step 202. In the example of the scenario of FIG. 6, r1: r2 = 856: 632.
[0053]
In step 204, the predicted time lengths S1 to Sn of the respective scenes are calculated from the real time length L1 of the edited video obtained in step 201 and the ratios r1: r2: ... rn obtained in step 203. The predicted time length of each scene is as follows.
[0054]
S1 = L1 * r1 / (r1 + r2 +... + Rn)
S2 = L1 * r2 / (r1 + r2 + ... + rn)
…
Sn = L1 * rn / (r1 + r2 + ... + rn)
When the real time length of the edited video obtained in step 201 is 300 (s), in the example of FIG. 6, the predicted time lengths S1 and S2 of scene 1 and scene 2 are:
S1 = 300 * 856 / (856 + 632) ≒ 173 (s),
S2 = 300 * 632 / (856 + 632) ≒ 127 (s),
Is calculated.
[0055]
In the present embodiment, even when there is no description of time information in the scenario, it is possible to generate an index describing a rough time, and the same effect as in the first embodiment can be obtained.
[0056]
Further, in the present embodiment, the number of characters is counted, but the number of cuts included in each scene can be used.
[0057]
[Third Embodiment]
In the present embodiment, a method for correcting the predicted start time of a scene using the time information of the cut point of the edited video will be described.
[0058]
FIG. 8 is a flowchart of a method for correcting the predicted time of a scene according to the third embodiment of the present invention.
[0059]
Here, the cut point is a joint between shots (video sections continuously captured by a camera). These cut points are candidates for scene and scene breaks. However, not all cut points are scene breaks. In FIG. 8, steps 301 to 305 are the same as steps 101 to 105 of the above-described first embodiment, and a description thereof will not be repeated. This embodiment can also be realized by the procedure of steps 201 to 205 of the second embodiment instead of steps 301 to 305.
[0060]
The input in step 306 is an edited video. In step 306, a cut point is obtained from the edited video based on the feature amount of the video. As a method therefor, a cut point is detected along with the time using an existing technique such as “Japanese Patent Application Laid-Open No. 2002-218376“ Recording medium storing a program of a cut detection device and a cut detection method ””.
[0061]
FIG. 9 shows a state in which a cut point is detected from an image according to the third embodiment of the present invention. The figure shows a diagram 801 schematically showing the temporal arrangement of video frames, and then shows frames 802, 803, and 804 at the detected cut points. In the example shown in the figure, the times of the three detected cut points are 1:33 (frame 802), 2:05 (frame 803), and 2:40 (frame 804).
[0062]
In step 307, the cut start point detected in step 306 is compared with the predicted start time of each scene of the index generated in step 305.
[0063]
For example, in the case of the example of the index in FIG. 4 generated in step 305, the predicted start time of the scene is “00:00, 01:40, 02:36”.
[0064]
Since the exact start time of the scene is likely to be one of the cut points detected in step 306, the predicted start time of this index is compared with the cut point detection time to correct the predicted start time. As a correction method, the detection time of the cut point closest to the prediction start time of the scene is set as the corrected scene prediction start time.
[0065]
FIG. 10 shows a modification of the predicted start time according to the third embodiment of the present invention. In the figure, the correspondence between the predicted start time of the scene and the time of the cut point of the edited video is shown, and the edited video is represented by a timeline 901, 905, and the detected cut points 902, 903, 904. , Predicted start times 906 and 907 obtained from the scenario. Using the cut point closest to the scene prediction start time as shown in FIG. 10, 1:44 (906) is 1:33 (902), and 2:36 (907) is 2:40 (904). Will be corrected.
[0066]
However, when detecting a cut point, there is a possibility that a detection omission has occurred. Therefore, if the interval between the predicted start time of the scene and the time of the closest cut point is equal to or longer than a predetermined time length, the correction may not be performed.
[0067]
Further, since the closest cut point is not always the start point of the correct scene, the operator may visually determine and select a corresponding cut point from cut points existing in the range D before and after the predicted start time. (D is a predetermined time length). Alternatively, it may be similarly selected from M cut points existing before and after the predicted start time (M is a predetermined number).
[0068]
In step 308, an index is generated using the predicted start time for each scene corrected in step 307. The generated index is as shown in FIG.
[0069]
According to the present embodiment, a cut point is detected from an edited video, and compared with the predicted start time for each scene of the index generated in the above-described first or second embodiment. , The prediction time information in the index can be made more accurate.
[0070]
[Fourth Embodiment]
In the present embodiment, in the index generation, not only the time but also a keyword having a high frequency of appearance for each scene, a keyword included in a title, a position of a person, a composition, BGM, and the like are extracted from a scenario, and a meta-data for the scene is extracted. An example of adding data (attached information) will be described.
[0071]
FIG. 12 is a flowchart of index generation according to the fourth embodiment of the present invention. Steps 401 to 405 are the same as steps 101 to 105 of the first embodiment, and thus description thereof is omitted. Further, in the present embodiment, instead of steps 401 to 405, the procedure of steps 201 to 205 in the second embodiment or steps 301 to 305 of the third embodiment may be used.
[0072]
In step 406, metadata is extracted from the scenario. Examples of the metadata include a title, a performer name, the number of performers, a keyword included in a narration or comment, a used BGM, a telop, a composition, and the like.
[0073]
For information such as title, performer name, number of performers, BGM used, telop, structure, etc., the operator transfers information described in the scenario to the index for each item. Items not described in the scenario are left blank in the index. Index items may be added as necessary, or only a part thereof may be used.
[0074]
With regard to the keyword extraction method, for example, the narration text is cut into word units using existing technology such as “Japanese Patent Application Laid-Open No. 1996-95982”, and all words are used as keywords as they are. The frequency of appearance of words may be counted, and the top 10 words may be selected from those having a high frequency of appearance, or the operator may manually extract keywords from the scenario.
[0075]
Here, it is desirable that the keywords described in the index are not included in other scenes as much as possible. For this purpose, the appearance frequency of each word included in the scenario is counted for each scene, and if the same word is included in the top 10 appearance frequencies of different scenes, the common word is excluded from the keywords. Alternatively, if the same word is included in more than a predetermined number of scenes, the word is excluded from the keywords. In this manner, metadata that expresses the features of each scene is obtained by deleting common keywords or making them less likely to become keywords.
[0076]
For example, in step 406, the operator extracts the keywords “first and last name” and “4.6 billion years” from scene 1 of the scenario in FIG. 3, “evolution” from scene 2, and “humanity” from scene 3 to obtain the title, telop, and BGM. If the information described in the scenario is transcribed, the index generated in step 407 is as shown in FIG.
[0077]
When an index to which such metadata is added is generated, in addition to the effects of the first, second, and third embodiments described above, the operator performs a scene search using a keyword and sets the predicted start time of the scene. Based on this, it is possible to find the corresponding video section.
[0078]
It should be noted that the flowcharts shown in FIGS. 2, 7, 8, and 12 in the above embodiment can be constructed as a program, installed on a computer used as an index generation device, and executed by a control unit such as a CPU. It is possible.
[0079]
Further, the constructed program is stored in a hard disk device connected to a computer used as an index generation device, or in a portable storage medium such as a flexible disk or a CD-ROM. It is also possible to install it.
[0080]
It should be noted that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible within the scope of the claims.
[0081]
【The invention's effect】
As described above, according to the present invention, the start time information of the scene described in the generated index can be expected to be a value close to the start time of each scene of the edited video. Therefore, in order to obtain the correct scene start time from the edited video, the operator only has to search for the scene start time described before and after the scene described in the generated index, without having to look at the entire video, thereby saving the operator's work. can do.
[0082]
In the present invention, it is possible to generate an index even when the scenario does not include information on the time for each scene. In addition, by using the time information of the cut point of the edited video, the index can be generated. It is possible to make the scene start time information close to an accurate value.
[0083]
Further, when the operator wants to search for a scene related to a certain keyword, the target scene can be easily found by referring to the start time information of the scene including the search keyword from the generated index.
[0084]
According to the present invention, no special device or procedure is required at the time of capturing an image to implement the above means.
[0085]
In addition, since an index that describes the start point of a structural unit (scene) whose content is meaningful based on a scenario, rather than a structural unit such as a signal feature of an image or an audio, is generated, images of the start point of the scene are arranged. This makes it possible to grasp the structure of the entire image and the outline of the image.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining the principle of the present invention.
FIG. 2 is a flowchart of index generation according to the first embodiment of the present invention.
FIG. 3 is an example of an input scenario according to the first embodiment of the present invention.
FIG. 4 is an example of an index generated in the first embodiment of the present invention.
FIG. 5 is an example of generating an index in consideration of an error according to the first embodiment of the present invention.
FIG. 6 is an example of a scenario according to the second embodiment of the present invention.
FIG. 7 is a flowchart of index generation according to the second embodiment of the present invention.
FIG. 8 is a flowchart of a method for correcting a predicted scene time according to the second embodiment of the present invention.
FIG. 9 is a diagram illustrating a state where a cut point is detected from an image according to the third embodiment of the present invention.
FIG. 10 is a modified example of the estimated start time according to the third embodiment of the present invention.
FIG. 11 is an example of a generated index according to the third embodiment of the present invention.
FIG. 12 is a flowchart of index generation according to the fourth embodiment of the present invention.
FIG. 13 is an example of a generated index according to the fourth embodiment of the present invention.
[Explanation of symbols]
201 Scene Length
202 title
203 Sketch
204 Narration
205 ticker
206 BGM
301 each scene number
302 Predicted start time
303 Predicted end time
304 Prediction time length
305 Start time of entire video
306 end time
307 hours long
801 Temporal arrangement of video frames
802, 803, 804 Frame at detected cut point

Claims

An index generation method for generating an index for a video that associates a video with a scenario based on an edited video and a scenario document,
A real-time length measuring step of obtaining a real-time length of the entire video from the edited video,
A predicted time calculation step of calculating a predicted time length for each scene and a predicted time length of the entire video from the scenario,
A time length comparing step of calculating a ratio of a real time length of the entire edited video and a predicted time length of the entire video calculated from the scenario,
A predicted time length correction step of correcting the predicted time length for each scene calculated from the scenario using the ratio,
An index generating step of generating an index describing a predicted start time, a predicted end time, and a predicted time length for each scene.

In the predicted time calculation process,
Narration and comment for each scene described in the scenario, count all or part of the characters of the dialogue,
Find the ratio of the number of characters for each scene,
2. The index generation method according to claim 1, further comprising calculating a predicted time length for each scene using the ratio.

After completion of the index generation process,
Detecting a cut point and the time of the cut point from the edited video,
Comparing the time of the detected cut point to be a correction candidate and the predicted start time of each scene calculated in the predicted time calculation process, any of the cut points is used as the corrected predicted start time,
3. The index generation method according to claim 1, further comprising the step of generating an index describing time information obtained by correcting the prediction start time and the prediction time length for each scene calculated from the scenario.

After completion of the index generation process,
Extracting metadata for each scene from the scenario,
4. The index generation method according to claim 1, further comprising a step of generating an index in which the extracted metadata is described for each scene.

An index generation program for generating an index for a video that associates a video with a scenario based on a edited video and a scenario document,
A real time length measuring step of obtaining a real time length of the entire video from the edited video,
The predicted time length of each scene and the predicted time length of the entire video are calculated from the scenario, and if there is no description about time in the scenario, all of the narration, comment, and dialog for each scene described in the scenario or A predicted time calculation step of counting the number of partial characters and calculating a ratio of the number of characters for each scene;
A predicted time calculation step of calculating a predicted time length for each scene using the ratio,
A time length comparing step of determining a ratio of a real time length of the entire edited video and a predicted time length of the entire video calculated from the scenario,
A predicted time length correction step of correcting the predicted time length for each scene calculated from the scenario using the ratio,
An index generation step of generating an index describing a prediction start time, a prediction end time, and a prediction time length for each scene;
A cut point detection step of detecting a cut point and the time of the cut point from the edited video,
The time of the detected cut point that is a correction candidate is compared with the predicted start time of each scene calculated in the predicted time calculation step, and any one of the cut points is set as the corrected predicted start time. A start time setting step;
A corrected index generation step of generating an index describing time information obtained by correcting the predicted start time and the predicted time length for each scene calculated from the scenario,
A metadata extraction step of extracting metadata for each scene from the scenario,
An index generation step of generating an index describing the extracted metadata for each scene.

A storage medium storing an index generation program for generating an index for a video that associates a video with a scenario based on a edited video and a scenario document,
A real time length measuring step of obtaining a real time length of the entire video from the edited video,
The predicted time length of each scene and the predicted time length of the entire video are calculated from the scenario, and if there is no description about time in the scenario, all of the narration, comment, and dialog for each scene described in the scenario or A predicted time calculation step of counting the number of partial characters and calculating a ratio of the number of characters for each scene;
A predicted time calculation step of calculating a predicted time length for each scene using the ratio,
A time length comparing step of determining a ratio of a real time length of the entire edited video and a predicted time length of the entire video calculated from the scenario,
A predicted time length correction step of correcting the predicted time length for each scene calculated from the scenario using the ratio,
An index generation step of generating an index describing a prediction start time, a prediction end time, and a prediction time length for each scene;
A cut point detection step of detecting a cut point and the time of the cut point from the edited video,
The time of the detected cut point that is a correction candidate is compared with the predicted start time of each scene calculated in the predicted time calculation step, and any one of the cut points is set as the corrected predicted start time. A start time setting step;
A corrected index generation step of generating an index describing time information obtained by correcting the predicted start time and the predicted time length for each scene calculated from the scenario,
A metadata extraction step of extracting metadata for each scene from the scenario,
A storage medium storing an index generation program characterized by storing a program including: a metadata-added index generation step of generating an index in which extracted metadata is described for each scene.