JP3601959B2

JP3601959B2 - Video structuring method and apparatus, and recording medium storing video structuring program

Info

Publication number: JP3601959B2
Application number: JP04506098A
Authority: JP
Inventors: 隆佐藤; 憲一南; 明人阿久津; 佳伸外村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-02-26
Filing date: 1998-02-26
Publication date: 2004-12-15
Anticipated expiration: 2018-02-26
Also published as: JPH11242685A

Description

【０００１】
【発明の属する技術分野】
本発明は、映像を構造化する方法および装置に関する。
【０００２】
【従来の技術】
映像データベースの検索や表示のためには、映像を適当な時間的区間に分割し、分割された映像を構造化することによって、映像を管理することが望ましい。
【０００３】
従来は、まず、フレーム画像の比較により、画面が大きく変化する時点で映像を分割する。ここで分割された映像の時間的区間は「ショット」と呼ばれる。次にショットをひとつ以上集めて、「シーン」と呼ばれる階層を作る。シーンは、意味的な区切りによって映像を分割した単位であり、例えば、同じ撮影場所や同じテーマが連続するショットを集めてシーンとする。
【０００４】
従来は、シーンを自動的に構成するためには、モデルを用いた。例えば、ニュース映像を対象にする場合、あるニュースキャスター出現ショットから次のニュースキャスター出現ショットまでをひとつのシーンとするというモデルを作る。そして、ニュースキャスターの存在するショットを、フレーム画像の分析によって抽出し、自動的にシーンを切り出すことによって、ニュース映像を構造化する。
【０００５】
【発明が解決しようとする課題】
従来技術では、シーンの意味的な分類に基づいて映像を分割しているため、モデルの定義に任意性があり、一般的な構造化が難しく、実際のデータの性質を反映しない場合があるという問題があった。
【０００６】
具体的には、従来技術には次のような問題がある。まず、映像の意味の解釈は現状の技術では難しいため、自動化が難しい。さらに、大量の映像データを対象にして構造化を行おうとした場合には、均質な構造化ができない。また、モデルに基づく映像解釈は、対象が限定され一般性に乏しく、実際の映像データに含まれている未知の構造を反映するのは難しい。
【０００７】
そこで、本発明の目的は、頻度の高い映像表現を抽出をすることができ、映像の一般的で均質な構造化処理の自動化を容易にし、実際のデータの性質を反映した構造化が可能な映像構造化方法および装置を提供することである。
【０００８】
【課題を解決するための手段】
前記課題を解決するために、本発明の映像構造化方法は、
時間的区間に分割された映像の前記区間における特徴量ベクトルを抽出する特徴量抽出段階と、特徴量ベクトルを番号に変換し、映像を番号列によって表す量子化段階と、番号列において部分番号列の出現回数を計数する計数段階を有し、
前記計数段階が、前記番号列の参照列を作成する参照列作成段階と、前記参照列を一定の順序で並べ替えるソート段階と、並べ替えられた前記参照列において隣接する部分番号列間で一致する番号列の長さである一致長を求めるとともにその出現回数を計数する一致番号計数段階と、部分番号列を、一致長と、出現回数に基づいて、並べ替えるソート段階とを有する。
また、本発明の他の映像構造化方法は、
時間的に区分された映像の特徴量ベクトルを抽出する特徴量抽出段階と、前記特徴量ベクトルを番号に変換し、映像を番号列によって表す量子化段階と、前記番号列において部分番号列の出現回数を計数する計数段階とを有し、
前記計数段階が、前記番号列の参照列を作成する参照列作成段階と、前記参照列を一定の順序で並べ替えるソート段階と、並べ替えられた前記参照列において隣接する部分番号列間で一致する番号列の長さである一致長を求めるとともにその出現回数を計数する一致部分番号列計数段階と、量子化段階が、特徴量ベクトルを番号に変換するときに出力する尤度と、一致長と、出現回数に基づいて、前記部分番号列を並べ替えるソート段階を有する。
【０００９】
本発明は、映像データに繰り返し現れる特徴量の列を抽出することにより映像を構造化する。本発明が扱う映像は、ショットなどの時間的区間に分割されて入力されるとする。また、ショット分割の方法として、例えば、フレーム画像を比較して、画面が大きく変化する時点で映像をショットに分割する方法を用いることができる。
【００１０】
映像データの特徴量を量子化し、番号列化し、その番号列の部分番号列の出現回数を計数するので、頻度の高い映像表現を抽出することができ、映像の構造化処理の自動化が容易になり、一般的で均質な構造化が可能で、実際のデータの性質を反映した構造化ができる。結果を並べ替えて出力するので、結果の観察が容易になる。あるいは、量子化の品質を利用して結果を並べ替えて出力するので、意味のある結果を観察しやすい。
【００１５】
本発明の実施態様によれば、計数段階が、部分番号列に包含され、出現回数が該部分番号列と同じ部分番号列を除外する。断片的な映像表現を除外するので、結果の観察が容易になる。
【００１９】
本発明の映像構造化装置は、時間的区間に分割された映像の前記区間における特徴量ベクトルを抽出する特徴量抽出手段と、特徴量ベクトル番号に変換し、映像を番号列によって表す量子化手段と、前記番号列において部分番号列の出現回数を計数する計数手段を有し、計数手段が、前記番号列の参照列を作成する参照列作成手段と、前記参照列を一定の順序で並べ替えるソート手段と、並べ替えられた前記参照列において隣接する部分番号列間で一致する番号列の長さである一致長を求めるとともにその出現回数を計数する一致部分番号列計数手段と、部分番号列を、一致長と、出現回数に基づいて、並べ替えるソート手段とを有する。
本発明の他の映像構造化装置は、時間的区間における映像の特徴量ベクトルを抽出する特徴量抽出手段と、前記特徴量ベクトル番号に変換し、映像を番号列によって表す量子化手段と、前記番号列において部分番号列の出現回数を計数する計数手段を有し、計数手段が、前記番号列の参照列を作成する参照列作成手段と、前記参照列を一定の順序で並べ替えるソート手段と、並べ替えられた参照列において隣接する部分番号列間で一致する番号列の長さである一致長を求めるとともにその出現回数を計数する一致部分番号列計数手段と、量子化手段が、特徴量ベクトルを番号に変換するときに出力する尤度と、一致長と、出現回数に基づいて、部分番号列を並べ替えるソート手段を有する。
【００２０】
本発明の実施態様によれば、計数手段が、部分番号列に包含される部分番号列を除外して出現回数を計数する。
【００２３】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して説明する。
【００２４】
図１を参照すると、本発明の第１の実施形態の映像構造化装置は、時間的区間に分割された映像の前記区間における映像の特徴量ベクトルを抽出する特徴量抽出部１１と、特徴量ベクトルを番号に変換し、映像を番号列によって表す量子化部１２と、番号列において部分番号列の出現回数を計数する計数部１３から構成される。
【００２５】
次に、本実施形態の動作を図２により説明する。
【００２６】
まず、特徴量抽出段階２１において、時間的区間ごとに、映像の特徴量ベクトルを抽出する。次に、量子化段階２２において、特徴量ベクトルを１次元の番号に変換する。これにより、映像は番号列によって表される。最後の計数段階２３において、部分番号列の出現回数を計数する。
【００２７】
図３を用いて、特徴量抽出段階２１の例を説明する。図３（１）〜（３）はそれぞれ、映像の特徴量の例として、フレーム画像と、音声信号と、映像に付加された付加情報の、３つを示している。フレーム画像は、画素値のＲＧＢ値を用いて、ベクトルとして表すことができる。また、音声信号は、サンプリングにより、波形の数値の列としてベクトル化できる。また、付加情報は、個々の属性値を数値として表しベクトル化する。名義尺度については適当に数字を割り当てる。例えば、テロップの有無をそれぞれ１，０として表す。
【００２８】
以上のように、フレーム画像、音声信号、付加情報を特徴量ベクトルとして表す。実際に用いる特徴量ベクトルとして、これらの特徴量ベクトルをすべて、あるいは、任意の組み合わせで連結したもの、あるいは、どれかひとつの特徴量ベクトルを単独で用いることができる。
【００２９】
なお、以上挙げた特徴量はこれに限定されるものではなく、例えば、フレーム画像の画素値として、色相、明度、彩度を用いてもよいし、音声信号の特徴量として、周波数スペクトルを用いてもよい。また。動き情報などの特徴量を用いてもよい。
【００３０】
次に、図４と表１を用いて、量子化段階２２の第１の例を説明する。量子化段階２２では、特徴量ベクトルの次元数を１次元にまで小さくする。次元数を小さくするためには、例えば、フレーム画像の特徴量ベクトルは、空間的に画素値を間引くことによって、次元数を減らすことができる。
【００３１】
【表１】

また、表１のような量子化テーブルを用いて、０〜２５５までの値域を０〜３までに縮退させ、より少ないビット数によって数値を表すことができる。例えば、図４に示すような、［３０，１５０，５０，２００］という４次元のベクトルｘは、表１の量子化テーブルによって、［０，２，１，３］というベクトルｙに変換される。さらに、ｙの要素を４進数の桁の数値とすると、３９という１次元の数値ｃに変換することができる。
【００３２】
次に、図５を用いて、量子化段階２２の第２の例を説明する。この例では、いわゆるベクトル量子化の方法を用いて、特徴量ベクトルを１次元の数値に変換する。まず、分割段階３１において、特徴量ベクトルの集合を部分集合に分割する。このとき、距離の近い特徴量ベクトルは同じ部分集合に属するようにする。部分集合に通し番号をつけ、量子化段階３２において、各特徴量ベクトルを部分集合の番号によって表す。このようにすれば、距離の近い特徴量ベクトルを同じ番号によって表すことができる。
【００３３】
次に、図６と表２〜表５を用いて、計数段階２３の第１の例を説明する。本例では、まず、参照列作成段階４１において、番号列を参照する参照列を作成する。例えば、“１２３４２１２３２”という番号列が入力されたとすると、参照列は表２のように作成される。
【００３４】
【表２】

つまり、参照列の要素ａ〜ｉは、それぞれ入力番号列の１〜９番目の要素から始まる部分番号列を参照する。例えば、ｃは、３番目の要素から始まる“３４２１２３２”という部分番号列を参照する。なお、ここではアルファベットを用いて参照列の要素を表したが、数字を用いて表してもよい。図６に戻って、次に、ソート段階４２において、参照列の個々の要素が参照する部分番号列を比較して、参照列を一定の順序で並べ替える。例えば、部分番号列を辞書順に並べ替えると、表３のようになる。
【００３５】
なお、ここでいう「辞書順」とは、部分番号列のより先頭の数字が小さい順と定義する。つまり、部分番号列ｓのｉ番目の番号をｓ（ｉ）と表すと、２つの部分番号列ａとｂの大小を次のように判定する。まず、先頭のａ（１）とｂ（１）を比較し、ａ（１）＜ｂ（１）ならばａ＜ｂと判定し、ａ（１）＞ｂ（１）ならばａ＞ｂと判定する。もしａ（１）＝ｂ（１）ならば、比較する位置をひとつ進め、ａ（２）とｂ（２）を同様に比較する。以下同様にａとｂの大小関係が判定されるまで、順番に比較する位置を進めながら比較していく。ただし、ａ、ｂのどちらかが終端に達した場合は、終端に達した部分番号列の方が大きいと判定する。例を示すと、
「１２３」＜「１２４」＜「１２」＜「２３４」
という大小関係になる。「１２４」と「１２」の比較では「１２」の方が先に終端に達するため、「１２４」＜「１２」という大小関係になっていることに注意する。
【００３６】
【表３】

次に、一致部分番号列計数段階４３では、並べ替えられた前記参照列において隣接する部分番号列間で一致する部分番号列とその出現回数を計数する。本例では、表３の行の上から２行づつ比較していき、一致する番号の長さを求める。例えば、ｆとａを比較すれば、先頭の“１２３”が一致するので、一致長は３となる。以下、ａとｅ，ｅとｇというように順番に一致長を求め、表４のように一致長の表を作成する。
【００３７】
【表４】

最後に、個々の一致長について、部分番号列と出現回数列を求める。例えば、一致長３については、表４を順番に見ると、ｆが３以上の一致長になっている。したがって、ｆが参照する部分番号列の先頭から３個目までの“１２３”を部分番号列とし、ｆとａの２回を出現回数とする。一致長２については、ｆとｇが条件を満たす。ｆについては、部分番号列が“１２”で、出現回数はｆ，ａの２回、ｇについては、部分番号列が“２３”で、出現回数はｇ，ｂの２回である。一致長１については、ｆと、ｅ，ｇ，ｂと、ｈが条件を満たす。それぞれ、ｆについては、部分番号列が“１”で、出現回数はｆ，ａの２回、ｅ，ｇ，ｂについては、部分番号列が“２”で、出現回数はｅ，ｇ，ｂ，ｉの４回、ｈについては、部分番号列が“３”で、出現回数はｈ，ｃ，の２回となる。このようにして表４を分析することによって、一致長ごとに、部分番号列と、出現回数を求めることができ、その結果、表５のような出現回数表を得ることができる。
【００３８】
【表５】

本例では、ある部分番号列に包含される部分番号列があっても出現回数が数えられてしまう。このため、ある部分番号列がＮ回出現すると、その部分番号列に包含される部分番号列は、少なくともＮ回の計数をもつことになる。例えば、表５では“１２３”という部分番号列に包含される“１２”という部分番号列が、“１２３”と同じ出現回数になっている。このように断片的な結果が数多く生成され、結果を観察するのが困難になる。そこで、ある部分番号列に包含される部分番号列を除外して計数する例を図７を用いて説明する。
【００３９】
図７の５１と５２は、それぞれ図６の４１と４２と同じである。一致長計数段階５３では、一致部分番号列計数段階４３と同様に、並べ替えられた参照列において隣接する部分番号列間で一致する部分番号列の長さを求め、参照列に記録する。表６〜表９の例を用いて説明する。表４の一致長を表６の一致長Ａ欄に記録し、一致長Ｂ欄には、Ａ欄の値か、上の行のＡ欄の値の大きい方を記録する。Ｂ欄は、Ａ欄によって示される部分番号列の長さを意味している。例えば、ｆ，ａのＢ欄は３になるが、これは、ｆのＡ欄の３によって、ｆとａが参照する部分番号列の先頭から３個が一致していることを、ｆとａそれぞれに記録することを意味する。
【００４０】
【表６】

次に、ソート段階５４で、参照列を元の順序に戻すと、表７のようになる。
【００４１】
【表７】

採否判定段階５５において、一致長を比較し、部分番号列の採否を判定する。表７では、Ｂ欄の値を上から２行づつ比較していく、下の行が上の行と等しいか上回る場合に、下の行を採用する。また、第１行目は無条件で採用する。それ以外の場合は採用しない。表７では、採用の場合を○で、不採用の場合を×で表し、Ｃ欄に記録している。次に、ソート段階５６において、ソート段階５２と同じ順序で参照列を並べ替えると、表８のようになる。
【００４２】
【表８】

一致部分番号列計数段階５７では、Ｂ欄の一致長を見ながら、部分番号列とその出現回数を計数する。このとき、Ｃ欄の採否結果を参照し、○の場合のみ計数する。したがって、一致長３の場合は、部分番号列は“１２３”で、出現回数はｆとａの２回となる。一致長２については、ｇ，ｂが該当するが、いずれもＣ欄が×になっているので計数は０となる。同様に、一致長１については、部分番号列が“２”で、出現回数はｅ，ｉの２回となる。以上の結果、表９のような出現回数表を得ることができる。
【００４３】
【表９】

さて、一般に、計数結果は膨大なものになるため、何らかの指針に沿って並べ替えて観察することが必要である。そこで、図８，９を用いて、本発明の第２の実施形態を説明する。この実施形態では、第１の実施形態に、ソート部１４（ソート段階２４）を付加して、計数結果を並べ替えることを特徴とする。
【００４４】
表１０を用いて、ソート段階２４の第１の例を説明する。ここでは、表５の計数結果の出力を出現回数の大きい順に並べ、出現回数が同じ場合は、一致長の大きい順に並べ、さらに一致長が同じ場合は、部分番号列の辞書順に並べる。なお、この例では、出現回数、一致長、部分番号列の優先順に比較したが、他の優先順に比較して並べ替えてもよい。
【００４５】
【表１０】

次に、表１１を用いて、ソート段階２４の第２の例を説明する。この例では、出現回数と一致長の積の値の大きい順に並べ替えている。積の値が等しいときは、一致長、部分番号列を用いて並べ替えている。こうすることにより、出現回数と一致長との積は、部分番号列が占める番号長を表すので、全体のデータをより多く占める順に観察することができる。
【００４６】
【表１１】

次に、図１０と表１２〜１４を参照して本発明の第３の実施形態を説明する。
【００４７】
本実施形態では、量子化部１２（量子段階）において特徴量ベクトルを１次元の番号に変換するときに尤度を出力するものとする。例えば、量子化段階の第１の例のように数値の間引きを行う場合は、間引きによって生じた誤差の逆数や、最大誤差との差の絶対値を尤度として用いることができる。また、量子化段階の第２の例のように、ベクトル量子化を用いる場合には、特徴ベクトルの部分集合の重心から各特徴量ベクトルまでの距離の逆数を尤度として用いることができる。
【００４８】
【表１２】

表１２では、例として表２〜表５と同じ番号列が表１２のような尤度と一緒に入力されるとする。表１３は表５と同じ結果であるが、部分番号列に対応する尤度の和を求めている。例えば、一致長３は、参照列ｆとａで、部分番号列“１２３”が出現しているが、それぞれの尤度和は、（８０＋６０＋９０）と（１００＋９０）となり、これらを合計して５２０という尤度和をもつ。
【００４９】
【表１３】

このようにそれぞれの部分番号列について尤度和を求め、尤度和の大きい順に並べ替えると表１４のようになる。
【００５０】
【表１４】

この例でも、第２の例と同様に、全体のデータをより多く占める順に観察することができる。さらに、量子化段階で尤度が高い場合ほど重み付けされてより上位に観察することができる。逆に、量子化誤差が大きい場合は、尤度が小さくなり下位に位置することになる。つまり、量子化の品質を結果の順位付けに利用することが可能になる。
【００５１】
図１１を参照すると、本発明の第４の実施形態の映像構造化装置は、時間的区間に分割された映像を入力する入力装置６１と、部分番号列の出現回数を出力する、ディスプレイ、プリンタなどの出力装置６２と、以上の各実施形態で説明した特徴量抽出、量子化、計数、さらにはソートの各処理をコンピュータに実行させるための映像構造化プログラムを記録した、ＦＤ，ＣＤ−ＲＯＭ、半導体メモリなどの記録媒体６３と、記録媒体６３から映像構造化プログラムを読み込んで実行するデータ処理装置６４で構成されている。
【００５２】
本発明は、その主旨を逸脱しない範囲で種々の変形が可能である。例えば、並べ変えの順序を逆にしたり、番号の代わりにアルファベットなどの記号を用いてもよい。
【００５３】
【発明の効果】
以上説明したように、本発明は、下記のような効果がある。
【００５４】
請求項１と４と７の発明は、映像データの特徴量を量子化し、番号列化し、その番号列の部分番号列の出現回数を計数するので、頻度の高い映像表現を抽出することができ、映像の構造化処理の自動化が容易になり、一般的で均質な構造化が可能で、実際のデータの性質を反映した構造化ができ、また結果を並べ替えて出力するので、結果の観察が容易である。
請求項２と５と７の発明は、映像データの特徴量を量子化し、番号列化し、その番号列の部分番号列の出現回数を計数するので、頻度の高い映像表現を抽出することができ、映像の構造化処理の自動化が容易になり、一般的で均質な構造化が可能で、実際のデータの性質を反映した構造化ができ、また量子化の品質を利用して結果を並べ替えて出力するので、意味のある結果を観察しやすい。
【００５９】
請求項３，６，７の発明は、ある部分番号列に包含され、出現回数が該部分番号列と同じ部分番号列を除外して出現回数を計数するので、断片的な映像表現を除外して結果の観察が容易になる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態の映像構造化装置の構成図である。
【図２】第１の実施形態の処理の流れ図である。
【図３】特徴量抽出の例を示す図である。
【図４】量子化段階２２の第１の例を示す図である。
【図５】量子化段階２２の第２の例を示す図である。
【図６】計数段階２３の第１の例を示す流れ図である。
【図７】計数段階２３の第２の例を示す流れ図である。
【図８】本発明の第２の実施形態の映像構造化装置の構成図である。
【図９】第２の実施形態の処理を示す流れ図である。
【図１０】本発明の第３の実施形態の映像構造化装置の構成図である。
【図１１】本発明の第４の実施形態の映像構造化装置の構成図である。
【符号の説明】
１１特徴量抽出部
１２量子化部
１３計数部
１４ソート部
２１特徴量抽出段階
２２量子化段階
２３計数段階
２４ソート段階
３１分割段階
３２量子化段階
４１参照列作成段階
４２ソート段階
４３一致部分番号列計数段階
５１参照列作成段階
５２，５４，５６ソート段階
５３一致長計数段階
５５採否判定段階
５７一致部分番号列計数段階
６１入力装置
６２出力装置
６３記録媒体
６４データ処理装置[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a method and an apparatus for structuring an image.
[0002]
[Prior art]
In order to search and display the video database, it is desirable to manage the video by dividing the video into appropriate time sections and structuring the divided video.
[0003]
Conventionally, first, a video is divided at a time point when a screen greatly changes by comparing frame images. The temporal section of the divided image is called a “shot”. Next, collect one or more shots to create a hierarchy called a "scene". A scene is a unit obtained by dividing a video by a semantic break. For example, shots in which the same shooting location and the same theme are consecutive are collected to be a scene.
[0004]
Conventionally, a model is used to automatically compose a scene. For example, when a news video is targeted, a model is created in which a scene from a certain newscaster appearance shot to the next newscaster appearance shot is defined as one scene. Then, the news video is structured by extracting the shot in which the newscaster is present by analyzing the frame image and automatically cutting out the scene.
[0005]
[Problems to be solved by the invention]
In the prior art, because the video is divided based on the semantic classification of the scene, the definition of the model is arbitrary, general structuring is difficult, and it may not reflect the properties of the actual data There was a problem.
[0006]
Specifically, the prior art has the following problems. First, the interpretation of the meaning of video is difficult with current technology, so automation is difficult. Furthermore, when attempting to structure a large amount of video data, uniform structuring cannot be performed. Further, video interpretation based on a model has limited objects and lacks generality, and it is difficult to reflect an unknown structure included in actual video data.
[0007]
Therefore, it is an object of the present invention to be able to extract frequent image expressions, facilitate automation of general and homogeneous structuring of images, and enable structuring that reflects the properties of actual data. It is to provide an image structuring method and apparatus.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, a video structuring method of the present invention includes:
A feature value extracting step of extracting a feature value vector in the section of the video divided into temporal sections, a quantization step of converting the feature quantity vector into a number and representing the video by a number sequence, and a partial number sequence in the number sequence It has a counting stage for counting the number of occurrences of,
The counting step includes a reference column creating step of creating a reference column of the number sequence, a sorting step of rearranging the reference column in a fixed order, and a coincidence between adjacent partial number sequences in the sorted reference column. A matching number counting step of obtaining a matching length which is the length of a number sequence to be performed and counting the number of appearances, and a sorting step of rearranging the partial number strings based on the matching length and the number of occurrences.
Further, another image structuring method of the present invention includes:
A feature amount extracting step of extracting a feature amount vector of a temporally segmented video, a quantization step of converting the feature amount vector into a number and representing the video by a number sequence, and the appearance of a partial number sequence in the number sequence A counting step of counting the number of times,
The counting step includes a reference column creating step of creating a reference column of the number sequence, a sorting step of rearranging the reference column in a fixed order, and a coincidence between adjacent partial number sequences in the sorted reference column. A matching part number sequence counting step of calculating a matching length, which is the length of the number sequence to be performed, and counting the number of appearances; a likelihood output when the quantization step converts a feature vector into a number; And a sorting step of sorting the partial number sequence based on the number of appearances.
[0009]
According to the present invention, a video is structured by extracting a sequence of feature values that repeatedly appear in video data. It is assumed that a video handled by the present invention is input after being divided into time sections such as shots. Further, as a method of dividing shots, for example, a method of comparing frame images and dividing a video into shots at a time when a screen greatly changes can be used.
[0010]
Quantizing the feature amount of video data, converting it into a number sequence, and counting the number of appearances of the partial number sequence of the number sequence, it is possible to extract frequent video expressions and easily automate the video structuring process. In other words, general and homogeneous structuring is possible, and structuring that reflects the properties of actual data can be performed. Since the results are rearranged and output, it is easy to observe the results. Alternatively, since the results are rearranged and output using the quality of quantization, meaningful results can be easily observed.
[0015]
According to an embodiment of the present invention, the counting step are encompassed part number column, that number of occurrences misses dividing the same part number column said partial number sequence. Eliminating fragmentary video representations makes it easier to observe the results.
[0019]
The video structuring apparatus according to the present invention is characterized in that a feature value extracting means for extracting a feature value vector in the section of a video divided into temporal sections, and a quantization means for converting the feature vector number into a feature vector number and representing the video by a number sequence And counting means for counting the number of appearances of the partial number sequence in the number sequence , wherein the counting device rearranges the reference sequence in a fixed order with reference sequence creating means for creating a reference sequence of the number sequence. Sorting means; matching part number string counting means for obtaining a matching length which is the length of a number string matching between adjacent part number strings in the rearranged reference string and counting the number of appearances; Are sorted based on the matching length and the number of appearances.
Another image structuring apparatus of the present invention includes a feature amount extraction unit that extracts a feature amount vector of a video in a time section, a quantization unit that converts the feature amount vector number into a number sequence and represents the video by a number sequence, A counting means for counting the number of appearances of the partial number sequence in the number sequence, the counting device forming a reference sequence of the number sequence, and a sorting device for rearranging the reference sequence in a predetermined order; A matching part number sequence counting means for calculating a matching length which is a length of a number sequence matching between adjacent part number sequences in the sorted reference string and counting the number of appearances thereof; There is a sorting means for rearranging the partial number sequence based on the likelihood output when converting a vector into a number, the matching length, and the number of appearances.
[0020]
According to the embodiment of the present invention, the counting means counts the number of appearances excluding the partial number sequence included in the partial number sequence.
[0023]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described with reference to the drawings.
[0024]
Referring to FIG. 1, a video structuring apparatus according to a first embodiment of the present invention includes a feature amount extraction unit 11 that extracts a feature amount vector of a video in the section of a video divided into temporal sections, It comprises a quantization unit 12 that converts a vector into a number and represents a video by a number sequence, and a counting unit 13 that counts the number of appearances of a partial number sequence in the number sequence.
[0025]
Next, the operation of the present embodiment will be described with reference to FIG.
[0026]
First, in a feature extraction step 21, a feature vector of a video is extracted for each temporal section. Next, in the quantization step 22, the feature quantity vector is converted into a one-dimensional number. Thereby, the video is represented by the number sequence. In the last counting step 23, the number of appearances of the partial number sequence is counted.
[0027]
An example of the feature amount extraction step 21 will be described with reference to FIG. 3 (1) to 3 (3) respectively show three examples of a feature amount of a video, that is, a frame image, an audio signal, and additional information added to the video. The frame image can be represented as a vector using the RGB values of the pixel values. In addition, the audio signal can be vectorized as a sequence of waveform numerical values by sampling. In addition, the additional information is represented by a vector representing each attribute value as a numerical value. For the nominal scale, numbers are assigned appropriately. For example, the presence or absence of a telop is represented as 1, 0, respectively.
[0028]
As described above, the frame image, the audio signal, and the additional information are represented as a feature vector. As the feature amount vector to be actually used, all of these feature amount vectors, one obtained by connecting them in an arbitrary combination, or any one of the feature amount vectors can be used alone.
[0029]
Note that the feature amounts described above are not limited to this. For example, hue, lightness, and saturation may be used as pixel values of a frame image, or a frequency spectrum may be used as a feature amount of an audio signal. You may. Also. A feature amount such as motion information may be used.
[0030]
Next, a first example of the quantization step 22 will be described with reference to FIG. In the quantization step 22, the number of dimensions of the feature quantity vector is reduced to one dimension. In order to reduce the number of dimensions, for example, the number of dimensions of the feature amount vector of the frame image can be reduced by spatially thinning out pixel values.
[0031]
[Table 1]

Further, by using a quantization table as shown in Table 1, the value range from 0 to 255 is reduced to 0 to 3, and a numerical value can be represented by a smaller number of bits. For example, a four-dimensional vector x of [30, 150, 50, 200] as shown in FIG. 4 is converted into a vector y of [0, 2, 1, 3] by the quantization table of Table 1. . Furthermore, if the element of y is a quaternary number, it can be converted to a one-dimensional numerical value c of 39.
[0032]
Next, a second example of the quantization step 22 will be described with reference to FIG. In this example, the feature amount vector is converted into a one-dimensional numerical value using a so-called vector quantization method. First, in a division step 31, a set of feature amount vectors is divided into subsets. At this time, feature vectors having short distances belong to the same subset. The subsets are serially numbered, and in the quantization stage 32, each feature vector is represented by a subset number. In this manner, feature vectors having short distances can be represented by the same number.
[0033]
Next, a first example of the counting step 23 will be described with reference to FIG. 6 and Tables 2 to 5. In this example, first, in a reference string creation stage 41, a reference string that refers to a number string is created. For example, assuming that a number sequence “123421232” is input, a reference sequence is created as shown in Table 2.
[0034]
[Table 2]

That is, the elements a to i of the reference sequence each refer to the partial number sequence starting from the first to ninth elements of the input number sequence. For example, c refers to the partial number sequence “342212” starting from the third element. Here, although the elements of the reference column are represented using alphabets, they may be represented using numbers. Returning to FIG. 6, next, in the sorting step 42, the reference sequence is rearranged in a fixed order by comparing the partial number sequences referred to by the individual elements of the reference sequence. For example, when the partial number sequence is rearranged in dictionary order, Table 3 is obtained.
[0035]
Here, the “dictionary order” is defined as the order in which the leading numeral of the partial number sequence is smaller. That is, when the i-th number of the partial number sequence s is represented as s (i), the magnitude of the two partial number sequences a and b is determined as follows. First, a (1) and b (1) at the beginning are compared, and if a (1) <b (1), it is determined that a <b. If a (1)> b (1), a> b is determined. judge. If a (1) = b (1), the position to be compared is advanced by one, and a (2) and b (2) are similarly compared. In the same manner, the comparison is performed while sequentially moving the position to be compared until the magnitude relationship between a and b is determined. However, if either a or b reaches the end, it is determined that the partial number sequence that has reached the end is larger. For example,
“123” <“124” <“12” <“234”
The relationship becomes big and small. Note that in the comparison between “124” and “12”, since “12” reaches the end earlier, the magnitude relationship is “124” <“12”.
[0036]
[Table 3]

Next, in the matching part number sequence counting step 43, a part number sequence that matches between adjacent part number sequences in the rearranged reference sequence and the number of appearances thereof are counted. In this example, two rows are compared from the top of the rows in Table 3 to find the length of the matching number. For example, when f and a are compared, the leading “123” matches, and the matching length is 3. Hereinafter, the matching length is obtained in order of a and e, e and g, and a table of the matching length is created as shown in Table 4.
[0037]
[Table 4]

Finally, for each matching length, a partial number sequence and an appearance frequency sequence are obtained. For example, with regard to the match length 3, when the table 4 is viewed in order, the match length is 3 or more. Therefore, the third "123" from the beginning of the partial number sequence referred to by f is defined as the partial number sequence, and two times of f and a are defined as the number of appearances. For the match length 2, f and g satisfy the condition. For f, the partial number sequence is “12”, the number of appearances is f and a twice, and for g, the partial number sequence is “23” and the number of appearances is g and b twice. For the match length 1, f, e, g, b, and h satisfy the condition. For f, the partial number sequence is “1”, the number of appearances is f and a twice, and for e, g, b, the partial number sequence is “2” and the number of appearances is e, g, b , I, four times and h, the partial number sequence is “3”, and the number of appearances is h, c, twice. By analyzing Table 4 in this way, a partial number sequence and the number of appearances can be obtained for each matching length, and as a result, an appearance number table as shown in Table 5 can be obtained.
[0038]
[Table 5]

In this example, the number of appearances is counted even if there is a partial number sequence included in a certain partial number sequence. Therefore, when a certain partial number sequence appears N times, the partial number sequence included in the partial number sequence has at least N counts. For example, in Table 5, the partial number sequence “12” included in the partial number sequence “123” has the same number of appearances as “123”. Many such fragmentary results are generated, making it difficult to observe the results. Thus, an example in which counting is performed by excluding a partial number sequence included in a certain partial number sequence will be described with reference to FIG.
[0039]
7 are the same as 41 and 42 in FIG. 6, respectively. In the matching length counting step 53, as in the matching part number string counting step 43, the length of the part number string that matches between adjacent part number strings in the rearranged reference string is obtained and recorded in the reference string. This will be described using examples in Tables 6 to 9. The match length in Table 4 is recorded in the match length A column of Table 6, and the match length B column records the value in column A or the larger value in column A in the upper row. Column B means the length of the partial number sequence indicated by column A. For example, the B column of f and a is 3 and this is because the 3 in the A column of f indicates that the first three numbers of the partial number sequence referred to by f and a are the same. It means to record each.
[0040]
[Table 6]

Next, in the sorting stage 54, the reference columns are returned to the original order, as shown in Table 7.
[0041]
[Table 7]

In the adoption / rejection determination step 55, the matching lengths are compared to determine the acceptance / rejection of the partial number sequence. In Table 7, the values in column B are compared every two rows from the top. If the lower row is equal to or greater than the upper row, the lower row is adopted. The first line is adopted unconditionally. Otherwise, it is not adopted. In Table 7, the case of adoption is represented by ○ and the case of non- adoption is represented by ×, and is recorded in the C column. Next, in the sorting step 56, when the reference columns are rearranged in the same order as in the sorting step 52, Table 8 is obtained.
[0042]
[Table 8]

In the matching part number sequence counting step 57, the part number sequence and the number of appearances thereof are counted while checking the matching length in the B column. At this time, reference is made to the adoption / rejection result in column C, and counting is performed only in the case of ○. Therefore, when the match length is 3, the partial number sequence is “123”, and the number of appearances is f and a twice. The match length 2 corresponds to g and b, but in both cases, the count is 0 because the column C is x. Similarly, for the match length 1, the partial number sequence is “2”, and the number of appearances is e and i, two. As a result, an appearance frequency table as shown in Table 9 can be obtained.
[0043]
[Table 9]

By the way, in general, the counting result becomes enormous, and it is necessary to rearrange and observe according to some guideline. Therefore, a second embodiment of the present invention will be described with reference to FIGS. This embodiment is characterized in that the sorting unit 14 (sorting step 24) is added to the first embodiment to rearrange the counting results.
[0044]
A first example of the sorting stage 24 will be described with reference to Table 10. Here, the outputs of the counting results in Table 5 are arranged in descending order of the number of appearances. If the number of appearances is the same, they are arranged in descending order of the matching length. If the matching lengths are the same, they are arranged in dictionary order of the partial number sequence. In this example, the number of appearances, the matching length, and the partial number sequence are compared in the order of priority, but they may be compared and sorted in another order of priority.
[0045]
[Table 10]

Next, a second example of the sorting step 24 will be described with reference to Table 11. In this example, the items are rearranged in descending order of the value of the product of the number of appearances and the matching length. If the values of the products are equal, they are sorted using the matching length and the partial number sequence. By doing so, the product of the number of appearances and the matching length represents the number length occupied by the partial number sequence, so that observation can be performed in the order in which the entire data occupies more.
[0046]
[Table 11]

Next, a third embodiment of the present invention will be described with reference to FIG. 10 and Tables 12 to 14.
[0047]
In the present embodiment, it is assumed that the likelihood is output when the quantization unit 12 (quantum stage) converts a feature vector into a one-dimensional number. For example, when numerical values are thinned out as in the first example of the quantization stage, the reciprocal of an error caused by the thinning or the absolute value of the difference from the maximum error can be used as the likelihood. When vector quantization is used as in the second example of the quantization step, the reciprocal of the distance from the center of gravity of the subset of feature vectors to each feature vector can be used as likelihood.
[0048]
[Table 12]

In Table 12, it is assumed that the same number sequence as in Tables 2 to 5 is input together with likelihoods as in Table 12 as an example. Table 13 shows the same result as Table 5, but calculates the sum of likelihoods corresponding to the partial number sequence. For example, as for the match length 3, although the reference number f and the reference number f and the partial number string “123” appear, the likelihood sums are (80 + 60 + 90) and (100 + 90). It has a likelihood sum.
[0049]
[Table 13]

Thus, the sum of likelihoods is obtained for each partial number sequence and rearranged in descending order of the likelihood sums, as shown in Table 14.
[0050]
[Table 14]

Also in this example, as in the second example, observation can be performed in the order in which the entire data occupies more. Furthermore, the higher the likelihood in the quantization stage, the higher the weight and the higher the likelihood of observation. Conversely, when the quantization error is large, the likelihood is reduced and the quantization error is positioned lower. That is, the quality of the quantization can be used for ranking the results.
[0051]
Referring to FIG. 11, an image structuring apparatus according to a fourth embodiment of the present invention includes an input device 61 for inputting an image divided into time intervals, a display and a printer for outputting the number of appearances of a partial number sequence. And an FD, CD-ROM recording an image structuring program for causing a computer to execute each of the feature amount extraction, quantization, counting, and sorting processes described in the above embodiments. , A recording medium 63 such as a semiconductor memory, and a data processing device 64 that reads and executes a video structuring program from the recording medium 63.
[0052]
The present invention can be variously modified without departing from the gist thereof. For example, the rearrangement order may be reversed, or a symbol such as an alphabet may be used instead of a number.
[0053]
【The invention's effect】
As described above, the present invention has the following effects.
[0054]
According to the first, fourth and seventh aspects of the present invention, the feature quantity of the video data is quantized and converted into a number sequence, and the number of appearances of the partial number sequence of the number sequence is counted. , Making it easy to automate the video structuring process, enabling general and homogeneous structuring, structuring that reflects the nature of the actual data, and reordering and outputting the results to observe the results Is easy.
According to the second, fifth and seventh aspects of the present invention, the feature quantity of the video data is quantized and converted into a number sequence, and the number of appearances of the partial number sequence of the number sequence is counted. , Making it easy to automate video structuring processing, enabling general and homogeneous structuring, structuring that reflects the nature of the actual data, and reordering results using the quality of quantization. Output, it is easy to observe meaningful results.
[0059]
The invention of claim 3, 6, 7 are included in a certain part number column, the number of occurrences is counted the number of occurrences by excluding the same part number column said partial number string, excluding the fragmentary video expression Te result of the observation is easily ing.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a video structuring apparatus according to a first embodiment of the present invention.
FIG. 2 is a flowchart of a process according to the first embodiment.
FIG. 3 is a diagram illustrating an example of feature amount extraction.
FIG. 4 is a diagram illustrating a first example of a quantization step 22;
FIG. 5 shows a second example of the quantization step 22;
FIG. 6 is a flowchart showing a first example of the counting step 23;
FIG. 7 is a flowchart showing a second example of the counting step 23;
FIG. 8 is a configuration diagram of a video structuring apparatus according to a second embodiment of the present invention.
FIG. 9 is a flowchart illustrating processing according to the second embodiment.
FIG. 10 is a configuration diagram of a video structuring apparatus according to a third embodiment of the present invention.
FIG. 11 is a configuration diagram of a video structuring apparatus according to a fourth embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 11 Feature amount extraction part 12 Quantization part 13 Counting part 14 Sorting part 21 Feature amount extraction stage 22 Quantization stage 23 Counting stage 24 Sort stage 31 Division stage 32 Quantization stage 41 Reference sequence creation stage 42 Sort stage 43 Matching part number sequence Counting step 51 Reference

string creation step

52, 54, 56 Sorting step 53 Match length counting step 55 Adoption decision step 57 Matching part number string counting step 61 Input device 62 Output device 63 Recording medium 64 Data processing device

Claims

An image structuring method for structuring an image divided into time sections,
A feature value extraction step of extracting a feature value vector of a video in the section,
A quantization step of converting the feature amount vector into a number and representing the video by a number sequence;
Counting the number of appearances of the partial number sequence in the number sequence,
The counting step includes:
A reference column creating step of creating a reference column of the number sequence,
A sorting step of sorting the reference columns in a certain order;
A matching part number string counting step of determining a matching length that is the length of a number string that matches between adjacent part number strings in the sorted reference string and counting the number of appearances thereof ;
A video structuring method, comprising: a sorting step of sorting the partial number sequence based on a matching length and the number of appearances .

An image structuring method for structuring an image divided into time sections,
A feature value extraction step of extracting a feature value vector of a video in the section,
A quantization step of converting the feature amount vector into a number and representing the video by a number sequence;
A counting step of counting the number of appearances of the partial number sequence in the number sequence;
Has,
The counting step includes:
A reference column creating step of creating a reference column of the number sequence,
A sorting step of sorting the reference columns in a certain order;
A matching part number string counting step of determining a matching length that is the length of a number string that matches between adjacent part number strings in the sorted reference string and counting the number of appearances thereof;
A video structure comprising: a sorting step of rearranging the partial number sequence based on a likelihood output when converting a feature amount vector into a number, a matching length, and the number of appearances. Method.

The counting step is included in part number column, that number of occurrences misses dividing the same part number column said partial number sequence, the video structuring method according to claim 1 or claim 2.

An image structuring device for structuring an image divided into temporal sections,
Feature amount extracting means for extracting a feature amount vector of a video in the section,
Quantizing means for converting the feature amount vector number and representing the video by a number sequence;
Have a counting means for counting the number of occurrences of part number string in the number column,
The counting means,
Reference column creation means for creating a reference column of the number sequence,
Sorting means for rearranging the reference columns in a certain order;
Matching part number sequence counting means for determining a matching length that is the length of a number sequence matching between adjacent part number sequences in the sorted reference sequence and counting the number of appearances thereof,
A video structuring apparatus comprising: sorting means for rearranging a partial number sequence based on a matching length and the number of appearances.

An image structuring device for structuring an image divided into temporal sections,
Feature amount extracting means for extracting a feature amount vector of a video in the section,
Quantizing means for converting the feature amount vector number and representing the video by a number sequence;
Counting means for counting the number of appearances of the partial number sequence in the number sequence;
Has,
The counting means,
Reference column creation means for creating a reference column of the number sequence,
Sorting means for rearranging the reference columns in a certain order;
Matching part number sequence counting means for determining a matching length that is the length of a number sequence matching between adjacent part number sequences in the sorted reference sequence and counting the number of appearances thereof,
A sorter that sorts the partial number sequence based on a likelihood output when the feature value vector is converted into a number, a match length, and a comparison result of the number of appearances,
An image structuring device having:

It said counting means are encompassed part number column, number of occurrences that remove dividing the same part number column said partial number sequence, the video structuring device according to claim 4 or claim 5.

A recording medium storing a video structuring program for causing a computer to execute the video structuring method according to claim 1 .