JP4253410B2

JP4253410B2 - News article extraction device

Info

Publication number: JP4253410B2
Application number: JP30538099A
Authority: JP
Inventors: 敦史小野; 宏之赤木
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1999-10-27
Filing date: 1999-10-27
Publication date: 2009-04-15
Anticipated expiration: 2019-10-27
Also published as: JP2001126050A

Description

【０００１】
【発明の属する技術分野】
この発明は、ニュース映像をデータベース化する場合に検索単位となるニュース記事を映像から自動的に切り出すニュース記事切り出し装置に関する。
【０００２】
【従来の技術】
従来のニュース映像データベースにおいては、「情報処理学会誌Ｖol,37 Ｎo.9“Informedia:ＣＭＵディジタルビデオライブラリプロジェクト”」等に記載された技術によって、映像のセグメンテーションや検索のための索引付けが行なわれている。
【０００３】
【発明が解決しようとする課題】
しかしながら、上記従来の技術においては、映像のセグメンテーションについて要素技術が列挙されてはいるものの、具体的な解は開示されていない。
【０００４】
そこで、この発明の目的は、ニュース映像からニュース映像の特性を用いてニュース記事を切り出すことができるニュース記事切り出し装置を提供することにある。
【０００５】
【課題を解決するための手段】
上記目的を達成するため、この発明のニュース記事切り出し装置は、映像入力手段と、上記映像入力手段によって入力されたニュース映像を音声と動画とに分離する映像分離手段と、上記映像分離手段によって分離された動画から,動画の変化点であるカット点画像を検出するカット点画像検出手段と、上記分離された動画からテロップを検出するテロップ検出手段と、上記分離された動画から顔の画像を検出する顔検出手段と、上記カット点画像のうち,上記テロップ検出手段によって検出されたテロップの直前に位置して顔が映っているカット点画像間の類似度を算出する類似度算出手段と、上記映像分離手段によって分離された音声から無音部分を検出する無音検出手段と、上記テロップの直前に位置して顔が映っているカット点画像のうち類似度の高いカット点画像を選出し,この選出カット点画像近傍に在る無音部分の間を記事として切り出す記事切り出し手段を備えたことを特徴としている。
【０００６】
上記構成によれば、映像入力手段によって入力されたニュース映像が、映像分離手段によって動画部分と音声部分に分離される。そして、カット点画像検出手段によって、上記分離された動画から、動画の変化点であるカット点画像が検出される。また、テロップ検出手段によって、上記分離された動画からテロップが検出される。また、顔検出手段によって、上記分離された動画から顔の画像が検出される。また、無音検出手段によって、上記分離された音声から無音部分が検出される。さらに、類似度算出手段によって、上記カット点画像のうち、上記テロップ検出手段によって検出されたテロップの直前に位置して顔が映っているカット点画像間の類似度が算出される。そうすると、記事切り出し手段によって、上記テロップの直前に位置して顔が映っているカット点画像のうち類似度の高いカット点画像が選出され、この選出カット点画像近傍に在る無音部分の間が記事として切り出される。
【０００７】
また、上記ニュース記事切り出し装置は、上記記事切り出し手段によって記事を切り出すに先立って、上記カット点画像検出手段によって検出された各カット点画像間のうち、コマーシャルメッセージ(ＣＭ)に該当するカット点画像間を検出して除去するＣＭ除去手段を備えることが望ましい。
【０００８】
上記構成によれば、ニュース記事が切り出されるに先立って、ＣＭ除去手段によって、上記検出された各カット点画像間のうち、ＣＭに該当するカット点画像間が検出されて除去される。したがって、以後に行われる上記記事切り出し手段による記事切り出しの際には、上記ＣＭの区間は除外されてニュース記事のみが切り出される。
【０００９】
【発明の実施の形態】
以下、この発明を図示の実施の形態により詳細に説明する。図１は、本実施の形態のニュース記事切り出し装置のブロック図である。
【００１０】
Ａ/Ｄ変換部１は、入力された映像をＡ/Ｄ変換してデジタル化する。映像分離部２は、Ａ/Ｄ変換部１によってデジタル化された映像を動画像部分と音声部分に分離する。こうして分離された動画像データと音声データはメモリ３に保存される。こうしてメモリ３に保存された動画像は、動画像解析部４によって解析される。また、メモリ３に保存された音声は、音声解析部５によって解析される。そして、動画像および音声の夫々の解析結果はメモリ３に格納される。
【００１１】
解析結果統合部６は、上記動画像解析部４および音声解析部５による解析結果をメモリ３から読み出して統合し、後に詳述するようにしてニュース記事を切り出す。こうして切り出されたニュース記事は、映像蓄積部７に蓄積されるのである。
【００１２】
図２は、図１に示すニュース記事切り出し装置によって実行されるニュース記事切り出し手順の概略を示すフローチャートである。以下、図２に従って、ニュース記事切り出し手順について説明する。
【００１３】
先ず、ステップＳ1で、上記映像分離部２によって、Ａ/Ｄ変換部１からのデジタル映像データが動画像データと音声データとに分離される。ステップＳ2で、動画像解析部４によって、上記分離された動画像データに基づいて動画像が解析される。尚、この動画像解析によって、後に詳述するように、動画の変化点であるカット点画像や、テロップが映っているフレーム(テロップフレーム)や、人物の顔が映っているフレームが検出される。
【００１４】
ステップＳ3で、上記音声解析部５によって、上記分離された音声データに基づいて音声が解析される。尚、この音声解析によって、後に詳述するように、無音区間が検出される。ステップＳ4で、解析結果統合部６によって、動画像解析結果と音声解析結果とが統合されてニュース記事の切り出しが行われる。
【００１５】
図３は、図２に示すニュース記事切り出し手順におけるステップＳ2の動画像解析時に行われるカット点画像検出処理動作のフローチャートである。先ず、ステップＳ11で、フレーム数frameが「０」に初期化される。ステップＳ12で、直前フレームの色相ヒストグラムhist2および現フレームの色相ヒストグラムhist1が「０」に初期化される。ステップＳ13で、上記メモリ３に格納されている動画像データから処理すべきフレームデータが読み出される。ステップＳ14で、色相ヒストグラムhist1が生成されて更新される。尚、色相ヒストグラムhist1の生成については後に詳述する。ステップＳ15で、フレーム数frameがインクリメントされる。
【００１６】
ステップＳ16で、frame＝１であるか否かが判別される。その結果、frame＝１であればステップＳ21に進む一方、そうでなければステップＳ17に進む。ステップＳ17で、直前フレームの色相ヒストグラムhist2と現フレームの色相ヒストグラムhist1との差分Ｄが求められる。尚、差分Ｄは式(１)によって算出するが、他の計算方法によって算出しても構わない。

ステップＳ18で、差分Ｄが閾値ＴＨより小さいか否かが判別される。その結果、上記閾値ＴＨより小さければステップＳ19に進み、閾値ＴＨ以上であればステップＳ20に進む。ステップＳ19で、直前フレームからの色相ヒストグラムの変化量が小さいために現フレームは動画の変化点とは見なされず、非カット点画像であると判定される。ステップＳ20で、現フレームはカット点画像であると判定される。そして、例えばカット点画像の位置を表わすカット点画像テーブルに登録される。ステップＳ21で、直前フレームの色相ヒストグラムhist2が、現フレームの色相ヒストグラムhist1で更新される。ステップＳ22で、メモリ３に未処理のフレームデータが在るか否かが判別される。その結果、在ればステップＳ13に戻り、次のフレームの処理に移行する。なければカット点画像検出処理動作を終了する。
【００１７】
このように、本実施の形態においては、現フレームの色相ヒストグラムhist1が直前フレームの色相ヒストグラムhist2に対して閾値ＴＨ以上変化した場合には、現フレームは動画の変化点であると見なし、現フレームをカット点画像として検出するのである。
【００１８】
図４は、図３示すカット点画像検出処理動作の上記ステップＳ14において実行される色相ヒストグラム生成処理動作のフローチャートである。先ず、ステップＳ31で、図３示すカット点画像検出処理動作の上記ステップＳ13においてメモリ３から取り込まれたフレームデータから、１画素の画素値Ｒ,Ｇ,Ｂが読み出される。ステップＳ32で、式(２)によって座標変換が行われる。

ステップＳ33で、式(３)によってヒストグラムのインクリメントが行なわれる。
hist[ｉ]＝hist[ｉ]＋１ … (３)
ｉ＝Ｈ/ＨＱＵＡＮＴ
但し、Ｈ_ＱＵＡＮＴ：色相の量子化定数
ステップＳ34で、当該フレームデータに未処理画素が在るか否かが判別される。その結果、在ればステップＳ31に戻って次の画素値Ｒ,Ｇ,Ｂの処理に移行する。なければ色相ヒストグラム生成処理動作を終了する。
【００１９】
図５は、図２示すニュース記事切り出し手順におけるステップＳ2の動画像解析時に行われるテロップフレーム検出処理動作のフローチャートである。以下の説明においては、横書きのテロップに関する検出方法を例に説明するが、縦書きのテロップを検出する場合にはｘ軸とｙ軸とを入れ換えれば同様に実行できる。
【００２０】
ステップＳ41で、フレーム数frameが「０」に初期化される。ステップＳ42で、直前フレームのエッジ画像edge2および現フレームのエッジ画像edge1が「０」に初期化される。ステップＳ43で、メモリ３に格納されている動画像データから処理すべきフレームデータが読み出される。ステップＳ44で、エッジ画像edge1が生成されて更新される。尚、エッジ画像の生成については後に詳述する。ステップＳ45で、フレーム数frameがインクリメントされる。
【００２１】
ステップＳ46で、frame＝１であるか否かが判別される。その結果、frame＝１であればステップＳ55に進む一方、そうでなければステップＳ47に進む。ステップＳ47で、後述する投影ヒストグラム生成方法によって、エッジ画像edge1のｙ軸への投影ヒストグラムが生成される。ステップＳ48で、上記ステップＳ47において生成されたヒストグラムが解析されて、テロップの候補領域となる山の範囲[ｙ1,ｙ2]が閾値等に基づいて検出される。ここで、通常、テロップの周囲にはエッジが集中している。そのために、横書きの場合には、図６に示すようなｙ軸への投影ヒストグラムには山が検出される。そこで、上記ステップＳ48においては、ｙ軸への投影ヒストグラムの山を検出してテロップの候補領域とするのである。次に、ステップＳ49で、上記ステップＳ48における山の範囲の検出結果に基づいて、山が在るか否かが判別される。その結果、山が在ればテロップの候補領域は在りとしてステップＳ50に進む一方、山がなければテロップの候補領域は無しとして上記ステップＳ55に進む。
【００２２】
ステップＳ50で、上記ｙ1からｙ2までの範囲のエッジがｘ軸に投影されてエッジ画像の投影ヒストグラムが生成される。ステップＳ51で、上記ステップＳ50において生成されたヒストグラムから、文字部分の山の範囲が閾値等に基づいて検出される。ステップＳ52で、上記ステップＳ51における山の範囲の検出結果に基づいて、山が在るか否かが判別される。その結果、山が在ればステップＳ54に進み、なければステップＳ53に進む。
【００２３】
ステップＳ53で、現フレームが非テロップフレームであると判定される。ステップＳ54で、現フレームがテロップフレームであると判定される。そして、例えばテロップフレームの位置を表わすテロップフレームテーブルに登録される。ステップＳ55で、直前フレームのエッジ画像edge2が現フレームのエッジ画像edge1で更新される。ステップＳ56で、メモリ３に未処理のフレームデータが在るか否かが判別される。その結果、在ればステップＳ43に戻り、次のフレームの処理に移行する。なければテロップフレーム検出処理動作を終了する。
【００２４】
このように、本実施の形態においては、生成した上記エッジ画像edge1のｙ軸への投影ヒストグラムに山が在り、且つ、ｘ軸への投影ヒストグラムにも山が在る場合には、現フレームにテロップ文字列が在ると判定し、現フレームをテロップフレームとして検出するのである。
【００２５】
図７は、図５示すテロップフレーム検出処理動作の上記ステップＳ44において実行されるエッジ画像生成処理動作のフローチャートである。ステップＳ61で、現フレームのエッジ画像edge1が「０」に初期化される。ステップＳ62で、図５示すテロップフレーム検出処理動作の上記ステップＳ43においてメモリ３から取り込まれたフレームデータから濃淡画像grayが生成される。ここで、濃淡画像grayとは、上記フレームデータから得られた画素値Ｒ,Ｇ,Ｂを式(２)によって座標変換を行ない、Ｖ値を画素値として表現した画像のことである。ステップＳ63で、変数Ｗに濃淡画像grayの幅の値が設定される。一方、変数Ｈには濃淡画像grayの高さの値が設定される。ステップＳ64で、変数ｉに初期値「１」が設定される。ステップＳ65で、変数ｊに初期値「１」が設定される。ステップＳ66で、水平エッジh edge[i][j]および垂直エッジv edge[i][j]が式(３)によって算出される。
h edge[i][j]＝abs(gray[i-1][j]−gray[i+1][j]) … (４)
v edge[i][j]＝abs(gray[i][j-1]−gray[i][j+1])
ここで、gray[i][j]は、濃淡画像grayにおける座標(j,i)の画素値である。
【００２６】
ステップＳ67で、変数ｊの内容がインクリメントされる。ステップＳ68で、ｊ＜(Ｗ−１)であるか否かが判別される。その結果、ｊ＜(Ｗ−１)であればステップＳ66に戻って水平エッジ及び垂直エッジの算出が続行される。一方、ｊ≧(Ｗ−１)であればステップＳ69に進む。ステップＳ69で、変数ｉがインクリメントされる。ステップＳ70で、ｉ＜(Ｈ−１)であるか否かが判別される。その結果ｉ＜(Ｈ−１)であればステップＳ65に戻って水平エッジおよび垂直エッジの算出が続行される。一方、ｉ≧(Ｈ−１)であればエッジ画像生成処理動作を終了する。
【００２７】
つまり、本実施の形態においては、１≦ｊ≦(Ｗ−１)及び１≦ｉ≦(Ｈ−１)の範囲で求めた水平方向の両隣画素のＶ値の差の絶対値であるh edge[i][j]を画素値とする画像h edgeと、１≦ｊ≦(Ｗ−１)及び１≦ｉ≦(Ｈ−１)の範囲で求めた垂直方向の両隣画素のＶ値の差の絶対値であるv edge[i][j]を画素値とする画像v edgeとをもって、上記エッジ画像edgeとするのである。
【００２８】
尚、本実施の形態においては、上述の方法によってエッシ画像edgeを生成するのであるが、それに限定されるものではなく他のエッジ検出方法を用いても差し支えない。
【００２９】
図８は、図５示すテロップフレーム検出処理動作の上記ステップＳ47あるいはステップＳ50において実行される投影ヒストグラム生成処理動作のフローチャートである。ステップＳ71で、図５示すテロップフレーム検出処理動作の上記ステップＳ44において、図７に示すエッジ画像生成処理動作に従って生成された現フレームのエッジ画像edge1(h edge1,v edge1)、および、作業バッファ等に保持されている前フレームのエッジ画像edge2(h edge2,v edge2)が入力される。
【００３０】
ステップＳ72で、投影する範囲(ｘmin,ｙmin)〜(ｘmax,ｙmax)が設定される。但し、本処理動作が図５示すテロップフレーム検出処理動作の上記ステップＳ47において呼び出された場合には、エッジ画像edge1,edge1の全体が対象となるために、投影範囲は(０,０)〜(Ｗ−１,Ｈ−１)となる。また、テロップフレーム検出処理動作の上記ステップＳ50から呼び出された場合には、投影範囲は(０,ｙ1)〜(Ｗ−１,ｙ2)となる。ステップＳ73で、ｙ軸への投影ヒストグラムｙhistおよびｘ軸への投影ヒストグラムｘhistが「０」に初期化される。ステップＳ74で、上記ステップＳ72において設定された投影範囲内の一つの画素に関して、ｙ軸への投影ヒストグラムｙhistおよびｘ軸への投影ヒストグラムｘhistが式(５)によって生成される。
ｘhist[j]＝ｘhist[j]＋Ｍin(h edge1[i][j],h edge2[i][j]) …（５）
ｙhist[j]＝ｙhist[j]＋Ｍin(v edge1[i][j],v edge2[i][j])
但し、本処理動作が、図５示すテロップフレーム検出処理動作の上記ステップＳ47において呼び出された場合には、ｙ軸への投影ヒストグラムｙhistが算出される。一方、テロップフレーム検出処理動作の上記ステップＳ50から呼び出された場合には、ｘ軸への投影ヒストグラムｘhistが算出される。ステップＳ75で、未処理画素が在るか否かが判別される。その結果、在ればステップＳ74に戻って次の画素に関する処理に移行し、なければ投影ヒストグラム生成処理動作を終了する。
【００３１】
図９は、図２に示すニュース記事切り出し手順におけるステップＳ2の動画像解析時に行われる人物の顔検出処理動作のフローチャートである。尚、本実施の形態においては、図１０に示す状態遷移モデルと呼ばれる階層構造を有するモデルの照合によって顔検出を行なっているが、ニューラルネットワークやその他の手法を用いても差し支えない。
【００３２】
ステップＳ81で、上記メモリ３から顔の検出用の画像が入力される。ステップＳ82で、上記入力された画像が、隣接する画素が類似色であるような画素の集合でなる領域に分割される。ステップＳ83で、上記分割された各領域の色,位置,形状の特微量が抽出される。ステップＳ84で、上記各領域(領域数Ｎ)が、図１０に示す状態遷移モデルの初期状態であるcolor segなる状態ラベルが与えられることによって初期化される。ステップＳ85で、領域番号ｉと状態が変化した領域数を表す変数changeとの夫々が、「０」に初期化される。
【００３３】
ステップＳ86で、領域[i]の特徴量と、領域[i]が遷移可能な状態への遷移する場合に満たすべき状態遷移ルールとの照合が行なわれる。その結果、領域[i]が如何なる状態遷移ルールをも満たさない場合にはステップＳ88に進む。一方、満たす場合にはステップＳ87に進む。ステップＳ87で、領域[i]の状態ラベルが、満たしている状態遷移ルールに対応する状態の状態ラベルに更新される。そうした後、変数changeの内容がインクリメントされる。例えば、領域[i]の状態ラベルがcolor_segであり、図１０に示す状態遷移モデルを用いる場合を考えると、状態ラベルcolor_segから遷移可能な状態はskin_segおよびblack_segである。この場合、領域[i]が上記両状態に遷移するために満たすべき状態遷移ルールは、図１０において上記状態ラベルcolor_segから状態ラベルskin_segおよび状態ラベルblack_segへの矢印に設定されている「ＩsＳkin」及び「ＩsＢlack」である。すなわち、領域[i]の特微量が状態遷移ルール「ＩsＳkin」を満たしていれば領域[i]の状態ラベルをskin_segに更新する。同様に、状態遷移ルール「ＩsＢlack」を満たしていればblack_segに更新するのである。
【００３４】
ステップＳ88で、領域番号ｉがインクリメントされる。ステップＳ89で、領域番号ｉが領域数Ｎより小さいか否かが判別される。その結果、ｉ＜ＮであればステップＳ86に戻って次の領域に対する処理に移行する。一方、ｉ≧ＮであればステップＳ90に進む。ステップＳ90で、change＝０であるか否か、つまり状態が遷移した領域が在るか否かが判別される。その結果、在ればステップＳ85に戻る。こうして、上述の処理が、状態ラベルが変化した領域が存在しなくなるまで繰り返される。
【００３５】
ステップＳ91で、総ての領域の状態ラベルをチェックすることによって、状態ラベルfaceを持つ領域が存在するか否かが判別される。その結果、存在すればステップＳ92に進み、存在しなければステップＳ93に進む。ステップＳ92で、人物の顔が検出されたとして、例えば人物の顔があるフレームの位置を表わす顔フレームテーブルに登録される。そうした後、人物の顔検出処理動作を終了する。ステップＳ93で、人物の顔は検出されなかったとして、人物の顔検出処理動作を終了する。
【００３６】
このように、本実施の形態においては、入力画像を類似色の領域に分割し、各領域の特微量を抽出し、各領域の特徴量が図１０に示す状態遷移モデルの状態遷移ルールを満たしていれば当該領域の状態を遷移させ、この処理を総ての領域が状態遷移しなくなるまで繰り返す。そして、状態ラベルfaceを持つ領域が存在した場合には、人物の顔を検出したと判断するのである。
【００３７】
図１１は、図２に示すニュース記事切り出し手順におけるステップＳ3の音声解析時に行われる無音区間検出処理動作のフローチャートである。先ず、ステップＳ101で、無音区間であることを表す変数Ｓilenceが「FALSE」に初期化される。ステップＳ102で、区間[sp,ep]の長さ分の音声データが読み込まれる。ステップＳ103で、上記読み込まれた音声データから音声パワーｐが算出される。ステップＳl04で、上記音声パワーｐの分散値が式(６)によって算出される。

ステップＳ105で、上記算出された分散値Ｖarが閾値ＴＨより小さいか否かが判別される。その結果、Ｖar＜ＴＨであれば区間[sp,ep]は無音区間であると判断されステップＳ108に進む。一方、Ｖar≧ＴＨであれば無音区間ではないと判断されステップＳ106に進む。
【００３８】
ステップＳ106で、上記変数Ｓilenceが「TRUE」であるか、つまり直前の処理区間は無音区間であるか否かが判別される。その結果、「TRUE」でなければ上記ステップＳ101に戻って、同様の処理が繰り返される。一方、「TRUE」であればステップＳ107に進む。ステップＳ107で、後述するようにステップＳ109,Ｓ111において値が設定された始端「start」と終端「end」に基づいて、無音区間[start,end]が検出される。そして、無音区間の位置を表わす無音区間テーブルに登録される。そうした後、上記ステップＳ101に戻って、同様の処理が繰り返される。
【００３９】
ステップＳ108で、上記変数Ｓilenceが「TRUE」であるか否かが判定される。その結果、「TRUE」でなければ、現在の区間[sp,ep]は無音区間の開始点であるとしてステップＳ109に進む。一方、「TRUE」であれば、現在の区間[sp,ep]は直前の無音区間の継続区間であるとしてステップＳ111に進む。ステップＳ109で、無音区間の始端「start」に「sp」が設定される。ステップＳ111で、変数Ｓilenceに「TRUE」が設定される。そうした後にステップＳ112に進む。ステップＳ111で、無音区間の終端「end」に「ep」が設定される。ステップＳ112で、未処理の音声データが在るか否かが判別される。その結果、在ればステップＳ102に戻って次の音声データの処理に移行する。そして、上記ステップＳ105において「Ｖar≧ＴＨ」と判定され、上記ステップＳ106において「直前の処理区間は無音区間である」と判定されると、上記ステップＳ107において無音区間[start(＝sp),end(＝ep)]が検出されるのである。一方、未処理の音声データがなければ無音区間検出処理動作を終了する。
【００４０】
このように、本実施の形態においては、音声区間[sp,ep]におけるパワーｐの分散値Ｖarが閾値ＴＨより小さい場合には、区間[sp,ep]は無音区間であると判断する。さらに、直前区間が無音区間であれば区間[sp,ep]は上記直前の無音区間の継続区間であると判定する。一方、直前区間が無音区間でなければ区間[sp,ep]は無音区間の開始点であると判定する。そして、次に分散値Ｖarが閾値ＴＨ以上になると、無音区間[start(＝sp),end(＝ep)]を検出するのである。
【００４１】
図１２は、図２示すニュース記事切り出し手順のステップＳ4において、解析結果統合部６によって行われるニュース記事切り出し処理動作のフローチャートである。尚、本ニュース記事切り出し処理動作においては、動画像解析部４によって図３に示すカット点画像検出処理動作に従って検出されたカット点画像に基づいて、ニュース記事を切り出すものである。
【００４２】
ステップＳ121で、図３に示すカット点画像検出処理動作によって上記カット点画像テーブルに登録されているカット点画像の集合Ｃut:{ｃ_i|ｉ＝１,２,…,Ｎ_cut}が得られる。そして、この集合{ｃ_i}を対象として、後述するようなクラスタリングによって、第１クラスタに属するカット点画像の集合Ｃlst：{clst_i|ｉ＝１,２,…,Ｎ_clst}⊂Ｃutが得られる。ステップＳ122で、集合{clst_i}のインデックスｉが「１」に初期化される。
【００４３】
ステップＳ123で、clst_iがニュース記事の始点として設定される。ステップＳ124で、ｉがインクリメントされる。ステップＳ125で、clst_iがニュース記事の終点として設定される。こうして一つのニュース記事が切り出されるのである。ステップＳ126で、ｉが最大値「Ｎ_clst」よりも小さいか否かが判別される。その結果、ｉ＜Ｎ_clstであれば上記ステップＳ123に戻って次のニュース記事の切り出し処理に移行する。一方、ｉ≧Ｎ_clstであればニュース記事切り出し処理動作を終了する。
【００４４】
図１３は、図１２に示すニュース記事切り出し処理動作のステップＳ121において実行されるクラスタリング処理動作のフローチャートである。ステップＳ131で、総てのカット点画像間の類似度Ｓimilar(ｉ,ｊ)が算出される。ここで、ｉ,ｊは類似度を算出する２つのカット点画像の番号である。尚、本実施の形態においては、類似度Ｓimilar(ｉ,ｊ)として式(１)の逆数を用いるが、他の類似度を用いても構わない。ステップＳ132で、頻度ヒストグラムＨist[i],Ｈist[j]が「０」に初期化される。ステップＳ133で、類似度Ｓimilar(ｉ,ｊ)が閾値ＴＨより大きいか否かが判別される。その結果、Ｓimilar(ｉ,ｊ)＞ＴＨであればステップＳ134に進み、Ｓimilar(ｉ,ｊ)≦ＴＨであればステップＳ135に進む。ステップＳ134で、頻度ヒストグラムＨist[i],Ｈist[j]がインクリメントされる。ステップＳ135で、未処理の類似度Ｓimilar(ｉ,ｊ)が在るか否かが判別される。その結果、在れば上記ステップＳ133に戻って、次の類似度Ｓimilar(ｉ,ｊ)に対する処理に移行する。
【００４５】
ステップＳ136で、上記生成された頻度ヒストグラムＨist[i],Ｈist[j]に基づいて最大頻度位置Ｍaxが検出される。ステップＳ137で、現在のクラスタが空集合であるか否かが判別される。その結果、空集合であればステップＳ139に進む一方、空集合でなければステップＳ138に進む。ステップＳ138で、上記検出された最大頻度位置Ｍaxが第１クラスタに含まれるか否かが判別される。その結果、含まれていればステップＳ139に進む一方、含まれていなければクラスタリング処理動作を終了する。ステップＳ139で、総てのＳimilar(Ｍax,j)が閾値ＴＨより大きくなるようなｊが第１クラスタに追加される。ステップＳ140で、頻度ヒストグラムＨist[i]，Ｈist[j]からＭaxが除外される。そうした後、上記ステップＳ136に戻って上術の処理が繰り返され、上記ステップＳ138において最大頻度位置Ｍaxが第１クラスタに含まれていないと判別されるとクラスタリング処理動作を終了するのである。
【００４６】
一般的に、ニュース映像においては、一つのニュース記事が終了する毎に、静止しているニュースキャスタの映像に切り換り、次のニュース記事の解説等があってから次のニュース記事の映像が開始されるようになっている。つまり、各ニュース記事の間には、「静止しているニュースキャスタの映像」という非常に類似した動画の変化点が存在するのである。
【００４７】
そこで、本実施の形態においては、上述のように、上記カット点画像の集合に対して、総てのカット点画像間の類似度Ｓimilar(ｉ,ｊ)を算出し、この類似度Ｓimilar(ｉ,ｊ)が閾値ＴＨより大きい頻度を表す頻度ヒストグラムＨist[i]，Ｈist[j]の最大頻度位置Ｍaxを含むようにクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clst間を一つのニュース記事として切り出すのである。
【００４８】
図１４は、図２示すニュース記事切り出し手順のステップＳ4における解析結果統合部６によって行われる図１２とは異なるニュース記事切り出し処理動作のフローチャートである。尚、本ニュース記事切り出し処理動作においては、上記カット点画像に加えて、動画像解析部４によって図５に示すテロップフレーム検出処理動作に従って検出されたテロップフレームに基づいて、ニュース記事を切り出すものである。
【００４９】
ステップＳ141で、図３に示すカット点画像検出処理動作によって上記カット点画像テーブルに登録されているカット点画像の集合Ｃut:{ｃ_i|ｉ＝１,２,…,Ｎ_cut}が得られる。さらに、図５に示すテロップフレーム検出処理動作によって上記テロップフレームテーブルに登録されているテロップフレームの集合Ｔelop:{ｔ_i|ｉ＝１,２,…,Ｎ_telop}が得られる。そして、テロップフレームｔ_iの直前のカット点画像がカット点画像の集合{ｃ_i}から抽出される。ステップＳ142で、図１３に示すクラスタリング処理動作によってクラスタリングが行われ、第１クラスタに属するカット点画像の集合Ｃlst：{clst_i|ｉ＝１,２,…,Ｎ_clst}⊂Ｃutが得られる。ステップＳ143で、集合｛clst_i}のインデックスｉが「１」に初期化される。
【００５０】
ステップＳ144で、clst_iがニュース記事の始点として設定される。ステップＳ145で、ｉがインクリメントされる。ステップＳ146で、clst_iがニュース記事の終点として設定される。こうして一つのニュース記事が切り出されるのである。ステップＳ147で、ｉが最大値「Ｎ_clst」よりも小さいか否かが判別される。その結果、ｉ＜Ｎ_clstであれば上記ステップＳ144に戻って次のニュース記事の切り出し処理に移行する。一方、ｉ≧Ｎ_clstであればニュース記事切り出し処理動作を終了する。
【００５１】
上述したように、ニュース映像においては、各ニュース記事の間には「静止しているニュースキャスタの映像」という類似映像が存在し、この映像がニュース映像と言う動画像全体の中の変化点となっている。また、上記ニュースキャスタの映像の直後にはテロップフレームが存在するのが常である。
【００５２】
そこで、本実施の形態においては、テロップフレームの直前に在るカット点画像の集合に対して、上記類似度を用いたクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clst間を一つのニュース記事として切り出すのである。
【００５３】
図１５は、図２示すニュース記事切り出し手順のステップＳ4において、解析結果統合部６によって行われる図１２および図１４とは異なるニュース記事切り出し処理動作のフローチャートである。尚、本ニュース記事切り出し処理動作においては、上記カット点画像およびテロップフレームに加えて、動画像解析部４によって図９に示す人物の顔検出処理動作に従って検出された人物の顔に基づいて、ニュース記事を切り出すものである。
【００５４】
ステップＳ151で、図３に示すカット点画像検出処理動作によって上記カット点画像テーブルに登録されているカット点画像の集合Ｃut:{ｃ_i|ｉ＝１,２,…,Ｎ_cut}が得られる。さらに、図５に示すテロップフレーム検出処理動作によって上記テロップフレームテーブルに登録されているテロップフレームの集合Ｔelop:{ｔ_i|ｉ＝１,２,…,Ｎ_telop}が得られる。更に、図９に示す人物の顔検出処理動作によって上記顔フレームテーブルに登録されているフレームの集合Ｆace：｛ｆ_i|ｉ＝１,２,…,Ｎ_face}⊂Ｃutが得られる。そして、テロップフレームｔ_iの直前のカット点画像であり且つ顔が検出されたカット点画像がカット点画像の集合{ｃ_i}から抽出される。
【００５５】
ステップＳ152で、図１３に示すクラスタリング処理動作によってクラスタリングが行われ、第１クラスタに属するカット点画像の集合Ｃlst：{clst_i|ｉ＝１,２,…,Ｎ_clst}⊂Ｆace⊂Ｃutが得られる。ステップＳ153で、集合｛clst_i}のインデックスｉが「１」に初期化される。
【００５６】
ステップＳ154で、clst_iがニュース記事の始点として設定される。ステップＳ155で、ｉがインクリメントされる。ステップＳ156で、clst_iがニュース記事の終点として設定される。こうして一つのニュース記事が切り出されるのである。ステップＳ157で、ｉが最大値「Ｎ_clst」よりも小さいか否かが判別される。その結果、ｉ＜Ｎ_clstであれば上記ステップＳ154に戻って次のニュース記事の切り出し処理に移行する。一方、ｉ≧Ｎ_clstであればニュース記事切り出し処理動作を終了する。
【００５７】
上述したように、ニュース映像におけるテロップフレームの直前には「静止しているニュースキャスタの映像」という類似している人物の顔の映像が存在し、この映像がニュース映像と言う動画像全体の中の変化点となっている。
【００５８】
そこで、本実施の形態においては、テロップフレームの直前に在って、且つ、人の顔が検出されたカット点画像の集合に対して、上記類似度を用いたクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clst間を一つのニュース記事として切り出すのである。
【００５９】
図１６は、図２示すニュース記事切り出し手順のステップＳ4において、解析結果統合部６によって行われる図１２,図１４および図１５とは異なるニュース記事切り出し処理動作のフローチャートである。尚、本ニュース記事切り出し処理動作においては、音声解析部５によって図１１示す無音区間検出検出処理動作に従って検出された無音区間に基づいて、ニュース記事を切り出すものである。
【００６０】
ステップＳ161で、図１１に示す無音区間検出処理動作によって上記無音区間テーブルに登録された上記無音区間の集合Ｓilent：{[ｓ_i,ｅ_i]|ｉ＝１,２,…,Ｎ_silent}が得られる。そして、集合{[ｓ_i,ｅ_i]}のインデックスｉが「１」に初期化されるのである。
【００６１】
ステップＳ162で、上記無音区間の終点ｅ_iがニュース記事の始点として設定される。ステップＳ163で、ｉがインクリメントされる。ステップＳ164で、無音区間の始点ｓ_iがニュース記事の終点として設定される。こうして一つのニュース記事が切り出されるのである。ステップＳ165で、ｉが最大値「Ｎ_silent」よりも小さいか否かが判別される。その結果、ｉ＜Ｎ_silentであれば上記ステップＳ162に戻って次のニュース記事の切り出し処理に移行する。一方、ｉ≧Ｎ_silentであればニュース記事切り出し処理動作を終了する。
【００６２】
上述したように、ニュース映像には「静止しているニュースキャスタの映像」が存在するのであるが、このニュースキャスタは、次のニュース記事の解説に入る前に一次的に無言状態となる。そこで、本実施の形態においては、無音区間の間を一つのニュース記事として切り出すのである。
【００６３】
図１７は、図２示すニュース記事切り出し手順のステップＳ4において、解析結果統合部６によって行われる図１２および図１４〜図１６とは異なるニュース記事切り出し処理動作のフローチャートである。尚、本ニュース記事切り出し処理動作においては、上記カット点画像,テロップフレーム,人物の顔および無音区間に基づいて、ニュース記事を切り出すものである。
【００６４】
ステップＳ171で、図３に示すカット点画像検出処理動作によって上記カット点画像テーブルに登録されているカット点画像の集合Ｃut:{ｃ_i|ｉ＝１,２,…,Ｎ_cut}が得られる。さらに、図５に示すテロップフレーム検出処理動作によって上記テロップフレームテーブルに登録されているテロップフレームの集合Ｔelop:{ｔ_i|ｉ＝１,２,…,Ｎ_telop}が得られる。更に、図９に示す人物の顔検出処理動作によって上記顔フレームテーブルに登録されたフレームの集合Ｆace:{ｆ_i|ｉ＝１,２,…,Ｎ_face}⊂Ｃutが得られる。更に、図１１に示す無音区間検出処理動作によって上記無音区間テーブルに登録された無音区間の集合Ｓilent：{[ｓ_i,ｅ_i]|ｉ＝１,２,…,Ｎ_silent}が得られる。そして、テロップフレームｔ_iの直前のカット点画像であり且つ顔が検出されたカット点画像がカット点画像の集合{ｃ_i}から抽出される。
【００６５】
ステップＳ172で、図１３に示すクラスタリング処理動作によってクラスタリングが行われ、第１クラスタに属するカット点画像の集合Ｃlst：{clst_i|ｉ＝１,２,…,Ｎ_clst}⊂Ｆace⊂Ｃutが得られる。ステップＳ173で、集合｛clst_i}のインデックスｉが「１」に初期化される。
【００６６】
ステップＳ174で、clst_iがニュース記事の仮の始点startとして設定される。ステップＳ175で、ｉがインクリメントされる。ステップＳ176で、clst_iがニュース記事の仮の終点endとして設定される。ステップＳ177で、仮の終点end付近に無音区間が存在するか否かが判別される。その結果、存在する場合にはステップＳl79に進み、存在しない場合にはステップＳ178に進む。ステップＳl78で、ｉが最大値「Ｎ_clst」よりも小さいか否かが判別される。その結果、ｉ＜Ｎ_clstであれば、上記ステップＳ175に戻って仮の終点endの更新が行われる。一方、ｉ≧Ｎ_clstであれば、ニュース記事切り出し処理動作を終了する。
【００６７】
ステップＳ179で、仮の始点start付近に在る無音区間終点が検出されて「Ｓ」として設定される。ステップＳ180で、仮の終点end付近に在る無音区間始点が検出されて「Ｅ」として設定される。ステップＳ181で、区間[Ｓ,Ｅ]がニュース記事として切り出される。ステップＳ182で、ｉが最大値「Ｎ_clst」よりも小さいか否かが判別される。その結果、ｉ＜Ｎ_clstであれば上記ステップＳ174に戻って次のニュース記事の切り出し処理に移行する。一方、ｉ≧Ｎ_clstであればニュース記事切り出し処理動作を終了する。
【００６８】
上述したように、ニュース映像におけるテロップフレームの直前には「静止しているニュースキャスタの映像」という類似している人物の顔の映像が存在し、この映像がニュース映像と言う動画像全体の中の変化点となっている。また、上記ニュースキャスタは、次のニュース記事の解説に入る前に一次的に無言状態となる。
【００６９】
そこで、本実施の形態においては、テロップフレームの直前に在って、且つ、人の顔が検出されたカット点画像の集合に対して、上記類似度を用いたクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clstから一つのニュース記事の仮の始点startと仮の終点endとを求める。そして、上記仮の始点start付近の無音区間終点Ｓと仮の終点end付近の無音区間始点Ｅとの間を一つのニュース記事として切り出すのである。
【００７０】
図１８は、上記解析結果統合部６によって、図１２,図１４〜図１７に示すニュース記事切り出し処理動作が行われるに先立って実行されるＣＭ区間を除去するＣＭ除去処理動作のフローチャートである。但し、図１８に示すＣＭ除去処理動作のフローチャートは、１５秒間のＣＭを検出除去するものである。したがって、３０秒間のＣＭを検出除去する場合には、図１８に示すＣＭ除去処理動作のフローチャート中における数字「１５」を「３０」に変更すればよい。
【００７１】
ステップＳ191で、開始カット点画像番号startが「０」に初期化される。ステップＳ192で、カット点画像間累積時間intervalが「０」に初期化され、終了カット点画像番号endが「start＋１」に初期化される。ステップＳ193で、図３に示すカット点画像検出処理動作によって検出されたカット点画像に間して、「end−1」番目のカット点画像と「end」番目のカット点画像との間（カット点画像間[end−1，end]）の時間が取得されて、カット点画像間累積時間intervalに加算される。ステップＳ194で、カット点画像間累積時間intervalが「１５」であるか否かが判別される。その結果、interval＝１５であればステップＳ195に進み、そうでなければステップＳ197に進む。
【００７２】
ステップＳ195で、当該カット点画像間[end−1，end]がＣＭ区間と判定されて映像データが削除される。そして、開始カット点画像番号startが「end」に更新される。こうすることによって、以降のニュース記事切り出し処理動作等においては、ＣＭ区間が処理対象から外されることになる。
【００７３】
ステップＳ196で、上記開始カット点画像番号startが「総カット点画像数Ｎ−１」よりも小さいか否かが判別される。その結果、start＜(Ｎ−１)であれば、上記ステップＳ192に戻って次のＣＭの検出処理に移行する。一方、start≧(Ｎ−１)であればＣＭ除去処理動作を終了する。
【００７４】
ステップＳ197で、上記開始カット点画像番号startが、「総カット点画像数Ｎ−１」よりも小さいか否かが判別される。その結果、start＜(Ｎ−１)であればステップＳ198に進み、start≧(Ｎ−１)であればＣＭ除去処理動作を終了する。ステップＳ198で、上記カット点画像間累積時間intervalが「１５」以上であるか否かが判別される。その結果、interval≧１５である場合にはステップＳ200に進み、interval＜１５である場合にはステップＳ199に進む。ステップＳl99で、終了カット点画像番号endがインクリメントされる。そうした後、上記ステップＳl93に戻って当該ＣＭの検出処理が続行される。ステップＳ200で、上記開始カット点画像番号startがインクリメントされる。そうした後、上記ステップＳ192に戻って、次のＣＭの検出処理に移行する。そして、上記ステップＳ196,Ｓ197において、start≧(Ｎ−１)であると判別されるとＣＭ除去処理動作を終了する。
【００７５】
このように、本実施の形態においては、図３に示すカット点画像検出処理動作によって検出されたカット点画像に間して、先頭から順次カット点画像間の累積時間を取得する。そして、カット点画像間累積時間が１５秒になった場合には、そのカット点画像間はＣＭ区間であるとしてその間の映像を削除する。そうすることによって、以後のニュース記事切り出し処理等において、ＣＭ区間を処理対象から外すことができるのである。
【００７６】
尚、本実施の形態においては、上述の方法によってＣＭを検出したが、他の方法によってＣＭ検出を行なっても一向に構わない。
【００７７】
上述のように、本実施の形態においては、映像分離部２によって、入力映像を動画像部分と音声部分に分離する。そして、動画像解析部４によって上記動画像が解析され、音声解析部５によって上記音声が解析される。
【００７８】
その場合における動画像の解析は、
(１) 現フレームの色相ヒストグラムhist1が、直前フレームの色相ヒストグラムhist2に対して閾値ＴＨ以上変化した場合には、現フレームをカット点画像として検出する。
(２) 両隣画素のＶ値の差の絶対値を画素値とするエッジ画像edge1のｙ軸への投影ヒストグラムおよびｘ軸への投影ヒストグラムに山が在る場合には、現フレームをテロップフレームとして検出する。
(３) 入力画像を類似色の領域に分割して特微量を抽出し、各領域の特徴量に基づいて上記状態遷移モデルにしたがって各領域の状態の遷移を繰り返す。そして、最終的に状態ラベルfaceを持つ領域が存在する場合には、人物の顔を検出したと判断する。
【００７９】
また、上記音声の解析は、
(４) 音声区間[sp,ep]におけるパワーｐの分散値Ｖarが閾値ＴＨより小さく、直前区間が無音区間でなければ無音区間の始端startに「sp」を設定する一方、上記直前区間が無音区間であれば無音区間の終端endに「ep」を設定する。そして、次に上記分散値Ｖarが閾値ＴＨ以上になると、区間[start,end]を無音区間として検出する。
【００８０】
そして、上記動画像解結果および音声解析結果に基づいて、解析結果統合部６によって、以下の方法によってニュース記事を切り出す。
【００８１】
(Ａ) 総てのカット点画像間の類似度Ｓimilar(ｉ,ｊ)が閾値ＴＨより大きい頻度を表す頻度ヒストグラムＨist[i],Ｈist[j]を求め、最大頻度位置Ｍaxを含むようにクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clst間を一つのニュース記事として切り出す。
【００８２】
したがって、上記色相ヒストグラムの変化点であるカット点画像(ニュースキャスタの画像)に基づいて、ニュース映像からニュース記事を切り出すことができる。
【００８３】
(Ｂ) テロップフレームの直前に在るカット点画像の集合に対して、上記類似度を用いたクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clst間を一つのニュース記事として切り出す。
【００８４】
したがって、上記カット点画像(ニュースキャスタの画像)とテロップフレームとに基づいて、ニュース映像から更に精度よくニュース記事を切り出すことができる。
【００８５】
(Ｃ) テロップフレームの直前に在って、且つ、顔が検出されたカット点画像の集合に対して、上記類似度を用いたクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clst間を一つのニュース記事として切り出す。
【００８６】
したがって、上記テロップフレームと顔が検出されたカット点画像(ニュースキャスタの画像)とに基づいて、ニュース映像から更に精度よくニュース記事を切り出すことができる。
【００８７】
(Ｄ) 無音区間の間を一つのニュース記事として切り出す。したがって、音声情報(ニュースキャスタの無言区間)に基づいて、ニュース映像からニュース記事を切り出すことができる。
【００８８】
(Ｅ) テロップフレームの直前に在って、且つ、顔が検出されたカット点画像の集合に対して、上記類似度を用いたクラスタリングを行う。そして、第１クラスタに属する夫々のカット点画像clstから一つのニュース記事の仮の始点startと仮の終点endとを求める。そして、仮の始点start付近の無音区間終点Ｓと仮の終点end付近の無音区間始点Ｅとの間を一つのニュース記事として切り出す。
【００８９】
したがって、上記テロップフレームと顔が検出されたカット点画像(ニュースキャスタの画像)と音声情報(ニュースキャスタの無言区間)とに基づいて、ニュース映像から更に精度よくニュース記事を切り出すことができる。
【００９０】
さらに、上記解析結果統合部６は、上述のようなニュース記事切り出し処理を行うに先立って、上記カット点画像の列から累積時間が１５秒になるカット点画像間を検索し、累積時間が１５秒であるカット点画像間をＣＭと確定して削除する。したがって、以後のニュース記事切り出し処理等において、上記ＣＭ区間を処理対象から外すことができるのである。
【００９１】
尚、この発明のニュース記事切り出し装置においては、上記色相ヒストグラムの変化点であるカット点画像に基づくニュース記事切り出し方法、上記カット点画像とテロップフレームとに基づくニュース記事切り出し方法、上記テロップフレームと顔が検出されたカット点画像とに基づくニュース記事切り出し方法、音声情報(無音区間)に基づくニュース記事切り出し方法、上記テロップフレームと顔が検出されたカット点画像と音声情報(無音区間)とに基づくニュース記事切り出し方法の総てが実現可能な構成を有する必要は無い。上記各ニュース記事切り出し方法から適宜選択すればよい。
【００９２】
【発明の効果】
以上より明らかなように、この発明のニュース記事切り出し装置は、映像分離手段によってニュース映像を動画部分と音声部分とに分離し、この分離された動画からカット点画像検出手段によってカット点画像(動画の変化点)を検出し、テロップ検出手段によってテロップを検出し、顔検出手段によって顔の画像を検出し、上記分離された音声から無音検出手段によって無音部分を検出し、類似度算出手段によって、上記テロップの直前に位置して人物の顔が映っているカット点画像間の類似度を算出し、記事切り出し手段によって、上記テロップの直前に位置して顔が映っている類似度の高いカット点画像を選出し、この選出カット点画像近傍に在る無音部分の間を記事として切り出すので、テロップの直前に在る類似している人物の顔が映っている動画の変化点の位置の近傍の無音部分間で、ニュース記事を切り出すことができる。
【００９３】
すなわち、この発明によれば、テロップの映像を参照して、上記動画像が人物の映像、すなわちニュースキャスタの映像に切り変った時点であって、且つ、次のニュース記事の開始前に上記ニュースキャスタが無言状態になった時点を的確に検出して、ニュース記事を更に正しく切り出すことができるのである。
【００９４】
また、上記ニュース記事切り出し装置は、上記記事の切り出しに先立って、上記検出された各カット点画像間のうちＣＭに該当するカット点画像間を検出して除去するＣＭ除去手段を備えれば、ＣＭ区間のニュース映像を事前に除去できる。したがって、以後に行われる上記記事切り出し手段による記事切り出しの際にはニュース記事のみを切り出すことができ、誤検出を低減した精度の高いニュース記事切り出しが可能になるのである。
【図面の簡単な説明】
【図１】この発明のニュース記事切り出し装置のブロック図である。
【図２】図１に示すニュース記事切り出し装置によって実行されるニュース記事切り出し手順の概略を示すフローチャートである。
【図３】図２における動画像解析時に行われるカット点画像検出処理動作のフローチャートである。
【図４】図３に示すカット点画像検出処理動作において実行される色相ヒストグラム生成処理動作のフローチャートである。
【図５】図２における動画像解析時に行われるテロップフレーム検出処理動作のフローチャートである。
【図６】テロップの候補領域と投影ヒストグラムの山との関係を示す図である。
【図７】図５示すテロップフレーム検出処理動作において実行されるエッジ画像生成処理動作のフローチャートである。
【図８】図５示すテロップフレーム検出処理動作において実行される投影ヒストグラム生成処理動作のフローチャートである。
【図９】図２における動画像解析時に行われる人物の顔検出処理動作のフローチャートである。
【図１０】状態遷移モデルの一例を示す図である。
【図１１】図２における音声解析時に行われる無音区間検出処理動作のフローチャートである。
【図１２】図１における解析結果統合部によって行われるニュース記事切り出し処理動作のフローチャートである。
【図１３】図１２に示すニュース記事切り出し処理動作において実行されるクラスタリング処理動作のフローチャートである。
【図１４】図１２とは異なるニュース記事切り出し処理動作のフローチャート図である。
【図１５】図１２および図１４とは異なるニュース記事切り出し処理動作のフローチャートである。
【図１６】図１２,図１４および図１５とは異なるニュース記事切り出し処理動作のフローチャートである。
【図１７】図１２および図１４〜図１６とは異なるニュース記事切り出し処理動作のフローチャートである。
【図１８】図１における解析結果統合部によって実行されるＣＭ除去処理動作のフローチャートである。
【符号の説明】
１…Ａ/Ｄ変換部、
２…映像分離部、
３…メモリ、
４…動画像解析部、
５…音声解析部、
６…解析結果統合部、
７…映像蓄積部。[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a news article cutout device that automatically cuts out a news article as a search unit when a news video is made into a database from the video.
[0002]
[Prior art]
  In a conventional news video database, indexing for video segmentation and search is performed using techniques described in “Information Processing Society of Japan Vol, 37 No.9“ Informedia: CMU Digital Video Library Project ””. ing.
[0003]
[Problems to be solved by the invention]
  However, in the above conventional technique, although elemental technologies are listed for video segmentation, a specific solution is not disclosed.
[0004]
  SUMMARY OF THE INVENTION An object of the present invention is to provide a news article cutout device that can cut out a news article from a news video using the characteristics of the news video.
[0005]
[Means for Solving the Problems]
  To achieve the above objective,thisThe news article extraction device of the invention is, MovieAn image input unit, a video separation unit that separates the news video input by the video input unit into audio and video, and a cut point image that is a change point of the video is detected from the video separated by the video separation unit A cut point image detecting means, a telop detecting means for detecting a telop from the separated moving image, a face detecting means for detecting a face image from the separated moving image, and the telop detection among the cut point images. Similarity calculation means for calculating a similarity between cut point images located immediately before the telop detected by the means, and silence detection for detecting a silent part from the sound separated by the video separation means And a cut point image having a high degree of similarity is selected from the cut point images that are located immediately before the telop and the face is reflected, and are located in the vicinity of the selected cut point image. Is characterized by comprising an article clipping means for cutting between the silence as articles.
[0006]
  According to the above configuration, the news video input by the video input unit is separated into the moving image portion and the audio portion by the video separation unit. Then, the cut point image detecting means detects a cut point image that is a change point of the moving image from the separated moving images. The telop is detected from the separated moving image by the telop detection means. Also, a face image is detected from the separated moving image by the face detection means. In addition, a silence portion is detected from the separated sound by the silence detection means. Further, the similarity calculation means calculates the similarity between the cut point images of the cut point images that are located immediately before the telop detected by the telop detection means and in which the face is reflected. Then, by the article cutout means, a cut point image having a high similarity is selected from the cut point images located immediately before the telop and showing the face, and a space between the silent portions in the vicinity of the selected cut point image is selected. Cut out as an article.
[0007]
  Also onNThe news article cutout device detects between the cut point images corresponding to the commercial message (CM) among the cut point images detected by the cut point image detection means before cutting out the article by the article cutout means. It is desirable to provide a CM removing means for removing them.
[0008]
  According to the above configuration, before the news article is cut out, the CM removing unit detects and removes the cut point images corresponding to the CM among the detected cut point images. Therefore, when the article is cut out by the article cutout means to be performed later, the CM section is excluded and only the news article is cut out.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
  Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments. FIG. 1 is a block diagram of a news article clipping device according to the present embodiment.
[0010]
  The A / D converter 1 performs A / D conversion on the input video and digitizes it. The video separation unit 2 separates the video digitized by the A / D conversion unit 1 into a moving image portion and an audio portion. The separated moving image data and audio data are stored in the memory 3. The moving image stored in the memory 3 in this manner is analyzed by the moving image analysis unit 4. The voice stored in the memory 3 is analyzed by the voice analysis unit 5. The analysis results of the moving image and the sound are stored in the memory 3.
[0011]
  The analysis result integration unit 6 reads out and integrates the analysis results obtained by the moving image analysis unit 4 and the voice analysis unit 5 from the memory 3, and extracts a news article as described in detail later. The news articles thus cut out are stored in the video storage unit 7.
[0012]
  FIG. 2 is a flowchart showing an outline of a news article cutout procedure executed by the news article cutout apparatus shown in FIG. Hereinafter, the news article extraction procedure will be described with reference to FIG.
[0013]
  First, in step S1, the video separation unit 2 separates the digital video data from the A / D conversion unit 1 into moving image data and audio data. In step S2, the moving image analysis unit 4 analyzes the moving image based on the separated moving image data. As will be described in detail later, this moving image analysis detects a cut point image that is a change point of a moving image, a frame in which a telop is shown (telop frame), and a frame in which a person's face is shown. .
[0014]
  In step S3, the voice analysis unit 5 analyzes the voice based on the separated voice data. As will be described in detail later, a silent section is detected by this voice analysis. In step S4, the analysis result integration unit 6 integrates the moving image analysis result and the voice analysis result to cut out a news article.
[0015]
  FIG. 3 is a flowchart of the cut point image detection processing operation performed at the time of moving image analysis in step S2 in the news article cutout procedure shown in FIG. First, in step S11, the frame number frame is initialized to “0”. In step S12, the hue histogram hist2 of the immediately preceding frame and the hue histogram hist1 of the current frame are initialized to “0”. In step S13, frame data to be processed is read out from the moving image data stored in the memory 3. In step S14, the hue histogram hist1 is generated and updated. The generation of the hue histogram hist1 will be described in detail later. In step S15, the frame number frame is incremented.
[0016]
  In step S16, it is determined whether or not frame = 1. As a result, if frame = 1, the process proceeds to step S21; otherwise, the process proceeds to step S17. In step S17, a difference D between the hue histogram hist2 of the previous frame and the hue histogram hist1 of the current frame is obtained. Although the difference D is calculated by the equation (1), it may be calculated by other calculation methods.

In step S18, it is determined whether or not the difference D is smaller than a threshold value TH. As a result, if it is smaller than the threshold value TH, the process proceeds to step S19, and if it is greater than or equal to the threshold value TH, the process proceeds to step S20. In step S19, since the change amount of the hue histogram from the immediately preceding frame is small, the current frame is not regarded as a moving point change point, and is determined to be a non-cut point image. In step S20, it is determined that the current frame is a cut point image. Then, for example, it is registered in a cut point image table representing the position of the cut point image. In step S21, the hue histogram hist2 of the previous frame is updated with the hue histogram hist1 of the current frame. In step S22, it is determined whether or not unprocessed frame data exists in the memory 3. As a result, if there is, the process returns to step S13, and the process proceeds to the next frame. If not, the cut point image detection processing operation ends.
[0017]
As described above, in the present embodiment, when the hue histogram hist1 of the current frame changes more than the threshold TH with respect to the hue histogram hist2 of the previous frame, the current frame is regarded as a change point of the moving image, and Is detected as a cut point image.
[0018]
FIG. 4 is a flowchart of the hue histogram generation processing operation executed in step S14 of the cut point image detection processing operation shown in FIG. First, in step S31, pixel values R, G, and B of one pixel are read from the frame data fetched from the memory 3 in step S13 of the cut point image detection processing operation shown in FIG. In step S32, coordinate transformation is performed by equation (2).

In step S33, the histogram is incremented by equation (3).
hist [i] = hist [i] +1 (3)
i = H / H QUANT
      Where H_QUANT: Hue quantization constant
In step S34, it is determined whether or not there is an unprocessed pixel in the frame data. As a result, if there is, the process returns to step S31 to shift to the processing of the next pixel value R, G, B. If not, the hue histogram generation processing operation ends.
[0019]
  FIG. 5 is a flowchart of the telop frame detection processing operation performed at the time of moving image analysis in step S2 in the news article cutout procedure shown in FIG. In the following description, a detection method related to horizontal writing telops will be described as an example. However, in the case of detecting vertical writing telops, the detection can be performed in the same manner by replacing the x axis and the y axis.
[0020]
  In step S41, the frame number frame is initialized to “0”. In step S42, the edge image edge2 of the immediately preceding frame and the edge image edge1 of the current frame are initialized to “0”. In step S43, frame data to be processed is read from the moving image data stored in the memory 3. In step S44, the edge image edge1 is generated and updated. The generation of the edge image will be described later in detail. In step S45, the frame number frame is incremented.
[0021]
  In step S46, it is determined whether or not frame = 1. As a result, if frame = 1, the process proceeds to step S55, and if not, the process proceeds to step S47. In step S47, a projection histogram on the y-axis of the edge image edge1 is generated by a projection histogram generation method described later. In step S48, the histogram generated in step S47 is analyzed, and a mountain range [y1, y2] serving as a telop candidate region is detected based on a threshold value or the like. Here, normally, edges are concentrated around the telop. Therefore, in the case of horizontal writing, a mountain is detected in the projection histogram on the y-axis as shown in FIG. Therefore, in step S48, the peak of the histogram projected onto the y-axis is detected and used as a telop candidate area. Next, in step S49, it is determined whether or not there is a mountain based on the detection result of the mountain range in step S48. As a result, if there is a mountain, the telop candidate region is present and the process proceeds to step S50. If there is no mountain, the telop candidate region is absent and the process proceeds to step S55.
[0022]
  In step S50, the edge in the range from y1 to y2 is projected on the x-axis to generate a projection histogram of the edge image. In step S51, the mountain range of the character portion is detected from the histogram generated in step S50 based on a threshold value or the like. In step S52, it is determined whether there is a mountain based on the detection result of the mountain range in step S51. As a result, if there is a mountain, the process proceeds to step S54, and if not, the process proceeds to step S53.
[0023]
  In step S53, it is determined that the current frame is a non-telop frame. In step S54, it is determined that the current frame is a telop frame. Then, for example, it is registered in a telop frame table representing the position of the telop frame. In step S55, the edge image edge2 of the immediately preceding frame is updated with the edge image edge1 of the current frame. In step S56, it is determined whether or not unprocessed frame data exists in the memory 3. As a result, if there is, the process returns to step S43 to shift to the next frame processing. If not, the telop frame detection processing operation is terminated.
[0024]
  As described above, in the present embodiment, when there is a mountain in the projection histogram on the y axis of the generated edge image edge1, and there is also a mountain in the projection histogram on the x axis, the current frame is displayed. It is determined that there is a telop character string, and the current frame is detected as a telop frame.
[0025]
  FIG. 7 is a flowchart of the edge image generation processing operation executed in step S44 of the telop frame detection processing operation shown in FIG. In step S61, the edge image edge1 of the current frame is initialized to “0”. In step S62, a gray image gray is generated from the frame data fetched from the memory 3 in step S43 of the telop frame detection processing operation shown in FIG. Here, the gray image gray is an image in which the pixel values R, G, and B obtained from the frame data are subjected to coordinate conversion by Equation (2) and the V value is expressed as a pixel value. In step S63, the value of the width of the gray image gray is set in the variable W. On the other hand, the variable H is set to the height value of the gray image gray. In step S64, an initial value “1” is set to the variable i. In step S65, an initial value “1” is set to the variable j. In step S66, the horizontal edge h edge [i] [j] and vertical edge v edge [i] [j] is calculated by equation (3).
h edge [i] [j] = abs (gray [i-1] [j] −gray [i + 1] [j]) (4)
v edge [i] [j] = abs (gray [i] [j-1] −gray [i] [j + 1])
Here, gray [i] [j] is a pixel value of coordinates (j, i) in the grayscale image gray.
[0026]
  In step S67, the contents of variable j are incremented. In step S68, it is determined whether j <(W-1). As a result, if j <(W−1), the process returns to step S66 and the calculation of the horizontal edge and the vertical edge is continued. On the other hand, if j ≧ (W−1), the process proceeds to step S69. In step S69, the variable i is incremented. In step S70, it is determined whether i <(H-1). If i <(H-1) as a result, the process returns to step S65 and the calculation of the horizontal edge and the vertical edge is continued. On the other hand, if i ≧ (H−1), the edge image generation processing operation is terminated.
[0027]
  That is, in the present embodiment, h is the absolute value of the difference between the V values of the adjacent pixels in the horizontal direction obtained in the range of 1 ≦ j ≦ (W−1) and 1 ≦ i ≦ (H−1). Image h with pixel values of edge [i] [j] edge and the absolute value of the difference between the V values of both adjacent pixels in the vertical direction obtained in the range of 1 ≦ j ≦ (W−1) and 1 ≦ i ≦ (H−1) An image with edge [i] [j] as the pixel value v The edge image is referred to as the edge image edge.
[0028]
  In the present embodiment, the edge image edge is generated by the above-described method, but the present invention is not limited to this, and other edge detection methods may be used.
[0029]
  FIG. 8 is a flowchart of the projection histogram generation processing operation executed in step S47 or step S50 of the telop frame detection processing operation shown in FIG. In step S71, the edge image edge1 (h1) of the current frame generated according to the edge image generation processing operation shown in FIG. 7 in step S44 of the telop frame detection processing operation shown in FIG. edge1, v edge1) and the edge image edge2 (h of the previous frame held in the work buffer etc. edge2, v edge2) is input.
[0030]
  In step S72, the projection range (xmin, ymin) to (xmax, ymax) is set. However, when this processing operation is called in step S47 of the telop frame detection processing operation shown in FIG. 5, since the entire edge images edge1 and edge1 are targeted, the projection range is (0, 0) to ( W-1, H-1). When called from step S50 of the telop frame detection processing operation, the projection range is (0, y1) to (W-1, y2). In step S73, the projection histogram yhist on the y-axis and the projection histogram xhist on the x-axis are initialized to “0”. In step S74, for one pixel within the projection range set in step S72, the projection histogram yhist on the y axis and the projection histogram xhist on the x axis are generated by the equation (5).
xhist [j] = xhist [j] + Min (h edge1 [i] [j], h edge2 [i] [j]) (5)
yhist [j] = yhist [j] + Min (v edge1 [i] [j], v edge2 [i] [j])
However, when this processing operation is called in step S47 of the telop frame detection processing operation shown in FIG. 5, a projection histogram yhist on the y-axis is calculated. On the other hand, when called from step S50 of the telop frame detection processing operation, a projection histogram xhist on the x-axis is calculated. In step S75, it is determined whether or not there is an unprocessed pixel. As a result, if there is, the process returns to step S74 to shift to the process relating to the next pixel, and if not, the projection histogram generation processing operation is terminated.
[0031]
  FIG. 9 is a flowchart of the human face detection processing operation performed at the time of moving image analysis in step S2 in the news article cutout procedure shown in FIG. In the present embodiment, face detection is performed by matching a model having a hierarchical structure called a state transition model shown in FIG. 10, but a neural network or other methods may be used.
[0032]
  In step S81, a face detection image is input from the memory 3. In step S82, the input image is divided into regions each including a set of pixels in which adjacent pixels have similar colors. In step S83, the color, position, and shape features of each of the divided areas are extracted. In step S84, each region (the number N of regions) is color, which is the initial state of the state transition model shown in FIG. It is initialized by giving a state label of seg. In step S85, each of the area number i and the variable change indicating the number of areas whose state has changed are initialized to “0”.
[0033]
  In step S86, the feature quantity of the area [i] is collated with the state transition rule to be satisfied when the area [i] transitions to a transitionable state. As a result, if the area [i] does not satisfy any state transition rule, the process proceeds to step S88. On the other hand, if it satisfies, the process proceeds to step S87. In step S87, the state label of region [i] is updated to the state label corresponding to the state transition rule that is satisfied. After that, the contents of the variable change is incremented. For example, when the state label of the area [i] is color_seg and the state transition model shown in FIG. 10 is used, the states that can transition from the state label color_seg are skin_seg and black_seg. In this case, the state transition rule to be satisfied in order for the region [i] to transition to both the states is “IsSkin” and “IsSkin” set in the arrows from the state label color_seg to the state label skin_seg and the state label black_seg in FIG. “IsBlack”. That is, if the feature amount of the area [i] satisfies the state transition rule “IsSkin”, the state label of the area [i] is updated to skin_seg. Similarly, if the state transition rule “IsBlack” is satisfied, it is updated to black_seg.
[0034]
  In step S88, the region number i is incremented. In step S89, it is determined whether or not the region number i is smaller than the number N of regions. As a result, if i <N, the process returns to step S86 to shift to the process for the next area. On the other hand, if i ≧ N, the process proceeds to step S90. In step S90, it is determined whether or not change = 0, that is, whether or not there is a region whose state has changed. As a result, if there is, the process returns to step S85. Thus, the above-described processing is repeated until there is no area where the state label has changed.
[0035]
  In step S91, it is determined whether or not there is a region having the state label face by checking the state labels of all the regions. As a result, if it exists, the process proceeds to step S92, and if not, the process proceeds to step S93. In step S92, it is assumed that a person's face has been detected. For example, the face of the person is registered in a face frame table indicating the position of a frame. After that, the human face detection processing operation ends. In step S93, assuming that no human face has been detected, the human face detection processing operation is terminated.
[0036]
  As described above, in the present embodiment, the input image is divided into regions of similar colors, the features of each region are extracted, and the feature amount of each region satisfies the state transition rule of the state transition model shown in FIG. If so, the state of the region is changed, and this process is repeated until all regions do not change state. If there is a region having the state label “face”, it is determined that a human face has been detected.
[0037]
  FIG. 11 is a flowchart of the silent section detection processing operation performed at the time of voice analysis in step S3 in the news article cutout procedure shown in FIG. First, in step S101, a variable Silence indicating a silent section is initialized to “FALSE”. In step S102, audio data corresponding to the length of the section [sp, ep] is read. In step S103, the audio power p is calculated from the read audio data. In step Sl04, the variance value of the audio power p is calculated by equation (6).

In step S105, it is determined whether or not the calculated variance value Var is smaller than a threshold value TH. As a result, if Var <TH, the section [sp, ep] is determined to be a silent section, and the process proceeds to step S108. On the other hand, if Var ≧ TH, it is determined that it is not a silent section, and the process proceeds to step S106.
[0038]
  In step S106, it is determined whether or not the variable Silence is “TRUE”, that is, whether the immediately preceding processing section is a silent section. As a result, if it is not “TRUE”, the process returns to step S101 and the same processing is repeated. On the other hand, if it is “TRUE”, the process proceeds to step S107. In step S107, a silent section [start, end] is detected based on the start end “start” and end end “end” set in steps S109 and S111 as described later. Then, it is registered in a silent section table representing the position of the silent section. After that, the process returns to step S101 and the same process is repeated.
[0039]
  In step S108, it is determined whether or not the variable Silence is “TRUE”. As a result, if it is not “TRUE”, the current section [sp, ep] is assumed to be the start point of the silent section, and the process proceeds to step S109. On the other hand, if it is “TRUE”, the current section [sp, ep] is assumed to be a continuous section of the immediately preceding silent section, and the process proceeds to step S111. In step S109, “sp” is set to the start “start” of the silent section. In step S111, the variable Silence is set to “TRUE”. After that, the process proceeds to step S112. In step S111, “ep” is set to the end “end” of the silent section. In step S112, it is determined whether there is unprocessed audio data. As a result, if there is, the process returns to step S102 and shifts to the processing of the next audio data. When it is determined in step S105 that “Var ≧ TH” and in step S106 that “the immediately preceding processing section is a silent section”, it is determined in step S107 that the silent section [start (= sp), end (= Ep)] is detected. On the other hand, if there is no unprocessed audio data, the silent section detection processing operation is terminated.
[0040]
  Thus, in the present embodiment, when the variance value Var of the power p in the voice section [sp, ep] is smaller than the threshold value TH, it is determined that the section [sp, ep] is a silent section. Further, if the immediately preceding section is a silent section, the section [sp, ep] is determined to be a continuation section of the immediately preceding silent section. On the other hand, if the immediately preceding section is not a silent section, the section [sp, ep] is determined to be the starting point of the silent section. Next, when the variance value Var becomes equal to or greater than the threshold value TH, the silent section [start (= sp), end (= ep)] is detected.
[0041]
  FIG. 12 is a flowchart of the news article cutout processing operation performed by the analysis result integration unit 6 in step S4 of the news article cutout procedure shown in FIG. In this news article cutout processing operation, a news article is cut out based on the cut point image detected by the moving image analysis unit 4 according to the cut point image detection processing operation shown in FIG.
[0042]
  In step S121, a set of cut point images Cut: {c registered in the cut point image table by the cut point image detection processing operation shown in FIG._i| i = 1,2, ..., N_cut} Is obtained. And this set {c_i}, The set of cut point images belonging to the first cluster Clst: {clst_i| i = 1,2, ..., N_clst} ⊂Cut is obtained. In step S122, the set {clst_i} Index i is initialized to “1”.
[0043]
  In step S123, clst_iIs set as the start point of a news article. In step S124, i is incremented. In step S125, clst_iIs set as the end point of the news article. In this way, one news article is cut out. In step S126, i is the maximum value “N”._clstIt is determined whether it is smaller than "." As a result, i <N_clstIf so, the process returns to step S123 to move to the next news article cutout process. On the other hand, i ≧ N_clstIf so, the news article cutout processing operation is terminated.
[0044]
  FIG. 13 is a flowchart of the clustering processing operation executed in step S121 of the news article cutout processing operation shown in FIG. In step S131, similarity Similar (i, j) between all cut point images is calculated. Here, i and j are numbers of two cut point images for calculating the similarity. In this embodiment, the reciprocal number of Equation (1) is used as the similarity level Similar (i, j), but other similarities may be used. In step S132, the frequency histograms Hist [i] and Hist [j] are initialized to “0”. In step S133, it is determined whether the similarity Similar (i, j) is greater than a threshold value TH. As a result, if Similar (i, j)> TH, the process proceeds to step S134, and if Similar (i, j) ≦ TH, the process proceeds to step S135. In step S134, the frequency histograms Hist [i] and Hist [j] are incremented. In step S135, it is determined whether or not there is an unprocessed similarity level Similar (i, j). As a result, if there is, the process returns to step S133 and shifts to processing for the next similarity Similar (i, j).
[0045]
  In step S136, the maximum frequency position Max is detected based on the generated frequency histograms Hist [i] and Hist [j]. In step S137, it is determined whether or not the current cluster is an empty set. As a result, if it is an empty set, the process proceeds to step S139, whereas if it is not an empty set, the process proceeds to step S138. In step S138, it is determined whether or not the detected maximum frequency position Max is included in the first cluster. As a result, if it is included, the process proceeds to step S139. If it is not included, the clustering processing operation is terminated. In step S139, j is added to the first cluster such that all Similar (Max, j) are larger than the threshold value TH. In step S140, Max is excluded from the frequency histograms Hist [i] and Hist [j]. After that, the process returns to step S136, and the above process is repeated. When it is determined in step S138 that the maximum frequency position Max is not included in the first cluster, the clustering processing operation is terminated.
[0046]
  In general, in a news video, every time a news article ends, the video switches to a stationary newscaster video, and after the next news article is explained, the video of the next news article is displayed. It is supposed to be started. In other words, there is a very similar moving picture change point, “video of a stationary newscaster”, between each news article.
[0047]
  Therefore, in the present embodiment, as described above, similarity (i, j) between all cut point images is calculated for the set of cut point images, and the similarity Similar (i , j) is clustered so as to include the maximum frequency position Max of the frequency histograms Hist [i] and Hist [j] representing the frequency greater than the threshold TH. Then, the cut point images clst belonging to the first cluster are cut out as one news article.
[0048]
  FIG. 14 is a flowchart of the news article cutout processing operation different from FIG. 12 performed by the analysis result integration unit 6 in step S4 of the news article cutout procedure shown in FIG. In this news article cutout processing operation, a news article is cut out based on the telop frame detected by the moving image analysis unit 4 according to the telop frame detection processing operation shown in FIG. 5 in addition to the cut point image. is there.
[0049]
  In step S141, a set of cut point images Cut: {c registered in the cut point image table by the cut point image detection processing operation shown in FIG._i| i = 1,2, ..., N_cut} Is obtained. Furthermore, a set of telop frames Telop: {t registered in the telop frame table by the telop frame detection processing operation shown in FIG._i| i = 1,2, ..., N_telop} Is obtained. And ticker frame t_iThe cut point image immediately before is a set of cut point images {c_i} Is extracted from. In step S142, clustering is performed by the clustering operation shown in FIG. 13, and a set of cut point images Clst: {clst belonging to the first cluster is obtained._i| i = 1,2, ..., N_clst} ⊂Cut is obtained. In step S143, the set {clst_i} Index i is initialized to “1”.
[0050]
  In step S144, clst_iIs set as the start point of a news article. In step S145, i is incremented. In step S146, clst_iIs set as the end point of the news article. In this way, one news article is cut out. In step S147, i is the maximum value “N”._clstIt is determined whether it is smaller than "." As a result, i <N_clstIf so, the process returns to step S144 to move to the next news article cutout process. On the other hand, i ≧ N_clstIf so, the news article cutout processing operation is terminated.
[0051]
  As described above, in a news video, there is a similar video called “video of a stationary newscaster” between each news article, and this video is a change point in the entire moving image called a news video. It has become. Further, there is usually a telop frame immediately after the video of the newscaster.
[0052]
  Therefore, in the present embodiment, clustering using the similarity is performed on a set of cut point images immediately before the telop frame. Then, the cut point images clst belonging to the first cluster are cut out as one news article.
[0053]
  FIG. 15 is a flowchart of a news article cutout process operation different from that of FIG. 12 and FIG. 14 performed by the analysis result integration unit 6 in step S4 of the news article cutout procedure shown in FIG. In this news article cutout processing operation, in addition to the cut point image and the telop frame, the news analysis is performed based on the human face detected by the moving image analysis unit 4 according to the human face detection processing operation shown in FIG. Articles are cut out.
[0054]
  In step S151, a set of cut point images Cut: {c registered in the cut point image table by the cut point image detection processing operation shown in FIG._i| i = 1,2, ..., N_cut} Is obtained. Furthermore, a set of telop frames Telop: {t registered in the telop frame table by the telop frame detection processing operation shown in FIG._i| i = 1,2, ..., N_telop} Is obtained. Further, a set of frames Face: {f registered in the face frame table by the human face detection processing operation shown in FIG._i| i = 1,2, ..., N_face} ⊂Cut is obtained. And ticker frame t_iA cut point image that is a cut point image immediately before and whose face is detected is a set of cut point images {c_i} Is extracted from.
[0055]
  In step S152, clustering is performed by the clustering processing operation shown in FIG. 13, and a set of cut point images belonging to the first cluster Clst: {clst_i| i = 1,2, ..., N_clst} ⊂Face⊂Cut is obtained. In step S153, the set {clst_i} Index i is initialized to “1”.
[0056]
  In step S154, clst_iIs set as the start point of a news article. In step S155, i is incremented. In step S156, clst_iIs set as the end point of the news article. In this way, one news article is cut out. In step S157, i is the maximum value “N”._clstIt is determined whether it is smaller than "." As a result, i <N_clstIf so, the process returns to step S154, and the process proceeds to cut out the next news article. On the other hand, i ≧ N_clstIf so, the news article cutout processing operation is terminated.
[0057]
  As described above, immediately before the telop frame in the news video, there is a video of a similar person's face called “video of a stationary newscaster”, and this video is in the entire moving image called news video. It is a changing point.
[0058]
  Therefore, in the present embodiment, clustering using the similarity is performed on a set of cut point images that are immediately before the telop frame and from which a human face is detected. Then, the cut point images clst belonging to the first cluster are cut out as one news article.
[0059]
  FIG. 16 is a flowchart of a news article cutout process operation different from that of FIGS. 12, 14 and 15 performed by the analysis result integration unit 6 in step S4 of the news article cutout procedure shown in FIG. In the news article cutout processing operation, a news article is cut out based on the silent section detected by the voice analysis unit 5 according to the silent section detection detection processing operation shown in FIG.
[0060]
  In step S161, the silent section set Silent: {[s registered in the silent section table by the silent section detection processing operation shown in FIG._i, e_i] | i = 1,2, ..., N_silent} Is obtained. And the set {[s_i, e_i]} Index i is initialized to “1”.
[0061]
  In step S162, the end point e of the silent section_iIs set as the start point of a news article. In step S163, i is incremented. In step S164, the start point s of the silent section_iIs set as the end point of the news article. In this way, one news article is cut out. In step S165, i is the maximum value “N”._silentIt is determined whether it is smaller than "." As a result, i <N_silentIf so, the process returns to step S162 to move to the next news article clipping process. On the other hand, i ≧ N_silentIf so, the news article cutout processing operation is terminated.
[0062]
  As described above, there is a “video of a stationary newscaster” in the news video, but this newscaster is temporarily silent before entering the explanation of the next news article. Therefore, in the present embodiment, a silent section is cut out as one news article.
[0063]
  FIG. 17 is a flowchart of the news article cutout processing operation different from that of FIGS. 12 and 14 to 16 performed by the analysis result integration unit 6 in step S4 of the news article cutout procedure shown in FIG. In the news article cutout processing operation, a news article is cut out based on the cut point image, the telop frame, the person's face, and the silent section.
[0064]
  In step S171, the cut point image set Cut: {c registered in the cut point image table by the cut point image detection processing operation shown in FIG._i| i = 1,2, ..., N_cut} Is obtained. Furthermore, a set of telop frames Telop: {t registered in the telop frame table by the telop frame detection processing operation shown in FIG._i| i = 1,2, ..., N_telop} Is obtained. Further, a set of frames Face: {f registered in the face frame table by the human face detection processing operation shown in FIG._i| i = 1,2, ..., N_face} ⊂Cut is obtained. Furthermore, the silent section set Silent: {[s registered in the silent section table by the silent section detection processing operation shown in FIG._i, e_i] | i = 1,2, ..., N_silent} Is obtained. And ticker frame t_iA cut point image that is a cut point image immediately before and whose face is detected is a set of cut point images {c_i} Is extracted from.
[0065]
  In step S172, clustering is performed by the clustering processing operation shown in FIG. 13, and the set of cut point images belonging to the first cluster, Clst: {clst_i| i = 1,2, ..., N_clst} ⊂Face⊂Cut is obtained. In step S173, the set {clst_i} Index i is initialized to “1”.
[0066]
  In step S174, clst_iIs set as the temporary start point of the news article. In step S175, i is incremented. In step S176, clst_iIs set as the provisional end point of the news article. In step S177, it is determined whether or not there is a silent section near the temporary end point end. As a result, if it exists, the process proceeds to step Sl79, and if it does not exist, the process proceeds to step S178. In step Sl78, i is the maximum value “N”._clstIt is determined whether it is smaller than "." As a result, i <N_clstIf so, the process returns to step S175 to update the temporary end point. On the other hand, i ≧ N_clstIf so, the news article cutout processing operation is terminated.
[0067]
  In step S179, the silent section end point in the vicinity of the temporary start point start is detected and set as “S”. In step S180, the silent section start point near the temporary end point is detected and set as “E”. In step S181, the section [S, E] is cut out as a news article. In step S182, i is the maximum value “N”._clstIt is determined whether it is smaller than "." As a result, i <N_clstIf so, the process returns to step S174 to move to the next news article cut-out process. On the other hand, i ≧ N_clstIf so, the news article cutout processing operation is terminated.
[0068]
  As described above, immediately before the telop frame in the news video, there is a video of a similar person's face called “video of a stationary newscaster”, and this video is in the entire moving image called news video. It is a changing point. The newscaster is temporarily silent before entering the explanation of the next news article.
[0069]
  Therefore, in the present embodiment, clustering using the similarity is performed on a set of cut point images that are immediately before the telop frame and from which a human face is detected. Then, a temporary start point start and a temporary end point end of one news article are obtained from each cut point image clst belonging to the first cluster. Then, the section between the silent section end point S near the temporary start point start and the silent section start point E near the temporary end point end is cut out as one news article.
[0070]
  FIG. 18 is a flowchart of the CM removal processing operation for removing the CM section executed before the news article cutout processing operation shown in FIGS. 12 and 14 to 17 is performed by the analysis result integration unit 6. However, the flowchart of the CM removal processing operation shown in FIG. 18 detects and removes the CM for 15 seconds. Therefore, when detecting and removing a CM for 30 seconds, the number “15” in the flowchart of the CM removal processing operation shown in FIG. 18 may be changed to “30”.
[0071]
  In step S191, the start cut point image number start is initialized to “0”. In step S192, the cumulative time interval between cut point images is initialized to “0”, and the end cut point image number end is initialized to “start + 1”. In step S193, between the cut point images detected by the cut point image detection processing operation shown in FIG. 3, the interval between the “end−1” -th cut point image and the “end” -th cut point image (cut) The time between point images [end-1, end]) is acquired and added to the cumulative time interval between cut point images. In step S194, it is determined whether or not the cumulative time interval between cut point images is “15”. As a result, if interval = 15, the process proceeds to step S195; otherwise, the process proceeds to step S197.
[0072]
  In step S195, the interval between the cut point images [end-1, end] is determined as the CM section, and the video data is deleted. Then, the start cut point image number start is updated to “end”. By doing so, the CM section is excluded from the processing target in subsequent news article cutout processing operations and the like.
[0073]
  In step S196, it is determined whether or not the start cut point image number start is smaller than the “total cut point image number N−1”. As a result, if start <(N−1), the process returns to step S192 to shift to the next CM detection process. On the other hand, if start ≧ (N−1), the CM removal processing operation is terminated.
[0074]
  In step S197, it is determined whether or not the start cut point image number start is smaller than “total cut point image number N−1”. As a result, if start <(N−1), the process proceeds to step S198, and if start ≧ (N−1), the CM removal processing operation ends. In step S198, it is determined whether or not the cumulative time interval between cut point images is “15” or more. As a result, if interval ≧ 15, the process proceeds to step S200, and if interval <15, the process proceeds to step S199. In step Sl99, the end cut point image number end is incremented. After that, the process returns to step Sl93 to continue the CM detection process. In step S200, the start cut point image number start is incremented. After that, the process returns to step S192, and the process proceeds to the next CM detection process. If it is determined in steps S196 and S197 that start ≧ (N−1), the CM removal processing operation is terminated.
[0075]
  As described above, in the present embodiment, the cumulative time between the cut point images is sequentially acquired from the head for the cut point images detected by the cut point image detection processing operation shown in FIG. When the cumulative time between cut point images reaches 15 seconds, the video between the cut point images is deleted because it is a CM section. By doing so, the CM section can be excluded from the processing target in the subsequent news article cutout processing or the like.
[0076]
  In the present embodiment, the CM is detected by the above-described method. However, the CM may be detected by another method.
[0077]
  As described above, in the present embodiment, the video separation unit 2 separates the input video into a moving image portion and an audio portion. Then, the moving image analysis unit 4 analyzes the moving image, and the sound analysis unit 5 analyzes the sound.
[0078]
  In that case, the analysis of the moving image is
(1) When the hue histogram hist1 of the current frame changes by more than the threshold value TH with respect to the hue histogram hist2 of the previous frame, the current frame is detected as a cut point image.
(2) When there is a mountain in the projection histogram on the y-axis and the projection histogram on the x-axis of the edge image edge1 having the absolute value of the difference between the V values of both adjacent pixels as the pixel value, the current frame is set as a telop frame. To detect.
(3) The input image is divided into regions of similar colors to extract features, and the state transition of each region is repeated according to the state transition model based on the feature amount of each region. If a region having a state label “face” finally exists, it is determined that a human face has been detected.
[0079]
  Also, the above voice analysis
(4) If the variance value Var of the power p in the voice section [sp, ep] is smaller than the threshold TH and the immediately preceding section is not a silent section, “sp” is set to the start start of the silent section, while the previous section is silent. If it is a section, “ep” is set to the end end of the silent section. Then, when the variance value Var is equal to or greater than the threshold value TH, the section [start, end] is detected as a silent section.
[0080]
  Then, based on the moving image solution result and the voice analysis result, the analysis result integration unit 6 cuts out a news article by the following method.
[0081]
  (A) Frequency histograms Hist [i] and Hist [j] representing frequencies at which Similarity (i, j) between all cut point images is greater than the threshold value TH are obtained, and clustering is performed so as to include the maximum frequency position Max. I do. Then, the cut point images clst belonging to the first cluster are cut out as one news article.
[0082]
  Therefore, a news article can be cut out from a news video based on a cut point image (newscaster image) that is a change point of the hue histogram.
[0083]
  (B) Clustering using the similarity is performed on a set of cut point images immediately before the telop frame. Then, the cut point images clst belonging to the first cluster are cut out as one news article.
[0084]
  Therefore, based on the cut point image (newscaster image) and the telop frame, a news article can be cut out from the news video with higher accuracy.
[0085]
  (C) Clustering using the similarity is performed on a set of cut point images immediately before the telop frame and from which a face is detected. Then, the cut point images clst belonging to the first cluster are cut out as one news article.
[0086]
  Therefore, based on the telop frame and the cut point image (newscaster image) from which the face is detected, a news article can be cut out from the news video with higher accuracy.
[0087]
  (D) Cut out a silent section as one news article. Therefore, a news article can be cut out from a news video based on voice information (a silent section of a news caster).
[0088]
  (E) Clustering using the similarity is performed on a set of cut point images immediately before the telop frame and from which a face is detected. Then, a temporary start point start and a temporary end point end of one news article are obtained from each cut point image clst belonging to the first cluster. Then, a section between the silent section end point S near the temporary start point start and the silent section start point E near the temporary end point end is cut out as one news article.
[0089]
  Therefore, a news article can be cut out from the news video with higher accuracy based on the telop frame, the cut point image (news caster image) from which the face is detected, and the voice information (silent section of the news caster).
[0090]
  Furthermore, prior to performing the above-described news article cutout processing, the analysis result integration unit 6 searches the cut point images between the cut point images from the sequence of the cut point images, and the cumulative time 15 The interval between cut point images, which is seconds, is determined as CM and deleted. Therefore, the CM section can be excluded from the processing target in the subsequent news article cutout processing or the like.
[0091]
  In the news article clipping device of the present invention, the news article clipping method based on the cut point image that is the change point of the hue histogram, the news article clipping method based on the cut point image and the telop frame, the telop frame and the face Based on the news article clipping method based on the detected cut point image, the news article clipping method based on the voice information (silent section), the cut image and the voice information (silent section) where the telop frame and the face are detected It is not necessary that all the news article extraction methods have a feasible configuration. What is necessary is just to select suitably from each said news article extraction method.
[0092]
【The invention's effect】
  As is clear from the above,thisThe news article extraction device of the invention is, MovieThe news video is separated into a moving image portion and an audio portion by the image separating means, the cut point image detecting means is detected from the separated moving image by the cut point image detecting means, and the telop is detected by the telop detecting means. The face detection means detects the face image, the silence detection means detects the silence from the separated sound, and the similarity calculation means detects the person's face that is located immediately before the telop. The similarity between point images is calculated, and a cut point image with a high similarity that is located immediately before the telop and shows a face is selected by the article cutout means, and a silent portion in the vicinity of the selected cut point image As an article, the news article is divided between the silent parts near the transition point of the video that shows the face of a similar person just before the telop. Ri can be issued.
[0093]
  That is, according to the present invention, referring to the video of the telop, when the moving image is changed to a video of a person, that is, a video of a newscaster, and before the start of the next news article, the news The news article can be cut out more correctly by accurately detecting when the caster is silent.
[0094]
  Also onNThe news article cutout device includes a CM removal unit that detects and removes the cut point images corresponding to the CM among the detected cut point images prior to cutting out the article. News video can be removed in advance. Therefore, only the news article can be cut out when the article is cut out by the article cutting means to be performed later, and the news article can be cut out with high accuracy with reduced erroneous detection.
[Brief description of the drawings]
FIG. 1 is a block diagram of a news article clipping device according to the present invention.
FIG. 2 is a flowchart showing an outline of a news article cutout procedure executed by the news article cutout apparatus shown in FIG. 1;
FIG. 3 is a flowchart of a cut point image detection processing operation performed at the time of moving image analysis in FIG. 2;
4 is a flowchart of a hue histogram generation processing operation executed in the cut point image detection processing operation shown in FIG.
5 is a flowchart of a telop frame detection processing operation performed at the time of moving image analysis in FIG. 2. FIG.
FIG. 6 is a diagram illustrating a relationship between a telop candidate region and a peak of a projection histogram.
7 is a flowchart of an edge image generation processing operation executed in the telop frame detection processing operation shown in FIG.
8 is a flowchart of a projection histogram generation processing operation executed in the telop frame detection processing operation shown in FIG.
FIG. 9 is a flowchart of a human face detection processing operation performed at the time of moving image analysis in FIG. 2;
FIG. 10 is a diagram illustrating an example of a state transition model.
11 is a flowchart of silent section detection processing operation performed at the time of voice analysis in FIG. 2;
12 is a flowchart of a news article cutout processing operation performed by an analysis result integration unit in FIG.
13 is a flowchart of the clustering processing operation executed in the news article cutout processing operation shown in FIG.
FIG. 14 is a flowchart of news article cutout processing operation different from FIG.
FIG. 15 is a flowchart of a news article cutout process operation different from those in FIGS.
FIG. 16 is a flowchart of news article cutout processing operation different from those of FIGS. 12, 14 and 15;
FIG. 17 is a flowchart of a news article segmenting process operation different from that in FIGS. 12 and 14 to 16;
FIG. 18 is a flowchart of a CM removal processing operation executed by the analysis result integration unit in FIG.
[Explanation of symbols]
  1 ... A / D converter,
  2 ... Video separation unit,
  3. Memory,
  4 ... moving image analysis unit,
  5 ... Voice analysis part,
  6 ... Analysis result integration part,
  7: Video storage unit.

Claims

Video input means;
Video separation means for separating the news video input by the video input means into audio and video;
A cut point image detecting means for detecting a cut point image that is a change point of the moving picture from the moving picture separated by the video separating means;
A telop detection means for detecting a telop from the separated video;
Face detection means for detecting a face image from the separated video;
Of the cut point images, similarity calculation means for calculating the similarity between cut point images that are located immediately before the telop detected by the telop detection means and the face is shown ;
Silence detection means for detecting a silence part from the sound separated by the video separation means;
Article cutout means for selecting a cut point image having a high similarity from cut point images located immediately before the telop and showing a face between silent portions in the vicinity of the selected cut point image. A news article cutout device characterized by comprising.

The news article cutout device according to claim 1,
Prior to cutting out articles by the article clipping means, among between the cut point image detected by the scene change image detection means, a commercial message removing means for removing by detecting inter-cut point image corresponding to the commercial message A news cutout device characterized by comprising.