JP3561942B2

JP3561942B2 - Shot detection method and representative image recording / display device

Info

Publication number: JP3561942B2
Application number: JP02650794A
Authority: JP
Inventors: 雪絵五島; 裕志赤堀; 眞藤本
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-02-24
Filing date: 1994-02-24
Publication date: 2004-09-08
Anticipated expiration: 2019-09-08
Also published as: JPH07236115A

Description

【０００１】
【産業上の利用分野】
本発明は、動画像の早見、検索、編集などのために、画像内容に応じた単位で動画像をまとめる方法、及び、まとめられた動画像を基に代表的な画像を自動抽出し、その画像を記録・表示する装置に関するものである。
【０００２】
【従来の技術】
従来、動画像の区切りを検出する技術や、動画像の区切りを編集作業に利用する技術が多数提案されてきた。
【０００３】
動画像の区切りを映像信号から自動検出する方法としては、例えば、特開平３−２１４３６４号公報に開示されたシーン変化検出手法がある。これは、隣接フレーム間でヒストグラムを比較することにより、シーンが変化したかどうか判定するものである。ここでのシーンの変化は、撮影・録画の開始／終了時点や、編集作業によって別々のシーンを連結した時のつなぎ目に対応している。また、ＶＩＳＳ（ＶＨＳＩｎｄｅｘＳｅａｒｃｈＳｙｓｔｅｍ）では、録画開始時に、ビデオテープにＶＩＳＳ信号を自動記録し、高速頭出しなどのタグとして利用している。
【０００４】
一方、録画の開始点だけでなく、好みの画像の時点にタグ付けする、といった柔軟な技術も必要とされている。例えば、前述のＶＩＳＳでは、ユーザが見たい場面に対してもＶＩＳＳ信号を記録できるので、ビデオテープ上に記録されたＶＩＳＳ信号を利用して、イントロサーチと呼ばれる早送り再生を行うことができる。イントロサーチとは、早送り中にＶＩＳＳ信号を見つけると、ある時間だけ再生状態にし、その後再び早送りするという動作をテープの終わりまで繰り返すものである。
【０００５】
さらに、「好みの画像を選択してＶＩＳＳ信号を付加する」というユーザの手間を省くために、代表的な画像を自動抽出する方法も提案されている。例えば、特開平５ー１４７３３７では、１カット（録画開始操作をしてから録画終了操作をするまでの間に連続して撮影された動画像）の中から、代表的な画像を自動的に抽出する静止画像自動抽出方法が開示されている。ここで、代表的な画像は、撮影者の意図、撮影された画像の状態、被写体の状態をもとに評価して選び出され、動画像の早見や検索などに利用される。
【０００６】
【発明が解決しようとする課題】
しかしながら、上記の静止画像自動抽出方法も、録画開始から終了までの動画像ごとに代表画像を抽出するので、基本的に代表画像の数は、動画像全体に含まれる録画開始／終了の区切りの数に依存する。
【０００７】
例えば、映画のようにカットの多い動画像の場合には、代表画像が多くなる。この代表画像は少数の画像の中から選ばれるので、各カットの画像内容をよく表すような画像が選ばれ易い。一方、ビデオカメラで長時間、撮影しつづけた映像の場合には、どんなに画像内容が変化しても、録画開始／終了の区切りがないので、ほとんど代表画像を抽出できない。従って、代表画像だけ見ても、動画像全体の内容がわからない場合もある。
【０００８】
ここで具体例として、撮影開始から終了までに、２種類の被写体を撮影するような場合を考えてみる。図２の画像Ａ〜画像Ｐは、長時間撮影した動画像から、一部分をぬきだしたものである。撮影者はまず、「黄色の車」を撮影して（画像Ａ，Ｂ）いったん撮影を止め、撮影開始後、「赤い服の人物」を撮影し（画像Ｃ〜画像Ｆ）、そのままカメラをパンニングして（画像Ｇ〜画像Ｉ）、「茶色の屋根の小屋」を撮影（画像Ｊ〜画像Ｎ）、ここで撮影を中止し、再開後は「高層ビル」を撮影している（画像Ｏ、Ｐ）。録画開始／終了の区切りは、画像Ｂと画像Ｃの間、画像Ｎと画像Ｏの間であり、画像Ｃ〜Ｎが一まとまりの画像とみなされる。しかし、画像内容から考えると画像Ｃ〜Ｎの中で、「赤い服の人物」の部分と「茶色の屋根の小屋」の部分は別の場面であり、それぞれで代表画像を選んだ方が良い。
【０００９】
このように、撮影や編集操作による区切りで動画像をまとめても、まとまりによって画像内容の変化する度合が異なるため、内容を表現するのに必要な代表画像の数が変化し、従来方法では対応できないことがわかる。従って、画像内容に応じた単位で動画像をまとめて取り扱うことが必要になる。
【００１０】
本発明はかかる点に鑑み、画像内容に応じて動画像をまとめる手法、およびまとめられた動画像を基に代表的画像を自動抽出し、記録表示する装置を提供することを目的とする。
【００１１】
【課題を解決するための手段】
上記課題を解決するために、本発明のショット検出方法は、撮影者が撮影開始操作をしてから撮影終了操作をするまでの間に撮影された動画像の中で、撮影者が特定の被写体を撮影しつづけた動画像、及び撮影者が特定の画角や撮影条件や撮影方法で撮影し続けた動画像をショットとし、撮像装置のズームや撮影開始操作などのカメラ操作情報と撮影中のカメラのセンサーからの信号を処理して得られた撮影状態情報の少なくとも一つを入力情報とし、動画像が一方のショットから別のショットへ移行する際の前記入力情報の変化の規則を予め選出しておき、前記入力情報に関して前記変化の規則と合っている度合をショット変化度合として検出し、少なくとも１つの前記ショット変化度合をもとにして、動画像中のショットを検出するものである。
【００１２】
また、本発明の代表画像記録・表示装置は、撮影者が撮影開始操作をしてから撮影終了操作をするまでの間に撮影された動画像の中で、撮影者が特定の被写体を撮影しつづけた動画像、及び撮影者が特定の画角や撮影条件や撮影方法で撮影し続けた動画像をショットとし、動画像を撮影する際に撮影者がカメラを操作したカメラ操作情報を取り込むカメラ操作情報獲得手段と、カメラのセンサーからの信号を処理して得られた撮影中の撮影状態情報を取り込む撮影状態情報獲得手段のうち少なくとも１つを備えて前記カメラ操作情報または前記撮影状態情報を出力する画像情報出力部と、前記画像情報出力部からの出力に関して、予め設定された変化の規則に合っている度合をショット変化度合として検出するショット変化度合検出部と、前記ショット変化度合検出部からの出力の少なくとも１つを基にして、動画像中のショットを検出するショット検出部と、前記ショット検出部で求められたショットに属する画像から代表的な画像を抽出する代表画像抽出部と、動画像の映像信号を取り込む映像情報獲得部と、前記代表画像抽出部で抽出した代表画像の映像信号を前記映像情報獲得部から入力し、画像記録媒体に記録、または画像表示装置に表示する画像記録・表示部を備えたものである。
【００１３】
【作用】
以上のような構成において、カメラ操作情報や撮影状態情報を入力情報とし、動画像の内容が変化する際の入力情報の変化規則を予め選出しておき、変化規則と入力情報を比較することで、撮影者が特定の被写体を撮影しつづけた動画像や特定の画角、撮影条件で撮影しつづけた動画像をショットとして検出できる。これによって、録画開始から終了までの動画像の単位にとらわれず、動画像を画像内容に応じた任意個数のかたまりにまとめることができる。
【００１４】
また、上記方法で検出されたショットを基に代表的画像を選ぶことにより、動画像の内容の変化度合に応じて必要個数の代表画像が得られるようになり、さらに、抽出した代表画像の情報を記録・表示することで、できるだけ少ない画像で動画像全体の内容をよく表せるようになる。
【００１５】
【実施例】
（実施例１）
まず、本発明のショット検出方法の第１の実施例について説明する。ここで、「ショット」とは、１カットの画像を撮影している間に、パンニングやズーミングなどのカメラ操作を続けたり、特定の被写体を撮りつづけたりして、撮影者が１つの場面として意図的に連続撮影した動画像のかたまりとする。それに対して、「カット」は、カメラにおいて録画開始操作をしてから録画終了操作をするまでの間に連続して撮影された動画像のかたまりとする。
【００１６】
例えば、図２の動画像に対して、カットの単位で区切ると、画像Ｂまでが「黄色い車」の写ったカット、画像Ｃ〜Ｎが「赤い服の人物」と「茶色い屋根の小屋」の両方の写ったカット、画像Ｏからは「高層ビル」の写ったカットが始まっている。一方、ショットについて考えると、画像Ｃ〜Ｎのカットの中に、
・「赤い服の人物」をアップショットで撮影した区間（画像Ｅ〜Ｆ付近）
・「茶色い屋根の小屋」をややロングショットで撮影した区間（画像Ｍ〜Ｎ付近）
の２つのショットが含まれることがわかる。
【００１７】
まず、図１を基に本発明のショット検出方法の概要を説明する。図に示すように、本発明のショット検出方法は、画像情報出力部１、ショット変化度合検出部２、ショット検出部３の３つの部分で実現されている。
【００１８】
画像情報出力部１は、映像信号を処理した情報、撮影時にユーザが行ったカメラ操作の情報、センサの出力情報を出力する。この３種類の情報に関しては、以後、それぞれの種類の情報を総称して、画像処理情報、カメラ操作情報、撮影状態情報と呼ぶことにする。３情報の詳しい説明は後述する。図１では、画像処理情報２０の例として、高周波成分（の大きさ）２１と色ヒストグラム２２を、カメラ操作情報３０の例として、録画開始／終了３１とズーム（倍率）３２を、撮影状態情報４０の例として、パンニング速度４１を挙げている。
【００１９】
ショット変化度合検出部２は、画像情報出力部１からの情報を入力として、画像内容の変化する度合を検出する。ショット検出部３では、ショット変化度合検出部２からの結果をもとに、動画像中でショットと判定される区間を検出する。
【００２０】
続いて、画像情報出力部１、ショット変化度合検出部２、ショット検出部３の各部の動作について詳細に説明する。
【００２１】
始めに、図３を用いて画像情報出力部１について説明する。図において、４はカメラ、５は映像信号、２０は画像処理情報、２１は高周波成分の大きさ、２２は色ヒストグラム、２３は動きベクトル、２４は高周波成分検出部、２５は色ヒストグラム検出部、２６は動きベクトル検出部、３０はカメラ操作情報、３１は録画開始／終了のボタン押下信号、３２はズーム倍率、３３はオート／マニュアルのモードのボタン設定、４０は撮影状態情報、４１は角速度センサの出力（パンニング時の速度）、４２はズーム倍率を換算するためのレンズ焦点距離、４３は絞り開度センサの出力、４４はフォーカス距離である。図３に示すように、画像情報出力部１は、カメラと多数の出力端子からなっており、ビデオカメラで撮影中、画像処理情報２０、カメラ操作情報３０、撮影状態情報４０を順次出力する。
【００２２】
ここで画像処理情報は、撮像素子で撮像した映像信号をもとにして自動的もしくは人間が関与して抽出処理した情報の総称である。図３では例として、フォーカス制御を行うために求めた映像信号の高周波成分の大きさ２１、画面内の色ヒストグラム２２、画面各所の動きベクトル２３を示している。この情報はすべて、カメラ４から映像信号５を入力し、それぞれ高周波成分検出部２４、色ヒストグラム検出部２５、動きベクトル検出部２６において求められる。その他、フレーム間における輝度信号や色信号の差異を求めたフレーム間差分値、あるいは映像信号から被写体領域の位置や大きさの情報なども画像処理情報に含まれる。
【００２３】
また、カメラ操作情報は、ユーザがビデオカメラで撮影する際に行った、ボタン操作をもとにした情報の総称である。図３では、例として、撮影の開始／終了のボタン操作により撮影の開始点／終了点を表す情報３１、撮影中のズーム操作から換算されたズーム倍率を表す情報３２、Ａｕｔｏ／Ｍａｎｕａｌの設定の情報３３を示している。
【００２４】
また撮影状態情報は、カメラの撮影中の状態をセンサーなどで検出した情報の総称である。図３では、例として、角速度センサーによって検出したパンニング速度の情報４１、撮影時のズーム倍率を示すレンズ焦点距離４２、絞り開度センサによって検出した絞り開度情報４３、フォーカス距離４４を示している。
【００２５】
次に、ショット変化度合検出部２とショット検出部３について説明する。ただしここでは、例として、高周波成分（の大きさ）２１、色ヒストグラム２２、録画開始／終了のボタン押下信号３１、ズーム倍率３２、パンニング速度４１の５つの情報が入力する場合に限定して説明する。
【００２６】
まず、図２の動画像を例にとり、ショット変化度合検出部２、ショット検出部３において、ショットを検出する原理を説明する。図５は、図２の各画像に対して、録画開始／終了、ズーム倍率、画面内の高周波成分の大きさ、色ヒストグラム、パンニングの速度、の５つの情報をプロットしたものである。ただし、色ヒストグラムは多数の色の中から茶色、赤色、黄緑に対する頻度だけを抽出して表示した。また、パンニング速度は、水平方向の速度に限定している。図からわかるように、ショットのの区間（Ｅ〜Ｆ、Ｍ〜Ｎ）は、全ての情報が一定値で安定しているのに対し、１つのショットから別のショットへ移る区間（Ｃ〜Ｅ、Ｆ〜Ｍ）では少なくとも１つの情報が変化していることがわかる。このことから、入力情報の変化度合を調べれば、現在処理中の画像が、ショットの区間中なのか、それとも別のショットに移行中なのか、わかるはずである。ここで、「処理中の画像が別のショットに移行中である」という信頼度をショット変化度合と呼ぶことにし、以下、入力される情報の変化度合を、各情報のショット変化度合として検出し、各情報に関するショット変化度合からショットの区間を特定する方法を説明する。
【００２７】
図４は、ショット変化度合検出部２、ショット検出部３の具体的な構成図である。図において、５１は微分フィルタ、５２は絶対値処理部、５３はローパスフィルタ、５４はゲイン調整部、５６は最大値検出部、５７は状態判定部、５８はショット区間出力部、５９はカウンタである。
【００２８】
まず、ショット変化度合検出部２では、入力される情報それぞれのショット変化度合を検出する。ここで、ショット変化度合は［０，ｍａｘ］（ｍａｘ：定数）の範囲の値に正規化されており、値が大きいほど、処理中の画像が別のショットに移行中の可能性が高いことを示すものとする。
【００２９】
ショット変化度合の検出方法は、入力情報によって多少異なっている。高周波成分（の大きさ）２１、ズーム倍率３２、パンニング速度４１の情報に関しては、まず微分フィルタ５１を通し、絶対値処理部５２で絶対値を求めることにより、各情報の時間的変動の大きさを求める。さらに、細かいノイズの影響を少なくし、大域的な変化だけを検出するため、ローパスフィルタ５３を通す。最後にゲイン調整部５４において、各情報固有のゲインをかけて正規化し、出力がショット変化度合の範囲内の値をとるように調整する。また色ヒストグラムのように、多次元のパラメータに関しては、隣接フレーム間での相関計算を行う。色ヒストグラム２２の場合、ヒストグラム相関検出部５５において、ヒストグラム差分などの相関計算を行う。さらに他の情報と同様に、ローパスフィルタ５３を通し、ゲイン調整部５４において正規化する。ただし、ここでの正規化は、相関が低いほど大きい値を出力するように調整する。また、録画開始／終了のボタン押下信号３１に関しては、ゲイン調整部５４において、カットの切れ目のときｍａｘ、切れ目でないとき０の値を出力するよう、調整する。以上のようにして、ショット変化度合検出部２では、入力情報毎にショット変化度合を検出する。
【００３０】
一方、ショット検出部３では、各情報に関するショット変化度合をもとに、ショットの区間を検出する。まず、最大値検出部５６では、ショット変化度合検出部２から同時刻に出力されるすべてのショット変化度合を比較し、その最大値Ｍを求める。状態判定部５７では、求めた最大値Ｍを所定のしきい値ＴＨと比較し、比較結果をもとに、現在処理中の画像の状態を以下のように判定する。
【００３１】
Ｍ＞ＴＨのとき、現在処理中の画像はショット外である
（別のショットへ移行中である） …（Ａ）
Ｍ ≦ ＴＨのとき、現在処理中の画像はショット区間内にある …（Ｂ）
ショット区間出力部５８では、状態判定部５７の出力（ＡｏｒＢ）に応じて以下のような処理を行う。
・状態判定部５７の出力がＡのとき、
カウンタ５９の内容Ｃを読み込む。
【００３２】
Ｃ＞０の場合、Ｃを出力し、その後カウンタ５９の内容を０にリセットする。
・状態判定部５７の出力がＢのとき、
カウンタ５９の内容をインクリメントする。
【００３３】
ただし、初期状態で、カウンタは０に設定されているとする。ショット終了時点で出力される値Ｃは、ショット継続中に処理した回数を表している。従って、ショット検出部３から値Ｃが出力された時、その出力を基に、直前に終了したショットの区間を特定することができる。
【００３４】
ここで、以上説明した構成が、実際の画像でどのように動作するか調べるため、図２の動画像を例にとって情報の流れを追ってみる。図６は、ショット変化度合検出部２とショット検出部３の内部のブロックの出力の時間変化を示した図である。図の上部のグラフはショット変化度合検出部２に対応しており、その中で実線は微分フィルタ５１またはヒストグラム相関検出部５５の出力を表し、破線はショット変化度合検出部２の最終的な出力（ショット変化度合）を表している。また、図６の下部のショット検出部３に対応するグラフでは、ショット変化度合検出部２の出力（破線）とその最大値（太い実線）、しきい値を重ねて示した。また、取り扱う入力情報は、録画開始／終了３１、ズーム倍率３２、画面内の高周波成分の大きさ２１、色ヒストグラム２２、パンニングの速度４１、の５つに限定し、さらに色ヒストグラムは３色のみ、パンニングの速度は水平方向に限定して表示してある。図６において、各入力情報に対するショット変化度合は、ショット以外の区間（Ｃ〜Ｅ、Ｆ〜Ｍ）で大きな値を出力している。また、ショット変化度合の最大値は、ショット以外の区間でしきい値を越えていることがわかる。従って、ＦとＮの画像を処理する際、ショット検出部３からそれぞれＥ〜Ｆ、Ｍ〜Ｎの区間中の処理回数（Ｌ１，Ｌ２）が出力され、Ｌ１、Ｌ２により、Ｅ〜Ｆ、Ｍ〜Ｎがショットの区間であると特定できる。
【００３５】
以上のように、カメラ操作情報または画像処理情報または撮影状態情報を入力とし、各情報に関するショット変化度合を検出することにより、ショットの区間が特定でき、動画像を画像内容に応じた単位でまとめることができる。
【００３６】
なお、上記実施例では、画像情報出力部１の具体例としては、図３を用いて、撮影中、カメラから直接情報を出力する方法を説明した。しかし、別の方法として、撮影中はカメラからの情報を記録媒体に記録し、ショット検出を行う際に、記録媒体から情報を読み出しても同様の処理が行える。この方法を図７を用いて説明する。図において、４はカメラ、５は映像信号、６は記録媒体、７は符号化処理部、８は復号化処理部、９は画像処理部、２０は画像処理情報、３０はカメラ操作情報、４０は撮影状態情報であり、破線枠内が画像情報出力部１に対応している。
【００３７】
以下に各部の動作を説明する。記録媒体６は、録画時と再生時で切り替わるスイッチを持っている。ここで、「録画時」とは、カメラ４で撮影しながら、各種情報を記録媒体に記録する期間であり、「再生時」とは、記録媒体に記録された各種情報を出力する期間である。まず、ユーザが撮影する際、記録媒体６のスイッチは「録画時」のモードに設定されている。撮影中にカメラ４から出力された映像信号５、画像処理情報、カメラ操作情報、撮影状態情報は、符号化処理部７において、符号化の処理やフォーマット合わせの処理が施され、記録媒体６に蓄積される。その後、ショットを検出する際に、記録媒体６のスイッチは「再生時」のモードに設定される。記録媒体６に蓄えられた情報は、復号化処理部８でそれぞれの情報として読み出され、画像情報出力部１の出力として、ショット変化度合検出部２に送られる。ここで、復号化処理部８をから出力された映像信号５を画像処理部９で加工し、その結果を他の画像処理情報と合わせて出力しても同様である。
【００３８】
このように、各種情報をいったん記録媒体に蓄積しても、カメラから直接得るのと同様にして、各種情報を出力できる。
【００３９】
以上の実施例では、画像処理情報、カメラ操作情報、撮影状態情報を、カメラから直接出力したり、いったん記録媒体に蓄積して後から読みだしたりする場合について説明したが、いずれにしても、必要なすべての情報は、もともとカメラから与えられていた。しかし、これら情報の一部、もしくは全部が与えられない場合でも、カメラあるいは記録媒体から出力される映像信号を処理することによって、欠如した情報に相当する情報を獲得し、獲得した情報をもとにショットを検出することができる。これについて、以下の第２の実施例のショット検出方法で詳細に説明する。
【００４０】
（実施例２）
第２の実施例は、映像信号のみから画像処理情報、カメラ操作情報、撮影状態情報を獲得し、獲得した情報からショットを検出するものである。本実施例の全体構成は、第１の実施例で用いた図１と同じであるが、画像情報出力部１の具体的な構成が異なっている。以下、図８を用いて、本実施例の画像情報出力部１において映像信号から各種情報を検出・出力する方法を説明する。ただし、ここで説明するのは、画像処理情報に関して高周波成分と色ヒストグラム、カメラ操作情報に関しては録画開始／終了の情報とズーム倍率、撮影状態情報に関してはパンニング速度、の５つの情報に限定している。
【００４１】
図８で、５は映像信号、２６は高周波成分検出部、２１は高周波成分（の大きさ）の情報、２７は色ヒストグラム検出部、２２は色ヒストグラムの情報、６０はフレーム間差分値検出部、６１はメモリ、６２は変化量検出部、６３はカットチェンジ検出部、３１は録画開始／終了の情報、６４はカメラワーク検出部、６５は動きベクトル検出部、６６はカメラワークパラメータ推定部、３２はズーム倍率の情報、４１はパンニング速度の情報である。以上の構成における各部の動作について以下に説明する。
【００４２】
まず、画像処理情報に関しては、高周波成分検出部２４、色ヒストグラム検出部２５において、それぞれ高周波成分２１、色ヒストグラム２２の情報を検出する。ここで、高周波成分検出部２４と色ヒストグラム検出部２５は、図３の各部と同じであり、既に第１の実施例で述べたので、説明は省略する。
【００４３】
次に、カメラ操作情報の内、録画開始／終了３１の情報を検出する方法を説明する。録画開始／終了の情報の検出は、フレーム間差分値検出部６０とカットチェンジ検出部６３で行われ、さらにフレーム間差分値検出部６０は、映像信号を１フレーム遅延させるためのメモリ６１と、連続するフレーム間で映像信号の差分を求める変化量検出部６２から構成されている。連続するフレーム間での差を求める信号は、輝度値やＲＧＢ値等を用い、変化量検出部６２において画素単位で連続するフレーム間の映像信号の差分演算を行い、画素ごとの差分値の総和を求めてフレーム間差分値として出力する。カットチェンジ検出部６３は、フレーム間差分値検出部６０で求めたフレーム間差分値に対してしきい値処理をする。すなわち、所定のしきい値とフレーム間差分値との比較を行い、フレーム間差分値がしきい値より大きい場合は、２枚のフレーム間で画像内容が大きく変化していると考えて、その部分でカットチェンジがあったと判断する。ここで検出したカットチェンジは、撮影の開始や終了時点に対応するので、カットチェンジの有無の信号は、カメラのボタン押下信号として得られる録画開始／終了の情報に相当する。
【００４４】
次に、カメラ操作情報の中のズーム倍率３２の情報と、撮影状態情報のパンニング速度４１の情報を検出する方法を説明する。この２つの情報は、カメラワーク検出部６４において検出され、さらにカメラワーク検出部６４は、動きベクトル検出部６５とカメラワークパラメータ推定部６６に分かれる。まず、動きベクトル検出部６５は、画面内で複数の座標位置を設定し、隣接フレームとの画素値の比較により、各座標位置での動きベクトルを検出する。カメラワークパラメータ推定部６６では、検出された動きベクトルを基に、カメラの水平、垂直方向の変化（パンニング、チルティング）や、カメラ画角の変化（ズーミング）、カメラの水平・垂直・前後の位置の変化（トラッキング、ブーミング、ドリーイング）等のカメラワークを推定する。ズーム倍率３２とパンニング速度４１は、カメラワークの１つとして推定されるので、カメラ操作情報や撮影状態情報に相当する情報が得られる。ここで、動きベクトル検出部６５、カメラワークパラメータ推定部６６の詳細な動作は、例えば特開平４ー３１７２６７で説明されているので、ここでの説明は省略する。
【００４５】
以上のように、カメラから、カメラ操作情報や撮影状態情報が得られない場合でも、映像信号を処理することにより、相当する情報を推定することができる。本実施例では、γ補正値、色温度、逆光や過順光の状態などの情報については記載しなかったが、これらの情報も、映像信号を処理することによって獲得することが可能である。上記情報を獲得後、この情報をもとにショットを検出する構成と手法については、第１の実施例と同様であり、説明は省略する。
【００４６】
（実施例３）
次に、本発明のショット検出方法の第３の実施例について説明する。本実施例は、図１のショット変化度合検出部２に関するものである。
【００４７】
第１の実施例では、図４に示すように、微分フィルタやヒストグラム相関処理によって各情報の変動量を求め、ショット変化度合を検出した。これは、「撮影者が１つのショットを撮り終わって別のショットを撮り始めるまでの期間は、各情報が大きく変化する」、という性質を利用している。しかしながら、入力される情報それぞれについて調べると、情報が大きく変化する現象以外にも、ショットを検出する際に関連するような現象があることがわかる。
【００４８】
例えば、ある被写体を撮影中に、急に別の被写体にレンズを向けると、フォーカスがずれ、ピンボケの状態が短時間続くことがあるが、この間、高周波成分は、小さくなっている。従って、「高周波成分が小さいとき、別のショットへの移行期間である」というルールが成り立つ。また、同じパンニングでも、ある被写体から別の被写体へ移るためのパンニングは、ショットの移行中と判定してほしいが、ゆっくり移動する被写体を追いかけてパンする場合には、パンニングの期間を１つのショットの区間中として認識したい。このように、同じパンニングでも、ショットの移行中の場合とそうでない場合があるので、第１の実施例のようにパンニング速度の変動量からパンニングの期間を検出できたとしても、ショットの区間を特定できないことがある。しかし、別のショットへ移行するためのパンニングは速くて短時間でおわるし、被写体をトラッキングする場合には、遅くてほぼ一定速度のパンニングが長時間継続するので、パンニング速度とパンニングの継続時間によって両者を区別することができる。
【００４９】
このように、ショットが移行する際に各情報がどのような変動パターンを示すかを予め調べておき、この変動パターンに合っているかどうかでショット変化度合を検出することにより、ショットの検出性能を高めることができる。本実施例では、各情報の変動量以外に、各情報の大きさ、各情報が所定の状態を継続する時間、を基にショット変化度合を検出する方法を説明する。
【００５０】
図９は、高周波成分２１とパンニング速度４１の２つの情報に関して、ショット変化度合の検出方法を示したものである。図において、微分フィルタ５１、絶対値処理部５２は、図４の各部と同じであり、６７、６９は推定部、６８は継続時間検出部である。以下に上記構成におけるショット変化度合検出部２の動作を説明する。
【００５１】
まず、高周波成分２１に関する検出方法を説明する。第１の実施例と同様に、微分フィルタ５１と絶対値処理部５２によって、高周波成分の時間的変動の大きさを求める。推定部６７では、高周波成分と絶対値処理部５２からの出力を、予め設定した関数で写像し、ショット変化度合として出力する。ここで、写像の関数の具体例として、図１０を用いて説明する。図１０は、２入力（高周波成分、絶対値処理部５２からの出力）、１出力（ショット変化度合）の写像関係を示している。図からわかるように、出力の値域は［０，ｍａｘ］であり、高周波成分の変動が大きい場合、または高周波成分が小さい場合に大きい値を出力するよう、設定されている。このような写像関数を予め設定することで、２つの入力に対して一意の出力がショット変化度合として求められる。
【００５２】
次にパンニング速度４１に関する検出方法を説明する。この場合も同様にして、微分フィルタ５１と絶対値処理部５２によって、パンニング速度の時間的変動の大きさを求める。さらに、継続時間検出部６８では、微分フィルタ５１からの結果を基に、パンニングの継続時間を求める。ここでパンニングの継続時間とは、撮影者がほぼ一定の速度でパンニングしつづけた時間とする。パンニング継続時間の具体的な検出方法としては、まず、パンニング速度の時間的変動量Ｄが振動許容範囲をはずれた時点から、初めてＤが許容範囲内に入る時点をパンニング開始時点として検出し、パンニング開始時点から初めてＤが振動許容範囲をはずれるまでの時間をパンニング継続時間として求める。ただし、振動許容範囲はあらかじめ［ーＷＩＤＥ，ＷＩＤＥ］（ＷＩＤＥ：正の所定数）と設定されているものとする。
【００５３】
推定部６９は、パンニング速度と、絶対値処理部５２の出力、パンニング継続時間の３つの値を、予め設定した関数で写像し、ショット変化度合として出力する。ここでの写像の関数は、図１０の２入力１出力の関係を３入力１出力に拡張したものであり、「パンニング速度が速いとき」または「パンニング速度の変動が大きいとき」または「パンニング継続時間が短いとき」に、大きい値が出力されるよう設定されている。
【００５４】
以上のように、ショットが移行する際に各情報がどのような変動パターンを示すかを予め調べておき、この変動パターンに合っているかどうかでショット変化度合を検出することにより、ショットの検出性能を高めることができる。
【００５５】
なお、上記実施例では、推定部６７、６９においてショット変化度合を検出する方法として、２入力１出力と３入力１出力の予め設定された写像関数を用いる方法を説明したが、入出力の変換に、例えばファジィ推論や多重線形関数、ニューラルネットなどを用いてもよい。
【００５６】
また、上記実施例では、入力される情報として高周波成分とパンニング速度の２種類に限定して説明したが、他の情報に対しても同様である。
【００５７】
また、上記実施例では、各情報の変動量、各情報の大きさ、各情報の所定状態が継続する時間を基に、ショット変化度合の検出を行ったが、入力する情報に応じて、他のパラメータを用いても同様である。
【００５８】
（実施例４）
次に本発明の代表画像記録・表示装置における実施例を説明する。
【００５９】
本発明の代表画像記録・表示装置は、前述のショット検出方法を利用して、動画像中の代表画像を抽出し、記録・表示するものである。ここで、代表画像とは、ブラウジングや検索などで、動画像を代表する画像として利用するために選ばれる画像であり、撮影者の意図、撮影された画像の状態、被写体の状態などの画像内容が良く表せるように選ばれる。
【００６０】
図１１に本発明の代表画像記録・表示装置の構成図を示す。図において、画像情報出力部１、ショット変化度合検出部２、ショット検出部３、画像処理情報２０、カメラ操作情報３０、撮影状態情報４０は、既に本発明のショット検出方法の実施例で説明したように、図１の構成図と同じである。また、１０は代表画像抽出部、７０はショット内代表情報検出部、７１はメモリ、７２は代表情報比較部、１１は映像信号出力部、１２は画像記録・表示部である。
【００６１】
以下、各部の動作を説明する。画像情報出力部１は、画像処理情報２０、カメラ操作情報３０、撮影状態情報４０を出力し、ショット変化度合検出部２において各情報に関するショット変化度合を検出する。ショット検出部３では、複数のショット変化度合を基にして、撮影者が１つのショットとして撮影した画像の区間を特定する。
【００６２】
一方、代表画像抽出部１０は、ショット内代表情報検出部７０、メモリ７１、代表情報比較部７２から構成されており、ショット検出部３で求めたショットの区間を基に、動画像の代表画像を求める。以下、代表画像抽出部１０の動作を説明する。
【００６３】
まず、ショット内代表情報検出部７０は、ショット検出部３で検出された各ショットに対して、同一ショットに属する画像の画像処理情報２０、カメラ操作情報３０、撮影状態情報４０を入力し、各情報の平均値を求める。検出された各情報の平均値は「ショットを代表する情報」として出力される。ここで、ショット検出部３で第ｉ番目に検出されたショットに注目し、ショットを代表する情報を数式で表すと、以下のようになる。
【００６４】
【数１】

【００６５】
ただし、ＩＮは、画像処理情報２０、カメラ操作情報３０、撮影状態情報４０の中で入力情報として利用される情報の数とする。例えば、入力情報を、高周波成分、色ヒストグラム、録画開始／終了、ズーム倍率、パンニング速度とする場合、ＩＮ＝５であり、ｍ（１，ｉ）は高周波成分、ｍ（２，ｉ）は色ヒストグラム、ｍ（３，ｉ）は録画開始／終了、ｍ（４，ｉ）はズーム倍率、ｍ（５，ｉ）はパンニング速度の平均値を示している。従って、ｋ番目（０≦ｋ≦ＩＮ）の入力情報に関する代表情報ｍ（ｋ，ｉ）は、
【００６６】
【数２】

【００６７】
と表される。
次に、メモリ７１は、ショット内代表情報検出部７０において検出されたショット毎の代表情報Ｍ（ｉ）を蓄える。代表情報比較部７２では、ショット間で代表情報Ｍ（ｉ）の比較を行い、検出された全てのショットの中から「代表画像を出すべきショット」を選び、選ばれたショットから代表画像を抽出する。代表情報比較部７２の処理手順は以下の通りである。
（１）最初に検出されたショットを「代表画像を出すべきショット」に選び、ショットを代表する情報Ｍ（１）をメモリ７１内の変数Ｍｐｒｅに代入して、Ｍｐｒｅ＝Ｍ（１）とする。
（２）２番目のショットが検出されたとき、ショットを代表する情報Ｍ（２）をＭｐｒｅと比較し、２つの代表情報間の距離を検出する。ここで代表情報間の距離とは、入力情報毎に１つの次元の軸を割り当てて（ＩＮ）次元空間をつくり、この空間に２つの代表情報をプロットしたときの距離である。例えば、ａ番目のショットとｂ番目のショットの代表情報の距離Ｄｉｓ（ａ，ｂ）は、
【００６８】
【数３】

【００６９】
で表される。
（３）Ｄｉｓ（１，２）を所定のしきい値Ｅと比較し、以下の処理を行う。
・Ｄｉｓ（１，２）＞Ｅのとき、
２番目のショットを「代表画像を出すべきショット」として選択し、ショットに属する画像の中から、中央に位置する画像Ｆ（２，Ｎ２／２）を代表画像として抽出する。また、メモリ７１内の変数Ｍｐｒｅに、代表情報Ｍ（２）を代入する。
・Ｄｉｓ（１，２）≦Ｅのとき、
２番目のショットは１番目のショットと画像内容が似ている、と判断される。（４）以降で検出されるショットに対しても、（２）、（３）と同様の処理を行う。すなわち、
（４−１）ｉ番目のショットが検出されたとき、Ｍ（ｉ）とＭｐｒｅを比較し、２つの代表情報間の距離Ｄｉｓ（ｉ，ｐｒｅ）を検出する。
（４−２）Ｄｉｓ（ｉ，ｐｒｅ）をしきい値Ｅと比較し、
・Ｄｉｓ（ｉ，ｐｒｅ）＞Ｅのとき、
ｉ番目のショットを「代表画像を出すべきショット」として選択し、ショットに属する画像の中から、中央に位置する画像Ｆ（ｉ，Ｎｉ／２）を代表画像として抽出する。メモリ７１内の変数Ｍｐｒｅに代表情報Ｍ（ｉ）を代入する。
・Ｄｉｓ（ｉ，ｐｒｅ）≦Ｅのとき、
ｉ番目のショットはＭｐｒｅの情報を持つショットと画像内容が似ている、と判断する。
（４ー３）４−１に戻り、（ｉ＋１）番目のショットに対して同様の処理を行う。
【００７０】
４−１，４−２，４−３の処理は、最後に検出されたショットまで継続され、動画像の代表画像が抽出される。
【００７１】
一方、代表画像記録・表示部１２は、映像信号やその他情報を表示するディスプレイ、または各種情報を記録する記録媒体で構成されている。代表情報比較部７２から代表画像に関する情報を入力し、映像信号出力部１１からは代表画像に対応する映像信号を入力し、代表画像の表示または記録を行う。
【００７２】
以上のようにして、ショット間で代表情報を比較し、比較結果を基に選んだショットから、代表画像を抽出することにより、できるだけ少ない代表画像ですべての画像内容を表現することが可能になる。従って、この代表画像を表示、記録して、動画像のブラウジング、検索などに利用することにより、画像内容が把握しやすくなる。
【００７３】
【発明の効果】
以上説明したように、本発明のショット検出方法は、カメラ操作情報や撮影状態情報を入力情報として画像内容の変化する度合を検出することで、撮影者が特定の被写体を撮影しつづけた動画像や特定の画角、撮影条件で撮影しつづけた動画像をショットとして検出できる。
【００７４】
本発明の方法によれば、画像内容が変化する毎に、動画像が別のショットとして分類されるので、「話の展開が速い」ときは多くのショットが、「ずっと同じような画面が続く」ときには少ないショットが検出され、動画像によって画像内容の変化の仕方が異なる場合にも、画像内容に応じた動画像の単位化が行える。
【００７５】
また本発明の代表画像表示・記録装置は、上記方法で求めたショットを用いて、代表画像を抽出することにより、少ない代表画像で多くの画像内容を網羅することが可能になり、抽出した代表画像の情報を記録・表示することで、動画像の効率的な検索、早見、編集が行えるようになる。
【図面の簡単な説明】
【図１】本発明のショット検出方法の全体構成を示す図
【図２】動画像のショットとカットの関係を説明するための図
【図３】本発明のショット検出方法の第１の実施例における画像情報出力部の構成を示す図
【図４】本発明のショット検出方法の第１の実施例におけるショット変化度合検出部およびショット検出部の具体的構成を示す図
【図５】本発明のショット検出方法の画像情報出力部の出力の時間変化を示す図
【図６】本発明のショット検出方法の第１の実施例のショット変化度合検出部およびショット検出部に関して、内部ブロックの出力の時間変化を示す図
【図７】本発明のショット検出方法の第１の実施例の画像情報出力部に関して、図３とは別の構成を示す図
【図８】本発明のショット検出方法の第２の実施例の画像情報出力部の構成を示す図
【図９】本発明のショット検出方法の第３の実施例のショット変化度合検出部の構成を示す図
【図１０】本発明のショット検出方法の第３の実施例の推定部における２入力１出力の写像関数を示す図
【図１１】本発明の代表画像記録・表示装置の実施例の構成を示す図
【符号の説明】
１画像情報出力部
２ショット変化度合検出部
３ショット検出部
４カメラ
５映像信号
６記録媒体
７符号化処理部
８復号化処理部
９画像処理部
１０代表画像抽出部
１１映像信号出力部
１２画像記録表示部
２０画像処理情報
２１高周波成分
２２色ヒストグラム
２３動きベクトル
２４高周波成分検出部
２５色ヒストグラム検出部
２６動きベクトル検出部
３０カメラ操作情報
３１録画開始／終了
３２ズーム倍率
３３オート／マニュアルのモード
４０撮影状態情報
４１パンニング速度
４２レンズ焦点距離
４３絞り開度センサ出力
４４フォーカス距離
５１微分フィルタ
５２絶対値処理部
５３ローパスフィルタ
５４ゲイン調整部
５６最大値検出部
５７状態判定部
５８ショット区間出力部
５９カウンタ
６０フレーム間差分値検出部
６１メモリ
６２変化量検出部
６３カットチェンジ検出部
６４カメラワーク検出部
６５動きベクトル検出部
６６カメラワークパラメータ推定部
６７、６９推定部
６８継続時間検出部
７０ショット内代表情報検出部
７１メモリ
７２代表情報比較部[0001]
[Industrial applications]
The present invention is a method of summarizing moving images in units according to image contents, for quick viewing, searching, editing, etc. of moving images, and automatically extracting a representative image based on the summarized moving images, The present invention relates to a device for recording and displaying images.
[0002]
[Prior art]
Conventionally, many techniques have been proposed for detecting a break in a moving image and for using a break in a moving image for editing work.
[0003]
As a method of automatically detecting a break of a moving image from a video signal, for example, there is a scene change detection method disclosed in Japanese Patent Application Laid-Open No. 3-214364. This is to determine whether a scene has changed by comparing histograms between adjacent frames. The scene change here corresponds to the start / end point of photographing / recording or the joint when different scenes are linked by editing work. In VISS (VHS Index Search System), a VISS signal is automatically recorded on a video tape at the start of recording, and is used as a tag for high-speed cueing.
[0004]
On the other hand, there is also a need for a flexible technique of tagging not only a starting point of recording but also a point of a desired image. For example, in the above-mentioned VISS, a VISS signal can be recorded even for a scene that the user wants to see, so that fast forward reproduction called intro search can be performed using the VISS signal recorded on the video tape. The intro search repeats the operation of, when a VISS signal is found during fast-forwarding, setting a reproduction state for a certain time and then fast-forwarding again until the end of the tape.
[0005]
In addition, a method of automatically extracting a representative image has been proposed in order to save the user the trouble of “selecting a favorite image and adding a VISS signal”. For example, in Japanese Patent Application Laid-Open No. 5-147337, a representative image is automatically extracted from one cut (moving image continuously taken from the start of recording to the end of recording). A still image automatic extraction method is disclosed. Here, a representative image is selected based on evaluation based on the intention of the photographer, the state of the photographed image, and the state of the subject, and is used for quick viewing and retrieval of a moving image.
[0006]
[Problems to be solved by the invention]
However, the above-described still image automatic extraction method also extracts a representative image for each moving image from the start to the end of the recording, so that the number of representative images basically depends on the boundary between the recording start / end included in the entire moving image. Depends on the number.
[0007]
For example, in the case of a moving image with many cuts such as a movie, the number of representative images increases. Since this representative image is selected from a small number of images, it is easy to select an image that well represents the image content of each cut. On the other hand, in the case of a video that has been shot for a long time with a video camera, no matter how the image content changes, there is no break between recording start / end, so that a representative image can hardly be extracted. Therefore, the contents of the entire moving image may not be known even when only the representative image is viewed.
[0008]
Here, as a specific example, consider a case where two types of subjects are photographed from the start to the end of photographing. The images A to P in FIG. 2 are obtained by extracting a part from a moving image captured for a long time. First, the photographer shoots the “yellow car” (images A and B), stops shooting once, starts shooting, then shoots “the person in red clothes” (images C to F), and pans the camera as it is. (Image G to Image I), and photographing the "brown roof hut" (Image J to Image N), stopping the photographing here, and photographing the "high-rise building" after resuming (Image O, P). The recording start / end is defined between the image B and the image C and between the image N and the image O, and the images C to N are regarded as a group of images. However, considering the image content, in the images C to N, the portion of “the person in red clothes” and the portion of the “hut with a brown roof” are different scenes, and it is better to select the representative image in each case. .
[0009]
As described above, even if moving images are grouped by a break due to shooting or editing operations, the degree of change in image contents varies depending on the unity, so the number of representative images required to express the contents changes, and conventional methods It turns out that you can't. Therefore, it is necessary to handle moving images collectively in units corresponding to image contents.
[0010]
In view of the above, it is an object of the present invention to provide a technique for combining moving images according to image contents, and an apparatus for automatically extracting a representative image based on the combined moving images and recording and displaying the representative image.
[0011]
[Means for Solving the Problems]
In order to solve the above-described problem, a shot detection method according to the present invention provides a method in which, in a moving image taken from a time when a photographer performs a shooting start operation to a time when a photographing end operation is performed, the photographer is provided with a specific subject. A moving image that has been shot continuously, and a moving image that has been continuously shot by the photographer at a specific angle of view, shooting conditions, and shooting method are used as shots.CameraAt least one of shooting state information obtained by processing a signal from a sensor is used as input information, and the input information when a moving image transitions from one shot to another shotofChange rules are selected in advance, and the input informationToIn this case, the degree of change is detected as a shot change degree, and a shot in a moving image is detected based on at least one of the shot change degrees.
[0012]
In addition, the representative image recording / display device of the present invention includes:Of the moving images taken during the period from when the photographer performs the shooting start operation to when the photographing end operation is performed, a moving image in which the photographer has continued to photograph a specific subject, and the photographer has a specific angle of view. A camera operation information acquisition unit that captures the camera operation information of the camera operated by the photographer when shooting a moving image, and a signal from the camera sensor. An image information output unit for outputting the camera operation information or the shooting state information, the image information output unit including at least one of shooting state information obtaining means for capturing shooting state information during shooting obtained by processing; and the image information output unit With respect to the output from the camera, a shot change degree detection unit that detects a degree of change that conforms to a preset change rule as a shot change degree, and a small amount of output from the shot change degree detection unit. Both by one based on the shot detection unit for detecting a shot in a moving image,A representative image extracting unit that extracts a representative image from the images belonging to the shot determined by the shot detecting unit, a video information acquiring unit that captures a video signal of a moving image, and a representative image extracting unit that extracts the representative image extracted by the representative image extracting unit. An image recording / display unit for inputting a video signal from the image information acquiring unit, recording the image signal on an image recording medium, or displaying the image signal on an image display device.Things.
[0013]
[Action]
In the above configuration, the camera operation informationAndUsing shooting status information as input information, input information when the content of the moving image changesofBy selecting a change rule in advance and comparing the change rule with the input information, it is possible to shoot a moving image where the photographer has continued to shoot a specific subject or a moving image continuously shot with a specific angle of view and shooting conditions. Can be detected as This allows the moving images to be grouped into an arbitrary number of blocks according to the image content, regardless of the unit of the moving image from the start to the end of recording.
[0014]
In addition, by selecting a representative image based on the shots detected by the above method, a required number of representative images can be obtained according to the degree of change in the content of the moving image. Is recorded and displayed, the contents of the entire moving image can be well represented with as few images as possible.
[0015]
【Example】
(Example 1)
First, a first embodiment of the shot detection method of the present invention will be described. Here, the “shot” means that while photographing a one-cut image, the photographer intends to continue a camera operation such as panning and zooming, or to continue taking a specific subject, as one scene. It is a group of moving images that have been continuously shot. On the other hand, “cut” is a group of moving images continuously shot from the start of recording to the end of recording in the camera.
[0016]
For example, when the moving image of FIG. 2 is divided in units of cut, the image B is a cut including a “yellow car”, and the images C to N are “red person in a red clothes” and “a hut with a brown roof”. From both of the cuts, image O, a cut of a “high-rise building” has begun. On the other hand, considering shots, in the cuts of images C to N,
・ Section where “Person in red clothes” was shot by up-shot (near images E to F)
・ Section where the “roof with brown roof” was shot with a slightly longer shot (near images M to N)
It can be seen that the two shots are included.
[0017]
First, an outline of the shot detection method of the present invention will be described with reference to FIG. As shown in the figure, the shot detection method of the present invention is realized by three parts: an image information output unit 1, a shot change degree detection unit 2, and a shot detection unit 3.
[0018]
The image information output unit 1 outputs information obtained by processing a video signal, information of a camera operation performed by a user at the time of shooting, and output information of a sensor. The three types of information are hereinafter collectively referred to as image processing information, camera operation information, and shooting state information. Detailed description of the three information will be described later. In FIG. 1, high-frequency component (size) 21 and color histogram 22 are used as examples of image processing information 20, recording start / end 31 and zoom (magnification) 32 are used as camera operation information 30. As an example of 40, a panning speed 41 is mentioned.
[0019]
The shot change degree detection unit 2 receives information from the image information output unit 1 and detects a change degree of image content. The shot detection unit 3 detects a section determined as a shot in the moving image based on the result from the shot change degree detection unit 2.
[0020]
Subsequently, the operations of the image information output unit 1, the shot change degree detection unit 2, and the shot detection unit 3 will be described in detail.
[0021]
First, the image information output unit 1 will be described with reference to FIG. In the figure, 4 is a camera, 5 is a video signal, 20 is image processing information, 21 is a high-frequency component size, 22 is a color histogram, 23 is a motion vector, 24 is a high-frequency component detection unit, 25 is a color histogram detection unit, 26 is a motion vector detecting unit, 30 is camera operation information, 31 is a recording start / end button press signal, 32 is a zoom magnification, 33 is an auto / manual mode button setting, 40 is shooting state information, 41 is an angular velocity sensor (Speed at the time of panning), 42 is a lens focal length for converting a zoom magnification, 43 is an output of a diaphragm opening sensor, and 44 is a focus distance. As shown in FIG. 3, the image information output unit 1 includes a camera and a number of output terminals, and sequentially outputs image processing information 20, camera operation information 30, and shooting state information 40 during shooting with a video camera.
[0022]
Here, the image processing information is a general term for information that is extracted automatically or with human involvement based on a video signal captured by an image sensor. FIG. 3 shows, as an example, the magnitude 21 of the high-frequency component of the video signal obtained for performing the focus control, the color histogram 22 in the screen, and the motion vectors 23 in various parts of the screen. All of this information is obtained by inputting the video signal 5 from the camera 4 and by the high-frequency component detector 24, the color histogram detector 25, and the motion vector detector 26, respectively. In addition, the image processing information also includes an inter-frame difference value obtained by calculating a difference between a luminance signal and a chrominance signal between frames, or information on the position and size of a subject area from a video signal.
[0023]
Further, the camera operation information is a general term for information based on button operations performed by a user when shooting with a video camera. In FIG. 3, as an example, information 31 indicating a start / end point of shooting by operating a button of shooting start / end, information 32 indicating a zoom magnification converted from a zoom operation during shooting, and setting of Auto / Manual Information 33 is shown.
[0024]
The shooting state information is a general term for information obtained by detecting a shooting state of the camera with a sensor or the like. In FIG. 3, as an example, information 41 of a panning speed detected by an angular velocity sensor, a lens focal length 42 indicating a zoom magnification at the time of shooting, aperture opening information 43 detected by an aperture sensor, and a focus distance 44 are shown. .
[0025]
Next, the shot change degree detection unit 2 and the shot detection unit 3 will be described. However, here, as an example, the description is limited to a case where five pieces of information of a (high-frequency) component 21, a color histogram 22, a recording start / end button press signal 31, a zoom magnification 32, and a panning speed 41 are input. I do.
[0026]
First, taking the moving image of FIG. 2 as an example, the principle of detecting a shot in the shot change degree detection unit 2 and the shot detection unit 3 will be described. FIG. 5 is a plot of five pieces of information of each image of FIG. 2, namely, recording start / end, zoom magnification, size of high-frequency components in the screen, color histogram, and panning speed. However, the color histogram extracted and displayed only the frequencies for brown, red, and yellow-green from many colors. The panning speed is limited to a horizontal speed. As can be seen from the figure, in the section of shots (E to F, M to N), all information is stable at a constant value, whereas the section to change from one shot to another shot (C to E) , FM), at least one information has changed. From this, by examining the degree of change of the input information, it should be known whether the image currently being processed is in the section of the shot or shifting to another shot. Here, the degree of reliability that “the image being processed is shifting to another shot” is referred to as a shot change degree, and hereinafter, the change degree of input information is detected as the shot change degree of each information. A method of specifying a shot section from the degree of change in shots for each piece of information will be described.
[0027]
FIG. 4 is a specific configuration diagram of the shot change degree detection unit 2 and the shot detection unit 3. In the figure, 51 is a differential filter, 52 is an absolute value processing section, 53 is a low-pass filter, 54 is a gain adjustment section, 56 is a maximum value detection section, 57 is a state determination section, 58 is a shot section output section, and 59 is a counter. is there.
[0028]
First, the shot change degree detection unit 2 detects the shot change degree of each piece of input information. Here, the shot change degree is normalized to a value in the range of [0, max] (max: constant), and the larger the value, the higher the possibility that the image being processed is shifting to another shot. Shall be shown.
[0029]
The method of detecting the degree of shot change is slightly different depending on the input information. Regarding the information of the (high-frequency component) 21, the zoom magnification 32, and the panning speed 41, first, the absolute value is obtained by the absolute value processing unit 52 through the differential filter 51, so that the magnitude of the temporal variation of each information is obtained. Ask for. Further, in order to reduce the influence of fine noise and detect only a global change, a low-pass filter 53 is used. Finally, the gain adjusting unit 54 normalizes the information by multiplying it by a gain unique to each information, and adjusts the output so as to take a value within the range of the shot change degree. Further, for multidimensional parameters such as a color histogram, a correlation calculation between adjacent frames is performed. In the case of the color histogram 22, the histogram correlation detection unit 55 performs correlation calculation such as histogram difference. Similarly to other information, the signal is passed through a low-pass filter 53 and normalized by a gain adjustment unit 54. However, the normalization here is adjusted so that a larger value is output as the correlation is lower. In addition, with respect to the recording start / end button press signal 31, the gain adjustment unit 54 adjusts so as to output a value of max when a cut is cut and a value of 0 when the cut is not a cut. As described above, the shot change degree detection unit 2 detects the shot change degree for each input information.
[0030]
On the other hand, the shot detecting section 3 detects a section of a shot based on the degree of change in the shot for each piece of information. First, the maximum value detection section 56 compares all shot change degrees output from the shot change degree detection section 2 at the same time, and obtains the maximum value M. The state determination unit 57 compares the obtained maximum value M with a predetermined threshold value TH, and determines the state of the image currently being processed as follows based on the comparison result.
[0031]
When M> TH, the image currently being processed is outside the shot
(Moving to another shot) (A)
When M ≦ TH, the image currently being processed is within the shot section ... (B)
The shot section output unit 58 performs the following processing according to the output (A or B) of the state determination unit 57.
When the output of the state determination unit 57 is A,
The contents C of the counter 59 are read.
[0032]
If C> 0, C is output, and then the content of the counter 59 is reset to 0.
When the output of the state determination unit 57 is B,
The contents of the counter 59 are incremented.
[0033]
However, it is assumed that the counter is set to 0 in the initial state. The value C output at the end of the shot indicates the number of times processing has been performed during the continuation of the shot. Therefore, when the value C is output from the shot detection unit 3, it is possible to specify the section of the shot that ended immediately before based on the output.
[0034]
Here, in order to examine how the above-described configuration operates on an actual image, the flow of information will be described using the moving image in FIG. 2 as an example. FIG. 6 is a diagram showing a temporal change in the output of the blocks inside the shot change degree detection unit 2 and the shot detection unit 3. The graph at the top of the figure corresponds to the shot change degree detection unit 2, in which the solid line represents the output of the differential filter 51 or the histogram correlation detection unit 55, and the broken line represents the final output of the shot change degree detection unit 2. (Degree of shot change). Also, in the graph corresponding to the shot detection unit 3 in the lower part of FIG. 6, the output (dashed line) of the shot change degree detection unit 2, its maximum value (thick solid line), and the threshold value are shown in an overlapping manner. The input information to be handled is limited to five: recording start / end 31, zoom magnification 32, high-frequency component size 21 in the screen, color histogram 22, panning speed 41, and the color histogram is limited to three colors. , The panning speed is displayed only in the horizontal direction. In FIG. 6, the shot change degree for each input information outputs a large value in a section (C to E, F to M) other than the shot. Also, it can be seen that the maximum value of the shot change degree exceeds the threshold value in a section other than the shot. Therefore, when processing the images F and N, the shot detection unit 3 outputs the number of processings (L1, L2) in the sections E to F and M to N, respectively. To N can be specified as shot sections.
[0035]
As described above, by inputting the camera operation information, the image processing information, or the shooting state information and detecting the degree of change in the shot for each piece of information, the section of the shot can be specified, and the moving images are grouped in units corresponding to the image contents. be able to.
[0036]
In the above embodiment, as a specific example of the image information output unit 1, a method of directly outputting information from the camera during shooting has been described with reference to FIG. However, as another method, the same processing can be performed by recording information from a camera on a recording medium during shooting and reading information from the recording medium when performing shot detection. This method will be described with reference to FIG. In the figure, 4 is a camera, 5 is a video signal, 6 is a recording medium, 7 is an encoding processing section, 8 is a decoding processing section, 9 is an image processing section, 20 is image processing information, 30 is camera operation information, 40 Denotes shooting state information, and the inside of a broken line frame corresponds to the image information output unit 1.
[0037]
The operation of each unit will be described below. The recording medium 6 has a switch that switches between recording and playback. Here, “at the time of recording” is a period during which various kinds of information are recorded on the recording medium while shooting with the camera 4, and “at the time of reproduction” is a period during which various kinds of information recorded on the recording medium are output. . First, when the user shoots, the switch of the recording medium 6 is set to the “recording” mode. The video signal 5, image processing information, camera operation information, and shooting state information output from the camera 4 during shooting are subjected to coding processing and format matching processing in a coding processing unit 7, and are stored in a recording medium 6. Stored. Thereafter, when a shot is detected, the switch of the recording medium 6 is set to the “playback” mode. The information stored in the recording medium 6 is read out by the decoding processing unit 8 as respective information, and sent to the shot change degree detection unit 2 as an output of the image information output unit 1. Here, the same applies if the video signal 5 output from the decoding processing unit 8 is processed by the image processing unit 9 and the result is output together with other image processing information.
[0038]
As described above, even if various information is once stored in the recording medium, the various information can be output in a manner similar to that obtained directly from the camera.
[0039]
In the above embodiment, the case where the image processing information, the camera operation information, and the photographing state information are directly output from the camera, or once stored in the recording medium and read later, have been described. All the necessary information was originally given by the camera. However, even when some or all of this information is not provided, by processing a video signal output from a camera or a recording medium, information corresponding to the missing information is obtained, and based on the obtained information, A shot can be detected. This will be described in detail in the following shot detection method of the second embodiment.
[0040]
(Example 2)
In the second embodiment, image processing information, camera operation information, and shooting state information are obtained only from a video signal, and a shot is detected from the obtained information. Although the overall configuration of the present embodiment is the same as that of FIG. 1 used in the first embodiment, the specific configuration of the image information output unit 1 is different. Hereinafter, a method of detecting and outputting various types of information from a video signal in the image information output unit 1 according to the present embodiment will be described with reference to FIG. However, the description here is limited to five pieces of information: high-frequency components and color histograms for image processing information, recording start / end information and zoom magnification for camera operation information, and panning speed for shooting state information. I have.
[0041]
In FIG. 8, 5 is a video signal, 26 is a high-frequency component detector, 21 is information on (the size of) a high-frequency component, 27 is a color histogram detector, 22 is color histogram information, and 60 is an inter-frame difference value detector. , 61 is a memory, 62 is a change amount detecting unit, 63 is a cut change detecting unit, 31 is recording start / end information, 64 is a camera work detecting unit, 65 is a motion vector detecting unit, 66 is a camera work parameter estimating unit, Numeral 32 indicates information on the zoom magnification, and numeral 41 indicates information on the panning speed. The operation of each unit in the above configuration will be described below.
[0042]
First, as for the image processing information, the high frequency component detection unit 24 and the color histogram detection unit 25 detect the information of the high frequency component 21 and the color histogram 22, respectively. Here, the high-frequency component detection unit 24 and the color histogram detection unit 25 are the same as the respective units in FIG. 3 and have already been described in the first embodiment, and thus description thereof will be omitted.
[0043]
Next, a method for detecting the information of the recording start / end 31 from the camera operation information will be described. Detection of recording start / end information is performed by an inter-frame difference value detection unit 60 and a cut change detection unit 63. The inter-frame difference value detection unit 60 further includes a memory 61 for delaying the video signal by one frame, It is composed of a change amount detecting section 62 for obtaining a difference between video signals between consecutive frames. As a signal for calculating a difference between consecutive frames, a difference value of a video signal between consecutive frames is calculated in a pixel unit in the change amount detection unit 62 using a luminance value, an RGB value, or the like, and the sum of the difference values for each pixel is calculated. And outputs it as an inter-frame difference value. The cut change detection unit 63 performs threshold processing on the inter-frame difference value obtained by the inter-frame difference value detection unit 60. That is, a predetermined threshold value is compared with the inter-frame difference value, and when the inter-frame difference value is larger than the threshold value, it is considered that the image content has greatly changed between the two frames, and It is determined that there was a cut change in the part. Since the cut change detected here corresponds to the start and end of shooting, the signal indicating the presence or absence of the cut change corresponds to recording start / end information obtained as a camera button press signal.
[0044]
Next, a method for detecting the information of the zoom magnification 32 in the camera operation information and the information of the panning speed 41 of the shooting state information will be described. These two pieces of information are detected by the camera work detection unit 64, and the camera work detection unit 64 is further divided into a motion vector detection unit 65 and a camera work parameter estimation unit 66. First, the motion vector detection unit 65 sets a plurality of coordinate positions in the screen, and detects a motion vector at each coordinate position by comparing pixel values with adjacent frames. In the camera work parameter estimating unit 66, based on the detected motion vector, changes in the horizontal and vertical directions of the camera (panning and tilting), changes in the angle of view of the camera (zooming), and changes in the horizontal, vertical, Estimate camera work such as change in position (tracking, booming, drilling). Since the zoom magnification 32 and the panning speed 41 are estimated as one of the camera works, information corresponding to camera operation information and shooting state information can be obtained. Here, detailed operations of the motion vector detecting unit 65 and the camera work parameter estimating unit 66 have been described in, for example, Japanese Patent Application Laid-Open No. 4-317267, and thus description thereof will be omitted.
[0045]
As described above, even when the camera operation information and the shooting state information cannot be obtained from the camera, the corresponding information can be estimated by processing the video signal. In the present embodiment, the information such as the γ correction value, the color temperature, the state of the backlight or the over-direct light is not described, but such information can also be obtained by processing the video signal. After acquiring the above information, the configuration and method of detecting a shot based on this information are the same as in the first embodiment, and a description thereof will be omitted.
[0046]
(Example 3)
Next, a third embodiment of the shot detection method of the present invention will be described. This embodiment relates to the shot change degree detection unit 2 of FIG.
[0047]
In the first embodiment, as shown in FIG. 4, a variation amount of each information is obtained by a differential filter and a histogram correlation process, and a shot change degree is detected. This utilizes the property that "each information changes greatly during the period from when the photographer finishes taking one shot to when taking another shot". However, when examining each piece of input information, it can be seen that there are other phenomena related to the detection of shots, in addition to the phenomena in which the information greatly changes.
[0048]
For example, if a lens is suddenly pointed at another subject while a certain subject is being photographed, the focus may deviate and the out-of-focus state may continue for a short time. During this time, the high-frequency component is small. Therefore, the rule that “when the high-frequency component is small, it is a transition period to another shot” is established. Also, in the same panning, panning for moving from one subject to another should be determined to be during the transition of a shot. However, when panning while chasing a slowly moving subject, the panning period is set to one shot. Want to recognize it as in the section. As described above, even if the same panning is performed, the shot may or may not be in transition, so even if the panning period can be detected from the fluctuation amount of the panning speed as in the first embodiment, the shot section is not changed. It may not be possible to identify. However, panning for transition to another shot is fast and short, and when tracking a subject, slow and almost constant speed panning continues for a long time, so depending on the panning speed and the duration of panning The two can be distinguished.
[0049]
As described above, when a shot shifts, what kind of variation pattern each information indicates is checked in advance, and the degree of shot change is detected based on whether or not the shot matches the variation pattern, thereby improving shot detection performance. Can be enhanced. In the present embodiment, a method of detecting the degree of shot change based on the size of each piece of information and the time during which each piece of information continues in a predetermined state, in addition to the amount of change in each piece of information, will be described.
[0050]
FIG. 9 shows a method of detecting the degree of shot change with respect to two pieces of information of the high frequency component 21 and the panning speed 41. In the figure, a differential filter 51 and an absolute value processing section 52 are the same as the respective sections in FIG. 4, 67 and 69 are estimating sections, and 68 is a duration detecting section. Hereinafter, the operation of the shot change degree detection unit 2 in the above configuration will be described.
[0051]
First, a detection method for the high frequency component 21 will be described. As in the first embodiment, the magnitude of the temporal variation of the high-frequency component is obtained by the differential filter 51 and the absolute value processing unit 52. The estimating unit 67 maps the high-frequency component and the output from the absolute value processing unit 52 using a function set in advance, and outputs it as a shot change degree. Here, a specific example of the mapping function will be described with reference to FIG. FIG. 10 shows a mapping relationship between two inputs (high-frequency components, output from the absolute value processing unit 52) and one output (shot change degree). As can be seen from the figure, the value range of the output is [0, max], and is set to output a large value when the fluctuation of the high frequency component is large or when the high frequency component is small. By setting such a mapping function in advance, a unique output for two inputs is obtained as a shot change degree.
[0052]
Next, a method for detecting the panning speed 41 will be described. Also in this case, similarly, the magnitude of the temporal fluctuation of the panning speed is obtained by the differential filter 51 and the absolute value processing unit 52. Further, the duration detecting unit 68 determines the duration of panning based on the result from the differential filter 51. Here, the duration of panning is the time during which the photographer continues panning at a substantially constant speed. As a specific method of detecting the panning duration, first, a point in time when the temporal fluctuation amount D of the panning speed falls out of the vibration allowable range and a point in time when D falls within the allowable range for the first time is detected as a panning start time. The time from the start time until D deviates from the vibration allowable range for the first time is determined as the panning duration. However, it is assumed that the allowable vibration range is set in advance to [-WIDE, WIDE] (WIDE: a predetermined positive number).
[0053]
The estimating unit 69 maps the three values of the panning speed, the output of the absolute value processing unit 52, and the panning duration with a preset function, and outputs the shot change degree. The mapping function here is obtained by expanding the relationship of two inputs and one output in FIG. 10 to three inputs and one output, and is “when the panning speed is high” or “when the fluctuation of the panning speed is large” or “when the panning is continued. When the time is short, a large value is set to be output.
[0054]
As described above, by examining in advance what kind of variation pattern each information indicates when the shot shifts, and by detecting the degree of shot change based on whether or not the shot matches the variation pattern, the shot detection performance is improved. Can be increased.
[0055]
In the above-described embodiment, a method using a predetermined mapping function of two inputs and one output and three inputs and one output has been described as a method of detecting the degree of shot change in the

estimation units

67 and 69. For example, fuzzy inference, multiple linear functions, a neural network, or the like may be used.
[0056]
Further, in the above-described embodiment, the input information is limited to the two types of the high frequency component and the panning speed, but the same applies to other information.
[0057]
Further, in the above embodiment, the degree of shot change is detected based on the amount of change of each information, the size of each information, and the time during which the predetermined state of each information continues. The same applies to the use of the parameter
[0058]
(Example 4)
Next, an embodiment of the representative image recording / display apparatus of the present invention will be described.
[0059]
The representative image recording / display device of the present invention extracts, records, and displays a representative image in a moving image using the above-described shot detection method. Here, the representative image is an image selected for use as a representative image of a moving image in browsing, searching, or the like, and includes image contents such as a photographer's intention, a state of a shot image, and a state of a subject. Is chosen so that it can be expressed well.
[0060]
FIG. 11 shows a configuration diagram of a representative image recording / display device of the present invention. In the figure, an image information output unit 1, a shot change degree detection unit 2, a shot detection unit 3, image processing information 20, camera operation information 30, and shooting state information 40 have already been described in the embodiment of the shot detection method of the present invention. As described above, the configuration is the same as that of FIG. Reference numeral 10 denotes a representative image extracting unit, 70 denotes a representative information within shot, 71 denotes a memory, 72 denotes a representative information comparing unit, 11 denotes a video signal output unit, and 12 denotes an image recording / display unit.
[0061]
Hereinafter, the operation of each unit will be described. The image information output unit 1 outputs the image processing information 20, the camera operation information 30, and the shooting state information 40, and the shot change degree detection unit 2 detects the degree of shot change for each piece of information. The shot detection unit 3 specifies a section of an image photographed as one shot by the photographer based on a plurality of shot change degrees.
[0062]
On the other hand, the representative image extraction unit 10 includes a within-shot representative information detection unit 70, a memory 71, and a representative information comparison unit 72. Based on the shot section obtained by the shot detection unit 3, the representative image of the moving image Ask for. Hereinafter, the operation of the representative image extracting unit 10 will be described.
[0063]
First, the in-shot representative information detection unit 70 inputs image processing information 20, camera operation information 30, and shooting state information 40 of an image belonging to the same shot for each shot detected by the shot detection unit 3, and Find the average value of the information. The average value of the detected information is output as “information representative of a shot”. Here, focusing on the i-th shot detected by the shot detection unit 3 and expressing information representative of the shot by a mathematical expression, the following is obtained.
[0064]
(Equation 1)

[0065]
Here, IN is the number of information used as input information in the image processing information 20, the camera operation information 30, and the shooting state information 40. For example, when the input information is a high frequency component, a color histogram, a recording start / end, a zoom magnification, and a panning speed, IN = 5, m (1, i) is a high frequency component, and m (2, i) is a color. In the histogram, m (3, i) indicates recording start / end, m (4, i) indicates zoom magnification, and m (5, i) indicates an average value of panning speed. Therefore, the representative information m (k, i) regarding the k-th (0 ≦ k ≦ IN) input information is
[0066]
(Equation 2)

[0067]
It is expressed as
Next, the memory 71 stores the representative information M (i) for each shot detected by the in-shot representative information detection unit 70. The representative information comparison unit 72 compares the representative information M (i) between the shots, selects a “shot to output a representative image” from among all the shots detected, and extracts a representative image from the selected shot. I do. The processing procedure of the representative information comparing unit 72 is as follows.
(1) The first detected shot is selected as a “shot for which a representative image is to be output”, and information M (1) representative of the shot is substituted into a variable Mpre in the memory 71, so that Mpre = M (1). .
(2) When the second shot is detected, information M (2) representing the shot is compared with Mpre to detect the distance between the two representative information. Here, the distance between the representative information is the distance when an (IN) dimensional space is created by allocating one dimensional axis for each input information and two representative information are plotted in this space. For example, the distance Dis (a, b) between the representative information of the a-th shot and the b-th shot is
[0068]
(Equation 3)

[0069]
Is represented by
(3) Dis (1,2) is compared with a predetermined threshold value E, and the following processing is performed.
・ When Dis (1,2)> E,
The second shot is selected as a "shot for which a representative image should be output", and an image F (2, N2 / 2) located at the center is extracted as a representative image from the images belonging to the shot. Further, the representative information M (2) is substituted for the variable Mpre in the memory 71.
・ When Dis (1,2) ≦ E,
It is determined that the second shot has similar image content to the first shot. (4) The same processing as in (2) and (3) is performed for shots detected thereafter. That is,
(4-1) When the i-th shot is detected, M (i) is compared with Mpre, and a distance Dis (i, pre) between two pieces of representative information is detected.
(4-2) Dis (i, pre) is compared with a threshold value E,
When Dis (i, pre)> E,
The i-th shot is selected as a "shot for which a representative image should be output", and an image F (i, Ni / 2) located at the center is extracted as a representative image from the images belonging to the shot. The representative information M (i) is substituted for a variable Mpre in the memory 71.
When Dis (i, pre) ≦ E,
It is determined that the i-th shot is similar in image content to the shot having Mpre information.
(4-3) Return to 4-1 and perform the same processing for the (i + 1) th shot.
[0070]
The processing of 4-1 to 4-2 and 4-3 is continued up to the last detected shot, and a representative image of a moving image is extracted.
[0071]
On the other hand, the representative image recording / display unit 12 is configured by a display for displaying a video signal and other information, or a recording medium for recording various information. Information about the representative image is input from the representative information comparison unit 72, and a video signal corresponding to the representative image is input from the video signal output unit 11 to display or record the representative image.
[0072]
As described above, by comparing the representative information between shots and extracting the representative image from the shot selected based on the comparison result, it is possible to express all image contents with as few representative images as possible. . Therefore, by displaying and recording this representative image and using it for browsing and searching for moving images, the image contents can be easily grasped.
[0073]
【The invention's effect】
As described above, the shot detection method of the present invention uses the camera operation informationAnd shootingDetects the degree of change in image content using state information as input information, and detects, as shots, moving images in which the photographer has continuously shot a specific subject, or has been continuously shot with a specific angle of view and shooting conditions it can.
[0074]
According to the method of the present invention, each time the image content changes, the moving image is classified as another shot, so that many shots are displayed when "the development of the story is fast" In some cases, even if a small number of shots are detected and the way of changing the image content differs depending on the moving image, the unitization of the moving image according to the image content can be performed.
[0075]
Further, the representative image display / recording apparatus of the present invention makes it possible to cover many image contents with a small number of representative images by extracting a representative image using shots obtained by the above method. By recording and displaying image information, efficient searching, quick viewing, and editing of moving images can be performed.
[Brief description of the drawings]
FIG. 1 is a diagram showing an overall configuration of a shot detection method according to the present invention.
FIG. 2 is a diagram for explaining a relationship between a shot and a cut of a moving image.
FIG. 3 is a diagram showing a configuration of an image information output unit in the first embodiment of the shot detection method of the present invention.
FIG. 4 is a diagram showing a specific configuration of a shot change degree detection unit and a shot detection unit in the first embodiment of the shot detection method of the present invention.
FIG. 5 is a diagram showing a time change of an output of an image information output unit in the shot detection method of the present invention.
FIG. 6 is a diagram showing a time change of an output of an internal block with respect to the shot change degree detection unit and the shot detection unit according to the first embodiment of the shot detection method of the present invention.
FIG. 7 is a diagram showing another configuration of the image information output unit of the first embodiment of the shot detection method according to the present invention, which is different from FIG. 3;
FIG. 8 is a diagram illustrating a configuration of an image information output unit according to a second embodiment of the shot detection method of the present invention.
FIG. 9 is a diagram showing a configuration of a shot change degree detection unit according to a third embodiment of the shot detection method of the present invention.
FIG. 10 is a diagram showing a mapping function of two inputs and one output in an estimating unit of a third embodiment of the shot detection method of the present invention.
FIG. 11 is a diagram showing a configuration of an embodiment of a representative image recording / display device of the present invention.
[Explanation of symbols]
1 Image information output unit
2 Shot change degree detector
3 Shot detector
4 Camera
5 Video signal
6 Recording media
7 Encoding unit
8 Decryption processing unit
9 Image processing unit
10 Representative image extraction unit
11 Video signal output section
12 Image record display section
20 Image processing information
21 High frequency components
22 color histogram
23 motion vector
24 High frequency component detector
25 color histogram detector
26 Motion vector detector
30 Camera operation information
31 Recording start / end
32 zoom magnification
33 Auto / Manual mode
40 Shooting status information
41 Panning speed
42 lens focal length
43 Throttle opening sensor output
44 Focus distance
51 Differential filter
52 Absolute value processing unit
53 Low-pass filter
54 Gain adjustment unit
56 Maximum value detector
57 State judgment unit
58 shot section output section
59 counter
60 Inter-frame difference value detection unit
61 memory
62 Change amount detection unit
63 Cut change detector
64 Camera work detector
65 Motion vector detector
66 Camera work parameter estimation unit
67, 69 Estimator
68 Duration detector
70 Representative information detection unit in shot
71 memory
72 Representative information comparison section

Claims

Of the moving images taken during the period from when the photographer performs the shooting start operation to when the photographing end operation is performed, a moving image in which the photographer has continued to photograph a specific subject, and the photographer has a specific angle of view. A moving image continuously shot under the shooting conditions and shooting conditions is defined as a shot, camera operation information such as zooming of the imaging device and shooting start operation, and shooting state information obtained by processing signals from the sensor of the camera being shot. at least one as input information, moving image previously selected rules change in the input information at the time of transition from one shot to another shot, matching rules for the changes related to the input information A shot detection method comprising: detecting a shot change degree as a shot change degree; and detecting a shot in a moving image based on at least one of the shot change degrees.

Further, image processing information obtained by processing a captured image as input information is added, and a rule of each change of the input information is selected in advance, and each of the input information matches the change rule. The shot detecting method according to claim 1, wherein the degree is detected as a degree of shot change.

The input information of the camera operation information, the image processing information, and the shooting state information is input from a camera or a storage medium when shooting with a camera or when playing back a storage medium in which the input information is stored in advance. 3. The shot detecting method according to claim 2, wherein the information is obtained by directly reading out information or estimating the input information based on a video signal output from a camera or a recording medium .

Utilizing the fact that the input information fluctuates greatly when a moving image transitions from one shot to another shot, a correlation value between adjacent frames or an output of a time-axis differential filter is used for each of the input information. 3. The shot detecting method according to claim 2, further comprising detecting the degree of shot change .

With respect to the input information, the shot change based on the correlation value between adjacent frames or the output of the differential filter on the time axis, or the magnitude of the input information, or the time during which the input information continues to fluctuate minutely 3. The shot detecting method according to claim 2, wherein the degree is detected.

Of the moving images taken during the period from when the photographer performs the shooting start operation to when the photographing end operation is performed, a moving image in which the photographer has continued to photograph a specific subject, and the photographer has a specific angle of view. A camera operation information acquisition unit that captures the camera operation information of the camera operated by the photographer when shooting a moving image, and a signal from the camera sensor. An image information output unit for outputting the camera operation information or the shooting state information, the image information output unit including at least one of shooting state information obtaining means for capturing shooting state information during shooting obtained by processing; and the image information output unit With respect to the output from the camera, a shot change degree detection unit that detects a degree of change that conforms to a preset change rule as a shot change degree, and a small amount of output from the shot change degree detection unit. A shot detection unit for detecting a shot in a moving image, a representative image extracting unit for extracting a representative image from an image belonging to the shot obtained by the shot detection unit, A video information acquisition unit that captures a video signal, and an image recording / display that inputs a video signal of a representative image extracted by the representative image extraction unit from the video information acquisition unit and records the video signal on an image recording medium or displays the image on an image display device A representative image recording / display device comprising a unit.

Image processing information acquisition means for capturing image processing information obtained by processing a captured image, an image information output unit further outputs the image processing information, and a shot change degree detection unit includes the image information output unit 7. The representative image recording / display apparatus according to claim 6, wherein a degree of conformity with a preset change rule is detected as a degree of shot change for each of the plurality of outputs from.

A representative image extracting unit, for each shot detected by the shot detecting unit, an in-shot representative information detecting unit for inputting various information of an image belonging to the same shot and obtaining representative information of each shot; A memory that stores the representative information of the shot obtained by the representative information detection unit and the representative information between shots are compared, and a shot is selected based on the comparison result. 6. The representative image recording / display apparatus according to claim 5, further comprising a representative information comparison unit that extracts a representative image from images belonging to the selected shot.