JP3498875B2

JP3498875B2 - Video processing system

Info

Publication number: JP3498875B2
Application number: JP25593995A
Authority: JP
Inventors: 栄二大平; 宏一木村; 浩道藤澤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-10-03
Filing date: 1995-10-03
Publication date: 2004-02-23
Anticipated expiration: 2015-10-03
Also published as: JPH09102045A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、人の行動を記録し
た動画像から、その動画像を検索するために有効な特徴
的なシーンを安定して検出し、そのシーンを格納すると
ともに、その動画像の場所の観点に基づいて自動的に要
約することができる動画像処理システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention stably detects a characteristic scene effective for retrieving a moving image from a moving image in which human behavior is recorded, stores the scene, and stores the scene. The present invention relates to a moving image processing system capable of automatically summarizing a moving image based on its location.

【０００２】[0002]

【従来の技術】動画像を見ながら過去の経験を思い出す
場合には、何日、何処で、誰が、のようなＴ（Time)・
Ｐ(Place)・Ｗ(Who)を確定して、それらをキーワードと
し、エピソード的な観点に基づいて検索することが有効
である。従来では、これを実現するために先ず動画像か
ら検索のキーワードとなる人や物体を認識し、この認識
結果に基づいて動画像を要約しかつ管理する方法が用い
られている。しかし、このような方法が用いられている
と言っても、常に移動する動画像であり、かつ過去の記
録であるため、これらの場所や日時や名前を覚えていな
いことが多く、また該当する場所が動画像の中に１枚し
か写っていないときには、どこであるかを同定できない
ことが多い。2. Description of the Related Art When recalling a past experience while watching a moving image, a T (Time)
It is effective to determine P (Place) and W (Who), use them as keywords, and search based on an episodic viewpoint. Conventionally, in order to realize this, a method of first recognizing a person or an object as a search keyword from a moving image and summarizing and managing the moving image based on the recognition result is used. However, even if such a method is used, since it is a moving image that is constantly moving and it is a past record, it is often the case that they do not remember their place, date and time, or name. When there is only one place in a moving image, it is often impossible to identify where it is.

【０００３】[0003]

【発明が解決しようとする課題】このように、従来にお
いては、動画像から人や物体を安定して認識することは
困難であった。ところで、人や物体と共に、過去の経験
を思い出すときには、場所すなわち「どこで」の情報が
極めて重要である。しかし、行動を記録した動画像中
に、１枚で場所を同定できる映像が写っていることはま
れである。このため、「どこ」の観点から動画像を自動
的に要約することは困難であった。本発明の目的は、こ
のような従来の課題を解消し、過去の経験を思い出すた
めに有効な特徴的なシーンを安定して検出するととも
に、動画像を場所の観点から自動的に要約を行うことが
可能な動画像処理システムを提供することにある。As described above, in the past, it was difficult to stably recognize a person or an object from a moving image. By the way, when recalling past experiences together with people and objects, information on a place, that is, "where" is extremely important. However, it is rare for a single image to identify a place in a moving image in which an action is recorded. Therefore, it is difficult to automatically summarize the moving image from the "where" point of view. An object of the present invention is to solve such a conventional problem, stably detect a characteristic scene effective for remembering past experiences, and automatically summarize a moving image from the viewpoint of location. An object of the present invention is to provide a moving image processing system capable of performing the above.

【０００４】[0004]

【課題を解決するための手段】本発明の動画像処理シス
テムでは、カメラから入力される動画像を対象として、
先ずある画像の隣接フレーム間の距離を求め（画像の横
幅）、次に求められた距離に基づき動画像を静止区間と
変動区間に分離する（図３参照）。動画像の１フレーム
毎に、画像中から輝度が特徴的な領域を抽出し、この特
徴的な領域を隣接するフレームにおいて追跡する。つま
り、動画像の場合には、形状よりも輝度の情報が知覚に
強く残るため、輝度が特徴的な領域を抽出する。上記特
徴的な領域を共通に有する静止区間を１つの事象区間と
して要約する。さらに、上記特徴的な領域を共通に有す
る変動区間内の連続したフレームを、１つの事象区間と
して要約する。また、変動区間ではカメラの動きの方向
を求める。つまり、変動区間では、映像の解析はできな
いため、動きの方向や写った物体の追跡のみが可能とな
る。一方、静止区間では、その区間を特徴付ける静止画
像を求め、各事象区間の静止画像の相互の類似性から画
像を分類する。上記の結果に基づき、複数の静止画像と
その静止画像間の位置関係で表現される場所の要約を求
める。動画像の１フレーム毎に、画像中から輝度が特徴
的な領域を抽出する手段においては、画像をいくつかの
一定の矩形の領域に分割し、各一定領域において、輝度
のヒストグラムを求める（図５参照）。例えば、１つの
動画像を２０×２０＝４００画素に分割し、輝度を０
（黒）〜２５５（白）レベルとする。そして、ヒストグ
ラムから輝度の分布が集中している一定領域を検出する
（図５の濃い部分参照）。この輝度の分布が集中してい
る一定領域のみを対象に、輝度の類似した隣接の一定領
域をまとめ上げて、これを特徴的な領域として抽出す
る。検出された特徴的な領域を隣接するフレームにおい
て追跡する手段において、隣接フレームで重なり合う特
徴的な領域を求め、重なり合う特徴的な領域間のマッチ
ングを行なうことにより、領域毎の移動量を求める。さ
らに、各領域の移動量から１フレームの画面全体の移動
量を求め、全体の移動量から各領域の移動量を修正す
る。In the moving image processing system of the present invention, a moving image input from a camera is targeted.
First, the distance between adjacent frames of a certain image is obtained (width of the image), and then the moving image is divided into a still section and a variable section based on the obtained distance (see FIG. 3). For each frame of the moving image, a region having a characteristic brightness is extracted from the image, and the characteristic region is tracked in an adjacent frame. That is, in the case of a moving image, the information of the luminance remains stronger in the perception than the shape, so that the region having the characteristic luminance is extracted. The still section having the above characteristic areas in common is summarized as one event section. Furthermore, consecutive frames within the fluctuation section having the characteristic region in common are summarized as one event section. In addition, the direction of camera movement is determined in the fluctuation section. In other words, in the changing section, the video cannot be analyzed, and therefore only the direction of movement and the captured object can be tracked. On the other hand, in the still section, a still image that characterizes the section is obtained, and the images are classified based on the mutual similarity of the still images in each event section. Based on the above results, a summary of a plurality of still images and a place represented by a positional relationship between the still images is obtained. In the means for extracting a region having a characteristic brightness from the image for each frame of a moving image, the image is divided into a number of constant rectangular regions, and a histogram of the brightness is obtained in each constant region (Fig. 5). For example, one moving image is divided into 20 × 20 = 400 pixels, and the brightness is set to 0.
(Black) to 255 (white) level. Then, a constant area in which the luminance distribution is concentrated is detected from the histogram (see the dark portion in FIG. 5). Targeting only a certain area in which the distribution of luminance is concentrated, adjacent certain areas having similar luminance are collected and extracted as a characteristic area. In the means for tracking the detected characteristic region in the adjacent frame, the overlapping characteristic regions in the adjacent frames are obtained, and the matching between the overlapping characteristic regions is performed to obtain the movement amount for each region. Further, the movement amount of the entire screen of one frame is obtained from the movement amount of each area, and the movement amount of each area is corrected from the total movement amount.

【０００５】[0005]

【発明の実施の形態】本発明においては、画像は静止区
間のみから抽出し、動作区間からは動きの方向のみを抽
出し、これを登録することにより、後で思い出したい情
報を不足なく記録する。場所は、物理的な位置情報で関
係付けた静止区間の画像により表現する。特徴的な領域
を共通に有する変動区間内の連続したフレームを１つの
事象区間とすることにより、歩いて移動した区間を１つ
の事象区間として抽出する。また、輝度の特徴に基づい
て同一輝度の領域を抽出し、抽出された領域を追跡する
ことにより、移動区間の処理を可能にする。これによ
り、過去の活動を記録した動画像から、その動画像を検
索するために有効な特徴的なシーンを安定して検出する
ことができ、かつ動画像を場所の観点から自動的に要約
することが可能となる。なお、場所は、過去の経験を回
想する場合に極めて有効なものである。BEST MODE FOR CARRYING OUT THE INVENTION According to the present invention, an image is extracted only from a still section, only a direction of movement is extracted from a moving section, and this is registered, so that information to be remembered later can be recorded without a shortage. . The place is represented by an image of a still section associated with physical position information. The consecutive frames within the variable section having the characteristic region in common are set as one event section, and the section moved by walking is extracted as one event section. Further, it is possible to process the moving section by extracting the area having the same brightness based on the characteristics of the brightness and tracking the extracted area. As a result, it is possible to stably detect a characteristic scene that is effective for retrieving a moving image from a moving image that records past activities, and automatically summarize the moving image from the viewpoint of location. It becomes possible. Note that the place is extremely effective when recollecting past experiences.

【０００６】[0006]

【実施例】以下、本発明の原理および実施例を、図面に
より詳細に説明する。動画像中の人の行動に着目した場
合、その視線は静止と移動とを繰り返すことが多い。こ
のうち、静止区間では目の前の場所を注意してみる。一
方、移動（変動）区間では映像の注意深い解析はせず、
動きの方向や写った物体の追跡のみを行なうことが多
い。このため、画像は静止区間のみから抽出し、動作区
間からは動きの方向のみを抽出し、これを登録すること
により、後で思い出したい情報を不足なく記録すること
ができる。場所は、物理的な位置情報で関係付けた静止
区間の画像（シーン）により表現できる。例えば、机の
画像の右に窓の画像がある等である。歩いて移動するシ
ーンでは、特徴的な領域としては、例えば廊下や天井が
抽出される。そして、直進しているときには、その特徴
的な領域がずっと存在する。特徴的な領域を共通に有す
る変動区間内の連続したフレームを１つの事象区間とす
ることにより、歩いて移動した区間を１つの事象区間と
して抽出できる。また、人の運動の知覚を担当する視覚
系は、色を感じないと言われている。すなわち、輝度の
みの情報を用いていることが知られている。また、運動
は、隣接する画像の絵がわずかに動いている時に知覚で
きると言われている。さらに、運動における物体や絵の
追跡は、形より輝度の情報に基づき行なわれている。こ
のため、輝度の特徴に基づいて同一輝度の領域を抽出
し、抽出された領域を追跡することにより、移動（変
動）区間の処理が可能である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The principle and embodiments of the present invention will be described in detail below with reference to the drawings. When focusing on the behavior of a person in a moving image, the line of sight often repeats stationary and moving. Of these, pay attention to the place in front of you in the still section. On the other hand, in the moving (fluctuation) section, careful analysis of the image is not performed,
In many cases, they only track the direction of movement or the object in the image. Therefore, the image is extracted only from the still section, only the direction of movement is extracted from the operation section, and this is registered, so that information to be remembered later can be recorded without a shortage. The place can be represented by an image (scene) of a still section associated with physical position information. For example, there is a window image to the right of the desk image. In a scene that moves on foot, for example, a corridor or a ceiling is extracted as a characteristic region. And, when you are going straight, the characteristic region always exists. By setting consecutive frames in the variable section having a characteristic region in common as one event section, a section moved by walking can be extracted as one event section. In addition, it is said that the visual system, which is responsible for the perception of human movement, does not sense color. That is, it is known that information on only brightness is used. It is also said that motion can be perceived when the pictures of adjacent images are slightly moving. Furthermore, tracking of objects and pictures in motion is performed based on information of luminance rather than shape. Therefore, it is possible to process the moving (fluctuation) section by extracting the area having the same brightness based on the characteristics of the brightness and tracking the extracted area.

【０００７】図２は、本発明をオフィス等における日常
活動を撮影した映像の情報管理装置に適用した場合の一
実施例の構成図である。例えば、図２に示すように作業
者の頭に取付けられたカメラ１およびマイク２から収録
された動画像は、制御部（CONT）４０からバス５０を経
由してメモリ３０に記憶される。動画像は、例えば、３
０フレーム／秒，横６４０×縦４８０画素／フレームで
標本化され入力される。また、メモリ３０には、画像処
理などのプログラムが格納されており、ＣＰＵ１０はメ
モリ３０からプログラムを読み出して実行する。プログ
ラムの実行により、メモリ３０に記憶された動画像は要
約され、その結果はメモリ３０あるいは磁気ディスク装
置２０に格納される。また、この要約結果は、ＣＰＵ１
０によりメモリ３０あるいは磁気ディスク装置２０から
読み出され、インタフェース部（IF）６０を介してディ
スプレイ（CRT）３に表示される。利用者は、キーボー
ド（KEY）４やマウス５を用いて表示された要約の検索
を行なう。FIG. 2 is a block diagram of an embodiment in which the present invention is applied to an information management apparatus for video images of daily activities in an office or the like. For example, as shown in FIG. 2, a moving image recorded from the camera 1 and the microphone 2 attached to the operator's head is stored in the memory 30 from the control unit (CONT) 40 via the bus 50. The moving image is, for example, 3
It is sampled and input at 0 frame / second and 640 horizontal × 480 vertical pixels / frame. A program for image processing is stored in the memory 30, and the CPU 10 reads the program from the memory 30 and executes it. The moving image stored in the memory 30 is summarized by the execution of the program, and the result is stored in the memory 30 or the magnetic disk device 20. Also, this summary result is
0 is read from the memory 30 or the magnetic disk device 20 and displayed on the display (CRT) 3 via the interface unit (IF) 60. The user uses the keyboard (KEY) 4 and the mouse 5 to search the displayed summary.

【０００８】図１は、図２におけるメモリに格納された
動画像処理プログラムの一実施例を示す機能ブロック図
である。通常、人の運動の知覚を司る視覚系は色を感じ
ない。すなわち、人間は輝度のみの情報を用いているこ
とが知られている。また、運動における物体や絵の追跡
を行う場合には、形状よりも輝度の情報に基づき行なわ
れている。このため、本実施例においても、以下のよう
に、輝度の特徴に基づいて動画像の時間方向の要約を行
なう。図１において、各機能ブロック３１〜３７はそれ
ぞれプログラムを示しており、これのプログラムが矢印
の順序で起動することを示している。カメラ１から入力
された画像は、画像間の差分処理部３１と輝度に基づく
塊の抽出処理部３３に送られる。画像間の差分処理部３
１では、１フレームの画像が送られてくる毎に、隣接フ
レームの画像の輝度の差分処理を行なう。差分処理によ
り前のフレームに比べて輝度の差が大きい場合には移動
区間であることがわかる。これにより、動画像を静止区
間と移動区間に分離することができる。差分処理は、例
えば次式（１）により差分値Ｄを計算する。FIG. 1 is a functional block diagram showing an embodiment of a moving image processing program stored in the memory shown in FIG. Normally, the visual system, which controls the perception of human movement, does not perceive color. That is, it is known that human beings use only information on luminance. Further, when tracking an object or a picture in motion, the tracking is performed based on the information on the brightness rather than the shape. Therefore, also in the present embodiment, the moving image is summarized in the time direction based on the luminance feature as follows. In FIG. 1, each of the functional blocks 31 to 37 represents a program, and this program is activated in the order of arrows. The image input from the camera 1 is sent to the inter-image difference processing unit 31 and the luminance-based block extraction processing unit 33. Difference processing unit 3 between images
In 1, every time an image of one frame is sent, the difference processing of the brightness of the image of the adjacent frame is performed. If the difference in luminance is larger than that of the previous frame by the difference processing, it can be seen that it is a moving section. As a result, the moving image can be separated into a still section and a moving section. In the difference processing, for example, the difference value D is calculated by the following equation (1).

【数１】ここで，fij(t)はtフレーム目の画像のij座標における
輝度，fij(t-1)はt-1フレーム目の画像のij座標におけ
る輝度である。[Equation 1] Here, fij (t) is the luminance at the ij coordinate of the t-th frame image, and fij (t-1) is the luminance at the ij coordinate of the t-1th frame image.

【０００９】図３は、本発明による差分処理結果の例を
示す図であり、図４は図１における輝度に基づく塊の抽
出処理部の詳細ブロック図である。また図５は、輝度の
ヒストグラムに基づく特徴的な矩形領域の検出法の説明
図である。静止・移動区間検出部３２は、図３に示すよ
うに差分値がいき値θ以下の区間を静止区間、θ以上の
区間を移動区間とする。移動区間が検出されると、輝度
に基づく塊の抽出処理部３３が起動する。輝度に基づく
塊の抽出処理部３３は、移動区間と、この移動区間に接
する静止区間の０〜数フレーム分の画像に対して、図４
に示すような処理を行なう。すなわち、ここでは動画像
を例えば縦２０×横２０画素の矩形領域（これをセルと
呼ぶ）に分割して、それらのセル毎に輝度のヒストグラ
ムを計算して、同じヒストグラムどうしをまとめて塊で
表わす。これは、フレーム間での塊の移動を検出するた
めである。図４において、輝度のヒストグラム計算部３
３０は、画像を前述のように横２０×縦２０画素のセル
に分割し、各セル毎に図５に示すような輝度のヒストグ
ラムを求める。特徴的な輝度のセル抽出部３３１は、最
大度数の級を含む４つの連続した級の度数の合計が、セ
ル全体の画素数の3/4（300）以上のセルを特徴的な輝度
のセルとして抽出する。図５では、最大度数の級を含む
４つの連続した級を斜線で表わしている。ラベリング部
３３２は、特徴的な輝度のセルのみを対象として、同じ
輝度を示す隣接するセルをまとめて、同じ輝度の塊を抽
出する。図４の最下部には、輝度に基づく塊として、高
い度数の順に濃い色で５つの塊が示されている。これら
の塊は、いずれも同じ輝度を示す特徴的な輝度のセルを
まとめて丸や矩形で囲んだものである。FIG. 3 is a diagram showing an example of a difference processing result according to the present invention, and FIG. 4 is a detailed block diagram of a luminance-based block extraction processing section in FIG. Further, FIG. 5 is an explanatory diagram of a method of detecting a characteristic rectangular area based on a luminance histogram. As shown in FIG. 3, the stationary / moving section detection unit 32 sets a section whose difference value is less than or equal to the threshold value θ as a stationary section and a section whose difference value is greater than θ as a moving section. When the moving section is detected, the luminance-based lump extraction processing unit 33 is activated. The luminance-based lump extraction processing unit 33 determines whether the moving section and the still section that is in contact with the moving section have images of 0 to several frames as shown in FIG.
Perform processing as shown in. That is, here, a moving image is divided into, for example, a rectangular area of 20 pixels in the vertical direction and 20 pixels in the horizontal direction (this is referred to as a cell), a histogram of brightness is calculated for each of the cells, and the same histograms are put together into a block. Represent. This is to detect the movement of chunks between frames. In FIG. 4, the luminance histogram calculation unit 3
30 divides the image into cells of horizontal 20 × vertical 20 pixels as described above, and obtains a luminance histogram as shown in FIG. 5 for each cell. The characteristic brightness cell extraction unit 331 determines that the total of the frequencies of four consecutive classes including the maximum frequency class is 3/4 (300) or more of the number of pixels of the entire cell To extract. In FIG. 5, four continuous classes including the maximum frequency class are represented by diagonal lines. The labeling unit 332 collects adjacent cells having the same brightness and extracts clusters having the same brightness, targeting only cells having characteristic brightness. In the lowermost part of FIG. 4, as the luminance-based lumps, five lumps in dark colors are shown in descending order of frequency. Each of these clusters is formed by enclosing cells with characteristic brightness showing the same brightness in a circle or a rectangle.

【００１０】図６は、図１における塊のトラッキング処
理部の詳細機能ブロック図である。塊のトラッキング処
理部３４は、塊毎の移動を検出してフレーム間全体での
移動を判断し、その移動量を追跡する。このために、隣
接フレームでの重なり合う特徴的な領域を求め、その領
域間のマッチングを行い、領域毎の移動量を求めた後、
各領域の移動量から１フレームの画面全体の移動を求
め、全体の移動量から各領域の移動量を修正する。とこ
ろで、人の運動の知覚において、運動は隣接する画像の
絵がほんの少し動いている時に知覚できると言われてい
る。動画像は、一般に３０フレーム／秒でサンプルされ
て表示される。これは、自然な運動の知覚に十分なサン
プル間隔である。このため、３０フレーム／秒で入力さ
れる画像中の物体は、基本的に隣接フレームにおいては
殆んど移動しない。すなわち、重なり合うと考えられ
る。図６に示す塊のトラッキング処理部３４において、
同一塊候補検出部３４０は、隣接フレームで互いに重な
る塊を検出し、それを同一の塊の候補とする。次に、マ
ッチング部３４１は、同一塊候補の塊間のマッチングを
行なって、塊の移動量を求める。ここで、３０フレーム
／秒でサンプルされたフレーム間では、写っている全て
の物の移動量はほぼ同じである。しかし、塊の抽出は曖
昧に行なわれるため、塊の局所的なマッチングでは、塊
の間で異なった移動量を求めてしまう。このため、補正
処理部３４２は、各塊の移動量から最も多い意見（違う
意見のうちの多数決をとる）を採用し、それをフレーム
間全体での移動量とする。そして、その移動量と大きく
異なる移動量を示した塊に対して、フレーム間での移動
量に基づき再度マッチングを行なう。つまり、格段に移
動量が違うものだけについて再度のマッチングを行な
う。その場合、あまり値が変わらないときには、全体の
方に合わせる。なお、各塊の移動量から最も多い意見を
採用する方法としては、既存の分類法、例えば擬集型分
類法により求められる。これは、各塊の移動量のうちの
最も近いものを選び、両者を１つのクラスとする。その
クラスの移動量は、例えば両者の平均値とする。この作
業を、クラス間の最も近い移動量が一定値以上になるま
で繰り返す。そして、残ったクラスの中で、属する塊の
面積の総和が最も大きいクラスの移動量を採用する。FIG. 6 is a detailed functional block diagram of the lump tracking processing section in FIG. The lump tracking processing unit 34 detects the movement of each lump, determines the movement in the entire frame, and tracks the movement amount. For this purpose, the characteristic regions that overlap in adjacent frames are obtained, the regions are matched, and the amount of movement for each region is obtained.
The movement of the entire screen of one frame is obtained from the movement amount of each area, and the movement amount of each area is corrected from the total movement amount. By the way, in the perception of human motion, it is said that motion can be perceived when a picture of an adjacent image is slightly moving. The moving image is generally sampled and displayed at 30 frames / sec. This is a sample interval sufficient for the perception of natural motion. Therefore, an object in an image input at 30 frames / sec basically does not move in adjacent frames. That is, they are considered to overlap. In the lump tracking processing unit 34 shown in FIG.
The same chunk candidate detection unit 340 detects chunks that overlap each other in adjacent frames and sets them as candidates for the same chunk. Next, the matching unit 341 performs matching between chunks of the same chunk candidate to obtain the movement amount of the chunk. Here, between the frames sampled at 30 frames / second, the movement amounts of all the objects in the image are almost the same. However, since the lumps are extracted ambiguously, the local matching of the lumps requires different movement amounts between the lumps. For this reason, the correction processing unit 342 adopts the largest number of opinions (takes the majority of different opinions) from the movement amount of each block, and sets it as the movement amount in the entire frame. Then, matching is performed again on the block showing a movement amount that is significantly different from the movement amount based on the movement amount between frames. That is, the matching is performed again only for those having a significantly different movement amount. In that case, if the values do not change much, adjust to the whole. As a method of adopting the largest number of opinions from the movement amount of each block, an existing classification method, for example, a pseudo-collection type classification method is used. This selects the closest movement amount of each chunk and sets both as one class. The amount of movement of the class is, for example, the average value of both. This operation is repeated until the closest movement amount between classes becomes a certain value or more. Then, of the remaining classes, the movement amount of the class having the largest total area of the belonging lumps is adopted.

【００１１】図７は、本発明における隣接フレームの塊
の同定法の説明図であり、図８は、同じく離れたフレー
ム間における塊の同定法の説明図である。ここでは、時
間tにおいて塊AとBが検出され、時間t+1において斜線で
示した塊CとDが検出された例を示す。塊のトラッキング
処理部３４は、塊Aと塊Bの一部が合わさって塊Cに移動
し、塊Bは塊CとDに分離して移動したと判定する。ただ
し、補正処理部３４２において、塊Aと塊C，塊Bと塊Dの
輝度の類似度が高く、かつ塊Bと塊Cの輝度の類似度が低
い場合は、塊Bから塊Cへの遷移は削除する。すなわち、
塊Aが塊Cに、塊Bが塊Dに移動したと判断する。ここで、
輝度の類似度は、例えば図５に示す輝度のヒストグラム
の相関値を求めることにより得られる。また、塊のトラ
ッキング処理部３４は、図８に示すように、一度分離し
た塊が再度合わさった（塊B）場合、塊Aと塊Bの輝度の
類似度が高ければ塊Aと塊Bを同一の塊と判断し、塊Aと
塊Aから塊Bまでの途中の塊を全て削除する。このような
輝度による塊の分離統合は、ディスプレイのような人工
的に光を生成している物において生じるが、この処理に
より塊を正しく抽出して記録しておく途中結果の量を減
らすことができる。FIG. 7 is an explanatory diagram of a method of identifying a block of adjacent frames according to the present invention, and FIG. 8 is an explanatory diagram of a method of identifying a block between similarly spaced frames. Here, an example is shown in which lumps A and B are detected at time t, and lumps C and D indicated by diagonal lines are detected at time t + 1. The lump tracking processing unit 34 determines that the lumps A and B are partly combined and moved to the lump C, and the lump B is separated into the lumps C and D and moved. However, in the correction processing unit 342, when the lumps A and lumps C and the lumps B and lumps D have a high degree of similarity in luminance and the lumps B and C have low degrees of similarity in luminance, the lumps B to C are changed. Delete the transition. That is,
It is determined that lump A has moved to lump C and lump B has moved to lump D. here,
The brightness similarity is obtained, for example, by obtaining the correlation value of the brightness histogram shown in FIG. In addition, as shown in FIG. 8, when the lumps separated once are combined again (lump B), the lump tracking processing unit 34 determines that the lumps A and B are similar to each other if the luminosity similarity between the lumps A and B is high. Judge that they are the same lump and delete all lumps A and lumps A to lump B on the way. Such separation / integration of lumps by luminance occurs in an object that artificially generates light, such as a display, but this process can reduce the amount of intermediate results during proper extraction and recording of lumps. it can.

【００１２】図１において、塊のトラッキング処理部３
４の結果に基づいて、静止区間の分類部３５は移動区間
の前後の静止区間における塊の存在の共通性を調べ、多
くの塊を共有する場合には両静止区間は同じ静止区間と
して１つの事象区間としてまとめる。塊の共有性は、移
動区間を通して視野から消えた塊の量、あるいは両静止
区間の中心部の領域において同じ塊を含んでいるか否か
を検出することにより判定可能である。このようにし
て、動画をいくつかの静止画の列に要約することが可能
である。次に、移動区間分類部３６は、静止区間分類部
３５と同様に塊のトラッキング処理部３４の結果から、
まず各フレームにおける塊の共有性を調べる。画像のあ
る一定領域に同じ塊が存在する連続したフレームは、１
つの事象区間としてまとめる。この処理によりまとめら
れる事象区間は、廊下を歩いている区間などが含まれ
る。ここでは、廊下や天井が同じ塊として全てのフレー
ムにおいて抽出される。それ以外の移動区間では、移動
の方向を求める。急激な移動の場合には、隣接フレーム
の塊のマッチング時に移動の方向が求められる。移動が
緩やかな場合には、塊の出現や消滅の記録から求められ
る。例えば移動区間の間に、左から新たな塊が出現し、
右から塊が消滅するときには、カメラは左に動いている
と判定できる。また、移動区間の始端と終端のフレーム
の画像に同じ塊が存在する場合には、両者のフレームで
塊のマッチングを取ることによりカメラの動いた方向を
検出することが可能である。In FIG. 1, a lump tracking processing unit 3
On the basis of the result of No. 4, the stationary section classification unit 35 checks the commonality of the existence of the lumps in the stationary sections before and after the moving section, and when many lumps are shared, both stationary sections are regarded as the same stationary section. Summarize as an event section. The commonality of the lumps can be determined by detecting the amount of the lumps disappeared from the field of view through the moving section, or whether or not the same lump is included in the central region of both stationary sections. In this way, it is possible to summarize the video in several still image columns. Next, the moving section classifying unit 36, similar to the still section classifying unit 35, from the result of the chunk tracking processing unit 34,
First, we examine the sharability of chunks in each frame. Consecutive frames in which the same block exists in a certain area of the image is 1
It is summarized as one event section. The event section summarized by this processing includes a section walking along the corridor. Here, the corridor and ceiling are extracted as the same block in all frames. In other movement sections, the movement direction is obtained. In the case of abrupt movement, the direction of movement is obtained when matching blocks of adjacent frames. If the movement is slow, it can be obtained from the records of the appearance and disappearance of the lumps. For example, during the movement section, a new lump appears from the left,
When the lump disappears from the right, it can be determined that the camera is moving to the left. Further, when the same block exists in the images of the start and end frames of the moving section, it is possible to detect the moving direction of the camera by matching the blocks in both frames.

【００１３】図９は、要約結果の一例を示す図である。
静止区間の要約部３７は、静止区間分類部３５で得られ
た静止画（例えば、静止区間の先頭フレームの映像）の
認識を行ない、同じ静止画を検出する。以上の処理によ
り、エピソード記憶３８には、図９に示すような位置関
係で相互に関係付けられた静止画像が得られる。図９
は、机の周りのシーンを表している。すなわち、最初に
自分の後方に配置されている人の机と窓が写され、その
右には自分のＣＲＴの端縁とキーボードが写され、その
右にはキーボードの正面が写され、その右には自分の机
の上の図書が写され、その上には図書が斜上から写され
ている。これを、図２のＣＲＴを通して利用者に提示す
ることにより、ある時点からある時点における作業の作
業場所を正確に知らせることができる。図９の結果は、
以降の場所を認識するためのテンプレートにもなる。す
なわち、図９の静止画とその位置関係をテンプレートと
して入力映像のマッチングを行なうことにより、図９と
同じ場所の認識が可能である。つまり机上の作業である
ことが認識できる。FIG. 9 is a diagram showing an example of the summary result.
The still section summarizing unit 37 recognizes the still image (for example, the video of the first frame of the still section) obtained by the still section classifying unit 35, and detects the same still image. Through the above processing, still images correlated with each other in the positional relationship shown in FIG. 9 are obtained in the episode storage 38. Figure 9
Represents the scene around the desk. That is, the desk and the window of the person who is placed behind me are first photographed, the edge of my CRT and the keyboard are photographed to the right, the front of the keyboard is photographed to the right, and the right Has a copy of the book on his desk, and the book is on the top of it. By presenting this to the user through the CRT of FIG. 2, it is possible to accurately inform the work place of the work from a certain time point to a certain time point. The result of Figure 9 is
It also serves as a template for recognizing subsequent locations. That is, by matching the input image with the still image of FIG. 9 and its positional relationship as a template, the same place as in FIG. 9 can be recognized. In other words, it can be recognized that it is a desk work.

【００１４】[0014]

【発明の効果】以上説明したように、本発明によれば、
過去の活動を記録した動画像からその動画像を検索すた
めに有効な特徴的なシーンを安定して検出するととも
に、動画像を場所の観点から自動的に要約することが可
能となる。As described above, according to the present invention,
It is possible to stably detect a characteristic scene that is effective for retrieving a moving image from a moving image that records past activities, and to automatically summarize the moving image from the viewpoint of location.

[Brief description of drawings]

【図１】本発明の一実施例を示す情報管理装置の動画像
処理の機能ブロック図である。FIG. 1 is a functional block diagram of moving image processing of an information management device showing an embodiment of the present invention.

【図２】本発明が適用される情報管理装置の構成を示す
図である。FIG. 2 is a diagram showing a configuration of an information management device to which the present invention is applied.

【図３】図１の画像間差分処理部における動画像からの
静止・移動区間の検出法を説明するための図である。FIG. 3 is a diagram for explaining a method of detecting a still / moving section from a moving image in the inter-image difference processing unit of FIG. 1.

【図４】図１における輝度に基づく塊の抽出処理の機能
ブロック図である。FIG. 4 is a functional block diagram of a lump extraction process based on luminance in FIG.

【図５】図４における輝度のヒストグラムに基づく特徴
的な矩形領域の検出法を説明するための図である。5 is a diagram for explaining a method of detecting a characteristic rectangular area based on the luminance histogram in FIG.

【図６】図１における塊のトラッキング部の機能ブロッ
ク図である。FIG. 6 is a functional block diagram of a chunk tracking unit in FIG. 1.

【図７】本発明の隣接フレームにおける塊の同定法を説
明するための図である。FIG. 7 is a diagram for explaining a method of identifying chunks in adjacent frames according to the present invention.

【図８】本発明の離れたフレーム間における塊の同定法
を説明するための図である。FIG. 8 is a diagram for explaining a method of identifying a block between distant frames according to the present invention.

【図９】図１のエピソード記憶に格納される要約結果の
一例を示す図である。9 is a diagram showing an example of a summary result stored in the episode memory of FIG. 1. FIG.

[Explanation of symbols]

１‥カメラ，２‥マイク，３‥ディスプレイ，４‥ キ
ーボード,５‥ マウス,１０‥ＣＰＵ，２０‥磁気ディ
スク，３０‥メモリ，４０‥ 制御部，５０‥ バス，６
０‥ インタフェース部，３１‥画像間の差分処理部，
３２‥静止・移動区間検出部，３３‥輝度に基づく塊の
抽出処理部，３４‥塊のトラッキング部，３５‥静止区
間の分類部，３６‥移動区間の分類部，３７‥静止区間
の要約部，３８‥エピソード記憶，３３０‥輝度のヒス
トグラム計算部，３３１‥特徴的な輝度のセル抽出部，
３３２‥ラベリング部，３４０‥同一塊候補検出部，３
４１‥マッチング部，３４２‥補正処理部。1 ... Camera, 2 ... Microphone, 3 ... Display, 4 ... Keyboard, 5 ... Mouse, 10 ... CPU, 20 ... Magnetic disk, 30 ... Memory, 40 ... Control unit, 50 ... Bus, 6
0 ... interface unit, 31 ... difference processing unit between images,
32 ... Stationary / moving section detection unit, 33 ... Lump-based block extraction processing section, 34 ... Block tracking section, 35 ... Stationary section classification section, 36 ... Moving section classification section, 37 ... Stationary section summary section , 38 ... Episode storage, 330 ... Luminance histogram calculation unit, 331 ... Characteristic luminance cell extraction unit,
332 ... Labeling unit, 340 ... Same block candidate detection unit, 3
41 ... Matching unit, 342 ... Correction processing unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平６−333048（ＪＰ，Ａ) 特開平５−189562（ＪＰ，Ａ) 特開平７−193748（ＪＰ，Ａ) 特開平７−236115（ＪＰ，Ａ) 特開平５−37848（ＪＰ，Ａ) 特開昭61−200789（ＪＰ，Ａ) 特開平６−153155（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06T 7/00 - 7/60 H04N 5/91 G06F 17/30 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) Reference JP-A-6-333048 (JP, A) JP-A-5-189562 (JP, A) JP-A-7-193748 (JP, A) JP-A-7- 236115 (JP, A) JP 5-37848 (JP, A) JP 61-200789 (JP, A) JP 6-153155 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G06T ^7/ 00-7/60 H04N 5/91 G06F 17/30

Claims

(57) [Claims]

1. A means for obtaining a distance between images of adjacent frames from a moving image input from a camera, a means for separating the moving image into a still section and a changing section based on the obtained distance, A unit for extracting a region having a characteristic brightness from the image as one lump for each frame, and a lump overlapping the above lump in adjacent frames,
A means for obtaining and tracking the movement amount of the chunk, a means for summarizing a stationary section having the chunk in common as one event section, and a continuous frame in a fluctuation section having the chunk in common as one event section And a means for summarizing the above.

2. Similarity between the means for determining the direction of camera movement in the varying section, the means for determining a still image that characterizes the still section summarized as the one event section, and the still images in each event section. 2. The method according to claim 1, further comprising: a means for obtaining an image and classifying the images, and a means for obtaining a summary of a plurality of still images and a location represented by a positional relationship between the still images based on the result. The moving image processing system described.

3. A means for extracting a region having a characteristic brightness, a means for obtaining a histogram of brightness for each constant area of an image, a means for detecting a constant area in which the brightness distribution is concentrated, Targeting only a certain area where the distribution of
The moving image processing system according to claim 1, further comprising: a unit that collects adjacent constant areas having similar brightness.

4. A means for tracking the characteristic area in adjacent frames, a means for obtaining a characteristic area that overlaps in an adjacent frame, and a matching between the overlapping characteristic areas to determine a movement amount for each area. 2. The method according to claim 1, further comprising: a means for obtaining the movement amount of the entire screen of one frame from the movement amount of each area, and a means for correcting the movement amount of each area from the movement amount of the entire area. Image processing system.