JP2001016579A

JP2001016579A - Image monitor method and image monitor device

Info

Publication number: JP2001016579A
Application number: JP11185862A
Authority: JP
Inventors: Mayumi Yuasa; 真由美湯浅
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1999-06-30
Filing date: 1999-06-30
Publication date: 2001-01-19
Anticipated expiration: 2019-06-30
Also published as: JP3617937B2

Abstract

PROBLEM TO BE SOLVED: To easily and surely recognize a subject extracted from a moving image and a still image in accordance with each user's convenience or the like by comparing the feature value of the subject extracted from the image with that of each category registered in a dictionary, presenting the recognition result of a category of the subject, and updating the dictionary on the basis of the feature value of the subject. SOLUTION: A visitor monitor device is composed by an image input part 1, a sound input part 2, a human body detecting part 3, an information extracting part 4, a storage part 5, a monitor part 6 and a user information input part 7. The human body detection part 3, for example, detects a change from an image acquired by a camera or the like, detects the human body. The information extracting part 4 extracts information necessary for record or recognition from the inputted image or sound. The storage part 5 is composed of a storage medium for storing dictionary information, extracted image information, recorded sound information and the like. The user information input part 7 performs input or correction of a name and a category of a visitor and retrieval and arrangement of the recorded information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、例えば、テレビ付
きインターホンや、ビデオメールやテレビ電話などにも
用いることのできる画像監視装置に関し、特に、人物を
含む動画像から人物等の被写体を監視するための画像監
視装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image monitoring apparatus which can be used for, for example, an intercom with a television, a video mail, a videophone, and the like, and more particularly, monitors a subject such as a person from a moving image including the person. The present invention relates to an image monitoring device.

【０００２】[0002]

【従来の技術】マンションの入口や家庭の玄関に設置さ
れるテレビ付きインターホンは来訪者の様子をモニター
で確認することができるが、留守番電話のように留守中
にも映像や音声を記録することが望まれている。このよ
うな機能を実現するためには、画像を動画で保存するた
めには、ＶＴＲ装置等を用いると装置が大型で扱いにく
いものとなり、また、ハードディスク等の記憶媒体を用
いる場合には記憶容量の問題から、小型化、低価格化が
困難であった。2. Description of the Related Art Telephone intercoms installed at the entrance of a condominium or at the entrance of a house allow visitors to check the status of visitors on a monitor. Is desired. In order to realize such a function, in order to save an image as a moving image, using a VTR device or the like makes the device large and cumbersome, and when a storage medium such as a hard disk is used, the storage capacity becomes large. Therefore, it has been difficult to reduce the size and cost.

【０００３】一方、これらの問題点を解決するために、
静止画で保存する場合には、必ずしも人物の識別が可能
な画像が保存されるとは限らず、後で見たときに識別不
可能となる可能性が存在する。On the other hand, in order to solve these problems,
When saving as a still image, an image that can identify a person is not always saved, and there is a possibility that the image cannot be identified when viewed later.

【０００４】また、例え識別が可能な画像が存在したと
しても、知人以外の集金、配達人や不審者の識別は必ず
しも容易ではない。また、知人であっても人物の名前を
失念することは今後高齢化の到来とともに多くなってく
ると思われる。[0004] Even if an image that can be identified exists, it is not always easy to collect money other than an acquaintance, and to identify a delivery person or a suspicious person. In addition, it is expected that even an acquaintance will forget the name of a person in the future with the aging of society.

【０００５】また、最近ではビデオカメラ付きパソコン
の低価格化により、家庭においてもパソコンを利用した
ビデオメールやテレビ電話等が容易に実現できるように
なりつつある。このように動画像が日常的にパソコン上
で使われるようになると、ハードディスク等の記憶装置
には限度があり、一度保存したデータの読みだしや検索
にも時間がかかるという問題が新たに生じる。[0005] Recently, the price of personal computers with video cameras has been reduced, so that video mail, videophone, and the like using personal computers can be easily realized even at home. When moving images are routinely used on a personal computer in this way, a storage device such as a hard disk is limited, and a new problem arises in that it takes time to read and search once stored data.

【０００６】[0006]

【発明が解決しようとする課題】そこで、本発明は上記
問題点に鑑みなされたもので、動画像、静止画像から抽
出される人物等の被写体の認識が各ユーザの都合やニー
ズに合わせて容易にしかも確実に行える画像監視方法お
よびそれを用いた画像監視装置を提供することを目的と
する。SUMMARY OF THE INVENTION Accordingly, the present invention has been made in view of the above problems, and it is easy to recognize a subject such as a person extracted from a moving image or a still image in accordance with the convenience and needs of each user. It is another object of the present invention to provide an image monitoring method and an image monitoring apparatus using the method, which can be performed more reliably.

【０００７】[0007]

【課題を解決するための手段】本発明の画像監視方法
は、入力された画像から抽出された被写体を認識するた
めの、各カテゴリ毎の特徴量を登録した辞書を生成し、
入力された画像から抽出された被写体の特徴量と前記辞
書に登録された各カテゴリの特徴量とを比較して該被写
体のカテゴリを認識し、認識結果を呈示するとともに、
該被写体の特徴量を基に、前記辞書を更新することによ
り、動画像、静止画像から抽出される人物等の被写体の
認識が各ユーザの都合やニーズに合わせて容易にしかも
確実に行える。According to an image monitoring method of the present invention, a dictionary for registering a feature amount for each category for recognizing a subject extracted from an input image is generated.
By comparing the feature amount of the subject extracted from the input image with the feature amount of each category registered in the dictionary, the category of the subject is recognized, and the recognition result is presented,
By updating the dictionary based on the feature amount of the subject, it is possible to easily and reliably recognize the subject such as a person extracted from a moving image or a still image according to the convenience and needs of each user.

【０００８】本発明の画像監視装置は、入力された画像
から抽出された被写体を認識するための、各カテゴリ毎
の特徴量を登録した辞書を生成し、入力された画像から
抽出された被写体の特徴量と前記辞書に登録された各カ
テゴリの特徴量とを比較して該被写体のカテゴリを認識
し、認識結果を呈示するとともに、該被写体の特徴量を
基に前記辞書を更新し、入力された音声メッセージを前
記認識結果または前記辞書に関連付けて記憶手段に記憶
することにより、動画像、静止画像から抽出される人物
等の被写体の認識が各ユーザの都合やニーズに合わせて
容易にしかも確実に行えるとともに、ユーザのニーズに
合わせて、よりインテリジェントな対応が可能となる。The image monitoring apparatus of the present invention generates a dictionary in which feature amounts for each category are registered for recognizing a subject extracted from an input image, and generates a dictionary of the subject extracted from the input image. The feature amount is compared with the feature amount of each category registered in the dictionary, the category of the subject is recognized, the recognition result is presented, and the dictionary is updated based on the feature amount of the subject. By storing the voice message in the storage unit in association with the recognition result or the dictionary, the recognition of a subject such as a person extracted from a moving image or a still image can be performed easily and reliably in accordance with the convenience and needs of each user. And more intelligent response to user needs.

【０００９】好ましくは、入力された時系列の複数の画
像のうち、前記辞書に登録されたカテゴリのいずれかと
その特徴量が最も類似する被写体が抽出された画像のみ
を記憶手段に記憶する。これにより、動画像、静止画像
から人物等の被写体の認識に適したもののみを選別して
記憶することができるので、記憶容量の低減化が図れ
る。Preferably, of the plurality of input time-series images, only an image in which a subject whose feature amount is most similar to one of the categories registered in the dictionary is extracted is stored in the storage means. As a result, only an image suitable for recognizing a subject such as a person can be selectively stored from a moving image and a still image, and the storage capacity can be reduced.

【００１０】本発明の画像監視装置は、入力された画像
から抽出された被写体を認識するための、各カテゴリ毎
の特徴量を登録した辞書を生成する辞書生成手段と、入
力された画像から抽出された被写体の特徴量と前記辞書
に登録された各カテゴリの特徴量とを比較して該被写体
のカテゴリを認識する認識手段と、この認識手段での認
識結果を呈示する呈示手段と、前記被写体の特徴量を基
に、前記辞書を更新する更新手段と、を具備することに
より、動画像、静止画像から抽出される人物等の被写体
の認識が各ユーザの都合やニーズに合わせて容易にしか
も確実に行える。The image monitoring apparatus according to the present invention has a dictionary generating means for generating a dictionary in which feature amounts for each category are registered for recognizing a subject extracted from an input image; Recognizing means for recognizing the category of the subject by comparing the feature quantity of the subject with the feature quantity of each category registered in the dictionary, presentation means for presenting a recognition result by the recognizing means, And updating means for updating the dictionary on the basis of the feature amount of the moving image, the recognition of a subject such as a person extracted from a moving image or a still image can be easily and easily matched to the convenience and needs of each user. It can be done reliably.

【００１１】好ましくは、入力された時系列の複数の画
像のうち、前記辞書に登録されたカテゴリのいずれかと
その特徴量が最も類似する被写体が抽出された画像のみ
を記憶する記憶手段をさらに具備する。これにより、動
画像、静止画像から人物等の被写体の認識に適したもの
のみを選別して記憶することができるので、記憶容量の
低減化とそれに伴う装置の小型化、低価格化が図れる。Preferably, the apparatus further comprises storage means for storing only an image in which a subject whose feature amount is most similar to one of the categories registered in the dictionary is extracted from the plurality of input time-series images. I do. As a result, only an image suitable for recognizing a subject such as a person can be selected and stored from a moving image or a still image, so that the storage capacity can be reduced, and the size and cost of the apparatus can be reduced accordingly.

【００１２】[0012]

【発明の実施の形態】以下、本発明の実施形態について
図面を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１３】（第１の実施形態）第１の実施形態では、
マンションや家庭の玄関などに設置されたテレビカメラ
付きインターホンなどで留守中に来訪者の伝言や画像を
記録したり、在宅中でも来訪者の名前や予め登録したカ
テゴリに属するかどうかなどを表示することでスムーズ
な対応を可能にする来訪者監視装置について説明する。(First Embodiment) In the first embodiment,
Record messages and images of visitors while they are away by using an intercom with a TV camera installed at the entrance of a condominium or home, or display the names of visitors or whether they belong to a pre-registered category while at home. A visitor monitoring device that enables a smooth response is described.

【００１４】図１は、本実施形態にかかる来訪者監視装
置の構成例を示したもので、画像入力部１と音声入力部
２と人物検知部３と情報抽出部４と記憶部５とモニタ部
６とユーザ情報入力部７とから構成される。FIG. 1 shows an example of the structure of a visitor monitoring apparatus according to the present embodiment. An image input unit 1, a voice input unit 2, a person detection unit 3, an information extraction unit 4, a storage unit 5, a monitor It comprises a unit 6 and a user information input unit 7.

【００１５】画像入力部１は、例えばビデオカメラ等の
監視カメラにて取得された画像を入力するためのもので
ある。音声入力部２は、例えばインターホンのマイク等
から取得された音声を入力するためのものである。The image input unit 1 is for inputting an image obtained by a monitoring camera such as a video camera. The voice input unit 2 is for inputting voice obtained from a microphone of an intercom, for example.

【００１６】人物検知部３は、例えば、呼鈴やブザーな
どの来訪者が自発的に来訪を知らせるもの、もしくは、
カメラ等から取得された画像から変化を検出して人物を
検知するもの、ノックや呼びかけなどの物理的な動作を
検出するもの等、特に限定しない。The person detecting unit 3 is a unit that a visitor such as a bell or a buzzer voluntarily notifies the visitor, or
There is no particular limitation on a device that detects a person by detecting a change from an image acquired from a camera or the like, a device that detects a physical operation such as knocking or calling, and the like.

【００１７】情報抽出部４は、入力された画像もしくは
音声から記録あるいは認識に必要な情報を抽出するもの
である。The information extracting section 4 extracts information necessary for recording or recognition from an input image or sound.

【００１８】記憶部５は、辞書情報や抽出された画像情
報、録音された音声情報などを記憶する磁気テープ、磁
気ディスク、光磁気ディスク、光ディスク、半導体メモ
リ等の記録媒体から構成されている。The storage unit 5 includes a recording medium such as a magnetic tape, a magnetic disk, a magneto-optical disk, an optical disk, and a semiconductor memory for storing dictionary information, extracted image information, recorded audio information, and the like.

【００１９】モニタ部６は、ディスプレイ装置、スピー
カなどから構成され、抽出された各種情報やカメラやマ
イクからの直接入力された信号を表示するためのもので
ある。The monitor section 6 includes a display device, a speaker, and the like, and displays various extracted information and signals directly input from a camera or a microphone.

【００２０】ユーザ情報入力部７は、来訪者の氏名やカ
テゴリを入力あるいは修正したり、記録情報の検索や整
理を行うものである。The user information input section 7 is for inputting or correcting the name and category of a visitor, and for searching and organizing recorded information.

【００２１】図２は、情報抽出部４の構成例を示したも
ので、顔検出部８と辞書生成部９と顔認識部１０と記録
用画像決定部１１とから構成される。FIG. 2 shows an example of the configuration of the information extracting unit 4, which comprises a face detecting unit 8, a dictionary generating unit 9, a face recognizing unit 10, and a recording image determining unit 11.

【００２２】顔検出部８は、入力された画像から人物の
顔領域を検出する。例えば、予め作成した顔領域のテン
プレートを利用して画像中で該テンプレートとの相関値
が最も高い領域を切り出す。The face detecting section 8 detects a face area of a person from the input image. For example, a region having the highest correlation value with the template is cut out from the image using a face region template created in advance.

【００２３】辞書生成部９は、切り出した複数枚の顔領
域画像から顔認識用の辞書を作成するようになってい
る。例えば、顔認識を従来からある部分空間法を用いて
行う場合には、従来と同様、顔領域画像から瞳や鼻孔特
徴点などの特徴点を検出し、検出された特徴点を利用し
て顔領域の正規化を行う。正規化をされた画像情報を特
徴空間内の特徴ベクトルに変換し、それらの特徴ベクト
ルから部分空間を生成し、それを顔認識用の辞書とす
る。そして、その部分空間をカテゴリに分類して記憶部
５に顔認識用の辞書として登録する。The dictionary generator 9 creates a dictionary for face recognition from a plurality of cut out face area images. For example, when performing face recognition using a conventional subspace method, similarly to the related art, feature points such as pupil and nostril feature points are detected from the face region image, and the detected feature points are used to detect the face. Performs region normalization. The normalized image information is converted into a feature vector in a feature space, a subspace is generated from the feature vectors, and this is used as a face recognition dictionary. Then, the subspace is classified into categories and registered in the storage unit 5 as a dictionary for face recognition.

【００２４】但し、この操作（辞書の新規登録）は、当
該来訪者が予め登録されたいずれのカテゴリにも属さな
い場合であり、属する場合においては当該カテゴリの辞
書更新を行うのが望ましい。この辞書更新の操作を行う
場合、辞書として保存するものは部分空間のみならず該
部分空間の固有値や相関行列といった統計情報も同時に
保存しておくことが望ましい。そして、辞書更新の際に
は、当該来訪者の顔領域画像から抽出された特徴量を基
に、当該部分空間の統計情報を更新すればよい。However, this operation (new registration of a dictionary) is performed when the visitor does not belong to any of the categories registered in advance, and if so, it is desirable to update the dictionary of the category. When performing this dictionary update operation, it is desirable to save not only the subspace but also statistical information such as eigenvalues and correlation matrices of the subspace at the same time as the dictionary. Then, at the time of updating the dictionary, the statistical information of the subspace may be updated based on the feature amount extracted from the face area image of the visitor.

【００２５】顔認識部１０では、予め登録したカテゴリ
に属するかどうかを入力画像系列と各カテゴリの辞書と
の類似度を計算する（例えば、各部分空間における特徴
ベクトルの内積を求める等、従来からある手法でもよ
い）ことにより判定する。登録されたどのカテゴリにも
属さない場合には先に述べた辞書を作成しておく。The face recognizing unit 10 calculates the similarity between an input image sequence and a dictionary of each category to determine whether the image belongs to a category registered in advance (for example, to obtain a dot product of a feature vector in each subspace, or the like). It may be determined by some method. If it does not belong to any of the registered categories, the dictionary described above is created.

【００２６】記録用画像決定部１１では、画像入力部１
から動画像が入力する度に、その入力画像系列から最も
当該来訪者の特徴を表す画像を選別して、それを記憶部
５に記憶する。選別方法は具体的には、入力画像系列の
複数の画像フレームから当該来訪者の辞書との類似度が
最も高い画像を選ぶ。入力された動画をそのまま記憶部
５に記憶しておくのでは、記憶容量が多く必要である
し、例えばその中で一枚だけ残しておく場合にどの画像
を残すかをユーザが選ぶのも面倒である。また、機械的
に選ぶと顔が正しく写ってなかったりする可能性が大き
い。しかし、入力画像系列中の画像が辞書にあるカテゴ
リのいずれにも属さないもの（いずれかのカテゴリに属
すると判定するには類似度が低すぎるもの）であって
も、そのうちの最も類似度が高い１枚のみを選定して記
憶部５に記憶しておけば、上記問題は解決できるであろ
う。In the recording image determining unit 11, the image input unit 1
Each time a moving image is input from, an image representing the feature of the visitor is selected from the input image sequence and stored in the storage unit 5. Specifically, the selection method selects an image having the highest similarity with the visitor's dictionary from a plurality of image frames of the input image sequence. If the input moving image is stored in the storage unit 5 as it is, a large storage capacity is required. For example, when only one image is left in the storage, it is troublesome for the user to select which image is left. It is. In addition, there is a high possibility that the face will not be correctly photographed if selected mechanically. However, even if the images in the input image series do not belong to any of the categories in the dictionary (the degree of similarity is too low to judge to belong to any of the categories), the most similarity among them is The above problem can be solved by selecting only one high-priced sheet and storing it in the storage unit 5.

【００２７】次に、留守中に来訪者がやってきた場合を
想定して、図１の来訪者対応装置の動作例を説明する。Next, an example of the operation of the visitor handling apparatus shown in FIG. 1 will be described, assuming that a visitor comes during an absence.

【００２８】ユーザは、予め留守状態であることをセッ
トしておく。このときユーザは在宅していても差し支え
ない。この操作は、通常の留守番電話と同様であっても
よい。来訪者がユーザ宅の玄関先に設定されたカメラ前
にきて、呼鈴を鳴らすか、あるいは、集合住宅の場合は
部屋番号を入力すると、その動作をトリガーとしてカメ
ラから入力された画像とマイクから入力された音声とを
それぞれ画像入力部１、音声入力部２にて取り込みを開
始し、室内に設置されたモニタ部６に表示する。The user sets in advance that the user is in the absence state. At this time, the user may be at home. This operation may be similar to a normal answering machine. When a visitor comes in front of the camera set at the entrance of the user's home and rings the bell, or in the case of a multi-family house, enters the room number, the operation triggers the image input from the camera and the microphone. The input voice and the input voice are started by the image input unit 1 and the voice input unit 2, respectively, and displayed on the monitor unit 6 installed indoors.

【００２９】一方、情報抽出部４では、まず、顔検出部
８で入力された画像から人物の顔領域を検出し、辞書生
成部９を経て抽出された特徴ベクトルと記憶部５に記憶
されている顔認識用の辞書を参照して、顔認識部１０で
画像入力部１から入力した画像から来訪者の顔を認識
し、例えば、当該来訪者のカテゴリを判定する。そし
て、認識結果としてのカテゴリ名をモニタ部６にカメラ
から取り込まれた画像とともに表示する。On the other hand, the information extraction unit 4 first detects a face area of a person from the image input by the face detection unit 8, and stores the feature vector extracted through the dictionary generation unit 9 and the storage unit 5. With reference to the face recognition dictionary, the face recognition unit 10 recognizes the visitor's face from the image input from the image input unit 1 and determines, for example, the category of the visitor. Then, the category name as the recognition result is displayed on the monitor 6 together with the image captured from the camera.

【００３０】カテゴリ名は予め登録されており、個人名
に限らない。例えば、新聞集金、宅配、郵便配達などは
毎度同じ人物が来る確率が高く、その個人名には意味が
ないので、個人名でないカテゴリ名、例えば、この場
合、「新聞集金」「宅配」「郵便配達」といったカテゴ
リ名でこれら認識結果を分類することも可能である。同
様に、特に分類する必要のないカテゴリに対しては、
「不審者」というカテゴリ名であってもよいし、。The category name is registered in advance, and is not limited to an individual name. For example, newspaper collection, home delivery, postal delivery, etc., are likely to have the same person every time, and the individual name is meaningless. Therefore, in this case, a category name that is not an individual name, for example, "newspaper collection", "home delivery", It is also possible to classify these recognition results by a category name such as “delivery”. Similarly, for categories that do not need to be classified,
The category name may be "suspicious person".

【００３１】また、あらかじめ登録したカテゴリに属さ
ない場合には、前述のように、入力画像系列から辞書を
生成する。属する場合には、前述のように、既に存在す
る辞書を更新する。ただし、分類されたカテゴリが誤っ
ていた場合に、ユーザが後で修正ができるように、当該
来訪者のみの辞書も作成し、当該カテゴリのもとの辞書
も消さずに保存しておくことが望ましい。If the category does not belong to a category registered in advance, a dictionary is generated from the input image sequence as described above. If so, the existing dictionary is updated as described above. However, if the categorized category is incorrect, it is possible to create a dictionary only for the visitor and save the original dictionary of the category without deleting it so that the user can correct it later. desirable.

【００３２】記録用画像決定部１１では、画像入力部１
からの入力画像系列の中で作成された辞書との類似度が
最も高い画像一枚を選別する。カテゴリの認識された画
像およびカテゴリの認識されなかった画像であっても、
その一枚のみを保存する。辞書は動画から生成するが、
画像は一枚のみ残すことにより、記憶容量を節約する。
さらなる記憶容量の削減のためには、顔領域のみを切り
出した画像のみを残してもよい。The recording image determining unit 11 includes an image input unit 1
One of the images having the highest similarity to the dictionary created in the input image sequence from is selected. Even if the image is a recognized category and an unrecognized category,
Save only one of them. Dictionaries are generated from videos,
Saving only one image saves storage capacity.
In order to further reduce the storage capacity, only an image obtained by cutting out only the face area may be left.

【００３３】また、音声入力部１から入力した来訪者の
音声メッセージを顔画像、認識結果と共に記憶部５に記
録する。The voice message of the visitor input from the voice input unit 1 is recorded in the storage unit 5 together with the face image and the recognition result.

【００３４】メッセージはキーワードスポットにより、
キーワードを音声認識により認識し、その結果を残すこ
とで、後に新たな辞書登録へのカテゴリ名づけが容易に
なる。それを簡単にするために例えば、「お名前をどう
ぞ」などといったプロンプトを音声もしくは文字情報と
して出すことも有効である。The message is indicated by a keyword spot.
Recognizing a keyword by voice recognition and leaving the result facilitates later naming of categories to new dictionary registrations. In order to simplify this, it is also effective to issue a prompt such as "Please name" as voice or character information.

【００３５】ユーザは帰宅時等、好きな時に伝言の確認
を行なう。その際認識されたカテゴリの修正や、新たな
カテゴリ名の入力、いらないメッセージの削除を行な
う。The user checks the message at his / her favorite time, such as when returning home. At that time, the recognized category is corrected, a new category name is input, and unnecessary messages are deleted.

【００３６】ユーザが在宅しているが、来訪者を確認し
てから応対したい場合にも、モニタに画像が表示され、
認識結果も表示されるため、応対したい来訪者の場合だ
けに応対することが可能となる。このとき、同じ会いた
くない来訪者があった時にいちいちその顔を覚えなくて
もカテゴリ名のみで対応できるので便利である。When the user is at home but wants to respond after confirming the visitor, an image is displayed on the monitor,
Since the recognition result is also displayed, it is possible to respond only to a visitor who wants to respond. At this time, when there is a visitor who does not want to meet the same, it is convenient because only the category name can be used without having to remember the face.

【００３７】（第２の実施形態）第２の実施形態では、
カメラ付きパソコンなどでビデオメール（例えば、一般
に、動画像を所定の通信手段を用いて送信することであ
ってもよい）やビデオコンファレンスを通じて相手から
送られてきた画像を利用して、ビデオメールの保存容量
を減らしたり、相手を顔認識によって確認したり、顔認
識用辞書を作成したりするものである。(Second Embodiment) In the second embodiment,
A video mail (for example, in general, a moving image may be transmitted using a predetermined communication means) on a personal computer with a camera or the like, and an image sent from a partner through a video conference may be used to transmit the video mail. It is used to reduce the storage capacity, confirm the other party by face recognition, and create a face recognition dictionary.

【００３８】第１の実施形態で説明した来訪者監視装置
は、主に、セキュリティに利用するものであったが、第
２の実施形態では、例えば、パソコン等で気軽に画像内
容をチェックするためにも用いることのできる画像監視
装置について説明する。The visitor monitoring device described in the first embodiment is mainly used for security. In the second embodiment, for example, a personal computer or the like is used to easily check image contents. The following describes an image monitoring device that can also be used.

【００３９】図３は、第２の実施形態にかかる画像監視
装置の構成例を示したもので、画像入力部１２、顔検出
部１３、辞書生成部１４、顔認識部１５、表示部１６、
記憶部１７、記録用画像決定部１８から構成される。図
３に示した来訪者監視装置は、パーソナルコンピュータ
（パソコン）上に構成されていてもよい。すなわち、例
えば、パソコンの有するハードウエアを用いて、上記各
部の機能をコンピュータに実行させることのできるプロ
グラムとして、磁気ディスク（フロッピー（登録商標）
ディスク、ハードディスクなど）、光ディスク（ＣＤ−
ＲＯＭ、ＤＶＤなど）、半導体メモリなどの記録媒体に
格納して頒布することもできる。FIG. 3 shows an example of the configuration of an image monitoring apparatus according to the second embodiment. The image input unit 12, the face detection unit 13, the dictionary generation unit 14, the face recognition unit 15, the display unit 16,
The storage unit 17 includes a recording image determination unit 18. The visitor monitoring device shown in FIG. 3 may be configured on a personal computer (personal computer). That is, for example, a magnetic disk (floppy (registered trademark)) is used as a program that allows a computer to execute the functions of the above-described units using hardware included in a personal computer.
Disk, hard disk, etc.), optical disk (CD-
ROMs, DVDs, etc.), semiconductor media, and other storage media for distribution.

【００４０】画像入力部１２は、あらかじめパソコンの
ハードディスク等に記録された動画像を読み込んだり、
ＬＡＮや電話などの通信回線を通して送られて来た動画
像を読み込んだりする。The image input unit 12 reads a moving image recorded in advance on a hard disk or the like of a personal computer,
It reads moving images sent through a communication line such as a LAN or a telephone.

【００４１】顔検出部１３、辞書生成部１４、顔認識部
１５、記憶部１７、記録用画像決定部１８については、
第１の実施形態で説明した顔検出部８、辞書生成部９、
顔認識部１０、記憶部５、記録用画像決定部１１と同様
である。The face detection unit 13, dictionary generation unit 14, face recognition unit 15, storage unit 17, and recording image determination unit 18
The face detection unit 8, the dictionary generation unit 9 described in the first embodiment,
It is the same as the face recognition unit 10, the storage unit 5, and the recording image determination unit 11.

【００４２】次にビデオメールの容量削減の場合を例に
して、実際の動作例について説明する。Next, an actual operation example will be described by taking the case of video mail capacity reduction as an example.

【００４３】相手から送られて来たビデオメールは通常
パソコンのハードディスクに保存される。画像入力部１
２は、そのデータを読み込んで、動画像の部分のみを取
り出し、さらに顔検出部１３で顔領域を検出する。検出
された顔領域からあらかじめ登録された人物カテゴリに
属するかどうか顔認識部１５において判定する。判定結
果をディスプレイなどの表示装置１６に画像とともに表
示する。The video mail sent from the other party is usually stored on the hard disk of the personal computer. Image input unit 1
2, the data is read, only the moving image portion is extracted, and the face detection unit 13 detects a face area. The face recognition unit 15 determines whether the detected face area belongs to a person category registered in advance. The determination result is displayed together with the image on a display device 16 such as a display.

【００４４】また、同時にもしくはユーザの指示で当該
動画像から辞書生成部１４において辞書を作成する。そ
の際、新しいカテゴリの場合はユーザがカテゴリ名を新
たに入力することができるが、通常の場合はカテゴリ名
をメールアドレスとすると便利である。すでに存在する
カテゴリの場合には、新たに辞書を作成する代わりに、
存在する辞書を更新する。At the same time or according to a user's instruction, a dictionary is created in the dictionary generation unit 14 from the moving image. At this time, in the case of a new category, the user can newly input a category name, but in a normal case, it is convenient to use the category name as an e-mail address. For categories that already exist, instead of creating a new dictionary,
Update existing dictionaries.

【００４５】記録用画像決定部１８において、上記動画
像から記録用の画像を決める。具体的には、第１の実施
形態で説明したように、入力画像系列中で、辞書との類
似度が最も高い画像を選択する。The recording image determining section 18 determines a recording image from the moving image. Specifically, as described in the first embodiment, an image having the highest similarity with the dictionary is selected from the input image sequence.

【００４６】ビデオメールは動画像情報であるため、そ
のまま記憶すると記憶容量が多く必要であるが、例えば
その中で一枚だけ残しておく場合にどの画像を残すかを
ユーザが選ぶのも面倒であるし、機械的に選ぶと顔が正
しく写ってなかったりする可能性が大きいが、このよう
にすることでその問題が解決される。Since the video mail is moving picture information, if it is stored as it is, a large storage capacity is required. For example, when only one of them is left, it is troublesome for the user to select which image is left. Yes, there is a high possibility that the face will not be correctly captured if it is selected mechanically, but this will solve that problem.

【００４７】なお、上記第１および第２の実施形態で
は、監視対象が人物である場合を例にとり説明したが、
この場合に限るものではなく、監視対象は何でもよく、
その場合も上記説明と同様である。In the first and second embodiments, the case where the monitoring target is a person has been described as an example.
It is not limited to this case, and the monitoring target may be anything,
In that case, it is the same as the above description.

【００４８】さらに、本発明はこれらの例に限定される
ものではなく、種々変形して応用可能である。Further, the present invention is not limited to these examples, and can be applied in various modifications.

【００４９】[0049]

【発明の効果】以上説明したように、本発明によれば、
動画像、静止画像から抽出される人物等の被写体の認識
が各ユーザの都合やニーズに合わせて容易にしかも確実
に行える。As described above, according to the present invention,
Recognition of a subject such as a person extracted from a moving image or a still image can be performed easily and reliably in accordance with the convenience and needs of each user.

[Brief description of the drawings]

【図１】本発明の第１の実施形態にかかる来訪者監視装
置の構成例を示した図。FIG. 1 is a diagram showing a configuration example of a visitor monitoring device according to a first embodiment of the present invention.

【図２】情報抽出部の構成例を示した図。FIG. 2 is a diagram showing a configuration example of an information extraction unit.

【図３】本発明の第２の実施形態にかかる画像監視装置
の構成例を示した図。FIG. 3 is a diagram showing a configuration example of an image monitoring apparatus according to a second embodiment of the present invention.

[Explanation of symbols]

１…画像入力部２…音声入力部３…人物検知部４…情報抽出部５…記憶部６…モニタ部７…ユーザ情報入力部８…顔検出部９…辞書生成部１０…顔認識部１１…記録用画像決定部１２…画像入力部１３…顔検出部１４…辞書生成部１５…顔認識部１６…表示部１７…記憶部１８…記録用画像決定部 REFERENCE SIGNS LIST 1 image input unit 2 voice input unit 3 person detection unit 4 information extraction unit 5 storage unit 6 monitor unit 7 user information input unit 8 face detection unit 9 dictionary generation unit 10 face recognition unit 11 ... Recording image determination unit 12 ... Image input unit 13 ... Face detection unit 14 ... Dictionary generation unit 15 ... Face recognition unit 16 ... Display unit 17 ... Storage unit 18 ... Recording image determination unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 9/00 Ｈ０４Ｍ 11/00 ３０１Ｈ０４Ｎ 7/14 11/00 ３０１ 7/16 ＺＨ０４Ｎ 5/765 Ｇ１０Ｌ 3/00 ５５１Ａ 5/781 ５５１Ｐ 7/14 Ｈ０４Ｎ 5/781 ５１０Ｅ 7/16 Ｆターム(参考） 5C054 AA02 CA04 CA08 CC02 CD06 CE15 CG06 CG07 CG08 CH04 CH05 DA01 DA08 EA01 EA03 EA05 EA07 EJ00 FA01 FA02 FA05 FB01 FB03 FC12 FC16 FE22 FE23 FE26 FF03 GA01 GA02 GA04 GB02 GB15 GC01 GC03 GD03 GD09 HA18 HA22 HA24 5C064 AA01 AA06 AB04 AC04 AC06 AC12 AC16 AC18 AD06 AD13 BA01 BB10 BC23 BC25 BD03 BD08 5D015 AA02 BB02 KK02 5K038 AA08 CC00 DD15 DD23 GG03 5K101 KK04 LL04 NN06 NN18 ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification code FI Theme coat ゛ (Reference) H04M 9/00 H04M 11/00 301 H04N 7/14 11/00 301 7/16 Z H04N 5/765 G10L 3 / 00A 551A 5/781 551P 7/14 H04N 5/781 510E 7/16 F term (reference) 5C054 AA02 CA04 CA08 CC02 CD06 CE15 CG06 CG07 CG08 CH04 CH05 DA01 DA08 EA01 EA03 EA05 EA07 EJ00 FA01 FA02 FA05 FB16 FB02 FC12 FC12 FE23 FE26 FF03 GA01 GA02 GA04 GB02 GB15 GC01 GC03 GD03 GD09 HA18 HA22 HA24 5C064 AA01 AA06 AB04 AC04 AC06 AC12 AC16 AC18 AD06 AD13 BA01 BB10 BC23 BC25 BD03 BD08 5D015 AA02 BB02 KK02 5K038 AA08 KK03 NN DD03

Claims

[Claims]

1. A dictionary for registering a feature amount for each category for recognizing a subject extracted from an input image is generated, and a feature amount of the subject extracted from the input image and the dictionary are stored in the dictionary. Recognizing the category of the subject by comparing the feature amount of each registered category and presenting the recognition result,
An image monitoring method, wherein the dictionary is updated based on a feature amount of the subject.

2. The image monitoring method according to claim 1, wherein the input voice message is stored in a storage unit in association with the recognition result or the dictionary.

3. The image processing apparatus, comprising: a plurality of input time-series images;
2. The image monitoring method according to claim 1, wherein only an image from which a subject whose feature amount is most similar to one of the categories registered in the dictionary is extracted is stored in a storage unit.

4. A dictionary generating means for generating a dictionary for registering a feature amount for each category for recognizing a subject extracted from an input image, and a feature amount of the subject extracted from the input image. Recognizing means for recognizing the category of the subject by comparing the feature amount of each category registered with the dictionary; presenting means for presenting a recognition result by the recognizing means; and An image monitoring apparatus, comprising: updating means for updating the dictionary.

5. The image monitoring apparatus according to claim 4, further comprising storage means for storing the input voice message in association with the recognition result or the dictionary.

6. A plurality of input time-series images,
5. The image monitoring apparatus according to claim 4, further comprising a storage unit configured to store only an image in which a subject whose feature amount is most similar to one of the categories registered in the dictionary is extracted.

7. A dictionary generating means for generating a dictionary in which a feature amount for each category is registered for recognizing a subject extracted from an input image, and a feature amount of the subject extracted from the input image. A recognition unit for recognizing the category of the subject by comparing the feature amount of each category registered with the dictionary; a presentation unit for presenting a recognition result by the recognition unit; and a feature amount of the subject based on the feature amount of the subject. An update unit that updates the dictionary; and a machine-readable recording medium that stores a program that executes the following.

8. A plurality of input time-series images,
8. The recording medium according to claim 7, further comprising a program executing storage means for storing only an image in which a subject whose feature amount is most similar to any of the categories registered in the dictionary is extracted.