JPH1196373A

JPH1196373A - Image pickup device

Info

Publication number: JPH1196373A
Application number: JP9275128A
Authority: JP
Inventors: Nobuyoshi Enomoto; 暢芳榎本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-09-22
Filing date: 1997-09-22
Publication date: 1999-04-09

Abstract

PROBLEM TO BE SOLVED: To surely pick up the image of a desired image pickup object by identifying the object while comparing feature information registered in an object identification dictionary with the image area of the extracted object, and displaying the image area of the object and the identified result. SOLUTION: An object area extracting part 4 extracts the image area of the object by comparing the feature information of a retrieval area on each hierarchy registered in an object area extraction dictionary 3 registering the feature information of the respective plural retrieval areas in a hierarchical implication relation while implication the image area of the object for retrieving the image area in the object inside a fetched image with the fetched image. An object identifying part 9 identifies the object by comparing an object identification dictionary 8 registering the feature information of the object with the image area of the object extracted by the object area extracting part 4. Then, a display part 10 displays the image area of the object extracted by the object area extracting part 4 and the identified result at the object identifying part 9. Thus, the image of the desired image pickup object can be surely picked up.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、所望の対象物を撮
像する撮像装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an imaging device for imaging a desired object.

【０００２】[0002]

【従来の技術】一般の映像中から所望の対象物を見付け
それを識別するものとして、光学式、またはディジタル
式の静止画カメラ、およびテレビカメラによって撮像
し、この映像をアナログ、またはディジタル映像信号に
変換、蓄積した複数映像をユーザに提示して、ユーザが
所望の対象物が含まれたものかどうかを判定し、蓄積手
段にその映像と対象物の属性（名前等）を関連づけるた
めのデータ構造を保存する映像編集装置といわれるもの
がある。2. Description of the Related Art An optical or digital still camera and a television camera are used to detect and identify a desired object in a general image, and the image is analog or digital image signal. Data for presenting to the user the plurality of videos converted and stored to determine whether the user includes the desired object, and associating the storage device with the attribute (name, etc.) of the video and the object. There is an image editing device that saves a structure.

【０００３】このような映像編集装置では、入力映像は
ユーザが目視対象識別を行ないやすいようにディジタル
信号化され提示されるが、複数の画像列の中から所望対
象物の含まれるものを探し、識別を行なうのはユーザで
あるため、その負担が大きい。In such a video editing apparatus, an input video is digitized and presented so that a user can easily identify a visual target. However, a video including a desired target is searched from a plurality of image sequences. Since it is the user who performs the identification, the burden is large.

【０００４】また、撮像され蓄積されたディジタル映像
の列（フレーム）について、フレーム間の映像変化の量
や、フレーム内の画素の移動ベクトルの大きさや向きに
よってシーンの切れ目を見付け、それを代表フレームと
し、代表フレームのみ、または代表フレーム周囲の少数
フレームをユーザに提示することで、ユーザが全映像を
目視せずに所望の対象を捜し出せる可能性を高めた映像
編集装置といわれるものがある。[0004] Further, regarding a sequence (frame) of digital images captured and stored, a break in the scene is found based on the amount of video change between frames and the magnitude and direction of the movement vector of the pixels in the frame. There is an image editing apparatus that presents only a representative frame or a small number of frames around the representative frame to the user, thereby increasing the possibility that the user can search for a desired target without viewing the entire image.

【０００５】このような映像編集装置では、映像列の変
動をとらえ、代表画像（フレーム）を決定、提示するた
め、全画像列の目視をユーザが行なうという負担はない
が、代表フレームについては対象かどうかの判断を人間
が行なう必要がある。[0005] In such a video editing apparatus, since the fluctuation of the video sequence is captured and the representative image (frame) is determined and presented, there is no burden on the user to view the entire image sequence. It is necessary for a human to determine whether or not.

【０００６】さらに、代表フレームについて、所望の対
象の画像特徴を辞書として持ち、保有しているうちの唯
一、またはいくつかの辞書パターンと似ているかどうか
の判断を行なう機能を適用することによって、対象の識
別を行なう装置がある。Further, by applying a function of having a desired image characteristic of a representative frame as a dictionary and determining whether or not the representative frame is similar to only one or several of the dictionary patterns, There is a device for identifying an object.

【０００７】このような装置では、抽出された代表フレ
ームの中に対象があるかどうかを判定する手段を具備す
るため、ユーザの負担は軽減されるが、そもそも代表画
像列が所望対象を含まない場合も存在する可能性があ
る。[0007] Such an apparatus includes means for determining whether or not there is a target in the extracted representative frame, so that the burden on the user is reduced, but the representative image sequence does not include the desired target in the first place. May also be present.

【０００８】一般情景中の画像を撮影する場合に所望の
対象がどのようなものかが、あらかじめわかっている場
合が多い。たとえば、以下の（１）〜（３）のような場
合がそれにあたる。When an image in a general scene is photographed, it is often known in advance what kind of desired object is. For example, the following cases (1) to (3) correspond to such cases.

【０００９】（１）ハンディＶＣＲやカメラを用いて、
なにかのイベントがあった場合に登場する人物やあらか
じめ概略わかっている物（たとえば製品など）のみを注
目して撮影したい場合や看板に注目して撮影し、さらに
内容を読み取ったりする場合。(1) Using a handy VCR or camera,
When you want to focus on a person who appears in the event of an event or an object (for example, a product) that you know in advance (such as a product), or when you want to focus on a signboard and shoot, and then read the content.

【００１０】（２）移動荷物上の着店コードを読み取
り、荷物の配送先振り分けを行ったり荷物の通し番号読
みとりによって荷物の配達状況追跡を行う場合。(2) A case in which a store arrival code on a moving bag is read and the delivery destination of the bag is sorted, or the delivery status of the bag is tracked by reading the serial number of the bag.

【００１１】（３）車両、船、列車、飛行機等の乗り物
に記述されているナンバーやマークなどを注目して撮影
し、内容を読みとり、不正通行発見、旅行時間計測、な
どの機能を可搬型の装置で実現する場合。(3) Attention is paid to the numbers and marks written on vehicles such as vehicles, ships, trains, airplanes, etc., and images are taken, the contents are read, and illegal traffic detection, travel time measurement, and other functions are carried. When realizing with the device of.

【００１２】上記（１）〜（３）のような場合には、従
来は所望対象をユーザがよくねらって撮像し、かつその
映像も所望映像がもれないために、所望でないものも大
量に撮像しておいて、後でその中から所望の部分ののみ
をユーザが抽出し内容を判別するか、または持ち帰った
大量のデータ中から連続画像間の画素変化量や移動ベク
トルなどの何らかの特徴を用いてシーンの切れ目を自動
検出してそれを手がかりに所望の画像を選択し、また選
択画像が所望画像かどうかを自動識別していた。In the above-mentioned cases (1) to (3), conventionally, the user takes an image of a desired object with good aim, and there is no desired image. After capturing the image, the user can extract only the desired portion from the image and determine the content, or select some characteristic such as the pixel change amount between consecutive images or the movement vector from the large amount of data brought back. In this method, a scene break is automatically detected, a desired image is selected based on the automatically detected break, and whether or not the selected image is a desired image is automatically identified.

【００１３】[0013]

【発明が解決しようとする課題】このように、従来は、
撮像された画像中に実際に所望する対象物が含まれてい
るか否かは、最終的にユーザがその撮像された画像を見
て判断するしかなく、所望する対象物を確実に撮像する
ことができなかった。As described above, conventionally,
Whether or not the desired object is actually included in the captured image is ultimately determined by the user looking at the captured image, and it is possible to reliably capture the desired object. could not.

【００１４】また、映像を撮像して蓄積し、その蓄積さ
れた映像中から所望の対象物を抽出、識別する従来の方
式では、撮像、光電変換、蓄積までをビデオカメラ等の
可搬型デバイスで行ない、対象抽出、および識別の処理
は別の装置で行なうことになるために、あらかじめ所望
対象が含まれるかどうかが不明の大量のデータをも収集
しておく必要がある。さらに、所望対象を抽出するため
の撮像系のパラメータは対象を撮像するときの状況に応
じて、対象を識別しやすいようにユーザが予め調整する
必要があった。In the conventional method, a video is captured and stored, and a desired object is extracted and identified from the stored video by a portable device such as a video camera. Since the processes of performing, object extraction, and identification are performed by another device, it is necessary to collect in advance a large amount of data which is unknown whether or not a desired object is included. Furthermore, the user has to adjust the parameters of the imaging system for extracting the desired target in advance according to the situation when the target is imaged so that the target can be easily identified.

【００１５】そこで、本発明は、撮像対象物の撮像状況
に応じてその撮像対象物が所望のものであるか否かを判
定しながら確実に所望の撮像対象物を撮像することので
きる撮像装置を提供することを目的とする。Accordingly, the present invention provides an imaging apparatus capable of reliably imaging a desired imaging target while determining whether or not the imaging target is a desired one according to the imaging state of the imaging target. The purpose is to provide.

【００１６】[0016]

【課題を解決するための手段】本発明の撮像装置（請求
項１）は、画像を取り込む画像取込手段と、前記画像中
の対象物の像域を検索するための該対象物の像域を包含
して階層的な包含関係にある複数の検索領域のそれぞれ
の特徴情報を登録した対象領域抽出辞書と、この対象領
域抽出辞書に登録された各階層の検索領域の特徴情報と
前記画像取込手段で取り込まれた画像とを比較して、該
対象物の像域を抽出する対象領域抽出手段と、前記対象
物の特徴情報を登録した対象識別辞書と、この対象識別
辞書に登録された特徴情報と、前記対象領域抽出手段で
抽出された前記対象物の像域とを比較して前記対象物を
識別する対象識別手段と、前記対象領域抽出手段で抽出
された前記対象物の像域および前記対象識別手段での識
別結果を表示する表示手段と、を具備したことにより、
確実に所望の撮像対象物を撮像することができる。According to the present invention, there is provided an imaging apparatus comprising: an image capturing means for capturing an image; and an image area of the object for retrieving an image area of the object in the image. , A target area extraction dictionary in which characteristic information of each of a plurality of search areas in a hierarchical inclusion relationship is registered, and the characteristic information of the search area of each hierarchy registered in the target area extraction dictionary and the image capture A target area extracting unit that compares the image captured by the reading unit and extracts an image area of the target object; a target identification dictionary in which feature information of the target object is registered; and a target identification dictionary registered in the target identification dictionary. A target identifying unit that compares the feature information with the image area of the target object extracted by the target region extracting unit to identify the target object; and an image region of the target object extracted by the target region extracting unit. And the identification result of the object identification means is displayed. And shows means by provided with the,
It is possible to reliably image a desired imaging target.

【００１７】また、本発明の撮像装置（請求項２）は、
画像を取り込む画像取込手段と、前記画像を取り込む際
の環境条件に応じて、該画像中に対象物の像域を検索す
るための該対象物の像域を包含して階層的な包含関係に
ある複数の検索領域のそれぞれの特徴情報を登録した複
数の対象領域抽出辞書と、前記取込手段で取り込まれた
画像に基づき前記複数の対象領域抽出辞書のうちの１つ
を選択する選択手段と、この選択手段で選択された対象
領域抽出辞書に登録されている各階層の検索領域の特徴
情報と前記画像取込手段で取り込まれた画像とを比較し
て、前記対象物の像域を抽出する対象領域抽出手段と、
前記対象物の特徴情報を登録した対象識別辞書と、この
対象識別辞書に登録された特徴情報と前記対象領域抽出
手段で抽出された前記対象物の像域とを比較して前記対
象物を識別する対象識別手段と、前記対象領域抽出手段
で抽出された前記対象物の像域および前記対象識別手段
での識別結果を表示する表示手段と、を具備したことに
より、撮像対象物の撮像環境の変化に関わりなく確実に
所望の撮像対象物を撮像することができる。Further, the imaging apparatus of the present invention (claim 2)
An image capturing means for capturing an image, and a hierarchical inclusion relationship including an image area of the object for searching for an image area of the object in the image according to environmental conditions at the time of capturing the image. A plurality of target area extraction dictionaries in which respective feature information of a plurality of search areas are registered, and a selection unit for selecting one of the plurality of target area extraction dictionaries based on the image captured by the capture unit And comparing the feature information of the search area of each hierarchy registered in the target area extraction dictionary selected by the selection means with the image captured by the image capturing means, to determine the image area of the object. Target region extracting means to be extracted;
The target identification dictionary in which the characteristic information of the target is registered, and the characteristic information registered in the target identification dictionary and the image area of the target extracted by the target region extracting means are compared to identify the target. Object identification means, and display means for displaying the image area of the object extracted by the object area extraction means and the identification result of the object identification means, It is possible to reliably image a desired imaging target regardless of the change.

【００１８】また、本発明の撮像装置（請求項３）は、
画像を取り込む画像取込手段と、前記画像を取り込む際
の環境条件に応じて、該画像中に対象物の像域を検索す
るための該対象物の像域を包含して階層的な包含関係に
ある複数の検索領域のそれぞれの特徴情報を登録した複
数の対象領域抽出辞書と、前記環境条件に対応した前記
画像の特徴情報と前記対象領域抽出辞書の識別子を登録
した環境条件辞書と、この環境条件辞書に登録されてい
る特徴情報と前記取込手段で取り込まれた画像とを比較
して前記複数の対象領域抽出辞書のうちの１つを選択す
る選択手段と、この選択手段で選択された対象領域抽出
辞書に登録されている各階層の検索領域の特徴情報と前
記画像取込手段で取り込まれた画像とを比較して、前記
対象物の像域を抽出する対象領域抽出手段と、前記対象
物の特徴情報を登録した対象識別辞書と、この対象識別
辞書に登録された特徴情報と前記対象領域抽出手段で抽
出された前記対象物の像域とを比較して前記対象物を識
別する対象識別手段と、前記対象領域抽出手段で抽出さ
れた前記対象物の像域および前記対象識別手段での識別
結果を表示する表示手段と、を具備したことにより、撮
像環境の変化に関わりなく確実に所望の撮像対象物を撮
像することができる。Further, the image pickup apparatus of the present invention (claim 3)
An image capturing means for capturing an image, and a hierarchical inclusion relationship including an image area of the object for searching for an image area of the object in the image according to environmental conditions at the time of capturing the image. A plurality of target region extraction dictionaries in which feature information of each of a plurality of search regions is registered; an environment condition dictionary in which the image feature information corresponding to the environmental condition and an identifier of the target region extraction dictionary are registered; Selecting means for comparing one of the plurality of target area extraction dictionaries by comparing feature information registered in the environmental condition dictionary with the image captured by the capturing means; Comparing the feature information of the search area of each hierarchy registered in the target area extraction dictionary with the image captured by the image capturing means, and extracting an image area of the target object; Register the feature information of the object A target identification dictionary for identifying the target object by comparing feature information registered in the target identification dictionary with an image area of the target object extracted by the target region extraction unit; Display means for displaying the image area of the target object extracted by the region extracting means and the identification result by the target identifying means, and thereby ensuring the desired imaging target object regardless of a change in the imaging environment. Images can be taken.

【００１９】[0019]

【発明の実施の形態】以下、本発明の実施形態について
図面を参照して説明する。（１）撮像装置の概要図１は、本実施形態に係る撮像装置の構成例を概略的に
示したもので、絞り、フォーカシングがディジタル制御
可能な光学系からなる画像入力部１と、シャッター速度
をディジタル制御可能な電子シャッターを内蔵している
ＣＭＯＳイメージセンサアレイとＡ／Ｄ変換器とからな
る撮像部２と、撮像部２のディジタル出力をランダムア
クセスし所望の対象領域を対象領域抽出辞書３を用いて
検出するための対象領域抽出部４と、この動作に先立っ
て、撮像部２からの画像を逐次アクセスして一時所定の
メモリに格納した複数画像と撮像状況辞書５とを用いて
現在の撮像状況を推定して対象領域抽出辞書３を選択す
る撮像状況選択部６と、対象領域抽出部４で対象領域が
抽出された後に光学系と対象との位置関係によって生ず
る抽出領域の３次元的幾何学変形を対象識別辞書８に登
録されている位置の形状へと逆変換する幾何変換部７
と、所望対象領域の抽出と幾何変換の後、対象識別辞書
８を用いて対象領域内部の識別を行う対象識別部９、対
象領域抽出結果やその識別結果をユーザに提示する表示
部１０とを具備する。Embodiments of the present invention will be described below with reference to the drawings. (1) Outline of Imaging Apparatus FIG. 1 schematically illustrates an example of the configuration of an imaging apparatus according to the present embodiment. An image input unit 1 including an optical system capable of digitally controlling the aperture and focusing, and a shutter speed Unit 2 comprising a CMOS image sensor array having an electronic shutter capable of digitally controlling the image and an A / D converter, and a digital output of the imaging unit 2 which is randomly accessed and a desired target region is extracted as a target region extraction dictionary 3 Prior to this operation, a target area extracting unit 4 for detecting the image by using a plurality of images that are sequentially accessed and temporarily stored in a predetermined memory and an imaging state dictionary 5 are used. The imaging condition selection unit 6 for estimating the imaging condition of the target and selecting the target region extraction dictionary 3, and the positional relationship between the optical system and the target after the target region is extracted by the target region extraction unit 4. Geometric conversion unit 7 inversely converted into 3-dimensional geometry of the position registered in the object identification dictionary 8 a variation of the extraction region generated
And a target identification unit 9 for identifying the inside of the target region using the target identification dictionary 8 after the extraction and geometric transformation of the desired target region, and a display unit 10 for presenting the target region extraction result and the identification result to the user. Have.

【００２０】また、表示部１０の表示結果に基づいてユ
ーザが画像入力部１および対象領域抽出部４に用いるパ
ラメータを調整可能とするユーザ操作部１１と、その操
作パラメータを記録した撮像調整パラメータ辞書１６
と、撮像調整パラメータ辞書１６に基づいて、別の撮像
機会に上記調整を自動的に行うための自動撮像調整部１
７とを具備する。A user operation unit 11 that allows a user to adjust parameters used in the image input unit 1 and the target area extraction unit 4 based on the display result of the display unit 10, and an imaging adjustment parameter dictionary that records the operation parameters 16
And an automatic imaging adjustment unit 1 for automatically performing the adjustment at another imaging opportunity based on the imaging adjustment parameter dictionary 16.
7 is provided.

【００２１】さらに、対象領域抽出辞書３、対象識別辞
書８、撮像状況辞書５、撮像調整パラメータ辞書１６の
各辞書を作成するための複数のデータを一時的に保存し
ておくための記憶手段１２と、それらデータを外部装置
へ出力したり、外部装置から辞書データを入力したりす
るデータ入出力部１３と、複数の入力画像データを用い
て対象領域抽出辞書３、対象識別辞書８、撮像状況辞書
５の各辞書を作成するための辞書作成部１４とを具備し
ている。Further, storage means 12 for temporarily storing a plurality of data for creating respective dictionaries of the target area extraction dictionary 3, the target identification dictionary 8, the imaging situation dictionary 5, and the imaging adjustment parameter dictionary 16. A data input / output unit 13 for outputting the data to an external device or inputting dictionary data from the external device; a target area extraction dictionary 3, a target identification dictionary 8, and an imaging condition using a plurality of input image data. A dictionary creating unit 14 for creating each dictionary of the dictionary 5;

【００２２】なお、上記各部は、全て１つの筐体内に格
納された可搬型の撮像装置を構成する。ただし、辞書作
成は統計的手法によって行うため、より大量の画像デー
タを使用して作成する方が精度が高くなる。したがって
上述の辞書作成部１４は、本撮像装置外部の高速大容量
計算機上に構築しておき、データ入出力部１３を通じて
辞書作成に必要なデータを送り、辞書作成を行ってもよ
い。The above components constitute a portable imaging device all housed in one housing. However, since the dictionary is created by a statistical method, it is more accurate to create the dictionary using a larger amount of image data. Therefore, the dictionary creation unit 14 may be built on a high-speed, large-capacity computer outside the imaging apparatus, send data necessary for dictionary creation through the data input / output unit 13, and create the dictionary.

【００２３】また、本撮像装置では、撮像部２中の撮像
デバイスとして、半導体ダイナミックメモリとほぼ同一
構造であるＣＭＯＳイメージセンサデバイスを用いるこ
とにより、画像データのランダムアクセスを可能とし、
対象領域抽出部４を撮像部２と同一のＬＳＩに組み込む
ことで対象領域抽出処理を並列化し高速化が可能であ
る。Further, in the imaging apparatus, a CMOS image sensor device having substantially the same structure as the semiconductor dynamic memory is used as an imaging device in the imaging section 2, thereby enabling random access to image data.
By incorporating the target area extraction unit 4 into the same LSI as the imaging unit 2, the target area extraction processing can be parallelized and speeded up.

【００２４】以下、図１の撮像装置を構成する各部につ
いて詳細に説明する。Hereinafter, each unit constituting the image pickup apparatus of FIG. 1 will be described in detail.

【００２５】（２）画像入力部１絞り、およびフォーカシングに対するディジタル信号を
与えると、それに対応するレンズ絞り値、フォーカシン
グ値の対応テーブルを内蔵しており、指定通りの絞り
値、フォーカシング値に自動的に設定される機能を持っ
た既存の画像入力手段を使用して差し支えない。(2) Image input unit 1 When a digital signal for aperture and focusing is given, a correspondence table of lens aperture value and focusing value corresponding to the digital signal is built in, and the aperture value and focusing value as specified are automatically set. An existing image input means having a function set to the above may be used.

【００２６】（３）撮像部２撮像素子とその表面への光の入射時間間隔（シャッター
速度）を外部からのディジタル制御信号によって制御す
る電子シャッターと撮像素子の出力をディジタル信号に
変換するＡ／Ｄ変換器とからなる。(3) Imager 2 An electronic shutter for controlling the time interval (shutter speed) of light incident on the image sensor and its surface by an external digital control signal, and an A / A for converting the output of the image sensor to a digital signal. And a D converter.

【００２７】撮像素子としては出力が直接ランダムアク
セス可能なＣＭＯＳイメージメンサアレイを使用してよ
く、電子シャッターとしては液晶方式のものなどを使用
して良い。As the image pickup device, a CMOS image Mensa array whose output can be directly accessed at random may be used, and as the electronic shutter, a liquid crystal type shutter or the like may be used.

【００２８】Ａ／Ｄ変換器はセンサアレイの出力を一定
のｂｉｔ長のディジタルデータに変換するために使用す
る。The A / D converter is used to convert the output of the sensor array into digital data having a fixed bit length.

【００２９】（４）対象領域抽出部４と対象領域抽出辞
書３撮像部２のディジタル出力画像内部のものが何であるの
かを識別する場合に、その前段で画像内部に所望の範疇
の対象領域（たとえば映像中から特定の人を識別したか
ったら、この場合の所望の範疇の領域とは顔領域とな
る）を抽出する処理を行って識別対象の領域を絞り込
む。この処理は対象領域抽出部４にて行う。すなわち、
あらかじめ撮像されて本撮像装置に入力されている所望
の対象物候補の画像をおおまかに表現する２次元的テン
プレート（対象領域抽出辞書３）と撮像部２のディジタ
ル出力画像をランダムアクセスして得られた部分画像と
のテンプレートマッチングを行った結果がある統計的基
準を満たす領域として所望の対象領域を抽出する。(4) The target area extraction unit 4 and the target area extraction dictionary 3 When identifying what is inside the digital output image of the imaging unit 2, the target area of a desired category is included in the image at the preceding stage. For example, if it is desired to identify a specific person from the video, the area of the desired category in this case is a face area), and the area to be identified is narrowed down. This processing is performed by the target area extraction unit 4. That is,
A two-dimensional template (target area extraction dictionary 3) that roughly represents an image of a desired target candidate that has been captured in advance and input to the imaging apparatus and a digital output image of the imaging unit 2 are obtained by random access. A desired target region is extracted as a region satisfying a certain statistical criterion as a result of performing the template matching with the partial image.

【００３０】ここに、たとえばテンプレートマッチング
による所望領域選択の基準としては、撮像部２の出力画
像上でテンプレートをずらしながら、テンプレートと重
なった領域内の画像をベクトルと見て、それとテンプレ
ートとのユークリッド距離、すなわち対応画素同士の画
素値の差の２乗のテンプレート領域内総和の平方が最小
の位置を選ぶ方法を用いてもよい。また、たとえばユー
クリッド距離の代わりに部分空間法（参照文献１：飯
島、パターン認識理論、ＰＰ１１９）によって求めた類
似度を用いてもよい。これは辞書からのテンプレートベ
クトルとして所望対象を示す複数のサンプル画像ベクト
ルの分散共分散行列を主成分分析したときの固有ベクト
ルを対応する固有値の大きい方から有限個あつめて並べ
たものを用い、テンプレートと重なった領域内の画像を
ベクトルと見たときの出力画像ベクトルとテンプレート
内の各固有ベクトルとの内積の総和が最大となる領域を
所望対象領域とするものである。Here, as a criterion for selecting a desired area by template matching, for example, while shifting the template on the output image of the imaging unit 2, the image in the area overlapping the template is regarded as a vector, and the Euclidean A method of selecting a position at which the distance, that is, the square of the sum of the squares of the pixel value difference between corresponding pixels in the template area is the minimum may be used. Further, for example, a similarity calculated by a subspace method (Reference 1: Iijima, Pattern Recognition Theory, PP119) may be used instead of the Euclidean distance. This uses a finite number of eigenvectors obtained by performing principal component analysis of the variance-covariance matrix of a plurality of sample image vectors indicating a desired target as template vectors from the dictionary, and arranging them in finite order from the largest corresponding eigenvalue. When the image in the overlapped area is viewed as a vector, the area where the sum of the inner products of the output image vector and each of the eigenvectors in the template is the maximum is defined as the desired target area.

【００３１】上記での所望対象物のみに関する部分空間
法の類似度ではなく、所望領域以外の対象物辞書データ
をも用いた混合類似度（参照文献２：飯島、パターン認
識理論、ＰＰ１２２）を算出して、それが最大となる領
域を所望対象領域として抽出してもよい。A similarity similar to that of the subspace method relating only to the desired object as described above, but a mixed similarity (Reference Document 2: Iijima, Pattern Recognition Theory, PP122) using object dictionary data other than the desired area is calculated. Then, a region where the maximum is obtained may be extracted as a desired target region.

【００３２】また撮像部２の出力画像は、光学系と対象
物の位置関係によって、さまざまな幾何学的な変換を受
けているため、対象領域辞書３では、ある１つの配置に
対する対象物の複数サンプル画像から作成したその対象
物に対する辞書データ（テンプレート）にさらにさまざ
まな幾何学変換を適用した結果できた複数のテンプレー
トを持つことにし（マルチエントリ）、マッチング結果
としてはこれら複数のテンプレートのうち、もっとも前
記統計的基準を満たすようなものを選択する。Since the output image of the image pickup unit 2 has undergone various geometric transformations depending on the positional relationship between the optical system and the object, the object area dictionary 3 stores a plurality of objects corresponding to one arrangement. The dictionary data (template) for the object created from the sample image will have multiple templates (multi-entries) obtained as a result of applying various geometric transformations, and as a matching result, among these multiple templates, The one that satisfies the statistical standard is selected.

【００３３】さらに対象領域抽出辞書３には、対象識別
部９で識別される対象そのもののみを表わすものではな
く、その対象を幾何学的に包含するような領域をおおま
かに表わすような辞書パターンを１つ、または複数含め
てあり、それらのうち最も大きなものから小さなものを
順次選択し、次第に入力画像内の所望対象物の検索領域
を狭めていく。Further, the target area extraction dictionary 3 does not represent only the target itself identified by the target identification unit 9 but includes a dictionary pattern that roughly represents an area geometrically including the target. One or more of them are included, and the smallest one is selected sequentially from the largest one, and the search area for the desired object in the input image is gradually narrowed.

【００３４】したがって上記のような階層的マッチング
を行うために、対象領域抽出辞書３を所望対象物のカテ
ゴリごとに、例えば、図２に示すような階層構成にす
る。各階層ごとに幾何学変換種類数のマルチエントリ辞
書が存在し、上位階層の辞書エントリに続いて下位階層
のより詳細領域の辞書が記述されるが、上位階層のエン
トリには、それに包含される下位階層領域との座標関係
（その階層のパターン重心を原点し画素数を単位とす
る）が記述される。そして、この構成にしたがって、上
位から下位へ順に所望の対象領域を限定していくが、こ
のとき上位階層で所望対象領域を包含する領域が決定さ
れたら、前記下位階層領域との座標関係を参考にし、次
の階層でマッチングすべき領域を絞り込んで、最終的に
所望対象領域を抽出する。Therefore, in order to perform the above-described hierarchical matching, the target area extraction dictionary 3 has a hierarchical structure as shown in FIG. 2 for each desired object category. There is a multi-entry dictionary of the number of geometric transformation types for each hierarchy, and a dictionary of a more detailed area of the lower hierarchy is described following the dictionary entry of the upper hierarchy, and the entry of the upper hierarchy is included in the dictionary. A coordinate relationship with the lower hierarchical area (the origin of the pattern centroid of the hierarchical level and the number of pixels as a unit) is described. Then, according to this configuration, the desired target area is limited in order from the upper level to the lower level. At this time, when the area including the desired target area in the upper level is determined, the coordinate relationship with the lower level area is referred to. Then, a region to be matched is narrowed down in the next hierarchy, and a desired target region is finally extracted.

【００３５】図３に示すフローチャートを参照して対象
領域抽出部４の対象領域抽出処理について説明する。The target area extraction processing of the target area extraction unit 4 will be described with reference to the flowchart shown in FIG.

【００３６】ここである階層ごとの幾何学変換として
は、たとえば対象の画面平面内での回転やサイズなどが
あるので、実際は以下の処理手順となる。The geometrical transformation for each layer includes, for example, rotation and size in a target screen plane, so that the following processing procedure is actually used.

【００３７】撮像部２の出力画像のマッチングによる対
象領域の検索領域を全画面に設定して、対象領域抽出辞
書３の最上位階層（例えば、図２では「階層１」）の辞
書データと撮像部２の出力画像とのマッチングを行う
（ステップＳ１〜ステップＳ２）。その際、撮像部２の
出力画像を複数の解像度でそれぞれ縮小して作成された
複数の画像と最上位階層における全ての幾何学変換種類
の辞書データ（幾何変換パターン）とでマッチングを行
う（ステップＳ３〜ステップＳ４）。そして、１つの幾
何学変換種類（回転）の辞書データ（ベクトル）と、撮
像部２の出力画像を複数解像度で縮小した画像とのマッ
チングを複数の幾何学変換種類の辞書データに対して行
った結果が前述の統計的基準を最も満たすものを見つけ
る。The search area of the target area by matching the output image of the image pickup unit 2 is set on the entire screen, and the dictionary data of the highest level (for example, “level 1” in FIG. 2) of the target area extraction dictionary 3 is picked up. Matching with the output image of the unit 2 is performed (steps S1 and S2). At this time, matching is performed between a plurality of images created by reducing the output image of the imaging unit 2 at a plurality of resolutions and dictionary data (geometric conversion patterns) of all geometric conversion types in the highest hierarchy (step). S3 to step S4). Then, matching between dictionary data (vector) of one geometric conversion type (rotation) and an image obtained by reducing the output image of the imaging unit 2 at a plurality of resolutions was performed on dictionary data of a plurality of geometric conversion types. Find the ones whose results best meet the aforementioned statistical criteria.

【００３８】マッチングの結果が最良であった（所望領
域の検出された）解像度の画像を選択し、その画像の検
索領域（この場合、全画面）と辞書３内の次階層での検
索領域の相対的位置から次階層の検索領域を取得する
（ステップＳ５）。すなわち、撮像部２から出力された
画像を上位階層から下位階層へとその各階層の辞書デー
タとマッチングを順次行っていき、その結果に基づき所
望対象物の存在し得る可能性のあるより小さな領域へと
検索領域を絞っていく。An image having the best matching result (the desired area is detected) is selected, and the search area of the image (in this case, the entire screen) and the search area in the next hierarchy in the dictionary 3 are selected. The search area of the next hierarchy is acquired from the relative position (step S5). That is, the image output from the imaging unit 2 is sequentially matched with the dictionary data of each hierarchy from the upper hierarchy to the lower hierarchy, and based on the result, a smaller area where a desired target object may exist may be obtained. Narrow the search area to.

【００３９】そして、この新たな検索領域と次階層の辞
書データとのマッチング処理を、上記ステップＳ３〜ス
テップＳ５と同様に、その階層での複数の幾何学変換種
類辞書について行い統計的基準を最も満たすものを見つ
ける。最下位階層にいたるまで上記ステップＳ３〜ステ
ップＳ５の処理を繰り返す。Then, the matching process between the new search area and the dictionary data of the next hierarchy is performed for a plurality of geometric transformation type dictionaries in that hierarchy, as in the above-described steps S3 to S5, and the statistical standard is set to the maximum. Find what meets you. The processes of steps S3 to S5 are repeated until the lowest hierarchical level is reached.

【００４０】（５）対象領域抽出辞書３の作成処理辞書作成部１４での対象領域抽出辞書３の作成処理動作
について、図４のフローチャートを参照して説明する。(5) Processing for Creating Target Area Extraction Dictionary 3 The operation for creating the target area extraction dictionary 3 in the dictionary creation section 14 will be described with reference to the flowchart in FIG.

【００４１】ある状況シーンでの複数の画像中のある１
画像（フレーム）について実際識別すべき対象物領域に
外接する領域をユーザ操作部１１を介して指定する（ス
テップＳ１１）。One of a plurality of images in a situation scene
An area circumscribing the target object area to be actually identified in the image (frame) is specified via the user operation unit 11 (step S11).

【００４２】上述のある状況シーンでの複数の画像を用
い、それらの画素ごとの対応づけを求め、ユーザ操作部
１１を介して指定された指定対象領域のフレーム間での
対応関係（移動ベクトル）を求め、その代表ベクトルＵ
を求める（ステップＳ１２）。Using the plurality of images in the above-mentioned situation scene, the correspondence between the pixels is obtained, and the correspondence (movement vector) between the frames of the designated area designated through the user operation unit 11 is obtained. And its representative vector U
Is obtained (step S12).

【００４３】指定対象領域を画面内で包含するような領
域のフレーム間での対応関係（移動ベクトル）Ｕｏ
（ｘ、ｙ）を求める。ただしｘ、ｙは画面内のｘ、ｙ座
標である（ステップＳ１３）。Correspondence (movement vector) Uo between frames of an area that includes the specified area in the screen
(X, y) is obtained. Here, x and y are x and y coordinates in the screen (step S13).

【００４４】ステップＳ１３で求められた領域内での全
移動ベクトルＵｏ（ｘ、ｙ）に対して、前述のＵとの距
離がある閾値Ｔｕ以下のものＵｔ（ｘ、ｙ）を求め、各
ベクトルＵｔ（ｘ、ｙ）についての座標（ｘ、ｙ）の外
接領域を求める（ステップＳ１４）。With respect to all the movement vectors Uo (x, y) in the area determined in step S13, Ut (x, y) having a distance from U equal to or less than a threshold value Tu is determined, and each vector is determined. A circumscribed area of coordinates (x, y) for Ut (x, y) is obtained (step S14).

【００４５】ステップＳ１４で求められた外接領域内の
画像パターンを主成分分析する（ステップＳ１５）。The image pattern in the circumscribed area obtained in step S14 is subjected to principal component analysis (step S15).

【００４６】なお、ステップＳ１２での移動ベクトル算
出には、フレーム間での小領域の相関による方法などを
用いてよい（参考文献３：高木、下田、「画像解析ハン
ドブック」ＰＰ．５５３．）。The calculation of the movement vector in step S12 may use a method based on the correlation of small areas between frames (Reference 3: Takagi, Shimoda, "Image Analysis Handbook", PP.553.).

【００４７】また、ステップＳ１３での閾値Ｔｕは画面
内についてＵｏ（ｘ、ｙ）とＵとの差ベクトルのユーク
リッド距離を求め、これを判別分析（参考文献４：大
津、栗田、関田、「パターン認識−理論と応用−」、朝
倉書店ｐｐ６５−７１、１９９６．）する方法などを
用いて求めてよい。The threshold value Tu in step S13 determines the Euclidean distance of the difference vector between Uo (x, y) and U in the screen, and performs discriminant analysis (Ref. 4: Otsu, Kurita, Sekida, "pattern Recognition—Theory and Application— ”, Asakura Shoten, pp. 65-71, 1996.).

【００４８】また、ステップＳ１４において外接領域を
求めるためには、Ｕｔ（ｘ、ｙ）の各座標の外接矩形を
連結領域ラベリング（参考文献５：高木、下田、「画像
解析ハンドブック」ＰＰ．５７８．）を用いて求めて
おき、その外接矩形に初期制御点を配置したＢ−ｓｐｌ
ｉｎｅスネーク（参考文献６：Ｒ．Ｃｉｐｏｌｌａ、
Ａ．Ｂｌａｋｅ、“ｔｈｅｄｙｎａｍｉｃａｎａｌ
ｙｓｉｓｏｆａｐｐａｒｅｎｔｃｏｔｏｕｒｓ”
ｉｎｐｒｏｃ．３ｒｄｉｎｔ．ｃｏｎｆ．ｏｎｃ
ｏｍｐｕｔｅｒｖｉｓｉｏｎ、ｐｐ６１６−６２３
（１９９０））を用いて求めるなどしてよい。Further, in order to obtain the circumscribed area in step S14, the circumscribed rectangle of each coordinate of Ut (x, y) is connected area labeling (Ref. 5: Takagi, Shimoda, "Image Analysis Handbook", PP.578. ), And an initial control point is arranged in the circumscribed rectangle.
ine snake (Reference 6: R. Cipolla,
A. Blake, "the dynamic anal
ysis of apparel coats ”
in proc. 3rd int. conf. on c
output vision, pp 616-623
(1990)).

【００４９】また、移動ベクトルの算出のために複数フ
レーム間での対応画素の輝度の相関を用いてもよいし、
各フレーム画像にあらかじめｓｏｂｅｌ−ｏｐｅｒａｔ
ｏｒ（参考文献７：高木、下田、「画像解析ハンドブッ
ク」ＰＰ．５５３．）などの微分フィルタを施した結
果画像（エッジ強調）の輝度相関を用いてもよい。さら
に所望対象物がある特定色で構成されている場合には、
各フレーム画像をそれら特定色を強調するような表色系
に変換した後、前述のエッジ強調画像を作成してその輝
度相関を用いてもよい。このような例として人物の顔を
抽出するような場合には顔の肌色を強調するような表色
系としてＹＩＱ表色系（参考文献８：高木、下田、「画
像解析ハンドブック」ＰＰ．１０３．）のＩ成分を用
いてよい。Further, the correlation of the luminance of the corresponding pixel among a plurality of frames may be used for calculating the movement vector,
Sobel-operat beforehand for each frame image
or (Reference Document 7: Takagi and Shimoda, “Image Analysis Handbook”, PP.553.), and the like, and the luminance correlation of an image (edge enhancement) obtained by applying a differential filter may be used. Further, if the desired object is composed of a specific color,
After converting each frame image into a color system that emphasizes those specific colors, the above-described edge-emphasized image may be created and its luminance correlation may be used. As an example of such a case, when a human face is extracted, a YIQ color system is used as a color system that emphasizes the skin color of the face (Ref. 8: Takagi and Shimoda, “Image Analysis Handbook”, PP.103. )) May be used.

【００５０】さらに、上記説明では、画素ごとの対応づ
けを求めたが対象領域を包含する領域についてあらかじ
め前述のエッジ強調画像を求め、これをある閾値で２値
化したエッジ領域についてフレーム間対応づけを求めて
移動ベクトルを求めるようにしてもよい。Further, in the above description, the correspondence for each pixel is obtained. However, the above-mentioned edge-enhanced image is obtained in advance for the region including the target region, and the edge region obtained by binarizing the image with a certain threshold is used for frame-to-frame association. May be obtained to obtain the movement vector.

【００５１】（６）撮像状況選択部６と撮像状況辞書５撮像状況選択部６は、撮像状況辞書５を用いて撮像状況
ごとに複数の対象領域抽出辞書３の中から所望対象物の
撮像状況に最適な対象領域抽出辞書を選択（ユーザによ
る選択、自動選択）する。(6) Imaging situation selection unit 6 and imaging situation dictionary 5 The imaging situation selection unit 6 uses the imaging situation dictionary 5 to select an imaging situation of a desired target object from a plurality of target area extraction dictionaries 3 for each imaging situation. Is selected (selection by the user, automatic selection).

【００５２】撮像状況辞書５は、各状況ごとの代表パタ
ーン、およびそれに対応する対象領域抽出辞書３の識別
番号が記述されている。The imaging situation dictionary 5 describes a representative pattern for each situation and an identification number of the target area extraction dictionary 3 corresponding to the representative pattern.

【００５３】まず、辞書作成部１４において、撮像状況
辞書５の作成処理動作について、図５に示すフローチャ
ートを参照して説明する。First, the operation of the dictionary creating section 14 for creating the imaging situation dictionary 5 will be described with reference to the flowchart shown in FIG.

【００５４】ある撮像条件下での撮像を開始し、撮像部
２のディジタル出力画像を数枚記憶部１２に保存してお
く（ステップＳ２１〜ステップＳ２２）。これについて
統計的に求めた状況の代表パターンを作成する（ステッ
プＳ２３）。ここでの代表パターンの作成は、あるシー
ンについての複数のフレーム画像について前述の部分空
間法（参考文献１）を用いて行う。したがって代表パタ
ーンは１つではなく前述の固有値の数だけ生成可能であ
るが、このうちのあるｎ個を用いる。Image pickup under a certain image pickup condition is started, and several digital output images of the image pickup section 2 are stored in the storage section 12 (steps S21 to S22). A representative pattern of the situation statistically obtained for this is created (step S23). The creation of the representative pattern here is performed by using the above-described subspace method (Reference Document 1) for a plurality of frame images of a certain scene. Therefore, not only one representative pattern but the number of the above-described eigenvalues can be generated, and a certain n of them are used.

【００５５】このときに、さらに、上記入力画像を用い
て、前述の対象領域抽出辞書の作成処理（図４参照）を
行う（ステップＳ２４）。At this time, the above-described input image is used to perform the above-described target area extraction dictionary creation processing (see FIG. 4) (step S24).

【００５６】ステップＳ２４で作成された対象領域抽出
辞書の識別番号と、ステップＳ２２〜ステップＳ２３の
処理で求められた、その撮像条件下における代表パター
ンとを対応付けて撮像状況辞書に登録し、撮像状況辞書
を作成していく（ステップＳ２５）。The identification number of the target area extraction dictionary created in step S24 and the representative pattern under the imaging conditions obtained in the processing of steps S22 to S23 are registered in the imaging situation dictionary in association with each other, and the A situation dictionary is created (step S25).

【００５７】以上、ステップＳ２１〜ステップＳ２５ま
での処理を使用が想定される状況数行って、状況に対応
した撮像状況辞書５を作成する（ステップＳ２６）。As described above, the processing from step S21 to step S25 is performed for the number of situations that are assumed to be used, and the imaging situation dictionary 5 corresponding to the situation is created (step S26).

【００５８】次に、撮像状況選択部６にて、所望対象物
の撮像状況に最適な対象領域抽出辞書をユーザ選択する
場合について説明する。Next, a case will be described in which the imaging situation selection unit 6 selects the target area extraction dictionary that is optimal for the imaging situation of the desired object.

【００５９】まず、ユーザ選択の場合には、撮像部２の
ディジタル出力画像を数枚記憶部１２に保存しておき、
表示部１０によってこれを縮小した画像を画面内に一覧
表示する。このとき画面内の別領域には撮像状況辞書５
中の辞書パターンを表示し、ユーザ操作部１１を用いて
選択する。なお撮像状況辞書５内の辞書パターンは後述
のように複数の状況シーンの代表パターンである。そし
て、撮像状況選択部６は、この選択された撮像状況辞書
５内の辞書パターンに対応する対象領域抽出辞書３の識
別番号を読み出す。First, in the case of user selection, several digital output images of the image pickup unit 2 are stored in the storage unit 12, and
The display unit 10 displays a list of reduced images on the screen. At this time, the imaging situation dictionary 5
The inside dictionary pattern is displayed and selected using the user operation unit 11. Note that the dictionary patterns in the imaging situation dictionary 5 are representative patterns of a plurality of situation scenes as described later. Then, the imaging situation selection unit 6 reads the identification number of the target area extraction dictionary 3 corresponding to the dictionary pattern in the selected imaging situation dictionary 5.

【００６０】次に、撮像状況選択部６にて対象領域抽出
辞書を自動選択する場合について、図６のフローチャー
トを参照して説明する。Next, the case where the target area extraction dictionary is automatically selected by the imaging condition selection section 6 will be described with reference to the flowchart of FIG.

【００６１】実際の撮像を行う直前に入力複数枚を用い
て、図５のステップＳ２３と同じじ統計手法によって、
その撮像状況下での代表パターンを求める（ステップＳ
３１〜ステップＳ３３）。Immediately before the actual imaging, a plurality of input images are used and the same statistical method as in step S23 of FIG. 5 is used.
A representative pattern under the imaging condition is obtained (step S
31 to step S33).

【００６２】撮像状況辞書５内の各代表パターンとステ
ップＳ３３で求めた現在の撮像状況での代表パターンと
の間の類似度または距離を求め、類似度最大、または距
離最小なる撮像状況辞書５内の代表パターン（辞書パタ
ーン）を選択し、その選択された辞書パターンに対応す
る対象領域抽出辞書３の識別番号を読み出す（ステップ
Ｓ３５）。以後、実際の撮像動作が開始されたとき、こ
の選択された対象領域抽出辞書３を用いて対象領域抽出
部４で撮像部２から出力された画像から対象領域を抽出
する。The similarity or distance between each representative pattern in the imaging situation dictionary 5 and the representative pattern in the current imaging situation obtained in step S33 is obtained, and the similarity or maximum distance in the imaging situation dictionary 5 is determined. Is selected, and the identification number of the target area extraction dictionary 3 corresponding to the selected dictionary pattern is read (step S35). Thereafter, when the actual imaging operation is started, the target area extraction unit 4 extracts the target area from the image output from the imaging unit 2 by using the selected target area extraction dictionary 3.

【００６３】なお、ステップＳ３４で求める撮像状況辞
書５内の各代表パターンと現在の撮像状況での代表パタ
ーンとの類似度は、例えば、部分空間法（参考文献１参
照）における類似度でもよい。また距離としては上述の
撮像状況辞書５および現在の代表パターンの内のある１
つの代表パターンのうち固有値最大なるもの同士のユー
クリッド距離を用いてもよい。The similarity between each representative pattern in the imaging situation dictionary 5 obtained in step S34 and the representative pattern in the current imaging situation may be, for example, the similarity in the subspace method (see Reference Document 1). In addition, as the distance, a certain one of the above-described imaging situation dictionary 5 and the current representative pattern is used.
The Euclidean distance between one of the representative patterns having the largest eigenvalue may be used.

【００６４】（７）対象物抽出結果の修正表示部１０において、対象領域抽出部４での所望領域抽
出結果を表示した際に、それがユーザの所望のものでな
かった場合に、ユーザ操作部１１を介して撮像部２の撮
像パラメータを調整し、さらに所望領域選択を行う。つ
ぎに、このときのユーザの調整量と対象領域抽出部４に
おけるその時のマッチング結果、およびマッチング時に
使用した対象領域抽出辞書３の代表パターンの番号Ｄｉ
とを記憶部１２に一時的に保存する。(7) Correction of Object Extraction Result When the display of the desired region extraction result by the target region extraction unit 4 on the display unit 10 is not desired by the user, the user operation unit The imaging parameters of the imaging unit 2 are adjusted via the interface 11 and a desired area is selected. Next, the adjustment amount of the user at this time, the matching result at that time in the target area extraction unit 4, and the number Di of the representative pattern of the target area extraction dictionary 3 used at the time of matching.
Are temporarily stored in the storage unit 12.

【００６５】辞書作成部１４では、記憶部１２に記憶さ
れた上述の内容をもとに、以下のように撮像調整パラメ
ータ辞書１６を作成する。以下、図７に示すフローチャ
ートを参照して説明する。The dictionary creating unit 14 creates the imaging adjustment parameter dictionary 16 as described below based on the above-mentioned contents stored in the storage unit 12. Hereinafter, description will be made with reference to the flowchart shown in FIG.

【００６６】ある１つの対象領域抽出辞書３の代表パタ
ーンＤｉについての複数の修正操作に対応したマッチン
グ結果ベクトルＶｍｉを統計的に尤もらしいｋ個のクラ
スにあらかじめ分類しておき各クラスごとにその代表ベ
クトルパターンＶｍｋを求める（ステップＳ４１〜ステ
ップＳ４２）。ただし、Ｖｍｉはここでは前述の対象領
域抽出における類似度、または代表パターンとの距離最
大なる領域として抽出された領域、およびそのときの類
似度または距離の組をベクトルの要素としたものであ
る。またクラス分類にはクラスタリング手法として一般
に知られているＫ−平均アルゴリズム（参考文献９：高
木、下田、「画像解析ハンドブック」ＰＰ．６５
０．）などを使用して差し支えない。The matching result vectors Vmi corresponding to a plurality of correction operations on the representative pattern Di of a certain target region extraction dictionary 3 are classified in advance into k classes that are statistically likely, and each class has its representative vector Vmi. A vector pattern Vmk is obtained (steps S41 to S42). Here, Vmi is a vector element using a set of the similarity in the above-described target area extraction or the area extracted as the area having the maximum distance from the representative pattern and the similarity or distance at that time. For class classification, a K-means algorithm generally known as a clustering method (Ref. 9: Takagi and Shimoda, “Image Analysis Handbook”, PP.65)
0. ) Can be used.

【００６７】代表ベクトルパターンＶｍｋのそれぞれと
もっとも距離の近いクラスｋ内のマッチング結果ベクト
ルＶｍｋｉに対応する修正操作パラメータＶｐｋｉを求
め（ステップＳ４３）、対象領域抽出辞書３の代表パタ
ーン番号Ｄｉ、マッチング結果代表ベクトルＶｍｋ、お
よび修正操作パラメータベクトルＶｐｋｉとを対応させ
て撮像調整パラメータ辞書１６に登録する（ステップＳ
４４）。The correction operation parameter Vpki corresponding to the matching result vector Vmki in the class k closest to each of the representative vector patterns Vmk is obtained (step S43), and the representative pattern number Di of the target area extraction dictionary 3 and the matching result representative The vector Vmk and the correction operation parameter vector Vpki are registered in the imaging adjustment parameter dictionary 16 in association with each other (Step S).
44).

【００６８】なお、ＶｍｋとＶｍｋｉとの距離としては
ユークリッド距離やマハラノビス距離（参考文献１０：
高木、下田、「画像解析ハンドブック」ＰＰ．６５
４．）などを用いてよい。The distance between Vmk and Vmki is Euclidean distance or Mahalanobis distance (see Reference 10:
Takagi, Shimoda, "Image Analysis Handbook" PP. 65
4. ) May be used.

【００６９】自動撮像調整部１７では、撮像調整パラメ
ータ辞書１６を用いて、以下のように撮像調整パラメー
タを取得し、撮像パラメータの自動調整を行う。以下、
図８に示すフローチャートを参照して、自動撮像調整部
１７の撮像調整処理動作について説明する。The automatic imaging adjustment section 17 obtains the imaging adjustment parameters as described below using the imaging adjustment parameter dictionary 16 and automatically adjusts the imaging parameters. Less than,
The imaging adjustment processing operation of the automatic imaging adjustment unit 17 will be described with reference to the flowchart shown in FIG.

【００７０】撮像により対象領域抽出辞書３を選択し、
その中で、現在使用されている代表パターンの番号Ｄｉ
を取得する（ステップＳ５１）。The target area extraction dictionary 3 is selected by imaging,
Among them, the number Di of the representative pattern currently used is
Is acquired (step S51).

【００７１】現在の対象領域抽出結果からマッチング結
果ベクトルＶｍｔを求め（ステップＳ５２）、撮像調整
パラメータ辞書１６内で領域抽出辞書の代表パターン番
号Ｄｉに相当するｋ個のクラスのマッチング結果代表ベ
クトルパターンＶｍｊ（ｊ＝１、…ｋ）とＶｍｔとの距
離を求め距離最小なるクラスｊを求める（ステップＳ５
３〜ステップＳ５４）。A matching result vector Vmt is obtained from the current target region extraction result (step S52), and a matching result representative vector pattern Vmj of k classes corresponding to the representative pattern number Di of the region extraction dictionary in the imaging adjustment parameter dictionary 16 is obtained. The distance between (j = 1,... K) and Vmt is determined, and the class j with the minimum distance is determined (step S5).
3 to Step S54).

【００７２】クラスｊに相当する撮像調整パラメータベ
クトルＶｐｊを撮像調整パラメータ辞書１６内で領域抽
出辞書の代表パターン番号Ｄｉに相当する辞書から求め
る（ステップＳ５５）。The imaging adjustment parameter vector Vpj corresponding to the class j is obtained from the dictionary corresponding to the representative pattern number Di of the area extraction dictionary in the imaging adjustment parameter dictionary 16 (step S55).

【００７３】ステップＳ５５で求めた撮像調整パラメー
タベクトルＶｐｊにしたがって撮像系の調整操作を自動
的に行い（ステップＳ５５）、その結果画像を表示部１
０によってユーザに提示し（ステップＳ５６）、さらに
提示結果に対してユーザの了解がユーザ操作部１１から
得られた場合には調整操作結果を対象領域抽出結果とす
る（ステップＳ５８）。An adjustment operation of the imaging system is automatically performed according to the imaging adjustment parameter vector Vpj obtained in step S55 (step S55), and the resulting image is displayed on the display unit 1.
0 is presented to the user (step S56), and when the user's consent is obtained from the user operation unit 11 with respect to the presentation result, the adjustment operation result is set as the target area extraction result (step S58).

【００７４】このように、対象領域抽出部４の抽出結果
を表示部１０に表示し、それがユーザの所望のものでな
かった場合に、ユーザがマニュアルで撮像部２の撮像パ
ラメータを調整し、所望領域選択を行うことを可能とす
るユーザ操作部１１を具備し、ユーザの調整操作量と対
象領域抽出部４におけるその時のマッチング結果、およ
びマッチング時に使用した対象領域抽出辞書とを関連付
けて記憶しておき、自動撮像調整部１７は、別の撮像時
において前記記憶されている辞書を用いたマッチング結
果に対し、ある評価に基づいて近い結果が得られた場合
に記憶されている調整操作を自動的に行い、その結果を
ユーザに提示し、さらに提示結果に対してユーザの了解
が得られた場合には上記調整操作結果を対象領域抽出結
果とする。As described above, the extraction result of the target area extraction unit 4 is displayed on the display unit 10, and if the extraction result is not the one desired by the user, the user manually adjusts the imaging parameters of the imaging unit 2, A user operation unit 11 that enables selection of a desired region is provided, and the user's adjustment operation amount, the matching result at that time in the target region extraction unit 4, and the target region extraction dictionary used at the time of matching are stored in association with each other. In addition, the automatic imaging adjustment unit 17 automatically adjusts the stored adjustment operation when a similar result is obtained based on a certain evaluation with respect to the matching result using the stored dictionary during another imaging. The result is presented to the user, and when the user's consent is obtained with respect to the presentation result, the result of the adjustment operation is set as the target area extraction result.

【００７５】また、ユーザが調整操作を行う場合が複数
にわたることがある場合について、ユーザの調整操作量
と対象領域抽出部４における各調整操作時のマッチング
結果、およびマッチングに使用した辞書とを関連付けて
記憶しておくが、ある１つのマッチング辞書についての
複数の修正操作に対応したマッチング結果を統計的に尤
もらしい数のクラスにあらかじめ分類しておき、あらた
な別の撮像時に対象領域抽出部４でのその時点のマッチ
ング結果にある評価基準に基づいて最も近い前記記憶済
み結果クラスの代表値を求め、それと関連付けられて記
憶されている調整操作を行った結果をユーザに提示し、
さらに結果に対してユーザの了解が得られた場合には上
記調整操作結果を対象領域抽出結果とする。In a case where the user performs an adjustment operation in a plurality of cases, the user's adjustment operation amount is associated with the matching result of each adjustment operation in the target area extraction unit 4 and the dictionary used for the matching. The matching results corresponding to a plurality of correction operations on a certain matching dictionary are classified in advance into statistically plausible numbers of classes, and the target area extracting unit 4 is used for new imaging. Finding the closest representative value of the stored result class based on the evaluation criteria in the matching result at that point in time, presenting to the user the result of performing the adjustment operation stored in association with it,
When the user's consent is obtained for the result, the result of the adjustment operation is set as the target area extraction result.

【００７６】（８）幾何変換部７幾何変換部７では、対象領域抽出部４で所望対象領域候
補であることが確定した領域について、あらかじめ対象
識別辞書８に設定されていた標準所望対象領域に対し
て、どの程度の幾何変形を受けたものなのかを２次元平
面内の幾何変換で近似する。(8) Geometric Transformation Unit 7 The geometric transformation unit 7 converts a region determined to be a desired target region candidate by the target region extraction unit 4 into a standard desired target region previously set in the target identification dictionary 8. On the other hand, the degree of the geometric deformation is approximated by a geometric transformation in a two-dimensional plane.

【００７７】幾何変換の一般形は射影変換であり、入力
（ＸＹｗ）に対する、出力（Ｘ^* Ｙ^* ｗ^* ）への
写像は以下のような線形方程式で表される。The general form of the geometric transformation is a projective transformation, and the mapping of the input (XY w) to the output (X ^* Y ^* w ^* ) is represented by the following linear equation.

【００７８】[0078]

【数１】ここでは、入力を標準所望対象領域の外接多角形座標、
出力を対象領域抽出部４にて抽出された対象領域候補の
外接多角形座標とし、２次元平面内での歪みは少ないと
仮定して、（１）（２）（３）式を以下のＡｆｆｉｎｅ
変換で近似する。(Equation 1) Here, the input is the circumscribed polygon coordinates of the standard desired target area,
The output is assumed to be the circumscribed polygon coordinates of the target area candidate extracted by the target area extraction unit 4, and the equations (1), (2), and (3) are converted into the following Affine assuming that the distortion in the two-dimensional plane is small.
Approximate by transformation.

【００７９】[0079]

【数２】ここに（ｘ、ｙ）は標準所望対象領域の外接矩形の各頂
点座標であり、（ｘ^*、ｙ^* ）は対象領域抽出部４で抽
出された領域候補の外接矩形の各頂点座標であり、この
方程式を解くために少なくとも３頂点を要する。(Equation 2) Here, (x, y) are the coordinates of each vertex of the circumscribed rectangle of the standard desired target area, and (x ^* , y ^* ) are the coordinates of each vertex of the circumscribed rectangle of the area candidate extracted by the target area extraction unit 4. , It takes at least three vertices to solve this equation.

【００８０】次に（４）式の写像の各係数にしたがって
座標変換し、標準所望対象領域候内の全画素に対応する
対象抽出領域中の各画素を取得すると、それは対象抽出
領域に対して幾何学逆変換をかけたものとなり、外接形
状は標準所望対象領域とほぼ等しくなる。Next, when the coordinates are transformed according to the coefficients of the mapping of the equation (4) to obtain the respective pixels in the target extraction area corresponding to all the pixels in the standard desired target area, it is determined that the target extraction area The circumscribed shape is substantially equal to the standard desired target area.

【００８１】以上のように求められた幾何学逆変換画像
に対して、後述の対象識別部９での識別処理を行う。The identification processing performed by the object identification section 9 described below is performed on the geometric inverse transformed image obtained as described above.

【００８２】（９）対象識別部９対象識別部９においては、あらかじめ対象識別辞書８に
登録されている標準所望対象パターンとの照合演算によ
って、入力画像が所望対象であるかどうかを識別する。
この場合、幾何変換部７で求められた幾何学逆変換画像
と、標準所望領域パターンとの照合演算を行なって識別
する。この場合の照合演算には部分空間法（参考文献１
参照）などを用いてよい。(9) Object Identification Unit 9 The object identification unit 9 identifies whether or not the input image is a desired object by performing a collation operation with a standard desired object pattern registered in the object identification dictionary 8 in advance.
In this case, identification is performed by performing a collation operation between the geometric inverse transform image obtained by the geometric transform unit 7 and the standard desired area pattern. The matching operation in this case is based on the subspace method (Ref.
Reference) etc. may be used.

【００８３】対象識別辞書８の構成は、照合演算に部分
空間法（参考文献１参照）を用いるとすると、所望対象
に対しさまざまに変化（光学的、幾何学的など）した複
数サンプルパターンをベクトルで表現し、それらの分散
共分散行列（または相関行列）を主成分分析した結果の
固有ベクトルを固有値の大きい順にならべた行列形式で
表現されるが、その他として辞書作成に使用したサンプ
ルパターン画像領域を囲む外接矩形（または外接多角
形）座標を点列データとして持つ。The structure of the object identification dictionary 8 is such that if the subspace method (see Reference 1) is used for the collation operation, a plurality of sample patterns variously changed (optically, geometrically, etc.) with respect to a desired object are vectorized. And the eigenvectors of the results of principal component analysis of those variance-covariance matrices (or correlation matrices) are expressed in a matrix format in which the eigenvalues are arranged in descending order of eigenvalues. It has the coordinates of the surrounding circumscribed rectangle (or circumscribed polygon) as point sequence data.

【００８４】ここで、辞書中の点列データは、幾何変換
部７で幾何変換パラメータを算出するために用いるもの
であり、その作り方は各サンプルパターン画像を囲むよ
うな外接矩形（または外接多角形）の頂点の座標列を以
下のようにベクトルとして表現した場合にそれらベクト
ルの全サンプルパターンについての統計的代表ベクトル
を求めたものである。ただし代表ベクトルの算出にはさ
まざまな手法があるが、ここでは以下のように平均、主
成分分析などを用いて良い。各サンプルパターン周囲の
点列数は一定でｎだとし、サンプル数はＭとすると、サ
ンプルパターンｉでの点列ベクトルＰｉはＰｉ＝（ｘ１ｉ、ｘ２ｉ、…ｘｎｉ、ｙ１ｉ、ｙ２ｉ、
…ｙｎｉ）と表わされる（但し、ｉ＝１、…Ｍ）。Here, the point sequence data in the dictionary is used for calculating the geometric transformation parameters in the geometric transformation section 7, and the way of producing the data is a circumscribed rectangle (or circumscribed polygon) surrounding each sample pattern image. When the coordinate sequence of the vertices in ()) is expressed as vectors as follows, statistical representative vectors for all sample patterns of those vectors are obtained. However, there are various methods for calculating the representative vector. Here, an average, a principal component analysis, or the like may be used as described below. Assuming that the number of point sequences around each sample pattern is constant and n, and the number of samples is M, the point sequence vector Pi in the sample pattern i is Pi = (x1i, x2i,... Xni, y1i, y2i,
... Yni) (where i = 1,... M).

【００８５】これら各ベクトルにサンプル数について平
均Ｐａを辞書登録する点列データとする場合When the average Pa of the number of samples in each of these vectors is used as point sequence data to be registered in a dictionary

【００８６】[0086]

【数３】で示される。(Equation 3) Indicated by

【００８７】また、主成分分析を用いる場合では全Ｐｉ
に関してｎｘｎ次元の分散共分散行列Ｍｐを求め、これ
に対して固有値λｊと固有ベクトルＶｉ（ただしｉ＝
１、…ｎ）を求め、そのうち最大固有値に対応する固有
ベクトルをＶｉｍａｘとし、このＶｉｍａｘを辞書登録
する点列データとする。（１０）効果以上説明したように、上記実施形態によれば、あらかじ
め撮像状況や撮像対象に対するおおまかな知識を辞書と
して保有し所望のものらしいかどうかを判定しながら撮
像するので、大量の画像データを収集する必要が軽減さ
れる。また撮像、所望画像領域の選択、対象の識別を一
体型の装置のみで実行可能な構成としたことで、とくに
可搬型の用途に使用することができる。また撮像手段と
対象領域抽出手段とは同一ＬＳＩ上に形成されたもので
あるためこの部分のみを別用途、たとえば所望対象に類
似の画像のみを出力するカメラやハンディＶＣＲなどを
作成することに容易に転用できる。さらに、もし所望の
対象がうまく抽出できなかった場合にユーザが抽出が可
能となるようにパラメータの調整を可能とし、さらにそ
の場合、領域抽出時の調整パラメータを学習し、別の機
会の撮像時にはこれを使用して自動的に所望対象をより
良く抽出し識別することを可能とするため、撮影条件の
変動にも徐々に耐えるようになるとともに、装置性能の
メンテナンスをする頻度が減少する。When the principal component analysis is used, all Pi
, An nxn dimensional variance-covariance matrix Mp is obtained, and an eigenvalue λj and an eigenvector Vi (where i =
1,... N), and the eigenvector corresponding to the largest eigenvalue is Vimax, and this Vimax is point sequence data to be registered in the dictionary. (10) Effects As described above, according to the above-described embodiment, a large amount of image data is stored because rough knowledge about the imaging situation and the imaging target is held in advance as a dictionary and images are determined while judging whether or not it is desired. The need to collect is reduced. Further, by adopting a configuration in which imaging, selection of a desired image area, and identification of an object can be performed only by an integrated device, the device can be used particularly for portable use. Further, since the imaging means and the target area extracting means are formed on the same LSI, only this part can be easily used for another purpose, such as a camera or a handy VCR which outputs only an image similar to a desired target. Can be diverted to Furthermore, if the desired target is not successfully extracted, it is possible to adjust the parameters so that the user can extract, and in that case, the adjustment parameters at the time of region extraction are learned, and when imaging at another opportunity, By using this, it is possible to automatically extract and identify a desired target automatically, so that it is possible to gradually endure fluctuations in imaging conditions and to reduce the frequency of maintenance of device performance.

【００８８】[0088]

【発明の効果】以上説明したように、本発明によれば、
撮像対象物の撮像状況に応じてその撮像対象物が所望の
ものであるか否かを判定しながら確実に所望の撮像対象
物を撮像することができる。As described above, according to the present invention,
A desired imaging target can be reliably imaged while determining whether or not the imaging target is a desired one according to the imaging state of the imaging target.

[Brief description of the drawings]

【図１】本装置の実施形態にかかる撮像装置の構成例を
示した図。FIG. 1 is a diagram showing a configuration example of an imaging apparatus according to an embodiment of the present apparatus.

【図２】対象領域抽出辞書の構成例を示した図。FIG. 2 is a diagram showing a configuration example of a target area extraction dictionary.

【図３】対象領域抽出部の処理動作を説明するためのフ
ローチャート。FIG. 3 is a flowchart illustrating a processing operation of a target area extraction unit.

【図４】辞書作成部における対象領域抽出辞書の作成処
理動作を説明するためのフローチャート。FIG. 4 is a flowchart for explaining an operation of creating a target area extraction dictionary in the dictionary creating unit.

【図５】辞書作成部における撮像状況辞書の作成処理動
作を説明するためのフローチャート。FIG. 5 is a flowchart illustrating an operation of creating a shooting situation dictionary in the dictionary creating unit.

【図６】撮像状況選択部の処理動作を説明するためのフ
ローチャート。FIG. 6 is a flowchart illustrating a processing operation of an imaging situation selection unit.

【図７】辞書作成部における撮像調整パラメータ辞書の
作成処理動作を説明するためのフローチャート。FIG. 7 is a flowchart for explaining the operation of a dictionary creation unit for creating an imaging adjustment parameter dictionary.

【図８】自動撮像調整部の処理動作を説明するためのフ
ローチャート。FIG. 8 is a flowchart illustrating a processing operation of an automatic imaging adjustment unit.

[Explanation of symbols]

１…画像入力部２…撮像部３…対象領域抽出辞書４…対象領域抽出部５…撮像状況辞書６…撮像状況選択部７…幾何変換部８…対象識別辞書９…対象識別部１０…表示部１１…ユーザ操作部１２…記憶部１３…データ入出力部１４…辞書作成部１６…撮像調整パラメータ辞書１７…自動撮像調整部 DESCRIPTION OF SYMBOLS 1 ... Image input part 2 ... Imaging part 3 ... Target area extraction dictionary 4 ... Target area extraction part 5 ... Imaging situation dictionary 6 ... Imaging situation selection part 7 ... Geometric conversion part 8 ... Object identification dictionary 9 ... Object identification part 10 ... Display Unit 11: User operation unit 12: Storage unit 13: Data input / output unit 14: Dictionary creation unit 16: Imaging adjustment parameter dictionary 17: Automatic imaging adjustment unit

Claims

[Claims]

1. An image capturing means for capturing an image, comprising: a plurality of search areas which include an image area of the object for searching for an image area of the object in the image and have a hierarchical inclusion relationship. The target area extraction dictionary in which each feature information is registered is compared with the feature information of the search area of each layer registered in the target area extraction dictionary and the image captured by the image capturing means. Target area extracting means for extracting an image area of the target object; a target identification dictionary in which feature information of the target object is registered; feature information registered in the target identification dictionary; and the target object extracted by the target area extracting means. Object identification means for comparing the image area with the object to identify the object, display means for displaying the image area of the object extracted by the object area extraction means and the identification result in the object identification means, Imaging device characterized by comprising: Place.

2. An image capturing means for capturing an image, and an image area of the object for searching for an image area of the object in the image according to an environmental condition at the time of capturing the image. A plurality of target area extraction dictionaries in which respective feature information of a plurality of search areas having a hierarchical inclusion relationship are registered; and one of the plurality of target area extraction dictionaries based on the image captured by the capturing means. And comparing the feature information of the search area of each hierarchy registered in the target area extraction dictionary selected by the selection means with the image captured by the image capturing means, and selecting the target Target area extracting means for extracting an image area of an object; a target identification dictionary in which feature information of the target is registered; feature information registered in the target identification dictionary; and the target extracted by the target area extracting means The image area of An image pickup apparatus comprising: a target identifying unit that identifies an object; and a display unit that displays an image area of the target extracted by the target region extracting unit and a result of identification performed by the target identifying unit. .

3. An image capturing means for capturing an image, and an image area of the object for retrieving an image area of the object in the image according to environmental conditions at the time of capturing the image. A plurality of target area extraction dictionaries in which respective feature information of a plurality of search areas having a hierarchical inclusion relationship are registered; and an environment in which feature information of the image corresponding to the environmental condition and an identifier of the target area extraction dictionary are registered. A condition dictionary, and selecting means for comparing characteristic information registered in the environmental condition dictionary with an image captured by the capturing means and selecting one of the plurality of target area extraction dictionaries; A target for extracting the image area of the target object by comparing the characteristic information of the search area of each hierarchy registered in the target area extraction dictionary selected by the selection means with the image captured by the image capturing means. Region extraction means, the object And a target identification dictionary for identifying the target by comparing the characteristic information registered in the target identification dictionary with the image area of the target extracted by the target region extracting means. Means for displaying an image area of the object extracted by the target area extracting means and a result of identification by the object identifying means.

4. The apparatus according to claim 1, wherein the target area extracting unit extracts the image area of the target object by narrowing the search area from a higher hierarchy to a lower hierarchy. An imaging device according to any one of the preceding claims.

5. A pixel-by-pixel correspondence between an inside of a circumscribed contour of an image area of the object designated as one of a plurality of images captured by the image capturing means and the plurality of images. A plurality of search areas in a hierarchical inclusion relationship including the image area of the target object, and registering the characteristic information of each of the plurality of search areas in accordance with the hierarchical inclusion relationship, 2. The imaging apparatus according to claim 1, further comprising a dictionary creating unit that creates an area extraction dictionary.

6. The luminance of each pixel between an inside of a circumscribed contour of an image area of the object designated as one of a plurality of images captured by the image capturing means and the plurality of images. Based on the corresponding relationship, a plurality of search areas that include the image area of the object and have a hierarchical inclusion relationship are obtained, and the characteristic information of each of the plurality of search areas is registered according to the hierarchical inclusion relationship. 2. The imaging apparatus according to claim 1, further comprising a dictionary creating unit that creates the target area extraction dictionary.

7. An image of the object designated as one of the plurality of captured images under each environmental condition by changing environmental conditions when the image is captured by the image capturing means. A plurality of search regions in a hierarchical inclusion relationship including the image region of the object are determined based on the pixel-by-pixel correspondence between the inside of the circumscribed contour of the region and the plurality of images. 4. The apparatus according to claim 2, further comprising: first dictionary creating means for registering feature information of each area according to the hierarchical inclusion relation and creating a target area extraction dictionary corresponding to the environmental condition. An imaging device according to any one of the preceding claims.

8. An image of the object designated as one of the plurality of captured images under each environmental condition by changing an environmental condition when the image is captured by the image capturing means. Based on the correspondence based on the brightness of each pixel between the inside of the circumscribed contour of the area and the plurality of images, a plurality of search areas in a hierarchical inclusion relationship including the image area of the object are obtained, A first dictionary creating means for registering feature information of each of a plurality of search areas according to the hierarchical inclusion relation and creating a target area extraction dictionary corresponding to the environmental condition. 4. The imaging device according to 2 or 3.

9. Based on a plurality of images captured under different environmental conditions, feature information of an image corresponding to each environmental condition is extracted. 4. The imaging apparatus according to claim 3, further comprising a second dictionary creating unit that creates the environmental condition dictionary by registering a corresponding identifier of the target area extraction dictionary.

10. An image of the object designated as one of the plurality of captured images under each environmental condition by changing an environmental condition when the image is captured by the image capturing means. A plurality of search regions in a hierarchical inclusion relationship including the image region of the object are determined based on the pixel-by-pixel correspondence between the inside of the circumscribed contour of the region and the plurality of images. First dictionary creating means for registering feature information of each area according to the hierarchical inclusion relationship and creating a target area extraction dictionary corresponding to the environmental condition; Image characteristic information of an image corresponding to each environmental condition is extracted from a plurality of images captured under different environmental conditions by image capturing means, and the extracted characteristic information of the image corresponding to each environmental condition and the corresponding characteristic information of the image are extracted. Knowledge of the target area extraction dictionary The imaging apparatus according 3, characterized by comprising a second dictionary creation means for creating said environmental condition dictionary and registers the child, the.

11. An image capturing parameter of the image capturing unit is adjusted based on an image area specified based on an image displayed on the display unit and an adjustment amount of the image capturing parameter of the image capturing unit. The imaging device according to any one of claims 1 to 3, further comprising an adjustment unit configured to perform the adjustment.

12. The image processing apparatus according to claim 1, wherein the target area extracting unit obtains an image of the search area by randomly accessing an image captured by the image capturing unit on a pixel-by-pixel basis. The imaging device according to any one of the above.