JP3894522B2

JP3894522B2 - Image recognition method

Info

Publication number: JP3894522B2
Application number: JP34539297A
Authority: JP
Inventors: 広司辻野; ケルナーエドガー; 知彦桝谷
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 1996-12-17
Filing date: 1997-12-15
Publication date: 2007-03-22
Anticipated expiration: 2017-12-15
Also published as: JPH10232938A

Description

【０００１】
【発明の属する技術分野】
本発明は、人工頭脳システムに関するものであり、さらに詳しくは生物の視覚認識に近似した画像処理方法によりカメラ等の光学機器より入力した種々の画像情報を数値化し、コンピュ−タにより計算処理して画像を認識することが可能であり、自己学習によりさらに効率的に画像を認識しうるコンピュ−タによる画像情報処理システムに関するものである。
【０００２】
【従来の技術】
従来より生物における視覚認識の大脳生理学的機能が解明されるにつれて、その認識機能に近似したモデルをコンピュ−タ等を用いて構成し人工的な視覚学習システムを構築する努力がなされてきた。かかる視覚学習システムは、例えばビデオカメラにより入力された光景を数値化して表現しこれを解析することにより、光景中の或る対象を認識して特定または分類する。即ち入力画像パターンと学習により蓄積されている認識対象の画像パターンとの一致を確認する解析処理をおこなうシステムである。
【０００３】
入力された光景は画素上の光量に対応する電圧値等の数値に変換され、ベクトルとして表現される。例えば画素領域２７×２７のイメージサイズの場合は７２９次元のベクトルが直交軸空間に分散している形として表現される。かかる多量のデータを解析処理して目標の画像パターンを識別するのは、現存するコンピュータの能力からも不可能に近い。
【０００４】
従って、これらの解析には認識対象の入力画像パターンを特徴を表現する圧縮されたデータに置換する処理をおこない、蓄積された学習パターンと比較的容易に対比しうるようにすることが要求される。効率的にデータを解析するためには、データ空間の最も特徴的な領域に限定されるよう、いわゆる部分空間に区分することが望ましい。
【０００５】
このような要求を解決する手法として主成分分析（ＰｒｉｎｃｉｐａｌＣｏｍｐｏｎｅｎｔｓＡｎａｌｙｓｉｓ；ＰＣＡ）がある。この方法は多次元のイメージ空間内の対象画像データの分布を特徴的分布にグループ化し、その分布を特定するために固有ベクトルの主要成分を用いることができるという認識に基づいている。即ちこれら固有ベクトルはイメージ群中の変化に対応する画素上の光量の変化量それぞれに起因するものであり、イメージ群中の変化を協同して特徴づける特徴群であると考えられる。
【０００６】
対象画像に対応する各ベクトルは固有ベクトルに対して大きく寄与するものも、それほど寄与しないものもある。対象画像はイメージ群内の大きな変化に起因する、例えば大きい固有値を持つ固有ベクトルの主要成分の組み合せにより殆ど表現できるのである。
【０００７】
言葉を換えて言えば、対象画像を非常に正確に再現するためには多数の固有ベクトルが必要であるが、対象画像の外観の特徴を表現するだけなら少ない固有ベクトルで充分に表現できるということである。以上の方法を利用して人間の顔を認識するシステムが米国特許第５１６４９９２号に開示されている。
【０００８】
その技術を以下に要約する。まず多数の既知の人間の顔の画像を学習する。顔画像の画素数をＮとすると、Ｍ個の顔画像はＮ²のベクトルΓ₁、Γ₂、Γ₃───Γ_mで表され、これらの顔画像ベクトルの平均値を求める。
【０００９】
各個人の顔のベクトルと平均値との差をとると、〔Φ_i＝Γ_i−平均ベクトル〕のＭ個のベクトル群となる。このベクトル群をＡ＝〔Φ₁,────Φ_M〕と定義すると、Ａの共分散マトリックスＣ＝ＡＡ^Tの固有ベクトル及び固有値としてベクトルｕ_k及びスカラー量λ_kを算出することにより、顔の固有空間が求められる。
【００１０】
マトリックスＣはイメージの画素数がＮ×Ｎの場合Ｎ²の固有ベクトル及び固有値を有する。しかし、顔の背景をも含む全体イメージ空間の次元Ｎ²より顔データの数Ｍが少ない場合（Ｍ＜＜Ｎ²）は、顔の画像認識のためにはＭ×Ｍ次元のマトリックスＬ＝Ａ^TＡの固有ベクトルを計算すればよい。マトリックスＬのＭ個の固有ベクトルν_iから顔の固有空間ｕ_l＝Ａν_iが求められる。以上の解析によりデータは圧縮され計算回数は非常に減少される。
【００１１】
入力する顔のイメ−ジ（Γ）は単純な操作により顔空間の成分に変換される。この処理を顔の固有空間に投射するという。
【００１２】
ω_k＝ｕ_k ^T（Γ−Ψ） Ψ；平均ベクトル
この操作はイメージ処理装置により行われる。Ω^T＝〔ω_1,ω₂───ω_M〕は重みで入力イメージのパターンに各顔の固有空間が寄与する程度を表す。このベクトルΩが標準的パターン認識に用いられる。
【００１３】
入力イメージΦ＝Γ−Ψと次式（１）で定義される顔の固有空間Φ_fとの距離は次式（２）で定義され、εが閾値以内であれば入力イメージはΦ_fであると認識する。
【００１４】
【数１】

【００１５】
つまり全体のイメージの光景の内、顔のイメージの配分を最も良く評価するベクトルを決定することにより顔のイメージの部分空間を限定することができる。このためデータ数をかなり低減して顔の特徴を形成する限定された１セットのデータに焦点を合わせることができる。
【００１６】
一度評価ベクトルを決定すれば入力イメージが顔であるか否か分類でき、顔であることが判れば蓄積された既知の個人の顔のパターンと比較することにより、特定の個人の顔が認識が可能となる。
【００１７】
前記米国特許の出願人であるＰｅｎｔｌａｎｄ等は１２８枚の顔の画像を学習画像として主成分分析を行い、主要な２０の固有ベクトルを用いて顔の認識を行うテストを実施し、２００の顔画像に対し９５％の認識率を得ている。
【００１８】
【発明が解決しようとする課題】
このような固有空間法による画像認識方法は、テンプレートマッチングや基準化された相関関係を用いる標準的認識技術より効果的ではある。しかし、高次元のベクトルで表現される画像の場合、想定により画像処理の計算の省略を行う推論技術がないと膨大な計算が必要となり実際的には不可能である。
【００１９】
また、固有空間法のみでは画像情報に関する知識の構造的記述が難しく画像理解への適応に問題があり、現実に或る対象の認識に適用した場合、必ず生じる誤った処理結果を修正する手法も確立されてない。従って、固有空間法を種々の画像認識に適用を拡張するためにはシステム理論が不可欠といえる。
【００２０】
本発明は、光学手段により入力される認識対象の全体的画像の画像情報処理を行う全体画像処理手段と前記全体画像の部分画像である局所画像の画像情報処理を行う局所画像処理手段とで構成し、前記各画像情報処理手段が入力画像の特徴を抽出する機能と整合性を評価する機能とを有し、その整合性の評価に基づいて他の像情報処理手段の処理機能を活性化又は抑制することによって、不確実な画像認識からより確実な画像認識へと展開することにより対象画像の認識を行うに好適なシステムを提供することを目的とする。
【００２１】
更に、本発明は生物の大脳皮質における情報処理機構のモデルである「単純な形状の認識から最終認識画像に至るボトムアップの画像情報の流れと最終認識画像から逆に単純な初期の形状認識に至るトップダウンな画像情報の流れとからなる処理制御法」（Ｋｏｅｒｎｅｒ，Ｔｓｕｊｉｎｏ，Ｍａｓｕｔａｎｉ”ＡＣｏｒｔｉｃａｌ−ｔｙｐｅＭｏｄｕｌａｒＮｅｔｗｏｒｋｆｏｒＨｙｐｏｔｈｅｔｉｃａｌＲｅａｓｏｎｉｎｇ” ＮｅｕｒａｌＮｅｔｗｏｒｋ
ｖｏｌ．１０，Ｎｏ．５，ｐｐ．７９１〜８１４）により、コンピュータの演算負荷を軽減することが可能で、かつ従来のものより短時間に画像認識が可能なシステムを提供する事を目的とする。
【００２２】
更に、本発明は認識対象である全体画像とその対象の部分である特徴的局所の画像とを並行して画像処理を行わせ、自動的に対比しながら不整合な仮説を抑制し、整合する仮説を活性化するシステムを用い、自己学習により認識の確度が改善される高い可能性を有するシステムを提供することを目的とする。
【００２３】
【課題を解決するための手段】
上記目的を達成するため、本発明による画像認識システムは、カメラ等の光学的手段により入力される認識対象の全体的画像と特徴的局所の画像から対象の認識を行う画像認識システムにおいて、全体的画像及び局所画像を解析する全体画像処理手段及び局所画像処理手段を備え、前記局所画像処理手段は局所画像毎に対応する複数の局所モジュールから成り、各局所モジュールは担当する入力局所画像の特徴を抽出する機能と、抽出された特徴と認識画像との整合性を評価する機能とを有し、前記全体画像処理手段は入力される全体画像の全体的特徴を抽出する機能と抽出された全体的特徴と認識画像との整合性を評価する機能とを有し、前記全体画像処理手段は前記局所モジュールからの入力を受けて全体的特徴と不整合な前記局所モジュールの前記機能を抑制し、全体的特徴と整合する前記局所モジュールの前記機能を活性化するよう構成したことを特徴とする。
【００２４】
上記構成によれば、全体画像処理手段による認識対象の全体画像の解析処理と局所画像処理手段による全体画像の特徴的な局所の画像を解析処理とにより画像認識を行うため、複数の狭い画素領域である局所毎の画像情報の解析処理を並行して且つ順次探索的に行うことができ、全体画像と特徴的な局所画像とにより対象の認識を行うなうシステムであるので、複雑な形状の認識対象であっても従来の如く困難な演算を伴う確立的評価を必要とせず仮説推論が可能なシステムとなり、画像の解析処理に要するコンピュータ等の負荷が軽減でき、対象画像の認識に至る時間を短縮できる。
【００２５】
また、局所画像処理手段を構成する局所モジュールにおいて整合性を評価した画像情報を全体画像処理手段において再評価して、整合する局所モジュールのみを活性化し、不整合な局所モジュールは抑制されるシステムであるので、ボトムアップ処理の流れとトップダウン的処理の流れを有し、これら２つの処理の流れが合致した時点で画像が認識されたと見做すことができるので、更にコンピュータの負荷が軽減でき、認識に要する時間を短縮できる。
【００２６】
更に本発明においては、前記局所画像処理手段を構成する各局所モジュールは、局所画像の位相に対応して担当する局所画像が定められており、同一局所画像の異なる位相に対応する複数の局所モジュールとして構成としたので、顔のような全体的には類似形状の認識対象において、位相的に略同位置にある目や鼻等の特徴的な局所画像の解析処理を行うことにより短時間に認識対象の画像認識が行うことがき、局所モジュールの入力を受けて全体画像処理手段が行う整合性の評価も容易に行える。
【００２７】
更に本発明においては、前記局所画像処理手段を構成する各局所モジュールは、局所画像の形状に対応して担当する局所画像が定められており、同一局所画像の異なる位相に対応する複数の局所モジュールとして構成することができるので、複雑な形状の認識対象の最も特徴を表す局所の形状に対応して局所モジュールを設定して当該局所の画像情報に的を絞った解析処理を行い全体画像との整合性を評価することにより認識対象の画像認識ができる。
【００２８】
更に本発明においては、前記局所画像処理手段を構成する各局所モジュールは、入力局所画像の特徴を抽出する機能を有する第１副モジュールと、予め学習した認識対象の当該局所画像の固有空間の知識を有する第２副モジュールと、前記第１副モジュールが抽出した特徴を有する入力局所画像の画像情報を第２副モジュールの固有空間へ投射した結果の射影距離に基づき整合性を評価する機能を有する第３副モジュールとから構成する。
【００２９】
かかる構成により、局所画像処理手段を構成する各局所モジュールの画像処理の機能を三つの副モジュールに分割して段階的に整合性の評価を行うようにしたので、局所画像処理における当初の画像情報は単純な形状や輪郭又はブロッブのような特徴でもよく、順次精度の高い整合性の評価を行えるようにすることがでるので局所画像の特徴の程度に応じて認識に至る画像処理の負荷が軽減される。
【００３０】
更に上記構成によれば、各副モジュール単位で機能の抑制及び活性化が可能となり、より早期に無駄な画像処理が抑制され、順次探索的な画像処理がより合理的に行える画像認識システムとなる。
【００３１】
更に、本発明においては、前記第１副モジュールは、予め学習した認識対象の当該局所画像の平均ベクトルの知識を有するので、局所画像処理手段を構成する各局所モジュールの副モジュールの機能は固有空間法に基づく画像処理により実施することとなり、圧縮されたデータにより画像解析処理が可能となってよりコンピュータの負荷軽減が図れる。
【００３２】
更に本発明においては、前記全体画像処理手段は、予め学習した認識対象の全体画像の平均ベクトルの知識を有する第１副モジュールと、予め学習した認識対象の全体画像の固有空間の知識を有する第２副モジュールと、前記第１副モジュールが抽出した特徴を有する入力全体画像の画像情報を第２副モジュールの固有空間へ投射した結果の射影距離に基づき整合性を評価する機能を有する第３副モジュールとから構成する。
【００３３】
かかる構成により、全体画像処理手段を構成する各局所モジュールの画像処理の機能を三つの副モジュールに分割して段階的に整合性の評価を行うようにしたので、全体画像処理における当初の仮説的な画像処理結果から順次精度の高い整合性の評価を行えるようにすることがでるので全体画像の特徴の程度に応じて認識に至る画像処理の負荷が軽減され、更に各副モジュール単位で機能の抑制及び活性化が可能となり、より早期に無駄な画像処理が抑制され、順次探索的な画像処理がより合理的に行える画像認識システムとなる。
【００３４】
更に本発明においては、前記局所画像処理手段において、同一局所画像の異なる位相に対応する複数の局所モジュール間においては、第１副モジュールが活性化した局所モジュールは異なる位相の解析処理を行っている他の局所モジュールに活性化する信号を出力するようにしたので、最初に整合性が評価された局所モジュールの仮説が早急に他局所モジュールにより確認されるように処理が行われ局所画像の早期の画像認識が可能となる。
【００３５】
更に本発明においては、前記局所画像処理手段において、異なる局所画像を担当する複数の局所モジュールが近似した位相領域からの画像情報を入力している場合において、第１副モジュールが活性化した局所モジュールは他の局所モジュールを抑制する信号を出力するようにしたので、同一入力局所画像情報に基づいて２つ以上の仮説が発生することが防止できるため、より効率的に画像処理が行える。
【００３６】
【発明の実施の形態】
次に、添付の図面を参照しながら本発明の画像認識システムについてさらに詳しく説明する。
【００３７】
図１は、本発明によるシステム構成の全体的説明図である。
【００３８】
図２は、特に局所モジュールを構成する副モジュールが入出力する制御信号を示し、さらに上位モジュール（ＡＩＴ）及び下位モジュール（入力制御手段）との関連を説明するための図である。
【００３９】
図３は、局所モジュールにおける全体の画像処理のフローを示す。
【００４０】
図４は、システム全体の画像解析処理のフローを示す。
【００４１】
図５は、本発明による画像認識システムにより人の顔を認識画像の対象として行ったシミュレーションの結果を示す図表である。
【００４２】
図１において、認識対象の画像を含む視覚イメージ情報はビデオカメラのような適当な光学的手段（１０）により入力され、公知の技術によりピクセルマップのフレームにデジタルデータとして保持される。カメラ（１０）にはカメラ制御手段（１２）が付設されて人間の目の機能に類似した機能をカメラ（１０）に与えるよう準備され、イメージの或る部分に焦点を合わせたり、所定の特定部分のみに個別に向けるようカメラ（１０）を操作することができる。
【００４３】
全体的なイメージ情報から認識対象の画像を選択的に限定するために入力イメージ情報は前注意処理手段（１３）により処理される。前注意処理手段（１３）には低透過度フィルタやカラーフィルタによるフィルタ処理や、場合によっては全体的イメージ中に動きのある対象物を特定するためのモーション検知手段や、認識対象の画像のスケールを正規化するための処理手段を備えていてもよい。
【００４４】
又、前注意処理手段（１３）にはカメラ入力時の照明の変動を補償するコントラスト正規化処理や、使用するＣＣＤカメラのリニアレスポンス特性を補償する正規化処理を行う手段等を含むこともある。
【００４５】
本発明の全体構成には、図には示されていないが、グラフィック又はイメージ処理ボード及び使用者がシステムに介入するためのキーボードやマウスを備える。これらは当技術分野では良く知られた標準機器である。前処理をされデジタル化された入力イメージ情報はメモリ機能及び主成分分析機能を有する前部モジュール（１１）のメモリに書き込まれる。
【００４６】
前部モジュール（１１）のメモリに入力されるイメージ情報は認識対象の画像情報に前処理手段により限定されても良いし、対象の認識に背景が重要な情報となる場合は全体のイメージ情報であっても良い。前部モジュール（１１）のメモリに入力された画像情報は操作指令により解析システムに送出されるが、解析システムは図１に概念的に示すように以下に説明する２つの副システムから構成される。
【００４７】
第１の副システムは局所画像処理手段（１５）（以下ＰＩＴと略称する）である。第２の副システムは全体画像処理手段（１４）（以下ＡＩＴと略称する）である。
【００４８】
副システムＰＩＴ（１５）は認識対象の画像の一部を構成する局所の画像の解析処理を担当するシステムで、全体画像の認識に有用な特徴を有する局部の画像に対応して解析処理を行うモジュールを備える。この解析処理モジュールを局所モジュール（２１）と名付ける。
【００４９】
この局所モジュール（２１）は全体画像に対する局所の位相に対応して設けられ、前記前部モジュール（１１）のメモリから位相に応じた局所画像情報がそれぞれ対応する局所モジュール（２１）に送出される。
【００５０】
局所モジュール（２１）に入力する画像情報は前部モジュール（１１）においてＰＣＡ処理等により圧縮された情報であってよい。入力された局所画像情報は、対応する位相の局所モジュール（２１）により並行的に特徴を抽出するための解析処理がなされる。抽出した入力局所画像の特徴は、局所モジュール（２１）が知識として有する予め学習された認識対象の画像の固有空間に投射されて整合性が評価される。整合する局所画像情報は副システムＡＩＴ（１４）に入力されて、副システムＡＩＴ（１４）が解析処理する全体画像との整合性が評価されて認識対象の画像認識がなされる。
【００５１】
副システムＡＩＴ（１４）は、前部モジュール（１１）より全体画像情報を入力し認識対象の全体画像の特徴を抽出する。抽出された特徴は、副システムＡＩＴ（１４）が知識として有している予め学習した認識対象の固有空間に投射されて、整合性を評価する解析処理が行われ、更に副システムＰＩＴ（１５）の局所モジュール（２１）から局所画像の情報を入力し、全体画像との整合性を評価して認識対象の画像認識が行われる。
【００５２】
例えば、認識対象が人間の顔や自動車であれば、局所とは顔や自動車の特徴を表す顔や自動車の部分的な局所、例えば目、鼻、口又は顔の輪郭等であり、自動車の場合は例えばフロントグリル、ホイール形状等である。
【００５３】
本発明においては、以上のシステム構成により、画像処理の初期における抽出された単純な輪郭形状やブロッブの特徴に基づく整合性を仮説と見做し、かかる仮説に基づいて局所モジュール（２１）間でその機能を活性化または抑制する通信を行いながらより高精度の整合性の評価を順次行い、更に全体画像の画像処理を行う副システムＡＩＴ（１４）との間での通信による活性化、抑制を通じて全体的に整合性のある解釈をアナログ的に徐々に形成していくマルチエージェント計算による仮説推論により画像認識を行う。
【００５４】
副システムＰＩＴ（１５）は認識対象画像の局所に対応する複数の局所モジュール（２１）の配列からなり、各局所モジュール（２１）は担当する局所画像領域の位置を主要な測度として入力位相と関連して配置される。一つの局所画像に対応する局所モジュール（２１）は同一局所画像の異なる位相を担当する複数の局所モジュールより構成される。副システムＡＩＴ（１４）は全体画像の処理モジュールで構成されるが、画像の全体的特徴を扱うため、副システムＰＩＴ（１５）のように入力位相との関連はない。
【００５５】
図２に基づき副システムＰＩＴ（１５）の各局所モジュール（２１）の構成とその機能について説明する。説明をより簡便にするため次のように定義する。各局所モジュール（２１）は３つの副モジュールＲ０，Ｒ１，Ｒ２から構成される。副モジュールＲ０，Ｒ１，Ｒ２は計算モジュールとしては同質であるが内部知識は異なる。即ち、画像情報の処理ルートは下記の如きルートとなる。
【００５６】
カメラ→前部モジュール→Ｒ０→Ｒ１→Ｒ２→ＡＩＴ
【００５７】
各モジュールは個々に入力データを解釈し行動する画像処理のエージェントであり、以下においてでエージェントと表現する。今１つのエージェントを基準として入力に近いエージェントをそのエージェントの下位エージェントと呼ぶ。又基準のエージェントよりＡＩＴに近いものをそのエージェントの上位エージェントと呼ぶ。
【００５８】
基準のエージェントと同じ位置にある他局所モジュールのエージェントは同位エージェントと呼称する。即ち、或る局所モジュール（２１）にとって、前部モジュール（１１）は下位エージェントであり、副システムＡＩＴ（１４）を構成するモジュールは上位エージェントであり、副システムＰＩＴ（１５）の他局所モジュールは同位エージェントとなる。又、局所モジュール（２１）の担当する局所画像と位相が近似している局所画像を担当する局所モジュール（２１）を近傍エージェントと呼称する。
【００５９】
さらに、本発明における画像情報処理は、前記矢印の方向の下位エージェントから上位エージェントへのボトムアップ処理と矢印と反対の流れ即ち上位エージェントから下位エージェントを制御するトップダウン処理により行われる。
【００６０】
図２において、副モジュールＲ０（２２）は担当する局所画像について予め学習された画像の平均ベクトルの知識Ψを持つ。本発明のシミュレーションを行った人間の顔の場合、鼻の画像を担当する局所モジュール（２１）の副モジュールＲ０（２２）の平均ベクトルΨは、鼻を中心としたＭ個の画像からＮ次元の局所データＭ個を収集し、各データを正規Ｎ次元ベクトルΓ_i（１≦ｉ≦Ｍ）とし、次式（３）により平均ベクトルΨを求める。
【００６１】
【数２】

【００６２】
副モジュールＲ０（２２）はその活性度が仮説の強度を表す「仮説制御エージェント」である。正規化されて入力される新しい画像情報Γ_NEWに対して、その初期値はベクトルΓ_NEW・Ψの値となり、これを初期仮説と見做す。副モジュールＲ０（２２）の値は、近傍あるいは上位エージェントからの入力により後に述べるように制御され、全体整合的な画像認識へと展開される。
【００６３】
副モジュールＲ１（２３）は担当する局所画像について予め学習された画像の固有空間の知識Ｕを有する。固有空間Ｕは前記局所画像データのＮ次元ベクトルΓ_iから平均ベクトルΨを差し引いた Φ_i＝Γ_i− Ψ で求められるベクトルΦの分布の最小二乗平均誤差の意味で最も良い記述として、次式（４）で示す正規直交ベクトルｕ_kをｋ＝１から順次ｋ＝Ｍ′（Ｍ′≦ Ｍ）まで求める。
【００６４】
【数３】

【００６５】
ｋ番目のベクトルｕ_kは式（４ー２）の条件の下で式（４ー１）が最大となるよ選択される。ベクトルｕ_kとスカラーλ_kは下記数式（５）で表される共分散マトリックスＣの固有ベクトル及び固有値となり、ベクトルｕ_kにより張られる空間をこの局所の固有空間Ｕと呼ぶ。
【００６６】
【数４】

【００６７】
副モジュールＲ０（２２）で抽出された入力画像Γ_NEWの特徴を表現するベクトル（Γ_NEW−Ψ）は副モジュールＲ１（２３）が有する学習局所画像の個々の特徴を表現する固有空間Ｕに投射され、入力ベクトルの固有空間への射影距離（ＤｉｓｔａｎｃｅＦｒｏｍＦｅａｔｕｒｅＳｐａｃｅ；以下ＤＦＦＳという）を算出する。
【００６８】
副モジュールＲ１（２３）に入力される局所画像情報は、副モジュールＲ０（２２）における特徴の評価により作動するスイッチ（ＳＷ２）を介して前部モジュール（１１）のメモリからより精度の高い情報を入力してもよい。また、副モジュールＲ１（２３）は上位エージェントからの信号により、固有空間Ｕのベクトルを入力局所画像のベクトル空間へ投射した逆写像から下位エージェントを制御するトップダウン処理を行う。
【００６９】
副モジュールＲ２（２４）は、入力ベクトルの固有空間への射影距離（ＤＦＦＳ）を主情報として、入力局所画像と投射された固有空間との整合性を評価する機能を有し、整合性の評価は入力局所画像ベクトルΓ_newの特徴を表すベクトル（Γ_new−Ψ）と固有ベクトルｕ_kとの内積から求まる入力局所画像ベクトルΓ_newの固有空間Ｕへの射影距離ＤＦＦＳにより行われる。
【００７０】
また、副モジュールＲ２（２４）は上位エージェントである副システムＡＩＴ（１４）からのトップダウン制御により、上述したボトムアップ処理による入力局所画像の整合性の評価に関して活性化、抑制の制御を受ける。
【００７１】
副システムＡＩＴ（１４）は全体画像の解析処理を行うモジュールからなる。このモジュールは上記した局所モジュール（２１）の構成と同一構成とすることが出来る。即ち、副モジュールＲ０，Ｒ１，Ｒ２で構成し、各副モジュールの所有する知識も機能も対象画像が局所画像の代わりに全体画像となる以外は全く前記の説明と同じ知識と機能を有する。
【００７２】
図２に、局所モジュール（２１）を構成する副モジュール（２２）（２３）（２４）の制御作用を矢印で示している。以下制御作用について説明する。制御作用には主として局所モジュール（２１）相互の間の制御作用として４種類、さらに、主として同一局所モジュール（２１）内の副モジュール（２２）（２３）（２４）間の制御作用として２種類の制御作用がある。
【００７３】
まず局所モジュール（２１）相互の間又は局所モジュール（２１）と上位または下位エージェントとの間の制御作用について説明する。
▲１▼活性型局所制御Ａ型（矢印Ａで示す。）
副システムＰＩＴ（１５）において、同一局所画像の位相の異なる入力画像を担当している複数の局所モジュール（２１）のうちの一つの局所モジュール（２１）の副モジュール（２２）が活性した場合、同一局所画像を担当している他の局所モジュールの副モジュールＲ０（２２）に活性化信号を出力して、当該局所画像を担当するすべての局所モジュール（２１）の副モジュールＲ０を活性化して、入力局所画像の画像処理を促進する。
【００７４】
例えば、鼻の画像処理を担当している複数の局所モジュール（２１）の一つが副モジュールＲ０（２２）において抽出した入力画像の特徴が整合性ありと評価されて活性化した場合、鼻の画像処理を担当しているすべての局所モジュール（２１）の副モジュールＲ０（２２）を活性化することにより鼻の画像処理を促進するのである。特に近傍エージェントを構成する局所モジュール同士でこの制御を行う事により位相が近似している局所画像の特徴の推定が短時間で可能となる。
【００７５】
▲２▼抑制型局所制御Ｂ型（矢印Ｂで示す。）
異なる局所画像を担当している複数の局所モジュール（２１）が、近似した位相領域からの画像情報を受けている場合は、ある特定の局所モジュール（２１）の副モジュールＲ０（２２）が活性化した場合には他の局所画像を担当している局所モジュール（２１）の副モジュールＲ０（２２）に抑制信号を与える。これにより近似した位相領域の画像情報に対して複数の仮説が競合することを防止する事が出来る。
【００７６】
▲３▼トップダウン制御Ｃ型（矢印Ｃで示す。）
トップダウン制御とは活性化されて作動しているエージェントが自ら入力を受けている下位エージェントに対して行う制御である。トップダウン制御には活性化制御と抑制制御の２種類がある。局所モジュール（２１）において整合性を評価された局所画像情報は、副システムＡＩＴ（１４）に入力されて、副システムＡＩＴ（１４）が入力画像を解析処理して整合性を評価した全体画像の当該局所画像と整合するか否かが評価される。非整合と判定された場合、当該局所モジュール（２１）は抑制信号により作動は停止される。
【００７７】
局所モジュール（２１）の副モジュールＲ０（２２）において非整合と判定された場合は下位エージェントである前部モジュールに抑制信号が出力され、当該局所モジュール（２１）への入力画像情報が制御される。また、副システムＡＩＴ（１４）の全体画像処理により整合性が評価された結果に基づいて、当該全体画像の局所画像を担当する不活性状態にある局所モジュール（２１）に活性化信号が出力されて、当該局所モジュール（２１）の作動が促される。
【００７８】
▲４▼ボトムアップ制御Ｄ型（矢印Ｄで示す。）
局所モジュール（２１）の副モジュールＲ２（２４）における射影距離（ＤＦＦＳ）を主情報とする評価が閾値以下になると、スイッチ（ＳＷ１）が閉じ副モジュールＲ１（２３）の固有空間の情報が整合した局所画像情報として、副システムＡＩＴ（１４）に出力される。副モジュールＲ２（２４）の評価が閾値以下と判定される場合は、局所モジュール（２１）におけるボトムアップ的な画像情報処理により各局所モジュール（２１）の持つ知識で充分説明できた場合と、副システムＡＩＴ（１４）からのトップダウン制御（Ｃ型）により活性化された局所モジュール（２１）の画像情報処理により整合性が認められる場合とがある。
【００７９】
次に局所モジュール（２１）を構成する副モジュールＲ０，Ｒ１，Ｒ２間の２種類の制御作用について述べる。
▲１▼局所モジュール内のトップダウン制御Ｅ型（矢印Ｅで示す。）
局所モジュール（２１）を構成する副モジュールＲ２（２４）での評価が閾値以下と判定されると、副モジュールＲ２（２４）は同局所モジュール（２１）の副モジュールＲ０（２２）を抑制する。副モジュールＲ２（２４）の評価値が閾値以下になる場合、上記ボトムアップ制御（Ｄ型）により副モジュールＲ１（２３）の情報はスイッチ（ＳＷ１）により副システムＡＩＴ（１４）に出力され、それが整合的であれば画像の認識は完了する。
【００８０】
副モジュールＲ０（２２）を活性化し続けた場合、前記抑制型局所制御（Ｂ型）により近傍の異なる局所モジュール（２１）の副モジュールＲ０（２２）は抑制されたままである。これは局所画像の認識がまだ不確定な状態の場合の順次探索の選択肢を狭めることになる。既に認識を完了した局所モジュール（２１）の副モジュールＲ０（２２）を抑制することにより近傍エージェントを構成する他の局所モジュール（２１）を活性化することが可能となる。
【００８１】
▲２▼局所モジュール内の副モジュール入力制御Ｆ型（矢印Ｆで示す。）
局所モジュール（２１）において、副モジュールＲ０（２２）の値が閾値以下にならない限りスイッチ（ＳＷ２）が閉じる事がなく副モジュールＲ１（２３）に画像情報は入力しない。副モジュールＲ０（２２）による抽出特徴のレベルが閾値により評価されるとスイッチ（ＳＷ２）が閉じ、前部モジュール（１１）から副モジュールＲ１（２３）の固有空間に投射される画像情報が入力される。
【００８２】
このような制御により無駄な計算を避けてコンピュータの負荷を軽減すると共に、副モジュールＲ１（２３）で投射される画像情報の質を向上し上位エージェントに与える雑音情報を減少させることができる。同様に、副モジュールＲ２（２４）も値が閾値以下になり整合性が評価されるまでは副モジュールＲ１（２３）の画像情報を副システムＡＩＴ（１４）に発信しないようにスイッチ（ＳＷ１）を設けて制御している。
【００８３】
図３に示す副システムＰＩＴ（１５）における画像処理の全体的フローについて説明する。以下の説明は実際に本発明のシミュレーションを行った人間の顔の認識の場合を例として説明するが、それは顔が幾つかの副記号による部品を持った対象画像であり、構造的記号表現に適しているからであって、本発明の適用が顔の認識に限定されるものではなく、局所の画像から全体画像を再構築する事により入力の全体画像を認識する画像認識のすべてに適用可能な技術である。
【００８４】
尚、本システムのシミュレーションはＳｕｎワークステーションＳＳ１０上にＧＮＵのＣ＋＋言語を用いて実施された。
【００８５】
ステップ１０１において、画像情報のデータが入力される。データはカメラにより入力される全体画像から任意に設定される注意の窓によりスキャンされる画素領域の画像情報データで、ステップ１０２で正規化モジュールにより、カメラ入力の際の照明の変化がもたらす入力イメージ画像データの変化や、又はＣＣＤカメラのリニアレスポンス特性がもたらす入力画像データの変化を補償する正規化処理、画像スケールの正規化処理等の必要な前処理が行われた後、前部モジュールのメモリに格納される。
【００８６】
メモリに格納された画像情報は注意の窓のスキャン位置情報に基づき、入力位相と対応する局所モジュール（２１）の副モジュールＲ０（２２）にそれぞれ入力される。入力画像情報は前部モジュール（１１）により適切なＰＣＡを行われてもよい。
【００８７】
副モジュールＲ０（２２）はすべての局所画像について学習により得られた多数の画像データから、画素位置に対応する局所ごとの平均ベクトルにより定義される。シミュレーションにおいては、左右の目、鼻、口、１２種類に区分された顔の輪郭について、主成分の２００次元ベクトルの平均ベクトルを用いた。ステップ１０３において、各局所モジュール（２１）の副モジュールＲ０（２２）は入力局所画像データについて平均ベクトルをテンプレートとしてパターンマッチングを行い距離に応じたアナログ出力をする。
【００８８】
副モジュールＲ０（２２）は入力局所画像の近傍（３×３画素領域）を走査し最大活性な入力局所画像を選択することができる。しかし、このような平均ベクトルをテンプレートにする場合のパターン検出能力は低く曖昧であるので副モジュールＲ０（２２）を局所画像の仮説生成部と考えることができる。
【００８９】
副モジュールＲ０（２２）の所有する知識を平均ベクトルとすることは、入力画像情報の特徴抽出には好都合であり、抽出された特徴を副モジュールＲ１（２３）に出力して固有空間に投射することができる。しかし副モジュールＲ０を上記の如く仮説生成部と見做し、副モジュールＲ１（２３）に出力する画像情報を副モジュールＲ０（２２）の活性度に応じて前部モジュール（１１）のメモリから入力する場合は、より数の少ない主成分ベクトルによる簡単なテンプレートを副モジュールＲ０（２２）の知識とすることもできる。
【００９０】
ステップ（１０４）において、上記マッチングの結果が整合評価値として特定される。本実施例においては、副モジュールＲ０（２２）における整合性の評価を副モジュールＲ２（２４）で行うようにしている。即ち副モジュールＲ２（２４）を局所モジュール（２１）の評価モジュールとして構成している。ステップ１０４では前述した副システムＡＩＴ（１４）からの活性化入力（Ｃ型）や他局所モジュール（２１）からの抑制信号入力（Ｂ型）、活性化信号入力（Ａ型）及び自局所モジュール（２１）の副モジュールＲ２（２４）からの抑制信号入力による制御を受ける。抑制信号の入力がなく活性が維持される場合は、Ｒ０関数で評価されてそれぞれの評価結果に基づいて他エージェントへ信号出力が行われる。Ｒ０関数は次式（６）で表現される。
【００９１】
【数５】

【００９２】
副モジュールＲ０（２２）においては、前記制御作用の項で説明したように、他エージェントの副エージェントＲ０から活性化型局所制御Ａ型（矢印Ａ）又は抑制型局所制御Ｂ型（矢印Ｂ）の制御信号により活性化又は抑制の制御を受け、更に副モジュールＲ２（２４）及び副システムＡＩＴ（１４）からのトップダウン制御を受ける。副モジュールＲ２（２４）からのトップダウン制御（矢印Ｃ）は抑制性のもので、副モジュールＲ０（２２）の信号処理レベルに対し上位エージェントで既に解釈された仮説についての処理を停止させる。
【００９３】
副システムＡＩＴ（１４）のトップダウン制御（矢印Ｃ）は活性化と抑制の双方があり、局所モジュール（２１）が画像処理を行い整合性を評価した局所画像情報を副システムＡＩＴ（１４）の全体画像へ投射して再評価される結果により局所モジュール（２１）の機能をトップダウン的に制御する。ステップ１０５において、テンプレートｋと入力画像データのベクトル間の距離に基づく値（ｆ）が閾値以下であると、同一局所モジュール（２１）の副モジュールＲ１（２３）に局所画像情報が前部モジュール（１１）のメモリから送出される。更に、ステップ１０４における評価に基づいて他局所モジュールの副モジュールＲ０へＡ型、Ｂ型の制御信号が送出される。
【００９４】
副モジュールＲ１（２３）は学習された各局所画像の個々の画像情報による固有空間の知識を有する。本シミュレーションにおいては学習された画像情報の２００次元のベクトルに対して大きい値の上位２０主成分のベクトルを用いて局所画像の個々の固有空間を生成した。
【００９５】
ステップ１０６において、局所モジュール（２１）の副モジュールＲ０（２２）が活性化すると、副モジュールＲ１（２３）が当該局所の入力画像情報を前部モジュール（１１）のメモリから入力される。入力する局所画像情報Γ_NEWは平均ベクトルを減じた特徴を表すベクトルであり、当該ベクトルの上位２０の主成分ベクトルが副モジュールＲ１（２３）の有する個々の固有空間に投射される。即ち次式（７）で表現される。
【００９６】
【数６】

【００９７】
副モジュールＲ１（２３）における入力局所画像のベクトルと副モジュールＲ１（２３）が知識として有する固有空間との射影距離（ＤＦＦＳ）を画像認識評価に用い、この評価を副モジュールＲ２（２４）のステップ１０７でで行う。
【００９８】
ステップ１０７の評価を行う副モジュールＲ２（２４）は、上位エージェントである副システムＡＩＴ（１４）における全体画像の画像処理結果に基づいて、優先的に作動させる局所モジュール（２１）を活性化させるトップダウン制御（矢印Ｃ）を副システムＡＩＴ（１４）から受け、また他局所モジュール（２１）において入力画像の整合性が評価された場合には、当該入力画像の画像処理を行っている局所モジュール（２１）に抑制信号を出力する。これらの制御信号により副モジュールＲ２（２４）における不要の処理が抑制され、システム全体の画像認識処理の時間が短縮される。
【００９９】
副モジュールＲ２（２４）における評価は次式（８）で表される。
【０１００】
【数７】

【０１０１】
ステップ１０７において、評価値が閾値以下であれば、副モジュールＲ１（２３）の局所画像情報を上位エージェントである副システムＡＩＴ（１４）に送出する（矢印Ｄ）。更に、副システムＡＩＴ（１４）からのＣ型トップダウン制御を受けて活性化された局所モジュールの副モジュールＲ２（２４）は副モジュールＲ１（２３）の固有空間を入力画像ベクトルに投射した逆写像をトップダウンの仮説として、副モジュールＲ０（矢印Ｒ０）に出力する。トップダウン仮説と整合的な局所モジュールは支持され、非整合的な局所モジュールは抑制され、整合的であるのに作動していない局所モジュールは活性化を促される。
【０１０２】
副モジュールＲ２（２４）において射影距離（ＤＦＦＳ）を主情報とする整合性の評価が閾値以下となり入力画像が認識された場合、自局所モジュール（２１）の副モジュールＲ０（２２）へ抑制信号（矢印Ｅ）が転送される。
【０１０３】
副システムＡＩＴ（１４）は全体画像の画像情報を解析処理するモジュールで構成され、前部モジュール（１１）から全体画像情報を、副システムＰＩＴ（１５）から各局所モジュール（２１）により処理された局所画像情報を入力し、各局所画像が整合する全体画像の認識を行う。
【０１０４】
副システムＡＩＴ（１４）のモジュールは、局所モジュール（２１）と同様の副モジュールＲ０，Ｒ１，Ｒ２で構成される。その機能も局所モジュールの該当する副モジュールの機能と同一の機能とした。即ち、副モジュールＲ０は学習した全体画像の平均ベクトルの知識を有して入力全体画像のテンプレートの機能を有している。副モジュールＲ１は学習全体画像の個々の固有空間の知識を有して副モジュールＲ０に於ける入力全体画像のテンプレートによる評価に基づき全体画像情報を固有空間に投射する機能を有している。副モジュールＲ２は射影距離（ＤＦＦＳ）の閾値の知識を有し副モジュールＲ１の表現値を用いＤＦＦＳにより整合性の評価を行う。
【０１０５】
本シミュレーションにおいては、副モジュールＲ０の平均ベクトルは学習した１０５人の顔画像の中から３５人の顔の全体画像（１２８×１２８画素位置×２００主成分ベクトル）を適当に中心を合わせて選択した画像データ（１０５×１０５画素位置×３主成分ベクトル）の平均により定義した。
【０１０６】
１０５×１０５画素位置は図４に示す注意の窓（２０３）の画素領域で、実際のカメラからの入力画像からは顔の位置は特定できないので、副システムＡＩＴ（１４）の指令によりカメラを走査し、副モジュールＲ０（２０４）におけるテンプレートとのマッチングにより注意の窓の位置が定められる。
【０１０７】
副システムＡＩＴ（１４）で定義される注意の窓のサイズは１０５×１０５画素であるが、副システムＰＩＴ（１５）からの副システムＡＩＴ（１４）への入力は３５×３５画素位置×１６局所となる。本シミュレーションにおいては使用したコンピュータの計算能力を考慮し１２×１２画素位置×１６局所×２０成分ベクトルとした。
【０１０８】
図４に示す本発明の全体のフローチャートについて、顔のシミュレーションの例に基づいて説明する。
【０１０９】
カメラにより入力される画像は前述した種々の正規化処理の後、顔の全体画像として前部モジュール（１１）のメモリに記憶される（２０２）。本シミュレーションはにおいては１２８×１２８画素領域で２５６階調濃淡画像を用いた。
【０１１０】
全体画像はデータバス（３００）を介して副システムＡＩＴ（１４）に入力される（２０４）。顔の全体画像（３主成分ベクトル）は副モジュールＲ０のテンプレートマッチングにより人間の顔であるかが評価され、テンプレートである平均ベクトルにより入力顔画像の特徴が抽出される。副モジュールＲ０（２０４）の評価の後、顔画像は副モジュールＲ１の１０５人の顔の固有空間に投射される（２０５）。
【０１１１】
副システムＡＩＴ（１４）の副モジュールＲ２において、これらのＤＦＦＳが求められ閾値と対比される（２０６）。副モジュールＲ２の評価で一つの固有空間のＤＦＦＳが閾値により特定される場合は入力顔画像はその固有空間により特定される個人の顔と認識できる。しかし、計算の容易さ又は計算速度を考慮して少ない主成分ベクトルをサンプリングして以上の画像データ処理を行っているため副モジュールＲ２の評価において複数の固有空間と整合するる場合がある。
【０１１２】
このような場合は、整合と評価された複数の固有空間の顔（全体画像）に属する局所画像を担当する福システムＰＩＴ（１５）の局所モジュール（１５）にトップダウン制御として活性化信号が送出される（ステップ２０７）。
【０１１３】
一方、入力された全体画像は副システムＰＩＴの各局所モジュールの位相情報により設定される注意の窓により走査され（２０３）、口、鼻、左右の目、１２の顔の輪郭線分の局所画像が生成されてデータバス（３００）により対応する局所モジュールに入力される。例えば鼻の局所モジュールに入力する主成分の局所画像データは副モジュールＲ０において学習した鼻の平均ベクトルをテンプレートとして鼻の画像であることを認識されると同時に入力鼻画像の特徴が抽出される（２０８）。
【０１１４】
入力局所画像が鼻画像であると認識されると、副モジュールＲ１の鼻の固有空間に抽出された特徴を表現するベクトルを投射するか又は前部モジュールのメモリからより高次元の入力画像情報を入力して副モジュールＲ１の鼻の固有空間へ投射する（２０９）。投射の結果算出して得られる射影距離ＤＦＦＳが副モジュールＲ２の有する閾値により評価されて整合したと判断されると（２１０）、福モジュールＲ１（２３）の固有空間の画像情報がデータバス（３００）を介して副システムＡＩＴ（１４）に送出される。
【０１１５】
副システムＡＩＴ（１４）は、全体画像の画像処理により認識した個人の顔の固有空間に位相を合わせて入力される鼻の画像情報を投射して整合性を評価する（２０５）。この場合、副システムＡＩＴ（１４）への局所画像情報はメモリ（１１）からより高次元の主成分の画像情報を入力するよう構成してもよい。また、副システムＡＩＴ（１４）における局所モジュール（２１）からの画像情報の整合性の評価は、副システムＡＩＴ（１４）の認識した固有空間へ局所モジュール（２１）からの入力画像情報を投射する方法に換えて、副システムＡＩＴ（１４）が認識した個人の画像情報に基づいて、対応する局所画像のテンプレートを用意し、それを用いて整合性を評価するよう構成してもよい。
【０１１６】
副システムＰＩＴ（１５）の副モジュールＲ０（２２）において鼻画像であると認識されると他の局所モジュールへＢ型の制御信号を送出し、同一局所画像情報を処理する他の局所モジュール（２１）を抑制し、位相が近似する他局所画像を担当する局所モジュール（２１）の副モジュールＲ０（２２）へ活性化信号（Ａ型）を出力する。
【０１１７】
局所モジュールＲ２における評価で整合性が評価された場合（２１０）は、同局所モジュールの副モジュールＲ０にＥ型の制御信号を出力し（２０８）作動を抑制する。
【０１１８】
副システムＡＩＴ（１４）における全体画像処理の評価に基づき、トップダウン制御により活性化された副システムＰＩＴ（１５）の局所モジュール（２１）は副モジュールＲ０（２２）で評価された局所画像情報について副システムＡＩＴ（１４）が認識した個人の局所の固有空間との射影距離ＤＦＦＳのみ評価すればよい。副システムＡＩＴ（１４）のトップダウン制御により複数の個人が仮説として局所モジュール（２１）に与えられた場合は副モジュールＲ２（２４）における評価においてその中の一個人の局所画像であることが認識されればその段階で全体画像がだれの顔であるかが特定され認識されたこととなる。
【０１１９】
入力画像が未学習の対象のものである場合は結果として入力画像を特定する認識は不可能である。このような未学習の入力画像がある場合、その画像処理情報を各副システムの副モジュールの知識に自動的に追加する機能を備えることにより、学習能力を有するシステムとなる。更に、一度画像認識を行った入力画像情報の特徴と整合した認識画像または各副システムの固有空間とを対応して記憶しておくことにより、入力画像の特徴に対応する特定のモジュールを優先的に活性化させることも可能であり、より学習効果の高いシステムとなる。
【０１２０】
図５は、シミュレーションにおける局所モジュール（２１）の画像処理経過を視覚的に示したものである。図５ａ）は副システムＡＩＴ（１４）からのトップダウン制御を行わない場合、ず５ｂ）は同じ入力画像の認識において副システムＡＩＴ（１４）からのトップダウン制御を行った場合である。図において列方向は時間ステップ（１ステップは５ミリ秒）を表し、１行目は副システムＡＩＴ（１４）における全体画像処理の経過を示している。２行目は副システムＰＩＴの１６の局所モジュール（２１）の副モジュールＲ２（２４）の活性状態を示し、以下の各行は右目、左目、鼻、口の各局所モジュール（２１）の副モジュールＲ０（２２）の活性状態を時系列に示している。
【０１２１】
図５に示す入力画像は未学習のもを使用した。図５ａ）に示すトップダウン制御を行わない場合、まず顔の輪郭を検出し、すぐに両目と鼻を検出している。しかし副モジュールＲ０（２２）のテンプレートとして平均ベクトルを用いているため左目や右目の認識には誤りが多く又ステップ９迄では口の検出はできなかった。
【０１２２】
トップダウン制御を行った図５ｂ）の場合は、顔の輪郭、両目、鼻の検出については図５ａ）の場合と同様であるが、ステップ５で口を検出している。図５ａ）ｂ）の副モジュールＲ０（２２）の活性状態を比較すると、トップダウン制御を行った方が副モジュールＲ０（２２）における活性化が少ないことが判る。これはトップダウン制御により不整合な副モジュールＲ０（２２）の活性が抑制されることをしめしており、副モジュールＲ２（２２）において少ない活性化情報の中から最も整合しているものを選択していることが理解できる。
【０１２３】
更に、１行目の副システムＡＩＴにおける全体画像処理の時系列の変化から判るように、最初は弱く、粗いが徐々に強く詳細になっている。従ってトップダウン制御も時間とともに詳細且つ確実になってくることが理解できる。
【０１２４】
表１はシミュレーションに用いた１０５の顔画像について目、鼻、口の局所の検出率を示したものである。従来の固有空間法と比較しても本発明による画像処理方法が検出能力においてかなりの向上を示すことが判る。
【０１２５】
【表１】

【０１２６】
以上、本発明の画像認識システムは、全体画像の認識処理と局所の画像の認識処理を、機能を分割したモジュールにより同時並行的に処理を進行せしめ、処理を行うモジュール間の制御信号により非整合な処理は早期に抑制し、且つ全体画像の認識のため必要な局所の画像処理を促進する機能を有するため、計算の負荷を軽減することが可能でしかも短時間で画像認識が可能となる。
【０１２７】
更に、本発明の画像処理は、上記の構成としたため画像処理の初期においては、極めて圧縮された画像データによる仮説的な画像認識を基に、全体整合的な認識の形成をマルチエージェントによる計算で実現するシステムとなり、従来の仮説推論のごとく困難な演算を伴う確率的評価を必要としない。
【０１２８】
更に、本発明の画像認識システムは多くの局所モジュールが入力画像の処理を順次探索的に行って、全体画像の認識に至るボトムアップ的な処理の流れと、全体画像の処理から得る仮説的認識を局所モジュールによる処理により確認するトップダウン的な処理の流れを有し、これら２つの処理の流れが合致する処理エージェントにおいて整合すると判断された時点で全体画像の認識が確立されたと見做し得るため早い時期での画像認識が達成できる。
【図面の簡単な説明】
【図１】本発明の実施形態の一例を示すシステム構成の全体的説明図。
【図２】局所モジュールを構成する副モジュールが入出力する制御信号を示す説明図。
【図３】局所モジュールにおける全体の画像処理を示すフローチャート。
【図４】システム全体の画像解析処理を示すフローチャート。
【図５】本実施形態のシミュレーション結果を示す説明図。
【符号の説明】
１０…光学的手段（カメラ）、１４…全体画像処理手段（ＡＩＴ）、１５…局所画像処理手段（ＰＩＴ）、２１…局所モジュール。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an artificial brain system, and more specifically, various image information input from an optical device such as a camera is digitized by an image processing method approximated to visual recognition of a living organism, and is calculated and processed by a computer. The present invention relates to an image information processing system using a computer that can recognize an image and can recognize the image more efficiently by self-learning.
[0002]
[Prior art]
Conventionally, as the cerebral physiological function of visual recognition in living organisms has been elucidated, efforts have been made to construct an artificial visual learning system by constructing a model that approximates the recognition function using a computer or the like. Such a visual learning system recognizes and identifies or classifies a certain object in the scene by, for example, expressing the scene inputted by a video camera in numerical form and analyzing it. In other words, this is a system that performs an analysis process for confirming a match between an input image pattern and a recognition target image pattern accumulated by learning.
[0003]
The input scene is converted into a numerical value such as a voltage value corresponding to the amount of light on the pixel and expressed as a vector. For example, in the case of the image size of the pixel region 27 × 27, it is expressed as a form in which 729-dimensional vectors are dispersed in the orthogonal axis space. It is almost impossible to identify a target image pattern by analyzing such a large amount of data because of the ability of existing computers.
[0004]
Therefore, these analyzes require processing to replace the input image pattern to be recognized with compressed data representing features so that it can be compared with the stored learning pattern relatively easily. . In order to analyze the data efficiently, it is desirable to divide into so-called partial spaces so as to be limited to the most characteristic area of the data space.
[0005]
There is a principal component analysis (PCA) as a technique for solving such a requirement. This method is based on the recognition that the distribution of target image data in a multidimensional image space can be grouped into characteristic distributions and the main components of eigenvectors can be used to identify the distribution. That is, these eigenvectors are attributed to the amount of change in the amount of light on the pixel corresponding to the change in the image group, and are considered to be a feature group that cooperatively characterizes the change in the image group.
[0006]
Each vector corresponding to the target image may greatly contribute to the eigenvector, or may not contribute so much. The target image can be almost expressed by a combination of main components of eigenvectors having large eigenvalues due to large changes in the image group.
[0007]
In other words, a large number of eigenvectors are necessary to reproduce the target image very accurately, but if the feature of the appearance of the target image is only expressed, it can be expressed with a small number of eigenvectors. . A system for recognizing a human face using the above method is disclosed in US Pat. No. 5,164,992.
[0008]
The technology is summarized below. First, a number of known human face images are learned. If the number of pixels of the face image is N, M face images are N²Vector Γ₁, Γ₂, Γ_Three───Γ_mThe average value of these face image vectors is obtained.
[0009]
Taking the difference between each person's face vector and the average, [Φ_i= Γ_i[Mean Vector]]. This vector group is expressed as A = [Φ₁, ────Φ_M], The covariance matrix of A C = AA^TEigenvector and vector u as eigenvalue_kAnd scalar quantity λ_kIs calculated to obtain the eigenspace of the face.
[0010]
The matrix C is N when the number of pixels of the image is N × N.²With eigenvectors and eigenvalues. However, the dimension N of the whole image space including the face background²When the number M of face data is smaller (M << N²) Is an M × M dimensional matrix L = A for facial image recognition.^TWhat is necessary is just to calculate the eigenvector of A. M eigenvectors ν of matrix L_iEigenspace of face u_l= Aν_iIs required. By the above analysis, the data is compressed and the number of calculations is greatly reduced.
[0011]
The input face image (Γ) is converted into a component of the face space by a simple operation. This process is said to be projected onto the eigenspace of the face.
[0012]
ω_k= U_k ^T(Γ−Ψ) Ψ; average vector
This operation is performed by the image processing apparatus. Ω^T= [Ω_1,ω₂───ω_M] Represents the degree to which the eigenspace of each face contributes to the pattern of the input image by weight. This vector Ω is used for standard pattern recognition.
[0013]
Eigenspace Φ of the face defined by the input image Φ = Γ−Ψ and the following equation (1)_fIs defined by the following equation (2). If ε is within the threshold, the input image is Φ_fRecognize that
[0014]
[Expression 1]

[0015]
That is, the partial space of the face image can be limited by determining a vector that best evaluates the distribution of the face image in the entire image scene. This makes it possible to focus on a limited set of data that significantly reduces the number of data and forms facial features.
[0016]
Once the evaluation vector is determined, it is possible to classify whether the input image is a face, and if it is known to be a face, it can be recognized by comparing it with a known individual face pattern accumulated. It becomes possible.
[0017]
Pentland et al., The applicant of the US patent, performed a principal component analysis using 128 facial images as learning images, performed a face recognition test using 20 main eigenvectors, and created 200 facial images. A recognition rate of 95% is obtained.
[0018]
[Problems to be solved by the invention]
Such an eigenspace method image recognition method is more effective than standard recognition technology using template matching or standardized correlation. However, in the case of an image expressed by a high-dimensional vector, if there is no inference technique for omitting the calculation of image processing by assumption, a huge amount of calculation is required, which is actually impossible.
[0019]
In addition, it is difficult to structurally describe knowledge about image information only with the eigenspace method, and there is a problem in adapting to image understanding. Not established. Therefore, system theory is indispensable to extend the application of eigenspace method to various image recognition.
[0020]
The present invention includes an entire image processing unit that performs image information processing of an entire image to be recognized input by an optical unit, and a local image processing unit that performs image information processing of a local image that is a partial image of the entire image. The image information processing means has a function of extracting features of the input image and a function of evaluating consistency, and activates processing functions of other image information processing means based on the evaluation of the consistency or An object of the present invention is to provide a system suitable for recognizing a target image by developing from uncertain image recognition to more reliable image recognition.
[0021]
Furthermore, the present invention is a model of an information processing mechanism in a cerebral cortex of a living organism, “from a simple shape recognition to a final recognition image, a bottom-up image information flow and a simple initial shape recognition from the final recognition image. "Process Control Method Comprising Top-Down Image Information Flow" (Koerner, Tsujino, Masutani “A Cortical-type Modular Network for Hypersonic Resonning” “Neural Network”
vol. 10, no. 5, pp. 791-814) is intended to provide a system capable of reducing the computational load of a computer and capable of image recognition in a shorter time than conventional ones.
[0022]
Furthermore, the present invention allows image processing to be performed in parallel on the entire image that is the recognition target and the characteristic local image that is a part of the recognition target, and automatically matches and suppresses inconsistent hypotheses and matches them. An object of the present invention is to provide a system having a high possibility that the accuracy of recognition is improved by self-learning using a system that activates a hypothesis.
[0023]
[Means for Solving the Problems]
In order to achieve the above object, an image recognition system according to the present invention is an image recognition system for recognizing an object from an overall image of a recognition object and a characteristic local image input by optical means such as a camera. An overall image processing means and a local image processing means for analyzing an image and a local image, the local image processing means comprising a plurality of local modules corresponding to each local image, and each local module has a feature of an input local image in charge A function of extracting, and a function of evaluating consistency between the extracted feature and the recognized image, and the whole image processing means extracts a whole feature of the inputted whole image and the extracted whole A function for evaluating consistency between the feature and the recognized image, and the whole image processing means receives the input from the local module and is inconsistent with the whole feature. Inhibiting the function of Lumpur, characterized by being configured to activate the function of the local module matching the overall characteristics.
[0024]
According to the above configuration, a plurality of narrow pixel regions are used to perform image recognition by analyzing the entire image to be recognized by the entire image processing unit and analyzing the characteristic local image of the entire image by the local image processing unit. It is a system that can perform analysis processing of image information for each local area in parallel and sequentially, and recognizes an object with the whole image and a characteristic local image, Even if it is a recognition target, it becomes a system that allows hypothetical reasoning without the need for established evaluation with difficult operations as in the past, reducing the load on the computer required for image analysis processing, and the time required for recognition of the target image Can be shortened.
[0025]
Further, the image information whose consistency is evaluated in the local modules constituting the local image processing means is reevaluated in the whole image processing means, and only the matching local modules are activated, and inconsistent local modules are suppressed. Therefore, it has a bottom-up process flow and a top-down process flow, and since it can be considered that an image has been recognized when these two process flows match, the load on the computer can be further reduced. , The time required for recognition can be shortened.
[0026]
Furthermore, in the present invention, each local module constituting the local image processing means has a local image in charge corresponding to the phase of the local image, and a plurality of local modules corresponding to different phases of the same local image. In a recognition target with a generally similar shape such as a face, it is recognized in a short time by analyzing characteristic local images such as eyes and nose that are approximately in the same position in phase. The target image can be recognized, and the consistency evaluation performed by the whole image processing means upon receiving the input of the local module can be easily performed.
[0027]
Further, in the present invention, each local module constituting the local image processing means has a local image in charge corresponding to the shape of the local image, and a plurality of local modules corresponding to different phases of the same local image. Since the local module is set corresponding to the local shape that represents the most characteristic of the recognition target of the complex shape, the analysis processing focused on the local image information is performed and the whole image is The image to be recognized can be recognized by evaluating the consistency.
[0028]
Further, in the present invention, each local module constituting the local image processing means includes a first submodule having a function of extracting features of the input local image, and knowledge of the eigenspace of the local image of the recognition target learned in advance. And a function of evaluating consistency based on a projection distance obtained by projecting image information of an input local image having the characteristics extracted by the first sub-module onto the eigenspace of the second sub-module. It consists of a third submodule.
[0029]
With this configuration, the image processing function of each local module that constitutes the local image processing means is divided into three sub-modules and the consistency evaluation is performed step by step. Can be a feature such as a simple shape, contour, or blob, and can be used to evaluate consistency with high accuracy sequentially, reducing the load of image processing that leads to recognition depending on the degree of local image features. Is done.
[0030]
Furthermore, according to the above-described configuration, functions can be suppressed and activated in units of sub modules, and wasteful image processing can be suppressed earlier, and an image recognition system capable of more rational search processing can be obtained. .
[0031]
Furthermore, in the present invention, the first sub-module has knowledge of the average vector of the local image of the recognition target learned in advance, so that the function of the sub-module of each local module constituting the local image processing means is an eigenspace. Since the image processing is performed based on the law, the image analysis processing can be performed by using the compressed data, and the load on the computer can be further reduced.
[0032]
Further, in the present invention, the whole image processing means includes a first submodule having knowledge of an average vector of the whole image of the recognition target learned in advance, and knowledge of the eigenspace of the whole image of the recognition target learned in advance. A second sub-module and a third sub-module having a function of evaluating consistency based on a projection distance obtained by projecting image information of the entire input image having the characteristics extracted by the first sub-module onto the eigenspace of the second sub-module. It consists of modules.
[0033]
With this configuration, the function of image processing of each local module constituting the whole image processing means is divided into three sub-modules and the consistency evaluation is performed step by step. Therefore, it is possible to evaluate the consistency with high accuracy sequentially from the results of the image processing, so that the load of image processing leading to recognition is reduced according to the degree of the characteristics of the entire image, and the function of each sub-module is further reduced. The image recognition system can be suppressed and activated, wasteful image processing is suppressed earlier, and sequential search image processing can be performed more rationally.
[0034]
Further, in the present invention, in the local image processing means, the local module activated by the first sub-module performs different phase analysis processing between a plurality of local modules corresponding to different phases of the same local image. Since the activation signal is output to other local modules, processing is performed so that the hypothesis of the local module whose consistency was evaluated first is confirmed by the other local module as soon as possible. Image recognition is possible.
[0035]
Furthermore, in the present invention, when the local image processing means inputs image information from a phase region approximated by a plurality of local modules in charge of different local images, the local module activated by the first submodule Since a signal for suppressing other local modules is output, it is possible to prevent two or more hypotheses from being generated based on the same input local image information, so that image processing can be performed more efficiently.
[0036]
DETAILED DESCRIPTION OF THE INVENTION
Next, the image recognition system of the present invention will be described in more detail with reference to the accompanying drawings.
[0037]
FIG. 1 is an overall explanatory view of a system configuration according to the present invention.
[0038]
FIG. 2 is a diagram illustrating control signals that are input / output by the sub-modules that constitute the local module, and further illustrating the relationship between the higher module (AIT) and the lower module (input control means).
[0039]
FIG. 3 shows the overall image processing flow in the local module.
[0040]
FIG. 4 shows a flow of image analysis processing of the entire system.
[0041]
FIG. 5 is a chart showing the results of a simulation performed by using the image recognition system according to the present invention as a recognition image target.
[0042]
In FIG. 1, visual image information including an image to be recognized is input by an appropriate optical means (10) such as a video camera and held as digital data in a frame of a pixel map by a known technique. Camera control means (12) is attached to the camera (10) and is prepared to give the camera (10) a function similar to the function of the human eye. The camera (10) can be operated so that it is directed individually to only the part.
[0043]
In order to selectively limit the image to be recognized from the overall image information, the input image information is processed by the precaution processing means (13). The pre-attention processing means (13) includes a filter process using a low-transmittance filter and a color filter, a motion detection means for specifying a moving object in the overall image, and a scale of an image to be recognized. A processing means for normalizing may be provided.
[0044]
Further, the precaution processing means (13) may include a means for performing a contrast normalization process for compensating for variations in illumination at the time of camera input and a normalization process for compensating for the linear response characteristics of the CCD camera used. .
[0045]
Although not shown in the figure, the overall configuration of the present invention includes a graphic or image processing board and a keyboard and mouse for the user to intervene in the system. These are standard equipment well known in the art. The preprocessed and digitized input image information is written in the memory of the front module (11) having a memory function and a principal component analysis function.
[0046]
The image information input to the memory of the front module (11) may be limited to the image information to be recognized by the pre-processing means. When the background is important information for the recognition of the object, the entire image information is used. There may be. The image information input to the memory of the front module (11) is sent to the analysis system by an operation command. The analysis system is composed of two sub-systems described below as conceptually shown in FIG. .
[0047]
The first sub-system is local image processing means (15) (hereinafter abbreviated as PIT). The second sub system is a whole image processing means (14) (hereinafter abbreviated as AIT).
[0048]
The sub-system PIT (15) is a system in charge of analysis processing of a local image constituting a part of an image to be recognized, and performs analysis processing corresponding to a local image having features useful for recognition of the entire image. Provide modules. This analysis processing module is named local module (21).
[0049]
The local module (21) is provided corresponding to the local phase with respect to the entire image, and local image information corresponding to the phase is sent from the memory of the front module (11) to the corresponding local module (21). .
[0050]
The image information input to the local module (21) may be information compressed by PCA processing or the like in the front module (11). The input local image information is subjected to analysis processing for extracting features in parallel by the local module (21) of the corresponding phase. The extracted features of the input local image are projected onto the eigenspace of the image of the recognition target learned in advance which the local module (21) has as knowledge, and the consistency is evaluated. The matching local image information is input to the sub-system AIT (14), the consistency with the entire image analyzed by the sub-system AIT (14) is evaluated, and the recognition target image is recognized.
[0051]
The sub-system AIT (14) inputs the whole image information from the front module (11) and extracts the features of the whole image to be recognized. The extracted features are projected onto the eigenspace of the recognition target that has been learned in advance as the knowledge of the subsystem AIT (14), and an analysis process for evaluating the consistency is performed. Further, the subsystem PIT (15) The local image information is input from the local module (21), and the consistency with the whole image is evaluated to perform image recognition of the recognition target.
[0052]
For example, if the object to be recognized is a human face or a car, the local is a face that represents the characteristics of the face or car or a local part of the car, such as the contours of the eyes, nose, mouth, or face. Is, for example, a front grille, a wheel shape or the like.
[0053]
In the present invention, with the above system configuration, consistency based on the extracted simple contour shape and blob characteristics at the initial stage of image processing is regarded as a hypothesis, and the local modules (21) are based on such hypothesis. Through communication that activates or suppresses the function, evaluation of consistency with higher accuracy is sequentially performed, and further, through activation and suppression by communication with the sub-system AIT (14) that performs image processing of the entire image. Image recognition is performed by hypothetical reasoning by multi-agent calculation that gradually forms an overall consistent interpretation.
[0054]
The sub-system PIT (15) includes an array of a plurality of local modules (21) corresponding to the local area of the recognition target image, and each local module (21) relates to the input phase with the position of the local image area in charge as a main measure. Arranged. The local module (21) corresponding to one local image is composed of a plurality of local modules responsible for different phases of the same local image. The sub-system AIT (14) is composed of a whole image processing module. However, the sub-system AIT (14) is not related to the input phase unlike the sub-system PIT (15) because it deals with the overall characteristics of the image.
[0055]
The configuration and function of each local module (21) of the subsystem PIT (15) will be described with reference to FIG. In order to make the explanation simpler, it is defined as follows. Each local module (21) is composed of three submodules R0, R1, and R2. The submodules R0, R1, and R2 are the same as calculation modules, but have different internal knowledge. That is, the image information processing route is as follows.
[0056]
Camera → Front module → R0 → R1 → R2 → AIT
[0057]
Each module is an image processing agent that interprets and acts on input data individually, and is expressed as an agent in the following. An agent close to the input on the basis of one agent is called a lower agent of that agent. An agent closer to the AIT than the reference agent is referred to as a higher agent of that agent.
[0058]
Agents of other local modules located at the same position as the reference agent are called peer agents. That is, for a certain local module (21), the front module (11) is a lower agent, the modules constituting the sub system AIT (14) are upper agents, and the other local modules of the sub system PIT (15) are Become a peer agent. The local module (21) in charge of the local image in phase with the local image in charge of the local module (21) is referred to as a neighborhood agent.
[0059]
Further, the image information processing in the present invention is performed by a bottom-up process from the lower agent to the upper agent in the direction of the arrow and a flow opposite to the arrow, that is, a top-down process for controlling the lower agent from the upper agent.
[0060]
In FIG. 2, the submodule R0 (22) has the knowledge Ψ of the average vector of the image previously learned for the local image in charge. In the case of a human face subjected to the simulation of the present invention, the average vector Ψ of the submodule R0 (22) of the local module (21) responsible for the nose image is calculated from the M images centered on the nose from the N-dimensional. M local data are collected, and each data is normalized N-dimensional vector Γ_i(1 ≦ i ≦ M), and the average vector Ψ is obtained by the following equation (3).
[0061]
[Expression 2]

[0062]
The submodule R0 (22) is a “hypothesis control agent” whose activity indicates the strength of the hypothesis. New image information Γ input after normalization_NEWThe initial value is the vector Γ_NEW・ It becomes the value of Ψ and this is regarded as the initial hypothesis. The value of the submodule R0 (22) is controlled as will be described later by an input from a neighborhood or higher-level agent, and is developed into a globally consistent image recognition.
[0063]
The submodule R1 (23) has eigenspace knowledge U of an image learned in advance for the local image in charge. The eigenspace U is an N-dimensional vector Γ of the local image data._iΦ minus average vector Ψ_i= Γ_iAs a best description in terms of the least mean square error of the distribution of the vector Φ obtained by Ψ, an orthonormal vector u expressed by the following equation (4)_kAre sequentially obtained from k = 1 to k = M ′ (M ′ ≦ M).
[0064]
[Equation 3]

[0065]
kth vector u_kIs selected so that the expression (4-1) is maximized under the condition of the expression (4-2). Vector u_kAnd scalar λ_kIs the eigenvector and eigenvalue of the covariance matrix C expressed by the following equation (5), and the vector u_kThe space spanned by is called the local eigenspace U.
[0066]
[Expression 4]

[0067]
Input image Γ extracted by submodule R0 (22)_NEWVector representing the features of (Γ_NEW-Ψ) is projected onto the eigenspace U representing individual features of the learning local image of the submodule R1 (23), and the projection distance to the eigenspace of the input vector (Distance From Feature Space; hereinafter referred to as DFFS) is calculated. To do.
[0068]
The local image information input to the submodule R1 (23) is obtained from the memory of the front module (11) with higher accuracy via the switch (SW2) that is activated by the feature evaluation in the submodule R0 (22). You may enter. Further, the sub-module R1 (23) performs a top-down process for controlling the lower agent from the inverse mapping obtained by projecting the vector of the eigenspace U onto the vector space of the input local image by the signal from the upper agent.
[0069]
The sub-module R2 (24) has a function of evaluating the consistency between the input local image and the projected eigenspace using the projection distance (DFFS) of the input vector to the eigenspace as main information, and evaluating the consistency. Is the input local image vector Γ_newRepresenting the features of (Γ_new−Ψ) and the eigenvector u_kInput local image vector Γ_newThe projection distance DFFS to the eigenspace U is performed.
[0070]
Further, the submodule R2 (24) is controlled to be activated and suppressed with respect to the evaluation of the consistency of the input local image by the bottom-up process described above by the top-down control from the sub-system AIT (14) that is the upper agent.
[0071]
The sub-system AIT (14) is composed of modules that perform analysis processing of the entire image. This module can have the same configuration as that of the local module (21) described above. That is, the sub-modules R0, R1, and R2 have the same knowledge and function as the above description except that the sub-modules have the knowledge and functions that the target image becomes the entire image instead of the local image.
[0072]
In FIG. 2, the control action of the submodules (22), (23) and (24) constituting the local module (21) is indicated by arrows. The control action will be described below. There are mainly four kinds of control actions as control actions between the local modules (21), and two kinds of control actions mainly between the submodules (22) (23) (24) in the same local module (21). There is a control action.
[0073]
First, the control action between the local modules (21) or between the local module (21) and the upper or lower agent will be described.
(1) Active type local control Type A (indicated by arrow A)
In the sub-system PIT (15), when the sub-module (22) of one local module (21) out of a plurality of local modules (21) in charge of input images with different phases of the same local image is activated, An activation signal is output to the sub-module R0 (22) of another local module in charge of the same local image, and the sub-module R0 of all local modules (21) in charge of the local image is activated, Facilitates image processing of input local images.
[0074]
For example, when one of a plurality of local modules (21) in charge of nasal image processing is activated when the characteristics of the input image extracted in the submodule R0 (22) are evaluated as being consistent, and activated. The image processing of the nose is promoted by activating the submodule R0 (22) of all the local modules (21) in charge of processing. In particular, by performing this control between the local modules constituting the neighborhood agent, it is possible to estimate the feature of the local image whose phase is approximated in a short time.
[0075]
(2) Suppression type local control type B (indicated by arrow B)
When a plurality of local modules (21) in charge of different local images receive image information from the approximated phase region, the submodule R0 (22) of a specific local module (21) is activated. In this case, a suppression signal is given to the submodule R0 (22) of the local module (21) in charge of another local image. As a result, it is possible to prevent a plurality of hypotheses from competing for the approximate image information of the phase region.
[0076]
(3) Top-down control C type (indicated by arrow C)
Top-down control is control performed by an activated and operating agent for a lower agent that is receiving input. There are two types of top-down control: activation control and suppression control. The local image information whose consistency is evaluated in the local module (21) is input to the sub-system AIT (14), and the sub-system AIT (14) analyzes the input image and evaluates the consistency of the entire image. It is evaluated whether it matches with the local image. When it is determined that there is a mismatch, the operation of the local module (21) is stopped by the suppression signal.
[0077]
When the submodule R0 (22) of the local module (21) is determined to be inconsistent, a suppression signal is output to the front module, which is a lower agent, and the input image information to the local module (21) is controlled. . An activation signal is output to the inactive local module (21) in charge of the local image of the whole image based on the result of the consistency evaluation by the whole image processing of the subsystem AIT (14). Thus, the operation of the local module (21) is prompted.
[0078]
(4) Bottom-up control D type (indicated by arrow D)
When the evaluation using the projection distance (DFFS) in the submodule R2 (24) of the local module (21) as the main information falls below the threshold, the switch (SW1) is closed and the information in the eigenspace of the submodule R1 (23) is matched. The local image information is output to the sub-system AIT (14). When it is determined that the evaluation of the submodule R2 (24) is equal to or less than the threshold value, the case where the knowledge of each local module (21) can be sufficiently explained by the bottom-up image information processing in the local module (21), Consistency may be recognized by image information processing of the local module (21) activated by the top-down control (C type) from the system AIT (14).
[0079]
Next, two types of control actions between the submodules R0, R1, and R2 constituting the local module (21) will be described.
(1) Top-down control in local module Type E (indicated by arrow E)
If it is determined that the evaluation in the submodule R2 (24) constituting the local module (21) is equal to or less than the threshold, the submodule R2 (24) suppresses the submodule R0 (22) of the local module (21). When the evaluation value of the submodule R2 (24) is less than or equal to the threshold value, the information of the submodule R1 (23) is output to the subsystem AIT (14) by the switch (SW1) by the bottom-up control (D type). Is consistent, the image recognition is complete.
[0080]
When the sub module R0 (22) is continuously activated, the sub module R0 (22) of the local module (21) in the vicinity is suppressed by the suppression type local control (B type). This narrows down the options for sequential search when local image recognition is still indeterminate. By suppressing the submodule R0 (22) of the local module (21) that has already been recognized, it becomes possible to activate another local module (21) that constitutes the neighboring agent.
[0081]
(2) Sub-module input control in local module F type (indicated by arrow F)
In the local module (21), the switch (SW2) is not closed unless the value of the sub module R0 (22) is equal to or less than the threshold value, and no image information is input to the sub module R1 (23). When the level of the extracted feature by the submodule R0 (22) is evaluated by the threshold value, the switch (SW2) is closed, and image information projected from the front module (11) to the eigenspace of the submodule R1 (23) is input. The
[0082]
With such control, it is possible to avoid unnecessary calculations and reduce the load on the computer, improve the quality of the image information projected by the submodule R1 (23), and reduce the noise information given to the upper agent. Similarly, the switch (SW1) is set so that the image information of the submodule R1 (23) is not transmitted to the subsystem AIT (14) until the value of the submodule R2 (24) is below the threshold value and the consistency is evaluated. It is provided and controlled.
[0083]
The overall flow of image processing in the subsystem PIT (15) shown in FIG. 3 will be described. In the following description, the case of recognition of a human face that has actually been simulated according to the present invention will be described as an example. This is a target image having parts with several sub-symbols. Because it is suitable, the application of the present invention is not limited to face recognition, and can be applied to all image recognition that recognizes the entire input image by reconstructing the entire image from local images. Technology.
[0084]
In addition, the simulation of this system was implemented using C ++ language of GNU on Sun workstation SS10.
[0085]
In step 101, image information data is input. The data is image information data of a pixel area scanned by an attention window arbitrarily set from the entire image input by the camera, and an input image caused by a change in illumination at the time of camera input by the normalization module in step 102 After the necessary preprocessing such as normalization processing to compensate for changes in the image data or changes in the input image data caused by the linear response characteristics of the CCD camera, normalization processing for the image scale, etc., the memory of the front module Stored in
[0086]
The image information stored in the memory is input to the submodule R0 (22) of the local module (21) corresponding to the input phase based on the scan position information of the attention window. The input image information may be subjected to appropriate PCA by the front module (11).
[0087]
The submodule R0 (22) is defined by an average vector for each local corresponding to the pixel position from a large number of image data obtained by learning for all local images. In the simulation, an average vector of 200-dimensional vectors of principal components was used for the left and right eyes, nose, mouth, and the outline of the face divided into 12 types. In step 103, the submodule R0 (22) of each local module (21) performs pattern matching on the input local image data using the average vector as a template, and outputs an analog output corresponding to the distance.
[0088]
The submodule R0 (22) can scan the vicinity (3 × 3 pixel region) of the input local image and select the maximum active input local image. However, since the pattern detection capability when such an average vector is used as a template is low and vague, the submodule R0 (22) can be considered as a hypothesis generation unit for a local image.
[0089]
Making the knowledge possessed by the submodule R0 (22) an average vector is convenient for extracting features of the input image information, and the extracted features are output to the submodule R1 (23) and projected onto the eigenspace. be able to. However, the submodule R0 is regarded as a hypothesis generation unit as described above, and image information to be output to the submodule R1 (23) is input from the memory of the front module (11) according to the activity of the submodule R0 (22). In this case, a simple template using fewer principal component vectors can be used as the knowledge of the submodule R0 (22).
[0090]
In step (104), the result of the matching is specified as a matching evaluation value. In the present embodiment, the consistency evaluation in the submodule R0 (22) is performed in the submodule R2 (24). That is, the submodule R2 (24) is configured as an evaluation module for the local module (21). In step 104, the activation input (C type) from the sub-system AIT (14), the suppression signal input (B type) from the other local module (21), the activation signal input (A type), and the local module ( 21) is controlled by the suppression signal input from the submodule R2 (24). When the suppression signal is not input and the activity is maintained, the R0 function is evaluated and a signal is output to another agent based on each evaluation result. The R0 function is expressed by the following equation (6).
[0091]
[Equation 5]

[0092]
In the submodule R0 (22), as described in the section of the control action, the activation type local control type A (arrow A) or the suppression type local control type B (arrow B) is changed from the subagent R0 of the other agent. Activation or suppression is controlled by the control signal, and top-down control is further received from the submodule R2 (24) and the subsystem AIT (14). The top-down control (arrow C) from the submodule R2 (24) is inhibitory and stops processing on hypotheses already interpreted by the higher agent for the signal processing level of the submodule R0 (22).
[0093]
The top-down control (arrow C) of the sub-system AIT (14) has both activation and suppression. The local image information obtained by the local module (21) performing image processing and evaluating the consistency is used for the sub-system AIT (14). The function of the local module (21) is controlled in a top-down manner based on the result of re-evaluation by projecting to the entire image. In step 105, if the value (f) based on the distance between the template k and the vector of the input image data is less than or equal to the threshold value, the local image information is transferred to the front module ( 11). Further, based on the evaluation in step 104, A-type and B-type control signals are sent to the submodule R0 of the other local module.
[0094]
The submodule R1 (23) has eigenspace knowledge based on individual image information of each learned local image. In this simulation, individual eigenspaces of local images are generated using vectors of the top 20 principal components having a larger value than the 200-dimensional vector of learned image information.
[0095]
In step 106, when the submodule R0 (22) of the local module (21) is activated, the submodule R1 (23) receives the local input image information from the memory of the front module (11). Input local image information Γ_NEWIs a vector representing a feature obtained by subtracting the average vector, and the top 20 principal component vectors of the vector are projected onto the individual eigenspaces of the submodule R1 (23). That is, it is expressed by the following equation (7).
[0096]
[Formula 6]

[0097]
The projection distance (DFFS) between the input local image vector in the submodule R1 (23) and the eigenspace possessed by the submodule R1 (23) is used for image recognition evaluation, and this evaluation is used as a step of the submodule R2 (24). At 107.
[0098]
The submodule R2 (24) that performs the evaluation in step 107 is a top that activates the local module (21) that is activated preferentially based on the image processing result of the entire image in the subsystem AIT (14) that is the upper agent. When the down control (arrow C) is received from the sub-system AIT (14) and the consistency of the input image is evaluated in the other local module (21), the local module ( 21) outputs a suppression signal. These control signals suppress unnecessary processing in the submodule R2 (24), and shorten the time for image recognition processing of the entire system.
[0099]
Evaluation in the submodule R2 (24) is expressed by the following equation (8).
[0100]
[Expression 7]

[0101]
In step 107, if the evaluation value is equal to or smaller than the threshold value, the local image information of the secondary module R1 (23) is sent to the secondary system AIT (14) that is the higher agent (arrow D). Further, the sub-module R2 (24) of the local module activated by receiving the C-type top-down control from the sub-system AIT (14) projects the eigenspace of the sub-module R1 (23) onto the input image vector. Is output to the submodule R0 (arrow R0) as a top-down hypothesis. Local modules consistent with the top-down hypothesis are supported, inconsistent local modules are suppressed, and local modules that are consistent but not activated are prompted for activation.
[0102]
In the submodule R2 (24), when the consistency evaluation using the projection distance (DFFS) as the main information is less than the threshold and the input image is recognized, a suppression signal () is sent to the submodule R0 (22) of the local module (21). Arrow E) is transferred.
[0103]
The sub-system AIT (14) is composed of modules that analyze the image information of the whole image, and the whole image information is processed from the front module (11) and processed from the sub-system PIT (15) by each local module (21). The local image information is input, and the whole image in which each local image is matched is recognized.
[0104]
The modules of the sub system AIT (14) are composed of sub modules R0, R1, and R2 similar to the local module (21). The function is also the same as the function of the corresponding submodule of the local module. That is, the submodule R0 has the knowledge of the average vector of the learned whole image and has the function of a template of the input whole image. The sub-module R1 has the function of projecting the entire image information to the eigenspace based on the evaluation of the input entire image in the sub-module R0 by the template based on the knowledge of the individual eigenspace of the entire learning image. The submodule R2 has knowledge of the projection distance (DFFS) threshold value, and evaluates the consistency by DFFS using the expression value of the submodule R1.
[0105]
In this simulation, the average vector of the submodule R0 is selected from the learned face images of 105 people by appropriately centering the entire face image of 128 people (128 × 128 pixel positions × 200 principal component vectors). It was defined by the average of image data (105 × 105 pixel positions × 3 principal component vectors).
[0106]
The 105 × 105 pixel position is the pixel area of the attention window (203) shown in FIG. 4. Since the face position cannot be specified from the actual input image from the camera, the camera is scanned by the command of the subsystem AIT (14). The position of the attention window is determined by matching with the template in the submodule R0 (204).
[0107]
The size of the attention window defined by the subsystem AIT (14) is 105 × 105 pixels, but the input from the subsystem PIT (15) to the subsystem AIT (14) is 35 × 35 pixel positions × 16 locals. It becomes. In this simulation, 12 × 12 pixel positions × 16 locals × 20 component vectors are used in consideration of the calculation capability of the computer used.
[0108]
The overall flowchart of the present invention shown in FIG. 4 will be described based on an example of face simulation.
[0109]
The image input by the camera is stored in the memory of the front module (11) as an entire face image after the various normalization processes described above (202). In this simulation, a 256 gradation grayscale image was used in a 128 × 128 pixel region.
[0110]
The entire image is input (204) to the subsystem AIT (14) via the data bus (300). Whether the whole face image (three principal component vectors) is a human face is determined by template matching of the submodule R0, and the features of the input face image are extracted by the average vector as a template. After the evaluation of the submodule R0 (204), the face image is projected onto the eigenspace of 105 faces of the submodule R1 (205).
[0111]
In the sub-module R2 of the sub-system AIT (14), these DFFSs are obtained and compared with the threshold (206). When the DFFS of one eigenspace is specified by the threshold value in the evaluation of the submodule R2, the input face image can be recognized as an individual face specified by the eigenspace. However, since the above image data processing is performed by sampling a small number of principal component vectors in consideration of the ease of calculation or the calculation speed, there are cases where the evaluation of the submodule R2 matches a plurality of eigenspaces.
[0112]
In such a case, an activation signal is sent as a top-down control to the local module (15) of the lucky system PIT (15) that is responsible for local images belonging to faces (entire images) of a plurality of eigenspaces evaluated as matching. (Step 207).
[0113]
On the other hand, the inputted whole image is scanned by a window of attention set by the phase information of each local module of the sub-system PIT (203), and the local image of the mouth, nose, left and right eyes, and 12 face outlines. Is generated and input to the corresponding local module via the data bus (300). For example, the local image data of the main component input to the local nose module is recognized as a nose image using the average nose vector learned in the submodule R0 as a template, and at the same time, the characteristics of the input nose image are extracted ( 208).
[0114]
When the input local image is recognized as a nose image, a vector representing the extracted feature is projected to the eigenspace of the submodule R1 or higher-dimensional input image information is obtained from the memory of the front module. Input and project to the eigenspace of the nose of the submodule R1 (209). When the projection distance DFFS obtained by calculating the projection result is evaluated and matched by the threshold value of the sub-module R2 (210), the image information in the eigenspace of the lucky module R1 (23) is transferred to the data bus (300). ) To the sub-system AIT (14).
[0115]
The sub-system AIT (14) evaluates consistency by projecting image information of the nose inputted in phase to the eigenspace of the individual face recognized by image processing of the entire image (205). In this case, the local image information to the sub-system AIT (14) may be configured to input higher-dimensional principal component image information from the memory (11). Further, the evaluation of the consistency of the image information from the local module (21) in the sub-system AIT (14) is performed by projecting the input image information from the local module (21) to the eigenspace recognized by the sub-system AIT (14). Instead of the method, a template of a corresponding local image may be prepared based on the personal image information recognized by the sub-system AIT (14), and the consistency may be evaluated using the template.
[0116]
When the sub-module R0 (22) of the sub-system PIT (15) recognizes that it is a nasal image, a B-type control signal is sent to another local module, and the other local module (21 ) And the activation signal (A type) is output to the submodule R0 (22) of the local module (21) in charge of another local image whose phase is approximate.
[0117]
When consistency is evaluated in the evaluation in the local module R2 (210), an E-type control signal is output to the sub-module R0 of the local module (208) to suppress the operation.
[0118]
Based on the evaluation of the entire image processing in the sub-system AIT (14), the local module (21) of the sub-system PIT (15) activated by the top-down control is the local image information evaluated in the sub-module R0 (22). It is only necessary to evaluate the projection distance DFFS with the local eigenspace recognized by the sub-system AIT (14). When a plurality of individuals are given as hypotheses to the local module (21) by the top-down control of the sub-system AIT (14), it is recognized that it is a local image of one individual in the evaluation in the sub-module R2 (24). Then, at that stage, the face of the entire image is identified and recognized.
[0119]
If the input image is an unlearned target, recognition that identifies the input image is impossible as a result. When there is such an unlearned input image, a system having a learning ability is provided by providing a function of automatically adding the image processing information to the knowledge of the submodule of each subsystem. Further, by storing the recognized image or the unique space of each subsystem corresponding to the feature of the input image information once image recognition is performed, a specific module corresponding to the feature of the input image is given priority. It is also possible to activate the system, resulting in a system with a higher learning effect.
[0120]
FIG. 5 visually shows the image processing progress of the local module (21) in the simulation. FIG. 5A) shows the case where the top-down control from the sub-system AIT (14) is not performed, and FIG. 5B) shows the case where the top-down control from the sub-system AIT (14) is performed in the recognition of the same input image. In the figure, the column direction represents a time step (one step is 5 milliseconds), and the first row shows the progress of the entire image processing in the sub-system AIT (14). The second row shows the active state of the submodule R2 (24) of the 16 local modules (21) of the subsystem PIT, and the following lines show the submodule R0 of the local modules (21) of the right eye, left eye, nose and mouth. The active state of (22) is shown in time series.
[0121]
The input image shown in FIG. 5 was an unlearned one. When the top-down control shown in FIG. 5a) is not performed, the face outline is first detected, and both eyes and nose are immediately detected. However, since the average vector is used as the template of the submodule R0 (22), there are many errors in the recognition of the left eye and the right eye, and the mouth cannot be detected until step 9.
[0122]
In the case of FIG. 5b) in which the top-down control is performed, the face contour, both eyes, and the nose are detected in the same manner as in FIG. 5a), but the mouth is detected in step 5. Comparing the activation states of the submodule R0 (22) in FIGS. 5a) and b), it can be seen that the activation in the submodule R0 (22) is less when the top-down control is performed. This indicates that the activity of the inconsistent submodule R0 (22) is suppressed by the top-down control, and the submodule R2 (22) selects the most consistent information from the small activation information. I can understand that.
[0123]
Further, as can be seen from the time-series change of the entire image processing in the sub-system AIT in the first row, the initial image is weak and coarse but gradually becomes more detailed. Therefore, it can be understood that the top-down control also becomes detailed and reliable over time.
[0124]
Table 1 shows the local detection rates of the eyes, nose, and mouth for the 105 face images used in the simulation. It can be seen that the image processing method according to the present invention shows a considerable improvement in detection capability even compared with the conventional eigenspace method.
[0125]
[Table 1]

[0126]
As described above, in the image recognition system of the present invention, the entire image recognition process and the local image recognition process are processed in parallel by the module having the divided functions, and mismatched by the control signal between the modules performing the process. Therefore, it is possible to reduce the calculation load and to perform image recognition in a short time because it has a function of suppressing early processing and promoting local image processing necessary for recognition of the entire image.
[0127]
Furthermore, since the image processing of the present invention is configured as described above, at the initial stage of image processing, based on hypothetical image recognition based on extremely compressed image data, the formation of globally consistent recognition can be calculated by multi-agents. The system is realized and does not require probabilistic evaluation with difficult operations like conventional hypothetical reasoning.
[0128]
Furthermore, in the image recognition system of the present invention, many local modules sequentially process the input image in a search manner, and the bottom-up process flow leading to the recognition of the whole image and the hypothetical recognition obtained from the whole image processing. It can be considered that the recognition of the whole image has been established when it is determined that the two processing flows match in the matching processing agent. Therefore, image recognition at an early stage can be achieved.
[Brief description of the drawings]
FIG. 1 is an overall explanatory diagram of a system configuration showing an example of an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing control signals input / output by sub-modules constituting a local module.
FIG. 3 is a flowchart showing overall image processing in a local module.
FIG. 4 is a flowchart showing image analysis processing of the entire system.
FIG. 5 is an explanatory diagram showing a simulation result of the present embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 10 ... Optical means (camera), 14 ... Whole image processing means (AIT), 15 ... Local image processing means (PIT), 21 ... Local module.

Claims

Whole image processing means and local image processing for analyzing whole image and local image in image recognition system for recognizing object from whole image of recognition object and characteristic local image inputted by optical means such as camera The local image processing means comprises a plurality of local modules corresponding to each local image, and each local module extracts a feature of the input local image in charge, and matches the extracted feature with the recognized image. And the whole image processing means has a function of extracting the whole feature of the inputted whole image and a function of evaluating the consistency between the extracted whole feature and the recognized image. The global image processing means receives the input from the local module, suppresses the function of the local module that is inconsistent with the global feature, and matches the local mode with the global feature. Image recognition system characterized by being configured to activate the function of Yuru.

Each local module constituting the local image processing means has a local image in charge corresponding to the phase of the local image, and is configured as a plurality of local modules corresponding to different phases of the same local image. The image recognition system according to claim 1, wherein:

Each local module constituting the local image processing means is defined as a local image corresponding to the shape of the local image, and is configured as a plurality of local modules corresponding to different phases of the same local image. The image recognition system according to claim 1, wherein:

Each local module constituting the local image processing means includes a first submodule having a function of extracting features of the input local image, and a second submodule having knowledge of the eigenspace of the local image to be recognized that has been learned in advance. And a third submodule having a function of evaluating consistency based on a projection distance obtained by projecting image information of the input local image having the characteristics extracted by the first submodule onto the eigenspace of the second submodule. The image recognition system according to claim 1, wherein the image recognition system is configured.

The image recognition system according to claim 4, wherein the first submodule has knowledge of an average vector of the local image of the recognition target learned in advance.

The whole image processing means includes a first submodule having knowledge of an average vector of a whole image of a recognition target learned in advance, a second submodule having knowledge of an eigenspace of the whole image of a recognition subject learned in advance, And a third submodule having a function of evaluating consistency based on a projection distance obtained by projecting image information of the entire input image having the characteristics extracted by the first submodule onto the eigenspace of the second submodule. The image recognition system according to claim 1.

In the local image processing means, among a plurality of local modules corresponding to different phases of the same local image, the local module activated by the first sub-module is active for other local modules performing different phase analysis processing. 5. The image processing system according to claim 4, wherein a signal to be converted is output.

In the local image processing means, when a plurality of local modules in charge of different local images are inputting image information from approximate phase regions, the local module activated by the first sub-module is replaced with another local module. The image processing system according to claim 4, wherein a suppression signal is output.